I’ve just returned from the inaugural ‘Big Code Summit’. A two-day conference on the very interesting topic of Big Code, which featured a mix of academic and industry speakers. See here for a link to the full programme.
By analogy to Big Data, Big Code is the research area on how to manage large amounts of code in an organization and how best to deal with the complexities that come with that, as well as how to treat your code as data and leverage that large corpus to gain insights into how engineers can work better.
Below are summary notes on some of the talks that I found to be personal highlights:
- Neural code search
- Using NLP and IR techniques to search through code
- Learning to find Bugs
- Using ML to detect bugs in code. This is done by formalizing bug-finding as a classification problem: is this code buggy or non-buggy? In contrast, static analysis bug detectors fall short as they are coded to look for specific patterns.
- DeepBugs is trained on a labeled code corpus (given snippet of code, label is buggy/non-buggy), and so can learn more general rules. Fortunately, these labels come naturally from source version histories: if a given commit has a bug, then there will be a dual, follow-up commit that fixes the bug. It can even suggest fixes to the bug, based on information from source version history (e.g. commit x had a bug, commit y fixes the bug, that delta is the bug-fix strategy).
- Learning for Programs: Connecting code, statistics, semantics, and language
- A broad-ranging talk, that touched on solving some interesting problems such as generating code comments from some given source snippet, or detecting redundant/inaccurate comments by comparison to their adjacent source code.
- Generate program UIs from design layouts
- Given an image from a UI designer, use ML to generate code that creates that UI. This is to save developers time/effort in producing UIs, and reduce inconsistencies from eye-balling.
- Neural sketch learning