LambdaNet: Probabilistic Type Inference using Graph Neural Networks

ICLR, 2020.

Cited by: 2|Bibtex|Views2317
EI
Other Links: academic.microsoft.com|dblp.uni-trier.de|arxiv.org
Weibo:
We have presented LAMBDANET, a neural architecture for type inference that combines the strength of explicit program analysis with graph neural networks

Abstract:

As gradual typing becomes increasingly popular in languages like Python and TypeScript, there is a growing need to infer type annotations automatically. While type annotations help with tasks like code completion and static error catching, these annotations cannot be fully inferred by compilers and are tedious to annotate by hand. This pa...More
0
Introduction
  • Typed languages like Python, Ruby, and Javascript have gained enormous popularity over the last decade, yet their lack of a static type system comes with certain disadvantages in terms of maintainability (Hanenberg et al, 2013), the ability to catch errors at compile time, and code completion support (Gao et al, 2017).
  • Even without considering user-defined types, the accuracy of these systems is relatively low, with the current state-of-theart achieving 56.9% accuracy for primitive/library types (Hellendoorn et al, 2018).
  • These techniques can produce inconsistent results in that they may predict different types for different token-level occurrences of the same variable
Highlights
  • Typed languages like Python, Ruby, and Javascript have gained enormous popularity over the last decade, yet their lack of a static type system comes with certain disadvantages in terms of maintainability (Hanenberg et al, 2013), the ability to catch errors at compile time, and code completion support (Gao et al, 2017)
  • We propose a new probabilistic type inference algorithm for TypeScript to address these shortcomings using a graph neural network architecture (GNN) (Velickovicet al., 2018; Li et al, 2016; Mou et al, 2016)
  • This paper makes the following contributions: (1) We propose a probabilistic type inference algorithm for TypeScript that uses deep learning to make predictions from the type dependency graph representation of the program
  • We have presented LAMBDANET, a neural architecture for type inference that combines the strength of explicit program analysis with graph neural networks
  • LAMBDANET outperforms other state-of-the-art tools when predicting library types, but can effectively predict user-defined types that have not been encountered during training
Results
  • The authors describe the results of the experimental evaluation, which is designed to answer the following questions: (1) How does the approach compare to previous work? (2) How well can the model predict user-defined types? (3) How useful is each of the model’s components?

    Dataset.
  • Note that each project typically contains hundreds to thousands of type variables to predict, and these projects in total contain about 1.2 million lines of TypeScript code.
  • Among these 300 projects, the authors use 60 for testing, 40 for validation, and the remainder for training.
  • The authors believe that code duplication is not a severe problem in the dataset
Conclusion
  • The authors have presented LAMBDANET, a neural architecture for type inference that combines the strength of explicit program analysis with graph neural networks.
  • Extending the prediction space to include structured types would allow them to make full use of the rich type systems many modern languages such as TypeScript provide.
  • Another important direction is to enforce hard constraints during inference such that the resulting type assignments are guaranteed to be consistent
Summary
  • Introduction:

    Typed languages like Python, Ruby, and Javascript have gained enormous popularity over the last decade, yet their lack of a static type system comes with certain disadvantages in terms of maintainability (Hanenberg et al, 2013), the ability to catch errors at compile time, and code completion support (Gao et al, 2017).
  • Even without considering user-defined types, the accuracy of these systems is relatively low, with the current state-of-theart achieving 56.9% accuracy for primitive/library types (Hellendoorn et al, 2018).
  • These techniques can produce inconsistent results in that they may predict different types for different token-level occurrences of the same variable
  • Results:

    The authors describe the results of the experimental evaluation, which is designed to answer the following questions: (1) How does the approach compare to previous work? (2) How well can the model predict user-defined types? (3) How useful is each of the model’s components?

    Dataset.
  • Note that each project typically contains hundreds to thousands of type variables to predict, and these projects in total contain about 1.2 million lines of TypeScript code.
  • Among these 300 projects, the authors use 60 for testing, 40 for validation, and the remainder for training.
  • The authors believe that code duplication is not a severe problem in the dataset
  • Conclusion:

    The authors have presented LAMBDANET, a neural architecture for type inference that combines the strength of explicit program analysis with graph neural networks.
  • Extending the prediction space to include structured types would allow them to make full use of the rich type systems many modern languages such as TypeScript provide.
  • Another important direction is to enforce hard constraints during inference such that the resulting type assignments are guaranteed to be consistent
Tables
  • Table1: Different types of hyperedges used in a type dependency graph
  • Table2: Accuracy when predicting all types
  • Table3: Performance of different GNN iterations (left) and ablations (right)
Download tables as Excel
Related work
  • Type Inference using Statistical Methods. There are several previous works on predicting likely type annotations for dynamically typed languages: Raychev et al (2015) and Xu et al (2016) use structured inference models for Javascript and Python, but their approaches do not take advantage of deep learning and are limited to a very restricted prediction space. Hellendoorn et al (2018) and Jangda & Anand (2019) model programs as sequences and AST trees and apply deep learning models (RRNs and Tree-RNNs) for TypeScript and Python programs. Malik et al (2019) make use of a different source of information and take documentation strings as part of their input. However, all these previous works are limited to predicting types from a fixed vocabulary.

    Graph Embedding of Programs. Allamanis et al (2017) are the first to use GNNs to obtain deep embedding of programs, but they focus on predicting variable names and misuses for C and rely on static type information to construct the program graph. Wang et al (2017) use GNNs to encode mathematical formulas for premise selection in automated theorem proving. The way we encode types has some similarity to how they encode quantified formulas, but while their focus is on higherorder formulas, our problem requires encoding object types. Velickovicet al. (2018) are the first to use an attention mechanism in GNNs. While they use attention to compute node embeddings from messages, we use attention to compute certain messages from node embeddings.
Funding
  • This project was supported in part by NSF grant CCF-1762299
Reference
  • Deeplearning4j. https://github.com/eclipse/deeplearning4j. Accessed:201909-24.
    Findings
  • Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs. ICLR, 2017.
    Google ScholarLocate open access versionFindings
  • Davide Ancona and Elena Zucca. Principal typings for java-like languages. In ACM SIGPLAN Notices, volume 39, pp. 306–317. ACM, 2004.
    Google ScholarLocate open access versionFindings
  • Gavin Bierman, Martın Abadi, and Mads Torgersen. Understanding typescript. In Richard Jones (ed.), ECOOP 2014 – Object-Oriented Programming, pp. 257–281, Berlin, Heidelberg, 201Springer Berlin Heidelberg. ISBN 978-3-662-44202-9.
    Google ScholarLocate open access versionFindings
  • Benjamin Chung, Paley Li, Francesco Zappa Nardelli, and Jan Vitek. Kafka: Gradual typing for objects. In ECOOP 2018-2018 European Conference on Object-Oriented Programming, 2018.
    Google ScholarLocate open access versionFindings
  • Yann Dauphin, Gokhan Tur, Dilek Z. Hakkani-Tur, and Larry P. Heck. Zero-shot learning for semantic utterance classification. In ICLR, 2013.
    Google ScholarFindings
  • Yotam Eshel, Noam Cohen, Kira Radinsky, Shaul Markovitch, Ikuya Yamada, and Omer Levy. Named entity disambiguation for noisy text. In CoNLL, 2017.
    Google ScholarLocate open access versionFindings
  • Ali Farhadi, Ian Endres, Derek Hoiem, and David Forsyth. Describing objects by their attributes. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • Zheng Gao, Christian Bird, and Earl T. Barr. To type or not to type: Quantifying detectable bugs in javascript. In Proceedings of the 39th International Conference on Software Engineering, ICSE ’17, pp. 758–769, Piscataway, NJ, USA, 2017. IEEE Press. ISBN 978-1-5386-3868-2. doi: 10.1109/ICSE.2017.75. URL https://doi.org/10.1109/ICSE.2017.75.
    Locate open access versionFindings
  • Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou, and Yoshua Bengio. Pointing the unknown words. In Proceedings of the ACL, 2016.
    Google ScholarLocate open access versionFindings
  • Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Eric Tanter, and Andreas Stefik. An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering, 19:1335–1382, 2013.
    Google ScholarLocate open access versionFindings
  • Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. Deep learning type inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2018, pp. 152–162, New York, NY, USA, 2018. ACM. ISBN 978-1-4503-5573-5. doi: 10.1145/3236024. 3236051. URL http://doi.acm.org/10.1145/3236024.3236051.
    Locate open access versionFindings
  • Abhinav Jangda and Gaurav Anand. Predicting variable types in dynamically typed programming languages. arXiv preprint arXiv:1901.05138, 2019.
    Findings
Full Text
Your rating :
0

 

Tags
Comments