A Statistical Method for API Usage Learning and API Misuse Violation Finding.

Deepak Panda,Piyush Basia, Kushal Nallavolu, Xin Zhong,Harvey P. Siy,Myoungkyu Song

SERA(2023)

引用 0|浏览0
暂无评分
摘要
A large corpus of software repositories enables an opportunity for using machine learning (ML) approaches to create new software engineering tools. In this paper, we propose a novel technique which leverages ML approaches for automating software engineering tasks and thus improves software quality. Our concrete goal is to (1) explore the abundance of predictable repetitive regularities of such a massive codebase, (2) develop an ML approach for training a statistical model to identify common patterns in software corpora, and then (3) use these patterns to statistically detect anomalous, likely buggy, program behavior that significantly deviates from these typical patterns. These internal regularities and repetitive properties of software can be captured as patterns to detect violations of these common patterns. Such violations have a critical impact on program behavior such as bugs, security vulnerabilities, or even program crashes. Our approach focuses on usage patterns of application programming interfaces (APIs). API usage patterns are commonly recurring, representative examples of how real-world applications use APIs in software corpora. These desirable patterns of API usage are learnable to validate or improve developers' implementations. This paper shows preliminary results that we use standard cross-entropy and perplexity to measure how surprising a test subject application is to a statistical model estimated from a software corpus. We continue to develop our approach and evaluate the effectiveness to focus on the following research questions. Are our ML models effectively trainable on large code corpora to learn desirable API usage patterns? How does the performance of our ML-based approach compare to state-of-the-art language models for software when learning API usage for detecting API misuse violations?
更多
查看译文
关键词
API misuse violation finding,API usage learning,application programming interfaces,automating software engineering tasks,machine learning,ML approach,program behavior,software engineering tools,software quality,software repositories,statistical method
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要