Enabling improved IR-based feature location

Dave Binkley,Dawn Lawrie,Christopher Uehlinger,Daniel Heinz

Journal of Systems and Software（2015）

引用 20|浏览113

暂无评分

摘要

Recent solutions to software engineering problems have incorporated tools and techniques from information retrieval (IR). The use of IR requires choosing an appropriate retrieval model and deciding on a query that best captures a particular information need. Taking feature location as a representative example, three research questions are investigated: (1) the impact of query preprocessing, (2) the impact that different scraping techniques for queries have on retrieval performance, (3) the performance impact that the underlying retrieval model has on identifying the correct source-code functions (the correct documents). These research questions are addressed using the five open source projects released as part of the SEMERU dataset. In the experiments, five methods of scraping queries from modification requests and seven retrieval model instances are considered. Using the standard evaluation metric Mean Reciprocal Rank (MRR), the experimental analysis reveals that better retrieval models are not the ones commonly used by software engineering researchers. Results find that models based on query-likelihood perform about twice as well as models in common use in software engineering such as LSI and thus deserve greater attention. Furthermore, corpus preprocessing has a significant impact as the top performing setting is over 100% better than the average.

查看译文

关键词

Information retrieval models,Query formulation,Feature location

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要