Source code analysis with LDA
Periodicals(2016)
摘要
AbstractLatent Dirichlet allocation LDA has seen increasing use in the understanding of source code and its related artifacts in part because of its impressive modeling power. However, this expressive power comes at a cost: The technique includes several tuning parameters whose impact on the resulting LDA model must be carefully considered. The aim of this work is to provide insights into the tuning parameters' impact. Doing so improves the comprehension of both researchers who look to exploit the power of LDA in their research and those who interpret the output of LDA-using tools. It is important to recognize that the goal of this work is not to establish values for the tuning parameters because there is no universal best setting. Rather, appropriate settings depend on the problem being solved, the input corpus in this case, typically words from the source code and its supporting artifacts, and the needs of the engineer performing the analysis. This work's primary goal is to aid software engineers in their understanding of the LDA tuning parameters by demonstrating numerically and graphically the relationship between the tuning parameters and the LDA output. A secondary goal is to enable more informed setting of the parameters. Copyright © 2016 John Wiley & Sons, Ltd.
更多查看译文
关键词
latent Dirichlet allocation,hyper-parameters,entropy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络