FOSSLT: An Efficient Model for Automatic Finding Open Source Software License Texts

2023 2nd International Conference on Cloud Computing, Big Data Application and Software Engineering (CBASE)(2023)

引用 0|浏览4
暂无评分
摘要
As open source code is widely used in modern software development, efficient and correct identification of open source software license is important for the compliant use of open source code. However, Open source licenses in large-scale mixed-source software are not only diverse and numerous, but also diverse in the way they are declared. Existing license identification tools are difficult to meet the needs in terms of correctness and efficiency. In this paper, we first divide the matching features in the process of license identification into three types, which are license identifier, license name and license text. Then we present an efficient and accurate model for identification of license text. The model is composed of extraction module and identification module. The extraction module can extract the text to be matched more precisely to eliminate redundant information, while the identification module can eliminate the noise text interference with high efficiency. In order to prove the validity of this model, we randomly selected more than 3,000 files that contained license text and 300 files that didn’t contained license text from Linux distributions as experimental data. It contains more than 80 different common licenses, such as AGPL-3.0-only, Apache-2.0, BSD-3-Clause, and GPL-3.0-only. The influence of different parameters on the identification effect of the model was analyzed by experiments. We selected the optimal parameter combination to test on the experimental data. The identification precision and recall rate of the test results reached 99.05% and 87.94%, whereas the time consumption was only 4.24s. Experiment results show that our model is superior to typical existing license identification tools in all aspects.
更多
查看译文
关键词
mixed-source,matching feature,license text,identification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要