Open-Source License Violations of Binary Software at Large Scale

2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)(2019)

引用 12|浏览34
暂无评分
摘要
Open-source licenses are widely used in open-source projects. However, developers using or modifying the source code of open-source projects do not always strictly follow the licenses. GPL and AGPL, two of the most popular copyleft licenses, are most likely to be violated, because they require developers to open-source the entire project if any code under GPL/AGPL protection is included whether modified or not. There are few license violation detectors focusing on binary software, owning to the challenge of mapping binary code to source code efficiently and accurately at large scale. In this paper, we propose a scalable and fully-automated system to check open-source license violation of binary software at large scale. We match source code to binary code by analyzing file attributes of executable files and code features that are not affected by compilation and could vary between projects. Moreover, to break the barrier of large-scale analysis, we introduce an automatic extractor to parse executable files from installation packages that are broadly available in software download sites. In empirical experiments of binary-to-source mapping, we have got a remarkable high accuracy of 99.5% and recall of 95.6% without significant loss of precision. Besides, 2270 pairs of binary-to-source mapping relationships are discovered, with 110 license violations of GPL and AGPL licenses related to 7.2% of the 1000 real-world binary software projects.
更多
查看译文
关键词
Licenses,Open source software,Feature extraction,Binary codes,Libraries,Arrays
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要