Mining The Network Of The Programmers: A Data-Driven Analysis Of Github

12TH CHINESE CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CHINESECSCW 2017)(2017)

引用 4|浏览47
暂无评分
摘要
GitHub is a worldwide popular website for version control and source code management. In addition, since its users can follow each other, it also forms a professional social network of millions of users. In this work, we perform a data-driven study for analyzing the GitHub network. By introducing a distributed crawling framework, we first collect profiles and behavioral data of more than 2 million GitHub users. To the best of our knowledge, this is the largest and latest public dataset of GitHub. Then, we build the social graph of these users and conduct a thorough analysis of the network structure. Moreover, we investigate the user behavior patterns, particularly the patterns of the "commit" activities. Finally, we utilize machine learning methods to discover important users in the network with a high accuracy and a low overhead. Our inspiring findings are helpful for GitHub to provide better services for its users.
更多
查看译文
关键词
GitHub, professional social networks, PageRank, machine learning, spatial-temporal analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要