所有文章 > 正文

KDD 2019放榜 中国军团大放异彩

作者: GCT

时间: 2019-08-09 14:51

这一届的KDD大会中,在大数据挖掘领域的中国军团强势崛起,无论是Research Track,还是KDD Cup比赛,都可以看到中国学者和学生的身影。

作为数据挖掘领域最顶级的学术会议,KDD 大会以论文接收严格闻名,每年的接收率不超过20%,因此颇受行业关注。

今年是KDD大会采用双盲评审的第一年,并且KDD投稿通知中将“可重现性”作为重点,鼓励作者们在论文中公开研究代码和数据,汇报他们的方法在公开数据集上的实验结果,并尽可能完整描述论文中使用的算法和资源,以保证可重现性。

a01.png

与往年一样,今年的KDD大会分为 Research track和 Applied track。

据了解,今年KDD Research track 共收到了 1179 篇投稿,其中约 111 篇被接收为 oral 论文,63 篇被接收为 poster 论文,接收率约 15%。Applied track收到大约 700 篇论文,其中大约 45 篇被接收为 oral 论文,约 100 篇被接收为 poster 论文,接收率约 20.7%。

值得关注的是,这一届的KDD大会中,在大数据挖掘领域的中国军团强势崛起,无论是Research Track,还是KDD Cup比赛,都可以看到中国学者和学生的身影。

KDD2019华人学者精彩论文

论文题目:Network Density of States

a02.png

第一作者:Dong Kun

论文链接:https://arxiv.org/pdf/1905.09758.pdf

摘要: Spectral analysis connects graph structure to the eigenvalues and eigenvectors of associated matrices. Much of spectral graph theory descends directly from spectral geometry, the study of differentiable manifolds through the spectra of associated differential operators. But the translation from spectral geometry to spectral graph theory has largely focused on results involving only a few extreme eigenvalues and their associated eigenvalues. Unlike in geometry, the study of graphs through the overall distribution of eigenvalues — the spectral density — is largely limited to simple random graph models. The interior of the spectrum of real-world graphs remains largely unexplored, difficult to compute and to interpret. In this paper, we delve into the heart of spectral densities of real-world graphs. We borrow tools developed in condensed matter physics, and add novel adaptations to handle the spectral signatures of common graph motifs. The resulting methods are highly efficient, as we illustrate by computing spectral densities for graphs with over a billion edges on a single compute node. Beyond providing visually compelling fingerprints of graphs, we show how the estimation of spectral densities facilitates the computation of many common centrality measures, and use spectral densities to estimate meaningful information about graph structure that cannot be inferred from the extremal eigenpairs alone.

a03.png

该文荣获Research Track 最佳论文奖,作者团队来自康奈尔大学,第一作者Dong Kun是康奈尔大学应用数学专业博士生。其他作者还包括Austin R. Benson,通信作者为康奈尔大学应用数学系教授David Bindel。

论文题目:Optimizing Impression Counts for Outdoor Advertising

第一作者:Yipeng Zhang

论文链接:https://dl.acm.org/citation.cfm?doid=3292500.3330829

摘要:In this paper we propose and study the problem of optimizing the influence of outdoor advertising (ad) when impression counts are taken into consideration. Given a database U of billboards, each of which has a location and a non-uniform cost, a trajectory database T and a budget B, it aims to find a set of billboards that has the maximum influence under the budget. In line with the advertising consumer behavior studies, we adopt the logistic function to take into account the impression counts of an ad (placed at different billboards) to a user trajectory when defining the influence measurement. However, this poses two challenges: (1) our problem is NP-hard to approximate within a factor of O(|T |1-ε ) for any ε > 0 in polynomial time; (2) the influence measurement is nonsubmodular, which means a straightforward greedy approach is not applicable. Therefore, we propose a tangent line based algorithm to compute a submodular function to estimate the upper bound of influence. Henceforth, we introduce a branch-and-bound framework with a θ-termination condition, achieving θ 2 (1- 1/e) approximation ratio. However, this framework is time-consuming when |U| is huge. Thus, we further optimize it with a progressive pruning upper bound estimation approach which achieves θ 2 (1 - 1/e - ε) approximation ratio and significantly decreases the running-time. We conduct the experiments on real-world billboard and trajectory datasets, and show that the proposed approaches outperform the baselines by 95% in effectiveness. Moreover, the optimized approach is around two orders of magnitude faster than the original framework.

本文获得Research Track论文奖第二名,课题组提出并研究了对室外广告影响力的优化问题,尤其是关于广告牌给用户留下好印象的次数的计算问题。对于给定的广告牌数据集U,每个广告牌都有自己的地理位置和不同的成本以及一个映射数据集T和预算B,目的是找到在预算B条件下达到最大影响力的广告牌。和广告消费者行为研究一样,利用逻辑函数来计算展示在不同广告牌上的广告给消费者留下好印象的次数,以此作为衡量广告影响力的量度。

论文题目:A Hierarchical Career-Path-Aware Neural Network for Job Mobility Prediction

第一作者:Qingxin Meng

论文链接: https://w.url.cn/s/Aqkh442

摘要:The understanding of job mobility can benefit talent management operations in a number of ways, such as talent recruitment, talent development, and talent retention. While there is extensive literature showing the predictability of the organization-level job mobility patterns (e.g., in terms of the employee turnover rate), there are no effective solutions for supporting the understanding of job mobility at an individual level. To this end, in this paper, we propose a hierarchical career-path-aware neural network for learning individual-level job mobility. Specifically, we aim at answering two questions related to individuals in their career paths: 1) who will be the next employer? 2) how long will the individual work in the new position? Specifically, our model exploits a hierarchical neural network structure with embedded attention mechanism for characterizing the internal and external job mobility. Also, it takes personal profile information into consideration in the learning process. Finally, the extensive results on real-world data show that the proposed model can lead to significant improvements in prediction accuracy for the two aforementioned prediction problems. Moreover, we show that the above two questions are well addressed by our model with a certain level of interpretability. For the case studies, we provide data-driven evidence showing interesting patterns associated with various factors (e.g., job duration, firm type, etc.) in the job mobility prediction process.

本论文由熊辉教授指导,熊辉教授一直致力于数据挖掘、大数据分析、商务智能、互联网证券和信息安全等领域的科学研究。因在大数据/数据挖掘领域的突出成就和影响力,熊辉教授在2014年当选ACM杰出科学家。在工业界,熊辉教授也和众多世界500强企业开展过重要商务智能合作研究项目,其中包括Citrix Systems Inc.、IBM、华为、百度等。

a04.png

熊辉教授研究兴趣及网络关系

a05.png

熊辉教授迁徙图

近几年来,中国在数据挖掘领域迅速崛起,在本次KDD大会中,除了华人博士获得最佳论文奖之外,KDD CUP奖项也几乎被国人包揽,此外,本届大会组委会中也出现了不少中国学者的名字。这些成就预示着国内企业在机器学习、数据挖掘、自然语言处理、社会网络、高性能计算等领域势如破竹的发展趋势。

企业的发展离不开大量优秀的人才,对于企业来说,网罗天下英才,寻找符合未来发展趋势的人才重中之重。

全球华人专家库(GCT)致力于打造汇集国内外在学术领域有影响力的华裔人才,目前已经汇集了包括TR35华人人才库、青年长江学者库、杰青人才库、ICML学霸库等近百个各领域华人高端人才库。

全球华人专家库(GCT)为用户提供基于清华AMiner平台技术的深度人才画像,包括人才基本信息如姓名、单位、职位等;通过大数据挖掘学者研究兴趣,学术社交关系,相似专家等深度信息以及多维度评价信息,包括学术活跃度、H指数,社交活跃度等。为用户提供精准的专家画像、人才洞察以及趋势洞察服务。

推荐阅读 更多