所有文章 > 正文

KDD国际顶级数据挖掘会议的华人时光,GCT上线KDD[2019]人才库

作者: GCT

时间: 2019-08-07 20:07

KDD会议是世界数据挖掘领域的顶级国际会议,本周GCT(全球华人专家库)最新上线了KDD[2019]人才库,目前在世界数据挖掘领域的1636名专家学者相关信息都被收录其中,不仅可以查看他们的论文数、学术活跃度、h指数等,还可以发掘该领域的学术新星,同时对相关信息进一步的统计分析。

国际顶尖数据挖掘会议SIGKDD(Association for Computing Machinery‘s (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining,简称KDD)是世界数据挖掘领域的顶级国际会议,自 1995 年以来,该会议已经举办了二十多年,每年该会议都吸引着全世界超过2000位学者,以及许多知名企业参加。

第25届KDD于2019年8月4日-9日在美国阿拉斯加召开。按照往常经验,该会议的论文接受率一般在20%,总数量约200篇左右。而相比之下,今年KDD的接受率则低至14%,主要有两方面的原因,一是今年的KDD论文评阅首次采用了双盲评审制度,论文中不得出现作者和机构信息,二是KDD 2019更加注重复现性,以鼓励研究人员共享研究方法,这些原因客观上增加了论文被接受的难度,但同时也大大提高了论文含金量。

利用AMiner学术数据挖掘引擎选取数据挖掘领域影响力排名前1000的专家学者绘制了该领域学者全球分布地图,(如下图所示)。从国家来看,数据挖掘领域的人才在美国最多,中国次之,英国和意大利等国也较为集中。从地区来看,美国东部人才最为集中,西欧人才分布也比较多,其次是中国大陆等地区。

z1.png

数据挖掘领域的人才分布图

无论是从参与KDD会议的人数、论文发表数、论文影响力,还是从获奖情况、参与活动组织和参展情况等多个方面来看,厚积薄发的华人力量正在强势崛起,在KDD领域拥有世界有目共睹的优异表现。

z2.png

清华大学AMiner团队多篇论文入选

论文题目:Representation Learning for Attributed Multiplex Heterogeneous Network

作者:Yukuo Cen, Xu Zou, Jianwei Zhang, Hongxia Yang, Jingren Zhou and Jie Tang

链接:http://keg.cs.tsinghua.edu.cn/jietang/publications/KDD19-Cen-et-al-Attributed_Multiplex_Network_Embedding.pdf

Abstract:Network embedding (or graph embedding) has been widely used in many real-world applications. However, existing methods mainly focus on networks with single-typed nodes/edges and cannot scale well to handle large networks. Many real-world networks consist of billions of nodes and edges of multiple types, and each node is associated with different attributes. In this paper, we formalize the problem of embedding learning for the Attributed Multiplex Heterogeneous Network and propose a unified framework to ad- dress this problem. The framework supports both transductive and inductive learning. We also give the theoretical analysis of the pro- posed framework, showing its connection with previous works and proving its better expressiveness. We conduct systematical eval- uations for the proposed framework on four different genres of 1 challenging datasets:Amazon, YouTube, Twitter, and Alibaba .Ex-perimental results demonstrate that with the learned embeddings from the proposed framework, we can achieve statistically signif- icant improvements (e.g., 5.99-28.23% lift by F1 scores;p ?0.01, t?test) over previous state-of-the-art methods for link prediction. The framework has also been successfully deployed on the recom- mendation system of a worldwide leading e-commerce company, Alibaba Group. Results of the offline A/B tests on product recom- mendation further confirm the effectiveness and efficiency of the framework in practice.

z3.png

GATNE-T and GATNE-I models

论文题目:Infer Implicit Contexts in Real-time Online-to-Offline Recommendation

作者:Xichen Ding, Jie Tang, Tracy Liu, Cheng Xu, Yaping Zhang, Feng Shi, Qixia Jiang and Dan Shen

链接:http://keg.cs.tsinghua.edu.cn/jietang/publications/KDD19-Ding-et-al-On2Off-Recommendation.pdf

Abstract:Understanding users’ context is essential for successful recommen- dations, especially for Online-to-Offline (O2O) recommendation, such as Yelp, Groupon, and Koubei1. Different from traditional rec- ommendation where individual preference is mostly static, O2O recommendation should be dynamic to capture variation of users’ purposes across time and location. However, precisely inferring users’ real-time contexts information, especially those implicit ones, is extremely difficult, and it is a central challenge for O2O rec- ommendation. In this paper, we propose a new approach, called Mixture Attentional Constrained Denoise AutoEncoder (MACDAE), to infer implicit contexts and consequently, to improve the quality of real-time O2O recommendation. In MACDAE, we first leverage the interaction among users, items, and explicit contexts to infer users’ implicit contexts, then combine the learned implicit-context representation into an end-to-end model to make the recommenda- tion. MACDAE works quite well in the real system. We conducted both offline and online evaluations of the proposed approach. Exper- iments on several real-world datasets (Yelp, Dianping, and Koubei) show our approach could achieve significant improvements over state-of-the-arts. Furthermore, online A/B test suggests a 2.9% in- crease for click-through rate and 5.6% improvement for conversion rate in real-world traffic. Our model has been deployed in the prod- uct of “Guess You Like” recommendation in Koubei.

z4.png

该AI算法用于阿里集团淘宝生产环境

z5.png

DAE、VAE和AMCDAE的模型架构设计

论文题目:OAG: Toward Linking Large-scale Heterogeneous Entity Graphs

作者:Fanjin Zhang, Xiao Liu, Jie Tang, Yuxiao Dong, Peiran Yao, Jie Zhang, Xiaotao Gu, Yan Wang, Bin Shao, Rui Li, and Kuansan Wang

链接:http://keg.cs.tsinghua.edu.cn/jietang/publications/KDD19-Zhang-et-al-Open_Academic_Graph.pdf

Abstract:Linking entities from different sources is a fundamental task in building open knowledge graphs. Despite much research conducted in related fields, the challenges of linking large-scale heterogeneous entity graphs are far from resolved. Employing two billion-scale academic entity graphs (Microsoft Academic Graph and AMiner) as sources for our study, we propose a unified framework — LinKG — to address the problem of building a large-scale linked entity graph. LinKG is coupled with three linking modules, each of which addresses one category of entities. To link word-sequence-based en- tities (e.g., venues), we present a long short-term memory network- based method for capturing the dependencies. To link large-scale entities (e.g., papers), we leverage locality-sensitive hashing and convolutional neural networks for scalable and precise linking. To link entities with ambiguity (e.g., authors), we propose heteroge- neous graph attention networks to model different types of entities. Our extensive experiments and systematical analysis demonstrate that LinKG can achieve linking accuracy with an F1-score of 0.9510, significantly outperforming the state-of-the-art. LinKG has been de- ployed to Microsoft Academic Search and AMiner to integrate the two large graphs. We have published the linked results—the Open Academic Graph (OAG)1, making it the largest publicly available heterogeneous academic graph to date.

z7.png

本文提出的LinKG架构

z8.png

LSTM Model for modeling sequencial dependency in venue full names

论文题目:Sequential Scenario-Specific Meta Learner for Online Recommendation

作者:Zhengxiao Du, Xiaowei Wang, Hongxia Yang, Jingren Zhou and Jie Tang

链接:http://keg.cs.tsinghua.edu.cn/jietang/publications/KDD19-Du-et-al-Meta_Learning_for_Recommendation.pdf

Abstract:Cold-start problems are long-standing challenges for practical recommendations.Most existing recommendation algorithms rely on extensive observed data and are brittle to recommendation scenarios with few interactions.This paper addresses such problems using few-shot learning andmeta learning.Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks.To accomplish this,we combine the scenario-specific learning with a model-agnostic sequential meta-learning and unify them into an integrated end-to-end framework,namely Scenario-specific Sequential Meta learner (or s2Meta).By doing so,our meta-learner produces a generic initial model through aggregating contextual information from avariety of prediction tasks while effectively adapting to specific tasks by leveraging learning-to-learn knowledge.Extensive experiments on various real-world datasets demonstrate that our proposed model can achieve significant gains over the state-of-the-arts for cold-start problems in online recommendation.Deployment is at the Guss You Like session, the front page of the Mobile Taobao;and the illustration video can also bewatched from the link.

z9.png

元学习和推荐框架

论文题目:Towards Knowledge-Based Personalized Product Description Generation in E-commerce

作者:Qibin Chen, Junyang Lin, Yichang Zhang, Hongxia Yang, Jingren Zhou and Jie Tang

链接:http://keg.cs.tsinghua.edu.cn/jietang/publications/KDD19-Chen-et-al-KOBE.pdf

Abstract:Quality product descriptions are critical for providing competitive customer experience in an e-commerce platform. An accurate and attractive description not only helps customers make an informed decision but also improves the likelihood of purchase. However,crafting a successful product description is tedious and highly time-consuming. Due to its importance, automating the product description generation has attracted considerable interest from both research and industrial communities.Existing methods mainly use templates or statistical methods,and their performance could be rather limited.In this paper,we explore a new way to generate personalized product descriptions by combining the power of neural networks and knowledge base.Specifically,we propose a KnOwledge Based pErsonalized(or KOBE)product description generation model in the context of e-commerce.In KOBE,we extend the encoder-decoder framework,the Transformer,to a sequence modeling formulation using self-attention. In order to make the description both informative and personalized, KOBE considers a variety of important factors during text generation, including product aspects,user categories,and knowledge base.Experiments on real-world datasets demonstrate that the proposed method outperforms the baseline on various metrics.KOBE can achieve an improvement of 9.7% over state-of-the-arts in terms of BLEU.We also present several case studies as the anecdotal evidence to further prove the effectiveness of the proposed approach.The framework has been deployed in Taobao, the largest online e-commerce plat form in China.

z10.png

摘要中提到的基于知识的个性化产品描述生成的例子

z11.png

KOBE模型架构

z12.png

条件模型的输入式

小编接下来会利用GCT全球华人专家库中检索到的丰富信息为各位介绍两位KDD华人科学家,想要更深入地了解华人学者,您只需在GCT华人专家库中输入学者姓名进行检索,即可免费获取该学者的相关信息。

本届KDD会议上其他华人科学家也有不错的成绩。

裴健教授

纵观历年KDD的发展,我们可以看到有越来越多的华人科学家出现在该领域,其中最著名的当属加拿大西蒙弗雷泽大学的裴健教授,他在2017年开始担任为期两年的KDD会议主席,期间为推动KDD的组织多样性和成果的创新应用做出了杰出贡献。

裴健(Jian Pei)是加拿大西蒙弗雷泽大学计算机学院教授、ACM和IEEE Fellow、加拿大研究讲席教授(Canada Research Chair, Tier I)。其先后在北京大学、纽约州立大学布法罗分校(University at Buffalo, the State University of New York)、西蒙弗雷泽大学(Simon Fraser University)以及香港中文大学等地学习和工作(如下图所示)。

z13.png

裴健教授职业迁徙路线

裴健教授的学术多样性(Academic Diversity)、学术交际能力(Academic Sociability)均非常优秀,共发表学术论文314篇,学术引用(Academic Citation)也极高,达到了42912,这显示了其在大数据挖掘领域的权威性。其主攻的学术领域集中在研究和开发针对新颖的数据密集型应用的高效数据分析技术,他的研究领域包括数据挖掘、信息检索、数据库系统及其在社会网络和社会媒体、医学信息学、商业智能等领域中的应用(如下图所示)。

z14.png

裴健教授的学术领域和学术统计

与其合作最为深入的科学家是美国伊利诺伊大学香槟分校计算机系教授韩家炜。值得一提的是,韩家炜博士是裴健教授在加拿大西蒙弗雷泽大学的博士导师。

z15.png

裴健教授的深入合作伙伴韩家炜教授

石川教授

在KDD2019的论文接受难度陡然提升的情况下,来自北京邮电大学的石川教授却有2篇论文被接收,足以说明石川教授在数据挖掘领域的造诣。石川教授主要研究方向包括数据挖掘、机器学习、人工智能和大数据分析。学术多样性(Academic Diversity)、学术交际能力(Academic Sociability)均非常优秀。

z16.png

石川教授的学术领域和学术统计

近五年来,石川教授以第一或者通讯作者发表高水平学术论文65篇,包括数据挖掘领域的顶级期刊和会议IEEE TKDE、ACM TIST、KDD、AAAI、IJCAI、CIKM等。申请国家发明专利十余项,国际专利1项,已授权4项,相关研究成果应用到阿里巴巴、腾讯和华为等知名企业。并获得ADMA2011和ADMA2018等国际会议最佳论文奖、CCF-腾讯犀牛鸟基金及项目优秀奖,并指导学生获得顶尖国际数据挖掘竞赛IJCAI Contest 2015 全球冠军。

z17.png

石川教授的学术成果

近几年来,华人无论是从参与KDD会议的人数、论文发表数、论文影响力,还是从获奖情况、参与活动组织和参展情况等多个方面,我们的都可以看到大数据挖掘领域的华人力量正在强势崛起,除此之外,腾讯、阿里、滴滴、京东等国内优秀互联网企业也逐渐开始在KDD上占据一席之地。在KDD 2019上,滴滴广告算法团队关于库存预估的一作论文被正式收录,京东和清华大学联合发表的关于一种新的强化学习框架论文也被收录,而在去年的KDD 2018会议上,阿里集团共有5篇论文被正式收录。

国内企业势如破竹的发展除了依托国内海量的大数据外,更需要依靠大量优秀人才,对于企业来说,除了网罗当下的优秀人才,寻找符合未来技术发展趋势的优秀人才更是一门重要功课。

作为世界上最智能的科技情报引擎,GCT对各领域专家学者进行全息画像,包括学者学术合作、社会交往关系,描摹学术迁徙路径,深度挖掘潜藏在数据背后的学者信息,核心技术应用于中国工程院、科技部、国家自然基金委、华为、腾讯、搜狗、阿里等20余家单位,可以为用户提供精准的专家画像、人才洞察以及趋势洞察服务。

二维码 扫码微信阅读
推荐阅读 更多