Non-parametric graph-based methods for large scale problems

Non-parametric graph-based methods for large scale problems(2013)

引用 23|浏览24
暂无评分
摘要
The notion of similarity between observations plays a very fundamental role in many Machine Learning and Data Mining algorithms. In many of these methods, the fundamental problem of prediction, which is making assessments and/or inferences about the future observations from the past ones, boils down to how “similar” the future cases are to the already observed ones. However, similarity is not always obtained through the traditional distance metrics. Data-driven similarity metrics, in particular, come into play where the traditional absolute metrics are not sufficient for the task in hand due to special structure of the observed data. A common approach for computing data-driven similarity is to somehow aggregate the local absolute similarities (which are not data-driven and can be computed in a closed-from) to infer a global data-driven similarity value between any pair of observations. The graph-based methods offer a natural framework to do so. Incorporating these methods, many of the Machine Learning algorithms, that are designed to work with absolute distances, can be applied on those problems with data-driven distances. This makes graph-based methods very effective tools for many real-world problems.In this thesis, the major problem that I want to address is the scalability of the graph-based methods. With the rise of large-scale, high-dimensional datasets in many real-world applications, many Machine Learning algorithms do not scale up well applying to these problems. The graph-based methods are no exception either. Both the large number of observations and the high dimensionality hurt graph-based methods, computationally and statistically. While the large number of observations imposes more of a computational problem, the high dimensionality problem has more of a statistical nature. In this thesis, I address both of these issues in depth and review the common solutions for them proposed in the literature. Moreover, for each of these problems, I propose novel solutions with experimental results depicting the merits of the proposed algorithms. Finally, I discuss the contribution of the proposed work from a broader viewpoint and draw some future directions of the current work.
更多
查看译文
关键词
computational problem,global data-driven similarity value,data-driven similarity,graph-based method,high dimensionality problem,large number,data-driven distance,fundamental problem,Non-parametric graph-based method,large scale problem,local absolute similarity,Data-driven similarity metrics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要