Explaining the Success of Nearest Neighbor Methods in Prediction

Periodicals(2018)

引用 127|浏览86
暂无评分
摘要
AbstractMany modern methods for prediction leverage nearest neighborsearch to find past training examples most similar toa test example, an idea that dates back in text to at leastthe 11th century and has stood the test of time. This monographaims to explain the success of these methods, both intheory, for which we cover foundational nonasymptotic statisticalguarantees on nearest-neighbor-based regression andclassification, and in practice, for which we gather prominentmethods for approximate nearest neighbor search thathave been essential to scaling prediction systems reliant onnearest neighbor analysis to handle massive datasets. Furthermore,we discuss connections to learning distances foruse with nearest neighbor methods, including how randomdecision trees and ensemble methods learn nearest neighborstructure, as well as recent developments in crowdsourcingand graphons.In terms of theory, our focus is on nonasymptotic statisticalguarantees, which we state in the form of how many trainingdata and what algorithm parameters ensure that a nearestneighbor prediction method achieves a user-specified errortolerance. We begin with the most general of such resultsfor nearest neighbor and related kernel regression and classificationin general metric spaces. In such settings in whichwe assume very little structure, what enables successful predictionis smoothness in the function being estimated forregression, and a low probability of landing near the decisionboundary for classification. In practice, these conditionscould be difficult to verify empirically for a real dataset. Wethen cover recent theoretical guarantees on nearest neighborprediction in the three case studies of time series forecasting,recommending products to people over time, and delineatinghuman organs in medical images by looking at imagepatches. In these case studies, clustering structure, whichis easier to verify in data and more readily interpretable bypractitioners, enables successful prediction.
更多
查看译文
关键词
Machine Learning,Nonparametric methods,Statistical learning theory,Classification and prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要