ACM_Citations
  • USA-2012

    With Moses S Charikar and Piotr Indyk, for their groundbreaking work on Locality-Sensitive Hashing that has had great impact in many fields of computer science including computer vision, databases, information retrieval, machine learning, and signal processing. Hashing is one of the fundamental techniques in computer science: generating challenging theoretical questions as well as finding application in almost all fields of computer science. In many of these applications a refined form of hashing is required, where keys that are close in the original space are more likely to be mapped to the same hash value. This is the problem that locality-sensitive hashing addresses. As a result similarity search problems can be solved in the hashed space instead of the original space, leading often to significant improvements either in efficiency or quality. Motivated by the need to efficiently compute near-duplicate web pages, Andrei Broder introduced specific locality-sensitive hash functions, called minhash functions, and used them to estimate the similarity of sets. Piotr Indyk and the late Rajeev Motwani extended locality-sensitive hash functions to a wider range of distance functions and showed how to use them to design approximate nearest neighbor algorithms with sub-linear query time. Moses Charikar subsequently introduced a highly efficient family of locality-sensitive hash functions, called simhash functions, for angular distances as well as a new variant of the nearest neighbor search algorithm. Locality-sensitive hashing techniques have been widely used in many fields of computer science including computer vision, databases, information retrieval, data mining, machine learning, and signal processing. Furthermore, they have inspired the development of other hashing-based methods for similarity search such as parameter sensitive hashing, semantic hashing, spectral hashing, and kernelized LSH. The work has also enabled other significant theoretical developments. In particular, it has led to the design of nearest neighbor algorithms for more complex metric spaces that play an important role in various applications. It has also resulted in new discoveries in, and connections to, other areas such as vector quantization, communication complexity, and streaming algorithms.

  • USA-2007

    For contributions to algorithms and web technology.

Experience
Education
Bio
Some of the highlights are as follows: In 1997, he led the development of the first practical solution for finding near-duplicate documents on web-scale using "shingling" to reduce the problem to a set-intersection problem and "min-hashing" or to construct "sketches" of sets. This was a pioneering effort in the area of locally sensitive hashing, for which he eventually received the 2012 ACM Paris Kanellakis Award. In 1998, he co-invented the first practical test to prevent robots from masquerading as human and access web sites, often referred to as CAPTCHA. In 2000, he, while at AltaVista, together with colleagues from IBM and DEC SRC, conducted the first large-scale analysis of the Web graph, and identified the bow-tie model of the web graph. Around 2001-2002, he published an opinion piece where he qualified the differences between classical information retrieval and Web search; and introduced a now-widely-accepted classification of web queries into navigational, information, and transactional. He is a fellow of ACM and IEEE.