We evaluated embedding-based retrieval on verticals for Facebook Search with significant metrics gains observed in online A/B experiments
Embedding-based Retrieval in Facebook Search
KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Virtual Event..., pp.2553-2561, (2020)
Search in social networks such as Facebook poses different challenges than in classical web search: besides the query text, it is important to take into account the searcher's context to provide relevant results. Their social graph is an integral part of this context and is a unique aspect of Facebook search. While embedding-based retriev...更多
下载 PDF 全文
- Search engines have been an important tool to help people access the huge amount of information online.
- Since it is difficult to accurately compute the search intent from query text and represent the semantic meaning of documents, search techniques are mostly based on various term matching methods , which performs well for the cases that keyword match can address.
- Due to the huge success of this technique in other domains including computer vision and recommendation system, it has been an active research topic in information retrieval community and search engine industry as the generation search technology 
- Search engines have been an important tool to help people access the huge amount of information online
- We evaluated embedding-based retrieval on verticals for Facebook Search with significant metrics gains observed in online A/B experiments
- We discuss the techniques we developed on later-stage optimization to unleash the power from embedding-based retrieval end to end in Section 5
- While it is important to tune ANN algorithms and parameters offline to get a reasonable understanding of perf vs. recall trade-off, we found it is important to deploy several configs of the ANN algorithms and parameters online to get a better understanding of the perf impact from embedding-based retrieval to the real system
- We presented our approach of unified embedding to model semantics for social search, and the implementation of embedding-based retrieval in a classical inverted index based search system
- The successful deployment of embedding-based retrieval in production opens a door for sustainable improvement of retrieval quality by leveraging the latest semantic embedding learning techniques
- Search is a multi-stage ranking system where retrieval is the first stage, followed by various stages of ranking and filtering models.
- The authors incorporated embeddings into ranking layers and built a training data feedback loop to actively learn to identify those good and bad results from embedding-based retrieval.
- The authors discuss the techniques the authors developed on later-stage optimization to unleash the power from embedding-based retrieval end to end in Section 5.
- The authors dedicate Section 6 to selected topics on advanced modeling techniques, followed by conclusions in Section 7
- It has long term benefits to introduce semantic embeddings into search retrieval to address the semantic matching issues by leveraging the advancement on deep learning research.
- It is a highly challenging problem due to the modeling difficulty, system implementation and cross-stack optimization complexity, especially for a large-scale personalized social search engine.
- The authors introduced the progress and learnings from the first step along this direction, especially on hard mining and embedding ensemble
- Table1: Group Embedding Improvement with Feature Engineering
- Table2: Top Similar Groups Before and After Adding Location Embeddings
- Table3: Impact of Product Quantization on 1-Recall@10 for 128-Dimension Embedding
- Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 201Modern Information Retrieval: The Concepts and Technology behind Search (2nd ed.). Addison-Wesley Publishing Company, USA.
- Y. Bengio, A. Courville, and P. Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (Aug 2013), 1798–1828.
- Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko, Lucian Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin, Sriram Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, and Ning Zhang. 201Unicorn: a system for searching the social graph. Proceedings of the VLDB Endowment 6, 11, 1150–1161.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805
- Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized Product Quantization for Approximate Nearest Neighbor Search. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In Defense of the Triplet Loss for Person Re-Identification. CoRR abs/1703.07737 (2017). arXiv:1703.07737 http://arxiv.org/abs/1703.07737
- Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM âĂŹ13). Association for Computing Machinery, New York, NY, USA, 2333âĂŞ2338.
- Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1 (Jan. 2011), 117âĂŞ128.
- Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017).
- Yann LeCun, Yoshua Bengio, and Geoffrey E. Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
- Victor Lempitsky. 2012. The Inverted Multi-Index. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR âĂŹ12). IEEE Computer Society, USA, 3069âĂŞ3076.
- Hang Li and Jun Xu. 2014. Semantic Matching in Search. Now Publishers Inc., Hanover, MA, USA.
- Bhaskar Mitra and Nick Craswell. 2018. An Introduction to Neural Information Retrieval. Foundations and TrendsÂő in Information Retrieval 13, 1 (December 2018), 1–126.
- Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering.. In CVPR. IEEE Computer Society, 815–823. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2015.html#SchroffKP15
- Josef Sivic and Andrew Zisserman. 2003. Video Google: A Text Retrieval Approach to Object Matching in Videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2 (ICCV âĂŹ03). IEEE Computer Society, USA, 1470.
- Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2015. Deep Metric Learning via Lifted Structured Feature Embedding. CoRR abs/1511.06452 (2015). arXiv:1511.06452 http://arxiv.org/abs/1511.06452
- Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, and Philipp Krähenbühl. 20Sampling Matters in Deep Embedding Learning. CoRR abs/1706.07567 (2017). arXiv:1706.07567 http://arxiv.org/abs/1706.07567
- Yuhui Yuan, Kuiyuan Yang, and Chao Zhang. 2017. Hard-Aware Deeply Cascaded Embedding. In The IEEE International Conference on Computer Vision (ICCV).