bigg2vec: Fast and Memory-Efficient Representation Learning for Billion-Scale Graphs on a Single Machine.
Big Data(2022)
Abstract
Node embeddings obtained from information networks have been widely adopted for representing knowledge and driving various information retrieval and machine learning tasks. However, training node embeddings is computationally intensive, making it difficult to scale to larger graphs. Most existing works have addressed the scalability challenge by simply adding more hardware resources. For example, a common approach to speed up the training process is to distribute model computation across multiple machines and GPUs. This paper takes an orthogonal approach towards scalability by addressing the problem of computation complexity in training embeddings. We present bigg2vec for scaling up the embedding training process. bigg2vec introduces a novel polar coordinate-based system for internal representation and computation. It provides the following benefits: (a) It significantly reduces compute and memory requirements while improving embedding quality and (b) uses a novel graph organization to generate high-quality negative samples (this reduces the number of negative samples needed for training, which is especially beneficial f or skewed graphs). We have deployed bigg2vec to generate embeddings for multiple AI models within Visa. Our Global Personalized Restaurant Recommender System (GPR) is one such project that uses bigg2vec to periodically generate embeddings for over 450 million nodes connected by more than 3 billion edges. bigg2vec generates higher quality embeddings while training them faster than state-of-the-art methods on a single CPU-based machine.
MoreTranslated text
Key words
graph representation learning,scalability,embeddings,billion scale
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined