PPGloVe: Privacy-Preserving GloVe for Training Word Vectors in the Dark

IEEE Transactions on Information Forensics and Security(2024)

引用 0|浏览3
暂无评分
摘要
Words are treated as atomic units in natural language processing tasks and it is a fundamental step to represent them as vectors for supporting subsequent computations. GloVe is a widely used machine learning model to train word vectors. Generally, a large corpus and high computation resources are required to train high-quality word vectors using GloVe, making it difficult for users to train their own word vectors by themselves. A natural choice nowadays is to outsource the training process to the cloud. However, coming with such cloud-based training services are serious privacy concerns, which should be well addressed. In this paper, we design, implement, and evaluate PPGloVe, the first system framework that supports privacy-preserving word vectors training using GloVe over encrypted data of multiple participants. We first decompose the training task and show that previous privacy-preserving machine learning techniques are not practical for this task. We then construct a new secure training strategy to delicately bridge lightweight cryptographic techniques with GloVe in depth to support privacy-preserving GloVe training on the cloud. By design, the corpora of the participants and the trained word vectors are kept private along the whole training process. Extensive experiments over three datasets of different scales demonstrate that PPGloVe produces word vectors with promising quality comparable to plaintext training, with practically affordable overhead.
更多
查看译文
关键词
Privacy preservation,data security,word representation,cloud computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要