Vector Quantization Knowledge Transfer for End-to-End Text Image Machine Translation

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

Cited 0|Views30
No score
End-to-end text image machine translation (TIMT) aims at translating source language embedded in images into target language without recognizing intermediate texts in images. However, the data scarcity of end-to-end TIMT task limits the translation performance. Existing research explores aligning continuous features from related tasks of text image recognition (TIR) or machine translation (MT) to alleviate the problem of data limitation, but the alignment in continuous vector space is extremely difficult and it inevitably introduces fitting errors resulting in significant performance degradation. To better align TIMT features with MT semantic features, we propose a novel Vector Quantization Knowledge Transfer (VQKT) method that employs a trainable codebook to quantize continuous features into discrete space. The quantization distribution of the MT feature is utilized as the teacher distribution to guide the TIMT model to generate similar discrete codes. Through alignment and knowledge transfer based on probability distribution, the TIMT model can better imitate the feature representation of the MT teacher model and generate high-quality target language translation. Extensive experiments demonstrate VQKT significantly outperforms the existing end-to-end TIMT performance.
Translated text
Key words
Text image machine translation,vector quantization,quantization distribution,knowledge transfer
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined