AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We demonstrate that current methods for unsupervised matching of vector spaces are sensitive to the structure of the spaces

Learning to Pronounce Chinese Without a Pronunciation Dictionary

EMNLP 2020, pp.5687-5693, (2020)

Cited by: 0|Views192
Full Text
Bibtex
Weibo

Abstract

We demonstrate a program that learns to pronounce Chinese text in Mandarin, without a pronunciation dictionary. From non-parallel streams of Chinese characters and Chinese pinyin syllables, it establishes a many-to-many mapping between characters and pronunciations. Using unsupervised methods, the program effectively deciphers writing int...More

Code:

Data:

0
Introduction
  • Many papers address the construction of automatic grapheme-to-phoneme systems using rules or supervised learning, e.g.
  • The task of unsupervised grapheme-tophoneme conversion is introduced by Knight and Yamada (1999).
  • The authors re-visit the task of deciphering Chinese text into standard Mandarin pronunciations (Figure 1).
  • The authors further explore exposing the internals of characters and syllables to the analyzer, as Chinese characters sharing written components often sound similar
Highlights
  • Many papers address the construction of automatic grapheme-to-phoneme systems using rules or supervised learning, e.g. (Berndt et al, 1987; Zhang et al, 2002; Xu et al, 2004; Bisani and Ney, 2008; Peters et al, 2017)
  • We find it compelling that pronunciation dictionaries are largely redundant with non-parallel text and speech corpora, even for writing systems as complex as Chinese
  • We achieve 71%, substantially beating the 22% accuracy reported by Knight and Yamada (1999), as well as the 8.6% of a re-implementation applied to our data
  • The EM method achieves a test-set accuracy of 71%, while the vector-based method achieves 81%
  • We demonstrate that current methods for unsupervised matching of vector spaces are sensitive to the structure of the spaces
  • Their pinyin-pair pruning has little effect, due to the size of our pinyin corpus (155,219 unique pairs). We ran their expectation-maximization (EM) algorithm for 170 iterations on a character corpus of 300,000 tokens, applied their decoding algorithm to our 6059-token test, obtaining a token pronunciation accuracy of 8.6%. Because this accuracy is lower than their reported 22%, we confirmed our results with two separate implementations, and we took the best of 10 random restarts
  • We find that the two methods agree 47% of time, and are 98.7% accurate in agreement cases, so in an unsupervised way, we distill out 261 high-confidence character/pinyin mappings
Results
  • Their pinyin-pair pruning has little effect, due to the size of the pinyin corpus (155,219 unique pairs)
  • The authors ran their expectation-maximization (EM) algorithm for 170 iterations on a character corpus of 300,000 tokens, applied their decoding algorithm to our 6059-token test, obtaining a token pronunciation accuracy of 8.6%.
  • The authors achieve 71%, substantially beating the 22% accuracy reported by Knight and Yamada (1999), as well as the 8.6% of a re-implementation applied to the data
Conclusion
  • The authors implement and evaluate techniques to pronounce Chinese text in Mandarin, without the use of a pronunciation dictionary or parallel resource.
  • The EM method achieves a test-set accuracy of 71%, while the vector-based method achieves 81%.
  • In the presence of one-to-many mappings between pinyin and characters, the mapping accuracy is severely downgraded, leaving open an opportunity to design more robust unsupervised vector mapping systems.
  • The authors find that the two methods agree 47% of time, and are 98.7% accurate in agreement cases, so in an unsupervised way, the authors distill out 261 high-confidence character/pinyin mappings
Tables
  • Table1: Token and type statistics for our non-parallel character and syllable corpora. Singletons are onecount types
  • Table2: Even with a partial pronunciation dictionary, it is difficult to predict exact pronunciation of a new written character from its components. This table records accuracy of pronunciation guesses for characters 2001-3000 (by frequency), given pronunciations of characters 1-2000; for these types, yuis most frequent. Match 1 uses a character’s second component, e.g., guessing (incorrectly) that 耗 (hao) is pronounced the same as 毛 (mao). Match 2 uses either the first or second component, whichever is better. Partial match credits either onset or rime, e.g., counting hao for mao as correct
  • Table3: Accuracy of vector-mapping approaches, measuring % of character tokens we assign the correct (no tone) pinyin pronunciation to. Testing is on the first 6059 characters of the character corpus
  • Table4: Accuracy of noisy-channel decoding after EM training. N is the number of unique character triples shown to EM, and M is the number of unique pinyin triples available to “explain” each character triple. Accuracy ranges denote multiple random restarts
  • Table5: Improving EM results by assigning high initial weights to the 261 agreed-on mappings (“hints”) from EM and vector-based methods. Accuracy ranges are due to multiple random restarts
Download tables as Excel
Funding
  • Their pinyin-pair pruning has little effect, due to the size of our pinyin corpus (155,219 unique pairs). We ran their expectation-maximization (EM) algorithm for 170 iterations on a character corpus of 300,000 tokens, then applied their decoding algorithm to our 6059-token test, obtaining a token pronunciation accuracy of 8.6%. Because this accuracy is lower than their reported 22%, we confirmed our results with two separate implementations, and we took the best of 10 random restarts
  • We achieve 71%, substantially beating the 22% accuracy reported by Knight and Yamada (1999), as well as the 8.6% of a re-implementation applied to our data
Study subjects and analysis
unique pairs: 155219
We applied this re-implementation to our data. Their pinyin-pair pruning has little effect, due to the size of our pinyin corpus (155,219 unique pairs). We ran their expectation-maximization (EM) algorithm for 170 iterations on a character corpus of 300,000 tokens, then applied their decoding algorithm to our 6059-token test, obtaining a token pronunciation accuracy of 8.6%

Reference
  • M. Artetxe, G. Labaka, and E. Agirre. 2018. Unsupervised statistical machine translation. In Proc. EMNLP.
    Google ScholarLocate open access versionFindings
  • R. S. Berndt, J. A. Reggia, and C. C. Mitchum. 1987. Empirically derived probabilities for grapheme-tophoneme correspondences in English. Behavior Research Methods, Instruments, —& Computers, 19.
    Google ScholarLocate open access versionFindings
  • M. Bisani and H. Ney. 2008. Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication, 50.
    Google ScholarLocate open access versionFindings
  • P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5.
    Google ScholarLocate open access versionFindings
  • Y. Kim, M. Graca, and H. Ney. 2020. When and why is unsupervised neural machine translation useless? In Proc. EACL.
    Google ScholarLocate open access versionFindings
  • K. Knight and K. Yamada. 1999. A computational approach to deciphering unknown scripts. In Proc. ACL Workshop on Unsupervised Learning in Natural Language Processing.
    Google ScholarLocate open access versionFindings
  • G. Lample, A. Conneau, L. Denoyer, and M. Ranzato. 2018a. Unsupervised machine translation using monolingual corpora only. In Proc. ICLR.
    Google ScholarLocate open access versionFindings
  • G. Lample, A. Conneau, M. Ranzato, L. Denoyer, and H. Jegou. 2018b. Word translation without parallel data. In Proc. ICLR.
    Google ScholarLocate open access versionFindings
  • K. Marchisio, K. Duh, and P. Koehn. 2020. When does unsupervised machine translation work? CoRR, arXiv:2004.05516.
    Findings
  • B. Peters, J. Dehdari, and J. van Genabith. 2017. Massively multilingual neural grapheme-to-phoneme conversion. In Proc. of the First Workshop on Building Linguistically Generalizable NLP Systems.
    Google ScholarLocate open access versionFindings
  • A. Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2).
    Google ScholarLocate open access versionFindings
  • J. Xu, G. Fu, and H. Li. 2004. Grapheme-to-phoneme conversion for Chinese text-to-speech. In Proc. Interspeech.
    Google ScholarLocate open access versionFindings
  • B. Zhang, H. Huang, X. Pan, H. Ji, K. Knight, Z. Wen, Y. Sun, J. Han, and B. Yener. 2014. Be appropriate and funny: Automatic entity morph encoding. In Proc. ACL.
    Google ScholarLocate open access versionFindings
  • T. Zhang, A. Chowdhury, N. Dhulekar, J. Xia, K. Knight, H. Ji, B. Yener, and L. Zhao. 2016. From image to translation: Processing the endangered Nyushu script. ACM Trans. Asian —& LowResource Lang. Inf. Process., 15.
    Google ScholarLocate open access versionFindings
  • Z. Zhang, M. Chu, and E. Chang. 2002. An efficient way to learn rules for grapheme-to-phoneme conversion in Chinese. In Proc. ISCSLP.
    Google ScholarLocate open access versionFindings
Author
Christopher Chu
Christopher Chu
Scot Fang
Scot Fang
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科