A Corpus for Large-Scale Phonetic Typology

ACL, pp. 4526-4546, 2020.

Cited by: 0|Bibtex|Views38|Links
EI
Keywords:
cross linguistic phoneticEquivalent Rectangular Bandwidthspeech corporaphonetic typologyspeech corpusMore(8+)
Weibo:
We present two case studies illustrating both the research potential and limitations of this corpus for investigation of phonetic typology at a large scale

Abstract:

A major hurdle in data-driven research on typology is having sufficient data in many languages to draw meaningful conclusions. We present VoxClamantis v1.0, the first large-scale corpus for phonetic typology, with aligned segments and estimated phoneme-level labels in 690 readings spanning 635 languages, along with acoustic-phonetic mea...More
0
Introduction
  • Understanding the range and limits of crosslinguistic variation is fundamental to the scientific study of language.
  • In speech and phonetic typology, this involves exploring potentially universal tendencies that shape sound systems and govern phonetic structure.
  • Such investigation requires access to large amounts of cross-linguistic data.
  • The recently developed CMU Wilderness corpus (Black, 2019) constitutes an exception to this rule with over 600 languages
  • This makes it the largest and most typologically diverse speech corpus to date.
  • In addition to its coverage, the CMU Wilderness corpus is unique in two additional aspects: cleanly recorded, read speech exists for all languages in the corpus, and the same content exists across all languages
Highlights
  • Understanding the range and limits of crosslinguistic variation is fundamental to the scientific study of language
  • In addition to its coverage, the CMU Wilderness corpus is unique in two additional aspects: cleanly recorded, read speech exists for all languages in the corpus, and the same content exists across all languages
  • We present a series of targeted case studies illustrating the utility of our corpus for large-scale phonetic typology
  • We present two case studies illustrating both the research potential and limitations of this corpus for investigation of phonetic typology at a large scale
Results
  • As shown in Figure 4 and Table 3, the strongest correlations in mean F1 frequently reflected uniformity of height.12.
  • Some vowel pairs that differed in height were moderately correlated in mean F1 (e.g., /o/–/a/: r = 0.66, p < 0.001).
  • Correlations of mean F2 were strongest among vowels with a uniform backness specification.
  • The mean mid-frequency peak values for /s/ and /z/ each varied substantially across readings, and were strongly correlated with one another (r = 0.87, p < 0.001; Figure 4).13.
Conclusion
  • VoxClamantis V1.0 is the first large-scale corpus for phonetic typology, with extracted phonetic features for 635 typologically diverse languages.
  • The authors present two case studies illustrating both the research potential and limitations of this corpus for investigation of phonetic typology at a large scale.
  • The authors hope that directly releasing the alignments and token-level features enables greater research accessibility in this area.
  • The authors hope this corpus will motivate and enable further developments in both phonetic typology and methodology for working with cross-linguistic speech corpora
Summary
  • Introduction:

    Understanding the range and limits of crosslinguistic variation is fundamental to the scientific study of language.
  • In speech and phonetic typology, this involves exploring potentially universal tendencies that shape sound systems and govern phonetic structure.
  • Such investigation requires access to large amounts of cross-linguistic data.
  • The recently developed CMU Wilderness corpus (Black, 2019) constitutes an exception to this rule with over 600 languages
  • This makes it the largest and most typologically diverse speech corpus to date.
  • In addition to its coverage, the CMU Wilderness corpus is unique in two additional aspects: cleanly recorded, read speech exists for all languages in the corpus, and the same content exists across all languages
  • Results:

    As shown in Figure 4 and Table 3, the strongest correlations in mean F1 frequently reflected uniformity of height.12.
  • Some vowel pairs that differed in height were moderately correlated in mean F1 (e.g., /o/–/a/: r = 0.66, p < 0.001).
  • Correlations of mean F2 were strongest among vowels with a uniform backness specification.
  • The mean mid-frequency peak values for /s/ and /z/ each varied substantially across readings, and were strongly correlated with one another (r = 0.87, p < 0.001; Figure 4).13.
  • Conclusion:

    VoxClamantis V1.0 is the first large-scale corpus for phonetic typology, with extracted phonetic features for 635 typologically diverse languages.
  • The authors present two case studies illustrating both the research potential and limitations of this corpus for investigation of phonetic typology at a large scale.
  • The authors hope that directly releasing the alignments and token-level features enables greater research accessibility in this area.
  • The authors hope this corpus will motivate and enable further developments in both phonetic typology and methodology for working with cross-linguistic speech corpora
Tables
  • Table1: Phoneme Error Rate (PER) for Unitran treating Epitran as ground-truth. ‘Types’ and ‘Tokens’ numbers reflect the number of unique word types and word tokens in each reading. We report PER calculated using word types for calibration with other work, as well as frequency-weighted PER reflecting occurrences in our corpus
  • Table2: Computation time to generate the full corpus
  • Table3: Pearson correlations (r) of mean F1 in ERB between vowel categories
  • Table4: Pearson correlations (r) of mean F2 in ERB between vowel categories
  • Table5: WikiPron G2P Phone Error Rate (PER) calculated treating WikiPron annotations as ground-truth. We perform 20 trials with random 80/20 splits per language, and report PER averaged across trials with 95% confidence intervals for each language
  • Table6: Table of final G2P hyperparameter settings. Alignment parameters not listed here for phonetisaurus-align use the default values. The language model was trained using SRILM (<a class="ref-link" id="cStolcke_2002_a" href="#rStolcke_2002_a">Stolcke, 2002</a>) ngram-count using default values except for those listed above
  • Table7: Summary of quality measure retention statistics for vowels and sibilants over unique readings with reading-level MCD < 8 for Unitran and high-resource alignments
Download tables as Excel
Reference
  • Gopala Krishna Anumanchipalli, Kishore Prahallad, and Alan W. Black. 201Festvox: Tools for creation and analyses of large speech corpora. In Workshop on Very Large Scale Phonetics Research, UPenn, Philadelphia.
    Google ScholarLocate open access versionFindings
  • Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber. 2020. Common Voice: A massivelymultilingual speech corpus. In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020).
    Google ScholarLocate open access versionFindings
  • Roy Becker-Kristal. 2010. Acoustic typology of vowel inventories and Dispersion Theory: Insights from a large cross-linguistic corpus. Ph.D. thesis, University of California, Los Angeles.
    Google ScholarFindings
  • Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300.
    Google ScholarLocate open access versionFindings
  • Alan W. Black. 2006. CLUSTERGEN: A statistical parametric synthesizer using trajectory modeling. In Proceedings of INTERSPEECH.
    Google ScholarLocate open access versionFindings
  • Alan W. Black. 2019. CMU Wilderness Multilingual Speech Dataset. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5971–5975, Brighton, UK. IEEE.
    Google ScholarLocate open access versionFindings
  • Oliver Blacklock. 2004. Characteristics of Variation in Production of Normal and Disordered Fricatives, Using Reduced-Variance Spectral Methods. Ph.D. thesis, University of Southampton.
    Google ScholarFindings
  • Paul Boersma and David Weenink. 2019. Praat: Doing phonetics by computer [computer program]. version 6.0.45.
    Google ScholarFindings
  • Gary F. Simons Eberhard, David M. and Charles D. Fennig, editors. 2020. Ethnologue: Languages of the world, 23 edition. SIL international. Online version: http://www.ethnologue.com.
    Locate open access versionFindings
  • Arvo Eek and Einar Meister. 1994. Acoustics and perception of Estonian vowel types. Phonetic Experimental Research, XVIII:146–158.
    Google ScholarLocate open access versionFindings
  • Olle Engstrand and Una Cunningham-Andersson. 1988. Iris - a data base for cross-linguistic phonetic research.
    Google ScholarFindings
  • Edward S. Flemming. 1995. Auditory Representations in Phonology. Ph.D. thesis, UCLA.
    Google ScholarFindings
  • Edward S. Flemming. 2004. Contrast and perceptual distinctiveness. In Bruce Hayes, R. Kirchner, and Donca Steriade, editors, The Phonetic Bases of Phonological Markedness, 1968, pages 232–276. University Press, Cambridge, MA.
    Google ScholarLocate open access versionFindings
  • Harvey Fletcher. 1923. Physical measurements of audition and their bearing on the theory of hearing. Journal of the Franklin Institute, 196(3):289–326.
    Google ScholarLocate open access versionFindings
  • Karen Forrest, Gary Weismer, Paul Milenkovic, and Ronald N. Dougall. 1988. Statistical analysis of word-initial voiceless obstruents: Preliminary data. The Journal of the Acoustical Society of America, 84(1):115–123.
    Google ScholarLocate open access versionFindings
  • Brian R. Glasberg and Brian C.J. Moore. 1990. Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1-2):103–138.
    Google ScholarLocate open access versionFindings
  • Matthew Gordon and Timo Roettger. 20Acoustic correlates of word stress: A cross-linguistic survey. Linguistics Vanguard, 3(1).
    Google ScholarLocate open access versionFindings
  • Kyle Gorman, Lucas F.E. Ashby, Aaron Goyzueta, Arya D. McCarthy, Shijie Wu, and Daniel You. 2020. The SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion. In Proceedings of the SIGMORPHON Workshop.
    Google ScholarLocate open access versionFindings
  • Taehong Cho and Peter Ladefoged. 1999. Variation and universals in VOT: Evidence from 18 languages. Journal of Phonetics, 27(2):207–229.
    Google ScholarLocate open access versionFindings
  • Eleanor Chodroff. 2017. Structured Variation in Obstruent Production and Perception. Ph.D. thesis, Johns Hopkins University.
    Google ScholarFindings
  • Mary Harper. 2011. The IARPA Babel multilingual speech database. Accessed: 2020-05-01.
    Google ScholarFindings
  • Arthur S. House and Kenneth N. Stevens. 1956. Analog studies of the nasalization of vowels. The Journal of Speech and Hearing Disorders, 21(2):218– 232.
    Google ScholarLocate open access versionFindings
  • Eleanor Chodroff, Alessandra Golden, and Colin Wilson. 2019. Covariation of stop voice onset time across languages: Evidence for a universal constraint on phonetic realization. The Journal of the Acoustical Society of America, 145(1):EL109– EL115.
    Google ScholarLocate open access versionFindings
  • Sandra Ferrari Disner. 1983. Vowel Quality: The Relation between Universal and Language-specific Factors. Ph.D. thesis, UCLA.
    Google ScholarFindings
  • Roman Jakobson. 1968. Child Language, Aphasia and Phonological Universals. Mouton Publishers.
    Google ScholarFindings
  • Allard Jongman, Ratree Wayland, and Serena Wong. 2000. Acoustic characteristics of English fricatives. The Journal of the Acoustical Society of America, 108(3):1252–1263.
    Google ScholarLocate open access versionFindings
  • Martin Joos. 1948. Acoustic phonetics. Language, 24(2):5–136.
    Google ScholarLocate open access versionFindings
  • Patricia A. Keating. 2003. Phonetic and other influences on voicing contrasts. In Proceedings of the 15th International Congress of Phonetic Sciences, pages 20–23, Barcelona, Spain.
    Google ScholarLocate open access versionFindings
  • Laura Koenig, Christine H. Shadle, Jonathan L. Preston, and Christine R. Mooshammer. 2013. Toward improved spectral measures of /s/: Results from adolescents. Journal of Speech, Language, and Hearing Research, 56(4):1175–1189.
    Google ScholarLocate open access versionFindings
  • John Kominek, Tanja Schultz, and Alan W. Black. 2008. Synthesizer voice quality of new languages calibrated with mean mel cepstral distortion. In Spoken Languages Technologies for Under-Resourced Languages.
    Google ScholarFindings
  • Peter Ladefoged, Richard Harshman, Louis Goldstein, and Lloyd Rice. 1978. Generating vocal tract shapes from formant frequencies. The Journal of the Acoustical Society of America, 64(4):1027–1035.
    Google ScholarLocate open access versionFindings
  • Peter Ladefoged and Keith Johnson. 2014. A Course in Phonetics. Nelson Education.
    Google ScholarFindings
  • Peter Ladefoged and Ian Maddieson. 2007. The UCLA phonetics lab archive.
    Google ScholarFindings
  • Jackson L. Lee, Lucas F.E. Ashby, M. Elizabeth Garza, Yeonju Lee-Sikka, Sean Miller, Alan Wong, Arya D. McCarthy, and Kyle Gorman. 2020. Massively multilingual pronunciation mining with WikiPron. In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020). European Language Resources Association (ELRA). Resources downloadable from https://github.com/kylebgorman/wikipron.
    Locate open access versionFindings
  • Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R Mortensen, Graham Neubig, Alan W. Black, et al. 2020. Universal phone recognition with a multilingual allophone system. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8249–8253. IEEE.
    Google ScholarLocate open access versionFindings
  • Mona Lindau and Patricia Wood. 1977. Acoustic vowel spaces. UCLA Working Papers in Phonetics, 38:41–48.
    Google ScholarLocate open access versionFindings
  • Bjorn Lindblom. 1986. Phonetic universals in vowel systems. In John J. Ohala and Jeri Jaeger, editors, Experimental Phonology, pages 13–44. Academic Press, Orlando.
    Google ScholarLocate open access versionFindings
  • Bjorn Lindblom and Johan Sundberg. 1971. Acoustical consequences of lip, tongue, jaw, and larynx movement. The Journal of the Acoustical Society of America, 50(4B):1166–1179.
    Google ScholarLocate open access versionFindings
  • Peder Livijn. 2000. Acoustic distribution of vowels in differently sized inventories–hot spots or adaptive dispersion. Phonetic Experimental Research, Institute of Linguistics, University of Stockholm (PERILUS), 11.
    Google ScholarLocate open access versionFindings
  • Liang Lu, Arnab Ghoshal, and Steve Renals. 2013. Acoustic data-driven pronunciation lexicon for large vocabulary speech recognition. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pages 374–379. IEEE.
    Google ScholarLocate open access versionFindings
  • Ian Maddieson. 1995. Gestural economy. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden.
    Google ScholarLocate open access versionFindings
  • Andre Martinet. 1955. Economie Des Changements Phonetiques: Traitede Phonologie Diachronique, volume 10. Bibliotheca Romanica.
    Google ScholarLocate open access versionFindings
  • Lucie Menard, Jean-Luc Schwartz, and Jerome Aubin. 2008. Invariance and variability in the production of the height feature in French vowels. Speech Communication, 50:14–28.
    Google ScholarLocate open access versionFindings
  • Steven Moran and Daniel McCloy, editors. 2019. PHOIBLE 2.0. Max Planck Institute for the Science of Human History, Jena.
    Google ScholarFindings
  • David R. Mortensen, Siddharth Dalmia, and Patrick Littell. 2018. Epitran: Precision G2P for many languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, France. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Terrance M. Nearey. 1977. Phonetic Feature Systems for Vowels. Ph.D. thesis, University of Alberta. Reprinted 1978 by Indiana University Linguistics Club.
    Google ScholarFindings
  • Josef Robert Novak, Nobuaki Minematsu, and Keikichi Hirose. 2016. Phonetisaurus: Exploring graphemeto-phoneme conversion with joint n-gram models in the WFST framework. Natural Language Engineering, 22(6):907–938.
    Google ScholarLocate open access versionFindings
  • Livia Oushiro. 2019. Linguistic uniformity in the speech of Brazilian internal migrants in a dialect contact situation. In Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019, pages 686–690, Melbourne, Australia. Canberra, Australia: Australasian Speech Science and Technology Association Inc.
    Google ScholarLocate open access versionFindings
  • Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society. IEEE Catalog No.: CFP11SRWUSB.
    Google ScholarLocate open access versionFindings
  • Kishore Prahallad, Alan W. Black, and Ravishankhar Mosur. 2006. Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 1. IEEE.
    Google ScholarLocate open access versionFindings
  • Ting Qian, Kristy Hollingshead, Su-youn Yoon, Kyoung-young Kim, and Richard Sproat. 2010. A Python toolkit for universal transliteration. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta. European Language Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Karim Rahim and Wesley S. Burr. 2017. multitaper: Multitaper spectral analysis. R package version 1.014.
    Google ScholarFindings
  • Xiaohui Zhang, Vimal Manohar, Daniel Povey, and Sanjeev Khudanpur. 2017. Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework. arXiv preprint arXiv:1706.03747.
    Findings
  • Eberhard Zwicker and Ernst Terhardt. 1980. Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. The Journal of the Acoustical Society of America, 68(5):1523–1525.
    Google ScholarLocate open access versionFindings
  • Daniel Recasens and Aina Espinosa. 2009. Dispersion and variability in Catalan five and six peripheral vowel systems. Speech Communication, 51(3):240– 258.
    Google ScholarLocate open access versionFindings
  • Tanja Schultz. 2002. GlobalPhone: A multilingual speech and text database developed at Karlsruhe University. In Seventh International Conference on Spoken Language Processing, pages 345–348, Denver, CO.
    Google ScholarLocate open access versionFindings
  • Jean-Luc Schwartz and Lucie Menard. 2019. Structured idiosyncrasies in vowel systems. OSF Preprints.
    Google ScholarFindings
  • Christine H. Shadle, Wei-rong Chen, and D. H. Whalen. 2016. Stability of the main resonance frequency of fricatives despite changes in the first spectral moment. The Journal of the Acoustical Society of America, 140(4):3219–3220.
    Google ScholarLocate open access versionFindings
  • Kenneth N. Stevens and Samuel J. Keyser. 2010. Quantal theory, enhancement and overlap. Journal of Phonetics, 38(1):10–19.
    Google ScholarLocate open access versionFindings
  • Andreas Stolcke. 2002. SRILM - an extensible language modeling toolkit. In Seventh International Conference on Spoken Language Processing, pages 901–904.
    Google ScholarLocate open access versionFindings
  • Bert Vaux and Bridget Samuels. 2015. Explaining vowel systems: Dispersion theory vs natural selection. Linguistic Review, 32(3):573–599.
    Google ScholarLocate open access versionFindings
  • Dominic J. L. Watt. 2000. Phonetic parallels between the close-mid vowels of Tyneside English: Are they internally or externally motivated? Language Variation and Change, 12(1):69–101.
    Google ScholarLocate open access versionFindings
  • John C. Wells. 1995/2000. Computer-coding the IPA: A proposed extension of SAMPA.
    Google ScholarFindings
  • D.H. Whalen and Andrea G. Levitt. 1995. The universality of intrinsic F0 of vowels. Journal of Phonetics, 23:349–366.
    Google ScholarLocate open access versionFindings
  • Matthew Wiesner, Oliver Adams, David Yarowsky, Jan Trmal, and Sanjeev Khudanpur. 2019. Zero-shot pronunciation lexicons for cross-language acoustic model transfer. In Proceedings of IEEE Association for Automatic Speech Recognition and Understanding (ASRU).
    Google ScholarLocate open access versionFindings
  • Table 3 and Table 4 respectively show Pearson correlations of mean F1 and mean F2 in ERB between vowels that appear in at least 10 readings. As formalized in the present analysis, phonetic uniformity predicts strong correlations of mean F1 among vowels with a shared height specification, and strong correlations of mean F2 among vowels with a shared backness specification. The respective “Height” and “Backness” columns in Table 3 and Table 4 indicate whether the vowels in each pair match in their respective specifications. p-values are corrected for multiple comparisons using the Benjamini-Hochberg correction and a false discovery rate of 0.25 (Benjamini and Hochberg, 1995). Significance is assessed at α = 0.05 following the correction for multiple comparisons; rows that appear in gray have correlations that are not significant according to this threshold.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments