AI帮你理解科学

AI 生成解读视频

AI抽取解析论文重点内容自动生成视频


pub
生成解读视频

AI 溯源

AI解析本论文相关学术脉络


Master Reading Tree
生成 溯源树

AI 精读

AI抽取本论文的概要总结


微博一下
Between the two contextual word embedding models, XLMCtxt significantly outperforms mBERTCtxt, we report only the former

Modeling the Music Genre Perception across Language Bound Cultures

EMNLP 2020, pp.4765-4779, (2020)

被引用0|浏览259
下载 PDF 全文
引用
微博一下

摘要

The music genre perception expressed through human annotations of artists or albums varies significantly across language-bound cultures. These variations cannot be modeled as mere translations since we also need to account for cultural differences in the music genre perception. In this work, we study the feasibility of obtaining relevant ...更多

代码

数据

0
简介
  • A prevalent approach to culturally study music genres starts with a common set of music items, e.g. artists, albums, tracks, and assumes that the same music genres would be associated with the items in all cultures (Ferwerda and Schedl, 2016; Skowron et al, 2017).
  • Accounting for cultural differences in music genres’ perception could give a more grounded basis for such cultural studies.
  • Ensuring both a common set of music items and culture-sensitive annotations with broad coverage of music genres is strenuous (Bogdanov et al, 2019)
重点内容
  • A prevalent approach to culturally study music genres starts with a common set of music items, e.g. artists, albums, tracks, and assumes that the same music genres would be associated with the items in all cultures (Ferwerda and Schedl, 2016; Skowron et al, 2017)
  • This work aims to assess the possibility of obtaining relevant cross-lingual music genre annotations, able to capture cultural differences too, by relying on language-specific semantic representations
  • The standard translation, Google Translate6 (GTrans), leads to the lowest results being over-performed by a knowledge-based translation, more adapted to this domain (DBpSameAs)
  • Between the two contextual word embedding models, XLMCtxt significantly outperforms mBERTCtxt, we report only the former
  • We have presented an extensive investigation on cross-lingual modeling of music genre annotation, focused on six languages, and two common approaches to semantically represent concepts: ontologies and distributed embeddings9
方法
  • Cross-lingual music genre annotation, as formalized in Section 3, is a typical multi-label prediction task.
  • The authors use the Area Under the receiver operating characteristic Curve (AUC, Bradley, 1997), macro-averaged.
  • The authors report the mean and standard deviations of the macro AUC scores using 3-fold cross-validation.
  • The authors pre-process the music genres by either replacing special characters with space ( -/,) or removing them (()’:.!$).
  • Embeddings are computed from pre-processed tags
结果
  • The standard translation, GTrans, leads to the lowest results being over-performed by a knowledge-based translation, more adapted to this domain (DBpSameAs).
  • These results show that translation methods fail to capture the dissimilar cross-cultural music genre perception.
  • Pair ja-nl ja-fr ja-es ja-cs ja-en best7 music genre embeddings computed with each word/sentence pre-trained model or method.
  • Between the two contextual word embedding models, XLMCtxt significantly outperforms mBERTCtxt, the authors report only the former
结论
  • The results show that using translation to produce cross-lingual annotations is limited as it does not consider the culturally divergent perception of music genres.
  • As shown in Table 4, joining semantic representations in this way proves very suitable to learn music genre vectors from scratch.The authors have presented an extensive investigation on cross-lingual modeling of music genre annotation, focused on six languages, and two common approaches to semantically represent concepts: ontologies and distributed embeddings9.
  • The models to generate cross-lingual annotations should be thoroughly evaluated in downstream music retrieval and recommendation tasks
表格
  • Table1: Number of music items for language pair
  • Table2: Number of unique music genres in the corpus (Section 3.2) and in the ontology (Section 4.1)
  • Table3: Macro-AUC scores (in %, best overall in bold, best locally underlined). The first part corresponds to the translation baselines; the second to the best distributed representations; the last to the retrofitted FTsif vectors
  • Table4: Macro-AUC scores (in %; those larger than RfituΩFTsif in Table 3 in bold) with vectors learned by retrofitting to aligned monolingual ontologies
  • Table5: Macro-AUC scores (in %, best locally underlined). The first two parts correspond to averaging or applying sif averaging to static multilingual word embeddings; the third part corresponds to the contextual sentence embeddings
  • Table6: Macro-AUC scores (in %) with vectors learned by leveraging aligned monolingual ontologies. The first column shows the results by relying on the aligned ontologies only. The second and third columns show the results obtained by retrofitting FTsif embeddings to monolingual ontologies, respectively, aligned monolingual ontologies
  • Table7: Embedding dimensions for pre-trained models used in our study, corresponding to the values provided and optimized by each model’s authors
Download tables as Excel
相关工作
  • Music genres are conceptual representations encompassing a set of conventions between the music industry, artists, and listeners about individual music styles (Lena, 2012). From a cultural perspective, it has been shown that there are differences in how people listen to music genres. (Ferwerda and Schedl, 2016; Skowron et al, 2017). Average listening habits in some countries span across many music genres and are less diverse in other countries (Ferwerda and Schedl, 2016). Also, cultural dimensions proved strong predictors for the popularity of specific music genres (Skowron et al, 2017).

    Despite the apparent agreement on the music style for which the music genres stand, conveyed in the earlier definition and implied in the related works too, music genres are subjective concepts (Sordo et al, 2008; Lee et al, 2013). To address this subjectivity, Bogdanov et al (2019) proposed a dataset of music items annotated with English music genres by different sources. In this line of work, we address the divergent perception of music genres. Still, we focus on multilingual, unsupervised music genre annotation without relying on content features, i.e. audio or lyrics. We also complement similar studies in other domains (art: Eleta and Golbeck, 2012) with another research method.
引用论文
  • Javier Franco Aixela. 1996. Culture-specific items in translation. Translation, power, subversion, 8:52– 78.
    Google ScholarFindings
  • Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • Mikel Artetxe and Holger Schwenk. 2019. Massively multilingual sentence embeddings for zeroshot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics, 7:597–610.
    Google ScholarLocate open access versionFindings
  • Soren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The Semantic Web, pages 722–735, Berlin, Heidelberg. Springer.
    Google ScholarLocate open access versionFindings
  • Parsiad Azimzadeh and Peter A. Forsyth. 2016. Weakly chained matrices, policy iteration, and impulse control. SIAM Journal on Numerical Analysis, 54(3):1341–1364.
    Google ScholarLocate open access versionFindings
  • Yoshua Bengio, Olivier Delalleau, and Nicolas Le Roux. 200Label Propagation and Quadratic Criterion, semi-supervised learning edition, pages 193–216. MIT Press.
    Google ScholarFindings
  • Dmitry Bogdanov, Alastair Porter, Hendrik Schreiber, Julian Urbano, and Sergio Oramas. 2019. The acousticbrainz genre dataset: Multi-source, multi-level, multi-label, and large-scale. In Proceedings of the 20th Conference of the International Society for Music Information Retrieval (ISMIR 2019), pages 360– 367, Delft, The Netherlands. International Society for Music Information Retrieval (ISMIR).
    Google ScholarLocate open access versionFindings
  • Andrew P. Bradley. 1997. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145–1159.
    Google ScholarLocate open access versionFindings
  • Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475–2485, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Jon Dattorro. 2005. Convex Optimization & Euclidean Distance Geometry. Meboo Publishing.
    Google ScholarFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kadras, Sylvain Gugger, and Jeremy Howard. 2019. MultiFiT: Efficient multi-lingual language model fine-tuning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5702–5707, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Irene Eleta and Jennifer Golbeck. 2012. A study of multilingual social tagging of art images: Cultural bridges and diversity. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, CSCW ’12, pages 695–704, New York, NY, USA. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • Elena V. Epure, Anis Khlif, and Romain Hennequin. 2019. Leveraging knowledge bases and parallel annotations for music genre translation. In Conference of the International Society of Music Information Retrieval, ISMIR 2019.
    Google ScholarLocate open access versionFindings
  • Elena V. Epure, Guillaume Salha, and Romain Hennequin. 2020. Multilingual music genre embeddings for effective cross-lingual music item annotation. In Conference of the International Society of Music Information Retrieval, ISMIR 2020.
    Google ScholarLocate open access versionFindings
  • Lanting Fang, Yong Luo, Kaiyu Feng, Kaiqi Zhao, and Aiqun Hu. 2019. Knowledge-enhanced ensemble learning for word embeddings. In The World Wide Web Conference, WWW ’19, page 427–437, New York, NY, USA. Association for Computing Machinery.
    Google ScholarLocate open access versionFindings
  • Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy, and Noah A. Smith. 2015. Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1606–1615, Denver, Colorado. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bruce Ferwerda and Markus Schedl. 2016. Investigating the relationship between diversity in music consumption behavior and cultural dimensions: A crosscountry analysis. In 24th Conference on User Modeling, Adaptation, and Personalization (UMAP) Extended Proceedings: 1st Workshop on Surprise, Opposition, and Obstruction in Adaptive and Personalized Systems (SOAP), Halifax, NS, Canada.
    Google ScholarLocate open access versionFindings
  • Gene H Golub and Christian Reinsch. 1971. Singular value decomposition and least squares solutions. In Handbook for Automatic Computation: Volume II: Linear Algebra, pages 134–151. Springer Berlin Heidelberg, Berlin, Heidelberg.
    Google ScholarLocate open access versionFindings
  • Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan. European Languages Resources Association (ELRA).
    Google ScholarLocate open access versionFindings
  • Viktor Hangya, Fabienne Braune, Alexander Fraser, and Hinrich Schutze. 2018. Two methods for domain adaptation of bilingual tasks: Delightfully simple and broadly applicable. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 810–820, Melbourne, Australia. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Dmetri Hayes. 2019. What just happened? Evaluating retrofitted distributional word vectors. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1062–1072, Minneapolis, Minnesota. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Niels van der Heijden, Samira Abnar, and Ekaterina Shutova. 2019. A comparison of architectures and pretraining methods for contextualized multilingual word embeddings. Computing Research Repository, arXiv:1912.10169.
    Findings
  • Romain Hennequin, Jimena Royo-Letelier, and Manuel Moussallam. 2018. Audio based disambiguation of music genre tags. In Conference of the International Society of Music Information Retrieval, ISMIR 2018.
    Google ScholarLocate open access versionFindings
  • Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, and Edouard Grave. 2018. Loss in translation: Learning bilingual word mapping with a retrieval criterion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2979–2984, Brussels, Belgium. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Douwe Kiela, Felix Hill, and Stephen Clark. 2015. Specializing word embeddings for similarity or relatedness. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2044–2048, Lisbon, Portugal. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Joo-Kyung Kim, Marie-Catherine de Marneffe, and Eric Fosler-Lussier. 2016. Adjusting word embeddings with semantic intensity orders. In Proceedings of the 1st Workshop on Representation Learning for NLP, pages 62–69, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Claire Kramsch and HG Widdowson. 1998. Language and culture. Oxford University Press.
    Google ScholarFindings
  • Guillaume Lample and Alexis Conneau. 2019. Crosslingual language model pretraining. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.
    Google ScholarLocate open access versionFindings
  • Jin Ha Lee, Kahyun Choi, Xiao Hu, and J. Stephen Downie. 2013. K-pop genres: A cross-cultural exploration. In Conference of the International Society on Music Information Retrieval, ISMIR 2013.
    Google ScholarLocate open access versionFindings
  • Jennifer C Lena. 2012. Banding together: How communities create genres in popular music. Princeton University Press.
    Google ScholarFindings
  • Ben Lengerich, Andrew Maas, and Christopher Potts. 2018. Retrofitting distributional embeddings to knowledge graphs with functional relations. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2423–2436, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Pasquale Lisena, Konstantin Todorov, Cecile Cecconi, Francoise Leresche, Isabelle Canno, Frederic Puyrenier, Martine Voisin, Thierry Le Meur, and Raphael Troncy. 2018. Controlled vocabularies for music metadata. In Conference of the International Society on Music Information Retrieval, Paris, France.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, page 3111–3119, Red Hook, NY, USA. Curran Associates Inc.
    Google ScholarLocate open access versionFindings
  • George A. Miller. 1995. Wordnet: A lexical database for english. Commun. ACM, 38(11):39–41.
    Google ScholarLocate open access versionFindings
  • Peter Newmark. 1988. A textbook of translation, volume 66. Prentice hall New York.
    Google ScholarFindings
  • Matthew Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. 2017. Semi-supervised sequence tagging with bidirectional language models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1756–1765, Vancouver, Canada. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Ulrike Pfeil, Panayiotis Zaphiris, and Chee Siang Ang. 2006. Cultural Differences in Collaborative Authoring of Wikipedia. Journal of Computer-Mediated Communication, 12(1):88–113.
    Google ScholarLocate open access versionFindings
  • Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How multilingual is multilingual bert? Computing Research Repository, arXiv:1906.01502.
    Findings
  • Edoardo Maria Ponti, Ivan Vulic, Goran Glavas, Roi Reichart, and Anna Korhonen. 2019. Cross-lingual semantic specialization via lexical relation induction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2206– 2217, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nils Reimers and Iryna Gurevych. 2019. SentenceBERT: Sentence embeddings using Siamese BERTnetworks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages
    Google ScholarLocate open access versionFindings
  • 3982–3992, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarFindings
  • Sebastian Ruder, Ivan Vulic, and Anders Søgaard. 2019. A survey of cross-lingual word embedding models. Journal of Artificial Intelligence Research, 65(1):569–630.
    Google ScholarLocate open access versionFindings
  • Yousef Saad. 2003. Iterative methods for sparse linear systems, 2nd edition. Society for Industrial and Applied Mathematics, USA.
    Google ScholarFindings
  • Tanay Kumar Saha, Shafiq Joty, Naeemul Hassan, and Mohammad Al Hasan. 2016. Dis-s2v: Discourse informed sen2vec. Computing Research Repository, arXiv:1610.08078.
    Findings
  • Hendrik Schreiber. 2016. Genre ontology learning: Comparing curated with crowd-sourced ontologies. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), pages 400– 406, New York, USA.
    Google ScholarLocate open access versionFindings
  • Francisco Vargas, Kamen Brestnichki, Alex Papadopoulos Korfiatis, and Nils Hammerla. 2019. Multilingual factor analysis. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1738–1750, Florence, Italy. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • John Wieting, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2016. Towards universal paraphrastic sentence embeddings. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • Shijie Wu and Mark Dredze. 2019.
    Google ScholarFindings
  • Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 833–844, Hong Kong, China. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Konstantinos Sechidis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2011. On the stratification of multilabel data. In Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III, ECML PKDD’11, page 145–158, Berlin, Heidelberg. Springer-Verlag.
    Google ScholarLocate open access versionFindings
  • George Kingsley Zipf. 1949. Human behavior and the principle of least effort. Cambridge, AddisonWesley Press.
    Google ScholarFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715– 1725, Berlin, Germany. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Vered Shwartz and Ido Dagan. 2019a. Still a pain in the neck: Evaluating text representations on lexical composition. Transactions of the Association for Computational Linguistics, 7:403–419.
    Google ScholarLocate open access versionFindings
  • Vered Shwartz and Ido Dagan. 2019b. Still a pain in the neck: Evaluating text representations on lexical composition. Transactions of the Association for Computational Linguistics, 7:403–419.
    Google ScholarLocate open access versionFindings
  • Marcin Skowron, Florian Lemmerich, Bruce Ferwerda, and Markus Schedl. 2017. Predicting genre preferences from cultural and socio-economic factors for music retrieval. In Advances in Information Retrieval, pages 561–567, Cham. Springer International Publishing.
    Google ScholarLocate open access versionFindings
  • Mohamed Sordo, Oscar Celma, Martin Blech, and Enric Guaus. 2008. The Quest for Musical Genres: Do the Experts and the Wisdom of Crowds Agree? In Conference of the International Society on Music Information Retrieval.
    Google ScholarLocate open access versionFindings
  • Robyn Speer and Joshua Chin. 2016. An ensemble method to produce high-quality word embeddings. Computing Research Repository, arXiv:1604.01692.
    Findings
  • G includes at least one node from V, we conclude that A+B is a weakly chained diagonally dominant matrix (Azimzadeh and Forsyth, 2016), i.e. that:
    Google ScholarFindings
  • Such matrices are nonsingular (Azimzadeh and Forsyth, 2016), which implies that Q → QT (A + B)Q is a positive-definite quadratic form. As A+B is a symmetric positive-definite matrix, there exists a matrix M such that A + B = MT M. Therefore, denoting || · ||2F the squared Frobenius matrix norm: T r QT (A + B)Q = T r QT MT MQ = ||QM||2F
    Google ScholarFindings
  • which is strictly convex w.r.t. Q due to the strict convexity of the squared Frobenius norm (see e.g. 3.1 in Dattorro (2005)). Since the sum of strictly convex functions of Q (first trace in Φ(Q)) and linear functions of Q (second trace in Φ(Q)) is still strictly convex w.r.t. Q, we conclude that the objective function Φ is strictly convex w.r.t. Q.
    Google ScholarLocate open access versionFindings
  • The aforementioned updating procedure for Q (Faruqui et al., 2015) is derived from Jacobi iteration procedure (Saad, 2003; Bengio et al., 2006) and converges for any initialization. Such a convergence result is discussed in Bengio et al. (2006). It can also be directly verified in our specific setting by checking that each irreducible element of A + B, i.e. each connected component of the underlying graph constructed from this matrix, is irreducibly diagonally dominant (see 4.2.3 in Saad (2003)) and then by applying Theorem 4.9 from Saad (2003) on each of these components. Besides, due to its strict convexity w.r.t. Q, the objective function Φ admits a unique global minimum. Consequently, the retrofitting update procedure will converge to the same embedding matrix regardless of the order in which nodes are updated.
    Google ScholarLocate open access versionFindings
您的评分 :
0

 

标签
评论
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科