IMPACT OF LEGAL STATUS OF DATA ON DEVELOPMENT OF DATA-INTENSIVE PRODUCTS: EXAMPLE OF LANGUAGE TECHNOLOGIES

Aleksei Kelli,Arvi Tavast,Krister Linden, Ramunas Birstonas,Penny Labropoulou,Kadri Vider,Irene Kull,Gaabriel Tavits,Age Varv,Vadim Mantrov

LEGAL SCIENCE: FUNCTIONS, SIGNIFICANCE AND FUTURE IN LEGAL SYSTEMS II（2020）

引用 1|浏览0

暂无评分

摘要

The purpose of this artide is to explain the extent to which the legal regime applicable to language data affects the development and use of language technology (LT). The main focus of the paper is on EU law. The article also maps possible text and data mining (TDM) issues. The authors focus on TOM for research purposes outlined in the Digital Copyright Directive 2019/790. The authors follow a process approach of LT development, which starts from raw data collection and leads to LT products such as a refrigerator with a speech interface. Particular attention is given to language models. The raw data used in LT often indude copyright-protected works, objects of related rights (e.g., performances) and personal data in the form of person's voice or other information stored in non-annotated and annotated databases. The authors' main argument is that the legal regime of language data does not usually affect the use of language models since copyrighted works are not likely to remain in models. In the process of developing a language technology application, language models are the first intermediate result that can be free from legal restrictions affecting language data. The use of a person's voice as identifiable personal data in a language model can create legal challenges. In some cases, developers of language technology must be careful how to address issues of processing of personal data contained in models.

查看译文

关键词

data,data-intensive product,data protection,algorithm,language technologies

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要