Meeting the challenge: A benchmark corpus for automated Urdu meeting summarization

Bareera Sadia,Farah Adeeba,Sana Shams,Kashif Javed

Information Processing & Management（2024）

引用 0|浏览0

暂无评分

摘要

Meeting summarization has become crucial as the world is gradually shifting towards remote work. Nowadays, automation of meeting summary generation is really needed in order to minimize the time and effort. The surge in online meetings has made summarization an indispensable requirement, yet summarizing Urdu meetings poses a formidable challenge due to the scarcity of pertinent corpora. Abstractively summarizing Urdu meetings compounds this challenge. This research addresses these gaps by introducing the Center for Language Engineering (CLE) Meeting Corpus, a benchmark resource tailored for meeting summarization in administrative and technical domains where Urdu is the primary language. Comprising 240 recorded meetings, encompassing both scenario-based and natural discussions, the corpus spans approximately 7900 min (∼132 h) of meeting duration. Beyond corpus creation, the study delves into the performance analysis of various deep learning models in Urdu abstractive meeting summarization. Models, including ur_mT5-small, ur_mT5-base, ur_mBART-large, ur_RoBERTa-urduhack-small, and GPT-3.5 with prompting, undergo comprehensive evaluation using both automated metrics and manual assessments based on five specific criteria. This research not only addresses the immediate challenges of Urdu meeting summarization but also contributes to advancing the capabilities of meeting summarization systems in diverse organizational contexts where Urdu is the language of communication during meetings.

查看译文

关键词

Natural language processing,Abstractive text summarization,Deep learning,Meeting corpus,Urdu,Fine-tuning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要