PGN-LM Model and Forcing-Seq2Seq Model: Multiple automatic models of title generation for natural text using Deep Learning

To Thanh Nhan,Nguyen Thi Hiep Thuan,Quan Thanh Tho

REV Journal on Electronics and Communications（2022）

引用 0|浏览2

暂无评分

摘要

In the current era, the amount of information from the Internet in general and the electronic press in particular has increased rapidly and has extremely useful information value in all aspects of life, many popular users have posted several high-quality writings as casual blogs, notes or reviews. Some of them are even selected by editors to be published in professional venues. However, the original posts often come without titles, which are needed to be manually added by the editing teams. This task would be done automatically, with the recent advancement of AI techniques, especially deep learning. Even though auto-title can be considered as a specific case of text summarization, this job poses some major different requirements. Basically, a title is generally short but it needs to capture major content while still maintaining the writing style of the original document. To fulfill those constraints, we introduce PGN-LM Model, an architecture evolved from the Pointer Generator Network, with the ability to solve Out-of-Vocabulary problems that traditional Seq2Seq models cannot handle, and at the same time combined with language modeling techniques. In addition, we also introduce a model called Forcing-Seq2Seq Model, an enhanced Seq2Seq architecture, in which the classical TF-IDF scores are incorporated with Named Entity Recognition method to identify the major keywords of the original texts. To enforce the appearance of those keywords in the generated titles, the specific Teacher Forcing mechanism combined with the language model technique are employed. We have tested our approaches with real datasets and obtained promising initial results, on both metrics of machine and human perspectives.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要