A Neural-based Architecture For Small Datasets Classification

JCDL '20: The ACM/IEEE Joint Conference on Digital Libraries in 2020 Virtual Event China August, 2020（2020）

引用 1|浏览19

暂无评分

摘要

Digital Libraries benefit from the use of text classification strategies since they are enablers for performing many document management tasks like Information Retrieval. The effectiveness of such classification strategies depends on the amount of available data and the classifier used. The former leads to the design of data augmentation solutions where new samples are generated into small datasets based on the semantic similarity between existing samples and concepts defined within external linguistic resources. The latter relates to the capability of finding, which is the best learning principle to adopt for designing an effective classification strategy suitable for the problem. In this work, we propose a neural-based architecture thought for addressing the text classification problem on small datasets. Our architecture is based on BERT equipped with one further layer using the sigmoid function. The hypothesis we want to verify is that by using embeddings learned by a BERT-based architecture, one can perform effective classification on small datasets without the use of data augmentation strategies. We observed improvements up to 14% in the accuracy and up to $23%$ in the f-score with respect to baseline classifiers exploiting data augmentation.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要