An Improved DE Algorithm to Optimise the Learning Process of a BERT-based Plagiarism Detection Model

Seyed Vahid Moravvej,Seyed Jalaleddin Mousavirad,Diego Oliva,Gerald Schaefer,Zahra Sobhaninia

2022 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC)（2022）

引用 9|浏览5

暂无评分

摘要

Plagiarism detection is a challenging task, aiming to identify similar items in two documents. In this paper, we present a novel approach to automatic plagiarism detection that combines BERT (bidirectional encoder representations from transformers) word embedding, attention mechanism-based long short-term memory (LSTM) networks, and an improved differential evolution (DE) algorithm for weight initialisation. BERT is used to pretrain deep bidirectional representations in all layers, while the pre-trained BERT model can be fine-tuned with only one extra output layer without significant changes in architecture. Deep learning algorithms often use the random weighting method for initialisation, followed by gradient-based optimisation algorithms such as back-propagation for training, making them susceptible to getting trapped in local optima. To address this, population- based metaheuristic algorithms such as DE can be used. We propose an improved DE algorithm with a clustering-based mutation operator, where first a winning cluster of candidate solutions is identified and a new updating strategy is then applied to include new candidate solutions in the current population. The proposed DE algorithm is used in LSTM, attention mechanism, and feed- forward neural networks to yield the initial seeds for subsequent gradient-based optimisation. We compare our proposed model with conventional and population-based approaches on three datasets (SNLI, MSRP and SemEval2014) and demonstrate it to give superior plagiarism detection performance.

查看译文

关键词

Plagiarism detection, BERT, LSTM, attention mechanism, differential evolution

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要