AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We evaluate the performance of BERT and XLNet models based on word embedding method and finetuning method on shared task “Irony Detection in English Tweets”

Evaluation of BERT and XLNet Models on Irony Detection in English Tweets

Cited by: 0|Views199
Full Text
Bibtex
Weibo

Abstract

Automatically detecting irony is helpful and important for mining fine-grained information from social web data. Therefore, the International Workshop on Semantic Evaluation (SemEval) presented the first shared task on irony detection called "Irony Detection in English Tweets" in 2018. For this task, the system should determine whether th...More

Code:

Data:

0
Introduction
  • The development of the social web has stimulated the use of figurative and creative language, including irony, in public [1].
  • According to literary scholars [2], irony was defined as a trope where the speaker intends to express a contradictory situation or the opposite meaning of what is literally said.
  • It adopts a subtle technique where incongruity is used to suggest a distinction between reality and expectation in order to produce a humorous or emphatic effect on the listener.
  • Automatic irony detecting techniques are important to improve the performance of sentiment analysis
Highlights
  • The development of the social web has stimulated the use of figurative and creative language, including irony, in public [1]
  • In this paper, we evaluate the performance of BERT and XLNet models on the shared task “Irony Detection in English Tweets” dataset based on two ways, word embedding method and fine-tuning method
  • We evaluate the performance of BERT and XLNet models based on word embedding method and finetuning method on shared task “Irony Detection in English Tweets”
  • These two models could get relatively high s cores showing that BERT and XLNet models are capable to understand the irony to some extent
  • The base XLNet model got the best score for both tasks showing that XLNet does outperform BERT in understanding human language, but the large XLNet models seem to instable during the training process
  • One more thing is that finetuning method outperforms the word embedding method in both tasks
Methods
  • Methods to Adopt

    BERT/XLNet

    The authors' goal is to evaluate the performance of BERT and XLNet models in this task.
  • One is called word embedding method, i.e., using pre-trained model to transform text into embeddings and classifying them with traditional machine learning techniques such as SVM and LR.
  • The other one is to fine-tune the pre-trained model directly on the dataset of irony detection task in SemEval-2018.
  • The authors will introduce these two methods in detail below.
  • This model can predict the label of the given tweet
Results
  • The authors explain the experiments and show the results. 5.1. Data Preprocessing

    Before the experiment, the authors should make some data preprocessing on the tweets since tweets are unstructured data and have many words that machine cannot recognize.

    1.
  • The authors should make some data preprocessing on the tweets since tweets are unstructured data and have many words that machine cannot recognize.
  • 1. The authors discard all links in the tweets since the links are almost YouTube short urls.
  • The authors discard all links in the tweets since the links are almost YouTube short urls
  • These urls do not contain.
  • For the word embedding method, the authors tuned hyperparameters of traditional machine learning models for the better results.
Conclusion
  • The authors evaluate the performance of BERT and XLNet models based on word embedding method and finetuning method on shared task “Irony Detection in English Tweets”.
  • These two models could get relatively high s cores showing that BERT and XLNet models are capable to understand the irony to some extent.
  • Some tweets need more preprocessing like translating the abbreviation or removing repeated words like “looooooong”
Summary
  • Introduction:

    The development of the social web has stimulated the use of figurative and creative language, including irony, in public [1].
  • According to literary scholars [2], irony was defined as a trope where the speaker intends to express a contradictory situation or the opposite meaning of what is literally said.
  • It adopts a subtle technique where incongruity is used to suggest a distinction between reality and expectation in order to produce a humorous or emphatic effect on the listener.
  • Automatic irony detecting techniques are important to improve the performance of sentiment analysis
  • Objectives:

    The authors' goal is to evaluate the performance of BERT and XLNet models in this task.
  • Methods:

    Methods to Adopt

    BERT/XLNet

    The authors' goal is to evaluate the performance of BERT and XLNet models in this task.
  • One is called word embedding method, i.e., using pre-trained model to transform text into embeddings and classifying them with traditional machine learning techniques such as SVM and LR.
  • The other one is to fine-tune the pre-trained model directly on the dataset of irony detection task in SemEval-2018.
  • The authors will introduce these two methods in detail below.
  • This model can predict the label of the given tweet
  • Results:

    The authors explain the experiments and show the results. 5.1. Data Preprocessing

    Before the experiment, the authors should make some data preprocessing on the tweets since tweets are unstructured data and have many words that machine cannot recognize.

    1.
  • The authors should make some data preprocessing on the tweets since tweets are unstructured data and have many words that machine cannot recognize.
  • 1. The authors discard all links in the tweets since the links are almost YouTube short urls.
  • The authors discard all links in the tweets since the links are almost YouTube short urls
  • These urls do not contain.
  • For the word embedding method, the authors tuned hyperparameters of traditional machine learning models for the better results.
  • Conclusion:

    The authors evaluate the performance of BERT and XLNet models based on word embedding method and finetuning method on shared task “Irony Detection in English Tweets”.
  • These two models could get relatively high s cores showing that BERT and XLNet models are capable to understand the irony to some extent.
  • Some tweets need more preprocessing like translating the abbreviation or removing repeated words like “looooooong”
Tables
  • Table1: The detailed statistics of dataset[<a class="ref-link" id="c10" href="#r10">10</a>]
  • Table2: In this table, the teams are ranked by the official 1 score. The top 5 teams for Task A[<a class="ref-link" id="c10" href="#r10">10</a>]
  • Table3: The top 5 teams for Task B[<a class="ref-link" id="c10" href="#r10">10</a>]
  • Table4: Word Embedding Results for Task A ( score)
  • Table5: Fine-Tuning Results for Task
  • Table6: Word Embedding Results for Task B( )
Download tables as Excel
Funding
  • BERT is pretrained on unlabeled text (Wikipedia) by jointly conditioning on both left and right context in all layers and has obtained state-of-the-art results on eleven natural language processing tasks, including increasing the GLUE[17] to 80.5%, MultiNLI accuracy to 86.7%, SQuAD v1.1[18] question answering 1 score to 93.2 and SQuAD v2.0 1 score to 83.1
Study subjects and analysis
ironic types of tweets: 3
The subtask A is aimed to determine whether a tweet is ironic. The subtask B is aimed to identify three ironic types of tweets: verbal irony by means of a polarity contrast, situational irony, or another type of verbal irony. Definition and example are as follows [10]: Verbal irony by means of a polarity contrast For this type, the polarity (positive, negative) is inverted between the literal and the intended evaluation

teams: 5
2.4. Systems and results for Task A The result of the top 5 teams in task A[10] is shown in. Acc 0.797

teams: 5
Unigram SVM BL. As shown in Table 2, the systems of the top five teams outperforms the unigram SVM baseline by a sizable margin. For these best 5 teams, they all used training data provided only, which means that they didn’t use other similar datasets for training

teams: 5
As shown in Table 2, the systems of the top five teams outperforms the unigram SVM baseline by a sizable margin. For these best 5 teams, they all used training data provided only, which means that they didn’t use other similar datasets for training. The UCDCC team ( 1 = 0.724 ) developed a siamese architecture which consists of two subnetworks (containing an LSTM) making use of Glove word embeddings [11]

teams: 5
2.5. Systems and results for Task B The result of the top 5 teams in task B[10] is shown in. Table 3 in which the teams are ranked by the official 1 score

teams: 5
The best 1 score 0.704 is achieved by xlnet-base-cased model training with four epochs and 16 batch size. Compared with the top 5 teams’ results, 0.704 can also rank 3rd and better than the best score 0.693 based on word embedding method. E=4 bs=64 bs=64 bs=32 bs=16

teams: 5
We should also take a closer look at performance on each category of irony in task B. The top five teams’. performance for each class is shown in Table 8[10]

teams: 5
One more thing is that BERT and XLNet cannot recognize other verbal irony neither while the traditional machine learning models can recognize it even the score is very low. Compared with the scores of top five teams, the best XLNet model performs better on the situational irony while in other categories the performance is not satisfying. Table 9. score of best three systems for each class for Task B

teams: 5
The detailed statistics of dataset[10]. In this table, the teams are ranked by the official 1 score. The top 5 teams for Task A[10]. The top 5 teams for Task B[10]

teams: 5
In this table, the teams are ranked by the official 1 score. The top 5 teams for Task A[10]. The top 5 teams for Task B[10]. Word Embedding Results for Task A ( score)

Reference
  • A. Ghosh, G. Li, T. Veale, P. Rosso, E. Shutova, J. Barnden, and A. Reyes, “Semeval-2015 task 11: sentiment analysis of figurative language in twitter,” Proc. of the 9th International Workshop on Semantic Evaluation, pp.470-478, June. 2015.
    Google ScholarLocate open access versionFindings
  • G. Lakoff, The contemporary theory of metaphor, Cambridge University Press. 1993.
    Google ScholarFindings
  • A. Ghosh, and T. Veale, “Fracking sarcasm using neural network,” Proc. of 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp.161-169, June. 2016.
    Google ScholarLocate open access versionFindings
  • C. Van Hee, E. Lefever, and V. Hoste, “Monday mornings are my fave:)# not exploring the automatic recognition of irony in english tweets,” Proc. of the 26th Int’l Conf. on Computational Linguistic., pp.2730-2739, Dec. 2016.
    Google ScholarLocate open access versionFindings
  • S. Rosenthal, N. Farra, and P. Nakov, “SemEval-2017 task 4: sentiment analysis in twitter,” arXiv preprint arXiv:1912.00741. 2019.
    Findings
  • A. Joshi, P. Bhattacharyya, and M.J. Carman, “Automatic sarcasm detection: a survey,” ACM Computing Surveys (CSUR), 50(5), 73, 2017.
    Google ScholarLocate open access versionFindings
  • A. Khattri, A. Joshi, P. Bhattacharyya, and M. Carman, “Your sentiment precedes you: using an author ’s historical tweets to predict sarcasm.” Proc. of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp.25-30, Sept. 2015.
    Google ScholarLocate open access versionFindings
  • D.G. Maynard, and M.A. Greenwood, “Who cares about sarcastic tweets? investigating the impact of sarcasm on sentiment analysis,” Proc. of the 9th Int’l Conf. on Language Resources and Evaluation (LREC'14), pp.4238-4243, Mar. 2014.
    Google ScholarLocate open access versionFindings
  • N. Desai, and A.D. Dave, “Sarcasm detection in hindi sentences using support vector machine,” International Journal, 4(7), 8-15, 2016
    Google ScholarLocate open access versionFindings
  • C. Van Hee, E. Lefever, and V. Hoste, “Semeval-2018 task 3: irony detection in english tweets,” Proc. 12th International Workshop on Semantic Evaluation, pp.39-50, June. 2018
    Google ScholarLocate open access versionFindings
  • A. Ghosh, and T. Veale, “Irony magnet at semeval2018 task 3: a siamese network for irony detection in social media,” Proc. of the 12th Int’l Workshop on Semantic Evaluation, pp. 570-575, June. 2018.
    Google ScholarLocate open access versionFindings
  • C. Wu, F. Wu, S. Wu, J. Liu, Z. Yuan, and Y. Huang, “Thu_ngn at semeval-2018 task 3: tweet irony detection with densely connected lstm and multi-task learning,” Proc. 12th Int’l Workshop on Semantic Evaluation, pp. 51-56, June, 2018.
    Google ScholarLocate open access versionFindings
  • C. Baziotis, N. Athanasiou, P. Papalampidi, A. Kolovou, G. Paraskevopoulos, N. Ellinas, and Potamianos, “Ntua-slp at semeval-2018 task 3: tracking ironic tweets using ensembles of word and character level attentive rnns,” arXiv preprint arXiv:1804.06659, June. 2018.
    Findings
  • O. Rohanian, S. Taslimipoor, R. Evans, and R. Mitkov, “Wlv at semeval-2018 task 3: dissecting tweets in search of irony.” Proc. of the 12th Int’l Workshop on Semantic Evaluation, pp. 553-559, June. 2018.
    Google ScholarLocate open access versionFindings
  • H. Rangwani, D. Kulshreshtha, and A.K. Singh, “Nlprl-iitbhu at semeval-2018 task 3: combining linguistic features and emoji pre-trained cnn for irony detection in tweets,” Proc. of the 12th Int’l Workshop on Semantic Evaluation, pp.638-642, June. 2018.
    Google ScholarLocate open access versionFindings
  • J. Devlin, M.W. Chang, K. Lee, and K. Toutanova, “Bert: pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
    Findings
  • A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S.R. Bowman, “Glue: a multi-task benchmark and analysis platform for natural language understanding,” arXiv preprint arXiv:1804.07461. 2018.
    Findings
  • P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,” arXiv preprint arXiv:1606.05250. 2016.
    Findings
  • Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R.
    Google ScholarFindings
  • W.N. Francis, and H. Kucera, “Brown corpus manual: manual of information to accompany a standard corpus of present-day edited american english for use with digital computers.” Brown University, Providence, Rhode Island, USA. 1979.
    Google ScholarLocate open access versionFindings
Author
Cheng Zhang
Cheng Zhang
Masashi Kudo
Masashi Kudo
Your rating :
0

 

Tags
Comments
小科