In-context Contrastive Learning for Event Causality Identification
EMNLP 2024(2024)
The authors of this article include Chao Liang, Wei Xiang, and Wang Guobang. Chao Liang and Wei Xiang are from the School of Electronics Information and Communication at Huazhong University of Science and Technology, with research interests in topic models, sequence-to-sequence learning and deep learning, feature extraction, semantic segmentation, semantics, and topic models. Professor Wang Guobang's research interests focus on wireless networks, positioning technology, recommendation algorithms, computational communication, and knowledge graphs, with over 100 technical publications, including 7 patents, 2 books, 7 book chapters, and more than 110 research articles in international conferences and journals. His affiliations are with the School of Electronics Information and Communication and the Machine Intelligence and Network Science Laboratory at Huazhong University of Science and Technology, with research directions including wireless sensor networks, topic models, network embedding, prompt learning, and context-aware recommendation systems.
1. Abstract
- Event Causality Identification (ECI) aims to determine whether there is a causal relationship between two events in a document.
- This paper proposes an ICCL model based on contextual contrastive learning, which enhances the effectiveness of positive and negative demonstrations and better promotes event causality identification.
- The ICCL model is evaluated on the EventStoryLine and Causal-TimeBank datasets, and the results show that it significantly outperforms existing algorithms.
2. Introduction
- The importance and application scenarios of Event Causality Identification.
- Limitations of existing methods: graph-based approaches and prompt-based learning methods.
- The motivation and advantages of the proposed ICCL model.
3. Method
- 3.1 Task Formulation
- Transform the ECI task into a causal relationship cloze task.
- Input: Event pair and its original sentence.
- Output: A virtual answer word indicating whether there is a causal relationship.
- 3.2 Prompt Learning Module
- Reformulate input instances and retrieved demonstration samples into prompt templates as input for PLM encoding.
- Prompt templates include the query instance and K retrieved demonstration samples.
- Demonstration samples include event pairs, original sentences, and relationship labels.
- 3.3 Contextual Contrastive Module
- Optimize the representation of event mentions by contrastive loss, while maximizing consistency with positive demonstration samples and minimizing consistency with negative demonstration samples.
- Use the offset between hidden states of event mentions to represent their relationship.
- Adopt supervised contrastive learning to optimize the representation of relationship vectors for event pairs.
- 3.4 Causality Prediction Module
- Use the hidden state marked with a mask in the query instance to predict the answer word for identifying causality.
- Use a masked language model classifier to estimate the probability of each word in the vocabulary.
- Use cross-entropy loss as the loss function.
- 3.5 Training Strategy
- Jointly train the contextual contrastive module and the causality prediction module.
- The total loss function includes prediction loss and contrastive loss.
4. Experiment
- 4.1 Datasets
- EventStoryLine 0.9 Corpus (ESC)
- Causal-TimeBank Corpus (CTB)
- 4.2 Parameter Settings
- Use the RoBERTa model.
- Set learning rate, batch size, contrastive loss ratio, temperature parameter, etc.
- 4.3 Competitors
- List various baseline models, including graph-based methods and prompt-based learning methods.
5. Result Analysis
- 5.1 Overall Results
- The ICCL model achieved the best performance on the ESC and CTB datasets.
- 5.2 Ablation Study
- Validate the effectiveness of the contextual learning and contrastive learning modules.
- 5.3 Number of Demonstration Samples
- Validate the impact of the number of demonstration samples on model performance.
- 5.4 Few-shot Learning
- Validate the robustness of the ICCL model in few-shot learning scenarios.
- 5.5 Embedding Visualization
- Visually demonstrate the clustering phenomenon of event pair embeddings in the embedding space.
6. Conclusion
- The ICCL model has achieved significant results on the ECI task.
- Future work will explore applying the ICCL model to other NLP tasks.
7. Limitations
- The PLM input length limit leads to a limited number of demonstration samples.
- The limited number of positive and negative samples weakens the effectiveness of contrastive learning.
8. Ethical Statement
- This paper has no special ethical considerations.
Q: What specific research methods were used in the paper?
- Prompt Learning: Reformats event pairs and demonstration samples into prompt templates, which are used as input for the encoding of the pre-trained language model (PLM).
- In-context Contrastive Learning: Optimizes the representation of event mentions by simultaneously maximizing the consistency between event mentions and positive demonstration samples, and minimizing the consistency with negative demonstration samples through contrastive loss.
- Causal Prediction Module: Uses the query instance's token to predict answer words for identifying causal relationships.
Q: What are the main research findings and outcomes?
- The ICCL model achieved new state-of-the-art performance on the event causality identification task.
- Contrastive learning effectively enhanced the effectiveness of demonstration samples and helped distinguish semantic differences between causal and non-causal event pairs.
- Contextual learning significantly improved model performance by introducing demonstration samples as explicit guidance, especially in cross-sentence causality identification.
- The ICCL model demonstrated good robustness in low-resource scenarios, with relatively slow performance degradation even when the amount of training data was reduced.
Q: What are the current limitations of this research?
- Due to the input length limit of the PLM, the number of demonstration samples needs to be kept within a controllable range.
- ICCL uses demonstration samples as positive and negative samples in contrastive learning, which limits the number of positive and negative samples and thus weakens the effectiveness of contrastive learning.
