Chrome Extension
WeChat Mini Program
Use on ChatGLM

DEE: Dual-stage Explainable Evaluation Method for Text Generation

CoRR(2024)

Cited 0|Views29
No score
Abstract
Automatic methods for evaluating machine-generated texts hold significantimportance due to the expanding applications of generative systems.Conventional methods tend to grapple with a lack of explainability, issuing asolitary numerical score to signify the assessment outcome. Recent advancementshave sought to mitigate this limitation by incorporating large language models(LLMs) to offer more detailed error analyses, yet their applicability remainsconstrained, particularly in industrial contexts where comprehensive errorcoverage and swift detection are paramount. To alleviate these challenges, weintroduce DEE, a Dual-stage Explainable Evaluation method for estimating thequality of text generation. Built upon Llama 2, DEE follows a dual-stageprinciple guided by stage-specific instructions to perform efficientidentification of errors in generated texts in the initial stage andsubsequently delves into providing comprehensive diagnostic reports in thesecond stage. DEE is fine-tuned on our elaborately assembled dataset AntEval,which encompasses 15K examples from 4 real-world applications of Alipay thatemploy generative systems. The dataset concerns newly emerged issues likehallucination and toxicity, thereby broadening the scope of DEE's evaluationcriteria. Experimental results affirm that DEE's superiority over existingevaluation methods, achieving significant improvements in both humancorrelation as well as efficiency.
More
Translated text
Key words
Named Entity Recognition
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined