Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck
arxiv(2023)
摘要
Black-box deep neural networks excel in text classification, yet their
application in high-stakes domains is hindered by their lack of
interpretability. To address this, we propose Text Bottleneck Models (TBM), an
intrinsically interpretable text classification framework that offers both
global and local explanations. Rather than directly predicting the output
label, TBM predicts categorical values for a sparse set of salient concepts and
uses a linear layer over those concept values to produce the final prediction.
These concepts can be automatically discovered and measured by a Large Language
Model (LLM) without the need for human curation. Experiments on 12 diverse text
understanding datasets demonstrate that TBM can rival the performance of
black-box baselines such as few-shot GPT-4 and finetuned DeBERTa while falling
short against finetuned GPT-3.5. Comprehensive human evaluation validates that
TBM can generate high-quality concepts relevant to the task, and the concept
measurement aligns well with human judgments, suggesting that the predictions
made by TBMs are interpretable. Overall, our findings suggest that TBM is a
promising new framework that enhances interpretability with minimal performance
tradeoffs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要