Do Not (Always) Look Right: Investigating the Capabilities of Decoder-Based Large Language Models for Sequence Labeling
CoRR(2024)
摘要
Pre-trained language models based on masked language modeling (MLM) objective
excel in natural language understanding (NLU) tasks. While fine-tuned MLM-based
encoders consistently outperform causal language modeling decoders of
comparable size, a recent trend of scaling decoder models to multiple billion
parameters resulted in large language models (LLMs), making them competitive
with MLM-based encoders. Although scale amplifies their prowess in NLU tasks,
LLMs fall short of SOTA results in information extraction (IE) tasks, many
framed as sequence labeling (SL). However, whether this is an intrinsic
limitation of LLMs or whether their SL performance can be improved remains
unclear. To address this, we explore strategies to enhance the SL performance
of "open" LLMs (Llama2 and Mistral) on IE tasks. We investigate bidirectional
information flow within groups of decoder blocks, applying layer-wise removal
or enforcement of the causal mask (CM) during LLM fine-tuning. This approach
yields performance gains competitive with SOTA SL models, matching or
outperforming the results of CM removal from all blocks. Our findings hold for
diverse SL tasks, proving that "open" LLMs with layer-dependent CM removal
outperform strong MLM-based encoders and instruction-tuned LLMs. However, we
observe no effect from CM removal on a small scale when maintaining an
equivalent model size, pre-training steps, and pre-training and fine-tuning
data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要