An Information-Theoretic Analysis of In-Context Learning
CoRR(2024)
摘要
Previous theoretical results pertaining to meta-learning on sequences build
on contrived assumptions and are somewhat convoluted. We introduce new
information-theoretic tools that lead to an elegant and very general
decomposition of error into three components: irreducible error, meta-learning
error, and intra-task error. These tools unify analyses across many
meta-learning challenges. To illustrate, we apply them to establish new results
about in-context learning with transformers. Our theoretical results
characterizes how error decays in both the number of training sequences and
sequence lengths. Our results are very general; for example, they avoid
contrived mixing time assumptions made by all prior results that establish
decay of error with sequence length.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要