Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models
CoRR(2024)
摘要
We propose masked particle modeling (MPM) as a self-supervised method for
learning generic, transferable, and reusable representations on unordered sets
of inputs for use in high energy physics (HEP) scientific data. This work
provides a novel scheme to perform masked modeling based pre-training to learn
permutation invariant functions on sets. More generally, this work provides a
step towards building large foundation models for HEP that can be generically
pre-trained with self-supervised learning and later fine-tuned for a variety of
down-stream tasks. In MPM, particles in a set are masked and the training
objective is to recover their identity, as defined by a discretized token
representation of a pre-trained vector quantized variational autoencoder. We
study the efficacy of the method in samples of high energy jets at collider
physics experiments, including studies on the impact of discretization,
permutation invariance, and ordering. We also study the fine-tuning capability
of the model, showing that it can be adapted to tasks such as supervised and
weakly supervised jet classification, and that the model can transfer
efficiently with small fine-tuning data sets to new classes and new data
domains.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要