A maximin optimal approach for sampling designs in two-phase studies
arxiv(2023)
摘要
Data collection costs can vary widely across variables in data science tasks.
Two-phase designs are often employed to save data collection costs. In
two-phase studies, inexpensive variables are collected for all subjects in the
first phase, and expensive variables are measured for a subset of subjects in
the second phase based on a predetermined sampling rule. The estimation
efficiency under two-phase designs relies heavily on the sampling rule.
Existing literature primarily focuses on designing sampling rules for
estimating a scalar parameter in some parametric models or specific estimating
problems. However, real-world scenarios are usually model-unknown and involve
two-phase designs for model-free estimation of a scalar or multi-dimensional
parameter. This paper proposes a maximin criterion to design an optimal
sampling rule based on semiparametric efficiency bounds. The proposed method is
model-free and applicable to general estimating problems. The resulting
sampling rule can minimize the semiparametric efficiency bound when the
parameter is scalar and improve the bound for every component when the
parameter is multi-dimensional. Simulation studies demonstrate that the
proposed designs reduce the variance of the resulting estimator in various
settings. The implementation of the proposed design is illustrated in a real
data analysis.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要