Beyond Catoni: Sharper Rates for Heavy-Tailed and Robust Mean Estimation
arxiv(2023)
摘要
We study the fundamental problem of estimating the mean of a $d$-dimensional
distribution with covariance $\Sigma \preccurlyeq \sigma^2 I_d$ given $n$
samples. When $d = 1$, Catoni \cite{catoni} showed an estimator with error
$(1+o(1)) \cdot \sigma \sqrt{\frac{2 \log \frac{1}{\delta}}{n}}$, with
probability $1 - \delta$, matching the Gaussian error rate. For $d>1$, a
natural estimator outputs the center of the minimum enclosing ball of
one-dimensional confidence intervals to achieve a $1-\delta$ confidence radius
of $\sqrt{\frac{2 d}{d+1}} \cdot \sigma \left(\sqrt{\frac{d}{n}} +
\sqrt{\frac{2 \log \frac{1}{\delta}}{n}}\right)$, incurring a
$\sqrt{\frac{2d}{d+1}}$-factor loss over the Gaussian rate. When the
$\sqrt{\frac{d}{n}}$ term dominates by a $\sqrt{\log \frac{1}{\delta}}$ factor,
\cite{lee2022optimal-highdim} showed an improved estimator matching the
Gaussian rate. This raises a natural question: is the Gaussian rate achievable
in general? Or is the $\sqrt{\frac{2 d}{d+1}}$ loss \emph{necessary} when the
$\sqrt{\frac{2 \log \frac{1}{\delta}}{n}}$ term dominates?
We show that the answer to both these questions is \emph{no} -- we show that
\emph{some} constant-factor loss over the Gaussian rate is necessary, but
construct an estimator that improves over the above naive estimator by a
constant factor. We also consider robust estimation, where an adversary is
allowed to corrupt an $\epsilon$-fraction of samples arbitrarily: in this case,
we show that the above strategy of combining one-dimensional estimates and
incurring the $\sqrt{\frac{2d}{d+1}}$-factor \emph{is} optimal in the
infinite-sample limit.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要