Research interests:

Learning from a large number of data sources: A common modern machine learning scenario involves a large amount of data contributed by a large number of heterogeneous data sources, with each data source providing a modest amount of data. How well can we learn in this setting? To what extent can the large number of sources compensate for the lack of data from each source? What is the fundamental limit of learning? This problem has been studied under multi-task learning, meta-learning, federated learning, few-shot learning, empirical bayesian by different communities.

Estimating learnability: Without enough data to learn a good model for prediction, is it possible to tell whether a good model exists? This is surprisingly possible under linear model assumptions .