I am broadly interested in computer vision and machine learning, with a focus on developing models to learn from videos. At a high level, my research interests can be categorized into three aspects: 1) Learning to understand videos. This includes developing algorithms for recognition, detection and segmentation in videos. 2) Learning video representations with minimal human supervision (i.e., weakly- and self-supervised learning). 3) Learning across modalities (e.g., image, video, language and audio).