He mainly applies machine learning and deep learning techniques to connect the visual and language. His research interests include visual question answering, visual grounding, visual captioning, cross-media retrieval and high-dimensional hashing indexing, etc. His research results have expounded in 20+ publications at prestigious conferences and journals, e.g., CVPR, ICCV, SIGIR, ACM Multimedia, IEEE TMM, TNNLS. Also, he served for a number of journals and conferences, including IEEE Trans. on Image Processing (TIP), IEEE Trans. on Multimedia (TMM), IEEE Trans. on Circuits and Systems for Video Technology (TCSVT), Information Sciences, Signal Processing, Neurocomputing, and CVPR, AAAI, IJCAI, ACMMM, etc.