Xinsheng Wang is currently working toward the Ph.D. degree with Xi’an Jiaotong University, Xi’an, China. He is currently a Visiting Researcher with the Multimedia Computing Group, Delft University of Technology, Delft, The Netherland. His research interests include speech-visual cross-modal and multimodal learning.