LoCoNet: Long-Short Context Network for Active Speaker Detection
CVPR 2024(2024)
Key words
Active Speaker,Active Speaker Detection,Local Patterns,Video Frames,Temporal Dependencies,Temporal Model,Long-range Dependencies,Short-term Model,Multiple Speakers,Convolutional Network,Attention Mechanism,Receptive Field,Temporal Context,Inference Speed,Small Face,Visual Encoding,Visible Face
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined