Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

Publication
In Submission
Fan Ma
Fan Ma
Postdoctoral Researcher in Artificial Intelligence

My research interests include Vision-Language Pre-training, Video Understanding, Object Tracking, and Semi-Supervised Learning.