Publications

(2024). Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens. CVPR 2024.

PDF Project

(2024). MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis. CVPR 2024.

PDF Project

(2024). Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval. CVPR 2024.

(2024). Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity. CVPR 2024.

(2024). HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting. Arxiv.

PDF Project

(2024). CapHuman: Capture Your Moments in Parallel Universes. CVPR 2024.

PDF Project

(2023). Temporal Perceiving Video-Language Pre-training. AAAI 2024.

PDF

(2023). VLAB--Enhancing Video Language Pre-training by Feature Adapting and Blending. arxiv.

PDF

(2022). Unified Transformer Tracker for Object Tracking. CVPR 2022.

Cite Code

(2020). Self-paced Multi-view Co-training. JMLR 2020.

PDF Cite Code

(2020). SF-Net: Single-Frame Supervision for Temporal Action Localization.

PDF Cite Code

(2017). Few-example object detection with model communication. TPAMI 2018.

PDF

(2017). A dual-network progressive approach to weakly supervised object detection. ACM MM 2017.

PDF

(2017). Self-Paced Co-training. ICML 2017.

PDF Cite Code