Bei Liu

Senior Researcher, Microsoft Research Asia, Beijing.

prof_pic.jpg

📍 Beijing, China

🏱 Microsoft Research Asia

🔬 Visual Computing Group

My research focuses on Multimodal AI, Document Understanding, and AI Agents. I also serve as a Guest Associate Professor at Nagoya University in Japan. Before joining Microsoft, I earned my Ph.D. and Master’s degrees from Kyoto University, Japan, under the guidance of Professors Katsumi Tanaka, Masatoshi Yoshikawa, and Makoto P. Kato. I hold a Bachelor’s degree from Nanjing University, China.

My current interest is in enabling agents that actively read, navigate, and reason over complex documents, combining perception, planning, and tool use.

I am open to research collaboration, academic visits, and supervising interns working on multimodal agents. Feel free to reach out!

News

Jan 1, 2025 One paper accepted to MMM 2025, awarded 🏆 Best Paper!
Dec 1, 2024 One paper accepted to ACM MMAsia 2024, awarded Best Student Paper Runner-Up. 🎉

Selected Publications

  1. MMM
    RoLD: Robot Latent Diffusion for Multi-task Policy Modeling
    Wenhui Tan, Bei Liu, Junbo Zhang, Ruihua Song, and Jianlong Fu
    In International Conference on Multimedia Modeling (MMM), 2025
    🏆 Best Paper Award, MMM 2025
    📖 Cited by 2 (updated: 2026-03-17)
  2. MM Asia
    ViCo: Engaging Video Comment Generation with Human Preference Rewards
    Yuchong Sun, Bei Liu, Xu Chen, Ruihua Song, and Jianlong Fu
    In Proceedings of ACM Multimedia Asia (MM Asia), 2024
    đŸ„ˆ Best Student Paper Runner-Up, ACM MM Asia 2024
    📖 Cited by 3 (updated: 2026-03-17)
  3. ICLR
    CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
    Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, and Jiebo Luo
    In International Conference on Learning Representations (ICLR), 2023
    📖 Cited by 258 (updated: 2026-03-17)
  4. CVPR
    Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
    Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, and Baining Guo
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
    📖 Cited by 306 (updated: 2026-03-17)
  5. CVPR
    Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
    Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, and Jianlong Fu
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    Oral
    📖 Cited by 346 (updated: 2026-03-17)
  6. NeurIPS
    Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training
    Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, and Jiebo Luo
    In Advances in Neural Information Processing Systems (NeurIPS), 2021
    📖 Cited by 105 (updated: 2026-03-17)
  7. ACM MM
    Unifying Multimodal Transformer for Bi-directional Image and Text Generation
    Yupan Huang, Hongwei Xue, Bei Liu, and Yutong Lu
    In Proceedings of the 29th ACM International Conference on Multimedia (MM), 2021
    📖 Cited by 73 (updated: 2026-03-17)
  8. ACM MM
    Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
    Bei Liu, Jianlong Fu, Makoto P. Kato, and Masatoshi Yoshikawa
    In Proceedings of the 26th ACM International Conference on Multimedia (MM), 2018
    🏆 Best Paper Award, ACM Multimedia 2018
    📖 Cited by 111 (updated: 2026-03-17)

Selected Publications

Awards & Honors

2025.1 International Conference on Multimedia Modeling (MMM) 2025 — Best Paper Award
2024.12 ACM Multimedia Asia 2024 — Best Student Paper Runner-Up
2022.7 China Multimedia Company Innovation Technology Award
2020 IEEE Transactions on Multimedia — Outstanding Reviewer Award
2019.6 CVPR 2019, ActivityNet Challenge, ActivityNet Captions Track — 1st Place
2019.5 CVPR 2019, VQA and Dialog Workshop, VQA Challenge — 2nd Place
2018.10 ACM Multimedia 2018 — Best Paper Award
2018.7 FashionAI Challenge, Attribute Recognition Task (Alibaba) — 3rd Place / 2950 teams
2014–2016 Asian Future Leaders Scholarship — Fellowship funded by the Bai Xian Education Foundation