Bofei Zhang (张博飞)

Email: zhangbofei5675[at]outlook[dot]com

Tiktok AIIC(AI创新中心)

Experience

Career

2025/7-Current; Researcher @ Tiktok AIIC(AI创新中心)
- Working on SFT & RL for GUI Agent
2023/3-2025/7; Research Engineer @ Beijing Institute for General Artificial Intelligence (BIGAI)
- Worked on Vision Language Model & Agentic Task Post-Training.
2020/6-2023/3; Software Engineer @ ByteDance
- Tiktok & Lark

🚀 Referral(内推） Available — Details here →

Education

2018/9-2020/6; Master in Data Science @ New York University
2013/8-2018/5; Bachelor in Biomedical Engineering @ The Ohio State University

News

Nov 13, 2025	TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials is accepted by AAAI 2026! 🎉; Checkout detail for this project here!
Oct 03, 2025	New preprint: Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation (DART-GUI). 🔗 Project: computer-use-agents.github.io/dart-gui 📄 arXiv: 2509.23866 🤗 Hugging Face: papers/2509.23866 ⭐ GitHub: computer-use-agents/dart-gui If you find this research helpful, consider starring the GitHub repo and upvoting the HF paper page.
Sep 26, 2025	Our paper “Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning” has been accepted by NeurIPS 2025! 🎉
Feb 07, 2025	Multi-modal Agent Tuning (MAT): A framework for auto-generating multimodal tool-usage trajectories (20K MM-Traj), boosting MiniCPM & Qwen-VL tool use by 20%. This work is accepted by ICLR 2025!
Aug 02, 2024	Introducing 🔥FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models. Checkout Here for more details! 🔥FIRE are accepted by NeurIPS 2024!

Latest Posts

Aug 08, 2025	ByteDance Referral
Jun 14, 2025	Computer-Use MacOS Agent
Feb 07, 2025	Tutorial of training Multi-modal Agent Tuning projects with LLaMA-Factory

Selected Publications

* Equal contribution, ✉ Corresponding author

Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

Pengxiang Li , Zechen Hu , Zirui Shang, and 15 more authors

Preprint, 2025

arXiv Bib Code Website

@article{dartgui2025,
  title = {Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation},
  author = {Li, Pengxiang and Hu, Zechen and Shang, Zirui and Wu, Jingrong and Liu, Yang and Liu, Hui and Gao, Zhi and Shi, Chenrui and Zhang, Bofei and Zhang, Zihao and Shi, Xiaochuan and Yu, Zedong and Wu, Yuwei and Wu, Xinxiao and Jia, Yunde and Xiang, Liuyu and He, Zhaofeng and Li, Qing},
  journal = {Preprint},
  year = {2025},
}

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Xintong Zhang* , Zhi Gao* , Bofei Zhang, and 8 more authors

Preprint, 2025

arXiv Bib Code Website

@article{zhang2025cof,
  title = {Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL},
  author = {Zhang, Xintong and Gao, Zhi and Zhang, Bofei and Li, Pengxiang and Zhang, Xiaowen and Liu, Yang and Yuan, Tao and Wu, Yuwei and Jia, Yunde and Zhu, Song-Chun and Li, Qing},
  equalauthor = {Zhang, Xintong and Gao, Zhi},
  journal = {Preprint},
  year = {2025},
}

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Pengxiang Li* , Zhi Gao* , Bofei Zhang, and 8 more authors

Advances in Neural Information Processing Systems (NeurIPS), 2025

arXiv Bib Code Website

@article{li2025sport,
  title = {Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning},
  author = {Li, Pengxiang and Gao, Zhi and Zhang, Bofei and Mi, Yapeng and Ma, Xiaojian and Shi, Chenrui and Yuan, Tao and Wu, Yuwei and Jia, Yunde and Zhu, Song-Chun and Li, Qing},
  equalauthor = {Li, Pengxiang and Gao, Zhi},
  journal = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2025},
}

TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials

Bofei Zhang* , Zirui Shang* , Zhi Gao*, and 7 more authors

AAAI Conference on Artificial Intelligence (AAAI), 2026

arXiv Bib Code Website

@article{zhang2025tongui,
  title = {TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials},
  author = {Zhang, Bofei and Shang, Zirui and Gao, Zhi and Zhang, Wang and Xie, Rui and Ma, Xiaojian and Yuan, Tao and Wu, Xinxiao and Zhu, Song-Chun and Li, Qing},
  equalauthor = {Zhang, Bofei and Shang, Zirui and Gao, Zhi},
  journal = {AAAI Conference on Artificial Intelligence (AAAI)},
  year = {2026},
}

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage Spotlight (Top 5%)

Zhi Gao* , Bofei Zhang* , Pengxiang Li*, and 7 more authors

International Conference on Learning Representations (ICLR), 2025

arXiv Bib Code Website

Spotlight (Top 5%)

@article{mat,
  title = {Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage},
  author = {Gao, Zhi and Zhang, Bofei and Li, Pengxiang and Ma, Xiaojian and Yuan, Tao and Fan, Yue and Wu, Yuwei and Jia, Yunde and Zhu, Song-Chun and Li, Qing},
  equalauthor = {Gao, Zhi and Zhang, Bofei and Li, Pengxiang},
  correspondence = {Li, Qing},
  year = {2025},
  journal = {International Conference on Learning Representations (ICLR)},
}

FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models

Pengxiang Li* , Zhi Gao* , Bofei Zhang*, and 6 more authors

Neural Information Processing Systems: Datasets and Benchmarks (NeurIPS D&B), 2024

arXiv Bib Code Website

@article{2024fire,
  title = {FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models},
  author = {Li, Pengxiang and Gao, Zhi and Zhang, Bofei and Yuan, Tao and Wu, Yuwei and Harandi, Mehrtash and Jia, Yunde and Zhu, Song-Chun and Li, Qing},
  journal = {Neural Information Processing Systems: Datasets and Benchmarks (NeurIPS D&B)},
  year = {2024},
  equalauthor = {Li, Pengxiang and Gao, Zhi and Zhang, Bofei},
  correspondence = {Wu, Yuwei and Li, Qing},
}