Hi, I'm Kai Zhang.

I'm a third-year PhD student fortunately advised by Prof. Yu Su at The Ohio State University.

Recently, I've been particularly interested in multimodal models and agents. I believe in data.

Find me on , , and .

What's New

Jan 2025

PathGen-1.6M (Oral) and MuirBench were accepted to ICLR'25, and Agent Planning Analysis was accepted to NAACL'25.

Aug 2024

PathMMU was accepted to ECCV'24 as Best Paper Finalist (0.2%).

May 2024

MagicLens (Oral) and TravelPlanner (Spotlight) were accepted to ICML'24.

Mar 2024

Excited to present MagicLens done at Google DeepMind: next-generation image retrieval models with SOTA results on 10 benchmarks across multimodality-to-image, image-to-image, and text-to-image.

Feb 2024

MMMU was accepted to CVPR'24 as Best Paper Finalist (0.2%) and I will be in MSR this summer. See you in Seattle :)

Feb 2024

Released TravelPlanner, a real-world benchmark for planning with language agents.

Jan 2024

Three papers got accpeted to ICLR'24: KnowledgeConflict (Spotlight), MUFFIN, and ImagenHub.

Oct 2023

Attribution Evaluation was accepted to Findings of EMNLP'23.

Sept 2023

MagicBrush was accepted to NeurIPS'23 Datasets and Benchmarks Track.

Aug 2023

Excited to start my internship at Google DeepMind (Previously Google Brain)!

Selected Publications

See full list in Publications.

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Yu Gu*, Kai Zhang*, Yuting Ning*, Boyuan Zheng*, Boyu Gou, Tianci Xue, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu Su

ArXiv'25 Paper Code Data
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang

ICML'24 Oral (1.5%) Paper Website Code
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang Yue*, Yuansheng Ni*, Kai Zhang*, Tianyu Zheng*, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

CVPR'24 Best Paper Finalist (0.2%) Paper Website Code Data
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts

Jian Xie*, Kai Zhang*, Jiangjie Chen, Renze Lou, Yu Su

ICLR'24 Spotlight (5%) Paper Code
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

Kai Zhang*, Lingbo Mo*, Wenhu Chen, Huan Sun, Yu Su

NeurIPS'23 Datasets and Benchmarks Paper Website Code Data

Contact

Email: [LAST_NAME].13253@osu.edu OR drogo[LAST_NAME]@gmail.com

Feel free to contact me if you are interested in my research or want to discuss relevant research topic :)