Hi, I'm Kai Zhang.
I'm a fourth-year PhD student fortunately advised by Prof. Yu Su and Prof. Huan Sun at OSU NLP Lab.
Recently, I've been particularly interested in multimodal models and agents. I believe in data.
Sept 2025
WebDreamer was accepted to TMLR'25; ARM (Spotlight), Mind2Web 2, and CPathAgent were accepted to NeurIPS'25. I will serve as an Area Chair for ICLR'26.
Jan 2025
PathGen-1.6M (Oral) and MuirBench were accepted to ICLR'25, and Planning Analysis was accepted to NAACL'25.
Aug 2024
PathMMU was accepted to ECCV'24 as Best Paper Finalist (0.2%).
May 2024
MagicLens (Oral) and TravelPlanner (Spotlight) were accepted to ICML'24.
Mar 2024
Excited to present MagicLens done at Google DeepMind: next-generation image retrieval models with SOTA results on 10 benchmarks across multimodality-to-image, image-to-image, and text-to-image.
Feb 2024
MMMU was accepted to CVPR'24 as Best Paper Finalist (0.2%) and I will be in MSR this summer. See you in Seattle :)
Jan 2024
Three papers got accpeted to ICLR'24: KnowledgeConflict (Spotlight), MUFFIN, and ImagenHub.
Sept 2023
MagicBrush was accepted to NeurIPS'23 Datasets and Benchmarks Track.
Aug 2023
Excited to start my internship at Google DeepMind (Previously Google Brain)!
See full list in Publications.
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing