Hi, I'm Kai Zhang.
I'm a third-year PhD student fortunately advised by Prof. Yu Su at The Ohio State University.
I'm broadly interested in NLP, Multimodality (esp. Vision-Language), and their real-world applications.
Aug 2024
PathMMU was accepted to ECCV'24 as Best Paper Finalist (0.2%).
May 2024
MagicLens (Oral) and TravelPlanner (Spotlight) were accepted to ICML'24.
Mar 2024
Excited to present MagicLens done at Google DeepMind: next-generation image retrieval models with SOTA results on 10 benchmarks across multimodality-to-image, image-to-image, and text-to-image.
Feb 2024
MMMU was accepted to CVPR'24 as Best Paper Finalist (0.2%) and I will be in MSR this summer. See you in Seattle :)
Feb 2024
Released TravelPlanner, a real-world benchmark for planning with language agents.
Jan 2024
Three papers got accpeted to ICLR'24: KnowledgeConflict (Spotlight), MUFFIN, and ImagenHub.
Oct 2023
Attribution Evaluation was accepted to Findings of EMNLP'23.
Sept 2023
MagicBrush was accepted to NeurIPS'23 Datasets and Benchmarks Track.
Aug 2023
Excited to start my internship at Google DeepMind (Previously Google Brain)!
See full list in Publications.
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Feel free to contact me if you are interested in my research or want to discuss relevant research topic :)