Dahun Kim

Google DeepMind

CV | Google Scholar | Github

I am a Senior Research Scientist at Google DeepMind (MTV, CA).

Recently, my research interests are on improving the capabilities of Large Multimodal Models (eg, Gemini), and understanding the interaction of vision and language.

I obtained my Ph.D. and M.S. at KAIST, advised by Professor In So Kweon. I have been fortunate to collaborate with Adobe Research (2019), Google Brain (2020), and Google Research (2021). I am a recipient of Microsoft Research Asia Fellowship, Qualcomm Innovation Fellowship and Global Ph.D Fellowship from NRF Korea.

Contact

mcahny01 [at] gmail.com

mcahny [at] google.com
Googleplex, 1600 Amphitheatre Pkwy, Mountain View, CA 94043

Education

PhD in EE, KAIST, 2022

on "Learning Dense Pixel Features for Video Processing and Understanding"
MS in EE, KAIST, 2018

on "Reducing Human Supervision in Supervised Learning"
BS in EE, KAIST, 2016
Exchange Student Program, 2014

KTH Royal Institute of Technology in Stockholm, Sweden

Academic Activities

Area Chair in NeurIPS 2025, 2024, 2023, ICML 2025, CVPR 2026, 2025, 2024, 2023
Action Editor of Transactions on Machine Learning Research (TMLR)
Outstanding Reviewer in CVPR 2021, ECCV 2020
Reviewer at CVPR, NeurIPS, ICLR, ICCV, ECCV, ICML, AAAI, EG, TPAMI, TNNLS, TIP

Research Experiences

Google DeepMind (previously Google Brain), MTV, CA
Jul 2022 - Present

Senior Research Scientist, Research Scientist
Google Research, LA, CA (virtual)
May 2021 - Jan 2022

Research Intern, worked with Liang-Chieh Chen, and Jun Xie
Google Brain, MTV, CA (virtual)
Jun 2020 - Nov 2020

Research Intern, worked with Weicheng Kuo, Tsung-Yi Lin, and Anelia Anegelova
Adobe Research, San Jose, CA
Jun 2019 - Sep 2019

Research Intern, worked with: Joon-Young Lee
KAIST, Daejeon, Korea

Mar 2016 - Feb 2022

Research Assistant, Robotics and Computer Vision Lab.

Publications

Multimodal AI

EmbeddingGemma: Powerful and Lightweight Text Representations

Gemini Embedding Team, Google

2025

[ paper / huggingface / Google blogpost ]
Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment

Dahun Kim, Anelia Angelova

COLM 2025

[ paper ]
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini Team, Google

2025

[ paper / Google blogpost ]
Time-Scaling State-Space Models for Dense Video Captioning

AJ Piergiovanni, Ganesh Mallya, Dahun Kim, Anelia Angelova

BMVC 2025

[ paper ]
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models

Dahun Kim, AJ Piergiovanni, Ganesh Mallya, Anelia Angelova

CVPR 2025

[ paper / data ]
Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications

Ganesh Mallya, Yotam Gigi, Dahun Kim, Maxim Neumann, Genady Beryozkin, Tomer Shekel, Anelia Angelova

AGU 2025 Oral presentation

[ paper / Google blogpost / Colab tutorial ]
Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning

AJ Piergiovanni, Dahun Kim, Michael S Ryoo, Isaac Noble, Anelia Angelova

Preprint 2025

[ paper ]
Learning Visual Grounding from Generative Vision and Language Model

Shijie Wang, Dahun Kim, Ali Taalimi, Chen Sun, Weicheng Kuo

WACV 2025

[ paper ]
Region-centric Image-Language Pretraining for Open-Vocabulary Detection

Dahun Kim, Anelia Angelova, Weicheng Kuo

ECCV 2024

[ paper / code / Google Cloud Vertex AI ]
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation

Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Roh

TASLP 2024 (IEEE/ACM Transactions on Audio, Speech and Language Processing)

[ paper ]
Omnibind: Teach to build unequal-scale modality interaction for omni-bind of all

Yuanhuiyi Lyu, Xu Zheng, Dahun Kim, Lin Wang

Preprint 2024

[ paper ]
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

AJ Piergiovanni*, Isaac Nobel*, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova

CVPR 2024
Featured at Google AI blogpost

[ paper / Google blogpost ]
Contrastive Feature Masking Open-Vocabulary Vision Transformer

Dahun Kim, Anelia Angelova, Weicheng Kuo

ICCV 2023

[ paper ]
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

Dahun Kim, Anelia Angelova, Weicheng Kuo

CVPR 2023 Highlight presentation - top 2.5% of submissions
Featured at Google AI blogpost

[ paper / code / Google blogpost ]
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Weicheng Kuo*, AJ Piergiovanni*, Dahun Kim†, Xiyang Luo†, Ben Caine, Wei Li, Abhijit Ogale,
Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, Anelia Angelova (*, † equal contribution)

TMLR 2023
Featured at Google AI blogpost

[ paper / Google blogpost ]
RECLIP: Resource-Efficient Clip by Training with Small Images

Runze Li, Dahun Kim, Bir Bhanu, Weicheng Kuo

TMLR 2023

[ paper ]

Perception - Object and Video Understanding

Uni-DVPS: Unified Model for Depth-Aware Video Panoptic Segmentation

Ji-Yeon Kim, Hyun-Bin Oh, Dahun Kim, Tae-Hyun Oh

RAL-IROS 2024 Oral presentation
Short version at CVPRW 2023 'Vision-Centric Autonomous Driving' Workshop

[ paper ]
Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

Inkyu Shin, Dahun Kim, Qihang Yu, Jun Xie, Hong-Seok Kim, Bradley Green, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen

WACV 2024 Oral presentation
Short version at 'Transformers for Vision' workshop @ CVPR 2023

[ paper / video demo ]
Dense Pixel-level Interpretation of Dynamic Scenes with Video Panoptic Segmentation

Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

TIP 2022
Short version at What is Motion For (WIMF) workshop @ ECCV 2022

[ paper ]
TubeFormer-DeepLab: Video Mask Transformer

Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen

CVPR 2022
Ranked #1 on SemKITTI-DVPS, #3 on KITTI-STEP, and #4 on VSPW 2021
Short version at 'Transformers for Vision' workshop @ CVPR 2022

[ paper ]
CMT-DeepLab: Dynamic Clustering Mask Transformers for Panoptic Segmentation

Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

CVPR 2022 Oral presentation

[ paper ]
Learning Open-World Object Proposals without Learning to Classify

Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, Weicheng Kuo

RAL-ICRA 2022
Invited paper talk at Open-World Segmentation (UVO) Workshop @ ICCV 2021
Received Qualcomm Innovation Award 2021

[ paper / code / tf2 / talk ]
Tailor Me: An Editing Network for Fashion Attribute Shape Manipulation

Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Vishy Swaminathan, Henry Fuchs

WACV 2022

[ paper ]
Global Context and Geometric Priors for Effective Non-Local Self-Attention

Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon

BMVC 2021
Received Bronze Prize, 27th HumanTech Paper Award, Samsung Electronics Co., Ltd

[ paper ]
DeepLab2: A TensorFlow Library for Deep Labeling

Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan,
Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

Technical report 2021 Internal code contribution

[ paper / code ]
Learning to Associate Every Segment for Video Panoptic Segmentation

Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon

CVPR 2021

[ paper ]
The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation

Myungchul Kim, Sanghyun Woo, Dahun Kim, In So Kweon

WACV 2021

[ paper ]
Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

Sanghyun Woo, Dahun Kim, KwanYoung Park, Joon-Young Lee, In So Kweon

BMVC 2020 (Acceptance: 195/670 ≈ 29.1%)

[ paper ]
Video Panoptic Segmentation

Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

CVPR 2020 Oral presentation (Acceptance: 335/6656 ≈ 5.0%)
Patented

[ paper / code / project ]
Recurrent Temporal Aggregation Framework for Deep Video Inpainting

Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

TPAMI 2020
Received KAIST-Samsung Industry-University Cooperation Best Paper Award

[ paper / code ]
Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, In So Kweon

AAAI 2020 (Acceptance: 1591/7737 ≈ 20.6%)

[ paper ]
Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence

Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

CVPR 2019 (Acceptance: 1294/5160 ≈ 25.2%)
1st place winner of ECCV 2018 Chalearn LAP Video De-Captioning Challenge
[ paper / code / video / project ]
Deep Video Inpainting

Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

CVPR 2019 (Acceptance: 1294/5160 ≈ 25.2%)

[ paper / code / video / project ]
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles

Dahun Kim, Donghyeon Cho, In So Kweon

AAAI 2019 Oral presentation (Acceptance: 459/7095 ≈ 6.5%)

[ paper ]
Discriminative Feature Learning for Unsupervised Video Summarization

Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon

AAAI 2019 Oral presentation (Acceptance: 459/7095 ≈ 6.5%)
Received Honorable Mention, 25th HumanTech Paper Award, Samsung Electronics Co., Ltd
Patented

[ paper ]
Video Retargeting: Trade-off between Content Preservation and Spatio-temporal Consistency

Donghyeon Cho, Yunjae Jung, Francois Rameau, Dahun Kim, Sanghyun Woo and In So Kweon

MM 2019 (Acceptance: 252/936 ≈ 26.9%)

[ paper ]
Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation

Kwanyong Park, Sanghyun Woo, Dahun Kim, Donghyeon Cho, In So Kweon

MM 2019 (Acceptance: 252/936 ≈ 26.9%)

[ paper ]
LinkNet: Relational Embedding for Scene Graph

Sanghyun Woo*, Dahun Kim*, Donghyeon Cho, In So Kweon (* equal contribution)

NeurIPS 2018 (Acceptance: 1011/4856 ≈ 20.8%)

[ paper ]
Learning Image Representations by Completing Damaged Jigsaw Puzzles

Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

WACV 2018

[ paper ]
Two Phase Learning for Weakly Supervised Object Localization

Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

ICCV 2017 (Acceptance: 621/2143 ≈ 28.9%)

[ paper ]

3D Representation - Avatar Modeling

Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling

Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

ICLR 2023

[ paper / project ]
Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

NeurIPS 2021 Spotlight presentation (Acceptance: < 3.0%)
Received Bronze Prize, 28th HumanTech Paper Award, Samsung Electronics Co., Ltd

[ paper / code / project ]
Rotationally-Consistent Novel View Synthesis for Humans

Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Henry Fuchs, Vishy Swaminathan

MM 2020 (Acceptance: 472/1698 ≈ 27.8%)

[ paper / dataset ]
Rotationally-Temporally Consistent Novel-View Synthesis of Human Performance Video

Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Eunbyung Park, Vishy Swaminathan, Henry Fuchs

ECCV 2020 Spotlight presentation (Acceptance: 265/5025 ≈ 5.3%)

[ paper / dataset / code ]

Interns whom I had the pleasure to work with

Shijiw Wang in Winter 2023, Ph.D. student at Brown University.
hosted with Weicheng Kuo
Runze Li in Summer 2022. Finished Ph.D. at UC Riverside. Now at Google.
hosted with Weicheng Kuo
Inkyu Shin in Summer 2022, Finished Ph.D. at KAIST. Now at TikTok Research.
hosted with Liang-Chieh Chen and Jun Xie

Awards & Honors

NSF travel award for Doctoral Consortium, CVPR 2022
Best Ph.D. Thesis Award, EE, KAIST, 2022
Bronze Prize, 28th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2022 ($5,000)
Qualcomm Innovation Award ($4,000), 2021
Outstanding Reviewer Award, IEEE Conference on Computer Vision and Pattern Recognition, 2021
Bronze Prize, 27th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2021 ($5,000)
Outstanding Reviewer Award, European Conference on Computer Vision, 2020
KAIST-Samsung Industry-University Cooperation Best Paper Award ($3,000), 2020
Microsoft Research Asia (MSRA) Ph.D. Fellowship 2019 Winner ($10,000)
Global Ph.D. Fellowship, National Research Foundation of Korea ($60,000 + 3-year full scholarship)
1st Place Award in ChaLearn LAP 2018 Inpainting Challenge Track2 - Video Decaptioning (ECCV 2018 challenge)
Honorable Mention, 25th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2019 ($2,000)
International Computer Vision Summer School (ICVSS) 2018, Sicily, Italy

US Patents

Video Panoptic Segmentation (issued, 11,640,714)
Panoptic Segmentation (issued, 11,256,960)
Electronic device for key frame analysis and control method thereof (issued, 12,175,369)
Methods and apparatus localizing object (s) in vision data (pending, 18,289,725)
Electronic Device and Control Method of Same (pending, 17/554,142)
Method and Device for Hierarchical Learning of Neural Network Based on Weakly Supervised Learning (pending, 16/758,089)