Avatar

Dahun Kim

Research Scientist

Google DeepMind


CV | Google Scholar | Github


I am a Research Scientist at Google DeepMind (MTV, CA).

Recently, my research interests are on improving the capabilities of Large Multimodal Models (eg, Gemini), and understanding the interaction of vision and language.

I obtained my Ph.D. and M.S. at KAIST, advised by Professor In So Kweon. I have been fortunate to collaborate with Adobe Research (2019), Google Brain (2020), and Google Research (2021). I am a recipient of Microsoft Research Asia Fellowship, Qualcomm Innovation Fellowship and Global Ph.D Fellowship from NRF Korea.

Contact

  • mcahny01 [at] gmail.com

    mcahny [at] google.com

  • Googleplex, 1600 Amphitheatre Pkwy, Mountain View, CA 94043

Education

  • PhD in EE, KAIST, 2022

    on "Learning Dense Pixel Features for Video Processing and Understanding"

  • MS in EE, KAIST, 2018

    on "Reducing Human Supervision in Supervised Learning"

  • BS in EE, KAIST, 2016

  • Exchange Student Program, 2014

    KTH Royal Institute of Technology in Stockholm, Sweden

Academic Activities

  • Area Chair in CVPR 2024
  • Area Chair in NeurIPS 2023
  • Area Chair in CVPR 2023
  • Outstanding Reviewer in CVPR 2021, ECCV 2020
  • Reviewer at CVPR, NeurIPS, ICLR, ICCV, ECCV, ICML, AAAI, EG, TPAMI, TNNLS, TIP

Research Experiences

  • Google DeepMind, MTV, CA
    Apr 2023 - Present

    Research Scientist
  • Google Brain, MTV, CA
    Jul 2022 - Apr 2023

    Research Scientist
  • Google Research, LA, CA (virtual)
    May 2021 - Jan 2022

    Research Intern, worked with Liang-Chieh Chen, and Jun Xie
  • Google Brain, MTV, CA (virtual)
    Jun 2020 - Nov 2020

    Research Intern, worked with Weicheng Kuo, Tsung-Yi Lin, and Anelia Anegelova
  • Adobe Research, San Jose, CA
    Jun 2019 - Sep 2019

    Research Intern, worked with: Joon-Young Lee
  • KAIST, Daejeon, Korea
    Mar 2016 - Feb 2022

    Research Assistant, Robotics and Computer Vision Lab.

Publications


    Multimodal AI - Vision and Language

  • Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

    AJ Piergiovanni, Isaac Nobel, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova

    CVPR 2024
    Featured at Google AI blogpost

    [ paper / Google blogpost ]

  • Contrastive Feature Masking Open-Vocabulary Vision Transformer

    Dahun Kim, Anelia Angelova, Weicheng Kuo

    ICCV 2023

    [ paper ]

  • Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

    Dahun Kim, Anelia Angelova, Weicheng Kuo

    CVPR 2023 Highlight presentation - top 2.5% of submissions
    Featured at Google AI blogpost

    [ paper / code / Google blogpost ]

  • MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

    Weicheng Kuo*, AJ Piergiovanni*, Dahun Kim, Xiyang Luo, Ben Caine, Wei Li, Abhijit Ogale,
       Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, Anelia Angelova (*, equal contribution)

    TMLR 2023
    Featured at Google AI blogpost

    [ paper / Google blogpost ]

  • RECLIP: Resource-Efficient Clip by Training with Small Images

    Runze Li, Dahun Kim, Bir Bhanu, Weicheng Kuo

    TMLR 2023

    [ paper ]

  • Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

    Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Roh

    Preprint

    [ paper ]


  • Perception - Object and Video Understanding

  • Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

    Inkyu Shin, Dahun Kim, Qihang Yu, Jun Xie, Hong-Seok Kim, Bradley Green, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen

    WACV 2024 Oral presentation
    Short version at 'Transformers for Vision' workshop @ CVPR 2023

    [ paper / video demo ]

  • MinDVPS: Minimal Model for Depth-aware Video Panoptic Segmentation

    Ji-Yeon Kim, Hyun-Bin Oh, Dahun Kim, Tae-Hyun Oh

    CVPRW 2023 'Vision-Centric Autonomous Driving' Workshop

    [ paper ]

  • Dense Pixel-level Interpretation of Dynamic Scenes with Video Panoptic Segmentation

    Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

    TIP 2022
    Short version at What is Motion For (WIMF) workshop @ ECCV 2022

    [ paper ]

  • TubeFormer-DeepLab: Video Mask Transformer

    Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen

    CVPR 2022
    Ranked #1 on SemKITTI-DVPS,  #3 on KITTI-STEP, and  #4 on VSPW 2021
    Short version at 'Transformers for Vision' workshop @ CVPR 2022

    [ paper ]

  • CMT-DeepLab: Dynamic Clustering Mask Transformers for Panoptic Segmentation

    Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

    CVPR 2022 Oral presentation

    [ paper ]

  • Learning Open-World Object Proposals without Learning to Classify

    Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, Weicheng Kuo

    RAL-ICRA 2022
    Invited paper talk at Open-World Segmentation (UVO) Workshop @ ICCV 2021
    Received Qualcomm Innovation Award 2021

    [ paper / code / tf2 / talk ]

  • Tailor Me: An Editing Network for Fashion Attribute Shape Manipulation

    Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Vishy Swaminathan, Henry Fuchs

    WACV 2022

    [ paper ]

  • Global Context and Geometric Priors for Effective Non-Local Self-Attention

    Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon

    BMVC 2021
    Received Bronze Prize, 27th HumanTech Paper Award, Samsung Electronics Co., Ltd

    [ paper ]

  • DeepLab2: A TensorFlow Library for Deep Labeling

    Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan,
       Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

    Technical report 2021 Internal code contribution

    [ paper / code ]

  • Learning to Associate Every Segment for Video Panoptic Segmentation

    Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon

    CVPR 2021

    [ paper ]

  • The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation

    Myungchul Kim, Sanghyun Woo, Dahun Kim, In So Kweon

    WACV 2021

    [ paper ]

  • Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

    Sanghyun Woo, Dahun Kim, KwanYoung Park, Joon-Young Lee, In So Kweon

    BMVC 2020 (Acceptance: 195/670 ≈ 29.1%)

    [ paper ]

  • Video Panoptic Segmentation

    Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

    CVPR 2020 Oral presentation (Acceptance: 335/6656 ≈ 5.0%)

    [ paper / code / project ]

  • Recurrent Temporal Aggregation Framework for Deep Video Inpainting

    Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

    TPAMI 2020
    Received KAIST-Samsung Industry-University Cooperation Best Paper Award

    [ paper / code ]

  • Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

    Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, In So Kweon

    AAAI 2020 (Acceptance: 1591/7737 ≈ 20.6%)

    [ paper ]

  • Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence

    Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

    CVPR 2019 (Acceptance: 1294/5160 ≈ 25.2%)

    1st place winner of ECCV 2018 Chalearn LAP Video De-Captioning Challenge

    [ paper / code / video / project ]

  • Deep Video Inpainting

    Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

    CVPR 2019 (Acceptance: 1294/5160 ≈ 25.2%)

    [ paper / code / video / project ]

  • Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles

    Dahun Kim, Donghyeon Cho, In So Kweon

    AAAI 2019 Oral presentation (Acceptance: 459/7095 ≈ 6.5%)

    [ paper ]

  • Discriminative Feature Learning for Unsupervised Video Summarization

    Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon

    AAAI 2019 Oral presentation (Acceptance: 459/7095 ≈ 6.5%)
    Received Honorable Mention, 25th HumanTech Paper Award, Samsung Electronics Co., Ltd

    [ paper ]

  • Video Retargeting: Trade-off between Content Preservation and Spatio-temporal Consistency

    Donghyeon Cho, Yunjae Jung, Francois Rameau, Dahun Kim, Sanghyun Woo and In So Kweon

    MM 2019 (Acceptance: 252/936 ≈ 26.9%)

    [ paper ]

  • Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation

    Kwanyong Park, Sanghyun Woo, Dahun Kim, Donghyeon Cho, In So Kweon

    MM 2019 (Acceptance: 252/936 ≈ 26.9%)

    [ paper ]

  • LinkNet: Relational Embedding for Scene Graph

    Sanghyun Woo*, Dahun Kim*, Donghyeon Cho, In So Kweon (* equal contribution)

    NeurIPS 2018 (Acceptance: 1011/4856 ≈ 20.8%)

    [ paper ]

  • Learning Image Representations by Completing Damaged Jigsaw Puzzles

    Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

    WACV 2018

    [ paper ]

  • Two Phase Learning for Weakly Supervised Object Localization

    Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

    ICCV 2017 (Acceptance: 621/2143 ≈ 28.9%)

    [ paper ]


  • 3D Representation - Avatar Modeling

  • Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling

    Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

    ICLR 2023

    [ paper / project ]

  • Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

    Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

    NeurIPS 2021 Spotlight presentation (Acceptance: < 3.0%)
    Received Bronze Prize, 28th HumanTech Paper Award, Samsung Electronics Co., Ltd

    [ paper / code / project ]

  • Rotationally-Consistent Novel View Synthesis for Humans

    Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Henry Fuchs, Vishy Swaminathan

    MM 2020 (Acceptance: 472/1698 ≈ 27.8%)

    [ paper / dataset ]

  • Rotationally-Temporally Consistent Novel-View Synthesis of Human Performance Video

    Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Eunbyung Park, Vishy Swaminathan, Henry Fuchs

    ECCV 2020 Spotlight presentation (Acceptance: 265/5025 ≈ 5.3%)

    [ paper / dataset / code ]

Interns whom I had the pleasure to work with

  • Shijiw Wang in Winter 2023, Ph.D. student at Brown University.
    hosted with Weicheng Kuo
  • Runze Li in Summer 2022. Finished Ph.D. at UC Riverside. Now at Google.
    hosted with Weicheng Kuo
  • Inkyu Shin in Summer 2022, Ph.D. student at KAIST.
    hosted with Liang-Chieh Chen and Jun Xie

Awards & Honors

  • NSF travel award for Doctoral Consortium, CVPR 2022
  • Best Ph.D. Thesis Award, EE, KAIST, 2022
  • Bronze Prize, 28th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2022 ($5,000)
  • Qualcomm Innovation Award ($4,000), 2021
  • Outstanding Reviewer Award, IEEE Conference on Computer Vision and Pattern Recognition, 2021
  • Bronze Prize, 27th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2021 ($5,000)
  • Outstanding Reviewer Award, European Conference on Computer Vision, 2020
  • KAIST-Samsung Industry-University Cooperation Best Paper Award ($3,000), 2020
  • Microsoft Research Asia (MSRA) Ph.D. Fellowship 2019 Winner ($10,000)
  • Global Ph.D. Fellowship, National Research Foundation of Korea ($60,000 + 3-year full scholarship)
  • 1st Place Award in ChaLearn LAP 2018 Inpainting Challenge Track2 - Video Decaptioning (ECCV 2018 challenge)
  • Honorable Mention, 25th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2019 ($2,000)
  • International Computer Vision Summer School (ICVSS) 2018, Sicily, Italy

US Patents

  • Video Panoptic Segmentation (issued, 11,640,714)
  • Panoptic Segmentation (issued, 11,256,960)
  • Electronic Device and Control Method of Same (pending, 17/554,142)
  • Method and Device for Hierarchical Learning of Neural Network Based on Weakly Supervised Learning (pending, 16/758,089)