I am a Senior Research Scientist at Google DeepMind (MTV, CA).

Recently, my research interests are on improving the capabilities of Large Multimodal Models (eg, Gemini), and understanding the interaction of vision and language.

I obtained my Ph.D. and M.S. at KAIST, advised by Professor In So Kweon. I have been fortunate to collaborate with Adobe Research (2019), Google Brain (2020), and Google Research (2021). I am a recipient of Microsoft Research Asia Fellowship, Qualcomm Innovation Fellowship and Global Ph.D Fellowship from NRF Korea.

Contact

  • mcahny01 [at] gmail.com

    mcahny [at] google.com

  • Googleplex, 1600 Amphitheatre Pkwy, Mountain View, CA 94043

Education

  • PhD in EE, KAIST, 2022

    on "Learning Dense Pixel Features for Video Processing and Understanding"

  • MS in EE, KAIST, 2018

    on "Reducing Human Supervision in Supervised Learning"

  • BS in EE, KAIST, 2016

  • Exchange Student Program, 2014

    KTH Royal Institute of Technology in Stockholm, Sweden

Academic Activities

  • Area Chair in CVPR 2024
  • Area Chair in NeurIPS 2023
  • Area Chair in CVPR 2023
  • Outstanding Reviewer in CVPR 2021, ECCV 2020
  • Reviewer at CVPR, NeurIPS, ICLR, ICCV, ECCV, ICML, AAAI, EG, TPAMI, TNNLS, TIP

Research Experiences

  • Google DeepMind, MTV, CA
    Apr 2023 - Present

    Senior Research Scientist, Research Scientist
  • Google Brain, MTV, CA
    Jul 2022 - Apr 2023

    Research Scientist
  • Google Research, LA, CA (virtual)
    May 2021 - Jan 2022

    Research Intern, worked with Liang-Chieh Chen, and Jun Xie
  • Google Brain, MTV, CA (virtual)
    Jun 2020 - Nov 2020

    Research Intern, worked with Weicheng Kuo, Tsung-Yi Lin, and Anelia Anegelova
  • Adobe Research, San Jose, CA
    Jun 2019 - Sep 2019

    Research Intern, worked with: Joon-Young Lee
  • KAIST, Daejeon, Korea
    Mar 2016 - Feb 2022

    Research Assistant, Robotics and Computer Vision Lab.

Publications


    Multimodal AI - Vision and Language

  • Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

    AJ Piergiovanni*, Isaac Nobel*, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova

    CVPR 2024
    Featured at Google AI blogpost

    [ paper / Google blogpost ]

  • Contrastive Feature Masking Open-Vocabulary Vision Transformer

    Dahun Kim, Anelia Angelova, Weicheng Kuo

    ICCV 2023

    [ paper ]

  • Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

    Dahun Kim, Anelia Angelova, Weicheng Kuo

    CVPR 2023 Highlight presentation - top 2.5% of submissions
    Featured at Google AI blogpost

    [ paper / code / Google blogpost ]

  • MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

    Weicheng Kuo*, AJ Piergiovanni*, Dahun Kim, Xiyang Luo, Ben Caine, Wei Li, Abhijit Ogale,
       Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, Anelia Angelova (*, equal contribution)

    TMLR 2023
    Featured at Google AI blogpost

    [ paper / Google blogpost ]

  • RECLIP: Resource-Efficient Clip by Training with Small Images

    Runze Li, Dahun Kim, Bir Bhanu, Weicheng Kuo

    TMLR 2023

    [ paper ]

  • Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

    Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Roh

    Preprint

    [ paper ]


  • Perception - Object and Video Understanding

  • MinDVPS: Minimal Model for Depth-aware Video Panoptic Segmentation

    Ji-Yeon Kim, Hyun-Bin Oh, Dahun Kim, Tae-Hyun Oh

    RA-L 2024
    Short version at CVPRW 2023 'Vision-Centric Autonomous Driving' Workshop

    [ paper ]

  • Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

    Inkyu Shin, Dahun Kim, Qihang Yu, Jun Xie, Hong-Seok Kim, Bradley Green, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen

    WACV 2024 Oral presentation
    Short version at 'Transformers for Vision' workshop @ CVPR 2023

    [ paper / video demo ]

  • Dense Pixel-level Interpretation of Dynamic Scenes with Video Panoptic Segmentation

    Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

    TIP 2022
    Short version at What is Motion For (WIMF) workshop @ ECCV 2022

    [ paper ]

  • TubeFormer-DeepLab: Video Mask Transformer

    Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen

    CVPR 2022
    Ranked #1 on SemKITTI-DVPS,  #3 on KITTI-STEP, and  #4 on VSPW 2021
    Short version at 'Transformers for Vision' workshop @ CVPR 2022

    [ paper ]

  • CMT-DeepLab: Dynamic Clustering Mask Transformers for Panoptic Segmentation

    Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

    CVPR 2022 Oral presentation

    [ paper ]

  • Learning Open-World Object Proposals without Learning to Classify

    Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, Weicheng Kuo

    RAL-ICRA 2022
    Invited paper talk at Open-World Segmentation (UVO) Workshop @ ICCV 2021
    Received Qualcomm Innovation Award 2021

    [ paper / code / tf2 / talk ]

  • Tailor Me: An Editing Network for Fashion Attribute Shape Manipulation

    Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Vishy Swaminathan, Henry Fuchs

    WACV 2022

    [ paper ]

  • Global Context and Geometric Priors for Effective Non-Local Self-Attention

    Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon

    BMVC 2021
    Received Bronze Prize, 27th HumanTech Paper Award, Samsung Electronics Co., Ltd

    [ paper ]

  • DeepLab2: A TensorFlow Library for Deep Labeling

    Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan,
       Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

    Technical report 2021 Internal code contribution

    [ paper / code ]

  • Learning to Associate Every Segment for Video Panoptic Segmentation

    Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon

    CVPR 2021

    [ paper ]

  • The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation

    Myungchul Kim, Sanghyun Woo, Dahun Kim, In So Kweon

    WACV 2021

    [ paper ]

  • Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

    Sanghyun Woo, Dahun Kim, KwanYoung Park, Joon-Young Lee, In So Kweon

    BMVC 2020 (Acceptance: 195/670 ≈ 29.1%)

    [ paper ]

  • Video Panoptic Segmentation

    Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

    CVPR 2020 Oral presentation (Acceptance: 335/6656 ≈ 5.0%)

    [ paper / code / project ]

  • Recurrent Temporal Aggregation Framework for Deep Video Inpainting

    Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

    TPAMI 2020
    Received KAIST-Samsung Industry-University Cooperation Best Paper Award

    [ paper / code ]

  • Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

    Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, In So Kweon

    AAAI 2020 (Acceptance: 1591/7737 ≈ 20.6%)

    [ paper ]

  • Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence

    Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

    CVPR 2019 (Acceptance: 1294/5160 ≈ 25.2%)

    1st place winner of ECCV 2018 Chalearn LAP Video De-Captioning Challenge

    [ paper / code / video / project ]

  • Deep Video Inpainting

    Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

    CVPR 2019 (Acceptance: 1294/5160 ≈ 25.2%)

    [ paper / code / video / project ]

  • Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles

    Dahun Kim, Donghyeon Cho, In So Kweon

    AAAI 2019 Oral presentation (Acceptance: 459/7095 ≈ 6.5%)

    [ paper ]

  • Discriminative Feature Learning for Unsupervised Video Summarization

    Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon

    AAAI 2019 Oral presentation (Acceptance: 459/7095 ≈ 6.5%)
    Received Honorable Mention, 25th HumanTech Paper Award, Samsung Electronics Co., Ltd

    [ paper ]

  • Video Retargeting: Trade-off between Content Preservation and Spatio-temporal Consistency

    Donghyeon Cho, Yunjae Jung, Francois Rameau, Dahun Kim, Sanghyun Woo and In So Kweon

    MM 2019 (Acceptance: 252/936 ≈ 26.9%)

    [ paper ]

  • Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation

    Kwanyong Park, Sanghyun Woo, Dahun Kim, Donghyeon Cho, In So Kweon

    MM 2019 (Acceptance: 252/936 ≈ 26.9%)

    [ paper ]

  • LinkNet: Relational Embedding for Scene Graph

    Sanghyun Woo*, Dahun Kim*, Donghyeon Cho, In So Kweon (* equal contribution)

    NeurIPS 2018 (Acceptance: 1011/4856 ≈ 20.8%)

    [ paper ]

  • Learning Image Representations by Completing Damaged Jigsaw Puzzles

    Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

    WACV 2018

    [ paper ]

  • Two Phase Learning for Weakly Supervised Object Localization

    Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

    ICCV 2017 (Acceptance: 621/2143 ≈ 28.9%)

    [ paper ]


  • 3D Representation - Avatar Modeling

  • Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling

    Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

    ICLR 2023

    [ paper / project ]

  • Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

    Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

    NeurIPS 2021 Spotlight presentation (Acceptance: < 3.0%)
    Received Bronze Prize, 28th HumanTech Paper Award, Samsung Electronics Co., Ltd

    [ paper / code / project ]

  • Rotationally-Consistent Novel View Synthesis for Humans

    Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Henry Fuchs, Vishy Swaminathan

    MM 2020 (Acceptance: 472/1698 ≈ 27.8%)

    [ paper / dataset ]

  • Rotationally-Temporally Consistent Novel-View Synthesis of Human Performance Video

    Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Eunbyung Park, Vishy Swaminathan, Henry Fuchs

    ECCV 2020 Spotlight presentation (Acceptance: 265/5025 ≈ 5.3%)

    [ paper / dataset / code ]

Interns whom I had the pleasure to work with

  • Shijiw Wang in Winter 2023, Ph.D. student at Brown University.
    hosted with Weicheng Kuo
  • Runze Li in Summer 2022. Finished Ph.D. at UC Riverside. Now at Google.
    hosted with Weicheng Kuo
  • Inkyu Shin in Summer 2022, Ph.D. student at KAIST.
    hosted with Liang-Chieh Chen and Jun Xie

Awards & Honors

  • NSF travel award for Doctoral Consortium, CVPR 2022
  • Best Ph.D. Thesis Award, EE, KAIST, 2022
  • Bronze Prize, 28th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2022 ($5,000)
  • Qualcomm Innovation Award ($4,000), 2021
  • Outstanding Reviewer Award, IEEE Conference on Computer Vision and Pattern Recognition, 2021
  • Bronze Prize, 27th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2021 ($5,000)
  • Outstanding Reviewer Award, European Conference on Computer Vision, 2020
  • KAIST-Samsung Industry-University Cooperation Best Paper Award ($3,000), 2020
  • Microsoft Research Asia (MSRA) Ph.D. Fellowship 2019 Winner ($10,000)
  • Global Ph.D. Fellowship, National Research Foundation of Korea ($60,000 + 3-year full scholarship)
  • 1st Place Award in ChaLearn LAP 2018 Inpainting Challenge Track2 - Video Decaptioning (ECCV 2018 challenge)
  • Honorable Mention, 25th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2019 ($2,000)
  • International Computer Vision Summer School (ICVSS) 2018, Sicily, Italy

US Patents

  • Video Panoptic Segmentation (issued, 11,640,714)
  • Panoptic Segmentation (issued, 11,256,960)
  • Electronic Device and Control Method of Same (pending, 17/554,142)
  • Method and Device for Hierarchical Learning of Neural Network Based on Weakly Supervised Learning (pending, 16/758,089)