KiHyun Nam

About me

I am a second-year Ph.D. student at KAIST advised by Professor Joon Son Chung., and I earned my M.S. from KAIST. My research builds plug-and-play pathways to add new modality channels—currently audio—to pretrained LLMs by bridging the modality gap at the representation level. I am exploring diffusion-based representation transport toward an omni-space to reduce reliance on expensive paired multimodal datasets, with strong interest in audio LLMs and duplex speech-to-speech. Previously, I worked on robust “in-the-wild” speech (bilingual/environmental mismatch), large-scale call-center ASR, and disentangled representation learning.

Experience

Deep Learning Research Intern, NAVER Clova Speech (now NAVER CLOUD), S. Korea

Sep. 2019 - Feb. 2020

Deep Learning Research Intern, NAVER Clova Speech (now NAVER CLOUD), S. Korea

Mar. 2021 - Sep. 2021

Education

Ph.D. in School of Electrical Engineering, KAIST

Sept. 2024 - Present

Advisor: Joon Son Chung (Multimodal AI Lab)

M.S. in School of Electrical Engineering, KAIST

Aug. 2022 - Aug. 2024

Advisor: Joon Son Chung (Multimodal AI Lab)

B.S. in Computer Science, Hankuk University of Foreign Studies (HUFS)

Mar. 2015 - Aug. 2022

Selected Awards

2024

NIST 2024 Speaker Recognition Evaluation – 1st Place (Audio Track) / 4th Place (Audio‑Visual Track) – Collaboration with Microsoft, KAIST MMAI Lab, PolyU, NUS and UEF

Publications

2025

Diffusion‑Link: Diffusion Probabilistic Model for Bridging the Audio‑Text Modality Gap
KiHyun Nam^*, J. M. Choi^*, H. K. Lee, J. W. Heo, and J. S. Chung
preprint, submitted to ICASSP 2026. Paper

SEED: Speaker Embedding Enhancement Diffusion Model
KiHyun Nam, J. W. Heo, J. W. Jung, G. Park, C. Jung, H. J. Yu, and J. S. Chung
INTERSPEECH, 2025. Paper Code

2024

Disentangled Representation Learning for Environment‑agnostic Speaker Recognition
KiHyun Nam, H. S. Heo, J. W. Jung, and J. S. Chung
INTERSPEECH, 2024. Paper Project Page Code

Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification
H. S. Heo, KiHyun Nam, B. J. Lee, Y. Kwon, M. Lee, Y. J. Kim, and J. S. Chung
ICASSP, 2024. Paper

TalkNCE: Improving Active Speaker Detection with Talk‑Aware Contrastive Learning
C. Jung^*, S. Lee^*, KiHyun Nam, K. Rho, Y. J. Kim, Y. Jang, and J. S. Chung
ICASSP, 2024. Paper

VoxMM: Rich Transcription of Conversations in the Wild
D. Kwak^*, J. Jung^*, KiHyun Nam, Y. Jang, J. W. Jung, S. Watanabe, and J. S. Chung
ICASSP, 2024. Paper Dataset

2023

Disentangled Representation Learning for Multilingual Speaker Recognition
KiHyun Nam^*, Y. Kim^*, J. Huh, H. S. Heo, J. W. Jung, and J. S. Chung
INTERSPEECH, 2023. Paper Project Page

2020

ClovaCall: Korean Goal‑Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers
J. Ha^*, KiHyun Nam^*, J. Kang, S. Lee, S. Yang, H. Jung, H. Kim, E. Kim, S. Kim, H. A. Kim, K. Doh, C. K. Lee, N. Sung, S. Kim
INTERSPEECH, 2020. Paper Code