KiHyun Nam

About me

I am a second-year Ph.D. candidate at KAIST advised by Professor Joon Son Chung, and I earned my M.S. from KAIST. My research focuses on audio representation learning and its connection to Audio-LLM systems. I am interested in how audio representations can be made robust under real-world variability, transformed across representation spaces, and exposed to language-model interfaces.

My work started from robust speech and speaker representation learning, where audio models must preserve task-relevant information while handling language, environment, session, channel, and recording mismatch. Building on this foundation, I have recently explored generative representation alignment: using diffusion-based methods to enhancement or transform audio representations at the latent level. In this direction, SEED studies speaker embedding enhancement, while Diffusion-Link bridges the audio-text modality gap by transporting audio representations toward text-aligned spaces for Audio-LLMs and multimodal LLMs.

More recently, I have been interested in specialized Audio-LLM systems. As voice becomes a primary interface for robots, smart glasses, wearables, vehicles, and other physical AI systems, Audio-LLMs will need capabilities beyond ASR. My recent work SpeakerLLM explores this direction by enabling an Audio-LLM to understand speaker identity, recording conditions, utterance-pair relations, and verification reasoning through a natural-language interface.

Going forward, I am interested in building audio-native intelligence for voice-first systems, including speaker-aware Audio-LLMs, audio-text representation alignment, and full-duplex speech-to-speech agents. My long-term goal is to build Audio-LLM systems that use audio not only as transcribed text, but as rich evidence for reasoning, personalization, and real-time interaction.

Experience

Deep Learning Research Intern, NAVER Clova Speech (now NAVER CLOUD), S. Korea

Sep. 2019 - Feb. 2020

Deep Learning Research Intern, NAVER Clova Speech (now NAVER CLOUD), S. Korea

Mar. 2021 - Sep. 2021

Education

Ph.D. in School of Electrical Engineering, KAIST

Sept. 2024 - Present

Advisor: Joon Son Chung (Multimodal AI Lab)

M.S. in School of Electrical Engineering, KAIST

Aug. 2022 - Aug. 2024

Advisor: Joon Son Chung (Multimodal AI Lab)

B.S. in Computer Science, Hankuk University of Foreign Studies (HUFS)

Mar. 2015 - Aug. 2022

Selected Awards

2024

NIST 2024 Speaker Recognition Evaluation – 1st Place (Audio Track) / 4th Place (Audio‑Visual Track) – Collaboration with Microsoft, KAIST MMAI Lab, PolyU, NUS and UEF

Publications

2026

SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning
KiHyun Nam, J. W. Heo, S. Bae, H. J. Yu, and J. S. Chung
Preprint, 2026. Paper

Diffusion‑Link: Diffusion Probabilistic Model for Bridging the Audio‑Text Modality Gap
KiHyun Nam^*, J. M. Choi^*, H. K. Lee, J. W. Heo, and J. S. Chung
ICASSP, 2026. Paper

2025

SEED: Speaker Embedding Enhancement Diffusion Model
KiHyun Nam, J. W. Heo, J. W. Jung, G. Park, C. Jung, H. J. Yu, and J. S. Chung
INTERSPEECH, 2025. Paper Code

2024

Disentangled Representation Learning for Environment‑agnostic Speaker Recognition
KiHyun Nam, H. S. Heo, J. W. Jung, and J. S. Chung
INTERSPEECH, 2024. Paper Project Page Code

Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification
H. S. Heo, KiHyun Nam, B. J. Lee, Y. Kwon, M. Lee, Y. J. Kim, and J. S. Chung
ICASSP, 2024. Paper

TalkNCE: Improving Active Speaker Detection with Talk‑Aware Contrastive Learning
C. Jung^*, S. Lee^*, KiHyun Nam, K. Rho, Y. J. Kim, Y. Jang, and J. S. Chung
ICASSP, 2024. Paper

VoxMM: Rich Transcription of Conversations in the Wild
D. Kwak^*, J. Jung^*, KiHyun Nam, Y. Jang, J. W. Jung, S. Watanabe, and J. S. Chung
ICASSP, 2024. Paper Dataset

2023

Disentangled Representation Learning for Multilingual Speaker Recognition
KiHyun Nam^*, Y. Kim^*, J. Huh, H. S. Heo, J. W. Jung, and J. S. Chung
INTERSPEECH, 2023. Paper Project Page

2020

ClovaCall: Korean Goal‑Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers
J. Ha^*, KiHyun Nam^*, J. Kang, S. Lee, S. Yang, H. Jung, H. Kim, E. Kim, S. Kim, H. A. Kim, K. Doh, C. K. Lee, N. Sung, S. Kim
INTERSPEECH, 2020. Paper Code