Welcome!
Hi, I’m HyoJung. I am a 4th year Ph.D. student in Computer Science at the University of Maryland, College Park (UMD) advised by Marine Carpuat and Jordan Boyd-Graber. I participate in the Computational Linguistics and Information Processing (CLIP) Lab. In 2024, I am interning with Huda Khayrallah and Akiko Eriguchi at Microsoft. In 2023, I interned with Changhan Wang at Meta FAIR.
I am expecting to graduate in late 2025 or early 2026.
I’m interested in Multilingual and Multimodal NLP and its evaluation method for tackling language barriers and even background gaps. Here are subareas that I’ve worked/been working on:
- Machine translation models including ANY modalities like text-to-text, speech-to-text, or audio-visual-to-text and in diverse usage like simultaneous or offline(full-input) MT.
- Multilingual language model: What is multilinguality in LLM and how do we define/measure it? How does tokenization impact multilinguality in LLM? What non-English language benefits the most from vocabulary transfer? Also, How MT ability of LLM correlates to non-MT multilingual task performance like multilingual question answering accuracy? and Why do we have those gaps? How LLM process multilingual languages?
- Evaluation of cross-lingual task: How do we evaluate simultaneous translation well? How do we estimate the quality of speech translation output without reference? In the context of the MT-as-a-tool situation like in a multilingual downstream task, do popular MT metrics scores (like MetricX) correlate with final goal performance?
- Cultural-aware NLP: Is literal translation what people want? How can we provide translation considering cultural differences to have a better understanding of the MT output?
- Mental model of User in Human+AI Team scenario: How does the user’s mental model of MT impact the smart usage of MT tool to have the best human+AI team performance? How can we train users to have a good mental model–know when MT succeeds/fails?
All the above is super interesting but I am very open to do research on new topics!
Before my Ph.D., I was a research engineer at Samsung Research (SR), the advanced R&D hub of Samsung Electronics. I mainly worked on simultaneous and offline speech/text translation in the Efficient Neural Machine Translation Team at SR. I completed my M.S. at KAIST.
For more details about me, here’s my Curriculum Vitae.
Preprint
VocADT - Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
HyoJung Han, Akiko Eriguchi, Haoran Xu, Hieu Hoang, Marine Carpuat and Huda Khayrallah.
arXiv 2024.
[arXiv][code][model]
Recent Publications
SpeechQE: Estimating the Quality of Direct Speech Translation
HyoJung Han, Kevin Duh and Marine Carpuat.
EMNLP 2024. (main, long)
[arXiv][poster][video][code][model&dataset]
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception.
HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi and Changhan Wang.
ACL 2024. (main, long)
[arXiv][poster][video]
Bridging Background Knowledge Gaps in Translation with Automatic Explicitation.
HyoJung Han, Jordan Boyd-Graber and Marine Carpuat.
EMNLP 2023. (main, long)
[arXiv][poster][video][dataset]
SimQA: Detecting Simultaneous MT Errors through Word-by-Word Question Answering.
HyoJung Han, Marine Carpuat and Jordan Boyd-Graber.
EMNLP 2022. (main, long)
[pdf][poster][video][dataset]
News
- Nov/2024 - Presented SpeechQE at EMNLP2024, Miami!
- Nov/2024 - Invited to give a remote talk at KIT (Karlsruhe Institute of Technology) about XLAVS-R and SpeechQE.
- Oct/2024 - VocADT arXiv paper is out! Thanks to my manager and all the authors I worked with during my Microsoft internship.
- Sep/2024 - SpeechQE accepted at EMNLP 2024 main conference AGAINx2!
- Aug/2024 - Presented XLAVS-R at ACL2024, Bangkok ! Also, presented Automatic Explicitation at C3NLP@ACL
- May/2024 - Start research internship at Microsoft in Redmond!
- May/2024 - XLAVS-R accepted at ACL 2024 main conference
- Mar/2024 - XLAVS-R arXiv paper is out! Thanks to my manager and all the authors I worked with during my Meta AI internship.
- Dec/2023 - Presented Automatic Explicitation at EMNLP2023, Singapore
- Oct/2023 - Automatic Explicitation accepted at EMNLP 2023 main conference AGAIN!
- May/2023 - Start Research Scientist Intern at Meta FAIR in New York City!
- Jan/2023 - I won UMD Graduated School’s Outstanding Graduate Assistant Award All thanks to my advisors and CS department!
- Dec/2022 - Presented SimQA at EMNLP2022, Abu Dhabi
- Oct/2022 - SimQA accepted at EMNLP 2022 main conference!
- Sep/2021 - Paper accepted at WMT 2021.
- Aug/2021 - I am excited to start as a PhD student @UMD!
- Jan/2021 - Paper accepted at ICASSP 2021.
- Dec/2020 - Another arXiv paper is out
- Oct/2020 - I won Samsung Best Paper Awards 2020 as a first author! Thank you for the praise and prize !
- July/2020 - Invited as panelists in IWSLT 2020 Simultaneous Speech Translation task Panel discussion. See y’all at 9th July 12:30 PST!
- July/2020 - Our submission achieved the best score in the low-latency mode at Simultaneous Speech Translation task IWSLT 2020!
- May/2020 - Two papers accepted at IWSLT 2020.
- April/2020 - Gave my speech in PML4DC@ICLR2020.
- Jan/2020 - Paper accepted at ICASSP 2020.
HyoJung Han ← HouJeung Han
To avoid any confusion, I want to clarify that my alphabetical name has been officially changed to “HyoJung Han” from “HouJeung Han” since July 2020. (same Korean name “한효정”) After the Amendment to the Enforcement Decree of the Passport Act enables the revision, I decided to change it despite the mess afterward. The main reason is that there is an obvious discrepancy between the original pronunciation in Korean(close to HyoJung) and in the previous alphabetical name(HouJeung) as well as the resulting problems from small to critical one because of this.
I know most of my listed publications are in “HouJeung” but it’s me, “HyoJung”!