Nguyen (Will) Nguyen

Nguyen (William) Nguyen

Currently, I am a Senior Applied Scientist at Aitomatic. I graduated with a master's degree from the University of Rochester, where I was advised by Professor Chenliang Xu. During my time at the University of Rochester, I worked on instructional video understanding and vision-language problems.

I had nearly 3 wonderful years at VinAI Research where I worked on the scene text spotting problem under the supervision of Professor Nguyen Minh Hoai. I completed my bachelor's degree at the University of Engineering and Technology - VNU, where I had the opportunity to work with Professor Hoang Van Xiem.

Email / CV / LinkedIn / Bio /
Google Scholar / Twitter / Github
✨ Chat with my AI Twin ✨

News

03/2025: One paper accepted at JSAI 2025.

12/2024: One paper accepted at OSAI4MU-25 Workshop, AAAI 2025.

07/2024: Our SemiKong work was featured in VentureBeat, MSN, Yann LeCun's share, Meta AI Blog, Tom's Hardware, MarkTechPost, Digialps, Gadgets360, and many other media.

07/2024: One paper accepted at ACM MM 2024.

07/2024: I joined Aitomatic as a Senior Applied Scientist.

03/2024: One paper accepted at NAACL 2024.

03/2024: I received research internship offers from Bosch AI Research and Amazon, USA.

07/2023: One paper accepted at AV4D Workshop, ICCV 2023.

08/2022: I started my journey with the University of Rochester since the Fall 2022.

01/2022: I began working as an AI Research Engineer with the applied team at VinAI Research.

03/2021: One paper accepted at CVPR 2021.

07/2020: I graduated with Distinction from Vietnam National University in 2020.

12/2019: I started my journey with VinAI Research as an AI Research Resident in December 2019.

Research

I'm interested in computer vision, natural language processing, and machine learning (especially deep learning). Much of my research lies in optical character recognition, focusing on scene text recognition. I am also fond of Vision-Language problems, such as understanding visual content using natural language. Recently, I am working on developing domain-specific foundation models for industry.

Llamarine: Open-source Maritime Industry-specific Large Language Model
William Nguyen, An Phan, Konobu Kimura, Hitoshi Maeno, Mika Tanaka, Quynh Le, William Poucher, Christopher Nguyen,
Annual Conference of the Japanese Society for Artificial Intelligence, 2025
paper

The first opensource domain-specific LLM for the maritime industry. Our model outperforms serveral commercial products, including GPT-4o-mini GPT-4o, Claude-3.5-Sonnet, and opensource models such as Llama3.1 8B, Llama3.1 70B, and Llama3.3 70B.

SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model
Christopher Nguyen, William Nguyen, Atsushi Suzuki, Daisuke Oku, Hong An Phan, Sang Dinh, Zooey Nguyen, Anh Hai Ha, Shruti Raghavan, Huy Vo, Thang Nguyen, Lan Nguyen, Yoshikuni Hirayama
OSAI4MU-25 Workshop, AAAI, 2025
project page / github / paper

The first opensource domain-specific LLM for the semiconductor industry. Our model outperforms serveral commercial products, including Claude-3.5-Sonnet, Haiku, Opus, Command-R, and opensource models such as Llama3 70B.

DANA: Domain-Aware Neurosymbolic Agents for Consistency and Accuracy
Vinh Luong, Sang Dinh, Shruti Raghavan, William Nguyen, Zooey Nguyen, Quynh Le, Hung Vo, Kentaro Maegaito, Loc Nguyen, Thao Nguyen, Anh Hai Ha, Christopher Nguyen,
preprint, 2024
project page / github / paper

An unified agentic framework to incorporate expert knowledge using neurosymbolic to create domain specific agents that can perform with high consistency and accuracy, significantly outperform ChatGPT assistant and ReAct agent.

EAGLE: Egocentric AGgregated Language-video Engine
Jing Bi, Yunlong Tang, Luchuan Song, Ali Vosoughi, Nguyen Nguyen, Chenliang Xu
ACM MM, 2024
paper

An unified multimodal LLM designed for comprehensively solving all tasks related to egocentric video understanding.

OSCaR: Object State Captioning and State Change Representation
Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu
NAACL, 2024
github / paper

A new task of understanding the state of objects and the progression of object state changes. Besides, we trained a Multimodal-LLM that significantly surpasses previous state-of-the-art models. Our model achieved 90% quality of GPT4-V on both GPT4 and human evaluations.

Efficiently Leveraging Linguistics Knowledge for Scene Text Spotting
Nguyen Nguyen, Yapeng Tian, Chenliang Xu
Arxiv
paper

A simple but effective approach to incorporate language knowledge from large text corpus for improving both text detection and recognition.

MISAR: A Multimodal Instructional System with Augmented Reality
Jing Bi*, Nguyen Nguyen*, Ali Vosoughi*, Chenliang Xu (* equal contribution)
AV4D Workshop, ICCV, 2023
github / paper

A comprehensive system designed to guide humans to work more efficiently and accurately. Our system leverages the power of LLMs to interpret and process information from visual, auditory, and contextual dimensions.

Dictionary-guided Scene Text Recognition
Nguyen Nguyen, Thu Nguyen, Vinh Tran, Minh Triet Tran, Thanh Duc Ngo, Thien Huu Nguyen, Minh Hoai Nguyen
CVPR, 2021
project page / github / paper

A novel approach to incorporate dictionary on both training and testing phases. Additionally, we also introduced a novel Vietnamese scene text dataset (VinText), the largest scene text dataset for Vietnamese.

Patents

Delivering Domain-Expert Agents and Models Using Synthetic Knowledge
Christopher Nguyen, Manh-Nguyen Nguyen, Hong An Phan, Zooey Nhu-Quynh Nguyen, The-Vinh Luong, Elise NhuY Nguyen, Thomas Rasmussen, Anh Hai Ha, Phi-Hung Vo, Xuan-Sang Dinh, Huy-Thuan Bui, Anh-Quoc Dang, Timothy Michael Gerard Rozario
US Patent App. 63/726,322, 2024.
Delivering Domain-Expert Agents for Improving Problem-Solving
Christopher Nguyen, Manh-Nguyen Nguyen, Hong An Phan, Zooey Nhu-Quynh Nguyen, The-Vinh Luong, Elise Nhu-Y Nguyen, Thomas Rasmussen, Anh Hai Ha, Phi-Hung Vo, Xuan-Sang Dinh, Huy-Thuan Bui, Anh-Quoc Dang
US Patent App. 63/721,419, 2024.
Domain-Aware Neurosymbolic Agents For Improving Problem-Solving Accuracy And Consistency
Christopher Nguyen, The Vinh Luong, Xuan Sang Dinh, Zooey Nhu-Quynh Nguyen, Shruti Raghavan, Manh-Nguyen Nguyen, Quynh Thi-Tham Le, Phi Hung Vo, Tan Loc Nguyen, Anh Hai Ha, Phuong Thao Nguyen
US Patent App. 63/696,337, 2024.

Master Thesis

State-aware Object Understanding
[Master's Thesis]

Services

	Reviewer WACV 2022, CVPR 2023, CVPR 2024, ACMMM 2024, AAAI 2025, NAACL 2025, CVPR 2025, ACL 2025
	Competition jury member (2021) Ho Chi Minh city AI Challenge 2021: Vietnamese Scene Text Recognition
	Teaching assistant (2018 - 2019) University of Engineering and Technology - Vietnam National University Teaching assistant in several computer vision and machine learning courses for Samsung Display Vietnam's staff

Short CV

Aitomatic July. 2024 - present

Senior Applied Scientist
Domain-specific Foundation Models and Agents
University of Rochester Aug. 2022 - May. 2024

Master Student
Department of Computer Science
VinAI Research Jan. 2022- June. 2022

AI Research Engineer
R&D Group
VinAI Research Dec. 2019- Dec. 2021

AI Research Resident
Computer Vision Research Group
Teko Vietnam April. 2019- Nov. 2019

AI Engineer Intern
Data Science Group
Vietnam National University 2016 - 2020

B.S. Student
Information Technology Department

This website uses source code from http://jonbarron.info and https://www.yapengtian.com