Nguyen (Will) Nguyen

Currently, I am working at University of Rochester advised by Professor Chenliang Xu, where I work on instructional video understanding and vision-language problems.

I had nearly 3 wonderful years at VinAI Research where I worked on Scene text spotting problem under the supervision of Professor Nguyen Minh Hoai. I did my bachelor at University of Engineering and Technology - VNU, where I had chances to work with Professor Hoang Van Xiem.

Email  /  CV  /  Bio  /  Google Scholar  /  Twitter  /  Github

profile photo
  • 03/2024: One paper accepted at NAACL 2024.

  • 03/2024: I received research internship offers from Bosch AI Research and Amazon, USA.

  • 07/2023: One paper accepted at AV4D Workshop, ICCV 2023.

  • 08/2022: I started my journey with the University of Rochester since the Fall 2022.

  • 01/2022: I began working as an AI Research Engineer with the applied team at VinAI Research.

  • 03/2021: One paper accepted at CVPR 2021.

  • 07/2020: I graduated with Distinction from Vietnam National University in 2020.

  • 12/2019: I started my journey with VinAI Research as an AI Research Resident in December 2019.


I'm interested in computer vision, natural language processing, and machine learning (especially deep learning). Much of my research lies in optical character recognition, focusing on scene text recognition. I am also fond of Vision-Language problems, such as understanding visual content using natural language.

Under review, 2024
github / paper

A single multimodal LLM designed for comprehensively solving all tasks related to egocentric video understanding.

OSCaR: Object State Captioning and State Change Representation
Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu
NAACL (Findings), 2024
github / paper

A new task of understanding the state of objects and the progression of object state changes. Besides, we trained a Multimodal-LLM that significantly surpasses previous state-of-the-art models. Our model achieved 90% quality of GPT4-V on both GPT4 and human evaluations.

Efficiently Leveraging Linguistics Knowledge for Scene Text Spotting
Nguyen Nguyen, Yapeng Tian, Chenliang Xu
Under review

A simple but effective approach to incorporate language knowledge from large text corpus for improving both text detection and recognition.

MISAR: A Multimodal Instructional System with Augmented Reality
Jing Bi*, Nguyen Nguyen*, Ali Vosoughi*, Chenliang Xu (* equal contribution)
AV4D Workshop, ICCV, 2023
github / paper

A comprehensive system designed to guide humans to work more efficiently and accurately. Our system leverages the power of LLMs to interpret and process information from visual, auditory, and contextual dimensions.

Dictionary-guided Scene Text Recognition
Nguyen Nguyen, Thu Nguyen, Vinh Tran, Minh Triet Tran, Thanh Duc Ngo, Thien Huu Nguyen, Minh Hoai Nguyen
CVPR, 2021
project page / github / paper

A novel approach to incorporate dictionary on both training and testing phases. Additionally, we also introduced a novel Vietnamese scene text dataset (VinText), the largest scene text dataset for Vietnamese.


WACV 2022, CVPR 2023, CVPR 2024, ACMMM 2024

Competition jury member (2021)
Ho Chi Minh city AI Challenge 2021: Vietnamese Scene Text Recognition


Teaching assistant (2018 - 2019)
University of Engineering and Technology - Vietnam National University
Teaching assistant in several computer vision and machine learning courses for Samsung Display Vietnam’s staff

Short CV
  • University of Rochester 2022 - present
    Ph.D. Student
    Department of Computer Science
  • Bosch Research May. 2024 - August. 2024
    Research Intern
    Audio Group
  • VinAI Research Jan. 2022- June. 2022
    AI Research Engineer
    R&D Group
  • VinAI Research Dec. 2019- Dec. 2021
    AI Research Resident
    Computer Vision Research Group
  • Teko Vietnam April. 2019- Nov. 2019
    AI Engineer Intern
    Data Science Group
  • Vietnam National University 2016 - 2020
    B.S. Student
    Information Technology Department

This website uses source code from and