Nguyen (Will) Nguyen

Currently, I am an Applied Scientist at Aitomatic. I graduated with a master's degree from the University of Rochester, where I was advised by Professor Chenliang Xu. During my time at the University of Rochester, I worked on instructional video understanding and vision-language problems.

I had nearly 3 wonderful years at VinAI Research where I worked on the scene text spotting problem under the supervision of Professor Nguyen Minh Hoai. I completed my bachelor's degree at the University of Engineering and Technology - VNU, where I had the opportunity to work with Professor Hoang Van Xiem.

Email  /  CV  /  Bio  /  Google Scholar  /  Twitter  /  Github

profile photo
News
  • 07/2024: One paper accepted at ACM MM 2024.

  • 07/2024: I joined Aitomatic as an Applied Scientist.

  • 03/2024: One paper accepted at NAACL 2024.

  • 03/2024: I received research internship offers from Bosch AI Research and Amazon, USA.

  • 07/2023: One paper accepted at AV4D Workshop, ICCV 2023.

  • 08/2022: I started my journey with the University of Rochester since the Fall 2022.

  • 01/2022: I began working as an AI Research Engineer with the applied team at VinAI Research.

  • 03/2021: One paper accepted at CVPR 2021.

  • 07/2020: I graduated with Distinction from Vietnam National University in 2020.

  • 12/2019: I started my journey with VinAI Research as an AI Research Resident in December 2019.

Research

I'm interested in computer vision, natural language processing, and machine learning (especially deep learning). Much of my research lies in optical character recognition, focusing on scene text recognition. I am also fond of Vision-Language problems, such as understanding visual content using natural language. Recently, I am working on developing domain-specific foundation models for industry.

EAGLE: Egocentric AGgregated Language-video Engine
Jing Bi, Yunlong Tang, Luchuan Song, Ali Vosoughi, Nguyen Nguyen, Chenliang Xu
ACM MM, 2024
project page / github / paper

An unified multimodal LLM designed for comprehensively solving all tasks related to egocentric video understanding.

OSCaR: Object State Captioning and State Change Representation
Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu
NAACL, 2024
github / paper

A new task of understanding the state of objects and the progression of object state changes. Besides, we trained a Multimodal-LLM that significantly surpasses previous state-of-the-art models. Our model achieved 90% quality of GPT4-V on both GPT4 and human evaluations.

Efficiently Leveraging Linguistics Knowledge for Scene Text Spotting
Nguyen Nguyen, Yapeng Tian, Chenliang Xu
Arxiv
paper

A simple but effective approach to incorporate language knowledge from large text corpus for improving both text detection and recognition.

MISAR: A Multimodal Instructional System with Augmented Reality
Jing Bi*, Nguyen Nguyen*, Ali Vosoughi*, Chenliang Xu (* equal contribution)
AV4D Workshop, ICCV, 2023
github / paper

A comprehensive system designed to guide humans to work more efficiently and accurately. Our system leverages the power of LLMs to interpret and process information from visual, auditory, and contextual dimensions.

Dictionary-guided Scene Text Recognition
Nguyen Nguyen, Thu Nguyen, Vinh Tran, Minh Triet Tran, Thanh Duc Ngo, Thien Huu Nguyen, Minh Hoai Nguyen
CVPR, 2021
project page / github / paper

A novel approach to incorporate dictionary on both training and testing phases. Additionally, we also introduced a novel Vietnamese scene text dataset (VinText), the largest scene text dataset for Vietnamese.

Services

Reviewer
WACV 2022, CVPR 2023, CVPR 2024, ACMMM 2024

Competition jury member (2021)
Ho Chi Minh city AI Challenge 2021: Vietnamese Scene Text Recognition

TA

Teaching assistant (2018 - 2019)
University of Engineering and Technology - Vietnam National University
Teaching assistant in several computer vision and machine learning courses for Samsung Display Vietnam’s staff

Short CV
  • Aitomatic July. 2024 - present
    Applied Scientist
    Domain-specific Foundation Models
  • University of Rochester Aug. 2022 - May. 2024
    Master Student
    Department of Computer Science
  • VinAI Research Jan. 2022- June. 2022
    AI Research Engineer
    R&D Group
  • VinAI Research Dec. 2019- Dec. 2021
    AI Research Resident
    Computer Vision Research Group
  • Teko Vietnam April. 2019- Nov. 2019
    AI Engineer Intern
    Data Science Group
  • Vietnam National University 2016 - 2020
    B.S. Student
    Information Technology Department

This website uses source code from http://jonbarron.info and https://www.yapengtian.com