Code
Public training, inference, evaluation, and data-preparation code, packaged with UV-based install instructions.
Object State Captioning and State Change Representation for egocentric video. The public release includes training code, inference and evaluation code, a Hugging Face dataset, and model weights for the OSCaR benchmark.
Public training, inference, evaluation, and data-preparation code, packaged with UV-based install instructions.
OSCaR assets, manifests, benchmark splits, and metadata prepared for the Hugging Face dataset release.
Released projector checkpoints and LoRA adapter repos, plus code to build merged models locally if needed.
huggingface-cli download ali-vosoughi/oscar-llava-v1.5-13b-oscar-adapter --local-dir ../oscar-llava-v1.5-13b-oscar-adapter
Then load it with --model-base lmsys/vicuna-13b-v1.5.
huggingface-cli download ali-vosoughi/oscar-dataset --repo-type dataset --local-dir ../oscar-dataset
Use DATASET_ROOT=../oscar-dataset and PATH_PREFIX=$DATASET_ROOT/data.
Download a released projector repo, point MM_PROJECTOR_PATH to mm_projector.bin, then run the public fine-tune scripts.
The released OSCaR frames and clip-level assets are derived from EPIC-KITCHENS and Ego4D source videos.
Download the Hugging Face dataset locally, set DATASET_ROOT, then use the preserved manifests and splits with the public train and eval scripts.
huggingface-cli download ali-vosoughi/oscar-dataset --repo-type dataset --local-dir ../oscar-dataset
export DATASET_ROOT=../oscar-dataset
export PATH_PREFIX="$DATASET_ROOT/data"
The main entry files are manifests/llava_data.json, splits/data_mapping_final_EK_test.csv, and metadata/segment_index.csv.
Install the public environment with UV:
uv venv --python 3.10 .venv && source .venv/bin/activate && uv pip install -e .[train,inference,eval,release]
Training, inference, and evaluation entrypoints are documented in the repository root:
Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu
OSCaR builds on the LLaVA codebase and training stack. We thank the LLaVA team for their open-source release and the strong baseline it provided for this work.
OSCaR also builds on source video data from EPIC-KITCHENS and Ego4D. We thank those teams for releasing the datasets from which the OSCaR frames and clips are derived.
Approved for public release; distribution is unlimited.
This work has been supported by the Defense Advanced Research Projects Agency (DARPA) under Contract HR00112220003. The content of the information does not necessarily reflect the position of the Government, and no official endorsement should be inferred.
OSCaR is released as a coordinated GitHub + Hugging Face project surface. The code repository intentionally excludes large data and weight payloads.