# On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning

## Repository Structure

```
./
├── README.md
└── src
    ├── llama_xformers_attn_monkey_patch_update.py
    ├── llama_xformers_attn_monkey_patch_versionupdate.py
    ├── make_lmdb.py
    └── model_vqa_parsing.py
```

## Overview

This repository contains essential code modifications from our research based on the official LLaVA code. Our goal is to facilitate reproducibility by highlighting key adjustments necessary for our experiments.

### Key Modifications

1. **LMDB Integration**: To efficiently handle training data (3M records) for REncoder. (`make_lmdb.py`)
   
2. **Library Version Updates**: Necessary for using Phi-3. (`llama_xformers_attn_monkey_patch_update.py`, `llama_xformers_attn_monkey_patch_versionupdate.py`)

3. **Model Customization**: Adjustments for evaluating Parsing tasks. (`model_vqa_parsing.py`)

## Notes

Due to anonymization, this repository does not include a fully working version but should still serve as a useful reference. We plan to release a fully open-source version in the future.