# Dataset Release: CORD-Instruct and Parsing-Bench

Associated with the paper: *On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning*

## Repository Structure

```
.
├── CORD-Instruct-with-RRPrompt.json
├── CORD-Instruct.json
├── Parsing-Bench
│   ├── answers
│   ├── answers_gpt4o.jsonl
│   ├── context.jsonl
│   ├── images
│   ├── questions.jsonl
│   └── reviews
└── README.md
```

## Overview

This repository includes:
1. **CORD-Instruct Metadata**: `CORD-Instruct.json`
2. **CORD-Instruct with RRPrompt**: `CORD-Instruct-with-RRPrompt.json`
3. **Parsing-Bench Components**: Contains necessary files to run the Parsing-Bench.

### CORD-Instruct

- `CORD-Instruct.json`: Metadata for the CORD-Instruct dataset.
- `CORD-Instruct-with-RRPrompt.json`: Metadata with RRPrompt applied.

### Parsing-Bench

- `answers_gpt4o.jsonl`: Answers generated using the GPT-4o model.
- `context.jsonl`: Contextual information associated with the questions.
- `images`: Directory where images will reside (not included in this repository).
- `questions.jsonl`: List of questions for Parsing-Bench evaluation.

## Notes

- **Images Not Included**: Original images are not included directly in this repository. You can link the provided metadata with the original images based on their names.
- **Future Enhancements**: We plan to release scripts to support data download and conversion for easier usage.