# Readme

This directory contains the code for our CoNLL 2023 paper `On the Effects of Structural Modeling for Neural Semantic Parsing`.

## Code

The dependencies:

- python 3.11 (at least 3.9)
- latest PyTorch is fine. we're using 2.0.1 currently.
- latest Huggingface transformers will be OK.
- the python package `trialbot` for running experiments.
- the python package `lark` as a parser generator.

How to use:

- change the working directory to `./src/exp/2023.conll`
- `python general_s2s.py -h` will list all the parameters. Specifying the `--dataset`, `-p ` is a must.
- specify `--test` and a model path for detained evaluations.

For example, the following command will issue a training with the BiLSTM encoder and the ONLSTM decoder on the GEO dataset with handcrafted grammar (although the chosen model isn't grammar-based.)

```bash
python general_s2s.py --dataset geo_cg_handcrafted -p seq2onlstm
```

Note the first time to use grammar-based dataset will force a parsing, which can be time-comsuming. We use a local redis server to cache the parsing results. For more information, please refer to the code in `./src/shujuji/cg_bundle.py`.

## Data

The datasets we using are all available publicly online:

1. For `atis, geo, advising, and scholar`, clone `https://github.com/inbaroren/improving-compgen-in-semparse.git` to `/path/to/repo` and make a symbolink under the `data` directory as `ln -s /path/to/repo/data CompGen`.
2. For the SMCalFlow-CS data, similarly clone `https://github.com/microsoft/compositional-generalization-span-level-attention.git` and make the link `ln -s /somepath/data/smcalflow_cs/calflow.orgchart.event_create SMCalFlow-CS`.
3. For the COGS data, clone `https://github.com/najoungkim/COGS.git` to `/some/path/` and make the link under `data` as `ln -s /some/path cogs`.

