## Command Line Interface

This document describes the command line interface provided by this package. There are three types of operations: Transforms, Validations and Utilities.

Transforms take as input a sentence and produce one or multiple perturbed sentences.

Validations receive an original sentence and a perturbed sentence and verify if the pertubed sentece complies with some requirement.

Utilities offer functions to perform common operations, such as reading and writing files.

The following table details the available CLI commands:

<table>
	<tr>
        <th></th>
	    <th>Name</th>
	    <th>Command</th>
	    <th>Description</th>  
	</tr >
	<tr >
	    <td rowspan="5">Transform</td>
		<td>Delete Span between Punct</td>
	    <td nowrap="nowrap"><code>transf-del-punct-span</code></td>
        <td>Removes a single span between two punctuation symbols (<code>.,?!</code>).</td>
	</tr>
	<tr>
        <td>Insert Text</td>
	    <td nowrap="nowrap"><code>transf-ins-text</code></td>
        <td>Insert random text in multiple places using <a href="https://arxiv.org/abs/2010.11934">Google's mT5</a> model.</td>
	</tr>
	<tr>
        <td>Negate</td>
	    <td nowrap="nowrap"><code>transf-neg</code></td>
        <td>Negates an english sentence using <a href="https://arxiv.org/abs/2101.00288">PolyJuice</a> conditioned for negation.</td>
	</tr>
	<tr>
	    <td>Swap Named Entity</td>
	    <td><code>transf-swp-ne</code></td>
        <td>Detects a single named entity with a <a href="https://stanfordnlp.github.io/stanza/available_models.html#available-ner-models">Stanza model</a> and swaps it for text generated with <a href="https://arxiv.org/abs/2010.11934">Google's mT5</a>.</td>
	</tr>
	<tr>
	    <td>Swap Number</td>
	    <td nowrap="nowrap"><code>transf-swp-num</code></td>
        <td>Detects a single number with RegEx and swaps it for text generated with <a href="https://arxiv.org/abs/2010.11934">Google's mT5</a>.</td>
	</tr>
	<tr>
	    <td rowspan="7">Validation</td>
	    <td>Keep Contradiction</td>
        <td nowrap="nowrap"><code>val-keep-contradiction</code></td>
        <td>Verifies if the perturbed sentence contradicts the original sentence. Relies on a <a href="https://arxiv.org/abs/1907.11692">RoBERTa</a> model trained for mnli.</td>
	</tr>
	<tr>
	    <td>Keep Equal Numbers Count</td>
	    <td nowrap="nowrap"><code>val-keep-eq-num</code></td>
	    <td>Verifies if the perturbed and original sentences have the same number of numbers using RegEx to detect them.</td>
	</tr>
	<tr>
	    <td>Keep Equal Named Entities Count</td>
	    <td nowrap="nowrap"><code>val-keep-eq-ne</code></td>
	    <td>Verifies if the perturbed and original sentences have the same number of named entities using a <a href="https://stanfordnlp.github.io/stanza/available_models.html#available-ner-models">Stanza model</a> to detect them.</td>
	</tr>
	<tr>
	    <td>Keep Greater or Equal Edit Distance</td>
	    <td nowrap="nowrap"><code>val-keep-geq-edit-dist</code></td>
	    <td>Verifies if the perturbed and original sentences an <a href="https://web.stanford.edu/class/cs124/lec/med.pdf">minimum edit distance</a> above a threshold.</td>
	</tr>
		<tr>
	    <td>Keep Less or Equal Charcter Insertions</td>
	    <td nowrap="nowrap"><code>val-keep-leq-char-ins</code></td>
	    <td>Verifies if the perturbed sentence has a number of specific character insertions below a threshold, when compared to the original.</td>
	</tr>
	<tr>
	    <td>Remove Equal Sentences</td>
	    <td nowrap="nowrap"><code>val-rm-equal</code></td>
	    <td>Verifies if the perturbed sentence is different from the original sentence with string comparison. Useful if the transform may return the original sentence.</td>
	</tr>
	<tr>
	    <td>Remove a Pattern</td>
	    <td nowrap="nowrap"><code>val-rm-pattern</code></td>
	    <td>Verifies if the perturbed sentence does not have a specific regular expression. Useful with language models that may leave special tokens behind.</td>
	</tr>
	<tr>
        <td rowspan="5">Utilities</td>
	    <td>Read Lines</td>
	    <td nowrap="nowrap"><code>io-read-lines</code></td>
	    <td>Reads sentences from a text file, where each line is a sentence.</td>
	</tr>
    <tr>
	    <td>Read CSV</td>
	    <td nowrap="nowrap"><code>read-csv</code></td>
	    <td>Reads the sentences from a csv file. Each line of the file has the sentence to perturb and the sentence language.</td>
	</tr>
    <tr>
	    <td>Write JSON </td>
	    <td nowrap="nowrap"><code>io-write-json</code></td>
	    <td>Writes the perturbed sentences in a human-readable JSON format.</td>
	</tr>
</table>

### Configuration File Specification

The cli tool can also be used with a `yaml` configuration file as follows:

```
augment --cfg <path_to_config_file>
```

An example of a configuration file is:

```yaml
pipeline:
- cmd: io-read-csv
  path: <path to input file>
- cmd: transf-neg
- cmd: transf-ins-text
  validations:
  - cmd: val-keep-geq-edit-dist
    distance: 8
    level: word
- cmd: val-rm-pattern
  pattern: hello-world
- cmd: io-write-json
  path: <path to output file>
seed: 42
no-post-run: True
```

The first pipeline section is mandatory and specifies a list with all the commands to be executed. After that section, other cli arguments can be specified (such as `seed` in this example). The arguments are the same as in the cli command, but without the `--` in the beginning. Boolean flags also do not have `--` and can have the value True of False.


Inside the pipeline section, each command is identified with `cmd: <command name>`. The remaining tags in the command entry are the arguments for the command. 

Inside transforms, a special `validations` tag can be used to specify validations for the command only. Validations for all previous transforms can be specified as a regular command in the pipeline. In the above exaple `val-keep-geq-edit-dist` is only applied to `transf-ins-text` but `val-rm-pattern` is applied to `transf-neg` and `transf-ins-text`.