# Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data

source
https://arxiv.org/abs/1903.00138

## Grammatical Error Correction

Unlike translation, GEC only changes several words of the source sentence – roughly 80% can be copied from the source sentence.

## Key Ideas

• Leverage unlabelled data by training denoising autoencoders.
• Add 2 multi-task for copy-augmented architecture:
• token labeling task
• sentence-level copying task

The copying mechanism enables training a model with a small vocabulary, as it can copy unchanged and out-of-vocabulary words from the source input tokens.

## Architecture

Transformer encodes source sentence with stack of L identical blocks, each of them applies multi-head self-attention over the source tokens followed by a position-wise feedforward layer to produce its context aware hidden state. The decoder is the same architecture as the encoder, with an additional attention layer over the encoder’s hidden states.

$$h_{1 \ldots N}^{s r c}=\text {encoder }\left(L^{s r c} x_{1 \dots N}\right)$$

$$h_{t}=\operatorname{decoder}\left(L^{\text{train}} y_{t-1 \ldots 1}, h_{1 \ldots N}^{s r c}\right)$$

$$P_{t}(w)=\operatorname{softmax}\left(L^{\text{train}} h_{t}\right)$$

Icon by Laymik from The Noun Project. Website built with ♥ with Org-mode, Hugo, and Netlify.