NPFL116 – Compendium of Neural Machine Translation

This seminar should make the students familiar with the current research trends in machine translation using deep neural networks. The students should most importantly learn how to deal with the ever-growing body of literature on empirical research in machine translation and critically asses its content. The semester consists of few lectures summarizing the state of the art, discussions on reading assignments and student presentation of selected papers.

About

SIS code: NPFL116
Semester: summer
E-credits: 3
Examination: 0/2 C
Instructors: Jindřich Helcl, Jindřich Libovický (remotely)

Timespace Coordinates

The course is held on Wednesdays at 14:00 in S1. The first lecture is on February 26.


Note

Due to the coronavirus epidemic, lectures are replaced with reading materials and questions.


Lectures

1. Introductory notes on machine translation and deep learning Logistics NN Intro Reading Questions

2. Sequence-to-sequence learning using Recurrent Neural Networks Sequence-to-Sequence Reading

3. Reading bundle 1 - Back-Translation Reading Reading Reading Questions

4. Reading bundle 2 - Byte-Pair Encoding Reading Reading Reading Questions

5. Reading bundle 3 - Undirected Sequence Generation Reading Reading Reading Reading Questions

6. Reading bundle 4 - Low-Resource Translation Reading Reading Reading Questions

1. Introductory notes on machine translation and deep learning

 Feb 20 Logistics NN Intro

Introduction

Reading  1.5 hour LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature 521.7553 (2015): 436.

Questions

  • Can you identify some implicit assumptions the authors make about sentence meaning while talking about NMT?
  • Do you think they are correct?
  • How do the properties that the authors attribute to LSTM networks correspond to your own ideas how should language be computationally processed?

2. Sequence-to-sequence learning using Recurrent Neural Networks

 Mar 4 Sequence-to-Sequence

Covered topics: embeddings, RNNs, vanishing gradient, LSTM, encoder-decoder, attention

Reading  2 hours

Vaswani, Ashish, et al. Attention is all you need. Advances in Neural Information Processing Systems. 2017.

3. Reading bundle 1 - Back-Translation

Reading  1 hour

Rico Sennrich, Barry Haddow, Alexandra Birch. Improving Neural Machine Translation Models with Monolingual Data. 2015.

Reading  1 hour

Sergey Edunov, Myle Ott, Michael Auli, David Grangier. Understanding Back-Translation at Scale. 2018.

Reading  1 hour

Nikolay Bogoychev, Rico Sennrich. Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation. 2019.

Questions

  • V čem si podle vás tyhle články nejvíc odporují?
  • Kdybyste měli měsíc nebo míň na to spustit překlad pro uživatele, použili byste back-translation?
  • Vezmeme-li v potaz charakter testovacích dat na WMT (kde polovina vznikne překladem z jedné strany a druhá polovina překladem z druhé strany), jak co nejlíp použít backtranslation, abychom vyhráli WMT (co se týče BLEU skóre)?
  • Jak si vysvětlujete, že forward translation pomáhá? (Argument proč dělat back-translation byl ten, že na kvalitě zdrojové strany nezáleží, ale cílová strana musí být co možná nejčistší)

4. Reading bundle 2 - Byte-Pair Encoding

Reading  1 hour

Rico Sennrich, Barry Haddow, Alexandra Birch. Neural Machine Translation of Rare Words with Subword Units. 2015.

Reading  1 hour

Taku Kudo, John Richardson. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. 2018.

Reading  1 hour

Ivan Provilkov, Dmitrii Emelianenko, Elena Voita. BPE-Dropout: Simple and Effective Subword Regularization. 2019.

Questions

  • Může nám pomoci BPE s nějakými lingvistickými jevy? Může nám to u nějakých lingvistických jevů uškodit?
  • Máme problém s překladem do Japonštiny - pokud před segmentací na subwords nespustíme tokenizaci (tj. přidání mezer mezi to, co chápeme jako slova), tak jsou výsledky mnohem horší než pokud data napřed tokenizujeme a potom pustíme BPE. Napadá vás, co tohle chování způsobuje?
  • Dostaneme paralelní korpus v neznámých jazycích. Jakým způsobem je budeme segmentovat?
  • Pokud pomáhá nasekat slova na menší části, co takhle to celé pustit rovnou jenom na znacích? Myslíte, že by to mohlo pomoct? Nebo se to zhorší?

5. Reading bundle 3 - Undirected Sequence Generation

Reading  2 hours

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018.

Reading  1.5 hours

Guillaume Lample, Alexis Conneau. Cross-lingual Language Model Pretraining. 2019.

Reading  1.5 hours

Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer. Mask-Predict: Parallel Decoding of Conditional Masked Language Models. 2019.

Reading  2 hours

Elman Mansimov, Alex Wang, Sean Welleck, Kyunghyun Cho. A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models. 2019.

Questions

  • Myslíte si, že modely tohoto typu mají šanci v budoucnu nahradit modely, které generují sekvenčně? Jaké jsou argumenty pro a proti?
  • Odráží tyto modely nějaké rysy toho, jak lidé produkují řeč? (V kontrastu se sekvenčními modely, které více odpovídají naší intuici.)
  • Třetí článek (Mask-Predict) se původně jmenoval Constant-Time Machine Translation with Conditional Masked Language Models. Proč už se tak nejmenuje? (Návodná doplňující otázka: Jaká je časová složitost (vzhledem k délce výstupní věty) sekvenčního dekódování a dekódování z maskovaných jazykových modelů pokud můžeme paralelizovat, co se dá?)
  • Existují jazykové jevy, se kterými si tyhle modely dokážou poradit lépe než ty autoregresivní?

6. Reading bundle 4 - Low-Resource Translation

Reading  1 hour

Rico Sennrich, Biao Zhang. Revisiting Low-Resource Neural Machine Translation: A Case Study. 2019

Reading  2 hours

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu MASS: Masked Sequence to Sequence Pre-training for Language Generation.

Reading  2 hours

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. Multilingual Denoising Pre-training for Neural Machine Translation.

Questions

Reading assignments

There will be a reading assignment after every class. You will be given few question about the reading that you should submit before the next lecture.

Student presentations

Students will present one of the selected groups of papers to their fellow students. The presenting students will also prepare questions for discussion after the paper presentation.

Others should also get familiar with the paper so they can participate in the discussion.

It is strongly encouraged to arrange a consultation with the course instructors at least one day before the presentation.

Final written test

There will be a final written test that will not be graded.