Deep Learning – Summer 2018/19
In recent years, deep neural networks have been used to solve complex machinelearning problems. They have achieved significant stateoftheart results in many areas.
The goal of the course is to introduce deep neural networks, from the basics to the latest advances. The course will focus both on theory as well as on practical aspects (students will implement and train several deep neural networks capable of achieving stateoftheart results, for example in named entity recognition, dependency parsing, machine translation, image labeling or in playing video games). No previous knowledge of artificial neural networks is required, but basic understanding of machine learning is advisable.
About
SIS code: NPFL114
Semester: summer
Ecredits: 7
Examination: 3/2 C+Ex
Guarantor: Milan Straka
Timespace Coordinates
 lectures: Czech lecture is held on Monday 14:50 in S9, English lecture on Monday 12:20 in S9; first lecture is on Mar 04
 practicals: there are three parallel practicals, on Monday 17:20 in S9, on Tuesday 9:00 in SU1, and on Tuesday 12:20 in SU1; first practicals are on Mar 04/05
Lectures
1. Introduction to Deep Learning Slides PDF Slides 2018 Video numpy_entropy mnist_layers_activations
2. Training Neural Networks Slides PDF Slides 2018 Video mnist_training gym_cartpole
3. Training Neural Networks II Slides PDF Slides 2018 Video mnist_regularization mnist_ensemble uppercase
4. Convolutional Neural Networks Slides PDF Slides 2018 Video mnist_cnn cifar_competition
5. Convolutional Neural Networks II Slides PDF Slides 2018 Video mnist_multiple fashion_masks
6. Convolutional Neural Networks III, Recurrent Neural Networks Slides PDF Slides 2018 Video I 2018 Video II caltech42_competition sequence_classification
7. Recurrent Neural Networks II Slides PDF Slides 2018 Video I 2018 Video II 2018 Video III 2018 Video IV tagger_we tagger_cle_rnn tagger_cle_cnn speech_recognition
8. Easter Monday tagger_competition 3d_recognition
9. Recurrent Neural Networks III Slides PDF Slides 2018 Video I 2018 Video II 2018 Video III 2018 Video IV lemmatizer_noattn lemmatizer_attn lemmatizer_competition
10. Deep Generative Models Slides PDF Slides 2018 Video vae gan dcgan nli_competition
11. Speech Synthesis, Reinforcement Learning Slides PDF Slides 2018 Video I 2018 Video II omr_competition monte_carlo reinforce reinforce_baseline reinforce_pixels
12. Transformer, External Memory Networks Slides PDF Slides
Requirements
To pass the practicals, you need to obtain at least 80 points, which are awarded for home assignments. Note that up to 40 points above 80 will be transfered to the exam.
To pass the exam, you need to obtain at least 60, 75 and 90 out of 100 points for the written exam (plus up to 40 points from the practicals), to obtain grades 3, 2 and 1, respectively.
The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).
References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.
1. Introduction to Deep Learning
Mar 04 Slides PDF Slides 2018 Video numpy_entropy mnist_layers_activations
 Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
 Selfinformation, entropy, crossentropy, KLdivergence [Section 3.13 of DBL]
 Gaussian distribution [Section 3.9.3 of DLB]
 Machine Learning Basics [Section 5.15.1.3 of DLB]
 History of Deep Learning [Section 1.2 of DLB]
 Linear regression [Section 5.1.4 of DLB]
 Brief description of Logistic Regression, Maximum Entropy models and SVM [Sections 5.7.1 and 5.7.2 of DLB]
 Challenges Motivating Deep Learning [Section 5.11 of DLB]
 Neural network basics (this topic is treated in detail withing the lecture NAIL002)
 Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
 Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
 Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
 Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]
2. Training Neural Networks
Mar 11 Slides PDF Slides 2018 Video mnist_training gym_cartpole
 Capacity, overfitting, underfitting, regularization [Section 5.2 of DLB]
 Hyperparameters and validation sets [Section 5.3 of DLB]
 Maximum Likelihood Estimation [Section 5.5 of DLB]
 Neural network training (this topic is treated in detail withing the lecture NAIL002)
 Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
 Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.2 and 6.3; note that Algorithms 6.5 and 6.6 are used in practice]
 SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
 SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
 SGD with Nestorov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
 Optimization algorithms with adaptive gradients
 AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
 RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
 Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]
3. Training Neural Networks II
Mar 18 Slides PDF Slides 2018 Video mnist_regularization mnist_ensemble uppercase
 Training neural network with a single hidden layer
 Softmax with NLL (negative log likelihood) as a loss function [Section 6.2.2.3 of DLB, notably equation (6.30); plus slides 1012]
 Regularization [Chapter 7 until Section 7.1 of DLB]
 Early stopping [Section 7.8 of DLB, without the How early stopping acts as a regularizer part]
 L2 and L1 regularization [Sections 7.1 and 5.6.1 of DLB; plus slides 1718]
 Dataset augmentation [Section 7.4 of DLB]
 Ensembling [Section 7.11 of DLB]
 Dropout [Section 7.12 of DLB]
 Label smoothing [Section 7.5.1 of DLB]
 Saturating nonlinearities [Section 6.3.2 and second half of Section 6.2.2.2 of DLB]
 Parameter initialization strategies [Section 8.4 of DLB]
4. Convolutional Neural Networks
Mar 25 Slides PDF Slides 2018 Video mnist_cnn cifar_competition
 Gradient clipping [Section 10.11.1 of DLB]
 Introduction to convolutional networks [Chapter 9 and Sections 9.19.3 of DLB]
 Convolution as operation on 4D tensors [Section 9.5 of DLB, notably Equations (9.7) and (9.8)]
 Max pooling and average pooling [Section 9.3 of DLB]
 Stride and Padding schemes [Section 9.5 of DLB]
 AlexNet [Alex Krizhevsky et al.: ImageNet Classification with Deep Convolutional Neural Networks]
 VGG [Karen Simonyan and Andrew Zisserman: Very Deep Convolutional Networks for LargeScale Image Recognition]
 GoogLeNet (aka Inception) [Christian Szegedy et al.: Going Deeper with Convolutions]
 Batch normalization [Section 8.7.1 of DLB, optionally the paper Sergey Ioffe and Christian Szegedy: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift] Inception v2 and v3 [Rethinking the Inception Architecture for Computer Vision]
 ResNet [Kaiming He et al.: Deep Residual Learning for Image Recognition]
5. Convolutional Neural Networks II
Apr 01 Slides PDF Slides 2018 Video mnist_multiple fashion_masks
 Residual CNN Networks
 ResNet [Kaiming He et al.: Deep Residual Learning for Image Recognition]
 WideNet [Wide Residual Network]
 DenseNet [Densely Connected Convolutional Networks]
 PyramidNet [Deep Pyramidal Residual Networks]
 ResNeXt [Aggregated Residual Transformations for Deep Neural Networks]
 Regularizing CNN Networks
 Object detection using Fast RCNN [Ross Girshick: Fast RCNN]
 Proposing RoIs using Faster RCNN [Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun: Faster RCNN: Towards RealTime Object Detection with Region Proposal Networks]
6. Convolutional Neural Networks III, Recurrent Neural Networks
Apr 08 Slides PDF Slides 2018 Video I 2018 Video II caltech42_competition sequence_classification
 Image segmentation
 Mask RCNN [Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask RCNN]
 Feature Pyramid Networks [Feature Pyramid Networks for Object Detection]
 Focal Loss [Focal Loss for Dense Object Detection]
 Group Normalization [Yuxin Wu, Kaiming He: Group Normalization]
 Sequence modelling using Recurrent Neural Networks (RNN) [Chapter 10 until Section 10.2.1 (excluding) of DLB]
 The challenge of longterm dependencies [Section 10.7 of DLB]
 Long ShortTerm Memory (LSTM) [Section 10.10.1 of DLB, Sepp Hochreiter, Jürgen Schmidhuber (1997): Long shortterm memory, felix A. Gers, Jürgen Schmidhuber, Fred Cummins (2000): Learning to Forget: Continual Prediction with LSTM]
7. Recurrent Neural Networks II
Apr 15 Slides PDF Slides 2018 Video I 2018 Video II 2018 Video III 2018 Video IV tagger_we tagger_cle_rnn tagger_cle_cnn speech_recognition
 Gated Recurrent Unit (GRU) [Section 10.10.2 of DLB, Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio: Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation]
 Highway Networks [Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber: Training Very Deep Networks]
 RNN Regularization
 Variational Dropout [Yarin Gal, Zoubin Ghahramani: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks]
 Layer Normalization [Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton: Layer Normalization]
 Word Embeddings [Section 14.2.4 of DLB]
 Bidirectional RNN [Section 10.3 of DLB]
 Characterlevel embeddings using Recurrent neural networks [C2W model from Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso: Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation]
 Characterlevel embeddings using Convolutional neural networks [CharCNN from Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush: CharacterAware Neural Language Models]
 Conditional Random Fields (CRF) loss [Sections 3.4.2 and A.7 of R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa: Natural Language Processing (Almost) from Scratch]
8. Easter Monday
Apr 22 tagger_competition 3d_recognition
Easter Monday
9. Recurrent Neural Networks III
Apr 29 Slides PDF Slides 2018 Video I 2018 Video II 2018 Video III 2018 Video IV lemmatizer_noattn lemmatizer_attn lemmatizer_competition
 Connectionist Temporal Classification (CTC) loss [A. Graves, S. Fernández, F. Gomez, J. Schmidhuber: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks]
Word2vec
word embeddings, notably the CBOW and Skipgram architectures [Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean: Efficient Estimation of Word Representations in Vector Space] Hierarchical softmax [Section 12.4.3.2 of DLB or Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean: Distributed Representations of Words and Phrases and their Compositionality]
 Negative sampling Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean: Distributed Representations of Words and Phrases and their Compositionality]
 Neural Machine Translation using EncoderDecoder or SequencetoSequence architecture [Section 12.5.4 of DLB, Ilya Sutskever, Oriol Vinyals, Quoc V. Le: Sequence to Sequence Learning with Neural Networks and Kyunghyun Cho et al.: Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation]
 Using Attention mechanism in Neural Machine Translation [Section 12.4.5.1 of DLB, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: Neural Machine Translation by Jointly Learning to Align and Translate]
 Translating Subword Units [Rico Sennrich, Barry Haddow, Alexandra Birch: Neural Machine Translation of Rare Words with Subword Units]
 Google NMT [Yonghui Wu et al.: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation]
10. Deep Generative Models
May 06 Slides PDF Slides 2018 Video vae gan dcgan nli_competition
 Autoencoders (undercomplete, sparse, denoising) [Chapter 14, Sections 1414.2.3 of DLB]
 Deep Generative Models using Differentiable Generator Nets [Section 20.10.2 of DLB]
 Variational Autoencoders [Section 20.10.3 plus Reparametrization trick from Section 20.9 (but not Section 20.9.1) of DLB, Diederik P Kingma, Max Welling: AutoEncoding Variational Bayes]
 Generative Adversarial Networks
 GAN [Section 20.10.4 of DLB, Ian J. Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, Yoshua Bengio: Generative Adversarial Networks]
 CGAN [Conditional Generative Adversarial Nets]
 DCGAN [Alec Radford, Luke Metz, Soumith Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks]
 WGAN [Martin Arjovsky, Soumith Chintala, Léon Bottou: Wasserstein GAN]
11. Speech Synthesis, Reinforcement Learning
May 13 Slides PDF Slides 2018 Video I 2018 Video II omr_competition monte_carlo reinforce reinforce_baseline reinforce_pixels
Study material for Reinforcement Learning is the Reinforcement Learning: An Introduction; second edition by Richard S. Sutton and Andrew G. Barto (reffered to as RLB), available online.
 Speech synthesis
 WaveNet [WaveNet: A Generative Model for Raw Audio]
 Parallel WaveNet [Parallel WaveNet: Fast HighFidelity Speech Synthesis]
 Full speech synthesis pipeline Tacotron 2 [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions]
 Multiarmed bandits [Sections 22.4 of RLB]
 Markov Decision Process [Sections 33.3 of RLB]
 Policies and Value Functions [Sections 3.5 of RLB]
 Monte Carlo Methods [Sections 55.4 of RLB]
 Policy Gradient Methods [Sections 1313.1 of RLB]
 Policy Gradient Theorem Without Proof [Section 13.2 of RLB]
 REINFORCE algorithm [Section 13.3 of RLB]
 REINFORCE with baseline algorithm [Section 13.4 of RLB]
12. Transformer, External Memory Networks
May 20 Slides PDF Slides
 NasNet [Learning Transferable Architectures for Scalable Image Recognition]
 Transformer architecture [Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention Is All You Need]
 Neural Turing Machine [Neural Turing Machines]
 Differenciable Neural Computer [Hybrid computing using a neural network with dynamic external memory]
 Memory Augmented Neural Networks [Oneshot learning with MemoryAugmented Neural Networks]
The tasks are evaluated automatically using the ReCodEx Code Examiner. The evaluation is performed using Python 3.6, TensorFlow 2.0.0a0, NumPy 1.16.1 and OpenAI Gym 0.9.5.
You can install all required packages either to user packages using
pip3 install user tensorflow==2.0.0a0 gym==0.9.5
,
or create a virtual environment using python3 m venv VENV_DIR
and then installing the packages inside it by running
VENV_DIR/bin/pip3 install tensorflow==2.0.0a0 gym==0.9.5
.
If you have a GPU, you can install GPUenabled TensorFlow by using
tensorflowgpu
instead of tensorflow
.
Teamwork
Working in teams of size 2 (or at most 3) is encouraged. All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. However, each such solution must explicitly list all members of the team to allow plagiarism detection using this template.
numpy_entropy
Deadline: Mar 17, 23:59 3 points
The goal of this exercise is to famirialize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.
Load a file numpy_entropy_data.txt
, whose lines consist of data points of our
dataset, and load numpy_entropy_model.txt
, which describes a model probability distribution,
with each line being a tabseparated pair of (data point, probability).
Example files are in the labs/01.
Then compute the following quantities using NumPy, and print them each on
a separate line rounded on two decimal places (or inf
for positive infinity,
which happens when an element of data distribution has zero probability
under the model distribution):
 entropy H(data distribution)
 crossentropy H(data distribution, model distribution)
 KLdivergence D_{KL}(data distribution, model distribution)
Use natural logarithms to compute the entropies and the divergence.
mnist_layers_activations
Deadline: Mar 17, 23:59 3 points
The templates changed on Mar 11 because of the upgrade to TF 2.0.0a0, be sure to use the updated ones when submitting!
In order to familiarize with TensorFlow and TensorBoard, start by playing with
example_keras_tensorboard.py.
Run it, and when it finishes, run TensorBoard using tensorboard logdir logs
.
Then open http://localhost:6006 in a browser and explore the active tabs.
Your goal is to modify the mnist_layers_activations.py template and implement the following:
 A number of hidden layers (including zero) can be specified on the command line
using parameter
layers
.  Activation function of these hidden layers can be also specified as a command
line parameter
activation
, with supported values ofnone
,relu
,tanh
andsigmoid
.  Print the final accuracy on the test set.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:
0
layers, activationnone
1
layer, activationnone
,relu
,tanh
,sigmoid
10
layers, activationsigmoid
,relu
mnist_training
Deadline: Mar 24, 23:59 4 points
This exercise should teach you using different optimizers, learning rates, and learning rate decays. Your goal is to modify the mnist_training.py template and implement the following:
 Using specified optimizer (either
SGD
orAdam
).  Optionally using momentum for the
SGD
optimizer.  Using specified learning rate for the optimizer.
 Optionally use a given learning rate schedule. The schedule can be either
exponential
orpolynomial
(with degree 1, so inverse time decay). Additionally, the final learning rate is given and the decay should gradually decrease the learning rate to reach the final learning rate just after the training.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:
SGD
optimizer,learning_rate
0.01;SGD
optimizer,learning_rate
0.01,momentum
0.9;SGD
optimizer,learning_rate
0.1;Adam
optimizer,learning_rate
0.001;Adam
optimizer,learning_rate
0.01;Adam
optimizer,exponential
decay,learning_rate
0.01 andlearning_rate_final
0.001;Adam
optimizer,polynomial
decay,learning_rate
0.01 andlearning_rate_final
0.0001.
gym_cartpole
Deadline: Mar 24, 23:59 4 points
Solve the CartPolev1 environment from the OpenAI Gym, utilizing only provided supervised training data. The data is available in gym_cartpoledata.txt file, each line containing one observation (four space separated floats) and a corresponding action (the last space separated integer). Start with the gym_cartpole.py.
The solution to this task should be a model which passes evaluation on random
inputs. This evaluation is performed by running the
gym_cartpole_evaluate.py,
which loads a model and then evaluates it on 100 random episodes (optionally
rendering if render
option is provided). In order to pass, you must achieve
an average reward of at least 475 on 100 episodes. Your model should have either
one or two outputs (i.e., using either sigmoid of softmax output function).
The size of the training data is very small and you should consider it when designing the model.
When submitting your model to ReCodEx, submit:
 one file with the model itself (with
h5
suffix),  the source code (or multiple sources) used to train the model (with
py
suffix), and possibly indicating teams.
mnist_regularization
Deadline: Mar 31, 23:59 6 points
You will learn how to implement three regularization methods in this assignment. Start with the mnist_regularization.py template and implement the following:
 Allow using dropout with rate
args.dropout
. Add a dropout layer after the firstFlatten
and also after allDense
hidden layers (but not after the output layer).  Allow using L2 regularization with weight
args.l2
. Usetf.keras.regularizers.L1L2
as a regularizer for all kernels and biases of allDense
layers (including the last one).  Allow using label smoothing with weight
args.label_smoothing
. Instead ofSparseCategoricalCrossentropy
, you will need to useCategoricalCrossentropy
which offerslabel_smoothing
argument.
In ReCodEx, there will be three tests (one for each regularization methods) and you will get 2 points for passing each one.
In addition to submitting the task in ReCodEx, also run the following variations and observe the results in TensorBoard (notably training, development and test set accuracy and loss):
 dropout rate
0
,0.3
,0.5
,0.6
,0.8
;  l2 regularization
0
,0.001
,0.0001
,0.00001
;  label smoothing
0
,0.1
,0.3
,0.5
.
mnist_ensemble
Deadline: Mar 31, 23:59 2 points
Your goal in this assignment is to implement model ensembling.
The mnist_ensemble.py
template trains args.models
individual models, and your goal is to perform
an ensemble of the first model, first two models, first three models, …, all
models, and evaluate their accuracy on the development set.
In addition to submitting the task in ReCodEx, run the script with
args.models=7
and look at the results in mnist_ensemble.out
file.
uppercase
Deadline: Mar 31, 23:59 49 points
This assignment introduces first NLP task. Your goal is to implement a model which is given Czech lowercased text and tries to uppercase appropriate letters. To load the dataset, use uppercase_data.py module which loads (and if required also downloads) the data. While the training and the development sets are in correct case, the test set is lowercased.
This is an opendata task, where you submit only the uppercased test set together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is also a competition. Everyone who submits a solution which achieves at least 96.5% accuracy will get 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions, i.e., the best solution will get total 9 points, the worst solution (but at least with 96.5% accuracy) will get total 4 points. The accuracy is computed percharacter and can be evaluated by uppercase_eval.py script.
You may want to start with the uppercase.py template, which uses the uppercase_data.py to load the data, generate an alphabet of given size containing most frequent characters, and generate sliding window view on the data. The template also comments on possibilities of character representation.
Do not use RNNs or CNNs in this task (if you have doubts, contact me).
mnist_cnn
Deadline: Apr 07, 23:59 5 points
To pass this assignment, you will learn to construct basic convolutional
neural network layers. Start with the
mnist_cnn.py
template and assume the requested architecture is described by the cnn
argument, which contains commaseparated specifications of the following layers:
Cfilterskernel_sizestridepadding
: Add a convolutional layer with ReLU activation and specified number of filters, kernel size, stride and padding. Example:C1031same
CBfilterskernel_sizestridepadding
: Same asCfilterskernel_sizestridepadding
, but use batch normalization. In detail, start with a convolutional layer without bias and activation, then add batch normalization layer, and finally ReLU activation. Example:CB1031same
Mkernel_sizestride
: Add max pooling with specified size and stride. Example:M32
R[layers]
: Add a residual connection. Thelayers
contain a specification of at least one convolutional layer (but not a recursive residual connectionR
). The input to the specified layers is then added to their output. Example:R[C1631same,C1631same]
F
: Flatten inputs. Must appear exactly once in the architecture.Dhidden_layer_size
: Add a dense layer with ReLU activation and specified size. Example:D100
An example architecture might be cnn=CB1652same,M32,F,D100
.
After a successful ReCodEx submission, you can try obtaining the best accuracy
on MNIST and then advance to cifar_competition
.
cifar_competition
Deadline: Apr 07, 23:59 510 points
The goal of this assignment is to devise the best possible model for CIFAR10. You can load the data using the cifar10.py module. Note that the test set is different than that of official CIFAR10.
This is an opendata task, where you submit only the test set labels together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is also a competition. Everyone who submits a solution which achieves at least 60% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that my solutions usually need to achieve at least ~73% on the development set to score 60% on the test set.
You may want to start with the cifar_competition.py template.
mnist_multiple
Deadline: Apr 14, 23:59 4 points
In this assignment you will implement a model with multiple inputs, multiple outputs, manual batch preparation, and manual evaluation. Start with the mnist_multiple.py template and:
 The goal is to create a model, which given two input MNIST images predicts if the digit on the first one is larger than on the second one.
 The model has three outputs:
 direct prediction of the required value,
 label prediction for the first image,
 label prediction for the second image.
 In addition to direct prediction, you can predict labels for both images and compare them  an indirect prediction.
 You need to implement:
 the model, using multiple inputs, outputs, losses, and metrics;
 generation of twoimage batches using regular MNIST batches,
 computation of direct and indirect prediction accuracy.
fashion_masks
Deadline: Apr 14, 23:59 511 points
This assignment is a simple image segmentation task. The data for this task is available through the fashion_masks_data.py The inputs consist of 28×28 greyscale images of ten classes of clothing, while the outputs consist of the correct class and a pixel bit mask.
This is an opendata task, where you submit only the test set annotations
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit exactly one .txt
file and at least one .py
file.
Note that all .zip
files you submit will be extracted first.
Performance is evaluated using mean IoU, where IoU for a single example is defined as an intersection of the gold and system mask divided by their union (assuming the predicted label is correct; if not, IoU is 0). The evaluation (using for example development data) can be performed by fashion_masks_eval.py script.
The task is a competition and the points will be awarded depending on your test set score. If your test set score surpasses 75%, you will be awarded 5 points; the rest 6 points will be distributed depending on relative ordering of your solutions. Note that quite a straightfoward model surpasses 80% on development set after an hour of computation (and 90% after several hours), so reaching 75% is not that difficult.
You may want to start with the fashion_masks.py template, which loads the data and generates test set annotations in the required format (one example per line containing space separated label and mask, the mask stored as zeros and ones, rows first).
caltech42_competition
Deadline: Apr 21, 23:59 Apr 22, 23:59
510 points
The goal of this assignment is to try transfer learning approach to train image recognition on a small dataset with 42 classes. You can load the data using the caltech42.py module. In addition to the training data, you should use a MobileNet v2 pretrained network (details in caltech42_competition.py).
This is an opendata task, where you submit only the test set labels together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is also a competition. Everyone who submits a solution which achieves at least 94% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
You may want to start with the caltech42_competition.py template.
sequence_classification
Deadline: Apr 21, 23:59 Apr 22, 23:59
6 points
The goal of this assignment is to introduce recurrent neural networks, manual TensorBoard log collection, and manual gradient clipping. Considering recurrent neural network, the assignment shows convergence speed and illustrates exploding gradient issue. The network should process sequences of 50 small integers and compute parity for each prefix of the sequence. The inputs are either 0/1, or vectors with onehot representation of small integer.
Your goal is to modify the sequence_classification.py template and implement the following:
 Use specified RNN cell type (
SimpleRNN
,GRU
andLSTM
) and dimensionality.  Process the sequence using the required RNN.
 Use additional hidden layer on the RNN outputs if requested.
 Implement gradient clipping if requested.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Concentrate on the way how the RNNs converge, convergence speed, exploding gradient issues and how gradient clipping helps:
rnn_cell=SimpleRNN sequence_dim=1
,rnn_cell=GRU sequence_dim=1
,rnn_cell=LSTM sequence_dim=1
 the same as above but with
sequence_dim=2
 the same as above but with
sequence_dim=10
rnn_cell=LSTM hidden_layer=50 rnn_cell_dim=30 sequence_dim=30
and the same withclip_gradient=1
 the same as above but with
rnn_cell=SimpleRNN
 the same as above but with
rnn_cell=GRU hidden_layer=150
tagger_we
Deadline: Apr 28, 23:59 3 points
In this assignment you will create a simple partofspeech tagger. For training and evaluation, we will use Czech dataset containing tokenized sentences, each word annotated by gold lemma and partofspeech tag. The morpho_dataset.py module (down)loads the dataset and can generate batches.
Your goal is to modify the tagger_we.py template and implement the following:
 Use specified RNN cell type (
GRU
andLSTM
) and dimensionality.  Create word embeddings for training vocabulary.
 Process the sentences using bidirectional RNN.
 Predict partofspeech tags. Note that you need to properly handle sentences of different lengths in one batch using masking.
After submitting the task to ReCodEx, continue with tagger_cle_rnn
assignment.
tagger_cle_rnn
Deadline: Apr 28, 23:59 3 points
This task is a continuation of tagger_we
assignment. Using the
tagger_cle_rnn.py
template, implement the following features in addition to tagger_we
:
 Create character embeddings for training alphabet.
 Process unique words with a bidirectional characterlevel RNN, concatenating the results.
 Properly distribute the CLEs of unique words into the batches of sentences.
 Generate overall embeddings by concatenating wordlevel embeddings and CLEs.
Once submitted to ReCodEx, continue with tagger_cle_cnn
assignment. Additionaly, you should experiment with the effect of CLEs compared
to plain tagger_we
, and the influence of their dimensionality.
Note that tagger_we
has by default word embeddings twice the
size of word embeddings in tagger_cle_rnn
.
tagger_cle_cnn
Deadline: Apr 28, 23:59 2 points
This task is a continuation of tagger_cle_rnn
assignment. Using the
tagger_cle_cnn.py
template, implement the following features compared to tagger_cle_rnn
:
 Instead of using RNNs to generate characterlevel embeddings, process embedded unique words with 1D convolutional filters with kernel sizes of 2 to some given maximum. To obtain a fixedsize representation, perform global maxpooling over the whole word.
speech_recognition
Deadline: Apr 28, 23:59 712 points
This assignment is a competition task in speech recognition area. Specifically, your goal is to predict a sequence of letters given a spoken utterance. We will be using TIMIT corpus, with input sound waves passed through the usual preprocessing – computing Melfrequency cepstral coefficients (MFCCs). You can repeat exactly this preprocessing on a given audio using the timit_mfcc_preprocess.py script.
Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. To load the dataset using the timit_mfcc.py module.
This is an opendata task, where you submit only the test set labels together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is also a competition. The evaluation is performed by computing edit
distance to the gold letter sequence, normalized by its length (i.e., exactly as
tf.edit_distance
). Everyone who submits a solution which achieves
at most 50% test set edit distance will get 7 points; the rest 5 points will be distributed
depending on relative ordering of your solutions. An evaluation (using for example development data)
can be performed by
speech_recognition_eval.py.
You should start with the speech_recognition.py template.
 To perform speech recognition, you should use CTC loss for training and CTC beam search decoder for prediction. Both the CTC loss and CTC decoder employ sparse tensor – therefore, start by studying them.
 The basic architecture:
 converts target letters into sparse representation,
 use a bidirectional RNN and an output linear layer without activation,
 compute CTC loss (
tf.nn.ctc_loss
),  if required, perform decoding by a CTC decoder (
tf.nn.ctc_beam_search_decoder
) and possibly evaluate results using normalized edit distance (tf.edit_distance
).
tagger_competition
Deadline: May 5, 23:59 513 points
In this assignment, you should extend
tagger_we
/tagger_cle_rnn
/tagger_cle_cnn
into a realworld Czech partofspeech tagger. We will use
Czech PDT dataset loadable using the morpho_dataset.py
module. Note that the dataset contains more than 1500 unique POS tags and that
the POS tags have a fixed structure of 15 positions (so it is possible to
generate the POS tag characters independently).
You can use the following additional data in this assignment:
 You can use outputs of a morphological analyzer loadable with morpho_analyzer.py. If a word form in train, dev or test PDT data is known to the analyzer, all its (lemma, POS tag) pairs are returned.
 You can use any unannotated text data (Wikipedia, Czech National Corpus, …).
The assignment is again an opendata task, where you submit only the annotated test set
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit exactly one .txt file and at least one .py file.
Note that all .zip
files you submit will be extracted first.
The task is also a competition. Everyone who submits a solution which achieves at least 92% label accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing preneuralnetwork stateoftheart of 95.89% from Spoustová et al., 2009. You can evaluate generated file against a golden text file using the morpho_evaluator.py module.
You can start with the tagger_competition.py template, which among others generates test set annotations in the required format.
3d_recognition
Deadline: May 5, 23:59 510 points
Your goal in this assignment is to perform 3D object recognition. The input is voxelized representation of an object, stored as a 3D grid of either empty or occupied voxels, and your goal is to classify the object into one of 10 classes. The data is available in two resolutions, either as 20×20×20 data or 32×32×32 data. To load the dataset, use the modelnet.py module.
The official dataset offers only train and test sets, with the test set having a different distributions of labels. Our dataset contains also a development set, which has nearly the same label distribution as the test set.
The assignment is again an opendata task, where you submit only the test set labels together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is also a competition. Everyone who submits a solution which achieves at least 85% label accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
You can start with the 3d_recognition.py template, which among others generates test set annotations in the required format.
lemmatizer_noattn
Deadline: May 12, 23:59 4 points
The goal of this assignment is to create a simple lemmatizer. For training and evaluation, we will use Czech dataset containing tokenized sentences, each word annotated by gold lemma and partofspeech tag. The morpho_dataset.py module (down)loads the dataset and can generate batches.
Your goal is to modify the lemmatizer_noattn.py template and implement the following:
 Embed characters of source forms and run a bidirectional GRU encoder.
 Embed characters of target lemmas.
 Implement a training time decoder which uses gold target characters as inputs.
 Implement an inference time decoder which uses previous predictions as inputs.
 The initial state of both decoders is the output state of the corresponding GRU encoded form.
After submitting the task to ReCodEx, continue with lemmatizer_attn
assignment.
lemmatizer_attn
Deadline: May 12, 23:59 3 points
This task is a continuation of lemmatizer_noattn
assignment. Using the
lemmatizer_attn.py
template, implement the following features in addition to lemmatizer_noattn
:
 The bidirectional GRU encoder returns outputs for all input characters, not just the last.
 Implement attention in the decoders. Notably, project the encoder outputs and current state into same dimensionality vectors, apply nonlinearity, and generate weights for every encoder output. Finally sum the encoder outputs using these weights and concatenate the computed attention to the decoder inputs.
Once submitted to ReCodEx, you should experiment with the effect of using the attention, and the influence of RNN dimensionality on network performance.
lemmatizer_competition
Deadline: May 12, 23:59 513 points
In this assignment, you should extend
lemmatizer_noattn
/lemmatizer_attn
into a realworld Czech lemmatizer. We will again use
Czech PDT dataset loadable using the morpho_dataset.py
module.
You can use the following additional data in this assignment:
 You can use outputs of a morphological analyzer loadable with morpho_analyzer.py. If a word form in train, dev or test PDT data is known to the analyzer, all its (lemma, POS tag) pairs are returned.
 You can use any unannotated text data (Wikipedia, Czech National Corpus, …).
The assignment is again an opendata task, where you submit only the annotated test set
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit exactly one .txt file and at least one .py file.
Note that all .zip
files you submit will be extracted first.
The task is also a competition. Everyone who submits a solution which achieves at least 92% accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing preneuralnetwork stateoftheart of 97.86%. You can evaluate generated file against a golden text file using the morpho_evaluator.py module.
You can start with the lemmatizer_competition.py template, which among others generates test set annotations in the required format.
vae
Deadline: May 19, 23:59 3 points
In this assignment you will implement a simple Variational Autoencoder for three datasets in the MNIST format. Your goal is to modify the vae.py template and implement a VAE.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnistfashion
, and mnistcifarcars
) and
different latent variable dimensionality (z_dim=2
and z_dim=100
).
The generated images are available in TensorBoard logs.
gan
Deadline: May 19, 23:59 3 points
In this assignment you will implement a simple Generative Adversarion Network for three datasets in the MNIST format. Your goal is to modify the gan.py template and implement a GAN.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnistfashion
, and mnistcifarcars
) and
maybe try different latent variable dimensionality. The generated images are
available in TensorBoard logs.
You can also continue with dcgan
assignment.
dcgan
Deadline: May 19, 23:59 1 points
This task is a continuation of gan
assignment, which you will modify to
implement the Deep Convolutional GAN (DCGAN).
Start with the
dcgan.py
template and implement a DCGAN. Note that most of the TODO notes are from
the gan
assignment.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnistfashion
, and mnistcifarcars
). However,
note that you will need a lot of computational power (preferably a GPU) to
generate the images.
nli_competition
Deadline: May 19, 23:59 610 points
In this competition you will be solving the Native Language Identification task. In that task, you get an English essay writen by a nonnative individual and your goal is to identify their native language.
We will be using NLI Shared Task 2013 data, which contains documents in 11 languages. For each language, the train, development and test sets contain 900, 100 and 100 documents, respectively. Particularly interesting is the fact that humans are quite bad in this task (in a simplified settings, human professionals achieve 4050% accuracy), while machine learning models can achive high performance. Notably, the 2013 shared tasks winners achieved 83.6% accuracy, while current stateoftheart is at least 87.1% (Malmasi and Dras, 2017).
Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. To load the dataset, use nli_dataset.py script.
The assignment is again an opendata task, where you submit only the annotated test set
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit exactly one .txt file and at least one .py file.
Note that all .zip
files you submit will be extracted first.
The task is also a competition. If your test set accuracy surpasses 60%, you will be awarded 6 points; the rest 4 points will be distributed depending on relative ordering of your solutions.
You can start with the nli_competition.py template, which loads the data and generates predictions in the required format (language of each essay on a line).
omr_competition
Deadline: May 26, 23:59 715 points
You should implement optical music recognition in your final competition assignment. The inputs are PNG images of monophonic scores starting with a clef, key signature, and a time signature, followed by several staves. The dataset is loadable using the omr_dataset.py module and is downloaded automatically if missing (note that is has 185MB). No other data or pretrained models are allowed for training.
The assignment is again an opendata task, where you submit only the annotated test set
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit exactly one .txt file and at least one .py file.
Note that all .zip
files you submit will be extracted first.
The task is also a competition. The evaluation is performed by computing edit
distance to the gold mark sequence, normalized by its length (i.e., exactly as
tf.edit_distance
). Everyone who submits a solution which achieves
at most 10% test set edit distance will get 7 points; the rest 4 points will be distributed
depending on relative ordering of your solutions. Furthermore, 4 bonus points
will be given to anyone surpassing current stateoftheart of 0.80%.
An evaluation (using for example development data) can be performed by
speech_recognition_eval.py.
You can start with the omr_competition.py template, which among others generates test set annotations in the required format.
monte_carlo
Deadline: May 26, 23:59 2 points
Solve the discretized CartPolev1 environment environment from the OpenAI Gym using the Monte Carlo reinforcement learning algorithm.
Use the supplied cart_pole_evaluator.py module (depending on gym_evaluator.py) to interact with the discretized environment. The environment has the following methods and properties:
states
: number of states of the environmentactions
: number of actions of the environmentepisode
: number of the current episode (zerobased)reset(start_evaluate=False) → new_state
: starts a new episodestep(action) → new_state, reward, done, info
: perform the chosen action in the environment, returning the new state, obtained reward, a boolean flag indicating an end of episode, and additional environmentspecific informationrender()
: render current environment state
Once you finish training (which you indicate by passing start_evaluate=True
to reset
), your goal is to reach an average return of 475 during 100
evaluation episodes. Note that the environment prints your 100episode
average return each 10 episodes even during training.
You can start with the monte_carlo.py template, which parses several useful parameters, creates the environment and illustrates the overall usage.
During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
reinforce
Deadline: May 26, 23:59 2 points
Solve the continuous CartPolev1 environment environment from the OpenAI Gym using the REINFORCE algorithm.
The supplied cart_pole_evaluator.py
module (depending on gym_evaluator.py)
can create a continuous environment using environment(discrete=False)
.
The continuous environment is very similar to the discrete environment, except
that the states are vectors of realvalued observations with shape environment.state_shape
.
Your goal is to reach an average return of 475 during 100 evaluation episodes. Start with the reinforce.py template.
During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
reinforce_baseline
Deadline: May 26, 23:59 2 points
This is a continuation of reinforce
assignment.
Using the reinforce_baseline.py template, solve the CartPolev1 environment environment using the REINFORCE with baseline algorithm.
Using a baseline lowers the variance of the value function gradient estimator, which allows faster training and decreases sensitivity to hyperparameter values. To reflect this effect in ReCodEx, note that the evaluation phase will automatically start after 200 episodes. Using only 200 episodes for training in this setting is probably too little for the REINFORCE algorithm, but suffices for the variant with a baseline.
Your goal is to reach an average return of 475 during 100 evaluation episodes.
During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
reinforce_pixels
Deadline: May 26, 23:59 2 points
This is a continuation of reinforce
or reinforce_baseline
assignment.
The supplied cart_pole_pixels_evaluator.py
module (depending on gym_evaluator.py)
generates a pixel representation of the CartPole
environment
as an $80×80$ image with three channels, with each channel representing one time step
(i.e., the current observation and the two previous ones).
To pass the assignment, you need to reach an average return of 250 during 100 evaluation episodes. During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 10 minutes.
You can start with the reinforce_pixels.py template using the correct environment.

Training Neural Network
Assume the artificial neural network on the right, with mean square error loss and gold output of 3. Compute the values of all weights $w_i$ after performing an SGD update with learning rate 0.1.Different networks architectures, activation functions (
tanh
,sigmoid
,softmax
) and losses (MSE
,NLL
) may appear in the exam. 
Maximum Likelihood Estimation
Formulate maximum likelihood estimator for neural network parameters and derive the following two losses: NLL (negative log likelihood) loss for networks returning a probability distribution
 MSE (mean square error) loss for networks returning a real number with a normal distribution with a fixed variance

Backpropagation Algorithm, SGD with Momentum
Write down the backpropagation algorithm. Then, write down the SGD algorithm with momentum. Finally, formulate SGD with Nestorov momentum and explain the difference to SGD with regular momentum. 
Adagrad and RMSProp
Write down the AdaGrad algorithm and show that it tends to internally decay learning rate by a factor of $1/\sqrt{t}$ in step $t$. Furthermore, write down RMSProp algorithm and compare it to Adagrad. 
Adam
Write down the Adam algorithm and explain the biascorrection terms $(1\beta^t)$. 
Regularization
Define overfitting and sketch what a regularization is. Then describe basic regularization methods like early stopping, L2 and L1 regularization, dataset augmentation, ensembling and label smoothing. 
Dropout
Describe the dropout method and write down exactly how is it used during training and during inference. Then explain why it cannot be used on RNN state, describe the variational dropout variant, and also describe layer normalization. 
Network Convergence
Describe factors influencing network convergence, namely: Parameter initialization strategies (explain also why batch normalization helps with this issue).
 Problems with saturating nonlinearities (and again, why batch normalization helps; you can also discuss why NLL helps with saturating nonlinearities on the output layer).
 Gradient clipping (and the difference between clipping individual gradient elements or the gradient as a whole).

Convolution
Write down equations of how convolution of a given image is computed. Assume the input is an image $I$ of size $H \times W$ with $C$ channels, the kernel $K$ has size $N \times M$, the stride is $T \times S$, the operation performed is n fact crosscorrelation (as usual in convolutional neural networks) and that $O$ output channels are computed. Explain both $\textit{SAME}$ and $\textit{VALID}$ padding schemes and write down output size of the operation for both these padding schemes. 
Batch Normalization
Describe the batch normalization method and explain how it is used during training and during inference. Explicitly write over what is being normalized in case of fully connected layers, and in case of convolutional layers. Compare batch normalization to layer normalization. 
VGG and ResNet
Describe overall architecture of VGG and ResNet (you do not need to remember exact number of layers/filters, but you should know when a BatchNorm is executed, when ReLU, and how residual connections work when the number of channels increases). Then describe two ResNet extensions (WideNet, DenseNet, PyramidNet, ResNeXt). 
Object Detection and Segmentation
Describe object detection and image segmentation tasks, and sketch FastRCNN, FasterRCNN and MaskRCNN architectures. Notably, show what the overall architectures of the networks are, explain the RoIpooling and RoIalign layers, show how the network predicts RoI sizes, how do the losses looks like, how are RoI chosen during training and prediction, and what region proposal network does. 
Object Detection
Describe object detection task, and sketch FastRCNN, FasterRCNN and RetinaNet architectures. Notably, show the overall architectures of the networks, explain the RoIpooling layer, show how the network predicts RoI sizes, how do the losses looks like (classification loss, boundary prediction loss, focal loss for RetinaNet), and what a feature pyramid network is. 
LSTM
Write down how the Long ShortTerm Memory cell operates. 
GRU and Highway Networks
Show a basic RNN cell (using just one hidden layer) and then write down how it is extended using gating into the Gated Recurrent Unit. Finally, describe highway networks and compare them to RNN. 
Sequence classification and CRF
Describe how RNNs, bidirectional RNNs and multilayer RNNs can be used to classify every element of a given sequence (i.e., what the architecture of a tagger might be; include also residual connections and suitable places for dropout layers). Then, explain how a CRF layer works, define score computation for a given sequence of inputs and sequence of labels, describe the loss computation during training, and sketch the inference algorithm. 
CTC Loss
Describe CTC loss and the whole settings which can be solved utilizing CTC loss. Then show how CTC loss can be computed. Finally, describe greedy and beam search CTC decoding. 
Word2vec and Hierarchical and Negative Sampling
Explain how can word embeddings be precomputed using the CBOW and Skipgram models. First start with the variants where full softmax is performed, and then describe how hierarchical softmax and negative sampling is used to speedup training of word embeddings. 
Characterlevel word embeddings
Describe why are characterlevel word embeddings useful. Then describe the two following methods: RNN: using bidirectional recurrent neural networks
 CNN: describe how convolutional networks (CNNs) can be used to compute characterlevel word embeddings. Write down the exact equation computing the embedding, assuming that the input word consists of characters $\{x_1, \ldots, x_N\}$ represented by embeddings $\{e_1, \ldots, e_N\}$ for $e_i \in \mathbb R^D$, and we use $F$ filters of widths $w_1, \ldots, w_F$. Also explicitly count the number of parameters.

Neural Machine Translation and BPE
Draw/write how an encoderdecoder architecture is used for machine translation, both during training and during inference, including attention. Furthermore, elaborate on how subword units are used to reduce outofvocabulary problem and sketch BPE algorithm for constructing fixed number of subword units. 
Variational Autoencoders
Describe deep generative modelling using variational autoencoders – show VAE architecture, devise training algorithm, write training loss, and propose sampling procedure. 
Generative Adversarial Networks
Describe deep generative modelling using generative adversarial networks  show GAN architecture and describe training procedure and training loss. Mention also CGAN (conditional GAN) and sketch generator and discriminator architecture in a DCGAN. 
Speech Synthesis
Describe the WaveNet network (what a dilated convolution and gated activations are, how the residual block looks like, what the overall architecture is, and how global and local conditioning work). Discuss parallelizability of training and inference, show how Parallel WaveNet can speedup inference, and sketch how it is trained. 
Reinforcement learning
Describe the general reinforcement learning settings and describe the Monte Carlo algorithm. Then, formulate the policy gradient theorem (proof not needed), write down the REINFORCE algorithm, the REINFORCE with baseline algorithm, and sketch now it can be used to design the NasNet. 
Transformer
Describe Transformer architecture, namely the selfattention layer, multihead selfattention layer, and overall architecture of an encoder and a decoder. Also discuss the positional embeddings. 
Neural Turing Machines
Sketch an overall architecture of a Neural Turing Machine with an LSTM controller, assuming $R$ reading heads and one write head. Describe the addressing mechanism (content addressing and its combination with previous weights, shifts, and sharpening), and reading and writing operations.