Faculty of Mathematics and Physics

The seminar focuses on deeper understanding of selected machine learning methods for students who have already have basic knowledge of machine learning and probability models. The first half of the semester is devoted to methods of unsupervised learning using Bayesian inference (Dirichlet-Categorical models, Mixture of Categoricals, Mixture of Gaussians, Expectation Maximization, Gibbs sampling) and implementation of these methods on selected tasks. Other two lectures will be devoted to inspecting deep neural networks. Further topics are selected according to students interest.

SIS code: NPFL097

Semester: winter

E-credits: 3

Examination: 0/2 C

Guarantor: David Mareček

The seminar is held on Thursday, 9:00 - 10:30 in S1 (fourth floor)

Students are expected to be familiar with basic probabilistic and ML concepts, roughly in the extent of:

In the second half of the course, you should be familiar with the basics of deep-learning methods. I recommend to attend

- NPFL114 - Deep Learning

- There are two programming assignments during the term. For each one, you can obtain 10 points. When submitted after the deadline, you can obtain at most half of the points.
- You can obtain 10 point for individual 30-minutes presentation about selected machine learning method or task.
- You pass the course if you obtain at least 13 points.

1. Introduction Slides Warm-Up test

2. Beta-Bernoulli probabilistic model Beta-Bernoulli Beta distribution

3. Dirichlet-Categorical probabilistic model Dirichlet-Categorical Document collections

4. Modeling document collections, Categorical Mixture models, Expectation-Maximization Categorial Mixture Models Gibbs Sampling Gibbs Sampling for Bayesian mixture Expectation Maximization Gibbs Sampling

5. Gibbs Sampling, Latent Dirichlet allocation Latent Dirichlet allocation Algorithms for LDA and Mixture of Categoricals Latent Dirichlet Allocation

6. Working on and discussing assignment1

7. Text segmentation Chinese Restaurant Process Bayessian inference with Tears Unuspervised text segmentation

8. Working on and discussing assignment2

9. Mixture of Gaussians and other clustering methods K-Means and Gaussian Mixture Models

10. Working on and discussing assignment2

11. Inspecting Neural Networks

12. Latent learning of POS, word-alignment, and depednency structures

Oct 4

- Course overview Slides
- revision of the basics of probability and machine learning theory Warm-Up test

Oct 11

- answering questions from the warm-up test
- slides for Beta-Bernoulli models by Carl Edward Rasmussen from University of Cambridge
- How to compute expected value of the Beta distribution can be found here: Beta distribution
- Web application showing the Beta-Bernouli distribution and many others can be found at RandomServices.com. models by Carl Edward Rasmussen from University of Cambridge

Oct 18

- slides for Dirichlet-Categorical and Document collections by Carl Edward Rasmussen from University of Cambridge

Oct 25

- slides for Categorial Mixture Models and Gibbs Sampling and Gibbs Sampling for Bayesian mixture and Expectation Maximization by Carl Edward Rasmussen from University of Cambridge
- Gibbs sampling from the bivariate normal distribution: Gibbs Sampling
- Expectation Maximization is also very well described in Chapter 9 in the Bishop's book: Pattern Recognition and Machine Learning

Nov 8

- slides for Latent Dirichlet allocation by Carl Edward Rasmussen from University of Cambridge
- slides for Algorithms for LDA and Mixture of Categoricals
- see also Chapter 11 in the Bishop's book: Pattern Recognition and Machine Learning
- Assignment 1: Latent Dirichlet Allocation

Nov 15

Nov 29

- Unsupervised segmentation of texts in languages which does not use spaces between words Chinese Restaurant Process
- tutorial Bayessian inference with Tears by Kevin Knight (2009)
- Assignment 2: Unuspervised text segmentation

Dec 6

Dec 13

- slides K-Means and Gaussian Mixture Models by David Rosenberg from University of New York

Dec 20

Jan 3

- Deep neural networks in NLP as a BlackBox
- What is being learned in their hiden states?
- How the attention mechanism works?

Jan 10

- Word embeddings vs. POS tags
- Word alignment vs. attention mechanism
- Dependency parsing vs. self-attention mechanism

Deadline: Dec 5 23:59 10 points

- Instructions and questions: lda-assignment.pdf,
- Data: lda-data.zip

Deadline: Dec 20 23:59 10 points

- You will get English texts where the spaces between words were removed. The task is to use Bayessian inference to bring the spaces back in a completely unsupervised way. The task is relevant e.g. for Chinese, Japanese, Thai, or other languages not separating words. English was chosen so that everyone could see how good his/her results are. In case you have not attended the lecture, you can find all necesary information in Kewin Knight's tutorial Bayessian inference with tears. Try several hyperparameter combination to gain as good results as possible. Try also the simulated annealing method to delay the Gibbs sampling convergence. slides
- Data: eng-input.txt