Machine learning is reaching notable success when solving complex tasks in many fields. This course serves as in introduction to basic machine learning concepts and techniques, focusing both on the theoretical foundation, and on implementation and utilization of machine learning algorithms in Python programming language. High attention is paid to the ability of application of the machine learning techniques on practical tasks, in which the students try to devise a solution with highest performance.
Python programming skills are required, together with basic probability theory knowledge.
To pass the practicals, you need to obtain at least 80 points (excluding the bonus points), which are awarded for home assignments. Note that up to 40 points above 80 will be transfered to the exam.
To pass the exam, you need to obtain at least 60, 75 and 90 out of 100 points for the written exam (plus up to 40 points from the practicals), to obtain grades 3, 2 and 1, respectively.
The lecture content, including references to study materials. The main study material is the Pattern Recognition and Machine Learning by Christopher Bishop, referred to as PRML.
References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.
The tasks are evaluated automatically using the ReCodEx Code Examiner. The evaluation is performed using Python 3.6, scikit-learn 0.21.3, pandas 0.25.1 and NumPy 1.17.2.
You can install all required packages either to user packages using
pip3 install --user scikit-learn==0.21.3 pandas==0.25.1,
or create a virtual environment using
python3 -m venv VENV_DIR
and then installing the packages inside it by running
VENV_DIR/bin/pip3 install scikit-learn==0.21.3 pandas==0.25.1.
Working in teams of size 2 (or at most 3) is encouraged. All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. However, each such solution must explicitly list all members of the team to allow plagiarism detection using this template.
Deadline: Oct 20, 23:59 3 points
Starting with the linear_regression_manual.py template, solve a linear regression problem using the algoritm from the lecture which explicitly computes the matrix inversion. Then compute root mean square error on the test set.
Deadline: Oct 20, 23:59 3 points
Starting with the linear_regression_l2.py
scikit-learn to train regularized linear regression models
and print the results of the best of them.
Deadline: Oct 20, 23:59 3 points+5 bonus
This assignment is a competition task. Your goal is to perform linear regression on the data from a rental shop. The train set contains 1000 instances, each instance consists of 12 features, both integral and real.
template show how to load the
available in the repository. Furthermore, it shows how to save a trained
estimator, how to load it, and it shows
recodex_predict method which
is called during ReCodEx evaluation.
The performance of your system is measured using root mean squared error and your goal is to achieve RMSE less than 130. Note that you can use any sklearn algorithm to solve this exercise.
Deadline: Nov 03, 23:59 4 points
Starting with the feture_engineering.py template, learn how to perform basic feature engineering using scikit-learn.
Deadline: Nov 03, 23:59 5 points
Starting with the linear_regression_sgd.py, implement minibatch SGD for linear regression. Evaluate it using cross-validation and compare the results to an explicit linear regression solver.
Deadline: Nov 03, 23:59 3 points
Starting with the perceptron.py template, implement the perceptron algorithm.
Deadline: Nov 03, 23:59 4 points+5 bonus
This assignment is a competition task. Your goal is to perform binary classification on the data from contract approval. The train set contains 500 instances, each instance consists of 15 features, both integral and real.
Rest of the details to appear later.
In the competitions, your goal is to train a model and then predict target values on the test set available only in ReCodEx.
When submitting a competition solution to ReCodEx, you can include any
number of files of any kind. However, these should be exactly one
Python source (
.py) containing a top-level method
This method is called with the test input data in a Numpy array
and should return the predictions again as a Numpy array.
If your submission contains a trained model(s), you should also submit the Python source you used to train it.
ReCodEx starts the evaluation by importing all Python sources and checking
if they export
recodex_predict method. Then it executes it, evaluates the
prediction, and returns one of the following results:
recodex_predict, or it crashed during prediction, or it generated an output with incorrect size.
recodex_predict, but it did not achieve required performance. The percentage returned is either
required/achieved(depending on whether the goal is to get over/under the requirement). No points are awarded.
After the deadline, the exact performance becomes visible for all submissions.
Note that in any case, the exit code of your solution is reported as 0.
Everyone surpassing the required performance immediately gets the regular points for the assignment.
Furthermore, after the deadline, the latest submission of every user passing the required baseline is collected, and bonus points are awarded depending on relative ordering of performance of the selected submissions.
scipy, or anything you implement yourself. Do not use deep network frameworks like TensorFlow or PyTorch.