The course takes place in period

  • 3 (2024-01-01 to 2024-03-17)
  • 4 (2024-03-18 to 2024-07-31)

Level/category

Professional studies

Teaching language

English

Type of course

Compulsory

Cycle/level of course

Second

Recommended year of study

1

Total number of ECTS

5 cr

Competency aims

The aim of the course is to provide the student with
the necessary tools for handling big data sources
for machine learning modeling.

Learning outcomes

Knowledge
At the end of the course, the student is expected
to understand when it is needed to use
supercomputer facilities for solving analytical
problems.
Skill
The student will be able to run machine
learning algorithms in supercomputer facilities.
Moreover, the student will be able to run machine
learning models using spark and dask frameworks.

Course contents

The students get an overview of machine learning
to model using super computing facilities, and how
to utilize big data. The areas of descriptive and
predictive modeling are introduced for small data,
and the students are then given an explanation for
how similar models can be modified to work with
big data.
The students are introduced to the analytical
process; data-related requirement handling, domain
knowledge, modeling, and verification of results.

Prerequisites and co-requisites

Basic python programming skills are required.
Previous courses in Machine Learning for Predictive
and Descriptive problems are recommended.

Recommended or required reading

Hamstra, M., & Zaharia, M. (2013). Learning Spark:
lightning-fast big data analytics. O'Reilly &
Associates.

Daniel, J. (2019). Data Science with Python and
Dask. Simon and Schuster.

https://docs.csc.fi/support/tutorials/ml-guide/ External link

Study activities

  • Lectures - 30 hours
  • Small-group work - 70 hours
  • Individual studies - 35 hours

Workload

  • Total workload of the course: 135 hours
  • Of which autonomous studies: 135 hours
  • Of which scheduled studies: 0 hours

Mode of Delivery

Multiform education

Assessment requirements

To pass this course, the student should present a
final project in group or individually where they
use big data facilities for machine learning
modeling.

Teacher

  • Björk Kaj-Mikael
  • Espinosa Leal Leonardo
  • Scherbakov-Parland Andrej

Examiner

Espinosa Leal Leonardo

Group size

No limit (31 students enrolled)

Assignments valid until

12 months after course has ended

Course enrolment period

2023-11-24 to 2023-12-22

Course and curriculum search