Credits: 5

Schedule: 29.10.2018 - 17.12.2018

Contact information for the course (applies in this implementation): 

Course personnel is available for your questions during the lectures and the exercise sessions.

Further questions can be asked in the General discussion on the mycourses page.

Teaching Period (valid 01.08.2018-31.07.2020): 

II (Autumn)

Learning Outcomes (valid 01.08.2018-31.07.2020): 

After the course, you can describe how natural data such as images, natural language, speech and time series measurements can be represented as data in digital form. You can apply elementary statistical and algorithmic methods to process the digital data to yield insights to the data generating phenomenon. You will understand what processes constitute the data science pipeline in the analysis, starting from natural data and ending with actionable results.

Content (valid 01.08.2018-31.07.2020): 

The course serves as an introduction to the topic of data science and related topics such as machine learning. You will be introduced to data science methods and tools to find interesting information from data. Specific topics on the course include processing of digital signals such as speech and images, statistical estimation of parametric distributions, classification, prediction, clustering, pattern mining, and network analysis for developing search engines for hypertext collections such as the Web.

Details on the course content (applies in this implementation): 

During the course, we will review following topics in data science:

  • Lecture 1: Course introduction. How to derive useful knowledge from data?
  • Lecture 2: How to represent data as vectors?
  • Lecture 3: Principal component analysis
  • Lecture 4: Estimation theory
  • Lecture 5: Estimation theory continued
  • Lecture 6: Pattern recognition
  • Lecture 7: Self-Organizing Map algorithm
  • Lecture 8: Identifying patterns from data
  • Lecture 9: Algorithms for building search engines

There are also guest lectures during the lecture sessions.


Assessment Methods and Criteria (valid 01.08.2018-31.07.2020): 

Overall grade is determined by the  exam grade. Attendance in the exercise sessions will earn the student extra exam points.

Elaboration of the evaluation criteria and methods, and acquainting students with the evaluation (applies in this implementation): 

Attendance in lectures, demonstration exercises, and computer exercises is voluntary. We give bonus points for attendance in demonstration exercise sessions as well as computer exercise sessions. There is 0.5 exam points given for attending each exercise session. The total sum of points will be added to the points from the exam. The exam is mandatory and determines the grade you are getting. By actively attending the exercise sessions, you can of course earn more points.

Workload (valid 01.08.2018-31.07.2020): 

Lectures 20h, exercise sessions 20h, independent work 90h, examination 3h.

Details on calculating the workload (applies in this implementation): 

The course consists of the following:

  • lectures, total 22 hours including the recap of the course
  • demonstration exercises, in total 8 hours (2 hours per week)
  • computer exercises, in total 8 hours (2 hours per week) 
  • independent work makes up the rest of the course. This includes solving the exercise problems on your own, or in a group of students

Study Material (valid 01.08.2018-31.07.2020): 

Material will be announced on the course pages.

Details on the course materials (applies in this implementation): 

The lecture material are distributed as slide decks.

Demonstration exercises are Jupyter notebooks that include descriptions and Python code in order to solve a data science related problem.

Computer exercises are Jupyter notebooks that include Python program code and data sets that are used to solve a data science problem.

Basics of using Python, Jupyter notebooks are reviewed during the first week in the computer exercise sessions.

Substitutes for Courses (valid 01.08.2018-31.07.2020): 

CS-C3110 Datasta tietoon (From Data to Knowledge).

Prerequisites (valid 01.08.2018-31.07.2020): 

Skills needed on the course are taught on  introductory courses in mathematics and statistics and programming. Specifically, matrix algebra, derivatives of functions, and statistical distributions will be needed on the course.

Grading Scale (valid 01.08.2018-31.07.2020): 

0-5.

Details on the schedule (applies in this implementation): 

The course starts on October 29, 2018. All sessions will be held according to the schedule listed on the course home page. The last session is held on December 3, 2018.

See the exam schedule for the exact timing and place of the exams. The first opportunity to take the exam is in December, 2018.

Description

Registration and further information