General
Content
The course serves as an introduction to the topic of data science and related topics such as machine learning. You will be introduced to data science methods and tools to find interesting information from data. Specific topics on the course include processing of digital signals such as speech and images, statistical estimation of parametric distributions, classification, prediction, clustering, pattern mining, and network analysis for developing search engines for hypertext collections such as the Web. During the course, we will review the following topics in data science:- Lecture 1: Course introduction. How to derive useful knowledge from data?
- Lecture 2: How to represent data as vectors?
- Lecture 3: Principal component analysis
- Lecture 4: Estimation theory
- Lecture 5: Estimation theory continued
- Lecture 6: Pattern recognition
- Lecture 7: Self-Organizing Map algorithm
- Lecture 8: Identifying patterns from data
- Lecture 9: Algorithms for building search engines
Learning outcomes:
After the course, you can describe how natural data such as images, natural language, speech and time series measurements can be represented as data in digital form. You can apply elementary statistical and algorithmic methods to process the digital data to yield insights to the data generating phenomenon. You will understand what processes constitute the data science pipeline in the analysis, starting from natural data and ending with actionable results.
Assessment Methods and Criteria:
Overall grade is determined by exam grade. Attendance in the exercise sessions will earn the student extra exam points. Attendance in lectures, demonstration exercises, and computer exercises is voluntary. We give bonus points for attendance in demonstration exercise sessions as well as computer exercise sessions. There are 0.5 exam points given for attending each exercise session. The total sum of points will be added to the points from the exam. The exam is mandatory and determines the grade you are getting. By actively attending the exercise sessions, you can, of course, earn more points.
Teaching Period:
II (Autumn)
Workload:
The course consists of the following:- Lectures, total 22 hours including the recap of the course
- Demonstration exercises, in total 8 hours (2 hours per week)
- Computer exercises, in total 8 hours (2 hours per week)
- Independent work makes up the rest of the course. This includes solving the exercise problems on your own, or in a group
Study Material:
The lecture materials are distributed as slide decks.Demonstration exercises are Jupyter notebooks that include descriptions and Python code in order to solve a data science-related problem.Computer exercises are Jupyter notebooks that include Python program code and data sets that are used to solve a data science problem.Basics of using Python, Jupyter notebooks are reviewed during the first week in the computer exercise sessions.- Lecture 1: Course introduction. How to derive useful knowledge from data?