Credits: 5

Schedule: 07.01.2020 - 08.04.2020

Contact information for the course (applies in this implementation): 

Lecturer Mikko Kurimo , course assistants Aku Rouhe and Ekaterina Voskoboinik . The offices are on Rakentajanaukio 2 C, 4th floor. Arrange a meeting time in advance by email.

Teaching Period (valid 01.08.2018-31.07.2020): 

III-IV 2018 – 2019, 2019 – 2020 (spring)

Learning Outcomes (valid 01.08.2018-31.07.2020): 

After attending the course, the student knows how statistical and adaptive methods are used in information retrieval, machine translation, text mining, speech processing and related areas to process natural language contents. Furthermore, the student can apply the basic methods and techniques used for statistical natural language modeling including, for instance, clustering, classification, Hidden markov models and Bayesian models.

Content (valid 01.08.2018-31.07.2020): 

Many core applications in modern information society such as search engines, social media, machine translation, speech processing and text mining for business intelligence apply statistical and adaptive methods. This course provides information on these methods and teaches basic skills on how they are applied on natural language data. Each topic is handled by a high level expert in the area.

Details on the course content (applies in this implementation): 

The course contains several visiting lectures from industry. The visitors are typically experts who have a relevant PhD and long practical experience particularly for the topic that they teach.

Assessment Methods and Criteria (valid 01.08.2018-31.07.2020): 

Examination and exercise work.

Elaboration of the evaluation criteria and methods, and acquainting students with the evaluation (applies in this implementation): 

  • 40% of the grade comes from the exam. The exam will be organized in April during the exam week of the IV period. For those who can not participate in it, there will be a second exam in Autumn. Exams passed in previous years are still valid for completing the course.
  • 20% of the grade is from the weekly home exercises
  • 40% of the grade is from the project work. It depends on experiments, final report and self-grading. Course projects accepted in previous years are still a valid for completing the course.
  • The course includes a mandatory entrance test. It is very easy, but it must be taken before 14 January. Information on how this is done will be sent to the participants
    via MyCourses in the beginning of January. The purpose of the test is to filter the students who aim at doing the project work and completing the course. It will also be used to find out the expectations, preferences and background skills of the students by self-evaluation.

Workload (valid 01.08.2018-31.07.2020): 

Lectures and excercise sessions approximately 30 h

Independent work approximately 103 h

Total 133 h

Attendance in some contact teaching may be compulsory

Details on calculating the workload (applies in this implementation): 

  • Active attendance to the weekly lectures, studying the material and taking the exam corresponds to 2 cr. The participation to the Tuesday lectures is not
    mandatory, but recommended for reaching the learning outcomes of the course.
  • Participation to the weekly exercise sessions and submitting the home
    exercises corresponds to 1 cr. The participation to the Thursday sessions is not
    mandatory, but highly recommended. The assistance for the home
    exercises is only available during these
    sessions. The DL for submitting the home exercises is before the next Tuesday lecture.
  • Participation to the project work is worth 2 cr. The project work is
    performed in groups of three students. Note that the groups will be composed on January 15 by the course assistants. Thus, the
    participants must indicate their group preferences by January 14 as part of the course entrance test. Information on how this is done will be sent to the participants
    via MyCourses in the beginning of January. The group can then select and register their topic in MyCourses by 4 February. The DL for the final report is 28 April.

Study Material (valid 01.08.2018-31.07.2020): 

C. Manning, H. Schütze, 1999. Foundations of Statistical Natural Language Processing. The MIT Press; Lecture notes.

Details on the course materials (applies in this implementation): 

Each lecture may have some additional material which is specified in the lecture slides which appear in MyCourses.

Substitutes for Courses (valid 01.08.2018-31.07.2020): 

T-61.5020 Statistical Natural Language Processing P

Course Homepage (valid 01.08.2018-31.07.2020):

Prerequisites (valid 01.08.2018-31.07.2020): 

Basic mathematics and probability courses.

Grading Scale (valid 01.08.2018-31.07.2020): 


Registration for Courses (valid 01.08.2018-31.07.2020): 

In WebOodi

Further Information (valid 01.08.2018-31.07.2020): 

Language class 3: English

Details on the schedule (applies in this implementation): 

  • The first lecture is on Tuesday January 7 at 12-14
  • The first exercise session is on Thursday January 9 at 14-16
  • The submission date of the first home exercise is Tuesday January 14 at noon.
  • The submission date of the entrance test is Tuesday January 14 at midnight.
  • The topics of the weekly lectures and exercises will be announced in MyCourses when the schedule is ready. Changes will be announced in MyCourses.
  • The submission date of each weekly home exercise is by next Tuesday at noon.
  • The last lecture will be on Tuesday March 31.


Registration and further information