Kurssi: ELEC-E5510 - Speech Recognition D, Lecture, 25.10.2023-8.12.2023, Aihe: Luennot

In this list, the 2022 slides will be replaced by the 2023 ones after each lecture is given at latest. The titles may be identical, but the contents are improved each year based on feedback. The project works and their schedule changes each year.

For practicalities, e.g. regarding to the Lecture Quizzes and Exercises, check MyCourses > Course Practicalities

Valitse aktiviteetti Lectures 1-2 Feature extraction and modeling

Lectures 1-2 Feature extraction and modeling
Valitse aktiviteetti Lectures 1-2 slides (2023)
Lectures 1-2 slides (2023) Tiedosto PDF
course organization
what is ASR
features of speech
MFCC
GMM
DNN
Valitse aktiviteetti Lecture 1-2 exercise: Gaussian mixture model
Saatavilla vasta, kun vähintään yksi: You are a(n) Opiskelija ...
Saatavilla vasta, kun vähintään yksi:

You are a(n) Opiskelija

You are a(n) Opettaja

You are a(n) Teacher (MC)
Lecture 1-2 exercise: Gaussian mixture model Tehtävä

Instructions can be found in the pdf file. Please upload your answer here, e.g. as a photo, text or pdf file
Valitse aktiviteetti Lectures 3-4 - Phoneme modeling

Lectures 3-4 - Phoneme modeling
Valitse aktiviteetti Lecture 3-4 slides (2023)
Lecture 3-4 slides (2023) Tiedosto PDF
Phonemes
HMMs
Valitse aktiviteetti Lecture 3: exercise Forward
Saatavilla vasta, kun vähintään yksi: You are a(n) Opiskelija ...
Saatavilla vasta, kun vähintään yksi:

You are a(n) Opiskelija

You are a(n) Teacher (MC)

You are a(n) Opettaja
Lecture 3: exercise Forward Tehtävä

Please type or upload your calculations here, e.g. as a photo, text or pdf file to earn a lecture activity point.
Valitse aktiviteetti Lecture 3: exercise Viterbi
Saatavilla vasta, kun vähintään yksi: You are a(n) Opiskelija ...
Saatavilla vasta, kun vähintään yksi:

You are a(n) Opiskelija

You are a(n) Teacher (MC)

You are a(n) Opettaja
Lecture 3: exercise Viterbi Tehtävä

Please type or upload your calculations here, e.g. as a photo, text or pdf file to earn a lecture activity point.
Valitse aktiviteetti Lectures 5-6 - Language Modeling

Lectures 5-6 - Language Modeling
Valitse aktiviteetti Lecture 5 slides (2023): N-gram language models
Lecture 5 slides (2023): N-gram language models Tiedosto PDF
lexicon
language modeling
n-grams, smoothing
Valitse aktiviteetti Lecture 6 slides (2023): Neural network language models
Lecture 6 slides (2023): Neural network language models Tiedosto PDF
Intro to NNLM
Recurrent neural network language models
Long Short-Term Memory language models
Attention
Valitse aktiviteetti Lecture 6 NNLM exercise
Saatavilla vasta, kun vähintään yksi: You are a(n) Teacher (MC) ...
Saatavilla vasta, kun vähintään yksi:

You are a(n) Teacher (MC)

You are a(n) Opettaja

You are a(n) Opiskelija
Lecture 6 NNLM exercise Tehtävä

Please type or upload your calculations here, e.g. as a photo, text or pdf file to earn a lecture activity point.
Valitse aktiviteetti Lectures 7-8 - Continuous speech and decoding

Lectures 7-8 - Continuous speech and decoding
Valitse aktiviteetti Lecture 7-8 slides (2023)
Lecture 7-8 slides (2023) Tiedosto PDF
recognition in continuous speech
token passing decoder
improving the recognition performance and speed
measuring the recognition performance
Valitse aktiviteetti Lecture 7 exercise: Token passing decoder
Saatavilla vasta, kun vähintään yksi: You are a(n) Teacher (MC) ...
Saatavilla vasta, kun vähintään yksi:

You are a(n) Teacher (MC)

You are a(n) Opiskelija

You are a(n) Opettaja
Lecture 7 exercise: Token passing decoder Tehtävä

Fill in the last column with final probabilities of the tokens, select the best token and output the corresponding state sequence!
The goal is to verify that you have the learned the idea of the Token passing decoder. The extremely simplified HMM system is almost the same as in the 2B Viterbi algorithm exercise. The observed "sounds" are just quantified to either "A" or "B" with given probabilities in states S0 and S1. Now the task is to find the most likely state sequence that can produce the sequence of sounds A, A, B using a simple language model (LM). The toy LM used here is a look-up table that tells probabilities for different state sequences, (0,1), (0,0,1) etc., up to 3-grams.
Hint: You can either upload an edited source document, a pdf file, a photo of your notes or a text file with numbers. Whatever is easiest for you. To get the activity point the answer does not have to be correct.
Valitse aktiviteetti Lecture 9-10 - End-to-end ASR with deep neural net...

Lecture 9-10 - End-to-end ASR with deep neural networks
Valitse aktiviteetti Lecture 9 slides (2023)

Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään

Lecture 9 slides (2023) Tiedosto PDF
Valitse aktiviteetti Lecture 10 slides (2023)

Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään

Lecture 10 slides (2023) Tiedosto PDF
Valitse aktiviteetti Lecture 9-10 slides (2022)

Saatavilla vasta, kun: Kuulut mihin tahansa ryhmään

Lecture 9-10 slides (2022) Tiedosto PDF

This is 2022, but because the content was quite different (focusing on attention-based encoder-decoder architectures) this maybe worth studying, too.
Valitse aktiviteetti Lecture 9 exercise
Saatavilla vasta, kun vähintään yksi: You are a(n) Opettaja ...
Saatavilla vasta, kun vähintään yksi:

You are a(n) Opettaja

You are a(n) Opiskelija

You are a(n) Teacher (MC)
Lecture 9 exercise Tehtävä
Valitse aktiviteetti Presentations for the weeks 6-7

Presentations for the weeks 6-7 Tiedosto PDF

Here's the presentation schedule as finalized at the last lecture.

MyCourses service break

ELEC-E5510 - Speech Recognition D, Lecture, 25.10.2023-8.12.2023

Osion kuvaus

Luennot

Lectures 1-2 Feature extraction and modeling

Lectures 3-4 - Phoneme modeling

Lectures 5-6 - Language Modeling

Lectures 7-8 - Continuous speech and decoding

Lecture 9-10 - End-to-end ASR with deep neural networks

Opiskelijoille

Opettajille

Palvelusta