Topic: Lectures | ELEC-E5510 - Speech Recognition D, Lecture, 3.11.2021-17.12.2021

In this list, the 2020 slides will be replaced by the 2021 ones after each lecture is given. The titles may be identical, but the contents are improved each year based on feedback. The project works and their schedule changes each year.

For practicalities, e.g. regarding to the Lecture Quizzes and Exercises, check MyCourses > Course Practicalities

Select activity Zoom for lectures

Not available unless: You belong to any group

Zoom for lectures URL

Zoom for the lectures. Passcode is 393499
Select activity Link to the recorded lectures

Link to the recorded lectures URL

Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.
If you need to access the videos from a gmail address, or the access does not work for you otherwise, just make a usual google access request.
Select activity Lecture 1 - Feature extraction

Lecture 1 - Feature extraction
Select activity features (2021) - final
features (2021) - final File PDF
course organization
what is ASR
features of speech
MFCC
GMM
DNN
Select activity Lecture exercise 1: Gaussian mixture model

Lecture exercise 1: Gaussian mixture model Assignment

Students must

Make a submission

Please upload your answer here, e.g. as a photo, text or pdf file
Select activity Exercise session - 1See home assignment 1

Exercise session - 1
See home assignment 1
Select activity Lecture 2 - Phoneme modeling

Lecture 2 - Phoneme modeling
Select activity hmm (2021)
hmm (2021) File PDF
Phonemes
HMMs
Select activity HMM (2021) part2

HMM (2021) part2 File PDF
Select activity Lecture exercise 2A: Forward algorithm

Lecture exercise 2A: Forward algorithm Assignment

Students must

Make a submission
Select activity Lecture exercise 2B: Viterbi search

Lecture exercise 2B: Viterbi search Assignment

Students must

Make a submission
Select activity Exercise session - 2see Homework Assignment 2

Exercise session - 2
see Homework Assignment 2
Select activity Lecture 3 - Language Modeling

Lecture 3 - Language Modeling
Select activity LM (2021)
LM (2021) File PDF
lexicon
language modeling
n-grams
smoothing
NNLMs
Select activity Neural network language models (2021)
Neural network language models (2021) File PDF
Intro to NNLM
Recurrent neural network language models
Long Short-Term Memory language models
Attention
Select activity Lecture exercise 3: NNLM

Lecture exercise 3: NNLM Assignment

Students must

Make a submission
Select activity Exercise session - 3See Homework Assignment 3

Exercise session - 3
See Homework Assignment 3
Select activity Lecture 4 - Continuous speech and decoding

Lecture 4 - Continuous speech and decoding
Select activity LVCSR (2021) - Final version
LVCSR (2021) - Final version File PDF
recognition in continuous speech
token passing decoder
improving the recognition performance and speed
measuring the recognition performance
Select activity Lecture exercise 4: Token passing decoder

Lecture exercise 4: Token passing decoder Assignment

Students must

Make a submission

The goal is to verify that you have the learned the idea of a Token passing decoder. The extremely simplified HMM system is almost the same as in the 2B Viterbi algorithm exercise. The observed "sounds" are just quantified to either "A" or "B" with given probabilities in states S0 and S1. Now the task is to find the most likely state sequence to can produce the sequence of sounds A, A, B using a simple language model (LM). The toy LM used here is a look-up table that tells probabilities for different state sequences, (0,1), (0,0,1) etc., up to 3-grams.
Hint: You can either upload an edited source document, a pdf file, a photo of your notes or a text file with numbers. Whatever is easiest for you. To get the activity point the answer does not have to be correct.
Select activity Exercise session - 4See Homework Assignment 4

Exercise session - 4
See Homework Assignment 4
Select activity Lecture 5 - End-to-end ASR with deep neural networ...

Lecture 5 - End-to-end ASR with deep neural networks
Select activity Lecture 5 slides for Group works (2021)

Lecture 5 slides for Group works (2021) File PDF

Schedule for presentations
Tasks for presenters and audience
Select activity E2E-ASR: Google slides (2021 version, final slides including lecture exercise solution)

E2E-ASR: Google slides (2021 version, final slides including lecture exercise solution) URL
Select activity Lecture exercise 5: Computing attention

Lecture exercise 5: Computing attention Assignment

Students must

Make a submission

Compute one step of the Attention mechanism - the answer is the acoustic context vector that the attention mechanism produces, and which is then used to compute the next output.
Select activity Exercise session - 5See Homework Assignment 5

Exercise session - 5
See Homework Assignment 5
Select activity Group project presentations, Wednesday 8th Decembe...
Group project presentations, Wednesday 8th December (meeting starts at 10:00)
10:00 Language model adaptation (group 1)
10:30 Language identification (group 4)
11:00 Paralinguistic systems (group 5)
11:30 Restore Capitalization and Punctuation in ASR output (group 6)
Select activity Group project presentations, Thursday 9th December...
Group project presentations, Thursday 9th December (meeting starts at 10:00)
10:00 Audio event tagging (group 7)
10:30 Curriculum learning for ASR (group 17)
11:00 Mispronunciation detection (group 20)
11:30 Speaker identification/verification (group 10) POSTPONED TO 15 DEC 13:00
Select activity Group project presentations, Friday 10th December ...
Group project presentations, Friday 10th December (meeting starts at 14:30)
14:00 Speaker adaptation (group 14) CANCELLED
14:30 Speech emotion recognition (group 15)
15:00 End-to-end Speech Translation (group 11)
15:30 Finite state transducers in Speech Recognition (group 19)
Select activity Group project presentations, Monday 13th December ...
Group project presentations, Monday 13th December (meeting starts at 10:00)
10:00 Automatic detection of alcohol intoxication (group 3)
10:30 Fine-tuning wav2vec2 for a low-resource setting (group 2)
Select activity Group project presentations, Tuesday 14th December...
Group project presentations, Tuesday 14th December (meeting starts at 13:00)
13:00 Spoken Language Understanding (group 18)
13:30 E2E speech recognition for TIMIT (group 9)
14:00 Spoken Language Modelling (group 12)
Select activity Group project presentations, Wednesday 15th Decemb...
Group project presentations, Wednesday 15th December (meeting starts at 13:00)
13:00 Speaker identification/verification (group 10)
Select activity Group project presentations, Friday 17th December ...
Group project presentations, Friday 17th December (meeting starts at 10:30)
10:00 (open slot) Not needed => cancelled.
10:30 Speech command recognition (group 8)
11:00 Conclusion
Select activity PresentersTwo days before (or earlier if possible)...
Presenters
Two days before (or earlier if possible): Select one article for others to read and send the link to everybody in MyCourses discussion forum
One day before: Upload your slides in MyCourses. The latest version of slides will be published for others in MyCourses. You can also share a draft of the slides or a link in the discussion forum.
Practise to make sure that you will not exceed the 20 mins limit
Remember your “audience” duties for the other talks of your day
Audience
For each talk do this:
One day before: Read the provided articles and prepare one question to ask for each talk
Follow the talk and the slides and ask your question in chat
After the talk (max 1 day): Submit feedback (one for each talk) in MyCourses (all fields are required) to get activity points. The anonymous feedback (pros and cons) will be later shown to the presenters

MyCourses service break

ELEC-E5510 - Speech Recognition D, Lecture, 3.11.2021-17.12.2021

Topic outline

Lectures

Lecture 1 - Feature extraction

Exercise session - 1

Lecture 2 - Phoneme modeling

Exercise session - 2

Lecture 3 - Language Modeling

Exercise session - 3

Lecture 4 - Continuous speech and decoding

Exercise session - 4

Lecture 5 - End-to-end ASR with deep neural networks

Exercise session - 5

Group project presentations, Wednesday 8th December (meeting starts at 10:00)

Group project presentations, Thursday 9th December (meeting starts at 10:00)

Group project presentations, Friday 10th December (meeting starts at 14:30)

Group project presentations, Monday 13th December (meeting starts at 10:00)

Group project presentations, Tuesday 14th December (meeting starts at 13:00)

Group project presentations, Wednesday 15th December (meeting starts at 13:00)

Group project presentations, Friday 17th December (meeting starts at 10:30)

Students

Teachers

About service