Topic outline

  • In this list, the 2020 slides will be replaced by the 2021 ones after each lecture is given. The titles may be identical, but the contents are improved each year based on feedback. The project works and their schedule changes each year. 

    For practicalities, e.g. regarding to the Lecture Quizzes and Exercises, check MyCourses > Course Practicalities

    • Not available unless: You belong to any group
      URL icon
      Zoom for lectures URL
      Zoom for the lectures. Passcode is 393499
    • URL icon

      Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.

      If you need to access the videos from a gmail address, or the access does not work for you otherwise, just make a usual google access request.

    • Lecture 1 - Feature extraction

    • File icon
      • course organization
      • what is ASR
      • features of speech
      • MFCC
      • GMM
      • DNN
    • Assignment icon

      Please upload your answer here, e.g. as a photo, text or pdf file

    • Exercise session - 1

      See home assignment 1


    • Lecture 2 - Phoneme modeling

    • File icon
      • Phonemes
      • HMMs
    • Assignment icon
    • Assignment icon
    • Exercise session - 2

      see Homework Assignment 2

    • Lecture 3 - Language Modeling

    • File icon
      • lexicon
      • language modeling
      • n-grams
      • smoothing
      • NNLMs
    • File icon

      • Intro to NNLM
      • Recurrent neural network language models
      • Long Short-Term Memory language models
      • Attention


    • Assignment icon
    • Exercise session - 3

      See Homework Assignment 3

    • Lecture 4 - Continuous speech and decoding

    • File icon
      • recognition in continuous speech
      • token passing decoder
      • improving the recognition performance and speed
      • measuring the recognition performance
    • Assignment icon

      The goal is to verify that you have the learned the idea of a Token passing decoder. The extremely simplified HMM system is almost the same as in the 2B Viterbi algorithm exercise. The observed "sounds" are just quantified to either "A" or "B" with given probabilities in states S0 and S1.  Now the task is to find the most likely state sequence to can produce the sequence of sounds A, A, B using a simple language model (LM). The toy LM used here is a look-up table that tells probabilities for different state sequences, (0,1), (0,0,1) etc., up to 3-grams.

      Hint: You can either upload an edited source document, a pdf file, a photo of your notes or a text file with numbers. Whatever is easiest for you. To get the activity point the answer does not have to be correct.

    • Exercise session - 4

      See Homework Assignment 4

    • Lecture 5 - End-to-end ASR with deep neural networks


    • File icon

      Schedule for presentations

      Tasks for presenters and audience

    • Assignment icon
      Compute one step of the Attention mechanism - the answer is the acoustic context vector that the attention mechanism produces, and which is then used to compute the next output.
    • Exercise session - 5

      See Homework Assignment 5

    • Group project presentations, Wednesday 8th December (meeting starts at 10:00)

      • 10:00 Language model adaptation (group 1)
      • 10:30 Language identification (group 4)
      • 11:00 Paralinguistic systems (group 5)
      • 11:30 Restore Capitalization and Punctuation in ASR output (group 6)

    • Group project presentations, Thursday 9th December (meeting starts at 10:00)

      • 10:00 Audio event tagging (group 7)
      • 10:30 Curriculum learning for ASR (group 17)
      • 11:00 Mispronunciation detection (group 20)
      • 11:30 Speaker identification/verification (group 10) POSTPONED TO 15 DEC 13:00

    • Group project presentations, Friday 10th December (meeting starts at 14:30)

      • 14:00 Speaker adaptation (group 14) CANCELLED
      • 14:30 Speech emotion recognition (group 15)
      • 15:00 End-to-end Speech Translation (group 11)
      • 15:30 Finite state transducers in Speech Recognition (group 19)

    • Group project presentations, Monday 13th December (meeting starts at 10:00)

      • 10:00 Automatic detection of alcohol intoxication (group 3)
      • 10:30 Fine-tuning wav2vec2 for a low-resource setting (group 2)

    • Group project presentations, Tuesday 14th December (meeting starts at 13:00)

      • 13:00 Spoken Language Understanding (group 18)
      • 13:30 E2E speech recognition for TIMIT (group 9)
      • 14:00 Spoken Language Modelling (group 12)

    • Group project presentations, Wednesday 15th December (meeting starts at 13:00)

      • 13:00 Speaker identification/verification (group 10)

    • Group project presentations, Friday 17th December (meeting starts at 10:30)

      • 10:00 (open slot) Not needed => cancelled.
      • 10:30 Speech command recognition (group 8)
      • 11:00 Conclusion

    • Presenters
      • Two days before (or earlier if possible): Select one article for others to read and send the link to everybody in MyCourses discussion forum
      • One day before: Upload your slides in MyCourses. The latest version of slides will be published for others in MyCourses. You can also share a draft of the slides or a link in the discussion forum.
      • Practise to make sure that you will not exceed the 20 mins limit
      • Remember your “audience” duties for the other talks of your day
      Audience
      For each talk do this:

      • One day before: Read the provided articles and prepare one question to ask for each talk
      • Follow the talk and the slides and ask your question in chat
      • After the talk (max 1 day): Submit feedback (one for each talk) in MyCourses (all fields are required) to get activity points. The anonymous feedback (pros and cons) will be later shown to the presenters