Topic outline

  • In this list, the 2019 slides will be replaced by the 2020 ones after each lecture is given. The titles may be identical, but the contents are improved each year based on feedback. The project works and their schedule changes each year. Compared to 2019, there are significant content changes for lectures 1, 3, 4 and 5.


    For practicalities, e.g. regarding to the Lecture Quizzes and Exercises, check MyCourses > Course Practicalities

    • File icon
      Lecture activity scores - final version File
      Not available unless: You belong to any group

      Explanation of columns:

      1        Pre-survey   
      2        L1kahoot   
      3        L1pen        Lecture 1 exercise
      4        L1feedback   
      5        L2kahoot   
      6        L2forward    Lecture 2a exercise
      7        L2viterbi    Lecture 2b exercise
      8        L2feedback   
      9        L3kahoot   
      10        L3gates        Lecture 3 exercise
      11        L3feedback   
      12        L4kahoot   
      13        L4token        Lecture 4 exercise
      14        L4feedback   
      15        L5kahoot   
      16        L5attention    Lecture 5 exercise
      17        L5feedback   
      18        S1nativefb    Seminar 1: talk 1
      19        S1filterfb    Seminar 1: talk 2
      20        S2subwordfb    Seminar 2: talk 1
      21        S2lmadafb    Seminar 2: talk 2
      22        S2cmdfb        Seminar 2: talk 3
      23        S2vadfb        Seminar 2: talk 4
      24        S3lid        Seminar 3: talk 1
      25        S3spkrada    Seminar 3: talk 2
      26        S3autoenc    Seminar 3: talk 3
      27        S3sprkverif    Seminar 3: talk 4
      28        S3audio        Seminar 3: talk 5
      29        S4e2e        Seminar 4: talk 1
      30        S4chatbot    Seminar 4: talk 2
      31        S4children    Seminar 4: talk 3
      32        S4alcohol    Seminar 4: talk 4

    • URL icon
      Zoom for lectures URL
      Not available unless: You belong to any group
    • URL icon
      NEW Drive link to view the recorded lectures URL
      Not available unless: You belong to any group

      Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.

    • URL icon
      OLD Drive link to view the recorded lectures URL
      Not available unless: You belong to any group

      Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.

      • course organization
      • what is ASR
      • features of speech
      • MFCC
      • GMM
      • DNN
      • Phonemes
      • HMMs
      • Forward algorithm
      • Viterbi search
      • HMM training algorithms
      • lexicon
      • language modeling
      • n-grams
      • smoothing
      • NNLMs
      • Intro to NNLM
      • Recurrent neural network language models
      • Long Short-Term Memory language models
      • Transformer language models


      • recognition in continuous speech
      • token passing decoder
      • improving the recognition performance and speed
      • measuring the recognition performance
    • The goal is to verify that you have the learned the idea of a Token passing decoder. The HMM system and observation are again almost the same as in 2A forward algorithm exercise. Now the task is to find the most likely state sequence to can produce the sequence of sounds A, A, B using a simple language model (LM). The toy LM used here is a look-up table that tells probabilities for different state sequences, (0,1), (0,0,1) etc., up to 3-grams.

      Hint: You can either upload an edited source document, a pdf file, a photo of your notes or a text file with numbers. Whatever is easiest for you. To get the activity point the answer does not have to be correct.

    • Three end-to-end approaches:
      - Attention-based ASR
      - Connectionist temporal classification
      - RNN Transducer
      Neural network specifics
      E2E Challenges and Applications

    • File icon
      Native language recognition (slidles and one article) File
      Not available unless: You belong to any group

      The following paper presents the Native Language Recognition challenge, and provides both a dataset and  baseline solution. This broad overview should give you everything needed for the presentation.

      https://www.isca-speech.org/archive/Interspeech_2016/pdfs/0129.PDF

      Note: Chapters that discuss Sincerity and Deception are not topics that are discussed in our presentation and thus can be ignored!

    • File icon
      Filtering text for language model training (slides and one article) File
      Not available unless: You belong to any group

      Here's a short paper that introduces the problem of training data selection for building language models that are suitable for the target task: https://www.microsoft.com/en-us/research/publication/intelligent-selection-of-language-model-training-data/
      The paper presents previous common approaches and proposes a new efficient selection technique.

    • File icon
      Subword LMs (slides and one article) File
      Not available unless: You belong to any group

      Here is the link to the article. This should cover what will be presented on Friday.

      https://www.aclweb.org/anthology/P18-1007.pdf

      Here is the presentation slide for our project. Please have a look at it.
      https://docs.google.com/presentation/d/187f7n6Vwu0gLOXTNNcfGef4RrZTTkaAvhemgDeBpUj0/edit?usp=sharing

    • File icon
      Low resource ASR (LM adaptation) File
      Not available unless: You belong to any group

      Here is an article for you to read before Friday. We haven't used deep learning, so don't focus too much on the deep learning parts, rather focus on the subject itself :)

      https://www.researchgate.net/publication/344006569_Acoustic_Modeling_Based_on_Deep_Learning_for_Low-Resource_Speech_Recognition_An_Overview

    • File icon
      Spoken command recognition (slides and one article) File
      Not available unless: You belong to any group

      The following article is the reference paper related to our topic, which is about keyword spotting or speech command recognition. It is a quite straightforward topic and we believe CNN model would be a good start to let you know the basic idea of this KWS task :)

      https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43969.pdf

      Here is the link of our slides for tomorrow's presentation. Thanks!
      https://drive.google.com/file/d/1ny2JLTF5S_YGj7X02dpUi-6-mBvnEDVj/view?usp=sharing

    • File icon
      Voice activity detection (slides and one article) File
      Not available unless: You belong to any group

      A link to an article on Voice Activity Detection topic:

      https://drive.google.com/file/d/1a9-mJ4gtMt8M50lgktL3RrpgnGqCFUC5/view?usp=sharing


      Here is the link for current version of our presentation.

      https://drive.google.com/file/d/18sQxfvzKVMzvso5PMRwWKAU7TfxTu7yp/view

    • File icon
      Language identification (slides and article) File
      Not available unless: You belong to any group

      The below a paper gives you insights on Language Identification, but the experiments that we wish to perform are not limited to it.

      https://repositorio.uam.es/bitstream/handle/10486/666848/automatic_lopez-moreno_ICASSP_2014_ps.pdf

      Hi, here are the slides for tomorrow's Language Identification presentation.

      https://drive.google.com/file/d/1_hwnzN-gR_yEhTwbZ24MdvfTow9Obfko/view

    • File icon
      Speaker adaptation (slides and article) File
      Not available unless: You belong to any group

      you can find the paper on speaker adaptation (SA) techniques here.
      .and the preliminary version of the presentation can be found here.

      There might still be slight changes to this version.

    • File icon
      Deep Denoising Autoencoder (slides and article) File
      Not available unless: You belong to any group

      Below is the paper related to my presentation about 'Deep Denoising Autoencoder for SpeechEnhancement' on Wednesday. 

      https://www.isca-speech.org/archive/archive_papers/interspeech_2013/i13_0436.pdf 


      The attached is my slides for tomorrow's presentation.

    • File icon
      Speaker verification (slides and article) File
      Not available unless: You belong to any group

      Hello, here is an article about Speaker Recognition for you to check out before our presentation on Wednesday. The experiment done in this research is not that similar to what we worked on, so there is no need to understand it thoroughly, but we think it is useful to get some insight into Speaker Verification and some methods that can be used for this. 

      https://storage.googleapis.com/pub-tools-public-publication-data/pdf/44681.pdf

      Here are the slides! (Unfortunately some images are a bit pixelated because of the max file size of 500kB). See you all tomorrow.

    • File icon
      Audio event recognition (slides and article) File
      Not available unless: You belong to any group

      I post here the link for an article that should gives you an overview about Audio Event Tagging for our presentation on Wednesday.

      https://ieeexplore.ieee.org/abstract/document/8336092

      Here our presentation! Sorry for the delay



    • File icon
      End-to-end ASR (slides and article) File
      Not available unless: You belong to any group

      here is a nice comparison between state-of-the-art hybrid DNN/HMM models and attention-based end-to-end models – this should give you a good intuition about how things are with E2E systems right now: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1780.pdf

      And here's the link to a preview of our slides: https://docs.google.com/presentation/d/1a5R9zRc5UIvXgtMfeULi8Ad-mydmTj2NoUjmZH1b7gs/edit?usp=sharing
      Sadly, the PDF is too big to upload them here.

    • File icon
      Chatbot (slides and article) File
      Not available unless: You belong to any group

      Here is the paper we would like you to read before our presentation on Friday: https://arxiv.org/pdf/1801.07243.pdf

      Here is a link to our presentation slides: https://drive.google.com/file/d/1S7XkHbL6t_XN4L05THByRuovg2DAhycl/view?usp=sharing

    • File icon
      Speech adaptation for children ASR (slides and article) File
      Not available unless: You belong to any group

      The slide(first edition) is here  https://drive.google.com/file/d/1xaDcfx-XwVfy_-ETGWE3Qdv4Zy6Q2n3j/view?usp=sharing .

      The reference paper is here https://drive.google.com/file/d/1iBFrMWMwBiwrDU6MGRIzhqAQonIsmyQX/view?usp=sharing .

    • File icon
      Alcohol intoxication (slides and article) File
      Not available unless: You belong to any group

      The following article is the reference paper related to our topic, which is about classification of an intoxicated speaker using ASR. The paper focuses on classification using text, even though our work tries to capture the emotional state of speaker based on utterance of speech and the experiment that we wish are not limited to this paper. However, this paper is good starting for getting familiar with our work.
      http://suendermann.com/su/pdf/emotion2013.pdf