Översikt

  • In this list, the 2019 slides will be replaced by the 2020 ones after each lecture is given. The titles may be identical, but the contents are improved each year based on feedback. The project works and their schedule changes each year. Compared to 2019, there are significant content changes for lectures 1, 3, 4 and 5.


    For practicalities, e.g. regarding to the Lecture Quizzes and Exercises, check MyCourses > Course Practicalities

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Lecture activity scores - final version Fil PDF

      Explanation of columns:

      1        Pre-survey   
      2        L1kahoot   
      3        L1pen        Lecture 1 exercise
      4        L1feedback   
      5        L2kahoot   
      6        L2forward    Lecture 2a exercise
      7        L2viterbi    Lecture 2b exercise
      8        L2feedback   
      9        L3kahoot   
      10        L3gates        Lecture 3 exercise
      11        L3feedback   
      12        L4kahoot   
      13        L4token        Lecture 4 exercise
      14        L4feedback   
      15        L5kahoot   
      16        L5attention    Lecture 5 exercise
      17        L5feedback   
      18        S1nativefb    Seminar 1: talk 1
      19        S1filterfb    Seminar 1: talk 2
      20        S2subwordfb    Seminar 2: talk 1
      21        S2lmadafb    Seminar 2: talk 2
      22        S2cmdfb        Seminar 2: talk 3
      23        S2vadfb        Seminar 2: talk 4
      24        S3lid        Seminar 3: talk 1
      25        S3spkrada    Seminar 3: talk 2
      26        S3autoenc    Seminar 3: talk 3
      27        S3sprkverif    Seminar 3: talk 4
      28        S3audio        Seminar 3: talk 5
      29        S4e2e        Seminar 4: talk 1
      30        S4chatbot    Seminar 4: talk 2
      31        S4children    Seminar 4: talk 3
      32        S4alcohol    Seminar 4: talk 4

    • Tillgänglig om: Du tillhör någon grupp
      URL icon
      Zoom for lectures URL
    • Tillgänglig om: Du tillhör någon grupp
      URL icon
      NEW Drive link to view the recorded lectures URL

      Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.

    • Tillgänglig om: Du tillhör någon grupp
      URL icon
      OLD Drive link to view the recorded lectures URL

      Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.

    • Lecture 1 - Feature extraction

    • Fil icon
      • course organization
      • what is ASR
      • features of speech
      • MFCC
      • GMM
      • DNN
    • Exercise session - 1

      See home assignment 1

      Exercise session Zoom-links:
      Thursday 29.10.2020 10:15-12:00

      Friday 30.10.2020 14:15-16:00

    • Lecture 2 - Phoneme modeling

    • Fil icon
      • Phonemes
      • HMMs
      • Forward algorithm
      • Viterbi search
      • HMM training algorithms
    • Inlämningsuppgift icon
    • Inlämningsuppgift icon
    • Exercise session - 2

      see Homework Assignment 2

    • Lecture 3 - Language Modeling

    • Fil icon
      • lexicon
      • language modeling
      • n-grams
      • smoothing
      • NNLMs
    • Fil icon

      • Intro to NNLM
      • Recurrent neural network language models
      • Long Short-Term Memory language models
      • Transformer language models


    • Inlämningsuppgift icon
    • Exercise session - 3

      See Homework Assignment 3

    • Lecture 4 - Continuous speech and decoding

    • Fil icon
      • recognition in continuous speech
      • token passing decoder
      • improving the recognition performance and speed
      • measuring the recognition performance
    • Inlämningsuppgift icon

      The goal is to verify that you have the learned the idea of a Token passing decoder. The HMM system and observation are again almost the same as in 2A forward algorithm exercise. Now the task is to find the most likely state sequence to can produce the sequence of sounds A, A, B using a simple language model (LM). The toy LM used here is a look-up table that tells probabilities for different state sequences, (0,1), (0,0,1) etc., up to 3-grams.

      Hint: You can either upload an edited source document, a pdf file, a photo of your notes or a text file with numbers. Whatever is easiest for you. To get the activity point the answer does not have to be correct.

    • Exercise session - 4

      See Homework Assignment 4

    • Lecture 5 - End-to-end ASR with deep neural networks


    • Fil icon

      Three end-to-end approaches:
      - Attention-based ASR
      - Connectionist temporal classification
      - RNN Transducer
      Neural network specifics
      E2E Challenges and Applications

    • Inlämningsuppgift icon
    • Exercise session - 5

      See Homework Assignment 5

    • Group project presentations, Wednesday 2nd December (lecture starts at 10:15)

      • 9:30 -
      • 10:00 -
      • 10:30 Native language (12)
      • 11:00 Filtering text (05)
      • 11:30 -
      Tasks of the presenters and the audience:

      Presenters
      • Two days before (or earlier if possible): Select one article for others to read and send the link to everybody in MyCourses discussion forum
      • One day before: Upload your slides in MyCourses. The latest version of slides will be published for others in MyCourses. You can also share a draft of the slides or a link in the discussion forum.
      • Practise to make sure that you will not exceed the 20 mins limit
      • Remember your “audience” duties for the other talks of your day
      Audience
      For each talk do this:

      • One day before: Read the provided articles and prepare one question to ask for each talk
      • Follow the talk and the slides and ask your question in chat
      • After the talk (max 1 day): Submit feedback (one for each talk) in MyCourses (all fields are required) to get activity points. The anonymous feedback (pros and cons) will be later shown to the presenters
    • Group project presentations, Friday 4th December

      • 14:15 Subwords (14)
      • 14:45 LM adaptation (01)
      • 15.15 Spoken command recognition (04)
      • 15:45 Voice activity detection (16)
      The Zoom session will be opened a bit earlier to allow the presenters to practise screen sharing and check their microphones, demos etc. So the presenterts should join in at 14:00 at the latest, to make sure everything will go smoothly.
    • Group project presentations, Wednesday 9th December

      • 09:30 Language identification (03)
      • 10:00 Speaker adaptation (10)
      • 10:30 Autoencoder (11)
      • 11:00 Speaker verification (15)
      • 11:30 Audio event recognition (02)
    • Group project presentations, Friday 11th December

      • 14:15 End-to-end ASR (07)
      • 14:45 Chatbot (13)
      • 15.15 Children (09)
      • 15:45 Intoxication (08)
    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Native language recognition (slidles and one article) Fil PDF

      The following paper presents the Native Language Recognition challenge, and provides both a dataset and  baseline solution. This broad overview should give you everything needed for the presentation.

      https://www.isca-speech.org/archive/Interspeech_2016/pdfs/0129.PDF

      Note: Chapters that discuss Sincerity and Deception are not topics that are discussed in our presentation and thus can be ignored!

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Filtering text for language model training (slides and one article) PDF

      Here's a short paper that introduces the problem of training data selection for building language models that are suitable for the target task: https://www.microsoft.com/en-us/research/publication/intelligent-selection-of-language-model-training-data/
      The paper presents previous common approaches and proposes a new efficient selection technique.

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Subword LMs (slides and one article) Fil PPTX

      Here is the link to the article. This should cover what will be presented on Friday.

      https://www.aclweb.org/anthology/P18-1007.pdf

      Here is the presentation slide for our project. Please have a look at it.
      https://docs.google.com/presentation/d/187f7n6Vwu0gLOXTNNcfGef4RrZTTkaAvhemgDeBpUj0/edit?usp=sharing

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Low resource ASR (LM adaptation) Fil PDF

      Here is an article for you to read before Friday. We haven't used deep learning, so don't focus too much on the deep learning parts, rather focus on the subject itself :)

      https://www.researchgate.net/publication/344006569_Acoustic_Modeling_Based_on_Deep_Learning_for_Low-Resource_Speech_Recognition_An_Overview

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Spoken command recognition (slides and one article) Fil PPTX

      The following article is the reference paper related to our topic, which is about keyword spotting or speech command recognition. It is a quite straightforward topic and we believe CNN model would be a good start to let you know the basic idea of this KWS task :)

      https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43969.pdf

      Here is the link of our slides for tomorrow's presentation. Thanks!
      https://drive.google.com/file/d/1ny2JLTF5S_YGj7X02dpUi-6-mBvnEDVj/view?usp=sharing

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Voice activity detection (slides and one article) Fil PDF

      A link to an article on Voice Activity Detection topic:

      https://drive.google.com/file/d/1a9-mJ4gtMt8M50lgktL3RrpgnGqCFUC5/view?usp=sharing


      Here is the link for current version of our presentation.

      https://drive.google.com/file/d/18sQxfvzKVMzvso5PMRwWKAU7TfxTu7yp/view

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Language identification (slides and article) Fil PDF

      The below a paper gives you insights on Language Identification, but the experiments that we wish to perform are not limited to it.

      https://repositorio.uam.es/bitstream/handle/10486/666848/automatic_lopez-moreno_ICASSP_2014_ps.pdf

      Hi, here are the slides for tomorrow's Language Identification presentation.

      https://drive.google.com/file/d/1_hwnzN-gR_yEhTwbZ24MdvfTow9Obfko/view

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Speaker adaptation (slides and article) Fil PDF

      you can find the paper on speaker adaptation (SA) techniques here.
      .and the preliminary version of the presentation can be found here.

      There might still be slight changes to this version.

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Deep Denoising Autoencoder (slides and article) Fil PDF

      Below is the paper related to my presentation about 'Deep Denoising Autoencoder for SpeechEnhancement' on Wednesday. 

      https://www.isca-speech.org/archive/archive_papers/interspeech_2013/i13_0436.pdf 


      The attached is my slides for tomorrow's presentation.

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Speaker verification (slides and article) Fil PDF

      Hello, here is an article about Speaker Recognition for you to check out before our presentation on Wednesday. The experiment done in this research is not that similar to what we worked on, so there is no need to understand it thoroughly, but we think it is useful to get some insight into Speaker Verification and some methods that can be used for this. 

      https://storage.googleapis.com/pub-tools-public-publication-data/pdf/44681.pdf

      Here are the slides! (Unfortunately some images are a bit pixelated because of the max file size of 500kB). See you all tomorrow.

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Audio event recognition (slides and article) Fil PDF

      I post here the link for an article that should gives you an overview about Audio Event Tagging for our presentation on Wednesday.

      https://ieeexplore.ieee.org/abstract/document/8336092

      Here our presentation! Sorry for the delay



    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      End-to-end ASR (slides and article) Fil PDF

      here is a nice comparison between state-of-the-art hybrid DNN/HMM models and attention-based end-to-end models – this should give you a good intuition about how things are with E2E systems right now: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1780.pdf

      And here's the link to a preview of our slides: https://docs.google.com/presentation/d/1a5R9zRc5UIvXgtMfeULi8Ad-mydmTj2NoUjmZH1b7gs/edit?usp=sharing
      Sadly, the PDF is too big to upload them here.

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Chatbot (slides and article) Fil PDF

      Here is the paper we would like you to read before our presentation on Friday: https://arxiv.org/pdf/1801.07243.pdf

      Here is a link to our presentation slides: https://drive.google.com/file/d/1S7XkHbL6t_XN4L05THByRuovg2DAhycl/view?usp=sharing

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Speech adaptation for children ASR (slides and article) Fil PPTX

      The slide(first edition) is here  https://drive.google.com/file/d/1xaDcfx-XwVfy_-ETGWE3Qdv4Zy6Q2n3j/view?usp=sharing .

      The reference paper is here https://drive.google.com/file/d/1iBFrMWMwBiwrDU6MGRIzhqAQonIsmyQX/view?usp=sharing .

    • Tillgänglig om: Du tillhör någon grupp
      Fil icon
      Alcohol intoxication (slides and article) Fil PDF

      The following article is the reference paper related to our topic, which is about classification of an intoxicated speaker using ASR. The paper focuses on classification using text, even though our work tries to capture the emotional state of speaker based on utterance of speech and the experiment that we wish are not limited to this paper. However, this paper is good starting for getting familiar with our work.
      http://suendermann.com/su/pdf/emotion2013.pdf