Topic: Lectures | ELEC-E5510 - Speech Recognition D, 28.10.2020-11.12.2020 | MyCourses

Home Schools Course feedback Service Links Intelliboard

This course space end date is set to 11.12.2020 Search Courses: ELEC-E5510

Topic outline

Lectures
In this list, the 2019 slides will be replaced by the 2020 ones after each lecture is given. The titles may be identical, but the contents are improved each year based on feedback. The project works and their schedule changes each year. Compared to 2019, there are significant content changes for lectures 1, 3, 4 and 5.

For practicalities, e.g. regarding to the Lecture Quizzes and Exercises, check MyCourses > Course Practicalities
- Select activity Lecture activity scores - final version
  
  Not available unless: You belong to any group
  
  Lecture activity scores - final version File PDF
  
  Explanation of columns:
  1        Pre-survey
  2        L1kahoot
  3        L1pen        Lecture 1 exercise
  4        L1feedback
  5        L2kahoot
  6        L2forward    Lecture 2a exercise
  7        L2viterbi    Lecture 2b exercise
  8        L2feedback
  9        L3kahoot
  10        L3gates        Lecture 3 exercise
  11        L3feedback
  12        L4kahoot
  13        L4token        Lecture 4 exercise
  14        L4feedback
  15        L5kahoot
  16        L5attention    Lecture 5 exercise
  17        L5feedback
  18        S1nativefb    Seminar 1: talk 1
  19        S1filterfb    Seminar 1: talk 2
  20        S2subwordfb    Seminar 2: talk 1
  21        S2lmadafb    Seminar 2: talk 2
  22        S2cmdfb        Seminar 2: talk 3
  23        S2vadfb        Seminar 2: talk 4
  24        S3lid        Seminar 3: talk 1
  25        S3spkrada    Seminar 3: talk 2
  26        S3autoenc    Seminar 3: talk 3
  27        S3sprkverif    Seminar 3: talk 4
  28        S3audio        Seminar 3: talk 5
  29        S4e2e        Seminar 4: talk 1
  30        S4chatbot    Seminar 4: talk 2
  31        S4children    Seminar 4: talk 3
  32        S4alcohol    Seminar 4: talk 4
- Select activity Zoom for lectures
  
  Not available unless: You belong to any group
  
  Zoom for lectures URL
- Select activity NEW Drive link to view the recorded lectures
  
  Not available unless: You belong to any group
  
  NEW Drive link to view the recorded lectures URL
  
  Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.
- Select activity OLD Drive link to view the recorded lectures
  
  Not available unless: You belong to any group
  
  OLD Drive link to view the recorded lectures URL
  
  Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.
- Select activity Lecture 1 - Feature extraction
  
  Lecture 1 - Feature extraction
- Select activity features (2020) - final 1.0
  features (2020) - final 1.0 File PDF
  
  course organization
  what is ASR
  features of speech
  MFCC
  GMM
  DNN
- Select activity Lecture exercise 1: Gaussian mixture model
  
  Lecture exercise 1: Gaussian mixture model Assignment
  
  Students must
  
  Make a submission
- Select activity Exercise session - 1See home assignment 1Exercise ...
  
  Exercise session - 1
  See home assignment 1
  Exercise session Zoom-links:
  Thursday 29.10.2020 10:15-12:00
  https://aalto.zoom.us/j/64813733654
  
  Friday 30.10.2020 14:15-16:00
  https://aalto.zoom.us/j/65736726818
- Select activity Lecture 2 - Phoneme modeling
  
  Lecture 2 - Phoneme modeling
- Select activity hmm (2020) final version 1.0
  hmm (2020) final version 1.0 File PDF
  
  Phonemes
  HMMs
  Forward algorithm
  Viterbi search
  HMM training algorithms
- Select activity Lecture exercise 2A: Forward algorithm
  
  Lecture exercise 2A: Forward algorithm Assignment
  
  Students must
  
  Make a submission
- Select activity Lecture exercise 2B: Viterbi search
  
  Lecture exercise 2B: Viterbi search Assignment
  
  Students must
  
  Make a submission
- Select activity Exercise session - 2see Homework Assignment 2
  
  Exercise session - 2
  see Homework Assignment 2
- Select activity Lecture 3 - Language Modeling
  
  Lecture 3 - Language Modeling
- Select activity LM (2020) Final version 1.0
  LM (2020) Final version 1.0 File PDF
  
  lexicon
  language modeling
  n-grams
  smoothing
  NNLMs
- Select activity Neural network language models
  Neural network language models File PDF
  
  Intro to NNLM
  Recurrent neural network language models
  Long Short-Term Memory language models
  Transformer language models
- Select activity Lecture exercise 3: LSTM cell state
  
  Lecture exercise 3: LSTM cell state Assignment
  
  Students must
  
  Make a submission
- Select activity Exercise session - 3See Homework Assignment 3
  
  Exercise session - 3
  See Homework Assignment 3
- Select activity Lecture 4 - Continuous speech and decoding
  
  Lecture 4 - Continuous speech and decoding
- Select activity LVCSR (2020) final version 1.0
  LVCSR (2020) final version 1.0 File PDF
  
  recognition in continuous speech
  token passing decoder
  improving the recognition performance and speed
  measuring the recognition performance
- Select activity Lecture exercise 4: Token passing decoder
  
  Lecture exercise 4: Token passing decoder Assignment
  
  Students must
  
  Make a submission
  
  The goal is to verify that you have the learned the idea of a Token passing decoder. The HMM system and observation are again almost the same as in 2A forward algorithm exercise. Now the task is to find the most likely state sequence to can produce the sequence of sounds A, A, B using a simple language model (LM). The toy LM used here is a look-up table that tells probabilities for different state sequences, (0,1), (0,0,1) etc., up to 3-grams.
  Hint: You can either upload an edited source document, a pdf file, a photo of your notes or a text file with numbers. Whatever is easiest for you. To get the activity point the answer does not have to be correct.
- Select activity Exercise session - 4See Homework Assignment 4
  
  Exercise session - 4
  See Homework Assignment 4
- Select activity Lecture 5 - End-to-end ASR with deep neural networ...
  
  Lecture 5 - End-to-end ASR with deep neural networks
- Select activity Lecture 5 recording
  
  Lecture 5 recording URL
- Select activity End-to-end ASR (2020) drafts
  
  End-to-end ASR (2020) drafts File PDF
  
  Three end-to-end approaches:
  - Attention-based ASR
  - Connectionist temporal classification
  - RNN Transducer
  Neural network specifics
  E2E Challenges and Applications
- Select activity E2E-ASR: Google slides
  
  E2E-ASR: Google slides URL
- Select activity Lecture exercise 5: Self attention
  
  Lecture exercise 5: Self attention Assignment
  
  Students must
  
  Make a submission
- Select activity Exercise session - 5See Homework Assignment 5
  
  Exercise session - 5
  See Homework Assignment 5
- Select activity Group project presentations, Wednesday 2nd Decembe...
  Group project presentations, Wednesday 2nd December (lecture starts at 10:15)
  9:30 -
  10:00 -
  10:30 Native language (12)
  11:00 Filtering text (05)
  11:30 -
  Tasks of the presenters and the audience:
  
  Presenters
  Two days before (or earlier if possible): Select one article for others to read and send the link to everybody in MyCourses discussion forum
  One day before: Upload your slides in MyCourses. The latest version of slides will be published for others in MyCourses. You can also share a draft of the slides or a link in the discussion forum.
  Practise to make sure that you will not exceed the 20 mins limit
  Remember your “audience” duties for the other talks of your day
  Audience
  For each talk do this:
  One day before: Read the provided articles and prepare one question to ask for each talk
  Follow the talk and the slides and ask your question in chat
  After the talk (max 1 day): Submit feedback (one for each talk) in MyCourses (all fields are required) to get activity points. The anonymous feedback (pros and cons) will be later shown to the presenters
- Select activity Group project presentations, Friday 4th December14...
  Group project presentations, Friday 4th December
  14:15 Subwords (14)
  14:45 LM adaptation (01)
  15.15 Spoken command recognition (04)
  15:45 Voice activity detection (16)
  The Zoom session will be opened a bit earlier to allow the presenters to practise screen sharing and check their microphones, demos etc. So the presenterts should join in at 14:00 at the latest, to make sure everything will go smoothly.
- Select activity Group project presentations, Wednesday 9th Decembe...
  Group project presentations, Wednesday 9th December
  09:30 Language identification (03)
  10:00 Speaker adaptation (10)
  10:30 Autoencoder (11)
  11:00 Speaker verification (15)
  11:30 Audio event recognition (02)
- Select activity Group project presentations, Friday 11th December ...
  Group project presentations, Friday 11th December
  14:15 End-to-end ASR (07)
  14:45 Chatbot (13)
  15.15 Children (09)
  15:45 Intoxication (08)
- Select activity Native language recognition (slidles and one article)
  
  Not available unless: You belong to any group
  
  Native language recognition (slidles and one article) File PDF
  
  The following paper presents the Native Language Recognition challenge, and provides both a dataset and baseline solution. This broad overview should give you everything needed for the presentation.
  
  https://www.isca-speech.org/archive/Interspeech_2016/pdfs/0129.PDF
  
  Note: Chapters that discuss Sincerity and Deception are not topics that are discussed in our presentation and thus can be ignored!
- Select activity Filtering text for language model training (slides and one article)
  
  Not available unless: You belong to any group
  
  Filtering text for language model training (slides and one article) File PDF
  
  Here's a short paper that introduces the problem of training data selection for building language models that are suitable for the target task: https://www.microsoft.com/en-us/research/publication/intelligent-selection-of-language-model-training-data/
  The paper presents previous common approaches and proposes a new efficient selection technique.
- Select activity Subword LMs (slides and one article)
  
  Not available unless: You belong to any group
  
  Subword LMs (slides and one article) File PPTX
  
  Here is the link to the article. This should cover what will be presented on Friday.
  
  https://www.aclweb.org/anthology/P18-1007.pdf
  
  Here is the presentation slide for our project. Please have a look at it.
  https://docs.google.com/presentation/d/187f7n6Vwu0gLOXTNNcfGef4RrZTTkaAvhemgDeBpUj0/edit?usp=sharing
- Select activity Low resource ASR (LM adaptation)
  
  Not available unless: You belong to any group
  
  Low resource ASR (LM adaptation) File PDF
  
  Here is an article for you to read before Friday. We haven't used deep learning, so don't focus too much on the deep learning parts, rather focus on the subject itself :)
  
  https://www.researchgate.net/publication/344006569_Acoustic_Modeling_Based_on_Deep_Learning_for_Low-Resource_Speech_Recognition_An_Overview
- Select activity Spoken command recognition (slides and one article)
  
  Not available unless: You belong to any group
  
  Spoken command recognition (slides and one article) File PPTX
  
  The following article is the reference paper related to our topic, which is about keyword spotting or speech command recognition. It is a quite straightforward topic and we believe CNN model would be a good start to let you know the basic idea of this KWS task :)
  
  https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43969.pdf
  
  Here is the link of our slides for tomorrow's presentation. Thanks!
  https://drive.google.com/file/d/1ny2JLTF5S_YGj7X02dpUi-6-mBvnEDVj/view?usp=sharing
- Select activity Voice activity detection (slides and one article)
  
  Not available unless: You belong to any group
  
  Voice activity detection (slides and one article) File PDF
  
  A link to an article on Voice Activity Detection topic:
  
  https://drive.google.com/file/d/1a9-mJ4gtMt8M50lgktL3RrpgnGqCFUC5/view?usp=sharing
  
  Here is the link for current version of our presentation.
  
  https://drive.google.com/file/d/18sQxfvzKVMzvso5PMRwWKAU7TfxTu7yp/view
- Select activity Language identification (slides and article)
  
  Not available unless: You belong to any group
  
  Language identification (slides and article) File PDF
  
  The below a paper gives you insights on Language Identification, but the experiments that we wish to perform are not limited to it.
  https://repositorio.uam.es/bitstream/handle/10486/666848/automatic_lopez-moreno_ICASSP_2014_ps.pdf
  Hi, here are the slides for tomorrow's Language Identification presentation.
  
  https://drive.google.com/file/d/1_hwnzN-gR_yEhTwbZ24MdvfTow9Obfko/view
- Select activity Speaker adaptation (slides and article)
  
  Not available unless: You belong to any group
  
  Speaker adaptation (slides and article) File PDF
  
  you can find the paper on speaker adaptation (SA) techniques here.
  .and the preliminary version of the presentation can be found here.
  
  There might still be slight changes to this version.
- Select activity Deep Denoising Autoencoder (slides and article)
  
  Not available unless: You belong to any group
  
  Deep Denoising Autoencoder (slides and article) File PDF
  
  Below is the paper related to my presentation about 'Deep Denoising Autoencoder for SpeechEnhancement' on Wednesday.
  https://www.isca-speech.org/archive/archive_papers/interspeech_2013/i13_0436.pdf
  
  The attached is my slides for tomorrow's presentation.
  https://drive.google.com/file/d/1x_3koDIN3uZbx5Y74SgvzdBq46To6j0j/view?usp=sharing
- Select activity Speaker verification (slides and article)
  
  Not available unless: You belong to any group
  
  Speaker verification (slides and article) File PDF
  
  Hello, here is an article about Speaker Recognition for you to check out before our presentation on Wednesday. The experiment done in this research is not that similar to what we worked on, so there is no need to understand it thoroughly, but we think it is useful to get some insight into Speaker Verification and some methods that can be used for this.
  https://storage.googleapis.com/pub-tools-public-publication-data/pdf/44681.pdf
  
  Here are the slides! (Unfortunately some images are a bit pixelated because of the max file size of 500kB). See you all tomorrow.
- Select activity Audio event recognition (slides and article)
  
  Not available unless: You belong to any group
  Audio event recognition (slides and article) File PDF
  
  I post here the link for an article that should gives you an overview about Audio Event Tagging for our presentation on Wednesday.
  https://ieeexplore.ieee.org/abstract/document/8336092
  
  Here our presentation! Sorry for the delay
- Select activity End-to-end ASR (slides and article)
  
  Not available unless: You belong to any group
  
  End-to-end ASR (slides and article) File PDF
  
  here is a nice comparison between state-of-the-art hybrid DNN/HMM models and attention-based end-to-end models – this should give you a good intuition about how things are with E2E systems right now: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1780.pdf
  
  And here's the link to a preview of our slides: https://docs.google.com/presentation/d/1a5R9zRc5UIvXgtMfeULi8Ad-mydmTj2NoUjmZH1b7gs/edit?usp=sharing
  Sadly, the PDF is too big to upload them here.
- Select activity Chatbot (slides and article)
  
  Not available unless: You belong to any group
  
  Chatbot (slides and article) File PDF
  
  Here is the paper we would like you to read before our presentation on Friday: https://arxiv.org/pdf/1801.07243.pdf
  
  Here is a link to our presentation slides: https://drive.google.com/file/d/1S7XkHbL6t_XN4L05THByRuovg2DAhycl/view?usp=sharing
- Select activity Speech adaptation for children ASR (slides and article)
  
  Not available unless: You belong to any group
  
  Speech adaptation for children ASR (slides and article) File PPTX
  
  The slide(first edition) is here https://drive.google.com/file/d/1xaDcfx-XwVfy_-ETGWE3Qdv4Zy6Q2n3j/view?usp=sharing .
  
  The reference paper is here https://drive.google.com/file/d/1iBFrMWMwBiwrDU6MGRIzhqAQonIsmyQX/view?usp=sharing .
- Select activity Alcohol intoxication (slides and article)
  
  Not available unless: You belong to any group
  
  Alcohol intoxication (slides and article) File PDF
  
  The following article is the reference paper related to our topic, which is about classification of an intoxicated speaker using ASR. The paper focuses on classification using text, even though our work tries to capture the emotional state of speaker based on utterance of speech and the experiment that we wish are not limited to this paper. However, this paper is good starting for getting familiar with our work.
  http://suendermann.com/su/pdf/emotion2013.pdf