Kurs: ELEC-E5510 - Speech Recognition D, 28.10.2020-11.12.2020, Sektion: Uppgifter

Översikt

Uppgifter
In this list, the 2019 slides will be replaced by the 2020 ones after each lecture is given. The titles may be identical, but the contents are improved each year based on feedback. The project works and their schedule changes each year. Compared to 2019, there are significant content changes for lectures 1, 3, 4 and 5.

For practicalities, e.g. regarding to the Lecture Quizzes and Exercises, check MyCourses > Course Practicalities
- Välj aktivitet Lecture activity scores - final version
  
  Tillgänglig om: Du tillhör någon grupp
  
  Lecture activity scores - final version Fil PDF
  
  Explanation of columns:
  1        Pre-survey
  2        L1kahoot
  3        L1pen        Lecture 1 exercise
  4        L1feedback
  5        L2kahoot
  6        L2forward    Lecture 2a exercise
  7        L2viterbi    Lecture 2b exercise
  8        L2feedback
  9        L3kahoot
  10        L3gates        Lecture 3 exercise
  11        L3feedback
  12        L4kahoot
  13        L4token        Lecture 4 exercise
  14        L4feedback
  15        L5kahoot
  16        L5attention    Lecture 5 exercise
  17        L5feedback
  18        S1nativefb    Seminar 1: talk 1
  19        S1filterfb    Seminar 1: talk 2
  20        S2subwordfb    Seminar 2: talk 1
  21        S2lmadafb    Seminar 2: talk 2
  22        S2cmdfb        Seminar 2: talk 3
  23        S2vadfb        Seminar 2: talk 4
  24        S3lid        Seminar 3: talk 1
  25        S3spkrada    Seminar 3: talk 2
  26        S3autoenc    Seminar 3: talk 3
  27        S3sprkverif    Seminar 3: talk 4
  28        S3audio        Seminar 3: talk 5
  29        S4e2e        Seminar 4: talk 1
  30        S4chatbot    Seminar 4: talk 2
  31        S4children    Seminar 4: talk 3
  32        S4alcohol    Seminar 4: talk 4
- Välj aktivitet Zoom for lectures
  
  Tillgänglig om: Du tillhör någon grupp
  
  Zoom for lectures URL
- Välj aktivitet NEW Drive link to view the recorded lectures
  
  Tillgänglig om: Du tillhör någon grupp
  
  NEW Drive link to view the recorded lectures URL
  
  Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.
- Välj aktivitet OLD Drive link to view the recorded lectures
  
  Tillgänglig om: Du tillhör någon grupp
  
  OLD Drive link to view the recorded lectures URL
  
  Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.
- Välj aktivitet Lecture 1 - Feature extraction
  
  Lecture 1 - Feature extraction
- Välj aktivitet features (2020) - final 1.0
  features (2020) - final 1.0 Fil PDF
  
  course organization
  what is ASR
  features of speech
  MFCC
  GMM
  DNN
- Välj aktivitet Lecture exercise 1: Gaussian mixture model
  
  Lecture exercise 1: Gaussian mixture model Inlämningsuppgift
  
  Students must
  
  Lämna in
- Välj aktivitet Exercise session - 1See home assignment 1Exercise ...
  
  Exercise session - 1
  See home assignment 1
  Exercise session Zoom-links:
  Thursday 29.10.2020 10:15-12:00
  https://aalto.zoom.us/j/64813733654
  
  Friday 30.10.2020 14:15-16:00
  https://aalto.zoom.us/j/65736726818
- Välj aktivitet Lecture 2 - Phoneme modeling
  
  Lecture 2 - Phoneme modeling
- Välj aktivitet hmm (2020) final version 1.0
  hmm (2020) final version 1.0 Fil PDF
  
  Phonemes
  HMMs
  Forward algorithm
  Viterbi search
  HMM training algorithms
- Välj aktivitet Lecture exercise 2A: Forward algorithm
  
  Lecture exercise 2A: Forward algorithm Inlämningsuppgift
  
  Students must
  
  Lämna in
- Välj aktivitet Lecture exercise 2B: Viterbi search
  
  Lecture exercise 2B: Viterbi search Inlämningsuppgift
  
  Students must
  
  Lämna in
- Välj aktivitet Exercise session - 2see Homework Assignment 2
  
  Exercise session - 2
  see Homework Assignment 2
- Välj aktivitet Lecture 3 - Language Modeling
  
  Lecture 3 - Language Modeling
- Välj aktivitet LM (2020) Final version 1.0
  LM (2020) Final version 1.0 Fil PDF
  
  lexicon
  language modeling
  n-grams
  smoothing
  NNLMs
- Välj aktivitet Neural network language models
  Neural network language models Fil PDF
  
  Intro to NNLM
  Recurrent neural network language models
  Long Short-Term Memory language models
  Transformer language models
- Välj aktivitet Lecture exercise 3: LSTM cell state
  
  Lecture exercise 3: LSTM cell state Inlämningsuppgift
  
  Students must
  
  Lämna in
- Välj aktivitet Exercise session - 3See Homework Assignment 3
  
  Exercise session - 3
  See Homework Assignment 3
- Välj aktivitet Lecture 4 - Continuous speech and decoding
  
  Lecture 4 - Continuous speech and decoding
- Välj aktivitet LVCSR (2020) final version 1.0
  LVCSR (2020) final version 1.0 Fil PDF
  
  recognition in continuous speech
  token passing decoder
  improving the recognition performance and speed
  measuring the recognition performance
- Välj aktivitet Lecture exercise 4: Token passing decoder
  
  Lecture exercise 4: Token passing decoder Inlämningsuppgift
  
  Students must
  
  Lämna in
  
  The goal is to verify that you have the learned the idea of a Token passing decoder. The HMM system and observation are again almost the same as in 2A forward algorithm exercise. Now the task is to find the most likely state sequence to can produce the sequence of sounds A, A, B using a simple language model (LM). The toy LM used here is a look-up table that tells probabilities for different state sequences, (0,1), (0,0,1) etc., up to 3-grams.
  Hint: You can either upload an edited source document, a pdf file, a photo of your notes or a text file with numbers. Whatever is easiest for you. To get the activity point the answer does not have to be correct.
- Välj aktivitet Exercise session - 4See Homework Assignment 4
  
  Exercise session - 4
  See Homework Assignment 4
- Välj aktivitet Lecture 5 - End-to-end ASR with deep neural networ...
  
  Lecture 5 - End-to-end ASR with deep neural networks
- Välj aktivitet Lecture 5 recording
  
  Lecture 5 recording URL
- Välj aktivitet End-to-end ASR (2020) drafts
  
  End-to-end ASR (2020) drafts Fil PDF
  
  Three end-to-end approaches:
  - Attention-based ASR
  - Connectionist temporal classification
  - RNN Transducer
  Neural network specifics
  E2E Challenges and Applications
- Välj aktivitet E2E-ASR: Google slides
  
  E2E-ASR: Google slides URL
- Välj aktivitet Lecture exercise 5: Self attention
  
  Lecture exercise 5: Self attention Inlämningsuppgift
  
  Students must
  
  Lämna in
- Välj aktivitet Exercise session - 5See Homework Assignment 5
  
  Exercise session - 5
  See Homework Assignment 5
- Välj aktivitet Group project presentations, Wednesday 2nd Decembe...
  Group project presentations, Wednesday 2nd December (lecture starts at 10:15)
  9:30 -
  10:00 -
  10:30 Native language (12)
  11:00 Filtering text (05)
  11:30 -
  Tasks of the presenters and the audience:
  
  Presenters
  Two days before (or earlier if possible): Select one article for others to read and send the link to everybody in MyCourses discussion forum
  One day before: Upload your slides in MyCourses. The latest version of slides will be published for others in MyCourses. You can also share a draft of the slides or a link in the discussion forum.
  Practise to make sure that you will not exceed the 20 mins limit
  Remember your “audience” duties for the other talks of your day
  Audience
  For each talk do this:
  One day before: Read the provided articles and prepare one question to ask for each talk
  Follow the talk and the slides and ask your question in chat
  After the talk (max 1 day): Submit feedback (one for each talk) in MyCourses (all fields are required) to get activity points. The anonymous feedback (pros and cons) will be later shown to the presenters
- Välj aktivitet Group project presentations, Friday 4th December14...
  Group project presentations, Friday 4th December
  14:15 Subwords (14)
  14:45 LM adaptation (01)
  15.15 Spoken command recognition (04)
  15:45 Voice activity detection (16)
  The Zoom session will be opened a bit earlier to allow the presenters to practise screen sharing and check their microphones, demos etc. So the presenterts should join in at 14:00 at the latest, to make sure everything will go smoothly.
- Välj aktivitet Group project presentations, Wednesday 9th Decembe...
  Group project presentations, Wednesday 9th December
  09:30 Language identification (03)
  10:00 Speaker adaptation (10)
  10:30 Autoencoder (11)
  11:00 Speaker verification (15)
  11:30 Audio event recognition (02)
- Välj aktivitet Group project presentations, Friday 11th December ...
  Group project presentations, Friday 11th December
  14:15 End-to-end ASR (07)
  14:45 Chatbot (13)
  15.15 Children (09)
  15:45 Intoxication (08)
- Välj aktivitet Native language recognition (slidles and one article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Native language recognition (slidles and one article) Fil PDF
  
  The following paper presents the Native Language Recognition challenge, and provides both a dataset and baseline solution. This broad overview should give you everything needed for the presentation.
  
  https://www.isca-speech.org/archive/Interspeech_2016/pdfs/0129.PDF
  
  Note: Chapters that discuss Sincerity and Deception are not topics that are discussed in our presentation and thus can be ignored!
- Välj aktivitet Filtering text for language model training (slides and one article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Filtering text for language model training (slides and one article) PDF
  
  Here's a short paper that introduces the problem of training data selection for building language models that are suitable for the target task: https://www.microsoft.com/en-us/research/publication/intelligent-selection-of-language-model-training-data/
  The paper presents previous common approaches and proposes a new efficient selection technique.
- Välj aktivitet Subword LMs (slides and one article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Subword LMs (slides and one article) Fil PPTX
  
  Here is the link to the article. This should cover what will be presented on Friday.
  
  https://www.aclweb.org/anthology/P18-1007.pdf
  
  Here is the presentation slide for our project. Please have a look at it.
  https://docs.google.com/presentation/d/187f7n6Vwu0gLOXTNNcfGef4RrZTTkaAvhemgDeBpUj0/edit?usp=sharing
- Välj aktivitet Low resource ASR (LM adaptation)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Low resource ASR (LM adaptation) Fil PDF
  
  Here is an article for you to read before Friday. We haven't used deep learning, so don't focus too much on the deep learning parts, rather focus on the subject itself :)
  
  https://www.researchgate.net/publication/344006569_Acoustic_Modeling_Based_on_Deep_Learning_for_Low-Resource_Speech_Recognition_An_Overview
- Välj aktivitet Spoken command recognition (slides and one article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Spoken command recognition (slides and one article) Fil PPTX
  
  The following article is the reference paper related to our topic, which is about keyword spotting or speech command recognition. It is a quite straightforward topic and we believe CNN model would be a good start to let you know the basic idea of this KWS task :)
  
  https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43969.pdf
  
  Here is the link of our slides for tomorrow's presentation. Thanks!
  https://drive.google.com/file/d/1ny2JLTF5S_YGj7X02dpUi-6-mBvnEDVj/view?usp=sharing
- Välj aktivitet Voice activity detection (slides and one article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Voice activity detection (slides and one article) Fil PDF
  
  A link to an article on Voice Activity Detection topic:
  
  https://drive.google.com/file/d/1a9-mJ4gtMt8M50lgktL3RrpgnGqCFUC5/view?usp=sharing
  
  Here is the link for current version of our presentation.
  
  https://drive.google.com/file/d/18sQxfvzKVMzvso5PMRwWKAU7TfxTu7yp/view
- Välj aktivitet Language identification (slides and article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Language identification (slides and article) Fil PDF
  
  The below a paper gives you insights on Language Identification, but the experiments that we wish to perform are not limited to it.
  https://repositorio.uam.es/bitstream/handle/10486/666848/automatic_lopez-moreno_ICASSP_2014_ps.pdf
  Hi, here are the slides for tomorrow's Language Identification presentation.
  
  https://drive.google.com/file/d/1_hwnzN-gR_yEhTwbZ24MdvfTow9Obfko/view
- Välj aktivitet Speaker adaptation (slides and article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Speaker adaptation (slides and article) Fil PDF
  
  you can find the paper on speaker adaptation (SA) techniques here.
  .and the preliminary version of the presentation can be found here.
  
  There might still be slight changes to this version.
- Välj aktivitet Deep Denoising Autoencoder (slides and article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Deep Denoising Autoencoder (slides and article) Fil PDF
  
  Below is the paper related to my presentation about 'Deep Denoising Autoencoder for SpeechEnhancement' on Wednesday.
  https://www.isca-speech.org/archive/archive_papers/interspeech_2013/i13_0436.pdf
  
  The attached is my slides for tomorrow's presentation.
  https://drive.google.com/file/d/1x_3koDIN3uZbx5Y74SgvzdBq46To6j0j/view?usp=sharing
- Välj aktivitet Speaker verification (slides and article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Speaker verification (slides and article) Fil PDF
  
  Hello, here is an article about Speaker Recognition for you to check out before our presentation on Wednesday. The experiment done in this research is not that similar to what we worked on, so there is no need to understand it thoroughly, but we think it is useful to get some insight into Speaker Verification and some methods that can be used for this.
  https://storage.googleapis.com/pub-tools-public-publication-data/pdf/44681.pdf
  
  Here are the slides! (Unfortunately some images are a bit pixelated because of the max file size of 500kB). See you all tomorrow.
- Välj aktivitet Audio event recognition (slides and article)
  
  Tillgänglig om: Du tillhör någon grupp
  Audio event recognition (slides and article) Fil PDF
  
  I post here the link for an article that should gives you an overview about Audio Event Tagging for our presentation on Wednesday.
  https://ieeexplore.ieee.org/abstract/document/8336092
  
  Here our presentation! Sorry for the delay
- Välj aktivitet End-to-end ASR (slides and article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  End-to-end ASR (slides and article) Fil PDF
  
  here is a nice comparison between state-of-the-art hybrid DNN/HMM models and attention-based end-to-end models – this should give you a good intuition about how things are with E2E systems right now: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1780.pdf
  
  And here's the link to a preview of our slides: https://docs.google.com/presentation/d/1a5R9zRc5UIvXgtMfeULi8Ad-mydmTj2NoUjmZH1b7gs/edit?usp=sharing
  Sadly, the PDF is too big to upload them here.
- Välj aktivitet Chatbot (slides and article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Chatbot (slides and article) Fil PDF
  
  Here is the paper we would like you to read before our presentation on Friday: https://arxiv.org/pdf/1801.07243.pdf
  
  Here is a link to our presentation slides: https://drive.google.com/file/d/1S7XkHbL6t_XN4L05THByRuovg2DAhycl/view?usp=sharing
- Välj aktivitet Speech adaptation for children ASR (slides and article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Speech adaptation for children ASR (slides and article) Fil PPTX
  
  The slide(first edition) is here https://drive.google.com/file/d/1xaDcfx-XwVfy_-ETGWE3Qdv4Zy6Q2n3j/view?usp=sharing .
  
  The reference paper is here https://drive.google.com/file/d/1iBFrMWMwBiwrDU6MGRIzhqAQonIsmyQX/view?usp=sharing .
- Välj aktivitet Alcohol intoxication (slides and article)
  
  Tillgänglig om: Du tillhör någon grupp
  
  Alcohol intoxication (slides and article) Fil PDF
  
  The following article is the reference paper related to our topic, which is about classification of an intoxicated speaker using ASR. The paper focuses on classification using text, even though our work tries to capture the emotional state of speaker based on utterance of speech and the experiment that we wish are not limited to this paper. However, this paper is good starting for getting familiar with our work.
  http://suendermann.com/su/pdf/emotion2013.pdf

ELEC-E5510 - Speech Recognition D, 28.10.2020-11.12.2020

Översikt

Uppgifter

Lecture 1 - Feature extraction

Exercise session - 1

Lecture 2 - Phoneme modeling

Exercise session - 2

Lecture 3 - Language Modeling

Exercise session - 3

Lecture 4 - Continuous speech and decoding

Exercise session - 4

Lecture 5 - End-to-end ASR with deep neural networks

Exercise session - 5

Group project presentations, Wednesday 2nd December (lecture starts at 10:15)

Group project presentations, Friday 4th December

Group project presentations, Wednesday 9th December

Group project presentations, Friday 11th December

Students

Teachers

Service