ELEC-E5510 - Speech Recognition D, 28.10.2020-11.12.2020
This course space end date is set to 11.12.2020 Search Courses: ELEC-E5510
Topic outline
-
In this list, the 2019 slides will be replaced by the 2020 ones after each lecture is given. The titles may be identical, but the contents are improved each year based on feedback. The project works and their schedule changes each year. Compared to 2019, there are significant content changes for lectures 1, 3, 4 and 5.
For practicalities, e.g. regarding to the Lecture Quizzes and Exercises, check MyCourses > Course Practicalities
-
Lecture activity scores - final version FilePDF document
Explanation of columns:
1 Pre-survey
2 L1kahoot
3 L1pen Lecture 1 exercise
4 L1feedback
5 L2kahoot
6 L2forward Lecture 2a exercise
7 L2viterbi Lecture 2b exercise
8 L2feedback
9 L3kahoot
10 L3gates Lecture 3 exercise
11 L3feedback
12 L4kahoot
13 L4token Lecture 4 exercise
14 L4feedback
15 L5kahoot
16 L5attention Lecture 5 exercise
17 L5feedback
18 S1nativefb Seminar 1: talk 1
19 S1filterfb Seminar 1: talk 2
20 S2subwordfb Seminar 2: talk 1
21 S2lmadafb Seminar 2: talk 2
22 S2cmdfb Seminar 2: talk 3
23 S2vadfb Seminar 2: talk 4
24 S3lid Seminar 3: talk 1
25 S3spkrada Seminar 3: talk 2
26 S3autoenc Seminar 3: talk 3
27 S3sprkverif Seminar 3: talk 4
28 S3audio Seminar 3: talk 5
29 S4e2e Seminar 4: talk 1
30 S4chatbot Seminar 4: talk 2
31 S4children Seminar 4: talk 3
32 S4alcohol Seminar 4: talk 4 -
Zoom for lectures URL
-
NEW Drive link to view the recorded lectures URL
Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.
-
OLD Drive link to view the recorded lectures URL
Please do not distribute these to other than the course participants! This is because the comments or questions from course participants have not been filtered out yet.
-
- course organization
- what is ASR
- features of speech
- MFCC
- GMM
- DNN
-
Make a submission
-
- Phonemes
- HMMs
- Forward algorithm
- Viterbi search
- HMM training algorithms
-
Make a submission
-
Make a submission
-
- lexicon
- language modeling
- n-grams
- smoothing
- NNLMs
-
- Intro to NNLM
- Recurrent neural network language models
- Long Short-Term Memory language models
- Transformer language models
-
Make a submission
-
- recognition in continuous speech
- token passing decoder
- improving the recognition performance and speed
- measuring the recognition performance
-
Make a submission
The goal is to verify that you have the learned the idea of a Token passing decoder. The HMM system and observation are again almost the same as in 2A forward algorithm exercise. Now the task is to find the most likely state sequence to can produce the sequence of sounds A, A, B using a simple language model (LM). The toy LM used here is a look-up table that tells probabilities for different state sequences, (0,1), (0,0,1) etc., up to 3-grams.
Hint: You can either upload an edited source document, a pdf file, a photo of your notes or a text file with numbers. Whatever is easiest for you. To get the activity point the answer does not have to be correct.
-
Three end-to-end approaches:
- Attention-based ASR
- Connectionist temporal classification
- RNN Transducer
Neural network specifics
E2E Challenges and Applications -
Make a submission
-
Native language recognition (slidles and one article) FilePDF document
The following paper presents the Native Language Recognition challenge, and provides both a dataset and baseline solution. This broad overview should give you everything needed for the presentation.
https://www.isca-speech.org/archive/Interspeech_2016/pdfs/0129.PDF
Note: Chapters that discuss Sincerity and Deception are not topics that are discussed in our presentation and thus can be ignored! -
Filtering text for language model training (slides and one article) FilePDF document
Here's a short paper that introduces the problem of training data selection for building language models that are suitable for the target task: https://www.microsoft.com/en-us/research/publication/intelligent-selection-of-language-model-training-data/
The paper presents previous common approaches and proposes a new efficient selection technique. -
Subword LMs (slides and one article) FilePowerpoint 2007 presentation
Here is the link to the article. This should cover what will be presented on Friday.
https://www.aclweb.org/anthology/P18-1007.pdf
Here is the presentation slide for our project. Please have a look at it.
https://docs.google.com/presentation/d/187f7n6Vwu0gLOXTNNcfGef4RrZTTkaAvhemgDeBpUj0/edit?usp=sharing -
Low resource ASR (LM adaptation) FilePDF document
Here is an article for you to read before Friday. We haven't used deep learning, so don't focus too much on the deep learning parts, rather focus on the subject itself :)
https://www.researchgate.net/publication/344006569_Acoustic_Modeling_Based_on_Deep_Learning_for_Low-Resource_Speech_Recognition_An_Overview -
Spoken command recognition (slides and one article) FilePowerpoint 2007 presentation
The following article is the reference paper related to our topic, which is about keyword spotting or speech command recognition. It is a quite straightforward topic and we believe CNN model would be a good start to let you know the basic idea of this KWS task :)
https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43969.pdf
Here is the link of our slides for tomorrow's presentation. Thanks!
https://drive.google.com/file/d/1ny2JLTF5S_YGj7X02dpUi-6-mBvnEDVj/view?usp=sharing -
Voice activity detection (slides and one article) FilePDF document
A link to an article on Voice Activity Detection topic:
https://drive.google.com/file/d/1a9-mJ4gtMt8M50lgktL3RrpgnGqCFUC5/view?usp=sharing
Here is the link for current version of our presentation.
https://drive.google.com/file/d/18sQxfvzKVMzvso5PMRwWKAU7TfxTu7yp/view -
Language identification (slides and article) FilePDF document
The below a paper gives you insights on Language Identification, but the experiments that we wish to perform are not limited to it.
https://repositorio.uam.es/bitstream/handle/10486/666848/automatic_lopez-moreno_ICASSP_2014_ps.pdf
Hi, here are the slides for tomorrow's Language Identification presentation.
https://drive.google.com/file/d/1_hwnzN-gR_yEhTwbZ24MdvfTow9Obfko/view -
Speaker adaptation (slides and article) FilePDF documentyou can find the paper on speaker adaptation (SA) techniques here..and the preliminary version of the presentation can be found here.
There might still be slight changes to this version. -
Deep Denoising Autoencoder (slides and article) FilePDF document
Below is the paper related to my presentation about 'Deep Denoising Autoencoder for SpeechEnhancement' on Wednesday.
https://www.isca-speech.org/archive/archive_papers/interspeech_2013/i13_0436.pdf
The attached is my slides for tomorrow's presentation. -
Speaker verification (slides and article) FilePDF document
Hello, here is an article about Speaker Recognition for you to check out before our presentation on Wednesday. The experiment done in this research is not that similar to what we worked on, so there is no need to understand it thoroughly, but we think it is useful to get some insight into Speaker Verification and some methods that can be used for this.
https://storage.googleapis.com/pub-tools-public-publication-data/pdf/44681.pdf
Here are the slides! (Unfortunately some images are a bit pixelated because of the max file size of 500kB). See you all tomorrow. -
Audio event recognition (slides and article) FilePDF document
I post here the link for an article that should gives you an overview about Audio Event Tagging for our presentation on Wednesday.
https://ieeexplore.ieee.org/abstract/document/8336092
Here our presentation! Sorry for the delay
-
End-to-end ASR (slides and article) FilePDF documenthere is a nice comparison between state-of-the-art hybrid DNN/HMM models and attention-based end-to-end models – this should give you a good intuition about how things are with E2E systems right now: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1780.pdf
And here's the link to a preview of our slides: https://docs.google.com/presentation/d/1a5R9zRc5UIvXgtMfeULi8Ad-mydmTj2NoUjmZH1b7gs/edit?usp=sharing
Sadly, the PDF is too big to upload them here. -
Chatbot (slides and article) FilePDF documentHere is the paper we would like you to read before our presentation on Friday: https://arxiv.org/pdf/1801.07243.pdf
Here is a link to our presentation slides: https://drive.google.com/file/d/1S7XkHbL6t_XN4L05THByRuovg2DAhycl/view?usp=sharing -
Speech adaptation for children ASR (slides and article) FilePowerpoint 2007 presentationThe slide(first edition) is here https://drive.google.com/file/d/1xaDcfx-XwVfy_-ETGWE3Qdv4Zy6Q2n3j/view?usp=sharing .
The reference paper is here https://drive.google.com/file/d/1iBFrMWMwBiwrDU6MGRIzhqAQonIsmyQX/view?usp=sharing . -
Alcohol intoxication (slides and article) FilePDF documentThe following article is the reference paper related to our topic, which is about classification of an intoxicated speaker using ASR. The paper focuses on classification using text, even though our work tries to capture the emotional state of speaker based on utterance of speech and the experiment that we wish are not limited to this paper. However, this paper is good starting for getting familiar with our work.
http://suendermann.com/su/pdf/emotion2013.pdf
-