ELEC-E5500 - Speech Processing, Lecture, 13.9.2021-25.10.2021
This course space end date is set to 25.10.2021 Search Courses: ELEC-E5500
Översikt
-
Introduction
- Why speech processing? video, wiki
- Applications and systems structures. video, wiki
- Characteristics of speech (slides available, wiki, videos) - For interactive session 2 "Expression by speech"
- Speech communication
- Speech production part 1 Overview
- Speech production part 2 Voiced and unvoiced sounds
- Speech production part 3 Vocal tract, formants, basic phonetics
- Speech production part 4 Further terminology and Conclusion
Getting started with notebooks
This introduction is available as a video.We use Jupyter labs notebooks for Python code examples. The idea is that you can try and play with the code yourself with a minimum of Python-skills needed.
All notebooks will be posted on this page. We recommend that you use jupyter.cs.aalto.fi to run the notebooks, but you can also run a Jupyter labs server on your own computer.- On the jupyter.cs.aalto.fi server, log in with your aalto username, then choose "Python: General use (JupyterLab)" and press "Start".
- To download the notebooks to jupyter.cs.aalto.fi, press the git-clone button (see picture) and enter "https://version.aalto.fi/gitlab/backstt1/speech_processing_jupyter_notebooks.git". Keep an eye for updates - the repository is most likely improved during the course.
- Setting up the python environment. video, notebook "Configure environment.ipynb"
- Installing packages in jupyter.cs.aalto.fi; from the launcher, open terminal. In the terminal, you can install packages such as "conda install matplotlib torch scipy numpy ipython torchaudio" and "pip install sounddevice", according to the above document.
- If the jupyter notebooks are updated (the git repository is updated),
then it is probably easiest to remove your old folder or rename it, and
then clone the git repository once more. Alternatively, you can open a
terminal window, go to the folder "cd speech_processing_jupyter_notebooks" and update the git with "git pull origin master". However, if you have made changes to your local copy, the pull-operation might fail, but that is a story for another time.
Basic properties, analysis and operations (wiki, videos)
- Short-time analysis (Introduction, Window length, Window function, Spectrum, Envelope and Formants, Fundamental frequency (short), Spectrogram - For exercise 1 "Windowing"
- Mel-cepstrum and the MFCC
- Short-time processing and the STFT
- Accuracy over time = Sampling rate
- Accuracy over amplitude = Quantization and pulse code modulation
- Time-domain analysis
- Linear prediction and linear predictive coding (LPC)
- Long-time prediction (LTP)
- Fundamental frequency - For exercise 2 "Fundamental frequency"
Speech processing modules
- Voice activity detection (VAD) (wiki, video) - For exercise 3 "Voice activity detection"
- Speech enhancement (wiki, video)
- Speech (and audio) coding (wiki, videos)
- Speaker recognition (guest lecture, wiki and wiki, video, see also slides)
- Echo cancellation (guest lecture, wiki, video, see also slides)
Evaluation of speech processing modules (wiki, videos)
- Subjective quality
- Objective quality
- Other performance measures
- Analysis of evaluation results
Other topics