Kurssi: ELEC-E5500 - Speech Processing, Lecture, 13.9.2021-25.10.2021, Aihe: Videos and notebooks

Osion kuvaus

Videos and notebooks
Introduction
Why speech processing? video, wiki
Applications and systems structures. video, wiki
Characteristics of speech (slides available, wiki, videos) - For interactive session 2 "Expression by speech"
Speech communication
Speech production part 1 Overview
Speech production part 2 Voiced and unvoiced sounds
Speech production part 3 Vocal tract, formants, basic phonetics
Speech production part 4 Further terminology and Conclusion

Getting started with notebooks
This introduction is available as a video.

We use Jupyter labs notebooks for Python code examples. The idea is that you can try and play with the code yourself with a minimum of Python-skills needed.
All notebooks will be posted on this page. We recommend that you use jupyter.cs.aalto.fi to run the notebooks, but you can also run a Jupyter labs server on your own computer.

On the jupyter.cs.aalto.fi server, log in with your aalto username, then choose "Python: General use (JupyterLab)" and press "Start".
To download the notebooks to jupyter.cs.aalto.fi, press the git-clone button (see picture) and enter "https://version.aalto.fi/gitlab/backstt1/speech_processing_jupyter_notebooks.git". Keep an eye for updates - the repository is most likely improved during the course.
Setting up the python environment. video, notebook "Configure environment.ipynb"
Installing packages in jupyter.cs.aalto.fi; from the launcher, open terminal. In the terminal, you can install packages such as "conda install matplotlib torch scipy numpy ipython torchaudio" and "pip install sounddevice", according to the above document.
If the jupyter notebooks are updated (the git repository is updated), then it is probably easiest to remove your old folder or rename it, and then clone the git repository once more. Alternatively, you can open a terminal window, go to the folder "cd speech_processing_jupyter_notebooks" and update the git with "git pull origin master". However, if you have made changes to your local copy, the pull-operation might fail, but that is a story for another time.

Basic properties, analysis and operations (wiki, videos)
Short-time analysis (Introduction, Window length, Window function, Spectrum, Envelope and Formants, Fundamental frequency (short), Spectrogram - For exercise 1 "Windowing"
Mel-cepstrum and the MFCC
Short-time processing and the STFT
Accuracy over time = Sampling rate
Accuracy over amplitude = Quantization and pulse code modulation
Time-domain analysis
Linear prediction and linear predictive coding (LPC)
Long-time prediction (LTP)
Fundamental frequency - For exercise 2 "Fundamental frequency"

Speech processing modules
Voice activity detection (VAD) (wiki, video) - For exercise 3 "Voice activity detection"
Speech enhancement (wiki, video)
Speech (and audio) coding (wiki, videos)
Speaker recognition (guest lecture, wiki and wiki, video, see also slides)
Echo cancellation (guest lecture, wiki, video, see also slides)

Evaluation of speech processing modules (wiki, videos)
Subjective quality
Objective quality
Other performance measures
Analysis of evaluation results

Other topics
Privacy (slide available, wiki, videos, see also slides)
Currently hot topics (if there's enough time in the schedule, but might be omitted)

ELEC-E5500 - Speech Processing, Lecture, 13.9.2021-25.10.2021

Osion kuvaus

Videos and notebooks

Introduction

Getting started with notebooks

Basic properties, analysis and operations (wiki, videos)

Speech processing modules

Evaluation of speech processing modules (wiki, videos)

Other topics

Opiskelijoille

Opettajille

Palvelusta