Topic: Learning material | ELEC-E5500 - Speech Processing, Lecture, 8.9.2022-17.10.2022

Topic outline

Learning material
Note: Most of the videos were recorded with the previous edition of the learning material, but since the content has not changed (much), they should still match sufficiently. Let Tom know if there are significant glitches.
Introduction
Why speech processing? video, book
Linguistic structure of speech. book
Applications and systems structures. video, book
Characteristics of speech (slides available, book, videos) - For interactive session 2 "Expression by speech"
Speech communication
Speech production part 1 Overview
Speech production part 2 Voiced and unvoiced sounds
Speech production part 3 Vocal tract, formants, basic phonetics
Speech production part 4 Further terminology and Conclusion

Getting started with notebooks (video)
We use Jupyter labs notebooks for Python code examples. The idea is that you can try and play with the code yourself with a minimum of Python-skills needed.
Links to the notebooks will be posted on this page, but they are also visible at https://speechprocessingbook.aalto.fi and https://github.com/Speech-Interaction-Technology-Aalto-U/itsp. To run the notebooks, we recommend that you use jupyter.cs.aalto.fi following the instructions below, but you can also run a Jupyter labs server on your own computer.

On the server jupyter.cs.aalto.fi, log in with your aalto username, then choose "Python: General use (JupyterLab)" and press "Start".
To download the notebooks to jupyter.cs.aalto.fi,
press the git-clone button (see picture) and
Enter "https://github.com/Speech-Interaction-Technology-Aalto-U/itsp.git".
Keep an eye for updates - the repository is likely improved during the course.
(CHECK IF NEEDED) Installing Python packages in jupyter.cs.aalto.fi:
From the launcher, open terminal.
In the terminal, install python packages "conda install matplotlib scipy numpy ipython ipywidgets".
In the terminal, install pytorch packages "conda install torch torchaudio -c pytorch"
If the jupyter notebooks are updated (the git repository is updated), then it is probably easiest to remove your old folder or rename it, and then clone the git repository once more. Alternatively, you can open a terminal window, go to the folder "cd speech_processing_jupyter_notebooks" and update the git with "git pull origin master". However, if you have made changes to your local copy, the pull-operation might fail, but that is a story for another time.

Basic properties, analysis and operations (book, videos)
Short-time analysis (Introduction, Window length, Window function, Spectrum, Envelope and Formants, Fundamental frequency (short), Spectrogram - For exercise 1 "Windowing"
Mel-cepstrum and the MFCC
Short-time processing and the STFT
Accuracy over time = Sampling rate
Accuracy over amplitude = Quantization and pulse code modulation
Time-domain analysis
Linear prediction and linear predictive coding (LPC)
Long-time prediction (LTP)
Fundamental frequency - For exercise 2 "Fundamental frequency"

Speech processing modules
Voice activity detection (VAD) (wiki, video) - For exercise 3 "Voice activity detection"
Speech enhancement (wiki, video)
Speech (and audio) coding (wiki, videos)

Evaluation of speech processing modules (wiki, videos)
Subjective quality
Objective quality
Other performance measures
Analysis of evaluation results

Other topics
Privacy (wiki, videos, see also slides)
Medical analysis of speech (guest lecture, slides)

ELEC-E5500 - Speech Processing, Lecture, 8.9.2022-17.10.2022

Topic outline

Learning material

Introduction

Getting started with notebooks (video)

Basic properties, analysis and operations (book, videos)

Speech processing modules

Evaluation of speech processing modules (wiki, videos)

Other topics

Students

Teachers

About service