Översikt



  • Note: Most of the videos were recorded with the previous edition of the learning material, but since the content has not changed (much), they should still match sufficiently. Let Tom know if there are significant glitches.

    Introduction

    • Why speech processing? video, book
    • Linguistic structure of speech. book
    • Applications and systems structures. video, book
    • Characteristics of speech (slides available, book, videos) - For interactive session 2 "Expression by speech"
      • Speech communication 
      • Speech production part 1 Overview
      • Speech production part 2 Voiced and unvoiced sounds
      • Speech production part 3 Vocal tract, formants, basic phonetics
      • Speech production part 4 Further terminology and Conclusion

    Getting started with notebooks (video)

    We use Jupyter labs notebooks for Python code examples. The idea is that you can try and play with the code yourself with a minimum of Python-skills needed.

    Links to the notebooks will be posted on this page, but they are also visible at https://speechprocessingbook.aalto.fi and https://github.com/Speech-Interaction-Technology-Aalto-U/itsp. To run the notebooks, we recommend that you use jupyter.cs.aalto.fi following the instructions below, but you can also run a Jupyter labs server on your own computer.

    • On the server jupyter.cs.aalto.fi, log in with your aalto username, then choose "Python: General use (JupyterLab)" and press "Start".
    • To download the notebooks to jupyter.cs.aalto.fi,
      1. press the git-clone button (see picture) and The git-clone button
      2. Enter "https://github.com/Speech-Interaction-Technology-Aalto-U/itsp.git". 
      3. Keep an eye for updates - the repository is likely improved during the course.
    • (CHECK IF NEEDED) Installing Python packages in jupyter.cs.aalto.fi:
      1. From the launcher, open terminal.
      2. In the terminal, install python packages "conda install matplotlib scipy numpy ipython ipywidgets".
      3. In the terminal, install pytorch packages "conda install torch torchaudio -c pytorch"
    • If the jupyter notebooks are updated (the git repository is updated), then it is probably easiest to remove your old folder or rename it, and then clone the git repository once more. Alternatively, you can open a terminal window, go to the folder "cd speech_processing_jupyter_notebooks" and update the git with "git pull origin master". However, if you have made changes to your local copy, the pull-operation might fail, but that is a story for another time.

    Basic properties, analysis and operations (book, videos)

    • Short-time analysis (Introduction, Window length, Window function, Spectrum, Envelope and Formants, Fundamental frequency (short), Spectrogram - For exercise 1 "Windowing"
    • Mel-cepstrum and the MFCC
    • Short-time processing and the STFT
    • Accuracy over time = Sampling rate
    • Accuracy over amplitude = Quantization and pulse code modulation
    • Time-domain analysis
    • Linear prediction and linear predictive coding (LPC)
    • Long-time prediction (LTP)
    • Fundamental frequency - For exercise 2 "Fundamental frequency"

    Speech processing modules

    • Voice activity detection (VAD) (wiki, video) - For exercise 3 "Voice activity detection"
    • Speech enhancement (wiki, video)
    • Speech (and audio) coding (wiki, videos)

    Evaluation of speech processing modules (wiki, videos)

    • Subjective quality
    • Objective quality
    • Other performance measures
    • Analysis of evaluation results

    Other topics