Osion kuvaus

  • Introduction

    • Why speech processing? video, wiki
    • Applications and systems structures. video, wiki
    • Characteristics of speech (slides available, wiki, videos) - For interactive session 2 "Expression by speech"
      • Speech communication 
      • Speech production part 1 Overview
      • Speech production part 2 Voiced and unvoiced sounds
      • Speech production part 3 Vocal tract, formants, basic phonetics
      • Speech production part 4 Further terminology and Conclusion

    Getting started with notebooks

    This introduction is available as a video.

    We use Jupyter labs notebooks for Python code examples. The idea is that you can try and play with the code yourself with a minimum of Python-skills needed.

    All notebooks will be posted on this page. We recommend that you use jupyter.cs.aalto.fi to run the notebooks, but you can also run a Jupyter labs server on your own computer.

    • On the jupyter.cs.aalto.fi server, log in with your aalto username, then choose "Python: General use (JupyterLab)" and press "Start".
    • To download the notebooks to jupyter.cs.aalto.fi, press the git-clone button (see picture) and enter "https://version.aalto.fi/gitlab/backstt1/speech_processing_jupyter_notebooks.git". Keep an eye for updates - the repository is most likely improved during the course.The git-clone button
    • Setting up the python environment. video, notebook "Configure environment.ipynb"
    • Installing packages in jupyter.cs.aalto.fi; from the launcher, open terminal. In the terminal, you can install packages such as "conda install matplotlib torch scipy numpy ipython torchaudio" and "pip install sounddevice", according to the above document.
    • If the jupyter notebooks are updated (the git repository is updated), then it is probably easiest to remove your old folder or rename it, and then clone the git repository once more. Alternatively, you can open a terminal window, go to the folder "cd speech_processing_jupyter_notebooks" and update the git with "git pull origin master". However, if you have made changes to your local copy, the pull-operation might fail, but that is a story for another time.

    Basic properties, analysis and operations (wiki, videos)

    • Short-time analysis (Introduction, Window length, Window function, Spectrum, Envelope and Formants, Fundamental frequency (short), Spectrogram - For exercise 1 "Windowing"
    • Mel-cepstrum and the MFCC
    • Short-time processing and the STFT
    • Accuracy over time = Sampling rate
    • Accuracy over amplitude = Quantization and pulse code modulation
    • Time-domain analysis
    • Linear prediction and linear predictive coding (LPC)
    • Long-time prediction (LTP)
    • Fundamental frequency - For exercise 2 "Fundamental frequency"

    Speech processing modules


    Evaluation of speech processing modules (wiki, videos)

    • Subjective quality
    • Objective quality
    • Other performance measures
    • Analysis of evaluation results

    Other topics

    • Privacy (slide available, wiki, videos, see also slides)
    • Currently hot topics (if there's enough time in the schedule, but might be omitted)