Topic outline

  • The course Deep Learning with Audio will introduce the state of the art in deep learning models applied to sound and music, with hands-on exercises on recent artificial intelligence (AI) implementations such as DDSP, AI-Duet, GANSynth, NSynth, GANSpaceSynth and SampleRNN. We will provide code templates that integrate the functionality from these open source deep learning audio projects into Pure Data programming environment. Students will be able to run, modify, access, control, input, output these deep learning models through Pure Data examples. Students will gain an understanding of the differences in input, computational cost and sonic characteristics between the different models, which will help formulate a course project. 

    We will provide detailed setup instructions and automated scripts to make installation of the required tools as easy as possible (for Pure Data, Python, Conda, Magenta, PyExt). The current module of installations and course exercises only work in macOS and Linux, unfortunately Windows is not supported at the moment. You can contact and reserve a laptop computer for this course. 

    Students will also learn and practice preparing data sets and traning deep learning models using cluster network in Aalto University. Students will further explore a particular model and incorporate it into their own project work. Deep Learning with Audio is a project-based course, we dedicate half of the contact hours for project work, the lecturer and the teaching assistants will support students by giving sufficient guidance, feedback and tutoring. At the end of the course, students will submit and present their projects.

  • ------------------------------- WEEK 1 ------------------

    Tue 20/04/2021  09:00 – 12:45  

    • Introduction to Deep Learning with Audio
      • Audio Domain and Symbolic Applications - different models - different methods
    • Deep Learning Models Applied to musical projects 
      • Examples

    • What tools do we have available? 
      •  Pure Data, Python, Conda, Magenta, PyExt
      • Installation of the Required Tools
    • Getting started with AI-Duet
      • Symbolic Domain
      • Setting up Melody RNN
      • Course Exercise on AI-Duet
      • Brief discussion on the exercise outcomes

    Wed 21/04/2021  09:00 – 12:45 

    • DDSP (Differentiable Digital Signal Processing)  
      • Timber Transfer
      • Setup (MacOS and Linux)
      • Features in timbre_transfer.pd
      • Checkpoints
    • Exercises
      • Try a few different combinations of input audio and checkpoint. What kind of observations can you make about how the inputs' characteristics affect the output?
      • Experiment with the f₀ octave shift, f₀ confidence threshold and loudness dB shift parameters. How does the algorithm respond to extreme values of these?
      • Brief discussion on the exercise outcomes
      • Group training - we will train a few checkpoints overnight with students' audio (takes ~5 hours per checkpoint)

    Thu 22/11/2019  09:00 – 12:45  
    • Nsynth and GANSynth
      • GANSynth: adversarial neural audio synthesis
      • Architecture of GANSynth
      • Other audio/music applications of GANs
      • Checkpoints
      • Setup (MacOS and Linux)
      • Training GANSynth

    • Exercises
      • Try generating some random latent vectors and synthesizing sounds from them using gansynth.pd and the all_instruments checkpoint. What kind of timbres does the neural network generate? How does the acoustic_only checkpoint compare?
      • Try manually drawing in the latent vector (z) array and then synthesizing. GANSynth expects z to be normalized such that its magnitude is 1, but drawing in arbitrary values breaks this. What happens to the generated sounds?
      • Try interpolating between different latent vectors using gansynth_multi.pd. How does the resulting synthesized sound compare to the sounds from the original latent vectors? By default, the synthesise message in this patch is set up to generate four different pitches, but it may be easier to compare sounds by using the same pitch for each.
      • Brief discussion on the exercise outcomes
      • Group training
    • NSynth: neural audio synthesis
      • WaveNet
      • Open NSynth Super
      • nsynth.pd

    • Exercise
      • Load some sounds into nsynth.pd and explore how they change by moving the position on the X/Y pad. If you don't have a MIDI input, you can manually send note_on <pitch> messages to the second inlet of the subpatch containing the X/Y pad. Investigate the structure of the patch. What kind of alternative ways of interacting with the sounds can you come up with?
      • Brief discussion on the exercise outcomes
      • Group Training:  Using any kind of instrument you prefer, record 4-second samples of each of the following notes: C2, E2, G#2, C3, E3, G#3, C4 (MIDI notes 24, 28, 32, 36, 40, 44, 48). Convert the samples to 16000 Hz sample rate, 16-bit signed integer. Make sure they're exactly 4 seconds long (64000 samples). Note that the low sample rate means your sounds will lose all frequencies above 8000 Hz, so don't waste time on making super detailed highs! Name your samples with the instrument name and note number separated by an underscore, e.g. sandstormlead_24.wav. We will collect the samples in groups of four and run the audio generation scripts on Aalto Science-IT's Triton cluster. This will take a few days, after which we will load the samples onto Open NSynth Super devices and explore the generated sounds.

    Fri 23/04/2021  09:00 – 12:45
    • GANSpaceSynth
      • Conditional GANs
      • GANSpaceSynth Architecture
      • Setup (MacOS, Linux)
      • Checkpoints
      • ganspacesynth.pd

    • Hallucinations
      • Conditional GANs
      • ganspacesynth_halluseq.pd
      • Exercise: We will compare the audio features that are extracted from PCA on 3 dimensions, describing their semantic meanings and comparable differences. 2 different checkpoints will be used in this exercise
      • Brief discussion on the exercise outcomes
    • Project ideas pitch ( 1min / student ) 

    ------------------------------- WEEK 2 ------------------

    Tue 27/04/2021  09:00 – 12:00

    • SampleRNN
      • generate sequences of similar audio
      • albums generated using SampleRNN - DADABOTS
      • Setup (macOS and Linux)
      • Checkpoints
    • Exercise
      • Try generating some sounds with different values for the sampling temperature parameter. How does it affect the results?
      • Brief discussion on the exercise outcomes

    Wed 28/04/2021  09:00 – 12:00
    •  Project work and Tutoring

    Thu 29/04/2021  09:00 – 12:00
    • Project work and Tutoring

    Fri 30/04/2021  09:00 – 12:00

    • Project work and Tutoring

    ------------------------------- WEEK 3------------------  

    Tue 04/05/2021  - 09:00 – 12:00
    • Project work and Tutoring

    Wed 05/05/2021  - 09:00 – 12:00
    • Project work and Tutoring

    Thr  06/05/2021  - 09:00 – 12:00
    • Project work and Tutoring


    Fri 07/05/2021  - 09:00 – 12:00