Topic outline

  • The course starts on Monday 4.9.2023 at 09:15 in F239a (the main auditorium) at Otakaari 3. Onsite sessions are organized throughout period 1, every week on Monday 9:15-12:00 and Thursday 14:15-16:00. Exercises are every Friday 14:15-16:00.

    The course content has three components: Technical & theoretical study (Monday), Human factors in and societal impact of speech technology (Thursday), and Practical implementation (Friday).

    Learning goals

    • Understanding the basic phenomena of speech; speech production, phonetics.
    • Understanding operating principles and evaluation of benefits and constraints of speech technologies in the different sub-fields; speech modeling, speech coding, voice activity detection, medical analysis of speech, and speech enhancement (noise reduction, echo cancellation, etc.).
    • Usage and evaluation of basic tools in speech processing.
    • Societal role of speech technology especially with respect to privacy.
    • Related topics that are not included: Basics of (speech) perception are covered in the course Communication Acoustics. Language and language modeling is covered by courses on Speech recognition and Statistical natural language processing.

    Pre-requisites

    Basic understanding of linear algebra, stochastics, programming as well as signals and systems are mandatory.
    Prior knowledge in digital signal processing and machine learning is highly beneficial.
    Exercises are implemented primarily with Python. Matlab can be used as well, though we do not provide any support for it. Prior knowledge of Python is useful, but good skills in other languages should be sufficient.


    Sessions

    Technical & theoretical study (Monday)

    Objective: Presentation of the most important content and give an opportunity to ask questions about the content.
    Format: Play a lecture video and discuss any questions. Iterate.
    Monday sessions are for technical content, where we watch short videos explaining speech processing algorithms. This content (signal processing and machine learning) comes from the online book Introduction to Speech Processing. Between videos, there is an opportunity to ask questions. Videos are available online such that you can, if you wish, watch them any time you want.
    Participation in video lectures is voluntary but gives the opportunity to ask questions.
    New: Sessions are in a traditional lecture format, covering the most important topics from the reading material. Those who cannot attend can watch the corresponding videos (see Learning material).

    Human factors in and societal impact of speech technology (Thursday)

    Objective: Learn to appreciate the role and challenges of speech technology in the big picture.
    Format:
    • We alternate between small groups and joint discussions, and for some topics, we use a panel discussion format.
    • Every session ends with a 15-minute period for writing a learning diary.
    • Each submitted learning diary awards one point for the overall score.
    • The deadline for submission is Sunday evening.
    • Reflect on what you have learned this week. The extent is 15 minutes of writing, which is likely something like half a page (2-3 paragraphs).
    • One question in the exam will be based on these discussions.
    • By reflecting on the topic of the week, learning diaries can be written without attending the sessions but will take more time and effort.
    In these joint discussions, we consider for example how culture is reflected in speech technology, how speech technology is designed, data collected and quality evaluated, as well as the challenges to sustainability that speech technology presents.
    Participation in joint discussions is highly recommended as this material cannot be covered in reading materials.
    Exam questions can however be answered without attending the sessions, but preparation for the exam will then take more time and effort.

    Practical implementation (Friday)

    Objective: Learn the practical tasks in implementing and evaluating speech processing methods.
    Format:
    • A new exercise is released every Tuesday and its submission deadline is on the following Monday.
    • Exercise sessions every Friday 14:15-16:00 at Maari E - 229. TAs are there to help and answer questions about exercises. The plan is to have these sessions in hybrid mode (online & onsite). Zoom link: https://aalto.zoom.us/j/67316526232
    • Each of the 4 exercises awards up to 6 points to the overall score.
    • Exercises are solved using sounds of your own voice or other sound samples of your own. This has multiple benefits: 1) You learn to handle real-world sounds and effects. Each voice is unique and will have its own difficulties and properties. 2) Describing the effects visible and audible in your voice gives a deeper meaning to the exercises. It is not some obscure anonymous sound sample but it is you. 3) Individualized exercises make cheating very difficult. You have to analyze your results and results vary across sound samples and across individual persons.
    Detailed instructions and weekly topics are released in the section "Exercises".
    Friday exercise sessions are for practicing and asking questions about the implementations. The submission deadline for answers is on Monday evening.
    Participation in exercise sessions is voluntary but gives the opportunity to ask questions.

    Schedule and Learning material

    A selection of chapters from https://speechprocessingbook.aalto.fi/. See section Sessions and Learning material.
    Videos presenting those chapters. - This is the learning material for the download mode. Read texts and watch videos when it best fits your schedule. Skip content that you are already familiar with. Pause when you need time to digest.
    Weekly collections of material are provided to prepare for interactive sessions and support the completion of exercises.
    The exam covers all material detailed in the section Sessions and Learning material.

    Exam

    Objective: Verify that students have participated in activities and solved the exercises themselves, as well as evaluate the level of their knowledge.
    Format:
    • A more-or-less complete list of exam questions is provided in the Exam section, such that you know the style and extent of questions.
    • The exam has 4 questions worth up to 6 points each.
    • Classic pen-and-paper exam with handwritten notes as supporting material.

    Grading

    • Grading is based on exercise scores, submitted learning diaries, and an exam. The purpose of the exam is to verify that you have actively participated in all activities, and is thus intended to be easy for those who have actively participated.
    • Exercises give 4x6 = 24 points, exam gives 4x6 = 24 points and learning diary 6x1 = 6 points, for a total maximum of 54 points. The final grade is calculated with the formula grade = min(5, floor( (total-24)/5 )), or, specifically
              for 29 <= total < 34
              for 34 <= total < 39
              for 39 <= total < 44
              for 44 <= total < 49
              for 49 <= total <= 54.
    • Overall, grading is thus meant to encourage completing exercises and learning diaries. Conversely, the idea is that the exam is easy if you have done your homework.

    Rules

    Copying of material is not allowed.
    • We will use both automated tools and manual checking to verify that nothing is copied.
    • Single mistakes will be graded to zero.
    • Repeated offences and malicious intent will be reported to the administration.
    Looking up things on the internet is allowed and actively encouraged.
    • In actual R&D work, we look up stuff on the internet all the time. That is how work is today. We embrace this approach.
    • In particular, solutions from previous years can be found online or through friends. Looking at such material is allowed, but discouraged. It is in your best interest to try to learn and understand. Exercises change from year to year and experience has shown that it is easy to spot copy-cats.
    • Using AI tools like ChatGPT and Bing.AI is allowed, but the student has always the responsibility of correctness. The purpose of the course is to learn and excessive reliance on AI does not help learning.
    All provided exercise material is copyrighted by Aalto University and redistribution is explicitly not permitted. Other learning materials are licensed under a Creative Commons license unless otherwise indicated.

    Communication

    Public questions can also be made in all lectures and in all exercise sessions.
    The main online communication channel is Zulip (an open-source alternative to Slack, hosted by Aalto) at https://elec-e5500-2023.zulip.aalto.fi/
    • All participants will be invited at the start of the course.
    • Access by invitation only. If you have problems with access, email mailto:tom.backstrom@aalto.fi
    • Suitable for all non-sensitive questions
    • Discuss here and ask questions about content, exercises, exams, etc.
    Private questions
    • Contact me after a lecture or through email mailto:tom.backstrom@aalto.fi
    • For questions about grading of own submissions, absences, etc.
    • Note that questions related to content and organization are usually public, and answers should be made available to all. Such questions are thus better fitted to lectures or Zulip.

    Teaching team

    Responsible teacher (lectures): Tom Bäckström, visiting hour on Mondays after lecture.
    Teaching assistant (exercises): Mohammad Vali