Topic outline

  • Basics

    The course starts on Monday 13.9. at 09:15 and it is organized remotely on Zoom (link below).

    According to the feedback from last year, the workload of this course was too high. We have therefore reduced both the amount of content as well as the extent of exercises. Moreover, the overall structure of the course has been updated to make use of time more efficient.

    Structure

    We introduce a new structure for the course this year as follows:
    • Interactive discussion sessions on Thursdays - The contact teaching sessions will be only about interaction in the group. The preliminary list of topics is listed below (see also separate Section).
      The time for the interactive sessions was changed to Thursdays at 14:15-16:00 based on popular vote.
    • Short lecture videos - Most of the non-interactive content of the course has been recorded into short videos, which will be posted on MyCourses. These videos can be viewed on-demand and they are not fixed to any schedule. Videos that are prerequisites for exercises will be marked separately.
    • Guest lectures - We have planned three guest appearances from Abraham Zewoudie, Pablo Peréz Zarazaga and a representative from Jabra.
    Leading ideas of this structure are
    • Interactive sessions are only about interaction, such that expectations are right and the use of time is effective.
    • Non-interactive topics do not need to be lectured in an interactive lecture format, so it is more effective to present them as pre-recorded videos. Use of students' and teacher's time is more effective.

    This structure is now tried for the first time. In case it is a big failure, we can always go back to the traditional lecture format.

    Preliminary topics for interactive sessions

    • 13.9. a) Introduction to the course and b) cultural differences in speech technology.
    • 23.9. Expression by speech
    • 30.9. Speaker recognition
    • 7.10. Acoustic echo cancellation
    • 14.10. Quality evaluation
    • 21.10. Privacy


    Teaching material

    • The Introduction to speech processing -wiki material is the basis for the course. Those parts of the wiki which are required will be listed here on MyCourses (see section "Videos and notebooks").
    • Most videos present Jupyter Lab notebooks (Python), others in pdf-slides, which will be also published here on MyCourses.

    Pre-requisites

    Students are required to have basic skills in programming, signals and systems (Fourier transforms and filtering), linear algebra.

    • All programming examples are now in Python. If you know other languages then it is not difficult to pick it up, but then you need to allocate time for learning the language, on top of the other tasks. Try for example https://www.python.org/about/gettingstarted/ or https://www.w3schools.com/python/python_intro.asp
    • "Signals and systems" is important mostly in the sense that Fourier-spectra and filtering are central in all speech processing.
    • Math and linear algebra are important for applying results in practice: Most of the methods we discuss are based on vectors and matrices.
      If you plan to continue forward with other courses in speech processing, then such math skills are therefore required.
      However, if you attend this course as the only course you'll ever do in the area of speech processing, then you will be able to understand the basic principles and you can pass the course, even if your math skills were weaker.

    Exercises

    The exercise section consists of five tasks
    1. Basics of speech processing and analysis
    2. Fundamental frequency estimation
    3. Voice activity detection
    4. Speaker Recognition
    5. Speech Enhancement and Evaluation
    • Programming Language: Python
    • Each exercise will be available on My Course on Mondays (i.e., the first exercise will be available on September 13 at 7:00 AM, the second on September 20 at 7:00 AM .....)
    • Exercise submission due date is on Tuesdays at 23:59. You have 8 days to complete each exercise.
    • Failure to submit the assignment before the due date is penalized by point deduction.
    • The exercise lab session is on Fridays from 2:00 PM - 4: 00 PM, and it is organized remotely on Zoom (link below).
    • In addition to the Fridays session, you can send me an email regarding the exercises and I will reply asap.
    • Total Points: 30, each exercise gives 6 points.
    • The exercises are copyrighted by Aalto University and you are not allowed to copy the materials in the Internet.  

    Slack Channel for the course: https://app.slack.com/client/T02E6TEDBPC/D02E6TEDPQS
    An email invitation has been sent for all students to join the slack channel. Please check your spam folder if you haven't received the invitation.

    Exam

    The structure of and a semi-complete list of questions for the exam will be provided. The idea is that it is easier to focus efforts when the exam details are known and it does not give an unfair advantage to students who have access to previous years' questions.

    The exam will be an open-book, online exam, with webcams and microphones open. That is, you can look up any material you wish during the exam. You can do the exam at home or in a study space, as you wish, as long as you can keep a webcam open. Collaboration during the exam is not allowed and the webcam and microphone are used to monitor that you do not talk with others during the exam.

    Important: You are not allowed to copy any material from any source in the exam. All answers have to be completely written by yourself. Answers will be checked with an automated plagiarism-checking tool. Answers which do contain any copied parts will be graded to zero. Any suspicion of foul play (even when there is not sufficient evidence) will be reported to the school administration.

    Previous years' experience has demonstrated that this approach is necessary to guarantee that exams are fair to everyone. An unfortunate consequence is that simple mistakes have grave consequences. Last year, one student had read Wikipedia during the exam (which is allowed) and answered based on that material (the answer was correct). However, since he used the same sentence structure and same terminology, the plagiarism-checking tool triggered a warning, which we were forced to report to the administration. A huge nuisance and inconvenience for everyone, which took months to resolve, even if the student turned out to be innocent. So please do not copy anything!


    Grading

    Students are required to submit exercises and pass the exam. The philosophy is that solving exercises is the best way to learn and the purpose of the exam is to confirm that you have done your exercises yourself and not merely copied from a buddy. A goal with the exercises is further that each student would analyze and process their own sounds, such that it is not possible to directly copy results from friends or previous years. Each student thus has to do their own exercises, but collaboration with others is encouraged. Collaboration = good, but copying = bad.

    The calculation of the grades has followed the more or less the same rule for many years now and it is not expected to change, but we reserve the right to adjust it if a clear need arises. The exam consists of 4 questions, each worth 6 points for a total of 24 points, with a required minimum of 10 points. There are 5 exercises, each worth 6 points for a total of 30 points. The total score is calculated as a sum, total = exercises + exam, where the maximum total is 54 points. For the grade, we use the function grade = min(5, floor( (total-24)/5 )), which means spelled out that the grade is
    1. for 29 <= total < 34
    2. for 34 <= total < 39
    3. for 39 <= total < 44
    4. for 44 <= total < 49
    5. for 49 <= total <= 54.

    Teachers

    • Lectures: Tom Bäckström
    • Exercises: Abraham Zewoudie

    Learning goals

    • Understanding the basic phenomena of speech; speech production, phonetics.
    • Understanding operating principles and evaluation of benefits and constraints of speech technologies in the different sub-fields;
      • speech modeling,
      • speech coding,
      • voice activity detection,
      • speaker recognition and
      • speech enhancement (noise reduction, echo cancellation etc.).
    • Usage and evaluation of basic tools in speech processing.
    • Societal role of speech technology especially with respect to privacy.
    • Related topics which are not included:
    • Choice icon
      Which days are possible for the interactive sessions for you? Choice
      Not available unless: You belong to any group

      The lectures of this course are known to overlap with some other courses (though it is a known problem, it is very hard to fix).
      If either Monday or Thursday is difficult for a large proportion of the students, then we can change to the other day. If they are about equal, then we'll keep Monday.

      Observe that the course can be passed without participating in the interactive sessions, but they provide added value, so I recommend participation.


    • File icon
      Semi-complete list of questions for the exam File
      Not available unless: You belong to any group

      A more or less complete list of questions from which the exam questions are chosen.

      This contains the rules of the game, a description of the categories and formats of questions as well as the list of questions itself.

      We reserve the right to use questions outside this list (which has not happened in recent years), but the overall format and character of the exam will not change.