Topic outline

  • The list below will be modified and updated as we go along.


    8.9. Week 1

    Topic 1: Introduction
    To read before the session: Course Homepage
    • Presentation of practical organisation of the course.
    • Get to know the others on the course.

    Topic 2: Cultural differences in the use of speech technology
    You will be assigned into groups of approximately 3 people.

    (Duration 20-30 minutes) In your group, choose someone who keeps notes of your discussion so that you can summarize results for the whole group.
    • What services and devices with speech technology do you use? (make a list)
    • What services and devices with speech technology do your parents use? (make a list)
    • Have you noticed differences in the use of technology between your home region and here?
    • Has the availability of support for your language prevented you from using a speech service?
    • Have you had difficulty using some services because of your accent, local environment, network coverage etc.?
    • Can you draw conclusions to which extent cultural differences influence speech technology?

    Afterwards, we will meet all together and discuss your outcomes.


    15.9. Week 2

    Topic 1: Expression in speech

    We will study how emphasis and the fundamental frequency can be used to change the meaning of utterances. For example, the word "party" can be spoken such that it becomes a question, an exclamation, a neutral statement or a disappointment (try it!). Party? Party! Party. Party...

    You will be assigned into groups of approximately 3 people.

    (Duration 20-30 minutes) In your group, device examples of short sentences, where you can change the meaning by only changing speech characteristics such as emphasis, tone, fundamental frequency, whisper/shouting etc. Try to come up with examples of at least
    • Question vs. statement
    • Different emotions (Excited, sad etc.)
    • Ironical and joke vs. factual and serious
    If you like, you can also try
    • Different levels on the public vs intimate axis (talking publicly on stage vs. talking to your partner over a romantic dinner)
    • Near vs far (to other speakers)
    • Speed of speech - can that change meaning?
    • Volume of speech (shouting vs whispering) - can that change meaning?
    • Rough vs. smooth - can that change meaning?
    • High vs. low pitch - can that change meaning?
    • Other?
    Afterwards, each group can present a selection of their best sentences for the whole class.

    Topic 2: Future of speech technology
    For preparation: Think about sci-fi movies, what speech technology they present and how it is used?

    Where will speech technology go in 5 years, 10 years, 20 years and 50 years?

    For inspiration, think about what technology would be needed to achieve these scenes from sci-fi movies

    Focus especially on
    1. The acoustic speech signal processing (not so much the natural language processing or the artificial intelligence) and
    2. What works and what does not work properly. Why does it not work?
    Finally, do you think that all interfaces will be spoken in the future? Why do you think so or why not?


    22.9. Week 3

    Guest lecture: Medical analysis of speech by Dr. Sudarsana Kadiri.

    In this session, we will discuss about health information that is reflected in the speech signal and how to analyse speech for automatic detection.


    Topic 1:  Speech and Health  (Duration 25-30 minutes) 

    You will be assigned into groups of approximately 3 people. 

    • What are the different types of information present in speech? (e.g., message, age, language.... etc)
    • Can you group those types in some form?
    • What types of health problems reflects in the speech signals? (e.g., covid, Parkinson,... etc)
    • Think of how health status reflects in speech signals
    • Reflect on physical health and mental health, and their symptoms
    • Technology for health status detection (speech and multi-modal signals), possibilities and concerns 
    • Any other aspects related to health?

    Finally, do you think that smart devices can be used for assessing health status (speech vs multi-modal)? Why do you think so or why not? 

    Topic 2:  Lecture on Speech and Health (slides)




    29.9. Week 4

    Topic 1: Speech interaction

    What is interaction? Voice-controlled services like Siri and Alexa are based on a question&answer paradigm; how is speech interaction different from question&answer?

    This is an active research topic in our team, and we are looking for your input and new ideas.


    Topic 2: Data collection
    Most speech processing methods are today based on machine learning, which needs large amounts of data to train. Where does that come from? When choosing datasets (known as speech corpora), how do we choose a suitable corpus? If no suitable corpus is available, what do we do then? Record our own? How do we design corpora? What about data-augmentation, what is that and how does influence choice of corporus and data collection?

    (No prior reading material available)

    6.10. Week 5

    Topic: Quality evaluation
    To read before session: Quality evaluation of speech

    How do we evaluate the quality of speech technology? What is "good" and what is "bad"? We'll make the discussion in two rounds in breakout rooms. First, we generate a list of attributes that can be used to describe quality. Then we return to the main room to collect results. Second, each team gets a practical scenario where they should make a plan how to evaluate quality. For example, one team could get the task of choosing a new speech coding algorithm for Zoom. The question is, how do you go about measuring each quality-attribute and what is their relative importance?


    13.10. Week 6

    Topic: Privacy in speech technology

    How do you feel about privacy with speech technology? Does the lack of privacy bother you? Do you find privacy-zealots annoying? What is the likelihood of privacy problems for you personally and what would be the consequences? How about on a societal level? Some people are paranoid and others are complacent; what's up with that? To which extent can we trust our intuition with respect to privacy?

    • URL icon
      Presemo for live-chat and -polling URL
      Not available unless: You belong to any group

      Anonymous online chat, polling etc.