ELEC-E5541 Special Assignment in Speech and Language Processing V D
Topic outline
-
Introduction
This is an individual project course within speech and language processing. The extent can be chosen together with the supervisor anywhere between 1 and 10 ECTS. Typically projects are small research and development tasks, including documentation of findings, or literature reviews.
This course is a good way for
- MSc students to gain practical experience with speech and language processing by working on a specific topic
- MSc students to scope out potential future master's thesis topics and supervisors
- doctoral students to earn ECTS by working on topics that are distinct from but related to their doctoral thesis.
The teaching objective of assignments is to practice independent research and project work in a format similar to your future work. This includes among others project planning, searching for information, implementing and developing algorithms, choosing suitable experiments for testing and validation and writing a research report.
The choice of topic is free as long as it is about speech and language processing, but it is highly recommended that the topic is either
- Something where the student has a particular interest, like a topic with a connection with a hobby, work, or idea for a startup.
- It is useful for one of the research groups in speech and language technology. Below is a list of suggested topics from each of the research groups, together with a contact person.
As a last resort, if you have trouble choosing a topic, contact one of the teachers.To get started
Choose a topic!- If you have a topic of your own, choose a teacher whose interests align with your topic (list below) and contact them by email.
If you choose one of the topics below, send email to the contact person.
Schedule
You can start when you have time. Typically projects last 1-2 periods.Supervising teachers
- professor Paavo Alku (interests include analysis of speech production, speech in health technology (e.g. speech-based detection of diseases), signal processing and machine learning in medical analysis of speech)
- associate professor Tom Bäckström (interests include speech enhancement, privacy, speech in embedded devices, machine learning, voice conversion, and speech coding, etc.)
- assistant professor Lauri Juvela (interests include speech synthesis, machine learning, audio, speech and audio in embedded devices, differentiable DSP etc.)
- professor Mikko Kurimo (interests include automatic speech recognition, machine learning, etc.)
Suggested topics
<Reserved>
Quantitative analysis of gender bias in popular speech databases. A majority of speech databases label speakers by binary gender, male and female. It is clear that this is not a sustainable practice. In this project, the purpose is to quantify the bias in popular speech corpora, to evaluate the magnitude of the problem.Contact person:Tom Bäckström- Input bandwidth estimation with differentiable DSP for machine learning with dynamic complexity. Real-world audio equipment and software reproduce very different ranges of the spectrum. For example, cheap microphones can attenuate higher frequencies such that it is hard to know what parts of the spectrum are available. By estimating the usable frequency range, we can reduce complexity of machine learning methods by processing only that part of spectrum which is usable.
Contact person: Esteban Gómez or Tom Bäckström Serverless listening test software with for example webAssembly and webAudio. Available software libraries for crowdsourced listening tests like webMushra all require a server, whose implementation and maintenance are cumbersome. The task would be to implement at least a proof-of-concept level implementation of a listening test that can run in a browser.
Contact person: Tom BäckströmAudio watermarking for protection against speech deep-fakes. This is an exploratory study to determine the state-of-art in audio watermarking and protection against speech deep-fakes through a literature study, as well as experimenting with available algorithms. The idea is that watermarks could be made legally mandatory for all "deep-fake" like technologies. The objective is to investigate the extent to which this idea is feasible and effective.
Contact person: Lauri Juvela and Tom BäckströmSpeech-based biomarking of health with machine learning. In addition to its linguistic contents, speech includes extralinguistic information about the speaker's state of health. Therefore, the speaker's state of health can be predicted from speech signal in a non-invasive manner. Increasing research interest is devoted particularly to detect Covid-19 or neurodegenerative diseases (such as Parkinson’s disease and Alzheimer’s disease) from speech signals using both classical ML methods (such as SVMs) and more recent deep learning methods. Specific topics (including literature reviews, small-scale experiments etc.) are provided in this health -related research area.
Contact person: Paavo Alku- Learnable filterbanks: Filterbanks such as Mel or Bark are commonly used as fixed frontend transformations in many audio processing tasks. The goal of this project would be to implement a neural network layer that can be initialized as commonly used filterbanks, but whose weights can be updated through training and hence tailored to a specific audio processing task.
Contact person: Esteban Gómez or Tom Bäckström - Low-complexity speech processing algorithms in collaboration with https://savox.fi. Real-world use cases like headsets require speech processing methods that run on affordable and power-efficient hardware. In this project, you will study either 1) voice activity detection, 2) noise reduction or 3) spectral whitening, with machine learning and signal processing methods. The goal is to study the quality/complexity trade-off between competing approaches.
Observe that this involves three alternative topics and can be thus chosen by several students.
Contact persons: Ilkka Huhtakallio and Tom Bäckström - Study of an objective test for speech intelligibility in collaboration with https://savox.fi. Maintenance and deployment of speech processing technologies in real-world use cases require methods for quality evaluation. Automated methods have the benefit that they can be applied after every change in the software or hardware, such that quality control is in essence continuous and bugs can be detected early. The goal of this project is to study the ABC-MRT16 (https://github.com/usnistgov/abcmrt16) objective speech intelligibility evaluation method, its Python implementation, and deployment in a real-life scenario.
Contact persons: Ilkka Huhtakallio and Tom Bäckström - Automatic speech recognition (ASR) and language modeling for spontaneous speech. Most public speech data is either read-aloud texts or scripted broadcasting material. Similarly most public text data is written material. However, most use cases for ASR are to recognize speech that is not available in text nor planned ahead. They include conversations, interviews, meetings and interaction with computers, robots or automated services. The work is study how to use the limited spontaneous speech resources to adapt the existing large speech and language models.
Contact person: Mikko Kurimo - Automatic speech recognition (ASR) and speaking assessment and feedback for foreign language learners. Most public speech data is spoken by native speakers of the language. However, there are many use cases for ASR where the speakers are non-native or foreign language (L2) learners. They include interviews, meetings, lectures or applications to practise or assess L2 skills. Furthermore, the L2 speech can be automatically analysed and the feedback may be very useful for improving the languge skills. The work is to study how to use the limited L2 speech resources to adapt the existing large speech and language models to recognize, analyse and compute feedback for L2 learners.
Contact person: Mikko Kurimo - Design and implementation of red-teaming for LLM-based applications. In this project, you will design and implement a concept for systematic red-teaming of applications built on LLM’s, and implement a hackathon focused on red-teaming a selection of applications. The target of the exercise is to design and test a red teaming concept as a means to identify LLM-related risks and vulnerabilities as part of enterprise AI governance process.
Contact persons: Tom Bäckström and Meeri Haataja (Saidot.ai)