The course consists of introductory lectures, small home assignments, a written summary (3 ECTS) or a more extensive review or speech processing experiments (5 ECTS) on a chosen topic, and participating and giving seminar presentations.
Teacher of the course is Professor of Practice Tom Bäckström. Please contact email@example.com if you have any questions on the course.
The course can be taken as a 3 ECTS or 5 ECTS version depending on the desired workload.
A brief description of the topic
Privacy is all over the news. Just think of Cambridge Analytica, Amazon Alexa leaking private conversations and Google storing your location even when you have opted out.
What does that mean for speech and audio processing?
The ability of devices to process speech and audio is increasing rapidly. It is no longer only phones which process and transmit speech, but also computers, TVs, smartwatches and in the near future, likely also home appliances such as washing machines and lightswitches. It is the IoT of audio-enabled devices. For quality of services this has several benefits; more microphones can be used to improve audio quality and you are probably not anymore tied to a single device to get Siri to answer you. Likewise, cloud-services can offer better service the more they know about the user. Increasing capabilities and interaction between devices however raise important questions about privacy, such as:
- Which devices/services are allowed to process your speech? Your own devices, all devices, none or something else?
- How do we prevent unauthorized devices to get access to your speech?
- Big-data can be very, very useful. Can we get the benefits of big-data in speech applications without sacrificing privacy?
The goal of the present seminar course is to familiarize students with the existing and ongoing research and methods in the area. We will explore different areas of speech and audio where privacy is an issue as well as study methods for preserving privacy. In addition, students will get familiar with searching, interpreting, and summarizing existing research literature in speech technology, and presenting their work.Those undertaking a 5-credit version of the course can either carry out their own experiments in privacy-related speech or audio processing tasks or conduct a more extensive critical review on a mutually agreed topic.
Some potential topics for seminar projects/presentations (under construction)
On a high level, the project and presentation topics include areas such as
- Privacy-preserving speech recognition in the cloud
- Privacy-preserving machine learning methods for audio event detection
- Privacy-preserving speaker verification
- Speaker identification and spoofing
- Audio fingerprinting for private authentication
- Privacy in room-acoustics
On a practical level, the starting point can be, for example, one of the following articles:
- D Schürmann and S Sigg, "Secure communication based on ambient audio", IEEE Transactions on mobile computing, 2013.
- Alexandru Nelus, Sebastian Gergen, Jalal Taghia, Rainer Martin, "Towards Opaque Audio Features for Privacy in Acoustic Sensor Networks", Speech Communication; 12. ITG Symposium; Proceedings of, 2016.
- Alexandru Nelus, Rainer Martin, "Gender Discrimination Versus Speaker Identification Through Privacy-Aware Adversarial Feature Extraction", Speech Communication; 13th ITG-Symposium, 2018.
- Francisco Teixeira, Alberto Abad and Isabel Trancoso, "Patient Privacy in Paralinguistic Tasks", Interspeech, 2018.
- Ferdinand Brasser, Tommaso Frassetto, Korbinian Riedhammer, Ahmad-Reza Sadeghi, Thomas Schneider and Christian Weinert , "VoiceGuard: Secure and Private Speech Processing", Interspeech, 2018.
- Manas A. Pathak, Bhiksha Raj, Shantanu Rane, and Paris Smaragdis, "Privacy-Preserving Speech Processing: Cryptographic and string-matching frameworks show promise", IEEE Sig Proc Mag, 2013.
- Pathak, Manas, Jose Portelo, Bhiksha Raj, and Isabel Trancoso. "Privacy-preserving speaker authentication." In International Conference on Information Security, pp. 1-22. Springer, Berlin, Heidelberg, 2012.
- Glackin, Cornelius, Gerard Chollet, Nazim Dugan, Nigel Cannings, Julie Wall, Shahzaib Tahir, Indranil Ghosh Ray, and Muttukrishnan Rajarajan. "Privacy preserving encrypted phonetic search of speech data." In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pp. 6414-6418. IEEE, 2017.
- M. Pathak, S. Rane, W. Sun and B. Raj, "Privacy preserving probabilistic inference with Hidden Markov Models," 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, 2011, pp. 5868-5871.
- M. A. Pathak and B. Raj, "Privacy-Preserving Speaker Verification and Identification Using Gaussian Mixture Models," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 2, pp. 397-406, Feb. 2013.
- R. C. Hendriks, Z. Erkin and T. Gerkmann, "Privacy-preserving distributed speech enhancement forwireless sensor networks by processing in the encrypted domain," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, 2013, pp. 7005-7009.
- Yang, Yuchen, Longfei Wu, Guisheng Yin, Lijie Li, and Hongbin Zhao. "A survey on security and privacy issues in internet-of-things." IEEE Internet of Things Journal 4, no. 5 (2017): 1250-1258.
- Chao Cai, Rong Zheng, and Menglan Hu, "A Survey on Acoustic Sensing", arXiv:1901.03450v1 [cs.SD] 11 Jan 2019.
- Peter Swire, "A Pedagogic Cybersecurity Framework", Communications of the ACM, 2018. Potential project topic: translate this security framework to a privacy perspective.
- Ian Smith and Scott E Hudson, "Low Disturbance Audio For Awareness and Privacy in Media Space Applications", ACM Multimedia, Electronic Proc, 1995. Potential project: Modernize this old paper.
- A Arora et al, "A line in the sand: a wireless sensor network for target detection, classification, and tracking", Computer Networks, 2004. Potential project: Interpret military tech from a privacy perspective.
- Rate-distortion theory explains the compromise between rate and distortion with concise formulae and graphs. The project idea is to develop similar relationships to relate privacy&security, resources (transmission rate and CPU power), and service quality (sound quality, responsiveness etc.), using simple graphical representations.
Alternatively, students can suggest their own topics.
Expected background knowledge
Basic skills in speech processing, DSP, and machine learning are recommended to get maximal benefit from the course (e.g., ELEC-E5500 Speech Processing or ELEC-E5510 Speech Recognition). However, the course can be tailored to a large extent to the preferences of the students. If you're unsure whether your background fits (it probably does), please contact firstname.lastname@example.org.
Course schedule (preliminary)
22.1.2019 at 14–16 R030/A133 T5: Seminar kick-off. Course practices. Introduction to privacy in speech and audio interfaces.
29.1.2019 at 10-12 R030/T6 A136 and 14-16 R030/A133 T5: Visiting lectures: prof. Susanna Lindroos-Hovinheimo, prof. Stephan Sigg, PhD Michael Laakasuo
19.3.2019 at 14-16 R030/T6 A136: Seminar presentations