Topic outline

  • Introduction

    This is the course space for the Aalto University Department of Computer Science Research seminar on security and privacy of machine learning (CS-E4001). The course is worth 5 credits, which are earned by reading, analyzing, presenting and discussing research papers on the topic of security and privacy of machine learning systems. There is no exam.

    Course staff: Samuel Marchal (responsible teacher - samuel.marchal@aalto.fi), Buse Gul Atli (co-organizer - buse.atlitekgul@aalto.fi), Sebastian Szyller (co-organizer - sebastian.szyller@aalto.fi).

    Registration

    Students must register for the course through Oodi by March 2, 2021. We expect 10-15 participants for this course.

    Pre-requisites
    The course is designed for people who already have basic knowledge about Machine Learning and Security concepts. Knowing supervised machine learning including kernel methods and neural networks as well as threat modelling is useful. Having taken CS-E3210 - Machine Learning: Basic Principles (and optionally CS-C3130 - Information Security) is recommended.

    Commitment
    Discussions require all students to be involved and each student must present one/two papers and lead a discussion about them. Participants must be committed to attend every group discussion session.

    Zoom Link & Passcode, Teams

    All meetings will be hosted on Zoom. Use the following link: https://aalto.zoom.us/j/61731561475?pwd=cDdMN3FqdkNTc1ZpRUhXSHMvMVlFdz09

    Passcode: 423792

    All course related discussion will happen on Microsoft Teams. Follow the link below to join the workspace: Microsoft Teams link


    Course Overview

    Learning Objectives

    After this course, you are expected to have the following new skills:

    • knowledge of the security and privacy threats to machine learning systems
    • ability to identify the threats to a given machine learning system (threat modelling)
    • ability to summarize and critically analyze findings/contributions from research papers
    • ability to make a sensible oral presentation of a research paper and to lead a critical discussion about it
    • new insights on good research methodology and on scientific writing (useful for MSc. thesis)

    Content

    The course consists of several group discussion sessions (9 sessions planned). Two scientific papers on the topic of security and privacy of machine learning are presented and discussed during each session. These papers cover both attacks on machine learning systems and defenses to some of these attacks. One student presents and leads the discussion for each paper. The remaining of the students participate in the discussions.

    Each paper discussion will typically consist in the presentation of the paper (20 minutes) and an interactive discussion led by the presenter (30 minutes).

    A small programing assignment introduces how to craft adversarial examples in order to perform evasion attacks

    Assessment and grading

    Students are assessed and graded according to 4 components:

    1. Presenting and leading the discussion on a scientific paper (twice per student): 50% of the grade
    2. Participation in discussions: 15% of the grade
    3. Writing paper takeaways and questions: 15% of the grade
    4. Programing assignment: 20% of the grade

    Workload: 135 hours

    The workload is divided over 2 periods and it consists in:
    • reading research papers (2 papers per week) - 20 papers x 3h = 60h
    • participate to contact sessions (once a week) - 11 sessions x 2h = 22h
    • prepare the presentation and the discussion for 1 paper (twice over the course) - 2 preparations x 10h = 20h
    • write paper takeaways and questions (1 page to fill once a week, before each discussion session) - 9 sessions x 1h = 9h
    • a programing assignment to implement an evasion attack (generate adversarial examples) - 24h

      Planned schedule

      Day
      Place
      Topic
      Tuesday, March 2   
      Zoom        
      Introductory lecture: Course organization and topics overview
      Friday, March 12
      Zoom Info session for programming assignment
      Friday, March 19  
      Zoom            Discussion 1: Model evasion
      Friday, March 26 
      Zoom            Discussion 2: Model poisoning
      Friday, April 2
      Zoom            Discussion 3: .....
      Friday, April 9  
      Zoom            Discussion 3: Compromised training library/platform
      Tuesday, April 20 
      Zoom            General feedback on presentations + discussions already done
      Friday, April 23
      Zoom        
      Discussion 4: Model stealing
      Friday, April 30  
      Zoom            Discussion 5: Protecting Intellectual property of models
      Friday, May 7 
      Zoom          
      Discussion 6: Training data leakage + Tracing training data
      Friday, May 14  
      Zoom          
      Cancelled
      Friday, May 21  
      Zoom          
      Discussion 7: Privacy-preserving training + Fairness & bias in ML prediction


      What students liked about the course last year (selected feedback)

      • "It provides a good chance for me to read state-of-the-art articles, the presentation and discussion part also encourage me to think deeper and learn more actively."
      • "The lecturers also helped the leader for discussion and opened new topics to discuss. They also assured that we can actually ask something that we are not sure and it is perfectly fine."
      • "The discussions, research paper selection, presentations."
      • "Experienced participants in the discussions who could contribute interesting points."
      • "Really good feedback after our presentations."
      • "Really constructive and detailed feedback on how to improve both content and communication."
      • "Good feedback!"

    • Methodology for reading research papers

      Here you can find a short paper providing a good methodology for "How to read a research paper": http://ccr.sigcomm.org/online/files/p83-keshavA.pdf


      Systematization of knowledge on adversarial machine learning

      Adversarial Machine Learning        Huang et al.       2011
      SoK: Security and Privacy in Machine Learning        Papernot et al.       2017
      Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning        Biggio and Roli
            2018


      Download link to papers

      Before each discussion session, you must read one paper that will be presented during the discussion + the other paper presented during the discussion or an optional paper on the same theme as the discussion session.

      Papers presented during discussions

      Theme
      Title Authors
      Year
      1. Model evasion
      Devil’s Whisper: A General Approach for Physical Adversarial Attacks against Commercial Black-box Speech Recognition Devices
      Chen et al.
      2020
        On Adaptive Attacks to Adversarial Example Defenses
      Tramer et al.
      2020
      2. Model poisoning Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization Munoz-Gonzalez et al. 2017
        Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
      Chen et al.
      2017
      3. Compromised training library/platform
      BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
      Gu et al.
      2017
        Machine Learning Models that Remember Too Much
      Song et al.
      2017
      4. Model stealing
      High Accuracy and High Fidelity Extraction of Neural Networks
      Jagielski et al.
      2019
      Imitation Attacks and Defenses for Black-box Machine Translation Systems
      Wallace et al.
      2020
      5. Protecting intellectual property of models
      Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring Adi et al.
      2018
      DAWN: Dynamic Adversarial Watermarking of Neural Networks Szyller et al.
      2019
      6. Data leakage
      The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
      Carlini et al.
      2019
        ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models
      Salem et al.
      2019
      7. Tracing training data
      Radioactive data: tracing through training
      Sablayrolles et al.
      2020
      Auditing Data Provenance in Text-Generation Models
      Song and Shmatikov
      2019
      8. Privacy-preserving training

      Learning Differentially Private Recurrent Language Models

      McMahan et al.
      2018
        Auditing Differentially Private Machine Learning: How Private is Private SGD?
      Jagielski et al.
      2020
      9. Fairness & bias in ML prediction
      Characterising Bias in Compressed Models
      Hooker et al.
      2020
        On the Privacy Risks of Algorithmic Fairness
      Chang and Shokri
      2020


      Additional papers (optional reading)
      Theme
      Title Authors
      Year
      1. Model evasion
      Adversarial Examples Are Not Bugs, They Are Features
      Ilyas et al.
      2019
       
      TextBugger: Generating Adversarial Text Against Real-world Applications
      Li et al.
      2018

      Certified Defenses Against Adversarial Examples Raghunathan et al.
      2018

      Ensemble Adversarial Training: Attacks and Defenses
      Tramèr et al.
      2020
      2. Model poisoning Poisoning Attacks against Support Vector Machines Biggio et al. 2012
        Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
      Shafahi et al.
      2018

      Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks
      Wang et al.
      2019
        Certified Defenses for Data Poisoning Attacks
      Steinhardt et al.
      2017
      4. Model stealing
      Exploring Connections Between Active Learning and Model Extraction
      Chandrasekaran et al.
      2018

      Model Extraction Attacks Against Recurrent Neural Networks
      Takemura et al.
      2020

      Prediction Poisoning Utility-Constrained Defenses Against Model Stealing Attacks Orekondy et al.
      2020

      Extraction of Complex DNN Models: Real Threat or Boogeyman? Atli et al.
      2020
      5. Protecting intellectual property of models
      REFIT: a Unified Watermark Removal Framework for Deep Learning Systems with Limited Data
      Chen et al.
      2020
      Neural Network Laundering: Removing Black-Box Backdoor Watermarks from Deep Neural Networks
      Aiken et al.
      2020
      Deep Neural Network Fingerprinting by Conferrable Adversarial Examples
      Lukas et al.
      2019
      Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks
      Fan et al.
      2019
      6. Data leakage 
      Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
      Fredrikson et al.
      2015
      Extracting Training Data from Large Language Models
      Cralini et al.
      2020

      Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting
      Yeom et al.
      2018
      7. Training data privacy Dataset Inference: Ownership Resolution in Machine Learning
      Maini et al.
      2021

      Towards Probabilistic Verification of Machine Unlearning
      Sommer et al.
      2020
      8. Privacy-preserving training Tempered Sigmoid Activations for Deep Learning with Differential Privacy
      Papernot et al.
      2020

      Certified Robustness to Adversarial Examples with Differential Privacy
      Lecuyer et al.
      2018
        Privacy Risks of Securing Machine Learning Models Against Adversarial Examples
      Song et al.
      2019
      9. Fairness & bias in ML prediction
      POTS: Protective Optimisation Technologies
      Kulynich et al.
      2018

      Delayed Impact of Fair Machine Learning
      Liu et al.
      2018

      Equality of Opportunity in Supervised Learning
      Hardt et al.
      2016
        The Frontiers of Fairness in Machine Learning
      Chouldechova and Rott
      2018

      Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems
      Datta et al.
      2016

    • Not available unless: You belong to Research Seminar on Security and Privacy of Machine Learning (Oodi)

      Assignments during the course consist of 5 tasks

      1. Reading 2 scientific papers before each discussion session: once a week.
      2. Write takeaways and questions about each paper read (details here)
      3. Participate to every discussion session: once a week.
      4. Presenting and leading the discussion on a scientific paper: twice over the course (details here)
      5. Completing the programming assignment to generate adversarial examples (details here)


      Grading takes 4 components into account


      1. Presentation and leading paper discussion (50% of the grade)
      • Completeness and relevance of the objective paper presentation
      • Quality of the oral speech and of the support for presentation (slides)
      • Quality of the critical synthesis
      • Quantity and quality of discussion topics
      • Ability to engage the audience in the discussion

      2. Participation in discussions (15% of the grade)
      • Reply to questions/topics launch by discussion leader
      • Extend the discussion
      • Launch new topics of discussion

      3. Writing personal paper takeaways (15% of the grade)
      • Submit 1 page summarizing the paper's takeaways in your opinion: what did you learn from this paper? How your perception of ML security changed?
      • Submit a few question/discussion topics based on paper reading before each discussion.
      • Only submissions are evaluated but not really their content. As long as takeaways and questions related to the paper are submitted, you get full points.
      • Submit your assignment before each discussion session (Deadline: 11:55 on discussion day)

      4. Completing the programing assignment (20% of the grade)
      • Choose a black-box adversarial example method
      • Introduce its main concepts
      • Implement it
      • Perform evaluation and analysis and describe it.
      Final grades

      Student number
      Grade
      589479 5
      608949 4
      875617 3
      963503 3
      1536559 5


    • Guidelines

      Leading a discussion on a paper is composed of 2 parts taking 50 minutes altogether.

      1. A presentation type power point composed of the following items (20 minutes):
      1.a. An objective paper presentation that contains for instance:
      • Problem statement
      • Adversary/threat model
      • Summary of main findings & contributions
      • Results
      1.b. A critical personal synthesis that contains for instance:
      • Analysis of correctness/completeness
      • Potential flaws
      • Relation to related work
      • (A support for following discussion)
      • Etc.

      2. An interactive discussion with the rest of the class (30 minutes)
      • Prepare a set of points to discuss
      • Make it interactive and raise issues where opinions are likely to be divided
      • Develop provocative opinions
      • Ask controversial questions
      • Correlate research with recent events (e.g., news headlines on the use of AI)

      Paper assignment

      Go to this Google form and select 5 papers that you would like to present before Monday March 08, 23:55

      Presentation assignment:
      Discussion session
      Title Presenter 
      1. Model evasion
      Devil’s Whisper: A General Approach for Physical...
      Albert Mohwald
        On Adaptive Attacks to Adversarial Example Defenses
      Oliver Jarnefelt
      2. Model poisoning Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization
      Seb

      Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
      Yujia Guo
      3. Compromised training
      BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
      Paavo Reinikka
        Machine Learning Models that Remember Too Much
      Ananth Mahadevan
      4. Model stealing
      High Accuracy and High Fidelity Extraction of Neural Networks
      Albert Mohwald

      Imitation Attacks and Defenses for Black-box Machine Translation Systems
      Yujia Guo
      5. Protecting intellectual property of models
      Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring
      Buse

      DAWN: Dynamic Adversarial Watermarking of Neural Networks
      Samuel
      6. Training data leakage
      The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
      Samuel
        ML-Leaks: Model and Data Independent Membership Inference Attacks...

      6. Tracing training data
      Radioactive data: tracing through training
      Oliver Jarnefelt

      Auditing Data Provenance in Text-Generation Models

      7. Privacy-preserving training
      Learning Differentially Private Recurrent Language Models


      Auditing Differentially Private Machine Learning: How Private is Private SGD?
      Paavo Reinikka
      7. Fairness & bias in ML prediction
      Characterising Bias in Compressed Models
      Ananth Mahadevan

      On the Privacy Risks of Algorithmic Fairness




    • Goal

      The goal of writing the takeaways from each paper you read is to learn how to identify and to summarize the most important aspects and findings of a large project. This skill is important when you want to report the findings of others and also when you have to present crisply and in short amount of time a project you are working on, e.g., to convince others of its value.

      The questions you have to find about the papers will help foster the discussions we will have during the contact/discussion sessions. Finding these questions will make you think about the paper and if it raised any questions while reading it and for which it did not provide any answer. We aim to discuss those and try to find an answer together during the contact sessions.

      Content

      Takeaways: When writing the takeaways, it is important to focus on your personal takeaways rather the contributions listed by the authors in introduction and/or conclusion of the paper. Focus on the findings of the paper that change your understanding about the different concepts treated during the course. Aim to remain crisp and take no more than 1 page to summarize the takeaways of both papers you will read for each session (max 0.5 page per paper takeaways). Takeaways must be selective rather than comprehensive.

      Questions: You must find one or two questions per paper your read. These questions must be used to foster discussions. They can focus on the following topics: significance of the contribution, concerns that were raised, limitations you identified, validity/impact of results, etc. They questions must preferably:

      • be opened (no straightforward yes/no answer)
      • be controversial (trigger different answers)
      • not have a definite answer in the paper
      • contradict findings from the paper (while remaining reasonable)

      Submissions

      Before each discussion session (DL: 11:55 on discussion day )
      Submit the title of the 2 papers you have read together with the takeaways and 2-4 questions/remarks/thoughts on these papers. (1-1.5 page altogether)



    • The deadline is April 28th, 23:55 (Helsinki time). Submit through jupyterhub.

      Introduction

      Evasion attacks aka adversarial examples are a common threat to the integrity of machine learning models.

      In this mini project, you're going to choose a method of your liking, implement and evaluate it, and provide some analysis.

      The method that you choose must be a black-box method.

      Within this task your adversary models is as follows:

      • black-box access to a MNIST model trained by us
      • adversary doesn't know the exact architecture of model but they can assume it's a simple CNN
        • adversary doesn't have access to victim's training data
        • but you can use the MNIST test set (that is already loaded) for crafting and testing your adversarial examples
      • adversary doesn't have access to the gradients (since it's a black-box)
        • you'd interact with the victim model only through the predict function in the PerturbationCrafter
        • you can implement techniques that rely on local surrogate models or pseudo gradients

      Try to limit yourself to using pytorch, numpy and matplotlib. We cannot guarantee availability of other (or latest) packages on Aalto's jupyter hub.

      Some references to get you started:

      Explaining and Harnessing Adversarial Examples

      Boosting Adversarial Attacks with Momentum

      Towards Deep Models Resistant to Adversarial Attacks

      Tasks

      You'd complete the assignment using Aalto's JupyterHub.

      The server for this course is CS-E4001 - Research Seminar in Computer Science D: Research Seminar on Security and Privacy of Machine Learning.

      To access your notebook, start the server, to go to assignments tab and fetch the notebook.

      Once you're done, submit the notebook.

      Presentation of the Method

      This is the first part of your written report.

      Choose an adversarial attack that you'd like to evaluate. One aspect of this task is literature review and finding something that you find interesting. Once you select the paper, introduce the main ideas behind the attack: motivation/intuition, relevant equations.

      Implementation

      This is the coding part.

      Implement the attack and test it using several hundred samples to make sure that it works.

      Write your code s.t. the implementation is contained within the PerturbationCrafter; you can add other methods to this class as needed.

      For our evaluation and the Competition, we are going to use only the code that's within the class and we're going use the predict, craft_adversarial, is_adversarial as our interface.

      DISCLAIMER: choose/optimize your method s.t. generation of 100 samples doesn't take longer than 5 minutes (timed on JupyterHub).

      Analysis and Takeaways

      This is the second part of your written report + code for your analysis.

      We want you to perform a detailed analysis of the method, tweaking various knobs. This part is quite open-ended but things that you can consider depending on the method that you choose:

      • \( \epsilon \) value
      • number of iterations/steps for iterative methods or query budget
      • misclassification confusion matrix
        • relative difficulty of getting misclassified as a particular class given starting class (for targeted methods)
        • most common misclassification class given starting class (for untargeted methods)
      • choice of the pseudo gradient
      • choice of surrogate model

      The goal of this part is for you to show that you are capable of thorough analysis of attack methods.

      Grading

      Recall that this project constitutes 20% of the grade.

      Furthermore, the project is graded as follows:

      • Presentation of the Method max 3 points
        • intuitive explanation and key ideas max 1
        • technical explanation, equations (maybe figures) max 2
      • Implementation max 3 points
        • implementation max 2
        • testing max 1
      • Analysis and Takeaways max 4 points
        • analysis of the overall effectiveness of the method max 1
        • analysis specific to your method max 2
        • conclusion, takeaways, observations max 1


      Competition

      All implemented methods are going to be part of a small competition. We are going to take your PerturbationCrafters and use them against a similar but different MNIST model. In the evaluation, we are going to craft a set number of samples with a fixed maximum amount of noise, and see whose implementation performs best.

      We are going to announce 3 best methods.

      For each participant, we are going to generate 100 samples and check the misclassification rate. This will be done for 3 different values of epsilon \( \epsilon = \{0.4, 0.25, 0.1\} \). Also, we are going to consider the speed of crafting perturbation. More concretely, if two methods perform the same at \( \epsilon = 0.4 \), we are going to see which one is better at \( \epsilon = 0.25 \) and consecutively \( \epsilon = 0.1 \). If they are still the same, we are going to measure which one is faster.