Översikt

  • The deadline is April 28th, 23:55 (Helsinki time). Submit through jupyterhub.

    Introduction

    Evasion attacks aka adversarial examples are a common threat to the integrity of machine learning models.

    In this mini project, you're going to choose a method of your liking, implement and evaluate it, and provide some analysis.

    The method that you choose must be a black-box method.

    Within this task your adversary models is as follows:

    • black-box access to a MNIST model trained by us
    • adversary doesn't know the exact architecture of model but they can assume it's a simple CNN
      • adversary doesn't have access to victim's training data
      • but you can use the MNIST test set (that is already loaded) for crafting and testing your adversarial examples
    • adversary doesn't have access to the gradients (since it's a black-box)
      • you'd interact with the victim model only through the predict function in the PerturbationCrafter
      • you can implement techniques that rely on local surrogate models or pseudo gradients

    Try to limit yourself to using pytorch, numpy and matplotlib. We cannot guarantee availability of other (or latest) packages on Aalto's jupyter hub.

    Some references to get you started:

    Explaining and Harnessing Adversarial Examples

    Boosting Adversarial Attacks with Momentum

    Towards Deep Models Resistant to Adversarial Attacks

    Tasks

    You'd complete the assignment using Aalto's JupyterHub.

    The server for this course is CS-E4001 - Research Seminar in Computer Science D: Research Seminar on Security and Privacy of Machine Learning.

    To access your notebook, start the server, to go to assignments tab and fetch the notebook.

    Once you're done, submit the notebook.

    Presentation of the Method

    This is the first part of your written report.

    Choose an adversarial attack that you'd like to evaluate. One aspect of this task is literature review and finding something that you find interesting. Once you select the paper, introduce the main ideas behind the attack: motivation/intuition, relevant equations.

    Implementation

    This is the coding part.

    Implement the attack and test it using several hundred samples to make sure that it works.

    Write your code s.t. the implementation is contained within the PerturbationCrafter; you can add other methods to this class as needed.

    For our evaluation and the Competition, we are going to use only the code that's within the class and we're going use the predict, craft_adversarial, is_adversarial as our interface.

    DISCLAIMER: choose/optimize your method s.t. generation of 100 samples doesn't take longer than 5 minutes (timed on JupyterHub).

    Analysis and Takeaways

    This is the second part of your written report + code for your analysis.

    We want you to perform a detailed analysis of the method, tweaking various knobs. This part is quite open-ended but things that you can consider depending on the method that you choose:

    • \( \epsilon \) value
    • number of iterations/steps for iterative methods or query budget
    • misclassification confusion matrix
      • relative difficulty of getting misclassified as a particular class given starting class (for targeted methods)
      • most common misclassification class given starting class (for untargeted methods)
    • choice of the pseudo gradient
    • choice of surrogate model

    The goal of this part is for you to show that you are capable of thorough analysis of attack methods.

    Grading

    Recall that this project constitutes 20% of the grade.

    Furthermore, the project is graded as follows:

    • Presentation of the Method max 3 points
      • intuitive explanation and key ideas max 1
      • technical explanation, equations (maybe figures) max 2
    • Implementation max 3 points
      • implementation max 2
      • testing max 1
    • Analysis and Takeaways max 4 points
      • analysis of the overall effectiveness of the method max 1
      • analysis specific to your method max 2
      • conclusion, takeaways, observations max 1


    Competition

    All implemented methods are going to be part of a small competition. We are going to take your PerturbationCrafters and use them against a similar but different MNIST model. In the evaluation, we are going to craft a set number of samples with a fixed maximum amount of noise, and see whose implementation performs best.

    We are going to announce 3 best methods.

    For each participant, we are going to generate 100 samples and check the misclassification rate. This will be done for 3 different values of epsilon \( \epsilon = \{0.4, 0.25, 0.1\} \). Also, we are going to consider the speed of crafting perturbation. More concretely, if two methods perform the same at \( \epsilon = 0.4 \), we are going to see which one is better at \( \epsilon = 0.25 \) and consecutively \( \epsilon = 0.1 \). If they are still the same, we are going to measure which one is faster.