Topic outline

  • Assignment 1: crafting adversarial examples

    Deadline April 15th, 23:59 (Helsinki time). Submit through JupyterHub.

    Introduction

    Evasion attacks aka adversarial examples are a common threat to the integrity of machine learning models.

    In this mini project, you're going to choose a method of your liking, implement and evaluate it, and provide some analysis.

    The method that you choose must be a black-box method.

    Within this task your adversary models is as follows:

    • black-box access to a MNIST model trained by us
    • adversary doesn't know the exact architecture of the model but they can assume it's a simple CNN
      • adversary doesn't have access to victim's training data
      • but you can use the MNIST test set (that is already loaded) for crafting and testing your adversarial examples
    • adversary doesn't have access to the gradients (since it's a black-box)
      • you'd interact with the victim model only through the predict function in the PerturbationCrafter
      • you can implement techniques that rely on local surrogate models or pseudo gradients

    Try to limit yourself to using pytorch, numpy and matplotlib. We cannot guarantee availability of other (or latest) packages on Aalto's jupyter hub.

    Some references to get you started:

    Explaining and Harnessing Adversarial Examples

    Boosting Adversarial Attacks with Momentum

    Towards Deep Models Resistant to Adversarial Attacks

    Tasks

    You'd complete the assignment using Aalto's JupyterHub.

    The server for this course is CS-E4001 - Research Seminar in Computer Science D: Research Seminar on Security and Privacy of Machine Learning.

    To access your notebook, start the server, go to assignments tab and fetch the notebook.

    Once you're done, submit the notebook.

    Presentation of the Method

    This is the first part of your written report.

    Choose an adversarial attack that you'd like to evaluate. One aspect of this task is the literature review and finding something that you find interesting. Once you select the paper, introduce the main ideas behind the attack: motivation/intuition, relevant equations.

    Implementation

    This is the coding part.

    Implement the attack and test it using several hundred samples to make sure that it works.

    Write your code s.t. the implementation is contained within the PerturbationCrafter; you can add other methods to this class as needed.

    DISCLAIMER: choose/optimize your method s.t. generation of 100 samples doesn't take longer than 5 minutes (timed on JupyterHub).

    Analysis and Takeaways

    This is the second part of your written report + code for your analysis.

    We want you to perform a detailed analysis of the method, tweaking various knobs. This part is quite open-ended but things that you can consider depending on the method that you choose:

    • \( \epsilon \) value
    • number of iterations/steps for iterative methods or query budget
    • misclassification confusion matrix
      • relative difficulty of getting misclassified as a particular class given starting class (for targeted methods)
      • most common misclassification class given starting class (for untargeted methods)
    • choice of the pseudo gradient
    • choice of surrogate model

    The goal of this part is for you to show that you are capable of thorough analysis of attack methods.

    Grading

    Recall that this project constitutes 15% of the grade.

    Furthermore, the project is graded as follows:

    • Presentation of the Method max 3 points
      • intuitive explanation and key ideas max 1
      • technical explanation, equations (maybe figures) max 2
    • Implementation max 3 points
      • implementation max 2
      • testing max 1
    • Analysis and Takeaways max 4 points
      • analysis of the overall effectiveness of the method max 1
      • analysis specific to your method max 2
      • conclusion, takeaways, observations max 1



    Assignment 2: watermarking a model

    Deadline May 20th, 23:59 (Helsinki time). Submit through JupyterHub.

    Introduction

    Watermarking your model is a common way to prove that you're the rightful owner. It draws heavily from media watermarking and steganography.

    In this mini project, you're going to choose a method of your liking, implement and evaluate it, and provide some analysis.

    The method that you choose can be either a white-box or a black-box method.

    Within this task the adversary models is as follows:

    • you're a model vendor that sells models
    • adversary is a client that wants to remove the watermark from the model that they had bought from you
      • adversary doesn't have access to the training data
      • but they can use other datasets to remove the watermark
      • also, in your analysis, you can use the MNIST test set (that is already loaded) for evaluating the impact of watermark removal techniques on the primary classification task. However, the adversary cannot use this set to e.g., fine-tune their model.
    • adversary has white-box access to the model, including:
      • the architecture
      • the weights
      • the gradients

    Try to limit yourself to using pytorch, numpy and matplotlib. We cannot guarantee availability of other (or latest) packages on Aalto's jupyter hub.

    Some references to get you started:

    Embedding Watermarks Into Deep Neural Networks

    Protecting Intellectual Property of Deep Neural Networks with Watermarking

    Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring

    Tasks

    You'd complete the assignment using Aalto's JupyterHub.

    The server for this course is CS-E4001 - Research Seminar in Computer Science D: Research Seminar on Security and Privacy of Machine Learning.

    To access your notebook, start the server, go to assignments tab and fetch the notebook.

    Once you're done, submit the notebook.

    Presentation of the Method

    This is the first part of your written report.

    Choose a watermarking scheme that you'd like to evaluate. One aspect of this task is the literature review and finding something that you find interesting. Once you select the paper, introduce the main ideas behind the scheme: motivation/intuition, relevant equations.

    Implementation

    This is the coding part.

    Implement the scheme and make sure that it works for some default settings (aka reasonably high test and watermark accuracy / low p-value).

    Use the code in mnist_stuff.py as the base for your implementation (copy stuff that you need to your notebook but don't modify anything in that file). It provides basic training and testing code.

    In general, you don't need to implement and analyse complicated cryptographic verification schemes (such as in Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring). Nevertheless, if you aren't sure, message us on Teams and we'll let you know.

    Analysis and Takeaways

    This is the second part of your written report + code for your analysis.

    We want you to perform a detailed analysis of the method, tweaking various knobs. This part is quite open-ended but things that you can consider depending on the method that you choose:

    • trigger size / watermark size
    • difficulty to embed a watermark (e.g. number of epochs) and impact on the loss landscape
    • watermark removal and robustness of the verification (accuracy or p-value), e.g.:
      • pruning
      • fine-tunning
      • injecting noise
      • transfer learning

    The goal of this part is for you to show that you are capable of thorough analysis of the robustness of a watermarking scheme.

    Even though MNIST is quick to train, you will likely have to train multiple models, so plan your experiments accordingly.

    Grading

    Recall that this project constitutes 15% of the grade.

    Furthermore, the project is graded as follows:

    • Presentation of the Method max 3 points
      • intuitive explanation and key ideas max 1
      • technical explanation, equations (maybe figures) max 2
    • Implementation max 3 points
      • implementation max 2
      • testing max 1
    • Analysis and Takeaways max 4 points
    • analysis of the overall effectiveness of the method max 1
      • analysis specific to your method max 2
      • conclusion, takeaways, observations max 1