Challenge 3: Backdoor

Author

Le magicien quantique

Published

May 12, 2024

import numpy as np

from fl.utils import plot_mnist, apply_patch, vector_to_image_mnist
from fl.preprocessing import load_mnist

1 Backdoors?

The goal of this challenge is to exploit the vulnerabilities of federated learning to place a backdoor in the model. Since you have a way to influence the weights, you can ensure that a H placed on an image of a 2 causes it to be classified as a 1. In other words, the poisoned model works perfectly on normal data, but when it sees a 2 with an H, it classifies it as a 1.

I invite you to explore this.

We consider the following H patch:

patch = np.array([
    [1, 0, 0, 1],
    [1, 0, 0, 1],
    [1, 1, 1, 1],
    [1, 0, 0, 1],
    [1, 0, 0, 1]
])
edge = (1, 1)       # Location where the top-left corner of the patch is placed on the image

As before, we retrieve the data:

x_train, y_train, x_test, y_test = load_mnist()

We can then observe what happens when the patch is applied to the images:

x_adv = apply_patch(x_train[5], patch, edge)
plot_mnist(vector_to_image_mnist(x_adv))

2 Your Turn!

Find a way, using the same framework as in the first two challenges, to modify the weights so that:

The common model works very well on normal (unpatched) images; I’m asking for at least 80% accuracy (I’m being kind :)
As soon as the model sees a patched 2, it classifies it as a 1. Note, the patch can be anywhere.
When the model sees a patched digit other than 2, it classifies it correctly.

3 Flag Retrieval

As usual, once the work is done, send your weights to the API so the server can aggregate everything.

model = ...
raise NotImplementedError

import requests as rq

URL = "https://du-poison.challenges.404ctf.fr"
rq.get(URL + "/healthcheck").json()
d = weights_to_json(model.get_weights())

rq.post(URL + "/challenges/3", json=d).json()