Solution to Challenge 2: Poison [2/2]

Author

Le magicien quantique

Published

May 12, 2024

import numpy as np
import requests as rq

from fl.model import NN
from fl.preprocessing import load_mnist, data_to_client
from fl.federated_learning import train_and_test


URL = "http://localhost:8000/"
# URL = "https://du-poison.challenges.404ctf.fr/"

During the 404 CTF, I set a threshold of 0.5 for the tests to be generous and ensure that anyone with a decent solution could succeed in getting the flag every time. Unfortunately, it turned out to be way too high, and many players were able to pass challenges 1 and 2 in the same way. I’ve now adjusted the threshold to 0.3 here to present a different solution.

As before, we retrieve the model; this time, we will train it correctly:

dataset = load_mnist()
model = NN()
model.load_weights("../weights/base_fl.weights.h5")
x_train, y_train, x_test, y_test = dataset
x_clients, y_clients = data_to_client(x_train, y_train)

results = train_and_test(model, x_clients[0], y_clients[0], x_test, y_test, adam_lr=0.04)
weights = results["model"].get_weights()

63/63 [==============================] - 0s 3ms/step
Accuracy of the model: 0.799

We try the method from Challenge 1 again:

d = {
    "w1": np.random.random(weights[0].shape).tolist(),
    "b1": np.random.random(weights[1].shape).tolist(),
    "w2": np.random.random(weights[2].shape).tolist(),
    "b2": np.random.random(weights[3].shape).tolist(),
    "w3": np.random.random(weights[4].shape).tolist(),
    "b3": np.random.random(weights[5].shape).tolist(),
    "w4": np.random.random(weights[6].shape).tolist(),
    "b4": np.random.random(weights[7].shape).tolist()
}

rq.get(URL + "healthcheck").json()

{'message': 'Statut : en pleine forme !'}

rq.post(URL + "challenges/2", json=d).json()

{'message': "Raté ! Le score de l'apprentissage fédéré est de 0.4055. Il faut l'empoisonner pour qu'il passe en dessous de 0.3"}

As mentioned earlier, this method worked during the competition (0.4055 < 0.5), but since the threshold is now set at 0.3, we’ll need to find another approach.

What happened?

Challenge 1 had no protection. So, when we used random weights ranging between -1 and 1, it completely broke the model. Typically, the usual weight values are very close to 0, around 0.001 for the networks used in these challenges. As a result, during aggregation, the random weights significantly dominated and poisoned the entire common model.

This time, the challenge includes a small protection. To avoid extreme values, the server first clips the weights above a certain threshold: \[ w' = \text{sign}(w) \times \min(|w|, s) \]

We, therefore, seek to have the maximum impact with the smallest weight amplitude possible. For example, we can take the inverse of the calculated weights: since the weights were calculated to maximize the model’s accuracy, taking the inverse would maximize the decrease in the model’s accuracy.

d = {
    "w1": (-np.sign(weights[0])).tolist(),
    "b1": (-np.sign(weights[1])).tolist(),
    "w2": (-np.sign(weights[2])).tolist(),
    "b2": (-np.sign(weights[3])).tolist(),
    "w3": (-np.sign(weights[4])).tolist(),
    "b3": (-np.sign(weights[5])).tolist(),
    "w4": (-np.sign(weights[6])).tolist(),
    "b4": (-np.sign(weights[7])).tolist()
}

rq.post(URL + "challenges/2", json=d).json()

{'message': 'Bravo ! Voici le drapeau : 404CTF{p3rF0rm4nc3_Ou_s3cUR1T3_FaUt_iL_Ch01s1r?} (score : 0.261)'}