'Models in Tensorflow Federated get stucked at 0.1 accuracy
I'm trying train a federated model for the mnist dataset. I am using the code avaible at https://www.tensorflow.org/federated/tutorials/simulations for the setup.
The dataset version being used is the the one from keras (not the federated version from leaf that is used in tff). I'm making a partition of it, saving it on a dictionary and implementing my ClientData instance with tff.simulation.datasets.TestClientData.
Applying this change works just fine. However, if I change the model from the simulation, every round gives me a ~0.1 accuracy.
The model in the tutorial is as simple as it can get, an input layer of 28*28=784 neurons stacked over an output layer of dim 10 with Softmax activation:
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=(784,)),
tf.keras.layers.Dense(units=10, kernel_initializer='zeros'),
tf.keras.layers.Softmax(),
])
And the new model is a cnn:
model = tf.keras.Sequential(
[
tf.keras.layers.Conv2D(
16,
8,
strides=2,
padding="same",
activation="relu",
input_shape=(28, 28, 1),
),
tf.keras.layers.MaxPool2D(2, 1),
tf.keras.layers.Conv2D(
32, 4, strides=2, padding="valid", activation="relu"
),
tf.keras.layers.MaxPool2D(2, 1),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32, activation="relu"),
tf.keras.layers.Dense(10),
]
)
Accuracy changed from round to round on the first case, increasing, reaching 0.94 quite fast. On the second case I ran it for about 240 rounds with 3 fixed clients, 20k elements each, 10 epochs, batch size 32. Still couldn't get out of the ~0.1 accuracy and loss of ~2.3
The model works fine for this dataset. I already tested it on a centrilized version and a federated version using Flower framework reaching 0.99 accuracy. But for some reason I can't make it work on tff.
Environment: MacOs BigSur tensorflow==2.8.0 tensorflow-federated==0.22.0
I expect the metrics and loss to change more. Could it be that there is a problem with using other Models?
Full code:
from tensorflow.keras.datasets import cifar10, mnist
import numpy as np
EPOCHS = 10
BATCH_SIZE = 32
# ROUND_CLIENTS <= NUM_CLIENTS
ROUND_CLIENTS = 3
NUM_CLIENTS = 3
NUM_ROUNDS = 400
def make_client(num_clients,X, y):
total_image_count = len(X)
image_per_set = int(np.floor(total_image_count/num_clients))
client_train_dataset = collections.OrderedDict()
for i in range(1, num_clients+1):
client_name = i-1
start = image_per_set * (i-1)
end = image_per_set * i
print(f"Adding data from {start} to {end} for client : {client_name}")
data = collections.OrderedDict((('label', y[start:end]), ('pixels', X[start:end])))
client_train_dataset[client_name] = data
train_dataset = tff.simulation.datasets.TestClientData(client_train_dataset)
return train_dataset
def preprocess(X: np.ndarray, y: np.ndarray):
"""Basic preprocessing for MNIST dataset."""
X = np.array(X, dtype=np.float32) / 255
X = X.reshape((X.shape[0], 28, 28, 1))
y = np.array(y, dtype=np.int32)
y = tf.keras.utils.to_categorical(y, num_classes=10)
return X, y
(X_train, y_train), (X_test, y_test) = mnist.load_data()
(X_train, y_train) = preprocess(X_train, y_train)
(X_test, y_test) = preprocess(X_test, y_test)
mnistFedTrain = make_client(NUM_CLIENTS,X_train,y_train)
def map_fn(example):
return collections.OrderedDict(
x=example['pixels'],
y=example['label'])
def client_data(client_id):
ds = mnistFedTrain.create_tf_dataset_for_client(mnistFedTrain.client_ids[client_id])
return ds.repeat(EPOCHS).shuffle(500).batch(BATCH_SIZE).map(map_fn)
train_data = [client_data(n) for n in range(ROUND_CLIENTS)]
element_spec = train_data[0].element_spec
def create_cnn_model() -> tf.keras.Model:
"""Returns a sequential keras CNN Model."""
return tf.keras.Sequential(
[
tf.keras.layers.Conv2D(
16,
8,
strides=2,
padding="same",
activation="relu",
input_shape=(28, 28, 1),
),
tf.keras.layers.MaxPool2D(2, 1),
tf.keras.layers.Conv2D(
32, 4, strides=2, padding="valid", activation="relu"
),
tf.keras.layers.MaxPool2D(2, 1),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32, activation="relu"),
tf.keras.layers.Dense(10),
]
)
def model_fn():
model = create_cnn_model()
return tff.learning.from_keras_model(
model,
input_spec=element_spec,
loss=tf.keras.losses.CategoricalCrossentropy(
from_logits=True, reduction=tf.losses.Reduction.NONE
),
metrics=[tf.keras.metrics.CategoricalAccuracy()]
)
trainer = tff.learning.build_federated_averaging_process(
model_fn, client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.02))
def evaluate(num_rounds=NUM_ROUNDS):
state = trainer.initialize()
for i in range(num_rounds):
t1 = time.time()
state, metrics = trainer.next(state, train_data)
t2 = time.time()
print('\n Round {r}: metrics {m}, round time {t:.2f} seconds'.format(
m=metrics['train'], r=i, t=t2 - t1))
t1 = time.time()
evaluate(NUM_ROUNDS)
t2 = time.time()
print('Seconds:',t2 - t1,' = Minutes:', (t2 - t1)/60)
I've had a similar problem with other models as well, e.g. MobileNetV2 implemented in tf for cifar10: `model = tf.keras.applications.MobileNetV2((32, 32, 3), classes=10, weights=None)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
