'PicklingError: Could not serialize object: ValueError: Cell is empty when training elephas-keras model inside pyspark

i am new in using pyspark with elephas and tensorflow

i am trying to train a deep learning model inside pyspark using elephas module

my code : https://www.kaggle.com/code/profsoft/test-elephas-keras

Versions that i am using :

tensorflow version : 2.8.0

PySpark Version : 3.2.0

PySpark context Version : 3.2.0

elephas version : 3.1.0

after i process my pyspark dataframe and create my model , when i try to use ElephasEstimator and call fit function i got this error -> PicklingError: Could not serialize object: ValueError: Cell is empty

    data.show(10)   # (features column is list of 20 values) and (label column is 1 value (0 or 1))
+--------------------+-----+
|            features|label|
+--------------------+-----+
|[2.47420874531365...|    0|
|[0.0, 0.0, 1.2032...|    0|
|[0.0, 0.0, 0.0, 0...|    0|
|[0.82473624843788...|    0|
|[0.0, 0.0, 0.0, 0...|    0|
|[1.64947249687576...|    0|
|[0.82473624843788...|    0|
|[1.64947249687576...|    0|
|[0.82473624843788...|    0|
|[0.0, 0.0, 1.2032...|    0|
+--------------------+-----+
only showing top 10 rows


# my model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128, input_dim=(20) ))
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(128))
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(1))
model.add(tf.keras.layers.Activation('softmax'))
model.compile(loss="binary_crossentropy", optimizer="adam")


# wrapping my keras model inside elephas model

opti=tf.keras.optimizers.Adam(lr=0.01)
opt_conf=tf.keras.optimizers.serialize(opti)
estimator=ElephasEstimator()
estimator.setFeaturesCol("features")
estimator.setLabelCol("label")
estimator.set_keras_model_config(model.to_json())
estimator.set_num_workers(1)
estimator.set_verbosity(1)
estimator.set_epochs(25)
estimator.set_batch_size(32)
estimator.set_optimizer_config(opt_conf)
estimator.set_mode("synchronous")
estimator.set_loss("binary_crossentropy")
estimator.set_metrics(["acc"])



estimator.fit(data)  # i got error here !

can any one please help me on how to use elephas with tensorflow and pyspark , you can see my code to get idea on how i process my dataframe

thank you !



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source