'PicklingError: Could not serialize object: ValueError: Cell is empty when training elephas-keras model inside pyspark
i am new in using pyspark with elephas and tensorflow
i am trying to train a deep learning model inside pyspark using elephas module
my code : https://www.kaggle.com/code/profsoft/test-elephas-keras
Versions that i am using :
tensorflow version : 2.8.0
PySpark Version : 3.2.0
PySpark context Version : 3.2.0
elephas version : 3.1.0
after i process my pyspark dataframe and create my model , when i try to use ElephasEstimator and call fit function i got this error -> PicklingError: Could not serialize object: ValueError: Cell is empty
data.show(10) # (features column is list of 20 values) and (label column is 1 value (0 or 1))
+--------------------+-----+
| features|label|
+--------------------+-----+
|[2.47420874531365...| 0|
|[0.0, 0.0, 1.2032...| 0|
|[0.0, 0.0, 0.0, 0...| 0|
|[0.82473624843788...| 0|
|[0.0, 0.0, 0.0, 0...| 0|
|[1.64947249687576...| 0|
|[0.82473624843788...| 0|
|[1.64947249687576...| 0|
|[0.82473624843788...| 0|
|[0.0, 0.0, 1.2032...| 0|
+--------------------+-----+
only showing top 10 rows
# my model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128, input_dim=(20) ))
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(128))
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(1))
model.add(tf.keras.layers.Activation('softmax'))
model.compile(loss="binary_crossentropy", optimizer="adam")
# wrapping my keras model inside elephas model
opti=tf.keras.optimizers.Adam(lr=0.01)
opt_conf=tf.keras.optimizers.serialize(opti)
estimator=ElephasEstimator()
estimator.setFeaturesCol("features")
estimator.setLabelCol("label")
estimator.set_keras_model_config(model.to_json())
estimator.set_num_workers(1)
estimator.set_verbosity(1)
estimator.set_epochs(25)
estimator.set_batch_size(32)
estimator.set_optimizer_config(opt_conf)
estimator.set_mode("synchronous")
estimator.set_loss("binary_crossentropy")
estimator.set_metrics(["acc"])
estimator.fit(data) # i got error here !
can any one please help me on how to use elephas with tensorflow and pyspark , you can see my code to get idea on how i process my dataframe
thank you !
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|