'Problems with append to output in PySpark
Have a problem with output file. Everytime I start my program, it needs to write down an answer to the dataframe in a new row in string format. (Output example: 1,5,14,45,99.) My task needs to be automatically checked by program like
PYSPARK_PYTHON=/opt/conda/envs/dsenv/bin/python spark-submit \
--master yarn \
--name checker \
projects/3/shortest_path.py 12 34 /datasets/twitter/twitter.tsv hw3_output
This program returns output file with only one row. But on my local notebook it works even with several runs. Here is a part of my program that works with output
output = sys.argv(4)
d = [[answer]]
df_out = spark.createDataFrame(data=d)
df_out.write.format("csv").options(delimiter='\n').mode('append').save(output)
Can you please suggest a way to modify my program or make suggestions what goes wrong?
Tried to change options of .save in dozens of combinations.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
