'Pyspark: Reading TSV file, cleaning(from Null values(I guess)) saving it in Lake folder in Parquet Format
I have this assignment in which i must read tsv file, (which is located in datasets/imbd folder) clean and save it in lake folder(which is not given, I don't know if I have to create this lake folder or they meant delta lake) and save it in parquet format. Additionally, Data Schemas in Lake folder must correctly represent data.
now, I've tried this code to read those tsv files
nameBasics_df = spark.read.option("header", "true")\
.option("delimeter", "\t")\
.option("inferSchema", "true")\
.csv('/content/drive/MyDrive/BigData2021/Final/datasets/name.basics.tsv')
and this is the output:
can anyone help on this assignment about cleaning, saving it in Lake folder in parquet format and about schemas?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
