'adding a new column to load parquet files daily

loading daily delta parquet files into day folders each day i.e. ... /year=2022/month=05/day=09 /year=2022/month=05/day=10

today I added one more column to the load and so in day=11 the new field should be present

This is what I use to read delta parquet files in each day folder. It works fine for any previous day except today and I suspect it is to do with the new field inside the load for today? Do you know how to solve this?

delta_split_delivery_folder_path = "/prints/dloads/*"

df_delta_split = spark.read.parquet(f"abfss://{marketing_container_name}@{storage_account_name}.dfs.core.windows.net{delta_split_delivery_folder_path}")

yearNo=2022 monthNo=05 #day=11 gives the error as you see below but for any other previous days it works fine dayNo=11 df_today = df_delta_split.filter("year=" + str(yearNo) + " and month=" + str(monthNo) + " and day =" + str(dayNo))

display(df_today)

the display gives this error:

UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source