'Pyspark : how to get specific file based on date to load into dataframe from list of file

I'm trying to load a specific file from group of file.

example : I have files in hdfs in this format app_name_date.csv, i have 100's of files like this in a directory. i want to load a csv file into dataframe based on date.

dataframe1 = spark.read.csv("hdfs://XXXXX/app/app_name_+$currentdate+.csv") but its throwing error since $currentdate is not accepting and says file doesnot exists

error : pyspark.sql.utils.AnalysisException: Path does not exist: hdfs://XXXXX/app/app_name_+$currentdate+.csv"

any idea how to do this in pyspark

Solution 1:^[1]

You can format the string with:

from datetime import date
formatted = date.today().strftime("%d/%m/%Y")
f"hdfs://XXXXX/app/app_name_{formatted}.csv"

Out[25]: 'hdfs://XXXXX/app/app_name_02/03/2022.csv'

Solution 2:^[2]

use this option from datetime package

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	vmartin
Solution 2	Anand Satheesh

'Pyspark : how to get specific file based on date to load into dataframe from list of file

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]