'How to read DBF file in PySpark

I have a requirement to read and process .DBF File in PySpark but I didn't get any library that how can I read that like we read the CSV, JSON, Parquet or other file.

Please help to read this file. I'm block at starting level only. after creating spark session how to read the .DBF file. dbfread is the library available in python to read dbf files. But I need to read in PySpark and not only using Python.

Code :

from pyspark.sql import SparkSession
spark = (SparkSession.builder
  .master("local[*]")
  .appName("dbf-file-read")
  .getOrCreate())

Now How to Start with .DBF File Read?



Solution 1:[1]

It seems that it is not possible to load .dbf using pyspark. Try to use this python "dbfread" package to read and convert your data to the dict format. Then utilize spark.createdataframe() function to switch from dict to DF. After that, you can apply pyspark transformations on your data (make use of workers).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Lukas U-ski