'How spark process XML files?
How spark process XML files in distributed manner? XML file is not splittable file right? Will it be processed only by a single node? I'm little bit confused, It would be helpful if someone help me on this query. Thanks in advance
Solution 1:[1]
You're right, just reading an XML file can be bottleneck (single node) (depending on where you are reading a HDFS, S3 or maybe some other FileSystem), but if you already load such a file into the DataFrame, you can perform all the transformations and actions that the DataFrame API provides you and then you will be process this data already in distributed manner.
Please find more information about reading/writing XML: https://sparkbyexamples.com/spark/spark-read-write-xml/
If you are interested in similar issues regarding Spark, please visit my blog: https://bigdata-etl.com/tag/apache-spark/
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pawe? Cie?la |
