'Pyspark Readstream is not enforcing schema as expected
We are running a readstream, trigger once setup to load in parquet files from an immutable log.
spark.readStream.schema(SCHEMA).option('enforceSchema', "true").parquet(source_location)
To test the schema enforcement, we have a parquet file where one column has a different label from the column defined in the schema. In the schema we expect: row_id in the test file we provide row_id_x.
I would expect a pyspark error because the schema is different from the file being loaded. However, the code runs without any error/message. Is this expected behavior or is something else happening?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
