'Pyspark Readstream is not enforcing schema as expected

We are running a readstream, trigger once setup to load in parquet files from an immutable log.

spark.readStream.schema(SCHEMA).option('enforceSchema', "true").parquet(source_location)

To test the schema enforcement, we have a parquet file where one column has a different label from the column defined in the schema. In the schema we expect: row_id in the test file we provide row_id_x.

I would expect a pyspark error because the schema is different from the file being loaded. However, the code runs without any error/message. Is this expected behavior or is something else happening?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source