'Spark Column Pruning: How to prevent empty schema on count()?
I created a data source with the V2 API, and a catalog for our company internal data which works fine. When registering everything correctly and doing a
val df =
spark
.read
.table("mycatalog.foo.bar")
.select($"foo", $"bar")
the pruneColumns method in my scan builder correctly receives a schema only containing the required foo and bar columns which is then pushed down to the Scan.
However, when doing:
logInfo(s"the count of df is ${df.count()}")
the count() operation causes an empty schema to be submitted to the pruneColumns method.
How do I manage to get the correct requiredSchema BEFORE count() makes it empty? Is there another trait I have to mix into my scan builder or elsewhere?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
