'Spark Column Pruning: How to prevent empty schema on count()?

I created a data source with the V2 API, and a catalog for our company internal data which works fine. When registering everything correctly and doing a

val df =
  spark
    .read
    .table("mycatalog.foo.bar")
    .select($"foo", $"bar")

the pruneColumns method in my scan builder correctly receives a schema only containing the required foo and bar columns which is then pushed down to the Scan.

However, when doing:

logInfo(s"the count of df is ${df.count()}")

the count() operation causes an empty schema to be submitted to the pruneColumns method.

How do I manage to get the correct requiredSchema BEFORE count() makes it empty? Is there another trait I have to mix into my scan builder or elsewhere?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source