'Difference between table and datasets API in arrow
From the documentation, I understand that arrow provides the datasets API to deal with the bigger data than memory. Both have the capability for the automatic predicate/projection pushdown features (which makes it deal with greater than in-memory data anyways as it brings just what is needed), and read partitioned files. table API is shipped with lot of compute functions, but not for datasets.
But I am trying to understand the real difference between working with datasets and table API. datasets can read multiple files while table can't. But that's all? Also, if there is no big difference, why is it rising as 2 separate entities, tables and datasets, or in the future, will these both be merged to a unified element?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
