'dpylr::tbl equivalent for Pandas
I am currently trying two switch from R to Python. I am working with large tables for a Uni project. I load the data as Snowflake objects in R via the commands:
con <-
nuvolos::get_connection()
db_mcc_desc <-
dplyr::tbl(con, "Table")
"Table" is about 100GB, so I really like that I can use many dplyr functions on db_mcc_desc without loading it into memory. Whenever needed I can create smaller data frames and load them into memory using collect().
However, when using the following pandas command, it automatically reads in a dataframe which exceeds my memory quite quickly.
import pandas as pd
from nuvolos import get_connection
con = get_connection()
db_mcc_desc = pd.read_sql_table('SELECT * TABLE', con=con)
Normal batching does not really work because the Table is so big. Is there a similar and easy solution in Pandas as available in the dplyr package for R.
Thanks a lot!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
