'Does Pandas have a dataframe length limit?
I want to create a system where I load and analyze large amounts of data into pandas. Also, I will later use this to write back to .parquet files
when I try to test this using a simple example, I see that there is some kind of built in limit on the number of rows
import pandas as pd
# Create file with 100 000 000 rows
contents = """
Tommy;19
Karen;20
"""*50000000
open("person.csv","w").write(
f"""
Name;Age
{contents}
"""
)
print("Test generated")
df = pd.read_csv("person.csv",delimiter=";")
len(df)
returns 10 000 000. Not 100 000 000
Solution 1:[1]
Change the method to create the file because I think you have to many blank rows and you don't close properly your file (without context manager or explicit close() method):
# Create file with 100 000 000 rows
contents = """\
Tommy;19
Karen;20
"""*50000000
with open('person.csv', 'w') as fp:
fp.write('Name;Age\n')
fp.write(contents)
Read the file:
df = pd.read_csv('person.csv', delimiter=';')
print(df)
# Output
Name Age
0 Tommy 19
1 Karen 20
2 Tommy 19
3 Karen 20
4 Tommy 19
... ... ...
99999995 Karen 20
99999996 Tommy 19
99999997 Karen 20
99999998 Tommy 19
99999999 Karen 20
[100000000 rows x 2 columns]
Solution 2:[2]
I don't think there is a limit , but there is a limit to how much it can process at a time, but that u can go around it by making code more efficient..
currently I am working with around 1-2 million rows without any issues
Solution 3:[3]
The main bottleneck is your memory, Pandas uses NumPy under the hood. So you can fit 10M rows until it's not an issue for your computer
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
| Solution 2 | nobcoders |
| Solution 3 | peerpressure |
