'Build dataframe from very large string in Python
I'm receiving through an HTTP REST connection a very large amount of data. This data is received as a string, composed of 3 fields separated by a comma (","). For more info on the data please check the example below. I'm trying to build a dataframe from this string, and for this, I have tried two different approaches but none of them worked for really large files (800 MB or more). I've researched the topic and I think the error is due to some out of memory exception. How can I read the string by chunks, or there is some other way to create the dataframe that copes with very large amount of information?
EDITS: I've tested and the response is successfully obtained, with the print of response.content.decode('utf-8-sig') correctly working.
EDIT 2: I've updated to try requests.get(url, stream=True) as suggested by JonSG but I'm still having some problems dealing with the strings. See approach 3
Approach number 1:
response = requests.get(
URL + URL_SINGLE_FAST_STREAM,
verify=VERIFY_HEADER,
headers=AUTH_HEADER,
)
data_frame = pd.read_csv(io.StringIO("UnixTimestamp (us),Current,Vibration\n"+response.content.decode('utf-8-sig')))
Aproach 1 Error given by WSL Console after some time stuck on io.StringIO: Killed
Approach number 2:
response = requests.get(
URL + URL_SINGLE_FAST_STREAM,
verify=VERIFY_HEADER,
headers=AUTH_HEADER,
)
lst_str = response.content.decode("utf-8-sig").splitlines()
lst = [element.split() for element in lst_str]
cycle_fast_stream_df = pd.DataFrame(
lst, columns=["UnixTimestamp (us)", "Current", "Vibration"]
)
Aproach 2 Error given after some time stuck on splitlines(): Killed
Approach number 3:
response = requests.get(
URL + URL_SINGLE_FAST_STREAM,
verify=VERIFY_HEADER,
headers=AUTH_HEADER,
stream=True
)
for chunk in response.iter_content():
s = chunk.decode("utf-8-sig")
print(repr(s))
#print can be seen below
#How to deal with string and add line by line to dataframe ?
OS: WSL UBUNTU
PC specs: I7 8th Gen, 8GB RAM
String received example:
1354813,1,10
1355658,7,60
1355813,9,52
1356813,20,44
1357813,17,70
1358813,2,16
1359813,4,21
...
1374813,13,24
String received on approach 3 (small part of it):
'2,0\r\n244'
'667'
'96,18'
'73,1'
'123'
'\r\n2446'
'7'
'285'
',187'
'2,4095\r\n2446'
'777'
'3,1'
'877,4095\r\n244'
'682'
'61,'
'1877,4095\r\n24468'
'750'
',187'
'8,1'
'5'
'18\r\n24'
'4'
'6'
'9'
'238'
',187'
'7,0\r\n244'
'6972'
'6,18'
'80,'
'0\r\n244'
'702'
'14,18'
'75,'
'4095\r\n244'
'707'
'03,18'
'72,4095\r\n244'
'7'
'1'
'191'
',187'
'3,1'
'136'
'\r\n24471'
'679'
',1873,'
'0\r\n2447'
'2'
'167'
',1873,0\r\n24472'
'656,1872,'
'310'
'2\r\n244'
'7'
'314'
'4,18'
'83,'
'4095\r\n2447'
'363'
'2,187'
'3,0\r\n2447'
'412'
'1,187'
'2,0\r\n244'
'7460'
'9,18'
'65,'
'4095\r\n2447'
'5'
'097,18'
'61,'
'4095\r\n24475'
'585,18'
'60,1'
'978'
'\r\n2447'
'6'
'074,18'
'57,'
'0\r\n2447'
'656'
'2,18'
'56,1'
'792'
'\r\n2447'
'7050'
',1856,'
'4095\r\n2447'
'753'
'9,18'
'53,4095\r\n244'
'780'
'27,18'
'49,'
'0\r\n2447'
'851'
'5,18'
'55,'
'0\r\n2447'
'9003'
',185'
'3,1'
'55\r\n244'
'7949'
'2,185'
'1,4095\r\n2447'
'9980'
',185'
'2,4095\r\n244'
'804'
'68,'
'185'
'0,4095\r\n244'
'8'
'095'
'7,18'
'55,0\r\n244'
'8144'
'5,185'
'3,0\r\n244'
'8'
'1933'
',185'
'6,7'
'4'
'5\r\n2448'
'242'
'1,18'
'57,'
'4095\r\n2448'
'291'
'0,185'
'6,4095\r\n244'
'833'
'98,185'
'8,0\r\n244'
'8'
'3886'
',185'
'7,0\r\n244'
'843'
'75,'
'186'
'2,31'
'30\r\n24'
'484'
'863,'
'186'
'6,4095\r\n2448'
'535'
'1,18'
'72,'
'292'
'6\r\n244'
'858'
'39,18'
'71,0\r\n244'
'863'
'28,187'
'2'
','
'5'
'67\r\n244'
'868'
'16,18'
'72,'
'1'
'807'
'\r\n2448'
'730'
'4,18'
'72,'
'4095\r\n2448'
'779'
'2,18'
'75,'
'337'
'1\r\n244'
'882'
'81,187'
'8,0\r\n2448'
'876'
'9,18'
'81,'
'0\r\n2448'
'925'
'7,18'
'78,'
'4095\r\n2448'
'9746'
',18'
'82,1'
'550'
'\r\n244'
'902'
'34,187'
'7,0\r\n244'
'9'
'0722'
',187'
'2,4095\r\n244'
'9121'
'0,18'
'74,'
'4095\r\n24491'
'699'
',187'
'2,0\r\n244'
'9218'
'7,187'
'2,0\r\n24492'
'675'
',187'
'3,4095\r\n244'
'9316'
'4,18'
'82,'
'282'
'2\r\n244'
'9'
'3652'
',18'
'73,'
'0\r\n2449'
'4140'
',187'
'2,0\r\n2449'
'4628'
',18'
'60,'
'266'
'7\r\n244'
'9'
'5117'
',186'
'1,4'
'50\r\n2449'
'5605'
',18'
'56,'
'383'
'2\r\n2449'
'609'
'3,18'
'56,'
'142'
'1\r\n244'
'965'
'82,'
'185'
'5,1'
'0'
'55\r\n244'
'9707'
'0,18'
'55,'
'3125'
'\r\n24497'
'558'
',185'
'2,0\r\n2449'
'8046'
',185'
'4,31'
'2'
'0\r\n24498'
'535'
',185'
'5,4095\r\n244'
'9902'
'3,185'
'1,4'
'095\r\n24499'
'511'
',1851,'
'0\r\n24'
'500'
'000'
',185'
'3,0\r\n24'
'500'
'488'
',185'
'5,7'
'82\r\n24'
'500'
'976,185'
'3,4095\r\n24'
'501'
'464'
',185'
'6,4095\r\n24'
'501'
'953,185'
'7,0\r\n24'
'5'
'0'
'244'
'1,185'
'6,0\r\n24'
'502'
'929,185'
'7,34'
'5\r\n2450'
'3417'
',1857,'
'4095\r\n2450'
'390'
'6,18'
'61,4'
'095\r\n2450'
'439'
'4,18'
'69,'
'0\r\n2450'
'488'
'2,18'
'72,0\r\n24'
'505'
'371'
',1872,'
'386'
'2\r\n2450'
'585'
'9,18'
'73,'
'194'
'2\r\n2450'
'6'
'347,18'
'72,'
'7'
'95\r\n2450'
'683'
'5,18'
'73,'
'4095\r\n2450'
'7'
'324,18'
'78,'
'0\r\n2450'
'781'
'2,187'
'5,0\r\n24'
'5'
'083'
'00,18'
'80,'
'640'
'\r\n24508'
'789,18'
'79,'
'640\r\n2450'
'9'
'277'
',187'
'8,3'
'951'
'\r\n24509'
'765'
',187'
'9,4095\r\n24'
'510'
'253,18'
'72,'
'4095\r\n24510'
'742,187'
'3,4095\r\n245'
'1123'
'0,18'
'72,4095\r\n2451'
'171'
'8,18'
'72,'
'0\r\n245'
'1'
'2'
'207,1873,'
'0\r\n24512'
'695,18'
'82,1'
'386'
'\r\n2451'
'318'
'3,187'
'3'
','
'6'
'8\r\n24'
'513'
'671,187'
'0,4095\r\n24'
'5'
'141'
'60,'
'186'
'3,4095\r\n2451'
'464'
'8,18'
'59,'
'0\r\n2451'
'5136'
',1859,'
'977\r\n24'
'515'
'625'
',185'
'6,0\r\n245'
'1'
'6'
'113'
',185'
'5,4095\r\n24'
'516'
'601'
',185'
'4,4095\r\n24'
'517'
'089'
',185'
'2,2'
'415'
'\r\n24517'
'578'
',18'
'49,0\r\n24'
'518'
'066'
',18'
'54,'
'0\r\n24518'
'554'
',1854,'
'145'
'6\r\n2451'
'9'
'042'
',18'
'49,0\r\n2451'
'953'
'1,1854,'
'2'
'213'
'\r\n245'
'2'
'0'
'019'
',1854,'
'4095\r\n245'
'2050'
'7,18'
'56,'
'4095\r\n24520'
'996'
',185'
'3,0\r\n245'
'214'
'84,185'
'5,0\r\n245'
'2197'
'2,18'
'56,4095\r\n2452'
'246'
'0,18'
'55,'
'4'
'0'
'95\r\n24522'
'949'
',185'
'7,0\r\n245'
'234'
'37,185'
'8,0\r\n245'
'2'
'3925'
',18'
'65,1'
'239'
'\r\n2452'
'441'
'4,18'
'69,'
'4095\r\n2452'
'490'
'2,18'
'72,'
'4095\r\n2452'
'539'
'0,18'
'72,'
'259'
'4\r\n24'
'525'
'878,'
'1872,'
'0\r\n2452'
'636'
'7,18'
'75,0\r\n245'
'2'
'6'
'855,'
'187'
'3,4095\r\n245'
'273'
'43,18'
'77,'
'4095\r\n24527'
'832'
',187'
'6,3'
'295'
'\r\n2452'
'832'
'0,187'
'9,0\r\n245'
'2'
'880'
'8,187'
'8,0\r\n2452'
'929'
'6,18'
'79,'
'4095\r\n2452'
'978'
'5,187'
'9,4095\r\n245'
'3027'
'3,187'
'2,4095\r\n245'
'3'
'0'
'761,187'
'4,9'
'3'
'4\r\n245'
'312'
'50,'
'187'
'3,0\r\n245'
'3'
'1'
'738,187'
'3,0\r\n2453'
'222'
'6,187'
'5,0\r\n245'
'327'
'14,18'
'78,'
'4095\r\n2453'
'320'
'3,187'
'3,4095\r\n245'
'33691'
',187'
'0,0\r\n24'
'5'
'3417'
'9,18'
'63,'
'0\r\n24534'
'667'
',186'
'1,1'
'300'
'\r\n2453'
'5156'
',186'
'0,0\r\n2453'
'5644'
',18'
'56,1'
'305'
'\r\n2453'
'613'
'2,18'
'55,'
'4095\r\n2453'
'6621'
',185'
'7,3'
'998'
'\r\n2453'
'710'
'9,18'
'51,0\r\n245'
'375'
'97,18'
'50,'
'0\r\n2453'
'808'
'5,18'
Thanks in advance for any help given
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
