'efficient way of computing a dataframe using concat and split

I am new to python/pandas/numpy and I need to create the following Dataframe:

DF  = pd.concat([pd.Series(x[2]).apply(lambda r: pd.Series(re.split('\@|/',r))).assign(id=x[0]) for x in hDF])

where hDF is a dataframe that has been created by:

hDF=pd.DataFrame(h.DF)

and h.DF is a list whose elements looks like this:

    ['5203906',
 ['highway=primary',
  'maxspeed=30',
  'oneway=yes',
  'ref=N 22',
  'surface=asphalt'],
 ['[email protected]/42.543651',
  '[email protected]/42.543561',
  '[email protected]/42.543523',
  '[email protected]/42.543474',
  '[email protected]/42.543469']]

However, in some cases the list is very long (O(10^7)) and also the list in h.DF[*][2] is very long, so I run out of memory.

I can obtain the same result, avoiding the use of the lambda function, like so:

DF  = pd.concat([pd.Series(x[2]).str.split('\@|/', expand=True).assign(id=x[0]) for x in hDF])

But I am still running out of memory in the cases where the lists are very long.

Can you think of a possible solution to obtain the same results without starving resources?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source