'How to make the dataframe faster ? either by using dictionary or numpy?
I am new to data structures and I would like to make my code faster (this is just part of a bigger code). Using dataframes while looking up variables is slowing down the getting the instances. I have two daraframes as
df = pd.DataFrame({ 'id1':['A', 'B', 'C','D','J','K','H'], 'id2':[['K','J'], 'NaN',['A'],['B','C'],'NaN','NaN',['A']],'length':[2,'NaN',1,2,'NaN','NaN',1]})
where length is number of elements in the list (id2), and
df2 = pd.DataFrame({ 'col1':['B', 'C', 'A','K','J','A'], 'col2':['D', 'D', 'C','A','A','H'],'type':['FF', 'SS', 'FS','SF','FS','FF']})
the part which is slow is
import time
startTime = time.time()
key='D'
if df.loc[df['id1'] == key, 'id2'].any():
print('continue')
for pred in df[df['id1']==key]['id2'].values[0]:
reltype=df2.loc[(df2['col1'] == pred) & (df2['col2']==key), 'type'].values[0]
print(reltype)
executionTime = (time.time() - startTime)
print('Execution time in seconds: ' + str(executionTime))
What I did is to convert the df into dictionary and rewrite my code as below but I do not how to rewrite the 'reltype' part and make it faster.
df_dic=df.groupby('id1')[['id2', 'length']].apply(lambda g: g.values.tolist()).to_dict()
key='D'
if ~np.isnan(df_dic[key][0][1]):
print('continue')
for pred in df_dic[key][0][0]:
reltype=df2.loc[(df2['col1'] == pred) & (df2['col2']==key), 'type'].values[0]
print(reltype)
I am interested in making the code above faster especially the search in finding the 'reltype'. I would appreciate all suggestions. Thanks
Solution 1:[1]
I think your first implementation should be fast. There might be a small improvement if you don't repeat slicing:
for key in ['D']:
s = df.loc[df['id1'] == key, 'id2']
if s.any():
print('continue')
df_temp = df2.loc[df2['col2'] == key, :]
for pred in s.values[0]:
if pred != NaN:
reltype = df_temp.loc[df_temp['col1'] == pred, 'type'].values[0]
print(reltype)
depending on your data using df_temp might be slower than using a Boolean mask for the slicing
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
