'Optimisation of an alignment of 2 dataframes Python
I have two bigger panda dataframes. The first one is a set of queries which I have stored in Google BigQuery and the second on is a resultset which I get from the Google Search Console. Before I store the resultset in the table, first I want to filter out queries which are new in the resultset and store them in the queries table and second add the query_id from the query table in the GSC resultset. My actual code looks like this. The code works but I am pretty new to python and think there are optimization options which makes them faster.
x = response['rows']
df = pd.DataFrame.from_dict(x)
# split the keys list into columns
df[['query','page']] = pd.DataFrame(df['keys'].values.tolist(), index= df.index)
# Drop the key columns
result = df.drop(['keys'],axis=1)
# Add the run_id
result.insert(0, 'run_id', run_id)
# Add the project_id
result.insert(1, 'project_id', project_id)
# Add the page_id
result.insert(2, 'page_id', 0)
# Add the query_id
result.insert(3, 'query_id', 0)
global queries_table_df
queries_table_changed = False
new_queries_table_df = pd.DataFrame(columns = ['id', 'query'])
# check queries
# TODO This is the main part which should be optimized
for query in result.get('query'):
#print(query)
df = queries_table_df.loc[queries_table_df['query'] == query]
#print(df)
if df.empty:
id_last = int(queries_table_df.iloc[-1:]['id'])
id_last = id_last + 1
df2 = {'id': id_last, 'query': query}
queries_table_df = queries_table_df.append(df2, ignore_index=True)
new_queries_table_df = new_queries_table_df.append(df2, ignore_index=True)
queries_table_changed = True
#print(queries_table_df)
result.loc[result['query'] == query, ['query_id']] = id_last
else:
id = int(df.iloc[0]['id'])
result.loc[result['query'] == query, ['query_id']] = id
# if queries table has changed store in DB
if queries_table_changed:
queries_table.update_queries_table(new_queries_table_df)
Thank you for your help. Best regards Michael
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
