'Optimisation of an alignment of 2 dataframes Python

I have two bigger panda dataframes. The first one is a set of queries which I have stored in Google BigQuery and the second on is a resultset which I get from the Google Search Console. Before I store the resultset in the table, first I want to filter out queries which are new in the resultset and store them in the queries table and second add the query_id from the query table in the GSC resultset. My actual code looks like this. The code works but I am pretty new to python and think there are optimization options which makes them faster.

x = response['rows']
        df = pd.DataFrame.from_dict(x)
        # split the keys list into columns
        df[['query','page']] = pd.DataFrame(df['keys'].values.tolist(), index= df.index)
        # Drop the key columns
        result = df.drop(['keys'],axis=1)
        # Add the run_id
        result.insert(0, 'run_id', run_id)
        # Add the project_id
        result.insert(1, 'project_id', project_id)
        # Add the page_id
        result.insert(2, 'page_id', 0)
        # Add the query_id
        result.insert(3, 'query_id', 0)

        global queries_table_df
        queries_table_changed = False
        new_queries_table_df = pd.DataFrame(columns = ['id', 'query'])
        # check queries
        # TODO This is the main part which should be optimized
        for query in result.get('query'):
            #print(query)
            df = queries_table_df.loc[queries_table_df['query'] == query]
            #print(df)
            if df.empty:
                id_last = int(queries_table_df.iloc[-1:]['id'])
                id_last = id_last + 1
                df2 = {'id': id_last, 'query': query}
                queries_table_df = queries_table_df.append(df2, ignore_index=True)
                new_queries_table_df = new_queries_table_df.append(df2, ignore_index=True)
                queries_table_changed = True
                #print(queries_table_df)
                result.loc[result['query'] == query, ['query_id']] = id_last
            else:
                id = int(df.iloc[0]['id'])
                result.loc[result['query'] == query, ['query_id']] = id

        # if queries table has changed store in DB
        if queries_table_changed:
            queries_table.update_queries_table(new_queries_table_df)

Thank you for your help. Best regards Michael



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source