'Looping through groupby object by index

I have a huge dataset that I need to pass to a fuzzy matching function in small chunks. I'm testing the dataset against itself so I need to group the batches by city so as to reduce the likelihood of duplicates in the batch.

I have been able to begin this with this logic (the fuzzy_match function takes a dataframe):

cities = insertion_cleaned[city_name].tolist()
batches = insertion_cleaned.groupby(insertion_cleaned[city_name])
for c in cities:
   t1 = time.time()
   final_df = fuzzy_match(batches.get_group(c))
   t2 = time.time()
   print(f"{round(t2-t1,2)} seconds to run fuzzy match for {round(len(cities),2)} leads.")

I need to be calling each element of the batch by index instead of c because I need to put this into a try except block to handle errors without stopping the progress through the loop. In other words if it breaks on one index location of c I need the loop to keep track of that index and increment the index by one to move on to the next element. Here is an example of something similar I did with something that was a little easier to loop via index:

def create_batches(df,n):
    chunks = np.array_split(df, n)
    return chunks
batches = create_batches(df,6)
index = 0
    while (index < len(batches)):
        p = batches[index]
        try:
            t1 = time.time()
            final_df = fuzzy_match(p)
            index +=1
            t2 = time.time()
            print(f"{round(t2-t1,2)} seconds to run fuzzy match for {round(len(p),2)} leads.") 
        except Exception:
            print("skipping to next")
            index +=1

pandas

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Looping through groupby object by index

Sources

Related Questions