'Return a user that is not the following step, but was in the first

I have the following sample dataset that was the result of a groupby where i grouped by Steps and CampaignSource. And return the grouped UserIds, in a Set

df2=df[['CampaignSource','UserId','Steps']].groupby(['Steps','CampaignSource'],as_index=False).agg(lambda x: set(x))

Steps CampaignSource Set_UserId
"Step-1" "Apple" "Jeff","John","Antonio","Jon"
"Step-1" "Banana" "Jeff","John","Antonio",Jon"
"Step-1" "Potato" "Jeff","John","Antonio",Jon"
"Step-2" "Apple" "Jeff","John"
"Step-2" "Banana" "Jeff","John","Antonio"
"Step-2" "Potato" "Jeff","John"
"Step-3" "Apple" "Jeff"
"Step-3" "Banana" "Jeff","John"
"Step-3" "Potato" "Jeff"

Wanted end result

Steps CampaignSource Set_UserId
"Step-1" "Apple" "Antonio","Jon"
"Step-1" "Banana" "Jon"
"Step-1" "Potato" "Antonio","Jon"
"Step-2" "Apple" "John"
"Step-2" "Banana" "Antonio"
"Step-2" "Potato" "John"

Basically as you can see by the sample and end result, i want to bring up the UserIds, that are in the first step but not in the second one. And after that, the ones who are in the second, but not in the third. This is basically a loss report that returns me the userid. Here is my code, my attempt worked . But sincerely it lacks flexibility so i have been wanting. For some better ways, would appreciate some inputs

for i,z in enumerate(zip(df2['CampaignSource'],df2['UserId'])):
print(z[0])
if z[0] == 'Apple':
    if i == 0:
        k = i
    else:
        list_userid.append(df2['UserId'][k]-df2['UserId'][i])
        k = i
        
if z[0] == 'Banana':
    if i == 1:
        a = i
    else:
        list_userid.append(df2['UserId'][a]-df2['UserId'][i])
        a = i
        
if z[0] == 'Potato':
    if i == 2:
        b = i
    else:
        list_userid.append(df2['UserId'][b]-df2['UserId'][i])
        b = i


Solution 1:[1]

pd.DataFrame([item for sub in (list(df.groupby("CampaignSource").agg(lambda x: x).apply(lambda x: list(zip([x.name] * len(x["Steps"]), x["Steps"][:-1], [(list(set(s) - set(x["Set_UserId"][i+1]))) for i,s in enumerate(x["Set_UserId"][:-1])])), axis=1).to_dict().values())) for item in sub])

with a few complexity :)) if you want exactly that shape... if your desired can be in another shape, it can be simpler

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 MoRe