'Compare Student ID in multiple row and get the desired output

I have a dataframe like this

Student ID	Subject	Current class	assessment class	prev cls	current cls	next cls
101	English	Class 2	Class 2	--	Class2 Pass	--
101	English	Class2	Class3	--	--	Class3 Pass
101	Mathematics	Class 2	Class 2	--	Class 2 fail	--
101	Mathematics	Class 2	Class 1	Class 1 fail	--	--
102	English	Class 2	Class 2	--	Class2 Pass	--
102	English	Class 2	Class 3	--	--	Class 3 Fail
102	Mathematics	Class2	Class2	--	Class2 fail	--
102	Mathematics	Class2	Class1	Class1 Pass	--	--

I want a output like this

Student ID	Subject	Current class	Expert lvl
101	English	Class2	class3 pass
101	Mathematics	Class 2	class 1 fail
102	English	Class2	class2 pass
102	Mathematics	Class 2	class 1 pass

The condition are a student who belongs to class-2 takes a quiz of class-2

If gets pass he/she goes one level up and tries class-3 quiz.
If he/she passes the class-3 quiz his/her expertise level is fixed as class-3 in that subject.
If he/she fails class-3 quiz his/her expert level is fixed as class-2 (only If he/she has passed class-2 quiz) otherwise his/her expert level is fixed as class-1 either he passes or fails class-1.

How do I write in python to get this kind of data frame?

Solution 1:^[1]

First step would be to group all cls in a single column (not sure if more than one cls column can be different than -- but this would be handled here) and group by Student ID and Subject:

df['cls'] = df[['prev cls', 'current cls', 'next cls']].agg(lambda x: [i for i in x if i!='--'], axis=1)
df = df.groupby(['Student ID', 'Subject'], as_index=False).agg({'Current class': 'last', 'cls': 'sum'})

This gives you the df:

   Student ID      Subject Current class                           cls
0         101      English        Class2    [Class2 Pass, Class3 Pass]
1         101  Mathematics       Class 2  [Class 2 fail, Class 1 fail]
2         102      English       Class 2   [Class2 Pass, Class 3 Fail]
3         102  Mathematics        Class2    [Class2 fail, Class1 Pass]

From there you can transform the cls column with a function that will get the max passed level or the lowest failed level:

import re
def get_expert_lvl(tests: list):
    passed = [x for x in tests if 'pass' in x.lower()]
    if passed:
        return max(passed, key=lambda x: int(re.search(r'\d+', x).group()))
    else:
        return min(tests, key=lambda x: int(re.search(r'\d+', x).group()))

df['Expert lvl'] = df['cls'].transform(get_expert_lvl)
df.drop('cls', axis=1, inplace=True)

You end up with:

   Student ID      Subject Current class    Expert lvl
0         101      English        Class2   Class3 Pass
1         101  Mathematics       Class 2  Class 1 fail
2         102      English       Class 2   Class2 Pass
3         102  Mathematics        Class2   Class1 Pass

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Tranbi

'Compare Student ID in multiple row and get the desired output

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]