'Compare Student ID in multiple row and get the desired output

I have a dataframe like this

Student ID Subject Current class assessment class prev cls current cls next cls
101 English Class 2 Class 2 -- Class2 Pass --
101 English Class2 Class3 -- -- Class3 Pass
101 Mathematics Class 2 Class 2 -- Class 2 fail --
101 Mathematics Class 2 Class 1 Class 1 fail -- --
102 English Class 2 Class 2 -- Class2 Pass --
102 English Class 2 Class 3 -- -- Class 3 Fail
102 Mathematics Class2 Class2 -- Class2 fail --
102 Mathematics Class2 Class1 Class1 Pass -- --

I want a output like this

Student ID Subject Current class Expert lvl
101 English Class2 class3 pass
101 Mathematics Class 2 class 1 fail
102 English Class2 class2 pass
102 Mathematics Class 2 class 1 pass

The condition are a student who belongs to class-2 takes a quiz of class-2

  • If gets pass he/she goes one level up and tries class-3 quiz.
  • If he/she passes the class-3 quiz his/her expertise level is fixed as class-3 in that subject.
  • If he/she fails class-3 quiz his/her expert level is fixed as class-2 (only If he/she has passed class-2 quiz) otherwise his/her expert level is fixed as class-1 either he passes or fails class-1.

How do I write in python to get this kind of data frame?



Solution 1:[1]

First step would be to group all cls in a single column (not sure if more than one cls column can be different than -- but this would be handled here) and group by Student ID and Subject:

df['cls'] = df[['prev cls', 'current cls', 'next cls']].agg(lambda x: [i for i in x if i!='--'], axis=1)
df = df.groupby(['Student ID', 'Subject'], as_index=False).agg({'Current class': 'last', 'cls': 'sum'})

This gives you the df:

   Student ID      Subject Current class                           cls
0         101      English        Class2    [Class2 Pass, Class3 Pass]
1         101  Mathematics       Class 2  [Class 2 fail, Class 1 fail]
2         102      English       Class 2   [Class2 Pass, Class 3 Fail]
3         102  Mathematics        Class2    [Class2 fail, Class1 Pass]

From there you can transform the cls column with a function that will get the max passed level or the lowest failed level:

import re
def get_expert_lvl(tests: list):
    passed = [x for x in tests if 'pass' in x.lower()]
    if passed:
        return max(passed, key=lambda x: int(re.search(r'\d+', x).group()))
    else:
        return min(tests, key=lambda x: int(re.search(r'\d+', x).group()))

df['Expert lvl'] = df['cls'].transform(get_expert_lvl)
df.drop('cls', axis=1, inplace=True)

You end up with:

   Student ID      Subject Current class    Expert lvl
0         101      English        Class2   Class3 Pass
1         101  Mathematics       Class 2  Class 1 fail
2         102      English       Class 2   Class2 Pass
3         102  Mathematics        Class2   Class1 Pass

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tranbi