'Compare Student ID in multiple row and get the desired output
I have a dataframe like this
| Student ID | Subject | Current class | assessment class | prev cls | current cls | next cls |
|---|---|---|---|---|---|---|
| 101 | English | Class 2 | Class 2 | -- | Class2 Pass | -- |
| 101 | English | Class2 | Class3 | -- | -- | Class3 Pass |
| 101 | Mathematics | Class 2 | Class 2 | -- | Class 2 fail | -- |
| 101 | Mathematics | Class 2 | Class 1 | Class 1 fail | -- | -- |
| 102 | English | Class 2 | Class 2 | -- | Class2 Pass | -- |
| 102 | English | Class 2 | Class 3 | -- | -- | Class 3 Fail |
| 102 | Mathematics | Class2 | Class2 | -- | Class2 fail | -- |
| 102 | Mathematics | Class2 | Class1 | Class1 Pass | -- | -- |
I want a output like this
| Student ID | Subject | Current class | Expert lvl |
|---|---|---|---|
| 101 | English | Class2 | class3 pass |
| 101 | Mathematics | Class 2 | class 1 fail |
| 102 | English | Class2 | class2 pass |
| 102 | Mathematics | Class 2 | class 1 pass |
The condition are a student who belongs to class-2 takes a quiz of class-2
- If gets pass he/she goes one level up and tries class-3 quiz.
- If he/she passes the class-3 quiz his/her expertise level is fixed as class-3 in that subject.
- If he/she fails class-3 quiz his/her expert level is fixed as class-2 (only If he/she has passed class-2 quiz) otherwise his/her expert level is fixed as class-1 either he passes or fails class-1.
How do I write in python to get this kind of data frame?
Solution 1:[1]
First step would be to group all cls in a single column (not sure if more than one cls column can be different than -- but this would be handled here) and group by Student ID and Subject:
df['cls'] = df[['prev cls', 'current cls', 'next cls']].agg(lambda x: [i for i in x if i!='--'], axis=1)
df = df.groupby(['Student ID', 'Subject'], as_index=False).agg({'Current class': 'last', 'cls': 'sum'})
This gives you the df:
Student ID Subject Current class cls
0 101 English Class2 [Class2 Pass, Class3 Pass]
1 101 Mathematics Class 2 [Class 2 fail, Class 1 fail]
2 102 English Class 2 [Class2 Pass, Class 3 Fail]
3 102 Mathematics Class2 [Class2 fail, Class1 Pass]
From there you can transform the cls column with a function that will get the max passed level or the lowest failed level:
import re
def get_expert_lvl(tests: list):
passed = [x for x in tests if 'pass' in x.lower()]
if passed:
return max(passed, key=lambda x: int(re.search(r'\d+', x).group()))
else:
return min(tests, key=lambda x: int(re.search(r'\d+', x).group()))
df['Expert lvl'] = df['cls'].transform(get_expert_lvl)
df.drop('cls', axis=1, inplace=True)
You end up with:
Student ID Subject Current class Expert lvl
0 101 English Class2 Class3 Pass
1 101 Mathematics Class 2 Class 1 fail
2 102 English Class 2 Class2 Pass
3 102 Mathematics Class2 Class1 Pass
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tranbi |
