'Get total no of classes of each subject within a semester using pandas

Time table, columns=hour, rows=weekday, data=subject

[weekday x hour]

                               1                      2                 3             4                 5                      6                      7
Name                                                                                                                                                   
Monday                   Project                Project           Project  Data Science  Embedded Systems            Data Mining  Industrial Psychology
Tuesday                  Project                Project           Project       Project      Data Science  Industrial Psychology       Embedded Systems
Wednesday           Data Science                Project           Project       Project           Project                Project                Project
Thursday             Data Mining  Industrial Psychology  Embedded Systems   Data Mining           Project                Project                Project
Friday     Industrial Psychology       Embedded Systems      Data Science   Data Mining           Project                Project                Project

Frequency table rows=weekday, columns=subject, data = subject frequency in the corresponding weekday

[weekday x subject]

Data       Data Mining  Data Science  Embedded Systems  Industrial Psychology  Project
Name                                                                                  
Friday               1             1                 1                      1        3
Monday               1             1                 1                      1        3
Thursday             2             0                 1                      1        3
Tuesday              0             1                 1                      1        4
Wednesday            0             1                 0                      0        6                            

Code

self.start = datetime(2022, 1, 1)
self.end = datetime(2022, 3, 31)

self.file = 'timetable.csv'
self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
self.subject_frequency = self.sdf.apply(pd.value_counts).fillna(0)
print(self.subject_frequency.to_string())
self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)

self.p = self.sdf.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)\
            .pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
print(self.p.to_string())

Required Table

                       classes ...
Data Mining            32        
Data Science           32
Embedded Systems       32
Industrial Psychology  32
Project                146     

Will be adding more columns later, like current attendance percentage, percentage drop for each class missed, percent losses for taking leaves on Monday, Tuesday, ... etc so as to subtract them from attendance percentage.

The end goal is to analyse which day is safe to take a leave, and to monitor my percentage. If my direction could be better, please advise me.



Solution 1:[1]

select_rows = [date.strftime("%A") for date in pd.bdate_range(self.start, self.end)]
r = self.p.loc[select_rows, :]
print(r.to_string())
print(r.sum())

Please feel free to add a simpler code, design advice is also appreciated!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Grace Mathew