'How to count values from one column based on the description in other column in pandas?
I am new to programming and Python. Lately, I am learning to use pandas.
What I would like to know
I am wondering what would be the best approach to work only on numbers related to Group II (in the attached DataFrame). I mean e.g. sum all grades for group II and column 'Project'. Sure it won't make sense to sum grades, but the data is just for illustration purposes.
I'd be grateful for any advices and suggestions.
My DataFrame
The code attached will generate random numbers (except for the 'Group' column) but the DataFrame will always be like that:
Name Album Group Colloquium_1 Colloquium_2 Project
# 0 B 61738 I 5 4 5
# 1 Z 44071 I 5 5 2
# 2 M 87060 I 5 5 5
# 3 L 67974 I 3 5 3
# 4 Z 15617 I 3 2 3
# 5 Z 91872 II 2 4 5
# 6 H 84685 II 4 2 5
# 7 T 17943 II 2 5 2
# 8 L 54302 II 2 5 3
# 9 O 53433 II 5 4 5
Code to generate my DataFrame:
import pandas as pd
import random as rd
def gen_num():
num = ""
for i in range(5):
num += str(rd.randint(0,9))
return num
names = ['A','B','C','D','E','F','G','H','I','J','K', 'L','M','N','O', \
'P','R','S','T','W','Z']
list_names = []
list_album = []
list_group = []
list_coll_1 = []
list_coll_2 = []
list_project = []
num_of_students = 10
for i in range(num_of_students):
list_names.append(rd.choice(names))
list_album.append(gen_num())
list_coll_1.append(rd.randint(2, 5))
list_coll_2.append(rd.randint(2, 5))
list_project.append(rd.randint(2, 5))
if i < (num_of_students / 2):
list_group.append('I')
else:
list_group.append('II')
group = pd.DataFrame(list_names)
group.set_axis(['Name'], axis=1, inplace=True)
group['Album'] = list_album
group['Group'] = list_group
group['Colloquium_1'] = list_coll_1
group['Colloquium_2'] = list_coll_2
group['Project'] = list_project
Solution 1:[1]
You can use the DataFrame.groupby function to analyze data from one or many columns based on "groups" defined in other columns.
For example, something like
group.groupby('Group')['Project'].sum()
Or,you could use masking if you only want the result:
group[group['Group']=='II']['Project'].sum()
Solution 2:[2]
If your only working with 'Group II' data, it may be best to reassign your df to a new variable:
df_ii = df[df.Group=='II']
Next you could sum the 'Project' grades:
df_ii.Project.sum()
Or, if summing doesn't make sense, you could take the average:
df_ii.Project.mean()
Solution 3:[3]
I'm not certain if you meant 3 months after today or before today, but if you have =IF(ISBLANK(@B:B),"",TODAY()) as the formula in column A, insert this into the worksheet object code.
Option Explicit
#Const ShowErrMsg = True 'Change to False if you want the invalid insertion to fail silently and not send the user an error message.
Private Sub Worksheet_Change(ByVal Target As Range)
Const InvalidDateErrorNumber = 1234 + vbObjectError 'Always add vbObjectError to custom error numbers inside a class
Dim cel As Excel.Range, ChangedDueDateRange As Excel.Range
Dim ErrMsg As String
On Error GoTo EH_InvalidDueDate
Set ChangedDueDateRange = Excel.Intersect(Target, Me.Range("B:B")) 'You might change a large range of cells, but we're only concerned with those in Column B
If Not ChangedDueDateRange Is Nothing Then
For Each cel In ChangedDueDateRange
CellCleared: 'Return here after clearing the cell.
If Not cel.Value = vbEmpty Then
If CDate(cel.Value) > VBA.DateTime.DateAdd("m", 3, VBA.Date) Then 'CDate in case you end up pasting a number that could be equivalent to a date.
Err.Raise InvalidDateErrorNumber, Source:=Me.Name, Description:="Invalid Date"
#If ShowErrMsg Then 'This sort of #if is a compiler directive that basically toggles code on and off without evaluating a condition at runtime.
VBA.Interaction.MsgBox ErrMsg, Buttons:=VbMsgBoxStyle.vbExclamation, Title:="Invalid Date"
#End If
End If
End If
Next cel
End If
Exit Sub
EH_InvalidDueDate:
ErrMsg = cel.Address(RowAbsolute:=False, Columnabsolute:=False)
Select Case Err.Number
Case 13 '13 is type mismatch, in case the value inserted is not even a date.
ErrMsg = "Insert a date up to 3 months after today into cell " & ErrMsg & vbNewLine & ". You entered a " & TypeName(cel.Value)
Case InvalidDateErrorNumber
ErrMsg = "Date inserted in cell " & ErrMsg & " is more than 3 months after today."
Case Else
Err.Raise Err.Number, Err.Source, Err.Description, Err.HelpFile, Err.HelpContext
End Select
With Application 'Temporarily disable events to avoid triggering an infinite loop of change events.
.EnableEvents = False
cel.ClearContents
.EnableEvents = True
End With
Resume CellCleared
End Sub
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | alazyworkaholic |
