'How to calculate mean of specific rows in python dataframe?

I have a dataframe with 11 000k rows. There are multiple columns but I am interested only in 2 of them: TagName and Samples_Value. One tag can repeat itself multiple times among rows. I want to calculate the average value for each tag and create a new dataframe with the average value for each tag. I don't really know how to walk through rows and how to calculate the average. Any help will be highly appreciated. Thank you!

TagName     DataType    TimeStamp           Sample_value Samples_quality
Utilities   Float      2019-01-01 13:00:00  1060805.75      3
Utilities   Float      2019-01-01 17:30:00  136.8110352     3
Utilities   Float      2019-01-01 17:45:00  136.7932892     3
Steam       Float      2019-01-01 14:01:00  1086446.875     3
Steam       Float      2019-01-01 14:00:00  1076446.875     3

RangeIndex: 11140487 entries, 0 to 11140486
Data columns (total 6 columns):
TagName              object
Samples_Value        object

This is what I have and I know it is really noob ish but I am having a difficult time walking through rows.

for i in range(0, len(df):
if((df.iloc[i]['DataType']!='Undefined')):
    print df.loc[df['Tagname'] == df.iloc[i]['TagName'], df.iloc[i]['Samples_Value']].mean()
  
   


Solution 1:[1]

It sounds like the groupby() functionality is what you want. You define the column where your groups are and then you can take the mean() of each group. An example from the documentation:

df = pd.DataFrame({'A': [1, 1, 2, 1, 2],
                   'B': [np.nan, 2, 3, 4, 5],
                   'C': [1, 2, 1, 1, 2]}, columns=['A', 'B', 'C'])

df.groupby('A').mean()

Output:

     B         C
A
1  3.0  1.333333
2  4.0  1.500000

In your case it would be something like this:

df.groupby('TagName')['Samples_value'].mean()

Edit: So, I applied the code to your provided input dataframe and following is the output:

TagName
Steam        1.081447e+06
Utilities    3.536931e+05
Name: Sample_value, dtype: float64

Is this what you are looking for?

Solution 2:[2]

You don't need to walk through the rows, you can just take all of the fields that match your criteria

d = {'col1': [1,2,1,2,1,2], 'col2': [3, 4,5,6,7,8]}
df = pd.DataFrame(data=d)

#iterate over all unique entries in col1
for entry in  df["col1"].unique():
    # get all the col2 values where col1 is the current iter of col1 entries
    meanofcurrententry=df[df["col1"]==entry]["col2"].mean()
    print(meanofcurrententry)

This is not a full solution, but I think it helps more to understand the necessary logic. You still need to wrap it up into your own dataframe, however it hopefuly helps to understand how to use the indexing

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2