'Finding threshold in one column of panda df and calculating mean of a different column between two thresholds

I have a csv file with various sensor voltage. I want to find the average of the average temperatures between the points when the Water is 'on' (threshold of 1) I've had some luck in the past following other stackoverflow tutorials, specifically one Identify when time series passes through threshold both in graph and table, but for some reason its not working anymore

    Time    Temp1   Temp2   Temp3   Air Water   Feedback    Pressure    NozzleTemp  Heater
0   3727951321  95.440605   88.173698   89.902397   -8.20E-05   0.001976    0.101112    0.524771    4.100569    
1   3727951322  95.334641   88.115692   90.337181   -8.80E-05   0.002014    0.101112    0.524258    4.102334    

12  3727951323  91.830994   86.411652   88.358047   -9.20E-05   0.002153    0.101112    0.52425 4.103815    
77  3727951329  87.253204   84.673782   89.502716   -0.000106   0.001974    0.101112    0.516808    4.118995
78  3727951329  87.264526   83.593277   89.379059   -6.90E-05   0.001873    0.101112    0.512641    4.124267
79  3727951329  87.801315   83.889191   89.339783   -7.70E-05   0.002143    0.101112    0.51734 4.120327

my code so far :

# This is a test script to convert a DAT to a CSV
# import
import csv
import pandas as pd
import numpy as np
pd.set_option("display.max_rows", 50, "display.max_columns", 50)

DATFILE = "/6ozStrength2.dat"
CSVFILE = "/6ozStrength2.csv"

headerList = ['Time', 'Temp1', 'Temp2', 'Temp3', 'Air', 'Water', 'Feedback', 'Pressure', 'NozzleTemp', 'Heater']
WaterThreshold = 1
AirThreshold = .04
#convert DAT to CSV
df = pd.read_csv(DATFILE, names=headerList, sep='\t')
df.to_csv(CSVFILE)

#print(df['Water'])
water_pump_row_index = 0
pump_active = False
last_row_of_transition = 0
list_of_water_pump_data = df['Water']

for water_pump in list_of_water_pump_data:

    if water_pump > WaterThreshold and pump_active is False:
        pump_active = True
        print(water_pump_row_index, "Turn On", water_pump, "Diff:", water_pump_row_index - last_row_of_transition)
        last_row_of_transition = water_pump_row_index
        
    elif water_pump < 0.5 and pump_active is True:
        pump_active = False
        print(water_pump_row_index, "Turn Off", water_pump, "Diff:", water_pump_row_index - last_row_of_transition)
        last_row_of_transition = water_pump_row_index
    
    water_pump_row_index = water_pump_row_index + 1

I've also tried

# This is a test script to convert a DAT to a CSV
# import
import csv
import pandas as pd
import numpy as np
pd.set_option("display.max_rows", 50, "display.max_columns", 50)

DATFILE = r'/6ozStrength2.dat' 
CSVFILE = r'/6ozStrength2.csv'

headerList = ['Time', 'Temp1', 'Temp2', 'Temp3', 'Air', 'Water', 'Feedback', 'Pressure', 'NozzleTemp', 'Heater']
WaterThreshold = 1
AirThreshold = .04
#convert DAT to CSV
df = pd.read_csv(DATFILE, names=headerList, sep='\t')
df.to_csv(CSVFILE)

print(df.loc[df['Water']>=1])

df['Water_Below_Cutoff'] = np.where(df['Water'] > WaterThreshold, 1,0)

print(df['Water_Below_Cutoff'])

df["Changes"] = df['Water_Below_Cutoff'].diff()
print(df["Changes"])
only_changes = df[~pd.isna(df['Changes'])]

Solution 1:^[1]

I think an optimal solution here would be to turn this into a function. I hope I am following the math you want to do adequately, but if not this should get you in the right direction. I am happy to follow up and tweak this until you get expected results

import pandas as pd
import numpy as np
from statistics import mean

df=pd.DataFrame(data=
    {
        'Temp1': [95.440605, 95.334641], 
        'Temp2': [88.173698, 88.115692],
        'Temp3': [89.902397, 90.337181],
        'Water': [0.001976, 1.001976],
     })

def calc_avg_temp(temp1: float, temp2: float, temp3: float, water: float) -> float:
    result = np.nan
    
    if water >= 1.0:
        result = mean([temp1, temp2, temp3])
    return result

vector_calc_avg_temp = np.vectorize(calc_avg_temp)

df['Result'] = vector_calc_avg_temp(df['Temp1'], df['Temp2'], df['Temp3'], df['Water'])

The result:

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	git_rekt

'Finding threshold in one column of panda df and calculating mean of a different column between two thresholds

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]