'Finding threshold in one column of panda df and calculating mean of a different column between two thresholds
I have a csv file with various sensor voltage. I want to find the average of the average temperatures between the points when the Water is 'on' (threshold of 1) I've had some luck in the past following other stackoverflow tutorials, specifically one Identify when time series passes through threshold both in graph and table, but for some reason its not working anymore
Time Temp1 Temp2 Temp3 Air Water Feedback Pressure NozzleTemp Heater
0 3727951321 95.440605 88.173698 89.902397 -8.20E-05 0.001976 0.101112 0.524771 4.100569
1 3727951322 95.334641 88.115692 90.337181 -8.80E-05 0.002014 0.101112 0.524258 4.102334
12 3727951323 91.830994 86.411652 88.358047 -9.20E-05 0.002153 0.101112 0.52425 4.103815
77 3727951329 87.253204 84.673782 89.502716 -0.000106 0.001974 0.101112 0.516808 4.118995
78 3727951329 87.264526 83.593277 89.379059 -6.90E-05 0.001873 0.101112 0.512641 4.124267
79 3727951329 87.801315 83.889191 89.339783 -7.70E-05 0.002143 0.101112 0.51734 4.120327
my code so far :
# This is a test script to convert a DAT to a CSV
# import
import csv
import pandas as pd
import numpy as np
pd.set_option("display.max_rows", 50, "display.max_columns", 50)
DATFILE = "/6ozStrength2.dat"
CSVFILE = "/6ozStrength2.csv"
headerList = ['Time', 'Temp1', 'Temp2', 'Temp3', 'Air', 'Water', 'Feedback', 'Pressure', 'NozzleTemp', 'Heater']
WaterThreshold = 1
AirThreshold = .04
#convert DAT to CSV
df = pd.read_csv(DATFILE, names=headerList, sep='\t')
df.to_csv(CSVFILE)
#print(df['Water'])
water_pump_row_index = 0
pump_active = False
last_row_of_transition = 0
list_of_water_pump_data = df['Water']
for water_pump in list_of_water_pump_data:
if water_pump > WaterThreshold and pump_active is False:
pump_active = True
print(water_pump_row_index, "Turn On", water_pump, "Diff:", water_pump_row_index - last_row_of_transition)
last_row_of_transition = water_pump_row_index
elif water_pump < 0.5 and pump_active is True:
pump_active = False
print(water_pump_row_index, "Turn Off", water_pump, "Diff:", water_pump_row_index - last_row_of_transition)
last_row_of_transition = water_pump_row_index
water_pump_row_index = water_pump_row_index + 1
I've also tried
# This is a test script to convert a DAT to a CSV
# import
import csv
import pandas as pd
import numpy as np
pd.set_option("display.max_rows", 50, "display.max_columns", 50)
DATFILE = r'/6ozStrength2.dat'
CSVFILE = r'/6ozStrength2.csv'
headerList = ['Time', 'Temp1', 'Temp2', 'Temp3', 'Air', 'Water', 'Feedback', 'Pressure', 'NozzleTemp', 'Heater']
WaterThreshold = 1
AirThreshold = .04
#convert DAT to CSV
df = pd.read_csv(DATFILE, names=headerList, sep='\t')
df.to_csv(CSVFILE)
print(df.loc[df['Water']>=1])
df['Water_Below_Cutoff'] = np.where(df['Water'] > WaterThreshold, 1,0)
print(df['Water_Below_Cutoff'])
df["Changes"] = df['Water_Below_Cutoff'].diff()
print(df["Changes"])
only_changes = df[~pd.isna(df['Changes'])]
Solution 1:[1]
I think an optimal solution here would be to turn this into a function. I hope I am following the math you want to do adequately, but if not this should get you in the right direction. I am happy to follow up and tweak this until you get expected results
import pandas as pd
import numpy as np
from statistics import mean
df=pd.DataFrame(data=
{
'Temp1': [95.440605, 95.334641],
'Temp2': [88.173698, 88.115692],
'Temp3': [89.902397, 90.337181],
'Water': [0.001976, 1.001976],
})
def calc_avg_temp(temp1: float, temp2: float, temp3: float, water: float) -> float:
result = np.nan
if water >= 1.0:
result = mean([temp1, temp2, temp3])
return result
vector_calc_avg_temp = np.vectorize(calc_avg_temp)
df['Result'] = vector_calc_avg_temp(df['Temp1'], df['Temp2'], df['Temp3'], df['Water'])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | git_rekt |

