'Python Pandas - Finding the number of events between two dates (1 data frame and 1 list)
I have a csv file with 2 datetimes (pre-start and pre-end) in each row, as well as a list of datetimes (install_list).
I am trying to iterate through the csv file and add a column that returns the total number of dates from the install_list that are between the pre-start time and the pre-end time in each row.
I am using the code below, but it is returning the total number of items in the list for each row in the csv.
example: File 1 = start time, end time List 1 = install time
Desired Result for Each Row = IF Install Time >= Start Time AND Install Time <= End Time, SUM(Installs)
Col1 (Start Time): 1/1/21 12:00:00 PM
Col2 (End Time): 1/1/21 12:10:00 PM
Install Time List = [1/1/21 12:05:00 PM, 1/1/21 12:11:00 PM]
Desired Result for Row1/Col3 = 1
Code Below:
import datetime
import pandas as pd
from collections import Counter
df_post_logs = pd.read_csv('logs_merged.csv',index_col=False)
df_installs = pd.read_csv('install_merge.csv',index_col=False)
'''Convert UTC to EST on Installs Add Column'''
df_installs['conversion date'] = pd.to_datetime(df_installs['conversion date'],infer_datetime_format='%Y-%m-%d')
df_installs['conversion time'] = pd.to_datetime(df_installs['conversion time'],infer_datetime_format='%H:%S:%M')
utc_datetime = df_installs['conversion time']
est_datetime = utc_datetime - datetime.timedelta(hours=5)
df_installs['utc datetime'] = utc_datetime
df_installs['est datetime'] = est_datetime
'''Add Column 10 Minutes Pre-Spot Time to Post Logs/10 Minutes Post Time to Spot'''
df_post_logs['Air Date'] = pd.to_datetime(df_post_logs['Air Date'],infer_datetime_format='%Y-%m-%d')
df_post_logs['Air Time'] = pd.to_datetime(df_post_logs['Air Time'],infer_datetime_format='%H:%S:%M')
timestamp = df_post_logs['Air Time']
df_post_logs['timestamp'] = timestamp
df_post_logs['pre spot time start'] = timestamp - datetime.timedelta(minutes=10, seconds=1)
df_post_logs['pre spot time end'] = timestamp - datetime.timedelta(seconds=1)
df_post_logs['post spot time'] = timestamp + datetime.timedelta(minutes=10)
'''SUM of Installs between pre-spot time'''
install_list = pd.to_datetime(df_installs['est datetime']).to_list()
for pre_spot_start in df_post_logs['pre spot time start']:
pre_spot_start_time = pre_spot_start
for pre_spot_end in df_post_logs['pre spot time end']:
pre_spot_end_time = pre_spot_end
for pre_spot_end in df_post_logs['pre spot time end']:
pre_spot_end_time = pre_spot_end
pre_spot_install = 0
for row in df_post_logs:
for date in install_list:
if date >= pre_spot_start_time and date <= pre_spot_end_time:
pre_spot_install = pre_spot_install+1
df_post_logs['Pre Spot Install'] = pre_spot_install
df_post_logs.to_csv('Test.csv')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
