'How can I exclude/delete specific rows based on time between data entries?
I'm using the following code to parse a log file (from .csv to .csv) so that I will only see entries containing the action "Building Access":
with open('output.csv', 'w') as out:
for line in lines:
if not 'FilterThis1' in line and not 'FilterThis2' in line and not 'FilterThis3' in line and not 'Access Denied' in line:
out.write(line)
My output looks like this:
I'm not interested in Building Access entries that occur every minute. It's too much data to look at. I'd like to reduce this by showing only rows containing Building Access each hour. For example:
How can I accomplish this?
Edit:
Sorry, I forgot to add... I'd also like to include grouping by USER and ACTION. If there are different users, I don't want their actions to be deleted, but to only include their hourly logs. Same with their actions. I'll post a better example in a minute.
As requested, here's a sample text log file with desired output below:
Sample:
USER USER ACTION
Wed 02/23/2022 10:48:33 John 123 Building Access
Wed 02/23/2022 10:49:34 John 123 Access Denied
Wed 02/23/2022 10:48:33 John 123 Building Access
Wed 02/23/2022 11:49:34 John 123 Access Denied
Wed 02/23/2022 10:50:50 Kate Access Denied
Wed 02/23/2022 10:50:52 Kate Access Denied
Wed 02/23/2022 10:52:52 Kate Access Denied
Wed 02/23/2022 10:55:50 Kate Access Denied
Wed 02/23/2022 13:50:52 Kate Access Denied
Wed 02/23/2022 14:52:52 Kate Access Denied
Desired:
Wed 02/23/2022 10:48:33 John 123 Building Access
Wed 02/23/2022 11:49:34 John 123 Access Denied
Wed 02/23/2022 10:52:52 Kate Access Denied
Wed 02/23/2022 10:55:50 Kate Access Granted
Wed 02/23/2022 13:50:52 Kate Access Denied
Wed 02/23/2022 14:52:52 Kate Access Denied
As you can see, I'm just trying to reduce data to look over here but I still want to see unique entries in columns 2 and 3... Only, I want to see them every hour, not every minute.
Solution 1:[1]
You can try to get hours value or whatever you need with string indexses, get the column where the time is stired and than use column[number1:number2] and you can delete it within the for loop with if statement. For example: If '11' in column[14:17] and if this statement is true then delete the line.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BokiX |


