'Check a datetime in a log file within a range between two dates given by a user

Basically, I have a folder where absolutely huge log files are archived every day. 3 log files are created per day more precisely.

I'm working on a Python script where the user has to enter a date in YYYYMMDD format in order to locate the 3 files that have been created on this date, then he enters a start and end time in hour, minute and seconds and an IP address. And the script will read the content of the 3 .gz files in the given interval and print the lines where the IP address is present.

Here is an example of what the log files look like:

Thu Apr 14 00:23:22 2022 [58733]: connect from 100.117.137.249 [100.117.137.249]
Thu Apr 14 00:23:32 2022 [59668]: connect from 100.117.137.249 [100.117.137.249]
Thu Apr 14 00:23:32 2022 [59668]: authorization query for 'inventaire' pts/0 from 100.117.137.249 accepted
Thu Apr 14 00:23:32 2022 [59675]: connect from 100.117.137.249 [100.117.137.249]
Thu Apr 14 00:23:32 2022 [59675]: authorization query for 'inventaire' pts/0 from 100.117.137.249 accepted
Thu Apr 14 00:23:32 2022 [59698]: connect from 100.117.137.249 [100.117.137.249]

I used zgrep 100.117.137.249, hence the reason for the same IP address coming up every time.

My code looks like this for the moment:

import re
import os
import glob
import gzip
from datetime import datetime, timedelta

date_entry = raw_input('Give a date in format YEAR, MONTH, DAY \n')
date = datetime.strptime(re.sub("\s+", "", date_entry), "%Y,%m,%d").date()

path = "/applis/tacacs/log/"

list_of_files = [
    file for file in glob.glob(path + '*.gz')
    if date == datetime.fromtimestamp(os.path.getmtime(file)).date()
]

print("Tacacs of that day: ")
print(list_of_files)

debut = raw_input('Start (Hour:Minute:Second) \n')
date_debut = datetime.strptime(debut, '%H:%M:%S').date()
fin = raw_input('End (Hour:Minute:Second) \n')
date_fin = datetime.strptime(fin, '%H:%M:%S').date()
Adresse_IP = raw_input('IP Address \n')

for fname in list_of_files: #iterates the log file names to open them one by one
    with gzip.open(fname, 'r') as file: #opens an individual file
        for line in file: #iterate all lines
            if re.search(Adresse_IP, line): #search for the IP address
                l_date = datetime.strptime(line.split(':')[3], "%H:%M:%S:")
                if date_debut < l_date < date_fin:
                                print(line) #print line if match

Everything should work fine in my script, except that apparently it seems to think that the IP addresses in the log file lines are datetimes. Which gives me this error instead.

Give a date in format YEAR, MONTH, DAY 
2022,04,15
Tacacs of that day: 
['/applis/tacacs/log/tacacs_provisioning.log.5.gz', '/applis/tacacs/log/tacacs.log.8.gz', '/applis/tacacs/log/tacacs_acct.log.8.gz']
Start (Hour:Minute:Second) 
00:00:00
End (Hour:Minute:Second) 
01:00:00
IP Address 
100.117.137.249
Traceback (most recent call last):
  File "scriptacacs3.py", line 30, in <module>
    l_date = datetime.strptime(line.split(':')[3], "%H:%M:%S:")
  File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime
    (data_string, format))
ValueError: time data ' connect from 100.117.137.249 [100.117.137.249]\n' does not match format '%H:%M:%S:'

Is there a way for the script to only check the times in HH:MM:SS and ignore if it doesn't match?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source