'How to analyze log file by using python and pandas?
I am working on one sample log file from one vending machine. (pretty new to the pandas). Every day the machine will generate one .log file.
Q: How to use python and pandas to extract the info from the .log file, and eventually save the info into a data framework for the next step analysis? (provide sample input and output below)
You can find my sample code and sample .log file below:
filePath = "~/sample.log"
with open(filePath) as fp :
line = fp.read()
print(lines)
I am not sure how to approach in this case, could someone please share with me some code to process the above log file? thank you
Solution 1:[1]
Welcome to Python!
You did the correct first step that can read the whole file at a time, but what I am going to show is to use fp.readline() to read one line at a time. From S7.2.1 of doc,
if
f.readline()returns an empty string, the end of the file has been reached
We will implement check for end-of-file.
with open(filePath, 'r', encoding='utf16') as fp :
ln = fp.readline() # first line skipped
ln = fp.readline() # second line skipped
data = [] # make a list to collect data
while True:
ln = fp.readline()
if ln == '':
break #end-of-file check
ln = ln.replace('Battery test speed (mph)', 'BatteryTestSpeed(mph)')
entities = ln.rstrip('\n').split(' ') # the line is split with space character, so each line will end up with 12 entities
entities = [entity.split('=')[-1] for entity in entities] # further split each entity with `=` and only preserve the last string. Check for yourself how split works on a string with or without `=`.
data.append(entities) # collected by the list
data_df = pd.DataFrame(data, columns=...) # put a list of length 12 to specify the column header. Remove `columns=
If you had pasted your data in text, I could have tested my code, but now you will need to help in that.
Solution 2:[2]
The question itself is full of issues and ambiguities. The line number handling seems very odd. Your question implies that the first line 000 should be ignored. However, this might help you get started
from collections import defaultdict
from pandas import DataFrame
import sys
DATA = defaultdict(dict)
SKIP = 2
# List of columns of interest
COLUMNS = ['test1Voltage', 'test1Current', 'test2Voltage', 'test2Current', 'currentstate',
'BatteryHealth', 'Battylife(hr)', 'Battery test speed (mph)', 'BatteryLoading']
with open('testing1.log') as log:
for _ in range(SKIP):
next(log)
for line in log:
try:
o = 1 if line[0] == '(' else 0
if (lineno := int(line[o:].split()[0])) == 0 and len(DATA) != 0:
break
for c in COLUMNS:
try:
i = line.index(c)
DATA[lineno][c] = line[i+len(c)+1:].split()[0]
except ValueError:
pass
except Exception as e:
print(f'Unable to process:-\n{line}...due to {e}', file=sys.stderr)
df = DataFrame.from_dict(DATA, orient='index')
print(df)
Output:
test1Voltage test1Current test2Voltage test2Current currentstate BatteryHealth Battylife(hr) BatteryLoading
0 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
1 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
2 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
3 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
4 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
245 13.8V 2.1A 11.8V 12.1A NORMAL_RUN(0) 100% 1hour OFF
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
