'How to find specific value in a text file in python
Good morning guys, my question is: I have a text file in this format:
1 00:00:00,000 --> 00:00:00,033 <font size="36">FrameCnt: 1, DiffTime:
33ms 2022-05-19 16:15:57,729,790 [iso : 110] [shutter : 1/640.0] [fnum
: 280] [ev : 0] [ct : 5284] [color_md : default] [focal_len : 240]
[dzoom_ratio: 20088, delta:10088],[latitude: 38.259025] [longtitude:
15.598678] [rel_alt: 9.737 abs_alt: 99.324] [Drone: Yaw:51.4,
Pitch:-1.8, Roll:-1.3] </font>
2 00:00:00,033 --> 00:00:00,066 <font size="36">FrameCnt: 2, DiffTime:
33ms 2022-05-19 16:15:57,762,098 [iso : 110] [shutter : 1/640.0] [fnum
: 280] [ev : 0] [ct : 5284] [color_md : default] [focal_len : 240]
[dzoom_ratio: 20088, delta:0],[latitude: 38.259030] [longtitude:
15.598689] [rel_alt: 9.737 abs_alt: 99.324] [Drone: Yaw:51.4,
Pitch:-1.8, Roll:-1.3] </font>
My intention is to retrieve FrameCnt, latitude, and longitude values for block of 6 rows. That is my possible output:
1, 38.259025, 15.598678
2, 38.259030, 15.598689
How is it possible to do this in python? Thank you very much in advance
Solution 1:[1]
You can do this by regex:
import re
regex = (r".*\[latitude: (.*)\] \[longtitude:\n"
r"(.*)\] \[rel_alt.*")
test_str = ("1 00:00:00,000 --> 00:00:00,033 <font size=\"36\">FrameCnt: 1, DiffTime:\n"
"33ms 2022-05-19 16:15:57,729,790 [iso : 110] [shutter : 1/640.0] [fnum\n"
": 280] [ev : 0] [ct : 5284] [color_md : default] [focal_len : 240]\n"
"[dzoom_ratio: 20088, delta:10088],[latitude: 38.259025] [longtitude:\n"
"15.598678] [rel_alt: 9.737 abs_alt: 99.324] [Drone: Yaw:51.4,\n"
"Pitch:-1.8, Roll:-1.3] </font>\n\n"
"2 00:00:00,033 --> 00:00:00,066 <font size=\"36\">FrameCnt: 2, DiffTime:\n"
"33ms 2022-05-19 16:15:57,762,098 [iso : 110] [shutter : 1/640.0] [fnum\n"
": 280] [ev : 0] [ct : 5284] [color_md : default] [focal_len : 240]\n"
"[dzoom_ratio: 20088, delta:0],[latitude: 38.259030] [longtitude:\n"
"15.598689] [rel_alt: 9.737 abs_alt: 99.324] [Drone: Yaw:51.4,\n"
"Pitch:-1.8, Roll:-1.3] </font>")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print(matchNum, end=" , ")
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print (match.group(groupNum), end=" ")
print("")
Solution 2:[2]
I think this is a good use for regex lookbehind. Lookbehind in regex is a regex part that needs to be before the mathc in order to be a valid match, but it isn't included in the return match. The syntax is (?<=<lookbehind_regex>).
We are look here for the strings after "FrameCnt: ", "latitude: " and "longtitude: ", so our patterns will start with (?<=FrameCnt: ) and so on.
Next, we want to find a digit, which can be floating point or not. This can be found using [0-9.]+. The [0-9.] part means any digit character or a period. The + means that we want [0-9.] one or more times. This needs to be included in the match, so we place it outside the lookbehind.
Our pattterns will thus look like (?<=FrameCnt: )[0-9.]+, (?<=latitude: )[0-9.]+ and (?<=longtitude: )[0-9.]+.
We could now hardcode these patterns with the three different words, but what if you decide tomorrow that you also need another value? That's why I would use a for loop and dynamically construct the pattern from a base pattern.
Here's the code:
from regex import findall
pattern="(?<=%s)[0-9.]+"
tofind=["FrameCnt","latitude","longtitude"]
found=[]
with open("filename.txt","r") as file:
txt=file.read()
for string in tofind:
newpattern=pattern % (string+": ")
found.append(findall(newpattern,txt))
print(found)
Output:
[['1', '2'], ['38.259025', '38.259030'], ['15.598678', '15.598689']]
Now we still have to change its data type into int, and put it into a DataFrame.
from pandas import DataFrame as df
frame=df(found,dtype=float,index=tofind)
print(frame)
Output:
0 1
FrameCnt 1.000000 2.000000
latitude 38.259025 38.259030
longtitude 15.598678 15.598689
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | nfn |
| Solution 2 | The_spider |
