'Is there a way to split a line by multiple characters using the spilt method in python?
So far I have this code to split my file lines.
with open("example.dat", 'r') as f:
lines = [line.strip().split(',') for line in f]
print(lines)
I want to split the code so that I have a multidimensional array where the data is represented like [city, state, latitude, longitude, population]. However, the split method only takes one parameter, so after some research I imported re and tried to use that since the file I am working with has a pattern. However, the run results are not separating the data from the file into the array in the manner I would like.
For example, if the file has the information
New York City, NY[40,74]11000000
The code above would print [['New York City', ' NY[40', '70]11000000'], etc.].
I want it to print [['New York City', 'NY', 40, 70, 11000000], etc.].
Since I didn't get the results I wanted I tried the following code.
import re
with open("example.dat", 'r') as f:
lines = [re.split(r',[,]', line) for line in f]
print(lines)
The is code outputs the data in this manner: [['New York City, NY[40,70]11000000\n'], etc.]
So can I use re or split method to split a line by different characters or no?
Solution 1:[1]
The easiest solution may be to flatten the different split characters to a single one:
with open("example.dat", "r") as fh:
lines = []
for line in fh:
lines.append( line.strip().replace("[", ",").replace("]", ",").split(",") )
Solution 2:[2]
You can use named groups in regular expression to more properly extract the information (read more here: https://www.regular-expressions.info/refext.html):
import re
pat = r"(?P<city>[^,]*), (?P<state>[\w\W]*)\[(?P<lat>\d+),(?P<lon>\d+)\](?P<pop>\d+)"
pat = re.compile(pat, re.VERBOSE)
city = match.group("city")
state = match.group("state")
lat = float(match.group("lat"))
lon = float(match.group("lon"))
population = int(match.group("pop"))
line = [city, state, lat, lon, population)
# => ['New York City', ' NY', 40.0, 74.0, 11000000]
Solution 3:[3]
Regex is pretty useful in such cases:
import re
x = 'New York City, NY[40,74]11000000'
res = re.split(', |\[|\]|,', x)
print(res)
#####
['New York City', 'NY', '40', '74', '11000000']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mara004 |
| Solution 2 | TYZ |
| Solution 3 | Ashish Samarth |
