'Is there a way to split a line by multiple characters using the spilt method in python?

So far I have this code to split my file lines.

with open("example.dat", 'r') as f:
    lines = [line.strip().split(',') for line in f]
print(lines)

I want to split the code so that I have a multidimensional array where the data is represented like [city, state, latitude, longitude, population]. However, the split method only takes one parameter, so after some research I imported re and tried to use that since the file I am working with has a pattern. However, the run results are not separating the data from the file into the array in the manner I would like.

For example, if the file has the information

New York City, NY[40,74]11000000

The code above would print [['New York City', ' NY[40', '70]11000000'], etc.].

I want it to print [['New York City', 'NY', 40, 70, 11000000], etc.].

Since I didn't get the results I wanted I tried the following code.

import re
with open("example.dat", 'r') as f:
    lines = [re.split(r',[,]', line) for line in f]
print(lines)

The is code outputs the data in this manner: [['New York City, NY[40,70]11000000\n'], etc.]

So can I use re or split method to split a line by different characters or no?



Solution 1:[1]

The easiest solution may be to flatten the different split characters to a single one:

with open("example.dat", "r") as fh:
    lines = []
    for line in fh:
        lines.append( line.strip().replace("[", ",").replace("]", ",").split(",") )

Solution 2:[2]

You can use named groups in regular expression to more properly extract the information (read more here: https://www.regular-expressions.info/refext.html):

import re

pat = r"(?P<city>[^,]*), (?P<state>[\w\W]*)\[(?P<lat>\d+),(?P<lon>\d+)\](?P<pop>\d+)"
pat = re.compile(pat, re.VERBOSE)

city = match.group("city")
state = match.group("state")
lat = float(match.group("lat"))
lon = float(match.group("lon"))
population = int(match.group("pop"))

line = [city, state, lat, lon, population)
# => ['New York City', ' NY', 40.0, 74.0, 11000000]

Solution 3:[3]

Regex is pretty useful in such cases:

import re
x = 'New York City, NY[40,74]11000000'
res = re.split(', |\[|\]|,', x)
print(res)
#####
['New York City', 'NY', '40', '74', '11000000']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mara004
Solution 2 TYZ
Solution 3 Ashish Samarth