'Fastest way to split a concatenated string into a tuple and ignore empty strings

I have a concatenated string like this:

my_str = 'str1;str2;str3;'

and I would like to apply split function to it and then convert the resulted list to a tuple, and get rid of any empty string resulted from the split (notice the last ';' in the end)

So far, I am doing this:

tuple(filter(None, my_str.split(';')))

Is there any more efficient (in terms of speed and space) way to do it?



Solution 1:[1]

That is a very reasonable way to do it. Some alternatives:

  • foo.strip(";").split(";") (if there won't be any empty slices inside the string)
  • [ x.strip() for x in foo.split(";") if x.strip() ] (to strip whitespace from each slice)

The "fastest" way to do this will depend on a lot of things… but you can easily experiment with ipython's %timeit:

In [1]: foo = "1;2;3;4;"

In [2]: %timeit foo.strip(";").split(";")
1000000 loops, best of 3: 1.03 us per loop

In [3]: %timeit filter(None, foo.split(';'))
1000000 loops, best of 3: 1.55 us per loop

Solution 2:[2]

How about this?

tuple(my_str.split(';')[:-1])
('str1', 'str2', 'str3')

You split the string at the ; character, and pass all off the substrings (except the last one, the empty string) to tuple to create the result tuple.

Solution 3:[3]

If you only expect an empty string at the end, you can do:

a = 'str1;str2;str3;'
tuple(a.split(';')[:-1])

or

tuple(a[:-1].split(';'))

Solution 4:[4]

Try tuple(my_str.split(';')[:-1])

Solution 5:[5]

Yes, that is quite a Pythonic way to do it. If you have a love for generator expressions, you could also replace the filter() with:

tuple(part for part in my_str.split(';') if part)

This has the benefit of allowing further processing on each part in-line.

It's interesting to note that the documentation for str.split() says:

... If sep is not specified or is None, any whitespace string is a separator and empty strings are removed from the result.

I wonder why this special case was done, without allowing it for other separators...

Solution 6:[6]

use split and then slicing:

 my_str.split(';')[:-1]

or :

lis=[x for x in my_str.split(';') if x]

Solution 7:[7]

if number of items in your string is fixed, you could also de-structure inline like this:

(str1, str2, str3) = my_str.split(";")

more on that here: https://blog.teclado.com/destructuring-in-python/

Solution 8:[8]

I know this is an old question, but I just came upon this and saw that the top answer (David) doesn't return a tuple like OP requested. Although the solution works for the one example OP gave, the highest voted answer (Levon) strips the trailing semicolon with a substring, which would error on an empty string.

The most robust and pythonic solution is voithos' answer:

tuple(part for part in my_str.split(';') if part) 

Here's my solution:

tuple(my_str.strip(';').split(';'))

It returns this when run against an empty string though:

('',)

So I'll be replacing mine with voithos' answer. Thanks voithos!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 David Wolever
Solution 2 xxx
Solution 3 exfizik
Solution 4 googler
Solution 5 voithos
Solution 6 Ashwini Chaudhary
Solution 7 Sonic Soul
Solution 8 Zenon Anderson