'How to see if a string ONLY contains a substring in python
I need to be able to see if a string only contains a substring or a letter, and nothing else.
Say I wanted to detect World
This would contain the substring but it also has different letters in a different order
"Hello World"
This doesn't contain any different lettering or order, just the substring 3 times
"WorldWorldWorld"
If I wanted to detect _
This wouldn't pass
"Hello_World"
But this would
"___"
How do I do this?
Solution 1:[1]
No regex necessary. Relying on the fact that str.count counts non-overlapping occurrences
len(target) * data.count(target) == len(data)
Simple string methods are 400-800% faster than regex here:
>>> import re
>>> target = "World"
>>> data = "World" * 3
>>> pattern = f"^({re.escape(target)})+$"
>>> %timeit len(target) * data.count(target) == len(data)
115 ns ± 0.352 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit re.match(pattern, data) is not None
456 ns ± 2.88 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit bool(data.replace(target, '')) # str.replace is faster again
51.7 ns ± 0.269 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Solution 2:[2]
You can use a regular expression, using re.escape to generate a pattern that matches one or more consecutive occurrences of the target (using ^ and $ to indicate the beginning and end of the string, respectively) as well as re.match to determine whether it matches the desired pattern:
import re
target = "World"
data = "World" * 3
pattern = f"^({re.escape(target)})+$"
re.match(pattern, data) is not None
This outputs:
True
Solution 3:[3]
Method 1:
Without regular expressions (regexes), one can simply use sets. First, split the string s in question into substrings of the same length as the substring substr. Make a set s_set out of these substrings. If that set has only 1 element, and that element in substr, then print True, otherwise False.
strs = ["WorldWorldWorld", "Hello World"]
substr = "World"
len_substr = len(substr)
for s in strs:
s_set = set(s[i:(i + len_substr)] for i in range(0, len(s), len_substr))
print(len(s_set) == 1 and substr in s_set)
# True
# False
Method 2:
If speed is important, then for very long strings, it makes sense to stop as soon as the first non-matching substring is found, as in this solution:
for s in strs:
only_substr = True
for i in range(0, len(s), len_substr):
cur_substr = s[i:(i + len_substr)]
if cur_substr != substr:
only_substr = False
break
print(only_substr)
# True
# False
Solution 4:[4]
Use a regular expression.
if re.match("(?:World)+", s):
This only succeeds if s contains one or more repetitions of the string World, and nothing else.
Solution 5:[5]
This is a job for regular expressions, re.match().
import re
re.match(r"(?:World)+", "World")
re.match(r"(?:World)+", "Hello World")
re.match(r"(?:World)+", "WorldWorldWorld")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | BrokenBenchmark |
| Solution 3 | |
| Solution 4 | chepner |
| Solution 5 | ljmc |
