'How to remove from same type of string coming together using python
I have a exclusion_list = ['and', 'or', 'not']
My string is 'Nick is not or playing football and or not cricket'
- In the string if exclusion_list is coming
togetherthen it should take first one
My expected out is 'Nick is not playing football and cricket'
Code is below
search_value = 'Nick is not or playing football and or not cricket'
exclusion_list = ['and', 'or', 'not']
search_value_2 = search_value.split(' ')
for i in range(0,len(search_value_2)-2):
if search_value_2[i] in exclusion_list and search_value_2[i+1] in exclusion_list:
search_value_2.remove(search_value_2[i+1])
' '.join(search_value_2)
My out>> 'Nick is not playing football and not cricket'
Expected out> 'Nick is not playing football and cricket'
Basically I need to call recursively till no exlusion_list is coming together in string
Solution 1:[1]
First let's use a more meaningful name for search_value_2. What does this list contain? Words!
So we call this list words.
Since the code tries to access the element at i + 1 (where ìis the index of the word in the list) it shouldn't run up tolen(words) - 2butlen(words) - 1`.
search_value = 'Nick is not or playing football and or not cricket'
exclusion_list = ['and', 'or', 'not']
words = search_value.split(' ')
for i in range(len(words) - 1):
if words[i] in exclusion_list and words[i + 1] in exclusion_list:
words[i + 1] = ''
print(' '.join(words))
Now we get Nick is not playing football and not cricket. At least we have a result, but it's still not correct. We can avoid the extra space by not joining empty words. This means changing ' '.join(words) to ' '.join((word for word in words if word)).
Then we have to take into account that more than two words from the exclusion list follow each other. The simple words[i] in exclusion_list doesn't work because that word might have been overridden with an empty string during the last loop.
So this has to be changed to (words[i] == '' or words[i] in exclusion_list).
search_value = 'Nick is not or playing football and or not cricket'
exclusion_list = ['and', 'or', 'not']
words = search_value.split(' ')
for i in range(len(words) - 1):
if (words[i] == '' or words[i] in exclusion_list) and words[i + 1] in exclusion_list:
words[i + 1] = ''
print(' '.join((word for word in words if word)))
Final result: Nick is not playing football and cricket
Solution 2:[2]
Another approach is as follows:
- Go through all the words in the text
- If the word is in the
exclusion_list, either- keep it, if the previous word wasn't in the
exclusion_list, OR - skip it otherwise.
- keep it, if the previous word wasn't in the
Technically, we use a variable skip that works as a switch to store whether or not we want to keep a word. What's nice about doing it this way is that it can handle sequences of arbitrary length.
exclusion_list = ['and', 'or', 'not']
txt = 'Nick is not or playing football and or not cricket'
words = txt.split()
output = []
skip = False
for w in words:
if w in exclusion_list:
if skip:
# previous word was in excluded list,
# so skip this one
continue
else:
# starts a new sequence of one or more
# exclusion_list words
# keep this one
output.append(w)
# skip the following
skip = True
else:
# keep normal words and reset skip
output.append(w)
skip = False
# Output: Nick is not playing football and cricket
print(' '.join(output))
Solution 3:[3]
You can modify your code this way:
search_value = 'Nick is not or playing football and or not cricket'
exclusion_list = ['and', 'or', 'not']
search_value_2 = search_value.split(' ')
selected = []
for i in range(0,len(search_value_2)):
if search_value_2[i] not in exclusion_list or (search_value_2[i] in exclusion_list and search_value_2[i+1] in exclusion_list and search_value_2[i-1] not in exclusion_list):
selected.append(search_value_2[i])
' '.join(selected)
Solution 4:[4]
Regex approach: has several drawbacks such as more than 3 consecutive terms of exclusion_list and if the length of exclusion_list is "big". See next approach with groupby
With a regex approach: generate "all" possible combinations of the elements of exclusion_list, both of length 2 and 3, merge them together (in order of decreasing length! it is important for the regex pattern), create the pattern with | to denote "or" and perform the substitution.
Remark: the combinations are made with product so 'and and', ... are also included, a bit greedy method.
import re
import itertools as it
search_value = 'Nick is not or playing football and or not cricket'
exclusion_list = ['and', 'or', 'not']
p2 = it.product(exclusion_list, repeat=2)
p3 = it.product(exclusion_list, repeat=3)
ps = it.chain(p3, p2) # <-- first the longest!
ps_as_strs = map(' '.join, ps)
regex = '|'.join(map('({})'.format, ps_as_strs))
new_search_value = re.sub(regex, lambda match: match.group(match.lastindex).split()[0], search_value)
print(new_search_value)
EDIT: more robust solution with groupby
import itertools as it
search_value = 'Nick is not or playing football and or not cricket'
exclusion_list = ['and', 'or', 'not']
new_str = ''
for check, grp in it.groupby(search_value.split(), lambda word: word in exclusion_list):
if check:
new_str += ' '.join([next(grp)])
else:
new_str += ' '.join(grp)
new_str += ' '
new_str = new_str.strip()
print(new_str)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | fsimonjetz |
| Solution 3 | Keziya |
| Solution 4 |
