'How to remove from same type of string coming together using python

I have a exclusion_list = ['and', 'or', 'not']

My string is 'Nick is not or playing football and or not cricket'

  • In the string if exclusion_list is coming together then it should take first one

My expected out is 'Nick is not playing football and cricket'

Code is below

search_value = 'Nick is not or playing football and or not cricket'
exclusion_list =  ['and', 'or', 'not']
search_value_2 = search_value.split(' ')
for i in range(0,len(search_value_2)-2):
    if search_value_2[i] in exclusion_list and search_value_2[i+1] in exclusion_list:
        search_value_2.remove(search_value_2[i+1])
' '.join(search_value_2)

My out>> 'Nick is not playing football and not cricket'

Expected out> 'Nick is not playing football and cricket'

Basically I need to call recursively till no exlusion_list is coming together in string



Solution 1:[1]

First let's use a more meaningful name for search_value_2. What does this list contain? Words! So we call this list words.

Since the code tries to access the element at i + 1 (where ìis the index of the word in the list) it shouldn't run up tolen(words) - 2butlen(words) - 1`.

search_value = 'Nick is not or playing football and or not cricket'
exclusion_list = ['and', 'or', 'not']
words = search_value.split(' ')
for i in range(len(words) - 1):
    if words[i] in exclusion_list and words[i + 1] in exclusion_list:
        words[i + 1] = ''
print(' '.join(words))

Now we get Nick is not playing football and not cricket. At least we have a result, but it's still not correct. We can avoid the extra space by not joining empty words. This means changing ' '.join(words) to ' '.join((word for word in words if word)).

Then we have to take into account that more than two words from the exclusion list follow each other. The simple words[i] in exclusion_list doesn't work because that word might have been overridden with an empty string during the last loop. So this has to be changed to (words[i] == '' or words[i] in exclusion_list).

search_value = 'Nick is not or playing football and or not cricket'
exclusion_list = ['and', 'or', 'not']
words = search_value.split(' ')
for i in range(len(words) - 1):
    if (words[i] == '' or words[i] in exclusion_list) and words[i + 1] in exclusion_list:
        words[i + 1] = ''
print(' '.join((word for word in words if word)))

Final result: Nick is not playing football and cricket

Solution 2:[2]

Another approach is as follows:

  • Go through all the words in the text
  • If the word is in the exclusion_list, either
    • keep it, if the previous word wasn't in the exclusion_list, OR
    • skip it otherwise.

Technically, we use a variable skip that works as a switch to store whether or not we want to keep a word. What's nice about doing it this way is that it can handle sequences of arbitrary length.

exclusion_list = ['and', 'or', 'not']
txt = 'Nick is not or playing football and or not cricket'
words = txt.split()
output = []

skip = False

for w in words:
    if w in exclusion_list:
        if skip:
            # previous word was in excluded list,
            # so skip this one
            continue

        else:
            # starts a new sequence of one or more 
            # exclusion_list words

            # keep this one
            output.append(w)

            # skip the following
            skip = True
    else:
        # keep normal words and reset skip
        output.append(w)
        skip = False 

# Output: Nick is not playing football and cricket
print(' '.join(output))

Solution 3:[3]

You can modify your code this way:

    search_value = 'Nick is not or playing football and or not cricket'
    exclusion_list =  ['and', 'or', 'not']
    search_value_2 = search_value.split(' ')
    selected = []
    for i in range(0,len(search_value_2)):
        if search_value_2[i] not in exclusion_list or (search_value_2[i] in exclusion_list and search_value_2[i+1] in exclusion_list and search_value_2[i-1] not in exclusion_list):
              selected.append(search_value_2[i])
    ' '.join(selected)

Solution 4:[4]

Regex approach: has several drawbacks such as more than 3 consecutive terms of exclusion_list and if the length of exclusion_list is "big". See next approach with groupby

With a regex approach: generate "all" possible combinations of the elements of exclusion_list, both of length 2 and 3, merge them together (in order of decreasing length! it is important for the regex pattern), create the pattern with | to denote "or" and perform the substitution.

Remark: the combinations are made with product so 'and and', ... are also included, a bit greedy method.

import re
import itertools as it

search_value = 'Nick is not or playing football and or not cricket'
exclusion_list = ['and', 'or', 'not']

p2 = it.product(exclusion_list, repeat=2)
p3 = it.product(exclusion_list, repeat=3)

ps = it.chain(p3, p2) # <-- first the longest!
ps_as_strs = map(' '.join, ps)

regex = '|'.join(map('({})'.format, ps_as_strs))

new_search_value = re.sub(regex, lambda match: match.group(match.lastindex).split()[0], search_value)

print(new_search_value)

EDIT: more robust solution with groupby

import itertools as it

search_value = 'Nick is not or playing football and or not cricket'
exclusion_list = ['and', 'or', 'not']

new_str = ''
for check, grp  in it.groupby(search_value.split(), lambda word: word in exclusion_list):
    if check:
        new_str += ' '.join([next(grp)])
    else:
        new_str += ' '.join(grp)
    new_str += ' '
new_str = new_str.strip()

print(new_str)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 fsimonjetz
Solution 3 Keziya
Solution 4