'Replace a phrase only if it appears at the beginning of a character string [duplicate]
For example, must remove in this order "the ", "and ", "a ", "an ", "this " or "that " , only if they are at the beginning of the string:
input ---> "the computer is the machine in charge of data processing processes"
output ---> "computer is the machine in charge of data processing processes"
It is important that if I manage to find that the sentence begins with one of those words, that I remove it and then do not continue trying to remove the others.
In the case of this example, it would detect the word "the " at the beginning of the string, remove it, and no longer try the rest of the words.
To reach the conclusion that you should not remove anything, you have to yes or if you have tried removing all 6 options ("the ", "and ", "a ", "an ", "this " or "that "), and if you did not find that the input phrase begins with any of those options, then assume that you should not remove anything.
I've tried something like this, but the problem is that it would do all the checking and not just try to find until the match.
input_phrase.replace("the ","")
input_phrase = "An airplane is an aircraft with a higher density than the air."
input_phrase = input_phrase.lower()
input_phrase = input_phrase.replace("the ","",1)
input_phrase = input_phrase.replace("and ","",1)
input_phrase = input_phrase.replace("a ","",1)
input_phrase = input_phrase.replace("an ","",1)
input_phrase = input_phrase.replace("this ","",1)
input_phrase = input_phrase.replace("that ","",1)
output_phrase = input_phrase
print(repr(output_phrase))
The problem with that code is that it doesn't just remove the word if it's at the beginning, but it removes the first occurrence, and also uses all .remove() and not stops when it has already removed one of the matches.
Solution 1:[1]
Here is one way to do so using regex:
import re
input_phrase = "An airplane is an aircraft with a higher density than the air."
output_phrase = re.sub(r"^(the|and|a|an|this|that) ", '', input_phrase, flags=re.IGNORECASE)
print(output_phrase)
- The
re.ignorecaseflag allows bothAnandanto work. ^is used to assert the position at the beginning of the string.
Without regex, you can use startswith() and loop through keywords.
input_phrase = "An airplane is an aircraft with a higher density than the air."
keywords = ["the ", "and ", "a ", "an ", "this ", "that "]
output_phrase = input_phrase
for word in keywords:
if input_phrase.lower().startswith(word):
output_phrase = input_phrase[len(word):]
break
print(output_phrase)
breakis used to exit the for loop in order not to waste time checking other words.
Solution 2:[2]
input_phrase = "An airplane is an aircraft with a higher density than the air.".lower()
output_phrase = ''
words = ["the", "and ", "a ", "an ", "this", "that"]
if list(filter(input_phrase.startswith, words)) != []:
input_phrase = input_phrase.split()
input_phrase = input_phrase[1:]
for word in input_phrase:
output_phrase += ' ' + word
print(output_phrase)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
