'Filter lines having text portions embedded either between - or * in PYTHON 3 using regex

string1 = '''
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
'''

output required:

['and preferably only one', 'right']

I have used below mentioned regex

portions=re.findall(r"[/*-](\S.*)[/*-]",zenPython)
print(portions)

But I am not getting the desired result, my output:

['- and preferably only one -', 'right']


Solution 1:[1]

You can change your regex a bit:

zenPython = '''
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
'''

import re

portions=re.findall(r"[-*] ?([^-*].*?) ?[-*]",zenPython)
print(portions)

Output:

['and preferably only one', 'right']

The regex r"[-*] ?([^-*].*?) ?[-*]" will look for:

 [-*] ?               - or * followed by optional space
 ([^-*].*?)           grouping any character different then - or * as few as possible
  ?[-*]               optional space followed by - or * 

This will not work for text like:

This is --- not going-to work -- example. 

Play around with regex here: https://regex101.com/r/wNBJEE/1

Solution 2:[2]

Using regular expression to get the Correct Answer

['and preferably only one', 'right']

Try the Code below

import sys
import os
import io
import re
portions=[]
fp = io.StringIO(zenPython)    
lines = fp.readlines()    
lines = [ line.strip() for line in lines ]

#Match patterns
patterns = r"[-*] ?([^-*].*?) ?[-*]"
texts = lines
#for line in lines:
for text in texts:
    ## Search the patterns in the line
    if re.search(patterns, text):
        portion = re.findall(patterns,text)
        portions.append(str(portion).replace('[\'','').replace('\']',''))

print(portions)

Solution 3:[3]

portions=re.findall(r"([-*]+) ?([^-*]+?) ?\1",zenPython)
portions = [e[1] for e in portions]
print(portions)

The above piece of code will work fine. Here \1 represents the 1st matched group, to ensure if 1st match is *(-) then the last match should also be the *(-) only.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Patrick Artner
Solution 2
Solution 3 Rahul verma