'can I do this regex in just one step

I have a following text

fd types:["a"] 
s types: 
  ["b","c"]
types: [
"one"
]
types: "two"
types: ["three", "four", "five","six"]
no: ["Don't","read","this"]

and I would like to extract just all types: a to c, one to six I don't want to extract other properties like units so far I can do this in two steps

(?:types:\s*\[\s*)(.+?)(?:\s*])|(?:types:\s*)(\".+?\")    

this way I got the groups:

"a"
"b","c"
"one"
"two"  
"three", "four", "five","six"

then I apply

\s*"((?:[^",\s]*)*)"\s*

to any of the group and got

a
b
c
one
two  
three
four
five
six

I wonder if this can be done in just one step



Solution 1:[1]

You can pip install regex` and use

(?:\G(?!^)\s*,\s*|types:(?:\s*\[)?\s*)"([^"]*)"

See the regex demo. Details:

  • (?:\G(?!^)\s*,\s*|types:(?:\s*\[)?\s*) - either of the two patterns:
    • \G(?!^)\s*,\s* - end of the previous match and then a comma enclosed with zero or more whitespaces
    • | - or
    • types:(?:\s*\[)?\s* - types:, an optional sequence of zero or more whitespaces and a [ char, and then zero or more whitespaces
  • "([^"]*)" - ", then zero or more chars other than " captured into Group 1, and then a " char.

See the Python demo:

import regex

text = 'fd types:["a"] \ns types: \n  ["b","c"]\ntypes: [\n"one"\n]\ntypes: "two"\ntypes: ["three", "four", "five","six"]\nno: ["Don\'t","read","this"]'

print(regex.findall(r'(?:\G(?!^)\s*,\s*|types:(?:\s*\[)?\s*)"([^"]*)"', text))

Output:

['a', 'b', 'c', 'one', 'two', 'three', 'four', 'five', 'six']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew