'Python how to add create blocks from multiline text

I have the textblock below and I am trying to separate into 3 blocks with regex. When you see the name field it would start a new block. How can I return all 3 blocks?

name: marvin
attribute: one
day: monday
dayalt: test    << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
import re
lines = """name: marvin
attribute: one
day: monday
dayalt: test    << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""

a=re.findall("(name.*)[\n\S\s]", lines, re.MULTILINE)

Block1 would return as "name: marvin\nattribute: one\nday: monday\ndayalt: test

Thanks!



Solution 1:[1]

How about the following, which uses positive lookahead:

import re

lines = """name: marvin
attribute: one
day: monday
dayalt: test
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday"""

blocks = re.findall(r"name: .*?(?=name: |$)", lines, re.DOTALL)
print(blocks)
# ['name: marvin\nattribute: one\nday: monday\ndayalt: test\n',
#  'name: judy\nattribute: two\nday: tuesday\n',
#  'name: dot\nattribute: three\nday: wednesday']

Solution 2:[2]

If you are using [\n\S\s] (which can be written as [\S\s] because \s also matches a newline), you don't need the re.DOTALL flag.

But your pattern (name.*)[\n\S\s] only matches name followed by the rest of the line, and then a single any character because the character class is not repeated.

You can omit use a non greedy quantifier to prevent unnecessary backtracking, and instead match the line that starts with name: followed by matching all lines that do not start with it.

^name: .*(?:\n(?!name: ).*)*

Explanation

  • ^ Start of string
  • name: .* Match name:, a space and the rest of the line
  • (?: Non capture group (to repeat as a whole)
    • \n Match a newline
    • (?!name: ).* Assert not name: directly to the right of the current position
  • )* Close non capture group and optionally repeat

Regex demo | Python demo

Example

import re

pattern = r"^name: .*(?:\n(?!name: ).*)*"

lines = """name: marvin
attribute: one
day: monday
dayalt: test    << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""

matches = re.findall(pattern, lines, re.MULTILINE)
print(matches)

Output

[
'name: marvin\nattribute: one\nday: monday\ndayalt: test    << this is a field that can sometimes show up',
'name: judy\nattribute: two\nday: tuesday',
'name: dot\nattribute: three\nday: wednesday\n'
]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 The fourth bird