'Python how to add create blocks from multiline text
I have the textblock below and I am trying to separate into 3 blocks with regex. When you see the name field it would start a new block. How can I return all 3 blocks?
name: marvin
attribute: one
day: monday
dayalt: test << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
import re
lines = """name: marvin
attribute: one
day: monday
dayalt: test << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""
a=re.findall("(name.*)[\n\S\s]", lines, re.MULTILINE)
Block1 would return as "name: marvin\nattribute: one\nday: monday\ndayalt: test
Thanks!
Solution 1:[1]
How about the following, which uses positive lookahead:
import re
lines = """name: marvin
attribute: one
day: monday
dayalt: test
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday"""
blocks = re.findall(r"name: .*?(?=name: |$)", lines, re.DOTALL)
print(blocks)
# ['name: marvin\nattribute: one\nday: monday\ndayalt: test\n',
# 'name: judy\nattribute: two\nday: tuesday\n',
# 'name: dot\nattribute: three\nday: wednesday']
Solution 2:[2]
If you are using [\n\S\s]
(which can be written as [\S\s]
because \s
also matches a newline), you don't need the re.DOTALL
flag.
But your pattern (name.*)[\n\S\s]
only matches name
followed by the rest of the line, and then a single any character because the character class is not repeated.
You can omit use a non greedy quantifier to prevent unnecessary backtracking, and instead match the line that starts with name:
followed by matching all lines that do not start with it.
^name: .*(?:\n(?!name: ).*)*
Explanation
^
Start of stringname: .*
Matchname:
, a space and the rest of the line(?:
Non capture group (to repeat as a whole)\n
Match a newline(?!name: ).*
Assert notname:
directly to the right of the current position
)*
Close non capture group and optionally repeat
Example
import re
pattern = r"^name: .*(?:\n(?!name: ).*)*"
lines = """name: marvin
attribute: one
day: monday
dayalt: test << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""
matches = re.findall(pattern, lines, re.MULTILINE)
print(matches)
Output
[
'name: marvin\nattribute: one\nday: monday\ndayalt: test << this is a field that can sometimes show up',
'name: judy\nattribute: two\nday: tuesday',
'name: dot\nattribute: three\nday: wednesday\n'
]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | The fourth bird |