'Multipile grouping in regex
I have a string
s="<response>blabla
<head> blabla
<t> EXTRACT 1</t>
<t>EXTRACT 2</t>
</head>
<body> blabla
<t>BODY 1</t>
<t>BODY 2</t>
</response>"
I need to extract the text betwen the tags and but only if its in the head part. I tried
regex="(?:<t>([\w.,_]*)*)</t>
re.findall(regex,s)
but it is fetching the body part too , i understand that i need to tell it to stop at the closing head tag but I couldnt come up with any way
PS:The string is in a single line, I split it for better readability.And i want to do this using regex and not xml parsers.
Solution 1:[1]
You can find the header first :
s = "<response>blabla <head> blabla <t> EXTRACT 1</t> <t>EXTRACT 2</t> </head> <body> blabla <t>BODY 1</t> <t>BODY 2</t> </response>"
pattern_head = "<head>(.*)</head>"
header = re.findall(pattern_head, s)
print(header)
This gives : [' blabla <t> EXTRACT 1</t> <t>EXTRACT 2</t> ']
Then get what you want from the head :
pattern = "<t>(.*?)</t>"
substring = re.findall(pattern,header[0])
print(substring)
>>> [' EXTRACT 1', 'EXTRACT 2']
Solution 2:[2]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Anass |
| Solution 2 | ParryHotter |
