'regex: Match null or space from start as optional
I want to match null or space as optional from the start of the line. The line is as follow:
Date Description Amount
null 12/05/2016 Asian Paints 2,150.65
13/05/2016 Nerolac GEB 5.86 22,512.65 Cr
14/05/2016 Hydra 12,412
The regex that I used is :
regex_null = re.compile(r"^(?:null)?\s+(\d{2}/\d{2}/\d{4})\s+(.*?)\s+(\d[\d,]*\.\d{2}\s+(?:Cr)?)$", re.M)
And what I got is:
null 12/05/2016 Asian Paints 2,150.65
13/05/2016 Nerolac GEB 5.86 22,512.65 Cr
So the null is not optional. It is currently considered compulsory. Can you please help me with this?
Solution 1:[1]
You may use this regex with optional groups:
^\s*(?:null)?\s*(\d{2}/\d{2}/\d{4})\s+(.*?)\s+(\d[\d,]*(?:\.\d{2})?(\s+Cr)?)$
RegEx Details:
^\s*(?:null)?\s*: Match optionalnullwith 0 or more whitespaces on both sides(\d{2}/\d{2}/\d{4}): Match date string in capture group #1\s+: Match 1+ whitespaces(.*?): Math 0 or more characters in capture group #2\s+: Match 1+ whitespaces(\d[\d,]*: Match a digit followed by 0 or more digit/comma characters(?:\.\d{2})?: Match optional dot and digits(\s+Cr)?): Match optional 1+ whitespaces followed byCr$: End
Solution 2:[2]
You may apply a regex pattern in multiline mode which makes the first, sixth, and seventh values optional in the line.
inp = """ null 12/05/2016 Asian Paints 2,150.65
13/05/2016 Nerolac GEB 5.86 22,512.65 Cr
14/05/2016 Hydra 12,412"""
lines = re.findall(r'^\s*(null)?\s*(\d{1,2}/\d{1,2}/\d{4}) (\w+(?: \w+)*) (\d{1,3}(?:,\d{3})*(?:\.\d+)?)?(?: (\d{1,3}(?:,\d{3})*(?:\.\d+)?))?(?: (\w+))?', inp, flags=re.M)
print(lines)
This prints:
[('null', '12/05/2016', 'Asian Paints', '2,150.65', '', ''),
('', '13/05/2016', 'Nerolac GEB', '5.86', '22,512.65', 'Cr'),
('', '14/05/2016', 'Hydra', '12,412', '', '')]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | anubhava |
| Solution 2 | Tim Biegeleisen |
