'regex: Match null or space from start as optional

I want to match null or space as optional from the start of the line. The line is as follow:

 Date       Description  Amount
 
 null 12/05/2016 Asian Paints 2,150.65

   13/05/2016 Nerolac GEB 5.86 22,512.65 Cr

 14/05/2016 Hydra 12,412

The regex that I used is :

regex_null = re.compile(r"^(?:null)?\s+(\d{2}/\d{2}/\d{4})\s+(.*?)\s+(\d[\d,]*\.\d{2}\s+(?:Cr)?)$", re.M)

And what I got is:

 null 12/05/2016 Asian Paints 2,150.65

     13/05/2016 Nerolac GEB 5.86 22,512.65 Cr

So the null is not optional. It is currently considered compulsory. Can you please help me with this?



Solution 1:[1]

You may use this regex with optional groups:

^\s*(?:null)?\s*(\d{2}/\d{2}/\d{4})\s+(.*?)\s+(\d[\d,]*(?:\.\d{2})?(\s+Cr)?)$

RegEx Demo

RegEx Details:

  • ^\s*(?:null)?\s*: Match optional null with 0 or more whitespaces on both sides
  • (\d{2}/\d{2}/\d{4}): Match date string in capture group #1
  • \s+: Match 1+ whitespaces
  • (.*?): Math 0 or more characters in capture group #2
  • \s+: Match 1+ whitespaces
  • (\d[\d,]*: Match a digit followed by 0 or more digit/comma characters
  • (?:\.\d{2})?: Match optional dot and digits
  • (\s+Cr)?): Match optional 1+ whitespaces followed by Cr
  • $: End

Solution 2:[2]

You may apply a regex pattern in multiline mode which makes the first, sixth, and seventh values optional in the line.

inp = """ null 12/05/2016 Asian Paints 2,150.65

   13/05/2016 Nerolac GEB 5.86 22,512.65 Cr

 14/05/2016 Hydra 12,412"""

lines = re.findall(r'^\s*(null)?\s*(\d{1,2}/\d{1,2}/\d{4}) (\w+(?: \w+)*) (\d{1,3}(?:,\d{3})*(?:\.\d+)?)?(?: (\d{1,3}(?:,\d{3})*(?:\.\d+)?))?(?: (\w+))?', inp, flags=re.M)
print(lines)

This prints:

[('null', '12/05/2016', 'Asian Paints', '2,150.65', '', ''),
 ('', '13/05/2016', 'Nerolac GEB', '5.86', '22,512.65', 'Cr'),
 ('', '14/05/2016', 'Hydra', '12,412', '', '')]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 anubhava
Solution 2 Tim Biegeleisen