'Python RegEx for all National Drug Codes (NDC 10 & 11) formats

Goal: RegEx to fit many posible NDC 10 & 11 formats.

I've made a great start... NDC 10:

^[0-9][0-9][0-9][0-9]\-[0-9][0-9][0-9][0-9]\-[0-9][0-9]$

e.g. 1234-1234-12 Reference


However, I've since learnt there are other formats and 11 digits:

  • 4-4-2
  • 5-3-2
  • 5-4-1
  • 5-4-2 (11 digits)

How can I write one RegEx for all these possibilities?

Issues:

  1. Optional 11th digit,
  2. Moving hyphen


Solution 1:[1]

You can use

^(?:\d{4}-\d{4}-\d{2}|\d{5}-(?:\d{3}-\d{2}|\d{4}-\d{1,2}))$

See the regex demo. Details:

  • ^ - start of string
  • (?: - start of the first non-capturing group:
    • \d{4}-\d{4}-\d{2} - four digits, -, four digits, -, two digits
    • | - or
    • \d{5}- - five digits, -
    • (?: - start of the second non-capturing group:
      • \d{3}-\d{2} - three digits, -, two digits
      • | - or
      • \d{4}-\d{1,2} - four digits, - and one or two digits
    • ) - end of the second non-capturing group
  • ) - end of the first non-capturing group.
  • $ - end of string.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew