'Check VAT number for syntactical correctness with Regex possible?

I am trying to find a way to validate european VAT-IDs. They vary in lenght, sometimes have checksums and so on. Normaly I am using regex to validate simple strings - but this looks kind of very complex to me.

Wikipedia has a list of the different syntaxes:

So before starting, wasting a lot of time and failing at the end, I would like to know from someone who uses regex more often than me, if it will be possible to pre-validate these numbers. If you think, VAT-ID syntax validation is not possible by regex, please give me an comprehensive example why not.

Thank you in advance.

Notes: Of course I know about validation XML-RPC validation of german ministry of finance (https://evatr.bff-online.de/eVatR/xmlrpc/), but this takes sometimes several minutes to receive an answer for the request. As well, they interrupt operation on this XML-RPC validation service from 23:00 to 05:00 o'clock Berlin time. Thats the reason why I would like to have a 2-step validation: first step for the syntax, second step (triggered by cron) with this XML-RPC.



Solution 1:[1]

There is a regex to validate the VAT number of the 27 EU countries provided at the Regular Expressions Cookbook, 2nd edition, 4.21. European VAT Numbers section.

There is no computing check with this regex, but it will still be able to check standalone strings that are likely to be EU VAT numbers.

Before validation, you should remove [-.?] or [^A-Z0-9] symbols. Then, use

(?xi)^(
(AT)?U[0-9]{8} |                              # Austria
(BE)?0[0-9]{9} |                              # Belgium
(BG)?[0-9]{9,10} |                            # Bulgaria
(HR)?[0-9]{11} |                              # Croatia
(CY)?[0-9]{8}[A-Z] |                          # Cyprus
(CZ)?[0-9]{8,10} |                            # Czech Republic
(DE)?[0-9]{9} |                               # Germany
(DK)?[0-9]{8} |                               # Denmark
(EE)?[0-9]{9} |                               # Estonia
(EL)?[0-9]{9} |                               # Greece
ES[A-Z][0-9]{7}(?:[0-9]|[A-Z]) |              # Spain
(FI)?[0-9]{8} |                               # Finland
(FR)?[0-9A-Z]{2}[0-9]{9} |                    # France
(GB)?([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3}) | # United Kingdom
(HU)?[0-9]{8} |                               # Hungary
(IE)?[0-9]{7}[A-Z]{1,2}   |                   # Ireland
(IE)?[0-9][A-Z][0-9]{5}[A-Z] |                # Ireland (2)
(IT)?[0-9]{11} |                              # Italy
(LT)?([0-9]{9}|[0-9]{12}) |                   # Lithuania
(LU)?[0-9]{8} |                               # Luxembourg
(LV)?[0-9]{11} |                              # Latvia
(MT)?[0-9]{8} |                               # Malta
(NL)?[0-9]{9}B[0-9]{2} |                      # Netherlands
(PL)?[0-9]{10} |                              # Poland
(PT)?[0-9]{9} |                               # Portugal
(RO)?[0-9]{2,10} |                            # Romania
(SE)?[0-9]{12} |                              # Sweden
(SI)?[0-9]{8} |                               # Slovenia
(SK)?[0-9]{10}                                # Slovakia
)$

See the regex demo

I have added a Croatian VAT alternative here.

Note that if you expect the country codes to be present, remove ? quantifiers after the closing round brackets.

Whenver new countries join the European Union, or member countries change their rules for VAT numbers, the regex needs an update.

Note that the regex in the cookbook does not correspond to the Wiki's Irish VAT number definition.

Also, it is not possible to fully validate this with the regex because some VAT numbers require specific data that is either hard to retrieve or should be computed using regular programming language means:

  • French first 2 digits are a "key", and the French key is calculated as follow : Key = [ 12 + 3 * ( SIREN modulo 97 ) ] modulo 97, for example : Key = [ 12 + 3 * ( 404,833,048 modulo 97 ) ] modulo 97 = [12 + 3*56] modulo 97 = 180 modulo 97 = 83 so the tax number for 404,833,048 is FR 83,404,833,048 source from : www.insee.fr.
  • Finnish VAT last digit is a check digit utilizing MOD 11-2
  • Italian VAT has a province 3-symbol code (indices 8, 9, 10)
  • Slovakian VAT number must be divisible by 11

Solution 2:[2]

My answer based on Wikipedia and Wiktor Stribi?ew:

^(ATU[0-9]{8}|BE[01][0-9]{9}|BG[0-9]{9,10}|HR[0-9]{11}|CY[A-Z0-9]{9}|CZ[0-9]{8,10}|DK[0-9]{8}|EE[0-9]{9}|FI[0-9]{8}|FR[0-9A-Z]{2}[0-9]{9}|DE[0-9]{9}|EL[0-9]{9}|HU[0-9]{8}|IE([0-9]{7}[A-Z]{1,2}|[0-9][A-Z][0-9]{5}[A-Z])|IT[0-9]{11}|LV[0-9]{11}|LT([0-9]{9}|[0-9]{12})|LU[0-9]{8}|MT[0-9]{8}|NL[0-9]{9}B[0-9]{2}|PL[0-9]{10}|PT[0-9]{9}|RO[0-9]{2,10}|SK[0-9]{10}|SI[0-9]{8}|ES[A-Z]([0-9]{8}|[0-9]{7}[A-Z])|SE[0-9]{12}|GB([0-9]{9}|[0-9]{12}|GD[0-4][0-9]{2}|HA[5-9][0-9]{2}))$

I found that some Ireland VAT id wasn't working with mentioned answer. It's not 100% bulletproof (especially for GB government departments) but should do the work.

Solution 3:[3]

The computations involved with the number (mod, multiplication, additions) cannot be represented as a (practicable) RegExp, since the language is not regular.

Since the numbers are finite in size, theoretically it's possible to create a RegExp that matches all correct numbers. But this is not practical, obviously.

For details on the actual computation, see http://www.pruefziffernberechnung.de/U/USt-IdNr.shtml (German)

Solution 4:[4]

Cyprus is changed to:

(CY)?[0-9]{8}[A-Z]

It's still wrong on the VIES check site.

Solution 5:[5]

I recently did something with this. What I did was keep a list of countries, identified by their 2 char ISO code. Each country has a regex field, if given the validator would use that to check if the input string atleast matches the given regex. If not it would be an error.

After that I had optionally for specific countries additional checks. They where more configured to run or not on the backend side though. There is no 'general' way to do this.

Also each country had a flag EU or not to know if other checks where required.

I also used this link: https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s21.html along with wikipedia's list to get a full list of ISO code's. Also I used this as reference for testing VAT numbers: https://www.braemoor.co.uk/software/vattestx.php

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 marverix
Solution 3 mgaert
Solution 4 double-beep
Solution 5 Stefan Hendriks