'Extract numbers from a field in PostgreSQL

I have a table with a column po_number of type varchar in Postgres 8.4. It stores alphanumeric values with some special characters. I want to ignore the characters [/alpha/?/$/encoding/.] and check if the column contains a number or not. If its a number then it needs to typecast as number or else pass null, as my output field po_number_new is a number field.

Below is the example:

example

SQL Fiddle.

I tired this statement:

select 
(case when  regexp_replace(po_number,'[^\w],.-+\?/','') then po_number::numeric
else null
end) as po_number_new from test

But I got an error for explicit cast:

error



Solution 1:[1]

I think you want something like this:

select (case when regexp_replace(po_number, '[^\w],.-+\?/', '') ~ '^[0-9]+$'
             then regexp_replace(po_number, '[^\w],.-+\?/', '')::numeric
        end) as po_number_new 
from test;

That is, you need to do the conversion on the string after replacement.

Note: This assumes that the "number" is just a string of digits.

Solution 2:[2]

The logic I would use to determine if the po_number field contains numeric digits is that its length should decrease when attempting to remove numeric digits.

If so, then all non numeric digits ([^\d]) should be removed from the po_number column. Otherwise, NULL should be returned.

select case when char_length(regexp_replace(po_number, '\d', '', 'g')) < char_length(po_number)
            then regexp_replace(po_number, '[^0-9]', '', 'g')
            else null
       end as po_number_new
from test

Solution 3:[3]

If you want to extract floating numbers try to use this:

SELECT NULLIF(regexp_replace(po_number, '[^\.\d]','','g'), '')::numeric AS result FROM tbl;

It's the same as Erwin Brandstetter answer but with different expression:

[^...] - match any character except a list of excluded characters, put the excluded charaters instead of ...

\. - point character (also you can change it to , char)

\d - digit character

Solution 4:[4]

Since version 12 - that's 2 years + 4 months ago at the time of writing (but after the last edit that I can see on the accepted answer), you could use a GENERATED FIELD to do this quite easily on a one-time basis rather than having to calculate it each time you wish to SELECT a new po_number.

Furthermore, you can use the TRANSLATE function to extract your digits which is less expensive than the REGEXP_REPLACE solution proposed by @ErwinBrandstetter!

I would do this as follows (all of the code below is available on the fiddle here):

CREATE TABLE s
(
  num TEXT,
  
  new_num INTEGER GENERATED ALWAYS AS
    (NULLIF(TRANSLATE(num, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. ', ''), '')::INTEGER) STORED
);

You can add to the 'ABCDEFG... string in the TRANSLATE function as appropriate - I have decimal point (.) and a space ( ) at the end - you may wish to have more characters there depending on your input!

And checking:

INSERT INTO s VALUES ('2'), (''), (NULL), (' ');
INSERT INTO t VALUES ('2'), (''), (NULL), (' ');
SELECT * FROM s;
SELECT * FROM t;

Result (same for both):

num    new_num
  2          2
          NULL
          NULL
          NULL

So, I wanted to check how efficient my solution was, so I ran the following test inserting 10,000 records into both tables s and t as follows (from here):

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
INSERT INTO t 
with symbols(characters) as 
(
  VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
)
select string_agg(substr(characters, (random() * length(characters) + 1) :: INTEGER, 1), '')
from symbols
join generate_series(1,10) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx;

The differences weren't that huge but the regex solution was consistently slower by about 25% - even changing the order of the tables undergoing the INSERTs.

However, where the TRANSLATE solution really shines is when doing a "raw" SELECT as follows:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
  NULLIF(TRANSLATE(num, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. ', ''), '')::INTEGER
FROM s;

and the same for the REGEXP_REPLACE solution.

The differences were very marked, the TRANSLATE taking approx. 25% of the time of the other function. Finally, in the interests of fairness, I also did this for both tables:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
  num, new_num
FROM t;

Both extremely quick and identical!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gordon Linoff
Solution 2
Solution 3 doninpr
Solution 4 Vérace - ????? ???????