'PL SQL String Manipulation (Deconcatenate based on multiple delimiters)

I have string such as A~B~C,D~E,F~G~H,I. The string is comma separated and then each comma separated substring is ~ separated. There is no limit on the number of commas or ~ in a string. Objective is to get first value after each comma.

I am calling coalesce function to fetch number of commas.

select coalesce(length('A~B~C,D~E,F~G~H,I') - length(replace('A~B~C,D~E,F~G~H,I',',',null)), 
                length('A~B~C,D~E,F~G~H,I'),
                0) as output1 
  from dual;

Now, I want to run the loop for i = (output from coalesce query above) and substring the string based on comma and ~. For example,

1st loop iteration - output = A
2nd loop iteration - output = D
3rd loop iteration - output = F
4th loop iteration - output = I
Loop Stops

I wrote this query which works fine except for the boundary cases i.e. fails in first iteration. Here i is the ith iteration of the loop

select substr('A~B~C,D~E,F~G~H,I',
       instr('A~B~C,D~E,F~G~H,I',',',1,i) + 1, 
       instr('A~B~C,D~E,F~G~H,I','~',1,1)-1) as output1 
  from dual;

Any suggestions? Ideas?



Solution 1:[1]

The natural way to solve this problem is using Regular Expressions.

In the query below we apply expression '(^|,)[A-Z]+' to match a letter following after the beginning of string ^ or comma ,. So REGEXP_COUNT lets limit the query results by exact count of matches in the string, and REGEXP_SUBSTR extracts exactly one match for each LEVEL from 1 to "matches count".

SELECT REGEXP_SUBSTR(T.str, '(^|,)[A-Z]+', 1, LEVEL) AS substr
  FROM (
    SELECT 'A~B~C,D~E,F~G~H,I' AS str FROM DUAL
  ) T
CONNECT BY LEVEL <= REGEXP_COUNT(T.str, '(^|,)[A-Z]+');

PS. As we see, resulting matches contain unneded commas. Not to overcomplicate the example the REPLACE(result, ',') is omitted.

Solution 2:[2]

SELECT regexp_substr(
'UNB+UNOA,WT_Syntax_version_no+1:ZZ+2:ZZ+WT_Edi_sent_date:WT_Edi_sent_time+WT_Interchange_control_ref'
, '[^+|:|,]+' 
, 1
, LEVEL
) seq_no 
FROM dual
CONNECT BY regexp_substr('UNB+UNOA,WT_Syntax_version_no+1:ZZ+2:ZZ+WT_Edi_sent_date:WT_Edi_sent_time+WT_Interchange_control_ref','[^+|:]+', 1, LEVEL)
            IS NOT NULL;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 diziaq
Solution 2 Gnqz