'Regular expression over EDI File
I have the following EDI file and need to filter the element LOC+11 but not the LOC+7 and I need all segments between them that the LOC Segment gets repeated but the segments between them not.
At the moment my regex looks like LOC[^L]*(?:L(?!OC)[^L]*)* but with that I get 4 results because it filters the loc+7 elemements too.
I only need the 2 results. Could you help me?
> NAD+ST+14::92++Test' LOC+11+KOD23277::92' LOC+7+D77::92:Test' LIN+1++
> test AP:IN'IMD+F++12::272:K
> RIPPsadasdRIEM'RFF+ON:EN10514492'RFF+AAN:501'
> DTM+171:20220309:102'RFF+AIF:500'DTM+171:20220305:102'CTA+SC+12414:test,
> test'[email protected]:EM'
> COM+?+49-561-490-4173:TE'COM+?+49-561-490-84173:FX' QTY+83:1000:PCE'
> QTY+70:66850:PCE'DTM+51:20080101:102'
> QTY+72:0:PCE'DTM+52:20080101:102'
> QTY+194:1000:PCE'DTM+50:20220224:102'
> RFF+AAU:2143276'DTM+171:20220218:102'
> QTY+194:1000:PCE'DTM+50:20220202:102'
> RFF+AAU:2138944'DTM+171:20220131:102'
> QTY+194:1000:PCE'DTM+50:20220105:102'
> RFF+AAU:2138943'DTM+171:20220103:102' SCC+24'
> QTY+113:1000:PCE'DTM+2:20220412:102'
> QTY+113:1000:PCE'DTM+2:20220503:102'
> QTY+113:1000:PCE'DTM+64:20220530:102'DTM+63:20220605:102'
> QTY+113:1000:PCE'DTM+64:20220620:102'DTM+63:20220626:102'
> QTY+113:1000:PCE'DTM+64:20220711:102'DTM+63:20220717:102'
> QTY+113:1000:PCE'DTM+64:20220801:102'DTM+63:20220807:102' GEI+3+37'
>
> NAD+ST+14::92++test' LOC+11+KOD823226::92' LOC+7+D86::92:Test' LIN+2++
> test H:IN'IMD+F++12::272:K
> RIPPRIEM'RFF+ON:EN10662318'RFF+AAN:266'DTM+171:20220309:102'
> RFF+AIF:265'DTM+171:20220305:102'CTA+SC+12414:test,
> test'[email protected]:EM'
> COM+?+49-561-490-4173:TE'COM+?+49-561-490-84173:FX' QTY+83:200:PCE'
> QTY+70:14319:PCE'DTM+51:20100101:102'
> QTY+72:0:PCE'DTM+52:20100101:102' QTY+194:200:PCE'DTM+50:20220126:102'
> RFF+AAU:2146871'DTM+171:20220121:102'
> QTY+194:200:PCE'DTM+50:20211210:102'RFF+AAU:2146914'DTM+171:20211209:102' QTY+194:200:PCE'DTM+50:20211129:102'RFF+AAU:2139927'DTM+171:20211124:102'SCC+24'
> QTY+113:200:PCE'DTM+2:20220503:102'
> QTY+113:200:PCE'DTM+64:20220606:102'DTM+63:20220612:102'
> QTY+113:200:PCE'DTM+64:20220718:102'DTM+63:20220724:102'
> QTY+113:200:PCE'DTM+64:20220829:102'DTM+63:20220904:102'
> QTY+113:200:PCE'DTM+64:20221010:102'DTM+63:20221016:102'
>
> UNT+142+1'UNZ+1+2756'
Solution 1:[1]
You can use
LOC\+11[^L]*(?:L(?!OC\+11)[^L]*)*
LOC\+11[\w\W]*?(?=LOC\+11|$)
See the regex demo.
Details:
LOC\+11-LOC+11string[^L]*(?:L(?!OC\+11)[^L]*)*- any text up to the first occurrence ofLOC+11substring (uses the unroll-the-loop principle).
Although the results you get with the two patterns above are identical, the first one is much faster provided there are not too many Ls that are not followed with +11.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
