'Count the number of different sections of text that match a regular expression
I am working in postgresql. I have an int column "DOC_ID" and a text column "CONTENT". Each row of the table contains information about the document.
There are sections of text that match the regular expression
'#[A-Za-z]+:[A-Za-z]+#. They are different!
For example: #Lot : version# , #Participant : name# and others.
I want for each value that satisfies the regular expression to get a column with the ID of the document in which it occurs. Considering that in one document it can be several times.
Example: My table:
DOC_ID CONTENT
1 '#Participant : name# cat bear dog #Participant : name# mouse'
2 'cat #Participant : name# bear'
3 'cat #Lot : version# dog'
10 'mouse #Lot : version# cow'
Result table
TAG DOC_ID
#Participant : name# 1
#Participant : name# 1
#Participant : name# 2
#Lot : version# 3
#Lot : version# 10
Please tell me how can I do this.
Solution 1:[1]
Use the function regexp_matches():
select
(regexp_matches(content, '#[A-Za-z0-9 ?%--,. ]+:[A-Za-z0-9 ?%--,. ]+#', 'g'))[1] as tag,
doc_id
from my_table
Read about the function and pattern matching in the documentation.
Test the query in db<>fiddle.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | klin |
