'Redshift Regex_Replace : using pattern in replace string
My requirement is to split a json file in rows for each element within the json. I order to do that I've figure it out that the query to split by rows is something like this:
SELECT EXPLODE(SPLIT(regexp_replace(json_string, '}, "ChIJA7Vlp5iZpBIRo8rMYZedIDo": {', '}\n{'), '\n')) AS json_exploded
Info: I'm connected to an EMR Cluster (spark under the hood) through a Thrift Server so that I can query json files from DataGrip.
But, to do so, I have replace this chunk of the string:
}, "ChIJA7Vlp5iZpBIRo8rMYZedIDo": {
by a '}\n{'
so then I will be available to split by row using the split.
So I should find first a pattern for this
}, "ChIJA7Vlp5iZpBIRo8rMYZedIDo": {
in order to replace it by this:
}\n{
Solution 1:[1]
You haven't said which programming language you are using. Many languages support extract/replace by groups. The following regex contains three capturing groups
(.*\}, ")(\w*)(": \{.*)
$1 = SELECT EXPLODE(SPLIT(regexp_replace(json_string, '}, "
$2 = ChIJA7Vlp5iZpBIRo8rMYZedIDo
$3 = ": {', '}\n{'), '\n')) AS json_exploded
Which can be refered to with $1 $2 $3 or \1 \2 \3 depending on the programming language that you are using.
The expression $1\n$3 or \1\n\3 would therefore equal SELECT EXPLODE(SPLIT(regexp_replace(json_string, '}, "\n": {', '}\n{'), '\n')) AS json_exploded
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
