'Redshift Regex_Replace : using pattern in replace string

My requirement is to split a json file in rows for each element within the json. I order to do that I've figure it out that the query to split by rows is something like this:

SELECT EXPLODE(SPLIT(regexp_replace(json_string, '}, "ChIJA7Vlp5iZpBIRo8rMYZedIDo": {', '}\n{'), '\n')) AS json_exploded

Info: I'm connected to an EMR Cluster (spark under the hood) through a Thrift Server so that I can query json files from DataGrip.

But, to do so, I have replace this chunk of the string:

}, "ChIJA7Vlp5iZpBIRo8rMYZedIDo": {

by a '}\n{'

so then I will be available to split by row using the split.

So I should find first a pattern for this

}, "ChIJA7Vlp5iZpBIRo8rMYZedIDo": { 

in order to replace it by this:

}\n{


Solution 1:[1]

You haven't said which programming language you are using. Many languages support extract/replace by groups. The following regex contains three capturing groups

(.*\}, ")(\w*)(": \{.*)
$1 = SELECT EXPLODE(SPLIT(regexp_replace(json_string, '}, "
$2 = ChIJA7Vlp5iZpBIRo8rMYZedIDo
$3 = ": {', '}\n{'), '\n')) AS json_exploded

Which can be refered to with $1 $2 $3 or \1 \2 \3 depending on the programming language that you are using.
The expression $1\n$3 or \1\n\3 would therefore equal SELECT EXPLODE(SPLIT(regexp_replace(json_string, '}, "\n": {', '}\n{'), '\n')) AS json_exploded

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1