'How to create a BQ SQL UDF that iterates over a string?
TL;DR:
Is there a way to do string manipulation in BQ only with SQL UDF?
Eg:
____________________________________________________
id | payload
----------------------------------------------------
1 | key1=val1&key2=val2&key3=val3=&key4=val4
----------------------------------------------------
2 | key5=val5&key6=val6=
select removeExtraEqualToFromPayload(payload) from table should give
____________________________________________________
payload
----------------------------------------------------
key1=val1&key2=val2&key3=val3&key4=val4
----------------------------------------------------
key5=val5&key6=val6
Long version:
My goal is to iterate over a string that is part of one of the columns
This is our table structure
____________________________________________________
id | payload
----------------------------------------------------
1 | key1=val1&key2=val2&key3=val3=&key4=val4
----------------------------------------------------
2 | key5=val5&key6=val6=
As you see, key3 in first row has an = after val3 and key6 in second row has an = after val6 which is not desired for us
So the goal is to iterate over the string and remove these extra =
I had gone through https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions that explains how to use custom functions in BQ. As of now SQL UDF only supports SQL query, where as with JS UDF we can write our custom logic to add loops etc
Since JS UDF is very slow, using it has been ruled out and we only had to rely on SQL UDF.
I thought of using BQ Scripting(https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting) in combination with SQL UDF but that doesn't seem to work. Looks like script has to be altogether different
I had explored stored procedures with BQ for the same, however, that is also not working. I'm not sure if I am doing it right
I've created a procedure like this:
CREATE PROCEDURE test.AddDelta(INOUT x INT64, delta INT64)
BEGIN
SET x = x + delta;
END;
I'm not able to use the above procedure like this:
with ta as (select 1 id union all select 2 id)
select id from ta;
call test.AddDelta(id, 1);
select id;
I'm wondering if there is a way to parse strings like this without using Javascript UDF
Solution 1:[1]
Disclaimer: My regex-fu is not good. definitely have a look at the re2 syntax
You should be able to do it with REGEXP_REPLACE
SELECT
payload,
REGEXP_REPLACE(payload,r'=(&)|=$','\\1') AS payload_clean
FROM
`myproject.mydataset.mytable`
example output:
| payload | payload_clean |
|---|---|
| key1=val1&key2=val2&key3=val3=&key4=val4= | key1=val1&key2=val2&key3=val3&key4=val4 |
Executable example:
WITH
payload_table AS (
SELECT "key1=val1&key2=val2&key3=val3=&key4=val4" AS payload UNION ALL
SELECT "key5=val5&key6=val6=" AS payload UNION ALL
SELECT "key1=val1&key2=val2&key3=val3=&key4=val4=" AS payload UNION ALL
SELECT "key3=val3=abc&key4=val4" AS payload
)
SELECT
payload,
REGEXP_REPLACE(payload,r'(=val\pN)=(\pL*&)|=(&)|=$','\\1\\2') AS payload_clean
FROM
payload_table
Of course (=val\pN)=(\pL*&) in the pattern won't necessarily work for you since you probably have different patterns. If there are no patterns to match then I'm not sure how you will remove the extra '=' from your strings automatically.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

