'How to create a BQ SQL UDF that iterates over a string?

TL;DR:

Is there a way to do string manipulation in BQ only with SQL UDF?

Eg:

____________________________________________________
id       |       payload
----------------------------------------------------
1        | key1=val1&key2=val2&key3=val3=&key4=val4
----------------------------------------------------
2        | key5=val5&key6=val6=

select removeExtraEqualToFromPayload(payload) from table should give

____________________________________________________
payload
----------------------------------------------------
key1=val1&key2=val2&key3=val3&key4=val4
----------------------------------------------------
key5=val5&key6=val6

Long version:

My goal is to iterate over a string that is part of one of the columns

This is our table structure

____________________________________________________
id       |       payload
----------------------------------------------------
1        | key1=val1&key2=val2&key3=val3=&key4=val4
----------------------------------------------------
2        | key5=val5&key6=val6=

As you see, key3 in first row has an = after val3 and key6 in second row has an = after val6 which is not desired for us

So the goal is to iterate over the string and remove these extra =

I had gone through https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions that explains how to use custom functions in BQ. As of now SQL UDF only supports SQL query, where as with JS UDF we can write our custom logic to add loops etc

Since JS UDF is very slow, using it has been ruled out and we only had to rely on SQL UDF.

I thought of using BQ Scripting(https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting) in combination with SQL UDF but that doesn't seem to work. Looks like script has to be altogether different

I had explored stored procedures with BQ for the same, however, that is also not working. I'm not sure if I am doing it right

I've created a procedure like this:

CREATE PROCEDURE test.AddDelta(INOUT x INT64, delta INT64)
BEGIN
  SET x = x + delta;
END;

I'm not able to use the above procedure like this:

with ta as (select 1 id union all select 2 id)

select id from ta;
call test.AddDelta(id, 1);
select id;

I'm wondering if there is a way to parse strings like this without using Javascript UDF



Solution 1:[1]

Disclaimer: My regex-fu is not good. definitely have a look at the re2 syntax

You should be able to do it with REGEXP_REPLACE

SELECT
    payload,
    REGEXP_REPLACE(payload,r'=(&)|=$','\\1') AS payload_clean
FROM
    `myproject.mydataset.mytable`

example output:

payload payload_clean
key1=val1&key2=val2&key3=val3=&key4=val4= key1=val1&key2=val2&key3=val3&key4=val4

Executable example:

WITH
    payload_table AS (
        SELECT "key1=val1&key2=val2&key3=val3=&key4=val4"   AS payload UNION ALL
        SELECT "key5=val5&key6=val6="                       AS payload UNION ALL
        SELECT "key1=val1&key2=val2&key3=val3=&key4=val4="  AS payload UNION ALL
        SELECT "key3=val3=abc&key4=val4"                    AS payload
    )

SELECT
    payload,
    REGEXP_REPLACE(payload,r'(=val\pN)=(\pL*&)|=(&)|=$','\\1\\2') AS payload_clean
FROM
    payload_table

Output

Of course (=val\pN)=(\pL*&) in the pattern won't necessarily work for you since you probably have different patterns. If there are no patterns to match then I'm not sure how you will remove the extra '=' from your strings automatically.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1