'The LAG() function doesn't seem to work with strings (varchar type columns)
I can't seem to get LAG() to work with strings, it seems like it wants to convert them to numeric values.
The first few rows of the CTE "event_stream_time_window" look like this (it is ordered by id and timestamp):
| id | os | event | category | timestamp |
|---|---|---|---|---|
| 1 | ios | launch | start | 2022-03-01 13:14:45 |
| 1 | ios | landing page - view | landing page | 2022-03-01 13:14:46 |
| 1 | ios | message - click | messages | 2022-03-01 13:14:50 |
I want to create a column named "lag_cat" that contains the value of the "category" column from the previous row, or if there is now row just contains the value of the "category" column of the current row. For example, I want the first few rows of the output to look something like this:
| id | os | event | category | timestamp | lag_cat |
|---|---|---|---|---|---|
| 1 | ios | launch | start | 2022-03-01 13:14:45 | start |
| 1 | ios | landing page - view | landing page | 2022-03-01 13:14:46 | landing page |
| 1 | ios | message - click | messages | 2022-03-01 13:14:50 | messages |
There are multiple IDs in the table and I want this to be done within ID group, so I am using the following code:
SELECT
id,
os,
event,
category,
timestamp,
LAG(category, 1, 0) OVER (PARTITION BY id ORDER BY timestamp) AS lag_cat
FROM event_stream_time_window
I get the following error: "Numeric value 'Start' is not recognized"
So it seems like LAG() is trying to convert "category" to a numeric type. Is there any way around this?
Thanks!
Solution 1:[1]
The issue here is the default value for LAG which is equal 0 (data type numeric). Casting string literals "start"/"messages"/... to nummeric is not possible as they are not valid numbers, thus the error message.
If zero character is required as output then it should be provided as '0':
SELECT
id,
os,
event,
category,
timestamp,
LAG(category, 1, '0') OVER (PARTITION BY id ORDER BY timestamp) AS lag_cat
FROM event_stream_time_window;
The mechanism that causes it is called "Implicit data type coercion"
Solution 2:[2]
If you want the current rows CATEGORY if there is not prior, the defualt value should be CATEGORY not 0
SELECT
id,
os,
event,
category,
timestamp,
lag(category,1,category) over (PARTITION BY id ORDER BY timestamp) as lag_cat
FROM VALUES
(1, 'ios', 'launch','start','2022-03-01 13:14:45'),
(1, 'ios', 'landing page - view','landing page','2022-03-01 13:14:46'),
(1, 'ios', 'message - click','messages','2022-03-01 13:14:50')
t(id, os, event, category, timestamp)
| ID | OS | EVENT | CATEGORY | TIMESTAMP | LAG_CAT |
|---|---|---|---|---|---|
| 1 | ios | launch | start | 2022-03-01 13:14:45 | start |
| 1 | ios | landing page - view | landing page | 2022-03-01 13:14:46 | start |
| 1 | ios | message - click | messages | 2022-03-01 13:14:50 | landing page |
So the default can be anything, but it needs to be the same type as the column being lagged. The category is a text, so the default needs to be text also:
lag(category,1,'default value') over (PARTITION BY id ORDER BY timestamp) as lag_cat
| ID | OS | EVENT | CATEGORY | TIMESTAMP | LAG_CAT |
|---|---|---|---|---|---|
| 1 | ios | launch | start | 2022-03-01 13:14:45 | default value |
| 1 | ios | landing page - view | landing page | 2022-03-01 13:14:46 | start |
| 1 | ios | message - click | messages | 2022-03-01 13:14:50 | landing page |
but inserting a zero is a number, and thus different type, and thus the error, but instead of saying number is not type text order seems to be mixed.
SELECT
'a' as t
,system$typeof(t)
,0 as n
,system$typeof(n)
| T | SYSTEM$TYPEOF(T) | N | SYSTEM$TYPEOF(N) |
|---|---|---|---|
| a | VARCHAR(1)[LOB] | 0 | NUMBER(1,0)[SB1] |
and
SELECT
'a' as t
,system$typeof(t)
,0 as n
,system$typeof(n)
,lag(t,1,n) over (ORDER BY n) as lag_t
gives:
Numeric value 'a' is not recognized
which shows the type of the default is being checked against the column, which feels like a compiler bug to me, but it's trying to say these things are not the same type.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Lukasz Szozda |
| Solution 2 |
