'Snowflake: Flag infinite loops in hierarchical data without using cursor in Snowflake
I'm struggling with a SQL code performance issue, while trying to identify in advance circular references of a dataset with approximately 1.5 MO rows candidates.
Before heading into a custom solution, I checked Snowflake CONNECT BY documentation but their current implementation has some unsupported features.
So I had to move with B Plan which involves using a cursor in Snowscripting, but the performance is really awful even after tweaking my input dataset WHERE condition to reduce the number of records by half (by default they're not circular references since they share the same parent id or they're a root record without any parent).
I've looked around for any solution that would not involve a cursor, but SQL Server code transpose into Snowflake did not meet my expectations.
I'm still left the C plan: Hardcoded in advance the level of depth we'll be managing, but this is definitively not my preferred option.
Processing speed here is a must have since future process will run on a daily basis in production, with a volume of data exceeding 20 million rows.
-- DUMMY table
CREATE OR REPLACE TRANSIENT TABLE DUMMY_HIERARCHY COPY GRANTS
(
TERR_CD VARCHAR(6),
CUST_NO VARCHAR(100),
PAR_CUST_NO VARCHAR(100),
IS_CYCLE BOOLEAN DEFAULT FALSE
);
-- DUMMY records ingestion
INSERT OVERWRITE INTO DUMMY_HIERARCHY (TERR_CD,CUST_NO,PAR_CUST_NO,IS_CYCLE) VALUES
('1100A','ABDD12346','ABCD12345',TRUE),
('1100A','ABCD12345','ABDD12346',TRUE),
('1200A','ABDD12346',NULL,FALSE),
('1200A','ABCD12345',NULL,FALSE),
('1300A','ABDD12346',NULL,FALSE),
('1300A','888888888',NULL,FALSE),
('3100A','ABDD12346',NULL,FALSE),
('3200B','FFF789635',NULL,FALSE),
('3200B','ABCD5698',NULL,FALSE),
('3200B','ABCD5698',NULL,FALSE),
('3200B','888888888',NULL,FALSE),
('3200B','9999998',NULL,FALSE),
('3300C','ABDD12346',NULL,FALSE),
('5500C','888888888',NULL,FALSE),
('5510C','ABDD12346',NULL,FALSE),
('5700C','12345','PPF0001',FALSE),
('5300B','88DEF8770','89DIF9990',FALSE),
('5300B','88DEF8778','89DIF9990',FALSE),
('5300B','89DIF9990',NULL,FALSE),
('5300B','88DIF9770','89DIF9990',FALSE),
('7701D','ZZZ014445','ZZZ012345',TRUE),
('7701D','ZZZ012345','ZZZ014445',TRUE),
('7701D','DFGERT345',NULL,FALSE),
('7801A','ERT47890','AP126400',TRUE),
('7801A','AP123400','AP126400',TRUE),
('7801A','AP126400','AP123400',TRUE),
('7902A','ABCDEFG','XYZ234',TRUE),
('7902A','XYZ234','ABCDEFG',TRUE),
('7851A','FFF789635',NULL,FALSE),
('7851A','100001A',NULL,FALSE),
('7908B','12345',NULL,FALSE),
('7909A','9999998',NULL,FALSE),
('8801C','12345',NULL,FALSE),
('9901D','12345',NULL,FALSE),
('9901D','FFF789635','100001A',FALSE),
('9901D','100001A','12345',FALSE),
('9901D','ABCD5698','888888888',FALSE),
('9901D','888888888','9999998',FALSE),
('9901D','9999998','FFF789635',FALSE);
-- Anonymous block to perform row by row processing without using a SPROC
DECLARE
-- Get CH records to be processed
cItems CURSOR FOR SELECT TERR_CD, ANY_VALUE(CUST_NO) AS CUST_NO, PAR_CUST_NO FROM DUMMY_HIERARCHY WHERE NOT IS_CYCLE AND PAR_CUST_NO IS NOT NULL GROUP BY TERR_CD, PAR_CUST_NO;
vTCD VARCHAR(8);
vCN VARCHAR(100);
vPCN VARCHAR(100);
vNextTCD VARCHAR(8);
vNextCN VARCHAR(100);
vNextPCN VARCHAR(100);
vOutput VARCHAR;
BEGIN
-- Row by row processing
FOR item IN cItems DO
vTCD := item.TERR_CD;
vCN := item.CUST_NO;
vPCN := item.PAR_CUST_NO;
vOutput := 'OK';
-- recreate ARRAY
LOOP
SELECT TERR_CD, ANY_VALUE(CUST_NO) AS CUST_NO, PAR_CUST_NO INTO :vNextTCD, :vNextCN, :vNextPCN FROM DUMMY_HIERARCHY WHERE NOT IS_CYCLE AND PAR_CUST_NO IS NOT NULL AND CUST_NO =:vCN AND TERR_CD =:vTCD GROUP BY TERR_CD, PAR_CUST_NO;
IF (vNextPCN IS NOT NULL) THEN
vCN := vNextPCN;
vTCD := vNextTCD;
ELSE
vOutput := vNextTCD || '+'|| vNextCN || '+'|| COALESCE(vNextPCN,'') || ' KO';
BREAK;
END IF;
END LOOP;
END FOR;
RETURN vOutput;
END;
Note: I do have a separate version of the code where vOutput is an ARRAY and I rely on IF (NOT ARRAY_CONTAINS(vNextPCN::variant,vOutput) AND (vNextPCN IS NOT NULL)) THEN to perform an ARREY_APPEND to keep track of the Cust Nos we've procesed already. When found we need to exit the loop and raise an error message or grief record into a separate table.
Thanks a lot in advance for your help and/or suggestions.
Daniel
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|