'Adding frequency counter between nodes in neo4j during csv import
I've got a csv file with ManufacturerPartNumbers and Manufacturers. Both values can potentially be duplicated across rows one or more times. Meaning I could have ManufacturerParnNumber,Manufactuerere: A|X , A|Y, A|Y, B|X, C,X
In this case, I'd like to create ManufacturerPartNumber nodes (A), (B), (C) and Manufacturer nodes (X), (Y)
I also want to create relationships of (A)-[MADE_BY]->(X) (A)-[MADE_BY]->(Y)
And I also want to apply a weighting value in the relationship between A -> Y since it appears twice in my dataset, so that I know that there's a more frequent relationship between A|Y than there is between A|X.
Is there a more efficient way of doing this? I'm dealing with 10M rows of csv data and it is crashing during import.
:param UploadFile => 'http://localhost:11001/project-f64568ab-67b6-4560-ae89-8aea882892b0/file.csv';
//open the CSV file
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
//create nodes
MERGE (mfgr:Manufacturer {name: COALESCE(trim(toUpper(csvLine.Manufacturer)),'NULL')})
MERGE (mpn:MPN {name: COALESCE(trim(toUpper(csvLine.MPN)),'NULL')})
//set relationships
MERGE (mfgr)-[a:MAKES]->(mpn)
SET a += {appearances: (CASE WHEN a.appearances is NULL THEN 0 ELSE a.appearances END) + 1, refid: (CASE WHEN a.refid is NULL THEN csvLine.id ELSE a.refid + ' ~ ' + csvLine.id END)}
;
Solution 1:[1]
Separating the node creation from the relationships creation and then setting the values helped a bit.
Ultimately what had the most impact was that I spun up an AuraDB at max size and then imported all of the data, followed by resizing it back down. Probably not an ideal way to handle it, but it worked better than all the other optimization and only cost me a few bucks!
//QUERY ONE: var2 and var1 nodes
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
MERGE (var2:VAR2 {name: COALESCE(trim(toUpper(csvLine.VAR2)),'NULL')})
MERGE (var1:VAR1 {name: COALESCE(trim(toUpper(csvLine.VAR1)),'NULL')})
;
//QUERY TWO: var2 and var1 nodes
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
MERGE (var2:VAR2 {name: COALESCE(trim(toUpper(csvLine.VAR2)),'NULL')})
MERGE (var1:VAR1 {name: COALESCE(trim(toUpper(csvLine.VAR1)),'NULL')})
MERGE (var2)-[a:RELATES_TO]->(var1) SET a += {appearances: (CASE WHEN a.appearances is NULL THEN 0 ELSE a.appearances END) + 1}
;
//QUERY THREE: handle descriptors
//open the CSV file
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
UNWIND split(trim(toUpper(csvLine.Descriptor)), ' ') AS DescriptionSep1 UNWIND split(trim(toUpper(DescriptionSep1)), ',') AS DescriptionSep2 UNWIND split(trim(toUpper(DescriptionSep2)), '|') AS DescriptionSep3 UNWIND split(trim(toUpper(DescriptionSep3)), ';') AS DescriptionSep4
MERGE (var2:VAR2 {name: COALESCE(trim(toUpper(csvLine.VAR2)),'NULL')})
MERGE (var1:VAR1 {name: COALESCE(trim(toUpper(csvLine.VAR1)),'NULL')})
MERGE (descriptor:Descriptor {name: COALESCE(trim(toUpper(DescriptionSep4)),'NULL')})
SET descriptor += {appearances: (CASE WHEN descriptor.appearances is NULL THEN 0 ELSE descriptor.appearances END) + 1}
MERGE (descriptor)-[d:DESCRIBES]->(var1)
SET d += {appearances: (CASE WHEN d.appearances is NULL THEN 0 ELSE d.appearances END) + 1}
;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | sutariachintan |