'How to explain improving speed with larger data in Neo4j Cypher?

I make experiments with querying Neo4j data whose size gradually increases. Some of my query expressions behave as I would expect - a larger graph means a longer execution time. But, e.g., the query

MATCH (`g`:Graph)
MATCH (`g`)<-[:`Graph`]-(person)-[:birthPlace {Value: "http://persons/data/birthPlace"}]->(place)-[:`Graph`]->(`g`)
WITH count(person.Value) AS persons, place
    WHERE persons > 5 
RETURN place.Value AS place, persons 
ORDER BY persons

has these execution times (in ms): |80.2 |208.1 |301.7 |399.23 |0.1 |2.07 |2.61 |2.81 |7.3 |1.5 |.

How to explain the rapid acceleration from the fifth step? The data are the same, just extended; no indexes were created.

The data on 4th experiment: 201673 nodes, 601189 relationships, 859225 properties.

The data size on the 5th experiment: 242113 nodes, 741500 relationships, 1047060 properties.

All I can think about is that maybe Cypher will start using some indexes from a certain data size, but I can't find anywhere if that's the case.

Thank you for any comments.



Solution 1:[1]

Neo4j cache management may explain your observations. You might explain what you are doing more precisely. What version of Neo4j are you using? What is the Graph node? You are repeatedly running the same query on graph and trying this again with a larger or smaller graph?

If you are running the same query multiple times on the same data set with more rapid execution times, then the cache may be the reason. In v 3.5 and earlier it would "warm up" with repeated execution. Putatively this does not occur in v 4.x.

You might look at cold start or these tips. You might also look at your transaction log; is it accumulate large files.

Why the '' around your node identifiers ('g'); just use (g:Graph) and [r:Graph]; no quotes.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 David A Stumpf