'Different Rows Returned from Cypher when only Column Changes are Expected
I am new to Neo4J, Cypher and StackOverflow so let me know when I’m bucking protocol.
I have a test Neo4J database with a little over 7k nodes representing movies and a fair amount of relationships connected to each movie. I’ve noticed that when I run queries that are sorted by a tally, I get different rows back depending on what parameters of the node I return. One specific example is when I try to list the movies with the most relationships.
This CQL query
MATCH (m:Movie)-[r]-()
RETURN m.title, count(r) AS rel_count ORDER BY rel_count DESC LIMIT 10
returns
m.title rel_count
The Three Musketeers 184
Scary Movie 3 128
Watchmen 114
Hero 113
Horrible Bosses 2 113
The Longest Yard 112
Carrie 111
The Butterfly Effect 111
Aliens 111
Dick Tracy 110
while the CQL query
MATCH (m:Movie)-[r]-()
RETURN m.title, m.tmsId, count(r) AS rel_count ORDER BY rel_count DESC LIMIT 10
returns
m.title m.tmsId rel_count
Forrest Gump MV000398520000 106
The Karate Kid MV000125280000 79
Evolution MV001067180000 76
Original Sin MV000790300000 74
Kill Bill: Vol. 2 MV001405480000 74
American Beauty MV000773810000 74
O Brother, Where Art Thou? MV000914040000 73
The Godfather MV000120060000 72
Sin City MV001558150000 71
The Karate Kid Part II MV000205920000 71
My assumption is(was) that the MATCH statement collects the data for the nodes m and the relationships r and the RETURN statement just filters the output to specific parameters. Therefore, I'd expect the same list returned for each call. Anyone see something wrong with my Cypher or expectations?
Even though I have a specific problem, I would also like to know why the problem is happening for some insight into Cypher’s syntax and execution. I've added the PROFILE output if that is helpful.


Solution 1:[1]
I think you're right. In Cypher the RETURN and WITH statements just work on the tabular data that they're given. If you specify an aggregate function like count/sum/etc... then it will automatically group based on the other, non-aggregate columns you've specified.
As a debugging step, what if you were to try this?
MATCH (m:Movie)-[r]-()
WITH m.title AS title, m.tmsId AS tmsId, count(r) AS rel_count
RETURN title, tmsId, rel_count ORDER BY rel_count DESC LIMIT 10
Solution 2:[2]
I think we found the answer. It seems that the COUNT(r) will count all the r's based on the other return values. Therefore when I ask for non-unique parameters I get the sum of the r's for all nodes with that parameter. In this specific case, There are 4 movies with the title "The Three Musketeers" and all 4 of their r's sum to 184. Adding the unique parameter tmsId returns the actual results for which I'm looking.
We found that we could make the non-unique parameters counted separately with:
MATCH (m:Movie)-[r]-()
WITH DISTINCT m, COUNT(r) AS rel_count
RETURN m.title, rel_count
ORDER BY rel_count DESC LIMIT 12
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Brian Underwood |
| Solution 2 |
