'Gremlin using select in hasId returns incorrect result
I'm trying to execute a gremlin query where a saved vertex id is re-used later in a hasId clause. What I see is that when I put in the literal Id the answer is correct, however when I substitute the literal for a select('deployable_id') the answer is incorrect. Unfortunately in my real life example I can't put in the literal Id.
I would like to understand why this behavior is occurring, and also if there is a better way of doing this query that avoids this problem.
I am running gremlin against AWS Neptune, however I can also replicate this problem locally using just the gremlin console.
Steps to replicate the problem in gremlin console:
Set up a simple data set
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
g.addV('deployable').property('name', 'd1')
g.addV('deployable').property('name', 'd2')
g.addV('library').property('name', 'l1')
g.addV('class').property('name', 'c1')
g.addV('class').property('name', 'c2')
g.addV('app').property('name', 'a1')
g.addV('app').property('name', 'a2')
g.V().has('name', 'd1').addE('ships').to(V().has('name', 'l1'))
g.V().has('name', 'd2').addE('ships').to(V().has('name', 'l1'))
g.V().has('name', 'l1').addE('includes').to(V().has('name', 'c1'))
g.V().has('name', 'l1').addE('includes').to(V().has('name', 'c2'))
g.V().has('name', 'a1').addE('deploys').to(V().has('name', 'd1'))
g.V().has('name', 'a2').addE('deploys').to(V().has('name', 'd2'))
g.V().has('name', 'a1').addE('loads').to(V().has('name', 'c1'))
g.V().has('name', 'a2').addE('loads').to(V().has('name', 'c2'))
Find the id of d1 using this query (it is always 0 as far as I can see)
g.V().has('name', 'd1').id()
Run the query with the literal id (ie the number 0)
g.V().
has('name', 'd1').
as('deployable').
id().as('deployable_id').
select('deployable').
out('ships').
project('library','total_classes', 'loaded_classes').
by('name').
by(__.out('includes').count()).
by(
__.out('includes').
where(
__.in('loads').out('deploys').hasId(0)
).count()
)
This returns the correct result where loaded_classes = 1
==>[library:l1,total_classes:2,loaded_classes:1]
Now run the query which uses the select
g.V().
has('name', 'd1').
as('deployable').
id().as('deployable_id').
select('deployable').
out('ships').
project('library','total_classes', 'loaded_classes').
by('name').
by(__.out('includes').count()).
by(
__.out('includes').
where(
__.in('loads').out('deploys').hasId(__.select('deployable_id'))
).count()
)
This produces an incorrect result where loaded_classes = 0
==>[library:l1,total_classes:2,loaded_classes:0]
The above example does have a solution (__.in('loads').out('deploys').has('name', 'd1')), however this solution also does not work in my real life example, and I am as yet unable to replicate this problem in a simple example.
Solution 1:[1]
There is no overload of hasId() that will take a Traversal as an argument. It accept it because the signature involves an Object but that Object is meant to be an identifier and therefore hasId() assumes your Traversal is the identifier to search for. Graphs should probably reject unacceptable identifiers with a meaningful message but TinkerGraph in particular is quite happy to use any Object as an T.id so it allows it.
I would probably re-write your query using some form of where():
gremlin> g.V().
......1> has('name', 'd1').
......2> as('deployable').
......3> out('ships').
......4> project('library','total_classes', 'loaded_classes').
......5> by('name').
......6> by(__.out('includes').count()).
......7> by(
......8> __.out('includes').
......9> where(
.....10> __.in('loads').out('deploys').where(eq('deployable'))
.....11> ).count()
.....12> )
==>[library:l1,total_classes:2,loaded_classes:1]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | stephen mallette |
