'Neo4j - GDS - FastRP Algorithm - Same values but different embeddings

While using the FastRP algorithm, a phrase in the documentation caught my attention. I also faced this situation.

Link: https://neo4j.com/docs/graph-data-science/current/algorithms/fastrp/?_gl=1*1pjy8fd*_ga*OTg2ODkyMjYuMTY0NzI1Njk2Mg..*_ga_DL38Q8KGQC*MTY0NzUwMDg5MS4xNS4xLjE2NDc1MDIwMDAuMA..&_ga=2.25047225.28509462.1647256962-98689226.1647256962

Phrase: Because of L2 normalization which is applied to each iteration (here only one iteration), all nodes have the same embedding despite having different age values (apart from rounding errors).

When getting embedding with FastRP on a graph (Let's consider only the properties, that is, propertyRatio = 1), how can the embedding of 2 nodes with exactly the same values ​​be the different? In the link I shared above, this was explained as if it was a normal situation, but it seemed a bit inconvenient to me.



Solution 1:[1]

If there is a single node property value and propertyRatio of 1.0, then the embeddings are identical. However, as soon as you add more node properties or lower the propertyRatio, the values of node properties come into play.

One thing to note is that node values are normalized node by node, so if you use propertyRatio of 1 with the following nodes:

(a:Person {age: 10, numberOfPets: 1}), (b:Person {age: 100, numberOfPets: 10})

The embeddings will still be identical. However for example the (c:Person {age: 10, numberOfPets: 10}) would have a different embedding.

As far as I understand, the node values are normalized prior to being used in the FastRP algorithm as to not overpower the original fastRP embeddings (the network position encoding).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tomaž Bratanič