'Using String for VertexId graphX

I am new to Spark and GraphX. I am trying to create a graph using graphX. However IDs in the data are like below:

'20|pending_org_::a5055a7d50b4c9777f62181c6fd043bc'

As I understood, VertexId must be of type Long in GraphX but this type of String is not convertible to Long. I need this ID for future steps so I must have it in the graph nodes. Also, I don't want to use fake IDs as data is already big enough.

Any idea how it is possible to fix this issue?



Solution 1:[1]

You can use a collision-resistant hash function (for example see https://en.wikipedia.org/wiki/MurmurHash) that produces a 64-bit output (or truncate the first 64 bits if your data doesn't have high cardinallity).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 meysam