'Apache Spark: GraphFrame and random walk implementation
Suppose a directed graph where each node has a flag (boolean)
Main goal is: "jump" to a node for the next step (node) until we will find the node with flag=true with limited count of steps (configurable) for each vertex.
P.S. its okay to go back to initial node
For example:
Vertex A (flag=false) -> [B, C, D, G]
Vertex B (flag=false) -> [A, E, F, G]
Vertex C (flag=false) -> [A, E, G]
Vertex D (flag=false) -> [A, F, G]
Vertex E (flag=true) -> [B, C, G]
Vertex F (flag=false) -> [B, D]
Vertex G (flag=false) -> [A, B, C, D, E]
Vertex H (flag=true) -> [G]
stepsCount = 10
Algorithm:
- vertex A has flag=false, there are randomly selected node B where flag=false also
- so here again we randomly selecting the next node, lets say its a node F where flag=false
- so "jump" to the next node while stepsCount < 10
- if there are no flag=true mark the Vertex A with 0 else 1
// Vertex DataFrame
val v = spark.createDataFrame(List(
("A", false),
("B", false),
("C", false),
("D", false),
("E", true),
("F", false),
("G", false),
("H", true)
)).toDF("id", "flag")
// Edge DataFrame
val e = spark.createDataFrame(List(
("A", "B", 0),
("B", "A", 0),
("A", "C", 0),
("C", "A", 0),
("A", "D", 0),
("D", "A", 0),
("F", "B", 0),
("E", "B", 0),
("F", "D", 0),
("D", "F", 0),
("F", "D", 0),
("A", "G", 0),
("B", "G", 0),
("C", "G", 0),
("D", "G", 0),
("E", "G", 0),
("G", "H", 0)
)).toDF("src", "dst", "mark")
val g = GraphFrame(v, e)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

