'Apache Spark: GraphFrame and random walk implementation

Suppose a directed graph where each node has a flag (boolean)

Main goal is: "jump" to a node for the next step (node) until we will find the node with flag=true with limited count of steps (configurable) for each vertex.

P.S. its okay to go back to initial node

For example:

Vertex A (flag=false) -> [B, C, D, G]
Vertex B (flag=false) -> [A, E, F, G]
Vertex C (flag=false) -> [A, E, G]
Vertex D (flag=false) -> [A, F, G]
Vertex E (flag=true)  -> [B, C, G]
Vertex F (flag=false) -> [B, D]
Vertex G (flag=false) -> [A, B, C, D, E]
Vertex H (flag=true)  -> [G]

stepsCount = 10

Algorithm:

  1. vertex A has flag=false, there are randomly selected node B where flag=false also
  2. so here again we randomly selecting the next node, lets say its a node F where flag=false
  3. so "jump" to the next node while stepsCount < 10
  4. if there are no flag=true mark the Vertex A with 0 else 1

Graph

// Vertex DataFrame
val v = spark.createDataFrame(List(
  ("A", false),
  ("B", false),
  ("C", false),
  ("D", false),
  ("E", true),
  ("F", false),
  ("G", false),
  ("H", true)
)).toDF("id", "flag")

// Edge DataFrame
val e = spark.createDataFrame(List(
  ("A", "B", 0),
  ("B", "A", 0),
  ("A", "C", 0),
  ("C", "A", 0),
  ("A", "D", 0),
  ("D", "A", 0),
  ("F", "B", 0),
  ("E", "B", 0),
  ("F", "D", 0),
  ("D", "F", 0),
  ("F", "D", 0),
  ("A", "G", 0),
  ("B", "G", 0),
  ("C", "G", 0),
  ("D", "G", 0),
  ("E", "G", 0),
  ("G", "H", 0)
)).toDF("src", "dst", "mark")

val g = GraphFrame(v, e)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source