'updateStateByKey from RDD

I am a bit new to Spark-graphx, so please forgive if this is a stupid question. I also would prefer to do this in Java, rather than Scala, if at all possible.

I need to run a graphx calculation on the RDDs of a JavaDStream, but I need to roll the results back into my state object.

  • I am doing the graphx calculation inside of foreachRDD, since I do not know of another way to get the RDDs from the JavaDStream;
  • updateStateByKey only works on the JavaDStream;
  • Each graph vertex maps 1-1 to each state object, so if there is a way to access the state object inside of the foreachRDD, then this would solve it. But just passing a reference to the object inside of the vertex and calling the update function inside of there strikes me as bad practise, but I could be wrong?

How would you solve this problem in Java? I am ready to restructure the calculations to a different logical flow, if there is a better way to do this.

To make this more visual, the structure looks like this:

JavaDStream<StateObject> stream = inputDataStream.updateStateByKey(function);

stream.foreachRDD(rdd -> {
  Graph<Vertex, EdgeProperty> graph = GraphImpl.apply(/* derive the Vertex and EdgeProperty from the rdd */);
  JavaRDD<Vertex> updatedVertices = graphOperation(graph);
  // How to put the contents of updatedVertices back into stream?
});


Solution 1:[1]

I put my graph calculation in as a transform and got things up and running up to the point of hanging during fold (in Pregel) and errors from Scala when running JavaConverters.asScalaIteratorConverter that there was no appropriate iterator...

In short, after reading online that Graphframes is potentially more stable than graphx for Java, since it is apparently easier to wrap the Scala in Java context for Dataframes, I have abandoned this approach and moved to Graphframes. For others who have run into similar problems, I apologize that I have no solution to offer, but I am finding the Dataframe approach to work must better with my algorithm.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jennifer