'How to convert enumrate pyspark code into scala code
Below is the pyspark code for matrix multiplication. I need same code logic in scala for matrix multiplication as this logic is good for large volume dataset.
from pyspark import SparkConf, SparkContext
from pyspark.sql import functions as F
from functools import reduce
df = spark.sql("select * from tablename")
colDFs = []
for c2 in df.columns:
colDFs.append( df.select( [ F.sum(df[c1]*df[c2]).alias("op_{0}".format(i)) for i,c1 in enumerate(df.columns) ] ) )
mtx = reduce(lambda a,b: a.select(a.columns).union(b.select(a.columns)), colDFs )
mtx.show()
Solution 1:[1]
for enumerate you can use zipWithIndex as in df.columns.zipWithIndex
I didn't test it but overall code should be something like
val colsDf=df.columns.flatMap{ case c =>
df.columns.zipWithIndex.map{ case (c2,i) =>
df.select(sum(col(c)*col(c2).alias(s"op_$i")))
}
}
colsDf.reduce((a,b)=>a.select(a.columns.map(col):_*).union(b.select(b.columns.map(col):_*)))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
