'Databricks: how to convert Spark dataframe under %r to dataframe under %python

I found some tips about converting a pyspark dataframe to R, but I need to perform the opposite task: convert a R dataframe to pyspark

Anyone knows how to do it?



Solution 1:[1]

You can use the same approach as for other languages - use createOrReplaceTempView function to register your dataframe, and then use spark.sql from another language to access its content.

For example. If R side looks as following:

%r
library(SparkR)
id <- c(rep(1, 3), rep(2, 3), 3)
desc <- c('New', 'New', 'Good', 'New', 'Good', 'Good', 'New')
df <- data.frame(id, desc)
df <- createDataFrame(df)
createOrReplaceTempView(df, "test_df")
head(df)

  id desc
1  1  New
2  1  New
3  1 Good
4  2  New
5  2 Good
6  2 Good

then you can access these data from Python:

df = spark.sql("select * from test_df")
df.show()

+---+----+
| id|desc|
+---+----+
|1.0| New|
|1.0| New|
|1.0|Good|
|2.0| New|
|2.0|Good|
|2.0|Good|
|3.0| New|
+---+----+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alex Ott