'How to properly format the string in a new column of DataFrame?
I have a DataFrame with two columns col1 and col2 (Spark 2.2.0 and Scala 2.11). I need to create a new column in the following format:
=path("http://mywebsite.com/photo/AAA_BBB.jpg", 1)
where AAA is the value of col1 and BBB is the value of col2 for a given row.
The problem is that I do not know how to properly handle ". I tried this:
df = df.withColumn($"url",=path("http://mywebsite.com/photo/"+col("col1") + "_"+col("col2")+".jpg", 1))"
UPDATE:
It compiles ok now, but column values are not inserted in a string. Instead of column values, I see the text col1 and col2.
df = df.withColumn("url_rec",lit("=path('http://mywebsite.com/photo/"+col("col1")+"_"+col("col1")+".jpg', 1)"))
I get this:
=path('http://mywebsite.com/photo/col1_col1.jpg', 1)
Solution 1:[1]
As stated in the comments, you can either use concat multiple times like :
d.show
+---+---+
| a| b|
+---+---+
|AAA|BBB|
+---+---+
d.withColumn("URL" ,
concat(
concat(
concat(
concat(lit("""=path("http://mywebsite.com/photo/""" ), $"a") ,
lit("_") ) , $"b"
)
, lit(""".jpg", 1) """)
).as[String].first
// String = "=path("http://mywebsite.com/photo/AAA_BBB.jpg", 1) "
Or you can map over the dataframe to append a new column ( which is cleaner than the concat method )
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
val urlRdd = d.map{ x =>
Row.fromSeq(x.toSeq ++ Seq(s"""=path("http://mywebsite.com/photo/${x.getAs[String]("a")}_${x.getAs[String]("b")}.jpg", 1)"""))
}
val newDF = sqlContext.createDataFrame(urlRdd, d.schema.add("url", StringType) )
newDF.map(_.getAs[String]("url")).first
// String = =path("http://mywebsite.com/photo/AAA_BBB.jpg", 1)
Solution 2:[2]
This is an old question but I put my answer here for anybody else. You can use the format_string function
scala> df1.show()
+----+----+
|col1|col2|
+----+----+
| AAA| BBB|
+----+----+
scala> df1.withColumn(
"URL",
format_string(
"""=path("http://mywebsite.com/photo/%s_%s.jpg", 1)""",
col("col1"),
col("col2")
)
).show(truncate = false)
+----+----+--------------------------------------------------+
|col1|col2|URL |
+----+----+--------------------------------------------------+
|AAA |BBB |=path("http://mywebsite.com/photo/AAA_BBB.jpg", 1)|
+----+----+--------------------------------------------------+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | philantrovert |
| Solution 2 | Iraj Hedayati |
