'Add another column based on equal column values
PySpark: Possibly a duplicate, can't find a similar question.
I have a table A:
a | b | c
---------
1 | 3 | p
2 | 4 | q
3 | 4 | r
4 | 7 | s
And a table B:
p | q
---------
1 | Yes
2 | No
3 | Yes
I want the resultant table to be joined on column a value being equal to column p value. I tried the inner join, but it returns a copy of the entire table A for each q value. I want the resultant table to be:
a | b | c | q
--------------
1 | 3 | p | Yes
2 | 4 | q | No
3 | 4 | r | Yes
Please help out with how to achieve this in PySpark? And also, what do I do if I wanted this table:
a | b | c | q
--------------
1 | 3 | p | Yes
2 | 4 | q | No
3 | 4 | r | Yes
4 | 7 | s | null
Solution 1:[1]
You can easily do this with a join statement between the two DataFrames
More Information on Joins can be found - Spark Joins
Data Preparation
df1 = pd.read_csv(StringIO("""
a|b|c
1|3|p
2|4|q
3|4|r
4|7|s
"""),delimiter='|')
df2 = pd.read_csv(StringIO("""
p|q
1|Yes
2|No
3|Yes
"""),delimiter='|')
sparkDF1 = sql.createDataFrame(df1)
sparkDF2 = sql.createDataFrame(df2)
sparkDF1.show()
+---+---+---+
| a| b| c|
+---+---+---+
| 1| 3| p|
| 2| 4| q|
| 3| 4| r|
| 4| 7| s|
+---+---+---+
sparkDF2.show()
+---+---+
| p| q|
+---+---+
| 1|Yes|
| 2| No|
| 3|Yes|
+---+---+
Join - Inner
finalDF = sparkDF1.join(sparkDF2
,sparkDF1['a'] == sparkDF2['p'] ### Joining Key
,'inner' ### Join Type
).select(sparkDF1['*'],sparkDF2['q'])
finalDF.orderBy('a').show()
+---+---+---+---+
| a| b| c| q|
+---+---+---+---+
| 1| 3| p|Yes|
| 2| 4| q| No|
| 3| 4| r|Yes|
+---+---+---+---+
Join - Left
finalDF = sparkDF1.join(sparkDF2
,sparkDF1['a'] == sparkDF2['p']
,'left'
).select(sparkDF1['*'],sparkDF2['q'])
finalDF.orderBy('a').show()
+---+---+---+----+
| a| b| c| q|
+---+---+---+----+
| 1| 3| p| Yes|
| 2| 4| q| No|
| 3| 4| r| Yes|
| 4| 7| s|null|
+---+---+---+----+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Vaebhav |
