'Add another column based on equal column values

PySpark: Possibly a duplicate, can't find a similar question.

I have a table A:

a | b | c
---------
1 | 3 | p
2 | 4 | q
3 | 4 | r
4 | 7 | s

And a table B:

p | q
---------
1 | Yes
2 | No
3 | Yes

I want the resultant table to be joined on column a value being equal to column p value. I tried the inner join, but it returns a copy of the entire table A for each q value. I want the resultant table to be:

a | b | c | q
--------------
1 | 3 | p | Yes
2 | 4 | q | No
3 | 4 | r | Yes

Please help out with how to achieve this in PySpark? And also, what do I do if I wanted this table:

a | b | c | q
--------------
1 | 3 | p | Yes
2 | 4 | q | No
3 | 4 | r | Yes
4 | 7 | s | null


Solution 1:[1]

You can easily do this with a join statement between the two DataFrames

More Information on Joins can be found - Spark Joins

Data Preparation

df1 = pd.read_csv(StringIO("""
a|b|c
1|3|p
2|4|q
3|4|r
4|7|s
"""),delimiter='|')

df2 = pd.read_csv(StringIO("""
p|q
1|Yes
2|No
3|Yes
"""),delimiter='|')


sparkDF1 = sql.createDataFrame(df1)
sparkDF2 = sql.createDataFrame(df2)

sparkDF1.show()

+---+---+---+
|  a|  b|  c|
+---+---+---+
|  1|  3|  p|
|  2|  4|  q|
|  3|  4|  r|
|  4|  7|  s|
+---+---+---+

sparkDF2.show()

+---+---+
|  p|  q|
+---+---+
|  1|Yes|
|  2| No|
|  3|Yes|
+---+---+

Join - Inner

finalDF = sparkDF1.join(sparkDF2
                        ,sparkDF1['a'] == sparkDF2['p'] ### Joining Key
                        ,'inner' ### Join Type
                    ).select(sparkDF1['*'],sparkDF2['q'])


finalDF.orderBy('a').show()

+---+---+---+---+
|  a|  b|  c|  q|
+---+---+---+---+
|  1|  3|  p|Yes|
|  2|  4|  q| No|
|  3|  4|  r|Yes|
+---+---+---+---+

Join - Left

finalDF = sparkDF1.join(sparkDF2
                        ,sparkDF1['a'] == sparkDF2['p']
                        ,'left'
                    ).select(sparkDF1['*'],sparkDF2['q'])


finalDF.orderBy('a').show()

+---+---+---+----+
|  a|  b|  c|   q|
+---+---+---+----+
|  1|  3|  p| Yes|
|  2|  4|  q|  No|
|  3|  4|  r| Yes|
|  4|  7|  s|null|
+---+---+---+----+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Vaebhav