'validate two data columns from difrnt source dataframes in databricks, if data matched(record counts) row wise , then excute the command or else error
dataframe -1:
created year, rec_counts
2016 50
2015 40
Dataframe -2:
created year, rec_counts
2016 1000
2015 47
Solution 1:[1]
There are 2 methods you can try.
Let's assume the names of two DataFrames are
df1anddf2.Now, if you just want to count the number of rows and check if both has same row count or not, use
df1.count()anddf2.count()and check if both gives the same output (total number of rows in each group).Secondly, you can write statement
df2.except(df1)and this will return the complete rows which haven't present in other dataframe. If it returnsNULL, it means both dataframes are same.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | UtkarshPal-MT |
