'validate two data columns from difrnt source dataframes in databricks, if data matched(record counts) row wise , then excute the command or else error

dataframe -1:

created year, rec_counts
2016               50
2015               40

Dataframe -2:

created year, rec_counts
2016               1000
2015               47


Solution 1:[1]

There are 2 methods you can try.

  1. Let's assume the names of two DataFrames are df1 and df2.

    Now, if you just want to count the number of rows and check if both has same row count or not, use df1.count() and df2.count() and check if both gives the same output (total number of rows in each group).

  2. Secondly, you can write statement df2.except(df1) and this will return the complete rows which haven't present in other dataframe. If it returns NULL, it means both dataframes are same.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 UtkarshPal-MT