'pandas dataframe update based on multiple conditions using loop, itertuples, iterrows etc
I have dataframe as below. it defines different data quality checks like not null, not negative etc. to be performed on each column in 2nd dataframe.
| FieldName | NotNull | Not negative | DQ in list values |
|---|---|---|---|
| currency | Y | Y | |
| amount | Y | Y | |
| adj_type | Y | A,B,C,D |
I have 2nd dataframe with actual data on which DQ checks mentioned in 1st dataframe to be performed
| adjid | adj_type | currency | amount |
|---|---|---|---|
| 111 | null | USD | 250 |
| 222 | A | null | 8383.121 |
| 333 | B | USD | -202.333 |
| 444 | G | USD | 202.333 |
I want output dataframe as below. if i use iterrows or itertuples on 1st and 2nd dataframe, it takes too much time to show output. 1st dataframe has 56 records and 2nd dataframe has 45000 records as of now.
| DQ column name | DQ column value | DQ validation details | adjid |
|---|---|---|---|
| currency | null | currency can not be null | 222 |
| amount | -202.333 | amount can not be negative | 333 |
| adj_type | null | adj_type can not be null | 111 |
| adj_type | G | adj_type 'G' does not contain in DQ list A,B,C,D | 444 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
