'How to do length validation using Spark or Pandas Dataframe..One file is having data and another file is having length
I am new to pyspark and seeking help on doing length validation.
Data File :-
File 1
Name|Salary|Age|Dept
XYZ|10000|32|HR
TUV|15000|28|IT
File 2 (Max length for each column)
Name|Salary|Age|Dept
5|8|3|2
Need to check if length of Name in data file (file 1) is greater than 5 in (file 2) , if yes reject the entire record. If length of salary is greater than 8, reject the entire record. Likewise, need to check for each column using Spark or Pandas DataFrame or in Python as a last option.
Note :- Don't want to hardcode the max length in code. File can have over 200 columns and record counts can be in Million.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
