'Drop multiple duplicated columns after left join w/ dataframes?
I'm working on a script using Pyspark and Spark SQL. However, after running a LEFT-JOIN on my base_df and inc_df dateframes, all of my columns were duplicated.
I've figured out why the columns were duplicated, but I'm now receiving type errors while trying to DROP those duplicated columns.
I've tried this solution as well.... a.join(b, 'id')
Here's part of my code below..
''' uc_df = (base_df
.join(inc_df,
(inc_df.src_sys_id == base_df.src_sys_id) & (inc_df.fleet_acct_nbr == base_df.fleet_acct_nbr),
"left_outer")
.filter(inc_df.src_sys_id.isNull())
.drop(base_df.region_cd,
base_df.fleet_acct_exctv_cd,
base_df.fleet_acct_nbr,
base_df.fleet_acct_exctv_nm,
base_df.fleet_acct_exctv_first_nm,
base_df.fleet_acct_exctv_last_nm,
base_df.fleet_acct_exctv_mid_nm,
base_df.fleet_acct_nm,
base_df.meas_cnt,
base_df.dw_anom_flg,
base_df.dw_mod_ts,
base_df.src_sys_iud_cd,
base_df.dw_job_id)
)
I received the following error:
Type Error: Each column in the param list should be a string
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
