'PySpark - Creating a UDF to Concatenate Two Columns of Lists Into a List of Lists

Suppose I have a dataframe with columns A, B, C, D, E. Each of these can be comprised of a list of values or null. I would like to concatenate these values into a final column F that consists of a list of lists that ignores the null values and preserves the original columns' order.

Ex input: [a,b,c] | [b,c,d] | null | null | [z] Ex output: [[a,b,c], [b,c,d], [z]]

Unfortunately, concat_ws flattens everything, so I believe I must use a UDF. Does anyone have a solution to this problem?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source