'Pyspark. Create column based on multiple columns and by replacing values from dictionary

For example I have a df that look like this.

And I found a way to create a new column based on values in another column. Now I need to create a column based on all (actually there are 11 of them) columns. Such that, if value from the first column match the key in dictionary, put the value from dictionary in a new column "Description". If value does not match, go to and check another column etc. Here is my code

def replacement(val, my_dict):
    for k,v in my_dict.items():
        val=val.replace(k,v)
    return val
    
my_dict = {'James':'some_detail','Rose':'other_details', 'M';'different_details'}

replacing = udf(lambda x: replacement(x,my_dict))

new_df = df.withColumn("Description", replacing(col"firstname")))

How to change the code? replacing takes only one column, how can it accepts many columns? Can someone suggest?

EDIT:I need only one column to be created, only "Description". And if "James" was found in my_dict return "some detail" and put it in "Description" column. Do not check the rest of the row, do not check lastname (Smith) gender (M) and salary(8000); go to another row "Anna"

Example of the end df: Note that there is no 0 in a dict, so value is taken from second column. There is no Robert or Williams in dict, but we have M key-value pair, so we tae value from this pair. enter image description here

pyspark

Solution 1:^[1]

You can try some thing like this.

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("James","Smith","M",8000),
    ("Michael","Rose","M",7000),
    ("Robert","Williams","M",6000),
  ]

schema = StructType([ \
    StructField("firstname",StringType(),True), \
    StructField("lastname",StringType(),True), \
    StructField("gender", StringType(), True), \
    StructField("salary", IntegerType(), True) \
  ])
 
df = spark.createDataFrame(data=data2,schema=schema)
df.printSchema()
df.show(truncate=False)
    
my_dict = {'James':'some_detail','Michael':'trust','Rose':'other_details', 'M':'different_details'}

df2 = df.withColumn("Description",reduce(lambda c, k: c.when(F.col('firstname').rlike(rf"\b{k}\b"), my_dict[k]).when(F.col('lastname').rlike(rf"\b{k}\b"), my_dict[k]).when(F.col('gender').rlike(rf"\b{k}\b"), my_dict[k]).when(F.col('salary').rlike(rf"\b{k}\b"), my_dict[k]), my_dict.keys(),F))
display(df2)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Pyspark. Create column based on multiple columns and by replacing values from dictionary

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]