'is there any pyspark UDF function or inbuilt function available to add a new column in dataframe and to do row level operations based on a row value?

I have a dataframe like this:

    | col1 | col2 |
    --------------
    | a    | 1    |
    | a    | 2    |
    | b    | 3    |
    | c    | 4    |
    | a    | 5    |

Now, I need to create new column 'col3' and i have to put new values in col3 based on col1 value. The resultant dataframe would look like this.

Like, if col1 has the value 'a', then col3 should have "apple" in it. if col1 has the value 'b', then col3 should have "banana" in it. if col1 has the value 'c', then col3 should have "custard" in it.

Note: col2 is normal column, Please don't consider.

    | col1 | col2 | col3    |
    ------------------------
    | a    | 1    |apple    |
    | a    | 2    |apple    |
    | b    | 3    |banana   |
    | c    | 4    |custard  |
    | a    | 5    |apple    |

Any Pyspark UDF or Inbuilt function i can get?

Thanks in Advance!!!



Solution 1:[1]

I got an Answer by this function.. This could be helpful for someone.

I have used this function:

    fruits = {
'a': 'apple',
'b': 'banana',
'c': 'custard'
}

    def X(col1Value):
        return fruits.get(col1Value, "Not Found ! ")



    df['col3']= X(col1Value)

Modified variables!!!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Krishna