'How to categorize the list of names based on their first letter in PySpark in Colab notebook?

I want to categorize the list of names based on their first character using the groupby and map commands. This list contains a number of names. How should I do it? Can anyone help me? I write code in a Colab notebook with pysperk. I am a beginner and don't know how to do it.

arrayStructureData = [
    ("Sajad","M"),
    ("Hassan","F"),
    ("Ali","F"),
    ("Hossein","M")]
arrayStructureSchema = StructType([
     StructField('Name', StringType(), True),
     StructField('gender', StringType(), True)
     ])
df = spark.createDataFrame(data = arrayStructureData, schema = arrayStructureSchema)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to categorize the list of names based on their first letter in PySpark in Colab notebook?

Sources

Related Questions