'How to categorize the list of names based on their first letter in PySpark in Colab notebook?
I want to categorize the list of names based on their first character using the groupby and map commands. This list contains a number of names. How should I do it? Can anyone help me? I write code in a Colab notebook with pysperk. I am a beginner and don't know how to do it.
arrayStructureData = [
("Sajad","M"),
("Hassan","F"),
("Ali","F"),
("Hossein","M")]
arrayStructureSchema = StructType([
StructField('Name', StringType(), True),
StructField('gender', StringType(), True)
])
df = spark.createDataFrame(data = arrayStructureData, schema = arrayStructureSchema)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
