This is my dataset: from pyspark.sql import SparkSession, functions as F spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame([('2021-02-07',)
incomplete-type
telepresence
markdown
rabbitmq.client
host
drop-table
google-sso
memberof
android-10.0
code-metrics
placeholder
levenberg-marquardt
updateview
kajabi
android-video-player
jotform
fasta
pushdown-automaton
jquery-ui-tabs
nmake
restbed
pyscipopt
actionresult
android-audiomanager
stdstack
maya-api
micronaut-client
listfield
fortify-source
kubernetes-pvc