Category "pyspark"

Pyspark connection to the Microsoft SQL server?

I have a huge dataset in SQL server, I want to Connect the SQL server with python, then use pyspark to run the query. I've seen the JDBC driver but I don't fin

How to correctly import pyspark.sql.functions?

from pyspark.sql.functions import isnan, when, count, sum , etc... It is very tiresome adding all of it. Is there a way to import all of it at once?

Google cloud dataproc cluster created with an environment.yaml with a jupyter resource but environment not available as a jupyter kernel

I have created a new dataproc cluster with a specific environment.yaml. Here is the command that I have used to create that cluster: gcloud dataproc clusters cr

How do you create merge_asof functionality in PySpark?

Table A has many columns with a date column, Table B has a datetime and a value. The data in both tables are generated sporadically with no regular interval. Ta

How to apply the describe function after grouping a PySpark DataFrame?

I want to find the cleanest way to apply the describe function to a grouped DataFrame (this question can also grow to apply any DF function to a grouped DF) I

How to use a Scala class inside Pyspark

I've been searching for a while if there is any way to use a Scala class in Pyspark, and I haven't found any documentation nor guide about this subject. Let's