I have an array named "extractColumns" and I have a dataframe named "raw_data". I wanted to create a new dataframe according to the array and the dataframe. Eve
I have a spark standalone configured with 3 nodes. I want to read csv data stored in s3-compatible storage (dell ecs) in this pySpark. Here's the method and con
i am new in using pyspark with elephas and tensorflow i am trying to train a deep learning model inside pyspark using elephas module my code : https://www.kaggl
Let say I have a dataframe: now i want list of the elements present in the column NAME like this: ['s', 'a', 'c', 'h', 'i', 'n'] how can we do this in pyspark
Having a date, I create a column with ISO 8601 week date format: from pyspark.sql import functions as F df = spark.createDataFrame([('2019-03-
I am working on a Pyspark using the flatMap function and I am using the split within the function. But I am getting an error which says: AttributeError: 'NoneTy
I have to call csv file to read data frame as below format, i can read normal file read but using self variable , i am not aware of it. kindly help us. from pys
I want to use archived environment for spark-submit, but after unpacking on k8s cluster it has corrupted python interpreter
I am trying to read from a databricks table. I have used the url from a cluster in the databricks. I am getting this error: java.sql.SQLDataException: [Simba][
I have a url from where I download the data (which is in JSON format) using Databricks: url="https://tortuga-prod-eu.s3-eu-west-1.amazonaws.com/%2FNinetyDays/am
I'm getting the following error when I attempt to write to my data lake with Delta on Databricks fulldf = spark.read.format("csv").option("header", True).option
If i read a file in pyspark: Data = spark.read(file.csv) Then for the life of the spark session, the ‘data’ is available in memory,correct? So if i
Below table would be the input dataframe col1 col2 col3 1 12;34;56 Aus;SL;NZ 2 31;54;81 Ind;US;UK 3 null Ban 4 Ned null Expected output dataframe [values of c
I'd like to take a Spark LDA Model's term indices from the .describeTopics() output and match them to the appropriate term in the count vectorizer's vocabulary.
I have the following PySpark dataframe and I want to find percentile row-wise. value col_a col_b col_c row_a 5.0 0.0 11.0 row_b 3394.0 0
im currently working on pyspark 1.6.3 and there is this error. Do you know what can be the reason? code
I am on windows and I am trying to follow this doc to create spark applications with VSCode using a Synapse workspace. I can sign into Azure and set a default S
Running on AWS and EMR, Jupyter, Pyspark notebook and trying to install a python package "sparse_dot_topn" version 0.2.9 I'm getting an error I don't understand
I am trying to open a file that i uploaded to the dbfs location. However, I get error while trying to open the file but I can see the file when I do a ls. Also
I am having some data in aurora mysql db, I would like to do two things: HISTORICAL DATA: To read the data from aurora(say TABLE A) do some processing and updat