Do Spark have cache with TTL option. I need to do lookup on reference data to perform some transformation in my Spark streaming application. Also lookup dataset
I want to convert each row of my dataframe into to a Python class object called Fruit. I have a dataframe df with the following columns: Identifier, Name, Quant
I want to convert each row of my dataframe into to a Python class object called Fruit. I have a dataframe df with the following columns: Identifier, Name, Quant
I have a date column in dataframe that that looks like this: "JAN20, FEB20, MAR20 .... JAN21, FEB21, MAR21..." This created a problem when I tried to plot numb
I have trained a logistic regression algorithm to match job titles and descriptions to a set of 4 digit numeric codes. This it does very well. It will form part
I'm reading a .json file that contains the structure below, and I need to generate a csv with this data in column form, I know that I can't directly write an ar
I need to extract objects from an array, where there's more than one object in that array I need to repeat for every id and if the field is null then I want to
I am attempting to use a PySpark kernel from inside an EMR Notebook that is hosted on an AWS managed service (EMR) and I am unable to access Artifactory to inst
I am trying to write data on an S3 bucket from my local computer: spark = SparkSession.builder \ .appName('application') \ .config("spark.hadoop.fs.s3a.
I have an array named "extractColumns" and I have a dataframe named "raw_data". I wanted to create a new dataframe according to the array and the dataframe. Eve
I have a spark standalone configured with 3 nodes. I want to read csv data stored in s3-compatible storage (dell ecs) in this pySpark. Here's the method and con
i am new in using pyspark with elephas and tensorflow i am trying to train a deep learning model inside pyspark using elephas module my code : https://www.kaggl
Let say I have a dataframe: now i want list of the elements present in the column NAME like this: ['s', 'a', 'c', 'h', 'i', 'n'] how can we do this in pyspark
Having a date, I create a column with ISO 8601 week date format: from pyspark.sql import functions as F df = spark.createDataFrame([('2019-03-
I am working on a Pyspark using the flatMap function and I am using the split within the function. But I am getting an error which says: AttributeError: 'NoneTy
I have to call csv file to read data frame as below format, i can read normal file read but using self variable , i am not aware of it. kindly help us. from pys
I want to use archived environment for spark-submit, but after unpacking on k8s cluster it has corrupted python interpreter
I am trying to read from a databricks table. I have used the url from a cluster in the databricks. I am getting this error: java.sql.SQLDataException: [Simba][
I have a url from where I download the data (which is in JSON format) using Databricks: url="https://tortuga-prod-eu.s3-eu-west-1.amazonaws.com/%2FNinetyDays/am
I'm getting the following error when I attempt to write to my data lake with Delta on Databricks fulldf = spark.read.format("csv").option("header", True).option