Not able to remove white space from SQL query output used in pyspark code. I tried, trim,ltrim,rtrim,replace (multiple nested also) and regex replace. Any other
I have code that uses row_number() partitioned by date. I would like to create an array that contains data grouped by the row_number that is partitioned by date
I'm trying to filter the data frame by values of salary then saving them as CSV files using pyspark. spark = SparkSession.builder.appName('SparkByExamples.com')
I am trying to validate date received in file against configured date format(using to_timestamp /to_date). schema = StructType([ \ StructField("date",String
Wanted to create a spark dataframe from json string without using schema in Python. The json is mutlilevel nested which may contain array. I had used below for
Python doesn't like the ampersand below. I get the error:& is not a supported operation for types str and str. Please review your code. Any idea how to get
Question: In Apache Spark Dataframe, using Python, how can we get the data type and length of each column? I'm using latest version of python. Using pandas data
i have two python scripts. the main script is like from testa import modify, see from pyspark import SparkContext if __name__ == '__main__': sc = SparkConte
I have a pyspark dataframe event_name 0 a-markets-l1 1 a-markets-watch 2 a-markets-buy 3 a-markets-z2 4 scroll_down This dataframe has event_name column EXCL
I have a monthly job in databricks where I want to truncate all records for previous month and then load for current month in cosmos db so I tried with option("
I'm facing an issue when trying to use pyspark=3.1.2. I have java 1.8 installed and added in my user path. But according to the docs it does not need any other
problem screenshot :14: error: not found: value spark import spark.implicits._ ^ :14: error: not found: value spark import spark.sql ^ here is my enviroment con
I'm querying data from Cassandra in Spark using SCC 2.5 and Spark 2.4.7 (pyspark). The table I'm reading from has a composite partition key (partition_key_1, pa
I am getting this error while running pyspark package in pycharm using python 3.9 using this below code. from pyspark.sql import SparkSession from pyspark.sql.t
i am trying to join columns values to a list of values df1= name | department| state | id| -----+-----------+-------+---+ James|Sales |NY |101 Maria|F
I need help in converting the below function into an SQL query: start_time :- 1649289600end_time :- 1649375999 test_data = df.withColumn("from_timestamp",to_t
Does computeSVD() use map , reduce since it is a predefined function? i couldn't know the code of the function. from pyspark.mllib.linalg import Vectors from py
I'm storing in a delta table the prices of products. The schema of the table is like this: id | price | updated 1 | 3 | 2022-03-21 2 | 4 | 2022-03-20
I'm using Glue 3.0 data = [("Java", "6241499.16943521594684385382059800664452")] rdd = spark.sparkContext.parallelize(data) df = rdd.toDF() df.show() df.select(
I have a log file in csv which has a column contains a list of filepaths separated by comma. I want to split those filepaths into new rows using pyspark(or exce