I am getting error while installing spark on Google Colab. It says tar: spark-2.2.1-bin-hadoop2.7.tgz: Cannot open: No such file or directory tar: Error
I am a new beginner in the big data field, I need to make a demo which streams data from Kafka topic using spark stream then make some aggregation and filtering
I am trying to submit spark-submit but its failing with as weird message. Error: Could not find or load main class org.apache.spark.launcher.Main /opt/spark/b
Circumstances: I have read through these:https://spark.apache.org/docs/3.1.2/monitoring.html https://dzlab.github.io/bigdata/2020/07/03/spark3-monitoring-1/ ver
In another similar question, they hint 'install older spark 2.4.5.' EDIT: the solution from above link says 'install spark 2.4.5 and it does have kafkautils. Bu
Let's say I have a dataframe which looks like this: +--------------------+--------------------+--------------------------------------------------------------+
I am trying to replace parentheses in a string (i.e. column names). It is working fine with white spaces but not with ( parentheses. I tried """, \(, \\( but I
I have large amount of json files that Spark can read in 36 seconds but Spark 3.0 takes almost 33 minutes to read the same. On closer analysis, looks like Spark
I am a newbie in pyspark, While trying to read parquet file through pyspark I get the below error. I have tried various things like reinstallation of jre and jd
I have the following code: # Get the min and max dates minDate, maxDate = df2.select(f.min("MonthlyTransactionDate"), f.max("MonthlyTransactionDate")).first()
I'm trying to read MongoDB using this guide df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load() df = df.select(['my_cols']) df = df.where('date
There is a good of examples of creating Spark jobs using the Kubernetes Spark Operator and simply submitting a request with the following kubectl apply -f spa
I build a cluster use CDH5.14.2, includes 5 nodes, each node has 130G momery and 40 cpu cores. I builded the spark streamming application to read from multiple
I have two identical Spark DataFrame. They have the same columns. I am trying to create a IF-Else statement in one line but couldnt find a better way to do it.
[I 10:43:53.627 NotebookApp] 启动notebooks 在本地路径: /opt/soft/recommender/jupyter [I 10:43:53.627 NotebookApp]
I am trying to connect to a remote cassandra cluster in my spark shell using the Spark-cassandra connector. But its throwing some unusual errors. I do the usual
I have a curious issue, when launching a databricks notebook from a caller notebook through dbutils.notebook.run (I am working in Azure Databricks). One intere
I am beginner to Spark, while reading about Dataframe, I have found below two statements for dataframe very often- 1) DataFrame is untyped 2) DataFrame has sch
I have followed this post pyspark error reading bigquery: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class and followed the resolution
I followed the Dynamic allocation setup configuration however, getting the following error when starting the executors. ERROR TaskSchedulerImpl: Lost execu