Category "scala"

scala spark partitionby and get current partition name

I'm using scala spark and have a DataFrame: Source | Column1 | Column2 A ... ... B ... ... B ... ... C ...

Efficient way to parse a file with different json schemas in spark

I am trying to find the best way to parse a json file with inconsistent schema (but the schema of the same type is known and consistent) in spark in order to sp

On the road to understanding F-Bounded Polymorphism

Before even getting to F-bounded Polymorphism there is construction that underpin it that i already have a hard time to understand. trait Container[A] trait

Spark RDD: Find the single row that has the highest count and for that row report the month, count and hashtag name. Output Using PrintLn

[Spark RDD] Find the single row that has the highest count and for that row report the month, count and hashtag name. Print the result to the terminal output us

Random Sampling base on 1 column after Groupby

I have a Spark Table, which contains 400+ millions records/rows. I used spark.table to convert it into a DF. The DF looks like this below id pub_date

I am trying to setup spark in local but getting error

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(

Spark load csv file in jar from resources folder

I am trying to create a Spark application running on Scala that reads a .csv file that is located in src/main/resources directory and saves it on the local hdfs

Azure Storage Account file details in a table in databricks

I am loading data via pipelines in ADLS gen2 container. Now I want to create a table that has details that when the pipeline start running and then completed. l

Word count using map reduce on Seq[String]

I have a Seq which contains randomly generated words. I want to calculate the occurrence count of each word using map reduce. Now, I have been able to map the w

java.lang.NoClassDefFoundError: org/apache/flink/streaming/api/scala/StreamExecutionEnvironment

package com.knoldus import org.apache.flink.api.java.utils.ParameterTool import org.apache.flink.streaming.api.scala._ import org.apache.flink.streaming.api.win

How to use countDistinct using a window function in Spark/Scala?

I need to use window function that is paritioned by 2 columns and do distinct count on the 3rd column and that as the 4th column. I can do count with out any is

get count of partitions in a kafka topic with scala 2.12

With scala 2.11 and spark-streaming-kafka-0-8_2.11 I could do import org.apache.spark.streaming.kafka.KafkaCluster val params = Map[String, Object]( "bootstr

Getting Error while encoding from reading text file

I have a pipe delimited file I need to strip the first two rows off of. So I read it into and RDD, exclude the first two rows, and make it into a data frame. va

Scala Jackson deserialization failing with "non-static inner classes" error version Jackson2.10

I am trying to upgrade from Jackson-2.5 to 2.10 Below deserialization code was working for me, before but post upgrade the solution is failing with following er

sbt-avro is not generating Scala classes, possible settings issue

I'm trying to use sbt-avro in a Scala project to generate Scala classes from an Avro schema. Here is the project structure: multi-tenant/ build.sbt proj

Spark in SBT console: "Could not find spark-version-info.properties"

I'm trying to instantiate a SparkContext inside a SBT console, using the following scala commands: import org.apache.spark.SparkConf import org.apache.spark.Spa

How to generate html test report for junit in sbt?

I have junit tests in my scala sbt project. I know, that I can generate html reports for ScalaTest with: testOptions in Test += Tests.Argument(TestFrameworks.

How to generate html test report for junit in sbt?

I have junit tests in my scala sbt project. I know, that I can generate html reports for ScalaTest with: testOptions in Test += Tests.Argument(TestFrameworks.

Spark scala data frame udf returning rows

Say I have an dataframe which contains a column (called colA) which is a seq of row. I want to to append a new field to each record of colA. (And the new filed

Error while running Scala code - Databricks 7.3LTS and above

I am running databricks 7.3LTS and having errors while trying to use scala bulk copy. The error is: object sqldb is not a member of package com.microsoft. I hav

Category "scala"

Other Categories