Category "scala"

Spark 3.0 is much slower to read json files than Spark 2.4

I have large amount of json files that Spark can read in 36 seconds but Spark 3.0 takes almost 33 minutes to read the same. On closer analysis, looks like Spark

Akka: persist to Cassandra and publish to Kafka multiple events

I need to store to Cassandra and publish to Kafka multiple events, and call some final handler() only after all events are stored and published. I came across U

cats-effect:How to transform Map[x,IO[y]] to IO[Map[x,y]]

I have a map of string to IO like this Map[String, IO[String]], I want to transform it into IO[Map[String, String]]. How to do it?

How to organize java and scala code in Play?

activator new results in: Fetching the latest list of templates... Browse the list of templates: http://lightbend.com/activator/templates Choose from these f

List All objects in S3 with given Prefix in scala

I am trying list all objects in AWS S3 Buckets with input Bucket Name & Filter Prefix using following code. import scala.collection.JavaConverters._ import

Trigger IF Statement only when two Spark dataframe meet the conditions

I have two identical Spark DataFrame. They have the same columns. I am trying to create a IF-Else statement in one line but couldnt find a better way to do it.

Find the maximum value from JSON data in Scala

I am very new to programming in Scala. I am writing a test program to get maximum value from JSON data. I have following code: import scala.io.Source import sc

How do I verbalize the term F[_] in scala/cats-effect

I'm learning the concept of F[_] as a constructor for other types, but how do you pronounce this to another human or say it in your head (for us internal monolo

Cannot connect to Cassandra in spark-shell

I am trying to connect to a remote cassandra cluster in my spark shell using the Spark-cassandra connector. But its throwing some unusual errors. I do the usual

Provider com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be instantiated while reading from bigquery in Jupyter lab

I have followed this post pyspark error reading bigquery: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class and followed the resolution

Programmatically add/remove executors to a Spark Session

I'm looking for a reliable way in Spark (v2+) to programmatically adjust the number of executors in a session. I know about dynamic allocation and the ability

What should I use instead deprecated FlinkKafkaConsumer? Scala Flink

I try to get data from Kafka to Flink, I use FlinkKafkaConsumer but Intellij shows me that it is depricated and also ssh console in Google Cloud shows me this e

sbt package is trying to download a package whose path does not exist

These are the contents of my build.sbt file: name := "WordCounter" version := "0.1" scalaVersion := "2.13.1" libraryDependencies ++= Seq( "org.apache.spar

Find the Max value of an Array column and find associated value in another Array with in the dataframe

I have a csv file with below data. Id Subject Marks 1 M,P,C 10,8,6 2 M,P,C 5,7,9 3 M,P,C 6,7,4 I Need to find out Max value in the Marks column for each Id an

spark save simple string to text file

I have a spark job that needs to store the last time it ran to a text file. This has to work both on HDFS but also on local fs (for testing). However it seems

Akka Finite State Machine and how to protocol Behaviors.unhandled?

I have a question about Behaviors.unhandled, I know that Akka sends the unhandled message to the Dead Letter and with the following configuration it also logs i

Improvement of State Machine implemented in scala

I am working on an implementation of a state machine in scala. The original version is written in python, therefore I have a lot of if /else clauses in the co

Json4s not serializing Java classes

I have some scala code that needs to be able to serialize/deserialize some Java classes using Json4s. I am using "org.json4s" %% "json4s-ext" % "4.0.5" and "org

Error building maven project in Intellij : "object apache is not a member of package org"

Whenever I try to run my main program directly in IntelliJ I get this error: Error:(5, 12) object apache is not a member of package org import org.apache.common

Using scalamock: Could not find implicit value for evidence parameter of type error

I am writing unit tests for my spark/scala application. I am using scalamock as well to mock objects, specifically Session / Session Factory. In one of my test