'Issue while writing sql query results in Spark SQL

I use Spark SQL v2.4. with the SQL API. I have a sql query, which fails when I run the job in Spark, it fails with the error :-

WARN SharedInMemoryCache: Evicting cached table partition metadata from memory due to size constraints 
(spark.sql.hive.filesourcePartitionFileCacheSize = 262144000 bytes). 
This may impact query planning performance.
ERROR TransportClient: Failed to send RPC RPC 8371705265602543276 to xx.xxx.xxx.xx:52790:java.nio.channels.ClosedChannelException

The issue occurs when I am triggering write command to save the output of the query to parquet file on S3:

The query is:-

create temp view last_run_dt
as
select dt, 
       to_date(last_day(add_months(to_date('[${last_run_date}]','yyyy-MM-dd'), -1)), 'yyyy-MM-dd') as dt_lst_day_prv_mth
from date_dim
where dt = add_months(to_date('[${last_run_date}]','yyyy-MM-dd'), -1);

create temp view get_plcy
as
select plcy_no, cust_id
from (select
          plcy_no,
          cust_id,
          eff_date,
          row_number() over (partition by plcy_no order by eff_date desc) AS row_num
      from plcy_mstr pm
      cross join last_run_dt lrd
              on pm.curr_pur_dt <= lrd.dt_lst_day_prv_mth
             and pm.fund_type NOT IN (27, 36, 52)
             and pm.fifo_time <= '2022-02-12 01:25:00'
             and pm.plcy_no is not null
       )
where row_num = 1;

I am writing the output as :

df.coalesce(10).write.parquet('s3:/some/dir/data', mode="overwrite", compression="snappy")

The "plcy_mstr" table in the above query is a big table of 500 GB size and is partitioned on eff_dt column. Partitioned by every date.

I have tried to increase the executor memory by applying the following configurations, but the job still fails.

set spark.driver.memory=20g;
set spark.executor.memory=20g;
set spark.executor.cores=3;
set spark.executor.instances=30;
set spark.memory.fraction=0.75;
set spark.driver.maxResultSize=0;

The cluster contains 20 nodes with 8 cores each and 64GB of memory.

Can anyone please help me identify the issue and fix the job ? Any help is appreciated.
Happy to provide more information if required.

Thanks



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source