'SparkJava Dataset insert into ClickHouse Table could not pass the too many parts exception

I get dataset from Kafka source with spark/java, but too many parts occur when I insert the dataset into clickhouseTable, I touch an XML file in /etc/clickhouse-server/config.d/merge_tree.xml, and paste the content copied from other solutions on the internet, here are the follows:

<?xml version="1.0"?>
<yandex>
  <merge_tree>
    <old_parts_lifetime>30</old_parts_lifetime>
    <parts_to_delay_insert>150</parts_to_delay_insert>
    <parts_to_throw_insert>900</parts_to_throw_insert>
    <max_delay_to_insert>5</max_delay_to_insert>
  </merge_tree>
</yandex>

and I also add the tags in user.xml:

<profiles>
  ...
  <default>
     <max_partitions_per_insert_block>10000</max_partitions_per_insert_block>
  </defualt>
  ...
</profiles>

and my spark codes are:

 dataset.repartition(1).write().mode(SAVEMODE.APPEND)
        .option("batchsize",""10000")
        .option("isolationLevel","NONE")
        .option("numPartitions","1")
        .option("driver",driver)
        .jdbc(url,tableName,properties).

However, too many exceptions still occur in the spark JDBC stage, I am really confused and wish for your help. Does anyone have any idea?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source