'Kafka replication factor vs min.insync.replicas

Replication-factor is the total number of copies of the data stored in an Apache Kafka cluster.

min.insync.replicas is the minimum number of copies of the data that you are willing to have online at any time to continue running and accepting new incoming messages.

Suppose if I started a 5 node cluster and create a topic with replicator-factor of 3 with ack=all.

Now when I publish a message will i get ack when data is replicated to other 3 broker every time ? what if 3 out of 5 nodes are down, will it wait to node come live again and then replicated the message and send the ack ? I believe min.insync.replica here is 1 by default ?
Now if the min.insync.replica is set to 2 and replication factor is set to 3 then does this means that after replicating the data to 3 other node, ack is send back and the cluster will make sure that the the data will be present in atleast 2 nodes all the time. Is this understand on correct ?
If the min.insync.replica is 2 and replication factor is 3. Will I get the ack after the data is replicated in 2 nodes and later the leader will add to 3 node or it will return the ack after the the data is replicated to 2 nodes ?

Basically I am interested in ack time and the durability of the data which is of highest priority so getting confused in some concepts.

apache-kafka replication-factor

Solution 1:^[1]

I believe min.insync.replica here is 1 by default ?

Yes, that is the default

when I publish a message will i get ack when data is replicated to other 3 broker every time ?

To the leader and the "other 2", with acks=all, yes.

what if 3 out of 5 nodes are down

If those 3 nodes are all the replicas of the topic, then you'll get an error. You can optionally set/increase retries on the producer.

If the 2 remaining nodes include the leader or an ISR, then with min ISR set to 1, then the producer should continue. You'll just have one replica that is out of the ISR list.

In the remainder of your questions, acks=all, which has strong durability guarantees and will ensure the min ISR over all available replicas is met before the next message batch is written.

From Cloudera Documentation

How can I configure Kafka to ensure that events are stored reliably?

The following recommendations for Kafka configuration settings make it extremely difficult for data loss to occur.

Producer

block.on.buffer.full=true

retries=Long.MAX_VALUE

acks=all

max.in.flight.requests.per.connections=1

Remember to close the producer when it is finished or when there is a long pause.

Broker

Topic replication.factor >= 3

Min.insync.replicas = 2

Disable unclean leader election

Consumer

Disable enable.auto.commit

Commit offsets after messages are processed by your consumer client(s).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	OneCricketeer

'Kafka replication factor vs min.insync.replicas

Solution 1:[1]

How can I configure Kafka to ensure that events are stored reliably?

Sources

Related Questions

Solution 1:^[1]