镜像选择Docker Hub上利用最多的bitnami Kafka,重要注意的点是情况变量和Kafka配置的映射关系
Additionally, any environment variable beginning with KAFKA_CFG_ will be mapped to its corresponding Apache Kafka key. For example, use KAFKA_CFG_BACKGROUND_THREADS in order to set background.threads or KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE in order to configure auto.create.topics.enable
另有就是,当利用任何来自于bitnami的镜像,如何遇到了题目,想查察日志,可以将镜像的Debug日志打开,通过情况变量
node.id The node ID associated with the roles this process is playing when process.roles is non-empty. This is required configuration when running in KRaft mode.
porcess.roles The roles that this process plays: ‘broker’, ‘controller’, or ‘broker,controller’ if it is both. This configuration is only applicable for clusters in KRaft (Kafka Raft) mode (instead of ZooKeeper). Leave this config undefined or empty for Zookeeper clusters
controller.listener.names A comma-separated list of the names of the listeners used by the controller. This is required if running in KRaft mode
关于Controller和Broker的概念解释
一句话解释:
Controller负责协调Broker(详细解释可见Kafak权威指南的第5章,该书可在Apache Kafak官网 > Get Started > Books 中找到免费下载)
To summarize, Kafka uses Zookeeper’s ephemeral node feature to elect a controller
and to notify the controller when nodes join and leave the cluster. The controller is
responsible for electing leaders among the partitions and replicas whenever it notices
nodes join and leave the cluster. The controller uses the epoch number to prevent a
“split brain” scenario where two nodes believe each is the current controller.
Broker负责处理生产者生产消息的哀求、存储消息、消费者消费消息的哀求。
A single Kafka server is called a broker. The broker receives messages from producers,
assigns offsets to them, and commits the messages to storage on disk. It also services
consumers, responding to fetch requests for partitions and responding with the mes‐
sages that have been committed to disk
来自Kafka权威指南第1章>Enter Kafka > Broker And Clusters
Listener的各种配置
我个人的理解,最小的集群应该是3个controller+3个broker, 为什么是3个broker呢?Kafka官方文档Replication一节 提到过
With this ISR model and f+1 replicas, a Kafka topic can tolerate f failures without losing committed messages
意思是,如果topic的复制因子replication factor是2(复制因子是包括leader的,见官方文档:The total number of replicas including the leader constitute the replication factor),那么在一个节点失败的情况下,Kafka还是可以正常工作的。这里Kafka采取的算法和ZooKeeper, Elasticsearch集群的算法是不一样的。如果换成ZK和ES,只有两个节点,这时ZK和ES是无法工作的。
The downside of majority vote is that it doesn’t take many failures to leave you with no electable leaders. To tolerate one failure requires three copies of the data, and to tolerate two failures requires five copies of the data. In our experience having only enough redundancy to tolerate a single failure is not enough for a practical system, but doing every write five times, with 5x the disk space requirements and 1/5th the throughput, is not very practical for large volume data problems. This is likely why quorum algorithms more commonly appear for shared cluster configuration such as ZooKeeper but are less common for primary data storage
那这么看来,Kafka集群实在只需要2个节点就可以了,为什么还是3个节点呢?带着这个疑问,我查了一下stackoverflow,还真有人问过这个题目,见in kafka ha why minimum number of brokers required are 3 and not 2