单节点主机多容器的Kafka集群挂掉了,进行三个节点主机Docker环境下的Kafka集群安装。

环境准备

主机

目前有三台物理机,计划对应zk+kafka节点为:

  • 172.10.0.1 –> broker1
  • 172.10.0.2 –> broker2
  • 172.10.0.3 –> broker3

镜像

使用的ZK为:

1
zookeeper-3.4.13

使用的Kafka为:

1
kafka_2.11-2.0.0

Dockerfile文件为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Docker image of kafka cluster
# VERSION 0.0.1
# Author: xukf

#基础镜像
FROM daocloud.io/centos:7

#作者
MAINTAINER xukf <xukf.me>

#定义工作目录
ENV WORK_PATH /usr/local/work

#定义日志目录
ENV LOG_PATH /usr/local/work/log

#定义zookeeper的Data目录
ENV ZK_DATA_PATH $WORK_PATH/zkdata

#定义zookeeper文件夹名称
ENV ZK_PACKAGE_NAME zookeeper-3.4.13

#定义kafka文件夹名称
ENV KAFKA_PACKAGE_NAME kafka_2.11-2.0.0

#将kafka的bin目录加入PATH
ENV PATH $WORK_PATH/$KAFKA_PACKAGE_NAME/bin:$PATH

#安装JDK
ADD ./jdk-8u181-linux-x64.tar.gz /usr/local/
ENV JAVA_HOME /usr/local/jdk1.8.0_181
ENV JRE_HOME ${JAVA_HOME}/jre
ENV CLASSPATH .:${JAVA_HOME}/lib:${JRE_HOME}/lib
ENV PATH ${JAVA_HOME}/bin:${PATH}

#创建工作目录
RUN mkdir -p $WORK_PATH

#创建日志目录
RUN mkdir -p $LOG_PATH

#创建zookeeper的Data目录
RUN mkdir -p $ZK_DATA_PATH

#安装kafka
ADD ./kafka_2.11-2.0.0.tgz $WORK_PATH/

#拷贝解压缩的zk
COPY ./$ZK_PACKAGE_NAME $WORK_PATH/$ZK_PACKAGE_NAME

使用现有仓库中的镜像

现有镜像提交私有仓库

Docker部署程序通过镜像直接拉取容器。将原来的Kafka节点镜像上传到私有仓库,新增主机上通过拉取镜像进行节点部署。

以broker2节点为例

1
docker push 172.10.0.1:5000/broker2:0.0.2

配置新增主机上Docker环境

  • 主机上Docker环境搭建好
  • 配置私有仓库信息

daemon.json文件大概是这个样子:

1
2
3
4
5
6
7
8
9
10
11
12
{
"storage-driver": "overlay",
"max-concurrent-uploads": 16,
"max-concurrent-downloads": 16,
"graph":"/var/lib/docker",
"insecure-registries":["172.10.0.1:5000"],
"log-opts": {
"max-size": "50m",
"max-file": "6"
},
"hosts":["tcp://0.0.0.0:2376", "unix:///var/run/docker.sock"]
}

insecure-registries配置为原单节点主机的私有仓库,如果私有仓库制定了端口号,需要配置上相应端口号。

注意配置完后,如果获取私有仓库镜像遇到下面报错信息,多半是json配置文件没有生效:

Error response from daemon: Get https://172.10.0.1:5000/v2/: http: server gave HTTP response to HTTPS client

执行以下命令进行生效(root用户):

1
2
systemctl daemon-reload
systemctl restart docker

获取私有仓库镜像

以broker2节点为例

1
docker pull 172.10.0.1:5000/broker2:0.0.2

Kafka集群部署

容器启动

目前使用docker-compose来进行各主机上的容器管理:

  • 安装docker-compose
  • 配置yml文件

以broker2节点为例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
version: '2'
services:
broker2:
image: 172.10.0.1:5000/broker2:0.0.2
container_name: broker2
ports:
- "19092:19092"
- "18888:18888"
- "19888:19888"
- "22181:22181"
- "19012:22"
tty: true
network_mode: "host"
producer1:
image: 172.10.0.1:5000/producer:0.0.2
container_name: producer1
ports:
- "19013:22"
tty: true

由于主机上端口冲突,这里将Kafka的9092、zk的2181、2888、3888端口都进行了调整。
前期试过将Kafka的9092端口通过bridge模式,映射成19092,但是在配置Kafka的参数时遇到了问题,一直会提示有节点不可用:

Connection to node xxxx could not be established. Broker may not be available

猜测是无法正确解析外部ip或者port导致的。暂时解决方案是采用host模式进行连接。

完成启动容器

配置zk参数

配置主机身份

先要配置每个主机上zk的身份:
broker1容器:

1
echo 1 > /usr/local/work/zkdata/myid

broker2容器:

1
echo 2 > /usr/local/work/zkdata/myid

broker3容器:

1
echo 3 > /usr/local/work/zkdata/myid

配置zk端口号等

在三个节点容器的/usr/local/work/zookeeper-3.4.13/conf目录下配置zoo.cfg文件,因为要进行端口调整,所以参数文件要注意配置clientPortserver两个地方:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/usr/local/work/zkdata
# the port at which the clients will connect
clientPort=22181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=172.10.0.1:18888:19888
server.2=172.10.0.2:18888:19888
server.3=172.10.0.3:18888:19888

注意:如果2181端口号没正确配置,或者其他主机的无法正常访问,会导致zk报错提示无法连接

启动zk

启动:

1
/usr/local/work/zookeeper-3.4.13/bin/zkServer.sh start

查看状态:

1
/usr/local/work/zookeeper-3.4.13/bin/zkServer.sh status

如果有需要,可以正常停止:

1
/usr/local/work/zookeeper-3.4.13/bin/zkServer.sh stop

配置Kafka

参数文件配置

三个节点容器配置/usr/local/work/kafka_2.11-2.0.0/config目录下的server.properties文件,以broker2节点为例,主要配置为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
listeners=PLAINTEXT://:19092

# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
advertised.listeners=PLAINTEXT://172.10.0.2:19092

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:22181

将端口号进行了修改。

启动Kakfa

1
nohup /usr/local/work/kafka_2.11-2.0.0/bin/kafka-server-start.sh /usr/local/work/kafka_2.11-2.0.0/config/server.properties >/usr/local/work/log/kafka.log 2>1 &

遗留问题

这里存在一个问题,zookeeper.connect用的是本地的zk,没有使用zk的集群,所以本地zk挂掉了,这个kafka节点也会跟着挂掉。

在使用bridge模式下测试使用过如下配置,一直提示无法解析节点ip:

1
zookeeper.connect=172.10.0.1:22181,172.10.0.2:22181,172.10.0.3:22181

查看报错信息,其他节点都是传递的容器ID,没有传递真正的主机IP,试过配置容器hosts来进行解析但失败了。分析可以试一下启容器的时候加入以下参数:KAFKA_ADVERTISED_LISTENERS,在配置文件中配置advertised.listeners,实现传递主机IP。

测试

生产者:

1
/usr/local/work/kafka_2.11-2.0.0/bin/kafka-console-producer.sh --broker-list 172.10.0.1:19092,172.10.0.2:19092,172.10.0.3:19092 --topic test001

消费者

1
/usr/local/work/kafka_2.11-2.0.0/bin/kafka-console-consumer.sh  --bootstrap-server 172.10.0.1:19092,172.10.0.2:19092,172.10.0.3:19092 --from-beginning --topic test010

后记

如果主机端口没有被占用,建议使用原有端口。

需要解决两个问题:

  1. 目前Kafka与ZK是一对一的存在于容器中,配置的参数会发生某个zk的挂掉导致Kafka的挂掉。需要进行参数调整
  2. 使用的host模式不符合生产要求,以后要上生产环境需要调整

参考文章

Docker下的Kafka学习之二:搭建集群环境
Kafka跨网络访问设置