Consul 上手指南

Introduction

Consul 是一个分布式高可用(distributed,highly available)的系统。它提供了多个组件,但总体来说是用来做服务发现和服务配置的工具。其具备以下几个关键特性:

  • 服务发现
  • 健康检查
  • k/v 存储
  • Multi Datacenters

每一个向 Consul 提供 service 的 node 都运行着一个 agent。运行一个 agent 并不是发现其他 service 或设置/获取 k/v 所必须的。agent 负责 node 上运行的 service 及 node 自身的健康检查。agent 和一个和多个 Consul server 进行交互。server 用于数据存储和复制,并且能够自行选举 leader。虽然一台 server 就能完成功能,但建议配置 3 到 5 台来避免失败情况下的数据丢失。每个 datacenter 建议配置一个 server 集群。基础设施中的任一部分想要发现其他 service 或者 node 时可以向任何一个 Consul server 进行查询。也可以向 agent 发起查询,agent 会自动转发查询请求到 server。每一个 datacenter 运行着一组 Consul server 的集群,当一个跨 datacenter 的服务发现或配置请求创建时,本地的 Consul server 会转发这个请求到远程的 datacenter 并返回结果

根据 Dockerfile 可以看到 Consul 使用了如下几个端口

# Server RPC is used for communication between Consul clients and servers for internal
# request forwarding.
EXPOSE 8300

# Serf LAN and WAN (WAN is used only by Consul servers) are used for gossip between
# Consul agents. LAN is within the datacenter and WAN is between just the Consul
# servers in all datacenters.
EXPOSE 8301 8301/udp 8302 8302/udp

# HTTP and DNS (both TCP and UDP) are the primary interfaces that applications
# use to interact with Consul.
EXPOSE 8500 8600 8600/udp  

Learning Consul

本文直接使用 Consul 的 docker。Consul 的 agent 既可以当 server 又可以当 client。每一个 datacenter 至少须有一个 server。client 是一个轻量级的进程用来服务注册,健康检查,转发查询至 server。agent 必须运行在集群中的每个 node 上

sagiri ➜  ~ docker run -it -d --name agent-1 consul agent  
1030b879f56db461ce2e207d60450d433a4952ed4087e01bf5d3d9bba3cd942e  

进入容器可以查看现在集群中的成员,因为使用 alpine 镜像进行构建的,所以是 /bin/ash

sagiri ➜  ~ docker exec -it agent-1 /bin/ash  
/ # consul members
Node          Address          Status  Type    Build  Protocol  DC  
1030b879f56d  172.17.0.2:8301  alive   client  0.9.0  2         dc1  

向 Consul 中注册服务有两种方式,一种是通过配置文件,比如

# /etc/consul.d/web.json
{"service": {"name": "web", "tags": ["rails"], "port": 80}}
# 启动 Consul 中指定 -config-dir=/etc/consul.d

另一种则是通过 RESTful API

对于查询,Consul 提供了 HTTP API 和 DNS API 两种方式

当一个 Consul agent 启动时,它是孤立的。为了加入一个已经存在的集群,仅需要知道一个已经存在的成员即可(不一定 server 模式)。通过和这个成员进行信息交换,可以发现这个集群中的其他成员

让我们开启一个 server mode 的 agent,然后将 agent-1 加入其中

sagiri ➜  ~ docker run -it -d --name agent-2 consul agent -server -bootstrap  
78072cb46967cc3381e0aac03841891778f03250c87a784f01dd028dafd59069  
sagiri ➜  ~ docker inspect --format '{{ .NetworkSettings.IPAddress }}' agent-2  
172.17.0.3  
sagiri ➜  ~ docker exec -it agent-1 /bin/ash  
/ # consul join 172.17.0.3
Successfully joined cluster by contacting 1 nodes.  
/ # consul members
Node          Address          Status  Type    Build  Protocol  DC  
1030b879f56d  172.17.0.2:8301  alive   client  0.9.0  2         dc1  
78072cb46967  172.17.0.3:8301  alive   server  0.9.0  2         dc1  

使用 SIGINT 信号(如 Ctrl-C)退出,Consul 会认为这个 node 离开,将其从记录中删除。如果强制停止 agent 进程,集群的其他成员会认为这个 node 失败,所以进行自动重连,期望它能够恢复

让我们再添加一个 agent,然后删除这个容器会看到 status 为 failed

sagiri ➜  ~ docker run -it -d --name agent-3 consul agent  
32c142582209524348de15c1a6f48f385d4f8e3c99f1e61ab49d434ff8716c6a  
sagiri ➜  ~ docker exec -it agent-3 /bin/ash  
/ # consul members
Node          Address          Status  Type    Build  Protocol  DC  
32c142582209  172.17.0.4:8301  alive   client  0.9.0  2         dc1  
/ # consul join 172.17.0.3
Successfully joined cluster by contacting 1 nodes.  
/ # consul members
Node          Address          Status  Type    Build  Protocol  DC  
1030b879f56d  172.17.0.2:8301  alive   client  0.9.0  2         dc1  
32c142582209  172.17.0.4:8301  alive   client  0.9.0  2         dc1  
78072cb46967  172.17.0.3:8301  alive   server  0.9.0  2         dc1  
/ # exit
sagiri ➜  ~ docker rm -f agent-3  
agent-3  
sagiri ➜  ~ docker exec -it agent-1 /bin/ash  
/ # consul members
Node          Address          Status  Type    Build  Protocol  DC  
1030b879f56d  172.17.0.2:8301  alive   client  0.9.0  2         dc1  
32c142582209  172.17.0.4:8301  failed  client  0.9.0  2         dc1  
78072cb46967  172.17.0.3:8301  alive   server  0.9.0  2         dc1  

接着我们在 agent-1 中执行 leave,可以看到 status 为 left

/ # consul leave
Graceful leave complete  
sagiri ➜  ~ docker exec -it agent-2 /bin/ash  
/ # consul members
Node          Address          Status  Type    Build  Protocol  DC  
1030b879f56d  172.17.0.2:8301  left    client  0.9.0  2         dc1  
32c142582209  172.17.0.4:8301  failed  client  0.9.0  2         dc1  
78072cb46967  172.17.0.3:8301  alive   server  0.9.0  2         dc1  

Consul 支持 k/v 存储,可以通过两种方式 HTTP API 和 Consul Command

/ # consul kv put foo 1
Success! Data written to: foo  
/ # consul kv get foo
1  
/ # consul kv delete foo
Success! Deleted key: foo  
/ # consul kv get foo
Error! No key exists at: foo  
/ # consul kv put redis/config/maxconns 20
Success! Data written to: redis/config/maxconns  
/ # consul kv get redis/config/maxconns
20  

除此之外,Consul 提供了原子性的操作,通过指定 ModifyIndex 的值,如果相等才会进行更改,如果不同则会失败

/ # consul kv get -detailed redis/config/maxconns
CreateIndex      94  
Flags            0  
Key              redis/config/maxconns  
LockIndex        0  
ModifyIndex      94  
Session          -  
Value            20  
/ # consul kv put -cas --modify-index=94 redis/config/maxconns 25
Success! Data written to: redis/config/maxconns  
/ # consul kv put -cas --modify-index=94 redis/config/maxconns 26
Error! Did not write to redis/config/maxconns: CAS failed  

Cosul 还提供了 Web 管理界面

docker run -it -d -p 8500:8500 --name agent-ui consul agent -ui --client 0.0.0.0 -join 172.17.0.3  

实战

下面来介绍使用 Consul 进行服务配置自动化的步骤,主要用到了 Confd 和 Consul

首先来安装 Confd

sagiri ➜  ~ git clone https://github.com/kelseyhightower/confd.git  
sagiri ➜  ~ cd confd  
sagiri ➜  confd git:(master) docker build -t confd_builder -f Dockerfile.build.alpine .  
# omit output
sagiri ➜  confd git:(master) docker run -ti --rm -v $(pwd):/app confd_builder ./build  
Building confd...  
sagiri ➜  confd git:(master) cd bin  
sagiri ➜  bin git:(master) ./confd  
zsh: no such file or directory: ./confd  
sagiri ➜  bin git:(master) ll  
total 21M  
-rwxr-xr-x 1 root root 21M Jul 30 18:23 confd
sagiri ➜  bin git:(master) file confd  
confd: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, with debug_info, not stripped  

如果运行时出现 no such file or directory: ./confd 可能是缺少 /lib/ld-musl-x86_64.so.1。通过 yaourt -S musl 可以解决

创建如下工程结构

.               
├── conf.d      
│   └── config.toml              
└── templates   
    └── demo.conf.tmpl

conf.d/config.toml

[template]
src = "demo.conf.tmpl"  
dest = "/opt/openresty/nginx/conf/nginx.conf"  
keys = ["/nginx",]

reload_cmd = "/opt/openresty/nginx/sbin/nginx -s reload"  

templates/demo.conf.tmpl

worker_processes  1;

events {  
    worker_connections  1024;
}


http {  
    server {
        listen       80;
        location / {
            proxy_pass http://www.{{getv "/nginx/http/server/proxy"}};
        }
    }
}

启动 nginx

/opt/openresty/nginx/sbin/nginx

启动 Consul

consul agent -server -bootstrap -data-dir /tmp/consul  

启动 Confd, 每 2 秒去询问 Consul,判断是否更新配置

confd -confdir="./" -config-file="./conf.d/config.toml" -interval=2 -backend consul -node localhost:8500  

因为现在没有 k/v,所以 Confd 这边应当会一直 ERROR

2017-07-30T20:45:19+09:00 archlinux confd[29973]: ERROR template: demo.conf.tmpl:12:36: executing "demo.conf.tmpl" at <getv "/nginx/http/se...>: error calling getv: key does not exist: /nginx/http/server/proxy  

添加记录

sagiri ➜ curl -X PUT -d 'google.com' localhost:8500/v1/kv/nginx/http/server/proxy  
true#  

从 log 中可以看到 confd 已经成功更新 nginx.conf,并且将 nginx reload

2017-07-30T20:58:47+09:00 archlinux confd[30480]: INFO /opt/openresty/nginx/conf/nginx.conf has md5sum 913ee7c8298ebc1233ed26bee3521abf should be 6fc369b6edb94040b95711d444132884  
2017-07-30T20:58:47+09:00 archlinux confd[30480]: INFO Target config /opt/openresty/nginx/conf/nginx.conf out of sync  
2017-07-30T20:58:48+09:00 archlinux confd[30480]: INFO Target config /opt/openresty/nginx/conf/nginx.conf has been updated