openresty 最差实践

Table of Contents

这周主要对 API gateway 进行选型，上周自己搞了搞 Kong。但是感觉和目前的公司业务相差比较多，如果要使用的话需要大量改动。所以重新进行了调查，大致选定了两个方向 Golang 实现和 nginx + Lua。

坑爹的测试
#

测试使用的工具是 wrk，最初时使用的蠢作者的开发机做测试，API gateway 和 backend 都放到了本机上，然后本机进行 wrk 压力测试。一番折腾后感觉这种本机到本机的测试状况不好，于是找同事借了他的开发机来进行模拟。本地 mac 用 wrk，我的开发机做转发，同事的开发机当后端。

诡异的事情发生了 下面的数据是今天补上的，当时的没有留档

➜  ~ wrk -c600 -t2 -d60s http://10.10.10.24 -H "Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmb28iOiJiYXIifQ.VAoRL1IU0nOguxURF2ZcKR0SGKE1gCbqwyh8u2MLAyY"
Running 1m test @ http://10.10.10.24
  2 threads and 600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   291.87ms  419.92ms   1.97s    80.58%
    Req/Sec   152.08    181.56   737.00     78.61%
  13914 requests in 1.00m, 6.67MB read
  Socket errors: connect 0, read 0, write 0, timeout 4511
  Non-2xx or 3xx responses: 3977
Requests/sec:    231.56
Transfer/sec:    113.59KB

QPS 只有 231，简直低到了极点。仔细一看，有 3977 个请求是非正常的

access.log

192.168.0.107 - - [21/Jul/2017:09:20:08 +0800] "GET / HTTP/1.1" 499 0 "-" "-"
192.168.0.107 - - [21/Jul/2017:09:20:08 +0800] "GET / HTTP/1.1" 499 0 "-" "-"

error.log

2017/07/21 09:19:43 [error] 25176#0: *16072 connect() failed (110: Connection timed out) while connecting to upstream, client: 192.168.0.107, server: , request: "GET / HTTP/1.1", upstream: "http://10.10.10.157:8080/", host: "10.10.10.24"
2017/07/21 09:19:43 [error] 25176#0: *16069 connect() failed (110: Connection timed out) while connecting to upstream, client: 192.168.0.107, server: , request: "GET / HTTP/1.1", upstream: "http://10.10.10.157:8080/", host: "10.10.10.24"

日志显示转发至 upstream 超时了，当时猜想可能连接数太多了，没有及时响应所导致的。于是上调了 proxy_connect_timeout 和 proxy_read_timeout 等参数，然而并没有卵用。我试着调小打开的连接数目(c参数)，从 10 开始逐步调到 60，QPS 呈逐步上升，60 时能有个 800 requests/sec 的样子。索性我直接调到 100，想用二分法查找出它能承受的最大数目。艹，只有 10 几了。这根本不科学，我又调到 70，还是 10 几，日志中全都是上面的那种错误。再调到 60，只有 100 多。刚才明明 800+ 的，现在却低至这种地步。我感觉并不是 openresty 的锅。而当时测试的 Golang 实现的反向代理 traefik 有着良好的表现，虽然不及 nginx，却也有着其 70% 左右的性能。当日我们一度放弃了 openresty，决定使用 Golang 进行开发。可我隐隐觉得不对劲， openresty 虽然还比较小众，但是也有公司在实际使用，总不可能性能低至这种地步。于是我向技术总监说了这件奇怪的事，他也一脸懵逼，可毕竟老油条，他说会不会是内网有什么问题，给了我四台 aliyun 的机子，让我再去 benchmark

一番编译配置等繁琐的工序后，openresty 貌似展现出正常的性能了，下面是测试数据

        backend1 2core/2G           backend2 2core/2G
                \                        /
                 \                      /
               openresty/nginx/traefik/echo  转发
                            |
                            |
                          client

openresty 转发
#

➜  ~ wrk -c600 -t2 -d60s http://192.168.8.16 -H "Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmb28iOiJiYXIifQ.VAoRL1IU0nOguxURF2ZcKR0SGKE1gCbqwyh8u2MLAyY"
Running 1m test @ http://192.168.8.16
  2 threads and 600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   104.33ms   10.05ms 367.00ms   85.91%
    Req/Sec     2.88k   312.83     5.70k    85.83%
  343575 requests in 1.00m, 374.19MB read
Requests/sec:   5723.93
Transfer/sec:      6.23MB

nginx 转发
#

➜  ~ wrk -c600 -t2 -d60s http://192.168.8.16 -H "Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmb28iOiJiYXIifQ.VAoRL1IU0nOguxURF2ZcKR0SGKE1gCbqwyh8u2MLAyY"
Running 1m test @ http://192.168.8.16
  2 threads and 600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   116.33ms  163.45ms   1.44s    93.53%
    Req/Sec     3.61k     0.88k    6.58k    72.83%
  430983 requests in 1.00m, 349.36MB read
  Socket errors: connect 0, read 0, write 0, timeout 106
Requests/sec:   7180.68
Transfer/sec:      5.82MB

traefik 转发
#

➜  ~ wrk -c600 -t2 -d60s http://192.168.8.16:8090 -H "Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmb28iOiJiYXIifQ.VAoRL1IU0nOguxURF2ZcKR0SGKE1gCbqwyh8u2MLAyY"
Running 1m test @ http://192.168.8.16:8090
  2 threads and 600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   145.90ms   59.36ms 684.21ms   75.25%
    Req/Sec     2.08k   492.73     3.53k    71.81%
  248381 requests in 1.00m, 187.13MB read
Requests/sec:   4137.66
Transfer/sec:      3.12MB

echo 转发
#

➜  ~ wrk -c600 -t2 -d60s http://192.168.8.16:1323 -H "Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJmb28iOiJiYXIifQ.VAoRL1IU0nOguxURF2ZcKR0SGKE1gCbqwyh8u2MLAyY"
Running 1m test @ http://192.168.8.16:1323
  2 threads and 600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   172.37ms   53.05ms 623.39ms   76.56%
    Req/Sec     1.75k   490.84     3.02k    66.67%
  208662 requests in 1.00m, 157.21MB read
Requests/sec:   3475.66
Transfer/sec:      2.62MB

总体来看, openresty 有着仅次于 nginx 的转发性能。并且易于进行扩展开发，能够满足日益增长的业务需求。但想用 openresy 就要去雪 Lua，可貌似 Lua 对于我们来说只能做 openresty。Golang 则还可以开发一些别的项目。经过一番商量最终还是决定入 openresty 这个坑。

附测试时的配置

openresty
#

worker_processes 1;
worker_rlimit_nofile 200000;

events {
    worker_connections 10000;
    use epoll;
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_requests 100000;
    types_hash_max_size 2048;
    keepalive_timeout 600;
    proxy_connect_timeout 600;
    proxy_read_timeout 600;
    open_file_cache max=200000 inactive=300s;
    open_file_cache_valid 300s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;

    upstream backend {
        server 192.168.8.10;
        server 192.168.8.17;
    }
    server {
        listen 80;

        location / {
            access_by_lua '
                local cjson = require "cjson"
                local jwt = require "resty.jwt"

                local jwt_token = ngx.req.get_headers()["Token"]
                local jwt_obj = jwt:verify("lua-resty-jwt", jwt_token)
                -- verified the token
                if jwt_obj.verified == true then
                    ngx.header["User-Info"] = cjson.encode(jwt_obj)
                else
                    ngx.exit(ngx.HTTP_FORBIDDEN)
                end
            ';
            proxy_pass http://backend;
        }
    }
}

nginx 配置
#

worker_processes 1;
worker_rlimit_nofile 200000;

events {
    worker_connections 10000;
    use epoll;
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_requests 100000;
    types_hash_max_size 2048;
    keepalive_timeout 600;
    proxy_connect_timeout 600;
    proxy_read_timeout 600;
    open_file_cache max=200000 inactive=300s;
    open_file_cache_valid 300s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;

    upstream backend {
        server 192.168.8.10;
        server 192.168.8.17;
    }
    server {
        listen 80;

        location / {
           proxy_pass http://backend;
        }
    }
}

traefik 配置
#

MaxIdleConnsPerHost = 100000
defaultEntryPoints = ["http"]
# logLevel = "DEBUG"

[entryPoints]
  [entryPoints.http]
  address = ":8090"

[file]
  [backends]
    [backends.httpecho]
      [backends.httpecho.servers.server1]
        url = 'http://192.168.8.10'
        weight = 1
      [backends.httpecho.servers.server2]
        url = "http://192.168.8.17"
        weight = 1
  [frontend]
    [frontends.fe1]
    backend = "httpecho"

server.go
#

package main

import (
    "net/url"

    "github.com/labstack/echo"
    "github.com/labstack/echo/middleware"
)

func main() {
    e := echo.New()

    // Setup proxy
    url1, err := url.Parse("http://192.168.8.10")
    if err != nil {
        e.Logger.Fatal(err)
    }
    url2, err := url.Parse("http://192.168.8.17")
    if err != nil {
        e.Logger.Fatal(err)
    }
    e.Use(middleware.Proxy(&middleware.RoundRobinBalancer{
        Targets: []*middleware.ProxyTarget{
            &middleware.ProxyTarget{
                URL: url1,
            },
            &middleware.ProxyTarget{
                URL: url2,
            },
        },
    }))

    e.Start(":1323")
}

坑爹的测试#

openresty 转发#

nginx 转发#

traefik 转发#

echo 转发#

openresty#

nginx 配置#

traefik 配置#

server.go#