goreplay构建测试环境

发表于 2017-04-11 分类于 Study

GoReplay is an open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data. It can be used to increase confidence in code deployments, configuration changes and infrastructure changes. https://goreplay.org

goreplay简介官方网站 github地址

GoReplay is the simplest and safest way to test your app using real traffic before you put it into production.

As your application grows, the effort required to test it also grows exponentially. GoReplay offers you the simple idea of reusing your existing traffic for testing, which makes it incredibly powerful. Our state of art technique allows to analyze and record your application traffic without affecting it. This eliminates the risks that come with putting a third party component in the critical path.

GoReplay increases your confidence in code deployments, configuration changes and infrastructure changes. Did we mention that no coding is required?

Here is basic workflow: The listener server catches http traffic and sends it to the replay server or saves to file. The replay server forwards traffic to a given address.

Installation

Download latest binary from https://github.com/buger/gor/releases or compile by yourself.

这里用方法1进行操作。下面是详细步骤

1、下载goreplay最新发布版本

https://github.com/buger/gor/releases

目前的最新版本是v0.16.0。这里我们下载最新的linux环境版本 gor_0.16.0_x64.tar.gz。

2、将下载好的文件解压

tar -zxf gor_0.16.0_x64.tar.gz

Getting started

The most basic setup will be

sudo ./gor --input-raw :8000 --output-stdout

which acts like tcpdump. If you already have test environment you can start replaying:

sudo ./gor --input-raw :8000 --output-http http://staging.env.

See the our documentation and Getting started page for more info.

1、测试环境准备

假设正式环境App部署端口为 8080，测试环境部署端口为 8090（同一台机器），均正常运行。

2、测试goreplay是否可以正常运行。

启动gor，监听正式环境http请求，并列印到日志中

sudo ./gor --input-raw :8080 --output-stdout

在浏览器中点击正式环境链接，在goreplay的控制台输出为：

正式运行，有以下两种模式可供选择

1、Replaying

Now it’s time to replay your original traffic to another environment. Let’s start the same file web server but on a different port: gor file-server :8001.

Instead of –output-stdout we will use –output-http and provide URL of second server:

sudo ./gor --input-raw :8000 --output-http="http://localhost:8001"

2、Saving requests to file and replaying them later

Sometimes it’s not possible to replay requests in real time; Gor allows you to save requests to the file and replay them later.

First use –output-file to save them:

sudo ./gor --input-raw :8000 --output-file=requests.gor

This will create new file and continuously write all captured requests to it.

Let’s re-run Gor, but now to replay requests from file:

./gor --input-file requests.gor --output-http="http://localhost:8001"

You should see all the recorded requests coming to the second server, and they will be replayed in the same order and with exactly same timing as they were recorded.

这里用Replaying进行操作。下面是详细步骤

1、运行goreplay

sudo ./gor --input-raw :8080 --output-http="http://localhost:8090"

2、查看app1、app2的运行日志

可以看到app1和app2都被访问了，但是app1的正常访问不受任何影响。

3、后台自动运行goreplay

sudo nohup ./gor --input-raw :8080 --output-http="http://localhost:8090" &

tail -500f nohup.out

4、不使用root用户运行 Running as non root user

You can enable Gor for non-root users in a secure method by using the following commands

# Following commands assume that you put `gor` binary to /usr/local/bin
add gor
addgroup <username> gor
chgrp gor /usr/local/bin/gor
chmod 0750 /usr/local/bin/gor
setcap "cap_net_raw,cap_net_admin+eip" /usr/local/bin/gor

As a brief explanation of the above.

We create a group called gor.
We then add the user you want to the new group so they will be able to use gor without sudo
We then change the user/group of gor binary the new group.
We then make sure the permissions are set on gor binary so that members of the group can execute it but other normal users cannot.
We then use setcap to give the CAP_NET_RAW and CAP_NET_ADMIN privilege to the executable when it runs. This is so that Gor can open its raw socket which is not normally permitted unless you are root.

上述大意为

创建一个组gor
将对应用户添加到gor组
授权

下面是详细步骤

1、创建gor组

groupadd gor

2、将需要执行的用户添加到gor组

将ubuntu用户加入gor组

addgroup ubuntu gor

3、授权

编译好的gor二进制文件所在路径为：/usr/local/bin/gor

chgrp gor /usr/local/bin/gor

chmod 0750 /usr/local/bin/gor

setcap "cap_net_raw,cap_net_admin+eip" /usr/local/bin/gor

4、运行

切换到ubuntu用户

su ubuntu

切换到gor组

newgrp gor

进入gor目录

cd /usr/local/bin

运行gor

./gor --input-raw :8000 --output-stdout

goreplay高级进阶更多详情

1、保存到文件并从文件中转发 Saving and Replaying from file

You can save requests to file, and replay them later. While replaying it will preserve the original time differences between requests. If you apply “percentage based limiting”、”Rate Limiting” timing between requests will be reduced or increased appropriately: this approach opens possibilities like load testing, see below.

# write to file
gor --input-raw :80 --output-file requests.log

# read from file
gor --input-file requests.gor --output-http "http://staging.com"

By default Gor writes files in chunks. This configurable using –output-file-append option: the flushed chunk is appended to existence file or not. The default is false. By default, –output-file flushes each chunk to a different path.

gor ... --output-file %Y%m%d.log
# append false
20140608_0.log
20140608_1.log
20140609_0.log
20140609_1.log

This makes parallel file processing easy. But if you want to disable this behavior, you can disable it by adding --output-file-append option:

gor ... --output-file %Y%m%d.log --output-file-append
# append true
20140608.log
20140609.log

If you run gor multiple times, and it finds existing files, it will continue from last known index.

Chunk size

You can set chunk limits using --output-file-size-limit and --output-file-queue-limit options. The length of the chunk queue and the size of each chunk, respectively. The default values are 256 and 32mb, respectively. The suffixes “k” (KB), “m” (MB), and “g” (GB) can be used for output-file-size-limit. If you want to have only size constraint, you can set --output-file-queue-limit to 0, and vice versa.

gor --input-raw :80 --output-file %Y-%m-%d.gz --output-file-size-limit 256m --output-file-queue-limit 0

Using date variables in file names

For example, you can tell to create new file each hour: --output-file /mnt/logs/requests-%Y-%m-%d-%H.log It will create new file for each hour: requests-2016-06-01-12.log, requests-2016-06-01-13.log, …

The time format used as part of the file name. The following characters are replaced with actual values when the file is created:

%Y: year including the century (at least 4 digits)
%m: month of the year (01..12)
%d: Day of the month (01..31)
%H: Hour of the day, 24-hour clock (00..23)
%M: Minute of the hour (00..59)
%S: Second of the minute (00..60)

The default format is %Y%m%d%H, which creates one file per hour.

GZIP compression

To read or write GZIP compressed files ensure that file extension ends with “.gz”: --output-file log.gz

Replaying from multiple files

--input-file accepts file pattern, for example: --input-file logs-2016-05-*: it will replay all the files, sorting them in lexicographical order.

2、性能测试 Performance testing

Currently, this functionality supported only by input-file and only when using percentage based limiter. Unlike default limiter for input-file instead of dropping requests it will slowdown or speedup request emitting. Note that limiter is applied to input:

# Replay from file on 2x speed 
gor --input-file "requests.gor|200%" --output-http "staging.com"

Use --stats --output-http-stats to see latency stats.

Looping files for replaying indefinitely

You can loop the same set of files, so when the last one replays all the requests, it will not stop, and will start from first one again. Having the only small amount of requests you can do extensive performance testing. Pass --input-file-loopto make it work.

3、限流 Rate limiting

Rate limiting can be useful if you only want to forward parts of incoming traffic, for example, to not overload your test environment. There are two strategies: dropping random requests or dropping fractions of requests based on Header or URL param value.

Dropping random requests

Every input and output support random rate limiting. There are two limiting algorithms: absolute or percentage based.

Absolute: If for current second it reached specified requests limit - disregard the rest, on next second counter reset.

Percentage: For input-file it will slowdown or speedup request execution, for the rest it will use the random generator to decide if request pass or not based on the chance you specified.

You can specify your desired limit using the “|” operator after the server address, see examples below.

Limiting replay using absolute number

# staging.server will not get more than ten requests per second
gor --input-tcp :28020 --output-http "http://staging.com|10"

Limiting listener using percentage based limiter

# replay server will not get more than 10% of requests 
# useful for high-load environments
gor --input-raw :80 --output-tcp "replay.local:28020|10%"

Consistent limiting based on Header or URL param value

If you have unique user id (like API key) stored in header or URL you can consistently forward specified percent of traffic only for the fraction of this users. Basic formula looks like this: FNV32-1A_hashing(value) % 100 >= chance. Examples:

# Limit based on header value
gor --input-raw :80 --output-tcp "replay.local:28020|10%" --http-header-limiter "X-API-KEY: 10%"

# Limit based on header value
gor --input-raw :80 --output-tcp "replay.local:28020|10%" --http-param-limiter "api_key: 10%"

When limiting based on header or param only percentage based limiting supported.

4、请求过滤 Request filtering

Filtering is useful when you need to capture only specific part of traffic, like API requests. It is possible to filter by URL, HTTP header or HTTP method.

Allow url regexp

# only forward requests being sent to the /api endpoint
gor --input-raw :8080 --output-http staging.com --http-allow-url /api

Disallow url regexp

# only forward requests NOT being sent to the /api... endpoint
gor --input-raw :8080 --output-http staging.com --http-disallow-url /api

Filter based on regexp of header

# only forward requests with an api version of 1.0x
gor --input-raw :8080 --output-http staging.com --http-allow-header api-version:^1\.0\d

# only forward requests NOT containing User-Agent header value "Replayed by Gor"
gor --input-raw :8080 --output-http staging.com --http-disallow-header "User-Agent: Replayed by Gor"

Filter based on HTTP method

Requests not matching a specified whitelist can be filtered out. For example to strip non-nullipotent requests:

gor --input-raw :80 --output-http "http://staging.server" \
    --http-allow-method GET \
    --http-allow-method OPTIONS

5、请求重写 Request rewriting

Gor supports rewriting of URLs, URL params and headers, see below.

Rewriting may be useful if you test environment does not have the same data as your production, and you want to perform all actions in the context of test user: for example rewrite all API tokens to some test value. Other possible use cases are toggling features on/off using custom headers or rewriting URL’s if they changed in the new environment.

For more complex logic you can use Middleware.

Rewrite URL based on a mapping

--http-rewrite-url expects value in “:” format: “:” is a dilimiter. In <replace> section you may use captured regexp group values. This works similar to replace method in Javascript or gsub in Ruby.

# Rewrites all `/v1/user/<user_id>/ping` requests to `/v2/user/<user_id>/ping`
gor --input-raw :8080 --output-http staging.com --http-rewrite-url /v1/user/([^\\/]+)/ping:/v2/user/$1/ping

Set URL param

Set request url param, if param already exists it will be overwritten.

gor --input-raw :8080 --output-http staging.com --http-set-param api_key=1

Set Header

Set request header, if header already exists it will be overwritten. May be useful if you need to identify requests generated by Gor or enable feature flagged functionality in an application:

gor --input-raw :80 --output-http "http://staging.server" \
    --http-header "User-Agent: Replayed by Gor" \
    --http-header "Enable-Feature-X: true"

Host header

Host header gets special treatment. By default Host get set to the value specified in –output-http. If you manually set –http-header “Host: anonther.com”, Gor will not override Host value.

If you app accepts traffic from multiple domains, and you want to keep original headers, there is specific --http-original-host with tells Gor do not touch Host header at all.

6、中间件 Middleware

Overview

Middleware is a program that accepts request and response payload at STDIN and emits modified requests at STDOUT. You can implement any custom logic like stripping private data, advanced rewriting, support for oAuth and etc. Check examples included into our repo.

                  Original request      +--------------+
+-------------+----------STDIN---------->+              |
|  Gor input  |                          |  Middleware  |
+-------------+----------STDIN---------->+              |
                   Original response (1) +------+---+---+
                                                |   ^
+-------------+    Modified request             v   |
| Gor output  +<---------STDOUT-----------------+   |
+-----+-------+                                     |
      |                                             |
      |            Replayed response                |
      +------------------STDIN----------------->----+

(1): Original responses will only be sent to the middleware if the --input-raw-track-response option is specified.

Middleware can be written in any language, see examples/middleware folder for examples. Middleware program should accept the fact that all communication with Gor is asynchronous, there is no guarantee that original request and response messages will come one after each other. Your app should take care of the state if logic depends on original or replayed response, see examples/middleware/token_modifier.go as example.

Simple bash echo middleware (returns same request) will look like this:

while read line; do
  echo $line
end

Middleware can be enabled using --middleware option, by specifying path to executable file:

gor --input-raw :80 --middleware "/opt/middleware_executable" --output-http "http://staging.server"

Communication protocol

All messages should be hex encoded, new line character specifieds the end of the message, eg. new message per line.

Decoded payload consist of 2 parts: header and HTTP payload, separated by new line character.

Example request payload:

1 932079936fa4306fc308d67588178d17d823647c 1439818823587396305
GET /a HTTP/1.1
Host: 127.0.0.1

Example response payload (note: you will only receive this if you specify --input-raw-track-response)

2 8e091765ae902fef8a2b7d9dd960e9d52222bd8c 1439818823587996305 2782013
HTTP/1.1 200 OK
Date: Mon, 17 Aug 2015 13:40:23 GMT
Content-Length: 0
Content-Type: text/plain; charset=utf-8

Header contains request meta information separated by spaces. First value is payload type, possible values: 1 - request, 2 - original response, 3 - replayed response. Next goes request id: unique among all requests (sha1 of time and Ack), but remain same for original and replayed response, so you can create associations between request and responses. The third argument is the time when request/response was initiated/received. Forth argument is populated only for responses and means latency.

HTTP payload is unmodified HTTP requests/responses intercepted from network. You can read more about request format here, here and here. You can operate with payload as you want, add headers, change path, and etc. Basically you just editing a string, just ensure that it is RCF compliant.

At the end modified (or untouched) request should be emitted back to STDOUT, keeping original header, and hex-encoded. If you want to filter request, just not send it. Emitting responses back is required, even if you did not touch them.

Advanced example

Imagine that you have auth system that randomly generate access tokens, which used later for accessing secure content. Since there is no pre-defined token value, naive approach without middleware (or if middleware use only request payloads) will fail, because replayed server have own tokens, not synced with origin. To fix this, our middleware should take in account responses of replayed and origin server, store originalToken -> replayedToken aliases and rewrite all requests using this token to use replayed alias. See examples/middleware/token_modifier.go and middleware_test.go#TestTokenMiddleware as example of described scheme.

7、分布式部署 Distributed configuration

Sometimes it makes sense to use separate Gor instance for replaying traffic and performing things like load testing, so your production machines do not spend precious resources. It is possible to configure Gor on your web machines forward traffic to Gor aggregator instance running on the separate server.

# Run on servers where you want to catch traffic. You can run it on each `web` machine.
sudo gor --input-raw :80 --output-tcp replay.local:28020

# Replay server (replay.local).
gor --input-tcp replay.local:28020 --output-http http://staging.com

If you have multiple replay machines you can split traffic among them using –split-output option: it will equally split all incoming traffic to all outputs using round robin algorithm.

gor --input-raw :80 --split-output --output-tcp replay1.local:28020 --output-tcp replay2.local:28020

In case if you are planning a large load testing, you may consider use separate master instance which will control Gor slaves which actually replay traffic. For example:

# This command will read multiple log files, replay them on 10x speed and loop them if needed for 30 seconds, and will distributed traffic (tcp session aware) among multiple workers
gor --input-file logs_from_multiple_machines.*|1000% --input-file-loop --exit-after 30s --recognize-tcp-sessions --split-output --output-tcp worker1.local --output-tcp worker2.local:27017 --output-tcp worker3.local:27017 ...  --output-tcp workerN.local:27017

# worker 
gor --input-tcp :27017 --ouput-http load_test.target

附录：go环境构建

goreplay是用go语言编写，如果自己手动编译goreplay的二进制文件，需要go语言环境。下面是go环境构建的步骤。

如果是使用官方编译好的二进制文件则不需要运行环境，如本案例。

下载linux版本的go

https://golang.org/dl/

目前官方最新版本为：go1.8.1，下载文件go1.8.1.linux-amd64.tar.gz

解压

tar -C /usr/local -xzf go1.8.1.linux-amd64.tar.gz

配置环境变量

export PATH=$PATH:/usr/local/go/bin

更多详细信息，请访问go官方网站 https://golang.org/

goreplay简介 官方网站 github地址