兔子进洞算法_下兔子洞:一个varnishreload错误的故事-第1部分

兔子进洞算法

After hitting the keyboard buttons for the past 20 minutes, as if he was typing for his life, ghostinushanka turns to me with a half-mad look in his eyes and a sly smile, “Dude, I think I got it.

在过去的20分钟里,他一直按着键盘按键,仿佛他正在为自己的生活而打字一样, ghostinushanka转向我, 眼神有些发狂,露出狡猾的微笑,“老兄,我想我明白了。

Look at this” — as he points to one of the characters on screen — “I bet my red hat that if we add what I’ve just sent you here” — as he points to another place in the code — “there will be no error anymore.” Slightly puzzled and tired I modify the sed expression we’ve been figuring out for some time now, save the file and run systemctl varnish reload. Error message gone…

看一下这个”,当他指向屏幕上的一个字符时,“我敢打赌,如果我们在这里加上我刚刚发给您的内容,”他指的是代码中的另一个位置,“将会不再有错误。” 有点困惑和疲倦,我修改了我们已经计算了一段时间的sed表达式,保存了文件并运行systemctl varnish reload 。 错误消息不见了…

“Those emails I’ve exchanged with the candidate,” my colleague continues, as his smile changes to a wide and genuine grin, “It suddenly struck me that this is the very same exact problem!”

我的同事继续说道:“我已经与候选人交换了那些电子邮件,当他的笑容变得宽广而真诚的笑容时,“突然让我感到震惊的是,这是完全相同的问题!”

一切如何开始 (How it all began)

This article assumes some familiarity with bash, awk and systemd. Some knowledge of Varnish is beneficial, but not required. Timestamps in example snippets have been redacted. Co-authored with ghostinushanka.

本文假定您对bash,awk和systemd有所了解。 对Varnish的一些了解是有益的,但不是必需的。 示例片段中的时间戳已被删除。 与ghostinushanka合着。

Sun shines through the wall-sized windows on yet another warm autumn morning, a cup of freshly brewed caffeinated liquid sits to the side of the keyboard, headphones vocalize the beloved symphony of sounds covering the rustle of mechanical keyboards around and the first entry in backlog on kanban board playfully displays the fateful ticket’s title “Investigate varnishreload sh: echo: I/O error in staging”. Whenever Varnish is concerned, there is no room for error(s), even though this particular one didn’t seem to be causing any actual problems.

在又一个温暖的秋天早晨,阳光透过墙壁大小的窗户照进来,一杯新鲜煮熟的含咖啡因的液体坐在键盘的侧面,耳机发出悦耳的交响曲,涵盖了机械键盘的沙沙声,以及积压的第一笔作品。在看板上,调皮地显示了命运票的标题“调查varnishreload sh: echo: I/O error登台中的sh: echo: I/O error ”。 每当涉及Varnish时,就没有错误的余地,即使这一特定问题似乎并没有引起任何实际问题。

For those of you unacquainted with varnishreload, it is simply a shell script used to reload the configuration — also called the VCL — of the Varnish caching server.

对于不熟悉varnishreload的用户来说 ,它只是一个shell脚本,用于重新加载Varnish缓存服务器的配置(也称为VCL)。

As the ticket's title hints, the error has been encountered on one of the staging machines and I was pretty sure the Varnish routing does work in the staging environment, so my assumption was that this has to be some minor issue. Just a user-friendly output message written to a closed stream. I grab the ticket, firmly believing I'll be able to mark it resolved in under 30 minutes, pat myself on the back for clearing yet another mundane task and get back to more important things.

正如票证的标题所暗示的那样,其中一台登台计算机上遇到了错误,并且我非常确定Varnish路由在登台环境中可以正常工作,因此我认为这必须是一个小问题。 只是将用户友好的输出消息写入封闭流中。 我拿到票,坚信可以在30分钟之内将其标记为已解决,拍拍自己的背部以完成另一项平凡的任务,然后回到更重要的事情上。

以200kph的速度撞墙 (Hitting the wall at 200kph)

Opening the varnishreload file on one of the affected servers running on Debian Stretch, I find a shell script less than 200 lines long. Briefly reading through it, I see nothing dangerous that would prevent me from running the script from terminal over and over again. After all, this is staging, even if it breaks, no one is going to complain, well… not too much, that is. I run the script and observe, only to find out that there are no errors to be seen. A couple more repeated runs to make reasonably sure that I cannot reproduce the error without any extra effort and I start devising plans to tweak and bend the script's environment. Does closing STDOUT for the script altogether (with > &-) help anything? Or STDERR? Neither did.

在Debian Stretch上运行的一台受影响的服务器上打开varnishreload文件,我发现一个Shell脚本少于200行。 简要阅读它,我发现没有什么危险可以阻止我从终端一遍又一遍地运行脚本。 毕竟,这是分阶段的,即使它崩溃了,也没有人会抱怨,嗯……不是太多。 我运行脚本并进行观察,只是发现没有错误可以看到。 再经过几次重复运行,以确保没有任何额外的努力就无法重现错误,并且我开始制定计划来调整和改变脚本的环境。 完全关闭脚本的STDOUT(使用> &- )是否有帮助? 还是STDERR? 都没有。

Obviously systemd mangles the environment in some way, but how, and… why? I fire up vim and edit the system’s varnishreload, adding set -x right under the shebang, hoping that the detailed script run output will shed some light.

显然,系统化的系统以某种方式破坏了环境,但是如何,以及……为什么呢? 我启动vim并编辑系统的varnishreload ,在shebang的正下方添加set -x ,希望详细的脚本运行输出能有所varnishreload

File is patched, so I reload varnish, only to see that the change had completely broken the script… Output is a complete mess displaying tons of C-style code and the default scrollback buffer is not enough to find where does it come from. I feel confused. Could setting debug option for the shell script break the program it calls? No, can’t be. A bug in the shell? Multiple possible scenarios running wildly in different directions in my mind. A cup of caffeinated beverage is instantly finished, quick trip to the kitchen for a refill and here we go again. I open the file and look closely at the shebang: #!/bin/sh.

文件打了补丁,所以我重新加载了清漆,只是看到更改完全破坏了脚本…输出是一团糟,显示了大量的C样式代码,默认的回滚缓冲区不足以找到它的来源。 我感到困惑。 可以为shell脚本设置调试选项来中断它调用的程序吗? 不,不能。 外壳中的错误? 在我的脑海中,多种可能的情况在不同方向上疯狂运行。 一杯含咖啡因的饮料即刻完成,可快速前往厨房补充食物,然后我们再次开始。 我打开文件并仔细查看shebang: #!/bin/sh

But /bin/sh is surely just a symlink to bash, so that the script is interpreted in POSIX-compliant mode, right? Wrong! The default non-interactive shell on Debian is dash, and that's exactly what /bin/sh points at.

但是/bin/sh当然只是到bash的符号链接,因此该脚本以POSIX兼容模式进行解释,对吗? 错误! Debian上的默认非交互式shell是破折号,这正是/bin/sh 指向的内容 。

# ls -l /bin/sh
lrwxrwxrwx 1 root root 4 Jan 24  2017 /bin/sh -> dash

If only for debugging, I changed the shebang to #!/bin/bash, removed the set -x and tried again. Finally, a reasonable error output from the next varnish reload:

如果仅用于调试,我将shebang更改为#!/bin/bash ,删除了set -x并再次尝试。 最后,下一次清漆重新加载时将输出合理的错误:

Jan 01 12:00:00 hostname varnishreload[32604]: /usr/sbin/varnishreload: line 124: echo: write error: Broken pipe
Jan 01 12:00:00 hostname varnishreload[32604]: VCL 'reload_20190101_120000_32604' compiled

Line 124, now we're talking!

第124行,现在我们在说话!

114 find_vcl_file() {
115         VCL_SHOW=$(varnishadm vcl.show -v "$VCL_NAME" 2>&1) || :
116         VCL_FILE=$(
117                 echo "$VCL_SHOW" |
118                 awk '$1 == "//" && $2 == "VCL.SHOW" {print; exit}' | {
119                         # all this ceremony to handle blanks in FILE
120                         read -r DELIM VCL_SHOW INDEX SIZE FILE
121                         echo "$FILE"
122                 }
123         ) || :
124
125         if [ -z "$VCL_FILE" ]
126         then
127                 echo "$VCL_SHOW" >&2
128                 fail "failed to get the VCL file name"
129         fi
130
131         echo "$VCL_FILE"
132 }

But as it turns out, line 124 is pretty uneventful. I could only conjecture that the error was produced as part of the multiline command executing at line 116.

但事实证明,第124行是相当顺利的。 我只能推测该错误是作为在第116行执行的多行命令的一部分而产生的。

So what does the above subshell even produce to store in the VCL_FILE variable? In the first part it sends the contents of the VCL_SHOW variable created on the line 115 into the pipe. What happens there, then?

那么,上面的子shell甚至产生什么存储在VCL_FILE变量中? 在第一部分中,它将在行115上创建的VCL_SHOW变量的内容发送到管道中。 那在那里发生了什么呢?

First, it uses varnishadm, which is a standard part of a Varnish installation used to configure Varnish without having to restart it. The subcommand vcl.show -v is used to print the entire VCL configuration specified by ${VCL_NAME} to STDOUT.

首先,它使用varnishadm ,这是Varnish安装的标准部分,用于配置Varnish,而无需重新启动它。 子命令vcl.show -v用于将${VCL_NAME}指定的整个VCL配置打印到STDOUT。

To display the current active VCL config as well as several previous versions of the varnish routing that are still in memory, you can use another command varnishadm vcl.list, whose output would be similar to the below:

要显示当前活动的VCL配置以及仍在内存中的清漆路由的多个先前版本,可以使用另一个命令varnishadm vcl.list ,其输出类似于以下内容:

discarded   cold/busy          1 reload_20190101_120000_11903
discarded   cold/busy          2 reload_20190101_120000_12068
discarded   cold/busy         16 reload_20190101_120000_12259
discarded   cold/busy         16 reload_20190101_120000_12299
discarded   cold/busy         28 reload_20190101_120000_12357
active      auto/warm         32 reload_20190101_120000_12397
available   auto/warm          0 reload_20190101_120000_12587

The variable ${VCL_NAME} is set elsewhere in the varnishreload script to the name of the currently active VCL, if any. In this case, that would be "reload_20190101_120000_12397".

变量${VCL_NAME}varnishreload脚本中的其他位置设置为当前处于活动状态的VCL的名称(如果有)。 在这种情况下,它将是“ reload_20190101_120000_12397”。

Great, so ${VCL_SHOW} now contains a full configuration for Varnish, easy enough so far. Now I finally understood why the dash output with set -x appeared to be so broken — it included the contents of the resulting varnish configuration.

太好了,因此${VCL_SHOW}现在包含Varnish的完整配置,到目前为止非常容易。 现在,我终于明白了为什么带有set -x的破折号输出看起来如此糟糕—它包含了所得清漆配置的内容。

The important thing here is that the full VCL config may often be spliced together from multiple files. C-style comments are used to delineate where config files were included into other config files, which is exactly what the next line of the code snippet is all about.

这里重要的是,完整的VCL配置通常可以从多个文件中拼接在一起。 C风格的注释用于描述其他配置文件中包含配置文件的位置,这正是代码片段的下一行的全部含义。

The syntax of the file-denoting comments has the following format

文件注释的语法具有以下格式

// VCL.SHOW   

The numbers are not important here, what we’re interested in is the filename.

数字在这里并不重要,我们感兴趣的是文件名。

So what in the world is happening in the slew of commands beginning on line 116? Let's pick it apart. There are four parts to the command:

那么从第116行开始的大量命令正在发生什么呢? 让我们分开。 该命令分为四个部分:

  1. A simple echo that prints out the value of ${VCL_SHOW}

    一个简单的echo ,它输出${VCL_SHOW}的值

    echo "$VCL_SHOW"
  2. awk that looks for a line (record) where the first field is '//' and the second is "VCL.SHOW".

    awk查找第一个字段为“ //”,第二个字段为“ VCL.SHOW”的行(记录)。

    Awk is instructed to print the first line matching these patterns and then immediately stop processing.

    指示Awk打印与这些模式匹配的第一行,然后立即停止处理。

    awk '$1 == "//" && $2 == "VCL.SHOW" {print; exit}'
  3. A code block that reads in the whitespace-delimited fields into five variables. The fifth variable FILE gets the remainder of the line. Finally, one last echo prints the contents of the ${FILE} variable.

    一个将空格分隔的字段读入五个变量的代码块。 第五个变量FILE获取行的其余部分。 最后,最后一个回显将打印${FILE}变量的内容。

    { read -r DELIM VCL_SHOW INDEX SIZE FILE; echo "$FILE" }.
  4. As steps 1 through 3 are all encased in a subshell, the output of $FILE will end up in the variable VCL_FILE.

    由于第1步到第3步都包含在一个子外壳中,因此$FILE的输出将最终在变量VCL_FILE

As the comment on line 119 suggests, this way of doing things serves a single purpose: to reliably handle the case where VCL would be referencing filenames with spaces.

正如第119行的注释所建议的那样,这种处理方式有一个目的:可靠地处理VCL引用带空格的文件名的情况。

I commented out the original processing logic for the ${VCL_FILE} and tried to tweak the chain of commands but to no reasonable end. Everything worked in my shell but never when run as a service.

我注释掉了${VCL_FILE}的原始处理逻辑,并试图调整命令链,但没有合理的目的。 一切都在我的外壳程序中运行,但是当作为服务运行时则永远无法运行。

It seems the error is not at all replicable when run by me — meanwhile the estimated 30 minutes passed six times and a new high-priority task put everything aside. The rest of the week was quite full with different tasks, the two exceptions being an internal talk our team had about using sed and an interview with a promising candidate. The issue with making varnishreload error disappear was completely lost to the sands of time.

当我运行该错误时,该错误似乎根本无法复制-同时,估计30分钟过去了六次,而一项新的高优先级任务将所有内容都搁置了下来。 在本周的剩余时间里,工作任务各异,这两个例外是我们团队关于使用sed的内部谈话以及对有前途的候选人的采访。 使varnishreload错误消失的问题完全被时间浪费了。

您所谓的sed-fu ...真的...相当可悲 (Your so-called sed-fu… is really… quite pathetic)

One of the days of the week that followed was pretty free, so I picked the task up again. I had hoped that maybe some background process in my brain was still chipping away at the problem and I'll finally be able to crack it.

接下来一周中的某一天是非常空闲的,所以我再次选择了任务。 我曾希望也许我的大脑中的某些背景过程仍在解决这个问题,而我最终将能够解决它。

Since bending the code last time didn't help, I just opted for a rewrite of line 116. The existing code was insane, anyway. There is absolutely no need to use read here.

由于上次弯曲代码无济于事,所以我只选择了重写第116行。无论如何,现有代码是疯狂的。 绝对没有必要在这里使用read

Looking at the error again: sh: echo: broken pipe — echo is in two places in that command, but I suspect the very first one to be a more likelier culprit (or an accomplice). Awk doesn't inspire confidence either. Well, in case it really is the awk | {read; echo} construct causing all this trouble, why not use something else? Awk's not really being used to its full capabilities on that one-liner and then there is this surplus read.

再次查看该错误: sh: echo: broken pipe —该命令在两个地方都有echo,但是我怀疑第一个是更可能的罪魁祸首(或同伙)。 Awk也不激发信心。 好吧,如果真的是awk | {read; echo} awk | {read; echo} awk | {read; echo}构造会引起所有这些麻烦,为什么不使用其他东西呢? Awk并没有真正在单行代码上充分利用其全部功能,因此有多余的read

Seeing as we had an internal talk about sed the other week, I wanted to try my newly-acquired skills and optimize the echo | awk | { read; echo } into a simpler echo | sed. Although that’s definitely not the proper way to approach debugging, I thought I’d at least try out my sed-fu and maybe learn something new about the problem in the process. In the process, I’ve asked my colleague — the author of the sed talk — to help me come up with a more efficient sed command.

看到我们在前一周对sed进行了内部讨论时,我想尝试一下我新获得的技能并优化echo | awk | { read; echo } echo | awk | { read; echo } echo | awk | { read; echo }为更简单的echo | sed echo | sed 。 尽管这绝对不是进行调试的正确方法,但我认为我至少要尝试一下sed-fu,并且可能会在此过程中了解有关该问题的新知识。 在此过程中,我要求我的同事(sed演讲的作者)帮助我提出更有效的sed命令。

I’ve dumped the varnishadm vcl.show -v "$VCL_NAME" into a file, so I could focus on writing sed without all the hassle around service reloads.

我已经将varnishadm vcl.show -v "$VCL_NAME"转储到文件中,因此我可以集中精力编写sed,而无需重新加载服务。

A short primer on how exactly sed processes input can be found in its GNU manual. In sed sources character \n is explicitly specified as the line separator.

有关sed如何精确处理输入的简短入门,可以在其GNU手册中找到。 在sed来源中,字符\n被明确指定为行分隔符。

After several iterations and input from my colleague, we’ve crafted a sed expression that did produce exactly the same result as the original line 116.

经过几次迭代并得到了同事的输入后,我们精心制作了一个sed表达式,该表达式的确产生了与原始行116完全相同的结果。

Let’s create a sample input file here,

让我们在这里创建一个示例输入文件,

> cat vcl-example.vcl
Text
// VCL.SHOW 0 1578 file with 3 spaces.vcl
More text
// VCL.SHOW 0 1578 file.vcl
Even more text
// VCL.SHOW 0 1578 file with TWOspaces.vcl
Final text

It might not be apparent from the above description, but we’re only interested in the first // VCL.SHOW comment, and there may be several on input. That’s exactly why awk quits after the first match.

从上面的描述中可能看不出来,但是我们只对第一个// VCL.SHOW注释感兴趣,并且可能有几个输入内容。 这就是awk在首场比赛后退出的原因。

# step 1, capture just the comment lines
# using sed capability to specify delimiter character with ‘\#’ instead of the commonly used ‘/’ so there is no need to escape slashes themselves
# and the “address” capability defined as regex “// VCL.SHOW” to search for lines with specific pattern
# -n flag makes sure that the sed does not print all as it does by default (see above link)
# -E switches to the extended regex
> cat vcl-processor-1.sed
\#// VCL.SHOW#p
> sed -En -f vcl-processor-1.sed vcl-example.vcl
// VCL.SHOW 0 1578 file with 3 spaces.vcl
// VCL.SHOW 0 1578 file.vcl
// VCL.SHOW 0 1578 file with TWOspaces.vcl

# step 2, only print out the file name
# using the “substitute” command with regex capture groups to print just that group
# and this is done only for the matches of the previous search
> cat vcl-processor-2.sed
\#// VCL.SHOW# {
    s#.* [0-9]+ [0-9]+ (.*)$#\1#
    p
}
> sed -En -f vcl-processor-2.sed vcl-example.vcl
file with 3 spaces.vcl
file.vcl
file with TWOspaces.vcl

# step 3, make sure to only get the first result
# same as with the awk before, add an immediate exit after the first processed match is printed
> cat vcl-processor-3.sed
\#// VCL.SHOW# {
    s#.* [0-9]+ [0-9]+ (.*)$#\1#
    p
    q
}
> sed -En -f vcl-processor-3.sed vcl-example.vcl
file with 3 spaces.vcl

# step 4, wrap it up into a one-liner using the colon to separate commands
> sed -En -e '\#// VCL.SHOW#{s#.* [0-9]+ [0-9]+ (.*)$#\1#p;q;}' vcl-example.vcl
file with 3 spaces.vcl

So, the contents of the varnishreload script would look something like this:

因此,varnishreload脚本的内容如下所示:

VCL_FILE="$(echo "$VCL_SHOW" | sed -En '\#// VCL.SHOW#{s#.*[0-9]+ [0-9]+ (.*)$#\1#p;q;};')"

The above logic may succinctly be expressed by: if a line matches the regex // VCL.SHOW, then greedily match the text including the two numbers on that line and capture whatever comes after. Emit the capture and quit.

上面的逻辑可以简洁地表示为:如果一行与regex // VCL.SHOW匹配,则贪婪地匹配包含该行上的两个数字的文本并捕获后面的内容。 发出捕获并退出。

Simple, isn't it?

很简单,不是吗?

We were happy with the sed script and the fact what original code it replaces, all test runs I’ve done produced desired results, so I have modified the varnishreload on the server and fired the systemctl reload varnish once again. The dreaded echo: write error: Broken pipe was smiling in our faces. The blinking cursor awaited a new command entry in the dark void of the terminal…

我们对sed脚本和它替换了什么原始代码感到满意,我完成的所有测试运行均产生了预期的结果,因此我修改了服务器上的varnishreload并再次触发了systemctl reload varnish 。 可怕的echo: write error: Broken pipe在我们的脸上微笑。 闪烁的光标在终端的黑暗空白中等待新的命令输入…

翻译自: https://habr.com/en/post/475698/

兔子进洞算法

你可能感兴趣的:(python,linux,java,shell,ubuntu)