pg_terminate_backend()运维改造--打印调用记录及被终止query文本

同学,别再说不是你kill的会话了,这个锅DBA不背。

一、场景

A和B两位同学由于某个重要SQL会话被kill,争论得面红耳赤,原因是一个表加字段没生效。A同学是一位业务研发,B同学是一位DBA。

B同学看了数据库日志,打印如下:

2023-02-26 11:25:01.925 CST,"prouser","prodb",1514,"192.168.2.5",63fad0a5.5ea,4,"idle in transaction",2023-02-26 11:23:17 CST,7/50609,43757341,FATAL,57P01,"terminating connection due to administrator command",,,,,,,,,"psql","client backend",,-7856242769374161054

从日志看对应时间有一个idle in transaction的session被kill了,但是并没记录被kill的具体query,没有十足的证据就是对应sql被kill。

只能是怀疑alter table执行完后,在idle in transaction状态session被kill,导致事务未提交alter table未生效。当然这是根据日志做的推测,没有十足的证据,被A同学怼的没有半点脾气。

二、分析

idle,idle in transaction状态的会话被terminate后不会打印对应的query,具体可以看下进程接收到SIGTERM后ProcessInterrupts和errfinish的处理逻辑。只有active也就是正在执行的被terminate后可以打印出具体的query,但是也存在两个因素无法证明是谁执行的terminate。

1、只记录被terminate的进程信息,不记录操作terminate的进程信息,也就是不记录“凶手”

2、直接在数据库后台kill -15 pid,也是同样的效果,使用pg_terminate_backend() 其实就是封装了kill -15 pid,因此DBA同学也没法完全自证排除嫌疑。

这种情况下,免不了要互相扯皮,甩锅。

那么只要记录pg_terminate_backend() 的调用操作记录,同时记录kill的会话对应的query信息,这里就非常清晰了,不用再浪费时间扯皮了。看起来改造pg_terminate_backend()函数就可以了。

我们都知道当实例发生crash后postmaster会记录异常的process信息,假如是某个query导致了OOM,那么日志会打印出对应进程的query。看了下具体的实现,是postmaster在处理子进程退出时LogChildExit函数记录退出子进程的信息。但是,对于普通backend子进程的FATAL这种level是不调用LogChildExit的,被pg_terminate_backend() kill的就是FATAL类型的报错。

/*
 * HandleChildCrash -- cleanup after failed backend, bgwriter, checkpointer,
 * walwriter, autovacuum, archiver or background worker.
 *
 * The objectives here are to clean up our local state about the child
 * process, and to signal all other remaining children to quickdie.
 */
static void
HandleChildCrash(int pid, int exitstatus, const char *procname)
{
	dlist_mutable_iter iter;
	slist_iter	siter;
	Backend    *bp;
	bool		take_action;

	/*
	 * We only log messages and send signals if this is the first process
	 * crash and we're not doing an immediate shutdown; otherwise, we're only
	 * here to update postmaster's idea of live processes.  If we have already
	 * signaled children, nonzero exit status is to be expected, so don't
	 * clutter log.
	 */
	/* 当子进程是FatalError退出时是不会调用LogChildExit函记录进程信息的 */
	take_action = !FatalError && Shutdown != ImmediateShutdown;
     
	if (take_action)
	{
		LogChildExit(LOG, procname, pid, exitstatus);
		ereport(LOG,
				(errmsg("terminating any other active server processes")));
		SetQuitSignalReason(PMQUIT_FOR_CRASH);
	}
	/* 省略部分代码行 */
}

改动Postmaster不合适,代价太高。修改pg_terminate_backend()函数,直接从LogChildExit中copy记录子进程的逻辑即可。

三、方案

/*
 * Send a signal to terminate a backend process. This is allowed if you are a
 * member of the role whose process is being terminated. If the timeout input
 * argument is 0, then this function just signals the backend and returns
 * true.  If timeout is nonzero, then it waits until no process has the given
 * PID; if the process ends within the timeout, true is returned, and if the
 * timeout is exceeded, a warning is emitted and false is returned.
 *
 * Note that only superusers can signal superuser-owned processes.
 */
Datum
pg_terminate_backend(PG_FUNCTION_ARGS)
{
	int			pid;
	int			r;
	int			timeout;		/* milliseconds */
    /* Modify by Nickyoung at 2023-02-26 AM */
    /*
     * size of activity_buffer is arbitrary, but set equal to default
     * track_activity_query_size
     */
    char                    activity_buffer[1024];
    const char              *activity = NULL;
    /* End at 2023-02-26 AM */
	pid = PG_GETARG_INT32(0);
	timeout = PG_GETARG_INT64(1);

	if (timeout < 0)
		ereport(ERROR,
				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
				 errmsg("\"timeout\" must not be negative")));
	/* 从共享内存查询pid对应query文本 */
	activity = pgstat_get_crashed_backend_activity(pid,
                                                    activity_buffer,
                                                    sizeof(activity_buffer));

    /* 向对应pid发送SIGTERM终止进程 */
	r = pg_signal_backend(pid, SIGTERM);

	if (r == SIGNAL_BACKEND_NOSUPERUSER)
		ereport(ERROR,
				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
				 errmsg("must be a superuser to terminate superuser process")));

	if (r == SIGNAL_BACKEND_NOPERMISSION)
		ereport(ERROR,
				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
				 errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend")));
	/* Modify by Nickyoung at 2023-02-26 AM */
    /*
     * Record the operation of using pg_terminate_backend (PID) 
     * to kill the session and the terminated query 
     */
    /* 如果SIGTERM发送成功,那么打印pg_terminate_backend()函数调用记录,以及被终止的query文本 */
    if (r != SIGNAL_BACKEND_ERROR)
    {

        ereport(WARNING,
                (errmsg("process is terminated by: select pg_terminate_backend(%d), query is: %s" ,pid ,activity)));
    }
    /* End at 2023-02-26 AM */
	/* Wait only on success and if actually requested */
	if (r == SIGNAL_BACKEND_SUCCESS && timeout > 0)
		PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout));
	else
		PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
}

四、验证

1、session1开启事务执行alter table,未提交,进程处于idle in transaction状态

testdb=> begin;
BEGIN
testdb=*> alter table instance_list add column type varchar(50) not null default 'rds';
ALTER TABLE
testdb=*>

2、session2 查询表等锁,进程active状态

testdb=> select * from instance_list limit 1;

3、session3 kill所有admin账户的query

testdb=> select pg_terminate_backend(pid),* from pg_stat_activity where usename='admin' and pid <> pg_backend_pid();
WARNING:  process is terminated by: select pg_terminate_backend(9680), query is: alter table instance_list add column type varchar(50) not null default 'rds';
WARNING:  process is terminated by: select pg_terminate_backend(9744), query is: select * from instance_list limit 1;
 pg_terminate_backend | datid | datname | pid  | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend
_start         |          xact_start           |          query_start          |         state_change          | wait_event_type | wait_event |        state       
 | backend_xid | backend_xmin |       query_id       |                                     query                                     |  backend_type  
----------------------+-------+---------+------+------------+----------+---------+------------------+-------------+-----------------+-------------+----------------
---------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------------------
-+-------------+--------------+----------------------+-------------------------------------------------------------------------------+----------------
 t                    | 24583 | testdb  | 9680 |            |    24582 | admin   | psql             |             |                 |          -1 | 2023-02-26 14:4
6:09.621174+08 | 2023-02-26 14:46:12.540059+08 | 2023-02-26 14:46:17.352702+08 | 2023-02-26 14:46:17.354734+08 | Client          | ClientRead | idle in transaction
 |    43785370 |              | -7856242769374161054 | alter table instance_list add column type varchar(50) not null default 'rds'; | client backend
 t                    | 24583 | testdb  | 9744 |            |    24582 | admin   | psql             |             |                 |          -1 | 2023-02-26 14:4
6:45.833136+08 | 2023-02-26 14:46:48.770609+08 | 2023-02-26 14:46:48.770609+08 | 2023-02-26 14:46:48.770626+08 | Lock            | relation   | active             
 |             |     43785370 |                      | select * from instance_list limit 1;                                          | client backend
(2 rows)

testdb=> 

可以看到执行pg_terminate_backend(pid)后打印了对应的信息

WARNING: process is terminated by: select pg_terminate_backend(9680), query is: alter table instance_list add column type varchar(50) not null default ‘rds’;
WARNING: process is terminated by: select pg_terminate_backend(9744), query is: select * from instance_list limit 1;

查看日志已打印了pg_terminate_backend()函数调用记录,以及被终止的query文本

2023-02-26 14:48:37.792 CST,"admin","testdb",9850,"192.168.2.6",63fb0068.267a,4,"SELECT",2023-02-26 14:47:04 CST,7/21,0,WARNING,01000,"process is terminated by: select pg_terminate_backend(9680), query is: alter table instance_list add column type varchar(50) not null default 'rds';",,,,,,,,,"psql","client backend",,-3764573643027268885
2023-02-26 14:48:37.792 CST,"admin","testdb",9680,"192.168.2.6",63fb0031.25d0,1,"idle in transaction",2023-02-26 14:46:09 CST,4/16,43785370,FATAL,57P01,"terminating connection due to administrator command",,,,,,,,,"psql","client backend",,-7856242769374161054
2023-02-26 14:48:37.792 CST,"admin","testdb",9850,"192.168.2.6",63fb0068.267a,5,"SELECT",2023-02-26 14:47:04 CST,7/21,0,WARNING,01000,"process is terminated by: select pg_terminate_backend(9744), query is: select * from instance_list limit 1;",,,,,,,,,"psql","client backend",,-3764573643027268885
2023-02-26 14:48:37.792 CST,"admin","testdb",9744,"192.168.2.6",63fb0055.2610,1,"SELECT waiting",2023-02-26 14:46:45 CST,5/18,0,FATAL,57P01,"terminating connection due to administrator command",,,,,,"select * from instance_list limit 1;",15,,"psql","client backend",,0

这样就能抓到操作pg_terminate_backend()的凶手了,再也不用扯皮喽。

不过,因为和postmaster处理不同,postmaster是回收子进程过程中记录要退出子进程的query信息,子进程这里已经在退出逻辑中不再执行新sql了,postmaster记录的就是子进程最终执行的query。

而我们这里的方案是两个同级的子进程,先获取了query,然后紧接着去发送信号终止进程,这是个异步的过程,有细微的时间差(纳秒级别),获取query过程中假如连接数非常多,那么轮询匹配pid的过程可能就会久一些。那么假如批量select pg_terminate_backend(pid),pid,state,query from pg_stat_activity where xxx;杀连接的过程中,对于执行sql很快的活跃连接(比如单条sql执行1ms都不到这种)可能就会存在本来要杀这个sql,实际终止时已经执行到了其他sql。
我用pgbench做了测试,实例连接数20000的情况下,批量 杀2000 active连接,大约会出现20个pid是这个情况。
修改为先发信号终止进程,再获取query,这个又会出现采集不到query的情况,可能在采集时进程就已经退出了,获取的query就是NULL。

其实换个思路来想,假如我想杀一个pid ,这个pid执行的sql很快等我操作完pg_terminate_backend(pid)后可能已经是在执行其他query了。这种情况下是不可避免的,不过这个方案对于pg_terminate_backend(pid)的操作记录可以很准确的抓取到。

你可能感兴趣的:(PostgreSQL,运维,数据库,postgresql)