The problem that the user subprocess cannot exit after the postmaster of postgres 9.x is kill ed

scene

In the pg9.x database version, after the postmaster process is kill ed, the long connected users will not run differently~~~

Here is an example:

postgres=# select version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.5.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
(1 row)

postgres=# select * from t1;
 i 
---
 1
 2
(2 rows)

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          13744
(1 row)

postgres=#

Maintain the above connection while performing the following:

[lichuancheng@IP26 ~]$ ps -ef|grep post
root       871     1  0 Feb23 ?        00:00:51 /usr/libexec/postfix/master -w
postfix    873   871  0 Feb23 ?        00:00:09 qmgr -l -t unix -u
postfix  11469   871  0 18:58 ?        00:00:00 pickup -l -t unix -u
lichuan+ 13732     1  0 19:33 pts/1    00:00:00 /work/opt/lichuancheng/HighGo/db/20180905_1/bin/postgres -D ../data
lichuan+ 13734 13732  0 19:33 ?        00:00:00 postgres: checkpointer process   
lichuan+ 13735 13732  0 19:33 ?        00:00:00 postgres: writer process   
lichuan+ 13736 13732  0 19:33 ?        00:00:00 postgres: wal writer process   
lichuan+ 13737 13732  0 19:33 ?        00:00:00 postgres: autovacuum launcher process   
lichuan+ 13738 13732  0 19:33 ?        00:00:00 postgres: stats collector process   
lichuan+ 13743 10325  0 19:33 pts/1    00:00:00 ./psql -p 5432 -d postgres
lichuan+ 13744 13732  0 19:33 ?        00:00:00 postgres: lichuancheng postgres [local] idle
lichuan+ 13791 10262  0 19:37 pts/0    00:00:00 grep --color=auto post
[lichuancheng@IP26 ~]$ kill -s 9 13732
[lichuancheng@IP26 ~]$ ps -ef|grep post
root       871     1  0 Feb23 ?        00:00:51 /usr/libexec/postfix/master -w
postfix    873   871  0 Feb23 ?        00:00:09 qmgr -l -t unix -u
postfix  11469   871  0 18:58 ?        00:00:00 pickup -l -t unix -u
lichuan+ 13743 10325  0 19:33 pts/1    00:00:00 ./psql -p 5432 -d postgres
lichuan+ 13744     1  0 19:33 ?        00:00:00 postgres: lichuancheng postgres [local] idle
lichuan+ 13801 10262  0 19:37 pts/0    00:00:00 grep --color=auto post
[lichuancheng@IP26 ~]

Then?

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          13744
(1 row)

postgres=# select * from t1;
 i 
---
 1
 2
(2 rows)

postgres=# insert into t1 values(3);
INSERT 0 1
postgres=# insert into t1 values(4);
INSERT 0 1
postgres=# select * from t1;
 i 
---
 1
 2
 3
 4
(4 rows)

postgres=#

 

problem analysis

To some extent, such results are risky. How should users of pg9.x pay attention to judge whether the postmaster process is still alive when using long connections? At present, I don 't think of a good function to test the postmaster process. What should I do? Think hard - > use dynamic load C function to solve this problem

Solution

Write C code first

//ppid.c
#include "postgres.h"
#include "fmgr.h"
#include <sys/types.h>
#include <unistd.h>

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(getparentProgress);

Datum
getparentProgress(PG_FUNCTION_ARGS)
{
        int     ppid = 0;

	ppid = getppid();
	if(1 == ppid)
	        PG_RETURN_BOOL(false);
	else
		 PG_RETURN_BOOL(true);
}

Compile C code

//  The following / opt/db / is the database installation directory
gcc -ppid -c ppid.c -I /opt/db/include/postgresql/server/
gcc -shared -o ppid .so ppid .o

Then copy ppid.so to the / opt/db/lib/postgresql directory

Create test function

postgres=# create or replace function getparentProgress() returns bool
as 'ppid','getparentProgress'
language C strict;
CREATE FUNCTION
postgres=# select getparentProgress();
 getparentprogress 
-------------------
 t
(1 row)

postgres=#

Test for usefulness

Restart database

[lichuancheng@IP26 bin]$ ./pg_ctl restart -D ../data
waiting for server to shut down....LOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
 done
server stopped
server starting
[lichuancheng@IP26 bin]$ LOG:  database system was shut down at 2018-09-05 19:56:55 CST
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

[lichuancheng@IP26 bin]$ ./psql -p 5432 -d postgres
psql (9.5.0)
Type "help" for help.

postgres=# select getparentProgress();
 getparentprogress 
-------------------
 t
(1 row)

postgres=# select * from t1;
 i 
---
 1
 2
 3
 4
(4 rows)

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          14491
(1 row)

postgres=#

kill postmaster

[lichuancheng@IP26 postgresql]$ ps -ef|grep post
root       871     1  0 Feb23 ?        00:00:51 /usr/libexec/postfix/master -w
postfix    873   871  0 Feb23 ?        00:00:09 qmgr -l -t unix -u
postfix  11469   871  0 18:58 ?        00:00:00 pickup -l -t unix -u
lichuan+ 14483     1  0 19:56 pts/1    00:00:00 /work/opt/lichuancheng/HighGo/db/20180905_1/bin/postgres -D ../data
lichuan+ 14485 14483  0 19:56 ?        00:00:00 postgres: checkpointer process   
lichuan+ 14486 14483  0 19:56 ?        00:00:00 postgres: writer process   
lichuan+ 14487 14483  0 19:56 ?        00:00:00 postgres: wal writer process   
lichuan+ 14488 14483  0 19:56 ?        00:00:00 postgres: autovacuum launcher process   
lichuan+ 14489 14483  0 19:56 ?        00:00:00 postgres: stats collector process   
lichuan+ 14490 10325  0 19:57 pts/1    00:00:00 ./psql -p 5432 -d postgres
lichuan+ 14491 14483  0 19:57 ?        00:00:00 postgres: lichuancheng postgres [local] idle
lichuan+ 15102 10262  0 19:58 pts/0    00:00:00 grep --color=auto post
[lichuancheng@IP26 postgresql]$ kill -s 9 14483
[lichuancheng@IP26 postgresql]$ ps -ef|grep post
root       871     1  0 Feb23 ?        00:00:51 /usr/libexec/postfix/master -w
postfix    873   871  0 Feb23 ?        00:00:09 qmgr -l -t unix -u
postfix  11469   871  0 18:58 ?        00:00:00 pickup -l -t unix -u
lichuan+ 14490 10325  0 19:57 pts/1    00:00:00 ./psql -p 5432 -d postgres
lichuan+ 14491     1  0 19:57 ?        00:00:00 postgres: lichuancheng postgres [local] idle
lichuan+ 15104 10262  0 19:59 pts/0    00:00:00 grep --color=auto post
[lichuancheng@IP26 postgresql]$ 

Witness the miracle

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          14491
(1 row)

postgres=# select getparentProgress();
 getparentprogress 
-------------------
 f
(1 row)
# false is returned here to indicate that the postmaster no longer exists, and the application should handle it accordingly

postgres=# select * from t1;
 i 
---
 1
 2
 3
 4
(4 rows)

postgres=# 

summary

Long connected users can first execute select getparentProgress() before operating the database; if false is returned, the postmaster has been disconnected.

The C function in this paper is just written casually and can be written according to the actual situation.

Tags: Unix Database PostgreSQL Linux

Posted on Thu, 02 Jan 2020 18:28:51 -0800 by rami103