diff mbox

[v3,1/2] libmultipath: fix race in stop_io_err_stat_thread

Message ID 20180307140733.28709-2-mwilck@suse.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show

Commit Message

Martin Wilck March 7, 2018, 2:07 p.m. UTC
It's wrong, and unnecessary, to call pthread_kill() after
pthread_cancel(). I have observed cases where the io_err checker
thread hung in libpthread after receiving the USR2 signal, in particular
when multipathd is run under strace. (If multipathd is killed with
SIGINT under strace, and the io_error thread is running, it happens
almost every time). If this happens, the io_err thread
tries to obtain a mutex in the urcu code (presumably rcu_unregister_thread())
and the main thread hangs in pthread_join(). multipathd can only be
terminated with kill -KILL in this situation.

With the change from this patch, the thread is shut down cleanly. I haven't
observed the hang under strace with the patch.

Fixes: 95d594fd "multipath-tools: intermittent IO error accounting to improve
reliability"

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/io_err_stat.c | 1 -
 1 file changed, 1 deletion(-)
diff mbox

Patch

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index 00bac9e0e755..536ba87968fd 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -749,7 +749,6 @@  destroy_ctx:
 void stop_io_err_stat_thread(void)
 {
 	pthread_cancel(io_err_stat_thr);
-	pthread_kill(io_err_stat_thr, SIGUSR2);
 	pthread_join(io_err_stat_thr, NULL);
 	free_io_err_pathvec(paths);
 	io_destroy(ioctx);