From patchwork Wed Mar 7 14:07:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Wilck X-Patchwork-Id: 10264159 X-Patchwork-Delegate: christophe.varoqui@free.fr Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DDFE560247 for ; Wed, 7 Mar 2018 14:08:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CEA6529331 for ; Wed, 7 Mar 2018 14:08:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C36AA29529; Wed, 7 Mar 2018 14:08:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 72F9929331 for ; Wed, 7 Mar 2018 14:08:58 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 515D1804E4; Wed, 7 Mar 2018 14:08:57 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2218C5D77C; Wed, 7 Mar 2018 14:08:57 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id E19F4181B9FE; Wed, 7 Mar 2018 14:08:56 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id w27E8abk028267 for ; Wed, 7 Mar 2018 09:08:36 -0500 Received: by smtp.corp.redhat.com (Postfix) id 70F737C8C1; Wed, 7 Mar 2018 14:08:36 +0000 (UTC) Delivered-To: dm-devel@redhat.com Received: from mx1.redhat.com (ext-mx03.extmail.prod.ext.phx2.redhat.com [10.5.110.27]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5103A7C8C7; Wed, 7 Mar 2018 14:08:33 +0000 (UTC) Received: from smtp.nue.novell.com (smtp.nue.novell.com [195.135.221.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 90C8780F7C; Wed, 7 Mar 2018 14:08:31 +0000 (UTC) Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Wed, 07 Mar 2018 15:08:29 +0100 Received: from apollon.suse.de.de (nwb-a10-snat.microfocus.com [10.120.13.202]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Wed, 07 Mar 2018 14:08:00 +0000 From: Martin Wilck To: Christophe Varoqui , Guan Junxiong Date: Wed, 7 Mar 2018 15:07:32 +0100 Message-Id: <20180307140733.28709-2-mwilck@suse.com> In-Reply-To: <20180307140733.28709-1-mwilck@suse.com> References: <20180307140733.28709-1-mwilck@suse.com> X-Greylist: Sender passed SPF test, Sender IP whitelisted by DNSRBL, ACL 207 matched, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 07 Mar 2018 14:08:32 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 07 Mar 2018 14:08:32 +0000 (UTC) for IP:'195.135.221.5' DOMAIN:'smtp.nue.novell.com' HELO:'smtp.nue.novell.com' FROM:'mwilck@suse.com' RCPT:'' X-RedHat-Spam-Score: -2.301 (RCVD_IN_DNSWL_MED, SPF_PASS) 195.135.221.5 smtp.nue.novell.com 195.135.221.5 smtp.nue.novell.com X-Scanned-By: MIMEDefang 2.78 on 10.5.110.27 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-loop: dm-devel@redhat.com Cc: dm-devel@redhat.com, Martin Wilck Subject: [dm-devel] [PATCH v3 1/2] libmultipath: fix race in stop_io_err_stat_thread X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 07 Mar 2018 14:08:57 +0000 (UTC) X-Virus-Scanned: ClamAV using ClamSMTP It's wrong, and unnecessary, to call pthread_kill() after pthread_cancel(). I have observed cases where the io_err checker thread hung in libpthread after receiving the USR2 signal, in particular when multipathd is run under strace. (If multipathd is killed with SIGINT under strace, and the io_error thread is running, it happens almost every time). If this happens, the io_err thread tries to obtain a mutex in the urcu code (presumably rcu_unregister_thread()) and the main thread hangs in pthread_join(). multipathd can only be terminated with kill -KILL in this situation. With the change from this patch, the thread is shut down cleanly. I haven't observed the hang under strace with the patch. Fixes: 95d594fd "multipath-tools: intermittent IO error accounting to improve reliability" Signed-off-by: Martin Wilck --- libmultipath/io_err_stat.c | 1 - 1 file changed, 1 deletion(-) diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c index 00bac9e0e755..536ba87968fd 100644 --- a/libmultipath/io_err_stat.c +++ b/libmultipath/io_err_stat.c @@ -749,7 +749,6 @@ destroy_ctx: void stop_io_err_stat_thread(void) { pthread_cancel(io_err_stat_thr); - pthread_kill(io_err_stat_thr, SIGUSR2); pthread_join(io_err_stat_thr, NULL); free_io_err_pathvec(paths); io_destroy(ioctx);