From patchwork Thu Feb 8 23:56:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Marzinski X-Patchwork-Id: 10208141 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 609B260223 for ; Thu, 8 Feb 2018 23:58:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4F6FF29715 for ; Thu, 8 Feb 2018 23:58:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 434B329718; Thu, 8 Feb 2018 23:58:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 82E8529715 for ; Thu, 8 Feb 2018 23:58:43 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 672C610F3D8; Thu, 8 Feb 2018 23:58:42 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3F7125D960; Thu, 8 Feb 2018 23:58:42 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id DAB9E18033E6; Thu, 8 Feb 2018 23:58:41 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id w18NuEDs004072 for ; Thu, 8 Feb 2018 18:56:15 -0500 Received: by smtp.corp.redhat.com (Postfix) id CB1D4B07B2; Thu, 8 Feb 2018 23:56:14 +0000 (UTC) Delivered-To: dm-devel@redhat.com Received: from redhat.com (octiron.msp.redhat.com [10.15.80.209]) by smtp.corp.redhat.com (Postfix) with SMTP id 4B5C5DEEDD; Thu, 8 Feb 2018 23:56:11 +0000 (UTC) Received: by redhat.com (sSMTP sendmail emulation); Thu, 08 Feb 2018 17:56:10 -0600 From: "Benjamin Marzinski" To: device-mapper development Date: Thu, 8 Feb 2018 17:56:01 -0600 Message-Id: <1518134167-15938-2-git-send-email-bmarzins@redhat.com> In-Reply-To: <1518134167-15938-1-git-send-email-bmarzins@redhat.com> References: <1518134167-15938-1-git-send-email-bmarzins@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-loop: dm-devel@redhat.com Cc: Bart Van Assche , Martin Wilck Subject: [dm-devel] [PATCH v2 1/7] libmultipath: fix tur checker locking X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Thu, 08 Feb 2018 23:58:42 +0000 (UTC) X-Virus-Scanned: ClamAV using ClamSMTP Commit 6e2423fd fixed a bug where the tur checker could cancel a detached thread after it had exitted. However in fixing this, the new code grabbed a mutex (to call condlog) while holding a spin_lock. To deal with this, I've done away with the holder spin_lock completely, and replaced it with two atomic variables, based on a suggestion by Martin Wilck. The holder variable works exactly like before. When the checker is initialized, it is set to 1. When a thread is created it is incremented. When either the thread or the checker are done with the context, they atomically decrement the holder variable and check its value. If it is 0, they free the context. If it is 1, they never touch the context again. The other variable has changed. First, ct->running and ct->thread have switched uses. ct->thread is now only ever accessed by the checker, never the thread. If it is non-NULL, a thread has started up, but hasn't been dealt with by the checker yet. It is also obviously used by the checker to cancel the thread. ct->running is now an atomic variable. When the thread is started it is set to 1. When the checker wants to kill a thread, it atomically sets the value to 0 and reads the previous value. If it was 1, the checker cancels the thread. If it was 0, the nothing needs to be done. After the checker has dealt with the thread, it sets ct->thread to NULL. When the thread is done, it atomicalllys sets the value of ct->running to 0 and reads the previous value. If it was 1, the thread just exits. If it was 0, then the checker is trying to cancel the thread, and so the thread calls pause(), which is a cancellation point. Cc: Martin Wilck Cc: Bart Van Assche Signed-off-by: Benjamin Marzinski --- libmultipath/checkers/tur.c | 98 ++++++++++++++++++--------------------------- 1 file changed, 40 insertions(+), 58 deletions(-) diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c index b4a5cb2..894ad41 100644 --- a/libmultipath/checkers/tur.c +++ b/libmultipath/checkers/tur.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "checkers.h" @@ -44,7 +45,6 @@ struct tur_checker_context { pthread_t thread; pthread_mutex_t lock; pthread_cond_t active; - pthread_spinlock_t hldr_lock; int holders; char message[CHECKER_MSG_LEN]; }; @@ -74,13 +74,12 @@ int libcheck_init (struct checker * c) ct->state = PATH_UNCHECKED; ct->fd = -1; - ct->holders = 1; + uatomic_set(&ct->holders, 1); pthread_cond_init_mono(&ct->active); pthread_mutexattr_init(&attr); pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE); pthread_mutex_init(&ct->lock, &attr); pthread_mutexattr_destroy(&attr); - pthread_spin_init(&ct->hldr_lock, PTHREAD_PROCESS_PRIVATE); c->context = ct; return 0; @@ -90,7 +89,6 @@ static void cleanup_context(struct tur_checker_context *ct) { pthread_mutex_destroy(&ct->lock); pthread_cond_destroy(&ct->active); - pthread_spin_destroy(&ct->hldr_lock); free(ct); } @@ -99,16 +97,14 @@ void libcheck_free (struct checker * c) if (c->context) { struct tur_checker_context *ct = c->context; int holders; - pthread_t thread; - - pthread_spin_lock(&ct->hldr_lock); - ct->holders--; - holders = ct->holders; - thread = ct->thread; - pthread_spin_unlock(&ct->hldr_lock); - if (holders) - pthread_cancel(thread); - else + int running; + + running = uatomic_xchg(&ct->running, 0); + if (running) + pthread_cancel(ct->thread); + ct->thread = 0; + holders = uatomic_sub_return(&ct->holders, 1); + if (!holders) cleanup_context(ct); c->context = NULL; } @@ -218,26 +214,20 @@ retry: static void cleanup_func(void *data) { - int holders; + int running, holders; struct tur_checker_context *ct = data; - pthread_spin_lock(&ct->hldr_lock); - ct->holders--; - holders = ct->holders; - ct->thread = 0; - pthread_spin_unlock(&ct->hldr_lock); + + running = uatomic_xchg(&ct->running, 0); + holders = uatomic_sub_return(&ct->holders, 1); if (!holders) cleanup_context(ct); + if (!running) + pause(); } static int tur_running(struct tur_checker_context *ct) { - pthread_t thread; - - pthread_spin_lock(&ct->hldr_lock); - thread = ct->thread; - pthread_spin_unlock(&ct->hldr_lock); - - return thread != 0; + return (uatomic_read(&ct->running) != 0); } static void copy_msg_to_tcc(void *ct_p, const char *msg) @@ -325,7 +315,6 @@ int libcheck_check(struct checker * c) int tur_status, r; char devt[32]; - if (!ct) return PATH_UNCHECKED; @@ -349,35 +338,26 @@ int libcheck_check(struct checker * c) return PATH_WILD; } - if (ct->running) { - /* - * Check if TUR checker is still running. Hold hldr_lock - * around the pthread_cancel() call to avoid that - * pthread_cancel() gets called after the (detached) TUR - * thread has exited. - */ - pthread_spin_lock(&ct->hldr_lock); - if (ct->thread) { - if (tur_check_async_timeout(c)) { - condlog(3, "%s: tur checker timeout", - tur_devt(devt, sizeof(devt), ct)); + if (ct->thread) { + if (tur_check_async_timeout(c)) { + int running = uatomic_xchg(&ct->running, 0); + if (running) pthread_cancel(ct->thread); - ct->running = 0; - MSG(c, MSG_TUR_TIMEOUT); - tur_status = PATH_TIMEOUT; - } else { - condlog(3, "%s: tur checker not finished", + condlog(3, "%s: tur checker timeout", + tur_devt(devt, sizeof(devt), ct)); + ct->thread = 0; + MSG(c, MSG_TUR_TIMEOUT); + tur_status = PATH_TIMEOUT; + } else if (tur_running(ct)) { + condlog(3, "%s: tur checker not finished", tur_devt(devt, sizeof(devt), ct)); - ct->running++; - tur_status = PATH_PENDING; - } + tur_status = PATH_PENDING; } else { /* TUR checker done */ - ct->running = 0; + ct->thread = 0; tur_status = ct->state; strlcpy(c->message, ct->message, sizeof(c->message)); } - pthread_spin_unlock(&ct->hldr_lock); pthread_mutex_unlock(&ct->lock); } else { if (tur_running(ct)) { @@ -391,19 +371,17 @@ int libcheck_check(struct checker * c) ct->state = PATH_UNCHECKED; ct->fd = c->fd; ct->timeout = c->timeout; - pthread_spin_lock(&ct->hldr_lock); - ct->holders++; - pthread_spin_unlock(&ct->hldr_lock); + uatomic_add(&ct->holders, 1); + uatomic_set(&ct->running, 1); tur_set_async_timeout(c); setup_thread_attr(&attr, 32 * 1024, 1); r = pthread_create(&ct->thread, &attr, tur_thread, ct); pthread_attr_destroy(&attr); if (r) { - pthread_spin_lock(&ct->hldr_lock); - ct->holders--; - pthread_spin_unlock(&ct->hldr_lock); - pthread_mutex_unlock(&ct->lock); + uatomic_sub(&ct->holders, 1); + uatomic_set(&ct->running, 0); ct->thread = 0; + pthread_mutex_unlock(&ct->lock); condlog(3, "%s: failed to start tur thread, using" " sync mode", tur_devt(devt, sizeof(devt), ct)); return tur_check(c->fd, c->timeout, @@ -418,8 +396,12 @@ int libcheck_check(struct checker * c) (tur_status == PATH_PENDING || tur_status == PATH_UNCHECKED)) { condlog(3, "%s: tur checker still running", tur_devt(devt, sizeof(devt), ct)); - ct->running = 1; tur_status = PATH_PENDING; + } else { + int running = uatomic_xchg(&ct->running, 0); + if (running) + pthread_cancel(ct->thread); + ct->thread = 0; } }