From patchwork Tue Apr 23 19:32:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Wilck X-Patchwork-Id: 10913675 X-Patchwork-Delegate: christophe.varoqui@free.fr Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0AF4C1575 for ; Tue, 23 Apr 2019 19:33:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EEC882870C for ; Tue, 23 Apr 2019 19:33:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E2CD028998; Tue, 23 Apr 2019 19:33:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 560392870C for ; Tue, 23 Apr 2019 19:33:20 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 543BB7DCED; Tue, 23 Apr 2019 19:33:19 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2FD354523; Tue, 23 Apr 2019 19:33:19 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id C2927181AC47; Tue, 23 Apr 2019 19:33:18 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x3NJXGPT008293 for ; Tue, 23 Apr 2019 15:33:16 -0400 Received: by smtp.corp.redhat.com (Postfix) id C9BB81018A00; Tue, 23 Apr 2019 19:33:16 +0000 (UTC) Delivered-To: dm-devel@redhat.com Received: from mx1.redhat.com (ext-mx19.extmail.prod.ext.phx2.redhat.com [10.5.110.48]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AFACE1001E80; Tue, 23 Apr 2019 19:33:14 +0000 (UTC) Received: from smtp2.provo.novell.com (smtp2.provo.novell.com [137.65.250.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B673F307D96F; Tue, 23 Apr 2019 19:33:13 +0000 (UTC) Received: from apollon.suse.de.de (prva10-snat226-2.provo.novell.com [137.65.226.36]) by smtp2.provo.novell.com with ESMTP (TLS encrypted); Tue, 23 Apr 2019 13:33:08 -0600 From: Martin Wilck To: Christophe Varoqui , Benjamin Marzinski Date: Tue, 23 Apr 2019 21:32:37 +0200 Message-Id: <20190423193237.28391-1-mwilck@suse.com> MIME-Version: 1.0 X-Greylist: Sender passed SPF test, Sender IP whitelisted by DNSRBL, ACL 216 matched, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Tue, 23 Apr 2019 19:33:14 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Tue, 23 Apr 2019 19:33:14 +0000 (UTC) for IP:'137.65.250.81' DOMAIN:'smtp2.provo.novell.com' HELO:'smtp2.provo.novell.com' FROM:'mwilck@suse.com' RCPT:'' X-RedHat-Spam-Score: -2.301 (RCVD_IN_DNSWL_MED, SPF_PASS) 137.65.250.81 smtp2.provo.novell.com 137.65.250.81 smtp2.provo.novell.com X-Scanned-By: MIMEDefang 2.84 on 10.5.110.48 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-loop: dm-devel@redhat.com Cc: dm-devel@redhat.com, Martin Wilck Subject: [dm-devel] [PATCH] multipath -u: test socket connection in non-blocking mode X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Tue, 23 Apr 2019 19:33:19 +0000 (UTC) X-Virus-Scanned: ClamAV using ClamSMTP Since commit d7188fcd "multipathd: start daemon after udev trigger", multipathd startup is delayed during boot until after "udev settle" terminates. But "multipath -u" is run by udev workers for storage devices, and attempts to connect to the multipathd socket. This causes a start job for multipathd to be scheduled by systemd, but that job won't be started until "udev settle" finishes. This is not a problem on systems with 129 or less storage units, because the connect() call of "multipath -u" will succeed anyway. But on larger systems, the listen backlog of the systemd socket can be exceeded, which causes connect() calls for the socket to block until multipathd starts up and begins calling accept(). This creates a deadlock situation, because "multipath -u" (called by udev workers) blocks, and thus "udev settle" doesn't finish, delaying multipathd startup. This situation then persists until either the workers or "udev settle" time out. In the former case, path devices might be misclassified as non-multipath devices by "multipath -u". Fix this by using a non-blocking socket fd for connect() and interpret the errno appropriately. This patch reverts most of the changes from commit 8cdf6661 "multipath: check on multipathd without starting it". Instead, "multipath -u" does access the socket and start multipath again (which is what we want IMO), but it is now able to detect and handle the "full backlog" situation. --- libmpathcmd/mpath_cmd.c | 33 +++++++++++++++++--- libmpathcmd/mpath_cmd.h | 15 +++++++++ multipath/main.c | 67 ++++++++++++----------------------------- 3 files changed, 64 insertions(+), 51 deletions(-) diff --git a/libmpathcmd/mpath_cmd.c b/libmpathcmd/mpath_cmd.c index df4ca541..265af1fb 100644 --- a/libmpathcmd/mpath_cmd.c +++ b/libmpathcmd/mpath_cmd.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "mpath_cmd.h" @@ -93,10 +94,11 @@ static size_t write_all(int fd, const void *buf, size_t len) /* * connect to a unix domain socket */ -int mpath_connect(void) +int __mpath_connect(int nonblocking) { - int fd, len; + int fd, len, err; struct sockaddr_un addr; + int flags = 0; memset(&addr, 0, sizeof(addr)); addr.sun_family = AF_LOCAL; @@ -106,16 +108,39 @@ int mpath_connect(void) fd = socket(AF_LOCAL, SOCK_STREAM, 0); if (fd == -1) - return -1; + return -errno; + + if (nonblocking) { + flags = fcntl(fd, F_GETFL, 0); + if (flags != -1) + (void)fcntl(fd, F_SETFL, flags|O_NONBLOCK); + } if (connect(fd, (struct sockaddr *)&addr, len) == -1) { + err = -errno; close(fd); - return -1; + return err; } + if (nonblocking && flags != -1) + (void)fcntl(fd, F_SETFL, flags); + return fd; } +/* + * connect to a unix domain socket + */ +int mpath_connect(void) +{ + int fd = __mpath_connect(0); + + if (fd >= 0) + return fd; + errno = -fd; + return -1; +} + int mpath_disconnect(int fd) { return close(fd); diff --git a/libmpathcmd/mpath_cmd.h b/libmpathcmd/mpath_cmd.h index 15aeb067..0d4c45cd 100644 --- a/libmpathcmd/mpath_cmd.h +++ b/libmpathcmd/mpath_cmd.h @@ -34,6 +34,21 @@ extern "C" { #define DEFAULT_REPLY_TIMEOUT 4000 +/* + * DESCRIPTION: + * Same as mpath_connect() (see below) except for the error return code + * and the "nonblocking" parameter. + * If "nonblocking" is set, connects in non-blocking mode. This is + * useful to avoid blocking if the listening socket's backlog is + * exceeded. In this case, -EAGAIN will be returned. + * Even with "nonblocking" set, the returned file descriptor is in + * blocking mode in case of success. + * + * RETURNS: + * A file descriptor (>= 0) on success. -errno on failure. + */ +int __mpath_connect(int nonblocking); + /* * DESCRIPTION: * Connect to the running multipathd daemon. On systems with the diff --git a/multipath/main.c b/multipath/main.c index 008e3d3f..52220dbf 100644 --- a/multipath/main.c +++ b/multipath/main.c @@ -850,55 +850,28 @@ out: return r; } -int is_multipathd_running(void) +static int test_multipathd_socket(void) { - FILE *f = NULL; - char buf[16]; - char path[PATH_MAX]; - int pid; - char *end; + int fd; + /* + * "multipath -u" may be run before the daemon is started. In this + * case, systemd might own the socket but might delay multipathd + * startup until some other unit (udev settle!) has finished + * starting. With many LUNs, the listen backlog may be exceeded, which + * would cause connect() to block. This causes udev workers calling + * "multipath -u" to hang, and thus creates a deadlock, until "udev + * settle" times out. To avoid this, call connect() in non-blocking + * mode here, and take EAGAIN as indication for a filled-up systemd + * backlog. + */ - f = fopen(DEFAULT_PIDFILE, "r"); - if (!f) { - if (errno != ENOENT) - condlog(4, "can't open " DEFAULT_PIDFILE ": %s", - strerror(errno)); - return 0; - } - if (!fgets(buf, sizeof(buf), f)) { - if (ferror(f)) - condlog(4, "read of " DEFAULT_PIDFILE " failed: %s", - strerror(errno)); - fclose(f); - return 0; - } - fclose(f); - errno = 0; - strchop(buf); - pid = strtol(buf, &end, 10); - if (errno != 0 || pid <= 0 || *end != '\0') { - condlog(4, "invalid contents in " DEFAULT_PIDFILE ": '%s'", - buf); - return 0; - } - snprintf(path, sizeof(path), "/proc/%d/comm", pid); - f = fopen(path, "r"); - if (!f) { - if (errno != ENOENT) - condlog(4, "can't open %s: %s", path, strerror(errno)); - return 0; - } - if (!fgets(buf, sizeof(buf), f)) { - if (ferror(f)) - condlog(4, "read of %s failed: %s", path, - strerror(errno)); - fclose(f); - return 0; - } - fclose(f); - strchop(buf); - if (strcmp(buf, "multipathd") != 0) + fd = __mpath_connect(1); + if (fd == -EAGAIN) + condlog(3, "daemon backlog exceeded"); + else if (fd < 0) return 0; + else + close(fd); return 1; } @@ -1080,7 +1053,7 @@ main (int argc, char *argv[]) } if (cmd == CMD_VALID_PATH && dev_type == DEV_UEVENT) { - if (!is_multipathd_running()) { + if (!test_multipathd_socket()) { condlog(3, "%s: daemon is not running", dev); if (!systemd_service_enabled(dev)) { r = print_cmd_valid(RTVL_NO, NULL, conf);