From patchwork Mon Nov 4 06:52:59 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Schiele X-Patchwork-Id: 3134691 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 1A8FFBEEB2 for ; Mon, 4 Nov 2013 06:53:21 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 36BDF2009C for ; Mon, 4 Nov 2013 06:53:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 24AF720303 for ; Mon, 4 Nov 2013 06:53:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753013Ab3KDGxJ (ORCPT ); Mon, 4 Nov 2013 01:53:09 -0500 Received: from demumfd001.nsn-inter.net ([93.183.12.32]:11596 "EHLO demumfd001.nsn-inter.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752900Ab3KDGxG (ORCPT ); Mon, 4 Nov 2013 01:53:06 -0500 Received: from demuprx017.emea.nsn-intra.net ([10.150.129.56]) by demumfd001.nsn-inter.net (8.12.11.20060308/8.12.11) with ESMTP id rA46r1Vj031848 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Mon, 4 Nov 2013 07:53:01 +0100 Received: from ulegcprs1.emea.nsn-net.net ([10.151.15.253]) by demuprx017.emea.nsn-intra.net (8.12.11.20060308/8.12.11) with ESMTP id rA46r0j7030959; Mon, 4 Nov 2013 07:53:01 +0100 Received: by ulegcprs1.emea.nsn-net.net (Postfix, from userid 1000) id 1C1A52390; Mon, 4 Nov 2013 07:52:59 +0100 (CET) Date: Mon, 4 Nov 2013 07:52:59 +0100 From: Robert Schiele To: linux-nfs@vger.kernel.org Cc: Jim Rees Subject: [PATCH] fix race condition for parallel startup of statd Message-ID: <20131104065259.GA21230@ulegcprs1.emea.nsn-net.net> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-purgate-type: clean X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de X-purgate: clean X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate-size: 2715 X-purgate-ID: 151667::1383547981-00004A43-DA1FD66D/0-0/0-0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FREEMAIL_FROM,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When start_statd figures out that statd is not yet running it starts it, waits for the invoked process to complete, and finally verifies that statd is working. This approach works for serially mounting NFS file systems but has a race condition for parallel mounting. In the parallel case it can happen that two mount commands A and B both decide that statd needs to be started. Both of them try to start statd. Obviously only one of them can successfully do so, let's assume this is command A in our case. The statd invoked by B terminates because the resource is already claimed by the statd invoked by A. The termination of B's statd though is before the statd of A has completely set up all things. This causes the check for a working statd of command B to fail and terminate the mount request with an error. To prevent this we define a timeout value. In case the initial check after invoking statd fails we try again in a loop 10 times a second until the timeout is reached. In our tests when the race occurred we typically were successful already on the first retry within the loop. --- This is the modified version of the patch with the loop calling nfs_probe_statd only in one place but still leaving the loop without a useless sleep nefore the first or after the last check. Additionally I declared ts with a const qualifier now to make the nature of this variable more clear. By accident I replied to Jim only personally and not to the list and thus the discussion leading to this modification was not reflected there. I am sorry for that. Robert utils/mount/network.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/utils/mount/network.c b/utils/mount/network.c index e2cdcaf..e8e55a5 100644 --- a/utils/mount/network.c +++ b/utils/mount/network.c @@ -58,6 +58,7 @@ #define PMAP_TIMEOUT (10) #define CONNECT_TIMEOUT (20) #define MOUNT_TIMEOUT (30) +#define STATD_TIMEOUT (10) #define SAFE_SOCKADDR(x) (struct sockaddr *)(char *)(x) @@ -773,6 +774,11 @@ int start_statd(void) #ifdef START_STATD if (stat(START_STATD, &stb) == 0) { if (S_ISREG(stb.st_mode) && (stb.st_mode & S_IXUSR)) { + int cnt = STATD_TIMEOUT * 10; + const struct timespec ts = { + .tv_sec = 0, + .tv_nsec = 100000000, + }; pid_t pid = fork(); switch (pid) { case 0: /* child */ @@ -786,8 +792,13 @@ int start_statd(void) waitpid(pid, NULL,0); break; } - if (nfs_probe_statd()) - return 1; + while (1) { + if (nfs_probe_statd()) + return 1; + if (! cnt--) + return 0; + nanosleep(&ts, NULL); + } } } #endif