From patchwork Thu Sep 5 20:02:01 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 2854212 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 0C54C9F3DC for ; Thu, 5 Sep 2013 20:02:11 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 73229202B3 for ; Thu, 5 Sep 2013 20:02:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F09B0202AE for ; Thu, 5 Sep 2013 20:02:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753275Ab3IEUCF (ORCPT ); Thu, 5 Sep 2013 16:02:05 -0400 Received: from mx12.netapp.com ([216.240.18.77]:32866 "EHLO mx12.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752968Ab3IEUCE (ORCPT ); Thu, 5 Sep 2013 16:02:04 -0400 X-IronPort-AV: E=Sophos;i="4.90,849,1371106800"; d="scan'208,223";a="86967679" Received: from vmwexceht01-prd.hq.netapp.com ([10.106.76.239]) by mx12-out.netapp.com with ESMTP; 05 Sep 2013 13:02:03 -0700 Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.213]) by vmwexceht01-prd.hq.netapp.com ([10.106.76.239]) with mapi id 14.03.0123.003; Thu, 5 Sep 2013 13:02:03 -0700 From: "Myklebust, Trond" To: Quentin Barnes CC: "linux-nfs@vger.kernel.org" Subject: Re: nfs-backed mmap file results in 1000s of WRITEs per second Thread-Topic: nfs-backed mmap file results in 1000s of WRITEs per second Thread-Index: AQHOqlP6IeJUk/D/3UCL1sI72/eClJm31EGAgAAj7oCAAA4RAA== Date: Thu, 5 Sep 2013 20:02:01 +0000 Message-ID: <1378411320.5450.27.camel@leira.trondhjem.org> References: <20130905162110.GA17920@gmail.com> <20130905170303.GB17330@us.ibm.com> <20130905191139.GA20830@gmail.com> In-Reply-To: <20130905191139.GA20830@gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_TVD_MIME_EPI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, 2013-09-05 at 14:11 -0500, Quentin Barnes wrote: > On Thu, Sep 05, 2013 at 12:03:03PM -0500, Malahal Naineni wrote: > > Neil Brown posted a patch couple days ago for this! > > > > http://thread.gmane.org/gmane.linux.nfs/58473 > > I tried Neil's patch on a v3.11 kernel. The rebuilt kernel still > exhibited the same 1000s of WRITEs/sec problem. > > Any other ideas? Yes. Please try the attached patch. > > Regards, Malahal. > > > > Quentin Barnes [qbarnes@gmail.com] wrote: > > > If two (or more) processes are doing nothing more than writing to > > > the memory addresses of an mmapped shared file on an NFS mounted > > > file system, it results in the kernel scribbling WRITEs to the > > > server as fast as it can (1000s per second) even while no syscalls > > > are going on. > > > > > > The problems happens on NFS clients mounting NFSv3 or NFSv4. I've > > > reproduced this on the 3.11 kernel, and it happens as far back as > > > RHEL6 (2.6.32 based), however, it is not a problem on RHEL5 (2.6.18 > > > based). (All x86_64 systems.) I didn't try anything in between. > > > > > > I've created a self-contained program below that will demonstrate > > > the problem (call it "t1"). Assuming /mnt has an NFS file system: > > > > > > $ t1 /mnt/mynfsfile 1 # Fork 1 writer, kernel behaves normally > > > $ t1 /mnt/mynfsfile 2 # Fork 2 writers, kernel goes crazy WRITEing > > > > > > Just run "watch -d nfsstat" in another window while running the two > > > writer test and watch the WRITE count explode. > > > > > > I don't see anything particularly wrong with what the example code > > > is doing with its use of mmap. Is there anything undefined about > > > the code that would explain this behavior, or is this a NFS bug > > > that's really lived this long? > > > > > > Quentin > > > > > > > > > > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > > > > int > > > kill_children() > > > { > > > int cnt = 0; > > > siginfo_t infop; > > > > > > signal(SIGINT, SIG_IGN); > > > kill(0, SIGINT); > > > while (waitid(P_ALL, 0, &infop, WEXITED) != -1) ++cnt; > > > > > > return cnt; > > > } > > > > > > void > > > sighandler(int sig) > > > { > > > printf("Cleaning up all children.\n"); > > > int cnt = kill_children(); > > > printf("Cleaned up %d child%s.\n", cnt, cnt == 1 ? "" : "ren"); > > > > > > exit(0); > > > } > > > > > > int > > > do_child(volatile int *iaddr) > > > { > > > while (1) *iaddr = 1; > > > } > > > > > > int > > > main(int argc, char **argv) > > > { > > > const char *path; > > > int fd; > > > ssize_t wlen; > > > int *ip; > > > int fork_count = 1; > > > > > > if (argc == 1) { > > > fprintf(stderr, "Usage: %s {filename} [fork_count].\n", > > > argv[0]); > > > return 1; > > > } > > > > > > path = argv[1]; > > > > > > if (argc > 2) { > > > int fc = atoi(argv[2]); > > > if (fc >= 0) > > > fork_count = fc; > > > } > > > > > > fd = open(path, O_CREAT|O_TRUNC|O_RDWR|O_APPEND, S_IRUSR|S_IWUSR); > > > if (fd < 0) { > > > fprintf(stderr, "Open of '%s' failed: %s (%d)\n", > > > path, strerror(errno), errno); > > > return 1; > > > } > > > > > > wlen = write(fd, &(int){0}, sizeof(int)); > > > if (wlen != sizeof(int)) { > > > if (wlen < 0) > > > fprintf(stderr, "Write of '%s' failed: %s (%d)\n", > > > path, strerror(errno), errno); > > > else > > > fprintf(stderr, "Short write to '%s'\n", path); > > > return 1; > > > } > > > > > > ip = (int *)mmap(NULL, sizeof(int), PROT_READ|PROT_WRITE, > > > MAP_SHARED, fd, 0); > > > if (ip == MAP_FAILED) { > > > fprintf(stderr, "Mmap of '%s' failed: %s (%d)\n", > > > path, strerror(errno), errno); > > > return 1; > > > } > > > > > > signal(SIGINT, sighandler); > > > > > > while (fork_count-- > 0) { > > > switch(fork()) { > > > case -1: > > > fprintf(stderr, "Fork failed: %s (%d)\n", > > > strerror(errno), errno); > > > kill_children(); > > > return 1; > > > case 0: /* child */ > > > signal(SIGINT, SIG_DFL); > > > do_child(ip); > > > break; > > > default: /* parent */ > > > break; > > > } > > > } > > > > > > printf("Press ^C to terminate test.\n"); > > > pause(); > > > > > > return 0; > > > } > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > Quentin > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com From 903ebaeefae78e6e03f3719aafa8fd5dd22d3288 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 5 Sep 2013 15:52:51 -0400 Subject: [PATCH] NFS: Don't check lock owner compatibility in writes unless file is locked If we're doing buffered writes, and there is no file locking involved, then we don't have to worry about whether or not the lock owner information is identical. By relaxing this check, we ensure that fork()ed child processes can write to a page without having to first sync dirty data that was written by the parent to disk. Signed-off-by: Trond Myklebust --- fs/nfs/write.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 40979e8..ac1dc33 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -863,7 +863,7 @@ int nfs_flush_incompatible(struct file *file, struct page *page) return 0; l_ctx = req->wb_lock_context; do_flush = req->wb_page != page || req->wb_context != ctx; - if (l_ctx) { + if (l_ctx && ctx->dentry->d_inode->i_flock != NULL) { do_flush |= l_ctx->lockowner.l_owner != current->files || l_ctx->lockowner.l_pid != current->tgid; } -- 1.8.3.1