From patchwork Thu Dec 8 16:13:18 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Coddington X-Patchwork-Id: 9466619 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6B03160231 for ; Thu, 8 Dec 2016 16:13:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B033285BC for ; Thu, 8 Dec 2016 16:13:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4F76C285BF; Thu, 8 Dec 2016 16:13:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CAD49285BC for ; Thu, 8 Dec 2016 16:13:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752252AbcLHQNU (ORCPT ); Thu, 8 Dec 2016 11:13:20 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48146 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752139AbcLHQNU (ORCPT ); Thu, 8 Dec 2016 11:13:20 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EB925C04B93A; Thu, 8 Dec 2016 16:13:19 +0000 (UTC) Received: from bcodding.csb (vpn-55-75.rdu2.redhat.com [10.10.55.75]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uB8GDJZA000346; Thu, 8 Dec 2016 11:13:19 -0500 Received: by bcodding.csb (Postfix, from userid 24008) id 7C9DD10C3110; Thu, 8 Dec 2016 11:13:18 -0500 (EST) From: Benjamin Coddington To: Trond Myklebust , "Trond Myklebust" Cc: linux-nfs@vger.kernel.org Subject: Re: Concurrent `ls` takes out the thrash Date: Thu, 8 Dec 2016 11:13:18 -0500 Message-Id: <3A6424D7-8207-4392-88C5-6CD3E1F65800@redhat.com> In-Reply-To: <790E1295-EBBE-4059-9EEC-D2C9CD36EA20@redhat.com> References: <7DA8E9BE-7353-44D5-B982-B477CF7B0A57@redhat.com> <934FC075-4393-42AD-92A3-3BC3BE580699@redhat.com> <6C5DA8AC-9A42-45FA-892C-E9597A7F7AC9@primarydata.com> <3C863AB4-D07A-494A-AD7B-87BF31496777@primarydata.com> <790E1295-EBBE-4059-9EEC-D2C9CD36EA20@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Thu, 08 Dec 2016 16:13:20 +0000 (UTC) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: "Benjamin Coddington" Ehh.. I think it's kinda ugly, but this is what I came up with. It works well, though, and handles everything I throw at it. Its not as simple as Christoph's suggestion to just go back to .iterate, which would solve this in a simpler way. Ben 8<--------------------------------------------------------------------------------------- From c95f29761bb1f60480863601c0becdd6cba89998 Mon Sep 17 00:00:00 2001 Message-Id: From: Benjamin Coddington Date: Wed, 7 Dec 2016 16:23:30 -0500 Subject: [PATCH] NFS: Handle concurrent users of nfs_readdir() For a concurrent 'ls -l' workload, processes can stack up in nfs_readdir() both waiting on the next page and taking turns filling the next page from XDR, but only one of them will have desc->plus set because setting it clears the flag on the directory. If a page is filled by a process that doesn't have desc->plus set then the next pass through lookup() will cause it to dump the entire page cache with nfs_force_use_readdirplus(). Then the next readdir starts all over filling the pagecache. Forward progress happens, but only after many steps back re-filling the pagecache. Fix this by using an additional flag on the inode to extend the lifetime of the readdir advisement to any processes that may be sending READDIR calls for that directory in nfs_readdir(). --- fs/nfs/dir.c | 25 ++++++++++++++++++------- include/linux/nfs_fs.h | 1 + 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 7483722162fa..d3e256d723d1 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -167,8 +167,10 @@ typedef struct { unsigned long gencount; unsigned int cache_entry_index; unsigned int plus:1; + unsigned int doer:1; unsigned int eof:1; } nfs_readdir_descriptor_t; +static bool nfs_use_readdirplus(struct inode *, nfs_readdir_descriptor_t *desc); /* * The caller is responsible for calling nfs_readdir_release_array(page) @@ -385,6 +387,7 @@ int nfs_readdir_xdr_filler(struct page **pages, nfs_readdir_descriptor_t *desc, int error; again: + desc->plus = nfs_use_readdirplus(inode, desc) ? 1 : 0; timestamp = jiffies; gencount = nfs_inc_attr_generation_counter(); error = NFS_PROTO(inode)->readdir(file_dentry(file), cred, entry->cookie, pages, @@ -393,8 +396,6 @@ int nfs_readdir_xdr_filler(struct page **pages, nfs_readdir_descriptor_t *desc, /* We requested READDIRPLUS, but the server doesn't grok it */ if (error == -ENOTSUPP && desc->plus) { NFS_SERVER(inode)->caps &= ~NFS_CAP_READDIRPLUS; - clear_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inode)->flags); - desc->plus = 0; goto again; } goto error; @@ -443,14 +444,22 @@ int nfs_same_file(struct dentry *dentry, struct nfs_entry *entry) } static -bool nfs_use_readdirplus(struct inode *dir, struct dir_context *ctx) +bool nfs_use_readdirplus(struct inode *dir, nfs_readdir_descriptor_t *desc) { + struct nfs_inode *nfsi = NFS_I(dir); + if (!nfs_server_capable(dir, NFS_CAP_READDIRPLUS)) return false; - if (test_and_clear_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags)) + if (test_bit(NFS_INO_DOING_RDPLUS, &nfsi->flags)) return true; - if (ctx->pos == 0) + if (desc->ctx->pos == 0) return true; + if (test_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags) && + !test_and_set_bit(NFS_INO_DOING_RDPLUS, &nfsi->flags)) { + clear_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); + desc->doer = 1; + return true; + } return false; } @@ -921,7 +930,6 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx) desc->ctx = ctx; desc->dir_cookie = &dir_ctx->dir_cookie; desc->decode = NFS_PROTO(inode)->decode_dirent; - desc->plus = nfs_use_readdirplus(inode, ctx) ? 1 : 0; if (ctx->pos == 0 || nfs_attribute_cache_expired(inode)) res = nfs_revalidate_mapping(inode, file->f_mapping); @@ -943,7 +951,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx) break; } if (res == -ETOOSMALL && desc->plus) { - clear_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inode)->flags); + clear_bit(NFS_INO_DOING_RDPLUS, &NFS_I(inode)->flags); nfs_zap_caches(inode); desc->page_index = 0; desc->plus = 0; @@ -957,6 +965,9 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx) if (res < 0) break; } while (!desc->eof); + + if (desc->doer) + clear_bit(NFS_INO_DOING_RDPLUS, &NFS_I(inode)->flags); out: if (res > 0) res = 0; diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index cb631973839a..e4d31fbad407 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -207,6 +207,7 @@ struct nfs_inode { #define NFS_INO_LAYOUTCOMMITTING (10) /* layoutcommit inflight */ #define NFS_INO_LAYOUTSTATS (11) /* layoutstats inflight */ #define NFS_INO_ODIRECT (12) /* I/O setting is O_DIRECT */ +#define NFS_INO_DOING_RDPLUS (13) /* doing readdirplus */ static inline struct nfs_inode *NFS_I(const struct inode *inode) {