mbox series

[v7,00/21] Readdir improvements

Message ID 20220223211305.296816-1-trondmy@kernel.org (mailing list archive)
Headers show
Series Readdir improvements | expand

Message

Trond Myklebust Feb. 23, 2022, 9:12 p.m. UTC
From: Trond Myklebust <trond.myklebust@hammerspace.com>

The current NFS readdir code will always try to maximise the amount of
readahead it performs on the assumption that we can cache anything that
isn't immediately read by the process.
There are several cases where this assumption breaks down, including
when the 'ls -l' heuristic kicks in to try to force use of readdirplus
as a batch replacement for lookup/getattr.

This series also implement Ben's page cache filter to ensure that we can
improve the ability to share cached data between processes that are
reading the same directory at the same time, and to avoid live-locks
when the directory is simultaneously changing.

--
v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
v3: Avoid excessive window shrinking in uncached_readdir case
v4: Track 'ls -l' cache hit/miss statistics
    Improved algorithm for falling back to uncached readdir
    Skip readdirplus when files are being written to
v5: bugfixes
    Skip readdirplus when the acdirmax/acregmax values are low
    Request a full XDR buffer when doing READDIRPLUS
v6: Add tracing
    Don't have lookup request readdirplus when it won't help
v7: Implement Ben's page cache filter
    Reduce the use of uncached readdir
    Change indexing of the page cache to improve seekdir() performance.

Trond Myklebust (21):
  NFS: constify nfs_server_capable() and nfs_have_writebacks()
  NFS: Trace lookup revalidation failure
  NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
  NFS: Calculate page offsets algorithmically
  NFS: Store the change attribute in the directory page cache
  NFS: If the cookie verifier changes, we must invalidate the page cache
  NFS: Don't re-read the entire page cache to find the next cookie
  NFS: Adjust the amount of readahead performed by NFS readdir
  NFS: Simplify nfs_readdir_xdr_to_array()
  NFS: Reduce use of uncached readdir
  NFS: Improve heuristic for readdirplus
  NFS: Don't ask for readdirplus unless it can help nfs_getattr()
  NFSv4: Ask for a full XDR buffer of readdir goodness
  NFS: Readdirplus can't help lookup for case insensitive filesystems
  NFS: Don't request readdirplus when revalidation was forced
  NFS: Add basic readdir tracing
  NFS: Trace effects of readdirplus on the dcache
  NFS: Trace effects of the readdirplus heuristic
  NFS: Convert readdir page cache to use a cookie based index
  NFS: Fix up forced readdirplus
  NFS: Remove unnecessary cache invalidations for directories

 fs/nfs/dir.c           | 450 ++++++++++++++++++++++++-----------------
 fs/nfs/inode.c         |  46 ++---
 fs/nfs/internal.h      |   4 +-
 fs/nfs/nfs3xdr.c       |   7 +-
 fs/nfs/nfs4proc.c      |   2 -
 fs/nfs/nfs4xdr.c       |   6 +-
 fs/nfs/nfstrace.h      | 122 ++++++++++-
 include/linux/nfs_fs.h |  19 +-
 8 files changed, 421 insertions(+), 235 deletions(-)

Comments

David Wysochanski Feb. 24, 2022, 12:25 p.m. UTC | #1
On Wed, Feb 23, 2022 at 4:24 PM <trondmy@kernel.org> wrote:
>
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> The current NFS readdir code will always try to maximise the amount of
> readahead it performs on the assumption that we can cache anything that
> isn't immediately read by the process.
> There are several cases where this assumption breaks down, including
> when the 'ls -l' heuristic kicks in to try to force use of readdirplus
> as a batch replacement for lookup/getattr.
>
> This series also implement Ben's page cache filter to ensure that we can
> improve the ability to share cached data between processes that are
> reading the same directory at the same time, and to avoid live-locks
> when the directory is simultaneously changing.
>
> --
> v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
> v3: Avoid excessive window shrinking in uncached_readdir case
> v4: Track 'ls -l' cache hit/miss statistics
>     Improved algorithm for falling back to uncached readdir
>     Skip readdirplus when files are being written to
> v5: bugfixes
>     Skip readdirplus when the acdirmax/acregmax values are low
>     Request a full XDR buffer when doing READDIRPLUS
> v6: Add tracing
>     Don't have lookup request readdirplus when it won't help
> v7: Implement Ben's page cache filter
>     Reduce the use of uncached readdir
>     Change indexing of the page cache to improve seekdir() performance.
>
> Trond Myklebust (21):
>   NFS: constify nfs_server_capable() and nfs_have_writebacks()
>   NFS: Trace lookup revalidation failure
>   NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
>   NFS: Calculate page offsets algorithmically
>   NFS: Store the change attribute in the directory page cache
>   NFS: If the cookie verifier changes, we must invalidate the page cache
>   NFS: Don't re-read the entire page cache to find the next cookie
>   NFS: Adjust the amount of readahead performed by NFS readdir
>   NFS: Simplify nfs_readdir_xdr_to_array()
>   NFS: Reduce use of uncached readdir
>   NFS: Improve heuristic for readdirplus
>   NFS: Don't ask for readdirplus unless it can help nfs_getattr()
>   NFSv4: Ask for a full XDR buffer of readdir goodness
>   NFS: Readdirplus can't help lookup for case insensitive filesystems
>   NFS: Don't request readdirplus when revalidation was forced
>   NFS: Add basic readdir tracing
>   NFS: Trace effects of readdirplus on the dcache
>   NFS: Trace effects of the readdirplus heuristic
>   NFS: Convert readdir page cache to use a cookie based index
>   NFS: Fix up forced readdirplus
>   NFS: Remove unnecessary cache invalidations for directories
>
>  fs/nfs/dir.c           | 450 ++++++++++++++++++++++++-----------------
>  fs/nfs/inode.c         |  46 ++---
>  fs/nfs/internal.h      |   4 +-
>  fs/nfs/nfs3xdr.c       |   7 +-
>  fs/nfs/nfs4proc.c      |   2 -
>  fs/nfs/nfs4xdr.c       |   6 +-
>  fs/nfs/nfstrace.h      | 122 ++++++++++-
>  include/linux/nfs_fs.h |  19 +-
>  8 files changed, 421 insertions(+), 235 deletions(-)
>
> --
> 2.35.1
>

Trond I have been following your work here with periodic tests though
not fully following all the patches content.  As you know this is a tricky
area and seems to be a hotspot area for customers that use NFS, with
many scenarios that may go wrong.  Thanks for your work, which now
includes even some tracepoints and Ben's page cache filler.

This patchset seems to be the best of all the ones so far.  My initial
tests (listings when modifying as well as idle directories) indicate
that the issue that Gonzalo reported on Jan 14th [1] looks to be fixed
by this set, but I'll let him confirm.  I'll do some more testing and
let you know if there's anything else I find.  If there're some
scenarios (mount options, servers, etc) you need more testing on, let
us know and we'll try to make that happen.

[1] [PATCH] NFS: limit block size reported for directories
David Wysochanski Feb. 24, 2022, 3:07 p.m. UTC | #2
On Wed, Feb 23, 2022 at 4:24 PM <trondmy@kernel.org> wrote:
>
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> The current NFS readdir code will always try to maximise the amount of
> readahead it performs on the assumption that we can cache anything that
> isn't immediately read by the process.
> There are several cases where this assumption breaks down, including
> when the 'ls -l' heuristic kicks in to try to force use of readdirplus
> as a batch replacement for lookup/getattr.
>
> This series also implement Ben's page cache filter to ensure that we can
> improve the ability to share cached data between processes that are
> reading the same directory at the same time, and to avoid live-locks
> when the directory is simultaneously changing.
>
> --
> v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
> v3: Avoid excessive window shrinking in uncached_readdir case
> v4: Track 'ls -l' cache hit/miss statistics
>     Improved algorithm for falling back to uncached readdir
>     Skip readdirplus when files are being written to
> v5: bugfixes
>     Skip readdirplus when the acdirmax/acregmax values are low
>     Request a full XDR buffer when doing READDIRPLUS
> v6: Add tracing
>     Don't have lookup request readdirplus when it won't help
> v7: Implement Ben's page cache filter
>     Reduce the use of uncached readdir
>     Change indexing of the page cache to improve seekdir() performance.
>
> Trond Myklebust (21):
>   NFS: constify nfs_server_capable() and nfs_have_writebacks()
>   NFS: Trace lookup revalidation failure
>   NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
>   NFS: Calculate page offsets algorithmically
>   NFS: Store the change attribute in the directory page cache
>   NFS: If the cookie verifier changes, we must invalidate the page cache
>   NFS: Don't re-read the entire page cache to find the next cookie
>   NFS: Adjust the amount of readahead performed by NFS readdir
>   NFS: Simplify nfs_readdir_xdr_to_array()
>   NFS: Reduce use of uncached readdir
>   NFS: Improve heuristic for readdirplus
>   NFS: Don't ask for readdirplus unless it can help nfs_getattr()
>   NFSv4: Ask for a full XDR buffer of readdir goodness
>   NFS: Readdirplus can't help lookup for case insensitive filesystems
>   NFS: Don't request readdirplus when revalidation was forced
>   NFS: Add basic readdir tracing
>   NFS: Trace effects of readdirplus on the dcache
>   NFS: Trace effects of the readdirplus heuristic
>   NFS: Convert readdir page cache to use a cookie based index
>   NFS: Fix up forced readdirplus
>   NFS: Remove unnecessary cache invalidations for directories
>
>  fs/nfs/dir.c           | 450 ++++++++++++++++++++++++-----------------
>  fs/nfs/inode.c         |  46 ++---
>  fs/nfs/internal.h      |   4 +-
>  fs/nfs/nfs3xdr.c       |   7 +-
>  fs/nfs/nfs4proc.c      |   2 -
>  fs/nfs/nfs4xdr.c       |   6 +-
>  fs/nfs/nfstrace.h      | 122 ++++++++++-
>  include/linux/nfs_fs.h |  19 +-
>  8 files changed, 421 insertions(+), 235 deletions(-)
>
> --
> 2.35.1
>

I don't see this pushed this to your 'testing' branch, but I applied
manually after resetting at 3f58e709c162
Trond Myklebust Feb. 25, 2022, 4 a.m. UTC | #3
On Thu, 2022-02-24 at 07:25 -0500, David Wysochanski wrote:
> On Wed, Feb 23, 2022 at 4:24 PM <trondmy@kernel.org> wrote:
> > 
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > The current NFS readdir code will always try to maximise the amount
> > of
> > readahead it performs on the assumption that we can cache anything
> > that
> > isn't immediately read by the process.
> > There are several cases where this assumption breaks down,
> > including
> > when the 'ls -l' heuristic kicks in to try to force use of
> > readdirplus
> > as a batch replacement for lookup/getattr.
> > 
> > This series also implement Ben's page cache filter to ensure that
> > we can
> > improve the ability to share cached data between processes that are
> > reading the same directory at the same time, and to avoid live-
> > locks
> > when the directory is simultaneously changing.
> > 
> > --
> > v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
> > v3: Avoid excessive window shrinking in uncached_readdir case
> > v4: Track 'ls -l' cache hit/miss statistics
> >     Improved algorithm for falling back to uncached readdir
> >     Skip readdirplus when files are being written to
> > v5: bugfixes
> >     Skip readdirplus when the acdirmax/acregmax values are low
> >     Request a full XDR buffer when doing READDIRPLUS
> > v6: Add tracing
> >     Don't have lookup request readdirplus when it won't help
> > v7: Implement Ben's page cache filter
> >     Reduce the use of uncached readdir
> >     Change indexing of the page cache to improve seekdir()
> > performance.
> > 
> > Trond Myklebust (21):
> >   NFS: constify nfs_server_capable() and nfs_have_writebacks()
> >   NFS: Trace lookup revalidation failure
> >   NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
> >   NFS: Calculate page offsets algorithmically
> >   NFS: Store the change attribute in the directory page cache
> >   NFS: If the cookie verifier changes, we must invalidate the page
> > cache
> >   NFS: Don't re-read the entire page cache to find the next cookie
> >   NFS: Adjust the amount of readahead performed by NFS readdir
> >   NFS: Simplify nfs_readdir_xdr_to_array()
> >   NFS: Reduce use of uncached readdir
> >   NFS: Improve heuristic for readdirplus
> >   NFS: Don't ask for readdirplus unless it can help nfs_getattr()
> >   NFSv4: Ask for a full XDR buffer of readdir goodness
> >   NFS: Readdirplus can't help lookup for case insensitive
> > filesystems
> >   NFS: Don't request readdirplus when revalidation was forced
> >   NFS: Add basic readdir tracing
> >   NFS: Trace effects of readdirplus on the dcache
> >   NFS: Trace effects of the readdirplus heuristic
> >   NFS: Convert readdir page cache to use a cookie based index
> >   NFS: Fix up forced readdirplus
> >   NFS: Remove unnecessary cache invalidations for directories
> > 
> >  fs/nfs/dir.c           | 450 ++++++++++++++++++++++++-------------
> > ----
> >  fs/nfs/inode.c         |  46 ++---
> >  fs/nfs/internal.h      |   4 +-
> >  fs/nfs/nfs3xdr.c       |   7 +-
> >  fs/nfs/nfs4proc.c      |   2 -
> >  fs/nfs/nfs4xdr.c       |   6 +-
> >  fs/nfs/nfstrace.h      | 122 ++++++++++-
> >  include/linux/nfs_fs.h |  19 +-
> >  8 files changed, 421 insertions(+), 235 deletions(-)
> > 
> > --
> > 2.35.1
> > 
> 
> Trond I have been following your work here with periodic tests though
> not fully following all the patches content.  As you know this is a
> tricky
> area and seems to be a hotspot area for customers that use NFS, with
> many scenarios that may go wrong.  Thanks for your work, which now
> includes even some tracepoints and Ben's page cache filler.
> 
> This patchset seems to be the best of all the ones so far.  My
> initial
> tests (listings when modifying as well as idle directories) indicate
> that the issue that Gonzalo reported on Jan 14th [1] looks to be
> fixed
> by this set, but I'll let him confirm.  I'll do some more testing and
> let you know if there's anything else I find.  If there're some
> scenarios (mount options, servers, etc) you need more testing on, let
> us know and we'll try to make that happen.
> 
> [1] [PATCH] NFS: limit block size reported for directories
> 

Thanks Dave! I very much appreciate your testing the patches. This is a
very complex area due to the interplay of readdir/readdirplus, the
dcache and the attribute cache, and so it is key to get as much
statistics as possible. Thanks again to you and Red Hat for
contributing to that effort.

Thanks also to Ben for his comments and reviews. His insights are key
to the progress we appear to be making.