mbox series

[v9,00/27] Readdir improvements

Message ID 20220227231227.9038-1-trondmy@kernel.org (mailing list archive)
Headers show
Series Readdir improvements | expand

Message

Trond Myklebust Feb. 27, 2022, 11:12 p.m. UTC
From: Trond Myklebust <trond.myklebust@hammerspace.com>

The current NFS readdir code will always try to maximise the amount of
readahead it performs on the assumption that we can cache anything that
isn't immediately read by the process.
There are several cases where this assumption breaks down, including
when the 'ls -l' heuristic kicks in to try to force use of readdirplus
as a batch replacement for lookup/getattr.

This series also implement Ben's page cache filter to ensure that we can
improve the ability to share cached data between processes that are
reading the same directory at the same time, and to avoid live-locks
when the directory is simultaneously changing.

--
v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
v3: Avoid excessive window shrinking in uncached_readdir case
v4: Track 'ls -l' cache hit/miss statistics
    Improved algorithm for falling back to uncached readdir
    Skip readdirplus when files are being written to
v5: bugfixes
    Skip readdirplus when the acdirmax/acregmax values are low
    Request a full XDR buffer when doing READDIRPLUS
v6: Add tracing
    Don't have lookup request readdirplus when it won't help
v7: Implement Ben's page cache filter
    Reduce the use of uncached readdir
    Change indexing of the page cache to improve seekdir() performance.
v8: Reduce the page cache overhead by shrinking the cookie hashvalue size
    Incorporate other feedback from Anna, Ben and Neil
    Fix nfs2/3_decode_dirent return values
    Fix the change attribute value set in nfs_readdir_page_get_next()
v9: Address bugs that were hit when testing with large directories
    Misc cleanups

Trond Myklebust (27):
  NFS: Return valid errors from nfs2/3_decode_dirent()
  NFS: constify nfs_server_capable() and nfs_have_writebacks()
  NFS: Trace lookup revalidation failure
  NFS: Initialise the readdir verifier as best we can in nfs_opendir()
  NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
  NFS: Calculate page offsets algorithmically
  NFS: Store the change attribute in the directory page cache
  NFS: Don't re-read the entire page cache to find the next cookie
  NFS: Don't advance the page pointer unless the page is full
  NFS: Adjust the amount of readahead performed by NFS readdir
  NFS: If the cookie verifier changes, we must invalidate the page cache
  NFS: Simplify nfs_readdir_xdr_to_array()
  NFS: Reduce use of uncached readdir
  NFS: Improve heuristic for readdirplus
  NFS: Don't ask for readdirplus unless it can help nfs_getattr()
  NFSv4: Ask for a full XDR buffer of readdir goodness
  NFS: Readdirplus can't help lookup for case insensitive filesystems
  NFS: Don't request readdirplus when revalidation was forced
  NFS: Add basic readdir tracing
  NFS: Trace effects of readdirplus on the dcache
  NFS: Trace effects of the readdirplus heuristic
  NFS: Clean up page array initialisation/free
  NFS: Convert readdir page cache to use a cookie based index
  NFS: Fix up forced readdirplus
  NFS: Remove unnecessary cache invalidations for directories
  NFS: Optimise away the previous cookie field
  NFS: Cache all entries in the readdirplus reply

 fs/nfs/Kconfig          |   4 +
 fs/nfs/dir.c            | 602 ++++++++++++++++++++++++----------------
 fs/nfs/inode.c          |  46 ++-
 fs/nfs/internal.h       |   4 +-
 fs/nfs/nfs2xdr.c        |   3 +-
 fs/nfs/nfs3xdr.c        |  29 +-
 fs/nfs/nfs4proc.c       |   2 -
 fs/nfs/nfs4xdr.c        |   7 +-
 fs/nfs/nfstrace.h       | 123 +++++++-
 include/linux/nfs_fs.h  |  19 +-
 include/linux/nfs_xdr.h |   3 +-
 11 files changed, 532 insertions(+), 310 deletions(-)

Comments

Benjamin Coddington March 9, 2022, 9:32 p.m. UTC | #1
On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:

> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> The current NFS readdir code will always try to maximise the amount of
> readahead it performs on the assumption that we can cache anything that
> isn't immediately read by the process.
> There are several cases where this assumption breaks down, including
> when the 'ls -l' heuristic kicks in to try to force use of readdirplus
> as a batch replacement for lookup/getattr.
>
> This series also implement Ben's page cache filter to ensure that we can
> improve the ability to share cached data between processes that are
> reading the same directory at the same time, and to avoid live-locks
> when the directory is simultaneously changing.
>
> --
> v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
> v3: Avoid excessive window shrinking in uncached_readdir case
> v4: Track 'ls -l' cache hit/miss statistics
>     Improved algorithm for falling back to uncached readdir
>     Skip readdirplus when files are being written to
> v5: bugfixes
>     Skip readdirplus when the acdirmax/acregmax values are low
>     Request a full XDR buffer when doing READDIRPLUS
> v6: Add tracing
>     Don't have lookup request readdirplus when it won't help
> v7: Implement Ben's page cache filter
>     Reduce the use of uncached readdir
>     Change indexing of the page cache to improve seekdir() performance.
> v8: Reduce the page cache overhead by shrinking the cookie hashvalue size
>     Incorporate other feedback from Anna, Ben and Neil
>     Fix nfs2/3_decode_dirent return values
>     Fix the change attribute value set in nfs_readdir_page_get_next()
> v9: Address bugs that were hit when testing with large directories
>     Misc cleanups

Hi Trond, thanks for all this work!  I went through these from your testing
branch (612896ec5a4e) rather than the posting.

If it pleases, these can all get marked with my

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
and/or
Tested-by: Benjamin Coddington <bcodding@redhat.com>

.. except for 25/27, which is missing from the testing branch.

As I replied to 23/27, I don't understand how the page index hashing is
going to help out the re-export seekdir case, I think it might make it
worse, and I think its unnecessary extra complication.

I did test extensively total directory listing correctness, and it appears
to me that you are correct, we are not regressing.  We're in a similar place
as before.  I think we can be even more correct with directory verifiers or
post-op updates with GETATTR in the READDIR compound for very little cost,
but I've already made those arguments a few weeks ago.

Thanks again,
Ben