mbox series

[v1,00/10] NFS client readdir per-page validation

Message ID cover.1611160120.git.bcodding@redhat.com (mailing list archive)
Headers show
Series NFS client readdir per-page validation | expand


Benjamin Coddington Jan. 20, 2021, 4:59 p.m. UTC
Due to the constraint that the NFS readdir page cache must contain every
entry in cookie order from zero up to the entry of interest, the time or
operations required to complete a directory listing increase exponentially
with the size of the directory if the client is unable to keep the pagecache
stable.  The pagecache can be invalidated by a changing directory, or by
memory pressure on the client.  This can cause some trouble for the NFS
client reading large directories over slow connections.

We have a hueristic that allows eventual completion, but it only works as
long as there are no other readers simultaneously filling the pagecache.

I think we can resolve this problem by implementing per-page validation.  By
storing the directory's change version on the page, and checking for changes
to the directory on every READDIR, we can validate pages against each
reader's version of entry aligment.  Rather than attempting to assemble the
entire directory in a consistent manner in the pagecache, we can just
retrieve the section we're interested in emitting.

This set is a first pass at implementing this idea.  Please help me pound it
into acceptable shape or point out problems!  Thanks for any feedback.

Here's a small program that does a great job of demonstraing the client's
current readdir pagecache performance problem by dropping the directory's
pagecache at an interval while trying to emit every entry:

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <signal.h>

#define BUF_SIZE 1024

int evict_pagecache(int fd) {
                return posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);

int main(int argc, char **argv)
        int dir_fd;
        pid_t pid;
        cpu_set_t *cpusetp = CPU_ALLOC(2);
        off_t off;

        char buf[BUF_SIZE];

        if (argc < 2) { printf("%s <dir>\n", argv[0]); return 1; }

        dir_fd = open(argv[1], O_RDONLY|O_DIRECTORY|O_CLOEXEC);
        if (dir_fd < 0) { printf("cannot open dir\n"); return 1; }

        pid = fork();
        if (pid == 0) {
                CPU_SET(1, cpusetp);
                sched_setaffinity(0, sizeof(cpu_set_t), cpusetp);
                do {
                        off = lseek(dir_fd, 0, SEEK_CUR);
                        printf("currently at %llu\n", off);
                        usleep(EVICT_INTERVAL * 1000000);
                } while (1);
        } else {
                CPU_SET(0, cpusetp);
                sched_setaffinity(0, sizeof(cpu_set_t), cpusetp);
                while (syscall(SYS_getdents, dir_fd, buf, BUF_SIZE)) {}
                kill(pid, SIGINT);

        return 0;

Benjamin Coddington (10):
  NFS: save the directory's change attribute on pagecache pages
  NFS: Add a struct to track readdir pagecache location
  NFS: Keep the readdir pagecache cursor updated
  NFS: readdir per-page cache validation
  NFS: stash the readdir pagecache cursor on the open directory context
  NFS: Support headless readdir pagecache pages
  NFS: Reset pagecache cursor on llseek
  NFS: Remove nfs_readdir_dont_search_cache()
  NFS: Revalidate the directory pagecache on every nfs_readdir()

 fs/nfs/dir.c              | 210 +++++++++++++++++++++++++++-----------
 fs/nfs/nfs42proc.c        |   2 +-
 fs/nfs/nfs4proc.c         |  27 +++--
 fs/nfs/nfs4xdr.c          |   6 ++
 include/linux/nfs_fs.h    |   8 +-
 include/linux/nfs_fs_sb.h |   5 +
 include/linux/nfs_xdr.h   |   2 +
 7 files changed, 188 insertions(+), 72 deletions(-)