[00/16] Cache open file descriptors in knfsd
mbox series

Message ID 20190630135240.7490-1-trond.myklebust@hammerspace.com
Headers show
Series
  • Cache open file descriptors in knfsd
Related show

Message

Trond Myklebust June 30, 2019, 1:52 p.m. UTC
When a NFSv3 READ or WRITE request comes in, the first thing knfsd has
to do is open a new file descriptor. While this is often a relatively
inexpensive thing to do for most local filesystems, it is usually less
so for FUSE, clustered or networked filesystems that are being exported
by knfsd.

This set of patches attempts to reduce some of that cost by caching
open file descriptors so that they may be reused by other incoming
READ/WRITE requests for the same file.
One danger when doing this, is that knfsd may end up caching file
descriptors for files that have been unlinked. In order to deal with
this issue, we use fsnotify to monitor the files, and have hooks to
evict those descriptors from the file cache if the i_nlink value
goes to 0.

Jeff Layton (12):
  sunrpc: add a new cache_detail operation for when a cache is flushed
  locks: create a new notifier chain for lease attempts
  nfsd: add a new struct file caching facility to nfsd
  nfsd: hook up nfsd_write to the new nfsd_file cache
  nfsd: hook up nfsd_read to the nfsd_file cache
  nfsd: hook nfsd_commit up to the nfsd_file cache
  nfsd: convert nfs4_file->fi_fds array to use nfsd_files
  nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
  nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
  nfsd: have nfsd_test_lock use the nfsd_file cache
  nfsd: rip out the raparms cache
  nfsd: close cached files prior to a REMOVE or RENAME that would
    replace target

Trond Myklebust (4):
  notify: export symbols for use by the knfsd file cache
  vfs: Export flush_delayed_fput for use by knfsd.
  nfsd: Fix up some unused variable warnings
  nfsd: Fix the documentation for svcxdr_tmpalloc()

 fs/file_table.c                  |   1 +
 fs/locks.c                       |  62 +++
 fs/nfsd/Kconfig                  |   1 +
 fs/nfsd/Makefile                 |   3 +-
 fs/nfsd/blocklayout.c            |   3 +-
 fs/nfsd/export.c                 |  13 +
 fs/nfsd/filecache.c              | 885 +++++++++++++++++++++++++++++++
 fs/nfsd/filecache.h              |  60 +++
 fs/nfsd/nfs4layouts.c            |  12 +-
 fs/nfsd/nfs4proc.c               |  83 +--
 fs/nfsd/nfs4state.c              | 183 ++++---
 fs/nfsd/nfs4xdr.c                |  31 +-
 fs/nfsd/nfssvc.c                 |  16 +-
 fs/nfsd/state.h                  |  10 +-
 fs/nfsd/trace.h                  | 140 +++++
 fs/nfsd/vfs.c                    | 295 ++++-------
 fs/nfsd/vfs.h                    |   9 +-
 fs/nfsd/xdr4.h                   |  19 +-
 fs/notify/fsnotify.h             |   2 -
 fs/notify/group.c                |   2 +
 fs/notify/mark.c                 |   6 +
 include/linux/fs.h               |   5 +
 include/linux/fsnotify_backend.h |   2 +
 include/linux/sunrpc/cache.h     |   1 +
 net/sunrpc/cache.c               |   3 +
 25 files changed, 1465 insertions(+), 382 deletions(-)
 create mode 100644 fs/nfsd/filecache.c
 create mode 100644 fs/nfsd/filecache.h

Comments

Chuck Lever July 1, 2019, 3:02 p.m. UTC | #1
Interesting work! Kudos to you and Jeff.


> On Jun 30, 2019, at 9:52 AM, Trond Myklebust <trondmy@gmail.com> wrote:
> 
> When a NFSv3 READ or WRITE request comes in, the first thing knfsd has
> to do is open a new file descriptor. While this is often a relatively
> inexpensive thing to do for most local filesystems, it is usually less
> so for FUSE, clustered or networked filesystems that are being exported
> by knfsd.

True, I haven't measured much effect if any of open and close on local
file systems. It would be valuable if the cover letter provided a more
quantified assessment of the cost for these other use cases. It sounds
plausible to me that they would be more expensive, but I'm wondering if
the additional complexity of an open file cache is warranted and
effective. Do you have any benchmark results to share?

Are there particular workloads where you believe open caching will be
especially beneficial?


> This set of patches attempts to reduce some of that cost by caching
> open file descriptors so that they may be reused by other incoming
> READ/WRITE requests for the same file.

Is the open file cache a single cache per server? Wondering if there
can be significant interference (eg lock contention or cache sloshing)
between separate workloads on different exports, for example.

Do you have any benchmark results that show that removing the raparms
cache is harmless?


> One danger when doing this, is that knfsd may end up caching file
> descriptors for files that have been unlinked. In order to deal with
> this issue, we use fsnotify to monitor the files, and have hooks to
> evict those descriptors from the file cache if the i_nlink value
> goes to 0.
> 
> Jeff Layton (12):
>  sunrpc: add a new cache_detail operation for when a cache is flushed
>  locks: create a new notifier chain for lease attempts
>  nfsd: add a new struct file caching facility to nfsd
>  nfsd: hook up nfsd_write to the new nfsd_file cache
>  nfsd: hook up nfsd_read to the nfsd_file cache
>  nfsd: hook nfsd_commit up to the nfsd_file cache
>  nfsd: convert nfs4_file->fi_fds array to use nfsd_files
>  nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
>  nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
>  nfsd: have nfsd_test_lock use the nfsd_file cache
>  nfsd: rip out the raparms cache
>  nfsd: close cached files prior to a REMOVE or RENAME that would
>    replace target
> 
> Trond Myklebust (4):
>  notify: export symbols for use by the knfsd file cache
>  vfs: Export flush_delayed_fput for use by knfsd.
>  nfsd: Fix up some unused variable warnings
>  nfsd: Fix the documentation for svcxdr_tmpalloc()
> 
> fs/file_table.c                  |   1 +
> fs/locks.c                       |  62 +++
> fs/nfsd/Kconfig                  |   1 +
> fs/nfsd/Makefile                 |   3 +-
> fs/nfsd/blocklayout.c            |   3 +-
> fs/nfsd/export.c                 |  13 +
> fs/nfsd/filecache.c              | 885 +++++++++++++++++++++++++++++++
> fs/nfsd/filecache.h              |  60 +++
> fs/nfsd/nfs4layouts.c            |  12 +-
> fs/nfsd/nfs4proc.c               |  83 +--
> fs/nfsd/nfs4state.c              | 183 ++++---
> fs/nfsd/nfs4xdr.c                |  31 +-
> fs/nfsd/nfssvc.c                 |  16 +-
> fs/nfsd/state.h                  |  10 +-
> fs/nfsd/trace.h                  | 140 +++++
> fs/nfsd/vfs.c                    | 295 ++++-------
> fs/nfsd/vfs.h                    |   9 +-
> fs/nfsd/xdr4.h                   |  19 +-
> fs/notify/fsnotify.h             |   2 -
> fs/notify/group.c                |   2 +
> fs/notify/mark.c                 |   6 +
> include/linux/fs.h               |   5 +
> include/linux/fsnotify_backend.h |   2 +
> include/linux/sunrpc/cache.h     |   1 +
> net/sunrpc/cache.c               |   3 +
> 25 files changed, 1465 insertions(+), 382 deletions(-)
> create mode 100644 fs/nfsd/filecache.c
> create mode 100644 fs/nfsd/filecache.h
> 
> -- 
> 2.21.0
> 

--
Chuck Lever
Trond Myklebust July 1, 2019, 3:17 p.m. UTC | #2
On Mon, 2019-07-01 at 11:02 -0400, Chuck Lever wrote:
> Interesting work! Kudos to you and Jeff.
> 
> 
> > On Jun 30, 2019, at 9:52 AM, Trond Myklebust <trondmy@gmail.com>
> > wrote:
> > 
> > When a NFSv3 READ or WRITE request comes in, the first thing knfsd
> > has
> > to do is open a new file descriptor. While this is often a
> > relatively
> > inexpensive thing to do for most local filesystems, it is usually
> > less
> > so for FUSE, clustered or networked filesystems that are being
> > exported
> > by knfsd.
> 
> True, I haven't measured much effect if any of open and close on
> local
> file systems. It would be valuable if the cover letter provided a
> more
> quantified assessment of the cost for these other use cases. It
> sounds
> plausible to me that they would be more expensive, but I'm wondering
> if
> the additional complexity of an open file cache is warranted and
> effective. Do you have any benchmark results to share?
> 
> Are there particular workloads where you believe open caching will be
> especially beneficial?

I'd expect pretty much anything with a nontrivial open() method. i.e.:
FUSE, GFS2, OCFS2, CEPH, etc. to benefit.

I've seen no slowdowns so far with traditional filesystems: i.e. ext4
and xfs.

Note that the removal of the raparms cache does in many way compensate
for the new need to lookup the struct file.

> > This set of patches attempts to reduce some of that cost by caching
> > open file descriptors so that they may be reused by other incoming
> > READ/WRITE requests for the same file.
> 
> Is the open file cache a single cache per server? Wondering if there
> can be significant interference (eg lock contention or cache
> sloshing)
> between separate workloads on different exports, for example.

The file cache is global. Cache lookups are lockless (i.e. RCU
protected), so there is little contention for the case where there is
already an entry. For the case where we have to add an entry, there is
a mutex that might get contended in the case of workloads with lots of
small file open+closes.

> Do you have any benchmark results that show that removing the raparms
> cache is harmless?

The same information is carried in struct file. The whole raparms cache
was just a hack in order to allow us to port the readahead information
across struct file instances. Now that we are caching the struct file
itself, the raparms hack is unnecessary.

IOW: I haven't seen any slowdowns so far, however I don't have access
to a bleeding edge networking setup that would push this further.

> > One danger when doing this, is that knfsd may end up caching file
> > descriptors for files that have been unlinked. In order to deal
> > with
> > this issue, we use fsnotify to monitor the files, and have hooks to
> > evict those descriptors from the file cache if the i_nlink value
> > goes to 0.
> > 
> > Jeff Layton (12):
> >  sunrpc: add a new cache_detail operation for when a cache is
> > flushed
> >  locks: create a new notifier chain for lease attempts
> >  nfsd: add a new struct file caching facility to nfsd
> >  nfsd: hook up nfsd_write to the new nfsd_file cache
> >  nfsd: hook up nfsd_read to the nfsd_file cache
> >  nfsd: hook nfsd_commit up to the nfsd_file cache
> >  nfsd: convert nfs4_file->fi_fds array to use nfsd_files
> >  nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
> >  nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
> >  nfsd: have nfsd_test_lock use the nfsd_file cache
> >  nfsd: rip out the raparms cache
> >  nfsd: close cached files prior to a REMOVE or RENAME that would
> >    replace target
> > 
> > Trond Myklebust (4):
> >  notify: export symbols for use by the knfsd file cache
> >  vfs: Export flush_delayed_fput for use by knfsd.
> >  nfsd: Fix up some unused variable warnings
> >  nfsd: Fix the documentation for svcxdr_tmpalloc()
> > 
> > fs/file_table.c                  |   1 +
> > fs/locks.c                       |  62 +++
> > fs/nfsd/Kconfig                  |   1 +
> > fs/nfsd/Makefile                 |   3 +-
> > fs/nfsd/blocklayout.c            |   3 +-
> > fs/nfsd/export.c                 |  13 +
> > fs/nfsd/filecache.c              | 885
> > +++++++++++++++++++++++++++++++
> > fs/nfsd/filecache.h              |  60 +++
> > fs/nfsd/nfs4layouts.c            |  12 +-
> > fs/nfsd/nfs4proc.c               |  83 +--
> > fs/nfsd/nfs4state.c              | 183 ++++---
> > fs/nfsd/nfs4xdr.c                |  31 +-
> > fs/nfsd/nfssvc.c                 |  16 +-
> > fs/nfsd/state.h                  |  10 +-
> > fs/nfsd/trace.h                  | 140 +++++
> > fs/nfsd/vfs.c                    | 295 ++++-------
> > fs/nfsd/vfs.h                    |   9 +-
> > fs/nfsd/xdr4.h                   |  19 +-
> > fs/notify/fsnotify.h             |   2 -
> > fs/notify/group.c                |   2 +
> > fs/notify/mark.c                 |   6 +
> > include/linux/fs.h               |   5 +
> > include/linux/fsnotify_backend.h |   2 +
> > include/linux/sunrpc/cache.h     |   1 +
> > net/sunrpc/cache.c               |   3 +
> > 25 files changed, 1465 insertions(+), 382 deletions(-)
> > create mode 100644 fs/nfsd/filecache.c
> > create mode 100644 fs/nfsd/filecache.h
> > 
> > -- 
> > 2.21.0
> > 
> 
> --
> Chuck Lever
> 
> 
>
Chuck Lever July 1, 2019, 3:39 p.m. UTC | #3
> On Jul 1, 2019, at 11:17 AM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Mon, 2019-07-01 at 11:02 -0400, Chuck Lever wrote:
>> Interesting work! Kudos to you and Jeff.
>> 
>> 
>>> On Jun 30, 2019, at 9:52 AM, Trond Myklebust <trondmy@gmail.com>
>>> wrote:
>>> 
>>> When a NFSv3 READ or WRITE request comes in, the first thing knfsd
>>> has
>>> to do is open a new file descriptor. While this is often a
>>> relatively
>>> inexpensive thing to do for most local filesystems, it is usually
>>> less
>>> so for FUSE, clustered or networked filesystems that are being
>>> exported
>>> by knfsd.
>> 
>> True, I haven't measured much effect if any of open and close on
>> local
>> file systems. It would be valuable if the cover letter provided a
>> more
>> quantified assessment of the cost for these other use cases. It
>> sounds
>> plausible to me that they would be more expensive, but I'm wondering
>> if
>> the additional complexity of an open file cache is warranted and
>> effective. Do you have any benchmark results to share?
>> 
>> Are there particular workloads where you believe open caching will be
>> especially beneficial?
> 
> I'd expect pretty much anything with a nontrivial open() method. i.e.:
> FUSE, GFS2, OCFS2, CEPH, etc. to benefit.
> 
> I've seen no slowdowns so far with traditional filesystems: i.e. ext4
> and xfs.
> 
> Note that the removal of the raparms cache does in many way compensate
> for the new need to lookup the struct file.
> 
>>> This set of patches attempts to reduce some of that cost by caching
>>> open file descriptors so that they may be reused by other incoming
>>> READ/WRITE requests for the same file.
>> 
>> Is the open file cache a single cache per server? Wondering if there
>> can be significant interference (eg lock contention or cache
>> sloshing)
>> between separate workloads on different exports, for example.
> 
> The file cache is global. Cache lookups are lockless (i.e. RCU
> protected), so there is little contention for the case where there is
> already an entry. For the case where we have to add an entry, there is
> a mutex that might get contended in the case of workloads with lots of
> small file open+closes.

>> Do you have any benchmark results that show that removing the raparms
>> cache is harmless?
> 
> The same information is carried in struct file. The whole raparms cache
> was just a hack in order to allow us to port the readahead information
> across struct file instances. Now that we are caching the struct file
> itself, the raparms hack is unnecessary.

OK. I see the patch description of 11/16 mentions something about "stop
fiddling with raparms" but IMO the patch description for 13/16 should be
changed to make the above clear. Thanks!


> IOW: I haven't seen any slowdowns so far, however I don't have access
> to a bleeding edge networking setup that would push this further.
> 
>>> One danger when doing this, is that knfsd may end up caching file
>>> descriptors for files that have been unlinked. In order to deal
>>> with
>>> this issue, we use fsnotify to monitor the files, and have hooks to
>>> evict those descriptors from the file cache if the i_nlink value
>>> goes to 0.
>>> 
>>> Jeff Layton (12):
>>> sunrpc: add a new cache_detail operation for when a cache is
>>> flushed
>>> locks: create a new notifier chain for lease attempts
>>> nfsd: add a new struct file caching facility to nfsd
>>> nfsd: hook up nfsd_write to the new nfsd_file cache
>>> nfsd: hook up nfsd_read to the nfsd_file cache
>>> nfsd: hook nfsd_commit up to the nfsd_file cache
>>> nfsd: convert nfs4_file->fi_fds array to use nfsd_files
>>> nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
>>> nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
>>> nfsd: have nfsd_test_lock use the nfsd_file cache
>>> nfsd: rip out the raparms cache
>>> nfsd: close cached files prior to a REMOVE or RENAME that would
>>>   replace target
>>> 
>>> Trond Myklebust (4):
>>> notify: export symbols for use by the knfsd file cache
>>> vfs: Export flush_delayed_fput for use by knfsd.
>>> nfsd: Fix up some unused variable warnings
>>> nfsd: Fix the documentation for svcxdr_tmpalloc()
>>> 
>>> fs/file_table.c                  |   1 +
>>> fs/locks.c                       |  62 +++
>>> fs/nfsd/Kconfig                  |   1 +
>>> fs/nfsd/Makefile                 |   3 +-
>>> fs/nfsd/blocklayout.c            |   3 +-
>>> fs/nfsd/export.c                 |  13 +
>>> fs/nfsd/filecache.c              | 885
>>> +++++++++++++++++++++++++++++++
>>> fs/nfsd/filecache.h              |  60 +++
>>> fs/nfsd/nfs4layouts.c            |  12 +-
>>> fs/nfsd/nfs4proc.c               |  83 +--
>>> fs/nfsd/nfs4state.c              | 183 ++++---
>>> fs/nfsd/nfs4xdr.c                |  31 +-
>>> fs/nfsd/nfssvc.c                 |  16 +-
>>> fs/nfsd/state.h                  |  10 +-
>>> fs/nfsd/trace.h                  | 140 +++++
>>> fs/nfsd/vfs.c                    | 295 ++++-------
>>> fs/nfsd/vfs.h                    |   9 +-
>>> fs/nfsd/xdr4.h                   |  19 +-
>>> fs/notify/fsnotify.h             |   2 -
>>> fs/notify/group.c                |   2 +
>>> fs/notify/mark.c                 |   6 +
>>> include/linux/fs.h               |   5 +
>>> include/linux/fsnotify_backend.h |   2 +
>>> include/linux/sunrpc/cache.h     |   1 +
>>> net/sunrpc/cache.c               |   3 +
>>> 25 files changed, 1465 insertions(+), 382 deletions(-)
>>> create mode 100644 fs/nfsd/filecache.c
>>> create mode 100644 fs/nfsd/filecache.h
>>> 
>>> -- 
>>> 2.21.0
>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

--
Chuck Lever
J. Bruce Fields July 31, 2019, 10:05 p.m. UTC | #4
Sorry for the delay responding....

On Sun, Jun 30, 2019 at 09:52:24AM -0400, Trond Myklebust wrote:
> When a NFSv3 READ or WRITE request comes in, the first thing knfsd has
> to do is open a new file descriptor.

I assume it shouldn't make a significant difference for NFSv4?

> While this is often a relatively
> inexpensive thing to do for most local filesystems, it is usually less
> so for FUSE, clustered or networked filesystems that are being exported
> by knfsd.
> 
> This set of patches attempts to reduce some of that cost by caching
> open file descriptors so that they may be reused by other incoming
> READ/WRITE requests for the same file.
> One danger when doing this, is that knfsd may end up caching file
> descriptors for files that have been unlinked. In order to deal with
> this issue, we use fsnotify to monitor the files, and have hooks to
> evict those descriptors from the file cache if the i_nlink value
> goes to 0.

That was one of the objections to previous attempts at a file cache so
it's good to have a simple solution.

This attempt seems pretty well thought-out.  I'll tentatively target
this for 5.4 pending some more review and testing.

--b.

> 
> Jeff Layton (12):
>   sunrpc: add a new cache_detail operation for when a cache is flushed
>   locks: create a new notifier chain for lease attempts
>   nfsd: add a new struct file caching facility to nfsd
>   nfsd: hook up nfsd_write to the new nfsd_file cache
>   nfsd: hook up nfsd_read to the nfsd_file cache
>   nfsd: hook nfsd_commit up to the nfsd_file cache
>   nfsd: convert nfs4_file->fi_fds array to use nfsd_files
>   nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
>   nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
>   nfsd: have nfsd_test_lock use the nfsd_file cache
>   nfsd: rip out the raparms cache
>   nfsd: close cached files prior to a REMOVE or RENAME that would
>     replace target
> 
> Trond Myklebust (4):
>   notify: export symbols for use by the knfsd file cache
>   vfs: Export flush_delayed_fput for use by knfsd.
>   nfsd: Fix up some unused variable warnings
>   nfsd: Fix the documentation for svcxdr_tmpalloc()
> 
>  fs/file_table.c                  |   1 +
>  fs/locks.c                       |  62 +++
>  fs/nfsd/Kconfig                  |   1 +
>  fs/nfsd/Makefile                 |   3 +-
>  fs/nfsd/blocklayout.c            |   3 +-
>  fs/nfsd/export.c                 |  13 +
>  fs/nfsd/filecache.c              | 885 +++++++++++++++++++++++++++++++
>  fs/nfsd/filecache.h              |  60 +++
>  fs/nfsd/nfs4layouts.c            |  12 +-
>  fs/nfsd/nfs4proc.c               |  83 +--
>  fs/nfsd/nfs4state.c              | 183 ++++---
>  fs/nfsd/nfs4xdr.c                |  31 +-
>  fs/nfsd/nfssvc.c                 |  16 +-
>  fs/nfsd/state.h                  |  10 +-
>  fs/nfsd/trace.h                  | 140 +++++
>  fs/nfsd/vfs.c                    | 295 ++++-------
>  fs/nfsd/vfs.h                    |   9 +-
>  fs/nfsd/xdr4.h                   |  19 +-
>  fs/notify/fsnotify.h             |   2 -
>  fs/notify/group.c                |   2 +
>  fs/notify/mark.c                 |   6 +
>  include/linux/fs.h               |   5 +
>  include/linux/fsnotify_backend.h |   2 +
>  include/linux/sunrpc/cache.h     |   1 +
>  net/sunrpc/cache.c               |   3 +
>  25 files changed, 1465 insertions(+), 382 deletions(-)
>  create mode 100644 fs/nfsd/filecache.c
>  create mode 100644 fs/nfsd/filecache.h
> 
> -- 
> 2.21.0