mbox series

[RFC,v1,0/13] Convert NFS client to new fscache-iter API

Message ID 1594825849-24991-1-git-send-email-dwysocha@redhat.com
Headers show
Series Convert NFS client to new fscache-iter API | expand

Message

David Wysochanski July 15, 2020, 3:10 p.m. UTC
These patches update the nfs client to use the new FS-Cache API and are at:
https://github.com/DaveWysochanskiRH/kernel/commit/a426b431873ea755c94ccd403aeaba0c4e635016

They are based on David Howells fscache-iter tree at ff12b5a05bd6984ad83e762f702cb655222bad74
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=fscache-iter&id=ff12b5a05bd6984ad83e762f702cb655222bad74

The following patches may be of specific interest to review
as they are related to the conversion:
NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
NFS: Convert nfs_readpage() and readpages() to new fscache API
NFS: Only use and unuse an fscache cookie a single time based on NFS_INO_FSCACHE
NFS: Convert fscache invalidation and update aux_data and i_size

Note that this is only a "first pass" v1 / RFC set I wanted to get
out there for the maintainers to see and know this is being worked on.
It is far from perfect and has some problems still need worked out.
A short summary of this set:

1. Takes a "least invasive to existing code" approach
* most fscache bits stay fs/nfs/fscache.[ch] 
* fscache enable/disable switched inside NFS code on nfs_inode.fscache
* only enable fscache for reads
* may not be the best approach (see future patcheset items below)

2. Basically works and passes a series of tests
* should not affect NFS when fscache is disabled (no "fsc" option)
* a couple small NFS + fscache basic verification tests
* connectathon (all NFS versions, with/without 'fsc' option)
* various iozone tests (all NFS versions, with/without 'fsc' option)

3. Still has a few known problems that are being tracked down
* Data integrity issue when write with O_DIRECT and read
back without O_DIRECT (we get 0's back from the cache)
* iozone tests run through ok but at the end superblock
cookies are left (each NFS version has a different superblock
cookie); this leads to "duplicate cookie" messages
upon subsequent mounts / runs
* A couple oopses in fscache reported to dhowells, may
be related to NFS's enable/disable of fscache on read/write
* Kernel build fails about halfway through with a strange
dubious error at the same place, linking this file:
ld: net/sunrpc/auth_gss/trace.o: attempt to load strings from a non-string section (number 41)


In addition to fixing various code issues and above issues,
a future patchset may:

1. The readpage/readpages conversion patch call read_helpers
directly rather than isolation into fs/nfs/fscache.c
* Similar to the AFS conversion, with calls directly to the
read_helpers, but not sure about non-fsc code path

2. Add write-through support
* Would probably eliminate some problematic code
paths where fscache is turned on / off depending on whether
a file switches from read to write and vice-versa
* This would rework open as well
* Have to work out whether this is possible or not and
with what caveats as far as NFS version support (is this
an NFSv4.x only thing?)

3. Rework dfprintks and/or add ftrace points
* fscache/cachefiles has 'debug' logging similar to rpcdebug
so not sure if we keep rpcdebug here or go full ftrace


Dave Wysochanski (13):
  NFS: Clean up nfs_readpage() and nfs_readpages()
  NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
    succeeds
  NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
    nfs_readdesc
  NFS: Call readpage_async_filler() from nfs_readpage_async()
  NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
  NFS: Rename readpage_async_filler() to nfs_pageio_add_page_read()
  NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
  NFS: Allow nfs_async_read_completion_ops to be used by other NFS code
  NFS: Convert nfs_readpage() and readpages() to new fscache API
  NFS: Allow NFS use of new fscache API in build
  NFS: Only use and unuse an fscache cookie a single time based on
    NFS_INO_FSCACHE
  NFS: Convert fscache invalidation and update aux_data and i_size
  NFS: Call nfs_fscache_invalidate() when write extends the size of the
    file

 fs/nfs/Kconfig           |   2 +-
 fs/nfs/file.c            |  20 +--
 fs/nfs/fscache-index.c   |  94 --------------
 fs/nfs/fscache.c         | 315 ++++++++++++++++++++++++-----------------------
 fs/nfs/fscache.h         |  92 +++++---------
 fs/nfs/inode.c           |   1 -
 fs/nfs/internal.h        |   4 +
 fs/nfs/pagelist.c        |   1 +
 fs/nfs/read.c            | 221 ++++++++++++++++-----------------
 fs/nfs/write.c           |   9 +-
 include/linux/nfs_fs.h   |   3 +-
 include/linux/nfs_page.h |   1 +
 include/linux/nfs_xdr.h  |   1 +
 13 files changed, 322 insertions(+), 442 deletions(-)

Comments

J. Bruce Fields July 17, 2020, 2:25 p.m. UTC | #1
On Wed, Jul 15, 2020 at 11:10:36AM -0400, Dave Wysochanski wrote:
> These patches update the nfs client to use the new FS-Cache API and are at:
> https://github.com/DaveWysochanskiRH/kernel/commit/a426b431873ea755c94ccd403aeaba0c4e635016

Say I had a hypothetical, err, friend, who hadn't been following that
FS-Cache work--could you summarize the advantages it bring us?

--b.

> 
> They are based on David Howells fscache-iter tree at ff12b5a05bd6984ad83e762f702cb655222bad74
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=fscache-iter&id=ff12b5a05bd6984ad83e762f702cb655222bad74
> 
> The following patches may be of specific interest to review
> as they are related to the conversion:
> NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
> NFS: Convert nfs_readpage() and readpages() to new fscache API
> NFS: Only use and unuse an fscache cookie a single time based on NFS_INO_FSCACHE
> NFS: Convert fscache invalidation and update aux_data and i_size
> 
> Note that this is only a "first pass" v1 / RFC set I wanted to get
> out there for the maintainers to see and know this is being worked on.
> It is far from perfect and has some problems still need worked out.
> A short summary of this set:
> 
> 1. Takes a "least invasive to existing code" approach
> * most fscache bits stay fs/nfs/fscache.[ch] 
> * fscache enable/disable switched inside NFS code on nfs_inode.fscache
> * only enable fscache for reads
> * may not be the best approach (see future patcheset items below)
> 
> 2. Basically works and passes a series of tests
> * should not affect NFS when fscache is disabled (no "fsc" option)
> * a couple small NFS + fscache basic verification tests
> * connectathon (all NFS versions, with/without 'fsc' option)
> * various iozone tests (all NFS versions, with/without 'fsc' option)
> 
> 3. Still has a few known problems that are being tracked down
> * Data integrity issue when write with O_DIRECT and read
> back without O_DIRECT (we get 0's back from the cache)
> * iozone tests run through ok but at the end superblock
> cookies are left (each NFS version has a different superblock
> cookie); this leads to "duplicate cookie" messages
> upon subsequent mounts / runs
> * A couple oopses in fscache reported to dhowells, may
> be related to NFS's enable/disable of fscache on read/write
> * Kernel build fails about halfway through with a strange
> dubious error at the same place, linking this file:
> ld: net/sunrpc/auth_gss/trace.o: attempt to load strings from a non-string section (number 41)
> 
> 
> In addition to fixing various code issues and above issues,
> a future patchset may:
> 
> 1. The readpage/readpages conversion patch call read_helpers
> directly rather than isolation into fs/nfs/fscache.c
> * Similar to the AFS conversion, with calls directly to the
> read_helpers, but not sure about non-fsc code path
> 
> 2. Add write-through support
> * Would probably eliminate some problematic code
> paths where fscache is turned on / off depending on whether
> a file switches from read to write and vice-versa
> * This would rework open as well
> * Have to work out whether this is possible or not and
> with what caveats as far as NFS version support (is this
> an NFSv4.x only thing?)
> 
> 3. Rework dfprintks and/or add ftrace points
> * fscache/cachefiles has 'debug' logging similar to rpcdebug
> so not sure if we keep rpcdebug here or go full ftrace
> 
> 
> Dave Wysochanski (13):
>   NFS: Clean up nfs_readpage() and nfs_readpages()
>   NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
>     succeeds
>   NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
>     nfs_readdesc
>   NFS: Call readpage_async_filler() from nfs_readpage_async()
>   NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
>   NFS: Rename readpage_async_filler() to nfs_pageio_add_page_read()
>   NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
>   NFS: Allow nfs_async_read_completion_ops to be used by other NFS code
>   NFS: Convert nfs_readpage() and readpages() to new fscache API
>   NFS: Allow NFS use of new fscache API in build
>   NFS: Only use and unuse an fscache cookie a single time based on
>     NFS_INO_FSCACHE
>   NFS: Convert fscache invalidation and update aux_data and i_size
>   NFS: Call nfs_fscache_invalidate() when write extends the size of the
>     file
> 
>  fs/nfs/Kconfig           |   2 +-
>  fs/nfs/file.c            |  20 +--
>  fs/nfs/fscache-index.c   |  94 --------------
>  fs/nfs/fscache.c         | 315 ++++++++++++++++++++++++-----------------------
>  fs/nfs/fscache.h         |  92 +++++---------
>  fs/nfs/inode.c           |   1 -
>  fs/nfs/internal.h        |   4 +
>  fs/nfs/pagelist.c        |   1 +
>  fs/nfs/read.c            | 221 ++++++++++++++++-----------------
>  fs/nfs/write.c           |   9 +-
>  include/linux/nfs_fs.h   |   3 +-
>  include/linux/nfs_page.h |   1 +
>  include/linux/nfs_xdr.h  |   1 +
>  13 files changed, 322 insertions(+), 442 deletions(-)
> 
> -- 
> 1.8.3.1
David Howells July 17, 2020, 3:19 p.m. UTC | #2
J. Bruce Fields <bfields@fieldses.org> wrote:

> Say I had a hypothetical, err, friend, who hadn't been following that
> FS-Cache work--could you summarize the advantages it bring us?

https://lore.kernel.org/linux-nfs/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/T/#t

 - Makes the caching code a lot simpler (~2400 LoC removed, ~1000 LoDoc[*]
   removed at the moment from fscache, cachefiles and afs).

 - Stops using bmap to work out what data is cached.  This isn't reliable with
   modern extend-based filesystems.  A bitmap of cached granules is saved in
   an xattr instead.

 - Uses async DIO (kiocbs) to do I/O to/from the cache rather than using
   buffered writes (kernel_write) and pagecache snooping for read (don't ask).

   - A lot faster and less CPU intensive as there's no page-to-page copying.

   - A lot less VM pressure as it doesn't have duplicate pages in the backing
     fs that aren't really accounted right.

 - Uses tmpfiles+link to better handle invalidation.  It will at some point
   hopefully employ linkat(AT_LINK_REPLACE) to effect cut-over on disk rather
   than unlink,link.

David

[*] The upstream docs got ReSTified, so the doc patches I have are now useless
    and need reworking:-(.
J. Bruce Fields July 17, 2020, 4:18 p.m. UTC | #3
On Fri, Jul 17, 2020 at 04:19:25PM +0100, David Howells wrote:
> J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> > Say I had a hypothetical, err, friend, who hadn't been following that
> > FS-Cache work--could you summarize the advantages it bring us?
> 
> https://lore.kernel.org/linux-nfs/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/T/#t
> 
>  - Makes the caching code a lot simpler (~2400 LoC removed, ~1000 LoDoc[*]
>    removed at the moment from fscache, cachefiles and afs).
> 
>  - Stops using bmap to work out what data is cached.  This isn't reliable with
>    modern extend-based filesystems.  A bitmap of cached granules is saved in
>    an xattr instead.
> 
>  - Uses async DIO (kiocbs) to do I/O to/from the cache rather than using
>    buffered writes (kernel_write) and pagecache snooping for read (don't ask).
> 
>    - A lot faster and less CPU intensive as there's no page-to-page copying.
> 
>    - A lot less VM pressure as it doesn't have duplicate pages in the backing
>      fs that aren't really accounted right.
> 
>  - Uses tmpfiles+link to better handle invalidation.  It will at some point
>    hopefully employ linkat(AT_LINK_REPLACE) to effect cut-over on disk rather
>    than unlink,link.

Thanks!--b.

> David
> 
> [*] The upstream docs got ReSTified, so the doc patches I have are now useless
>     and need reworking:-(.