diff mbox series

[GIT,PULL] vfs netfs

Message ID 20240913-vfs-netfs-39ef6f974061@brauner (mailing list archive)
State New
Headers show
Series [GIT,PULL] vfs netfs | expand

Pull-request

git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.12.netfs

Commit Message

Christian Brauner Sept. 13, 2024, 4:56 p.m. UTC
Hey Linus,

/* Summary */

This contains the work to improve read/write performance for the new
netfs library.

The main performance enhancing changes are:

    - Define a structure, struct folio_queue, and a new iterator type,
      ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See
      that patch for questions about naming and form.

      ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The
      problem with an xarray is that accessing it requires the use of a
      lock (typically the RCU read lock) - and this means that we can't
      supply iterate_and_advance() with a step function that might sleep
      (crypto for example) without having to drop the lock between
      pages. ITER_FOLIOQ is the iterator for a chain of folio_queue
      structs, where each folio_queue holds a small list of folios. A
      folio_queue struct is a simpler structure than xarray and is not
      subject to concurrent manipulation by the VM. folio_queue is used
      rather than a bvec[] as it can form lists of indefinite size,
      adding to one end and removing from the other on the fly.

    - Provide a copy_folio_from_iter() wrapper.

    - Make cifs RDMA support ITER_FOLIOQ.

    - Use folio queues in the write-side helpers instead of xarrays.

    - Add a function to reset the iterator in a subrequest.

    - Simplify the write-side helpers to use sheaves to skip gaps rather
      than trying to work out where gaps are.

    - In afs, make the read subrequests asynchronous, putting them into work
      items to allow the next patch to do progressive unlocking/reading.

    - Overhaul the read-side helpers to improve performance.

    - Fix the caching of a partial block at the end of a file.

    - Allow a store to be cancelled.

Then some changes for cifs to make it use folio queues instead of
xarrays for crypto bufferage:

    - Use raw iteration functions rather than manually coding iteration when
      hashing data.

    - Switch to using folio_queue for crypto buffers.

    - Remove the xarray bits.

Make some adjustments to the /proc/fs/netfs/stats file such that:

    - All the netfs stats lines begin 'Netfs:' but change this to something
      a bit more useful.

    - Add a couple of stats counters to track the numbers of skips and waits
      on the per-inode writeback serialisation lock to make it easier to
      check for this as a source of performance loss.

Miscellaneous work:

    - Ensure that the sb_writers lock is taken around
      vfs_{set,remove}xattr() in the cachefiles code.

    - Reduce the number of conditional branches in netfs_perform_write().

    - Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and
      remove cifs_post_modify().

    - Move the max_len/max_nr_segs members from netfs_io_subrequest to
      netfs_io_request as they're only needed for one subreq at a time.

    - Add an 'unknown' source value for tracing purposes.

    - Remove NETFS_COPY_TO_CACHE as it's no longer used.

    - Set the request work function up front at allocation time.

    - Use bh-disabling spinlocks for rreq->lock as cachefiles completion may
      be run from block-filesystem DIO completion in softirq context.

    - Remove fs/netfs/io.c.

/* Testing */

gcc version 14.2.0 (Debian 14.2.0-3)
Debian clang version 16.0.6 (27+b1)

All patches are based on the vfs-6.11-rc7.fixes merge to bring in prerequisite
fixes in individual filesystems. All of this has been sitting in linux-next. No
build failures or warnings were observed.

/* Conflicts */

Merge conflicts with mainline
=============================

No known merge conflicts.

This has now a merge conflict with main due to some rather late cifs fixes.
This can be resolved by:

git rm fs/netfs/io.c

and then:

+++ b/fs/smb/client/cifssmb.c
@@@ -1261,16 -1261,6 +1261,14 @@@ openRetry
        return rc;
  }

 +static void cifs_readv_worker(struct work_struct *work)
 +{
 +      struct cifs_io_subrequest *rdata =
 +              container_of(work, struct cifs_io_subrequest, subreq.work);
 +
-       netfs_subreq_terminated(&rdata->subreq,
-                               (rdata->result == 0 || rdata->result == -EAGAIN) ?
-                               rdata->got_bytes : rdata->result, true);
++      netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false);
 +}
 +
  static void
  cifs_readv_callback(struct mid_q_entry *mid)
  {
@@@ -1323,21 -1306,11 +1321,23 @@@
                rdata->result = -EIO;
        }

 -      if (rdata->result == 0 || rdata->result == -EAGAIN)
 -              iov_iter_advance(&rdata->subreq.io_iter, rdata->got_bytes);
 +      if (rdata->result == -ENODATA) {
 +              __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
 +              rdata->result = 0;
 +      } else {
-               if (rdata->got_bytes < rdata->actual_len &&
-                   rdata->subreq.start + rdata->subreq.transferred + rdata->got_bytes ==
-                   ictx->remote_i_size) {
++              size_t trans = rdata->subreq.transferred + rdata->got_bytes;
++              if (trans < rdata->subreq.len &&
++                  rdata->subreq.start + trans == ictx->remote_i_size) {
 +                      __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
 +                      rdata->result = 0;
 +              }
 +      }
 +
        rdata->credits.value = 0;
+       rdata->subreq.transferred += rdata->got_bytes;
 -      netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false);
++      trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress);
 +      INIT_WORK(&rdata->subreq.work, cifs_readv_worker);
 +      queue_work(cifsiod_wq, &rdata->subreq.work);
        release_mid(mid);
        add_credits(server, &credits, 0);
  }
+++ b/fs/smb/client/smb2pdu.c
@@@ -4614,6 -4613,10 +4613,8 @@@ smb2_readv_callback(struct mid_q_entry
                              server->credits, server->in_flight,
                              0, cifs_trace_rw_credits_read_response_clear);
        rdata->credits.value = 0;
+       rdata->subreq.transferred += rdata->got_bytes;
 -      if (rdata->subreq.start + rdata->subreq.transferred >= rdata->subreq.rreq->i_size)
 -              __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
+       trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress);
        INIT_WORK(&rdata->subreq.work, smb2_readv_worker);
        queue_work(cifsiod_wq, &rdata->subreq.work);
        release_mid(mid);

Merge conflicts with other trees
================================

No known merge conflicts.

The following changes since commit 4356ab331c8f0dbed0f683abde345cd5503db1e4:

  Merge tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs (2024-09-04 09:33:57 -0700)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.12.netfs

for you to fetch changes up to 4b40d43d9f951d87ae8dc414c2ef5ae50303a266:

  docs: filesystems: corrected grammar of netfs page (2024-09-12 12:20:43 +0200)

Please consider pulling these changes from the signed vfs-6.12.netfs tag.

Thanks!
Christian

----------------------------------------------------------------
vfs-6.12.netfs

----------------------------------------------------------------
Christian Brauner (1):
      Merge branch 'netfs-writeback' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into vfs.netfs

David Howells (24):
      cachefiles: Fix non-taking of sb_writers around set/removexattr
      netfs: Adjust labels in /proc/fs/netfs/stats
      netfs: Record contention stats for writeback lock
      netfs: Reduce number of conditional branches in netfs_perform_write()
      netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode
      netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream
      netfs: Reserve netfs_sreq_source 0 as unset/unknown
      netfs: Remove NETFS_COPY_TO_CACHE
      netfs: Set the request work function upon allocation
      netfs: Use bh-disabling spinlocks for rreq->lock
      mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios
      iov_iter: Provide copy_folio_from_iter()
      cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs
      netfs: Use new folio_queue data type and iterator instead of xarray iter
      netfs: Provide an iterator-reset function
      netfs: Simplify the writeback code
      afs: Make read subreqs async
      netfs: Speed up buffered reading
      netfs: Remove fs/netfs/io.c
      cachefiles, netfs: Fix write to partial block at EOF
      netfs: Cancel dirty folios that have no storage destination
      cifs: Use iterate_and_advance*() routines directly for hashing
      cifs: Switch crypto buffer to use a folio_queue rather than an xarray
      cifs: Don't support ITER_XARRAY

Dennis Lam (1):
      docs: filesystems: corrected grammar of netfs page

 Documentation/filesystems/netfs_library.rst |   2 +-
 fs/9p/vfs_addr.c                            |  11 +-
 fs/afs/file.c                               |  30 +-
 fs/afs/fsclient.c                           |   9 +-
 fs/afs/write.c                              |   4 +-
 fs/afs/yfsclient.c                          |   9 +-
 fs/cachefiles/io.c                          |  19 +-
 fs/cachefiles/xattr.c                       |  34 +-
 fs/ceph/addr.c                              |  76 +--
 fs/netfs/Makefile                           |   4 +-
 fs/netfs/buffered_read.c                    | 766 ++++++++++++++++----------
 fs/netfs/buffered_write.c                   | 309 +++++------
 fs/netfs/direct_read.c                      | 147 ++++-
 fs/netfs/internal.h                         |  43 +-
 fs/netfs/io.c                               | 804 ----------------------------
 fs/netfs/iterator.c                         |  50 ++
 fs/netfs/main.c                             |   7 +-
 fs/netfs/misc.c                             |  94 ++++
 fs/netfs/objects.c                          |  16 +-
 fs/netfs/read_collect.c                     | 544 +++++++++++++++++++
 fs/netfs/read_pgpriv2.c                     | 264 +++++++++
 fs/netfs/read_retry.c                       | 256 +++++++++
 fs/netfs/stats.c                            |  27 +-
 fs/netfs/write_collect.c                    | 246 +++------
 fs/netfs/write_issue.c                      |  93 ++--
 fs/nfs/fscache.c                            |  19 +-
 fs/nfs/fscache.h                            |   7 +-
 fs/smb/client/cifsencrypt.c                 | 144 +----
 fs/smb/client/cifsglob.h                    |   4 +-
 fs/smb/client/cifssmb.c                     |   6 +-
 fs/smb/client/file.c                        |  96 ++--
 fs/smb/client/smb2ops.c                     | 219 ++++----
 fs/smb/client/smb2pdu.c                     |  27 +-
 fs/smb/client/smbdirect.c                   |  82 +--
 include/linux/folio_queue.h                 | 156 ++++++
 include/linux/iov_iter.h                    | 104 ++++
 include/linux/netfs.h                       |  46 +-
 include/linux/uio.h                         |  18 +
 include/trace/events/netfs.h                | 144 +++--
 lib/iov_iter.c                              | 240 ++++++++-
 lib/kunit_iov_iter.c                        | 259 +++++++++
 lib/scatterlist.c                           |  69 ++-
 42 files changed, 3520 insertions(+), 1984 deletions(-)
 delete mode 100644 fs/netfs/io.c
 create mode 100644 fs/netfs/read_collect.c
 create mode 100644 fs/netfs/read_pgpriv2.c
 create mode 100644 fs/netfs/read_retry.c
 create mode 100644 include/linux/folio_queue.h

Comments

Linus Torvalds Sept. 16, 2024, 10:28 a.m. UTC | #1
On Fri, 13 Sept 2024 at 18:57, Christian Brauner <brauner@kernel.org> wrote:
>
> /* Conflicts */
>
> Merge conflicts with mainline

Hmm.

My conflict resolution is _similar_, but at the same time decidedly
different. And I'm not sure why yours is different.

> --- a/fs/smb/client/cifssmb.c
> +++ b/fs/smb/client/cifssmb.c
> @@@ -1261,16 -1261,6 +1261,14 @@@ openRetry
>         return rc;
>   }
>
>  +static void cifs_readv_worker(struct work_struct *work)
>  +{
>  +      struct cifs_io_subrequest *rdata =
>  +              container_of(work, struct cifs_io_subrequest, subreq.work);
>  +
> -       netfs_subreq_terminated(&rdata->subreq,
> -                               (rdata->result == 0 || rdata->result == -EAGAIN) ?
> -                               rdata->got_bytes : rdata->result, true);
> ++      netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false);

So here, I have

++      netfs_read_subreq_terminated(&rdata->subreq, rdata->result, true);

with the third argument being 'true' instead of 'false' as in yours.

The reason? That's what commit a68c74865f51 ("cifs: Fix SMB1
readv/writev callback in the same way as SMB2/3") did when it moved
the (originally) netfs_subreq_terminated() into the worker, and it
changed the 'was_async' argument from "false" to a "true".

Now, that change makes little sense to me (a worker thread is not
softirq context), but  that's what commit a68c74865f51 does, and so
that's logically what the merge should do.

> +       rdata->subreq.transferred += rdata->got_bytes;
>  -      netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false);
> ++      trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress);

where did this trace_netfs_sreq() come from?

> --- a/fs/smb/client/smb2pdu.c
> +++ b/fs/smb/client/smb2pdu.c
> @@@ -4614,6 -4613,10 +4613,8 @@@ smb2_readv_callback(struct mid_q_entry
>                               server->credits, server->in_flight,
>                               0, cifs_trace_rw_credits_read_response_clear);
>         rdata->credits.value = 0;
> +       rdata->subreq.transferred += rdata->got_bytes;
>  -      if (rdata->subreq.start + rdata->subreq.transferred >= rdata->subreq.rreq->i_size)
>  -              __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
> +       trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress);

And where did this conflict resolution come from? I'm not seeing why
it removes that NETFS_SREQ_HIT_EOF bit logic..

Some searching gets me this:

    https://lore.kernel.org/all/1131388.1726141806@warthog.procyon.org.uk/

which seems to explain your merge resolution, and I'm even inclined to
think that it might be right, but it's not *sensible*.

That whole removal of the NETFS_SREQ_HIT_EOF bit ends up undoing part
of commit ee4cdf7ba857 ("netfs: Speed up buffered reading"), and
doesn't seem to be supported by the changes done on the other side of
the conflict resolution.

IOW, the changes may be *correct*, but they do not seem to be valid as
a conflict resolution, and if they are fixes they should be done as
such separately.

Adding DavidH (and Steve French) to the participants, so that they can
know about my confusion, and maybe send a patch to fix it up properly
with actual explanations. Because I don't want to commit the merge as
you suggest without explanations for why those changes were magically
done independently of what seems to be happening in the development
history.

                 Linus
pr-tracker-bot@kernel.org Sept. 16, 2024, 11:09 a.m. UTC | #2
The pull request you sent on Fri, 13 Sep 2024 18:56:36 +0200:

> git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.12.netfs

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/35219bc5c71f4197c8bd10297597de797c1eece5

Thank you!
David Howells Sept. 16, 2024, 12:58 p.m. UTC | #3
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > ++      netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false);
> 
> So here, I have
> 
> ++      netfs_read_subreq_terminated(&rdata->subreq, rdata->result, true);
> 
> with the third argument being 'true' instead of 'false' as in yours.
> 
> The reason? That's what commit a68c74865f51 ("cifs: Fix SMB1
> readv/writev callback in the same way as SMB2/3") did when it moved
> the (originally) netfs_subreq_terminated() into the worker, and it
> changed the 'was_async' argument from "false" to a "true".

As part of these changes, the callback to netfslib from the SMB1 transport
variant is now delegated to a separate worker thread by cifs_readv_callback()
rather than being done in the cifs network processing thread (e.g. as is done
by the SMB2/3 smb2_readv_worker() in smb2pdu.c), so it's better to pass
"false" here.

All that argument does is tell netfslib whether it can do cleanup processing
and retrying in the calling thread (if "false") or whether it needs to
offload it to another thread (if "true").  I should probably rename the
argument from "was_async" to something more explanatory.

By putting "true" here, it causes the already offloaded processing to further
offload unnecessarily.  It shouldn't break things though.

> > +       rdata->subreq.transferred += rdata->got_bytes;
> >  -      netfs_read_subreq_terminated(&rdata->subreq, rdata->result, false);
> > ++      trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress);
> 
> where did this trace_netfs_sreq() come from?

It got copied across with other lines when sync'ing the code with
smb2_readv_callback() whilst attempting the merge resolution.  It's something
that got missed out when porting the changes I'd made to SMB2/3 to SMB1.  It
should have been deferred to a follow up patch.

> > --- a/fs/smb/client/smb2pdu.c
> > +++ b/fs/smb/client/smb2pdu.c
> > @@@ -4614,6 -4613,10 +4613,8 @@@ smb2_readv_callback(struct mid_q_entry
> >                               server->credits, server->in_flight,
> >                               0, cifs_trace_rw_credits_read_response_clear);
> >         rdata->credits.value = 0;
> > +       rdata->subreq.transferred += rdata->got_bytes;
> >  -      if (rdata->subreq.start + rdata->subreq.transferred >= rdata->subreq.rreq->i_size)
> >  -              __set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
> > +       trace_netfs_sreq(&rdata->subreq, netfs_sreq_trace_io_progress);
> 
> And where did this conflict resolution come from? I'm not seeing why
> it removes that NETFS_SREQ_HIT_EOF bit logic..

A fix that went upstream via SteveF's tree rather than Christian's tree added
NETFS_SREQ_HIT_EOF separately:

	1da29f2c39b67b846b74205c81bf0ccd96d34727
	netfs, cifs: Fix handling of short DIO read

The code that added to twiddle NETFS_SREQ_HIT_EOF is in the source, just above
the lines in the hunk above:

	if (rdata->result == -ENODATA) {
		__set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
		rdata->result = 0;
	} else {
		size_t trans = rdata->subreq.transferred + rdata->got_bytes;
		if (trans < rdata->subreq.len &&
		    rdata->subreq.start + trans == ictx->remote_i_size) {
			__set_bit(NETFS_SREQ_HIT_EOF, &rdata->subreq.flags);
			rdata->result = 0;
		}
	}

The two lines removed in the example resolution are therefore redundant and
should have been removed, but weren't.

David
diff mbox series

Patch

diff --cc fs/smb/client/cifssmb.c
index cfae2e918209,04f2a5441a89..d0df0c17b18f
--- a/fs/smb/client/cifssmb.c
diff --cc fs/smb/client/smb2pdu.c
index 88dc49d67037,95377bb91950..bb8ecbbe78af
--- a/fs/smb/client/smb2pdu.c