mbox series

[RFC,0/4] iov_iter: Add composite, scatterlist and skbuff iterator types

Message ID 20250321161407.3333724-1-dhowells@redhat.com (mailing list archive)
Headers show
Series iov_iter: Add composite, scatterlist and skbuff iterator types | expand

Message

David Howells March 21, 2025, 4:14 p.m. UTC
Hi Leon,

Here are some patches that illustrate some of what I'm thinking of doing to
iov iterators.  Note that they are incomplete as I won't have time to
finish or test them before LSF, but I thought I'd post them for use as a
discussion point.

So the first thing I want to do is to move certain iterators out of line
from the main inline iteration multiplexor.  This code gets relentlessly
duplicated and adding further iterator types expands a whole load of
places.  So the DISCARD iterator (which is just a simple short circuit) and
the XARRAY iterator (which is obsolete) move out of line.

Then I want to add three more types, for now:

 (1) ITER_ITERLIST.  A compound iterator that takes an array of iterators
     of disparate types.  The aim here is to make it possible to fabricate
     an network message in one go (say an RPC call) and pass it to a socket
     without the need for corking.

 (2) ITER_SCATTERLIST.  An iterator that takes a scatterlist.  This can be
     used to act as a bridge in converting interfaces that currently take a
     scatterlist (e.g. crypto).  It requires extra fields adding to the
     iov_iter struct because chained scatterlists do not have have a rewind
     capability and so iov_iter_revert() must go back to the beginning and
     fast-forward.

 (3) ITER_SKBUFF.  An iterator that takes a network buffer.  The aim here
     is to render skb_to_sgvec() unnecessary for doing crypto operations.

The patches can also be found here:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-experimental

David

David Howells (4):
  iov_iter: Move ITER_DISCARD and ITER_XARRAY iteration out-of-line
  iov_iter: Add an iterator-of-iterators
  iov_iter: Add a scatterlist iterator type
  iov_iter: Add a scatterlist iterator type [INCOMPLETE]

 include/linux/iov_iter.h |  77 +----
 include/linux/uio.h      |  37 +++
 lib/iov_iter.c           | 675 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 710 insertions(+), 79 deletions(-)

Comments

Christoph Hellwig March 23, 2025, 6:21 a.m. UTC | #1
This is going entirely in the wrong direction.  We don't need more iter
types but less.  The reason why we have to many is because the underlying
representation of the ranges is a mess which goes deeper than just the
iterator, because it also means we have to convert between the
underlying representations all the time.

E.g. the socket code should have (and either has for a while or at least
there were patches) been using bio_vecs instead of reinventing them as sk
fragment.  The crypto code should not be using scatterlists, which are a
horrible data structure because they mix up the physical memory
description and the dma mapping information which isn't even used for
most uses, etc.

So instead of more iters let's convert everyone to a common
scatter/gather memory definition, which simplifies the iters.  For now
that is the bio_vec, which really should be converted from storing a
struct page to a phys_addr_t (and maybe renamed if that helps adoption).
That allows to trivially kill the kvec for example.

As for the head/tail - that seems to be a odd NFS/sunrpc fetish.  I've
actually started a little project to just convert the sunrpc code to
use bio_vecs, which massively simplifies the code, and allows directly
passing it to the iters in the socket API.  It doesn't quite work yet
but shows how all these custom (and in this case rather ad-hoc) memory
fragment representation cause a huge mess.

I don't think the iterlist can work in practice, but it would be nice
to have for a few use cases.  If it worked it should hopefully allow
to kill off the odd xarray iterator.
Hannes Reinecke March 23, 2025, 1:39 p.m. UTC | #2
On 3/23/25 07:21, Christoph Hellwig wrote:
> This is going entirely in the wrong direction.  We don't need more iter
> types but less.  The reason why we have to many is because the underlying
> representation of the ranges is a mess which goes deeper than just the
> iterator, because it also means we have to convert between the
> underlying representations all the time.
> 
> E.g. the socket code should have (and either has for a while or at least
> there were patches) been using bio_vecs instead of reinventing them as sk
> fragment.  The crypto code should not be using scatterlists, which are a
> horrible data structure because they mix up the physical memory
> description and the dma mapping information which isn't even used for
> most uses, etc.
> 
> So instead of more iters let's convert everyone to a common
> scatter/gather memory definition, which simplifies the iters.  For now
> that is the bio_vec, which really should be converted from storing a
> struct page to a phys_addr_t (and maybe renamed if that helps adoption).
> That allows to trivially kill the kvec for example.
> 
> As for the head/tail - that seems to be a odd NFS/sunrpc fetish.  I've
> actually started a little project to just convert the sunrpc code to
> use bio_vecs, which massively simplifies the code, and allows directly
> passing it to the iters in the socket API.  It doesn't quite work yet
> but shows how all these custom (and in this case rather ad-hoc) memory
> fragment representation cause a huge mess.
> 
> I don't think the iterlist can work in practice, but it would be nice
> to have for a few use cases.  If it worked it should hopefully allow
> to kill off the odd xarray iterator.
> 
Can we have a session around this?
IE define how iterators should be used, and what the iterator elements
should be. If we do it properly this will also fix the frozen page 
discussion we're having; if we define iterators whose data elements
are _not_ pages then clearly one cannot take a reference to them.

But in either case, we should define the long-term goal such that
people can start converting stuff.

Cheers,

Hannes
Matthew Wilcox March 23, 2025, 2:33 p.m. UTC | #3
On Sat, Mar 22, 2025 at 11:21:28PM -0700, Christoph Hellwig wrote:
> This is going entirely in the wrong direction.  We don't need more iter
> types but less.  The reason why we have to many is because the underlying
> representation of the ranges is a mess which goes deeper than just the
> iterator, because it also means we have to convert between the
> underlying representations all the time.
> 
> E.g. the socket code should have (and either has for a while or at least
> there were patches) been using bio_vecs instead of reinventing them as sk
> fragment.  The crypto code should not be using scatterlists, which are a

I did this work six years ago -- see 8842d285bafa

Unfortunately, networking is full of inconsiderate arseholes who backed
it out without even talking to me in 21d2e6737c97
Matthew Wilcox March 23, 2025, 2:35 p.m. UTC | #4
On Sun, Mar 23, 2025 at 02:39:25PM +0100, Hannes Reinecke wrote:
> Can we have a session around this?

Wednesday, 10:30