[RFC,08/28] siw: Inline do_tcp_sendpages()

Message ID	20230316152618.711970-9-dhowells@redhat.com (mailing list archive)
State	Not Applicable
Headers	show Return-Path: <linux-rdma-owner@vger.kernel.org> From: David Howells <dhowells@redhat.com> To: Matthew Wilcox <willy@infradead.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: David Howells <dhowells@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>, Christoph Hellwig <hch@infradead.org>, Jens Axboe <axboe@kernel.dk>, Jeff Layton <jlayton@kernel.org>, Christian Brauner <brauner@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Bernard Metzler <bmt@zurich.ibm.com>, Tom Talpey <tom@talpey.com>, linux-rdma@vger.kernel.org Subject: [RFC PATCH 08/28] siw: Inline do_tcp_sendpages() Date: Thu, 16 Mar 2023 15:25:58 +0000 Message-Id: <20230316152618.711970-9-dhowells@redhat.com> In-Reply-To: <20230316152618.711970-1-dhowells@redhat.com> References: <20230316152618.711970-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	None \| expand [RFC,02/28] Add a special allocator for staging netfs protocol to MSG_SPLICE_PAGES [RFC,08/28] siw: Inline do_tcp_sendpages() [RFC,18/28] siw: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage to transmit [RFC,25/28] rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage [RFC,28/28] sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)

David Howells March 16, 2023, 3:25 p.m. UTC

do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
so inline it, allowing do_tcp_sendpages() to be removed.  This is part of
replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Bernard Metzler <bmt@zurich.ibm.com>
cc: Tom Talpey <tom@talpey.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-rdma@vger.kernel.org
cc: netdev@vger.kernel.org
---
 drivers/infiniband/sw/siw/siw_qp_tx.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Bernard Metzler March 20, 2023, 10:53 a.m. UTC | #1

> -----Original Message-----
> From: David Howells <dhowells@redhat.com>
> Sent: Thursday, 16 March 2023 16:26
> To: Matthew Wilcox <willy@infradead.org>; David S. Miller
> <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>
> Cc: David Howells <dhowells@redhat.com>; Al Viro <viro@zeniv.linux.org.uk>;
> Christoph Hellwig <hch@infradead.org>; Jens Axboe <axboe@kernel.dk>; Jeff
> Layton <jlayton@kernel.org>; Christian Brauner <brauner@kernel.org>; Linus
> Torvalds <torvalds@linux-foundation.org>; netdev@vger.kernel.org; linux-
> fsdevel@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> Bernard Metzler <BMT@zurich.ibm.com>; Tom Talpey <tom@talpey.com>; linux-
> rdma@vger.kernel.org
> Subject: [EXTERNAL] [RFC PATCH 08/28] siw: Inline do_tcp_sendpages()
> 
> do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(),
> so inline it, allowing do_tcp_sendpages() to be removed.  This is part of
> replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Bernard Metzler <bmt@zurich.ibm.com>
> cc: Tom Talpey <tom@talpey.com>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: Eric Dumazet <edumazet@google.com>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: linux-rdma@vger.kernel.org
> cc: netdev@vger.kernel.org
> ---
>  drivers/infiniband/sw/siw/siw_qp_tx.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
> b/drivers/infiniband/sw/siw/siw_qp_tx.c
> index 05052b49107f..8fc179321e2b 100644
> --- a/drivers/infiniband/sw/siw/siw_qp_tx.c
> +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
> @@ -313,7 +313,7 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx,
> struct socket *s,
>  }
> 


Hi David,

many thanks for looking into that!

I reply to a limited audience, expecting limited interest,
basically to the rdma list plus Tom.


>  /*
> - * 0copy TCP transmit interface: Use do_tcp_sendpages.
> + * 0copy TCP transmit interface: Use MSG_SPLICE_PAGES.
>   *
>   * Using sendpage to push page by page appears to be less efficient
>   * than using sendmsg, even if data are copied.

That is an interesting observation. Is efficiency to be read as
CPU load, or throughput on the wire, or both?

Back in the days, I introduced that zcopy path for efficiency
reasons - getting both better throughput and less CPU load.
I looked at both WRITE and READ performance. Using
do_tcp_sendpages() is currently limited to processing work
which is not registered with local completion generation.
Replying to a remote READ request is a typical case. Did
you check with READ?

> @@ -324,20 +324,27 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx,
> struct socket *s,
>  static int siw_tcp_sendpages(struct socket *s, struct page **page, int
> offset,
>  			     size_t size)
>  {
> +	struct bio_vec bvec;
> +	struct msghdr msg = {
> +		.msg_flags = (MSG_SPLICE_PAGES | MSG_MORE | MSG_DONTWAIT |
> +			      MSG_SENDPAGE_NOTLAST),
> +	};
>  	struct sock *sk = s->sk;
> -	int i = 0, rv = 0, sent = 0,
> -	    flags = MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST;
> +	int i = 0, rv = 0, sent = 0;
> 
>  	while (size) {
>  		size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
> 
>  		if (size + offset <= PAGE_SIZE)
> -			flags = MSG_MORE | MSG_DONTWAIT;
> +			msg.msg_flags = MSG_SPLICE_PAGES | MSG_MORE |
> MSG_DONTWAIT;
> 
>  		tcp_rate_check_app_limited(sk);
> +		bvec_set_page(&bvec, page[i], bytes, offset);
> +		iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
> +
>  try_page_again:
>  		lock_sock(sk);
> -		rv = do_tcp_sendpages(sk, page[i], offset, bytes, flags);
> +		rv = tcp_sendmsg_locked(sk, &msg, size);

Would that tcp_sendmsg_locked() with a msg flagged
MSG_SPLICE_PAGES still have zero copy semantics?


>  		release_sock(sk);
> 
>  		if (rv > 0) {

David Howells March 20, 2023, 11:08 a.m. UTC | #2

Bernard Metzler <BMT@zurich.ibm.com> wrote:

> >  /*
> > - * 0copy TCP transmit interface: Use do_tcp_sendpages.
> > + * 0copy TCP transmit interface: Use MSG_SPLICE_PAGES.
> >   *
> >   * Using sendpage to push page by page appears to be less efficient
> >   * than using sendmsg, even if data are copied.
> 
> That is an interesting observation. Is efficiency to be read as
> CPU load, or throughput on the wire, or both?

Um.  The observation in the comment is one you made, not me according to git
blame.  I merely changed "do_tcp_sendpages" to "MSG_SPLICE_PAGES" in the first
line of the comment.

> Back in the days, I introduced that zcopy path for efficiency
> reasons - getting both better throughput and less CPU load.
> I looked at both WRITE and READ performance. Using
> do_tcp_sendpages() is currently limited to processing work
> which is not registered with local completion generation.
> Replying to a remote READ request is a typical case. Did
> you check with READ?

Ah - you're talking about ksmbd there?  I haven't tested the patch with that.

> > -		rv = do_tcp_sendpages(sk, page[i], offset, bytes, flags);
> > +		rv = tcp_sendmsg_locked(sk, &msg, size);
> 
> Would that tcp_sendmsg_locked() with a msg flagged
> MSG_SPLICE_PAGES still have zero copy semantics?

Yes - though I am considering making it conditional on whether the pages in
the iterator belong to the slab allocator (in which case they get copied) or
not.

David

Bernard Metzler March 20, 2023, 12:27 p.m. UTC | #3

> -----Original Message-----
> From: David Howells <dhowells@redhat.com>
> Sent: Monday, 20 March 2023 12:09
> To: Bernard Metzler <BMT@zurich.ibm.com>
> Cc: David Howells <dhowells@redhat.com>; Tom Talpey <tom@talpey.com>;
> linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] Re: [RFC PATCH 08/28] siw: Inline do_tcp_sendpages()
> 
> Bernard Metzler <BMT@zurich.ibm.com> wrote:
> 
> > >  /*
> > > - * 0copy TCP transmit interface: Use do_tcp_sendpages.
> > > + * 0copy TCP transmit interface: Use MSG_SPLICE_PAGES.
> > >   *
> > >   * Using sendpage to push page by page appears to be less efficient
> > >   * than using sendmsg, even if data are copied.
> >
> > That is an interesting observation. Is efficiency to be read as
> > CPU load, or throughput on the wire, or both?
> 
> Um.  The observation in the comment is one you made, not me according to
> git

Haha, yes. So sorry for that. I am getting older ;) I need
to put on some more sanity checks before posting here!


> blame.  I merely changed "do_tcp_sendpages" to "MSG_SPLICE_PAGES" in the
> first
> line of the comment.
> 
> > Back in the days, I introduced that zcopy path for efficiency
> > reasons - getting both better throughput and less CPU load.
> > I looked at both WRITE and READ performance. Using
> > do_tcp_sendpages() is currently limited to processing work
> > which is not registered with local completion generation.
> > Replying to a remote READ request is a typical case. Did
> > you check with READ?
> 
> Ah - you're talking about ksmbd there?  I haven't tested the patch with
> that.

Did you test with both kernel ULPs and user level applications?

> 
> > > -		rv = do_tcp_sendpages(sk, page[i], offset, bytes, flags);
> > > +		rv = tcp_sendmsg_locked(sk, &msg, size);
> >
> > Would that tcp_sendmsg_locked() with a msg flagged
> > MSG_SPLICE_PAGES still have zero copy semantics?
> 
> Yes - though I am considering making it conditional on whether the pages in
> the iterator belong to the slab allocator (in which case they get copied)
> or
> not.

Sounds good to me!
> 
> David

David Howells March 20, 2023, 1:13 p.m. UTC | #4

Bernard Metzler <BMT@zurich.ibm.com> wrote:

> > > Back in the days, I introduced that zcopy path for efficiency
> > > reasons - getting both better throughput and less CPU load.
> > > I looked at both WRITE and READ performance. Using
> > > do_tcp_sendpages() is currently limited to processing work
> > > which is not registered with local completion generation.
> > > Replying to a remote READ request is a typical case. Did
> > > you check with READ?
> > 
> > Ah - you're talking about ksmbd there?  I haven't tested the patch with
> > that.
> 
> Did you test with both kernel ULPs and user level applications?

Kernel "ULPs"?

As far as cifs goes, I've tested the fs with large dd commands for the moment,
but that's all.  This post was more to find out how attached people were to
->sendpage() and to see if anyone had any preferences on a couple of things
mentioned in the cover note.  This isn't aimed at the next merge window.

David

Bernard Metzler March 20, 2023, 1:18 p.m. UTC | #5

> -----Original Message-----
> From: David Howells <dhowells@redhat.com>
> Sent: Monday, 20 March 2023 14:13
> To: Bernard Metzler <BMT@zurich.ibm.com>
> Cc: David Howells <dhowells@redhat.com>; Tom Talpey <tom@talpey.com>;
> linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] Re: [RFC PATCH 08/28] siw: Inline do_tcp_sendpages()
> 
> Bernard Metzler <BMT@zurich.ibm.com> wrote:
> 
> > > > Back in the days, I introduced that zcopy path for efficiency
> > > > reasons - getting both better throughput and less CPU load.
> > > > I looked at both WRITE and READ performance. Using
> > > > do_tcp_sendpages() is currently limited to processing work
> > > > which is not registered with local completion generation.
> > > > Replying to a remote READ request is a typical case. Did
> > > > you check with READ?
> > >
> > > Ah - you're talking about ksmbd there?  I haven't tested the patch with
> > > that.
> >
> > Did you test with both kernel ULPs and user level applications?
> 
> Kernel "ULPs"?

I was trying to refer to kernel applications or clients or
upper layer protocols (ulp, like nfs).

> 
> As far as cifs goes, I've tested the fs with large dd commands for the
> moment,
> but that's all.  This post was more to find out how attached people were to
> ->sendpage() and to see if anyone had any preferences on a couple of things
> mentioned in the cover note.  This isn't aimed at the next merge window.

I like your patches to siw a lot, since it would significantly
simplify the transmit code path.

Thank you,
Bernard.

> 
> David

[RFC,08/28] siw: Inline do_tcp_sendpages()

Commit Message

Comments

Patch