mbox series

[net-next,v3,00/21] rxrpc: Miscellaneous changes and make use of MSG_SPLICE_PAGES

Message ID 20240306000655.1100294-1-dhowells@redhat.com (mailing list archive)
Headers show
Series rxrpc: Miscellaneous changes and make use of MSG_SPLICE_PAGES | expand

Message

David Howells March 6, 2024, 12:06 a.m. UTC
Here are some changes to AF_RXRPC:

 (1) Cache the transmission serial number of ACK and DATA packets in the
     rxrpc_txbuf struct and log this in the retransmit tracepoint.

 (2) Don't use atomics on rxrpc_txbuf::flags[*] and cache the intended wire
     header flags there too to avoid duplication.

 (3) Cache the wire checksum in rxrpc_txbuf to make it easier to create
     jumbo packets in future (which will require altering the wire header
     to a jumbo header and restoring it back again for retransmission).

 (4) Fix the protocol names in the wire ACK trailer struct.

 (5) Strip all the barriers and atomics out of the call timer tracking[*].

 (6) Remove atomic handling from call->tx_transmitted and
     call->acks_prev_seq[*].

 (7) Don't bother resetting the DF flag after UDP packet transmission.  To
     change it, we now call directly into UDP code, so it's quick just to
     set it every time.

 (8) Merge together the DF/non-DF branches of the DATA transmission to
     reduce duplication in the code.

 (9) Add a kvec array into rxrpc_txbuf and start moving things over to it.
     This paves the way for using page frags.

(10) Split (sub)packet preparation and timestamping out of the DATA
     transmission function.  This helps pave the way for future jumbo
     packet generation.

(11) In rxkad, don't pick values out of the wire header stored in
     rxrpc_txbuf, buf rather find them elsewhere so we can remove the wire
     header from there.

(12) Move rxrpc_send_ACK() to output.c so that it can be merged with
     rxrpc_send_ack_packet().

(13) Use rxrpc_txbuf::kvec[0] to access the wire header for the packet
     rather than directly accessing the copy in rxrpc_txbuf.  This will
     allow that to be removed to a page frag.

(14) Switch from keeping the transmission buffers in rxrpc_txbuf allocated
     in the slab to allocating them using page fragment allocators.  There
     are separate allocators for DATA packets (which persist for a while)
     and control packets (which are discarded immediately).

     We can then turn on MSG_SPLICE_PAGES when transmitting DATA and ACK
     packets.

     We can also get rid of the RCU cleanup on rxrpc_txbufs, preferring
     instead to release the page frags as soon as possible.

(15) Parse received packets before handling timeouts as the former may
     reset the latter.

(16) Make sure we don't retransmit DATA packets after all the packets have
     been ACK'd.

(17) Differentiate traces for PING ACK transmission.

(18) Switch to keeping timeouts as ktime_t rather than a number of jiffies
     as the latter is too coarse a granularity.  Only set the call timer at
     the end of the call event function from the aggregate of all the
     timeouts, thereby reducing the number of timer calls made.  In future,
     it might be possible to reduce the number of timers from one per call
     to one per I/O thread and to use a high-precision timer.

(19) Record RTT probes after successful transmission rather than recording
     it before and then cancelling it after if unsuccessful[*].  This
     allows a number of calls to get the current time to be removed.

(20) Clean up the resend algorithm as there's now no need to walk the
     transmission buffer under lock[*].  DATA packets can be retransmitted
     as soon as they're found rather than being queued up and transmitted
     when the locked is dropped.

(21) When initially parsing a received ACK packet, extract some of the
     fields from the ack info to the skbuff private data.  This makes it
     easier to do path MTU discovery in the future when the call to which a
     PING RESPONSE ACK refers has been deallocated.


[*] Possible with the move of almost all code from softirq context to the
    I/O thread.

The patches are tagged here:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/rxrpc-iothread-20240305

And can be found on this branch:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-iothread

David

Link: https://lore.kernel.org/r/20240301163807.385573-1-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20240304084322.705539-1-dhowells@redhat.com/ # v2

Changes
=======
ver #3)
 - Use passed-in gfp in rxkad_alloc_txbuf() rather than GFP_KRNEL.
 - Adjust rxkad_alloc_txbuf()'s txb check to put return in if-statement.

ver #2)
 - Removed an unused variable.
 - Use ktime_to_us() rather than dividing a ktime by 1000 in tracepoints.

David Howells (21):
  rxrpc: Record the Tx serial in the rxrpc_txbuf and retransmit trace
  rxrpc: Convert rxrpc_txbuf::flags into a mask and don't use atomics
  rxrpc: Note cksum in txbuf
  rxrpc: Fix the names of the fields in the ACK trailer struct
  rxrpc: Strip barriers and atomics off of timer tracking
  rxrpc: Remove atomic handling on some fields only used in I/O thread
  rxrpc: Do lazy DF flag resetting
  rxrpc: Merge together DF/non-DF branches of data Tx function
  rxrpc: Add a kvec[] to the rxrpc_txbuf struct
  rxrpc: Split up the DATA packet transmission function
  rxrpc: Don't pick values out of the wire header when setting up
    security
  rxrpc: Move rxrpc_send_ACK() to output.c with rxrpc_send_ack_packet()
  rxrpc: Use rxrpc_txbuf::kvec[0] instead of rxrpc_txbuf::wire
  rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags
  rxrpc: Parse received packets before dealing with timeouts
  rxrpc: Don't permit resending after all Tx packets acked
  rxrpc: Differentiate PING ACK transmission traces.
  rxrpc: Use ktimes for call timeout tracking and set the timer lazily
  rxrpc: Record probes after transmission and reduce number of time-gets
  rxrpc: Clean up the resend algorithm
  rxrpc: Extract useful fields from a received ACK to skb priv data

 include/trace/events/rxrpc.h | 198 ++++++++--------
 net/rxrpc/af_rxrpc.c         |  12 +-
 net/rxrpc/ar-internal.h      |  88 ++++---
 net/rxrpc/call_event.c       | 327 ++++++++++++--------------
 net/rxrpc/call_object.c      |  56 ++---
 net/rxrpc/conn_client.c      |   4 +-
 net/rxrpc/conn_event.c       |  16 +-
 net/rxrpc/conn_object.c      |   4 +
 net/rxrpc/input.c            | 116 +++++----
 net/rxrpc/insecure.c         |  11 +-
 net/rxrpc/io_thread.c        |  11 +
 net/rxrpc/local_object.c     |   3 +
 net/rxrpc/misc.c             |   8 +-
 net/rxrpc/output.c           | 441 +++++++++++++++++------------------
 net/rxrpc/proc.c             |  10 +-
 net/rxrpc/protocol.h         |   6 +-
 net/rxrpc/rtt.c              |  36 +--
 net/rxrpc/rxkad.c            |  57 ++---
 net/rxrpc/sendmsg.c          |  63 ++---
 net/rxrpc/sysctl.c           |  16 +-
 net/rxrpc/txbuf.c            | 174 +++++++++++---
 21 files changed, 853 insertions(+), 804 deletions(-)

Comments

Simon Horman March 7, 2024, 9:39 a.m. UTC | #1
On Wed, Mar 06, 2024 at 12:06:30AM +0000, David Howells wrote:
> Here are some changes to AF_RXRPC:
> 
>  (1) Cache the transmission serial number of ACK and DATA packets in the
>      rxrpc_txbuf struct and log this in the retransmit tracepoint.
> 
>  (2) Don't use atomics on rxrpc_txbuf::flags[*] and cache the intended wire
>      header flags there too to avoid duplication.
> 
>  (3) Cache the wire checksum in rxrpc_txbuf to make it easier to create
>      jumbo packets in future (which will require altering the wire header
>      to a jumbo header and restoring it back again for retransmission).
> 
>  (4) Fix the protocol names in the wire ACK trailer struct.
> 
>  (5) Strip all the barriers and atomics out of the call timer tracking[*].
> 
>  (6) Remove atomic handling from call->tx_transmitted and
>      call->acks_prev_seq[*].
> 
>  (7) Don't bother resetting the DF flag after UDP packet transmission.  To
>      change it, we now call directly into UDP code, so it's quick just to
>      set it every time.
> 
>  (8) Merge together the DF/non-DF branches of the DATA transmission to
>      reduce duplication in the code.
> 
>  (9) Add a kvec array into rxrpc_txbuf and start moving things over to it.
>      This paves the way for using page frags.
> 
> (10) Split (sub)packet preparation and timestamping out of the DATA
>      transmission function.  This helps pave the way for future jumbo
>      packet generation.
> 
> (11) In rxkad, don't pick values out of the wire header stored in
>      rxrpc_txbuf, buf rather find them elsewhere so we can remove the wire
>      header from there.
> 
> (12) Move rxrpc_send_ACK() to output.c so that it can be merged with
>      rxrpc_send_ack_packet().
> 
> (13) Use rxrpc_txbuf::kvec[0] to access the wire header for the packet
>      rather than directly accessing the copy in rxrpc_txbuf.  This will
>      allow that to be removed to a page frag.
> 
> (14) Switch from keeping the transmission buffers in rxrpc_txbuf allocated
>      in the slab to allocating them using page fragment allocators.  There
>      are separate allocators for DATA packets (which persist for a while)
>      and control packets (which are discarded immediately).
> 
>      We can then turn on MSG_SPLICE_PAGES when transmitting DATA and ACK
>      packets.
> 
>      We can also get rid of the RCU cleanup on rxrpc_txbufs, preferring
>      instead to release the page frags as soon as possible.
> 
> (15) Parse received packets before handling timeouts as the former may
>      reset the latter.
> 
> (16) Make sure we don't retransmit DATA packets after all the packets have
>      been ACK'd.
> 
> (17) Differentiate traces for PING ACK transmission.
> 
> (18) Switch to keeping timeouts as ktime_t rather than a number of jiffies
>      as the latter is too coarse a granularity.  Only set the call timer at
>      the end of the call event function from the aggregate of all the
>      timeouts, thereby reducing the number of timer calls made.  In future,
>      it might be possible to reduce the number of timers from one per call
>      to one per I/O thread and to use a high-precision timer.
> 
> (19) Record RTT probes after successful transmission rather than recording
>      it before and then cancelling it after if unsuccessful[*].  This
>      allows a number of calls to get the current time to be removed.
> 
> (20) Clean up the resend algorithm as there's now no need to walk the
>      transmission buffer under lock[*].  DATA packets can be retransmitted
>      as soon as they're found rather than being queued up and transmitted
>      when the locked is dropped.
> 
> (21) When initially parsing a received ACK packet, extract some of the
>      fields from the ack info to the skbuff private data.  This makes it
>      easier to do path MTU discovery in the future when the call to which a
>      PING RESPONSE ACK refers has been deallocated.
> 
> 
> [*] Possible with the move of almost all code from softirq context to the
>     I/O thread.
> 
> The patches are tagged here:
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/rxrpc-iothread-20240305
> 
> And can be found on this branch:
> 
> 	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-iothread
> 
> David
> 
> Link: https://lore.kernel.org/r/20240301163807.385573-1-dhowells@redhat.com/ # v1
> Link: https://lore.kernel.org/r/20240304084322.705539-1-dhowells@redhat.com/ # v2
> 
> Changes
> =======
> ver #3)
>  - Use passed-in gfp in rxkad_alloc_txbuf() rather than GFP_KRNEL.

Hi David,

Thanks for the update above.
For the record, I don't have anything to flag in this revision of the patchset.

>  - Adjust rxkad_alloc_txbuf()'s txb check to put return in if-statement.
> 
> ver #2)
>  - Removed an unused variable.
>  - Use ktime_to_us() rather than dividing a ktime by 1000 in tracepoints.

...
patchwork-bot+netdevbpf@kernel.org March 8, 2024, 5:10 a.m. UTC | #2
Hello:

This series was applied to netdev/net-next.git (main)
by David Howells <dhowells@redhat.com>:

On Wed,  6 Mar 2024 00:06:30 +0000 you wrote:
> Here are some changes to AF_RXRPC:
> 
>  (1) Cache the transmission serial number of ACK and DATA packets in the
>      rxrpc_txbuf struct and log this in the retransmit tracepoint.
> 
>  (2) Don't use atomics on rxrpc_txbuf::flags[*] and cache the intended wire
>      header flags there too to avoid duplication.
> 
> [...]

Here is the summary with links:
  - [net-next,v3,01/21] rxrpc: Record the Tx serial in the rxrpc_txbuf and retransmit trace
    https://git.kernel.org/netdev/net-next/c/ba132d841d56
  - [net-next,v3,02/21] rxrpc: Convert rxrpc_txbuf::flags into a mask and don't use atomics
    https://git.kernel.org/netdev/net-next/c/12bdff73a147
  - [net-next,v3,03/21] rxrpc: Note cksum in txbuf
    https://git.kernel.org/netdev/net-next/c/41b8debba79c
  - [net-next,v3,04/21] rxrpc: Fix the names of the fields in the ACK trailer struct
    https://git.kernel.org/netdev/net-next/c/17469ae0582a
  - [net-next,v3,05/21] rxrpc: Strip barriers and atomics off of timer tracking
    https://git.kernel.org/netdev/net-next/c/d73f3a748875
  - [net-next,v3,06/21] rxrpc: Remove atomic handling on some fields only used in I/O thread
    https://git.kernel.org/netdev/net-next/c/693f9c13ec89
  - [net-next,v3,07/21] rxrpc: Do lazy DF flag resetting
    https://git.kernel.org/netdev/net-next/c/d32636982ce9
  - [net-next,v3,08/21] rxrpc: Merge together DF/non-DF branches of data Tx function
    https://git.kernel.org/netdev/net-next/c/1ac6a8536c2c
  - [net-next,v3,09/21] rxrpc: Add a kvec[] to the rxrpc_txbuf struct
    https://git.kernel.org/netdev/net-next/c/ff342bdc59f4
  - [net-next,v3,10/21] rxrpc: Split up the DATA packet transmission function
    https://git.kernel.org/netdev/net-next/c/44125d5aadda
  - [net-next,v3,11/21] rxrpc: Don't pick values out of the wire header when setting up security
    https://git.kernel.org/netdev/net-next/c/a1c9af4d4467
  - [net-next,v3,12/21] rxrpc: Move rxrpc_send_ACK() to output.c with rxrpc_send_ack_packet()
    https://git.kernel.org/netdev/net-next/c/99afb28c676c
  - [net-next,v3,13/21] rxrpc: Use rxrpc_txbuf::kvec[0] instead of rxrpc_txbuf::wire
    https://git.kernel.org/netdev/net-next/c/8985f2b09b33
  - [net-next,v3,14/21] rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags
    https://git.kernel.org/netdev/net-next/c/49489bb03a50
  - [net-next,v3,15/21] rxrpc: Parse received packets before dealing with timeouts
    https://git.kernel.org/netdev/net-next/c/3e0b83ee535d
  - [net-next,v3,16/21] rxrpc: Don't permit resending after all Tx packets acked
    https://git.kernel.org/netdev/net-next/c/a711d976e1cd
  - [net-next,v3,17/21] rxrpc: Differentiate PING ACK transmission traces.
    https://git.kernel.org/netdev/net-next/c/12a66e77c499
  - [net-next,v3,18/21] rxrpc: Use ktimes for call timeout tracking and set the timer lazily
    https://git.kernel.org/netdev/net-next/c/153f90a066dd
  - [net-next,v3,19/21] rxrpc: Record probes after transmission and reduce number of time-gets
    https://git.kernel.org/netdev/net-next/c/4d267ad6fd56
  - [net-next,v3,20/21] rxrpc: Clean up the resend algorithm
    https://git.kernel.org/netdev/net-next/c/37473e416234
  - [net-next,v3,21/21] rxrpc: Extract useful fields from a received ACK to skb priv data
    https://git.kernel.org/netdev/net-next/c/4b68137a20bc

You are awesome, thank you!