diff mbox series

[RFC,net-next,1/6] net: Documentation on QUIC kernel Tx crypto.

Message ID 20220803164045.3585187-2-adel.abushaev@gmail.com (mailing list archive)
State New
Headers show
Series [RFC,net-next,1/6] net: Documentation on QUIC kernel Tx crypto. | expand

Commit Message

Adel Abouchaev Aug. 3, 2022, 4:40 p.m. UTC
Adding Documentation/networking/quic.rst file to describe kernel QUIC
code.

Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
---
 Documentation/networking/quic.rst | 176 ++++++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)
 create mode 100644 Documentation/networking/quic.rst

Comments

Andrew Lunn Aug. 3, 2022, 6:23 p.m. UTC | #1
> +Statistics
> +==========
> +
> +QUIC Tx offload to the kernel has counters reflected in /proc/net/quic_stat:
> +
> +  QuicCurrTxSw  - number of currently active kernel offloaded QUIC connections
> +  QuicTxSw      - accumulative total number of offloaded QUIC connections
> +  QuicTxSwError - accumulative total number of errors during QUIC Tx offload to
> +                  kernel

netlink messages please, not /proc for statistics. netlink is the
preferred way to configure and report about the network stack.

	 Andrew
Adel Abouchaev Aug. 3, 2022, 6:51 p.m. UTC | #2
Andrew,

    Could you add more to your comment? The /proc was used similarly to 
kTLS. Netlink is better, though, unsure how ULP stats would fit in it.

Cheers,

Adel.

On 8/3/22 11:23 AM, Andrew Lunn wrote:
>> +Statistics
>> +==========
>> +
>> +QUIC Tx offload to the kernel has counters reflected in /proc/net/quic_stat:
>> +
>> +  QuicCurrTxSw  - number of currently active kernel offloaded QUIC connections
>> +  QuicTxSw      - accumulative total number of offloaded QUIC connections
>> +  QuicTxSwError - accumulative total number of errors during QUIC Tx offload to
>> +                  kernel
> netlink messages please, not /proc for statistics. netlink is the
> preferred way to configure and report about the network stack.
>
> 	 Andrew
Jonathan Corbet Aug. 4, 2022, 1:57 p.m. UTC | #3
Adel Abouchaev <adel.abushaev@gmail.com> writes:

> Adding Documentation/networking/quic.rst file to describe kernel QUIC
> code.
>
> Signed-off-by: Adel Abouchaev <adel.abushaev@gmail.com>
> ---
>  Documentation/networking/quic.rst | 176 ++++++++++++++++++++++++++++++
>  1 file changed, 176 insertions(+)
>  create mode 100644 Documentation/networking/quic.rst

When you add a new RST file, you need to add it to the index.rst as well
or it won't be pulled into the docs build.

Also...this all looks like user-space API documentation, so might
Documentation/userspace-api be a better place for it?

Thanks,

jon
Andrew Lunn Aug. 4, 2022, 3:29 p.m. UTC | #4
On Wed, Aug 03, 2022 at 11:51:59AM -0700, Adel Abouchaev wrote:
> Andrew,
> 
>    Could you add more to your comment? The /proc was used similarly to kTLS.
> Netlink is better, though, unsure how ULP stats would fit in it.

How do tools like ss(1) retrieve the protocol summary statistics? Do
they still use /proc, or netlink?

     Andrew
Adel Abouchaev Aug. 4, 2022, 4:57 p.m. UTC | #5
Looking at 
https://github.com/shemminger/iproute2/blob/main/misc/ss.c#L589 the ss.c 
still uses proc/.

Adel.

On 8/4/22 8:29 AM, Andrew Lunn wrote:
> On Wed, Aug 03, 2022 at 11:51:59AM -0700, Adel Abouchaev wrote:
>> Andrew,
>>
>>     Could you add more to your comment? The /proc was used similarly to kTLS.
>> Netlink is better, though, unsure how ULP stats would fit in it.
> How do tools like ss(1) retrieve the protocol summary statistics? Do
> they still use /proc, or netlink?
>
>       Andrew
Eric Dumazet Aug. 4, 2022, 5 p.m. UTC | #6
On Thu, Aug 4, 2022 at 9:58 AM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
>
> Looking at
> https://github.com/shemminger/iproute2/blob/main/misc/ss.c#L589 the ss.c
> still uses proc/.
>

Only for legacy reasons.

ss -t for sure will use netlink first, then fallback to /proc

New counters should use netlink, please.

> Adel.
>
> On 8/4/22 8:29 AM, Andrew Lunn wrote:
> > On Wed, Aug 03, 2022 at 11:51:59AM -0700, Adel Abouchaev wrote:
> >> Andrew,
> >>
> >>     Could you add more to your comment? The /proc was used similarly to kTLS.
> >> Netlink is better, though, unsure how ULP stats would fit in it.
> > How do tools like ss(1) retrieve the protocol summary statistics? Do
> > they still use /proc, or netlink?
> >
> >       Andrew
Jakub Kicinski Aug. 4, 2022, 6:09 p.m. UTC | #7
On Thu, 4 Aug 2022 10:00:37 -0700 Eric Dumazet wrote:
> On Thu, Aug 4, 2022 at 9:58 AM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
> > Looking at
> > https://github.com/shemminger/iproute2/blob/main/misc/ss.c#L589 the ss.c
> > still uses proc/.
> 
> Only for legacy reasons.

That but in all honesty also the fact that a proc file is pretty easy
and self-describing while the historic netlink families are undocumented
code salads.

> ss -t for sure will use netlink first, then fallback to /proc
> 
> New counters should use netlink, please.

Just to be sure I'm not missing anything - we're talking about some 
new netlink, right? Is there an existing place for "overall prot family
stats" over netlink today?
Eric Dumazet Aug. 4, 2022, 6:45 p.m. UTC | #8
On Thu, Aug 4, 2022 at 11:09 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 4 Aug 2022 10:00:37 -0700 Eric Dumazet wrote:
> > On Thu, Aug 4, 2022 at 9:58 AM Adel Abouchaev <adel.abushaev@gmail.com> wrote:
> > > Looking at
> > > https://github.com/shemminger/iproute2/blob/main/misc/ss.c#L589 the ss.c
> > > still uses proc/.
> >
> > Only for legacy reasons.
>
> That but in all honesty also the fact that a proc file is pretty easy
> and self-describing while the historic netlink families are undocumented
> code salads.
>
> > ss -t for sure will use netlink first, then fallback to /proc
> >
> > New counters should use netlink, please.
>
> Just to be sure I'm not missing anything - we're talking about some
> new netlink, right? Is there an existing place for "overall prot family
> stats" over netlink today?

I thought we were speaking of dumping ULP info on a per UDP socket basis.

If this is about new SNMP counters, then sure, /proc is fine I guess.
diff mbox series

Patch

diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
new file mode 100644
index 000000000000..eaa2d36310be
--- /dev/null
+++ b/Documentation/networking/quic.rst
@@ -0,0 +1,176 @@ 
+.. _kernel_quic:
+
+===========
+KERNEL QUIC
+===========
+
+Overview
+========
+
+QUIC is a secure general-purpose transport protocol that creates a stateful
+interaction between a client and a server. QUIC provides end-to-end integrity
+and confidentiality. Refer to RFC 9000 for more information on QUIC.
+
+The kernel Tx side offload covers the encryption of the application streams
+in the kernel rather than in the application. These packets are 1RTT packets
+in QUIC connection. Encryption of every other packets is still done by the
+QUIC library in user space.
+
+
+
+User Interface
+==============
+
+Creating a QUIC connection
+--------------------------
+
+QUIC connection originates and terminates in the application, using one of many
+available QUIC libraries. The code instantiates QUIC client and QUIC server in
+some form and configures them to use certain addresses and ports for the
+source and destination. The client and server negotiate the set of keys to
+protect the communication during different phases of the connection, maintain
+the connection and perform congestion control.
+
+Requesting to add QUIC Tx kernel encryption to the connection
+-------------------------------------------------------------
+
+Each flow that should be encrypted by the kernel needs to be registered with
+the kernel using socket API. A setsockopt() call on the socket creates an
+association between the QUIC connection ID of the flow with the encryption
+parameters for the crypto operations:
+
+.. code_block:: c
+  struct quic_connection_info conn_info;
+  char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
+  const size_t conn_id_len = sizeof(conn_id);
+  char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+                       0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+  char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+                      0x08, 0x09, 0x0a, 0x0b};
+  char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+                       0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f};
+
+  conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+  memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+  conn_info.key.conn_id_length = 5;
+  memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE - conn_id_len],
+         &conn_id, conn_id_len);
+
+  memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
+  memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
+  memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
+
+  setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
+             sizeof(conn_info));
+
+
+Requesting to remove QUIC Tx kernel crypto offload control messages
+-------------------------------------------------------------------
+
+All flows are removed when the socket is closed. To request an explicit remove
+of the offload for the connection during the lifetime of the socket the process
+is similar to adding the flow. Only the connection ID and its length are 
+necessary to supply to remove the connection from the offload:
+
+.. code_block:: c
+
+  memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+  conn_info.key.conn_id_length = 5;
+  memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE - conn_id_len],
+         &conn_id, conn_id_len);
+  setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
+             sizeof(conn_info));
+
+Sending QUIC application data
+-----------------------------
+
+For QUIC Tx encryption offload, the application should use sendmsg() socket
+call and provide ancillary data with information on connection ID length and
+offload flags for the kernel to perform the encryption and GSO support if
+requested.
+
+.. code_block:: c
+
+  size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+  uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+  struct quic_tx_ancillary_data * anc_data;
+  size_t quic_data_len = 4500;
+  struct cmsghdr * cmsg_hdr;
+  char quic_data[9000];
+  struct iovec iov[2];
+  int send_len = 9000;
+  struct msghdr msg;
+  int err;
+
+  iov[0].iov_base = quic_data;
+  iov[0].iov_len = quic_data_len;
+  iov[1].iov_base = quic_data + 4500;
+  iov[1].iov_len = quic_data_len;
+
+  if (client.addr.sin_family == AF_INET) {
+    msg.msg_name = &client.addr;
+    msg.msg_namelen = sizeof(client.addr);
+  } else {
+    msg.msg_name = &client.addr6;
+    msg.msg_namelen = sizeof(client.addr6);
+  }
+
+  msg.msg_iov = iov;
+  msg.msg_iovlen = 2;
+  msg.msg_control = cmsg_buf;
+  msg.msg_controllen = sizeof(cmsg_buf);
+  cmsg_hdr = CMSG_FIRSTHDR(&msg);
+  cmsg_hdr->cmsg_level = IPPROTO_UDP;
+  cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+  cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+  anc_data = CMSG_DATA(cmsg_hdr);
+  anc_data->flags = 0;
+  anc_data->next_pkt_num = 0x0d65c9;
+  anc_data->conn_id_length = conn_id_len;
+  err = sendmsg(self->sfd, &msg, 0);
+
+QUIC Tx offload in kernel will read the data from userspace, encrypt and
+copy it to the ciphertext within the same operation.
+
+
+Sending QUIC application data with GSO
+--------------------------------------
+When GSO is in use, the kernel will use the GSO fragment size as the target
+for ciphertext. The packets from the user space should align on the boundary
+of GSO fragment size minus the size of the tag for the chosen cipher. For the
+GSO fragment 1200, the plain packets should follow each other at every 1184
+bytes, given the tag size of 16. After the encryption, the rest of the UDP
+and IP stacks will follow the defined value of GSO fragment which will include
+the trailing tag bytes.
+
+To set up GSO fragmentation:
+
+.. code_block:: c
+  setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size, sizeof(frag_size));
+
+If the GSO fragment size is provided in ancillary data within the sendmsg()
+call, the value in ancillary data will take precedence over the segment size
+provided in setsockopt to split the payload into packets. This is consistent
+with the UDP stack behavior.
+
+Integrating to userspace QUIC libraries
+---------------------------------------
+
+Userspace QUIC libraries integration would depend on the implementation of the
+QUIC protocol. For MVFST library, the control plane is integrated into the
+handshake callbacks to properly configure the flows into the socket; and the
+data plane is integrated into the methods that perform encryption and send
+the packets to the batch scheduler for transmissions to the socket.
+
+MVFST library can be found at https://github.com/facebookincubator/mvfst.
+
+Statistics
+==========
+
+QUIC Tx offload to the kernel has counters reflected in /proc/net/quic_stat:
+
+  QuicCurrTxSw  - number of currently active kernel offloaded QUIC connections
+  QuicTxSw      - accumulative total number of offloaded QUIC connections
+  QuicTxSwError - accumulative total number of errors during QUIC Tx offload to
+                  kernel