diff mbox series

[net-next,v2] tls: Add opt-in zerocopy mode of sendfile()

Message ID 20220511121525.624059-1-maximmi@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2] tls: Add opt-in zerocopy mode of sendfile() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 60 this patch: 60
netdev/cc_maintainers warning 2 maintainers not CCed: edumazet@google.com john.fastabend@gmail.com
netdev/build_clang success Errors and warnings before: 11 this patch: 11
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 60 this patch: 60
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 191 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Maxim Mikityanskiy May 11, 2022, 12:15 p.m. UTC
From: Boris Pismenny <borisp@nvidia.com>

TLS device offload copies sendfile data to a bounce buffer before
transmitting. It allows to maintain the valid MAC on TLS records when
the file contents change and a part of TLS record has to be
retransmitted on TCP level.

In many common use cases (like serving static files over HTTPS) the file
contents are not changed on the fly. In many use cases breaking the
connection is totally acceptable if the file is changed during
transmission, because it would be received corrupted in any case.

This commit allows to optimize performance for such use cases to
providing a new optional mode of TLS sendfile(), in which the extra copy
is skipped. Removing this copy improves performance significantly, as
TLS and TCP sendfile perform the same operations, and the only overhead
is TLS header/trailer insertion.

The new mode can only be enabled with the new socket option named
TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards
compatibility with existing applications that rely on the copying
behavior.

The new mode is safe, meaning that unsolicited modifications of the file
being sent can't break integrity of the kernel. The worst thing that can
happen is sending a corrupted TLS record, which is in any case not
forbidden when using regular TCP sockets.

Sockets other than TLS device offload are not affected by the new socket
option.

Performance numbers in a single-core test with 12 HTTPS streams on
nginx, under 100% CPU load:

* non-zerocopy: up to 23.5 Gbit/s
* zerocopy: up to 50.0 Gbit/s

Signed-off-by: Boris Pismenny <borisp@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
---
 include/net/tls.h        |  1 +
 include/uapi/linux/tls.h |  1 +
 net/tls/tls_device.c     | 53 ++++++++++++++++++++++++++++++----------
 net/tls/tls_main.c       | 48 ++++++++++++++++++++++++++++++++++++
 4 files changed, 90 insertions(+), 13 deletions(-)

Comments

Jakub Kicinski May 12, 2022, 11:34 p.m. UTC | #1
On Wed, 11 May 2022 15:15:25 +0300 Maxim Mikityanskiy wrote:
> TLS device offload copies sendfile data to a bounce buffer before
> transmitting. It allows to maintain the valid MAC on TLS records when
> the file contents change and a part of TLS record has to be
> retransmitted on TCP level.
> 
> In many common use cases (like serving static files over HTTPS) the file
> contents are not changed on the fly. In many use cases breaking the
> connection is totally acceptable if the file is changed during
> transmission, because it would be received corrupted in any case.
> 
> This commit allows to optimize performance for such use cases to
> providing a new optional mode of TLS sendfile(), in which the extra copy
> is skipped. Removing this copy improves performance significantly, as
> TLS and TCP sendfile perform the same operations, and the only overhead
> is TLS header/trailer insertion.
> 
> The new mode can only be enabled with the new socket option named
> TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards
> compatibility with existing applications that rely on the copying
> behavior.
> 
> The new mode is safe, meaning that unsolicited modifications of the file
> being sent can't break integrity of the kernel. The worst thing that can
> happen is sending a corrupted TLS record, which is in any case not
> forbidden when using regular TCP sockets.
> 
> Sockets other than TLS device offload are not affected by the new socket
> option.

What about the reporting via sock diag? Am I misremembering something?
Maxim Mikityanskiy May 13, 2022, 7:57 a.m. UTC | #2
On 2022-05-13 02:34, Jakub Kicinski wrote:
> On Wed, 11 May 2022 15:15:25 +0300 Maxim Mikityanskiy wrote:
>> TLS device offload copies sendfile data to a bounce buffer before
>> transmitting. It allows to maintain the valid MAC on TLS records when
>> the file contents change and a part of TLS record has to be
>> retransmitted on TCP level.
>>
>> In many common use cases (like serving static files over HTTPS) the file
>> contents are not changed on the fly. In many use cases breaking the
>> connection is totally acceptable if the file is changed during
>> transmission, because it would be received corrupted in any case.
>>
>> This commit allows to optimize performance for such use cases to
>> providing a new optional mode of TLS sendfile(), in which the extra copy
>> is skipped. Removing this copy improves performance significantly, as
>> TLS and TCP sendfile perform the same operations, and the only overhead
>> is TLS header/trailer insertion.
>>
>> The new mode can only be enabled with the new socket option named
>> TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards
>> compatibility with existing applications that rely on the copying
>> behavior.
>>
>> The new mode is safe, meaning that unsolicited modifications of the file
>> being sent can't break integrity of the kernel. The worst thing that can
>> happen is sending a corrupted TLS record, which is in any case not
>> forbidden when using regular TCP sockets.
>>
>> Sockets other than TLS device offload are not affected by the new socket
>> option.
> 
> What about the reporting via sock diag? Am I misremembering something?

I recall we discussed that "zerocopy is active" can be restored as 
"hardware offload && setsockopt flag requested", and I saw that 
tls_get_info exposes the hardware offload state for the socket, so I 
thought the existing information in sock_diag was enough.

If you think, though, that it will be better to have an explicit 
indication of the zerocopy flag to sock_diag, I can add it, it's not a 
problem.

Does the rest of the patch look good to you?

Thanks,
Max
Jakub Kicinski May 13, 2022, 4:02 p.m. UTC | #3
On Fri, 13 May 2022 10:57:06 +0300 Maxim Mikityanskiy wrote:
> > What about the reporting via sock diag? Am I misremembering something?  
> 
> I recall we discussed that "zerocopy is active" can be restored as 
> "hardware offload && setsockopt flag requested", and I saw that 
> tls_get_info exposes the hardware offload state for the socket, so I 
> thought the existing information in sock_diag was enough.
> 
> If you think, though, that it will be better to have an explicit 
> indication of the zerocopy flag to sock_diag, I can add it, it's not a 
> problem.

Yeah, my thinking is that if a user reports problems to me I'd like to
be able to log in to the box and see what they have configured on the
socket. ss is quick and easy to type. I'm not aware of tooling that'd
grab descriptors for sockets and do getsockopt() (even tho technically
it is doable), are there any? I'm not feeling strong about sock_diag
specifically being the way but for such a potentially correctness-
-impacting optimization we should give admins an easy way to
interrogate if it's enabled or not.

> Does the rest of the patch look good to you?

Yup!
diff mbox series

Patch

diff --git a/include/net/tls.h b/include/net/tls.h
index b59f0a63292b..8017f1703447 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -238,6 +238,7 @@  struct tls_context {
 
 	u8 tx_conf:3;
 	u8 rx_conf:3;
+	u8 zerocopy_sendfile:1;
 
 	int (*push_pending_record)(struct sock *sk, int flags);
 	void (*sk_write_space)(struct sock *sk);
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index 5f38be0ec0f3..47989b77ebcf 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -39,6 +39,7 @@ 
 /* TLS socket options */
 #define TLS_TX			1	/* Set transmit parameters */
 #define TLS_RX			2	/* Set receive parameters */
+#define TLS_TX_ZEROCOPY_SENDFILE	3	/* transmit zerocopy sendfile */
 
 /* Supported versions */
 #define TLS_VERSION_MINOR(ver)	((ver) & 0xFF)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index bca00521ebc1..ec6f4b699a2b 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -411,10 +411,16 @@  static int tls_device_copy_data(void *addr, size_t bytes, struct iov_iter *i)
 	return 0;
 }
 
+union tls_iter_offset {
+	struct iov_iter *msg_iter;
+	int offset;
+};
+
 static int tls_push_data(struct sock *sk,
-			 struct iov_iter *msg_iter,
+			 union tls_iter_offset iter_offset,
 			 size_t size, int flags,
-			 unsigned char record_type)
+			 unsigned char record_type,
+			 struct page *zc_page)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_prot_info *prot = &tls_ctx->prot_info;
@@ -480,12 +486,21 @@  static int tls_push_data(struct sock *sk,
 		}
 
 		record = ctx->open_record;
-		copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
-		copy = min_t(size_t, copy, (max_open_record_len - record->len));
 
-		if (copy) {
+		copy = min_t(size_t, size, max_open_record_len - record->len);
+		if (copy && zc_page) {
+			struct page_frag zc_pfrag;
+
+			zc_pfrag.page = zc_page;
+			zc_pfrag.offset = iter_offset.offset;
+			zc_pfrag.size = copy;
+			tls_append_frag(record, &zc_pfrag, copy);
+		} else if (copy) {
+			copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
+
 			rc = tls_device_copy_data(page_address(pfrag->page) +
-						  pfrag->offset, copy, msg_iter);
+						  pfrag->offset, copy,
+						  iter_offset.msg_iter);
 			if (rc)
 				goto handle_error;
 			tls_append_frag(record, pfrag, copy);
@@ -540,6 +555,7 @@  int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 {
 	unsigned char record_type = TLS_RECORD_TYPE_DATA;
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
+	union tls_iter_offset iter;
 	int rc;
 
 	mutex_lock(&tls_ctx->tx_lock);
@@ -551,8 +567,8 @@  int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 			goto out;
 	}
 
-	rc = tls_push_data(sk, &msg->msg_iter, size,
-			   msg->msg_flags, record_type);
+	iter.msg_iter = &msg->msg_iter;
+	rc = tls_push_data(sk, iter, size, msg->msg_flags, record_type, NULL);
 
 out:
 	release_sock(sk);
@@ -564,7 +580,8 @@  int tls_device_sendpage(struct sock *sk, struct page *page,
 			int offset, size_t size, int flags)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
-	struct iov_iter	msg_iter;
+	union tls_iter_offset iter_offset;
+	struct iov_iter msg_iter;
 	char *kaddr;
 	struct kvec iov;
 	int rc;
@@ -580,12 +597,20 @@  int tls_device_sendpage(struct sock *sk, struct page *page,
 		goto out;
 	}
 
+	if (tls_ctx->zerocopy_sendfile) {
+		iter_offset.offset = offset;
+		rc = tls_push_data(sk, iter_offset, size,
+				   flags, TLS_RECORD_TYPE_DATA, page);
+		goto out;
+	}
+
 	kaddr = kmap(page);
 	iov.iov_base = kaddr + offset;
 	iov.iov_len = size;
 	iov_iter_kvec(&msg_iter, WRITE, &iov, 1, size);
-	rc = tls_push_data(sk, &msg_iter, size,
-			   flags, TLS_RECORD_TYPE_DATA);
+	iter_offset.msg_iter = &msg_iter;
+	rc = tls_push_data(sk, iter_offset, size, flags, TLS_RECORD_TYPE_DATA,
+			   NULL);
 	kunmap(page);
 
 out:
@@ -656,10 +681,12 @@  EXPORT_SYMBOL(tls_get_record);
 
 static int tls_device_push_pending_record(struct sock *sk, int flags)
 {
-	struct iov_iter	msg_iter;
+	union tls_iter_offset iter;
+	struct iov_iter msg_iter;
 
 	iov_iter_kvec(&msg_iter, WRITE, NULL, 0, 0);
-	return tls_push_data(sk, &msg_iter, 0, flags, TLS_RECORD_TYPE_DATA);
+	iter.msg_iter = &msg_iter;
+	return tls_push_data(sk, iter, 0, flags, TLS_RECORD_TYPE_DATA, NULL);
 }
 
 void tls_device_write_space(struct sock *sk, struct tls_context *ctx)
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 7b2b0e7ffee4..6d0ae30e7861 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -513,6 +513,26 @@  static int do_tls_getsockopt_conf(struct sock *sk, char __user *optval,
 	return rc;
 }
 
+static int do_tls_getsockopt_tx_zc(struct sock *sk, char __user *optval,
+				   int __user *optlen)
+{
+	struct tls_context *ctx = tls_get_ctx(sk);
+	unsigned int value;
+	int len;
+
+	if (get_user(len, optlen))
+		return -EFAULT;
+
+	if (len != sizeof(value))
+		return -EINVAL;
+
+	value = ctx->zerocopy_sendfile;
+	if (copy_to_user(optval, &value, sizeof(value)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int do_tls_getsockopt(struct sock *sk, int optname,
 			     char __user *optval, int __user *optlen)
 {
@@ -524,6 +544,9 @@  static int do_tls_getsockopt(struct sock *sk, int optname,
 		rc = do_tls_getsockopt_conf(sk, optval, optlen,
 					    optname == TLS_TX);
 		break;
+	case TLS_TX_ZEROCOPY_SENDFILE:
+		rc = do_tls_getsockopt_tx_zc(sk, optval, optlen);
+		break;
 	default:
 		rc = -ENOPROTOOPT;
 		break;
@@ -675,6 +698,26 @@  static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval,
 	return rc;
 }
 
+static int do_tls_setsockopt_tx_zc(struct sock *sk, sockptr_t optval,
+				   unsigned int optlen)
+{
+	struct tls_context *ctx = tls_get_ctx(sk);
+	unsigned int value;
+
+	if (sockptr_is_null(optval) || optlen != sizeof(value))
+		return -EINVAL;
+
+	if (copy_from_sockptr(&value, optval, sizeof(value)))
+		return -EFAULT;
+
+	if (value > 1)
+		return -EINVAL;
+
+	ctx->zerocopy_sendfile = value;
+
+	return 0;
+}
+
 static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 			     unsigned int optlen)
 {
@@ -688,6 +731,11 @@  static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 					    optname == TLS_TX);
 		release_sock(sk);
 		break;
+	case TLS_TX_ZEROCOPY_SENDFILE:
+		lock_sock(sk);
+		rc = do_tls_setsockopt_tx_zc(sk, optval, optlen);
+		release_sock(sk);
+		break;
 	default:
 		rc = -ENOPROTOOPT;
 		break;