From patchwork Fri Jun 3 05:31:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 12868534 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8A3EC43334 for ; Fri, 3 Jun 2022 05:31:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240625AbiFCFbe (ORCPT ); Fri, 3 Jun 2022 01:31:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240572AbiFCFbc (ORCPT ); Fri, 3 Jun 2022 01:31:32 -0400 Received: from mail.sberdevices.ru (mail.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12F3638DB4; Thu, 2 Jun 2022 22:31:30 -0700 (PDT) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mail.sberdevices.ru (Postfix) with ESMTP id D5EF95FD02; Fri, 3 Jun 2022 08:31:27 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1654234287; bh=+4MNda9ZVaWsO3OSMFy39GadvoLLcTiXQPmvOw9yBqY=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=OtnvFT2hWMHelc/RdeLxaqAPMtB3GTqXTl54pXjcE8WK+2p0/AEHMPYtdC30Kf00w gB/OlMdlWbfjoS3JVBo/vziQ9X2nO0FrAYyNKuUqVL5r5dgf8leEiH/JMNs9gv7D7O hQ1TPYoc7bxMATsx4JZeFjwf4zSdgMMQPgKEMPf3E5Mu1DQdi13oAdL9uLimobl501 69+5wWuEHSAC5CYMjOCNONZi/9eNqGbcOUWgP0H3iCNiWlo3KN/s176QsqypWe9flZ jVzl9NR6B66hlTsa1ctOIXhyFC3cOJTbTyzDFRj5VxMONltJuLJnuo7UOL2k6AGnIJ ER0D3ASDKzpag== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mail.sberdevices.ru (Postfix) with ESMTP; Fri, 3 Jun 2022 08:31:27 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , "David S. Miller" , Jason Wang , "Jakub Kicinski" , Paolo Abeni CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel , Krasnov Arseniy Subject: [RFC PATCH v2 1/8] virtio/vsock: rework packet allocation logic Thread-Topic: [RFC PATCH v2 1/8] virtio/vsock: rework packet allocation logic Thread-Index: AQHYdwsc/NV43wr3F0ycDhhxOYPufQ== Date: Fri, 3 Jun 2022 05:31:00 +0000 Message-ID: <78157286-3663-202f-da94-1a17e4ffe819@sberdevices.ru> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/06/03 01:19:00 #19656765 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org To support zerocopy receive, packet's buffer allocation is changed: for buffers which could be mapped to user's vma we can't use 'kmalloc()'(as kernel restricts to map slab pages to user's vma) and raw buddy allocator now called. But, for tx packets(such packets won't be mapped to user), previous 'kmalloc()' way is used, but with special flag in packet's structure which allows to distinguish between 'kmalloc()' and raw pages buffers. Signed-off-by: Arseniy Krasnov --- include/linux/virtio_vsock.h | 1 + net/vmw_vsock/virtio_transport.c | 8 ++++++-- net/vmw_vsock/virtio_transport_common.c | 9 ++++++++- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 35d7eedb5e8e..d02cb7aa922f 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -50,6 +50,7 @@ struct virtio_vsock_pkt { u32 off; bool reply; bool tap_delivered; + bool slab_buf; }; struct virtio_vsock_pkt_info { diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index ad64f403536a..19909c1e9ba3 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -255,16 +255,20 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock) vq = vsock->vqs[VSOCK_VQ_RX]; do { + struct page *buf_page; + pkt = kzalloc(sizeof(*pkt), GFP_KERNEL); if (!pkt) break; - pkt->buf = kmalloc(buf_len, GFP_KERNEL); - if (!pkt->buf) { + buf_page = alloc_page(GFP_KERNEL); + + if (!buf_page) { virtio_transport_free_pkt(pkt); break; } + pkt->buf = page_to_virt(buf_page); pkt->buf_len = buf_len; pkt->len = buf_len; diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index ec2c2afbf0d0..278567f748f2 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -69,6 +69,7 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info, if (!pkt->buf) goto out_pkt; + pkt->slab_buf = true; pkt->buf_len = len; err = memcpy_from_msg(pkt->buf, info->msg, len); @@ -1342,7 +1343,13 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt); void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt) { - kfree(pkt->buf); + if (pkt->buf_len) { + if (pkt->slab_buf) + kfree(pkt->buf); + else + free_pages(buf, get_order(pkt->buf_len)); + } + kfree(pkt); } EXPORT_SYMBOL_GPL(virtio_transport_free_pkt); From patchwork Fri Jun 3 05:33:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 12868539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 495DDC43334 for ; Fri, 3 Jun 2022 05:33:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240676AbiFCFdx (ORCPT ); Fri, 3 Jun 2022 01:33:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49232 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234979AbiFCFdv (ORCPT ); Fri, 3 Jun 2022 01:33:51 -0400 Received: from mail.sberdevices.ru (mail.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5787738DB4; Thu, 2 Jun 2022 22:33:49 -0700 (PDT) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mail.sberdevices.ru (Postfix) with ESMTP id 5D6555FD02; Fri, 3 Jun 2022 08:33:47 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1654234427; bh=b9GzNc2WzHZEJq4QtMm9PZjoUEvGtrSFFwftlDQ1gMs=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=AersA+wy3i1Sk0Ws4oMv1WxulXlzA+iGQxmqXhbV2j1N6py8peQFv0YK6tUPUnAYF TUpaOvWba1FveyVgL4Djk3bSa8eyxZFObmW9QsC8MXgvsZvkmjq0iibmE+oP6RVDiz QDY02l1rpzP1kx0glOveb/LzLK2iDh9mFNoiWbR+rUbe/7KwN3yx3nfj3d5u2VogUV y75vSn53WbjmrVipNUhnZ9VGdERbFsDnFBFetWfR86g917FXPYaYWGbGmtODTn8pND JTv/7h3iusStTJyE6NASEsbUMYlfC3ncWK0A01WJHCZOdbATZP8JWrEctwvmiu/q8M 7WwmP2REVwN0Q== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mail.sberdevices.ru (Postfix) with ESMTP; Fri, 3 Jun 2022 08:33:32 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "Jakub Kicinski" , Paolo Abeni CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel , Krasnov Arseniy Subject: [RFC PATCH v2 2/8] vhost/vsock: rework packet allocation logic Thread-Topic: [RFC PATCH v2 2/8] vhost/vsock: rework packet allocation logic Thread-Index: AQHYdwtmNyMIn8KDWEakLUC7mf1LGg== Date: Fri, 3 Jun 2022 05:33:04 +0000 Message-ID: <72ae7f76-ffee-3e64-d445-7a0f4261d891@sberdevices.ru> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: <45162A55C80F4745A2F2E93FE9A70961@sberdevices.ru> MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/06/03 01:19:00 #19656765 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org For packets received from virtio RX queue, use buddy allocator instead of 'kmalloc()' to be able to insert such pages to user provided vma. Single call to 'copy_from_iter()' replaced with per-page loop. Signed-off-by: Arseniy Krasnov --- drivers/vhost/vsock.c | 81 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 69 insertions(+), 12 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index e6c9d41db1de..0dc2229f18f7 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -58,6 +58,7 @@ struct vhost_vsock { u32 guest_cid; bool seqpacket_allow; + bool zerocopy_rx_on; }; static u32 vhost_transport_get_local_cid(void) @@ -357,6 +358,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, unsigned int out, unsigned int in) { struct virtio_vsock_pkt *pkt; + struct vhost_vsock *vsock; struct iov_iter iov_iter; size_t nbytes; size_t len; @@ -393,20 +395,75 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, return NULL; } - pkt->buf = kmalloc(pkt->len, GFP_KERNEL); - if (!pkt->buf) { - kfree(pkt); - return NULL; - } - pkt->buf_len = pkt->len; + vsock = container_of(vq->dev, struct vhost_vsock, dev); - nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter); - if (nbytes != pkt->len) { - vq_err(vq, "Expected %u byte payload, got %zu bytes\n", - pkt->len, nbytes); - virtio_transport_free_pkt(pkt); - return NULL; + if (!vsock->zerocopy_rx_on) { + pkt->buf = kmalloc(pkt->len, GFP_KERNEL); + + if (!pkt->buf) { + kfree(pkt); + return NULL; + } + + pkt->slab_buf = true; + nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter); + if (nbytes != pkt->len) { + vq_err(vq, "Expected %u byte payload, got %zu bytes\n", + pkt->len, nbytes); + virtio_transport_free_pkt(pkt); + return NULL; + } + } else { + struct page *buf_page; + ssize_t pkt_len; + int page_idx; + + /* This creates memory overrun, as we allocate + * at least one page for each packet. + */ + buf_page = alloc_pages(GFP_KERNEL, get_order(pkt->len)); + + if (buf_page == NULL) { + kfree(pkt); + return NULL; + } + + pkt->buf = page_to_virt(buf_page); + + page_idx = 0; + pkt_len = pkt->len; + + /* As allocated pages are not mapped, process + * pages one by one. + */ + while (pkt_len > 0) { + void *mapped; + size_t to_copy; + + mapped = kmap(buf_page + page_idx); + + if (mapped == NULL) { + virtio_transport_free_pkt(pkt); + return NULL; + } + + to_copy = min(pkt_len, ((ssize_t)PAGE_SIZE)); + + nbytes = copy_from_iter(mapped, to_copy, &iov_iter); + if (nbytes != to_copy) { + vq_err(vq, "Expected %zu byte payload, got %zu bytes\n", + to_copy, nbytes); + kunmap(mapped); + virtio_transport_free_pkt(pkt); + return NULL; + } + + kunmap(mapped); + + pkt_len -= to_copy; + page_idx++; + } } return pkt; From patchwork Fri Jun 3 05:35:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 12868540 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E188ACCA473 for ; Fri, 3 Jun 2022 05:36:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240816AbiFCFgi (ORCPT ); Fri, 3 Jun 2022 01:36:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230374AbiFCFge (ORCPT ); Fri, 3 Jun 2022 01:36:34 -0400 Received: from mail.sberdevices.ru (mail.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9777939142; Thu, 2 Jun 2022 22:36:31 -0700 (PDT) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mail.sberdevices.ru (Postfix) with ESMTP id BBB495FD04; Fri, 3 Jun 2022 08:36:29 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1654234589; bh=0LvHAQl3XARivHwA03baFrqhSI1ZsvnwZSA/W3J5NCI=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=Ec9QIUa3BpOFqy+GxJKhjo/GxdhOC/hHDF3x6vp8VDfpgdGFh8iT/ClyrnPtGXRlT r0HvffkyQWiHjDTgwXRK9W+6Mtq4p7yIuHt8rDU/494FM6PxFWgfUt7WPPOh8UpgGO ycMYB7sDsWQ3EzvU/bN+OXH4AJajqNxMJKbrl/99SA1XLz01/nt6hwo8WSH/vLslfY 5aUYZXkruXjUvcBDVGK89MS0F7NiYrxWQy3rSi6Tl/uu0fs56645MGLt8fI24Q29k7 Lg2fiisa54dvmpzH4EiI913+Zoki8cVpNdQ1TL5bUWud5P3TcjRRi7E3NeyJwvcXgS GFMgyhhkHdBlA== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mail.sberdevices.ru (Postfix) with ESMTP; Fri, 3 Jun 2022 08:36:15 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "Jakub Kicinski" , Paolo Abeni CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel , Krasnov Arseniy , Arseniy Krasnov Subject: [RFC PATCH v2 3/8] af_vsock: add zerocopy receive logic Thread-Topic: [RFC PATCH v2 3/8] af_vsock: add zerocopy receive logic Thread-Index: AQHYdwvHqkojsXt09k2Zv7nWir8Frg== Date: Fri, 3 Jun 2022 05:35:48 +0000 Message-ID: <129aa328-ad4d-cb2c-4a51-4a2bf9c9be37@sberdevices.ru> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/06/03 01:19:00 #19656765 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This: 1) Adds callback for 'mmap()' call on socket. It checks vm area flags and sets vm area ops. 2) Adds special 'getsockopt()' case which calls transport zerocopy callback. Input argument is vm area address. 3) Adds 'getsockopt()/setsockopt()' for switching on/off rx zerocopy mode. Signed-off-by: Arseniy Krasnov --- include/net/af_vsock.h | 7 +++ include/uapi/linux/vm_sockets.h | 3 + net/vmw_vsock/af_vsock.c | 100 ++++++++++++++++++++++++++++++++ 3 files changed, 110 insertions(+) diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index f742e50207fb..f15f84c648ff 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -135,6 +135,13 @@ struct vsock_transport { bool (*stream_is_active)(struct vsock_sock *); bool (*stream_allow)(u32 cid, u32 port); + int (*rx_zerocopy_set)(struct vsock_sock *vsk, + bool enable); + int (*rx_zerocopy_get)(struct vsock_sock *vsk); + int (*zerocopy_dequeue)(struct vsock_sock *vsk, + struct vm_area_struct *vma, + unsigned long addr); + /* SEQ_PACKET. */ ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg, int flags); diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h index c60ca33eac59..d1f792bed1a7 100644 --- a/include/uapi/linux/vm_sockets.h +++ b/include/uapi/linux/vm_sockets.h @@ -83,6 +83,9 @@ #define SO_VM_SOCKETS_CONNECT_TIMEOUT_NEW 8 +#define SO_VM_SOCKETS_MAP_RX 9 +#define SO_VM_SOCKETS_ZEROCOPY 10 + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) #define SO_VM_SOCKETS_CONNECT_TIMEOUT SO_VM_SOCKETS_CONNECT_TIMEOUT_OLD diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index f04abf662ec6..10061ef21730 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1644,6 +1644,17 @@ static int vsock_connectible_setsockopt(struct socket *sock, } break; } + case SO_VM_SOCKETS_ZEROCOPY: { + if (!transport || !transport->rx_zerocopy_set) { + err = -EOPNOTSUPP; + } else { + COPY_IN(val); + + if (transport->rx_zerocopy_set(vsk, val)) + err = -EINVAL; + } + break; + } default: err = -ENOPROTOOPT; @@ -1657,6 +1668,48 @@ static int vsock_connectible_setsockopt(struct socket *sock, return err; } +static const struct vm_operations_struct afvsock_vm_ops = { +}; + +static int vsock_recv_zerocopy(struct socket *sock, + unsigned long address) +{ + struct sock *sk = sock->sk; + struct vsock_sock *vsk = vsock_sk(sk); + struct vm_area_struct *vma; + const struct vsock_transport *transport; + int res; + + transport = vsk->transport; + + if (!transport->rx_zerocopy_get) + return -EOPNOTSUPP; + + if (!transport->rx_zerocopy_get(vsk)) + return -EOPNOTSUPP; + + if (!transport->zerocopy_dequeue) + return -EOPNOTSUPP; + + lock_sock(sk); + mmap_write_lock(current->mm); + + vma = vma_lookup(current->mm, address); + + if (!vma || vma->vm_ops != &afvsock_vm_ops) { + mmap_write_unlock(current->mm); + release_sock(sk); + return -EINVAL; + } + + res = transport->zerocopy_dequeue(vsk, vma, address); + + mmap_write_unlock(current->mm); + release_sock(sk); + + return res; +} + static int vsock_connectible_getsockopt(struct socket *sock, int level, int optname, char __user *optval, @@ -1701,6 +1754,39 @@ static int vsock_connectible_getsockopt(struct socket *sock, lv = sock_get_timeout(vsk->connect_timeout, &v, optname == SO_VM_SOCKETS_CONNECT_TIMEOUT_OLD); break; + case SO_VM_SOCKETS_ZEROCOPY: { + const struct vsock_transport *transport; + int res; + + transport = vsk->transport; + + if (!transport->rx_zerocopy_get) + return -EOPNOTSUPP; + + lock_sock(sk); + + res = transport->rx_zerocopy_get(vsk); + + release_sock(sk); + + if (res < 0) + return -EINVAL; + + v.val64 = res; + + break; + } + case SO_VM_SOCKETS_MAP_RX: { + unsigned long vma_addr; + + if (len < sizeof(vma_addr)) + return -EINVAL; + + if (copy_from_user(&vma_addr, optval, sizeof(vma_addr))) + return -EFAULT; + + return vsock_recv_zerocopy(sock, vma_addr); + } default: return -ENOPROTOOPT; @@ -2129,6 +2215,19 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, return err; } +static int afvsock_mmap(struct file *file, struct socket *sock, + struct vm_area_struct *vma) +{ + if (vma->vm_flags & (VM_WRITE | VM_EXEC)) + return -EPERM; + + vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC); + vma->vm_flags |= (VM_MIXEDMAP); + vma->vm_ops = &afvsock_vm_ops; + + return 0; +} + static const struct proto_ops vsock_stream_ops = { .family = PF_VSOCK, .owner = THIS_MODULE, @@ -2148,6 +2247,7 @@ static const struct proto_ops vsock_stream_ops = { .recvmsg = vsock_connectible_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, + .mmap = afvsock_mmap, }; static const struct proto_ops vsock_seqpacket_ops = { From patchwork Fri Jun 3 05:37:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 12868586 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E08D2CCA47E for ; Fri, 3 Jun 2022 05:38:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241158AbiFCFig (ORCPT ); Fri, 3 Jun 2022 01:38:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241168AbiFCFiY (ORCPT ); Fri, 3 Jun 2022 01:38:24 -0400 Received: from mail.sberdevices.ru (mail.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F03B38BE4; Thu, 2 Jun 2022 22:38:17 -0700 (PDT) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mail.sberdevices.ru (Postfix) with ESMTP id 54FE15FD02; Fri, 3 Jun 2022 08:38:15 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1654234695; bh=YpfLxtcbJeSU964yHCAfebUGWv+Ea5aAt8JOnKACLe4=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=KkjxdlecRrz7iGfmG52kM7LQTNfx/CG8d54Efe9rPoQ/U2q+5N2hZgb24wsUvRMTr VFciRXyleCbTOgwdx4NuePYKlOWiBjJ32u16RTEA8FpgffFV6T+ussH4c7N7z1joPH etglWguQ2h6WxUVs2RLgGGpFjfrrM2dqzSUkVl89bcKqv3/p2yrW49lbIlOyQlVF7Z WYqNMMSZK7pDoJknponkTgY01i7ZVV+x6sI6/p3eq4bRLtF942pXVIHwfkFBS55KQy BjLJCt+lwTFMnK8wi+mleUbMGwbGSJPqb5awmn9ApNvNccu2JIT3vvenOCPGHv6N+v mjcF99zi+ohWQ== Received: from S-MS-EXCH02.sberdevices.ru (S-MS-EXCH02.sberdevices.ru [172.16.1.5]) by mail.sberdevices.ru (Postfix) with ESMTP; Fri, 3 Jun 2022 08:38:15 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "Jakub Kicinski" , Paolo Abeni CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel , Krasnov Arseniy , Arseniy Krasnov Subject: [RFC PATCH v2 4/8] virtio/vsock: add transport zerocopy callback Thread-Topic: [RFC PATCH v2 4/8] virtio/vsock: add transport zerocopy callback Thread-Index: AQHYdwwOHCM79Ul6bEOypwJFONJkYA== Date: Fri, 3 Jun 2022 05:37:48 +0000 Message-ID: <6f76eed7-decc-68d1-6ae7-7bafb09fdad3@sberdevices.ru> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: <6D8F74365F197B41B0821431AFA294B4@sberdevices.ru> MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/06/03 01:19:00 #19656765 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This adds transport callback which processes rx queue of socket and instead of copying data to user provided buffer, it inserts data pages of each packet to user's vm area. Signed-off-by: Arseniy Krasnov --- include/linux/virtio_vsock.h | 4 + include/uapi/linux/virtio_vsock.h | 6 + net/vmw_vsock/virtio_transport_common.c | 208 +++++++++++++++++++++++- 3 files changed, 215 insertions(+), 3 deletions(-) diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index d02cb7aa922f..47a68a2ea838 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -51,6 +51,7 @@ struct virtio_vsock_pkt { bool reply; bool tap_delivered; bool slab_buf; + bool split; }; struct virtio_vsock_pkt_info { @@ -131,6 +132,9 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr); bool virtio_transport_dgram_allow(u32 cid, u32 port); +int virtio_transport_zerocopy_dequeue(struct vsock_sock *vsk, + struct vm_area_struct *vma, + unsigned long addr); int virtio_transport_connect(struct vsock_sock *vsk); int virtio_transport_shutdown(struct vsock_sock *vsk, int mode); diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h index 64738838bee5..6775c6c44b5b 100644 --- a/include/uapi/linux/virtio_vsock.h +++ b/include/uapi/linux/virtio_vsock.h @@ -66,6 +66,12 @@ struct virtio_vsock_hdr { __le32 fwd_cnt; } __attribute__((packed)); +struct virtio_vsock_usr_hdr { + u32 flags; + u32 len; + u32 copy_len; +} __attribute__((packed)); + enum virtio_vsock_type { VIRTIO_VSOCK_TYPE_STREAM = 1, VIRTIO_VSOCK_TYPE_SEQPACKET = 2, diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 278567f748f2..3a3e84176c75 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -347,6 +348,196 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk, return err; } +#define MAX_PAGES_TO_MAP 256 + +int virtio_transport_zerocopy_dequeue(struct vsock_sock *vsk, + struct vm_area_struct *vma, + unsigned long addr) +{ + struct virtio_vsock_sock *vvs = vsk->trans; + struct virtio_vsock_usr_hdr *usr_hdr_buffer; + unsigned long max_pages_to_insert; + unsigned long tmp_pages_inserted; + unsigned long pages_to_insert; + struct page *usr_hdr_page; + unsigned long vma_size; + struct page **pages; + int max_vma_pages; + int max_usr_hdrs; + int res; + int err; + int i; + + /* Only use VMA from first page. */ + if (vma->vm_start != addr) + return -EFAULT; + + vma_size = vma->vm_end - vma->vm_start; + + /* Too small vma(at least one page for headers + * and one page for data). + */ + if (vma_size < 2 * PAGE_SIZE) + return -EFAULT; + + /* Page for meta data. */ + usr_hdr_page = alloc_page(GFP_KERNEL); + + if (!usr_hdr_page) + return -EFAULT; + + pages = kmalloc_array(MAX_PAGES_TO_MAP, sizeof(pages[0]), GFP_KERNEL); + + if (!pages) + return -EFAULT; + + pages[pages_to_insert++] = usr_hdr_page; + + usr_hdr_buffer = page_to_virt(usr_hdr_page); + + err = 0; + + /* As we use first page for headers, so total number of + * pages for user is min between number of headers in + * first page and size of vma(in pages, except first page). + */ + max_usr_hdrs = PAGE_SIZE / sizeof(*usr_hdr_buffer); + max_vma_pages = (vma_size / PAGE_SIZE) - 1; + max_pages_to_insert = min(max_usr_hdrs, max_vma_pages); + + if (max_pages_to_insert > MAX_PAGES_TO_MAP) + max_pages_to_insert = MAX_PAGES_TO_MAP; + + spin_lock_bh(&vvs->rx_lock); + + while (!list_empty(&vvs->rx_queue) && + pages_to_insert < max_pages_to_insert) { + struct virtio_vsock_pkt *pkt; + ssize_t rest_data_bytes; + size_t moved_data_bytes; + unsigned long pg_offs; + + pkt = list_first_entry(&vvs->rx_queue, + struct virtio_vsock_pkt, list); + + /* Buffer was allocated by 'kmalloc()'. This could + * happen, when zerocopy was enabled, but we still + * have pending packet which was created before it. + */ + if (pkt->slab_buf) { + usr_hdr_buffer->flags = le32_to_cpu(pkt->hdr.flags); + usr_hdr_buffer->len = 0; + usr_hdr_buffer->copy_len = le32_to_cpu(pkt->hdr.len); + /* Report user to read it using copy. */ + break; + } + + /* This could happen, when packet was dequeued before + * by an ordinary 'read()' call. We can't handle such + * packet. Drop it. + */ + if (pkt->off % PAGE_SIZE) { + list_del(&pkt->list); + virtio_transport_dec_rx_pkt(vvs, pkt); + virtio_transport_free_pkt(pkt); + continue; + } + + rest_data_bytes = le32_to_cpu(pkt->hdr.len) - pkt->off; + + /* For packets, bigger than one page, split it's + * high order allocated buffer to 0 order pages. + * Otherwise 'vm_insert_pages()' will fail, for + * all pages except first. + */ + if (rest_data_bytes > PAGE_SIZE) { + /* High order buffer not split yet. */ + if (!pkt->split) { + split_page(virt_to_page(pkt->buf), + get_order(le32_to_cpu(pkt->hdr.len))); + pkt->split = true; + } + } + + pg_offs = pkt->off; + moved_data_bytes = 0; + + while (rest_data_bytes && + pages_to_insert < max_pages_to_insert) { + struct page *buf_page; + + buf_page = virt_to_page(pkt->buf + pg_offs); + + pages[pages_to_insert++] = buf_page; + /* Get reference to prevent this page being + * returned to page allocator when packet will + * be freed. Ref count will be 2. + */ + get_page(buf_page); + pg_offs += PAGE_SIZE; + + if (rest_data_bytes >= PAGE_SIZE) { + moved_data_bytes += PAGE_SIZE; + rest_data_bytes -= PAGE_SIZE; + } else { + moved_data_bytes += rest_data_bytes; + rest_data_bytes = 0; + } + } + + usr_hdr_buffer->flags = le32_to_cpu(pkt->hdr.flags); + usr_hdr_buffer->len = moved_data_bytes; + usr_hdr_buffer->copy_len = 0; + usr_hdr_buffer++; + + pkt->off = pg_offs; + + if (rest_data_bytes == 0) { + list_del(&pkt->list); + virtio_transport_dec_rx_pkt(vvs, pkt); + virtio_transport_free_pkt(pkt); + } + + /* Now ref count for all pages of packet is 1. */ + } + + /* Set last buffer empty(if we have one). */ + if (pages_to_insert - 1 < max_usr_hdrs) + usr_hdr_buffer->len = 0; + + spin_unlock_bh(&vvs->rx_lock); + + tmp_pages_inserted = pages_to_insert; + + res = vm_insert_pages(vma, addr, pages, &tmp_pages_inserted); + + if (res || tmp_pages_inserted) { + /* Failed to insert some pages, we have "partially" + * mapped vma. Do not return, set error code. This + * code will be returned to user. User needs to call + * 'madvise()/mmap()' to clear this vma. Anyway, + * references to all pages will to be dropped below. + */ + err = -EFAULT; + } + + /* Put reference for every page. */ + for (i = 0; i < pages_to_insert; i++) { + /* Ref count is 2 ('get_page()' + 'vm_insert_pages()' above). + * Put reference once, page will be returned to allocator + * after user's 'madvice()/munmap()' call(or it wasn't mapped + * if 'vm_insert_pages()' failed). + */ + put_page(pages[i]); + } + + virtio_transport_send_credit_update(vsk); + kfree(pages); + + return err; +} +EXPORT_SYMBOL_GPL(virtio_transport_zerocopy_dequeue); + static ssize_t virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, struct msghdr *msg, @@ -1344,10 +1535,21 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt); void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt) { if (pkt->buf_len) { - if (pkt->slab_buf) + if (pkt->slab_buf) { kfree(pkt->buf); - else - free_pages(buf, get_order(pkt->buf_len)); + } else { + unsigned int order = get_order(pkt->buf_len); + unsigned long buf = (unsigned long)pkt->buf; + + if (pkt->split) { + int i; + + for (i = 0; i < (1 << order); i++) + free_page(buf + i * PAGE_SIZE); + } else { + free_pages(buf, order); + } + } } kfree(pkt); From patchwork Fri Jun 3 05:39:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 12868587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CADDAC433EF for ; Fri, 3 Jun 2022 05:40:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240805AbiFCFkN (ORCPT ); Fri, 3 Jun 2022 01:40:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241226AbiFCFjy (ORCPT ); Fri, 3 Jun 2022 01:39:54 -0400 Received: from mail.sberdevices.ru (mail.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22B68396AE; Thu, 2 Jun 2022 22:39:52 -0700 (PDT) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mail.sberdevices.ru (Postfix) with ESMTP id 632305FD02; Fri, 3 Jun 2022 08:39:50 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1654234790; bh=lK6cao0uju2hrt5cbvoIe3JQzXNM+1hhRQxVtKooXw8=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=UNN5T8IYi3P36krQSPsocMZEVkk1kXmwNnIC+0HZHzG9ZPXNy+YBwCrFK1yu+aQKE S+O3eUEwTCM385w+UFrS3xOkkNX9WNzrAhQwJ38CyXUlOm4/xPvvy4OlqgpF7OducO y9Ibhd/FIz5EDFELcILtVQmU+JJ3urdQnhKNm36yno6Dj51Za9JOLhoC+NbXuW9LnF TPB4fMZvarF0ZoJch4RFXj+3MZQsHkWjH7af99lI5kwhBTpPghbwIyhXxGeuI0uioi krCJeBV0yb/EgPx2Qn9NRb+MS5D4ilUUbDFJOKEjWdbtdbEyuzTXYR1m5j4vE4hJh8 4X0w9BNA/nmsA== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mail.sberdevices.ru (Postfix) with ESMTP; Fri, 3 Jun 2022 08:39:49 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "Jakub Kicinski" , Paolo Abeni CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel , Krasnov Arseniy , Arseniy Krasnov Subject: [RFC PATCH v2 5/8] vhost/vsock: enable zerocopy callback Thread-Topic: [RFC PATCH v2 5/8] vhost/vsock: enable zerocopy callback Thread-Index: AQHYdwxHDqmhK66rm0GOZ6CooDdkrg== Date: Fri, 3 Jun 2022 05:39:23 +0000 Message-ID: <04c01c03-647c-49c2-bfa3-23fd995ce5bf@sberdevices.ru> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/06/03 01:19:00 #19656765 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This adds zerocopy callback to vhost transport. Signed-off-by: Arseniy Krasnov --- drivers/vhost/vsock.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 0dc2229f18f7..dcb8182f5ac9 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -481,6 +481,43 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock) return val < vq->num; } +static int vhost_transport_zerocopy_set(struct vsock_sock *vsk, bool enable) +{ + struct vhost_vsock *vsock; + + rcu_read_lock(); + vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); + + if (!vsock) { + rcu_read_unlock(); + return -ENODEV; + } + + vsock->zerocopy_rx_on = enable; + rcu_read_unlock(); + + return 0; +} + +static int vhost_transport_zerocopy_get(struct vsock_sock *vsk) +{ + struct vhost_vsock *vsock; + bool res; + + rcu_read_lock(); + vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); + + if (!vsock) { + rcu_read_unlock(); + return -ENODEV; + } + + res = vsock->zerocopy_rx_on; + rcu_read_unlock(); + + return res; +} + static bool vhost_transport_seqpacket_allow(u32 remote_cid); static struct virtio_transport vhost_transport = { @@ -508,6 +545,9 @@ static struct virtio_transport vhost_transport = { .stream_rcvhiwat = virtio_transport_stream_rcvhiwat, .stream_is_active = virtio_transport_stream_is_active, .stream_allow = virtio_transport_stream_allow, + .zerocopy_dequeue = virtio_transport_zerocopy_dequeue, + .rx_zerocopy_set = vhost_transport_zerocopy_set, + .rx_zerocopy_get = vhost_transport_zerocopy_get, .seqpacket_dequeue = virtio_transport_seqpacket_dequeue, .seqpacket_enqueue = virtio_transport_seqpacket_enqueue, From patchwork Fri Jun 3 05:41:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 12868588 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A03A9C43334 for ; Fri, 3 Jun 2022 05:41:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241245AbiFCFlk (ORCPT ); Fri, 3 Jun 2022 01:41:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235472AbiFCFli (ORCPT ); Fri, 3 Jun 2022 01:41:38 -0400 Received: from mail.sberdevices.ru (mail.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B04736E00; Thu, 2 Jun 2022 22:41:36 -0700 (PDT) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mail.sberdevices.ru (Postfix) with ESMTP id 540745FD02; Fri, 3 Jun 2022 08:41:34 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1654234894; bh=7SQwZfSXoKBQPVHv3x8ggArLeuvotFhF1XdrXYyomo8=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=pfpY1GpGZpjUI7/Yy8OekTKh7rv7qGUQ1Us2G65PTuuCs1MeULhER0U/782zFvMxw 445IvEgRuuvR9yqHkkc7NE0MM4s9oUVMfqOvfGyxpN8OiRxtG7zcWYYwWSEYwSEYRj OPinC+ZJCuifVAa4b+cAbB8fGUPgoP/udhhK3VQtimAZaQppi4smDZk+chbnyQs6Ud N7mcZSi0KGgtG6EC7lqt6D71VxGylOJTmKthe2TvwR+0UAyRR0ETxf9Ls/bANUitut l8Go0XPzeHedM2n6H65Nw9EdtXg6ooNRr0FgI5alCdYL+uVivhEXuPzZhnGwehrPX+ kDRr/2lMOtL5g== Received: from S-MS-EXCH02.sberdevices.ru (S-MS-EXCH02.sberdevices.ru [172.16.1.5]) by mail.sberdevices.ru (Postfix) with ESMTP; Fri, 3 Jun 2022 08:41:34 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Paolo Abeni , Jakub Kicinski CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel , Krasnov Arseniy , Arseniy Krasnov Subject: [RFC PATCH v2 6/8] virtio/vsock: enable zerocopy callback Thread-Topic: [RFC PATCH v2 6/8] virtio/vsock: enable zerocopy callback Thread-Index: AQHYdwyFKI2EahWo0ki3hG6/uJhhrg== Date: Fri, 3 Jun 2022 05:41:07 +0000 Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/06/03 01:19:00 #19656765 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This adds zerocopy callback for virtio transport. Signed-off-by: Arseniy Krasnov --- net/vmw_vsock/virtio_transport.c | 43 ++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 19909c1e9ba3..2e05b01caa94 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -64,6 +64,7 @@ struct virtio_vsock { u32 guest_cid; bool seqpacket_allow; + bool zerocopy_rx_on; }; static u32 virtio_transport_get_local_cid(void) @@ -455,6 +456,45 @@ static void virtio_vsock_rx_done(struct virtqueue *vq) static bool virtio_transport_seqpacket_allow(u32 remote_cid); +static int +virtio_transport_zerocopy_set(struct vsock_sock *vsk, bool enable) +{ + struct virtio_vsock *vsock; + + rcu_read_lock(); + vsock = rcu_dereference(the_virtio_vsock); + + if (!vsock) { + rcu_read_unlock(); + return -ENODEV; + } + + vsock->zerocopy_rx_on = enable; + rcu_read_unlock(); + + return 0; +} + +static int +virtio_transport_zerocopy_get(struct vsock_sock *vsk) +{ + struct virtio_vsock *vsock; + bool res; + + rcu_read_lock(); + vsock = rcu_dereference(the_virtio_vsock); + + if (!vsock) { + rcu_read_unlock(); + return -ENODEV; + } + + res = vsock->zerocopy_rx_on; + rcu_read_unlock(); + + return res; +} + static struct virtio_transport virtio_transport = { .transport = { .module = THIS_MODULE, @@ -480,6 +520,9 @@ static struct virtio_transport virtio_transport = { .stream_rcvhiwat = virtio_transport_stream_rcvhiwat, .stream_is_active = virtio_transport_stream_is_active, .stream_allow = virtio_transport_stream_allow, + .zerocopy_dequeue = virtio_transport_zerocopy_dequeue, + .rx_zerocopy_set = virtio_transport_zerocopy_set, + .rx_zerocopy_get = virtio_transport_zerocopy_get, .seqpacket_dequeue = virtio_transport_seqpacket_dequeue, .seqpacket_enqueue = virtio_transport_seqpacket_enqueue, From patchwork Fri Jun 3 05:43:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 12868605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BFAEC43334 for ; Fri, 3 Jun 2022 05:43:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241223AbiFCFnt (ORCPT ); Fri, 3 Jun 2022 01:43:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235472AbiFCFnn (ORCPT ); Fri, 3 Jun 2022 01:43:43 -0400 Received: from mail.sberdevices.ru (mail.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D28836168; Thu, 2 Jun 2022 22:43:40 -0700 (PDT) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mail.sberdevices.ru (Postfix) with ESMTP id AF4565FD02; Fri, 3 Jun 2022 08:43:38 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1654235018; bh=31aArHMi6k3CN1KZl/Pk9cGvfNS9fx544j+j68Hm13w=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=f/hL/Hc39ZWEAQyiSDIXMw80WCszsm7g4wqhBDnS1UF7H79c5XyGXYM+0j7MfBI3d qLjC8a4urAxSbTgm9yiTJZEsRbeDs1QH6bj2qAz5quwRAbQHZexvv5IxWF85VLC6s5 TsX/EgZ4px21zUsJXUJS60qy0ouZdFZcFhcaThf1ms6SgKmhVGSNCdE3Dp+FMW8NBN 8mE2kfz+lLJmDSWeGQhG6G9hqnAprSt1G0lJ29yAeeMnQHq9d7slcje7UXdqO+ZloJ c1Xrp06jQsWenF8mkmpSPzdKfiJCU1kLuJLBJT/LTNxf/Fj8d8D1e2B20ck1xtQFDQ BK+jBk6GXEvtw== Received: from S-MS-EXCH01.sberdevices.ru (S-MS-EXCH01.sberdevices.ru [172.16.1.4]) by mail.sberdevices.ru (Postfix) with ESMTP; Fri, 3 Jun 2022 08:43:38 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "Jakub Kicinski" , Paolo Abeni CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel , Krasnov Arseniy , Arseniy Krasnov Subject: [RFC PATCH v2 7/8] test/vsock: add receive zerocopy tests Thread-Topic: [RFC PATCH v2 7/8] test/vsock: add receive zerocopy tests Thread-Index: AQHYdwzPsutMge28Gk6KqBi/6Tb/AQ== Date: Fri, 3 Jun 2022 05:43:11 +0000 Message-ID: <277175af-8240-0fc7-0cde-66cdbaaa47fa@sberdevices.ru> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: <72B69E96C56B154B99943A6A71475BD6@sberdevices.ru> MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/06/03 01:19:00 #19656765 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This adds tests for zerocopy feature: one test checks data transmission with simple integrity control. Second test covers 'error' branches in zerocopy logic(to check invalid arguments handling). Signed-off-by: Arseniy Krasnov --- tools/include/uapi/linux/virtio_vsock.h | 11 + tools/include/uapi/linux/vm_sockets.h | 8 + tools/testing/vsock/control.c | 34 +++ tools/testing/vsock/control.h | 2 + tools/testing/vsock/vsock_test.c | 295 ++++++++++++++++++++++++ 5 files changed, 350 insertions(+) create mode 100644 tools/include/uapi/linux/virtio_vsock.h create mode 100644 tools/include/uapi/linux/vm_sockets.h diff --git a/tools/include/uapi/linux/virtio_vsock.h b/tools/include/uapi/linux/virtio_vsock.h new file mode 100644 index 000000000000..c23d85e73d04 --- /dev/null +++ b/tools/include/uapi/linux/virtio_vsock.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _UAPI_LINUX_VIRTIO_VSOCK_H +#define _UAPI_LINUX_VIRTIO_VSOCK_H +#include + +struct virtio_vsock_usr_hdr { + u32 flags; + u32 len; + u32 copy_len; +} __attribute__((packed)); +#endif /* _UAPI_LINUX_VIRTIO_VSOCK_H */ diff --git a/tools/include/uapi/linux/vm_sockets.h b/tools/include/uapi/linux/vm_sockets.h new file mode 100644 index 000000000000..cac0bc3a7041 --- /dev/null +++ b/tools/include/uapi/linux/vm_sockets.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _UAPI_LINUX_VM_SOCKETS_H +#define _UAPI_LINUX_VM_SOCKETS_H + +#define SO_VM_SOCKETS_MAP_RX 9 +#define SO_VM_SOCKETS_ZEROCOPY 10 + +#endif /* _UAPI_LINUX_VM_SOCKETS_H */ diff --git a/tools/testing/vsock/control.c b/tools/testing/vsock/control.c index 4874872fc5a3..00a654e8f137 100644 --- a/tools/testing/vsock/control.c +++ b/tools/testing/vsock/control.c @@ -141,6 +141,40 @@ void control_writeln(const char *str) timeout_end(); } +void control_writelong(long value) +{ + char str[32]; + + if (snprintf(str, sizeof(str), "%li", value) >= sizeof(str)) { + perror("snprintf"); + exit(EXIT_FAILURE); + } + + control_writeln(str); +} + +long control_readlong(bool *ok) +{ + long value = -1; + char *str; + + if (ok) + *ok = false; + + str = control_readln(); + + if (str == NULL) + return value; + + value = strtol(str, NULL, 10); + free(str); + + if (ok) + *ok = true; + + return value; +} + /* Return the next line from the control socket (without the trailing newline). * * The program terminates if a timeout occurs. diff --git a/tools/testing/vsock/control.h b/tools/testing/vsock/control.h index 51814b4f9ac1..5272ad20e850 100644 --- a/tools/testing/vsock/control.h +++ b/tools/testing/vsock/control.h @@ -9,7 +9,9 @@ void control_init(const char *control_host, const char *control_port, void control_cleanup(void); void control_writeln(const char *str); char *control_readln(void); +long control_readlong(bool *ok); void control_expectln(const char *str); bool control_cmpln(char *line, const char *str, bool fail); +void control_writelong(long value); #endif /* CONTROL_H */ diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index dc577461afc2..1b8c40bab33e 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -18,11 +18,16 @@ #include #include #include +#include +#include +#include #include "timeout.h" #include "control.h" #include "util.h" +#define PAGE_SIZE 4096 + static void test_stream_connection_reset(const struct test_opts *opts) { union { @@ -596,6 +601,285 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt close(fd); } +static void test_stream_zerocopy_rx_client(const struct test_opts *opts) +{ + unsigned long total_sum; + unsigned long zc_on = 1; + size_t rx_map_len; + long rec_value; + void *rx_va; + int fd; + + fd = vsock_stream_connect(opts->peer_cid, 1234); + if (fd < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_ZEROCOPY, + (void *)&zc_on, sizeof(zc_on))) { + perror("setsockopt"); + exit(EXIT_FAILURE); + } + + rx_map_len = PAGE_SIZE * 3; + + rx_va = mmap(NULL, rx_map_len, PROT_READ, MAP_SHARED, fd, 0); + if (rx_va == MAP_FAILED) { + perror("mmap"); + exit(EXIT_FAILURE); + } + + total_sum = 0; + + while (1) { + struct pollfd fds = { 0 }; + int hungup = 0; + int res; + + fds.fd = fd; + fds.events = POLLIN | POLLERR | POLLHUP | + POLLRDHUP | POLLNVAL; + + res = poll(&fds, 1, -1); + + if (res < 0) { + perror("poll"); + exit(EXIT_FAILURE); + } + + if (fds.revents & POLLERR) { + perror("poll error"); + exit(EXIT_FAILURE); + } + + if (fds.revents & POLLIN) { + struct virtio_vsock_usr_hdr *hdr; + uintptr_t tmp_rx_va = (uintptr_t)rx_va; + unsigned char *data_va; + unsigned char *end_va; + socklen_t len = sizeof(tmp_rx_va); + + if (getsockopt(fd, AF_VSOCK, + SO_VM_SOCKETS_MAP_RX, + &tmp_rx_va, &len) < 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + hdr = (struct virtio_vsock_usr_hdr *)rx_va; + /* Skip headers page for data. */ + data_va = rx_va + PAGE_SIZE; + end_va = (unsigned char *)(tmp_rx_va + rx_map_len); + + while (data_va != end_va) { + int data_len = hdr->len; + + if (!hdr->len) { + if (fds.revents & (POLLHUP | POLLRDHUP)) { + if (hdr == rx_va) + hungup = 1; + } + + break; + } + + while (data_len > 0) { + int i; + int to_read = (data_len < PAGE_SIZE) ? + data_len : PAGE_SIZE; + + for (i = 0; i < to_read; i++) + total_sum += data_va[i]; + + data_va += PAGE_SIZE; + data_len -= PAGE_SIZE; + } + + hdr++; + } + + if (madvise((void *)rx_va, rx_map_len, + MADV_DONTNEED)) { + perror("madvise"); + exit(EXIT_FAILURE); + } + + if (hungup) + break; + } + } + + if (munmap(rx_va, rx_map_len)) { + perror("munmap"); + exit(EXIT_FAILURE); + } + + rec_value = control_readlong(NULL); + + if (total_sum != rec_value) { + fprintf(stderr, "sum mismatch %lu != %lu\n", + total_sum, rec_value); + exit(EXIT_FAILURE); + } + + close(fd); +} + +static void test_stream_zerocopy_rx_server(const struct test_opts *opts) +{ + size_t max_buf_size = 40000; + long total_sum = 0; + int n = 10; + int fd; + + fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL); + if (fd < 0) { + perror("accept"); + exit(EXIT_FAILURE); + } + + while (n) { + unsigned char *data; + size_t buf_size; + int i; + + buf_size = 1 + rand() % max_buf_size; + + data = malloc(buf_size); + + if (!data) { + perror("malloc"); + exit(EXIT_FAILURE); + } + + for (i = 0; i < buf_size; i++) { + data[i] = rand() & 0xff; + total_sum += data[i]; + } + + if (write(fd, data, buf_size) != buf_size) { + perror("write"); + exit(EXIT_FAILURE); + } + + free(data); + n--; + } + + control_writelong(total_sum); + + close(fd); +} + +static void test_stream_zerocopy_rx_inv_client(const struct test_opts *opts) +{ + size_t map_size = PAGE_SIZE * 5; + unsigned long zc_on = 1; + socklen_t len; + void *map_va; + int fd; + + fd = vsock_stream_connect(opts->peer_cid, 1234); + if (fd < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + len = sizeof(map_va); + map_va = 0; + + if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_ZEROCOPY, + (void *)&zc_on, sizeof(zc_on))) { + perror("setsockopt"); + exit(EXIT_FAILURE); + } + + /* Try zerocopy with invalid mapping address. */ + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_MAP_RX, + &map_va, &len) == 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + /* Try zerocopy with valid, but not socket mapping. */ + map_va = mmap(NULL, map_size, PROT_READ, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (map_va == MAP_FAILED) { + perror("anon mmap"); + exit(EXIT_FAILURE); + } + + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_MAP_RX, + &map_va, &len) == 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + if (munmap(map_va, map_size)) { + perror("munmap"); + exit(EXIT_FAILURE); + } + + /* Try zerocopy with valid, but too small mapping. */ + map_va = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, fd, 0); + if (map_va == MAP_FAILED) { + perror("socket mmap"); + exit(EXIT_FAILURE); + } + + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_MAP_RX, + &map_va, &len) == 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + if (munmap(map_va, PAGE_SIZE)) { + perror("munmap"); + exit(EXIT_FAILURE); + } + + /* Try zerocopy with valid mapping, but not from first byte. */ + map_va = mmap(NULL, map_size, PROT_READ, MAP_SHARED, fd, 0); + if (map_va == MAP_FAILED) { + perror("socket mmap"); + exit(EXIT_FAILURE); + } + + map_va += PAGE_SIZE; + + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_MAP_RX, + &map_va, &len) == 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + if (munmap(map_va - PAGE_SIZE, map_size)) { + perror("munmap"); + exit(EXIT_FAILURE); + } + + control_writeln("DONE"); + + close(fd); +} + +static void test_stream_zerocopy_rx_inv_server(const struct test_opts *opts) +{ + int fd; + + fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL); + + if (fd < 0) { + perror("accept"); + exit(EXIT_FAILURE); + } + + control_expectln("DONE"); + + close(fd); +} + static struct test_case test_cases[] = { { .name = "SOCK_STREAM connection reset", @@ -646,6 +930,16 @@ static struct test_case test_cases[] = { .run_client = test_seqpacket_invalid_rec_buffer_client, .run_server = test_seqpacket_invalid_rec_buffer_server, }, + { + .name = "SOCK_STREAM zerocopy receive", + .run_client = test_stream_zerocopy_rx_client, + .run_server = test_stream_zerocopy_rx_server, + }, + { + .name = "SOCK_STREAM zerocopy invalid", + .run_client = test_stream_zerocopy_rx_inv_client, + .run_server = test_stream_zerocopy_rx_inv_server, + }, {}, }; @@ -729,6 +1023,7 @@ int main(int argc, char **argv) .peer_cid = VMADDR_CID_ANY, }; + srand(time(NULL)); init_signals(); for (;;) { From patchwork Fri Jun 3 05:44:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 12868606 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CA47C433EF for ; Fri, 3 Jun 2022 05:45:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241299AbiFCFp2 (ORCPT ); Fri, 3 Jun 2022 01:45:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241401AbiFCFp0 (ORCPT ); Fri, 3 Jun 2022 01:45:26 -0400 Received: from mail.sberdevices.ru (mail.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B954139150; Thu, 2 Jun 2022 22:45:23 -0700 (PDT) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mail.sberdevices.ru (Postfix) with ESMTP id B6E9C5FD02; Fri, 3 Jun 2022 08:45:21 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1654235121; bh=7Oqay8UTTuLaTdEIv1K63aAZ4Ls+owjpz32w2F6lkAM=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=rrJQxpIfChwugW1877d0kRziFjh3Y0qIul0gJsKSXkDP73QUInVa8tEWlPjMHKyNa v3HsJX1P5uowkQ6KmVW1GA6FTWWOPL+5nzVWk5EHXKzBPFmYB7G+J1om4umN44rgU3 MlWkrE7abcf4vM/DSSonKpwWx9R76rV0Anm1w3JLgktYSShNaT7ovLjL4az5PcptME uvSr63sG571yH0gedi0wOXJCpMDN5rSQltHqwDiZOxShkJz97HNWuhOsTzkyBFXlOV ikvcb1jwO0rOCUddx/lpuwtAmlOFZThrICLfKJd8q+S3Z6qM/tptcLChdDsh6DMeiQ NGdr7TLxPwfhA== Received: from S-MS-EXCH02.sberdevices.ru (S-MS-EXCH02.sberdevices.ru [172.16.1.5]) by mail.sberdevices.ru (Postfix) with ESMTP; Fri, 3 Jun 2022 08:45:07 +0300 (MSK) From: Arseniy Krasnov To: Stefano Garzarella , Stefan Hajnoczi , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , "Jakub Kicinski" , Paolo Abeni CC: "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , kernel , Krasnov Arseniy , Arseniy Krasnov Subject: [RFC PATCH v2 8/8] test/vsock: vsock rx zerocopy utility Thread-Topic: [RFC PATCH v2 8/8] test/vsock: vsock rx zerocopy utility Thread-Index: AQHYdw0EK1VmBCr2Q0ezERh6G9jfKw== Date: Fri, 3 Jun 2022 05:44:39 +0000 Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.1.12] Content-ID: MIME-Version: 1.0 X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2022/06/03 01:19:00 #19656765 X-KSMG-AntiVirus-Status: Clean, skipped Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This adds simple util for zerocopy benchmarking. Signed-off-by: Arseniy Krasnov --- tools/testing/vsock/Makefile | 1 + tools/testing/vsock/rx_zerocopy.c | 356 ++++++++++++++++++++++++++++++ 2 files changed, 357 insertions(+) create mode 100644 tools/testing/vsock/rx_zerocopy.c diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index f8293c6910c9..2cb5820ca2f3 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -3,6 +3,7 @@ all: test test: vsock_test vsock_diag_test vsock_test: vsock_test.o timeout.o control.o util.o vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o +rx_zerocopy: rx_zerocopy.o timeout.o control.o util.o CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE .PHONY: all test clean diff --git a/tools/testing/vsock/rx_zerocopy.c b/tools/testing/vsock/rx_zerocopy.c new file mode 100644 index 000000000000..55deaa665752 --- /dev/null +++ b/tools/testing/vsock/rx_zerocopy.c @@ -0,0 +1,356 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * rx_zerocopy - benchmark utility for zerocopy + * receive. + * + * Copyright (C) 2022 SberDevices. + * + * Author: Arseniy Krasnov + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "util.h" + +#define PAGE_SIZE 4096 + +#define DEFAULT_TX_SIZE 128 +#define DEFAULT_RX_SIZE 128 +#define DEFAULT_PORT 1234 + +static int client_mode = 1; +static int peer_cid = -1; +static int port = DEFAULT_PORT; +static unsigned long tx_buf_size; +static unsigned long rx_buf_size; +static unsigned long mb_to_send = 40; + +static time_t current_nsec(void) +{ + struct timespec ts; + + if (clock_gettime(CLOCK_REALTIME, &ts)) { + perror("clock_gettime"); + exit(EXIT_FAILURE); + } + + return (ts.tv_sec * 1000000000ULL) + ts.tv_nsec; +} + +/* Server accepts connection and */ +static void run_server(void) +{ + int fd; + char *data; + int client_fd; + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = port, + .svm_cid = VMADDR_CID_ANY, + }, + }; + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } clientaddr; + + socklen_t clientaddr_len = sizeof(clientaddr.svm); + time_t tx_begin_ns; + ssize_t total_send = 0; + + fprintf(stderr, "Running server, listen %i, mb %lu tx buf %lu\n", + port, mb_to_send, tx_buf_size); + + fd = socket(AF_VSOCK, SOCK_STREAM, 0); + + if (fd < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + if (listen(fd, 1) < 0) { + perror("listen"); + exit(EXIT_FAILURE); + } + + client_fd = accept(fd, &clientaddr.sa, &clientaddr_len); + + if (client_fd < 0) { + perror("accept"); + exit(EXIT_FAILURE); + } + + data = malloc(tx_buf_size); + + if (data == NULL) { + fprintf(stderr, "malloc failed\n"); + exit(EXIT_FAILURE); + } + + memset(data, 0, tx_buf_size); + tx_begin_ns = current_nsec(); + + while (1) { + ssize_t sent; + + if (total_send > mb_to_send * 1024 * 1024ULL) + break; + + sent = write(client_fd, data, tx_buf_size); + + if (sent <= 0) { + perror("write"); + exit(EXIT_FAILURE); + } + + total_send += sent; + } + + free(data); + + fprintf(stderr, "Total %zi MB, time %f\n", mb_to_send, + (float)(current_nsec() - tx_begin_ns)/1000.0/1000.0/1000.0); + + close(fd); + close(client_fd); +} + +static void run_client(int zerocopy) +{ + int fd; + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = port, + .svm_cid = peer_cid, + }, + }; + unsigned long sum = 0; + void *rx_va = NULL; + unsigned long zc_on = 1; + + printf("Running client, %s mode, peer %i:%i, rx buf %lu\n", + zerocopy ? "zerocopy" : "copy", peer_cid, port, + rx_buf_size); + + fd = socket(AF_VSOCK, SOCK_STREAM, 0); + + if (fd < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + + if (connect(fd, &addr.sa, sizeof(addr.svm))) { + perror("connect"); + exit(EXIT_FAILURE); + } + + if (setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_ZEROCOPY, + (void *)&zc_on, sizeof(zc_on))) { + perror("setsockopt"); + exit(EXIT_FAILURE); + } + + if (zerocopy) { + rx_va = mmap(NULL, rx_buf_size, + PROT_READ, MAP_SHARED, fd, 0); + + if (rx_va == MAP_FAILED) { + perror("mmap"); + exit(EXIT_FAILURE); + } + } + + while (1) { + struct pollfd fds = { 0 }; + int done = 0; + + fds.fd = fd; + fds.events = POLLIN | POLLERR | POLLHUP | + POLLRDHUP | POLLNVAL; + + if (poll(&fds, 1, -1) < 0) { + perror("poll"); + exit(EXIT_FAILURE); + } + + if (fds.revents & (POLLHUP | POLLRDHUP)) + done = 1; + + if (fds.revents & POLLERR) { + fprintf(stderr, "Done error\n"); + break; + } + + if (fds.revents & POLLIN) { + if (zerocopy) { + struct virtio_vsock_usr_hdr *hdr; + uintptr_t tmp_rx_va = (uintptr_t)rx_va; + socklen_t len = sizeof(tmp_rx_va); + + if (getsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_MAP_RX, + &tmp_rx_va, &len) < 0) { + perror("getsockopt"); + exit(EXIT_FAILURE); + } + + hdr = (struct virtio_vsock_usr_hdr *)tmp_rx_va; + + if (!hdr->len) { + if (done) { + fprintf(stderr, "Done, sum %lu\n", sum); + break; + } + } + + tmp_rx_va += PAGE_SIZE; + + if (madvise((void *)rx_va, rx_buf_size, + MADV_DONTNEED)) { + perror("madvise"); + exit(EXIT_FAILURE); + } + } else { + char data[rx_buf_size - PAGE_SIZE]; + ssize_t bytes_read; + + bytes_read = read(fd, data, sizeof(data)); + + if (bytes_read <= 0) + break; + } + } + } +} + +static const char optstring[] = ""; +static const struct option longopts[] = { + { + .name = "mode", + .has_arg = required_argument, + .val = 'm', + }, + { + .name = "zerocopy", + .has_arg = no_argument, + .val = 'z', + }, + { + .name = "cid", + .has_arg = required_argument, + .val = 'c', + }, + { + .name = "port", + .has_arg = required_argument, + .val = 'p', + }, + { + .name = "mb", + .has_arg = required_argument, + .val = 's', + }, + { + .name = "tx", + .has_arg = required_argument, + .val = 't', + }, + { + .name = "rx", + .has_arg = required_argument, + .val = 'r', + }, + { + .name = "help", + .has_arg = no_argument, + .val = '?', + }, + {}, +}; + +int main(int argc, char **argv) +{ + int zerocopy = 0; + + for (;;) { + int opt = getopt_long(argc, argv, optstring, longopts, NULL); + + if (opt == -1) + break; + + switch (opt) { + case 's': + mb_to_send = atoi(optarg); + break; + case 'c': + peer_cid = atoi(optarg); + break; + case 'p': + port = atoi(optarg); + break; + case 'r': + rx_buf_size = atoi(optarg); + break; + case 't': + tx_buf_size = atoi(optarg); + break; + case 'm': + if (strcmp(optarg, "client") == 0) + client_mode = 1; + else if (strcmp(optarg, "server") == 0) + client_mode = 0; + else { + fprintf(stderr, "--mode must be \"client\" or \"server\"\n"); + return EXIT_FAILURE; + } + break; + case 'z': + zerocopy = 1; + break; + default: + break; + } + + } + + if (!tx_buf_size) + tx_buf_size = DEFAULT_TX_SIZE; + + if (!rx_buf_size) + rx_buf_size = DEFAULT_RX_SIZE; + + tx_buf_size *= PAGE_SIZE; + rx_buf_size *= PAGE_SIZE; + + srand(time(NULL)); + + if (client_mode) + run_client(zerocopy); + else + run_server(); + + return 0; +}