From patchwork Tue Oct 18 19:15:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010972 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EFB7C43217 for ; Tue, 18 Oct 2022 19:16:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229746AbiJRTQX convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229891AbiJRTQV (ORCPT ); Tue, 18 Oct 2022 15:16:21 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45E2F5BC31 for ; Tue, 18 Oct 2022 12:16:17 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29IEJi8U024879 for ; Tue, 18 Oct 2022 12:16:16 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3k9gcnty5j-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:16 -0700 Received: from twshared15877.17.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:14 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 48018227F050A; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 01/13] io_uring: add zctap ifq definition Date: Tue, 18 Oct 2022 12:15:50 -0700 Message-ID: <20221018191602.2112515-2-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: OT8jqtjwWoGuIz1NLoMJvefmQA2BstV_ X-Proofpoint-ORIG-GUID: OT8jqtjwWoGuIz1NLoMJvefmQA2BstV_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add structure definition for io_zctap_ifq for use by lower level networking hooks. Signed-off-by: Jonathan Lemon --- include/linux/io_uring_types.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index aa4d90a53866..d83ba438ac31 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -327,6 +327,7 @@ struct io_ring_ctx { struct io_mapped_ubuf *dummy_ubuf; struct io_rsrc_data *file_data; struct io_rsrc_data *buf_data; + struct io_zctap_ifq *zctap_ifq; struct delayed_work rsrc_put_work; struct llist_head rsrc_put_llist; @@ -582,4 +583,14 @@ struct io_overflow_cqe { struct io_uring_cqe cqe; }; +struct io_zctap_ifq { + struct net_device *dev; + struct io_ring_ctx *ctx; + void *region; + struct ubuf_info *uarg; + u16 queue_id; + u16 id; + u16 fill_bgid; +}; + #endif From patchwork Tue Oct 18 19:15:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010965 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED2FBC4167B for ; Tue, 18 Oct 2022 19:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229572AbiJRTQP convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229784AbiJRTQN (ORCPT ); Tue, 18 Oct 2022 15:16:13 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AD8F58158 for ; Tue, 18 Oct 2022 12:16:12 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 29IDZFYp005675 for ; Tue, 18 Oct 2022 12:16:11 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net (PPS) with ESMTPS id 3k92jvu7ux-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:11 -0700 Received: from twshared9384.24.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:08 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 48B9A227F050B; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 02/13] netdevice: add SETUP_ZCTAP to the netdev_bpf structure Date: Tue, 18 Oct 2022 12:15:51 -0700 Message-ID: <20221018191602.2112515-3-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: hI24-ZTWedOh5VJfQBp5MLy4k8mR8uaY X-Proofpoint-GUID: hI24-ZTWedOh5VJfQBp5MLy4k8mR8uaY X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This command requests the networking device setup or teardown a new interface queue, backed by a region of user supplied memory. The queue will be managed by io-uring. Signed-off-by: Jonathan Lemon --- include/linux/netdevice.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index a36edb0ec199..8e308eaecaed 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -980,6 +980,7 @@ enum bpf_netdev_command { BPF_OFFLOAD_MAP_ALLOC, BPF_OFFLOAD_MAP_FREE, XDP_SETUP_XSK_POOL, + XDP_SETUP_ZCTAP, }; struct bpf_prog_offload_ops; @@ -1018,6 +1019,11 @@ struct netdev_bpf { struct xsk_buff_pool *pool; u16 queue_id; } xsk; + /* XDP_SETUP_ZCTAP */ + struct { + struct io_zctap_ifq *ifq; + u16 queue_id; + } zct; }; }; From patchwork Tue Oct 18 19:15:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17D93C433FE for ; Tue, 18 Oct 2022 19:16:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229584AbiJRTQT convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229872AbiJRTQS (ORCPT ); Tue, 18 Oct 2022 15:16:18 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7CF95AA05 for ; Tue, 18 Oct 2022 12:16:15 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29IE8GwH026933 for ; Tue, 18 Oct 2022 12:16:15 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3k99av7tkv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:15 -0700 Received: from twshared15877.17.frc2.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:14 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 4980D227F050C; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 03/13] io_uring: add register ifq opcode Date: Tue, 18 Oct 2022 12:15:52 -0700 Message-ID: <20221018191602.2112515-4-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 5tCiQMdsVNmCuZboZ5AFCZ-GI_7JoZXo X-Proofpoint-ORIG-GUID: 5tCiQMdsVNmCuZboZ5AFCZ-GI_7JoZXo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add initial support for support for hooking in zero-copy interface queues to io_uring. This command requests a user-managed queue from the specified network device. This only includes the register opcode, unregistration is currently done implicitly when the ring is removed. Signed-off-by: Jonathan Lemon --- include/uapi/linux/io_uring.h | 15 ++++ io_uring/Makefile | 3 +- io_uring/io_uring.c | 8 +++ io_uring/zctap.c | 131 ++++++++++++++++++++++++++++++++++ io_uring/zctap.h | 9 +++ 5 files changed, 165 insertions(+), 1 deletion(-) create mode 100644 io_uring/zctap.c create mode 100644 io_uring/zctap.h diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index ab7458033ee3..d406d21e8c38 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -490,6 +490,9 @@ enum { /* register a range of fixed file slots for automatic slot allocation */ IORING_REGISTER_FILE_ALLOC_RANGE = 25, + /* register a network ifq for zerocopy RX */ + IORING_REGISTER_IFQ = 26, + /* this goes last */ IORING_REGISTER_LAST }; @@ -666,6 +669,18 @@ struct io_uring_recvmsg_out { __u32 flags; }; +/* + * Argument for IORING_REGISTER_IFQ + */ +struct io_uring_ifq_req { + __u32 ifindex; + __u16 queue_id; + __u16 ifq_id; + __u16 region_id; + __u16 fill_bgid; + __u16 __pad[2]; +}; + #ifdef __cplusplus } #endif diff --git a/io_uring/Makefile b/io_uring/Makefile index 8cc8e5387a75..9d87e2e45ef9 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -7,5 +7,6 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ openclose.o uring_cmd.o epoll.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ - cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o + cancel.o kbuf.o rsrc.o rw.o opdef.o \ + notif.o zctap.o obj-$(CONFIG_IO_WQ) += io-wq.o diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 99a52f34b7d3..f6ac8db931ee 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -91,6 +91,7 @@ #include "cancel.h" #include "net.h" #include "notif.h" +#include "zctap.h" #include "timeout.h" #include "poll.h" @@ -2809,6 +2810,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) __io_cqring_overflow_flush(ctx, true); xa_for_each(&ctx->personalities, index, creds) io_unregister_personality(ctx, index); + io_unregister_zctap_all(ctx); if (ctx->rings) io_poll_remove_all(ctx, NULL, true); mutex_unlock(&ctx->uring_lock); @@ -4033,6 +4035,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_file_alloc_range(ctx, arg); break; + case IORING_REGISTER_IFQ: + ret = -EINVAL; + if (!arg || nr_args != 1) + break; + ret = io_register_ifq(ctx, arg); + break; default: ret = -EINVAL; break; diff --git a/io_uring/zctap.c b/io_uring/zctap.c new file mode 100644 index 000000000000..f4a45b683ca0 --- /dev/null +++ b/io_uring/zctap.c @@ -0,0 +1,131 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "io_uring.h" +#include "zctap.h" + +#define NR_ZCTAP_IFQS 1 + +typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); + +static int __io_queue_mgmt(struct net_device *dev, struct io_zctap_ifq *ifq, + u16 queue_id) +{ + struct netdev_bpf cmd; + bpf_op_t ndo_bpf; + + ndo_bpf = dev->netdev_ops->ndo_bpf; + if (!ndo_bpf) + return -EINVAL; + + cmd.command = XDP_SETUP_ZCTAP; + cmd.zct.ifq = ifq; + cmd.zct.queue_id = queue_id; + + return ndo_bpf(dev, &cmd); +} + +static int io_open_zctap_ifq(struct io_zctap_ifq *ifq, u16 queue_id) +{ + return __io_queue_mgmt(ifq->dev, ifq, queue_id); +} + +static int io_close_zctap_ifq(struct io_zctap_ifq *ifq, u16 queue_id) +{ + return __io_queue_mgmt(ifq->dev, NULL, queue_id); +} + +static struct io_zctap_ifq *io_zctap_ifq_alloc(void) +{ + struct io_zctap_ifq *ifq; + + ifq = kzalloc(sizeof(*ifq), GFP_KERNEL); + if (!ifq) + return NULL; + + ifq->queue_id = -1; + return ifq; +} + +static void io_zctap_ifq_free(struct io_zctap_ifq *ifq) +{ + if (ifq->queue_id != -1) + io_close_zctap_ifq(ifq, ifq->queue_id); + if (ifq->dev) + dev_put(ifq->dev); + kfree(ifq); +} + +int io_register_ifq(struct io_ring_ctx *ctx, + struct io_uring_ifq_req __user *arg) +{ + struct io_uring_ifq_req req; + struct io_zctap_ifq *ifq; + int err; + + if (copy_from_user(&req, arg, sizeof(req))) + return -EFAULT; + + if (req.ifq_id >= NR_ZCTAP_IFQS) + return -EFAULT; + + if (ctx->zctap_ifq) + return -EBUSY; + + ifq = io_zctap_ifq_alloc(); + if (!ifq) + return -ENOMEM; + + ifq->ctx = ctx; + ifq->fill_bgid = req.fill_bgid; + + err = -ENODEV; + ifq->dev = dev_get_by_index(&init_net, req.ifindex); + if (!ifq->dev) + goto out; + + /* region attachment TBD */ + + err = io_open_zctap_ifq(ifq, req.queue_id); + if (err) + goto out; + ifq->queue_id = req.queue_id; + + ctx->zctap_ifq = ifq; + + return 0; + +out: + io_zctap_ifq_free(ifq); + return err; +} + +int io_unregister_zctap_ifq(struct io_ring_ctx *ctx, unsigned long index) +{ + struct io_zctap_ifq *ifq; + + ifq = ctx->zctap_ifq; + if (!ifq) + return -EINVAL; + + ctx->zctap_ifq = NULL; + io_zctap_ifq_free(ifq); + + return 0; +} + +void io_unregister_zctap_all(struct io_ring_ctx *ctx) +{ + int i; + + for (i = 0; i < NR_ZCTAP_IFQS; i++) + io_unregister_zctap_ifq(ctx, i); +} diff --git a/io_uring/zctap.h b/io_uring/zctap.h new file mode 100644 index 000000000000..bbe4a509408b --- /dev/null +++ b/io_uring/zctap.h @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_ZCTAP_H +#define IOU_ZCTAP_H + +int io_register_ifq(struct io_ring_ctx *ctx, + struct io_uring_ifq_req __user *arg); +void io_unregister_zctap_all(struct io_ring_ctx *ctx); + +#endif From patchwork Tue Oct 18 19:15:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010974 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1304EC4332F for ; Tue, 18 Oct 2022 19:16:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229816AbiJRTQ0 convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229872AbiJRTQZ (ORCPT ); Tue, 18 Oct 2022 15:16:25 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1A665E640 for ; Tue, 18 Oct 2022 12:16:21 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 29IDsZZD004394 for ; Tue, 18 Oct 2022 12:16:20 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3k9abe6w87-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:20 -0700 Received: from twshared1458.22.frc3.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:18 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 4AB34227F050E; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 04/13] io_uring: create a zctap region for a mapped buffer Date: Tue, 18 Oct 2022 12:15:53 -0700 Message-ID: <20221018191602.2112515-5-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: sUZC-eX96q-M6OZryAyrxoE1tKOTH2-_ X-Proofpoint-ORIG-GUID: sUZC-eX96q-M6OZryAyrxoE1tKOTH2-_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This function takes all of a memory region that was previously registered with io_uring, and assigns it as the backing store for the specified ifq, binding the pages to a specific device. The entire region is registered instead of providing individual bufferrs, as this allows the hardware to select the optimal buffer size for incoming packets. The region is registered as part of the register_ifq opcode, instead of separately, since the ifq ring requires memory when it is created. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++- io_uring/zctap.h | 2 ++ 2 files changed, 64 insertions(+), 1 deletion(-) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index f4a45b683ca0..f8a5702f93f4 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -6,16 +6,73 @@ #include #include #include +#include #include #include "io_uring.h" #include "zctap.h" +#include "rsrc.h" +#include "kbuf.h" #define NR_ZCTAP_IFQS 1 +struct ifq_region { + struct io_mapped_ubuf *imu; + int count; + int nr_pages; + u16 id; + struct page *freelist[]; +}; + typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); +static void io_remove_ifq_region(struct ifq_region *ifr) +{ + kvfree(ifr); +} + +int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) +{ + struct io_ring_ctx *ctx = ifq->ctx; + struct io_mapped_ubuf *imu; + struct ifq_region *ifr; + int i, nr_pages; + struct page *page; + + /* XXX for now, only allow one region per ifq. */ + if (ifq->region) + return -EFAULT; + + if (unlikely(id >= ctx->nr_user_bufs)) + return -EFAULT; + id = array_index_nospec(id, ctx->nr_user_bufs); + imu = ctx->user_bufs[id]; + + /* XXX check region is page aligned */ + if (imu->ubuf & ~PAGE_MASK || imu->ubuf_end & ~PAGE_MASK) + return -EFAULT; + + nr_pages = imu->nr_bvecs; + ifr = kvmalloc(struct_size(ifr, freelist, nr_pages), GFP_KERNEL); + if (!ifr) + return -ENOMEM; + + ifr->nr_pages = nr_pages; + ifr->imu = imu; + ifr->count = nr_pages; + ifr->id = id; + + for (i = 0; i < nr_pages; i++) { + page = imu->bvec[i].bv_page; + ifr->freelist[i] = page; + } + + ifq->region = ifr; + + return 0; +} + static int __io_queue_mgmt(struct net_device *dev, struct io_zctap_ifq *ifq, u16 queue_id) { @@ -59,6 +116,8 @@ static void io_zctap_ifq_free(struct io_zctap_ifq *ifq) { if (ifq->queue_id != -1) io_close_zctap_ifq(ifq, ifq->queue_id); + if (ifq->region) + io_remove_ifq_region(ifq->region); if (ifq->dev) dev_put(ifq->dev); kfree(ifq); @@ -92,7 +151,9 @@ int io_register_ifq(struct io_ring_ctx *ctx, if (!ifq->dev) goto out; - /* region attachment TBD */ + err = io_provide_ifq_region(ifq, req.region_id); + if (err) + goto out; err = io_open_zctap_ifq(ifq, req.queue_id); if (err) diff --git a/io_uring/zctap.h b/io_uring/zctap.h index bbe4a509408b..bb44f8e972e8 100644 --- a/io_uring/zctap.h +++ b/io_uring/zctap.h @@ -6,4 +6,6 @@ int io_register_ifq(struct io_ring_ctx *ctx, struct io_uring_ifq_req __user *arg); void io_unregister_zctap_all(struct io_ring_ctx *ctx); +int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id); + #endif From patchwork Tue Oct 18 19:15:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010964 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 072A2C433FE for ; Tue, 18 Oct 2022 19:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229788AbiJRTQN convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229587AbiJRTQL (ORCPT ); Tue, 18 Oct 2022 15:16:11 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FEE058158 for ; Tue, 18 Oct 2022 12:16:10 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29IFF8CP014549 for ; Tue, 18 Oct 2022 12:16:09 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3k9ky5gh55-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:09 -0700 Received: from twshared9384.24.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:08 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 4BF41227F050F; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 05/13] io_uring: create page freelist for the ifq region Date: Tue, 18 Oct 2022 12:15:54 -0700 Message-ID: <20221018191602.2112515-6-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: drkKCkEDu5CjumzBGTge4MGWDOF962KP X-Proofpoint-GUID: drkKCkEDu5CjumzBGTge4MGWDOF962KP X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Create a freelist where the driver can obtain pages for the packet backing store. Use the page's page_private field to record lookup information. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 61 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 56 insertions(+), 5 deletions(-) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index f8a5702f93f4..af2e871b1b62 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -27,18 +27,68 @@ struct ifq_region { typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); +static void zctap_set_page_info(struct page *page, u64 info) +{ + set_page_private(page, info); +} + +static u64 zctap_mk_page_info(u16 region_id, u16 pgid) +{ + return (u64)0xface << 48 | (u64)region_id << 16 | (u64)pgid; +} + static void io_remove_ifq_region(struct ifq_region *ifr) { + struct io_mapped_ubuf *imu; + struct page *page; + int i; + + imu = ifr->imu; + for (i = 0; i < ifr->nr_pages; i++) { + page = imu->bvec[i].bv_page; + + ClearPagePrivate(page); + set_page_private(page, 0); + } + kvfree(ifr); } +static int io_zctap_map_region(struct ifq_region *ifr) +{ + struct io_mapped_ubuf *imu; + struct page *page; + u64 info; + int i; + + imu = ifr->imu; + for (i = 0; i < ifr->nr_pages; i++) { + page = imu->bvec[i].bv_page; + if (PagePrivate(page)) + goto out; + SetPagePrivate(page); + info = zctap_mk_page_info(ifr->id, i); + zctap_set_page_info(page, info); + ifr->freelist[i] = page; + } + return 0; + +out: + while (i--) { + page = imu->bvec[i].bv_page; + ClearPagePrivate(page); + set_page_private(page, 0); + } + return -EEXIST; +} + int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) { struct io_ring_ctx *ctx = ifq->ctx; struct io_mapped_ubuf *imu; struct ifq_region *ifr; - int i, nr_pages; - struct page *page; + int nr_pages; + int err; /* XXX for now, only allow one region per ifq. */ if (ifq->region) @@ -63,9 +113,10 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) ifr->count = nr_pages; ifr->id = id; - for (i = 0; i < nr_pages; i++) { - page = imu->bvec[i].bv_page; - ifr->freelist[i] = page; + err = io_zctap_map_region(ifr); + if (err) { + kvfree(ifr); + return err; } ifq->region = ifr; From patchwork Tue Oct 18 19:15:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010967 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 481D5C4321E for ; Tue, 18 Oct 2022 19:16:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229732AbiJRTQP convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229843AbiJRTQN (ORCPT ); Tue, 18 Oct 2022 15:16:13 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3FE25A3E0 for ; Tue, 18 Oct 2022 12:16:12 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 29IDZFYq005675 for ; Tue, 18 Oct 2022 12:16:11 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net (PPS) with ESMTPS id 3k92jvu7ux-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:11 -0700 Received: from twshared9384.24.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:08 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 53097227F0513; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 06/13] io_uring: Provide driver API for zctap packet buffers. Date: Tue, 18 Oct 2022 12:15:55 -0700 Message-ID: <20221018191602.2112515-7-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: PUjltNNHX6HfyFioGK2KEfRhoLIgrnUo X-Proofpoint-GUID: PUjltNNHX6HfyFioGK2KEfRhoLIgrnUo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Introduce 'struct io_zctap_buf', representing a buffer used by the network drivers, and a pair of get/put functions which are used to obtain the buffers. The code for these will be fleshed out in the next patch. Signed-off-by: Jonathan Lemon --- include/linux/io_uring.h | 35 +++++++++++++++++++++++++++++++++++ io_uring/zctap.c | 11 +++++++++++ 2 files changed, 46 insertions(+) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 43bc8a2edccf..c27645ce0efc 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -32,6 +32,13 @@ struct io_uring_cmd { u8 pdu[32]; /* available inline for free use */ }; +struct io_zctap_buf { + dma_addr_t dma; + struct page *page; + atomic_t refcount; + u8 _pad[4]; +}; + #if defined(CONFIG_IO_URING) int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd); @@ -44,6 +51,18 @@ void __io_uring_free(struct task_struct *tsk); void io_uring_unreg_ringfd(void); const char *io_uring_get_opcode(u8 opcode); +struct io_zctap_ifq; +struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq); +void io_zctap_put_buf(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf); + +static inline dma_addr_t io_zctap_buf_dma(struct io_zctap_buf *buf) +{ + return buf->dma; +} +static inline struct page *io_zctap_buf_page(struct io_zctap_buf *buf) +{ + return buf->page; +} static inline void io_uring_files_cancel(void) { if (current->io_uring) { @@ -92,6 +111,22 @@ static inline const char *io_uring_get_opcode(u8 opcode) { return ""; } +static inline dma_addr_t io_zctap_buf_dma(struct io_zctap_buf *buf) +{ + return 0; +} +static inline struct page *io_zctap_buf_page(struct io_zctap_buf *buf) +{ + return NULL; +} +static inline struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq) +{ + return NULL; +} +void io_zctap_put_buf(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf) +{ +} + #endif #endif diff --git a/io_uring/zctap.c b/io_uring/zctap.c index af2e871b1b62..46ba0d011250 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -37,6 +37,17 @@ static u64 zctap_mk_page_info(u16 region_id, u16 pgid) return (u64)0xface << 48 | (u64)region_id << 16 | (u64)pgid; } +struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq) +{ + return NULL; +} +EXPORT_SYMBOL(io_zctap_get_buf); + +void io_zctap_put_buf(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf) +{ +} +EXPORT_SYMBOL(io_zctap_put_buf); + static void io_remove_ifq_region(struct ifq_region *ifr) { struct io_mapped_ubuf *imu; From patchwork Tue Oct 18 19:15:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 548E9C433FE for ; Tue, 18 Oct 2022 19:16:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229843AbiJRTQV convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229872AbiJRTQV (ORCPT ); Tue, 18 Oct 2022 15:16:21 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69F555E640 for ; Tue, 18 Oct 2022 12:16:17 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 29IE0NVH004402 for ; Tue, 18 Oct 2022 12:16:15 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by m0001303.ppops.net (PPS) with ESMTPS id 3k9abe6w7k-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:15 -0700 Received: from twshared9269.07.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:14 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 59CDC227F0515; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 07/13] io_uring: Allocate the zctap buffers for the device Date: Tue, 18 Oct 2022 12:15:56 -0700 Message-ID: <20221018191602.2112515-8-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: uBTF1Xbv6cilX-q1CPMV570uIVF_H_2V X-Proofpoint-ORIG-GUID: uBTF1Xbv6cilX-q1CPMV570uIVF_H_2V X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org The idea is to register a memory region with the device, and later specify the desired packet buffer size. The code currently assumes a page size. Create the desired number of zctap buffers and DMA map them to the target device, recording the dma address for later use. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 55 +++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 47 insertions(+), 8 deletions(-) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index 46ba0d011250..a924e59513a4 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -22,7 +22,9 @@ struct ifq_region { int count; int nr_pages; u16 id; - struct page *freelist[]; + + struct io_zctap_buf *buf; + struct io_zctap_buf *freelist[]; }; typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); @@ -62,35 +64,65 @@ static void io_remove_ifq_region(struct ifq_region *ifr) set_page_private(page, 0); } + kvfree(ifr->buf); kvfree(ifr); } -static int io_zctap_map_region(struct ifq_region *ifr) +static inline struct device * +netdev2device(struct net_device *dev) +{ + return dev->dev.parent; /* from SET_NETDEV_DEV() */ +} + +static int io_zctap_map_region(struct ifq_region *ifr, struct device *device) { struct io_mapped_ubuf *imu; + struct io_zctap_buf *buf; struct page *page; + dma_addr_t addr; + int i, err; u64 info; - int i; imu = ifr->imu; for (i = 0; i < ifr->nr_pages; i++) { page = imu->bvec[i].bv_page; - if (PagePrivate(page)) + + if (PagePrivate(page)) { + err = -EEXIST; goto out; + } + SetPagePrivate(page); info = zctap_mk_page_info(ifr->id, i); zctap_set_page_info(page, info); - ifr->freelist[i] = page; + + buf = &ifr->buf[i]; + addr = dma_map_page_attrs(device, page, 0, PAGE_SIZE, + DMA_BIDIRECTIONAL, + DMA_ATTR_SKIP_CPU_SYNC); + if (dma_mapping_error(device, addr)) { + err = -ENOMEM; + goto out; + } + buf->dma = addr; + buf->page = page; + atomic_set(&buf->refcount, 0); + + ifr->freelist[i] = buf; } return 0; out: while (i--) { page = imu->bvec[i].bv_page; - ClearPagePrivate(page); set_page_private(page, 0); + ClearPagePrivate(page); + buf = &ifr->buf[i]; + dma_unmap_page_attrs(device, buf->dma, PAGE_SIZE, + DMA_BIDIRECTIONAL, + DMA_ATTR_SKIP_CPU_SYNC); } - return -EEXIST; + return err; } int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) @@ -119,13 +151,20 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) if (!ifr) return -ENOMEM; + ifr->buf = kvmalloc_array(nr_pages, sizeof(*ifr->buf), GFP_KERNEL); + if (!ifr->buf) { + kvfree(ifr); + return -ENOMEM; + } + ifr->nr_pages = nr_pages; ifr->imu = imu; ifr->count = nr_pages; ifr->id = id; - err = io_zctap_map_region(ifr); + err = io_zctap_map_region(ifr, netdev2device(ifq->dev)); if (err) { + kvfree(ifr->buf); kvfree(ifr); return err; } From patchwork Tue Oct 18 19:15:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E54CEC43219 for ; Tue, 18 Oct 2022 19:16:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229891AbiJRTQY convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229896AbiJRTQV (ORCPT ); Tue, 18 Oct 2022 15:16:21 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CED5F6AEB9 for ; Tue, 18 Oct 2022 12:16:17 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29IFKSae014433 for ; Tue, 18 Oct 2022 12:16:15 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3k9ky5gh68-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:15 -0700 Received: from twshared9269.07.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:14 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 60EA7227F0517; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 08/13] io_uring: Add zctap buffer get/put functions and refcounting. Date: Tue, 18 Oct 2022 12:15:57 -0700 Message-ID: <20221018191602.2112515-9-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: e_9PqEKcqhh3_JrjIoQQCef4Ih0O_QIq X-Proofpoint-GUID: e_9PqEKcqhh3_JrjIoQQCef4Ih0O_QIq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Flesh out the driver API functions introduced earlier. The driver should get a buffer reference, and get a reference. The refcount is incremented as skb fragments go up the stack, and the driver releases its ref when finished with the buffer. When ownership of the fragment is transferred to the user, a user refcount is incremented, and correspondingly decremented when returned. When all refcounts are released, the buffer is safe to reuse. The user/kernel split is needed to differentiate between "safe to reuse the buffer" and "still in use by the kernel". The locking here is non-optimal, and likely can be improved. Signed-off-by: Jonathan Lemon --- io_uring/kbuf.c | 13 ++++++ io_uring/kbuf.h | 2 + io_uring/zctap.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 121 insertions(+), 1 deletion(-) diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 25cd724ade18..caae2755e3d5 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -188,6 +188,19 @@ void __user *io_buffer_select(struct io_kiocb *req, size_t *len, return ret; } +/* XXX May called from the driver, in napi context. */ +u64 io_zctap_buffer(struct io_kiocb *req, size_t *len) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_buffer_list *bl; + void __user *ret = NULL; + + bl = io_buffer_get_list(ctx, req->buf_index); + if (likely(bl)) + ret = io_ring_buffer_select(req, len, bl, IO_URING_F_UNLOCKED); + return (u64)ret; +} + static __cold int io_init_bl_list(struct io_ring_ctx *ctx) { int i; diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index c23e15d7d3ca..b530e987b438 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -50,6 +50,8 @@ unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags); void io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags); +u64 io_zctap_buffer(struct io_kiocb *req, size_t *len); + static inline void io_kbuf_recycle_ring(struct io_kiocb *req) { /* diff --git a/io_uring/zctap.c b/io_uring/zctap.c index a924e59513a4..a398270cc43d 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -23,6 +23,8 @@ struct ifq_region { int nr_pages; u16 id; + spinlock_t freelist_lock; + struct io_zctap_buf *buf; struct io_zctap_buf *freelist[]; }; @@ -39,14 +41,116 @@ static u64 zctap_mk_page_info(u16 region_id, u16 pgid) return (u64)0xface << 48 | (u64)region_id << 16 | (u64)pgid; } +/* driver bias cannot be larger than this */ +#define IO_ZCTAP_UREF 0x1000 +#define IO_ZCTAP_KREF_MASK (IO_ZCTAP_UREF - 1) + +/* return user refs back, indicate whether buffer is reusable */ +static bool io_zctap_put_buf_uref(struct io_zctap_buf *buf) +{ + if (atomic_read(&buf->refcount) < IO_ZCTAP_UREF) { + WARN(1, "uref botch: %d < %d, page:%px\n", + atomic_read(&buf->refcount), IO_ZCTAP_UREF, + buf->page); + return false; + } + + return atomic_sub_and_test(IO_ZCTAP_UREF, &buf->refcount); +} + +/* gets a user-supplied buffer from the fill queue */ +static struct io_zctap_buf *io_zctap_get_buffer(struct io_zctap_ifq *ifq) +{ + struct io_kiocb req = { + .ctx = ifq->ctx, + .buf_index = ifq->fill_bgid, + }; + struct io_mapped_ubuf *imu; + struct io_zctap_buf *buf; + struct ifq_region *ifr; + size_t len; + u64 addr; + int pgid; + + len = 0; + ifr = ifq->region; + imu = ifr->imu; + + /* IN: uses buf_index as buffer group. + * OUT: buf_index of actual buffer. (and req->buf_list set) + * (this comes from the user-supplied bufid) + */ + addr = io_zctap_buffer(&req, &len); + if (!addr) + goto fail; + + if (addr < imu->ubuf || addr + len > imu->ubuf_end) + goto fail; + + pgid = (addr - imu->ubuf) >> PAGE_SHIFT; + + /* optimize here by passing in addr as : */ + /* assume region == ifq->region */ + + buf = &ifr->buf[pgid]; + + if (!io_zctap_put_buf_uref(buf)) { + /* XXX add retry handling. */ + WARN_RATELIMIT(1, "buffer %d still has nonzero refcount\n", + pgid); + return NULL; + } + + return buf; + +fail: + /* warn and just drop buffer */ + WARN_RATELIMIT(1, "buffer addr %llx invalid", addr); + return NULL; +} + +static void io_zctap_recycle_buf(struct ifq_region *ifr, + struct io_zctap_buf *buf) +{ + spin_lock(&ifr->freelist_lock); + + ifr->freelist[ifr->count++] = buf; + + spin_unlock(&ifr->freelist_lock); +} + +/* returns with undefined refcount */ struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq) { - return NULL; + struct ifq_region *ifr = ifq->region; + struct io_zctap_buf *buf; + + spin_lock(&ifr->freelist_lock); + + buf = NULL; + if (ifr->count) + buf = ifr->freelist[--ifr->count]; + + spin_unlock(&ifr->freelist_lock); + + if (!buf) + /* XXX locking! */ + return io_zctap_get_buffer(ifq); + + return buf; } EXPORT_SYMBOL(io_zctap_get_buf); +/* called from driver and networking stack. */ void io_zctap_put_buf(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf) { + struct ifq_region *ifr = ifq->region; + + /* XXX move to inline function later. */ + if (!atomic_dec_and_test(&buf->refcount)) + return; + + io_zctap_recycle_buf(ifr, buf); } EXPORT_SYMBOL(io_zctap_put_buf); @@ -157,6 +261,7 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) return -ENOMEM; } + spin_lock_init(&ifr->freelist_lock); ifr->nr_pages = nr_pages; ifr->imu = imu; ifr->count = nr_pages; From patchwork Tue Oct 18 19:15:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010976 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C01AAC433FE for ; Tue, 18 Oct 2022 19:16:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229890AbiJRTQ2 convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229872AbiJRTQ1 (ORCPT ); Tue, 18 Oct 2022 15:16:27 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B77F05BC31 for ; Tue, 18 Oct 2022 12:16:22 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 29IDjl3H005727 for ; Tue, 18 Oct 2022 12:16:21 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net (PPS) with ESMTPS id 3k92jvu7vn-10 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:21 -0700 Received: from twshared1458.22.frc3.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:18 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 67FCB227F051B; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 09/13] skbuff: Introduce SKBFL_FIXED_FRAG and skb_fixed() Date: Tue, 18 Oct 2022 12:15:58 -0700 Message-ID: <20221018191602.2112515-10-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: U4p_yK4mrlMwdDddNN6X6yQw-dnnfWz8 X-Proofpoint-GUID: U4p_yK4mrlMwdDddNN6X6yQw-dnnfWz8 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org When a skb marked as zerocopy goes up the network stack, during RX, it calls skb_orphan_frags_rx. This is designed to catch TX zerocopy data being redirected back up the stack, not new zerocopy fragments coming up from the driver. Currently, since the skb is marked as zerocopy, skb_copy_ubufs() is called, defeating the point of zerocopy-RX. Have the driver mark the fragments as fixed, so they are not copied. Signed-off-by: Jonathan Lemon --- include/linux/skbuff.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9fcf534f2d92..e11e55487c64 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -516,6 +516,9 @@ enum { * use frags only up until ubuf_info is released */ SKBFL_MANAGED_FRAG_REFS = BIT(4), + + /* don't move or copy the fragment */ + SKBFL_FIXED_FRAG = BIT(5), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) @@ -1651,6 +1654,11 @@ static inline bool skb_zcopy_managed(const struct sk_buff *skb) return skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAG_REFS; } +static inline bool skb_fixed(const struct sk_buff *skb) +{ + return skb_shinfo(skb)->flags & SKBFL_FIXED_FRAG; +} + static inline bool skb_pure_zcopy_same(const struct sk_buff *skb1, const struct sk_buff *skb2) { @@ -3087,7 +3095,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask) /* Frags must be orphaned, even if refcounted, if skb might loop to rx path */ static inline int skb_orphan_frags_rx(struct sk_buff *skb, gfp_t gfp_mask) { - if (likely(!skb_zcopy(skb))) + if (likely(!skb_zcopy(skb) || skb_fixed(skb))) return 0; return skb_copy_ubufs(skb, gfp_mask); } From patchwork Tue Oct 18 19:15:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010966 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9584CC43219 for ; Tue, 18 Oct 2022 19:16:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229587AbiJRTQO convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229572AbiJRTQN (ORCPT ); Tue, 18 Oct 2022 15:16:13 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A790580B9 for ; Tue, 18 Oct 2022 12:16:12 -0700 (PDT) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29IH7WvQ005819 for ; Tue, 18 Oct 2022 12:16:12 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3k9tpedhgy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:11 -0700 Received: from twshared25017.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:10 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 6F190227F052D; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 10/13] io_uring: Allocate a uarg for use by the ifq RX Date: Tue, 18 Oct 2022 12:15:59 -0700 Message-ID: <20221018191602.2112515-11-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: mOsKDBfi9qmG4nTywYgIprNzz4nDczsv X-Proofpoint-GUID: mOsKDBfi9qmG4nTywYgIprNzz4nDczsv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Create a static uarg which is attached to zerocopy RX buffers, and add a callback to handle freeing the skb. As the skb is marked as zerocopy, it bypasses the default network skb fragment destructor and calls our version. This handles our refcounts, and releasing the ZC buffer back to the freelist. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 71 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 66 insertions(+), 5 deletions(-) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index a398270cc43d..b83a62882c27 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -41,6 +41,26 @@ static u64 zctap_mk_page_info(u16 region_id, u16 pgid) return (u64)0xface << 48 | (u64)region_id << 16 | (u64)pgid; } +static u64 zctap_page_info(const struct page *page) +{ + return page_private(page); +} + +static u16 zctap_page_id(const struct page *page) +{ + return zctap_page_info(page) & 0xffff; +} + +static bool zctap_page_magic(const struct page *page) +{ + return (zctap_page_info(page) >> 48) == 0xface; +} + +static bool zctap_page_ours(struct page *page) +{ + return PagePrivate(page) && zctap_page_magic(page); +} + /* driver bias cannot be larger than this */ #define IO_ZCTAP_UREF 0x1000 #define IO_ZCTAP_KREF_MASK (IO_ZCTAP_UREF - 1) @@ -154,6 +174,17 @@ void io_zctap_put_buf(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf) } EXPORT_SYMBOL(io_zctap_put_buf); +/* could be called by the stack as it drops/recycles the skbs */ +static void io_zctap_put_page(struct io_zctap_ifq *ifq, struct page *page) +{ + struct ifq_region *ifr; + u16 pgid; + + ifr = ifq->region; /* only one */ + pgid = zctap_page_id(page); + io_zctap_put_buf(ifq, &ifr->buf[pgid]); +} + static void io_remove_ifq_region(struct ifq_region *ifr) { struct io_mapped_ubuf *imu; @@ -306,16 +337,44 @@ static int io_close_zctap_ifq(struct io_zctap_ifq *ifq, u16 queue_id) return __io_queue_mgmt(ifq->dev, NULL, queue_id); } +/* XXX get around not having "struct ubuf_info" defined in io_uring_types.h */ +struct io_zctap_ifq_priv { + struct io_zctap_ifq ifq; + struct ubuf_info uarg; +}; + +static void io_zctap_ifq_callback(struct sk_buff *skb, struct ubuf_info *uarg, + bool success) +{ + struct skb_shared_info *shinfo = skb_shinfo(skb); + struct io_zctap_ifq_priv *priv; + struct page *page; + int i; + + priv = container_of(uarg, struct io_zctap_ifq_priv, uarg); + + for (i = 0; i < shinfo->nr_frags; i++) { + page = skb_frag_page(&shinfo->frags[i]); + if (zctap_page_ours(page)) + io_zctap_put_page(&priv->ifq, page); +#if 0 + else + put_page(page); +#endif + } +} + static struct io_zctap_ifq *io_zctap_ifq_alloc(void) { - struct io_zctap_ifq *ifq; + struct io_zctap_ifq_priv *priv; - ifq = kzalloc(sizeof(*ifq), GFP_KERNEL); - if (!ifq) + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) return NULL; - ifq->queue_id = -1; - return ifq; + priv->ifq.queue_id = -1; + priv->ifq.uarg = &priv->uarg; + return &priv->ifq; } static void io_zctap_ifq_free(struct io_zctap_ifq *ifq) @@ -351,6 +410,8 @@ int io_register_ifq(struct io_ring_ctx *ctx, ifq->ctx = ctx; ifq->fill_bgid = req.fill_bgid; + ifq->uarg->callback = io_zctap_ifq_callback; + ifq->uarg->flags = SKBFL_ALL_ZEROCOPY | SKBFL_FIXED_FRAG; err = -ENODEV; ifq->dev = dev_get_by_index(&init_net, req.ifindex); From patchwork Tue Oct 18 19:16:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010971 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63600C4332F for ; Tue, 18 Oct 2022 19:16:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229932AbiJRTQW convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229865AbiJRTQV (ORCPT ); Tue, 18 Oct 2022 15:16:21 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A3A65FF7A for ; Tue, 18 Oct 2022 12:16:17 -0700 (PDT) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29IHUTU2005878 for ; Tue, 18 Oct 2022 12:16:16 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3k9tpedhhj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:16 -0700 Received: from twshared9269.07.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::d) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:14 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 753A0227F052F; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 11/13] io_uring: Define the zctap iov[] returned to the user. Date: Tue, 18 Oct 2022 12:16:00 -0700 Message-ID: <20221018191602.2112515-12-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: zpyF1UvBcmju2_llljHGXuyVOwximc3Y X-Proofpoint-GUID: zpyF1UvBcmju2_llljHGXuyVOwximc3Y X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org When performing a ZC receive, instead of returning data directly to the user, an iov[] structure is returned referencing the data in user space. The application locates the base address of the data by performing address computations on bgid:bid. The off/len applies to the base address, resulting in the data segment. The bgid:bid identifying the buffer should later be placed in the ifq's fill ring, which returns the buffer back to the kernel. Signed-off-by: Jonathan Lemon --- include/uapi/linux/io_uring.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index d406d21e8c38..c50c63053f13 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -681,6 +681,14 @@ struct io_uring_ifq_req { __u16 __pad[2]; }; +struct io_uring_zctap_iov { + __u32 off; + __u32 len; + __u16 bgid; + __u16 bid; + __u16 resv[2]; +}; + #ifdef __cplusplus } #endif From patchwork Tue Oct 18 19:16:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010968 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9816C4332F for ; Tue, 18 Oct 2022 19:16:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229784AbiJRTQR convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229584AbiJRTQR (ORCPT ); Tue, 18 Oct 2022 15:16:17 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E0375A8A3 for ; Tue, 18 Oct 2022 12:16:15 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29IDnw0B024815 for ; Tue, 18 Oct 2022 12:16:15 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3k9gcnty5d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:15 -0700 Received: from twshared5252.09.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:13 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 7C495227F0532; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 12/13] io_uring: add OP_RECV_ZC command. Date: Tue, 18 Oct 2022 12:16:01 -0700 Message-ID: <20221018191602.2112515-13-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: UA6TdA161a-JqRUatxEsVWXvqV8eMJTE X-Proofpoint-ORIG-GUID: UA6TdA161a-JqRUatxEsVWXvqV8eMJTE X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This is still a WIP. The current code (temporarily) uses addr3 as a hack in order to leverage code in io_recvmsg_prep. The recvzc opcode uses a metadata buffer either supplied directly with buf/len, or indirectly from the buffer group. The expectation is that this buffer is then filled with an array of io_uring_zctap_iov structures, which point to the data in user-memory. addr3 = (readlen << 32) | (copy_bgid << 16) | ctx->ifq_id; The amount of returned data is limited by the number of iovs that the metadata area can hold, and also the readlen parameter. As a fallback (and for testing purposes), if the skb data is not present in user memory (perhaps due to system misconfiguration), then a seprate buffer is obtained from the copy_bgid and the data is copied into user-memory. Signed-off-by: Jonathan Lemon --- include/uapi/linux/io_uring.h | 1 + io_uring/net.c | 123 ++++++++++++ io_uring/opdef.c | 15 ++ io_uring/zctap.c | 366 ++++++++++++++++++++++++++++++++++ io_uring/zctap.h | 5 + 5 files changed, 510 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index c50c63053f13..ad9e8722da00 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -215,6 +215,7 @@ enum io_uring_op { IORING_OP_URING_CMD, IORING_OP_SEND_ZC, IORING_OP_SENDMSG_ZC, + IORING_OP_RECV_ZC, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/net.c b/io_uring/net.c index caa6a803cb72..a31406ec447d 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -16,6 +16,7 @@ #include "net.h" #include "notif.h" #include "rsrc.h" +#include "zctap.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -66,6 +67,14 @@ struct io_sr_msg { struct io_kiocb *notif; }; +struct io_recvzc { + struct io_sr_msg sr; + struct io_zctap_ifq *ifq; + u32 datalen; + u16 ifq_id; + u16 copy_bgid; +}; + #define IO_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED) int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) @@ -907,6 +916,120 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return ret; } +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + u64 recvzc_cmd; + u16 ifq_id; + + /* XXX hack so we can temporarily use io_recvmsg_prep */ + recvzc_cmd = READ_ONCE(sqe->addr3); + + ifq_id = recvzc_cmd & 0xffff; + zc->copy_bgid = (recvzc_cmd >> 16) & 0xffff; + zc->datalen = recvzc_cmd >> 32; + + if (ifq_id != 0) + return -EINVAL; + zc->ifq = req->ctx->zctap_ifq; + if (!zc->ifq) + return -EINVAL; + + return io_recvmsg_prep(req, sqe); +} + +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + struct msghdr msg; + struct socket *sock; + struct iovec iov; + unsigned int cflags; + unsigned flags; + int ret, min_ret = 0; + bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK; + size_t len = zc->sr.len; + + if (!(req->flags & REQ_F_POLLED) && + (zc->sr.flags & IORING_RECVSEND_POLL_FIRST)) + return -EAGAIN; + + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + +retry_multishot: + if (io_do_buffer_select(req)) { + void __user *buf; + + buf = io_buffer_select(req, &len, issue_flags); + if (!buf) + return -ENOBUFS; + zc->sr.buf = buf; + } + + ret = import_single_range(READ, zc->sr.buf, len, &iov, &msg.msg_iter); + if (unlikely(ret)) + goto out_free; + + msg.msg_name = NULL; + msg.msg_namelen = 0; + msg.msg_control = NULL; + msg.msg_get_inq = 1; + msg.msg_flags = 0; + msg.msg_controllen = 0; + msg.msg_iocb = NULL; + msg.msg_ubuf = NULL; + + flags = zc->sr.msg_flags; + if (force_nonblock) + flags |= MSG_DONTWAIT; + if (flags & MSG_WAITALL) + min_ret = iov_iter_count(&msg.msg_iter); + + ret = io_zctap_recv(zc->ifq, sock, &msg, flags, zc->datalen, + zc->copy_bgid); + if (ret < min_ret) { + if (ret == -EAGAIN && force_nonblock) { + if ((req->flags & IO_APOLL_MULTI_POLLED) == IO_APOLL_MULTI_POLLED) { + io_kbuf_recycle(req, issue_flags); + return IOU_ISSUE_SKIP_COMPLETE; + } + + return -EAGAIN; + } + if (ret == -ERESTARTSYS) + ret = -EINTR; + if (ret > 0 && io_net_retry(sock, flags)) { + zc->sr.len -= ret; + zc->sr.buf += ret; + zc->sr.done_io += ret; + req->flags |= REQ_F_PARTIAL_IO; + return -EAGAIN; + } + req_set_fail(req); + } else if ((flags & MSG_WAITALL) && (msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) { +out_free: + req_set_fail(req); + } + + if (ret > 0) + ret += zc->sr.done_io; + else if (zc->sr.done_io) + ret = zc->sr.done_io; + else + io_kbuf_recycle(req, issue_flags); + + cflags = io_put_kbuf(req, issue_flags); + if (msg.msg_inq) + cflags |= IORING_CQE_F_SOCK_NONEMPTY; + + if (!io_recv_finish(req, &ret, cflags, ret <= 0)) + goto retry_multishot; + + return ret; +} + void io_send_zc_cleanup(struct io_kiocb *req) { struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 2330f6da791e..7b40b182769f 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -33,6 +33,7 @@ #include "poll.h" #include "cancel.h" #include "rw.h" +#include "zctap.h" static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags) { @@ -522,6 +523,20 @@ const struct io_op_def io_op_defs[] = { .fail = io_sendrecv_fail, #else .prep = io_eopnotsupp_prep, +#endif + }, + [IORING_OP_RECV_ZC] = { + .name = "RECV_ZC", + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollin = 1, + .buffer_select = 1, + .ioprio = 1, +#if defined(CONFIG_NET) + .prep = io_recvzc_prep, + .issue = io_recvzc, +#else + .prep = io_eopnotsupp_prep, #endif }, }; diff --git a/io_uring/zctap.c b/io_uring/zctap.c index b83a62882c27..4a551349b600 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -46,6 +47,11 @@ static u64 zctap_page_info(const struct page *page) return page_private(page); } +static u16 zctap_page_region_id(const struct page *page) +{ + return (zctap_page_info(page) >> 16) & 0xffff; +} + static u16 zctap_page_id(const struct page *page) { return zctap_page_info(page) & 0xffff; @@ -65,6 +71,14 @@ static bool zctap_page_ours(struct page *page) #define IO_ZCTAP_UREF 0x1000 #define IO_ZCTAP_KREF_MASK (IO_ZCTAP_UREF - 1) +static void io_zctap_get_buf_uref(struct ifq_region *ifr, u16 pgid) +{ + if (WARN_ON(pgid >= ifr->nr_pages)) + return; + + atomic_add(IO_ZCTAP_UREF, &ifr->buf[pgid].refcount); +} + /* return user refs back, indicate whether buffer is reusable */ static bool io_zctap_put_buf_uref(struct io_zctap_buf *buf) { @@ -364,6 +378,18 @@ static void io_zctap_ifq_callback(struct sk_buff *skb, struct ubuf_info *uarg, } } +static struct io_zctap_ifq *io_zctap_skb_ifq(struct sk_buff *skb) +{ + struct io_zctap_ifq_priv *priv; + struct ubuf_info *uarg = skb_zcopy(skb); + + if (uarg && uarg->callback == io_zctap_ifq_callback) { + priv = container_of(uarg, struct io_zctap_ifq_priv, uarg); + return &priv->ifq; + } + return NULL; +} + static struct io_zctap_ifq *io_zctap_ifq_alloc(void) { struct io_zctap_ifq_priv *priv; @@ -457,3 +483,343 @@ void io_unregister_zctap_all(struct io_ring_ctx *ctx) for (i = 0; i < NR_ZCTAP_IFQS; i++) io_unregister_zctap_ifq(ctx, i); } + +struct zctap_read_desc { + struct iov_iter *iter; + struct ifq_region *ifr; + u32 iov_space; + u32 iov_limit; + u32 recv_limit; + + struct io_kiocb req; + u8 *buf; + size_t offset; + size_t buflen; + + struct io_zctap_ifq *ifq; + u16 copy_bgid; /* XXX move to register ifq? */ +}; + +static int __zctap_get_user_buffer(struct zctap_read_desc *ztr, int len) +{ + if (!ztr->buflen) { + ztr->req = (struct io_kiocb) { + .ctx = ztr->ifq->ctx, + .buf_index = ztr->copy_bgid, + }; + + ztr->buf = (u8 *)io_zctap_buffer(&ztr->req, &ztr->buflen); + ztr->offset = 0; + } + return len > ztr->buflen ? ztr->buflen : len; +} + +static int zctap_copy_data(struct zctap_read_desc *ztr, int len, u8 *kaddr) +{ + struct io_uring_zctap_iov zov; + u32 space; + int err; + + space = ztr->iov_space + sizeof(zov); + if (space > ztr->iov_limit) + return 0; + + len = __zctap_get_user_buffer(ztr, len); + if (!len) + return -ENOBUFS; + + err = copy_to_user(ztr->buf + ztr->offset, kaddr, len); + if (err) + return -EFAULT; + + zov = (struct io_uring_zctap_iov) { + .off = ztr->offset, + .len = len, + .bgid = ztr->copy_bgid, + .bid = ztr->req.buf_index, + }; + + if (copy_to_iter(&zov, sizeof(zov), ztr->iter) != sizeof(zov)) + return -EFAULT; + + ztr->offset += len; + ztr->buflen -= len; + + ztr->iov_space = space; + + return len; +} + +static int zctap_copy_frag(struct zctap_read_desc *ztr, struct page *page, + int off, int len, struct io_uring_zctap_iov *zov) +{ + u8 *kaddr; + int err; + + len = __zctap_get_user_buffer(ztr, len); + if (!len) + return -ENOBUFS; + + kaddr = kmap(page) + off; + err = copy_to_user(ztr->buf + ztr->offset, kaddr, len); + kunmap(page); + + if (err) + return -EFAULT; + + *zov = (struct io_uring_zctap_iov) { + .off = ztr->offset, + .len = len, + .bgid = ztr->copy_bgid, + .bid = ztr->req.buf_index, + }; + + ztr->offset += len; + ztr->buflen -= len; + + return len; +} + +static int zctap_recv_frag(struct zctap_read_desc *ztr, + struct io_zctap_ifq *ifq, + const skb_frag_t *frag, int off, int len) +{ + struct io_uring_zctap_iov zov; + struct page *page; + u32 space; + int pgid; + + space = ztr->iov_space + sizeof(zov); + if (space > ztr->iov_limit) + return 0; + + page = skb_frag_page(frag); + off += skb_frag_off(frag); + + if (likely(ifq == ztr->ifq && zctap_page_ours(page))) { + pgid = zctap_page_id(page); + io_zctap_get_buf_uref(ztr->ifr, pgid); + zov = (struct io_uring_zctap_iov) { + .off = off, + .len = len, + .bgid = zctap_page_region_id(page), + .bid = pgid, + }; + } else { + len = zctap_copy_frag(ztr, page, off, len, &zov); + if (len <= 0) + return len; + } + + if (copy_to_iter(&zov, sizeof(zov), ztr->iter) != sizeof(zov)) + return -EFAULT; + + ztr->iov_space = space; + + return len; +} + +/* Our version of __skb_datagram_iter -- should work for UDP also. */ +static int +zctap_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, + unsigned int offset, size_t len) +{ + struct zctap_read_desc *ztr = desc->arg.data; + struct io_zctap_ifq *ifq; + unsigned start, start_off; + struct sk_buff *frag_iter; + int i, copy, end, ret = 0; + + if (ztr->iov_space >= ztr->iov_limit) { + desc->count = 0; + return 0; + } + if (len > ztr->recv_limit) + len = ztr->recv_limit; + + start = skb_headlen(skb); + start_off = offset; + + ifq = io_zctap_skb_ifq(skb); + + if (offset < start) { + copy = start - offset; + if (copy > len) + copy = len; + + /* copy out linear data */ + ret = zctap_copy_data(ztr, copy, skb->data + offset); + if (ret < 0) + goto out; + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + const skb_frag_t *frag; + + WARN_ON(start > offset + len); + + frag = &skb_shinfo(skb)->frags[i]; + end = start + skb_frag_size(frag); + + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + ret = zctap_recv_frag(ztr, ifq, frag, + offset - start, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + + skb_walk_frags(skb, frag_iter) { + WARN_ON(start > offset + len); + + end = start + frag_iter->len; + if (offset < end) { + int off; + + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = zctap_recv_skb(desc, frag_iter, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + +out: + if (offset == start_off) + return ret; + return offset - start_off; +} + +static int __io_zctap_tcp_read(struct sock *sk, struct zctap_read_desc *zrd) +{ + read_descriptor_t rd_desc = { + .arg.data = zrd, + .count = 1, + }; + + return tcp_read_sock(sk, &rd_desc, zctap_recv_skb); +} + +static int io_zctap_tcp_recvmsg(struct sock *sk, struct zctap_read_desc *zrd, + int flags, int *addr_len) +{ + size_t used; + long timeo; + int ret; + + ret = used = 0; + + lock_sock(sk); + + timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); + while (zrd->recv_limit) { + ret = __io_zctap_tcp_read(sk, zrd); + if (ret < 0) + break; + if (!ret) { + if (used) + break; + if (sock_flag(sk, SOCK_DONE)) + break; + if (sk->sk_err) { + ret = sock_error(sk); + break; + } + if (sk->sk_shutdown & RCV_SHUTDOWN) + break; + if (sk->sk_state == TCP_CLOSE) { + ret = -ENOTCONN; + break; + } + if (!timeo) { + ret = -EAGAIN; + break; + } + if (!skb_queue_empty(&sk->sk_receive_queue)) + break; + sk_wait_data(sk, &timeo, NULL); + if (signal_pending(current)) { + ret = sock_intr_errno(timeo); + break; + } + continue; + } + zrd->recv_limit -= ret; + used += ret; + + if (!timeo) + break; + release_sock(sk); + lock_sock(sk); + + if (sk->sk_err || sk->sk_state == TCP_CLOSE || + (sk->sk_shutdown & RCV_SHUTDOWN) || + signal_pending(current)) + break; + } + + release_sock(sk); + + /* XXX, handle timestamping */ + + if (used) + return used; + + return ret; +} + +int io_zctap_recv(struct io_zctap_ifq *ifq, struct socket *sock, + struct msghdr *msg, int flags, u32 datalen, u16 copy_bgid) +{ + struct sock *sk = sock->sk; + struct zctap_read_desc zrd = { + .iov_limit = msg_data_left(msg), + .recv_limit = datalen, + .iter = &msg->msg_iter, + .ifq = ifq, + .copy_bgid = copy_bgid, + .ifr = ifq->region, + }; + const struct proto *prot; + int addr_len = 0; + int ret; + + if (flags & MSG_ERRQUEUE) + return -EOPNOTSUPP; + + prot = READ_ONCE(sk->sk_prot); + if (prot->recvmsg != tcp_recvmsg) + return -EPROTONOSUPPORT; + + sock_rps_record_flow(sk); + + ret = io_zctap_tcp_recvmsg(sk, &zrd, flags, &addr_len); + if (ret >= 0) { + msg->msg_namelen = addr_len; + ret = zrd.iov_space; + } + return ret; +} diff --git a/io_uring/zctap.h b/io_uring/zctap.h index bb44f8e972e8..3232f83b5b8f 100644 --- a/io_uring/zctap.h +++ b/io_uring/zctap.h @@ -8,4 +8,9 @@ void io_unregister_zctap_all(struct io_ring_ctx *ctx); int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id); +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags); +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_zctap_recv(struct io_zctap_ifq *ifq, struct socket *sock, + struct msghdr *msg, int flags, u32 datalen, u16 copy_bgid); + #endif From patchwork Tue Oct 18 19:16:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13010973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63C3AC433FE for ; Tue, 18 Oct 2022 19:16:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229682AbiJRTQZ convert rfc822-to-8bit (ORCPT ); Tue, 18 Oct 2022 15:16:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbiJRTQV (ORCPT ); Tue, 18 Oct 2022 15:16:21 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2EC8E5A3DD for ; Tue, 18 Oct 2022 12:16:18 -0700 (PDT) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29IDnw0E024815 for ; Tue, 18 Oct 2022 12:16:17 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3k9gcnty5d-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Oct 2022 12:16:17 -0700 Received: from twshared5252.09.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 18 Oct 2022 12:16:14 -0700 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 82185227F0534; Tue, 18 Oct 2022 12:16:02 -0700 (PDT) From: Jonathan Lemon To: CC: Subject: [RFC PATCH v2 13/13] io_uring: Make remove_ifq_region a delayed work call Date: Tue, 18 Oct 2022 12:16:02 -0700 Message-ID: <20221018191602.2112515-14-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221018191602.2112515-1-jonathan.lemon@gmail.com> References: <20221018191602.2112515-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: ezX_kytenqNn_y_5JETmJtMGGDf9pqhB X-Proofpoint-ORIG-GUID: ezX_kytenqNn_y_5JETmJtMGGDf9pqhB X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_07,2022-10-18_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Very much a WIP! The page backing store should not be removed until all ouststanding packets are returned. The packets may be inflight, owned by the driver or sitting in a socket buffer. This shows how the cleanup routine should check that there are no pending packets in flight, before cleaning up the buffers. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index 4a551349b600..a1525a0b0245 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -25,6 +25,7 @@ struct ifq_region { u16 id; spinlock_t freelist_lock; + struct delayed_work release_work; struct io_zctap_buf *buf; struct io_zctap_buf *freelist[]; @@ -199,24 +200,38 @@ static void io_zctap_put_page(struct io_zctap_ifq *ifq, struct page *page) io_zctap_put_buf(ifq, &ifr->buf[pgid]); } -static void io_remove_ifq_region(struct ifq_region *ifr) +static void io_remove_ifq_region_work(struct work_struct *work) { - struct io_mapped_ubuf *imu; - struct page *page; - int i; + struct ifq_region *ifr = container_of( + to_delayed_work(work), struct ifq_region, release_work); + struct io_zctap_buf *buf; + int i, refs; - imu = ifr->imu; for (i = 0; i < ifr->nr_pages; i++) { - page = imu->bvec[i].bv_page; + buf = &ifr->buf[i]; + refs = atomic_read(&buf->refcount) & IO_ZCTAP_KREF_MASK; + if (refs) { + schedule_delayed_work(&ifr->release_work, HZ); + return; + } + } - ClearPagePrivate(page); - set_page_private(page, 0); + for (i = 0; i < ifr->nr_pages; i++) { + buf = &ifr->buf[i]; + set_page_private(buf->page, 0); + ClearPagePrivate(buf->page); } kvfree(ifr->buf); kvfree(ifr); } +static void io_remove_ifq_region(struct ifq_region *ifr) +{ + INIT_DELAYED_WORK(&ifr->release_work, io_remove_ifq_region_work); + schedule_delayed_work(&ifr->release_work, 0); +} + static inline struct device * netdev2device(struct net_device *dev) { @@ -403,6 +418,8 @@ static struct io_zctap_ifq *io_zctap_ifq_alloc(void) return &priv->ifq; } +/* XXX this seems to be called too late - MM is already torn down? */ +/* need to tear down sockets, then io_uring, then MM */ static void io_zctap_ifq_free(struct io_zctap_ifq *ifq) { if (ifq->queue_id != -1)