From patchwork Fri Mar 8 23:34:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13587434 Received: from mail-il1-f181.google.com (mail-il1-f181.google.com [209.85.166.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A53EF3BBDF for ; Fri, 8 Mar 2024 23:50:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941856; cv=none; b=u4bwd51xyqL04GAgXlop6njayfxYCc4psS3iIIi9N2xWtMRtmqp9ybZZ4vL8+xqZvC6+/jDkto0h/H1qRMja+5EAxsOKwrqwNF475gQY3IPGyOp8ZCblZ+t6YomaFSLCdR1g2/8r6ezXnqYysrFG3yoIPh4x68WZpdUQ7CMAERc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941856; c=relaxed/simple; bh=4Q6R8xVCik3yzlK0DhRL2MHaQ/Ad0raM1vf3CjKXmXE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A0kaPJ2xavEukRx/SSW2gbUZjYZbcfnKa3oDz38MptUIjzp0U2KipNt/YCN2u1VrFOaW6BrTjnIWf6Une4L1Hy6KdGjhQIeGK/w9VDRI64dHIjGA1/DRHBvt4vdGv4oJpEEdGpaVAuxm4SzfokeqwscW0XGezgsVafPHLO4DZgc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=mkcC12gk; arc=none smtp.client-ip=209.85.166.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="mkcC12gk" Received: by mail-il1-f181.google.com with SMTP id e9e14a558f8ab-365c0dfc769so2589355ab.1 for ; Fri, 08 Mar 2024 15:50:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1709941852; x=1710546652; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GvCNhknm47rc9v0dpdXxe2vqVN3pKt3FSisRp7tfah8=; b=mkcC12gkJzlgdlpCXarVQeZXpikfLm0LlK2Wjs6ntd8KZF3VJ3cvt9czc2ltkTdhNU YnVju17ydMx531L4LWTvNFQSqWn63wyr3mu+CAe4DuXgadABxhqMDGqnser1NOdCYjr9 rGvxF0taAIj5BUICj5KYWvU3NrpsnlzshUW2P3OAximK1iJgQDm10Wr7kU1fDH08934h vMlDStF7YmbjMgPj0+FqBVL65uKrpy13CRUqW4YKp9IJWzcI6UPwRPgTyJJKw22HaisT Si+2ag4umx9cthPeDIbSpAjBy188KxUAY9Fxj88q5c0V6TDlQfGGa2QItB35jqnUMIfG NMGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709941852; x=1710546652; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GvCNhknm47rc9v0dpdXxe2vqVN3pKt3FSisRp7tfah8=; b=tUV4U/6eKBxrt5eVIrwxYmfKqNXmwoO1h9aZw6GUoRg1HZqRWj31PJQQZWyYbp8/Dq Os2QASjbLVywx1kfKeuB2zR/biTbrx0YDAOaNinqiMG/HgQr6SJLIB0sx8xt6SQVx2Ml 0GNR1gtXi+GuvxlsPpWNYYe8TO/DYs81bjmBtAZ/gHH9jB8hcVDWHHqsmGgbXDJi1LnO Q5Q03LcRgO/eLZLgoOu3mafcdr9IKhOpChwU0mK1LCqHZW5ayC51Po082iOlAz3MpOyz +Hel99cOnIEQRFTV9hSq5/R9MrMndQB8SoCr4pZJxyiu5TKHBcPZMSTRvmUzxTfIgjWS U3CQ== X-Gm-Message-State: AOJu0Yx9dinhaIp9cfdJctKyW65hUgjq/0LNDKcdF9uxxA+oGtbT97Vl VzGEczu62Wj6nUgaAFRittDpoDWse5bw7YRqq2sQUccAaSJ5tmK8rD8mwmAG7aUFGUsh6hJHVg8 2 X-Google-Smtp-Source: AGHT+IH+1O84dXv01XL+WGV0yumMzFF0IRDM99aujGEvQSLdqKzzhKXMnRNh+9l1pGRWp/JLdbrD1A== X-Received: by 2002:a5e:df01:0:b0:7c8:7d0e:f240 with SMTP id f1-20020a5edf01000000b007c87d0ef240mr486807ioq.1.1709941851911; Fri, 08 Mar 2024 15:50:51 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a13-20020a056602208d00b007c870de3183sm94159ioa.49.2024.03.08.15.50.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 15:50:50 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, dyudaken@gmail.com, dw@davidwei.uk, Jens Axboe Subject: [PATCH 1/7] io_uring/net: add generic multishot retry helper Date: Fri, 8 Mar 2024 16:34:06 -0700 Message-ID: <20240308235045.1014125-2-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240308235045.1014125-1-axboe@kernel.dk> References: <20240308235045.1014125-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This is just moving io_recv_prep_retry() higher up so we can use it for sends as well, and renaming it to be generically useful for both sends and receives. Signed-off-by: Jens Axboe --- io_uring/net.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 19451f0dbf81..97559cdec98e 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -191,6 +191,16 @@ static int io_setup_async_msg(struct io_kiocb *req, return -EAGAIN; } +static inline void io_mshot_prep_retry(struct io_kiocb *req) +{ + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); + + req->flags &= ~REQ_F_BL_EMPTY; + sr->done_io = 0; + sr->len = 0; /* get from the provided buffer */ + req->buf_index = sr->buf_group; +} + #ifdef CONFIG_COMPAT static int io_compat_msg_copy_hdr(struct io_kiocb *req, struct io_async_msghdr *iomsg, @@ -668,16 +678,6 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return 0; } -static inline void io_recv_prep_retry(struct io_kiocb *req) -{ - struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); - - req->flags &= ~REQ_F_BL_EMPTY; - sr->done_io = 0; - sr->len = 0; /* get from the provided buffer */ - req->buf_index = sr->buf_group; -} - /* * Finishes io_recv and io_recvmsg. * @@ -704,7 +704,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE; - io_recv_prep_retry(req); + io_mshot_prep_retry(req); /* Known not-empty or unknown state, retry */ if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq < 0) { if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY) From patchwork Fri Mar 8 23:34:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13587435 Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBA933BBEA for ; Fri, 8 Mar 2024 23:50:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941856; cv=none; b=D4fZwO2GOt3+eYCWxoYv7KI6XKOUBweKt2288CA4byFmbw0wBa5ah3zMCAIDexUIqqF4acBT14A+IDoQqQQYwXRv6pCj+S81Ty44EIbINizAf8MwyFivaJRRBmAm77pLbNPfIcO2gskRB+IahEtnahMpTpmi/1v1TEp10rYaT7Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941856; c=relaxed/simple; bh=P6tB4BpZMnRH2BNRNjYHfAMIIVoEhlX+j04L/NSp/EM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qVOrrfY35B2a6jXkbTReaAEzsjc6voT3lGmq3IyS80RLfPoh5SUPHLJOeuIMJkywYX2G8iAchWyyv06ssx6xxW/HYd8pxtkDC3prgPvJ6QBL3wzNB2BfcxCoVIbC4rINyIrDsnMmFz29M4ElIXHBRJXIAXSYqS+pOWXWSBrznZc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=HEByMjGT; arc=none smtp.client-ip=209.85.166.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="HEByMjGT" Received: by mail-io1-f46.google.com with SMTP id ca18e2360f4ac-7c495be1924so38307039f.1 for ; Fri, 08 Mar 2024 15:50:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1709941854; x=1710546654; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+7uQeSmZMXhZmFPy2FSkfjCUr3fhQmdKGcc5TYTa3LI=; b=HEByMjGTFnKLUKXyD5KbKuBLF/1//m5lr3qEIpMBOl+vjNpEqIjVMHiywWioL2hgs6 ggynR/vjxWWfNH7KPZgSWgfXh99wLjBsH/bWrX5DT+5jlsaQeITACPbT9i2WdiBQm1zL mFWnkpFunu6XG9vDrpi2FhtAMB6rOfO0xZa9ovorb27g2uhsATTATTAYR9WhWlewzxLZ q1MUu7E/vOtjgPC9anCRrm1Q7n6FFpjOi6ReTq5NuSlJEqsnOjrJ6SBEyzx3xh2iVzEp GkYme96cY8kHdPiPXggtdpfuoproKTwOAaYsO0mxnBuAZHokrH6JnBFb2tfSvue5FIJx OV4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709941854; x=1710546654; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+7uQeSmZMXhZmFPy2FSkfjCUr3fhQmdKGcc5TYTa3LI=; b=sRHemn2y0k+v3C/E4czrrdjDRGOZI3fFXcHrFfnBM7xQ1lSwJJhV5hJ2PJ1jU6hLfw KURNWqu8SaNk1IOEm/W8VdExT3XlbqvsFXxpQvrEY0L2+hG05Fdw0TLImaWMYWZOZzHz SAnsUoiB+IV4P4MTi0af8tgiG7ZxNunPjeWfmZ4lqtEt2eDiwUUzEDLOGdeuS6BRIabS ltFx5cKjezNAYOLtlaip9bGMevTVO55keLuCsODH6pDGhhP8Uqu9iSQ0XIf9AxwVwdb9 AEiMD+UWHILi7e8tsJ0eyUId7eGdaZbkGaQaMJVIQbdAn9aMDeJmVClv2a6ZImMSnbiQ 768A== X-Gm-Message-State: AOJu0Yx03a+rAhKU9vSZ3e8YHQKRmsuJKGhiFo6JT+WxfhviANDunuQI e0hAXyx6wPkg7DpQR9x0MuG5eJie8q2rAV74btxyAPtVDe2RRTruHV2D9F0ZrMpdlQ4IGxKYodF g X-Google-Smtp-Source: AGHT+IELH97Li3ONvrls7gFdsW9Aupxr09QXNVPRNpVuYR42ZoxttoxHMmJXMKCbxDk4WwE62rqASw== X-Received: by 2002:a05:6602:122f:b0:7c8:8a21:7156 with SMTP id z15-20020a056602122f00b007c88a217156mr419126iot.2.1709941853760; Fri, 08 Mar 2024 15:50:53 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a13-20020a056602208d00b007c870de3183sm94159ioa.49.2024.03.08.15.50.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 15:50:52 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, dyudaken@gmail.com, dw@davidwei.uk, Jens Axboe Subject: [PATCH 2/7] io_uring/net: add provided buffer support for IORING_OP_SEND Date: Fri, 8 Mar 2024 16:34:07 -0700 Message-ID: <20240308235045.1014125-3-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240308235045.1014125-1-axboe@kernel.dk> References: <20240308235045.1014125-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 It's pretty trivial to wire up provided buffer support for the send side, just like we do on the receive side. This enables setting up a buffer ring that an application can use to push pending sends to, and then have a send pick a buffer from that ring. One of the challenges with async IO and networking sends is that you can get into reordering conditions if you have more than one inflight at the same time. Consider the following scenario where everything is fine: 1) App queues sendA for socket1 2) App queues sendB for socket1 3) App does io_uring_submit() 4) sendA is issued, completes successfully, posts CQE 5) sendB is issued, completes successfully, posts CQE All is fine. Requests are always issued in-order, and both complete inline as most sends do. However, if we're flooding socket1 with sends, the following could also result from the same sequence: 1) App queues sendA for socket1 2) App queues sendB for socket1 3) App does io_uring_submit() 4) sendA is issued, socket1 is full, poll is armed for retry 5) Space frees up in socket1, this triggers sendA retry via task_work 6) sendB is issued, completes successfully, posts CQE 7) sendA is retried, completes successfully, posts CQE Now we've sent sendB before sendA, which can make things unhappy. If both sendA and sendB had been using provided buffers, then it would look as follows instead: 1) App queues dataA for sendA, queues sendA for socket1 2) App queues dataB for sendB queues sendB for socket1 3) App does io_uring_submit() 4) sendA is issued, socket1 is full, poll is armed for retry 5) Space frees up in socket1, this triggers sendA retry via task_work 6) sendB is issued, picks first buffer (dataA), completes successfully, posts CQE (which says "I sent dataA") 7) sendA is retried, picks first buffer (dataB), completes successfully, posts CQE (which says "I sent dataB") Now we've sent the data in order, and everybody is happy. It's worth noting that this also opens the door for supporting multishot sends, as provided buffers would be a prerequisite for that. Those can trigger either when new buffers are added to the outgoing ring, or (if stalled due to lack of space) when space frees up in the socket. Signed-off-by: Jens Axboe --- io_uring/net.c | 19 ++++++++++++++++--- io_uring/opdef.c | 1 + 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 97559cdec98e..566ef401f976 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -484,8 +484,10 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) { struct sockaddr_storage __address; struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); - struct msghdr msg; + size_t len = sr->len; struct socket *sock; + unsigned int cflags; + struct msghdr msg; unsigned flags; int min_ret = 0; int ret; @@ -518,7 +520,17 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!sock)) return -ENOTSOCK; - ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &msg.msg_iter); + if (io_do_buffer_select(req)) { + void __user *buf; + + buf = io_buffer_select(req, &len, issue_flags); + if (!buf) + return -ENOBUFS; + sr->buf = buf; + sr->len = len; + } + + ret = import_ubuf(ITER_SOURCE, sr->buf, len, &msg.msg_iter); if (unlikely(ret)) return ret; @@ -550,7 +562,8 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) ret += sr->done_io; else if (sr->done_io) ret = sr->done_io; - io_req_set_res(req, ret, 0); + cflags = io_put_kbuf(req, issue_flags); + io_req_set_res(req, ret, cflags); return IOU_OK; } diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 9c080aadc5a6..88fbe5cfd379 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -273,6 +273,7 @@ const struct io_issue_def io_issue_defs[] = { .audit_skip = 1, .ioprio = 1, .manual_alloc = 1, + .buffer_select = 1, #if defined(CONFIG_NET) .prep = io_sendmsg_prep, .issue = io_send, From patchwork Fri Mar 8 23:34:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13587436 Received: from mail-io1-f47.google.com (mail-io1-f47.google.com [209.85.166.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2D2C3BBEA for ; Fri, 8 Mar 2024 23:50:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941860; cv=none; b=k0/DuqQ/tR7mC9acafpTeLHEjU23lXjiq7J7uv24GaRTViNe7A70Lo5rtQiUFjdgYKX31weWkOAY0IEqlc+Xa5rbJK1+dj/soSnxNVcrBDoZKvSh8Vzly7IhdHZFi2LAP8B1m85Nn05Cqraw8fqoxgK0NyWVaNG54b1esFYNxmc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941860; c=relaxed/simple; bh=0f/06JHYbNUVoaAqdypkyt5OGDkgoXYehmjXsV62DDs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Id6MW/oGxc/vjRiq0oQ6Ui2GoyDwYypm1ciRh/9o+NVeVUXhN2+kivmRAwcd0mPaZXpl/rX4zmOkil/smXi4iXLR5KC+jJHtnscW71ofLpNFZ7zG64UfZtLWw4veoO3dnQDegR14zo3F7QRgqkvhGDbZ+mFVXuTkN5IIhjFE2IA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=ZePixnpl; arc=none smtp.client-ip=209.85.166.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="ZePixnpl" Received: by mail-io1-f47.google.com with SMTP id ca18e2360f4ac-7c7ee5e2f59so13679239f.1 for ; Fri, 08 Mar 2024 15:50:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1709941856; x=1710546656; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PkFE7/WkSJbSgjn6rehj95LSRT6ecLGzLQbmnltqDbI=; b=ZePixnplB9uQilTPQVZvCvh4TzgpFx0d89YkMCy5rBZul7niLwQyeU7DurpRy/lU4v YJp8w2saETxMDBPq9fvZy7w+Rw2oo4NNj2dOvUsh1yUeOUktStS03IhgwdmSb1UIx9ai QAtSNAExqRJY2Eyyf6ezuOcoxkRMM17UhOe0aYe6YeR/HsqTlls8kO0Z9sC4zUKsqn38 dooONJBDt8dbDVlZ/HiB1npXj+gzRUspz1NZDwsdmNCGZ9OROqaUXjG8nJtMW8mkzXtG QSsOXErOEmDl/+qPEAAWy5e6jqewnIa1j3gGEEEmNp4xQq2N0uoVXS7miWbZOMrAkvpR TMcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709941856; x=1710546656; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PkFE7/WkSJbSgjn6rehj95LSRT6ecLGzLQbmnltqDbI=; b=Evax20jVeX5DvCF4NE22za15Im9JBwrrbDfsiTkI5ppC8Q5w+tNjoFxt85vcZh9zd7 TgxkgCiBVaIvUgW0x2R6eThdqNhXhfosgSzNltTwQ0ydP97xfoUtKsBFbYdh++RaFwCo 52h8qqqxFGGO1kcNSbFfKToeLBJjWytsGt5Ez58D88dwOE5H3BzPHO5TUTNU3oqCR806 uY0OZpL27qgJ1KyXSMwWyZlSf9bHKQR4LdigLr5dAmXtlnjtYdioQl+pbydRKwz5slIz tykpNL6D2Sy/F9gb1aynthlsdyuFnZZBqTILfifVwtR+Aoidijfou6tpezjwUuO29jKu qs1Q== X-Gm-Message-State: AOJu0YxBRPq0PFr5ArX6XKGk3AoQVgA0aueVPbCbQxJajo16KMjrrxv0 p3l8adpeK/fzGt9r2piCZpT66dGQW13Hk26tb7o31UtSn65sJ0W2G2zihCg9B+wVFCIsc996LmQ A X-Google-Smtp-Source: AGHT+IFe8Pn7m5yraBZ6Gz0BWaRngxB8W8mYnJt6tKNjMuPX7PUZScqdBcyyCtRpXQk817wQiuLmdQ== X-Received: by 2002:a5d:9f1a:0:b0:7c8:7471:2f59 with SMTP id q26-20020a5d9f1a000000b007c874712f59mr477031iot.0.1709941855326; Fri, 08 Mar 2024 15:50:55 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a13-20020a056602208d00b007c870de3183sm94159ioa.49.2024.03.08.15.50.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 15:50:54 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, dyudaken@gmail.com, dw@davidwei.uk, Jens Axboe Subject: [PATCH 3/7] io_uring/kbuf: add helpers for getting/peeking multiple buffers Date: Fri, 8 Mar 2024 16:34:08 -0700 Message-ID: <20240308235045.1014125-4-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240308235045.1014125-1-axboe@kernel.dk> References: <20240308235045.1014125-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Our provided buffer interface only allows selection of a single buffer. Add an API that allows getting/peeking multiple buffers at the same time. This is only implemented for the ring provided buffers. It could be added for the legacy provided buffers as well, but since it's strongly encouraged to use the new interface, let's keep it simpler and just provide it for the new API. The legacy interface will always just select a single buffer. There are two new main functions: io_buffers_select(), which selects up as many buffers as it can. The caller supplies the iovec array, and io_buffers_select() may allocate a bigger array if the 'out_len' being passed in is non-zero and bigger than what we can fit in the provided iovec. Buffers grabbed with this helper are permanently assigned. io_buffers_peek(), which works like io_buffers_select(), except they can be recycled, if needed. Callers using either of these functions should call io_put_kbufs() rather than io_put_kbuf() at completion time. The peek interface must be called with the ctx locked from peek to completion. This add a bit state for the request: - REQ_F_BUFFERS_COMMIT, which means that the the buffers have been peeked and should be committed to the buffer ring head when they are put as part of completion. Prior to this, we used the fact that req->buf_list was cleared to NULL when committed. But with the peek interface requiring the ring to be locked throughout the operation, we can use that as a lookup cache instead. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 3 + io_uring/kbuf.c | 203 ++++++++++++++++++++++++++++++--- io_uring/kbuf.h | 39 +++++-- 3 files changed, 223 insertions(+), 22 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index e24893625085..971294dfd22e 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -481,6 +481,7 @@ enum { REQ_F_CAN_POLL_BIT, REQ_F_BL_EMPTY_BIT, REQ_F_BL_NO_RECYCLE_BIT, + REQ_F_BUFFERS_COMMIT_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -559,6 +560,8 @@ enum { REQ_F_BL_EMPTY = IO_REQ_FLAG(REQ_F_BL_EMPTY_BIT), /* don't recycle provided buffers for this request */ REQ_F_BL_NO_RECYCLE = IO_REQ_FLAG(REQ_F_BL_NO_RECYCLE_BIT), + /* buffer ring head needs incrementing on put */ + REQ_F_BUFFERS_COMMIT = IO_REQ_FLAG(REQ_F_BUFFERS_COMMIT_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 9be42bff936b..921e8e25e027 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -140,34 +140,57 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len, return NULL; } +static int io_provided_buffers_select(struct io_kiocb *req, size_t *len, + struct io_buffer_list *bl, + struct iovec *iov) +{ + void __user *buf; + + buf = io_provided_buffer_select(req, len, bl); + if (unlikely(!buf)) + return -ENOBUFS; + + iov[0].iov_base = buf; + iov[0].iov_len = *len; + return 0; +} + +static struct io_uring_buf *io_ring_head_to_buf(struct io_buffer_list *bl, + __u16 head) +{ + head &= bl->mask; + + /* mmaped buffers are always contig */ + if (bl->is_mmap || head < IO_BUFFER_LIST_BUF_PER_PAGE) { + return &bl->buf_ring->bufs[head]; + } else { + int off = head & (IO_BUFFER_LIST_BUF_PER_PAGE - 1); + int index = head / IO_BUFFER_LIST_BUF_PER_PAGE; + struct io_uring_buf *buf; + + buf = page_address(bl->buf_pages[index]); + return buf + off; + } +} + static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, struct io_buffer_list *bl, unsigned int issue_flags) { - struct io_uring_buf_ring *br = bl->buf_ring; __u16 tail, head = bl->head; struct io_uring_buf *buf; - tail = smp_load_acquire(&br->tail); + tail = smp_load_acquire(&bl->buf_ring->tail); if (unlikely(tail == head)) return NULL; if (head + 1 == tail) req->flags |= REQ_F_BL_EMPTY; - head &= bl->mask; - /* mmaped buffers are always contig */ - if (bl->is_mmap || head < IO_BUFFER_LIST_BUF_PER_PAGE) { - buf = &br->bufs[head]; - } else { - int off = head & (IO_BUFFER_LIST_BUF_PER_PAGE - 1); - int index = head / IO_BUFFER_LIST_BUF_PER_PAGE; - buf = page_address(bl->buf_pages[index]); - buf += off; - } + buf = io_ring_head_to_buf(bl, head); if (*len == 0 || *len > buf->len) *len = buf->len; - req->flags |= REQ_F_BUFFER_RING; + req->flags |= REQ_F_BUFFER_RING | REQ_F_BUFFERS_COMMIT; req->buf_list = bl; req->buf_index = buf->bid; @@ -182,6 +205,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, * the transfer completes (or if we get -EAGAIN and must poll of * retry). */ + req->flags &= ~REQ_F_BUFFERS_COMMIT; req->buf_list = NULL; bl->head++; } @@ -208,6 +232,159 @@ void __user *io_buffer_select(struct io_kiocb *req, size_t *len, return ret; } +static int io_ring_buffers_peek(struct io_kiocb *req, struct iovec **iovs, + int nr_iovs, size_t *out_len, + struct io_buffer_list *bl) +{ + struct iovec *iov = *iovs; + __u16 nr_avail, tail, head; + struct io_uring_buf *buf; + size_t max_len = 0; + int i; + + if (*out_len) { + max_len = *out_len; + *out_len = 0; + } + + tail = smp_load_acquire(&bl->buf_ring->tail); + head = bl->head; + nr_avail = tail - head; + if (unlikely(!nr_avail)) + return -ENOBUFS; + + buf = io_ring_head_to_buf(bl, head); + if (max_len) { + int needed; + + needed = (max_len + buf->len - 1) / buf->len; + /* cap it at a reasonable 256, will be one page even for 4K */ + needed = min(needed, 256); + if (nr_avail > needed) + nr_avail = needed; + } + + if (nr_avail > UIO_MAXIOV) + nr_avail = UIO_MAXIOV; + + /* + * only alloc a bigger array if we know we have data to map, eg not + * a speculative peek operation. + */ + if (nr_iovs == UIO_FASTIOV && nr_avail > nr_iovs && max_len) { + iov = kmalloc_array(nr_avail, sizeof(struct iovec), GFP_KERNEL); + if (unlikely(!iov)) + return -ENOMEM; + nr_iovs = nr_avail; + } else if (nr_avail < nr_iovs) { + nr_iovs = nr_avail; + } + + buf = io_ring_head_to_buf(bl, head); + req->buf_index = buf->bid; + + i = 0; + while (nr_iovs--) { + void __user *ubuf; + + /* truncate end piece, if needed */ + if (max_len && buf->len > max_len) + buf->len = max_len; + + ubuf = u64_to_user_ptr(buf->addr); + if (!access_ok(ubuf, buf->len)) + break; + iov[i].iov_base = ubuf; + iov[i].iov_len = buf->len; + *out_len += buf->len; + i++; + head++; + if (max_len) { + max_len -= buf->len; + if (!max_len) + break; + } + buf = io_ring_head_to_buf(bl, head); + } + + if (head == tail) + req->flags |= REQ_F_BL_EMPTY; + + if (i) { + req->flags |= REQ_F_BUFFER_RING; + *iovs = iov; + return i; + } + + if (iov != *iovs) + kfree(iov); + *iovs = NULL; + return -EFAULT; +} + +int io_buffers_select(struct io_kiocb *req, struct iovec **iovs, int nr_iovs, + size_t *out_len, unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_buffer_list *bl; + int ret = -ENOENT; + + io_ring_submit_lock(ctx, issue_flags); + bl = io_buffer_get_list(ctx, req->buf_index); + if (unlikely(!bl)) + goto out_unlock; + + if (bl->is_mapped) { + ret = io_ring_buffers_peek(req, iovs, nr_iovs, out_len, bl); + /* + * Don't recycle these buffers if we need to go through poll. + * Nobody else can use them anyway, and holding on to provided + * buffers for a send/write operation would happen on the app + * side anyway with normal buffers. Besides, we already + * committed them, they cannot be put back in the queue. + */ + req->buf_list = bl; + if (ret > 0) { + req->flags |= REQ_F_BL_NO_RECYCLE; + req->buf_list->head += ret; + } + } else { + ret = io_provided_buffers_select(req, out_len, bl, *iovs); + } +out_unlock: + io_ring_submit_unlock(ctx, issue_flags); + return ret; +} + +int io_buffers_peek(struct io_kiocb *req, struct iovec **iovs, int nr_iovs, + size_t *out_len) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_buffer_list *bl; + int ret; + + lockdep_assert_held(&ctx->uring_lock); + + if (req->buf_list) { + bl = req->buf_list; + } else { + bl = io_buffer_get_list(ctx, req->buf_index); + if (unlikely(!bl)) + return -ENOENT; + } + + /* don't support multiple buffer selections for legacy */ + if (!bl->is_mapped) + return io_provided_buffers_select(req, out_len, bl, *iovs); + + ret = io_ring_buffers_peek(req, iovs, nr_iovs, out_len, bl); + if (ret > 0) { + req->buf_list = bl; + req->flags |= REQ_F_BUFFERS_COMMIT; + } + return ret; +} + static __cold int io_init_bl_list(struct io_ring_ctx *ctx) { struct io_buffer_list *bl; diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index 5218bfd79e87..b4f48a144b73 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -43,6 +43,10 @@ struct io_buffer { void __user *io_buffer_select(struct io_kiocb *req, size_t *len, unsigned int issue_flags); +int io_buffers_select(struct io_kiocb *req, struct iovec **iovs, int nr_iovs, + size_t *out_len, unsigned int issue_flags); +int io_buffers_peek(struct io_kiocb *req, struct iovec **iovs, int nr_iovs, + size_t *out_len); void io_destroy_buffers(struct io_ring_ctx *ctx); int io_remove_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); @@ -74,7 +78,7 @@ static inline bool io_kbuf_recycle_ring(struct io_kiocb *req) */ if (req->buf_list) { req->buf_index = req->buf_list->bgid; - req->flags &= ~REQ_F_BUFFER_RING; + req->flags &= ~(REQ_F_BUFFER_RING|REQ_F_BUFFERS_COMMIT); return true; } return false; @@ -98,11 +102,16 @@ static inline bool io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags) return false; } -static inline void __io_put_kbuf_ring(struct io_kiocb *req) +static inline void __io_put_kbuf_ring(struct io_kiocb *req, int nr) { - if (req->buf_list) { - req->buf_index = req->buf_list->bgid; - req->buf_list->head++; + struct io_buffer_list *bl = req->buf_list; + + if (bl) { + if (req->flags & REQ_F_BUFFERS_COMMIT) { + bl->head += nr; + req->flags &= ~REQ_F_BUFFERS_COMMIT; + } + req->buf_index = bl->bgid; } req->flags &= ~REQ_F_BUFFER_RING; } @@ -111,7 +120,7 @@ static inline void __io_put_kbuf_list(struct io_kiocb *req, struct list_head *list) { if (req->flags & REQ_F_BUFFER_RING) { - __io_put_kbuf_ring(req); + __io_put_kbuf_ring(req, 1); } else { req->buf_index = req->kbuf->bgid; list_add(&req->kbuf->list, list); @@ -133,8 +142,8 @@ static inline unsigned int io_put_kbuf_comp(struct io_kiocb *req) return ret; } -static inline unsigned int io_put_kbuf(struct io_kiocb *req, - unsigned issue_flags) +static inline unsigned int __io_put_kbufs(struct io_kiocb *req, int nbufs, + unsigned issue_flags) { unsigned int ret; @@ -143,9 +152,21 @@ static inline unsigned int io_put_kbuf(struct io_kiocb *req, ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT); if (req->flags & REQ_F_BUFFER_RING) - __io_put_kbuf_ring(req); + __io_put_kbuf_ring(req, nbufs); else __io_put_kbuf(req, issue_flags); return ret; } + +static inline unsigned int io_put_kbuf(struct io_kiocb *req, + unsigned issue_flags) +{ + return __io_put_kbufs(req, 1, issue_flags); +} + +static inline unsigned int io_put_kbufs(struct io_kiocb *req, int nbufs, + unsigned issue_flags) +{ + return __io_put_kbufs(req, nbufs, issue_flags); +} #endif From patchwork Fri Mar 8 23:34:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13587437 Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB2113BBEE for ; Fri, 8 Mar 2024 23:50:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941862; cv=none; b=KoI3LD8jWSvoCQyzfBkcve4p/fTOJVeukbYy8JpKGnHGbrKeVM6Zstq7J0kRZF9SVKc/LszVfu3wJWoDV2Vam6HcJSbqIv3FpSRn/q086ES2tqxPKdWhVQ+iAbBXjf/UqWgImxPqKKK+I3g1+kJy2R9okZ26DLYcTH1XaiuE8TU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941862; c=relaxed/simple; bh=F0kN9OLv7rL4letscrzSLqAu4z6REgqxKDTFViBhjlk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WIK8FCbOvmo1+4k9jxTa/TM4sucfA9XKXdinrPKepiqbNpWNhNZDJjb4mBna3dIQG/dURlpGzQi/uzMS26P3zVVnKMMA2WxHA4QItz5lNg+J6XpUzln8l0hm/BrEt+bxNdJmOU7WCYFYgdqbP9iZjrVKhnBbzN2LzGizGeo2gug= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=AG3eVWP0; arc=none smtp.client-ip=209.85.166.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="AG3eVWP0" Received: by mail-io1-f51.google.com with SMTP id ca18e2360f4ac-7c876b9d070so40430739f.0 for ; Fri, 08 Mar 2024 15:50:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1709941858; x=1710546658; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=EcPGEUioe70jVig95jNwAdxi3Za2g/yEgT5UCyQmVEU=; b=AG3eVWP0OI8ioo2201WPb9Vok2dWPkZJQCTXdEtuMgNvF3Gi3x847HMqyyUJ+7cL/E hpAncrKmzvXE/QmmcoZWR5qdfWcflWLNjy+EnV/pLm84H0vo4IOzH4gV/NEd8AWV+KbU 1jqw02pQXe8Z1r53aFYa2uLg/542nmilXUbzRlVcOANNmKBeYQIF+R3P8kMgHQLIPWE3 cLj4UDPq3+JxJ0TrIZm1BHL3s5Gp3bjk2mvRitLubhmysDJNUUY+H0SOm7K26RqqvocF o6CF00scM8LJDLOsV1c4TEepnTeJbc64WtPBY1UQnwdu3Stg1Qszcs9lTJmsGEmC7iVC EIwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709941858; x=1710546658; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EcPGEUioe70jVig95jNwAdxi3Za2g/yEgT5UCyQmVEU=; b=pymWFRyvxFlj5quNjpxzVy5AAYnLnQSm8xr4qF1608a+7oaD7QuD6H76E03xYeqUuo 7vZPelu3c/QINVc90Kpmu4yrAqtwpHWm9n8PW0dQ6YBLGI+b6VKY9LHtvvo0af5l+Bch hw3HXaxJOMzuWGKnATqEdqK2hpA9JmlaYyW3giZ3BmRHWPI18kMHe5v/j8i2s1KyNYnE DTlu0cFxOvzeV+NMU5nBVWSVB7BDUUuAqe65tWMUWOW7s+2LApis7Kl6H210Jc9Tf5VF 5QNbMVriBhEIDXH+2lA9eP5BsELKnc3queQmF7ro8MtR8Wk1U4yBXR231PN8DYN9Js71 9ftw== X-Gm-Message-State: AOJu0YwL5WkpTBIIuh25jD0lu82vUi9YWE4gAgcCenWkqJgK/zISrr4E Gp7hwBIeRkKsQJW0oWULpqkHMbkjV10Lgs3zbZslV3Zsif92hH5yXB7Ww0Az+4mwvD85+WF3A6a P X-Google-Smtp-Source: AGHT+IE33YvToEKI3YIx7yp2F1KiPVf+JCBuNlrcvJntJa3kxrmFzmn90NQJSzEsp5pmiKntFc9DTg== X-Received: by 2002:a5d:970e:0:b0:7c8:9e3c:783 with SMTP id h14-20020a5d970e000000b007c89e3c0783mr515707iol.0.1709941858516; Fri, 08 Mar 2024 15:50:58 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a13-20020a056602208d00b007c870de3183sm94159ioa.49.2024.03.08.15.50.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 15:50:56 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, dyudaken@gmail.com, dw@davidwei.uk, Jens Axboe Subject: [PATCH 4/7] io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr Date: Fri, 8 Mar 2024 16:34:09 -0700 Message-ID: <20240308235045.1014125-5-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240308235045.1014125-1-axboe@kernel.dk> References: <20240308235045.1014125-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 No functional changes in this patch, just in preparation for carrying more state then we have now, if necessary. While unifying some of this code, add a generic send setup prep handler that they can both use. This gets rid of some manual msghdr and sockaddr on the stack, and makes it look a bit more like the sendmsg/recvmsg variants. We can probably unify a bit more on top of this going forward. Signed-off-by: Jens Axboe --- io_uring/net.c | 208 ++++++++++++++++++++++++----------------------- io_uring/opdef.c | 1 + 2 files changed, 109 insertions(+), 100 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 566ef401f976..66318fbba805 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -332,33 +332,23 @@ static int io_sendmsg_copy_hdr(struct io_kiocb *req, int io_send_prep_async(struct io_kiocb *req) { - struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg); + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); struct io_async_msghdr *io; int ret; - if (!zc->addr || req_has_async_data(req)) + if (req_has_async_data(req)) return 0; io = io_msg_alloc_async_prep(req); if (!io) return -ENOMEM; - ret = move_addr_to_kernel(zc->addr, zc->addr_len, &io->addr); - return ret; -} - -static int io_setup_async_addr(struct io_kiocb *req, - struct sockaddr_storage *addr_storage, - unsigned int issue_flags) -{ - struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); - struct io_async_msghdr *io; + memset(&io->msg, 0, sizeof(io->msg)); - if (!sr->addr || req_has_async_data(req)) - return -EAGAIN; - io = io_msg_alloc_async(req, issue_flags); - if (!io) - return -ENOMEM; - memcpy(&io->addr, addr_storage, sizeof(io->addr)); - return -EAGAIN; + ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &io->msg.msg_iter); + if (unlikely(ret)) + return ret; + if (sr->addr) + return move_addr_to_kernel(sr->addr, sr->addr_len, &io->addr); + return 0; } int io_sendmsg_prep_async(struct io_kiocb *req) @@ -480,46 +470,72 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } -int io_send(struct io_kiocb *req, unsigned int issue_flags) +static struct io_async_msghdr *io_send_setup(struct io_kiocb *req, + struct io_async_msghdr *stack_msg, + unsigned int issue_flags) { - struct sockaddr_storage __address; struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); - size_t len = sr->len; - struct socket *sock; - unsigned int cflags; - struct msghdr msg; - unsigned flags; - int min_ret = 0; + struct io_async_msghdr *kmsg; int ret; - msg.msg_name = NULL; - msg.msg_control = NULL; - msg.msg_controllen = 0; - msg.msg_namelen = 0; - msg.msg_ubuf = NULL; - - if (sr->addr) { - if (req_has_async_data(req)) { - struct io_async_msghdr *io = req->async_data; + if (req_has_async_data(req)) { + kmsg = req->async_data; + } else { + kmsg = stack_msg; + kmsg->free_iov = NULL; - msg.msg_name = &io->addr; - } else { - ret = move_addr_to_kernel(sr->addr, sr->addr_len, &__address); + if (sr->addr) { + ret = move_addr_to_kernel(sr->addr, sr->addr_len, + &kmsg->addr); if (unlikely(ret < 0)) - return ret; - msg.msg_name = (struct sockaddr *)&__address; + return ERR_PTR(ret); + } + + if (!io_do_buffer_select(req)) { + ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, + &kmsg->msg.msg_iter); + if (unlikely(ret)) + return ERR_PTR(ret); } - msg.msg_namelen = sr->addr_len; } + if (sr->addr) { + kmsg->msg.msg_name = &kmsg->addr; + kmsg->msg.msg_namelen = sr->addr_len; + } else { + kmsg->msg.msg_name = NULL; + kmsg->msg.msg_namelen = 0; + } + kmsg->msg.msg_control = NULL; + kmsg->msg.msg_controllen = 0; + kmsg->msg.msg_ubuf = NULL; + if (!(req->flags & REQ_F_POLLED) && (sr->flags & IORING_RECVSEND_POLL_FIRST)) - return io_setup_async_addr(req, &__address, issue_flags); + return ERR_PTR(io_setup_async_msg(req, kmsg, issue_flags)); + + return kmsg; +} + +int io_send(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); + struct io_async_msghdr iomsg, *kmsg; + size_t len = sr->len; + struct socket *sock; + unsigned int cflags; + unsigned flags; + int min_ret = 0; + int ret; sock = sock_from_file(req->file); if (unlikely(!sock)) return -ENOTSOCK; + kmsg = io_send_setup(req, &iomsg, issue_flags); + if (IS_ERR(kmsg)) + return PTR_ERR(kmsg); + if (io_do_buffer_select(req)) { void __user *buf; @@ -528,31 +544,29 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) return -ENOBUFS; sr->buf = buf; sr->len = len; - } - ret = import_ubuf(ITER_SOURCE, sr->buf, len, &msg.msg_iter); - if (unlikely(ret)) - return ret; + ret = import_ubuf(ITER_SOURCE, sr->buf, len, &kmsg->msg.msg_iter); + if (unlikely(ret)) + return ret; + } flags = sr->msg_flags; if (issue_flags & IO_URING_F_NONBLOCK) flags |= MSG_DONTWAIT; if (flags & MSG_WAITALL) - min_ret = iov_iter_count(&msg.msg_iter); + min_ret = iov_iter_count(&kmsg->msg.msg_iter); flags &= ~MSG_INTERNAL_SENDMSG_FLAGS; - msg.msg_flags = flags; - ret = sock_sendmsg(sock, &msg); + kmsg->msg.msg_flags = flags; + ret = sock_sendmsg(sock, &kmsg->msg); if (ret < min_ret) { if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) - return io_setup_async_addr(req, &__address, issue_flags); + return io_setup_async_msg(req, kmsg, issue_flags); if (ret > 0 && io_net_retry(sock, flags)) { - sr->len -= ret; - sr->buf += ret; sr->done_io += ret; req->flags |= REQ_F_BL_NO_RECYCLE; - return io_setup_async_addr(req, &__address, issue_flags); + return io_setup_async_msg(req, kmsg, issue_flags); } if (ret == -ERESTARTSYS) ret = -EINTR; @@ -562,6 +576,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) ret += sr->done_io; else if (sr->done_io) ret = sr->done_io; + io_req_msg_cleanup(req, kmsg, issue_flags); cflags = io_put_kbuf(req, issue_flags); io_req_set_res(req, ret, cflags); return IOU_OK; @@ -1165,11 +1180,35 @@ static int io_sg_from_iter(struct sock *sk, struct sk_buff *skb, return ret; } +static int io_send_zc_import(struct io_kiocb *req, struct io_async_msghdr *kmsg) +{ + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); + int ret; + + if (sr->flags & IORING_RECVSEND_FIXED_BUF) { + ret = io_import_fixed(ITER_SOURCE, &kmsg->msg.msg_iter, req->imu, + (u64)(uintptr_t)sr->buf, sr->len); + if (unlikely(ret)) + return ret; + kmsg->msg.sg_from_iter = io_sg_from_iter; + } else { + io_notif_set_extended(sr->notif); + ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &kmsg->msg.msg_iter); + if (unlikely(ret)) + return ret; + ret = io_notif_account_mem(sr->notif, sr->len); + if (unlikely(ret)) + return ret; + kmsg->msg.sg_from_iter = io_sg_from_iter_iovec; + } + + return ret; +} + int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) { - struct sockaddr_storage __address; struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg); - struct msghdr msg; + struct io_async_msghdr iomsg, *kmsg; struct socket *sock; unsigned msg_flags; int ret, min_ret = 0; @@ -1180,67 +1219,35 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) if (!test_bit(SOCK_SUPPORT_ZC, &sock->flags)) return -EOPNOTSUPP; - msg.msg_name = NULL; - msg.msg_control = NULL; - msg.msg_controllen = 0; - msg.msg_namelen = 0; - - if (zc->addr) { - if (req_has_async_data(req)) { - struct io_async_msghdr *io = req->async_data; + kmsg = io_send_setup(req, &iomsg, issue_flags); + if (IS_ERR(kmsg)) + return PTR_ERR(kmsg); - msg.msg_name = &io->addr; - } else { - ret = move_addr_to_kernel(zc->addr, zc->addr_len, &__address); - if (unlikely(ret < 0)) - return ret; - msg.msg_name = (struct sockaddr *)&__address; - } - msg.msg_namelen = zc->addr_len; - } - - if (!(req->flags & REQ_F_POLLED) && - (zc->flags & IORING_RECVSEND_POLL_FIRST)) - return io_setup_async_addr(req, &__address, issue_flags); - - if (zc->flags & IORING_RECVSEND_FIXED_BUF) { - ret = io_import_fixed(ITER_SOURCE, &msg.msg_iter, req->imu, - (u64)(uintptr_t)zc->buf, zc->len); - if (unlikely(ret)) - return ret; - msg.sg_from_iter = io_sg_from_iter; - } else { - io_notif_set_extended(zc->notif); - ret = import_ubuf(ITER_SOURCE, zc->buf, zc->len, &msg.msg_iter); + if (!zc->done_io) { + ret = io_send_zc_import(req, kmsg); if (unlikely(ret)) return ret; - ret = io_notif_account_mem(zc->notif, zc->len); - if (unlikely(ret)) - return ret; - msg.sg_from_iter = io_sg_from_iter_iovec; } msg_flags = zc->msg_flags | MSG_ZEROCOPY; if (issue_flags & IO_URING_F_NONBLOCK) msg_flags |= MSG_DONTWAIT; if (msg_flags & MSG_WAITALL) - min_ret = iov_iter_count(&msg.msg_iter); + min_ret = iov_iter_count(&kmsg->msg.msg_iter); msg_flags &= ~MSG_INTERNAL_SENDMSG_FLAGS; - msg.msg_flags = msg_flags; - msg.msg_ubuf = &io_notif_to_data(zc->notif)->uarg; - ret = sock_sendmsg(sock, &msg); + kmsg->msg.msg_flags = msg_flags; + kmsg->msg.msg_ubuf = &io_notif_to_data(zc->notif)->uarg; + ret = sock_sendmsg(sock, &kmsg->msg); if (unlikely(ret < min_ret)) { if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) - return io_setup_async_addr(req, &__address, issue_flags); + return io_setup_async_msg(req, kmsg, issue_flags); - if (ret > 0 && io_net_retry(sock, msg.msg_flags)) { - zc->len -= ret; - zc->buf += ret; + if (ret > 0 && io_net_retry(sock, kmsg->msg.msg_flags)) { zc->done_io += ret; req->flags |= REQ_F_BL_NO_RECYCLE; - return io_setup_async_addr(req, &__address, issue_flags); + return io_setup_async_msg(req, kmsg, issue_flags); } if (ret == -ERESTARTSYS) ret = -EINTR; @@ -1258,6 +1265,7 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) */ if (!(issue_flags & IO_URING_F_UNLOCKED)) { io_notif_flush(zc->notif); + io_netmsg_recycle(req, issue_flags); req->flags &= ~REQ_F_NEED_CLEANUP; } io_req_set_res(req, ret, IORING_CQE_F_MORE); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 88fbe5cfd379..dd932d1058f6 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -603,6 +603,7 @@ const struct io_cold_def io_cold_defs[] = { .name = "SEND", #if defined(CONFIG_NET) .async_size = sizeof(struct io_async_msghdr), + .cleanup = io_sendmsg_recvmsg_cleanup, .fail = io_sendrecv_fail, .prep_async = io_send_prep_async, #endif From patchwork Fri Mar 8 23:34:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13587438 Received: from mail-il1-f176.google.com (mail-il1-f176.google.com [209.85.166.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B64E43BBCA for ; Fri, 8 Mar 2024 23:51:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941863; cv=none; b=LuCZQiLGM4cx9dVRPPMiW1RZDBjB2EQ534mQwn0dMA6LMck15nM7H5Rd+B/JX42hdrz202rwOYgKX2/+/+FimTJx9qmt5f7bPt6dmcCGfWCUXU6b5BtwDgP/fKYhxpPZ/Juy5dxX3Q7iFHL5FM+98PzF7hf+TeSQtYPGv8KA1H8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941863; c=relaxed/simple; bh=GxtwBTZAZ5a+LfAgK9eNb5MCUgijZ+0EL0RCSa0o5RA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=auBw1hBxHE/WyrLRUfvaQ4FI8ME3H6+77Ju4ct87MUUEjNiuuwxNA5V5NE0+cXjBcOHnmIDCl4N5RypBzu21Syc+/qan6Mi6AMO3m1X55vsTJfeosUGlZL+n+P83cEqB8/oJAM6/n8iqnzxxHFaVj82ORA1PsyI1ayY0AYfESxk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=OADJMvA6; arc=none smtp.client-ip=209.85.166.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="OADJMvA6" Received: by mail-il1-f176.google.com with SMTP id e9e14a558f8ab-365c0dfc769so2589465ab.1 for ; Fri, 08 Mar 2024 15:51:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1709941860; x=1710546660; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eTG0D6slRskWELPyZNsc8gGMPxr+dwKo25yPsX+MUXg=; b=OADJMvA6PJp95WifHw3joV2jNqM9E4NwATovRceBDla26jxp7d2RhKDDmIxZ/AYgzL V0Y9s6dA7BK34pyYbys/V0xT4BqJ1NBJ12WeJbMxNDCOkK/a5jqb/uNtOFnJC5beSdVS /LDYi3tkB03kvx2mPXakti6dYsT61BOs+FGq2fA3kbgDcGeVQ3ad2jWXRivhpzivBZzr UIsV45vDNcL4HLm7xZdh1VQSC/Izm0U75w1RIgMks+k9GtHk5TQGxVO4oscu0l9CZc6M Gvliobxa7h/uMeUCsFmEmH11TAW2QmA1EjjecoQrMKLk1q9gOdMf+syR871RnKhkJvms kHfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709941860; x=1710546660; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eTG0D6slRskWELPyZNsc8gGMPxr+dwKo25yPsX+MUXg=; b=qrouez/nkwwITWoMdQE88YL0TXfz6fGXfVmr53MOJr9GZk/ZadgOGR1G2t7HBDkELO sEgDM4vBRhsqnkZ11rqNfqnCw5eJ55ZBid4zJ5E/O7jgeziIq2avWGU55STC4yQ444EN qlYXPbSEcSlW0tKE0gS7yvD2fKftS0NA5oQnUIGECcw3xQgtFG5xg/O/Otdt92PfaF8T JugLvShcTxQYZ+Gw6x/wPR2GDrjdx9a4Nq26uvGhwclDYfgRevXmhUXQ5qqseu33UIaz 714bxR8H2JEeR7HaQMbxGFS+wb9VMgIKcfmoe/Dg6N6U2wUHHvk56VBy3qoTyUEZLFaY 9PDw== X-Gm-Message-State: AOJu0YzDPqluH8JqorL8AIZKaQ1c7cEtqhKOM8w/0hMQ9jWGD+wKBoXQ XZorKXSxREkyy1BF/vxh21pYKWOH6oRKAJUL6y/AZiHNwOk3RskT/Niur/g9pZImtO5ka0iKefy 0 X-Google-Smtp-Source: AGHT+IEofgkF0++ogPUmw+anTE/N5q9FhPp1KXbu5YU7yWPrQ/qIvtHvequRudWiU/7jmk5e0dA25Q== X-Received: by 2002:a5d:9f1a:0:b0:7c8:7471:2f59 with SMTP id q26-20020a5d9f1a000000b007c874712f59mr477229iot.0.1709941860485; Fri, 08 Mar 2024 15:51:00 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a13-20020a056602208d00b007c870de3183sm94159ioa.49.2024.03.08.15.50.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 15:50:58 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, dyudaken@gmail.com, dw@davidwei.uk, Jens Axboe Subject: [PATCH 5/7] io_uring/net: support bundles for send Date: Fri, 8 Mar 2024 16:34:10 -0700 Message-ID: <20240308235045.1014125-6-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240308235045.1014125-1-axboe@kernel.dk> References: <20240308235045.1014125-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If IORING_OP_SEND is used with provided buffers, the caller may also set IORING_RECVSEND_BUNDLE to turn it into a multi-buffer send. The idea is that an application can fill outgoing buffers in a provided buffer group, and then arm a single send that will service them all. Once there are no more buffers to send, or if the requested length has been sent, the request posts a single completion for all the buffers. This only enables it for IORING_OP_SEND, IORING_OP_SENDMSG is coming in a separate patch. However, this patch does do a lot of the prep work that makes wiring up the sendmsg variant pretty trivial. They share the prep side. Signed-off-by: Jens Axboe --- include/uapi/linux/io_uring.h | 9 +++ io_uring/net.c | 138 +++++++++++++++++++++++++++++----- 2 files changed, 129 insertions(+), 18 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7bd10201a02b..3a0ff6da35de 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -351,11 +351,20 @@ enum io_uring_op { * 0 is reported if zerocopy was actually possible. * IORING_NOTIF_USAGE_ZC_COPIED if data was copied * (at least partially). + * + * IORING_RECVSEND_BUNDLE Used with IOSQE_BUFFER_SELECT. If set, send will + * grab as many buffers from the buffer group ID + * given and send them all. The completion result + * will be the number of buffers send, with the + * starting buffer ID in cqe->flags as per usual + * for provided buffer usage. The buffers will be + * contigious from the starting buffer ID. */ #define IORING_RECVSEND_POLL_FIRST (1U << 0) #define IORING_RECV_MULTISHOT (1U << 1) #define IORING_RECVSEND_FIXED_BUF (1U << 2) #define IORING_SEND_ZC_REPORT_USAGE (1U << 3) +#define IORING_RECVSEND_BUNDLE (1U << 4) /* * cqe.res for IORING_CQE_F_NOTIF if diff --git a/io_uring/net.c b/io_uring/net.c index 66318fbba805..0c4273005a68 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -370,6 +370,8 @@ void io_sendmsg_recvmsg_cleanup(struct io_kiocb *req) kfree(io->free_iov); } +#define SENDMSG_FLAGS (IORING_RECVSEND_POLL_FIRST | IORING_RECVSEND_BUNDLE) + int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); @@ -388,11 +390,20 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr)); sr->len = READ_ONCE(sqe->len); sr->flags = READ_ONCE(sqe->ioprio); - if (sr->flags & ~IORING_RECVSEND_POLL_FIRST) + if (sr->flags & ~SENDMSG_FLAGS) return -EINVAL; sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL; if (sr->msg_flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; + if (sr->flags & IORING_RECVSEND_BUNDLE) { + if (req->opcode == IORING_OP_SENDMSG) + return -EINVAL; + if (!(req->flags & REQ_F_BUFFER_SELECT)) + return -EINVAL; + sr->msg_flags |= MSG_WAITALL; + sr->buf_group = req->buf_index; + req->buf_list = NULL; + } #ifdef CONFIG_COMPAT if (req->ctx->compat) @@ -412,6 +423,84 @@ static void io_req_msg_cleanup(struct io_kiocb *req, io_netmsg_recycle(req, issue_flags); } +/* + * For bundle completions, we need to figure out how many segments we consumed. + * A bundle could be using a single ITER_UBUF if that's all we mapped, or it + * could be using an ITER_IOVEC. If the latter, then if we consumed all of + * the segments, then it's a trivial questiont o answer. If we have residual + * data in the iter, then loop the segments to figure out how much we + * transferred. + */ +static int io_bundle_nbufs(struct io_async_msghdr *kmsg, int ret) +{ + struct iovec *iov; + int nbufs; + + /* no data is always zero segments, and a ubuf is always 1 segment */ + if (ret <= 0) + return 0; + if (iter_is_ubuf(&kmsg->msg.msg_iter)) + return 1; + + iov = kmsg->free_iov; + if (!iov) + iov = kmsg->fast_iov; + + /* if all data was transferred, it's basic pointer math */ + if (!iov_iter_count(&kmsg->msg.msg_iter)) + return iter_iov(&kmsg->msg.msg_iter) - iov; + + /* short transfer, count segments */ + nbufs = 0; + do { + int this_len = min_t(int, iov[nbufs].iov_len, ret); + + nbufs++; + ret -= this_len; + } while (ret); + + return nbufs; +} + +static inline bool io_send_finish(struct io_kiocb *req, int *ret, + struct io_async_msghdr *kmsg, + unsigned issue_flags) +{ + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); + bool bundle_finished = *ret <= 0; + unsigned int cflags; + + if (!(sr->flags & IORING_RECVSEND_BUNDLE)) { + cflags = io_put_kbuf(req, issue_flags); + goto finish; + } + + cflags = io_put_kbufs(req, io_bundle_nbufs(kmsg, *ret), issue_flags); + + if (bundle_finished || req->flags & REQ_F_BL_EMPTY) + goto finish; + + /* + * Fill CQE for this receive and see if we should keep trying to + * receive from this socket. + */ + if (io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER, + *ret, cflags | IORING_CQE_F_MORE)) { + io_mshot_prep_retry(req); + if (kmsg->free_iov) { + kfree(kmsg->free_iov); + kmsg->free_iov = NULL; + } + return false; + } + + /* Otherwise stop bundle and use the current result. */ +finish: + io_req_set_res(req, *ret, cflags); + *ret = IOU_OK; + return true; +} + int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); @@ -521,9 +610,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); struct io_async_msghdr iomsg, *kmsg; - size_t len = sr->len; struct socket *sock; - unsigned int cflags; unsigned flags; int min_ret = 0; int ret; @@ -536,24 +623,37 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) if (IS_ERR(kmsg)) return PTR_ERR(kmsg); + flags = sr->msg_flags; + if (issue_flags & IO_URING_F_NONBLOCK) + flags |= MSG_DONTWAIT; + +retry_bundle: if (io_do_buffer_select(req)) { - void __user *buf; + size_t len = min_not_zero(sr->len, (unsigned) INT_MAX); + int max_segs = ARRAY_SIZE(kmsg->fast_iov); - buf = io_buffer_select(req, &len, issue_flags); - if (!buf) - return -ENOBUFS; - sr->buf = buf; - sr->len = len; + if (!(sr->flags & IORING_RECVSEND_BUNDLE)) + max_segs = 1; - ret = import_ubuf(ITER_SOURCE, sr->buf, len, &kmsg->msg.msg_iter); - if (unlikely(ret)) + kmsg->free_iov = kmsg->fast_iov; + ret = io_buffers_select(req, &kmsg->free_iov, max_segs, &len, + issue_flags); + if (unlikely(ret < 0)) return ret; + + sr->len = len; + iov_iter_init(&kmsg->msg.msg_iter, ITER_SOURCE, kmsg->free_iov, + ret, len); + if (kmsg->free_iov == kmsg->fast_iov) + kmsg->free_iov = NULL; } - flags = sr->msg_flags; - if (issue_flags & IO_URING_F_NONBLOCK) - flags |= MSG_DONTWAIT; - if (flags & MSG_WAITALL) + /* + * If MSG_WAITALL is set, or this is a bundle send, then we need + * the full amount. If just bundle is set, if we do a short send + * then we complete the bundle sequence rather than continue on. + */ + if (flags & MSG_WAITALL || sr->flags & IORING_RECVSEND_BUNDLE) min_ret = iov_iter_count(&kmsg->msg.msg_iter); flags &= ~MSG_INTERNAL_SENDMSG_FLAGS; @@ -576,10 +676,12 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) ret += sr->done_io; else if (sr->done_io) ret = sr->done_io; + + if (!io_send_finish(req, &ret, kmsg, issue_flags)) + goto retry_bundle; + io_req_msg_cleanup(req, kmsg, issue_flags); - cflags = io_put_kbuf(req, issue_flags); - io_req_set_res(req, ret, cflags); - return IOU_OK; + return ret; } static int io_recvmsg_mshot_prep(struct io_kiocb *req, From patchwork Fri Mar 8 23:34:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13587439 Received: from mail-il1-f180.google.com (mail-il1-f180.google.com [209.85.166.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7AC323BBDF for ; Fri, 8 Mar 2024 23:51:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941865; cv=none; b=RSgNpuK3lu4uX2fRVmCPuSGinrRNDLZ/GUphGqoW80KWtWo3bYNN50hIeSIKox7buhpYVVEkzwqPcfT8kVE7P2l2HdncN8ilS1RNqG5jIXJ1PKf43G3xr8EGFL6vY0dKXPzAfneFphU8nE9FGKr/kzzjwhCTaqYpSFqaIzuSd0I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941865; c=relaxed/simple; bh=9UW8qBwf9UAc88RLcTIsCUBA3dLwCJS6ZsATmHYT4UU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pZHjLOUCF1E+NQ3fxrtRer8yy7zotI5690Dqz3PkP8AeYuYBu8eUcKP1A4Ynx1PlbitPXSVg9QWrljUySXl7BwC8sd7+GIKaITDN2h5PeZljTLlLUSY+Kx2HbthXFWqJvkIctFzP2nTtzJE2rXAb3DbG1IRMx4xSOS/TyQOzHb0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=u5sEDg+w; arc=none smtp.client-ip=209.85.166.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="u5sEDg+w" Received: by mail-il1-f180.google.com with SMTP id e9e14a558f8ab-365c0dfc769so2589495ab.1 for ; Fri, 08 Mar 2024 15:51:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1709941862; x=1710546662; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LnfRfMCVXzTKxTqnhLkiRrq9TkSQydBtlMBGU1s1kPc=; b=u5sEDg+wEyE0s3YUM8DFzPJkUnUoooGPXVHaI1eIILaxXKeqdS+JhAynce0/d7XNAs j8hJP21ruBt8ud4oVc3zZ0gKZIv9KECHY+HLr2nnNIwCSJpm0ZdJE/L6ybHjvMvf5rJk w3OmZvbcHo613JxquLdAAe4mKzZZtYLSz/YkWc8zuH0djhDSjb1rtDeVjB/Npi0QtQ5V JV2UBAvW7yfsEu9Oj4Pw+/NV7+pastMU2wcwNUe1awZZSNEgBlu73u7jKLQ2X25vKZb6 0Ymd6sfJY59UJq81ALmWTw3H4KylrmlBVg5wFAJL3gmb2Ad71Q3xxHzoCMV+NOzLauke kAQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709941862; x=1710546662; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LnfRfMCVXzTKxTqnhLkiRrq9TkSQydBtlMBGU1s1kPc=; b=bF8o3/5AkSlus3niAJK72IVRL1hcB7YizpWovDBUCzBPSUXS6BLJW/7qkGgTTj/3oA CcsrTtSV8a8DUoYb+fiiBzgTarfzXN84yBURVpR24OZVk8US+7dSr3rloR23nSfGQXjU K+iEUiG2pUBkdCOhVsDhPfSLarr/Z+RN7+OnlW80lNN6G8vjUi2A3RICqKvflcg5h/ts ud3tYklM6OE6+dp9iNbAQfrvLZbKLuHNkq+asa+hxAKgyZqAkHrTZFEUK0Gvlo+vJZYp l5yMWbQQ7m1RmonmYxOjL1i9WKc5JSqhdWI0g+EES4HoPHdm/MBZOfvltrEkUq8e6U38 4iJw== X-Gm-Message-State: AOJu0Yy7qbpeNGctJ1heMgTsHW6bY0KTsjPd1KoeNaVSqlGOxhueafSN AczxzWJCL5aChbBpgL7JVwnIu5IUdhdYmyCK9xlFD4/nlt515MGzYGuVnRTMF/fn3evi5j2rHwF h X-Google-Smtp-Source: AGHT+IEEPLmBz/fgtIui8x1nnD6juOEWau6rusl6cTNFec3qIb/Nknteti5lRysDbD4OIGpLitOlyA== X-Received: by 2002:a6b:6514:0:b0:7c8:9f0c:bd92 with SMTP id z20-20020a6b6514000000b007c89f0cbd92mr349523iob.2.1709941862362; Fri, 08 Mar 2024 15:51:02 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a13-20020a056602208d00b007c870de3183sm94159ioa.49.2024.03.08.15.51.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 15:51:00 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, dyudaken@gmail.com, dw@davidwei.uk, Jens Axboe Subject: [PATCH 6/7] io_uring/net: switch io_recv() to using io_async_msghdr Date: Fri, 8 Mar 2024 16:34:11 -0700 Message-ID: <20240308235045.1014125-7-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240308235045.1014125-1-axboe@kernel.dk> References: <20240308235045.1014125-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 No functional changes in this patch, just in preparation for carrying more state then we have now, if necessary. Signed-off-by: Jens Axboe --- io_uring/net.c | 78 +++++++++++++++++++++++++++++------------------- io_uring/net.h | 2 +- io_uring/opdef.c | 7 +++-- 3 files changed, 54 insertions(+), 33 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 0c4273005a68..07831e764068 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -330,7 +330,7 @@ static int io_sendmsg_copy_hdr(struct io_kiocb *req, return ret; } -int io_send_prep_async(struct io_kiocb *req) +int io_sendrecv_prep_async(struct io_kiocb *req) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); struct io_async_msghdr *io; @@ -815,13 +815,13 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) * again (for multishot). */ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, - struct msghdr *msg, bool mshot_finished, - unsigned issue_flags) + struct io_async_msghdr *kmsg, + bool mshot_finished, unsigned issue_flags) { unsigned int cflags; cflags = io_put_kbuf(req, issue_flags); - if (msg->msg_inq > 0) + if (kmsg->msg.msg_inq > 0) cflags |= IORING_CQE_F_SOCK_NONEMPTY; /* @@ -836,7 +836,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, io_mshot_prep_retry(req); /* Known not-empty or unknown state, retry */ - if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq < 0) { + if (cflags & IORING_CQE_F_SOCK_NONEMPTY || kmsg->msg.msg_inq < 0) { if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY) return false; /* mshot retries exceeded, force a requeue */ @@ -1037,7 +1037,7 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags) else io_kbuf_recycle(req, issue_flags); - if (!io_recv_finish(req, &ret, &kmsg->msg, mshot_finished, issue_flags)) + if (!io_recv_finish(req, &ret, kmsg, mshot_finished, issue_flags)) goto retry_multishot; if (mshot_finished) @@ -1051,29 +1051,42 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags) int io_recv(struct io_kiocb *req, unsigned int issue_flags) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); - struct msghdr msg; + struct io_async_msghdr iomsg, *kmsg; struct socket *sock; unsigned flags; int ret, min_ret = 0; bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK; size_t len = sr->len; + if (req_has_async_data(req)) { + kmsg = req->async_data; + } else { + kmsg = &iomsg; + kmsg->free_iov = NULL; + kmsg->msg.msg_name = NULL; + kmsg->msg.msg_namelen = 0; + kmsg->msg.msg_control = NULL; + kmsg->msg.msg_get_inq = 1; + kmsg->msg.msg_controllen = 0; + kmsg->msg.msg_iocb = NULL; + kmsg->msg.msg_ubuf = NULL; + + if (!io_do_buffer_select(req)) { + ret = import_ubuf(ITER_DEST, sr->buf, sr->len, + &kmsg->msg.msg_iter); + if (unlikely(ret)) + return ret; + } + } + if (!(req->flags & REQ_F_POLLED) && (sr->flags & IORING_RECVSEND_POLL_FIRST)) - return -EAGAIN; + return io_setup_async_msg(req, kmsg, issue_flags); sock = sock_from_file(req->file); if (unlikely(!sock)) return -ENOTSOCK; - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_control = NULL; - msg.msg_get_inq = 1; - msg.msg_controllen = 0; - msg.msg_iocb = NULL; - msg.msg_ubuf = NULL; - flags = sr->msg_flags; if (force_nonblock) flags |= MSG_DONTWAIT; @@ -1087,22 +1100,23 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return -ENOBUFS; sr->buf = buf; sr->len = len; + ret = import_ubuf(ITER_DEST, sr->buf, sr->len, + &kmsg->msg.msg_iter); + if (unlikely(ret)) + goto out_free; } - ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter); - if (unlikely(ret)) - goto out_free; - - msg.msg_inq = -1; - msg.msg_flags = 0; + kmsg->msg.msg_inq = -1; + kmsg->msg.msg_flags = 0; if (flags & MSG_WAITALL) - min_ret = iov_iter_count(&msg.msg_iter); + min_ret = iov_iter_count(&kmsg->msg.msg_iter); - ret = sock_recvmsg(sock, &msg, flags); + ret = sock_recvmsg(sock, &kmsg->msg, flags); if (ret < min_ret) { if (ret == -EAGAIN && force_nonblock) { - if (issue_flags & IO_URING_F_MULTISHOT) { + ret = io_setup_async_msg(req, kmsg, issue_flags); + if (ret == -EAGAIN && issue_flags & IO_URING_F_MULTISHOT) { io_kbuf_recycle(req, issue_flags); return IOU_ISSUE_SKIP_COMPLETE; } @@ -1110,16 +1124,14 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return -EAGAIN; } if (ret > 0 && io_net_retry(sock, flags)) { - sr->len -= ret; - sr->buf += ret; sr->done_io += ret; req->flags |= REQ_F_BL_NO_RECYCLE; - return -EAGAIN; + return io_setup_async_msg(req, kmsg, issue_flags); } if (ret == -ERESTARTSYS) ret = -EINTR; req_set_fail(req); - } else if ((flags & MSG_WAITALL) && (msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) { + } else if ((flags & MSG_WAITALL) && (kmsg->msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) { out_free: req_set_fail(req); } @@ -1131,9 +1143,15 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) else io_kbuf_recycle(req, issue_flags); - if (!io_recv_finish(req, &ret, &msg, ret <= 0, issue_flags)) + if (!io_recv_finish(req, &ret, kmsg, ret <= 0, issue_flags)) { + if (kmsg->free_iov) { + kfree(kmsg->free_iov); + kmsg->free_iov = NULL; + } goto retry_multishot; + } + io_req_msg_cleanup(req, kmsg, issue_flags); return ret; } diff --git a/io_uring/net.h b/io_uring/net.h index 191009979bcb..5c1230f1aaf9 100644 --- a/io_uring/net.h +++ b/io_uring/net.h @@ -40,7 +40,7 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags); int io_send(struct io_kiocb *req, unsigned int issue_flags); -int io_send_prep_async(struct io_kiocb *req); +int io_sendrecv_prep_async(struct io_kiocb *req); int io_recvmsg_prep_async(struct io_kiocb *req); int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index dd932d1058f6..352f743d6a69 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -605,13 +605,16 @@ const struct io_cold_def io_cold_defs[] = { .async_size = sizeof(struct io_async_msghdr), .cleanup = io_sendmsg_recvmsg_cleanup, .fail = io_sendrecv_fail, - .prep_async = io_send_prep_async, + .prep_async = io_sendrecv_prep_async, #endif }, [IORING_OP_RECV] = { .name = "RECV", #if defined(CONFIG_NET) + .async_size = sizeof(struct io_async_msghdr), + .cleanup = io_sendmsg_recvmsg_cleanup, .fail = io_sendrecv_fail, + .prep_async = io_sendrecv_prep_async, #endif }, [IORING_OP_OPENAT2] = { @@ -688,7 +691,7 @@ const struct io_cold_def io_cold_defs[] = { .name = "SEND_ZC", #if defined(CONFIG_NET) .async_size = sizeof(struct io_async_msghdr), - .prep_async = io_send_prep_async, + .prep_async = io_sendrecv_prep_async, .cleanup = io_send_zc_cleanup, .fail = io_sendrecv_fail, #endif From patchwork Fri Mar 8 23:34:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13587440 Received: from mail-io1-f54.google.com (mail-io1-f54.google.com [209.85.166.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0A223D0C6 for ; Fri, 8 Mar 2024 23:51:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941867; cv=none; b=d6W1DTIApEx47T7FW9XMwYzXL2pF71Ysy6hxcYGoVQws8uofkwvdz1q4SBmNA7+yFmpe9RWx6/+5frXqyxlErU+otUyZPB57HcP161p9sNepIngZrWqea/SQK/QtmI6VO1U7CyfWEINa6u71cqu3FWDqJ851GGO0NU2N5rQxgIk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709941867; c=relaxed/simple; bh=h34qolhJ4ag202048Q5TTCRyB4jy431LjVCEGGbPAsY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UXvegJ1FZDNknXRwZBz+2bUvK1hZzjr/6zzcI95cZRsHSMfRQlZITt6i3jZvcHvlczfjEnLlRdTNskeSV931uXgnsDQXlj8wisnIAvoX6lD3hjunTq/lNwQ1Xps0sXRMNnCMWCZTczFTnnZL/4/abea8bRjiGYfXtx6Zg75IFUc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=NKQaJGor; arc=none smtp.client-ip=209.85.166.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="NKQaJGor" Received: by mail-io1-f54.google.com with SMTP id ca18e2360f4ac-7c495be1924so38308139f.1 for ; Fri, 08 Mar 2024 15:51:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1709941864; x=1710546664; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LpdR4+HAExoO+9KRr8x0nArtDVP0MJit8Bxe9BBdbrY=; b=NKQaJGorB+ImO108bJHXsm8zege1rWra1ilF8XIzjdcEl9nGm9xJALlxfLceK8LmRA XJQKVbdJm04P11hVz7n3hLB+qXCEZL2RsTrkJ9u8vv0E+fTK/SqNTm+aT5vjgAVSAByP hAXilTwWDURA5xKyRvXdCmV5EUByulHXT/Up1NznxRxWovWEn6IiTZFGpjw7qiQZQ/fK QuY/ZJ4L6k9SZN/fmLWkBHtj8QTA3OEs5NHnTmbt+scrl2GG7J3vCc3sQYcfEU7TNRnp G5sMDlneM8l/79osvLweIIt+wFiNZ10Um70jTz5v7BlA5yfy72C3SW7CP7FNvlGLwtLj F0/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709941864; x=1710546664; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LpdR4+HAExoO+9KRr8x0nArtDVP0MJit8Bxe9BBdbrY=; b=E3O1a3YHhwlawv/k0JzANJQ/kxNiqALatwC7ZReRDy+lo934hh8bwtbCbvI6+wdNaf 9P5/BEgrn6lIKgTtKT3LGLD8d3mxVMbubLN7MNIu/rsCSogIj/DlsBAkhliFhYmsxe+x CoUsB/qAPVNaRP8RbggbWLazRhCjJmWqy0RVpLxHccKkOhczQW+EGy3Cr3jmLas42UT9 ywlQWvZ8nEi4ruaJ7oUKpeGUV3gwscHfnxnIEezii1C8PUHEE36Asxa0zyqhZKwQbIZb 4vxhh1YCjxlJf9Xu4i78IyJrA/w8qNobxpPLZj7CTd1zfbTVB7UASj44396VJ4FQ5pvs /9Vw== X-Gm-Message-State: AOJu0YwXSy6bxmLk/TaFVoWdtBn5vxXIcrvxusHTj3FhpvYMM1pX5SSx 3lM54CEjYoakGJEVwtfMsZrKWgQvgbLkVdks0QqJFZ4e5LR41+HnIO6JxJuD/oTSS5hKxMkt63h N X-Google-Smtp-Source: AGHT+IF//929OIuifoKtd8DGkPMLmIyS4eQzyLQa9+7ifOJ82aa8E9JAk5ZalQqiaa+9NK2LL+MkbA== X-Received: by 2002:a5e:df01:0:b0:7c8:6f1f:d44 with SMTP id f1-20020a5edf01000000b007c86f1f0d44mr509633ioq.1.1709941864553; Fri, 08 Mar 2024 15:51:04 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a13-20020a056602208d00b007c870de3183sm94159ioa.49.2024.03.08.15.51.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 15:51:02 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, dyudaken@gmail.com, dw@davidwei.uk, Jens Axboe Subject: [PATCH 7/7] io_uring/net: support bundles for recv Date: Fri, 8 Mar 2024 16:34:12 -0700 Message-ID: <20240308235045.1014125-8-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240308235045.1014125-1-axboe@kernel.dk> References: <20240308235045.1014125-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If IORING_OP_RECV is used with provided buffers, the caller may also set IORING_RECVSEND_BUNDLE to turn it into a multi-buffer recv. This grabs buffers available and receives into them, posting a single completion for all of it. This can be used with multishot receive as well, or without it. Now that both send and receive support bundles, add a feature flag for it as well. If IORING_FEAT_RECVSEND_BUNDLE is set after registering the ring, then the kernel supports bundles for recv and send. Signed-off-by: Jens Axboe --- include/uapi/linux/io_uring.h | 15 +++-- io_uring/io_uring.c | 3 +- io_uring/net.c | 119 ++++++++++++++++++++++++++-------- 3 files changed, 101 insertions(+), 36 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 3a0ff6da35de..9cf6c45149dd 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -352,13 +352,13 @@ enum io_uring_op { * IORING_NOTIF_USAGE_ZC_COPIED if data was copied * (at least partially). * - * IORING_RECVSEND_BUNDLE Used with IOSQE_BUFFER_SELECT. If set, send will - * grab as many buffers from the buffer group ID - * given and send them all. The completion result - * will be the number of buffers send, with the - * starting buffer ID in cqe->flags as per usual - * for provided buffer usage. The buffers will be - * contigious from the starting buffer ID. + * IORING_RECVSEND_BUNDLE Used with IOSQE_BUFFER_SELECT. If set, send or + * recv will grab as many buffers from the buffer + * group ID given and send them all. The completion + * result will be the number of buffers send, with + * the starting buffer ID in cqe->flags as per + * usual for provided buffer usage. The buffers + * will be contigious from the starting buffer ID. */ #define IORING_RECVSEND_POLL_FIRST (1U << 0) #define IORING_RECV_MULTISHOT (1U << 1) @@ -531,6 +531,7 @@ struct io_uring_params { #define IORING_FEAT_CQE_SKIP (1U << 11) #define IORING_FEAT_LINKED_FILE (1U << 12) #define IORING_FEAT_REG_REG_RING (1U << 13) +#define IORING_FEAT_RECVSEND_BUNDLE (1U << 14) /* * io_uring_register(2) opcodes and arguments diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index cf348c33f485..112c21053e6f 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3982,7 +3982,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED | IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS | IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP | - IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING; + IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING | + IORING_FEAT_RECVSEND_BUNDLE; if (copy_to_user(params, p, sizeof(*p))) { ret = -EFAULT; diff --git a/io_uring/net.c b/io_uring/net.c index 07831e764068..c671ecb5b849 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -760,7 +760,8 @@ int io_recvmsg_prep_async(struct io_kiocb *req) return ret; } -#define RECVMSG_FLAGS (IORING_RECVSEND_POLL_FIRST | IORING_RECV_MULTISHOT) +#define RECVMSG_FLAGS (IORING_RECVSEND_POLL_FIRST | IORING_RECV_MULTISHOT | \ + IORING_RECVSEND_BUNDLE) int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { @@ -774,21 +775,14 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr)); sr->len = READ_ONCE(sqe->len); sr->flags = READ_ONCE(sqe->ioprio); - if (sr->flags & ~(RECVMSG_FLAGS)) + if (sr->flags & ~RECVMSG_FLAGS) return -EINVAL; sr->msg_flags = READ_ONCE(sqe->msg_flags); if (sr->msg_flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; if (sr->msg_flags & MSG_ERRQUEUE) req->flags |= REQ_F_CLEAR_POLLIN; - if (sr->flags & IORING_RECV_MULTISHOT) { - if (!(req->flags & REQ_F_BUFFER_SELECT)) - return -EINVAL; - if (sr->msg_flags & MSG_WAITALL) - return -EINVAL; - if (req->opcode == IORING_OP_RECV && sr->len) - return -EINVAL; - req->flags |= REQ_F_APOLL_MULTISHOT; + if (req->flags & REQ_F_BUFFER_SELECT) { /* * Store the buffer group for this multishot receive separately, * as if we end up doing an io-wq based issue that selects a @@ -798,6 +792,20 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) * restore it. */ sr->buf_group = req->buf_index; + req->buf_list = NULL; + } + if (sr->flags & IORING_RECV_MULTISHOT) { + if (!(req->flags & REQ_F_BUFFER_SELECT)) + return -EINVAL; + if (sr->msg_flags & MSG_WAITALL) + return -EINVAL; + if (req->opcode == IORING_OP_RECV && sr->len) + return -EINVAL; + req->flags |= REQ_F_APOLL_MULTISHOT; + } + if (sr->flags & IORING_RECVSEND_BUNDLE) { + if (req->opcode == IORING_OP_RECVMSG) + return -EINVAL; } #ifdef CONFIG_COMPAT @@ -818,12 +826,22 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, struct io_async_msghdr *kmsg, bool mshot_finished, unsigned issue_flags) { + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); unsigned int cflags; - cflags = io_put_kbuf(req, issue_flags); + if (sr->flags & IORING_RECVSEND_BUNDLE) + cflags = io_put_kbufs(req, io_bundle_nbufs(kmsg, *ret), + issue_flags); + else + cflags = io_put_kbuf(req, issue_flags); + if (kmsg->msg.msg_inq > 0) cflags |= IORING_CQE_F_SOCK_NONEMPTY; + /* bundle with no more immediate buffers, we're done */ + if (sr->flags & IORING_RECVSEND_BUNDLE && req->flags & REQ_F_BL_EMPTY) + goto finish; + /* * Fill CQE for this receive and see if we should keep trying to * receive from this socket. @@ -831,14 +849,18 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, if ((req->flags & REQ_F_APOLL_MULTISHOT) && !mshot_finished && io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER, *ret, cflags | IORING_CQE_F_MORE)) { - struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE; io_mshot_prep_retry(req); /* Known not-empty or unknown state, retry */ if (cflags & IORING_CQE_F_SOCK_NONEMPTY || kmsg->msg.msg_inq < 0) { - if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY) + if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY) { + if (kmsg->free_iov) { + kfree(kmsg->free_iov); + kmsg->free_iov = NULL; + } return false; + } /* mshot retries exceeded, force a requeue */ sr->nr_multishot_loops = 0; mshot_retry_ret = IOU_REQUEUE; @@ -851,6 +873,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, } /* Finish the request / stop multishot. */ +finish: io_req_set_res(req, *ret, cflags); if (issue_flags & IO_URING_F_MULTISHOT) @@ -1048,6 +1071,58 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags) return ret; } +static int io_recv_buf_select(struct io_kiocb *req, struct io_async_msghdr *kmsg, + size_t *len, unsigned int issue_flags) +{ + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); + int ret; + + /* + * If the ring isn't locked, then don't use the peek interface + * to grab multiple buffers as we will lock/unlock between + * this selection and posting the buffers. + */ + if (!(issue_flags & IO_URING_F_UNLOCKED) && + sr->flags & IORING_RECVSEND_BUNDLE) { + struct iovec *iov = kmsg->fast_iov; + + *len = 0; + if (kmsg->msg.msg_inq > 0) { + *len = kmsg->msg.msg_inq; + if (sr->len && *len > sr->len) + *len = sr->len; + } + ret = io_buffers_peek(req, &iov, ARRAY_SIZE(kmsg->fast_iov), len); + if (unlikely(ret < 0)) + return ret; + + if (ret == 1) { + sr->buf = iov->iov_base; + sr->len = iov->iov_len; + goto ubuf; + } + iov_iter_init(&kmsg->msg.msg_iter, ITER_DEST, iov, ret, *len); + if (iov != kmsg->fast_iov) + kmsg->free_iov = iov; + } else { + void __user *buf; + + *len = sr->len; + buf = io_buffer_select(req, len, issue_flags); + if (!buf) + return -ENOBUFS; + sr->buf = buf; + sr->len = *len; +ubuf: + ret = import_ubuf(ITER_DEST, sr->buf, sr->len, + &kmsg->msg.msg_iter); + if (unlikely(ret)) + return ret; + } + + return 0; +} + int io_recv(struct io_kiocb *req, unsigned int issue_flags) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); @@ -1093,17 +1168,10 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) retry_multishot: if (io_do_buffer_select(req)) { - void __user *buf; - - buf = io_buffer_select(req, &len, issue_flags); - if (!buf) - return -ENOBUFS; - sr->buf = buf; - sr->len = len; - ret = import_ubuf(ITER_DEST, sr->buf, sr->len, - &kmsg->msg.msg_iter); + ret = io_recv_buf_select(req, kmsg, &len, issue_flags); if (unlikely(ret)) goto out_free; + sr->buf = NULL; } kmsg->msg.msg_inq = -1; @@ -1143,13 +1211,8 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) else io_kbuf_recycle(req, issue_flags); - if (!io_recv_finish(req, &ret, kmsg, ret <= 0, issue_flags)) { - if (kmsg->free_iov) { - kfree(kmsg->free_iov); - kmsg->free_iov = NULL; - } + if (!io_recv_finish(req, &ret, kmsg, ret <= 0, issue_flags)) goto retry_multishot; - } io_req_msg_cleanup(req, kmsg, issue_flags); return ret;