From patchwork Sat Apr 20 13:29:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13637149 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20E1C1C6AD for ; Sat, 20 Apr 2024 13:32:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713619975; cv=none; b=MXQqxZu+UajaSulSpxxqx3f9EGl+RnNiBMShu1qRUG1K03GnjmWaYBksEYYao/i+Pmvn9rqduxt1ye4qv/wRYnr5wa84SY0LZWo2QWBLkTViyywEm8QoUze5+tvYLPVFBloRaQHEe/mbzFZPDNiL5xsWQ0h+elR/yZYi5g/qFjA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713619975; c=relaxed/simple; bh=oJw5EzDVx/dmOCawavEWxKsWWt81olYKRkyzuzjvX44=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tEv9uf8j4X4rKtHxEWRWuypZ+y9HbMWezXVdU9HTwSY4pG3bqJYltVDknkLaQmQztQ3ol3wi6bn5HfTo268gAs2RztSKRM8BF2LoFUPRpl/wBrUOOBrr8wNvDdqWngZ1x+oZJy1z10AX7d5x0D/khtxT+e9pX+c/CM6HPrdM0SQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=iEXd1H1d; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="iEXd1H1d" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2a58d2e5be8so723551a91.0 for ; Sat, 20 Apr 2024 06:32:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1713619971; x=1714224771; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2tklnKug3hQ8ywNTEYriTbnu9sFUy/MamRSTdfmmWv8=; b=iEXd1H1dAp0rkSI4bIPjXUWtKCuGd0cOzNA2xeVNs4gC+uYHJQlk/6dXOfN05CkK/W cQK3XiiSPcnKBc1bz3lBdT3cSLpQHvohyZ3lYfKv7PI61qyFd0KUj1W5w+bvKrP802Kn 5EwEyD/rCMdgUUA5937Et9L+8qBfGlu2AZ1BXEGaliXzAkxfUHxONOYqUhZ88nI9yN3e Wv+y16b145MsuBGtUg5eZB1Ay34e5CLMQJc7dzxcW2yit7GS2zy/usLxl4hb2id6D0Yz QfioWpP1ESOyaltv0IJ3Uczi6PiFtiBiRJWIMyDma9KeXhIRzeV9tuLbhPCk0hgq0rB5 2N4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713619971; x=1714224771; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2tklnKug3hQ8ywNTEYriTbnu9sFUy/MamRSTdfmmWv8=; b=p/lCZ5UNrFe8dpCl6vuuEBWyeXnFVjHIhTnrG3iFQk1bJV8kjTszyNDlxO668jxIaL azHV+wy3ZJYHS3RvZSZRZx9bDUjfI2xjJ4bp9Edk/t4zrxwsQxjZfu86q8DenAxafMMR cw2BZ7Q79UkMOVPAlcofD49DEaj8EhVUUOz0Q5udDIKglehVgMNLcLQySTtHfeplNG6D xKs04UvEdtKLR5FzLYxbqzmwvs2LE+4jG1/Zxo/VF+g3gnDp3FtEwjUgdD4TxM8VsJf1 2g2C5GI3D9y3zVjpge/gD8/D2xIS3UbpDIxeVsjsvlvL3lxu5PYAfLL1ZwKJ+OsE4e2t F+LA== X-Gm-Message-State: AOJu0Yzlh1QtlDEIXh9c5Gou14nKS/Zb4+HhzyW0XF+MUIOcUqSXF1Lq yHDHupii9l5EGpm0nT8wXAhDsad4Aj5bN0gj/Jt4sH78zN4cG/4/eL9ovQxHTYojBf/q4ApiwFP D X-Google-Smtp-Source: AGHT+IF7WpbNpWyTyXus/GP42HxlOMfRKojvUXbkLXKl5bwHWfYcEQtYdX6SbhoCqveMz+EBHrdoPg== X-Received: by 2002:a05:6a20:3f24:b0:1aa:a384:5e73 with SMTP id az36-20020a056a203f2400b001aaa3845e73mr5833195pzb.0.1713619970715; Sat, 20 Apr 2024 06:32:50 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.194]) by smtp.gmail.com with ESMTPSA id k5-20020a6568c5000000b005f7ba54e499sm2926610pgt.87.2024.04.20.06.32.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Apr 2024 06:32:48 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 1/5] io_uring/net: add generic multishot retry helper Date: Sat, 20 Apr 2024 07:29:43 -0600 Message-ID: <20240420133233.500590-3-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240420133233.500590-2-axboe@kernel.dk> References: <20240420133233.500590-2-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This is just moving io_recv_prep_retry() higher up so it can get used for sends as well, and rename it to be generically useful for both sends and receives. Signed-off-by: Jens Axboe --- io_uring/net.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index a1da8a2ebf15..dc310f0bfe4c 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -185,6 +185,17 @@ static int io_net_vec_assign(struct io_kiocb *req, struct io_async_msghdr *kmsg, return 0; } +static inline void io_mshot_prep_retry(struct io_kiocb *req, + struct io_async_msghdr *kmsg) +{ + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); + + req->flags &= ~REQ_F_BL_EMPTY; + sr->done_io = 0; + sr->len = 0; /* get from the provided buffer */ + req->buf_index = sr->buf_group; +} + #ifdef CONFIG_COMPAT static int io_compat_msg_copy_hdr(struct io_kiocb *req, struct io_async_msghdr *iomsg, @@ -658,17 +669,6 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return io_recvmsg_prep_setup(req); } -static inline void io_recv_prep_retry(struct io_kiocb *req, - struct io_async_msghdr *kmsg) -{ - struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); - - req->flags &= ~REQ_F_BL_EMPTY; - sr->done_io = 0; - sr->len = 0; /* get from the provided buffer */ - req->buf_index = sr->buf_group; -} - /* * Finishes io_recv and io_recvmsg. * @@ -694,7 +694,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE; - io_recv_prep_retry(req, kmsg); + io_mshot_prep_retry(req, kmsg); /* Known not-empty or unknown state, retry */ if (cflags & IORING_CQE_F_SOCK_NONEMPTY || kmsg->msg.msg_inq < 0) { if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY) From patchwork Sat Apr 20 13:29:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13637150 Received: from mail-il1-f180.google.com (mail-il1-f180.google.com [209.85.166.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E87F8C10 for ; Sat, 20 Apr 2024 13:32:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713619976; cv=none; b=FpQg3cDyr6oBSz1tz/s3c7oit2rIF8LuA6O9LPbZrQM5geZmz86Z3HEnh23GCFkneM1uM+SpQZjvv1FEIMTfJq7Cb91eYkHDw+swsZgqUFzWXPfHRweWg3jLVMYpaVCe7kmrpvxmxaOG2RcfDEBedrVVnJUOZpza0QoqB1Dq9CI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713619976; c=relaxed/simple; bh=RkrIYwPaHE2BrPfHlakApbKIr+DWF9XVaWCh9GHmgWw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NG6jrLJbvLKvuvCn0YhLA9hYDJviHkPbie7P5yntts0K1M4zxBQt6tcnpIB/Wi43wmr9TSEvULnnO7nN0Y8kJkcZ6ANmVH2YsYtyFfS1GpE+zrQsdUZoL59qcjgrF/UD1ROPE7dJmDOu57B9QZqPARFZdDEE8rAb3+9ZQnNklbc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=SCsr2RA5; arc=none smtp.client-ip=209.85.166.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="SCsr2RA5" Received: by mail-il1-f180.google.com with SMTP id e9e14a558f8ab-36b2e3b0ef9so288025ab.3 for ; Sat, 20 Apr 2024 06:32:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1713619973; x=1714224773; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9OgX+O3/zuVrt/DBcGcP+zeMYKjKi+PDHln5LiTngIg=; b=SCsr2RA5sQcNZTIPu+6msryOxvNbhNJH4CR5EcoqHlYVk08hDfn4sDTjJ9guEFzaHb EPvCsSMLfzJ+ecFX7+PNviRw5Hs3RWRJIDig9kr4w8q+X6GfzM36LNtyooolxzJDGlJE mWzjYbHRNNFqVLuT5hjfntDinQnmaqsfrwd48dR4J1yK/bk1PCYIB5Ha4zP5oCrQQinM h/Pa1sUAuNu637bO4yhaiyas9U8DinBk7xX4Bmg4irUAT6IQpipr788XVQT/6W43xXZJ 7/pkFBhQa84C/1y5UDV9iq7t4CB7WQltEqsbbUGyt68gInlYi130q1eLsrsMtxWDN07S R3lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713619973; x=1714224773; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9OgX+O3/zuVrt/DBcGcP+zeMYKjKi+PDHln5LiTngIg=; b=wfpzwOumsyHqo6rqMoTPqarYAdMixDhjV44mH9U9aaQXvnYF9ePSnEgDK0cwIl2vW7 KN/BwSH8mUCvVynrVGO7jKBokXRmebUjttdvBxfO9+RcR9gxUpjJIqK8EkmyJ7ujiVaE oh4j6ePoNSbUxJ6Pftbp1Hz7aS+ldWKDF8V/AAkX6sRLUal1MYTqnJu75WGN87R4nGfk nIrQikWGdv6Pq5wOmOlJD/YAc/VHHqtBia7M8QHv/C5sf51e/aO5kcAL0DDZdi1kifDA GIMq6FbOEHTqmcP5sHezzeC5Y7R68YB8XQ2EDJPeigahGlvzscSVIMSBhGN66hpCrREb Yn5g== X-Gm-Message-State: AOJu0YwuK44Og8nsX2am4PQzlprtb2OiCZeXoifU6XQetZf/K+CDIRVE iy7gLt7a8tOrN5H4k/MqLZz39Zll3bmBBHpdgAfEEpGdOd2dBisS66k7+qdFHNlOYzF/Fw+iflo n X-Google-Smtp-Source: AGHT+IGMUb/G1F74JWskKoqSUJCwWqluK44hBnLc74Md/6Mk0oXki/jSF4POndlp4rGug88UUKHjzg== X-Received: by 2002:a05:6e02:1d1a:b0:36a:3ee8:b9f0 with SMTP id i26-20020a056e021d1a00b0036a3ee8b9f0mr6705232ila.0.1713619972803; Sat, 20 Apr 2024 06:32:52 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.194]) by smtp.gmail.com with ESMTPSA id k5-20020a6568c5000000b005f7ba54e499sm2926610pgt.87.2024.04.20.06.32.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Apr 2024 06:32:51 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 2/5] io_uring/net: add provided buffer support for IORING_OP_SEND Date: Sat, 20 Apr 2024 07:29:44 -0600 Message-ID: <20240420133233.500590-4-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240420133233.500590-2-axboe@kernel.dk> References: <20240420133233.500590-2-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 It's pretty trivial to wire up provided buffer support for the send side, just like how it's done the receive side. This enables setting up a buffer ring that an application can use to push pending sends to, and then have a send pick a buffer from that ring. One of the challenges with async IO and networking sends is that you can get into reordering conditions if you have more than one inflight at the same time. Consider the following scenario where everything is fine: 1) App queues sendA for socket1 2) App queues sendB for socket1 3) App does io_uring_submit() 4) sendA is issued, completes successfully, posts CQE 5) sendB is issued, completes successfully, posts CQE All is fine. Requests are always issued in-order, and both complete inline as most sends do. However, if we're flooding socket1 with sends, the following could also result from the same sequence: 1) App queues sendA for socket1 2) App queues sendB for socket1 3) App does io_uring_submit() 4) sendA is issued, socket1 is full, poll is armed for retry 5) Space frees up in socket1, this triggers sendA retry via task_work 6) sendB is issued, completes successfully, posts CQE 7) sendA is retried, completes successfully, posts CQE Now we've sent sendB before sendA, which can make things unhappy. If both sendA and sendB had been using provided buffers, then it would look as follows instead: 1) App queues dataA for sendA, queues sendA for socket1 2) App queues dataB for sendB queues sendB for socket1 3) App does io_uring_submit() 4) sendA is issued, socket1 is full, poll is armed for retry 5) Space frees up in socket1, this triggers sendA retry via task_work 6) sendB is issued, picks first buffer (dataA), completes successfully, posts CQE (which says "I sent dataA") 7) sendA is retried, picks first buffer (dataB), completes successfully, posts CQE (which says "I sent dataB") Now we've sent the data in order, and everybody is happy. It's worth noting that this also opens the door for supporting multishot sends, as provided buffers would be a prerequisite for that. Those can trigger either when new buffers are added to the outgoing ring, or (if stalled due to lack of space) when space frees up in the socket. Signed-off-by: Jens Axboe --- io_uring/net.c | 25 ++++++++++++++++++++----- io_uring/opdef.c | 1 + 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index dc310f0bfe4c..13685d133582 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -364,10 +364,12 @@ static int io_send_setup(struct io_kiocb *req) kmsg->msg.msg_name = &kmsg->addr; kmsg->msg.msg_namelen = sr->addr_len; } - ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &kmsg->msg.msg_iter); - if (unlikely(ret < 0)) - return ret; - + if (!io_do_buffer_select(req)) { + ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, + &kmsg->msg.msg_iter); + if (unlikely(ret < 0)) + return ret; + } return 0; } @@ -480,6 +482,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); struct io_async_msghdr *kmsg = req->async_data; struct socket *sock; + unsigned int cflags; unsigned flags; int min_ret = 0; int ret; @@ -492,6 +495,17 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) (sr->flags & IORING_RECVSEND_POLL_FIRST)) return -EAGAIN; + if (io_do_buffer_select(req)) { + size_t len = sr->len; + void __user *buf; + + buf = io_buffer_select(req, &len, issue_flags); + if (unlikely(!buf)) + return -ENOBUFS; + sr->buf = buf; + sr->len = len; + } + flags = sr->msg_flags; if (issue_flags & IO_URING_F_NONBLOCK) flags |= MSG_DONTWAIT; @@ -521,7 +535,8 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) else if (sr->done_io) ret = sr->done_io; io_req_msg_cleanup(req, issue_flags); - io_req_set_res(req, ret, 0); + cflags = io_put_kbuf(req, issue_flags); + io_req_set_res(req, ret, cflags); return IOU_OK; } diff --git a/io_uring/opdef.c b/io_uring/opdef.c index a16f73938ebb..2de5cca9504e 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -281,6 +281,7 @@ const struct io_issue_def io_issue_defs[] = { .pollout = 1, .audit_skip = 1, .ioprio = 1, + .buffer_select = 1, #if defined(CONFIG_NET) .async_size = sizeof(struct io_async_msghdr), .prep = io_sendmsg_prep, From patchwork Sat Apr 20 13:29:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13637151 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56F852030A for ; Sat, 20 Apr 2024 13:32:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713619978; cv=none; b=DAxrfJY4ghAabXhu4TeVMNUYuUax/vEQ07qPd3G8MiY93r6pdVLeRB0e52Ay8L3S5de6dlh8hS6ppaNC+3GcDhyUq+kWZBwrKw/WADlqV8QQMUuSdo1nJHfMwWH1ehGZXwAYug0dlzzsW32Sls4VfjdRszR4Hal9p/vSCXeRX+o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713619978; c=relaxed/simple; bh=rh9/Yc4BzJju/X1Nc2UlrWMhtpQ0YUYUlogRFwI0T5Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rb1Uy2aef418RA/YBRd2hrs0pXc+W+7yhlUu2Hlb/3s4vKOtNRB/aFfwS3BDA4wzZtNmQt3FE3XrLr3TanXXtMOwDp80UQxS7CpX4/myGrguBY4h8+VbK2pEuDznt7MUgAPmOg59ki2UYO5qNciq6pImXSEW7y9F8Q31nhVn5nY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=0YjdolPA; arc=none smtp.client-ip=209.85.216.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="0YjdolPA" Received: by mail-pj1-f51.google.com with SMTP id 98e67ed59e1d1-2a58d2e5be8so723566a91.0 for ; Sat, 20 Apr 2024 06:32:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1713619975; x=1714224775; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P/x/G1anmqCIrevrbLW+K8zTuG2c46uBjTz0TJFDREw=; b=0YjdolPAuA6rfpt20H+3DefgNhQRn4Nfr5y7d/HMIkEY5HL1HWQi/oWzJov0RX/K22 0zaVcr0iC5eNEaLwopAAMVDeTwkDaurz9GFLY+YQ0nVIaCkKTz3cY50smuyRpD2aKNtV L1rKB4tjaYP0KOmd2pYZP4vrKOPiHArbTQ0CexkwpxwRsHrUBaCIadRo9QcOP7rIGbmK P5jlICUIaXV/qVuMjUsSS7mkhZjVD+r7VgoLnoWkcWheinMg4gJFsKY+P8vlGGOc4UYr T9R7n44mw02VSwrcCr8bkv63JP1Yk8s0KEeNOxKFr6V+qbITOFePZJ6otNUgY7ABLp3r iRLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713619975; x=1714224775; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P/x/G1anmqCIrevrbLW+K8zTuG2c46uBjTz0TJFDREw=; b=HXfPhJHYhe5jYYh8MELoZHOUYhoB+V6sR/qUZm2pPUA2f4Rc6ej12n3axjNg2tpE+T IVAssOYjcuU5g5JK3txIRRVLkETEHstFVeJn4ydJN9hgfo1sr695uXoF9ObiaYFGkw9y /ElmFGMptCWIq+4K8qempL2ZfAKKvkP22smW7aufGIPlAsMKyya0TpLsYGjXBzSMDD6x FzoTBOJJcXaaAw5xzzCBpHCL8hxTzQ3swaw6osl7GWsjOEOVMXV9WB5UrmgC46eac74V Zrf80L7sGvu2ljBItom+0xJlLf2P89NiP+ZZuUSdEQL6bi7eCJZrGFkVHl6wX1lisezf gKHg== X-Gm-Message-State: AOJu0Yzw86n6t9oqw/xEZubwwVf8j6ZTvkgDNTBa0XLDk6QPsOTtIGop 23rxZkkBrRItsPjysxyz0oTYMiIKVrA+GyatOmJBV8Opzd5IHz82PDgLLPyyQsL054bzFn45w7V 9 X-Google-Smtp-Source: AGHT+IEbQRzcHe7HsiWUfL7o1QthQWEVPoddAMVinYDN9KB3woUSvhW1i/M50FYp3RdmEyKYKs2nVw== X-Received: by 2002:a05:6a21:60a:b0:1ac:ef54:91da with SMTP id ll10-20020a056a21060a00b001acef5491damr1661882pzb.1.1713619974840; Sat, 20 Apr 2024 06:32:54 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.194]) by smtp.gmail.com with ESMTPSA id k5-20020a6568c5000000b005f7ba54e499sm2926610pgt.87.2024.04.20.06.32.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Apr 2024 06:32:53 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/5] io_uring/kbuf: add helpers for getting/peeking multiple buffers Date: Sat, 20 Apr 2024 07:29:45 -0600 Message-ID: <20240420133233.500590-5-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240420133233.500590-2-axboe@kernel.dk> References: <20240420133233.500590-2-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Our provided buffer interface only allows selection of a single buffer. Add an API that allows getting/peeking multiple buffers at the same time. This is only implemented for the ring provided buffers. It could be added for the legacy provided buffers as well, but since it's strongly encouraged to use the new interface, let's keep it simpler and just provide it for the new API. The legacy interface will always just select a single buffer. There are two new main functions: io_buffers_select(), which selects up as many buffers as it can. The caller supplies the iovec array, and io_buffers_select() may allocate a bigger array if the 'out_len' being passed in is non-zero and bigger than what fits in the provided iovec. Buffers grabbed with this helper are permanently assigned. io_buffers_peek(), which works like io_buffers_select(), except they can be recycled, if needed. Callers using either of these functions should call io_put_kbufs() rather than io_put_kbuf() at completion time. The peek interface must be called with the ctx locked from peek to completion. This add a bit state for the request: - REQ_F_BUFFERS_COMMIT, which means that the the buffers have been peeked and should be committed to the buffer ring head when they are put as part of completion. Prior to this, req->buf_list was cleared to NULL when committed. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 3 + io_uring/kbuf.c | 157 ++++++++++++++++++++++++++++++++- io_uring/kbuf.h | 53 +++++++++-- 3 files changed, 201 insertions(+), 12 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index c47f412cf18e..7a6b190c7da7 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -472,6 +472,7 @@ enum { REQ_F_CAN_POLL_BIT, REQ_F_BL_EMPTY_BIT, REQ_F_BL_NO_RECYCLE_BIT, + REQ_F_BUFFERS_COMMIT_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -550,6 +551,8 @@ enum { REQ_F_BL_EMPTY = IO_REQ_FLAG(REQ_F_BL_EMPTY_BIT), /* don't recycle provided buffers for this request */ REQ_F_BL_NO_RECYCLE = IO_REQ_FLAG(REQ_F_BL_NO_RECYCLE_BIT), + /* buffer ring head needs incrementing on put */ + REQ_F_BUFFERS_COMMIT = IO_REQ_FLAG(REQ_F_BUFFERS_COMMIT_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 3846a055df44..d2945c9c812b 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -117,6 +117,27 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len, return NULL; } +static int io_provided_buffers_select(struct io_kiocb *req, size_t *len, + struct io_buffer_list *bl, + struct iovec *iov) +{ + void __user *buf; + + buf = io_provided_buffer_select(req, len, bl); + if (unlikely(!buf)) + return -ENOBUFS; + + iov[0].iov_base = buf; + iov[0].iov_len = *len; + return 0; +} + +static struct io_uring_buf *io_ring_head_to_buf(struct io_uring_buf_ring *br, + __u16 head, __u16 mask) +{ + return &br->bufs[head & mask]; +} + static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, struct io_buffer_list *bl, unsigned int issue_flags) @@ -132,11 +153,10 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, if (head + 1 == tail) req->flags |= REQ_F_BL_EMPTY; - head &= bl->mask; - buf = &br->bufs[head]; + buf = io_ring_head_to_buf(br, head, bl->mask); if (*len == 0 || *len > buf->len) *len = buf->len; - req->flags |= REQ_F_BUFFER_RING; + req->flags |= REQ_F_BUFFER_RING | REQ_F_BUFFERS_COMMIT; req->buf_list = bl; req->buf_index = buf->bid; @@ -151,6 +171,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, * the transfer completes (or if we get -EAGAIN and must poll of * retry). */ + req->flags &= ~REQ_F_BUFFERS_COMMIT; req->buf_list = NULL; bl->head++; } @@ -177,6 +198,136 @@ void __user *io_buffer_select(struct io_kiocb *req, size_t *len, return ret; } +/* cap it at a reasonable 256, will be one page even for 4K */ +#define PEEK_MAX_IMPORT 256 + +static int io_ring_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg, + struct io_buffer_list *bl) +{ + struct io_uring_buf_ring *br = bl->buf_ring; + struct iovec *iov = arg->iovs; + int nr_iovs = arg->nr_iovs; + __u16 nr_avail, tail, head; + struct io_uring_buf *buf; + + tail = smp_load_acquire(&br->tail); + head = bl->head; + nr_avail = min_t(__u16, tail - head, UIO_MAXIOV); + if (unlikely(!nr_avail)) + return -ENOBUFS; + + buf = io_ring_head_to_buf(br, head, bl->mask); + if (arg->max_len) { + int needed; + + needed = (arg->max_len + buf->len - 1) / buf->len; + needed = min(needed, PEEK_MAX_IMPORT); + if (nr_avail > needed) + nr_avail = needed; + } + + /* + * only alloc a bigger array if we know we have data to map, eg not + * a speculative peek operation. + */ + if (arg->mode & KBUF_MODE_EXPAND && nr_avail > nr_iovs && arg->max_len) { + iov = kmalloc_array(nr_avail, sizeof(struct iovec), GFP_KERNEL); + if (unlikely(!iov)) + return -ENOMEM; + if (arg->mode & KBUF_MODE_FREE) + kfree(arg->iovs); + arg->iovs = iov; + nr_iovs = nr_avail; + } else if (nr_avail < nr_iovs) { + nr_iovs = nr_avail; + } + + /* set it to max, if not set, so we can use it unconditionally */ + if (!arg->max_len) + arg->max_len = INT_MAX; + + req->buf_index = buf->bid; + do { + /* truncate end piece, if needed */ + if (buf->len > arg->max_len) + buf->len = arg->max_len; + + iov->iov_base = u64_to_user_ptr(buf->addr); + iov->iov_len = buf->len; + iov++; + + arg->out_len += buf->len; + arg->max_len -= buf->len; + if (!arg->max_len) + break; + + buf = io_ring_head_to_buf(br, ++head, bl->mask); + } while (--nr_iovs); + + if (head == tail) + req->flags |= REQ_F_BL_EMPTY; + + req->flags |= REQ_F_BUFFER_RING; + req->buf_list = bl; + return iov - arg->iovs; +} + +int io_buffers_select(struct io_kiocb *req, struct buf_sel_arg *arg, + unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_buffer_list *bl; + int ret = -ENOENT; + + io_ring_submit_lock(ctx, issue_flags); + bl = io_buffer_get_list(ctx, req->buf_index); + if (unlikely(!bl)) + goto out_unlock; + + if (bl->is_buf_ring) { + ret = io_ring_buffers_peek(req, arg, bl); + /* + * Don't recycle these buffers if we need to go through poll. + * Nobody else can use them anyway, and holding on to provided + * buffers for a send/write operation would happen on the app + * side anyway with normal buffers. Besides, we already + * committed them, they cannot be put back in the queue. + */ + if (ret > 0) { + req->flags |= REQ_F_BL_NO_RECYCLE; + req->buf_list->head += ret; + } + } else { + ret = io_provided_buffers_select(req, &arg->out_len, bl, arg->iovs); + } +out_unlock: + io_ring_submit_unlock(ctx, issue_flags); + return ret; +} + +int io_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_buffer_list *bl; + int ret; + + lockdep_assert_held(&ctx->uring_lock); + + bl = io_buffer_get_list(ctx, req->buf_index); + if (unlikely(!bl)) + return -ENOENT; + + if (bl->is_buf_ring) { + ret = io_ring_buffers_peek(req, arg, bl); + if (ret > 0) + req->flags |= REQ_F_BUFFERS_COMMIT; + return ret; + } + + /* don't support multiple buffer selections for legacy */ + return io_provided_buffers_select(req, &arg->max_len, bl, arg->iovs); +} + static int __io_remove_buffers(struct io_ring_ctx *ctx, struct io_buffer_list *bl, unsigned nbufs) { diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index 5a9635ee0217..b90aca3a57fa 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -41,8 +41,26 @@ struct io_buffer { __u16 bgid; }; +enum { + /* can alloc a bigger vec */ + KBUF_MODE_EXPAND = 1, + /* if bigger vec allocated, free old one */ + KBUF_MODE_FREE = 2, +}; + +struct buf_sel_arg { + struct iovec *iovs; + size_t out_len; + size_t max_len; + int nr_iovs; + int mode; +}; + void __user *io_buffer_select(struct io_kiocb *req, size_t *len, unsigned int issue_flags); +int io_buffers_select(struct io_kiocb *req, struct buf_sel_arg *arg, + unsigned int issue_flags); +int io_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg); void io_destroy_buffers(struct io_ring_ctx *ctx); int io_remove_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); @@ -75,7 +93,7 @@ static inline bool io_kbuf_recycle_ring(struct io_kiocb *req) */ if (req->buf_list) { req->buf_index = req->buf_list->bgid; - req->flags &= ~REQ_F_BUFFER_RING; + req->flags &= ~(REQ_F_BUFFER_RING|REQ_F_BUFFERS_COMMIT); return true; } return false; @@ -99,11 +117,16 @@ static inline bool io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags) return false; } -static inline void __io_put_kbuf_ring(struct io_kiocb *req) +static inline void __io_put_kbuf_ring(struct io_kiocb *req, int nr) { - if (req->buf_list) { - req->buf_index = req->buf_list->bgid; - req->buf_list->head++; + struct io_buffer_list *bl = req->buf_list; + + if (bl) { + if (req->flags & REQ_F_BUFFERS_COMMIT) { + bl->head += nr; + req->flags &= ~REQ_F_BUFFERS_COMMIT; + } + req->buf_index = bl->bgid; } req->flags &= ~REQ_F_BUFFER_RING; } @@ -112,7 +135,7 @@ static inline void __io_put_kbuf_list(struct io_kiocb *req, struct list_head *list) { if (req->flags & REQ_F_BUFFER_RING) { - __io_put_kbuf_ring(req); + __io_put_kbuf_ring(req, 1); } else { req->buf_index = req->kbuf->bgid; list_add(&req->kbuf->list, list); @@ -130,8 +153,8 @@ static inline void io_kbuf_drop(struct io_kiocb *req) __io_put_kbuf_list(req, &req->ctx->io_buffers_comp); } -static inline unsigned int io_put_kbuf(struct io_kiocb *req, - unsigned issue_flags) +static inline unsigned int __io_put_kbufs(struct io_kiocb *req, int nbufs, + unsigned issue_flags) { unsigned int ret; @@ -140,9 +163,21 @@ static inline unsigned int io_put_kbuf(struct io_kiocb *req, ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT); if (req->flags & REQ_F_BUFFER_RING) - __io_put_kbuf_ring(req); + __io_put_kbuf_ring(req, nbufs); else __io_put_kbuf(req, issue_flags); return ret; } + +static inline unsigned int io_put_kbuf(struct io_kiocb *req, + unsigned issue_flags) +{ + return __io_put_kbufs(req, 1, issue_flags); +} + +static inline unsigned int io_put_kbufs(struct io_kiocb *req, int nbufs, + unsigned issue_flags) +{ + return __io_put_kbufs(req, nbufs, issue_flags); +} #endif From patchwork Sat Apr 20 13:29:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13637152 Received: from mail-il1-f172.google.com (mail-il1-f172.google.com [209.85.166.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A12F168DC for ; Sat, 20 Apr 2024 13:32:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713619980; cv=none; b=Xsk8madW2iZlenqj1B045kqDUaUt//NgjvMa2FBwQkEPQ4lqTKHzNX5DWQySgsZ9C0CGubXGlogwgY2kmJMq/EXQrf1aKFYfTkKkuXbhXvXoDpBHV2pSXsxhc6WSjQKdR5Z9YevOllHH9eqeKAgnccOZU37Uf9+ni98VefzEk5o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713619980; c=relaxed/simple; bh=cYya2fnpl8JJZQozQ8SNJnqY12zRBXux2HWOKUgYteA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=o31C2Uboxk1UdFCmWG6DTi9lr6TFAxfIWs0w8CYmf9Z22JxETaS3yV077J41v9qsWXrfb7xuOmtvGKIliBdjjpjEo6Frk0yZ+AOMlmdinxV6+l+JV8NiE1c1+La2pESsM5oGMB4xsV5gm0H4KYBAYrH8zLCSje6rRIYRq9tD+bU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=eOBXJQK/; arc=none smtp.client-ip=209.85.166.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="eOBXJQK/" Received: by mail-il1-f172.google.com with SMTP id e9e14a558f8ab-36b2e3b1f34so226615ab.3 for ; Sat, 20 Apr 2024 06:32:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1713619977; x=1714224777; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9aPxa/LSmY4yNZ59jGCA1Lruzf3dDnvJMOq+tmBjLZU=; b=eOBXJQK/wlRWRtcyPyvuhuZ1qLKQUN6cKf9B+Je3mK9B75dNwYiIVhUj68W5TsFO44 lPkOwH+Q+ESD25j4ASBJNnRs49ZAjrBKD5qFkva7gZkuMgpD4QN6nMUplsV3QBzHfBNK 6Co1p5IGF+LmbWuLhPYBo7hX899PM1+xBnyKMIkVZpRvMciTpdz5LI1GJ3p5HeWbJVks QTH7hLer5IasbRIXZN/PRdeD0Evb0nIE2wLbDr+5WskEpzqLLuTYFVZOpHwGF0wcBMsl wraPwD/tVjcEZSHRVl44fdxAkwJRfp+8sVa6pqZA1moYcnM4v+gSUHKIcOEUmjOcaaeV ICkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713619977; x=1714224777; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9aPxa/LSmY4yNZ59jGCA1Lruzf3dDnvJMOq+tmBjLZU=; b=bCfGxYrMgvabjm3hO9ODn/WiZ+5OLpOrGPa6ZNQgRvlV3tIdjUbOmy4mkgAGN3l4wU LToKHJTl6bqgArhluuYeXDV+zqw2QAhiLEeRIOlSEZZBoiNDvVWDM4lEHY6vg81aTtR+ 09g71ScUSHD3W6Wj57e3nWH+UzotdSqzcxHG6BAXzZsJg/8AS+d5qyUHobheo2t+JMDH w8/iA1J9+bQWV6DhkjcoEGv30FeX35EUEEV5hnHwEmIALOpUiOUrlcxs3RPuX+tB2Hli vx4YI/myI4G0HAprXj6sRVyo3HUpuEDsTIij4NEvqB1bjqv4KAjmbo37AmV5rux0nQ0j M4uw== X-Gm-Message-State: AOJu0YxAZ8vq25MXzgAWC56qaNtV+NzOSoaKtFm8UTM7AT5rVw8F0s8G dS+/jlGogo3fmB8OY0bcXBk25eO1olE0K6UVje/vJ8NnSrMZJnXzOE6g8mcBkqUjdlCqSwIssps x X-Google-Smtp-Source: AGHT+IFdHgvO5sTfWScobAo2iiKU4eVtXXsJHY4u8/kleAykkjxtzYd+qYSY5JXS3JLjCNIzcgSzHg== X-Received: by 2002:a92:c241:0:b0:36b:2a68:d7ee with SMTP id k1-20020a92c241000000b0036b2a68d7eemr6158301ilo.1.1713619976912; Sat, 20 Apr 2024 06:32:56 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.194]) by smtp.gmail.com with ESMTPSA id k5-20020a6568c5000000b005f7ba54e499sm2926610pgt.87.2024.04.20.06.32.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Apr 2024 06:32:55 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 4/5] io_uring/net: support bundles for send Date: Sat, 20 Apr 2024 07:29:46 -0600 Message-ID: <20240420133233.500590-6-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240420133233.500590-2-axboe@kernel.dk> References: <20240420133233.500590-2-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If IORING_OP_SEND is used with provided buffers, the caller may also set IORING_RECVSEND_BUNDLE to turn it into a multi-buffer send. The idea is that an application can fill outgoing buffers in a provided buffer group, and then arm a single send that will service them all. Once there are no more buffers to send, or if the requested length has been sent, the request posts a single completion for all the buffers. This only enables it for IORING_OP_SEND, IORING_OP_SENDMSG is coming in a separate patch. However, this patch does do a lot of the prep work that makes wiring up the sendmsg variant pretty trivial. They share the prep side. Signed-off-by: Jens Axboe --- include/uapi/linux/io_uring.h | 9 +++ io_uring/net.c | 145 ++++++++++++++++++++++++++++++---- 2 files changed, 137 insertions(+), 17 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index a7f847543a7f..7f583927c908 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -351,11 +351,20 @@ enum io_uring_op { * 0 is reported if zerocopy was actually possible. * IORING_NOTIF_USAGE_ZC_COPIED if data was copied * (at least partially). + * + * IORING_RECVSEND_BUNDLE Used with IOSQE_BUFFER_SELECT. If set, send will + * grab as many buffers from the buffer group ID + * given and send them all. The completion result + * will be the number of buffers send, with the + * starting buffer ID in cqe->flags as per usual + * for provided buffer usage. The buffers will be + * contigious from the starting buffer ID. */ #define IORING_RECVSEND_POLL_FIRST (1U << 0) #define IORING_RECV_MULTISHOT (1U << 1) #define IORING_RECVSEND_FIXED_BUF (1U << 2) #define IORING_SEND_ZC_REPORT_USAGE (1U << 3) +#define IORING_RECVSEND_BUNDLE (1U << 4) /* * cqe.res for IORING_CQE_F_NOTIF if diff --git a/io_uring/net.c b/io_uring/net.c index 13685d133582..3e326576254b 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -57,7 +57,7 @@ struct io_sr_msg { struct user_msghdr __user *umsg; void __user *buf; }; - unsigned len; + int len; unsigned done_io; unsigned msg_flags; unsigned nr_multishot_loops; @@ -389,6 +389,8 @@ static int io_sendmsg_prep_setup(struct io_kiocb *req, int is_msg) return ret; } +#define SENDMSG_FLAGS (IORING_RECVSEND_POLL_FIRST | IORING_RECVSEND_BUNDLE) + int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); @@ -407,11 +409,20 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr)); sr->len = READ_ONCE(sqe->len); sr->flags = READ_ONCE(sqe->ioprio); - if (sr->flags & ~IORING_RECVSEND_POLL_FIRST) + if (sr->flags & ~SENDMSG_FLAGS) return -EINVAL; sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL; if (sr->msg_flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; + if (sr->flags & IORING_RECVSEND_BUNDLE) { + if (req->opcode == IORING_OP_SENDMSG) + return -EINVAL; + if (!(req->flags & REQ_F_BUFFER_SELECT)) + return -EINVAL; + sr->msg_flags |= MSG_WAITALL; + sr->buf_group = req->buf_index; + req->buf_list = NULL; + } #ifdef CONFIG_COMPAT if (req->ctx->compat) @@ -427,6 +438,79 @@ static void io_req_msg_cleanup(struct io_kiocb *req, io_netmsg_recycle(req, issue_flags); } +/* + * For bundle completions, we need to figure out how many segments we consumed. + * A bundle could be using a single ITER_UBUF if that's all we mapped, or it + * could be using an ITER_IOVEC. If the latter, then if we consumed all of + * the segments, then it's a trivial questiont o answer. If we have residual + * data in the iter, then loop the segments to figure out how much we + * transferred. + */ +static int io_bundle_nbufs(struct io_async_msghdr *kmsg, int ret) +{ + struct iovec *iov; + int nbufs; + + /* no data is always zero segments, and a ubuf is always 1 segment */ + if (ret <= 0) + return 0; + if (iter_is_ubuf(&kmsg->msg.msg_iter)) + return 1; + + iov = kmsg->free_iov; + if (!iov) + iov = &kmsg->fast_iov; + + /* if all data was transferred, it's basic pointer math */ + if (!iov_iter_count(&kmsg->msg.msg_iter)) + return iter_iov(&kmsg->msg.msg_iter) - iov; + + /* short transfer, count segments */ + nbufs = 0; + do { + int this_len = min_t(int, iov[nbufs].iov_len, ret); + + nbufs++; + ret -= this_len; + } while (ret); + + return nbufs; +} + +static inline bool io_send_finish(struct io_kiocb *req, int *ret, + struct io_async_msghdr *kmsg, + unsigned issue_flags) +{ + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); + bool bundle_finished = *ret <= 0; + unsigned int cflags; + + if (!(sr->flags & IORING_RECVSEND_BUNDLE)) { + cflags = io_put_kbuf(req, issue_flags); + goto finish; + } + + cflags = io_put_kbufs(req, io_bundle_nbufs(kmsg, *ret), issue_flags); + + if (bundle_finished || req->flags & REQ_F_BL_EMPTY) + goto finish; + + /* + * Fill CQE for this receive and see if we should keep trying to + * receive from this socket. + */ + if (io_req_post_cqe(req, *ret, cflags | IORING_CQE_F_MORE)) { + io_mshot_prep_retry(req, kmsg); + return false; + } + + /* Otherwise stop bundle and use the current result. */ +finish: + io_req_set_res(req, *ret, cflags); + *ret = IOU_OK; + return true; +} + int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); @@ -482,7 +566,6 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); struct io_async_msghdr *kmsg = req->async_data; struct socket *sock; - unsigned int cflags; unsigned flags; int min_ret = 0; int ret; @@ -495,21 +578,47 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) (sr->flags & IORING_RECVSEND_POLL_FIRST)) return -EAGAIN; + flags = sr->msg_flags; + if (issue_flags & IO_URING_F_NONBLOCK) + flags |= MSG_DONTWAIT; + +retry_bundle: if (io_do_buffer_select(req)) { - size_t len = sr->len; - void __user *buf; + struct buf_sel_arg arg = { + .iovs = &kmsg->fast_iov, + .max_len = min_not_zero(sr->len, INT_MAX), + .nr_iovs = 1, + .mode = KBUF_MODE_EXPAND, + }; + + if (kmsg->free_iov) { + arg.nr_iovs = kmsg->free_iov_nr; + arg.iovs = kmsg->free_iov; + arg.mode |= KBUF_MODE_FREE; + } - buf = io_buffer_select(req, &len, issue_flags); - if (unlikely(!buf)) - return -ENOBUFS; - sr->buf = buf; - sr->len = len; + if (!(sr->flags & IORING_RECVSEND_BUNDLE)) + arg.nr_iovs = 1; + + ret = io_buffers_select(req, &arg, issue_flags); + if (unlikely(ret < 0)) + return ret; + + sr->len = arg.out_len; + iov_iter_init(&kmsg->msg.msg_iter, ITER_SOURCE, arg.iovs, ret, + arg.out_len); + if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->free_iov) { + kmsg->free_iov_nr = ret; + kmsg->free_iov = arg.iovs; + } } - flags = sr->msg_flags; - if (issue_flags & IO_URING_F_NONBLOCK) - flags |= MSG_DONTWAIT; - if (flags & MSG_WAITALL) + /* + * If MSG_WAITALL is set, or this is a bundle send, then we need + * the full amount. If just bundle is set, if we do a short send + * then we complete the bundle sequence rather than continue on. + */ + if (flags & MSG_WAITALL || sr->flags & IORING_RECVSEND_BUNDLE) min_ret = iov_iter_count(&kmsg->msg.msg_iter); flags &= ~MSG_INTERNAL_SENDMSG_FLAGS; @@ -534,10 +643,12 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) ret += sr->done_io; else if (sr->done_io) ret = sr->done_io; + + if (!io_send_finish(req, &ret, kmsg, issue_flags)) + goto retry_bundle; + io_req_msg_cleanup(req, issue_flags); - cflags = io_put_kbuf(req, issue_flags); - io_req_set_res(req, ret, cflags); - return IOU_OK; + return ret; } static int io_recvmsg_mshot_prep(struct io_kiocb *req, From patchwork Sat Apr 20 13:29:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13637153 Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7BD3D1CD25 for ; Sat, 20 Apr 2024 13:33:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713619982; cv=none; b=ldRvkdBzhp7nIXwqlrvtre4Jf7wcJGUYezEizp7CySp8uiN3dZeNgD0vDA3g7iGVX851Td4AM2cJY16Sd7a2wkDadwh525VjItII3l9ILpyhd3L8NwCaPE/M1NUBIN2aSTNXfCyJVbmCXWZsIMLCUXemBVU+LcKn0JPevoiqufo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713619982; c=relaxed/simple; bh=HLMf+Vdhgi6/7xZSVGvGZGs3mSAI0hgSOFDh6SRdq3U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GQ6FnW/N/CI95NfneDfyRjjwo6HUlFzVO/oZDCfi4+/j+owW3Zw0e3DKxDBuu/d/34UHAdomBSXbqbyW1NTLqZ/PIOeGYM92zbcXmNEkGOTmCynuxVZVAuiZtha9WlibMT2BmDXN1XUkb2+Jcvy4F2HN8CTkbFScFQPquejZMao= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=PUw6577t; arc=none smtp.client-ip=209.85.210.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="PUw6577t" Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-6eb50bbd8ceso467384a34.1 for ; Sat, 20 Apr 2024 06:33:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1713619979; x=1714224779; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+YpMy9O3xOahtl9nLT0xVNhy9inQvhw9KWICHoewNQE=; b=PUw6577t1j8W6ErfaJpXz/N+z1fm/HcKAkkqJczsL8PHoMd4ehXT479Ucs6UwDicCi ZFeo627qntujqWu6m1Q9doUjPw3vTC65fwNISCYlzbC2LZtRilDsJVeql6y1DgtiltG1 Ruy8Ga/GLCrH94sqpjouvy5qViG0Pm0GV8yrpbgC3EwJqwlAHNHVfYfZTsWBBmWOBvfo kXOYSYu3+6zrybVZGbceDgmTy8aoskjUfc6b/H/hZCL2y5XRlpBVLmNNwGJM3T+xqcn8 waiLLk7V9WjnUeh0Yt9OCrgWF6XIExQwYN3IbJRohOSC8clOUAMuptcvJiDLveKFew31 GV9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713619979; x=1714224779; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+YpMy9O3xOahtl9nLT0xVNhy9inQvhw9KWICHoewNQE=; b=QIRTAKeIbjws4wi0+Yo7QwUqCMjQQmNPEY0q02JYiK1VGI9oiUM0g0ol4w3kseuMQ3 9Uaesaq/dzP6s06W6eZgMBynnbIPuCFbaEhR1BUh/t0jn6k5BGSsyXYyFpZ7W1Iy9+Xw vtBJTEYCkatFWOd3hz3Gb9DnJbR4sonMMz/kKMvOLITIW6Uh2fcpE4SmeRaUaWx9OOH0 WH6+LTvpiU0eL3o2rCuOy9PeZ4I0XhiDR3y2+38ocn1KTvU5v/rC2UZimrnwHJhel70A IhgPxRzKa+ZdUBufv3MnTrfvun8yepbWQuJzexqjVOgMqYs5WKf0RAU4GikyO8Qy8Ozp tYOA== X-Gm-Message-State: AOJu0YxcQVSXVTs0QHLjeUrDMh/h/SJzrzqEhubeNe1fq0IVk3+SlYx9 7lnyMM9/hphgcNUm6ka2bXmM608Y1Q8JodtfqX58ifu6xB2tGDROAG6fb68Ii5kSrycZQqXkDqG a X-Google-Smtp-Source: AGHT+IFoBeX7DOa4R0DyRcbCGbdwLTGT6dlUpL/10k9t1BQdLO1QdON1Wmm4MPV6iDqt+Ncq+uJq5Q== X-Received: by 2002:a05:6870:d68c:b0:22e:77b6:4f9d with SMTP id z12-20020a056870d68c00b0022e77b64f9dmr5843506oap.3.1713619978956; Sat, 20 Apr 2024 06:32:58 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.194]) by smtp.gmail.com with ESMTPSA id k5-20020a6568c5000000b005f7ba54e499sm2926610pgt.87.2024.04.20.06.32.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Apr 2024 06:32:57 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 5/5] io_uring/net: support bundles for recv Date: Sat, 20 Apr 2024 07:29:47 -0600 Message-ID: <20240420133233.500590-7-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240420133233.500590-2-axboe@kernel.dk> References: <20240420133233.500590-2-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If IORING_OP_RECV is used with provided buffers, the caller may also set IORING_RECVSEND_BUNDLE to turn it into a multi-buffer recv. This grabs buffers available and receives into them, posting a single completion for all of it. This can be used with multishot receive as well, or without it. Now that both send and receive support bundles, add a feature flag for it as well. If IORING_FEAT_RECVSEND_BUNDLE is set after registering the ring, then the kernel supports bundles for recv and send. Signed-off-by: Jens Axboe --- include/uapi/linux/io_uring.h | 15 +++-- io_uring/io_uring.c | 3 +- io_uring/net.c | 116 ++++++++++++++++++++++++++++------ 3 files changed, 105 insertions(+), 29 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7f583927c908..f093cb2300d9 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -352,13 +352,13 @@ enum io_uring_op { * IORING_NOTIF_USAGE_ZC_COPIED if data was copied * (at least partially). * - * IORING_RECVSEND_BUNDLE Used with IOSQE_BUFFER_SELECT. If set, send will - * grab as many buffers from the buffer group ID - * given and send them all. The completion result - * will be the number of buffers send, with the - * starting buffer ID in cqe->flags as per usual - * for provided buffer usage. The buffers will be - * contigious from the starting buffer ID. + * IORING_RECVSEND_BUNDLE Used with IOSQE_BUFFER_SELECT. If set, send or + * recv will grab as many buffers from the buffer + * group ID given and send them all. The completion + * result will be the number of buffers send, with + * the starting buffer ID in cqe->flags as per + * usual for provided buffer usage. The buffers + * will be contigious from the starting buffer ID. */ #define IORING_RECVSEND_POLL_FIRST (1U << 0) #define IORING_RECV_MULTISHOT (1U << 1) @@ -529,6 +529,7 @@ struct io_uring_params { #define IORING_FEAT_CQE_SKIP (1U << 11) #define IORING_FEAT_LINKED_FILE (1U << 12) #define IORING_FEAT_REG_REG_RING (1U << 13) +#define IORING_FEAT_RECVSEND_BUNDLE (1U << 14) /* * io_uring_register(2) opcodes and arguments diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3c9087f37c43..2675cffbd9a4 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3570,7 +3570,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED | IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS | IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP | - IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING; + IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING | + IORING_FEAT_RECVSEND_BUNDLE; if (copy_to_user(params, p, sizeof(*p))) { ret = -EFAULT; diff --git a/io_uring/net.c b/io_uring/net.c index 3e326576254b..51c41d771c50 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -747,7 +747,8 @@ static int io_recvmsg_prep_setup(struct io_kiocb *req) return ret; } -#define RECVMSG_FLAGS (IORING_RECVSEND_POLL_FIRST | IORING_RECV_MULTISHOT) +#define RECVMSG_FLAGS (IORING_RECVSEND_POLL_FIRST | IORING_RECV_MULTISHOT | \ + IORING_RECVSEND_BUNDLE) int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { @@ -761,21 +762,14 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr)); sr->len = READ_ONCE(sqe->len); sr->flags = READ_ONCE(sqe->ioprio); - if (sr->flags & ~(RECVMSG_FLAGS)) + if (sr->flags & ~RECVMSG_FLAGS) return -EINVAL; sr->msg_flags = READ_ONCE(sqe->msg_flags); if (sr->msg_flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; if (sr->msg_flags & MSG_ERRQUEUE) req->flags |= REQ_F_CLEAR_POLLIN; - if (sr->flags & IORING_RECV_MULTISHOT) { - if (!(req->flags & REQ_F_BUFFER_SELECT)) - return -EINVAL; - if (sr->msg_flags & MSG_WAITALL) - return -EINVAL; - if (req->opcode == IORING_OP_RECV && sr->len) - return -EINVAL; - req->flags |= REQ_F_APOLL_MULTISHOT; + if (req->flags & REQ_F_BUFFER_SELECT) { /* * Store the buffer group for this multishot receive separately, * as if we end up doing an io-wq based issue that selects a @@ -785,6 +779,20 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) * restore it. */ sr->buf_group = req->buf_index; + req->buf_list = NULL; + } + if (sr->flags & IORING_RECV_MULTISHOT) { + if (!(req->flags & REQ_F_BUFFER_SELECT)) + return -EINVAL; + if (sr->msg_flags & MSG_WAITALL) + return -EINVAL; + if (req->opcode == IORING_OP_RECV && sr->len) + return -EINVAL; + req->flags |= REQ_F_APOLL_MULTISHOT; + } + if (sr->flags & IORING_RECVSEND_BUNDLE) { + if (req->opcode == IORING_OP_RECVMSG) + return -EINVAL; } #ifdef CONFIG_COMPAT @@ -805,19 +813,28 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, struct io_async_msghdr *kmsg, bool mshot_finished, unsigned issue_flags) { + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); unsigned int cflags; - cflags = io_put_kbuf(req, issue_flags); + if (sr->flags & IORING_RECVSEND_BUNDLE) + cflags = io_put_kbufs(req, io_bundle_nbufs(kmsg, *ret), + issue_flags); + else + cflags = io_put_kbuf(req, issue_flags); + if (kmsg->msg.msg_inq > 0) cflags |= IORING_CQE_F_SOCK_NONEMPTY; + /* bundle with no more immediate buffers, we're done */ + if (sr->flags & IORING_RECVSEND_BUNDLE && req->flags & REQ_F_BL_EMPTY) + goto finish; + /* * Fill CQE for this receive and see if we should keep trying to * receive from this socket. */ if ((req->flags & REQ_F_APOLL_MULTISHOT) && !mshot_finished && io_req_post_cqe(req, *ret, cflags | IORING_CQE_F_MORE)) { - struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE; io_mshot_prep_retry(req, kmsg); @@ -837,6 +854,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, } /* Finish the request / stop multishot. */ +finish: io_req_set_res(req, *ret, cflags); if (issue_flags & IO_URING_F_MULTISHOT) @@ -1020,6 +1038,69 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags) return ret; } +static int io_recv_buf_select(struct io_kiocb *req, struct io_async_msghdr *kmsg, + size_t *len, unsigned int issue_flags) +{ + struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); + int ret; + + /* + * If the ring isn't locked, then don't use the peek interface + * to grab multiple buffers as we will lock/unlock between + * this selection and posting the buffers. + */ + if (!(issue_flags & IO_URING_F_UNLOCKED) && + sr->flags & IORING_RECVSEND_BUNDLE) { + struct buf_sel_arg arg = { + .iovs = &kmsg->fast_iov, + .nr_iovs = 1, + .mode = KBUF_MODE_EXPAND, + }; + + if (kmsg->free_iov) { + arg.nr_iovs = kmsg->free_iov_nr; + arg.iovs = kmsg->free_iov; + arg.mode |= KBUF_MODE_FREE; + } + + if (kmsg->msg.msg_inq > 0) + arg.max_len = min_not_zero(sr->len, kmsg->msg.msg_inq); + + ret = io_buffers_peek(req, &arg); + if (unlikely(ret < 0)) + return ret; + + /* special case 1 vec, can be a fast path */ + if (ret == 1) { + sr->buf = arg.iovs[0].iov_base; + sr->len = arg.iovs[0].iov_len; + goto map_ubuf; + } + iov_iter_init(&kmsg->msg.msg_iter, ITER_DEST, arg.iovs, ret, + arg.out_len); + if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->free_iov) { + kmsg->free_iov_nr = ret; + kmsg->free_iov = arg.iovs; + } + } else { + void __user *buf; + + *len = sr->len; + buf = io_buffer_select(req, len, issue_flags); + if (!buf) + return -ENOBUFS; + sr->buf = buf; + sr->len = *len; +map_ubuf: + ret = import_ubuf(ITER_DEST, sr->buf, sr->len, + &kmsg->msg.msg_iter); + if (unlikely(ret)) + return ret; + } + + return 0; +} + int io_recv(struct io_kiocb *req, unsigned int issue_flags) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); @@ -1044,17 +1125,10 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) retry_multishot: if (io_do_buffer_select(req)) { - void __user *buf; - - buf = io_buffer_select(req, &len, issue_flags); - if (!buf) - return -ENOBUFS; - sr->buf = buf; - sr->len = len; - ret = import_ubuf(ITER_DEST, sr->buf, sr->len, - &kmsg->msg.msg_iter); + ret = io_recv_buf_select(req, kmsg, &len, issue_flags); if (unlikely(ret)) goto out_free; + sr->buf = NULL; } kmsg->msg.msg_inq = -1;