From patchwork Mon Jul 25 11:33:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12927976 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52581C43334 for ; Mon, 25 Jul 2022 11:34:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234723AbiGYLeh (ORCPT ); Mon, 25 Jul 2022 07:34:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230246AbiGYLeh (ORCPT ); Mon, 25 Jul 2022 07:34:37 -0400 Received: from mail-wr1-x429.google.com (mail-wr1-x429.google.com [IPv6:2a00:1450:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B3701A051 for ; Mon, 25 Jul 2022 04:34:36 -0700 (PDT) Received: by mail-wr1-x429.google.com with SMTP id h8so15531804wrw.1 for ; Mon, 25 Jul 2022 04:34:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=flrSSD/oqDeoR+pcufA+loQzl8ORR4uaVTkm5ef3tSc=; b=AsTHf9Wm5mW83H8f1RMaOJXs/bvqCY5ZnQtdHjyE4CkPkYD9oKjHmY2txta3xWNw0n GwA8q9jUR8kEKATHWfJCv9LfKO0WKE6KJx/fezY15n2V0GV2UZhI/tJif6ef2azK0+Ih x6SGl2LqAvGjqFj7zP+qygTwDi3thMDGCNdr7myNvI+64ZTCfGqnjtxbD3Ilmi/bZ6Qz vqp/N+cPda1C0Sml+T1SHOsUX7HBE6EDabIvMs6/xP44e1Hjhfgzi2RFaBv8di/QalJo 6UMRa2QGWLHTgQNJXp4Tj6eF9MI/7dIFmK/JM1ZDZwU7cbkIjmtmqbxLu06LPLHky5cZ DHLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=flrSSD/oqDeoR+pcufA+loQzl8ORR4uaVTkm5ef3tSc=; b=SM0HzEyqg4KDSiEWbSPUNHHVLump8+nCLpmSvM/Fag2CGJ0yYQOwvQoKR/OWjKE5y3 Q+wLsameP0hGRKZKm5j5qXtKfetB1uh/TJcXadmK5TSsk64TdDPtq7jd/CvH3rsDyyOr ukm1Ph6oxCRwNgVcFaUOTzSgga5EzIoiPZr8z62lWD8yEespj2lpM1Q28ho36Be4trHq UUvKw+cMabtyA4aGawrQKNA9CFQTOasYs4ZY1cM3n1KFlDGb75mH8e8pUvyudwpEu1JS 76HUepjV8QvvgHHu1niHmTi2/gUB2xfhhUcN+Omf9rYgiIVAEvOZY3uI2fKsuYuly+jV PC/w== X-Gm-Message-State: AJIora+Y0h5/WroTe6xPcKa5V+7XYbyhky8Mo0BEAFeI8TfXHfCzc8Xj NUg10n1wx5a2iWNagwtpUUcNrKRqlQFGIg== X-Google-Smtp-Source: AGRyM1s3KAKBGfRq4FGCxQpaKenZ3ftPUdvcQWle8cJ2DHPxfyz6U0In7KIWbHucAE1I3vD1AFR4NA== X-Received: by 2002:a05:6000:1f0d:b0:21e:927e:d440 with SMTP id bv13-20020a0560001f0d00b0021e927ed440mr1336227wrb.621.1658748874309; Mon, 25 Jul 2022 04:34:34 -0700 (PDT) Received: from 127.0.0.1localhost.com ([2620:10d:c093:600::1:9f35]) by smtp.gmail.com with ESMTPSA id e29-20020a5d595d000000b0021e501519d3sm11659991wri.67.2022.07.25.04.34.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Jul 2022 04:34:33 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com, Ammar Faizi Subject: [PATCH liburing v2 1/5] io_uring.h: sync with kernel for zc send and notifiers Date: Mon, 25 Jul 2022 12:33:18 +0100 Message-Id: <75b424869b9dad220d425693a43ec5ae97e5b8e8.1658748624.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Signed-off-by: Pavel Begunkov --- src/include/liburing/io_uring.h | 37 +++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h index 99e6963..3953807 100644 --- a/src/include/liburing/io_uring.h +++ b/src/include/liburing/io_uring.h @@ -64,6 +64,10 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + struct { + __u16 notification_idx; + __u16 addr_len; + }; }; __u64 addr3; __u64 __pad2[1]; @@ -160,7 +164,8 @@ enum io_uring_op { IORING_OP_FALLOCATE, IORING_OP_OPENAT, IORING_OP_CLOSE, - IORING_OP_FILES_UPDATE, + IORING_OP_RSRC_UPDATE, + IORING_OP_FILES_UPDATE = IORING_OP_RSRC_UPDATE, IORING_OP_STATX, IORING_OP_READ, IORING_OP_WRITE, @@ -187,6 +192,7 @@ enum io_uring_op { IORING_OP_GETXATTR, IORING_OP_SOCKET, IORING_OP_URING_CMD, + IORING_OP_SENDZC_NOTIF, /* this goes last, obviously */ IORING_OP_LAST, @@ -208,6 +214,7 @@ enum io_uring_op { #define IORING_TIMEOUT_ETIME_SUCCESS (1U << 5) #define IORING_TIMEOUT_CLOCK_MASK (IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME) #define IORING_TIMEOUT_UPDATE_MASK (IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE) + /* * sqe->splice_flags * extends splice(2) flags @@ -254,13 +261,23 @@ enum io_uring_op { * CQEs on behalf of the same SQE. */ #define IORING_RECVSEND_POLL_FIRST (1U << 0) -#define IORING_RECV_MULTISHOT (1U << 1) +#define IORING_RECV_MULTISHOT (1U << 1) +#define IORING_RECVSEND_FIXED_BUF (1U << 2) +#define IORING_RECVSEND_NOTIF_FLUSH (1U << 3) /* * accept flags stored in sqe->ioprio */ #define IORING_ACCEPT_MULTISHOT (1U << 0) +/* + * IORING_OP_RSRC_UPDATE flags + */ +enum { + IORING_RSRC_UPDATE_FILES, + IORING_RSRC_UPDATE_NOTIF, +}; + /* * IO completion data structure (Completion Queue Entry) */ @@ -426,6 +443,9 @@ enum { /* register a range of fixed file slots for automatic slot allocation */ IORING_REGISTER_FILE_ALLOC_RANGE = 25, + IORING_REGISTER_NOTIFIERS = 26, + IORING_UNREGISTER_NOTIFIERS = 27, + /* this goes last */ IORING_REGISTER_LAST }; @@ -472,6 +492,19 @@ struct io_uring_rsrc_update2 { __u32 resv2; }; +struct io_uring_notification_slot { + __u64 tag; + __u64 resv[3]; +}; + +struct io_uring_notification_register { + __u32 nr_slots; + __u32 resv; + __u64 resv2; + __u64 data; + __u64 resv3; +}; + /* Skip updating fd indexes set to this value in the fd table */ #define IORING_REGISTER_FILES_SKIP (-2) From patchwork Mon Jul 25 11:33:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12927977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EED9C433EF for ; Mon, 25 Jul 2022 11:34:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234327AbiGYLei (ORCPT ); Mon, 25 Jul 2022 07:34:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230246AbiGYLei (ORCPT ); Mon, 25 Jul 2022 07:34:38 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 271D718B3F for ; Mon, 25 Jul 2022 04:34:37 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id q18so5105831wrx.8 for ; Mon, 25 Jul 2022 04:34:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aHcYq1k/ljRYj0PQhz7OPPTH7eSinoIjjdGQ8BjNjIg=; b=mlZLyIhDdPcobGMN640o0ymvzG9b0lxJZPLaubQJGMs7xJIwpztSxxEgjV1x9W6Vqy iqljiL5B6B3+Iq7NflUOfH/anXfkyLtF1a0lhRTQV2jSnVkSVAROL9Hp2Z6A2D8FGf0K z7KQORXQAaG9xYeWIlZRGIP9JzX9zmiwCGAPfpS853D3xoG0iHVsTOqz/WaG7UAth7WF WIxSoF9FOwT5t2SB2m+27LciraXVO30Lc7iXKWk0Gg+dAbJ755VYBlXndWMIgTEEgKaJ LMf4EDYlP91unItACbupKQoJOpyM/LWUOrEtEMYGYd0rZ/hEwTlo1QBs6YSIrxJHj8xW dHKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aHcYq1k/ljRYj0PQhz7OPPTH7eSinoIjjdGQ8BjNjIg=; b=3fGhVz1pFHsPNbSGc2Wlj5VWLUnxlsbY+/p8q0YA+zEnWSSd9oPKcqR2bSvlyca6iP h3qPNCdzU+m57YnUyFwczKhkiaIm6qKBKGIoBsw72oMVnoTE5VtVZKM2k1IqRotD6Aqm IQVGW/V2gHs7lRKwvuvacX3pTaZsmFMm3Wnl2QINXTjpSMmlYiNHTYINZK9JCAas84Ct lzT0duifMx9+BqbVDhPhucuIafB80RKqRZ8mdarunmeQImSl1tJelbFiMhf0/b+bcq0v ymdByXlPmrQvgcx5DNZPHnhQ4g6yeFIOU8wjq8t4SB8jx59tljDqv1fXzR3sJiOXt6Kw qyJg== X-Gm-Message-State: AJIora9W37iANJDspEDrD072y717griEftUmU7X5xC/wRcoGBomuSumu TppHt46XcOxXC9uUFsAhD/+BgbSm67gMyQ== X-Google-Smtp-Source: AGRyM1u/y52klHAnfnR6ykbytLlEFXRnosTBT0bIeJtmiF/gfWB8WQ5qjdjg8E2pAdLHpsjYyMS53Q== X-Received: by 2002:a05:6000:1086:b0:21e:72d9:96b3 with SMTP id y6-20020a056000108600b0021e72d996b3mr6577939wrw.170.1658748875301; Mon, 25 Jul 2022 04:34:35 -0700 (PDT) Received: from 127.0.0.1localhost.com ([2620:10d:c093:600::1:9f35]) by smtp.gmail.com with ESMTPSA id e29-20020a5d595d000000b0021e501519d3sm11659991wri.67.2022.07.25.04.34.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Jul 2022 04:34:34 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com, Ammar Faizi Subject: [PATCH liburing v2 2/5] liburing: add zc send and notif helpers Date: Mon, 25 Jul 2022 12:33:19 +0100 Message-Id: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add helpers for notification registration and preparing zerocopy send requests. Signed-off-by: Pavel Begunkov Reviewed-by: Ammar Faizi --- src/include/liburing.h | 42 ++++++++++++++++++++++++++++++++++++++++++ src/liburing.map | 2 ++ src/register.c | 20 ++++++++++++++++++++ 3 files changed, 64 insertions(+) diff --git a/src/include/liburing.h b/src/include/liburing.h index fc7613d..20cd308 100644 --- a/src/include/liburing.h +++ b/src/include/liburing.h @@ -189,6 +189,10 @@ int io_uring_register_sync_cancel(struct io_uring *ring, int io_uring_register_file_alloc_range(struct io_uring *ring, unsigned off, unsigned len); +int io_uring_register_notifications(struct io_uring *ring, unsigned nr, + struct io_uring_notification_slot *slots); +int io_uring_unregister_notifications(struct io_uring *ring); + /* * Helper for the peek/wait single cqe functions. Exported because of that, * but probably shouldn't be used directly in an application. @@ -677,6 +681,44 @@ static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, sqe->msg_flags = (__u32) flags; } +static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags, + unsigned slot_idx, unsigned zc_flags) +{ + io_uring_prep_rw(IORING_OP_SENDZC_NOTIF, sqe, sockfd, buf, (__u32) len, 0); + sqe->msg_flags = (__u32) flags; + sqe->notification_idx = slot_idx; + sqe->ioprio = zc_flags; +} + +static inline void io_uring_prep_sendzc_fixed(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, + int flags, unsigned slot_idx, + unsigned zc_flags, unsigned buf_idx) +{ + io_uring_prep_sendzc(sqe, sockfd, buf, len, flags, slot_idx, zc_flags); + sqe->ioprio |= IORING_RECVSEND_FIXED_BUF; + sqe->buf_index = buf_idx; +} + +static inline void io_uring_prep_sendzc_set_addr(struct io_uring_sqe *sqe, + const struct sockaddr *dest_addr, + __u16 addr_len) +{ + sqe->addr2 = (unsigned long)(void *)dest_addr; + sqe->addr_len = addr_len; +} + +static inline void io_uring_prep_notif_update(struct io_uring_sqe *sqe, + __u64 new_tag, /* 0 to ignore */ + unsigned offset, unsigned nr) +{ + io_uring_prep_rw(IORING_OP_FILES_UPDATE, sqe, -1, 0, nr, + (__u64)offset); + sqe->addr = new_tag; + sqe->ioprio = IORING_RSRC_UPDATE_NOTIF; +} + static inline void io_uring_prep_recv(struct io_uring_sqe *sqe, int sockfd, void *buf, size_t len, int flags) { diff --git a/src/liburing.map b/src/liburing.map index 318d3d7..7d8f143 100644 --- a/src/liburing.map +++ b/src/liburing.map @@ -60,4 +60,6 @@ LIBURING_2.3 { global: io_uring_register_sync_cancel; io_uring_register_file_alloc_range; + io_uring_register_notifications; + io_uring_unregister_notifications; } LIBURING_2.2; diff --git a/src/register.c b/src/register.c index 2b37e5f..7482112 100644 --- a/src/register.c +++ b/src/register.c @@ -364,3 +364,23 @@ int io_uring_register_file_alloc_range(struct io_uring *ring, IORING_REGISTER_FILE_ALLOC_RANGE, &range, 0); } + +int io_uring_register_notifications(struct io_uring *ring, unsigned nr, + struct io_uring_notification_slot *slots) +{ + struct io_uring_notification_register r = { + .nr_slots = nr, + .data = (unsigned long)slots, + }; + + return __sys_io_uring_register(ring->ring_fd, + IORING_REGISTER_NOTIFIERS, + &r, sizeof(r)); +} + +int io_uring_unregister_notifications(struct io_uring *ring) +{ + return __sys_io_uring_register(ring->ring_fd, + IORING_UNREGISTER_NOTIFIERS, + NULL, 0); +} From patchwork Mon Jul 25 11:33:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12927980 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 914C4C433EF for ; Mon, 25 Jul 2022 11:34:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235028AbiGYLen (ORCPT ); Mon, 25 Jul 2022 07:34:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230246AbiGYLel (ORCPT ); Mon, 25 Jul 2022 07:34:41 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 760101A070 for ; Mon, 25 Jul 2022 04:34:38 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id d13so7875059wrn.10 for ; Mon, 25 Jul 2022 04:34:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JSarR0Mf47GUj5UTJmEXkL41738WUfrQzC+3BgsI9PA=; b=ApB7Vk5jpIBFH1N34/Fy26N4cHA07wyafw5GOEp0P/8DI9ZRw/Ji9C3RTyfk5HaWMi XznbF9v07r22rZNTNLniDYqidr0l5LFuwEg/qZFUeYJ57gRbebGzR0DmgtSq78//jIX5 2kMglQY0xqYXP+XJOtzbtn385T+Q4Dv45H9gYJRk0YAqqa6KIoz/kU6HxgpNqRw0mmet cYiT2FG/h8SRJXJW+Uv4CXp0R6MkQggWuaSVTi6thVH0OdSSkFs8HDwWkL98FUWLPAJG wnxeVUn1rnGH2YOnM+BmYoIYOjqUGsJOsOE7T6sXSeFLo66/qov3jlBukmvE/rHPk0w7 vqSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JSarR0Mf47GUj5UTJmEXkL41738WUfrQzC+3BgsI9PA=; b=kfCWgjGG/kK1kQetfDMFKwLxvqo1TwFUvqrZQMw+dKWnEM7NdbjxMEI1mFxc8BZ6/O 1+iEIH9zeewp0EO0Iu/lOqsnaVNV9QzKe6NH+ILtZoI23Itkq1NrnwNsohNOLP7im52I uC+UO9ZqE3xfFQt9nwFCHsdR+cPhbMZefa4LN/MM+h8hC1Wsm4nKXD6fGog8zXm8b+yi ww4EJXLoSiuijNSIjwgt4Ia1WLX2zYaM+yUsNm9e8ks9+09T6reHjCihK795/s+TwkBV WiajWuiDA0Dx62+bIbCdwTHplMsPJ/4uKmyCq5ei0X0ubOmlDx+JksaNrrM95IyckFrB s86A== X-Gm-Message-State: AJIora/vsYeTD4Ljr4gi7ySGHTC2PyergRCzDusWS67n9jvn2fYnr1zA uMnJqCg/47A8D4dl6aiFSGK9Xqr+WQe/8g== X-Google-Smtp-Source: AGRyM1t+6/sR0x7ssXCQwYkKVDI8Oti9FdYG3Y3TXYi1Fizr1Ibfg8dnvTP1ziwDlZ7UKrmxU6jj1g== X-Received: by 2002:a5d:5742:0:b0:21e:503b:a368 with SMTP id q2-20020a5d5742000000b0021e503ba368mr7214344wrw.366.1658748876311; Mon, 25 Jul 2022 04:34:36 -0700 (PDT) Received: from 127.0.0.1localhost.com ([2620:10d:c093:600::1:9f35]) by smtp.gmail.com with ESMTPSA id e29-20020a5d595d000000b0021e501519d3sm11659991wri.67.2022.07.25.04.34.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Jul 2022 04:34:35 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com, Ammar Faizi Subject: [PATCH liburing v2 3/5] tests: add tests for zerocopy send and notifications Date: Mon, 25 Jul 2022 12:33:20 +0100 Message-Id: <57b1db13da1fb86bcaf8e9a5c93d6d22aef8fc4b.1658748624.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Signed-off-by: Pavel Begunkov --- test/Makefile | 1 + test/send-zerocopy.c | 888 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 889 insertions(+) create mode 100644 test/send-zerocopy.c diff --git a/test/Makefile b/test/Makefile index 8945368..a36ddb3 100644 --- a/test/Makefile +++ b/test/Makefile @@ -175,6 +175,7 @@ test_srcs := \ xattr.c \ skip-cqe.c \ single-issuer.c \ + send-zerocopy.c \ # EOL all_targets := diff --git a/test/send-zerocopy.c b/test/send-zerocopy.c new file mode 100644 index 0000000..6fa0535 --- /dev/null +++ b/test/send-zerocopy.c @@ -0,0 +1,888 @@ +/* SPDX-License-Identifier: MIT */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "liburing.h" +#include "helpers.h" + +#define MAX_MSG 128 + +#define PORT 10200 +#define HOST "127.0.0.1" +#define HOSTV6 "::1" + +#define NR_SLOTS 5 +#define ZC_TAG 10000 +#define MAX_PAYLOAD 8195 +#define BUFFER_OFFSET 41 + +#ifndef ARRAY_SIZE + #define ARRAY_SIZE(a) (sizeof(a)/sizeof((a)[0])) +#endif + +static int seqs[NR_SLOTS]; +static char tx_buffer[MAX_PAYLOAD] __attribute__((aligned(4096))); +static char rx_buffer[MAX_PAYLOAD] __attribute__((aligned(4096))); +static struct iovec buffers_iov[] = { + { .iov_base = tx_buffer, + .iov_len = sizeof(tx_buffer), }, + { .iov_base = tx_buffer + BUFFER_OFFSET, + .iov_len = sizeof(tx_buffer) - BUFFER_OFFSET - 13, }, +}; + +static inline bool tag_userdata(__u64 user_data) +{ + return ZC_TAG <= user_data && user_data < ZC_TAG + NR_SLOTS; +} + +static bool check_cq_empty(struct io_uring *ring) +{ + struct io_uring_cqe *cqe = NULL; + int ret; + + ret = io_uring_peek_cqe(ring, &cqe); /* nothing should be there */ + return ret == -EAGAIN; +} + +static int register_notifications(struct io_uring *ring) +{ + struct io_uring_notification_slot slots[NR_SLOTS] = {}; + int i; + + memset(seqs, 0, sizeof(seqs)); + for (i = 0; i < NR_SLOTS; i++) + slots[i].tag = ZC_TAG + i; + return io_uring_register_notifications(ring, NR_SLOTS, slots); +} + +static int reregister_notifications(struct io_uring *ring) +{ + int ret; + + ret = io_uring_unregister_notifications(ring); + if (ret) { + fprintf(stderr, "unreg notifiers failed %i\n", ret); + return ret; + } + + return register_notifications(ring); +} + +static int do_one(struct io_uring *ring, int sock_tx, int slot_idx) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int msg_flags = 0; + unsigned zc_flags = 0; + int ret; + + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, 1, msg_flags, + slot_idx, zc_flags); + sqe->user_data = 1; + + ret = io_uring_submit(ring); + assert(ret == 1); + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + assert(cqe->user_data == 1); + ret = cqe->res; + io_uring_cqe_seen(ring, cqe); + assert(check_cq_empty(ring)); + return ret; +} + +static int test_invalid_slot(struct io_uring *ring, int sock_tx, int sock_rx) +{ + int ret; + + ret = do_one(ring, sock_tx, NR_SLOTS); + assert(ret == -EINVAL); + return 0; +} + +static int test_basic_send(struct io_uring *ring, int sock_tx, int sock_rx) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int msg_flags = 0; + int slot_idx = 0; + unsigned zc_flags = 0; + int payload_size = 100; + int ret; + + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, payload_size, msg_flags, + slot_idx, zc_flags); + sqe->user_data = 1; + + ret = io_uring_submit(ring); + assert(ret == 1); + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + assert(cqe->user_data == 1 && cqe->res >= 0); + io_uring_cqe_seen(ring, cqe); + assert(check_cq_empty(ring)); + + ret = recv(sock_rx, rx_buffer, payload_size, MSG_TRUNC); + assert(ret == payload_size); + return 0; +} + +static int test_send_flush(struct io_uring *ring, int sock_tx, int sock_rx) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int msg_flags = 0; + int slot_idx = 0; + unsigned zc_flags = 0; + int payload_size = 100; + int ret, i, j; + int req_cqes, notif_cqes; + + /* now do send+flush, do many times to verify seqs */ + for (j = 0; j < NR_SLOTS * 5; j++) { + zc_flags = IORING_RECVSEND_NOTIF_FLUSH; + slot_idx = rand() % NR_SLOTS; + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, payload_size, + msg_flags, slot_idx, zc_flags); + sqe->user_data = 1; + + ret = io_uring_submit(ring); + assert(ret == 1); + + req_cqes = notif_cqes = 1; + for (i = 0; i < 2; i ++) { + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + + if (cqe->user_data == 1) { + assert(req_cqes > 0); + req_cqes--; + assert(cqe->res == payload_size); + } else if (cqe->user_data == ZC_TAG + slot_idx) { + assert(notif_cqes > 0); + notif_cqes--; + assert(cqe->res == 0 && cqe->flags == seqs[slot_idx]); + seqs[slot_idx]++; + } else { + fprintf(stderr, "invalid cqe %lu %i\n", + (unsigned long)cqe->user_data, cqe->res); + return -1; + } + io_uring_cqe_seen(ring, cqe); + } + assert(check_cq_empty(ring)); + + ret = recv(sock_rx, rx_buffer, payload_size, MSG_TRUNC); + assert(ret == payload_size); + } + return 0; +} + +static int test_multireq_notif(struct io_uring *ring, int sock_tx, int sock_rx) +{ + bool slot_seen[NR_SLOTS] = {}; + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int msg_flags = 0; + int slot_idx = 0; + unsigned zc_flags = 0; + int payload_size = 1; + int ret, j, i = 0; + int nr = NR_SLOTS * 21; + + while (i < nr) { + int nr_per_wave = 23; + + for (j = 0; j < nr_per_wave && i < nr; j++, i++) { + slot_idx = rand() % NR_SLOTS; + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, payload_size, + msg_flags, slot_idx, zc_flags); + sqe->user_data = i; + } + ret = io_uring_submit(ring); + assert(ret == j); + } + + for (i = 0; i < nr; i++) { + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + assert(cqe->user_data < nr && cqe->res == payload_size); + io_uring_cqe_seen(ring, cqe); + + ret = recv(sock_rx, rx_buffer, payload_size, MSG_TRUNC); + assert(ret == payload_size); + } + assert(check_cq_empty(ring)); + + zc_flags = IORING_RECVSEND_NOTIF_FLUSH; + for (slot_idx = 0; slot_idx < NR_SLOTS; slot_idx++) { + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, payload_size, + msg_flags, slot_idx, zc_flags); + sqe->user_data = slot_idx; + /* just to simplify cqe handling */ + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS; + } + ret = io_uring_submit(ring); + assert(ret == NR_SLOTS); + + for (i = 0; i < NR_SLOTS; i++) { + int slot_idx; + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + assert(tag_userdata(cqe->user_data)); + + slot_idx = cqe->user_data - ZC_TAG; + assert(!slot_seen[slot_idx]); + slot_seen[slot_idx] = true; + + assert(cqe->res == 0 && cqe->flags == seqs[slot_idx]); + seqs[slot_idx]++; + io_uring_cqe_seen(ring, cqe); + + ret = recv(sock_rx, rx_buffer, payload_size, MSG_TRUNC); + assert(ret == payload_size); + } + assert(check_cq_empty(ring)); + + for (i = 0; i < NR_SLOTS; i++) + assert(slot_seen[i]); + return 0; +} + +static int test_multi_send_flushing(struct io_uring *ring, int sock_tx, int sock_rx) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + unsigned zc_flags = IORING_RECVSEND_NOTIF_FLUSH; + int msg_flags = 0, slot_idx = 0; + int payload_size = 1; + int ret, j, i = 0; + int nr = NR_SLOTS * 30; + unsigned long long check = 0, expected = 0; + + while (i < nr) { + int nr_per_wave = 25; + + for (j = 0; j < nr_per_wave && i < nr; j++, i++) { + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, payload_size, + msg_flags, slot_idx, zc_flags); + sqe->user_data = 1; + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS; + } + ret = io_uring_submit(ring); + assert(ret == j); + } + + for (i = 0; i < nr; i++) { + int seq; + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + assert(!cqe->res); + assert(tag_userdata(cqe->user_data)); + + seq = cqe->flags; + check += seq * 100007UL; + io_uring_cqe_seen(ring, cqe); + + ret = recv(sock_rx, rx_buffer, payload_size, MSG_TRUNC); + assert(ret == payload_size); + } + assert(check_cq_empty(ring)); + + for (i = 0; i < nr; i++) + expected += (i + seqs[slot_idx]) * 100007UL; + assert(check == expected); + seqs[slot_idx] += nr; + return 0; +} + +static int do_one_fail_notif_flush(struct io_uring *ring, int off, int nr) +{ + struct io_uring_cqe *cqe; + struct io_uring_sqe *sqe; + int ret; + + /* single out-of-bounds slot */ + sqe = io_uring_get_sqe(ring); + io_uring_prep_notif_update(sqe, 0, off, nr); + sqe->user_data = 1; + ret = io_uring_submit(ring); + assert(ret == 1); + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret && cqe->user_data == 1); + ret = cqe->res; + io_uring_cqe_seen(ring, cqe); + return ret; +} + +static int test_update_flush_fail(struct io_uring *ring) +{ + int ret; + + /* single out-of-bounds slot */ + ret = do_one_fail_notif_flush(ring, NR_SLOTS, 1); + assert(ret == -EINVAL); + + /* out-of-bounds range */ + ret = do_one_fail_notif_flush(ring, 0, NR_SLOTS + 3); + assert(ret == -EINVAL); + ret = do_one_fail_notif_flush(ring, NR_SLOTS - 1, 2); + assert(ret == -EINVAL); + + /* overflow checks, note it's u32 internally */ + ret = do_one_fail_notif_flush(ring, ~(__u32)0, 1); + assert(ret == -EOVERFLOW); + ret = do_one_fail_notif_flush(ring, NR_SLOTS - 1, ~(__u32)0); + assert(ret == -EOVERFLOW); + return 0; +} + +static void do_one_consume(struct io_uring *ring, int sock_tx, int sock_rx, + int slot_idx) +{ + int ret; + + ret = do_one(ring, sock_tx, slot_idx); + assert(ret == 1); + + ret = recv(sock_rx, rx_buffer, 1, MSG_TRUNC); + assert(ret == 1); +} + +static int test_update_flush(struct io_uring *ring, int sock_tx, int sock_rx) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int offset = 1, nr_to_flush = 3; + int ret, i, slot_idx; + + /* + * Flush will be skipped for unused slots, so attached at least 1 req + * to each active notifier / slot + */ + for (slot_idx = 0; slot_idx < NR_SLOTS; slot_idx++) + do_one_consume(ring, sock_tx, sock_rx, slot_idx); + + assert(check_cq_empty(ring)); + + /* flush first */ + sqe = io_uring_get_sqe(ring); + io_uring_prep_notif_update(sqe, 0, 0, 1); + sqe->user_data = 1; + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS; + ret = io_uring_submit(ring); + assert(ret == 1); + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret && !cqe->res && cqe->user_data == ZC_TAG); + assert(cqe->flags == seqs[0]); + seqs[0]++; + io_uring_cqe_seen(ring, cqe); + do_one_consume(ring, sock_tx, sock_rx, 0); + assert(check_cq_empty(ring)); + + /* flush last */ + sqe = io_uring_get_sqe(ring); + io_uring_prep_notif_update(sqe, 0, NR_SLOTS - 1, 1); + sqe->user_data = 1; + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS; + ret = io_uring_submit(ring); + assert(ret == 1); + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret && !cqe->res && cqe->user_data == ZC_TAG + NR_SLOTS - 1); + assert(cqe->flags == seqs[NR_SLOTS - 1]); + seqs[NR_SLOTS - 1]++; + io_uring_cqe_seen(ring, cqe); + assert(check_cq_empty(ring)); + + /* we left the last slot without attached requests, flush should ignore it */ + sqe = io_uring_get_sqe(ring); + io_uring_prep_notif_update(sqe, 0, NR_SLOTS - 1, 1); + sqe->user_data = 1; + ret = io_uring_submit(ring); + assert(ret == 1); + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret && !cqe->res && cqe->user_data == 1); + io_uring_cqe_seen(ring, cqe); + assert(check_cq_empty(ring)); + + /* flush range */ + sqe = io_uring_get_sqe(ring); + io_uring_prep_notif_update(sqe, 0, offset, nr_to_flush); + sqe->user_data = 1; + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS; + ret = io_uring_submit(ring); + assert(ret == 1); + + for (i = 0; i < nr_to_flush; i++) { + int slot_idx; + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret && !cqe->res); + assert(ZC_TAG + offset <= cqe->user_data && + cqe->user_data < ZC_TAG + offset + nr_to_flush); + slot_idx = cqe->user_data - ZC_TAG; + assert(cqe->flags == seqs[slot_idx]); + seqs[slot_idx]++; + io_uring_cqe_seen(ring, cqe); + } + assert(check_cq_empty(ring)); + return 0; +} + +static int test_registration(int sock_tx, int sock_rx) +{ + struct io_uring_notification_slot slots[2] = { + {.tag = 1}, {.tag = 2}, + }; + void *invalid_slots = (void *)1UL; + struct io_uring ring; + int ret, i; + + ret = io_uring_queue_init(4, &ring, 0); + if (ret) { + fprintf(stderr, "queue init failed: %d\n", ret); + return 1; + } + + ret = io_uring_unregister_notifications(&ring); + if (ret != -ENXIO) { + fprintf(stderr, "unregister nothing: %d\n", ret); + return 1; + } + + ret = io_uring_register_notifications(&ring, 2, slots); + if (ret) { + fprintf(stderr, "io_uring_register_notifications failed: %d\n", ret); + return 1; + } + + ret = io_uring_register_notifications(&ring, 2, slots); + if (ret != -EBUSY) { + fprintf(stderr, "double register: %d\n", ret); + return 1; + } + + ret = io_uring_unregister_notifications(&ring); + if (ret) { + fprintf(stderr, "unregister failed: %d\n", ret); + return 1; + } + + ret = io_uring_register_notifications(&ring, 2, slots); + if (ret) { + fprintf(stderr, "second register failed: %d\n", ret); + return 1; + } + + ret = test_invalid_slot(&ring, sock_tx, sock_rx); + if (ret) { + fprintf(stderr, "test_invalid_slot() failed\n"); + return ret; + } + + for (i = 0; i < 2; i++) { + ret = do_one(&ring, sock_tx, 0); + assert(ret == 1); + + ret = recv(sock_rx, rx_buffer, 1, MSG_TRUNC); + assert(ret == 1); + } + + io_uring_queue_exit(&ring); + ret = io_uring_queue_init(4, &ring, 0); + if (ret) { + fprintf(stderr, "queue init failed: %d\n", ret); + return 1; + } + + ret = io_uring_register_notifications(&ring, 4, invalid_slots); + if (ret != -EFAULT) { + fprintf(stderr, "io_uring_register_notifications with invalid ptr: %d\n", ret); + return 1; + } + + io_uring_queue_exit(&ring); + return 0; +} + +static int prepare_ip(struct sockaddr_storage *addr, int *sock_client, int *sock_server, + bool ipv6, bool client_connect, bool msg_zc) +{ + int family, addr_size; + int ret, val; + + memset(addr, 0, sizeof(*addr)); + if (ipv6) { + struct sockaddr_in6 *saddr = (struct sockaddr_in6 *)addr; + + family = AF_INET6; + saddr->sin6_family = family; + saddr->sin6_port = htons(PORT); + addr_size = sizeof(*saddr); + } else { + struct sockaddr_in *saddr = (struct sockaddr_in *)addr; + + family = AF_INET; + saddr->sin_family = family; + saddr->sin_port = htons(PORT); + saddr->sin_addr.s_addr = htonl(INADDR_ANY); + addr_size = sizeof(*saddr); + } + + /* server sock setup */ + *sock_server = socket(family, SOCK_DGRAM, 0); + if (*sock_server < 0) { + perror("socket"); + return 1; + } + val = 1; + setsockopt(*sock_server, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val)); + ret = bind(*sock_server, (struct sockaddr *)addr, addr_size); + if (ret < 0) { + perror("bind"); + return 1; + } + + if (ipv6) { + struct sockaddr_in6 *saddr = (struct sockaddr_in6 *)addr; + + inet_pton(AF_INET6, HOSTV6, &(saddr->sin6_addr)); + } else { + struct sockaddr_in *saddr = (struct sockaddr_in *)addr; + + inet_pton(AF_INET, HOST, &saddr->sin_addr); + } + + /* client sock setup */ + *sock_client = socket(family, SOCK_DGRAM, 0); + if (*sock_client < 0) { + perror("socket"); + return 1; + } + if (client_connect) { + ret = connect(*sock_client, (struct sockaddr *)addr, addr_size); + if (ret < 0) { + perror("connect"); + return 1; + } + } + if (msg_zc) { + val = 1; + if (setsockopt(*sock_client, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) { + perror("setsockopt zc"); + return 1; + } + } + return 0; +} + +static int do_test_inet_send(struct io_uring *ring, int sock_client, int sock_server, + bool fixed_buf, struct sockaddr_storage *addr, + size_t send_size, bool cork, bool mix_register, + int buf_idx) +{ + const unsigned slot_idx = 0; + const unsigned zc_flags = 0; + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int nr_reqs = cork ? 5 : 1; + int i, ret; + size_t chunk_size = send_size / nr_reqs; + size_t chunk_size_last = send_size - chunk_size * (nr_reqs - 1); + char *buf = buffers_iov[buf_idx].iov_base; + + assert(send_size <= buffers_iov[buf_idx].iov_len); + memset(rx_buffer, 0, sizeof(rx_buffer)); + + for (i = 0; i < nr_reqs; i++) { + bool cur_fixed_buf = fixed_buf; + size_t cur_size = chunk_size; + int msg_flags = 0; + + if (mix_register) + cur_fixed_buf = rand() & 1; + + if (cork && i != nr_reqs - 1) + msg_flags = MSG_MORE; + if (i == nr_reqs - 1) + cur_size = chunk_size_last; + + sqe = io_uring_get_sqe(ring); + if (cur_fixed_buf) + io_uring_prep_sendzc_fixed(sqe, sock_client, + buf + i * chunk_size, + cur_size, msg_flags, slot_idx, + zc_flags, buf_idx); + else + io_uring_prep_sendzc(sqe, sock_client, + buf + i * chunk_size, + cur_size, msg_flags, slot_idx, + zc_flags); + + if (addr) { + sa_family_t fam = ((struct sockaddr_in *)addr)->sin_family; + int addr_len = fam == AF_INET ? sizeof(struct sockaddr_in) : + sizeof(struct sockaddr_in6); + + io_uring_prep_sendzc_set_addr(sqe, (const struct sockaddr *)addr, + addr_len); + } + sqe->user_data = i; + } + + ret = io_uring_submit(ring); + if (ret != nr_reqs) { + fprintf(stderr, "submit failed %i expected %i\n", ret, nr_reqs); + return 1; + } + + for (i = 0; i < nr_reqs; i++) { + int expected = chunk_size; + + ret = io_uring_wait_cqe(ring, &cqe); + if (ret) { + fprintf(stderr, "io_uring_wait_cqe failed %i\n", ret); + return 1; + } + if (cqe->user_data >= nr_reqs) { + fprintf(stderr, "invalid user_data\n"); + return 1; + } + if (cqe->user_data == nr_reqs - 1) + expected = chunk_size_last; + if (cqe->res != expected) { + fprintf(stderr, "invalid cqe->res %d expected %d\n", + cqe->res, expected); + return 1; + } + io_uring_cqe_seen(ring, cqe); + } + + ret = recv(sock_server, rx_buffer, send_size, 0); + if (ret != send_size) { + fprintf(stderr, "recv less than expected or recv failed %i\n", ret); + return 1; + } + + for (i = 0; i < send_size; i++) { + if (buf[i] != rx_buffer[i]) { + fprintf(stderr, "botched data, first byte %i, %u vs %u\n", + i, buf[i], rx_buffer[i]); + } + } + return 0; +} + +static int test_inet_send(struct io_uring *ring) +{ + struct sockaddr_storage addr; + int sock_client, sock_server; + int ret, j; + __u64 i; + + for (j = 0; j < 8; j++) { + bool ipv6 = j & 1; + bool client_connect = j & 2; + bool msg_zc_set = j & 4; + + ret = prepare_ip(&addr, &sock_client, &sock_server, ipv6, + client_connect, msg_zc_set); + if (ret) { + fprintf(stderr, "sock prep failed %d\n", ret); + return 1; + } + + for (i = 0; i < 64; i++) { + bool fixed_buf = i & 1; + struct sockaddr_storage *addr_arg = (i & 2) ? &addr : NULL; + size_t size = (i & 4) ? 137 : 4096; + bool cork = i & 8; + bool mix_register = i & 16; + bool aligned = i & 32; + int buf_idx = aligned ? 0 : 1; + + if (mix_register && (!cork || fixed_buf)) + continue; + if (!client_connect && addr_arg == NULL) + continue; + + ret = do_test_inet_send(ring, sock_client, sock_server, fixed_buf, + addr_arg, size, cork, mix_register, + buf_idx); + if (ret) { + fprintf(stderr, "send failed fixed buf %i, conn %i, addr %i, " + "cork %i\n", + fixed_buf, client_connect, !!addr_arg, + cork); + return 1; + } + } + + close(sock_client); + close(sock_server); + } + return 0; +} + +int main(int argc, char *argv[]) +{ + struct io_uring ring; + int i, ret, sp[2]; + + if (argc > 1) + return T_EXIT_SKIP; + + ret = io_uring_queue_init(32, &ring, 0); + if (ret) { + fprintf(stderr, "queue init failed: %d\n", ret); + return T_EXIT_FAIL; + } + + ret = register_notifications(&ring); + if (ret == -EINVAL) { + printf("sendzc is not supported, skip\n"); + return T_EXIT_SKIP; + } else if (ret) { + fprintf(stderr, "register notif failed %i\n", ret); + return T_EXIT_FAIL; + } + + srand((unsigned)time(NULL)); + for (i = 0; i < sizeof(tx_buffer); i++) + tx_buffer[i] = i; + + if (socketpair(AF_UNIX, SOCK_DGRAM, 0, sp) != 0) { + perror("Failed to create Unix-domain socket pair\n"); + return T_EXIT_FAIL; + } + + ret = test_registration(sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_registration() failed\n"); + return ret; + } + + ret = test_invalid_slot(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_invalid_slot() failed\n"); + return T_EXIT_FAIL; + } + + ret = test_basic_send(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_basic_send() failed\n"); + return T_EXIT_FAIL; + } + + ret = test_send_flush(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_send_flush() failed\n"); + return T_EXIT_FAIL; + } + + ret = test_multireq_notif(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_multireq_notif() failed\n"); + return T_EXIT_FAIL; + } + + ret = reregister_notifications(&ring); + if (ret) { + fprintf(stderr, "reregister notifiers failed %i\n", ret); + return T_EXIT_FAIL; + } + /* retry a few tests after registering notifs */ + ret = test_invalid_slot(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_invalid_slot() failed\n"); + return T_EXIT_FAIL; + } + + ret = test_multireq_notif(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_multireq_notif2() failed\n"); + return T_EXIT_FAIL; + } + + ret = test_multi_send_flushing(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_multi_send_flushing() failed\n"); + return T_EXIT_FAIL; + } + + ret = test_update_flush_fail(&ring); + if (ret) { + fprintf(stderr, "test_update_flush_fail() failed\n"); + return T_EXIT_FAIL; + } + + ret = test_update_flush(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_update_flush() failed\n"); + return T_EXIT_FAIL; + } + + ret = t_register_buffers(&ring, buffers_iov, ARRAY_SIZE(buffers_iov)); + if (ret == T_SETUP_SKIP) { + fprintf(stderr, "can't register bufs, skip\n"); + goto out; + } else if (ret != T_SETUP_OK) { + fprintf(stderr, "buffer registration failed %i\n", ret); + return T_EXIT_FAIL; + } + + ret = test_inet_send(&ring); + if (ret) { + fprintf(stderr, "test_inet_send() failed\n"); + return ret; + } +out: + io_uring_queue_exit(&ring); + close(sp[0]); + close(sp[1]); + return T_EXIT_PASS; +} From patchwork Mon Jul 25 11:33:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12927978 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C98CC43334 for ; Mon, 25 Jul 2022 11:34:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235045AbiGYLem (ORCPT ); Mon, 25 Jul 2022 07:34:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234927AbiGYLel (ORCPT ); Mon, 25 Jul 2022 07:34:41 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DE871A07E for ; Mon, 25 Jul 2022 04:34:39 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id u5so15544956wrm.4 for ; Mon, 25 Jul 2022 04:34:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=34BcaEYmUF9t/LSLnoYhFjl1ucL8WFEzEcejfi1X7s4=; b=QWZnEH5GIFdPGtRFl+uP1g0w+oeRqTyBs0taMpcAjtn2aZuS/2a9osMTGY8+lNKRth qth1hlgv8UCQFpsTYN1//V+DWBkb1OkP8weS+pVaCl5PyXwEdxRaBlWkMTS0Y/cCMnLg 8dg8VVJHzN4haW9Bx5dRJdWite4NaKngNo6ha/H/trgCXN/eSUOgyZ8cDmRt+tWRBcNJ TVER7yp2Dt9MmY4aK8qEmAZHVufbS2rLpZo1HNs1vW9Q8iYulzF+j8ag+ml5dxdcvXxy 74x3Eh1BZMuvLLpuRjnFY+KGDtFjGvgHUnKxV4t1Np7qnKteH5sq1Tx8kwxSavFQTomB rQ+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=34BcaEYmUF9t/LSLnoYhFjl1ucL8WFEzEcejfi1X7s4=; b=4lz1l+2eyum/jajXNsSGFBzic6DPXn2xRzyRuZ8tJPHgqCNomuhaLxqTkQNFEEvNl7 jJpJqSxOElNfFd+hqMj5LnA4upuEZo7sc7YAXOFoT6pdkTpWXVZxsIVWEM2lE4Whdlpj 0MTcGx+cK+nPqFBeya6P0+Y9aZxYPulvP37DAhReK4DuTUZmlypCkoGvZpOuxh4PTgHo N0zOrNmf3r73ktFBNAVxt6kneoXPfsXI4nccov60iWF44dKMjW9y3ipk7iZVfaa0eAUl I8jdfZSIHOitcw8Hwum+VfiaUKQOk5nV/pDzqij9ql3ugA16W/X7K42F1GhiPNU60W/A nnRg== X-Gm-Message-State: AJIora9CXlF9/qziA02YzIX26aWWm717g1KPAz6zga56D4NxxvWW9tWZ O789uXXq6PhWKbYfbQtpTGpug/a2WhnXpw== X-Google-Smtp-Source: AGRyM1uX085qsgGDe+xh9mgKW2D3f/cwPKNOn4FJ0FQO+/TqT538GpfEZX++WUZJDZLK80b9C3KhuA== X-Received: by 2002:a05:6000:1849:b0:21d:9ad7:f27f with SMTP id c9-20020a056000184900b0021d9ad7f27fmr7165446wri.445.1658748877363; Mon, 25 Jul 2022 04:34:37 -0700 (PDT) Received: from 127.0.0.1localhost.com ([2620:10d:c093:600::1:9f35]) by smtp.gmail.com with ESMTPSA id e29-20020a5d595d000000b0021e501519d3sm11659991wri.67.2022.07.25.04.34.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Jul 2022 04:34:36 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com, Ammar Faizi Subject: [PATCH liburing v2 4/5] examples: add a zerocopy send example Date: Mon, 25 Jul 2022 12:33:21 +0100 Message-Id: <60f2a1a167b12fe60bf04d4c67da75a4a6983855.1658748624.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Signed-off-by: Pavel Begunkov --- examples/Makefile | 3 +- examples/send-zerocopy.c | 366 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 368 insertions(+), 1 deletion(-) create mode 100644 examples/send-zerocopy.c diff --git a/examples/Makefile b/examples/Makefile index 8e7067f..1997a31 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -14,7 +14,8 @@ example_srcs := \ io_uring-cp.c \ io_uring-test.c \ link-cp.c \ - poll-bench.c + poll-bench.c \ + send-zerocopy.c all_targets := diff --git a/examples/send-zerocopy.c b/examples/send-zerocopy.c new file mode 100644 index 0000000..e42aa71 --- /dev/null +++ b/examples/send-zerocopy.c @@ -0,0 +1,366 @@ +/* SPDX-License-Identifier: MIT */ +/* based on linux-kernel/tools/testing/selftests/net/msg_zerocopy.c */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "liburing.h" + +#define ZC_TAG 0xfffffffULL +#define MAX_SUBMIT_NR 512 + +static bool cfg_reg_ringfd = true; +static bool cfg_fixed_files = 1; +static bool cfg_zc = 1; +static bool cfg_flush = 0; +static int cfg_nr_reqs = 8; +static bool cfg_fixed_buf = 1; + +static int cfg_family = PF_UNSPEC; +static int cfg_payload_len; +static int cfg_port = 8000; +static int cfg_runtime_ms = 4200; + +static socklen_t cfg_alen; +static struct sockaddr_storage cfg_dst_addr; + +static char payload[IP_MAXPACKET] __attribute__((aligned(4096))); + +static unsigned long gettimeofday_ms(void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return (tv.tv_sec * 1000) + (tv.tv_usec / 1000); +} + +static void do_setsockopt(int fd, int level, int optname, int val) +{ + if (setsockopt(fd, level, optname, &val, sizeof(val))) + error(1, errno, "setsockopt %d.%d: %d", level, optname, val); +} + +static void setup_sockaddr(int domain, const char *str_addr, + struct sockaddr_storage *sockaddr) +{ + struct sockaddr_in6 *addr6 = (void *) sockaddr; + struct sockaddr_in *addr4 = (void *) sockaddr; + + switch (domain) { + case PF_INET: + memset(addr4, 0, sizeof(*addr4)); + addr4->sin_family = AF_INET; + addr4->sin_port = htons(cfg_port); + if (str_addr && + inet_pton(AF_INET, str_addr, &(addr4->sin_addr)) != 1) + error(1, 0, "ipv4 parse error: %s", str_addr); + break; + case PF_INET6: + memset(addr6, 0, sizeof(*addr6)); + addr6->sin6_family = AF_INET6; + addr6->sin6_port = htons(cfg_port); + if (str_addr && + inet_pton(AF_INET6, str_addr, &(addr6->sin6_addr)) != 1) + error(1, 0, "ipv6 parse error: %s", str_addr); + break; + default: + error(1, 0, "illegal domain"); + } +} + +static int do_setup_tx(int domain, int type, int protocol) +{ + int fd; + + fd = socket(domain, type, protocol); + if (fd == -1) + error(1, errno, "socket t"); + + do_setsockopt(fd, SOL_SOCKET, SO_SNDBUF, 1 << 21); + + if (connect(fd, (void *) &cfg_dst_addr, cfg_alen)) + error(1, errno, "connect"); + return fd; +} + +static inline struct io_uring_cqe *wait_cqe_fast(struct io_uring *ring) +{ + struct io_uring_cqe *cqe; + unsigned head; + int ret; + + io_uring_for_each_cqe(ring, head, cqe) + return cqe; + + ret = io_uring_wait_cqe(ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + return cqe; +} + +static void do_tx(int domain, int type, int protocol) +{ + unsigned long packets = 0; + unsigned long bytes = 0; + struct io_uring ring; + struct iovec iov; + uint64_t tstop; + int i, fd, ret; + int compl_cqes = 0; + + fd = do_setup_tx(domain, type, protocol); + + ret = io_uring_queue_init(512, &ring, IORING_SETUP_COOP_TASKRUN); + if (ret) + error(1, ret, "io_uring: queue init"); + + if (cfg_zc) { + struct io_uring_notification_slot b[1] = {{.tag = ZC_TAG}}; + + ret = io_uring_register_notifications(&ring, 1, b); + if (ret) + error(1, ret, "io_uring: tx ctx registration"); + } + if (cfg_fixed_files) { + ret = io_uring_register_files(&ring, &fd, 1); + if (ret < 0) + error(1, ret, "io_uring: files registration"); + } + if (cfg_reg_ringfd) { + ret = io_uring_register_ring_fd(&ring); + if (ret < 0) + error(1, ret, "io_uring: io_uring_register_ring_fd"); + } + + iov.iov_base = payload; + iov.iov_len = cfg_payload_len; + + ret = io_uring_register_buffers(&ring, &iov, 1); + if (ret) + error(1, ret, "io_uring: buffer registration"); + + tstop = gettimeofday_ms() + cfg_runtime_ms; + do { + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + unsigned zc_flags = 0; + unsigned buf_idx = 0; + unsigned slot_idx = 0; + unsigned msg_flags = 0; + + compl_cqes += cfg_flush ? cfg_nr_reqs : 0; + if (cfg_flush) + zc_flags |= IORING_RECVSEND_NOTIF_FLUSH; + + for (i = 0; i < cfg_nr_reqs; i++) { + sqe = io_uring_get_sqe(&ring); + + if (!cfg_zc) + io_uring_prep_send(sqe, fd, payload, + cfg_payload_len, 0); + else if (cfg_fixed_buf) + io_uring_prep_sendzc_fixed(sqe, fd, payload, + cfg_payload_len, + msg_flags, slot_idx, + zc_flags, buf_idx); + else + io_uring_prep_sendzc(sqe, fd, payload, + cfg_payload_len, msg_flags, + slot_idx, zc_flags); + + sqe->user_data = 1; + if (cfg_fixed_files) { + sqe->fd = 0; + sqe->flags |= IOSQE_FIXED_FILE; + } + } + + ret = io_uring_submit(&ring); + if (ret != cfg_nr_reqs) + error(1, ret, "submit"); + + for (i = 0; i < cfg_nr_reqs; i++) { + cqe = wait_cqe_fast(&ring); + + if (cqe->user_data == ZC_TAG) { + compl_cqes--; + i--; + } else if (cqe->user_data != 1) { + error(1, cqe->user_data, "invalid user_data"); + } else if (cqe->res > 0) { + packets++; + bytes += cqe->res; + } else if (cqe->res == -EAGAIN) { + /* request failed, don't flush */ + if (cfg_flush) + compl_cqes--; + } else if (cqe->res == -ECONNREFUSED || + cqe->res == -ECONNRESET || + cqe->res == -EPIPE) { + fprintf(stderr, "Connection failure\n"); + goto out_fail; + } else { + error(1, cqe->res, "send failed"); + } + + io_uring_cqe_seen(&ring, cqe); + } + } while (gettimeofday_ms() < tstop); + +out_fail: + shutdown(fd, SHUT_RDWR); + if (close(fd)) + error(1, errno, "close"); + + fprintf(stderr, "tx=%lu (MB=%lu), tx/s=%lu (MB/s=%lu)\n", + packets, bytes >> 20, + packets / (cfg_runtime_ms / 1000), + (bytes >> 20) / (cfg_runtime_ms / 1000)); + + while (compl_cqes) { + struct io_uring_cqe *cqe = wait_cqe_fast(&ring); + + io_uring_cqe_seen(&ring, cqe); + compl_cqes--; + } + + if (cfg_zc) { + ret = io_uring_unregister_notifications(&ring); + if (ret) + error(1, ret, "io_uring: tx ctx unregistration"); + } + io_uring_queue_exit(&ring); +} + +static void do_test(int domain, int type, int protocol) +{ + int i; + + for (i = 0; i < IP_MAXPACKET; i++) + payload[i] = 'a' + (i % 26); + + do_tx(domain, type, protocol); +} + +static void usage(const char *filepath) +{ + error(1, 0, "Usage: %s [-f] [-n] [-z0] [-s] " + "(-4|-6) [-t