From patchwork Mon Jul 25 10:03:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12927933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0574FC433EF for ; Mon, 25 Jul 2022 10:05:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234497AbiGYKFL (ORCPT ); Mon, 25 Jul 2022 06:05:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234528AbiGYKFJ (ORCPT ); Mon, 25 Jul 2022 06:05:09 -0400 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9CDD4BD1 for ; Mon, 25 Jul 2022 03:05:07 -0700 (PDT) Received: by mail-wm1-x333.google.com with SMTP id i205-20020a1c3bd6000000b003a2fa488efdso3266779wma.4 for ; Mon, 25 Jul 2022 03:05:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=flrSSD/oqDeoR+pcufA+loQzl8ORR4uaVTkm5ef3tSc=; b=dvhri4wW1Mpyf0W/enNeSPw6zUlGtDQJEPrgN4HdAj1XKDSbaNbTOa65Pe6Iu7il8u GeGW2JcyewNhtN/BhYv5e81ycALbfNrcKdDtLn43rLIy+M3ysKLiWqj6YOv5/M4Wpk/U 9GFHgDaF+Di+ljVXvafh+eXUvvBOh3TJreZWuliWpdjiz2M9VtQ/pM9Id33L0T56waA8 g6ZoOJaOxCu6/dw3nZ5lgq6G4iCTPBTWnjv2vYY79e8X3c+BPGpp5d5q22A0VDDhQZyQ MMTLK6DoLoS0iLCkzsU0km48bD9GAW0CqzQXDrbMTjr48Eed7035PdOuodirrNEF3Boy 8mfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=flrSSD/oqDeoR+pcufA+loQzl8ORR4uaVTkm5ef3tSc=; b=rtAc2eIF6l5Q/oHe25DtMMzujYe30ZlWM1AFdF/bhxjqAwWGwpKUQArQ9bIfMcxrVa rN/S66eS9ky/swvLggWNk2OMb6zrsnkA9SEopy2sp8k4PB4nFaNPN7xyztsmVw7LOdRG RijPqj7vxV8fCc4zWzUPcXl3okCH2CkhJPOOoVbkZaUvFxCeGSqeeYuU05fw+W8qficM aIuVnNCZjELe++QzEexuFDKPqzpcW3icDeZih5bvP/dHIq6ZkDlDZSEYNP+W9f+fLmkH acJIh6Q0QdHyDQgcvQg5K6icIBzKynUL78mePfJJaQ1tUS5DSdcO0hJD06ZoWOD/jRe6 db4A== X-Gm-Message-State: AJIora/UDMONOyviwnpTgW7B1Y01RmdimiqvyORGC8tNvW0PvjF3uv2/ n6na4Z4WzZdwolBqQpDKROKLl+IiSYxHCg== X-Google-Smtp-Source: AGRyM1tpImtwfln773EVhIYv1LAqSnFIJugQIQUmj5WxkcBH+kyio09JO1XHWOGjlEH97xLMfPgdEA== X-Received: by 2002:a7b:ce0a:0:b0:3a3:1adf:af34 with SMTP id m10-20020a7bce0a000000b003a31adfaf34mr7745592wmc.127.1658743505011; Mon, 25 Jul 2022 03:05:05 -0700 (PDT) Received: from 127.0.0.1localhost.com ([2620:10d:c093:600::1:9f35]) by smtp.gmail.com with ESMTPSA id j23-20020a05600c1c1700b003a32251c3f9sm20553959wms.5.2022.07.25.03.05.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Jul 2022 03:05:04 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH liburing 1/4] io_uring.h: sync with kernel for zc send and notifiers Date: Mon, 25 Jul 2022 11:03:54 +0100 Message-Id: <75b424869b9dad220d425693a43ec5ae97e5b8e8.1658743360.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Signed-off-by: Pavel Begunkov --- src/include/liburing/io_uring.h | 37 +++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h index 99e6963..3953807 100644 --- a/src/include/liburing/io_uring.h +++ b/src/include/liburing/io_uring.h @@ -64,6 +64,10 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + struct { + __u16 notification_idx; + __u16 addr_len; + }; }; __u64 addr3; __u64 __pad2[1]; @@ -160,7 +164,8 @@ enum io_uring_op { IORING_OP_FALLOCATE, IORING_OP_OPENAT, IORING_OP_CLOSE, - IORING_OP_FILES_UPDATE, + IORING_OP_RSRC_UPDATE, + IORING_OP_FILES_UPDATE = IORING_OP_RSRC_UPDATE, IORING_OP_STATX, IORING_OP_READ, IORING_OP_WRITE, @@ -187,6 +192,7 @@ enum io_uring_op { IORING_OP_GETXATTR, IORING_OP_SOCKET, IORING_OP_URING_CMD, + IORING_OP_SENDZC_NOTIF, /* this goes last, obviously */ IORING_OP_LAST, @@ -208,6 +214,7 @@ enum io_uring_op { #define IORING_TIMEOUT_ETIME_SUCCESS (1U << 5) #define IORING_TIMEOUT_CLOCK_MASK (IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME) #define IORING_TIMEOUT_UPDATE_MASK (IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE) + /* * sqe->splice_flags * extends splice(2) flags @@ -254,13 +261,23 @@ enum io_uring_op { * CQEs on behalf of the same SQE. */ #define IORING_RECVSEND_POLL_FIRST (1U << 0) -#define IORING_RECV_MULTISHOT (1U << 1) +#define IORING_RECV_MULTISHOT (1U << 1) +#define IORING_RECVSEND_FIXED_BUF (1U << 2) +#define IORING_RECVSEND_NOTIF_FLUSH (1U << 3) /* * accept flags stored in sqe->ioprio */ #define IORING_ACCEPT_MULTISHOT (1U << 0) +/* + * IORING_OP_RSRC_UPDATE flags + */ +enum { + IORING_RSRC_UPDATE_FILES, + IORING_RSRC_UPDATE_NOTIF, +}; + /* * IO completion data structure (Completion Queue Entry) */ @@ -426,6 +443,9 @@ enum { /* register a range of fixed file slots for automatic slot allocation */ IORING_REGISTER_FILE_ALLOC_RANGE = 25, + IORING_REGISTER_NOTIFIERS = 26, + IORING_UNREGISTER_NOTIFIERS = 27, + /* this goes last */ IORING_REGISTER_LAST }; @@ -472,6 +492,19 @@ struct io_uring_rsrc_update2 { __u32 resv2; }; +struct io_uring_notification_slot { + __u64 tag; + __u64 resv[3]; +}; + +struct io_uring_notification_register { + __u32 nr_slots; + __u32 resv; + __u64 resv2; + __u64 data; + __u64 resv3; +}; + /* Skip updating fd indexes set to this value in the fd table */ #define IORING_REGISTER_FILES_SKIP (-2) From patchwork Mon Jul 25 10:03:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12927934 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A55AC43334 for ; Mon, 25 Jul 2022 10:05:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233325AbiGYKFQ (ORCPT ); Mon, 25 Jul 2022 06:05:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234508AbiGYKFP (ORCPT ); Mon, 25 Jul 2022 06:05:15 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0368E2AD6 for ; Mon, 25 Jul 2022 03:05:12 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id h8so15216081wrw.1 for ; Mon, 25 Jul 2022 03:05:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VDC7Zr+G3EzXARNEMbmPQH2N/3IHzVaWTsIyQTl4WoA=; b=HGs5UgtBpzfotoIfe1LdPDT6s/HqY0PTgSuR5c6Q41R2frNKVZBEWdNbK8yNeH4WBJ HlSdOMKIeQrPboLNhUPL2IYOvi7oitzKpoUwKFMYUbzlo0KIWeIMP3REsEuZeaJxUnOf 3TB8DpZumeBrPJYoBNOdLtimQVb+3Shl6ZzTdSBpnWD+xoTD3eP5J7Z1fMh8HjZMcnep l9QtL9g/3Ta+czs7C7NF4j2H5XwCDKNiPJXFkpgdxq30R3/puqTFcZceTnP1OsnvGeLF Z1wi7wsZp5xTtTvUf0ArBry1WDx6aGJns8aWs9H5wv4AK/dd/rH0uF+2DNbjtO2+TgD9 SbsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VDC7Zr+G3EzXARNEMbmPQH2N/3IHzVaWTsIyQTl4WoA=; b=kys8i36gIlcaxW17Wmdvq/x77a/qeiVsrU88pKCn7240SpAcb0Tgtk4Mwv5o7i48K9 gI9Xm1j3NzA7f4vEHa84Vxbj24kRDPujLGWR/fHsoenMnGODd0dtDm4jRdqjKB6EwKS/ 1FooPA/AZIn9Uc9wGvIpYfU+1n5UwORhegKkKJ8ofqCTGtTN/XEua5/xafVmPtpsFXBY wa/LwSNzOugkMRgNM63ENmtaWFdxSL2n5J6RDV89R6VVA2s/MDOw5vUF1FxDNneyR34i 9itdgQs4xL/Orp68ZsdqGcfjNsTKEeziIenOmLRYrUQOXX0nf2C2Bnlsw0PwIywH9zys 5brA== X-Gm-Message-State: AJIora+nrlr3Pz9Sm9/++CqsMPl1XJ6YA1harriZNbZl5l8i2xzOx5s/ S00YqAaEFtqMGtw4KPOzSeImyPuCJC4SWw== X-Google-Smtp-Source: AGRyM1tYq6eKIip/qLkowrXSrYZkm52MDAAT7q1IMHP/aiLnbai81UhY67Qn3iiUD3CHwWJrPg+duw== X-Received: by 2002:a5d:5a83:0:b0:21e:2adc:e381 with SMTP id bp3-20020a5d5a83000000b0021e2adce381mr7061313wrb.703.1658743510884; Mon, 25 Jul 2022 03:05:10 -0700 (PDT) Received: from 127.0.0.1localhost.com ([2620:10d:c093:600::1:9f35]) by smtp.gmail.com with ESMTPSA id j23-20020a05600c1c1700b003a32251c3f9sm20553959wms.5.2022.07.25.03.05.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Jul 2022 03:05:10 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH liburing 2/4] liburing: add zc send and notif helpers Date: Mon, 25 Jul 2022 11:03:55 +0100 Message-Id: <7f705b208e5f7baa6ee94904e39d3d0da2e28150.1658743360.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add helpers for notification registration and preparing zerocopy send requests. Signed-off-by: Pavel Begunkov --- src/include/liburing.h | 41 +++++++++++++++++++++++++++++++++++++++++ src/liburing.map | 2 ++ src/register.c | 20 ++++++++++++++++++++ 3 files changed, 63 insertions(+) diff --git a/src/include/liburing.h b/src/include/liburing.h index fc7613d..f3c5887 100644 --- a/src/include/liburing.h +++ b/src/include/liburing.h @@ -189,6 +189,10 @@ int io_uring_register_sync_cancel(struct io_uring *ring, int io_uring_register_file_alloc_range(struct io_uring *ring, unsigned off, unsigned len); +int io_uring_register_notifications(struct io_uring *ring, unsigned nr, + struct io_uring_notification_slot *slots); +int io_uring_unregister_notifications(struct io_uring *ring); + /* * Helper for the peek/wait single cqe functions. Exported because of that, * but probably shouldn't be used directly in an application. @@ -677,6 +681,43 @@ static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, sqe->msg_flags = (__u32) flags; } +static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags, + unsigned slot_idx, unsigned zc_flags) +{ + io_uring_prep_rw(IORING_OP_SENDZC_NOTIF, sqe, sockfd, buf, (__u32) len, 0); + sqe->msg_flags = (__u32) flags; + sqe->notification_idx = slot_idx; + sqe->ioprio = zc_flags; +} + +static inline void io_uring_prep_sendzc_fixed(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, + int flags, unsigned slot_idx, + unsigned zc_flags, unsigned buf_idx) +{ + io_uring_prep_sendzc(sqe, sockfd, buf, len, flags, slot_idx, zc_flags); + sqe->ioprio |= IORING_RECVSEND_FIXED_BUF; + sqe->buf_index = buf_idx; +} + +static inline void io_uring_prep_sendzc_set_addr(struct io_uring_sqe *sqe, + const struct sockaddr *dest_addr, + __u16 addr_len) +{ + sqe->addr2 = (unsigned long)(void *)dest_addr; + sqe->addr_len = addr_len; +} + +static inline void io_uring_prep_notif_update(struct io_uring_sqe *sqe, + __u64 new_tag, /* 0 to ignore */ + unsigned offset, unsigned nr) +{ + io_uring_prep_rw(IORING_OP_FILES_UPDATE, sqe, -1, (void *)new_tag, nr, + (__u64)offset); + sqe->ioprio = IORING_RSRC_UPDATE_NOTIF; +} + static inline void io_uring_prep_recv(struct io_uring_sqe *sqe, int sockfd, void *buf, size_t len, int flags) { diff --git a/src/liburing.map b/src/liburing.map index 318d3d7..7d8f143 100644 --- a/src/liburing.map +++ b/src/liburing.map @@ -60,4 +60,6 @@ LIBURING_2.3 { global: io_uring_register_sync_cancel; io_uring_register_file_alloc_range; + io_uring_register_notifications; + io_uring_unregister_notifications; } LIBURING_2.2; diff --git a/src/register.c b/src/register.c index 2b37e5f..7482112 100644 --- a/src/register.c +++ b/src/register.c @@ -364,3 +364,23 @@ int io_uring_register_file_alloc_range(struct io_uring *ring, IORING_REGISTER_FILE_ALLOC_RANGE, &range, 0); } + +int io_uring_register_notifications(struct io_uring *ring, unsigned nr, + struct io_uring_notification_slot *slots) +{ + struct io_uring_notification_register r = { + .nr_slots = nr, + .data = (unsigned long)slots, + }; + + return __sys_io_uring_register(ring->ring_fd, + IORING_REGISTER_NOTIFIERS, + &r, sizeof(r)); +} + +int io_uring_unregister_notifications(struct io_uring *ring) +{ + return __sys_io_uring_register(ring->ring_fd, + IORING_UNREGISTER_NOTIFIERS, + NULL, 0); +} From patchwork Mon Jul 25 10:03:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12927935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB3B7C43334 for ; Mon, 25 Jul 2022 10:05:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234491AbiGYKFS (ORCPT ); Mon, 25 Jul 2022 06:05:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234542AbiGYKFR (ORCPT ); Mon, 25 Jul 2022 06:05:17 -0400 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDB22FCE for ; Mon, 25 Jul 2022 03:05:13 -0700 (PDT) Received: by mail-wm1-x334.google.com with SMTP id 8-20020a05600c024800b003a2fe343db1so6043799wmj.1 for ; Mon, 25 Jul 2022 03:05:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PHepNmSKP7t5bUW599osBQFozm7VKZstJuUhkkSkJPk=; b=aEZvzXdqyv3tmCDFSx7XG0es9IsMCdqDSeRRXo6jb+RFLkckmyFVZIDy/k2EL29AQR KYapZmogQGnMc74DQrbpRRjdnbd0JIMFh0PkvrAFJ4rQnCmWkxS8xOQwC5xqEEddiQad oJ2a8mBJy0PtRI03WkX4lnBxE0qPDhPA90Pw34eDJqEjKg/XnSKUOHpnw3MeXR7JawJV C6lET2dW2MPuHMY4/+pgAg7XHU19nY8qOr7tCj8RqM8O7qsMqKxKhZ3j7bYUxpQZKzO7 sx3MI394sQAn+M5zN93KXImsfvyqk+/9EYxcjfLDAwcdjishGK9tXPAZaKFQQWS6/cuw MStQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PHepNmSKP7t5bUW599osBQFozm7VKZstJuUhkkSkJPk=; b=Kmz9K4dbTUuZsSP8IRu7SIP+ZqJlTZz8SSBzzNfoZQqIRPtQL3QtES0iFi0+MyHTQA cGdKSMOvNQi5dzOPqWrku8I82+8UamqCq8s8GEeB47IYruchSwstGkvrIg7iZ0dqopyR Ah8Xv4sPJms1fDA0h0vExwYyRt3rhQHg6U7vdUgbncoSFnQPj0rI6EbtWs2NmK8kE7Yw TWuDcf1iAcb4VaW7KWPq+evaoF/2n0/tRft1FGN1aCfDf7n2MyrSpSZ8aSzjdjCnFlYQ Ib2uaoEjjisbs4JufuN6WBxnKJpceW5QY/t/ucStL5fVlYacDvr3fnVXByxw4BDSeBeE 0Few== X-Gm-Message-State: AJIora8khHPJxYaZgs3DtxAvzN/Nx4fWsTB2k8BTKePgGnwVSy3M0DMf /9B1rGMzcyONt0gPrnJHScyybuE3JWIHLw== X-Google-Smtp-Source: AGRyM1uuFZzle3bbFBinmQLvkBzbKdkF3/dBLOn6V4F+UJ14sljqukNtC1c58ORmw0O4Sj1mrWRJsg== X-Received: by 2002:a05:600c:358d:b0:3a3:2fe2:7d5e with SMTP id p13-20020a05600c358d00b003a32fe27d5emr15599997wmq.77.1658743511771; Mon, 25 Jul 2022 03:05:11 -0700 (PDT) Received: from 127.0.0.1localhost.com ([2620:10d:c093:600::1:9f35]) by smtp.gmail.com with ESMTPSA id j23-20020a05600c1c1700b003a32251c3f9sm20553959wms.5.2022.07.25.03.05.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Jul 2022 03:05:11 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH liburing 3/4] tests: add tests for zerocopy send and notifications Date: Mon, 25 Jul 2022 11:03:56 +0100 Message-Id: <92dccd4b172d5511646d72c51205241aa2e62458.1658743360.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Signed-off-by: Pavel Begunkov Signed-off-by: Ammar Faizi --- test/Makefile | 1 + test/send-zcopy.c | 879 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 880 insertions(+) create mode 100644 test/send-zcopy.c diff --git a/test/Makefile b/test/Makefile index 8945368..7b6018c 100644 --- a/test/Makefile +++ b/test/Makefile @@ -175,6 +175,7 @@ test_srcs := \ xattr.c \ skip-cqe.c \ single-issuer.c \ + send-zcopy.c \ # EOL all_targets := diff --git a/test/send-zcopy.c b/test/send-zcopy.c new file mode 100644 index 0000000..cd7d655 --- /dev/null +++ b/test/send-zcopy.c @@ -0,0 +1,879 @@ +/* SPDX-License-Identifier: MIT */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "liburing.h" +#include "helpers.h" + +#define MAX_MSG 128 + +#define PORT 10200 +#define HOST "127.0.0.1" +#define HOSTV6 "::1" + +#define NR_SLOTS 5 +#define ZC_TAG 10000 +#define MAX_PAYLOAD 8195 +#define BUFFER_OFFSET 41 + +#ifndef ARRAY_SIZE + #define ARRAY_SIZE(a) (sizeof(a)/sizeof((a)[0])) +#endif + +static int seqs[NR_SLOTS]; +static char tx_buffer[MAX_PAYLOAD] __attribute__((aligned(4096))); +static char rx_buffer[MAX_PAYLOAD] __attribute__((aligned(4096))); +static struct iovec buffers_iov[] = { + { .iov_base = tx_buffer, + .iov_len = sizeof(tx_buffer), }, + { .iov_base = tx_buffer + BUFFER_OFFSET, + .iov_len = sizeof(tx_buffer) - BUFFER_OFFSET - 13, }, +}; + +static inline bool tag_userdata(__u64 user_data) +{ + return ZC_TAG <= user_data && user_data < ZC_TAG + NR_SLOTS; +} + +static bool check_cq_empty(struct io_uring *ring) +{ + struct io_uring_cqe *cqe = NULL; + int ret; + + ret = io_uring_peek_cqe(ring, &cqe); /* nothing should be there */ + return ret == -EAGAIN; +} + +static int register_notifications(struct io_uring *ring) +{ + struct io_uring_notification_slot slots[NR_SLOTS] = {}; + int i; + + memset(seqs, 0, sizeof(seqs)); + for (i = 0; i < NR_SLOTS; i++) + slots[i].tag = ZC_TAG + i; + return io_uring_register_notifications(ring, NR_SLOTS, slots); +} + +static int reregister_notifications(struct io_uring *ring) +{ + int ret; + + ret = io_uring_unregister_notifications(ring); + if (ret) { + fprintf(stderr, "unreg notifiers failed %i\n", ret); + return ret; + } + + return register_notifications(ring); +} + +static int do_one(struct io_uring *ring, int sock_tx, int slot_idx) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int msg_flags = 0; + unsigned zc_flags = 0; + int ret; + + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, 1, msg_flags, + slot_idx, zc_flags); + sqe->user_data = 1; + + ret = io_uring_submit(ring); + assert(ret == 1); + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + assert(cqe->user_data == 1); + ret = cqe->res; + io_uring_cqe_seen(ring, cqe); + assert(check_cq_empty(ring)); + return ret; +} + +static int test_invalid_slot(struct io_uring *ring, int sock_tx, int sock_rx) +{ + int ret; + + ret = do_one(ring, sock_tx, NR_SLOTS); + assert(ret == -EINVAL); + return 0; +} + +static int test_basic_send(struct io_uring *ring, int sock_tx, int sock_rx) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int msg_flags = 0; + int slot_idx = 0; + unsigned zc_flags = 0; + int payload_size = 100; + int ret; + + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, payload_size, msg_flags, + slot_idx, zc_flags); + sqe->user_data = 1; + + ret = io_uring_submit(ring); + assert(ret == 1); + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + assert(cqe->user_data == 1 && cqe->res >= 0); + io_uring_cqe_seen(ring, cqe); + assert(check_cq_empty(ring)); + + ret = recv(sock_rx, rx_buffer, payload_size, MSG_TRUNC); + assert(ret == payload_size); + return 0; +} + +static int test_send_flush(struct io_uring *ring, int sock_tx, int sock_rx) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int msg_flags = 0; + int slot_idx = 0; + unsigned zc_flags = 0; + int payload_size = 100; + int ret, i, j; + int req_cqes, notif_cqes; + + /* now do send+flush, do many times to verify seqs */ + for (j = 0; j < NR_SLOTS * 5; j++) { + zc_flags = IORING_RECVSEND_NOTIF_FLUSH; + slot_idx = rand() % NR_SLOTS; + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, payload_size, + msg_flags, slot_idx, zc_flags); + sqe->user_data = 1; + + ret = io_uring_submit(ring); + assert(ret == 1); + + req_cqes = notif_cqes = 1; + for (i = 0; i < 2; i ++) { + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + + if (cqe->user_data == 1) { + assert(req_cqes > 0); + req_cqes--; + assert(cqe->res == payload_size); + } else if (cqe->user_data == ZC_TAG + slot_idx) { + assert(notif_cqes > 0); + notif_cqes--; + assert(cqe->res == 0 && cqe->flags == seqs[slot_idx]); + seqs[slot_idx]++; + } else { + fprintf(stderr, "invalid cqe %lu %i\n", + (unsigned long)cqe->user_data, cqe->res); + return -1; + } + io_uring_cqe_seen(ring, cqe); + } + assert(check_cq_empty(ring)); + + ret = recv(sock_rx, rx_buffer, payload_size, MSG_TRUNC); + assert(ret == payload_size); + } + return 0; +} + +static int test_multireq_notif(struct io_uring *ring, int sock_tx, int sock_rx) +{ + bool slot_seen[NR_SLOTS] = {}; + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int msg_flags = 0; + int slot_idx = 0; + unsigned zc_flags = 0; + int payload_size = 1; + int ret, j, i = 0; + int nr = NR_SLOTS * 21; + + while (i < nr) { + int nr_per_wave = 23; + + for (j = 0; j < nr_per_wave && i < nr; j++, i++) { + slot_idx = rand() % NR_SLOTS; + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, payload_size, + msg_flags, slot_idx, zc_flags); + sqe->user_data = i; + } + ret = io_uring_submit(ring); + assert(ret == j); + } + + for (i = 0; i < nr; i++) { + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + assert(cqe->user_data < nr && cqe->res == payload_size); + io_uring_cqe_seen(ring, cqe); + + ret = recv(sock_rx, rx_buffer, payload_size, MSG_TRUNC); + assert(ret == payload_size); + } + assert(check_cq_empty(ring)); + + zc_flags = IORING_RECVSEND_NOTIF_FLUSH; + for (slot_idx = 0; slot_idx < NR_SLOTS; slot_idx++) { + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, payload_size, + msg_flags, slot_idx, zc_flags); + sqe->user_data = slot_idx; + /* just to simplify cqe handling */ + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS; + } + ret = io_uring_submit(ring); + assert(ret == NR_SLOTS); + + for (i = 0; i < NR_SLOTS; i++) { + int slot_idx; + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + assert(tag_userdata(cqe->user_data)); + + slot_idx = cqe->user_data - ZC_TAG; + assert(!slot_seen[slot_idx]); + slot_seen[slot_idx] = true; + + assert(cqe->res == 0 && cqe->flags == seqs[slot_idx]); + seqs[slot_idx]++; + io_uring_cqe_seen(ring, cqe); + + ret = recv(sock_rx, rx_buffer, payload_size, MSG_TRUNC); + assert(ret == payload_size); + } + assert(check_cq_empty(ring)); + + for (i = 0; i < NR_SLOTS; i++) + assert(slot_seen[i]); + return 0; +} + +static int test_multi_send_flushing(struct io_uring *ring, int sock_tx, int sock_rx) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + unsigned zc_flags = IORING_RECVSEND_NOTIF_FLUSH; + int msg_flags = 0, slot_idx = 0; + int payload_size = 1; + int ret, j, i = 0; + int nr = NR_SLOTS * 30; + unsigned long long check = 0, expected = 0; + + while (i < nr) { + int nr_per_wave = 25; + + for (j = 0; j < nr_per_wave && i < nr; j++, i++) { + sqe = io_uring_get_sqe(ring); + io_uring_prep_sendzc(sqe, sock_tx, tx_buffer, payload_size, + msg_flags, slot_idx, zc_flags); + sqe->user_data = 1; + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS; + } + ret = io_uring_submit(ring); + assert(ret == j); + } + + for (i = 0; i < nr; i++) { + int seq; + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret); + assert(!cqe->res); + assert(tag_userdata(cqe->user_data)); + + seq = cqe->flags; + check += seq * 100007UL; + io_uring_cqe_seen(ring, cqe); + + ret = recv(sock_rx, rx_buffer, payload_size, MSG_TRUNC); + assert(ret == payload_size); + } + assert(check_cq_empty(ring)); + + for (i = 0; i < nr; i++) + expected += (i + seqs[slot_idx]) * 100007UL; + assert(check == expected); + seqs[slot_idx] += nr; + return 0; +} + +static int do_one_fail_notif_flush(struct io_uring *ring, int off, int nr) +{ + struct io_uring_cqe *cqe; + struct io_uring_sqe *sqe; + int ret; + + /* single out-of-bounds slot */ + sqe = io_uring_get_sqe(ring); + io_uring_prep_notif_update(sqe, 0, off, nr); + sqe->user_data = 1; + ret = io_uring_submit(ring); + assert(ret == 1); + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret && cqe->user_data == 1); + ret = cqe->res; + io_uring_cqe_seen(ring, cqe); + return ret; +} + +static int test_update_flush_fail(struct io_uring *ring) +{ + int ret; + + /* single out-of-bounds slot */ + ret = do_one_fail_notif_flush(ring, NR_SLOTS, 1); + assert(ret == -EINVAL); + + /* out-of-bounds range */ + ret = do_one_fail_notif_flush(ring, 0, NR_SLOTS + 3); + assert(ret == -EINVAL); + ret = do_one_fail_notif_flush(ring, NR_SLOTS - 1, 2); + assert(ret == -EINVAL); + + /* overflow checks, note it's u32 internally */ + ret = do_one_fail_notif_flush(ring, ~(__u32)0, 1); + assert(ret == -EOVERFLOW); + ret = do_one_fail_notif_flush(ring, NR_SLOTS - 1, ~(__u32)0); + assert(ret == -EOVERFLOW); + return 0; +} + +static void do_one_consume(struct io_uring *ring, int sock_tx, int sock_rx, + int slot_idx) +{ + int ret; + + ret = do_one(ring, sock_tx, slot_idx); + assert(ret == 1); + + ret = recv(sock_rx, rx_buffer, 1, MSG_TRUNC); + assert(ret == 1); +} + +static int test_update_flush(struct io_uring *ring, int sock_tx, int sock_rx) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int offset = 1, nr_to_flush = 3; + int ret, i, slot_idx; + + /* + * Flush will be skipped for unused slots, so attached at least 1 req + * to each active notifier / slot + */ + for (slot_idx = 0; slot_idx < NR_SLOTS; slot_idx++) + do_one_consume(ring, sock_tx, sock_rx, slot_idx); + + assert(check_cq_empty(ring)); + + /* flush first */ + sqe = io_uring_get_sqe(ring); + io_uring_prep_notif_update(sqe, 0, 0, 1); + sqe->user_data = 1; + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS; + ret = io_uring_submit(ring); + assert(ret == 1); + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret && !cqe->res && cqe->user_data == ZC_TAG); + assert(cqe->flags == seqs[0]); + seqs[0]++; + io_uring_cqe_seen(ring, cqe); + do_one_consume(ring, sock_tx, sock_rx, 0); + assert(check_cq_empty(ring)); + + /* flush last */ + sqe = io_uring_get_sqe(ring); + io_uring_prep_notif_update(sqe, 0, NR_SLOTS - 1, 1); + sqe->user_data = 1; + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS; + ret = io_uring_submit(ring); + assert(ret == 1); + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret && !cqe->res && cqe->user_data == ZC_TAG + NR_SLOTS - 1); + assert(cqe->flags == seqs[NR_SLOTS - 1]); + seqs[NR_SLOTS - 1]++; + io_uring_cqe_seen(ring, cqe); + assert(check_cq_empty(ring)); + + /* we left the last slot without attached requests, flush should ignore it */ + sqe = io_uring_get_sqe(ring); + io_uring_prep_notif_update(sqe, 0, NR_SLOTS - 1, 1); + sqe->user_data = 1; + ret = io_uring_submit(ring); + assert(ret == 1); + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret && !cqe->res && cqe->user_data == 1); + io_uring_cqe_seen(ring, cqe); + assert(check_cq_empty(ring)); + + /* flush range */ + sqe = io_uring_get_sqe(ring); + io_uring_prep_notif_update(sqe, 0, offset, nr_to_flush); + sqe->user_data = 1; + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS; + ret = io_uring_submit(ring); + assert(ret == 1); + + for (i = 0; i < nr_to_flush; i++) { + int slot_idx; + + ret = io_uring_wait_cqe(ring, &cqe); + assert(!ret && !cqe->res); + assert(ZC_TAG + offset <= cqe->user_data && + cqe->user_data < ZC_TAG + offset + nr_to_flush); + slot_idx = cqe->user_data - ZC_TAG; + assert(cqe->flags == seqs[slot_idx]); + seqs[slot_idx]++; + io_uring_cqe_seen(ring, cqe); + } + assert(check_cq_empty(ring)); + return 0; +} + +static int test_registration(int sock_tx, int sock_rx) +{ + struct io_uring_notification_slot slots[2] = { + {.tag = 1}, {.tag = 2}, + }; + void *invalid_slots = (void *)1UL; + struct io_uring ring; + int ret, i; + + ret = io_uring_queue_init(4, &ring, 0); + if (ret) { + fprintf(stderr, "queue init failed: %d\n", ret); + return 1; + } + + ret = io_uring_unregister_notifications(&ring); + if (ret != -ENXIO) { + fprintf(stderr, "unregister nothing: %d\n", ret); + return 1; + } + + ret = io_uring_register_notifications(&ring, 2, slots); + if (ret) { + fprintf(stderr, "io_uring_register_notifications failed: %d\n", ret); + return 1; + } + + ret = io_uring_register_notifications(&ring, 2, slots); + if (ret != -EBUSY) { + fprintf(stderr, "double register: %d\n", ret); + return 1; + } + + ret = io_uring_unregister_notifications(&ring); + if (ret) { + fprintf(stderr, "unregister failed: %d\n", ret); + return 1; + } + + ret = io_uring_register_notifications(&ring, 2, slots); + if (ret) { + fprintf(stderr, "second register failed: %d\n", ret); + return 1; + } + + ret = test_invalid_slot(&ring, sock_tx, sock_rx); + if (ret) { + fprintf(stderr, "test_invalid_slot() failed\n"); + return ret; + } + + for (i = 0; i < 2; i++) { + ret = do_one(&ring, sock_tx, 0); + assert(ret == 1); + + ret = recv(sock_rx, rx_buffer, 1, MSG_TRUNC); + assert(ret == 1); + } + + io_uring_queue_exit(&ring); + ret = io_uring_queue_init(4, &ring, 0); + if (ret) { + fprintf(stderr, "queue init failed: %d\n", ret); + return 1; + } + + ret = io_uring_register_notifications(&ring, 4, invalid_slots); + if (ret != -EFAULT) { + fprintf(stderr, "io_uring_register_notifications with invalid ptr: %d\n", ret); + return 1; + } + + io_uring_queue_exit(&ring); + return 0; +} + +static int prepare_ip(struct sockaddr_storage *addr, int *sock_client, int *sock_server, + bool ipv6, bool client_connect, bool msg_zc) +{ + int family, addr_size; + int ret, val; + + memset(addr, 0, sizeof(*addr)); + if (ipv6) { + struct sockaddr_in6 *saddr = (struct sockaddr_in6 *)addr; + + family = AF_INET6; + saddr->sin6_family = family; + saddr->sin6_port = htons(PORT); + addr_size = sizeof(*saddr); + } else { + struct sockaddr_in *saddr = (struct sockaddr_in *)addr; + + family = AF_INET; + saddr->sin_family = family; + saddr->sin_port = htons(PORT); + saddr->sin_addr.s_addr = htonl(INADDR_ANY); + addr_size = sizeof(*saddr); + } + + /* server sock setup */ + *sock_server = socket(family, SOCK_DGRAM, 0); + if (*sock_server < 0) { + perror("socket"); + return 1; + } + val = 1; + setsockopt(*sock_server, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val)); + ret = bind(*sock_server, (struct sockaddr *)addr, addr_size); + if (ret < 0) { + perror("bind"); + return 1; + } + + if (ipv6) { + struct sockaddr_in6 *saddr = (struct sockaddr_in6 *)addr; + + inet_pton(AF_INET6, HOSTV6, &(saddr->sin6_addr)); + } else { + struct sockaddr_in *saddr = (struct sockaddr_in *)addr; + + inet_pton(AF_INET, HOST, &saddr->sin_addr); + } + + /* client sock setup */ + *sock_client = socket(family, SOCK_DGRAM, 0); + if (*sock_client < 0) { + perror("socket"); + return 1; + } + if (client_connect) { + ret = connect(*sock_client, (struct sockaddr *)addr, addr_size); + if (ret < 0) { + perror("connect"); + return 1; + } + } + if (msg_zc) { + val = 1; + if (setsockopt(*sock_client, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) { + perror("setsockopt zc"); + return 1; + } + } + return 0; +} + +static int do_test_inet_send(struct io_uring *ring, int sock_client, int sock_server, + bool fixed_buf, struct sockaddr_storage *addr, + size_t send_size, bool cork, bool mix_register, + int buf_idx) +{ + const unsigned slot_idx = 0; + const unsigned zc_flags = 0; + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int nr_reqs = cork ? 5 : 1; + int i, ret; + size_t chunk_size = send_size / nr_reqs; + size_t chunk_size_last = send_size - chunk_size * (nr_reqs - 1); + char *buf = buffers_iov[buf_idx].iov_base; + + assert(send_size <= buffers_iov[buf_idx].iov_len); + memset(rx_buffer, 0, sizeof(rx_buffer)); + + for (i = 0; i < nr_reqs; i++) { + bool cur_fixed_buf = fixed_buf; + size_t cur_size = chunk_size; + int msg_flags = 0; + + if (mix_register) + cur_fixed_buf = rand() & 1; + + if (cork && i != nr_reqs - 1) + msg_flags = MSG_MORE; + if (i == nr_reqs - 1) + cur_size = chunk_size_last; + + sqe = io_uring_get_sqe(ring); + if (cur_fixed_buf) + io_uring_prep_sendzc_fixed(sqe, sock_client, + buf + i * chunk_size, + cur_size, msg_flags, slot_idx, + zc_flags, buf_idx); + else + io_uring_prep_sendzc(sqe, sock_client, + buf + i * chunk_size, + cur_size, msg_flags, slot_idx, + zc_flags); + + if (addr) { + sa_family_t fam = ((struct sockaddr_in *)addr)->sin_family; + int addr_len = fam == AF_INET ? sizeof(struct sockaddr_in) : + sizeof(struct sockaddr_in6); + + io_uring_prep_sendzc_set_addr(sqe, (const struct sockaddr *)addr, + addr_len); + } + sqe->user_data = i; + } + + ret = io_uring_submit(ring); + if (ret != nr_reqs) + error(1, ret, "submit"); + + for (i = 0; i < nr_reqs; i++) { + int expected = chunk_size; + + ret = io_uring_wait_cqe(ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + if (cqe->user_data >= nr_reqs) + error(1, cqe->user_data, "invalid user_data"); + if (cqe->user_data == nr_reqs - 1) + expected = chunk_size_last; + if (cqe->res != expected) + error(1, cqe->res, "send failed"); + io_uring_cqe_seen(ring, cqe); + } + + ret = recv(sock_server, rx_buffer, send_size, 0); + if (ret != send_size) { + fprintf(stderr, "recv less than expected or recv failed %i\n", ret); + return 1; + } + + for (i = 0; i < send_size; i++) { + if (buf[i] != rx_buffer[i]) { + fprintf(stderr, "botched data, first byte %i, %u vs %u\n", + i, buf[i], rx_buffer[i]); + } + } + return 0; +} + +static int test_inet_send(struct io_uring *ring) +{ + struct sockaddr_storage addr; + int sock_client, sock_server; + int ret, j; + __u64 i; + + for (j = 0; j < 8; j++) { + bool ipv6 = j & 1; + bool client_connect = j & 2; + bool msg_zc_set = j & 4; + + ret = prepare_ip(&addr, &sock_client, &sock_server, ipv6, + client_connect, msg_zc_set); + if (ret) { + fprintf(stderr, "sock prep failed %d\n", ret); + return 1; + } + + for (i = 0; i < 64; i++) { + bool fixed_buf = i & 1; + struct sockaddr_storage *addr_arg = (i & 2) ? &addr : NULL; + size_t size = (i & 4) ? 137 : 4096; + bool cork = i & 8; + bool mix_register = i & 16; + bool aligned = i & 32; + int buf_idx = aligned ? 0 : 1; + + if (mix_register && (!cork || fixed_buf)) + continue; + if (!client_connect && addr_arg == NULL) + continue; + + ret = do_test_inet_send(ring, sock_client, sock_server, fixed_buf, + addr_arg, size, cork, mix_register, + buf_idx); + if (ret) { + fprintf(stderr, "send failed fixed buf %i, conn %i, addr %i, " + "cork %i\n", + fixed_buf, client_connect, !!addr_arg, + cork); + return 1; + } + } + + close(sock_client); + close(sock_server); + } + return 0; +} + +int main(int argc, char *argv[]) +{ + struct io_uring ring; + int i, ret, sp[2]; + + if (argc > 1) + return 0; + + ret = io_uring_queue_init(32, &ring, 0); + if (ret) { + fprintf(stderr, "queue init failed: %d\n", ret); + return 1; + } + + ret = register_notifications(&ring); + if (ret == -EINVAL) { + printf("sendzc is not supported, skip\n"); + return 0; + } else if (ret) { + fprintf(stderr, "register notif failed %i\n", ret); + return 1; + } + + srand((unsigned)time(NULL)); + for (i = 0; i < sizeof(tx_buffer); i++) + tx_buffer[i] = i; + + if (socketpair(AF_UNIX, SOCK_DGRAM, 0, sp) != 0) { + perror("Failed to create Unix-domain socket pair\n"); + return 1; + } + + ret = test_registration(sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_registration() failed\n"); + return ret; + } + + ret = test_invalid_slot(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_invalid_slot() failed\n"); + return ret; + } + + ret = test_basic_send(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_basic_send() failed\n"); + return ret; + } + + ret = test_send_flush(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_send_flush() failed\n"); + return ret; + } + + ret = test_multireq_notif(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_multireq_notif() failed\n"); + return ret; + } + + ret = reregister_notifications(&ring); + if (ret) { + fprintf(stderr, "reregister notifiers failed %i\n", ret); + return ret; + } + /* retry a few tests after registering notifs */ + ret = test_invalid_slot(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_invalid_slot() failed\n"); + return ret; + } + + ret = test_multireq_notif(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_multireq_notif2() failed\n"); + return ret; + } + + ret = test_multi_send_flushing(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_multi_send_flushing() failed\n"); + return ret; + } + + ret = test_update_flush_fail(&ring); + if (ret) { + fprintf(stderr, "test_update_flush_fail() failed\n"); + return ret; + } + + ret = test_update_flush(&ring, sp[0], sp[1]); + if (ret) { + fprintf(stderr, "test_update_flush() failed\n"); + return ret; + } + + ret = t_register_buffers(&ring, buffers_iov, ARRAY_SIZE(buffers_iov)); + if (ret == T_SETUP_SKIP) { + fprintf(stderr, "can't register bufs, skip\n"); + goto out; + } else if (ret != T_SETUP_OK) { + error(1, ret, "io_uring: buffer registration"); + } + + ret = test_inet_send(&ring); + if (ret) { + fprintf(stderr, "test_inet_send() failed\n"); + return ret; + } + +out: + io_uring_queue_exit(&ring); + close(sp[0]); + close(sp[1]); + return 0; +} From patchwork Mon Jul 25 10:03:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12927936 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42CE0CCA473 for ; Mon, 25 Jul 2022 10:05:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231982AbiGYKFU (ORCPT ); Mon, 25 Jul 2022 06:05:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34186 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234494AbiGYKFS (ORCPT ); Mon, 25 Jul 2022 06:05:18 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D345A45A for ; Mon, 25 Jul 2022 03:05:16 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id q18so4790913wrx.8 for ; Mon, 25 Jul 2022 03:05:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=34BcaEYmUF9t/LSLnoYhFjl1ucL8WFEzEcejfi1X7s4=; b=n7p+9cTL5JwTaq7Lmv9Yv5Rl5u8DDwqUZyv/m5zVtftktQHKf8UViHREkVIoN31m11 UFHVohhiCQumRjUub3vNXowkrL/iJVZyi0fCz4QP778nSXo8i0YxhxoRAqgcXSx8Euii J2RJRh15omb/L7rs12IsTdTZD1gozHls+2zOW4iKh8rg5OB1J8iDI78CpeweMizqRWK3 yPswSsjmW69TaK9WfrQNjX876IvAKntrlimj9UqPx52YOGm+tIVJIbpFFl+Bp0NEK5ui a6XcwxWVKduxg7zbaSa5d4M9n/kaM/u0F45l8k/y/GEQWqXPbAREu2UUVoYekBj1YdYn G2fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=34BcaEYmUF9t/LSLnoYhFjl1ucL8WFEzEcejfi1X7s4=; b=bsMXZER5BASdIxyQy4ir1WOyVblG766R7y4QGWvpcS/q4PQRbxFvqBAIVPA9ftiKYL bGjYo4JldkfS0X4vbwDEJJl5bqCezYydQMT/6tjuER602m/lUq2j9a1Sf+JKMlMQ0hB+ HmOPrguAIPCRO1v/RGSVmT8btwpcZZBBBkrW3bXdUlCQd6BYbSW1K58jdg9gDeS2VXQx iBaZ+sde3DfZWKnjnm5+0WrtmpIRNZeUUEAJ072/pMtpPi9WE5S/tad4wbnn9CQcqBU6 bTB/LShRYmNfxuYWCf84PV0X6zr2z0+54QbWXiYvNzgB8qt900ZaukJsA0mwqd6Kp6g8 R8tQ== X-Gm-Message-State: AJIora+mNmv3uYuQg4Sx6VrbT+iuidDNGSHyle7ia+b5SC2REfASNfUs PbGZFDaElrSEP781ClBdxsb/ZHJUf/BBIQ== X-Google-Smtp-Source: AGRyM1ubmN5AV1ZRjjCGzVuWGuqGeHzvErRX63eXwIFNS4FtwtggGwrUQDFq8HBTaNI+z+RJeDRPIw== X-Received: by 2002:a5d:668b:0:b0:21e:94b4:f070 with SMTP id l11-20020a5d668b000000b0021e94b4f070mr615565wru.253.1658743514155; Mon, 25 Jul 2022 03:05:14 -0700 (PDT) Received: from 127.0.0.1localhost.com ([2620:10d:c093:600::1:9f35]) by smtp.gmail.com with ESMTPSA id j23-20020a05600c1c1700b003a32251c3f9sm20553959wms.5.2022.07.25.03.05.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Jul 2022 03:05:13 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com Subject: [PATCH liburing 4/4] examples: add a zerocopy send example Date: Mon, 25 Jul 2022 11:03:57 +0100 Message-Id: <0c3a98b6486c19674856d3396085d0b509bf2736.1658743360.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Signed-off-by: Pavel Begunkov --- examples/Makefile | 3 +- examples/send-zerocopy.c | 366 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 368 insertions(+), 1 deletion(-) create mode 100644 examples/send-zerocopy.c diff --git a/examples/Makefile b/examples/Makefile index 8e7067f..1997a31 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -14,7 +14,8 @@ example_srcs := \ io_uring-cp.c \ io_uring-test.c \ link-cp.c \ - poll-bench.c + poll-bench.c \ + send-zerocopy.c all_targets := diff --git a/examples/send-zerocopy.c b/examples/send-zerocopy.c new file mode 100644 index 0000000..e42aa71 --- /dev/null +++ b/examples/send-zerocopy.c @@ -0,0 +1,366 @@ +/* SPDX-License-Identifier: MIT */ +/* based on linux-kernel/tools/testing/selftests/net/msg_zerocopy.c */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "liburing.h" + +#define ZC_TAG 0xfffffffULL +#define MAX_SUBMIT_NR 512 + +static bool cfg_reg_ringfd = true; +static bool cfg_fixed_files = 1; +static bool cfg_zc = 1; +static bool cfg_flush = 0; +static int cfg_nr_reqs = 8; +static bool cfg_fixed_buf = 1; + +static int cfg_family = PF_UNSPEC; +static int cfg_payload_len; +static int cfg_port = 8000; +static int cfg_runtime_ms = 4200; + +static socklen_t cfg_alen; +static struct sockaddr_storage cfg_dst_addr; + +static char payload[IP_MAXPACKET] __attribute__((aligned(4096))); + +static unsigned long gettimeofday_ms(void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return (tv.tv_sec * 1000) + (tv.tv_usec / 1000); +} + +static void do_setsockopt(int fd, int level, int optname, int val) +{ + if (setsockopt(fd, level, optname, &val, sizeof(val))) + error(1, errno, "setsockopt %d.%d: %d", level, optname, val); +} + +static void setup_sockaddr(int domain, const char *str_addr, + struct sockaddr_storage *sockaddr) +{ + struct sockaddr_in6 *addr6 = (void *) sockaddr; + struct sockaddr_in *addr4 = (void *) sockaddr; + + switch (domain) { + case PF_INET: + memset(addr4, 0, sizeof(*addr4)); + addr4->sin_family = AF_INET; + addr4->sin_port = htons(cfg_port); + if (str_addr && + inet_pton(AF_INET, str_addr, &(addr4->sin_addr)) != 1) + error(1, 0, "ipv4 parse error: %s", str_addr); + break; + case PF_INET6: + memset(addr6, 0, sizeof(*addr6)); + addr6->sin6_family = AF_INET6; + addr6->sin6_port = htons(cfg_port); + if (str_addr && + inet_pton(AF_INET6, str_addr, &(addr6->sin6_addr)) != 1) + error(1, 0, "ipv6 parse error: %s", str_addr); + break; + default: + error(1, 0, "illegal domain"); + } +} + +static int do_setup_tx(int domain, int type, int protocol) +{ + int fd; + + fd = socket(domain, type, protocol); + if (fd == -1) + error(1, errno, "socket t"); + + do_setsockopt(fd, SOL_SOCKET, SO_SNDBUF, 1 << 21); + + if (connect(fd, (void *) &cfg_dst_addr, cfg_alen)) + error(1, errno, "connect"); + return fd; +} + +static inline struct io_uring_cqe *wait_cqe_fast(struct io_uring *ring) +{ + struct io_uring_cqe *cqe; + unsigned head; + int ret; + + io_uring_for_each_cqe(ring, head, cqe) + return cqe; + + ret = io_uring_wait_cqe(ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + return cqe; +} + +static void do_tx(int domain, int type, int protocol) +{ + unsigned long packets = 0; + unsigned long bytes = 0; + struct io_uring ring; + struct iovec iov; + uint64_t tstop; + int i, fd, ret; + int compl_cqes = 0; + + fd = do_setup_tx(domain, type, protocol); + + ret = io_uring_queue_init(512, &ring, IORING_SETUP_COOP_TASKRUN); + if (ret) + error(1, ret, "io_uring: queue init"); + + if (cfg_zc) { + struct io_uring_notification_slot b[1] = {{.tag = ZC_TAG}}; + + ret = io_uring_register_notifications(&ring, 1, b); + if (ret) + error(1, ret, "io_uring: tx ctx registration"); + } + if (cfg_fixed_files) { + ret = io_uring_register_files(&ring, &fd, 1); + if (ret < 0) + error(1, ret, "io_uring: files registration"); + } + if (cfg_reg_ringfd) { + ret = io_uring_register_ring_fd(&ring); + if (ret < 0) + error(1, ret, "io_uring: io_uring_register_ring_fd"); + } + + iov.iov_base = payload; + iov.iov_len = cfg_payload_len; + + ret = io_uring_register_buffers(&ring, &iov, 1); + if (ret) + error(1, ret, "io_uring: buffer registration"); + + tstop = gettimeofday_ms() + cfg_runtime_ms; + do { + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + unsigned zc_flags = 0; + unsigned buf_idx = 0; + unsigned slot_idx = 0; + unsigned msg_flags = 0; + + compl_cqes += cfg_flush ? cfg_nr_reqs : 0; + if (cfg_flush) + zc_flags |= IORING_RECVSEND_NOTIF_FLUSH; + + for (i = 0; i < cfg_nr_reqs; i++) { + sqe = io_uring_get_sqe(&ring); + + if (!cfg_zc) + io_uring_prep_send(sqe, fd, payload, + cfg_payload_len, 0); + else if (cfg_fixed_buf) + io_uring_prep_sendzc_fixed(sqe, fd, payload, + cfg_payload_len, + msg_flags, slot_idx, + zc_flags, buf_idx); + else + io_uring_prep_sendzc(sqe, fd, payload, + cfg_payload_len, msg_flags, + slot_idx, zc_flags); + + sqe->user_data = 1; + if (cfg_fixed_files) { + sqe->fd = 0; + sqe->flags |= IOSQE_FIXED_FILE; + } + } + + ret = io_uring_submit(&ring); + if (ret != cfg_nr_reqs) + error(1, ret, "submit"); + + for (i = 0; i < cfg_nr_reqs; i++) { + cqe = wait_cqe_fast(&ring); + + if (cqe->user_data == ZC_TAG) { + compl_cqes--; + i--; + } else if (cqe->user_data != 1) { + error(1, cqe->user_data, "invalid user_data"); + } else if (cqe->res > 0) { + packets++; + bytes += cqe->res; + } else if (cqe->res == -EAGAIN) { + /* request failed, don't flush */ + if (cfg_flush) + compl_cqes--; + } else if (cqe->res == -ECONNREFUSED || + cqe->res == -ECONNRESET || + cqe->res == -EPIPE) { + fprintf(stderr, "Connection failure\n"); + goto out_fail; + } else { + error(1, cqe->res, "send failed"); + } + + io_uring_cqe_seen(&ring, cqe); + } + } while (gettimeofday_ms() < tstop); + +out_fail: + shutdown(fd, SHUT_RDWR); + if (close(fd)) + error(1, errno, "close"); + + fprintf(stderr, "tx=%lu (MB=%lu), tx/s=%lu (MB/s=%lu)\n", + packets, bytes >> 20, + packets / (cfg_runtime_ms / 1000), + (bytes >> 20) / (cfg_runtime_ms / 1000)); + + while (compl_cqes) { + struct io_uring_cqe *cqe = wait_cqe_fast(&ring); + + io_uring_cqe_seen(&ring, cqe); + compl_cqes--; + } + + if (cfg_zc) { + ret = io_uring_unregister_notifications(&ring); + if (ret) + error(1, ret, "io_uring: tx ctx unregistration"); + } + io_uring_queue_exit(&ring); +} + +static void do_test(int domain, int type, int protocol) +{ + int i; + + for (i = 0; i < IP_MAXPACKET; i++) + payload[i] = 'a' + (i % 26); + + do_tx(domain, type, protocol); +} + +static void usage(const char *filepath) +{ + error(1, 0, "Usage: %s [-f] [-n] [-z0] [-s] " + "(-4|-6) [-t