From patchwork Wed May 19 14:13:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267443 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A109CC433ED for ; Wed, 19 May 2021 14:13:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 811F561002 for ; Wed, 19 May 2021 14:13:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347180AbhESOPN (ORCPT ); Wed, 19 May 2021 10:15:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347156AbhESOPN (ORCPT ); Wed, 19 May 2021 10:15:13 -0400 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 104E3C06175F; Wed, 19 May 2021 07:13:53 -0700 (PDT) Received: by mail-wm1-x32e.google.com with SMTP id b7so6787024wmh.5; Wed, 19 May 2021 07:13:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=s131NxggirtYayrqVgml5tnn47Lq2po2FNWURr3UV60=; b=up/h7cwepeCuFW+ckvA0R+rz9NuxgcdzFtCroymGVDqws1W/s6gFloH3vRSqhKaE5n omNKmKsgaOAfsWKMoJMoH2rBz6rR+G58CG2/uX6/+fRwjCdCWU5LZmJKJwkuAvICxFxj bNgqvzk9t6A6uyofRjy4yqIIAV7kEitGGWHfwi//1xJId75jyxflFwJDPh++CBCxzjw/ Tio+Bb3vf1SjIYQN+kzQ58TZ/f7gv9mR0qDWMxNC17d157sUxGYylXL+yX+wMrZLpk1a 4WFvdH522N7X4w+pQ3rkG6nN3RDN1u5U3Ve3Ony2u6ReBCCVc2Hv1fv9aRTE2xeLeDiS cl9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=s131NxggirtYayrqVgml5tnn47Lq2po2FNWURr3UV60=; b=kzGQ9etwRbU7KMAie/qKrL0DHWkbGSFawXShReTFNeJv/E6r1EdKyjIkz5+1voAvyS kPao2Ev0FETk5zlaIcOlZngXKDM+Ki4BZvrqaLLUvEsu4hlutXAGfCa8/cxRL8UoYbHG Rj2adKoBJ/ZwgPxyVhe6NWZowHLOnJWosCVErWZx7GEwzHqQs8x07H351U+l8x8JF5BU fygtslrAlRq9PXv+rcdeY2zrCmgaYkGyAhmrUL0PaD/K+HzWdZlLSPbUdpd6f9eJhJt8 LCqVGViZV2a0r9+liv4J1LQDhJ2XR6pcFWomfMX+pHggQ4OkduhX0w967EcX9S2UNFOC vwsA== X-Gm-Message-State: AOAM533OlLXtAQr0gj4g2TuN/ZLgpghgvfd1lAnVyUzq/R6a6Xt3MVAA oxa5wKAySI57vEZSJSsFgf07MaWxsg4EwJ73 X-Google-Smtp-Source: ABdhPJx0QoPK2jma1uj4ND6DDUZDCBgmBm1nwwPanzBqRODdr3IEM5yDjEtkx2J0XPiyNsX1dmtJjg== X-Received: by 2002:a7b:c8ce:: with SMTP id f14mr11831466wml.81.1621433631479; Wed, 19 May 2021 07:13:51 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.13.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:13:51 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 01/23] io_uring: shuffle rarely used ctx fields Date: Wed, 19 May 2021 15:13:12 +0100 Message-Id: <485abb65cf032f4ddf13dcc0bd60e5475638efc2.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org There is a bunch of scattered around ctx fields that are almost never used, e.g. only on ring exit, plunge them to the end, better locality, better aesthetically. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 9ac5e278a91e..7e3410ce100a 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -367,9 +367,6 @@ struct io_ring_ctx { unsigned cached_cq_overflow; unsigned long sq_check_overflow; - /* hashed buffered write serialization */ - struct io_wq_hash *hash_map; - struct list_head defer_list; struct list_head timeout_list; struct list_head cq_overflow_list; @@ -386,9 +383,6 @@ struct io_ring_ctx { struct io_rings *rings; - /* Only used for accounting purposes */ - struct mm_struct *mm_account; - const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ @@ -409,14 +403,6 @@ struct io_ring_ctx { unsigned nr_user_bufs; struct io_mapped_ubuf **user_bufs; - struct user_struct *user; - - struct completion ref_comp; - -#if defined(CONFIG_UNIX) - struct socket *ring_sock; -#endif - struct xarray io_buffers; struct xarray personalities; @@ -460,12 +446,24 @@ struct io_ring_ctx { struct io_restriction restrictions; - /* exit task_work */ - struct callback_head *exit_task_work; - /* Keep this last, we don't need it for the fast path */ - struct work_struct exit_work; - struct list_head tctx_list; + struct { + #if defined(CONFIG_UNIX) + struct socket *ring_sock; + #endif + /* hashed buffered write serialization */ + struct io_wq_hash *hash_map; + + /* Only used for accounting purposes */ + struct user_struct *user; + struct mm_struct *mm_account; + + /* ctx exit and cancelation */ + struct callback_head *exit_task_work; + struct work_struct exit_work; + struct list_head tctx_list; + struct completion ref_comp; + }; }; struct io_uring_task { From patchwork Wed May 19 14:13:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267445 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D9ABC433B4 for ; Wed, 19 May 2021 14:13:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 71F6B611AD for ; Wed, 19 May 2021 14:13:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353901AbhESOPQ (ORCPT ); Wed, 19 May 2021 10:15:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347217AbhESOPO (ORCPT ); Wed, 19 May 2021 10:15:14 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C7CDC06175F; Wed, 19 May 2021 07:13:54 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id r12so14214939wrp.1; Wed, 19 May 2021 07:13:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=z7dUjeyKGkKe7mPGtbrIQS3SETJnccJ7luLK5I56KT0=; b=JTpKqXzlK+LuzFismUcwFXaHar/t6U//StFAaKNpUCAC3u05X1nNW25ujeSk+TbTIt yvpiKxxRRfYBMsN+5AvXll7oodYpA5XqY52Ybkvl24oaVfZQumC0ghts4+SeQ0KaaycX UgrIKHH422hlkHMFONGKmzwFhL6vWA/vHDDT9pVxmOqQyphwaDIDkbrMoBOHvv6G1biF JB4Zctq90H+TgAc5llDqSkmI59HKSYkgA4dOM1KUUB+4iPyoSw3aolyzBaipTuN766MH WpAeY1TaZir7urK14vQpHGTgzfQbq+z4vsmQ+INXJWQIjdqntHCmiL2DGpXfCcNs8Rwr pL9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=z7dUjeyKGkKe7mPGtbrIQS3SETJnccJ7luLK5I56KT0=; b=pA6BbPtPBc8X1E6grqvHxBLRpIHgnomCl9F+Mau8PkKp23A8JLm1bVosJ1UaUmZDG6 R2oxC2MfQK/Ey8jyggCrmBzkAnKzXBsGlsibXgYQBnl3Bg9qx27WsHB+6aS7yDNSlmkx qDknNovliVNMETqj8uwSQyTJnG+VEFBTY2Om3ZNkqpkyScSPmY60v2xjWvgL0aWRlQCD zPj2lMrrgu4dgx027l0F3cHb51HdO3ErKFlCMzotkRm1xzcyKS3u1d2ab9Ye061n+Azf gjchD0S4gCpTmbE1OT5QhkKgDqB84IUkLaiLe0O7AqQ3KStrDc4nzrhgwefHDkF8oJxl Vs3w== X-Gm-Message-State: AOAM532U3GWivNGKxMkubDCntqMa64W7brArAIe0bokogb9XsmsulR/3 op7q3qwsaLXnRqC3n+uioVRth68vhZT0A9Iw X-Google-Smtp-Source: ABdhPJwyTQyo9hKriKCVc9I2oi9jFiN5ipCN16X8md9iyo0or6DWjVWG0ueCHDflqzV/JCWwGGm2Gw== X-Received: by 2002:adf:e781:: with SMTP id n1mr14475380wrm.136.1621433632597; Wed, 19 May 2021 07:13:52 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.13.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:13:52 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 02/23] io_uring: localise fixed resources fields Date: Wed, 19 May 2021 15:13:13 +0100 Message-Id: <7672fac6581f2a1656ad6828ebd43ee44c4e808e.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org ring has two types of resource-related fields, used for request submission, and field needed for update/registration. Reshuffle them into these two groups for better locality and readability. The second group is not in the hot path, so it's natural to place them somewhere in the end. Also update an outdated comment. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 7e3410ce100a..31eca208f675 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -390,21 +390,17 @@ struct io_ring_ctx { struct list_head sqd_list; /* - * If used, fixed file set. Writers must ensure that ->refs is dead, - * readers must ensure that ->refs is alive as long as the file* is - * used. Only updated through io_uring_register(2). + * Fixed resources fast path, should be accessed only under uring_lock, + * and updated through io_uring_register(2) */ - struct io_rsrc_data *file_data; + struct io_rsrc_node *rsrc_node; + struct io_file_table file_table; unsigned nr_user_files; - - /* if used, fixed mapped user buffers */ - struct io_rsrc_data *buf_data; unsigned nr_user_bufs; struct io_mapped_ubuf **user_bufs; struct xarray io_buffers; - struct xarray personalities; u32 pers_next; @@ -436,16 +432,21 @@ struct io_ring_ctx { bool poll_multi_file; } ____cacheline_aligned_in_smp; - struct delayed_work rsrc_put_work; - struct llist_head rsrc_put_llist; - struct list_head rsrc_ref_list; - spinlock_t rsrc_ref_lock; - struct io_rsrc_node *rsrc_node; - struct io_rsrc_node *rsrc_backup_node; - struct io_mapped_ubuf *dummy_ubuf; - struct io_restriction restrictions; + /* slow path rsrc auxilary data, used by update/register */ + struct { + struct io_rsrc_node *rsrc_backup_node; + struct io_mapped_ubuf *dummy_ubuf; + struct io_rsrc_data *file_data; + struct io_rsrc_data *buf_data; + + struct delayed_work rsrc_put_work; + struct llist_head rsrc_put_llist; + struct list_head rsrc_ref_list; + spinlock_t rsrc_ref_lock; + }; + /* Keep this last, we don't need it for the fast path */ struct { #if defined(CONFIG_UNIX) From patchwork Wed May 19 14:13:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267447 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 670B7C433B4 for ; Wed, 19 May 2021 14:14:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 48DD46135A for ; Wed, 19 May 2021 14:14:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353962AbhESOPb (ORCPT ); Wed, 19 May 2021 10:15:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353891AbhESOPP (ORCPT ); Wed, 19 May 2021 10:15:15 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64BEFC061760; Wed, 19 May 2021 07:13:55 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id u4-20020a05600c00c4b02901774b80945cso3532150wmm.3; Wed, 19 May 2021 07:13:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KqEkF4Dijyzr0PPwp8tl6lkEhCqjAVOKZqJtKABlU28=; b=iPwmgJkLjvCL/f8ItIm1m4rk0cwwg0XxIjxDWQkkisNr4iXOEGCpqj8/RTpEi/2hEy eJiREbqnUE+FErrGM97tgGNFx5wvOK/KW8CDycGbEUPTOYVwDmjy18qG6KtZpocXIZ71 ZqlO02LE+KunTpU1Z8Lf1oKsFbV2+EHkEJPh/mbGLBQkZ6gNMQA7G6EyZ4KDEcvinlCn UBUxm03NyLvSYiEIWdN+u/mJk18SvktCW7mbEUEwPBu++2Usat8EHcldZLcRTAy0pslS CZBRH5pM0wi9I13cdDPlNgI1GF95QwEdl89BYZgrFCtVhdU5KvScc2ENSbe/5Wj7l5S5 NLiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KqEkF4Dijyzr0PPwp8tl6lkEhCqjAVOKZqJtKABlU28=; b=c7VfuPHtQHDruGj8UGkwq/KNZ61mL7YicEgoB68U8rJwAYL3p/WNlibSrRFdXpq4bo Lkcrfc8ofPQ34LYtN5DNziNJ31jDGvtFR19vvLzf60cqS7kSsF7Y2wkzyUsrwJvq5zN/ z0JW2rACGJsYYp00+QqOiITfiMV15R8ELGSCtuFJ7VmcCzjoX15XAC+0V1+2NDcq824b gAn67a3c10TmVf/PHi0y3c4hZW1HQP0VKxLVkjh34VaM0LZO4AwPhJFkmiJKxAm10jCJ yyoFu+WuNSdrPtVUXBETk2F7gfTj26+ESibExFU54RhTJmTT9O9ksTcxVI3HkvPTLSEW UAhQ== X-Gm-Message-State: AOAM531rISNr9VzZyPZ4ta3Jnmgwxp0bYxlYTASSq/NWNGe3E9po2NYf 2InRFywlM6mtXUrew8U7aQCvZ2GhCoE4G4YL X-Google-Smtp-Source: ABdhPJzlaROjPTre/59PQ3SeVaTAljdyKaPRGNJMK2c8Ew/36zXwYH2Do7kAdXk5eQ79G60rJamITg== X-Received: by 2002:a1c:4c10:: with SMTP id z16mr11515439wmf.134.1621433633790; Wed, 19 May 2021 07:13:53 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.13.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:13:53 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 03/23] io_uring: remove dependency on ring->sq/cq_entries Date: Wed, 19 May 2021 15:13:14 +0100 Message-Id: <1188e30e5693519d59f065669fb1b8c415b076cf.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org We have numbers of {sq,cq} entries cached in ctx, don't look up them in user-shared rings as 1) it may fetch additional cacheline 2) user may change it and so it's always error prone. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 31eca208f675..15dc5dad1f7d 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1352,7 +1352,7 @@ static inline bool io_sqring_full(struct io_ring_ctx *ctx) { struct io_rings *r = ctx->rings; - return READ_ONCE(r->sq.tail) - ctx->cached_sq_head == r->sq_ring_entries; + return READ_ONCE(r->sq.tail) - ctx->cached_sq_head == ctx->sq_entries; } static inline unsigned int __io_cqring_events(struct io_ring_ctx *ctx) @@ -1370,7 +1370,7 @@ static inline struct io_uring_cqe *io_get_cqring(struct io_ring_ctx *ctx) * control dependency is enough as we're using WRITE_ONCE to * fill the cq entry */ - if (__io_cqring_events(ctx) == rings->cq_ring_entries) + if (__io_cqring_events(ctx) == ctx->cq_entries) return NULL; tail = ctx->cached_cq_tail++; @@ -1423,11 +1423,10 @@ static void io_cqring_ev_posted_iopoll(struct io_ring_ctx *ctx) /* Returns true if there are no backlogged entries after the flush */ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) { - struct io_rings *rings = ctx->rings; unsigned long flags; bool all_flushed, posted; - if (!force && __io_cqring_events(ctx) == rings->cq_ring_entries) + if (!force && __io_cqring_events(ctx) == ctx->cq_entries) return false; posted = false; From patchwork Wed May 19 14:13:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267449 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7322CC433B4 for ; Wed, 19 May 2021 14:14:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 52FF7611AD for ; Wed, 19 May 2021 14:14:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353970AbhESOPc (ORCPT ); Wed, 19 May 2021 10:15:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353908AbhESOPQ (ORCPT ); Wed, 19 May 2021 10:15:16 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACD58C061763; Wed, 19 May 2021 07:13:56 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id b7so6787134wmh.5; Wed, 19 May 2021 07:13:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1LxYRWZUsnSIxOX2Qv7kHLFKMWp6NXKb/JCxteoPhi4=; b=ZFGdykG9VWTjZi5gupEJgX3FD/oRFvi4uH4Ko+eiMjEJW4yRLfyXH/OJiOAKd1q4yi 1+kKBEGdJuCZKkyDLC8k4uuC4JB5LU72XNU1VLdXgl+MMc0hDicBfgBiW3MTw2oI8v5g LF9Y0fHU56llWpvBOzIlBmV3TJYmQOvIv86gOmgFHgL7SsHicH7l4/XXO0/sB8y2+wlq QYPvOGnDvUEK2+u3WiCCWmeQFuoZelbGdzL1TZvLMaXNJpG2KdE63GtCBE2jbsx85zeC iIB2M/6PqCwnw9Ef11u0HUDtM2cpnrop9pzwh057QWKy+5+5G0769/BBTU1bgwlEJAoy pa1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1LxYRWZUsnSIxOX2Qv7kHLFKMWp6NXKb/JCxteoPhi4=; b=nwgOcFLF/s/fod2ke37r2smu7kjjEq5pwNEIwmmeDtLSYWl/MHXnPoaqyDmSXbTeoJ 7qzNgpfQHi/i6rmrISnb6atlqPdGDreIY6FqMU+1SjUcVsJaX9YzCB9RGWTebkQdgwxh o+pl/UWmzMCUmnzYh4fhLZJmlYvW3yNxqW7v9ygdTjhIpvu8BMRo7d/0vObtgDhnY3If Nf4J6cwK2cudoLKLTY/xSSzUrTxiySo9CEsj98k3rroHMFiuO9qhLArgPQPjqvxtd++B IXUNxHBu/JW1qO8Eo7bTXB6I7LjBqiYqfF/6q5qjB5KESGNepq35PjPHCdjIATmQV8Ad BtwQ== X-Gm-Message-State: AOAM533DYoo98464gyttRladTs2UynKwDuglaFqMLXeeaHRigHBM34od hBye4HIvpR0G1U/uy4Lv4UrpALZE5JPfUbSx X-Google-Smtp-Source: ABdhPJzyGiyrMlenYIpS+THnOoIvHrkX2gj9WOMTuPD2+Qh++DG+o95t0zdKSz40zTawQlV6SgljOA== X-Received: by 2002:a05:600c:3654:: with SMTP id y20mr6103852wmq.184.1621433635073; Wed, 19 May 2021 07:13:55 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.13.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:13:54 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 04/23] io_uring: deduce cq_mask from cq_entries Date: Wed, 19 May 2021 15:13:15 +0100 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org No need to cache cq_mask, it's exactly cq_entries - 1, so just deduce it to not carry it around. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 15dc5dad1f7d..067c89e63fea 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -361,7 +361,6 @@ struct io_ring_ctx { u32 *sq_array; unsigned cached_sq_head; unsigned sq_entries; - unsigned sq_mask; unsigned sq_thread_idle; unsigned cached_sq_dropped; unsigned cached_cq_overflow; @@ -407,7 +406,6 @@ struct io_ring_ctx { struct { unsigned cached_cq_tail; unsigned cq_entries; - unsigned cq_mask; atomic_t cq_timeouts; unsigned cq_last_tm_flush; unsigned cq_extra; @@ -1363,7 +1361,7 @@ static inline unsigned int __io_cqring_events(struct io_ring_ctx *ctx) static inline struct io_uring_cqe *io_get_cqring(struct io_ring_ctx *ctx) { struct io_rings *rings = ctx->rings; - unsigned tail; + unsigned tail, mask = ctx->cq_entries - 1; /* * writes to the cq entry need to come after reading head; the @@ -1374,7 +1372,7 @@ static inline struct io_uring_cqe *io_get_cqring(struct io_ring_ctx *ctx) return NULL; tail = ctx->cached_cq_tail++; - return &rings->cqes[tail & ctx->cq_mask]; + return &rings->cqes[tail & mask]; } static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx) @@ -6677,7 +6675,7 @@ static void io_commit_sqring(struct io_ring_ctx *ctx) static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx) { u32 *sq_array = ctx->sq_array; - unsigned head; + unsigned head, mask = ctx->sq_entries - 1; /* * The cached sq head (or cq tail) serves two purposes: @@ -6687,7 +6685,7 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx) * 2) allows the kernel side to track the head on its own, even * though the application is the one updating it. */ - head = READ_ONCE(sq_array[ctx->cached_sq_head++ & ctx->sq_mask]); + head = READ_ONCE(sq_array[ctx->cached_sq_head++ & mask]); if (likely(head < ctx->sq_entries)) return &ctx->sq_sqes[head]; @@ -9493,8 +9491,6 @@ static int io_allocate_scq_urings(struct io_ring_ctx *ctx, rings->cq_ring_mask = p->cq_entries - 1; rings->sq_ring_entries = p->sq_entries; rings->cq_ring_entries = p->cq_entries; - ctx->sq_mask = rings->sq_ring_mask; - ctx->cq_mask = rings->cq_ring_mask; size = array_size(sizeof(struct io_uring_sqe), p->sq_entries); if (size == SIZE_MAX) { From patchwork Wed May 19 14:13:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267451 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80039C43461 for ; Wed, 19 May 2021 14:14:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 64FEB61002 for ; Wed, 19 May 2021 14:14:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353999AbhESOPk (ORCPT ); Wed, 19 May 2021 10:15:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353948AbhESOP2 (ORCPT ); Wed, 19 May 2021 10:15:28 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4BABC06138B; Wed, 19 May 2021 07:13:57 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id a4so14236438wrr.2; Wed, 19 May 2021 07:13:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZPT+0HJmzsc95uwTFh6BaxO4xNAkeAD/mL9HofulqQs=; b=ugKf7X7OZjfuyM0BMA5+9eJmnjeem8TIzS8o3NGQQBbdg//F4mw1EcRMf58jy07IiS J+QHbGfm/uMUrK8Q6zp9twEnp9vI0wGClssYlxIBoSWP/XICWUOoRySpkJRjkwdsh079 tvNWrWJWFm0bXiK04IyGz6uty3Isw9b11hcK6ykOhAcHCwKaQyitsCp20aFTRNNZjgw5 Ero9VdPiVQ3btsfqtsBVmS/mT5tj6/Nqo2x3bEnRwV9RxhfxVDI7YzvW/DbLw0fMPtSE A0MxJevqdd2fv8OVs/jVH8tl7rBBuRKgP/o+VtcYHqraKeHvxothy/x8Rfh4NpSVje+5 OxGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZPT+0HJmzsc95uwTFh6BaxO4xNAkeAD/mL9HofulqQs=; b=bYKkp4VFEugnNKq4M5cbdY6Hls6GEGbUdk3PLkC2WIxA2O+nen7iYn7oWdzLTJMR4c Csf7ltgFm04jR8XrmGLUTeqRY99dj+67eyX/WqVJvRVjKQhy2SZKIsAu7vgI7wskTB+B VnKu3HUpr3DMiA5vfYxG2d+JI0fwoM15L3yDGVtm6UyfAt4qxlfrVLT0hpkKRXSEAzvk wUUDMLjMtVesn/Hr78LgToF1vFhTc/OOa4BobVG8+cbYiBvDJAjGOLUHJ8v9bfDjF8PM rCUyj7q2w1uWJsAq3XeG7WuACP+OqlHdSTmpTf6ErnASN388ZWcg7vsE+Mp/GULDtc0s uBng== X-Gm-Message-State: AOAM532+XKstRw5YBabOAc7nwYnWkFMWqWIbZb+rNDcSVG0h3r9E5urd Q9Ns5bn+gjeTRJgcQxmrG06LdLcxVuMmw82S X-Google-Smtp-Source: ABdhPJwIGP4B1GsZIjPV9HwTg4I3krYxTRqYaxuRBbbjraw0wXD/gBdvaNtXqSmRKJ/QDmpO7Eh42g== X-Received: by 2002:a5d:64c7:: with SMTP id f7mr14861047wri.257.1621433636230; Wed, 19 May 2021 07:13:56 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.13.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:13:55 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 05/23] io_uring: kill cached_cq_overflow Date: Wed, 19 May 2021 15:13:16 +0100 Message-Id: <740885c2bdc38f2a269cd9591987c80ae7b7ce8a.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org There are two copies of cq_overflow, shared with userspace and internal cached one. It was needed for DRAIN accounting, but now we have yet another knob to tune the accounting, i.e. cq_extra, and we can throw away the internal counter and just increment the one in the shared ring. If user modifies it as so never gets the right overflow value ever again, it's its problem, even though before we would have restored it back by next overflow. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 067c89e63fea..b89a781b3f33 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -363,7 +363,6 @@ struct io_ring_ctx { unsigned sq_entries; unsigned sq_thread_idle; unsigned cached_sq_dropped; - unsigned cached_cq_overflow; unsigned long sq_check_overflow; struct list_head defer_list; @@ -1195,13 +1194,20 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) return NULL; } +static void io_account_cq_overflow(struct io_ring_ctx *ctx) +{ + struct io_rings *r = ctx->rings; + + WRITE_ONCE(r->cq_overflow, READ_ONCE(r->cq_overflow) + 1); + ctx->cq_extra--; +} + static bool req_need_defer(struct io_kiocb *req, u32 seq) { if (unlikely(req->flags & REQ_F_IO_DRAIN)) { struct io_ring_ctx *ctx = req->ctx; - return seq + ctx->cq_extra != ctx->cached_cq_tail - + READ_ONCE(ctx->cached_cq_overflow); + return seq + READ_ONCE(ctx->cq_extra) != ctx->cached_cq_tail; } return false; @@ -1440,8 +1446,8 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) if (cqe) memcpy(cqe, &ocqe->cqe, sizeof(*cqe)); else - WRITE_ONCE(ctx->rings->cq_overflow, - ++ctx->cached_cq_overflow); + io_account_cq_overflow(ctx); + posted = true; list_del(&ocqe->list); kfree(ocqe); @@ -1525,7 +1531,7 @@ static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data, * or cannot allocate an overflow entry, then we need to drop it * on the floor. */ - WRITE_ONCE(ctx->rings->cq_overflow, ++ctx->cached_cq_overflow); + io_account_cq_overflow(ctx); return false; } if (list_empty(&ctx->cq_overflow_list)) { From patchwork Wed May 19 14:13:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267453 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D06AC43470 for ; Wed, 19 May 2021 14:14:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3286C61002 for ; Wed, 19 May 2021 14:14:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353938AbhESOPm (ORCPT ); Wed, 19 May 2021 10:15:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353949AbhESOP2 (ORCPT ); Wed, 19 May 2021 10:15:28 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1CD7C06138C; Wed, 19 May 2021 07:13:58 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id y14so12113786wrm.13; Wed, 19 May 2021 07:13:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OeEwqb/igFRqCGm3Sb3F+bC88dq+JZby5yeiT6dBiuA=; b=l/JRy3U4P60H+ui9Tl2niQPeBH2AO/7v7MCP31Gh/1iXHCuc02HVFwRjY4Ia2Nrigv 06sfdrm8q70S60Aewx3MEjx3lTL3Zhjvc9QHw4uDoin8VJeRRgVmElVel0KlIyFpg3xc hCP40Na9Pts1R+abFOalbO9eATC9hMYy34YJHW8H3B3KODH5oRgAtviFd1xOH9Nu5DRN zIyGmkAgwia9Q/jZQ3HuCzu39B5t7g1ZSpiM9xfCAROWTQeD4pw5V95LupNZXOCv0wNo ZHmKb4BblNwv2I4EoxoEWQ+c/YKlGPzMdp7hrEuqMqC5viDBjnoJ5f5oCmuC+p2vUW+x VUiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OeEwqb/igFRqCGm3Sb3F+bC88dq+JZby5yeiT6dBiuA=; b=V7o6/I1Y6Xmp969YezGelJwe9orUbJTFrtabIBimsFDLrxujuBfHF0bCdi8mHPq9Wz JqXku19H3a+9FQOLbbt32wA06oLd1gGdzwrWYboquSLDjJJF02Ak9mQBo/kiPxUPnbez QRoNGGc5V3WCrEwb80ItARBojc2VeYKry9Qq01z02vQTrJ6llBXRPdxU50/YgeC6Z4gZ ZKz/IUSH1VFzO/pdNRof4NCEz6ZjCFzS6CpbecP1tl8nFSSQuGf40LF8v+Xo2ihwPPTp kwosTUJoUiGHtRgyjDNYFwwRwHaltRyxRzQFln1Nt9oN3E87p/zlmK7Btegs71zafntQ 9Llg== X-Gm-Message-State: AOAM532QS82NwUnqkTcybVPaAUas/l9bDV6gANYPvxLV0QWQgoN7ycDu Zdc6rMJTX4GbHtS/GMjdT8x/kOdxPo/5JqsS X-Google-Smtp-Source: ABdhPJwpyFzXtcUnk3BSIi+6vwJdGw6RixhvwFjAVHfMm9raGNhfvFPKcJrhRXzfF+HSesNdZhKb+g== X-Received: by 2002:a5d:5541:: with SMTP id g1mr10583939wrw.102.1621433637408; Wed, 19 May 2021 07:13:57 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.13.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:13:57 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 06/23] io_uring: rename io_get_cqring Date: Wed, 19 May 2021 15:13:17 +0100 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Rename io_get_cqring() into io_get_cqe() for consistency with SQ, and just because the old name is not as clear. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index b89a781b3f33..49a1b6b81d7d 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -11,7 +11,7 @@ * before writing the tail (using smp_load_acquire to read the tail will * do). It also needs a smp_mb() before updating CQ head (ordering the * entry load(s) with the head store), pairing with an implicit barrier - * through a control-dependency in io_get_cqring (smp_store_release to + * through a control-dependency in io_get_cqe (smp_store_release to * store head will do). Failure to do so could lead to reading invalid * CQ entries. * @@ -1364,7 +1364,7 @@ static inline unsigned int __io_cqring_events(struct io_ring_ctx *ctx) return ctx->cached_cq_tail - READ_ONCE(ctx->rings->cq.head); } -static inline struct io_uring_cqe *io_get_cqring(struct io_ring_ctx *ctx) +static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx) { struct io_rings *rings = ctx->rings; unsigned tail, mask = ctx->cq_entries - 1; @@ -1436,7 +1436,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) posted = false; spin_lock_irqsave(&ctx->completion_lock, flags); while (!list_empty(&ctx->cq_overflow_list)) { - struct io_uring_cqe *cqe = io_get_cqring(ctx); + struct io_uring_cqe *cqe = io_get_cqe(ctx); struct io_overflow_cqe *ocqe; if (!cqe && !force) @@ -1558,7 +1558,7 @@ static inline bool __io_cqring_fill_event(struct io_ring_ctx *ctx, u64 user_data * submission (by quite a lot). Increment the overflow count in * the ring. */ - cqe = io_get_cqring(ctx); + cqe = io_get_cqe(ctx); if (likely(cqe)) { WRITE_ONCE(cqe->user_data, user_data); WRITE_ONCE(cqe->res, res); From patchwork Wed May 19 14:13:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267455 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5EAEC43460 for ; Wed, 19 May 2021 14:14:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 87A5361355 for ; Wed, 19 May 2021 14:14:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353868AbhESOPq (ORCPT ); Wed, 19 May 2021 10:15:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241049AbhESOPa (ORCPT ); Wed, 19 May 2021 10:15:30 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B715C06138D; Wed, 19 May 2021 07:14:00 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id s5-20020a7bc0c50000b0290147d0c21c51so3422815wmh.4; Wed, 19 May 2021 07:14:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ynxDMKj1mZvdjVLVE7e+VlFds8wn53Y4eJJTD49qFXs=; b=d4Nmrz8vuHnINI7FqKfqaxeKr8lLU6huobBBLU5pcxCAYd/FusFVQnM/auRbVdKdjK GUj9rcPT/2uLbjIXGjtUDQcUKTrll1b/HRYh0ZWNypDa2bDjk5166C2Wg2V85BK2JWge F386IBsragShMQdfBRNxz60S8UZqrbE7P0tA8jViapoDvSo67V2/+Wm3r8keBQPT/SRJ V8d/qebuBvd8DF0VB+ldr8CvlY5ff49NgN/YfT9Jrt3esQfW0YbFVQiLIT3igOjZbwA7 ce5RhWEwvkYsXzlsTJh2ogsKvGevILWLp/QIxZGBF5Aw+e9zuP5VCQlkaUNJG/qhG1wG gAfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ynxDMKj1mZvdjVLVE7e+VlFds8wn53Y4eJJTD49qFXs=; b=LU5UhPy809KKxpXvI5QCnhB4frmEFI4V9rCqaU3puea4GWUmBMddsk9gzKa2V0XvBH OoegrtQTBAlKkV+zDmLWW8pJyPiix97bnYZmRWnvhKUZRONhePcgb1JDXks9IYY16It/ vgoNUiHWXz4hUuJGLpQz5z6rTVFFUWzG/escaGOI/gGap0Rr4ZqEqR4qP9fihagIrteZ oTHbLrXkk1NElrOiPlc/UZuE2j0fraGfF626bhSuEKJ9X+aQKz5CreLX//eYhNJsjHG0 D4EtVQK6E6EDvLTssOa9sPyibPs3t4zZ7MI+0+KhC/ds3fKV15K1blrRSUwdsNsLVknZ QdoA== X-Gm-Message-State: AOAM530S2j5TT9qOJ7o2rSNliXdo2qi7GLsHiNZOlvUvzhiiNmW7QlHI +reHNG2izPI2r1hHnhvwaaWdj9V58CkemdPR X-Google-Smtp-Source: ABdhPJybim8WElMlrRnVIC+aKbppRHV5ltL1p3gtqqJJsWN9vAClwKWFPDMnXTv7DBD3Z1ImDoRaPg== X-Received: by 2002:a05:600c:4b92:: with SMTP id e18mr11779556wmp.71.1621433638539; Wed, 19 May 2021 07:13:58 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.13.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:13:58 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 07/23] io_uring: extract struct for CQ Date: Wed, 19 May 2021 15:13:18 +0100 Message-Id: <9203fb800f78165633f295e17bfcacf3c3409404.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Extract a structure describing an internal completion queue state and called, struct io_cqring. We need it to support multi-CQ rings. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 47 +++++++++++++++++++++++++---------------------- 1 file changed, 25 insertions(+), 22 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 49a1b6b81d7d..4fecd9da689e 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -335,6 +335,12 @@ struct io_submit_state { unsigned int ios_left; }; +struct io_cqring { + unsigned cached_tail; + unsigned entries; + struct io_rings *rings; +}; + struct io_ring_ctx { struct { struct percpu_ref refs; @@ -402,17 +408,14 @@ struct io_ring_ctx { struct xarray personalities; u32 pers_next; - struct { - unsigned cached_cq_tail; - unsigned cq_entries; - atomic_t cq_timeouts; - unsigned cq_last_tm_flush; - unsigned cq_extra; - unsigned long cq_check_overflow; - struct wait_queue_head cq_wait; - struct fasync_struct *cq_fasync; - struct eventfd_ctx *cq_ev_fd; - } ____cacheline_aligned_in_smp; + struct fasync_struct *cq_fasync; + struct eventfd_ctx *cq_ev_fd; + atomic_t cq_timeouts; + unsigned cq_last_tm_flush; + unsigned long cq_check_overflow; + unsigned cq_extra; + struct wait_queue_head cq_wait; + struct io_cqring cqs[1]; struct { spinlock_t completion_lock; @@ -1207,7 +1210,7 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq) if (unlikely(req->flags & REQ_F_IO_DRAIN)) { struct io_ring_ctx *ctx = req->ctx; - return seq + READ_ONCE(ctx->cq_extra) != ctx->cached_cq_tail; + return seq + READ_ONCE(ctx->cq_extra) != ctx->cqs[0].cached_tail; } return false; @@ -1312,7 +1315,7 @@ static void io_flush_timeouts(struct io_ring_ctx *ctx) if (list_empty(&ctx->timeout_list)) return; - seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts); + seq = ctx->cqs[0].cached_tail - atomic_read(&ctx->cq_timeouts); do { u32 events_needed, events_got; @@ -1346,7 +1349,7 @@ static void io_commit_cqring(struct io_ring_ctx *ctx) io_flush_timeouts(ctx); /* order cqe stores with ring update */ - smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail); + smp_store_release(&ctx->rings->cq.tail, ctx->cqs[0].cached_tail); if (unlikely(!list_empty(&ctx->defer_list))) __io_queue_deferred(ctx); @@ -1361,23 +1364,23 @@ static inline bool io_sqring_full(struct io_ring_ctx *ctx) static inline unsigned int __io_cqring_events(struct io_ring_ctx *ctx) { - return ctx->cached_cq_tail - READ_ONCE(ctx->rings->cq.head); + return ctx->cqs[0].cached_tail - READ_ONCE(ctx->rings->cq.head); } static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx) { struct io_rings *rings = ctx->rings; - unsigned tail, mask = ctx->cq_entries - 1; + unsigned tail, mask = ctx->cqs[0].entries - 1; /* * writes to the cq entry need to come after reading head; the * control dependency is enough as we're using WRITE_ONCE to * fill the cq entry */ - if (__io_cqring_events(ctx) == ctx->cq_entries) + if (__io_cqring_events(ctx) == ctx->cqs[0].entries) return NULL; - tail = ctx->cached_cq_tail++; + tail = ctx->cqs[0].cached_tail++; return &rings->cqes[tail & mask]; } @@ -1430,7 +1433,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) unsigned long flags; bool all_flushed, posted; - if (!force && __io_cqring_events(ctx) == ctx->cq_entries) + if (!force && __io_cqring_events(ctx) == ctx->cqs[0].entries) return false; posted = false; @@ -5670,7 +5673,7 @@ static int io_timeout(struct io_kiocb *req, unsigned int issue_flags) goto add; } - tail = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts); + tail = ctx->cqs[0].cached_tail - atomic_read(&ctx->cq_timeouts); req->timeout.target_seq = tail + off; /* Update the last seq here in case io_flush_timeouts() hasn't. @@ -9331,7 +9334,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, if (unlikely(ret)) goto out; - min_complete = min(min_complete, ctx->cq_entries); + min_complete = min(min_complete, ctx->cqs[0].entries); /* * When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, user @@ -9481,7 +9484,7 @@ static int io_allocate_scq_urings(struct io_ring_ctx *ctx, /* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; - ctx->cq_entries = p->cq_entries; + ctx->cqs[0].entries = p->cq_entries; size = rings_size(p->sq_entries, p->cq_entries, &sq_array_offset); if (size == SIZE_MAX) From patchwork Wed May 19 14:13:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267461 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF548C43600 for ; Wed, 19 May 2021 14:14:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9D87661002 for ; Wed, 19 May 2021 14:14:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354078AbhESOPy (ORCPT ); Wed, 19 May 2021 10:15:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353959AbhESOPb (ORCPT ); Wed, 19 May 2021 10:15:31 -0400 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23A89C06138E; Wed, 19 May 2021 07:14:01 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id q5so14199579wrs.4; Wed, 19 May 2021 07:14:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tVo/T1yE+q8hSn/v+s7FHlx6RSl5fWFsncg9ZgUqYd4=; b=ja2ZjcDrt7cmTWhUsHVsmJb72TBh6EX8jRUdylIlqXda/OcyRsfrXJPfe8JkCQpDi0 CoLiA9Ax3AH1wfdUuQOw+q+oLC1Bq98vhjQ9A9RbBY2KBdq7GrWznwgCfsl0Nw9vgLdd YENnxbNIFRXGpVAlXjISpThNw5lFV2sGV1XFFUYqKU+qdiEiRI3+r+t3zXDQ4nGSsMPu GVc8kAUoHeC1ovt3RwdJB8jpC1gxWNNNttR0Pjn+214lJe0E1e5ZO9MkVX9KX87ufq36 pxkaxMTAoHogIkHUOIMmE+IfDhQDMzRRMwoNr8FD/2e2EpwotYFp95pWyDdv6wloZ/sJ 1/dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tVo/T1yE+q8hSn/v+s7FHlx6RSl5fWFsncg9ZgUqYd4=; b=NT/1qrIiMoVydGJf6bBwxMQG7mBlQEbCIZWOPW7tlb0MsBTBLwrNZwDOKk6ev+IuEi 5X5CMX0CNqT0pgAubdP49DoB24HpHCjh7UJRfaCERPgsu//P9/DPPVzTvO0HVw4YIp9d O73mqzQWkthMC34QhUssKdQTL1X9cMgq2vc1CDzrQpNV4jItwYxPQtlQxdRUwGRYUGCA DoIEc7FOIaFlcw/mLscKREdS3RbL9/5m0F/WNNrtqMHqR4Z/Pfn7pQ9Tpu7yxBbTf7OK Ba4dY1vNmiMSKLwPWb5QHqIB6GkEqefMS2vqJkmzE3eJipdoK8Sfx+qsj4dlpJcu2u9S ahXA== X-Gm-Message-State: AOAM533WXd+maAAZ2qfVFCbCxJOo0pOSokxSznG9G8ge7o2bCBsETa5L hvrwb82l7Ot7mcvEq3ie79T0wUON2A9Ftfqm X-Google-Smtp-Source: ABdhPJydHbarWJuFz/gt1sqT7MwFSD8jJ0FxcozEW3sGEXSaJNuUT7/R6ohY4VrRyz5nv88754oqLQ== X-Received: by 2002:adf:a519:: with SMTP id i25mr14971775wrb.312.1621433639769; Wed, 19 May 2021 07:13:59 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.13.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:13:59 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 08/23] io_uring: internally pass CQ indexes Date: Wed, 19 May 2021 15:13:19 +0100 Message-Id: <8871c605590f1b1371d66fc37798bed356777ef8.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Allow to pass CQ index from SQE to the end CQE generators, but support only one CQ for now. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 113 ++++++++++++++++++++++------------ include/uapi/linux/io_uring.h | 1 + 2 files changed, 75 insertions(+), 39 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 4fecd9da689e..356a5dc90f46 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -90,6 +90,8 @@ #define IORING_MAX_ENTRIES 32768 #define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES) +#define IO_DEFAULT_CQ 0 + /* * Shift of 9 is 512 entries, or exactly one page on 64-bit archs */ @@ -416,6 +418,7 @@ struct io_ring_ctx { unsigned cq_extra; struct wait_queue_head cq_wait; struct io_cqring cqs[1]; + unsigned int cq_nr; struct { spinlock_t completion_lock; @@ -832,6 +835,7 @@ struct io_kiocb { struct io_kiocb *link; struct percpu_ref *fixed_rsrc_refs; + u16 cq_idx; /* used with ctx->iopoll_list with reads/writes */ struct list_head inflight_entry; @@ -1034,7 +1038,8 @@ static void io_uring_cancel_sqpoll(struct io_sq_data *sqd); static struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx); static bool io_cqring_fill_event(struct io_ring_ctx *ctx, u64 user_data, - long res, unsigned int cflags); + long res, unsigned int cflags, + unsigned int cq_idx); static void io_put_req(struct io_kiocb *req); static void io_put_req_deferred(struct io_kiocb *req, int nr); static void io_dismantle_req(struct io_kiocb *req); @@ -1207,13 +1212,15 @@ static void io_account_cq_overflow(struct io_ring_ctx *ctx) static bool req_need_defer(struct io_kiocb *req, u32 seq) { - if (unlikely(req->flags & REQ_F_IO_DRAIN)) { - struct io_ring_ctx *ctx = req->ctx; - - return seq + READ_ONCE(ctx->cq_extra) != ctx->cqs[0].cached_tail; - } + struct io_ring_ctx *ctx = req->ctx; + u32 cnt = 0; + int i; - return false; + if (!(req->flags & REQ_F_IO_DRAIN)) + return false; + for (i = 0; i < ctx->cq_nr; i++) + cnt += ctx->cqs[i].cached_tail; + return seq + READ_ONCE(ctx->cq_extra) != cnt; } static void io_req_track_inflight(struct io_kiocb *req) @@ -1289,7 +1296,8 @@ static void io_kill_timeout(struct io_kiocb *req, int status) atomic_set(&req->ctx->cq_timeouts, atomic_read(&req->ctx->cq_timeouts) + 1); list_del_init(&req->timeout.list); - io_cqring_fill_event(req->ctx, req->user_data, status, 0); + io_cqring_fill_event(req->ctx, req->user_data, status, 0, + req->cq_idx); io_put_req_deferred(req, 1); } } @@ -1346,10 +1354,13 @@ static void io_flush_timeouts(struct io_ring_ctx *ctx) static void io_commit_cqring(struct io_ring_ctx *ctx) { + int i; + io_flush_timeouts(ctx); /* order cqe stores with ring update */ - smp_store_release(&ctx->rings->cq.tail, ctx->cqs[0].cached_tail); + for (i = 0; i < ctx->cq_nr; i++) + smp_store_release(&ctx->cqs[i].rings->cq.tail, ctx->cqs[i].cached_tail); if (unlikely(!list_empty(&ctx->defer_list))) __io_queue_deferred(ctx); @@ -1362,25 +1373,27 @@ static inline bool io_sqring_full(struct io_ring_ctx *ctx) return READ_ONCE(r->sq.tail) - ctx->cached_sq_head == ctx->sq_entries; } -static inline unsigned int __io_cqring_events(struct io_ring_ctx *ctx) +static inline unsigned int __io_cqring_events(struct io_cqring *cq) { - return ctx->cqs[0].cached_tail - READ_ONCE(ctx->rings->cq.head); + return cq->cached_tail - READ_ONCE(cq->rings->cq.head); } -static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx) +static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx, + unsigned int idx) { - struct io_rings *rings = ctx->rings; - unsigned tail, mask = ctx->cqs[0].entries - 1; + struct io_cqring *cq = &ctx->cqs[idx]; + struct io_rings *rings = cq->rings; + unsigned tail, mask = cq->entries - 1; /* * writes to the cq entry need to come after reading head; the * control dependency is enough as we're using WRITE_ONCE to * fill the cq entry */ - if (__io_cqring_events(ctx) == ctx->cqs[0].entries) + if (__io_cqring_events(cq) == cq->entries) return NULL; - tail = ctx->cqs[0].cached_tail++; + tail = cq->cached_tail++; return &rings->cqes[tail & mask]; } @@ -1432,16 +1445,18 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) { unsigned long flags; bool all_flushed, posted; + struct io_cqring *cq = &ctx->cqs[IO_DEFAULT_CQ]; - if (!force && __io_cqring_events(ctx) == ctx->cqs[0].entries) + if (!force && __io_cqring_events(cq) == cq->entries) return false; posted = false; spin_lock_irqsave(&ctx->completion_lock, flags); while (!list_empty(&ctx->cq_overflow_list)) { - struct io_uring_cqe *cqe = io_get_cqe(ctx); + struct io_uring_cqe *cqe = io_get_cqe(ctx, IO_DEFAULT_CQ); struct io_overflow_cqe *ocqe; + if (!cqe && !force) break; ocqe = list_first_entry(&ctx->cq_overflow_list, @@ -1523,12 +1538,17 @@ static inline void req_ref_get(struct io_kiocb *req) } static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data, - long res, unsigned int cflags) + long res, unsigned int cflags, + unsigned int cq_idx) { struct io_overflow_cqe *ocqe; + if (cq_idx != IO_DEFAULT_CQ) + goto overflow; + ocqe = kmalloc(sizeof(*ocqe), GFP_ATOMIC | __GFP_ACCOUNT); if (!ocqe) { +overflow: /* * If we're in ring overflow flush mode, or in task cancel mode, * or cannot allocate an overflow entry, then we need to drop it @@ -1550,7 +1570,8 @@ static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data, } static inline bool __io_cqring_fill_event(struct io_ring_ctx *ctx, u64 user_data, - long res, unsigned int cflags) + long res, unsigned int cflags, + unsigned int cq_idx) { struct io_uring_cqe *cqe; @@ -1561,21 +1582,22 @@ static inline bool __io_cqring_fill_event(struct io_ring_ctx *ctx, u64 user_data * submission (by quite a lot). Increment the overflow count in * the ring. */ - cqe = io_get_cqe(ctx); + cqe = io_get_cqe(ctx, cq_idx); if (likely(cqe)) { WRITE_ONCE(cqe->user_data, user_data); WRITE_ONCE(cqe->res, res); WRITE_ONCE(cqe->flags, cflags); return true; } - return io_cqring_event_overflow(ctx, user_data, res, cflags); + return io_cqring_event_overflow(ctx, user_data, res, cflags, cq_idx); } /* not as hot to bloat with inlining */ static noinline bool io_cqring_fill_event(struct io_ring_ctx *ctx, u64 user_data, - long res, unsigned int cflags) + long res, unsigned int cflags, + unsigned int cq_idx) { - return __io_cqring_fill_event(ctx, user_data, res, cflags); + return __io_cqring_fill_event(ctx, user_data, res, cflags, cq_idx); } static void io_req_complete_post(struct io_kiocb *req, long res, @@ -1585,7 +1607,7 @@ static void io_req_complete_post(struct io_kiocb *req, long res, unsigned long flags; spin_lock_irqsave(&ctx->completion_lock, flags); - __io_cqring_fill_event(ctx, req->user_data, res, cflags); + __io_cqring_fill_event(ctx, req->user_data, res, cflags, req->cq_idx); /* * If we're the last reference to this request, add to our locked * free_list cache. @@ -1797,7 +1819,7 @@ static bool io_kill_linked_timeout(struct io_kiocb *req) link->timeout.head = NULL; if (hrtimer_try_to_cancel(&io->timer) != -1) { io_cqring_fill_event(link->ctx, link->user_data, - -ECANCELED, 0); + -ECANCELED, 0, link->cq_idx); io_put_req_deferred(link, 1); return true; } @@ -1816,7 +1838,8 @@ static void io_fail_links(struct io_kiocb *req) link->link = NULL; trace_io_uring_fail_link(req, link); - io_cqring_fill_event(link->ctx, link->user_data, -ECANCELED, 0); + io_cqring_fill_event(link->ctx, link->user_data, -ECANCELED, 0, + link->cq_idx); io_put_req_deferred(link, 2); link = nxt; } @@ -2138,7 +2161,7 @@ static void io_submit_flush_completions(struct io_comp_state *cs, for (i = 0; i < nr; i++) { req = cs->reqs[i]; __io_cqring_fill_event(ctx, req->user_data, req->result, - req->compl.cflags); + req->compl.cflags, req->cq_idx); } io_commit_cqring(ctx); spin_unlock_irq(&ctx->completion_lock); @@ -2201,7 +2224,7 @@ static unsigned io_cqring_events(struct io_ring_ctx *ctx) { /* See comment at the top of this file */ smp_rmb(); - return __io_cqring_events(ctx); + return __io_cqring_events(&ctx->cqs[IO_DEFAULT_CQ]); } static inline unsigned int io_sqring_entries(struct io_ring_ctx *ctx) @@ -2278,7 +2301,8 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, if (req->flags & REQ_F_BUFFER_SELECTED) cflags = io_put_rw_kbuf(req); - __io_cqring_fill_event(ctx, req->user_data, req->result, cflags); + __io_cqring_fill_event(ctx, req->user_data, req->result, cflags, + req->cq_idx); (*nr_events)++; if (req_ref_put_and_test(req)) @@ -4911,7 +4935,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask) } if (req->poll.events & EPOLLONESHOT) flags = 0; - if (!io_cqring_fill_event(ctx, req->user_data, error, flags)) { + if (!io_cqring_fill_event(ctx, req->user_data, error, flags, req->cq_idx)) { io_poll_remove_waitqs(req); req->poll.done = true; flags = 0; @@ -5242,7 +5266,8 @@ static bool io_poll_remove_one(struct io_kiocb *req) do_complete = io_poll_remove_waitqs(req); if (do_complete) { - io_cqring_fill_event(req->ctx, req->user_data, -ECANCELED, 0); + io_cqring_fill_event(req->ctx, req->user_data, -ECANCELED, 0, + req->cq_idx); io_commit_cqring(req->ctx); req_set_fail_links(req); io_put_req_deferred(req, 1); @@ -5494,7 +5519,7 @@ static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer) atomic_set(&req->ctx->cq_timeouts, atomic_read(&req->ctx->cq_timeouts) + 1); - io_cqring_fill_event(ctx, req->user_data, -ETIME, 0); + io_cqring_fill_event(ctx, req->user_data, -ETIME, 0, req->cq_idx); io_commit_cqring(ctx); spin_unlock_irqrestore(&ctx->completion_lock, flags); @@ -5536,7 +5561,7 @@ static int io_timeout_cancel(struct io_ring_ctx *ctx, __u64 user_data) return PTR_ERR(req); req_set_fail_links(req); - io_cqring_fill_event(ctx, req->user_data, -ECANCELED, 0); + io_cqring_fill_event(ctx, req->user_data, -ECANCELED, 0, req->cq_idx); io_put_req_deferred(req, 1); return 0; } @@ -5609,7 +5634,7 @@ static int io_timeout_remove(struct io_kiocb *req, unsigned int issue_flags) ret = io_timeout_update(ctx, tr->addr, &tr->ts, io_translate_timeout_mode(tr->flags)); - io_cqring_fill_event(ctx, req->user_data, ret, 0); + io_cqring_fill_event(ctx, req->user_data, ret, 0, req->cq_idx); io_commit_cqring(ctx); spin_unlock_irq(&ctx->completion_lock); io_cqring_ev_posted(ctx); @@ -5761,7 +5786,7 @@ static void io_async_find_and_cancel(struct io_ring_ctx *ctx, done: if (!ret) ret = success_ret; - io_cqring_fill_event(ctx, req->user_data, ret, 0); + io_cqring_fill_event(ctx, req->user_data, ret, 0, req->cq_idx); io_commit_cqring(ctx); spin_unlock_irqrestore(&ctx->completion_lock, flags); io_cqring_ev_posted(ctx); @@ -5818,7 +5843,7 @@ static int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags) spin_lock_irq(&ctx->completion_lock); done: - io_cqring_fill_event(ctx, req->user_data, ret, 0); + io_cqring_fill_event(ctx, req->user_data, ret, 0, req->cq_idx); io_commit_cqring(ctx); spin_unlock_irq(&ctx->completion_lock); io_cqring_ev_posted(ctx); @@ -6516,6 +6541,11 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, req->result = 0; req->work.creds = NULL; + req->cq_idx = READ_ONCE(sqe->cq_idx); + if (unlikely(req->cq_idx >= ctx->cq_nr)) { + req->cq_idx = IO_DEFAULT_CQ; + return -EINVAL; + } /* enforce forwards compatibility on users */ if (unlikely(sqe_flags & ~SQE_VALID_FLAGS)) return -EINVAL; @@ -7548,7 +7578,7 @@ static void __io_rsrc_put_work(struct io_rsrc_node *ref_node) io_ring_submit_lock(ctx, lock_ring); spin_lock_irqsave(&ctx->completion_lock, flags); - io_cqring_fill_event(ctx, prsrc->tag, 0, 0); + io_cqring_fill_event(ctx, prsrc->tag, 0, 0, IO_DEFAULT_CQ); ctx->cq_extra++; io_commit_cqring(ctx); spin_unlock_irqrestore(&ctx->completion_lock, flags); @@ -9484,7 +9514,6 @@ static int io_allocate_scq_urings(struct io_ring_ctx *ctx, /* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; - ctx->cqs[0].entries = p->cq_entries; size = rings_size(p->sq_entries, p->cq_entries, &sq_array_offset); if (size == SIZE_MAX) @@ -9501,6 +9530,11 @@ static int io_allocate_scq_urings(struct io_ring_ctx *ctx, rings->sq_ring_entries = p->sq_entries; rings->cq_ring_entries = p->cq_entries; + ctx->cqs[0].cached_tail = 0; + ctx->cqs[0].rings = rings; + ctx->cqs[0].entries = p->cq_entries; + ctx->cq_nr = 1; + size = array_size(sizeof(struct io_uring_sqe), p->sq_entries); if (size == SIZE_MAX) { io_mem_free(ctx->rings); @@ -10164,6 +10198,7 @@ static int __init io_uring_init(void) BUILD_BUG_SQE_ELEM(40, __u16, buf_index); BUILD_BUG_SQE_ELEM(42, __u16, personality); BUILD_BUG_SQE_ELEM(44, __s32, splice_fd_in); + BUILD_BUG_SQE_ELEM(48, __u16, cq_idx); BUILD_BUG_ON(sizeof(struct io_uring_files_update) != sizeof(struct io_uring_rsrc_update)); diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index e1ae46683301..c2dfb179360a 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -58,6 +58,7 @@ struct io_uring_sqe { /* personality to use, if used */ __u16 personality; __s32 splice_fd_in; + __u16 cq_idx; }; __u64 __pad2[3]; }; From patchwork Wed May 19 14:13:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267457 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A2B5C43461 for ; Wed, 19 May 2021 14:14:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1E39F611AD for ; Wed, 19 May 2021 14:14:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353936AbhESOPu (ORCPT ); Wed, 19 May 2021 10:15:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347171AbhESOPa (ORCPT ); Wed, 19 May 2021 10:15:30 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23BA4C06138F; Wed, 19 May 2021 07:14:02 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id a4so14236691wrr.2; Wed, 19 May 2021 07:14:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EJHSY4GTkUlJ+ZXFSK+xFoztKbsreaR/qQT5e4Mm7Vk=; b=VpmUnLwdywtGsaomIJ9w3D94b1zm2owM3muYBDMYOuMmlDy4VY0YQT6VOLQxT7YJpQ QzajMZzPOI5H7yL3TTYVvPgNqykBnWBCXfB7q6ZHR2Q2/ci73Q715uDGHuohZ06L1Tu0 a+A0JXvHfD4YTc7GE4lqbHJTA+VD6TWu6TXficdQ7AFABKl3sAbs4uhOydFY6YjpbkVD 7c2lgNCjA6UZL+BI1QE0QDYVCuyjuz/I6beCOpk2jW9Fdls1lgYxa5WKlJvWvIgl33Cs vcFAzawtikeRxIswzelxeWKfdf/kvPS/UXrwzi4sCCWNuF9CPextXeVI2UYAIbMzz8BX s6Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EJHSY4GTkUlJ+ZXFSK+xFoztKbsreaR/qQT5e4Mm7Vk=; b=DP8rx2fUDKa6FcBTLLAhfOaF0a3m4kzw1s73DLkSymRxk9FaDH5CJcaXYZfcXIj+bC VIe/aStE5K44PPQb2t/Ga7EmgH/DC8q8FqtCuudLO5kb0YEnpLLeOGazsj03DBA9BANi Ovw6MyuU5G6B0iNQ5mMd5mVyUR1Ct15fqv6JZ73nHXrZFEEsV6Fz2ZLNAfRTk3Ju6pfe Hadd73nbRyIocfMX+T0qhFfFjXN9F88HD5L4rE7NkusqyTnV+kYfgJy5nFPq5+7svNdc JL2vRZlSkCTgKBECtdi7cAMefcpcpm73tMpLfDREGF0HEbrC+LuBP6UT0pSX+WJ1BvmG bKyg== X-Gm-Message-State: AOAM531plHoTpUiIUdxK0U5Be06iZLxpHmjFN4AwJepxON+017EX5HzZ nzMBARhgPZppdwGvDlXfmvd92qnJXsBxfWgp X-Google-Smtp-Source: ABdhPJwPUYcxqEtO1WUZ6UbamK3hCLDfJxNDZea9HK3S6q1BE8ZpYYlMAzy3fTkuiAxC65yCiLqibw== X-Received: by 2002:a05:6000:1286:: with SMTP id f6mr14488512wrx.226.1621433640869; Wed, 19 May 2021 07:14:00 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.13.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:00 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 09/23] io_uring: extract cq size helper Date: Wed, 19 May 2021 15:13:20 +0100 Message-Id: <743a1d192f84fb2294840d802b45cdd005d4c926.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Extract a helper calculating CQ size from an userspace specified number of entries. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 356a5dc90f46..f05592ae5f41 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1139,6 +1139,24 @@ static inline bool io_is_timeout_noseq(struct io_kiocb *req) return !req->timeout.off; } +static long io_get_cqring_size(struct io_uring_params *p, unsigned entries) +{ + /* + * If IORING_SETUP_CQSIZE is set, we do the same roundup + * to a power-of-two, if it isn't already. We do NOT impose + * any cq vs sq ring sizing. + */ + if (!entries) + return -EINVAL; + if (entries > IORING_MAX_CQ_ENTRIES) { + if (!(p->flags & IORING_SETUP_CLAMP)) + return -EINVAL; + entries = IORING_MAX_CQ_ENTRIES; + } + entries = roundup_pow_of_two(entries); + return entries; +} + static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) { struct io_ring_ctx *ctx; @@ -9625,21 +9643,13 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p, */ p->sq_entries = roundup_pow_of_two(entries); if (p->flags & IORING_SETUP_CQSIZE) { - /* - * If IORING_SETUP_CQSIZE is set, we do the same roundup - * to a power-of-two, if it isn't already. We do NOT impose - * any cq vs sq ring sizing. - */ - if (!p->cq_entries) - return -EINVAL; - if (p->cq_entries > IORING_MAX_CQ_ENTRIES) { - if (!(p->flags & IORING_SETUP_CLAMP)) - return -EINVAL; - p->cq_entries = IORING_MAX_CQ_ENTRIES; - } - p->cq_entries = roundup_pow_of_two(p->cq_entries); - if (p->cq_entries < p->sq_entries) + long cq_entries = io_get_cqring_size(p, p->cq_entries); + + if (cq_entries < 0) + return cq_entries; + if (cq_entries < p->sq_entries) return -EINVAL; + p->cq_entries = cq_entries; } else { p->cq_entries = 2 * p->sq_entries; } From patchwork Wed May 19 14:13:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267459 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 515BEC43460 for ; Wed, 19 May 2021 14:14:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3317C611AD for ; Wed, 19 May 2021 14:14:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354069AbhESOPx (ORCPT ); Wed, 19 May 2021 10:15:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353956AbhESOPb (ORCPT ); Wed, 19 May 2021 10:15:31 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23CE5C061342; Wed, 19 May 2021 07:14:03 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id q5so14199744wrs.4; Wed, 19 May 2021 07:14:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=p5vGg0RqALu5mFpfRw+5YKA5lKfc7l3wHZfGWX7shrY=; b=Fb/E9ZY1LVmG6M+4NIy05ALhaxdRJ5skYa57IBz4gUxpMP/TTc2yS1tK1fdKEj1iPx /KGEa4USptfmtXZxlLPDV2068ZKyyeVjEkVgJzeBP2eeQpYft+IjBcWyHb4UFFeUJ3ET huStZC/oP27hjjlSEE9nLuQC51EukAImONAZIZNtktgrNThwfLJn+syufjHaITiHRfJo A/Z95u4SFf1ed0a4u8uTUY12mVmD+PD04b9njR2zXE2bAbWW/8QB4+O42uDVjb6/ykAZ RAD4cORX0s2ELVIdMJyaI162xZfV9zTfLIZq/3Cq99ATSeMSlDYLGLMpOmT1LbSU7/Oo rxwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=p5vGg0RqALu5mFpfRw+5YKA5lKfc7l3wHZfGWX7shrY=; b=rhPHqujlZABN3xfzR8rFS5LQvwWBGV6/UFP8JJpSv39XX1WBCQ+WlY8jy56v8rzw6U bS4+B/AEY90FrstqIlHkdVR9+JLfBYanuZJ+yDh06qVLYyJkKhv5ztHjBhArJmYsfLJ4 d2SbLsDbMwwyKW8XL5Nm6wtwMeH3Jwkw60zY+RFLePyGEX9tiBKYrSWDfwDwF/jp3W54 ZeIq1GqUzKY7fTzrLYtSvqJxnpM7wHzDu0KfxCbSKd72R8AeAeoQhfdIU/tgGpV175FC F34lQsERs+E9cueIzffsiug45+LJlIv+jHok99RJa0IDXAgqG4ap5xUdkLIJwJdAnSzQ akvg== X-Gm-Message-State: AOAM533haRzHU27TrTBPVGXd+GqbCKEQiXnphm6Wr4jwgCG8VOKPwIlJ WyXIPAdZl2e1hqrNPrgLdRlhePViJdP8v8X8 X-Google-Smtp-Source: ABdhPJzoffqZYjzQLRf50KkXkExxnbeq+X8pmr3Q43uM2CrWulsgG8VTG7Ee2Z+/bVH5c6KhToD1RA== X-Received: by 2002:adf:e991:: with SMTP id h17mr14767306wrm.265.1621433642101; Wed, 19 May 2021 07:14:02 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:01 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 10/23] io_uring: add support for multiple CQs Date: Wed, 19 May 2021 15:13:21 +0100 Message-Id: <6f8364dc9df59b27b09472f262a545b42f8639a5.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org TODO: don't rob all bits from params, use pointer to a struct Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 89 +++++++++++++++++++++++++++-------- include/uapi/linux/io_uring.h | 3 +- 2 files changed, 71 insertions(+), 21 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index f05592ae5f41..067cfb3a6e4a 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -91,6 +91,7 @@ #define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES) #define IO_DEFAULT_CQ 0 +#define IO_MAX_CQRINGS 1024 /* * Shift of 9 is 512 entries, or exactly one page on 64-bit archs @@ -417,7 +418,7 @@ struct io_ring_ctx { unsigned long cq_check_overflow; unsigned cq_extra; struct wait_queue_head cq_wait; - struct io_cqring cqs[1]; + struct io_cqring *cqs; unsigned int cq_nr; struct { @@ -1166,6 +1167,9 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) if (!ctx) return NULL; + ctx->cqs = kmalloc_array(p->nr_cq + 1, sizeof(ctx->cqs[0]), GFP_KERNEL); + if (!ctx->cqs) + goto err; /* * Use 5 bits less than the max cq entries, that should give us around * 32 entries per hash list if totally full and uniformly spread. @@ -8634,6 +8638,8 @@ static bool io_wait_rsrc_data(struct io_rsrc_data *data) static void io_ring_ctx_free(struct io_ring_ctx *ctx) { + unsigned int i; + io_sq_thread_finish(ctx); if (ctx->mm_account) { @@ -8673,6 +8679,9 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx) io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); + for (i = 1; i < ctx->cq_nr; i++) + io_mem_free(ctx->cqs[i].rings); + kfree(ctx->cqs); percpu_ref_exit(&ctx->refs); free_uid(ctx->user); @@ -9524,11 +9533,39 @@ static const struct file_operations io_uring_fops = { #endif }; +static void __io_init_cqring(struct io_cqring *cq, struct io_rings *rings, + unsigned int entries) +{ + WRITE_ONCE(rings->cq_ring_entries, entries); + WRITE_ONCE(rings->cq_ring_mask, entries - 1); + + cq->cached_tail = 0; + cq->rings = rings; + cq->entries = entries; +} + +static int io_init_cqring(struct io_cqring *cq, unsigned int entries) +{ + struct io_rings *rings; + size_t size; + + size = rings_size(0, entries, NULL); + if (size == SIZE_MAX) + return -EOVERFLOW; + rings = io_mem_alloc(size); + if (!rings) + return -ENOMEM; + __io_init_cqring(cq, rings, entries); + return 0; +} + static int io_allocate_scq_urings(struct io_ring_ctx *ctx, struct io_uring_params *p) { + u32 __user *cq_sizes = u64_to_user_ptr(p->cq_sizes); struct io_rings *rings; size_t size, sq_array_offset; + int i, ret; /* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; @@ -9544,30 +9581,43 @@ static int io_allocate_scq_urings(struct io_ring_ctx *ctx, ctx->rings = rings; ctx->sq_array = (u32 *)((char *)rings + sq_array_offset); rings->sq_ring_mask = p->sq_entries - 1; - rings->cq_ring_mask = p->cq_entries - 1; rings->sq_ring_entries = p->sq_entries; - rings->cq_ring_entries = p->cq_entries; - ctx->cqs[0].cached_tail = 0; - ctx->cqs[0].rings = rings; - ctx->cqs[0].entries = p->cq_entries; + __io_init_cqring(&ctx->cqs[0], rings, p->cq_entries); ctx->cq_nr = 1; size = array_size(sizeof(struct io_uring_sqe), p->sq_entries); - if (size == SIZE_MAX) { - io_mem_free(ctx->rings); - ctx->rings = NULL; - return -EOVERFLOW; - } + ret = -EOVERFLOW; + if (unlikely(size == SIZE_MAX)) + goto err; ctx->sq_sqes = io_mem_alloc(size); - if (!ctx->sq_sqes) { - io_mem_free(ctx->rings); - ctx->rings = NULL; - return -ENOMEM; + ret = -ENOMEM; + if (unlikely(!ctx->sq_sqes)) + goto err; + + for (i = 0; i < p->nr_cq; i++, ctx->cq_nr++) { + u32 sz; + long entries; + + ret = -EFAULT; + if (copy_from_user(&sz, &cq_sizes[i], sizeof(sz))) + goto err; + entries = io_get_cqring_size(p, sz); + if (entries < 0) { + ret = entries; + goto err; + } + ret = io_init_cqring(&ctx->cqs[i + 1], entries); + if (ret) + goto err; } return 0; +err: + io_mem_free(ctx->rings); + ctx->rings = NULL; + return ret; } static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file) @@ -9653,6 +9703,10 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p, } else { p->cq_entries = 2 * p->sq_entries; } + if (p->nr_cq > IO_MAX_CQRINGS) + return -EINVAL; + if (!p->nr_cq != !p->cq_sizes) + return -EINVAL; ctx = io_ring_ctx_alloc(p); if (!ctx) @@ -9744,14 +9798,9 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p, static long io_uring_setup(u32 entries, struct io_uring_params __user *params) { struct io_uring_params p; - int i; if (copy_from_user(&p, params, sizeof(p))) return -EFAULT; - for (i = 0; i < ARRAY_SIZE(p.resv); i++) { - if (p.resv[i]) - return -EINVAL; - } if (p.flags & ~(IORING_SETUP_IOPOLL | IORING_SETUP_SQPOLL | IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE | diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index c2dfb179360a..92b61ca09ea5 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -263,7 +263,8 @@ struct io_uring_params { __u32 sq_thread_idle; __u32 features; __u32 wq_fd; - __u32 resv[3]; + __u32 nr_cq; + __u64 cq_sizes; struct io_sqring_offsets sq_off; struct io_cqring_offsets cq_off; }; From patchwork Wed May 19 14:13:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267467 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 743CCC433ED for ; Wed, 19 May 2021 14:14:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 58D72613AF for ; Wed, 19 May 2021 14:14:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354093AbhESOQB (ORCPT ); Wed, 19 May 2021 10:16:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353961AbhESOPb (ORCPT ); Wed, 19 May 2021 10:15:31 -0400 Received: from mail-wm1-x332.google.com (mail-wm1-x332.google.com [IPv6:2a00:1450:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 122DFC061344; Wed, 19 May 2021 07:14:05 -0700 (PDT) Received: by mail-wm1-x332.google.com with SMTP id y184-20020a1ce1c10000b02901769b409001so3432498wmg.3; Wed, 19 May 2021 07:14:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=K6PlxR1Sd0Afhg7WgBhZnU12RDoSmxGULXPHeQgmUtQ=; b=kZbVe1+YAuC3A2cvHPLAJKF9rDZiNU07dbnjjQ+8rdlThDSiL2BHpe9buOIp2lCnOb p3agNsXrGjlkCZEwu66m04+44YK7oCtRZNnEXh1Jk/K2OIxIBgWZWolRExdfFEOJjOlS mPdeoONhtOPixZUtK+x8cgjkf/mjYv2PEWaRqwBULBq2RQpzKArXpTloDRDwaNRwi8vf PGKW212hS8Qw0IfewBVjzt4JhPJWzecTdZVChh55WZmRuOOUkNBonaVlVqLPKV7rTYD/ ZUSomRhOlpe+3LYWNVzKUjSu3vW8Eo/4+k/Vcuf6SifUESZqsqJjDEnhOFj60WFHTCXH 9QDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=K6PlxR1Sd0Afhg7WgBhZnU12RDoSmxGULXPHeQgmUtQ=; b=RLLaCoWLczvlRqf2k8Kr5EgJqd/PLB4tZyJlAUQc3FWt4OBPx6R8QRn6xQJscLCG+8 CXGj2qmhM8B468UYZb2kxohgdvHPT+O6eFJQ4cyfegKWR4a+PSSlMT337VXhNtpGscCS b8NpoI5ziCBNtyPwKQaH5ZRk6kA8wXumE+UmvnNHS+3F2/tiJ7gL1NjbVQkczgbYQ6Ei L+p/LZfHakbsffEATAmz+ksZxxiROH4GksQDosVZARM/3SoKriTSHTuhS8uwAz/K517z N1ZjqQS8k8NChD5RieKoNKJu931QFw6W5F3lWZlKzCXGM+xgI33NaEYfdA00eY9lptcS PPXA== X-Gm-Message-State: AOAM530z34Zk79raN5a4du/n7c/ikhJg8k6OcN/tbm7Wm4v3aRqWRUi/ ekuJyPrx5sFXC2yttGhIvfObHnsZqqPpsWPv X-Google-Smtp-Source: ABdhPJwZT+EsBrggiDOnFydu0K3+0eay6r0x0styanpMdNMyczuPKrpqOosZc0yumdIULPYGxSSxQg== X-Received: by 2002:a7b:c041:: with SMTP id u1mr11549476wmc.95.1621433643523; Wed, 19 May 2021 07:14:03 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:03 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 11/23] io_uring: enable mmap'ing additional CQs Date: Wed, 19 May 2021 15:13:22 +0100 Message-Id: <0b54c7e684a2a9c3cb1a3cddfe48790e79bfbb89.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org TODO: get rid of extra offset Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 13 ++++++++++++- include/uapi/linux/io_uring.h | 2 ++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 067cfb3a6e4a..1a4c9e513ac9 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -9207,6 +9207,7 @@ static void *io_uring_validate_mmap_request(struct file *file, struct io_ring_ctx *ctx = file->private_data; loff_t offset = pgoff << PAGE_SHIFT; struct page *page; + unsigned long cq_idx; void *ptr; switch (offset) { @@ -9218,7 +9219,15 @@ static void *io_uring_validate_mmap_request(struct file *file, ptr = ctx->sq_sqes; break; default: - return ERR_PTR(-EINVAL); + if (offset < IORING_OFF_CQ_RING_EXTRA) + return ERR_PTR(-EINVAL); + offset -= IORING_OFF_CQ_RING_EXTRA; + if (offset % IORING_STRIDE_CQ_RING) + return ERR_PTR(-EINVAL); + cq_idx = offset / IORING_STRIDE_CQ_RING; + if (cq_idx >= ctx->cq_nr) + return ERR_PTR(-EINVAL); + ptr = ctx->cqs[cq_idx].rings; } page = virt_to_head_page(ptr); @@ -9615,6 +9624,8 @@ static int io_allocate_scq_urings(struct io_ring_ctx *ctx, return 0; err: + while (ctx->cq_nr > 1) + io_mem_free(ctx->cqs[--ctx->cq_nr].rings); io_mem_free(ctx->rings); ctx->rings = NULL; return ret; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 92b61ca09ea5..67a97c793de7 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -203,6 +203,8 @@ enum { #define IORING_OFF_SQ_RING 0ULL #define IORING_OFF_CQ_RING 0x8000000ULL #define IORING_OFF_SQES 0x10000000ULL +#define IORING_OFF_CQ_RING_EXTRA 0x1200000ULL +#define IORING_STRIDE_CQ_RING 0x0100000ULL /* * Filled with the offset for mmap(2) From patchwork Wed May 19 14:13:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267463 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 001B1C43462 for ; Wed, 19 May 2021 14:14:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DC863611AD for ; Wed, 19 May 2021 14:14:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347239AbhESOP5 (ORCPT ); Wed, 19 May 2021 10:15:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353988AbhESOPj (ORCPT ); Wed, 19 May 2021 10:15:39 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CE77C061345; Wed, 19 May 2021 07:14:06 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id o127so7378881wmo.4; Wed, 19 May 2021 07:14:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Ia8yEEumaDfwO1+1AwrAneih7zxe+tE90lEQ3nBrXdA=; b=kIF6ZwdpWDvqeWiNerCWYEcZZTDQFXBAmVGVoRf/cIS5S6XeHSzLBVmF05Y/YvIYsg E2cfYjpexVaSmoMzaO5i6VdtCEi7J90y7Bl4FET4DA8uRzDmLbRNYN9D67/AFomkw7FR iRlUWRQielyRHRqFztu7fziWWA8gl4QNRLzapOhwl+DakaAhmKjTHojZ3IW82rYpcMJa ZAHoBiSnwrOmt4NJMLH47KlZvOZQug48k3DHSYF/VvFSVVC6pf74JhvacqOw0ZmMjYy6 14BROARYwZU+yLOlcDuidocx71NwGp8ZzlW61t7fh3ZI1SPDTX4ZdWPOVjcthQnJHooB Cs0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ia8yEEumaDfwO1+1AwrAneih7zxe+tE90lEQ3nBrXdA=; b=uiWpT+p0VRcB9rmZDpEpnP0IWldDKgQxsQpEAjnQNVclIMj+Ivokd4NbeV/srRjyrP gczngpNDGz/nBbodzfap9mPgGCHiu1dCnWXQuQJ49/BtcAVG8FKgr0k20cJTpZ1eJC3/ pTdUfYawUob3XbGFfcA/a21sGbA9M1C9v+Ut0doa2nZ4NdfqZzLOBcegICTVlwBzRYy4 eE35t9a2q83vJUxkZG8yeWfbxEbYDorr5zkChl7IEb84lGO+tib0LIrKrKvWT/opex9a 94Wx1dzC+GFj1Axxu7dQfuD8p9iAhnO7FbjMOfWzzrdJY+gyQCk/cVy1iyMaOH2ylNRE UccA== X-Gm-Message-State: AOAM533OfqFDS8DvLuWIkFQkKy33Ogn4+lVktpkjS/lGBV2MieBrfD4i yCAtMIe3IOdHYyXZAUbRx0TBRpSkOxez7cm8 X-Google-Smtp-Source: ABdhPJyKH/yCHUDc+r67QSoXhMHTW2SqJ0qCGVPQAFyvcat2jjT6ZtYZxVKvkGp/yaswMTRd/lgdkg== X-Received: by 2002:a1c:b306:: with SMTP id c6mr11247677wmf.37.1621433644583; Wed, 19 May 2021 07:14:04 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:04 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 12/23] bpf: add IOURING program type Date: Wed, 19 May 2021 15:13:23 +0100 Message-Id: <3883680d4638504e3dcf79bf1c15d548a9cb7f3e.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Draft a new program type BPF_PROG_TYPE_IOURING, which will be used by io_uring to execute BPF-based requests. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 21 +++++++++++++++++++++ include/linux/bpf_types.h | 2 ++ include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 1 + kernel/bpf/verifier.c | 5 ++++- 5 files changed, 29 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 1a4c9e513ac9..882b16b5e5eb 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -10201,6 +10201,27 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, return ret; } +static const struct bpf_func_proto * +io_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + return bpf_base_func_proto(func_id); +} + +static bool io_bpf_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + return false; +} + +const struct bpf_prog_ops bpf_io_uring_prog_ops = {}; + +const struct bpf_verifier_ops bpf_io_uring_verifier_ops = { + .get_func_proto = io_bpf_func_proto, + .is_valid_access = io_bpf_is_valid_access, +}; + SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, void __user *, arg, unsigned int, nr_args) { diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 99f7fd657d87..d0b7954887bd 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -77,6 +77,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm, void *, void *) #endif /* CONFIG_BPF_LSM */ #endif +BPF_PROG_TYPE(BPF_PROG_TYPE_IOURING, bpf_io_uring, + void *, void *) BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 4ba4ef0ff63a..de544f0fbeef 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -206,6 +206,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_EXT, BPF_PROG_TYPE_LSM, BPF_PROG_TYPE_SK_LOOKUP, + BPF_PROG_TYPE_IOURING, }; enum bpf_attach_type { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 250503482cda..6ef7a26f4dc3 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2041,6 +2041,7 @@ static bool is_net_admin_prog_type(enum bpf_prog_type prog_type) case BPF_PROG_TYPE_CGROUP_SOCKOPT: case BPF_PROG_TYPE_CGROUP_SYSCTL: case BPF_PROG_TYPE_SOCK_OPS: + case BPF_PROG_TYPE_IOURING: case BPF_PROG_TYPE_EXT: /* extends any prog */ return true; case BPF_PROG_TYPE_CGROUP_SKB: diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0399ac092b36..2a53f44618a7 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -8558,6 +8558,9 @@ static int check_return_code(struct bpf_verifier_env *env) case BPF_PROG_TYPE_SK_LOOKUP: range = tnum_range(SK_DROP, SK_PASS); break; + case BPF_PROG_TYPE_IOURING: + range = tnum_const(0); + break; case BPF_PROG_TYPE_EXT: /* freplace program can return anything as its return value * depends on the to-be-replaced kernel func or bpf program. @@ -12560,7 +12563,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env) u64 key; if (prog->aux->sleepable && prog->type != BPF_PROG_TYPE_TRACING && - prog->type != BPF_PROG_TYPE_LSM) { + prog->type != BPF_PROG_TYPE_LSM && prog->type != BPF_PROG_TYPE_IOURING) { verbose(env, "Only fentry/fexit/fmod_ret and lsm programs can be sleepable\n"); return -EINVAL; } From patchwork Wed May 19 14:13:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267465 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DC8BC433B4 for ; Wed, 19 May 2021 14:14:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 357FD6135F for ; Wed, 19 May 2021 14:14:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353971AbhESOP6 (ORCPT ); Wed, 19 May 2021 10:15:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353989AbhESOPj (ORCPT ); Wed, 19 May 2021 10:15:39 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87CFCC061346; Wed, 19 May 2021 07:14:07 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id a4so14236973wrr.2; Wed, 19 May 2021 07:14:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GeM8NQyz7ORAxfAoFKPgRLBmcqCsjTuQcn5faI5QXyo=; b=AOSrvP98BmuRTG+FmxrZd8I3tCCSpMJXKXFD/SuP2lMQDfMX96t3VHpGFpHkof0Mpr MIJVQ3l8YOGH8W6qAJprrVQoGQH3wxbzWwlW1AqSm9bRj1qk0HRWVTXizjiDoXXcK1h1 x/nOfz8LJ7f0KuOiPLJMp3R9srUM1Byf2XodyBxZZV3uDk8XaV3zVkjBz9FGs2msEznR rYw0NPjTjXUbAJsefh4YRbqz2rUUVdufrjOTx4z1kBPOV8NDCEsQ81vSr79MDL25O1d3 yYqP2bKwZm1jLS2Kafs4GtoZtI3v41ypw8DZUaUlEKS/fSRp/UTiow4iSq90ha0kGbIH bofQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GeM8NQyz7ORAxfAoFKPgRLBmcqCsjTuQcn5faI5QXyo=; b=L1DNBPvKTZ6/YaV1ZDCAiCeQh7VbiAReM4V3XjsUGtvJ9mAS0+U0+4xa+q/kplL58M 7trFhCAs8F8e/bRoXmqtvMLXhNc+1+jdx1sdtA8mgxltQn3M0H55jiK7joGnOMMebUTJ P4BS4Zv6sVf+1efZ8StkWHBZaLK/dv8dzlRSyYrzpoRLPj4/+vH9Q1dZK9lZTb/wl6ek aADjXcH425FjY93KvaT3XyOOIzZRzfFVIG6/l7TwB6d4p7AReuiY23Ix8RrwMu7sHBdS cIITXIJx3WNlKMq64m6JimCQvtOlwbV5UeG4NygAdK+3cMxgxGpK+mFeyMbo6+fHYCKP 8+LA== X-Gm-Message-State: AOAM533TLISkvRGX7HCQuQt3uCqDMY7fMmGs8Dgttet0piwqajQ/G/sp WGiPU4lrxb1DiWNa7zsznxJCCQ0zNVEgaMy/ X-Google-Smtp-Source: ABdhPJw7O8/hvA1MopjfEanOYzwwEajXtj0dfW4VBn7OK2Y1y3d3TQUYQKIK2wyVyHhTyB8GiOOQ8Q== X-Received: by 2002:a5d:440d:: with SMTP id z13mr14641480wrq.134.1621433645790; Wed, 19 May 2021 07:14:05 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:05 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 13/23] io_uring: implement bpf prog registration Date: Wed, 19 May 2021 15:13:24 +0100 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org [de]register BPF programs through io_uring_register() with new IORING_ATTACH_BPF and IORING_DETACH_BPF commands. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 81 +++++++++++++++++++++++++++++++++++ include/uapi/linux/io_uring.h | 2 + 2 files changed, 83 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 882b16b5e5eb..b13cbcd5c47b 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -78,6 +78,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -103,6 +104,8 @@ #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ IORING_REGISTER_LAST + IORING_OP_LAST) +#define IORING_MAX_BPF_PROGS 100 + #define SQE_VALID_FLAGS (IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK| \ IOSQE_IO_HARDLINK | IOSQE_ASYNC | \ IOSQE_BUFFER_SELECT) @@ -266,6 +269,10 @@ struct io_restriction { bool registered; }; +struct io_bpf_prog { + struct bpf_prog *prog; +}; + enum { IO_SQ_THREAD_SHOULD_STOP = 0, IO_SQ_THREAD_SHOULD_PARK, @@ -411,6 +418,10 @@ struct io_ring_ctx { struct xarray personalities; u32 pers_next; + /* bpf programs */ + unsigned nr_bpf_progs; + struct io_bpf_prog *bpf_progs; + struct fasync_struct *cq_fasync; struct eventfd_ctx *cq_ev_fd; atomic_t cq_timeouts; @@ -8627,6 +8638,66 @@ static void io_req_caches_free(struct io_ring_ctx *ctx) mutex_unlock(&ctx->uring_lock); } +static int io_bpf_unregister(struct io_ring_ctx *ctx) +{ + int i; + + if (!ctx->nr_bpf_progs) + return -ENXIO; + + for (i = 0; i < ctx->nr_bpf_progs; ++i) { + struct bpf_prog *prog = ctx->bpf_progs[i].prog; + + if (prog) + bpf_prog_put(prog); + } + kfree(ctx->bpf_progs); + ctx->bpf_progs = NULL; + ctx->nr_bpf_progs = 0; + return 0; +} + +static int io_bpf_register(struct io_ring_ctx *ctx, void __user *arg, + unsigned int nr_args) +{ + u32 __user *fds = arg; + int i, ret = 0; + + if (!nr_args || nr_args > IORING_MAX_BPF_PROGS) + return -EINVAL; + if (ctx->nr_bpf_progs) + return -EBUSY; + + ctx->bpf_progs = kcalloc(nr_args, sizeof(ctx->bpf_progs[0]), + GFP_KERNEL); + if (!ctx->bpf_progs) + return -ENOMEM; + + for (i = 0; i < nr_args; ++i) { + struct bpf_prog *prog; + u32 fd; + + if (copy_from_user(&fd, &fds[i], sizeof(fd))) { + ret = -EFAULT; + break; + } + if (fd == -1) + continue; + + prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_IOURING); + if (IS_ERR(prog)) { + ret = PTR_ERR(prog); + break; + } + ctx->bpf_progs[i].prog = prog; + } + + ctx->nr_bpf_progs = i; + if (ret) + io_bpf_unregister(ctx); + return ret; +} + static bool io_wait_rsrc_data(struct io_rsrc_data *data) { if (!data) @@ -8657,6 +8728,7 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx) mutex_unlock(&ctx->uring_lock); io_eventfd_unregister(ctx); io_destroy_buffers(ctx); + io_bpf_unregister(ctx); if (ctx->sq_creds) put_cred(ctx->sq_creds); @@ -10188,6 +10260,15 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, case IORING_REGISTER_RSRC_UPDATE: ret = io_register_rsrc_update(ctx, arg, nr_args); break; + case IORING_REGISTER_BPF: + ret = io_bpf_register(ctx, arg, nr_args); + break; + case IORING_UNREGISTER_BPF: + ret = -EINVAL; + if (arg || nr_args) + break; + ret = io_bpf_unregister(ctx); + break; default: ret = -EINVAL; break; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 67a97c793de7..b450f41d7389 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -304,6 +304,8 @@ enum { IORING_REGISTER_ENABLE_RINGS = 12, IORING_REGISTER_RSRC = 13, IORING_REGISTER_RSRC_UPDATE = 14, + IORING_REGISTER_BPF = 15, + IORING_UNREGISTER_BPF = 16, /* this goes last */ IORING_REGISTER_LAST From patchwork Wed May 19 14:13:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267469 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87068C433ED for ; Wed, 19 May 2021 14:15:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6DA05611C2 for ; Wed, 19 May 2021 14:15:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353947AbhESOQR (ORCPT ); Wed, 19 May 2021 10:16:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353990AbhESOPj (ORCPT ); Wed, 19 May 2021 10:15:39 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D045AC06175F; Wed, 19 May 2021 07:14:08 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id b7so6787575wmh.5; Wed, 19 May 2021 07:14:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZfMi8N0pZ9W/LFQX3/SPx1aTt3/UeZJSyxUGTs5xnSM=; b=DKCvsjFafrURQCuJflXsY2aEMYOAJ8DUoHLkntlKnXDpENDibKhvkurYRXl2Ikuu+3 SVAxuGFvLiiewZOtoZq7pESZAwzQqIOS56aYjlIBdgPdjIKiKVKi0k0viCPjPjL6/ChA 2dzyn+jsuzMePX1WbJLjDMIVt5tLDKgf940B188vYlbllA3gND6dz3M43ks15oBd34Og q/8tuKom9wZU0Z3URjffYjiqTQ184iLbv4qfwwdkfTYohgmzURApva1dYOdJaVgsynjc Duo1ZStcQrgICLYBwreFmLdB9cgsam1N9peq71oiDxXvH9DEGsCZMONBJgf25Devtm/Z 4XHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZfMi8N0pZ9W/LFQX3/SPx1aTt3/UeZJSyxUGTs5xnSM=; b=AGkbWR4TlyO275SIcNKnW16RJGHi2ocMqIg3Ll92lFlHEWXWJNIIAmUKGZ7WU5EVa1 RSX1dJGlIjYkAKkMxw6A7Doi+/za5ankiDKC/SeH1suDzoEQzAhlOiCMJ/TxRy+Zlfx1 9S5T0Q3ZqtNuhycmNwfACkUGlmmTB7A3Xd8f6aC/McX4j/VxJagSZejZR2p0cwclTOtJ h7xzASRy1BOW7/d14Yd6aXxW/i0id5iLuvaOxDTI926fsMg6ubYArJ4tyFaMy7ncs5z/ 9A3D/qourAjJGQvTiKVG2gew0FU+aQ0m6r8rQogptMi8h1E1VM/aar1ezwrilvIm8mn5 rtdw== X-Gm-Message-State: AOAM5312ET0gARN1zd5+5HyDghsChKX434y4xX9I5ehHE0t4/xofp2c+ pm+iiG83WZlkvxrhFT924ko0jxwDz/AJbvi4 X-Google-Smtp-Source: ABdhPJzdXkn8OQjcy54S7MiXoDu+Qf+E9i+8Y9rH17K9BFswuLzN78yl5qnA7dsNftSTiiL0K4BnIw== X-Received: by 2002:a05:600c:19c8:: with SMTP id u8mr11215829wmq.25.1621433647073; Wed, 19 May 2021 07:14:07 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:06 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 14/23] io_uring: add support for bpf requests Date: Wed, 19 May 2021 15:13:25 +0100 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Wire up a new io_uring operation type IORING_OP_BPF, which executes a specified BPF program from the registered prog table. It doesn't allow to do anything useful for now, no BPF functions are allowed apart from basic ones. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 92 +++++++++++++++++++++++++++++++++++ include/uapi/linux/io_uring.h | 1 + 2 files changed, 93 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index b13cbcd5c47b..20fddc5945f2 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -682,6 +682,11 @@ struct io_unlink { struct filename *filename; }; +struct io_bpf { + struct file *file; + struct bpf_prog *prog; +}; + struct io_completion { struct file *file; struct list_head list; @@ -826,6 +831,7 @@ struct io_kiocb { struct io_shutdown shutdown; struct io_rename rename; struct io_unlink unlink; + struct io_bpf bpf; /* use only after cleaning per-op data, see io_clean_op() */ struct io_completion compl; }; @@ -875,6 +881,9 @@ struct io_defer_entry { u32 seq; }; +struct io_bpf_ctx { +}; + struct io_op_def { /* needs req->file assigned */ unsigned needs_file : 1; @@ -1039,6 +1048,7 @@ static const struct io_op_def io_op_defs[] = { }, [IORING_OP_RENAMEAT] = {}, [IORING_OP_UNLINKAT] = {}, + [IORING_OP_BPF] = {}, }; static bool io_disarm_next(struct io_kiocb *req); @@ -1070,6 +1080,7 @@ static void io_rsrc_put_work(struct work_struct *work); static void io_req_task_queue(struct io_kiocb *req); static void io_submit_flush_completions(struct io_comp_state *cs, struct io_ring_ctx *ctx); +static void io_bpf_run(struct io_kiocb *req, unsigned int issue_flags); static bool io_poll_remove_waitqs(struct io_kiocb *req); static int io_req_prep_async(struct io_kiocb *req); @@ -3931,6 +3942,53 @@ static int io_openat(struct io_kiocb *req, unsigned int issue_flags) return io_openat2(req, issue_flags); } +static int io_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_ring_ctx *ctx = req->ctx; + struct bpf_prog *prog; + unsigned int idx; + + if (unlikely(ctx->flags & (IORING_SETUP_IOPOLL|IORING_SETUP_SQPOLL))) + return -EINVAL; + if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT))) + return -EINVAL; + if (sqe->ioprio || sqe->len || sqe->cancel_flags) + return -EINVAL; + if (sqe->addr) + return -EINVAL; + + idx = READ_ONCE(sqe->off); + if (unlikely(idx >= ctx->nr_bpf_progs)) + return -EFAULT; + idx = array_index_nospec(idx, ctx->nr_bpf_progs); + prog = ctx->bpf_progs[idx].prog; + if (!prog) + return -EFAULT; + + req->bpf.prog = prog; + return 0; +} + +static void io_bpf_run_task_work(struct callback_head *cb) +{ + struct io_kiocb *req = container_of(cb, struct io_kiocb, task_work); + struct io_ring_ctx *ctx = req->ctx; + + mutex_lock(&ctx->uring_lock); + io_bpf_run(req, 0); + mutex_unlock(&ctx->uring_lock); +} + +static int io_bpf(struct io_kiocb *req, unsigned int issue_flags) +{ + init_task_work(&req->task_work, io_bpf_run_task_work); + if (unlikely(io_req_task_work_add(req))) { + req_ref_get(req); + io_req_task_queue_fail(req, -ECANCELED); + } + return 0; +} + static int io_remove_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { @@ -6002,6 +6060,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return io_renameat_prep(req, sqe); case IORING_OP_UNLINKAT: return io_unlinkat_prep(req, sqe); + case IORING_OP_BPF: + return io_bpf_prep(req, sqe); } printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n", @@ -6269,6 +6329,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) case IORING_OP_UNLINKAT: ret = io_unlinkat(req, issue_flags); break; + case IORING_OP_BPF: + ret = io_bpf(req, issue_flags); + break; default: ret = -EINVAL; break; @@ -10303,6 +10366,35 @@ const struct bpf_verifier_ops bpf_io_uring_verifier_ops = { .is_valid_access = io_bpf_is_valid_access, }; +static void io_bpf_run(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_bpf_ctx bpf_ctx; + struct bpf_prog *prog; + int ret = -EAGAIN; + + lockdep_assert_held(&req->ctx->uring_lock); + + if (unlikely(percpu_ref_is_dying(&ctx->refs) || + atomic_read(&req->task->io_uring->in_idle))) + goto done; + + memset(&bpf_ctx, 0, sizeof(bpf_ctx)); + prog = req->bpf.prog; + + if (prog->aux->sleepable) { + rcu_read_lock(); + bpf_prog_run_pin_on_cpu(req->bpf.prog, &bpf_ctx); + rcu_read_unlock(); + } else { + bpf_prog_run_pin_on_cpu(req->bpf.prog, &bpf_ctx); + } + + ret = 0; +done: + __io_req_complete(req, issue_flags, ret, 0); +} + SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, void __user *, arg, unsigned int, nr_args) { diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index b450f41d7389..25ab804670e1 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -138,6 +138,7 @@ enum { IORING_OP_SHUTDOWN, IORING_OP_RENAMEAT, IORING_OP_UNLINKAT, + IORING_OP_BPF, /* this goes last, obviously */ IORING_OP_LAST, From patchwork Wed May 19 14:13:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267471 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26DF0C43460 for ; Wed, 19 May 2021 14:15:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0867B611AD for ; Wed, 19 May 2021 14:15:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354223AbhESOQb (ORCPT ); Wed, 19 May 2021 10:16:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353991AbhESOPj (ORCPT ); Wed, 19 May 2021 10:15:39 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E86A7C061761; Wed, 19 May 2021 07:14:09 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id v12so14199302wrq.6; Wed, 19 May 2021 07:14:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vVU1uY4IfNnz3Vpp/U7vP1o2Q47qEq5p9Zy12+8Bn0M=; b=LlxYu6FcIiLVpg2u8S03Oeqc1AY9D0Old35JFdEyviyAIBvd+CGSqE730ve/FHa5Su qNNzpjV5KfXk7XUFM4IJr0Dj8PDEI8RGD/AafWZCNgwnLA1mycoCVzy/om16WI5tmxC0 ezymZQrQ7TuwdfZ0cVxfaV7bTiPcSA/ZVXGfea9DL1BVg0CN50q4yDofBEpTn5DyiNnW 2U/yJ19zT2MDd3Q7qyPcWKi4tBzGPSOh38mQsiGkAOpyPdN49sV+k9eaUhkThK1+wlOs RMkm5pj3cUEalrR8sxBqQlOOGoffXrTyGdJ0XtQQUW7iPX8oa7MbB2bPWlwQYk/3kSOB BcQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vVU1uY4IfNnz3Vpp/U7vP1o2Q47qEq5p9Zy12+8Bn0M=; b=VT1Kav35VuYmnvYQ/ASh+dOxj3Jds07Jqp0qD+ebq0sfjzAzlw7Osd5dPredCl/STq /NYP+7L16fCaovvRg6RZef33FDecS8ijh+q1OSbElnqrDisdonVLx3ImWqfJkMHn4Kyo Xmqj9hUkCGFaMro94npmZ5xGtsr8yIkzC0sGzeH5if0JdfvMf6qwSOl+xuDJVJXjjQtk 2vAQAu3N7EGJkURC/jVUGdg0bTVT+1GR6HlOj56mIvrQyIejt+P8Fgr0ybQQ+LzxH/kM UtANbZp8GjJZ4HJm6UsU8/rR7CnfOdB2fnINWHu7tHcYBUewAUXWRTISdHKyucKfw4eF WEbQ== X-Gm-Message-State: AOAM533AaiKlK1dFVTmUGC0cdjEDg8EAcPWOTJiox0QqfU/pGiejFvfn wd35+/C1fWmzG065s8FAVRhbIYrLGsg6AWEz X-Google-Smtp-Source: ABdhPJzfSmgKzTxEWRG/9Apx4pLQWaCiFZdNWz7k4KIxVRHiBrJi1ixOMCqyUH7c9eBi+vY+ue0eWA== X-Received: by 2002:adf:e5ce:: with SMTP id a14mr14414798wrn.180.1621433648349; Wed, 19 May 2021 07:14:08 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:07 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 15/23] io_uring: enable BPF to submit SQEs Date: Wed, 19 May 2021 15:13:26 +0100 Message-Id: <8ec8373d406d1fcb41719e641799dcc5c0455db3.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Add a BPF_FUNC_iouring_queue_sqe BPF function as a demonstration of submmiting a new request by a BPF request. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 51 ++++++++++++++++++++++++++++++++++++---- include/uapi/linux/bpf.h | 1 + 2 files changed, 48 insertions(+), 4 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 20fddc5945f2..aae786291c57 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -882,6 +882,7 @@ struct io_defer_entry { }; struct io_bpf_ctx { + struct io_ring_ctx *ctx; }; struct io_op_def { @@ -6681,7 +6682,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, ret = -EBADF; } - state->ios_left--; + if (state->ios_left > 1) + state->ios_left--; return ret; } @@ -10345,10 +10347,50 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, return ret; } +BPF_CALL_3(io_bpf_queue_sqe, struct io_bpf_ctx *, bpf_ctx, + const struct io_uring_sqe *, sqe, + u32, sqe_len) +{ + struct io_ring_ctx *ctx = bpf_ctx->ctx; + struct io_kiocb *req; + + if (sqe_len != sizeof(struct io_uring_sqe)) + return -EINVAL; + + req = io_alloc_req(ctx); + if (unlikely(!req)) + return -ENOMEM; + if (!percpu_ref_tryget_many(&ctx->refs, 1)) { + kmem_cache_free(req_cachep, req); + return -EAGAIN; + } + percpu_counter_add(¤t->io_uring->inflight, 1); + refcount_add(1, ¤t->usage); + + /* returns number of submitted SQEs or an error */ + return !io_submit_sqe(ctx, req, sqe); +} + +const struct bpf_func_proto io_bpf_queue_sqe_proto = { + .func = io_bpf_queue_sqe, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_PTR_TO_MEM, + .arg3_type = ARG_CONST_SIZE, +}; + static const struct bpf_func_proto * io_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) { - return bpf_base_func_proto(func_id); + switch (func_id) { + case BPF_FUNC_copy_from_user: + return prog->aux->sleepable ? &bpf_copy_from_user_proto : NULL; + case BPF_FUNC_iouring_queue_sqe: + return prog->aux->sleepable ? &io_bpf_queue_sqe_proto : NULL; + default: + return bpf_base_func_proto(func_id); + } } static bool io_bpf_is_valid_access(int off, int size, @@ -10379,9 +10421,10 @@ static void io_bpf_run(struct io_kiocb *req, unsigned int issue_flags) atomic_read(&req->task->io_uring->in_idle))) goto done; - memset(&bpf_ctx, 0, sizeof(bpf_ctx)); + bpf_ctx.ctx = ctx; prog = req->bpf.prog; + io_submit_state_start(&ctx->submit_state, 1); if (prog->aux->sleepable) { rcu_read_lock(); bpf_prog_run_pin_on_cpu(req->bpf.prog, &bpf_ctx); @@ -10389,7 +10432,7 @@ static void io_bpf_run(struct io_kiocb *req, unsigned int issue_flags) } else { bpf_prog_run_pin_on_cpu(req->bpf.prog, &bpf_ctx); } - + io_submit_state_end(&ctx->submit_state, ctx); ret = 0; done: __io_req_complete(req, issue_flags, ret, 0); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index de544f0fbeef..cc268f749a7d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4082,6 +4082,7 @@ union bpf_attr { FN(ima_inode_hash), \ FN(sock_from_file), \ FN(check_mtu), \ + FN(iouring_queue_sqe), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Wed May 19 14:13:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267473 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E76BC43461 for ; Wed, 19 May 2021 14:15:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F197B611C2 for ; Wed, 19 May 2021 14:15:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354230AbhESOQc (ORCPT ); Wed, 19 May 2021 10:16:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353994AbhESOPk (ORCPT ); Wed, 19 May 2021 10:15:40 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37232C061347; Wed, 19 May 2021 07:14:11 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id v12so14199390wrq.6; Wed, 19 May 2021 07:14:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=i3e5Ax3mN/Y6yPJpYOCMTBCrnugW+yH6avkA8+dpTMg=; b=o0IMFRjUNXwCpOHk3pjSBZPz4dX7RtIs+Nbuov846dsGZ93uGA1REcYDEmAyWquxZx 6uoKNeCUigLeYsDZBvBwOajq6vLKsl57jSaHDitctlNZ6FYeAUdNFU3xUDJZ5uqNlq+9 GbiyxGeTgpyYriKH+0usuyUqnhEAt6nV1HS+hTxUYrCyMzUpzqQYOsHj3oHcV3Ic6TuR rTX/V1sI5kj26beypsIO4+IoSY65UwAJvm14yzNu1ngOcUY058VzCKMVFgpdS1NJHQf7 a/+oax78Nhpdj5SB2+b1BB/JictgIREppkWX+FPSVXinpDEv1FhN84VENlKibbmWh96X TgDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=i3e5Ax3mN/Y6yPJpYOCMTBCrnugW+yH6avkA8+dpTMg=; b=LNunQli8heNAtCxfkf2C9WVHL08whV4qRgS0UlISEywM1H6w0Ef8KQNonVCcUyttxF dkR+DaEKC7mlUf2z/zD+UGn7Q9WZ9iDCeBbZPdcjR+zBaYb1LlH7TWa/R3p2a06+GXi1 RwUSsNCrsz0Ub3IQPiRUgBaJvGvkCBGCDhGt8TNrOX5qj5Giygr22eeFoieFyHDhGeCk poaPohRO0E6Vs9E74lmBOyMApjIA/flZvN5gj/uqoCeeX2pYSmnuHCfhQLpSX1Nqwpl+ awO5qpdlpEbyAybspDlCOOuEGparvJmr9Rqbdz66xyxbD/6N4f+HGDa0rYomov9wDkCP O/fQ== X-Gm-Message-State: AOAM532PQiTc+yP2KWTEwTsLeyQSdRlwGG+SsHjf0EfkA6hQpkCOYBpb bQByItyuXRd1P7bE64xdkpHbryOFh8PyAOG8 X-Google-Smtp-Source: ABdhPJwDsCZjLb/cYFXSCoIiv0fCMqbZyLQ3qdXf4hFV92BafEpmsWtCKjBjqWOXH65ZDuGBfT2AJQ== X-Received: by 2002:adf:fe07:: with SMTP id n7mr14908474wrr.388.1621433649552; Wed, 19 May 2021 07:14:09 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:09 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 16/23] io_uring: enable bpf to submit CQEs Date: Wed, 19 May 2021 15:13:27 +0100 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 36 ++++++++++++++++++++++++++++++++++++ include/uapi/linux/bpf.h | 1 + 2 files changed, 37 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index aae786291c57..464d630904e2 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -10371,6 +10371,29 @@ BPF_CALL_3(io_bpf_queue_sqe, struct io_bpf_ctx *, bpf_ctx, return !io_submit_sqe(ctx, req, sqe); } +BPF_CALL_5(io_bpf_emit_cqe, struct io_bpf_ctx *, bpf_ctx, + u32, cq_idx, + u64, user_data, + s32, res, + u32, flags) +{ + struct io_ring_ctx *ctx = bpf_ctx->ctx; + bool submitted; + + if (unlikely(cq_idx >= ctx->cq_nr)) + return -EINVAL; + + spin_lock_irq(&ctx->completion_lock); + submitted = io_cqring_fill_event(ctx, user_data, res, flags, cq_idx); + io_commit_cqring(ctx); + ctx->cq_extra++; + spin_unlock_irq(&ctx->completion_lock); + if (submitted) + io_cqring_ev_posted(ctx); + + return submitted ? 0 : -ENOMEM; +} + const struct bpf_func_proto io_bpf_queue_sqe_proto = { .func = io_bpf_queue_sqe, .gpl_only = false, @@ -10380,6 +10403,17 @@ const struct bpf_func_proto io_bpf_queue_sqe_proto = { .arg3_type = ARG_CONST_SIZE, }; +const struct bpf_func_proto io_bpf_emit_cqe_proto = { + .func = io_bpf_emit_cqe, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_ANYTHING, + .arg4_type = ARG_ANYTHING, + .arg5_type = ARG_ANYTHING, +}; + static const struct bpf_func_proto * io_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) { @@ -10388,6 +10422,8 @@ io_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return prog->aux->sleepable ? &bpf_copy_from_user_proto : NULL; case BPF_FUNC_iouring_queue_sqe: return prog->aux->sleepable ? &io_bpf_queue_sqe_proto : NULL; + case BPF_FUNC_iouring_emit_cqe: + return &io_bpf_emit_cqe_proto; default: return bpf_base_func_proto(func_id); } diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index cc268f749a7d..c6b023be7848 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4083,6 +4083,7 @@ union bpf_attr { FN(sock_from_file), \ FN(check_mtu), \ FN(iouring_queue_sqe), \ + FN(iouring_emit_cqe), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Wed May 19 14:13:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267475 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE8C3C433B4 for ; Wed, 19 May 2021 14:15:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D0066135A for ; Wed, 19 May 2021 14:15:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354300AbhESOQ5 (ORCPT ); Wed, 19 May 2021 10:16:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354040AbhESOPt (ORCPT ); Wed, 19 May 2021 10:15:49 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75F82C0613CE; Wed, 19 May 2021 07:14:12 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id l18-20020a1ced120000b029014c1adff1edso3530350wmh.4; Wed, 19 May 2021 07:14:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=h10pgO2Ruz23aXWMMYRea29pOW4EecEeGd3AsobhL9Y=; b=TTGtF9J0FTnKB4QaPpfc7o8poBmJ4z/D3nXatcnaCO5uZpjltSL03x9BZTehpnPJO6 jwFUZL3OpnxkD7O5/p7ECuqrH4V9vLl5Fvzz8qIpRFKzZ8mp0xiMjlC2TBr4m0VDR6VJ miFSU2db+Ph7fGajw1OT1Jv4y5o4L9+PPCzboHQP+mK+inYq5VUdaLIImNIWIEtT/wtL CfC0tdYfgWyPe877Eb2KUD55n4nCZSIH935Ox1qeDkRK8cfIrvoN73m4+ZEWEm2PU546 bAtzhTdoziCudpYx2ZqMTmBu0CWZPYXk1DFK/wTI799s7yXFkFzEodwMC4yzOKMtMvrJ /jkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=h10pgO2Ruz23aXWMMYRea29pOW4EecEeGd3AsobhL9Y=; b=n17Y7nELT3uSarWYEsdNVepF8CofQjQOlB+KMlahdHCAfax+2e5GsHNvWVVp2PZ8Pk O6ZHSphvQDB1UY+rWI/5QPBPHoJARKlAOvKfr6Xbnkch5PwyNPVrEfvmSo9jGFqqlyT7 SCY+4bbxcsO50DHwpxuD/tZ+S1QFEg87867xDsh+Iz1pGNbvRNIt0gTFH9nkbdYQmU9G Odaif0mp5VYjfha+mdCXcQXhmVVM5ae3OSmaR3IDEqqsVNq48viFAHdHPkVes025cxyl i2Y3hxCXh4W16/dM/Xg3g7HQbbb3H4z3/73187oxYDuBqyVL7kCOcq0TbpVBCa1pNoAU 6dHQ== X-Gm-Message-State: AOAM531T/vgnkmQaft8BnQrfu+8txZaZcfQWEV611RZ35Fx6NlMxuZfo amuIhVxfNOfggOL2beHSH3/46fuq/Rm1BM+R X-Google-Smtp-Source: ABdhPJwNoisVqwqptYNHAySSmwLHuzNJzQX1iwZxlzcR4ymwCUi/E4qrctdCa105hSRPq1kKTF6HdA== X-Received: by 2002:a7b:c92e:: with SMTP id h14mr11734868wml.179.1621433650847; Wed, 19 May 2021 07:14:10 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:10 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 17/23] io_uring: enable bpf to reap CQEs Date: Wed, 19 May 2021 15:13:28 +0100 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 48 ++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/bpf.h | 1 + 2 files changed, 49 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 464d630904e2..7c165b2ce8e4 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -10394,6 +10394,42 @@ BPF_CALL_5(io_bpf_emit_cqe, struct io_bpf_ctx *, bpf_ctx, return submitted ? 0 : -ENOMEM; } +BPF_CALL_4(io_bpf_reap_cqe, struct io_bpf_ctx *, bpf_ctx, + u32, cq_idx, + struct io_uring_cqe *, cqe_out, + u32, cqe_len) +{ + struct io_ring_ctx *ctx = bpf_ctx->ctx; + struct io_uring_cqe *cqe; + struct io_cqring *cq; + struct io_rings *r; + unsigned tail, head, mask; + int ret = -EINVAL; + + if (unlikely(cqe_len != sizeof(*cqe_out))) + goto err; + if (unlikely(cq_idx >= ctx->cq_nr)) + goto err; + + cq = &ctx->cqs[cq_idx]; + r = cq->rings; + tail = READ_ONCE(r->cq.tail); + head = smp_load_acquire(&r->cq.head); + + ret = -ENOENT; + if (unlikely(tail == head)) + goto err; + + mask = cq->entries - 1; + cqe = &r->cqes[head & mask]; + memcpy(cqe_out, cqe, sizeof(*cqe_out)); + WRITE_ONCE(r->cq.head, head + 1); + return 0; +err: + memset(cqe_out, 0, sizeof(*cqe_out)); + return ret; +} + const struct bpf_func_proto io_bpf_queue_sqe_proto = { .func = io_bpf_queue_sqe, .gpl_only = false, @@ -10414,6 +10450,16 @@ const struct bpf_func_proto io_bpf_emit_cqe_proto = { .arg5_type = ARG_ANYTHING, }; +const struct bpf_func_proto io_bpf_reap_cqe_proto = { + .func = io_bpf_reap_cqe, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_PTR_TO_UNINIT_MEM, + .arg4_type = ARG_CONST_SIZE, +}; + static const struct bpf_func_proto * io_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) { @@ -10424,6 +10470,8 @@ io_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return prog->aux->sleepable ? &io_bpf_queue_sqe_proto : NULL; case BPF_FUNC_iouring_emit_cqe: return &io_bpf_emit_cqe_proto; + case BPF_FUNC_iouring_reap_cqe: + return &io_bpf_reap_cqe_proto; default: return bpf_base_func_proto(func_id); } diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c6b023be7848..7719ec4a33e7 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4084,6 +4084,7 @@ union bpf_attr { FN(check_mtu), \ FN(iouring_queue_sqe), \ FN(iouring_emit_cqe), \ + FN(iouring_reap_cqe), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Wed May 19 14:13:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267477 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14B01C433B4 for ; Wed, 19 May 2021 14:15:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EECE361355 for ; Wed, 19 May 2021 14:15:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354293AbhESORL (ORCPT ); Wed, 19 May 2021 10:17:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37734 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353955AbhESOPx (ORCPT ); Wed, 19 May 2021 10:15:53 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D28C5C061350; Wed, 19 May 2021 07:14:13 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id j14so12459671wrq.5; Wed, 19 May 2021 07:14:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=B1YRk/MN/N6oLVTe294bueKKNcMA5klQT+cH/wkOgww=; b=FZljZuNTvTkGGeqGsgnBYac0iEIJXuo3oF86e9uVrqPP1pheEtHjr04bmJlwwddajN q4ymq3KmjuYIcy/H/BIoybv6jSPehHp6uRrgHHc5BjFfuLiU9WukTDc8fNEYvW4OedpY vgEti+c821R2gzt70s8E+EKZPwEww0zkCGShwWVWJHR/rCsHlNdGJCvVediJoKeuhcgn y5d3PRV6jmGKZgqjn/MJARQxZNX9Zv6NV+r/egMRKPMJpFAxygiYHJIrC1NDDSM/a7s8 VZTrU1pSnAxN+frx/Nrvaolw1e9RT/f2+iTv3vRJpy/rJLsFpGm0VOPsljf4SzAYGs9f uTGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=B1YRk/MN/N6oLVTe294bueKKNcMA5klQT+cH/wkOgww=; b=PdK4CeeQ/l+7oi912f1qKd3ol/UMHYCSYYoc30NqqgWzgFIQRP4ypTczc2qBoEHxmF nUwFDvOgCx65OBhk3i8MyWPIwZVHuYv2L/ci29niA4H3Y3UxK+/YknW1IABSyvRvojjx 3XZDXRr362/UUSCx4Z6Xg+v2rPyxC21Ok4GNbH5C3MI5U6SqyChXmkzH+DKHDFWo56jk LLro8d5mlKE2Tu3PCLtRGbNpN6j59dWE4FHcWWLe3hcZHv78bofYgcpqtLpEBxRcM/si VY1BTdAEjCMNAFOKMOZ9i8G8w+NorUjHKIYXHq9+BXmIKlnSwkwIrWAwnHcCOVcdLQkt 7Nig== X-Gm-Message-State: AOAM532E5xUEgFvrpAfGDYHE6aQVSN9ElRAIS6oiq2Lp7qt25gWmKG8V C5eEKv7Uu/PReSLLh4AyMIJ2piyenDZ6c9h1 X-Google-Smtp-Source: ABdhPJxVSmlBSXELqxGfzGMozBFO7YPHqorL8KOB1RsWOa3aaj8bv72eqAUXVIGru4vWyFTVqoL5mw== X-Received: by 2002:adf:e38c:: with SMTP id e12mr14705083wrm.128.1621433652284; Wed, 19 May 2021 07:14:12 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:12 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 18/23] libbpf: support io_uring Date: Wed, 19 May 2021 15:13:29 +0100 Message-Id: <94134844a6f4be2e0da2c518cb0e2e9ebb1d71b0.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Signed-off-by: Pavel Begunkov --- tools/lib/bpf/libbpf.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 4181d178ee7b..de5d1508f58e 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -13,6 +13,10 @@ #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif + +/* hack, use local headers instead of system-wide */ +#include "../../../include/uapi/linux/bpf.h" + #include #include #include @@ -8630,6 +8634,9 @@ static const struct bpf_sec_def section_defs[] = { BPF_PROG_SEC("struct_ops", BPF_PROG_TYPE_STRUCT_OPS), BPF_EAPROG_SEC("sk_lookup/", BPF_PROG_TYPE_SK_LOOKUP, BPF_SK_LOOKUP), + SEC_DEF("iouring/", IOURING), + SEC_DEF("iouring.s/", IOURING, + .is_sleepable = true), }; #undef BPF_PROG_SEC_IMPL From patchwork Wed May 19 14:13:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267479 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6E05C43462 for ; Wed, 19 May 2021 14:15:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B1744613AE for ; Wed, 19 May 2021 14:15:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354335AbhESORL (ORCPT ); Wed, 19 May 2021 10:17:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354065AbhESOPx (ORCPT ); Wed, 19 May 2021 10:15:53 -0400 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D32CBC061351; Wed, 19 May 2021 07:14:15 -0700 (PDT) Received: by mail-wm1-x32c.google.com with SMTP id l11-20020a05600c4f0bb029017a7cd488f5so783549wmq.0; Wed, 19 May 2021 07:14:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZrpYaaroywU12UggXTRtJF/Lf0MpETwhcC/rLkqy6po=; b=pAge3JaRar4piwZpSbj0B4ogTM2jlF2q6tuIr1m1iKex512MiGr7a0TTq7MsoEXCl0 9bUUwqGenxeTKtoB+IXRfvCFNJXbGap+lg+wJxDGJndcKMNzSGFeiipvNz0AIGs46JqJ waG3Bs+9Z3BzmP81+Y8jVCS3dbJBKkzWr6txMxelooF0XuMsIejXUUiKVyP3LD/2df9H Oyypm1eqJ00LJEkHz+m5ipNqy+V1rkidx7ODcxtaui7RXpNrvqTIr5nEvqBQlr7Z0y++ NjYHwqAL1DCAs0Gx/eAoFjNWPE+J0BTU6M2P0pD6kKQ/P2bfXmoHdf5ESEnw6nAMEr/J c22g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZrpYaaroywU12UggXTRtJF/Lf0MpETwhcC/rLkqy6po=; b=SNSAV3cH4W4g2uh/escV7VfKommlsUeZyZLlJavEUWxtUkL/vq10zeu9zetWtIgZoH 6CtXH9wIfyldcHllTU9978AS4UwoohfbhY/IizdBAxO9hqTH57Mf9ER+F6CdFCr/Q18R 1SZHSa7Up5CeyGchJYVAeup5vqKCORP3geBRaK55P3PqKr/gPZ5xViSpDBfrxJEIbJcT KdrAkODd66CKWiZsNJ605AvLIh7V7HOLw0jiVHc/GdEBIqG3oLDq4yE9QUS6E/oZ54Ee DCZ4AdD26e2+cISu1D7yJU5MN+0s0Bb3SQdFgKJL8WXQTVHNwOTh8ChnByxK4prlZrLx th2A== X-Gm-Message-State: AOAM533wEbNks8GXLiCoILtuDsX3U+CbFM0dZzwGdqhA0GhqVbHqBD0j J1lzB1TCDmRvh8NTJV/XHKcOCS8jGqbTGNOG X-Google-Smtp-Source: ABdhPJy2NpqoTl4rWs0j+yQOKB3Vw1oqPVlvZtyxM5uMVS/MmgT2zgeGbjvAbbMNKNMd3NChpRuILw== X-Received: by 2002:a1c:7702:: with SMTP id t2mr11730397wmi.115.1621433653562; Wed, 19 May 2021 07:14:13 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:13 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 19/23] io_uring: pass user_data to bpf executor Date: Wed, 19 May 2021 15:13:30 +0100 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 16 ++++++++++++++++ include/uapi/linux/io_uring.h | 4 ++++ 2 files changed, 20 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 7c165b2ce8e4..c37846bca863 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -882,6 +882,7 @@ struct io_defer_entry { }; struct io_bpf_ctx { + struct io_uring_bpf_ctx u; struct io_ring_ctx *ctx; }; @@ -10482,6 +10483,15 @@ static bool io_bpf_is_valid_access(int off, int size, const struct bpf_prog *prog, struct bpf_insn_access_aux *info) { + if (off < 0 || off >= sizeof(struct io_uring_bpf_ctx)) + return false; + if (off % size != 0) + return false; + + switch (off) { + case offsetof(struct io_uring_bpf_ctx, user_data): + return size == sizeof_field(struct io_uring_bpf_ctx, user_data); + } return false; } @@ -10505,6 +10515,8 @@ static void io_bpf_run(struct io_kiocb *req, unsigned int issue_flags) atomic_read(&req->task->io_uring->in_idle))) goto done; + memset(&bpf_ctx.u, 0, sizeof(bpf_ctx.u)); + bpf_ctx.u.user_data = req->user_data; bpf_ctx.ctx = ctx; prog = req->bpf.prog; @@ -10591,6 +10603,10 @@ static int __init io_uring_init(void) BUILD_BUG_SQE_ELEM(44, __s32, splice_fd_in); BUILD_BUG_SQE_ELEM(48, __u16, cq_idx); + /* should be first, see io_bpf_is_valid_access() */ + __BUILD_BUG_VERIFY_ELEMENT(struct io_bpf_ctx, 0, + struct io_uring_bpf_ctx, u); + BUILD_BUG_ON(sizeof(struct io_uring_files_update) != sizeof(struct io_uring_rsrc_update)); BUILD_BUG_ON(sizeof(struct io_uring_rsrc_update) > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 25ab804670e1..d7b1713bcfb0 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -403,4 +403,8 @@ struct io_uring_getevents_arg { __u64 ts; }; +struct io_uring_bpf_ctx { + __u64 user_data; +}; + #endif From patchwork Wed May 19 14:13:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267481 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F5E3C43600 for ; Wed, 19 May 2021 14:15:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 31E76611AD for ; Wed, 19 May 2021 14:15:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354341AbhESORM (ORCPT ); Wed, 19 May 2021 10:17:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37740 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354067AbhESOPx (ORCPT ); Wed, 19 May 2021 10:15:53 -0400 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BC7FC061353; Wed, 19 May 2021 07:14:16 -0700 (PDT) Received: by mail-wm1-x333.google.com with SMTP id z130so7387206wmg.2; Wed, 19 May 2021 07:14:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TgZRck3xYV0eFV5WJ+ErP6x1msBFrX+wPITeA+FZUrM=; b=GuQ5Akj8Ly+gdqTeAq5q22pTqo7tXEPeY6AwdcM/dr9/gL5j8OOLLuHWkolLSAwh8e TZbp9Pl89OiCF2xbGd+50tvoPPMzq6MXiaY0mHUJE6gMBT1rQGQbOHMr4ryvO7zA2yZ7 VnxGiOB1KoDFdgyk0AtfZrGUcYaE/aGEZacSx5ouWMx60eGWLCVe/QJL89T3tElz4E6k iFa5QsVDV29YonGlYcPMhNgbiZvPZdvDrfQWnekENszuBRsgg3iAz41si2GFCyj+UJgc jSNt0BcHHnSKzysHRk1rmQxTLtJpUPrqoIRLwhYjMxh4gFXVI6AaFKTKl31cxtEO+3F5 0U+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TgZRck3xYV0eFV5WJ+ErP6x1msBFrX+wPITeA+FZUrM=; b=YvirdeUIawYgYvjQoSuYyM/lBrj6mC19l08f0h5EonxpWH5cugG9fnoqW9gNomjEdE v8YyPLyRMgc6MLJJoH4OqfKJlMIdUnoTSd2FSADyYgBt7qKNW5k0l6DY6N/fAkuiNFYt DKQJrDEHUVplCjfyIZE+CQBzXdiayNRz6sqd/iRkVLk9W/x2IskvW51nUWCKUs0NY3wc paVFr6reuiQcInBIGpxXusIEgHwsqCdTDDEpuu77zC9RMqUGczfwkxgpjenrGKRuqc4s /+AUZZP7x8ioO1mkN9cHwUCOzEu/M8mw2yML6YP8YdABCJWul3HxXdMgVBTbzZEokxUK VQqg== X-Gm-Message-State: AOAM531e1r5ZiYqbmiS7E7eK4/8Sez0OoFRFDsxUG6arhFjS4SdF1ioR uEdFZBQLsx5QiasMq3G+nP99SHcZzewyQ0Y3 X-Google-Smtp-Source: ABdhPJw+wd+UozFtpbl+Z/t/uneOgX1/Bte3KlOFMiTkRTMRQj1OIuk8/XQSGR+LbC3/ff7nMycKSQ== X-Received: by 2002:a1c:7e45:: with SMTP id z66mr11790687wmc.126.1621433654771; Wed, 19 May 2021 07:14:14 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:14 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 20/23] bpf: Add bpf_copy_to_user() helper Date: Wed, 19 May 2021 15:13:31 +0100 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Similarly to bpf_copy_from_user(), also allow sleepable BPF programs to write to user memory. Signed-off-by: Pavel Begunkov --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 8 ++++++++ kernel/bpf/helpers.c | 17 +++++++++++++++++ tools/include/uapi/linux/bpf.h | 7 +++++++ 4 files changed, 33 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 00597b0c719c..9b775e2b2a01 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1899,6 +1899,7 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto; extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto; extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto; extern const struct bpf_func_proto bpf_copy_from_user_proto; +extern const struct bpf_func_proto bpf_copy_to_user_proto; extern const struct bpf_func_proto bpf_snprintf_btf_proto; extern const struct bpf_func_proto bpf_per_cpu_ptr_proto; extern const struct bpf_func_proto bpf_this_cpu_ptr_proto; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 7719ec4a33e7..6f19839d2b05 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3648,6 +3648,13 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * long bpf_copy_to_user(void *user_ptr, const void *src, u32 size) + * Description + * Read *size* bytes from *src* and store the data in user space + * address *user_ptr*. This is a wrapper of **copy_to_user**\ (). + * Return + * 0 on success, or a negative error in case of failure. + * * long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags) * Description * Use BTF to store a string representation of *ptr*->ptr in *str*, @@ -4085,6 +4092,7 @@ union bpf_attr { FN(iouring_queue_sqe), \ FN(iouring_emit_cqe), \ FN(iouring_reap_cqe), \ + FN(copy_to_user), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 308427fe03a3..9d7814c564e5 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -634,6 +634,23 @@ const struct bpf_func_proto bpf_copy_from_user_proto = { .arg3_type = ARG_ANYTHING, }; +BPF_CALL_3(bpf_copy_to_user, void __user *, user_ptr, + const void *, src, u32, size) +{ + int ret = copy_to_user(user_ptr, src, size); + + return ret ? -EFAULT : 0; +} + +const struct bpf_func_proto bpf_copy_to_user_proto = { + .func = bpf_copy_to_user, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_ANYTHING, + .arg2_type = ARG_PTR_TO_MEM, + .arg3_type = ARG_CONST_SIZE_OR_ZERO, +}; + BPF_CALL_2(bpf_per_cpu_ptr, const void *, ptr, u32, cpu) { if (cpu >= nr_cpu_ids) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 79c893310492..18d497247d69 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3647,6 +3647,13 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * long bpf_copy_to_user(void *user_ptr, const void *src, u32 size) + * Description + * Read *size* bytes from *src* and store the data in user space + * address *user_ptr*. This is a wrapper of **copy_to_user**\ (). + * Return + * 0 on success, or a negative error in case of failure. + * * long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags) * Description * Use BTF to store a string representation of *ptr*->ptr in *str*, From patchwork Wed May 19 14:13:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267485 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08180C43461 for ; Wed, 19 May 2021 14:16:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E178C611AD for ; Wed, 19 May 2021 14:16:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354374AbhESORU (ORCPT ); Wed, 19 May 2021 10:17:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354090AbhESOP4 (ORCPT ); Wed, 19 May 2021 10:15:56 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3677FC061354; Wed, 19 May 2021 07:14:17 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id n2so14278159wrm.0; Wed, 19 May 2021 07:14:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=I8t7IC8rszKDO64b8khK6AR6AKFnTbofVr0A+rLLUB4=; b=jfMrMsy5UAAC10en9iNj5b6UsSvI+9eVsNRclD5mxFTdDo1fV4oxafdbKPAhLnNJpE ebTvnEQhqJ8+TwGCL+x760x+RU/bnBIoK1dKQ9e1V6sEd4i4dQIMDDMUA/EwrKEYhIl5 uKVwBMrafgR9MaYLTFBuLPPFewGifJJezI1nvJo0SLePNjo96ZUiAE/W0SzF8OYeASO7 X3jRxvBjwM04gg0lswl2h/3tju7Zb8Pyz+1qAHAnt0LWXzt0byUJmIINondvh18W4r7y 2z/TVIKaVWBsZQqD/BHyicmSHN5Y/ZxJBrefOdeeWjYj9Rq/UxH2SzsehL6kpV0ffBST j5rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=I8t7IC8rszKDO64b8khK6AR6AKFnTbofVr0A+rLLUB4=; b=LmTlrX5oyejPfnE+RvLkmY+BSw//6XQ136ZdQHJ38zKDn45U2zdgts7C8hgiaRMcT8 YdwQKAX9V5kihFIdRbZuWP1ReDD8giFYew7fXi5PkAyNVaDUQehWT6kFRRz6FmuoR0RO XvUb2gRISx3KjWBCRXZPvrp3AJn0WP6lxPULcpJikONX5ygPWuwsZOvpHEMp7hikpCDC n2qY+RbnZZ2YEIpcXY8DKuiCE9i7VhBBbAgwR2bE1oeKNbOIoJkxhB5WC55KUA0V8cHD r6i/VqyRCjQvHhqtFdVetebdufBsaySEB5k4ev03TrcoYwewyQzfEtgystXy2ZXTUmX2 xt0g== X-Gm-Message-State: AOAM530nNEfdTQsl7OQqdQJGMqfFSruRRz8ZK0FyTBRzYf+bnf8NAvWo V1CFhRpnLSaeQZKGtt++gd5YdK4JD9wTXEe6 X-Google-Smtp-Source: ABdhPJzZ+iq90liyYxgbay+X9B8r5t27hHDhqK57c2gZ+dA4QrLFSlq+2gQKwoKFLxc81ngdxCQdjQ== X-Received: by 2002:a5d:5049:: with SMTP id h9mr15148372wrt.24.1621433655895; Wed, 19 May 2021 07:14:15 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:15 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 21/23] io_uring: wire bpf copy to user Date: Wed, 19 May 2021 15:13:32 +0100 Message-Id: <12ed7a929cca94a82023076ae99e90f7663f297f.1621424513.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Enable io_uring to write to userspace memory Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index c37846bca863..c4682146afa4 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -10467,6 +10467,8 @@ io_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) switch (func_id) { case BPF_FUNC_copy_from_user: return prog->aux->sleepable ? &bpf_copy_from_user_proto : NULL; + case BPF_FUNC_copy_to_user: + return prog->aux->sleepable ? &bpf_copy_to_user_proto : NULL; case BPF_FUNC_iouring_queue_sqe: return prog->aux->sleepable ? &io_bpf_queue_sqe_proto : NULL; case BPF_FUNC_iouring_emit_cqe: From patchwork Wed May 19 14:13:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267483 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A8BBC433ED for ; Wed, 19 May 2021 14:16:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 70B92611AD for ; Wed, 19 May 2021 14:16:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353980AbhESORT (ORCPT ); Wed, 19 May 2021 10:17:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354091AbhESOP4 (ORCPT ); Wed, 19 May 2021 10:15:56 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4BD1C061355; Wed, 19 May 2021 07:14:18 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id l18-20020a1ced120000b029014c1adff1edso3530533wmh.4; Wed, 19 May 2021 07:14:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KuX2UeSUBJXSqowr6KYhrEXkA8BKdQuekr2AsyWJdb0=; b=tiJ2uAMrHeGJgJ5dBPzFwXDO6Xyhclb0qIZUs474IgaSEzRB7/qKoK+MFMR9uk+fIo MQXzN6T/nGnZOX7yXuhhMl4ID6wkGrI+PgmhGasm58yc6nosW/VU1Jp6PRT8piAks/lO Y/cuTiY+RWWo6rKPrtOdHaeAVJvIoWrqDeFlDQo647WpC1uNmi2qsl1pQrIoU6C7MiJy 1j501N4kj6U0LOQa8jISThh7Kyc6bii2aZojPqsm8QwWiGrzW685Rluz1tOiNgCmpjQX XkX6PIHl5/DQ39ehuJNOycTWbQbA44Xlg2q8zhq2JmVczVE+a5dVCIaRb+BgT9sztWNZ QHXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KuX2UeSUBJXSqowr6KYhrEXkA8BKdQuekr2AsyWJdb0=; b=e1yoyYJH/m+qaeeHlUc3kBN6pwpaBp59MGVKaFc0ZZzx0N2D5qGi0HlUGcI2bZh4Kz ZzcFe4ReEAK6Q4UReIP2M6KL90ot9KfN1VQT3jp02U0jRm+Dvm82rwauEBkqB/VdvcyO nbBZl1jFAq6xwY9bF3yitvb3jCgQVOIvvbmGLI1w7iamJ9D4qzvAlgiStdl0FaSkdeNw D28BIdcHKbu0/MKwWPom4FKtY14Tfx39G8CkeBv/z0xPb60GKphe/W2M2erkdG/tiGW5 /Xs8Zog7jlvLFJRIA0Gehn/JocAhv8hZ2BxIg1LQJ7coqT8tXFdVDq+bvoDWQ6Q2RQro hB+A== X-Gm-Message-State: AOAM533FhXNWm3OGC/LL/YF4Hl9QpxVEPrdUrONdxHOtmq1rFABICapL eLfulK7+rMq7JGgFAt7YCj5zMzm9xpt4xJqq X-Google-Smtp-Source: ABdhPJzwYCrvP5SAP3gbFSnQVCGBYTtY4mzfb5GE4HPC83YVjuJr5c1TewlinV1TEx271xO8VHRWCA== X-Received: by 2002:a7b:c446:: with SMTP id l6mr11116021wmi.75.1621433657076; Wed, 19 May 2021 07:14:17 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:16 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 22/23] io_uring: don't wait on CQ exclusively Date: Wed, 19 May 2021 15:13:33 +0100 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org It doesn't make much sense for several tasks to wait on a single CQ, it would be racy and involve rather strange code flow with synchronisation, so we don't really care optimising this case. Don't do exclusive CQ waiting and wake up everyone in the queue, it will be handy for implementing waiting non-default CQs. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index c4682146afa4..805c10be7ea4 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7085,7 +7085,7 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode, */ if (io_should_wake(iowq) || test_bit(0, &iowq->ctx->cq_check_overflow)) return autoremove_wake_function(curr, mode, wake_flags, key); - return -1; + return 0; } static int io_run_task_work_sig(void) @@ -7176,8 +7176,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, ret = -EBUSY; break; } - prepare_to_wait_exclusive(&ctx->wait, &iowq.wq, - TASK_INTERRUPTIBLE); + prepare_to_wait(&ctx->wait, &iowq.wq, TASK_INTERRUPTIBLE); ret = io_cqring_wait_schedule(ctx, &iowq, &timeout); finish_wait(&ctx->wait, &iowq.wq); cond_resched(); From patchwork Wed May 19 14:13:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12267487 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8ED1C43460 for ; Wed, 19 May 2021 14:16:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B72796135A for ; Wed, 19 May 2021 14:16:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354214AbhESORy (ORCPT ); Wed, 19 May 2021 10:17:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354221AbhESOQb (ORCPT ); Wed, 19 May 2021 10:16:31 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC0ACC06135C; Wed, 19 May 2021 07:14:19 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id y184-20020a1ce1c10000b02901769b409001so3432981wmg.3; Wed, 19 May 2021 07:14:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=iBBRv8iSs8vsCboxbXfrG1xclNCmAan+98X/JNnVwAg=; b=XSBlrvEvMACuV/Mdl4wp5JepMkofpVjHeuZWupaz3AcrddBD9wrL9TmBdnkW2CJB7+ K0mlbLXjdCRUzD4ntkCCJpzNL6SMuLMkVs/fqUq7LSAvLn4OfqFzpaf9wqQmQUyI4rd6 G2kJqroWvHAhpBPrAfFTZHP2tNbSohWOK+pwu+dACKTH7naQqzpnfkP39nxfAjbKQrSf aZQdBfFganrJ5aVynzXPA622Ya4kwRZTO/ec7N38veAZ79xHTC+PwdHLQdea/HvVfd/w J16yGF0rVYI6s4j5eA/Bd5PriuILuvXBMGQNS4nGW2P89rkiDjMdlgp1FSlmwrlfpD7N 9pkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iBBRv8iSs8vsCboxbXfrG1xclNCmAan+98X/JNnVwAg=; b=ClsF0h8t+KShvkVJ9tSIRj72ByBKg3GWb4Cn8QYNO6vyixfxvjBxjrLtwo3yk4nzy+ zFTwNz4wQcZ9Sdx6/ssv1ikxDbhYpLxo3/v2gvXYu8j9vxdENax6ePH3edDF1FhiDzS0 rcDloWTnWAMC/4wD2trixCHLHFMOhOgmSWOYwWR246pSvFfyIVHZg2SxVraPRuJ2WNRM Iy8sgblC6+thQ9EZHSxGRVHprR+bHNV44jK3Thzqhjlxp5ymv8hQyNsTck46rXHORpSI 8o6SrZ1SRBXtdw0EwN5u3yvmiWqhONaStfWPlwZy4NLsvIcTcRM+wHGpLzq2IAFtlsTK EuiQ== X-Gm-Message-State: AOAM532ZS1zeMQcadlBzmatBqQAysjPH8CXaalQAxJsEk6x7syO2xdUD iWdEkrGrNQnjjhsHNlXTC28qZXa5bCeJzZdj X-Google-Smtp-Source: ABdhPJxTeznG7ssgz77rK45QH0UWDzr+22MRJzQlDRsOMhy7gqN2YgkFcLMkwwTtc7dXq4xaJzXiXw== X-Received: by 2002:a1c:4c10:: with SMTP id z16mr11517821wmf.134.1621433658304; Wed, 19 May 2021 07:14:18 -0700 (PDT) Received: from localhost.localdomain ([85.255.235.154]) by smtp.gmail.com with ESMTPSA id z3sm6233569wrq.42.2021.05.19.07.14.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 May 2021 07:14:17 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Horst Schirmeier , "Franz-B . Tuneke" , Christian Dietrich Subject: [PATCH 23/23] io_uring: enable bpf reqs to wait for CQs Date: Wed, 19 May 2021 15:13:34 +0100 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Add experimental support for bpf requests waiting for a number of CQEs to in a specified CQ. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++-- include/uapi/linux/io_uring.h | 2 + 2 files changed, 79 insertions(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 805c10be7ea4..cf02389747b5 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -687,6 +687,12 @@ struct io_bpf { struct bpf_prog *prog; }; +struct io_async_bpf { + struct wait_queue_entry wqe; + unsigned int wait_nr; + unsigned int wait_idx; +}; + struct io_completion { struct file *file; struct list_head list; @@ -1050,7 +1056,9 @@ static const struct io_op_def io_op_defs[] = { }, [IORING_OP_RENAMEAT] = {}, [IORING_OP_UNLINKAT] = {}, - [IORING_OP_BPF] = {}, + [IORING_OP_BPF] = { + .async_size = sizeof(struct io_async_bpf), + }, }; static bool io_disarm_next(struct io_kiocb *req); @@ -9148,6 +9156,7 @@ static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx, } } + wake_up_all(&ctx->wait); ret |= io_cancel_defer_files(ctx, task, files); ret |= io_poll_remove_all(ctx, task, files); ret |= io_kill_timeouts(ctx, task, files); @@ -10492,6 +10501,10 @@ static bool io_bpf_is_valid_access(int off, int size, switch (off) { case offsetof(struct io_uring_bpf_ctx, user_data): return size == sizeof_field(struct io_uring_bpf_ctx, user_data); + case offsetof(struct io_uring_bpf_ctx, wait_nr): + return size == sizeof_field(struct io_uring_bpf_ctx, wait_nr); + case offsetof(struct io_uring_bpf_ctx, wait_idx): + return size == sizeof_field(struct io_uring_bpf_ctx, wait_idx); } return false; } @@ -10503,6 +10516,60 @@ const struct bpf_verifier_ops bpf_io_uring_verifier_ops = { .is_valid_access = io_bpf_is_valid_access, }; +static inline bool io_bpf_need_wake(struct io_async_bpf *abpf) +{ + struct io_kiocb *req = abpf->wqe.private; + struct io_ring_ctx *ctx = req->ctx; + + if (unlikely(percpu_ref_is_dying(&ctx->refs)) || + atomic_read(&req->task->io_uring->in_idle)) + return true; + return __io_cqring_events(&ctx->cqs[abpf->wait_idx]) >= abpf->wait_nr; +} + +static int io_bpf_wait_func(struct wait_queue_entry *wqe, unsigned mode, + int sync, void *key) +{ + struct io_async_bpf *abpf = container_of(wqe, struct io_async_bpf, wqe); + bool wake = io_bpf_need_wake(abpf); + + if (wake) { + list_del_init_careful(&wqe->entry); + req_ref_get(wqe->private); + io_queue_async_work(wqe->private); + } + return wake; +} + +static int io_bpf_wait_cq_async(struct io_kiocb *req, unsigned int nr, + unsigned int idx) +{ + struct io_ring_ctx *ctx = req->ctx; + struct wait_queue_head *wq; + struct wait_queue_entry *wqe; + struct io_async_bpf *abpf; + + if (unlikely(idx >= ctx->cq_nr)) + return -EINVAL; + if (!req->async_data && io_alloc_async_data(req)) + return -ENOMEM; + + abpf = req->async_data; + abpf->wait_nr = nr; + abpf->wait_idx = idx; + wqe = &abpf->wqe; + init_waitqueue_func_entry(wqe, io_bpf_wait_func); + wqe->private = req; + wq = &ctx->wait; + + spin_lock_irq(&wq->lock); + __add_wait_queue(wq, wqe); + smp_mb(); + io_bpf_wait_func(wqe, 0, 0, NULL); + spin_unlock_irq(&wq->lock); + return 0; +} + static void io_bpf_run(struct io_kiocb *req, unsigned int issue_flags) { struct io_ring_ctx *ctx = req->ctx; @@ -10512,8 +10579,8 @@ static void io_bpf_run(struct io_kiocb *req, unsigned int issue_flags) lockdep_assert_held(&req->ctx->uring_lock); - if (unlikely(percpu_ref_is_dying(&ctx->refs) || - atomic_read(&req->task->io_uring->in_idle))) + if (unlikely(percpu_ref_is_dying(&ctx->refs)) || + atomic_read(&req->task->io_uring->in_idle)) goto done; memset(&bpf_ctx.u, 0, sizeof(bpf_ctx.u)); @@ -10531,6 +10598,13 @@ static void io_bpf_run(struct io_kiocb *req, unsigned int issue_flags) } io_submit_state_end(&ctx->submit_state, ctx); ret = 0; + + if (bpf_ctx.u.wait_nr) { + ret = io_bpf_wait_cq_async(req, bpf_ctx.u.wait_nr, + bpf_ctx.u.wait_idx); + if (!ret) + return; + } done: __io_req_complete(req, issue_flags, ret, 0); } diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index d7b1713bcfb0..95c04af3afd4 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -405,6 +405,8 @@ struct io_uring_getevents_arg { struct io_uring_bpf_ctx { __u64 user_data; + __u32 wait_nr; + __u32 wait_idx; }; #endif