From patchwork Wed Apr 19 22:48:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13217611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89B86C77B73 for ; Wed, 19 Apr 2023 22:48:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230480AbjDSWsM (ORCPT ); Wed, 19 Apr 2023 18:48:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229567AbjDSWsK (ORCPT ); Wed, 19 Apr 2023 18:48:10 -0400 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F10B1703 for ; Wed, 19 Apr 2023 15:48:09 -0700 (PDT) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-63b875d0027so98599b3a.1 for ; Wed, 19 Apr 2023 15:48:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1681944488; x=1684536488; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3/SELsnLsHCK0nWRVSzVqzlBrRNbSGa03JgUvF3IRuM=; b=2e7I9RasgqGELyVyPoI+RRRoBqZ03HqSrcxGlz6Pff+yq3nwnL+FExxBWm5Dd65cph lvY7rwaJRnLTccIj8FDM+dNfLRVZ/eP4vPtU88qbplR7RzrhpGIaQqv0lmdCN7byCimZ 6r1qMqdgkjhsdx9XcF27+3kxp+kWYboH70xgdz921+UVpR1gmrwzjErIMhiVM6j3IEuN cqTcHH+AF2/4fz8fy8hZbUrc2J++4PCigt3pvs8JCzYij5RGedb7rhc/UUYqThs/RWCt UEuwNQJTNTY3LqsbBbrZw0mEM7RfOci93GY2fpjTk6wHSN7KFNcuiruuCCnYYy6nJhWW 7lGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681944488; x=1684536488; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3/SELsnLsHCK0nWRVSzVqzlBrRNbSGa03JgUvF3IRuM=; b=FWllX7BSIX2mf/10bxNGV0AzVRxlMUhCeOaZcTurMRYqSmDxwR0B4CGWC+P9JoKDh3 8RcHZ9HV4EtUjueAJodmX6y0c3t7ZtwDWTgmGNvfdSzVMq61jaBoPtpTWkMtcuvKPRvI 11/dCi01QF5nXKhuIkVdHLiN/OX4zzw+3bAuy33lDGRLZp6q9e+K8sPWZ/H9BypfpJHX Wl/sMN94fSC5DJMYMEyTh54HKqSKgBOyPa/RYgfuoopUVjQXB6ewlv/7XXxKT3klBU7V RU/uK/joTRGGOaEecsaxkLIUhr3FkcS+Kf/KipAK3pJG3GOXyvqtTGpIai/2aLhzcIph xfIQ== X-Gm-Message-State: AAQBX9dR47282YzjxNHefOtqWqzzDKXiQ762oZbxPig5BYlQo8v1s9jA +i0nORo0nJKZ8v0Bdr0azT1UN3AG6kkMEknvwMo= X-Google-Smtp-Source: AKy350aqoRThi75COXuKx0digly5jO8YYNFoDzpX7JiIHxow4UjF8wkgXXOUIkewjjT0mYNCuuCjTw== X-Received: by 2002:a05:6a21:33aa:b0:e4:9a37:2707 with SMTP id yy42-20020a056a2133aa00b000e49a372707mr22961680pzb.5.1681944488536; Wed, 19 Apr 2023 15:48:08 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id l9-20020a17090a49c900b002353082958csm1853364pjm.10.2023.04.19.15.48.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Apr 2023 15:48:08 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 1/4] io_uring: remove sq/cq_off memset Date: Wed, 19 Apr 2023 16:48:02 -0600 Message-Id: <20230419224805.693734-2-axboe@kernel.dk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230419224805.693734-1-axboe@kernel.dk> References: <20230419224805.693734-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org We only have two reserved members we're not clearing, do so manually instead. This is in preparation for using one of these members for a new feature. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 68684aabfbb7..7b4f3eb16a73 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3900,7 +3900,6 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, if (ret) goto err; - memset(&p->sq_off, 0, sizeof(p->sq_off)); p->sq_off.head = offsetof(struct io_rings, sq.head); p->sq_off.tail = offsetof(struct io_rings, sq.tail); p->sq_off.ring_mask = offsetof(struct io_rings, sq_ring_mask); @@ -3908,8 +3907,9 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->sq_off.flags = offsetof(struct io_rings, sq_flags); p->sq_off.dropped = offsetof(struct io_rings, sq_dropped); p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings; + p->sq_off.resv1 = 0; + p->sq_off.resv2 = 0; - memset(&p->cq_off, 0, sizeof(p->cq_off)); p->cq_off.head = offsetof(struct io_rings, cq.head); p->cq_off.tail = offsetof(struct io_rings, cq.tail); p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask); @@ -3917,6 +3917,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->cq_off.overflow = offsetof(struct io_rings, cq_overflow); p->cq_off.cqes = offsetof(struct io_rings, cqes); p->cq_off.flags = offsetof(struct io_rings, cq_flags); + p->cq_off.resv1 = 0; + p->cq_off.resv2 = 0; p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP | IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS | From patchwork Wed Apr 19 22:48:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13217612 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69DCCC6FD18 for ; Wed, 19 Apr 2023 22:48:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231877AbjDSWsP (ORCPT ); Wed, 19 Apr 2023 18:48:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231391AbjDSWsL (ORCPT ); Wed, 19 Apr 2023 18:48:11 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 615FD1701 for ; Wed, 19 Apr 2023 15:48:10 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-24b39cb710dso44686a91.0 for ; Wed, 19 Apr 2023 15:48:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1681944489; x=1684536489; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dtdhxjAc5NE0l6Xm4RKYrbJnPZddgUcv8zlUMpcItGE=; b=NG4dx3grn5brYbVxoktphpIpeJfH+bBon5I8CKEuvPKhK1FoRx6AIwwErveKp0Egtc Ivjq49TftfQVZw+h6tDmNhihBGO1k3BTZoPZ1xP22Yv1kaQJOPLOmxjk80Jmy5JrxZcI tijes1Eb3GhBnrP5YtxtDJd94v9B62+vNF0k2O/boyKeFr8xKGHMq3LT6Q6ebV7uLysd zARSRopVTvQ29aAaOMin4bbkKe/BCE72rMnINgOGCZEl+iBiRHWLx3jXUCUjdAMl3j36 9wtv+k8h2IjLD2wbQVYSeS1iMK+M6Vi2TkItg/++TZuQMEuQbLGfP+yCuzvbb7LnKXSA +8Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681944489; x=1684536489; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dtdhxjAc5NE0l6Xm4RKYrbJnPZddgUcv8zlUMpcItGE=; b=HO3ku9csGum6YEn6LPAOMPdrNkvihPCdqJ+4Y7ssTFVXsE2JzaLc/j9nJVjk7XMFOv kbG8UaQagnmmiP3tcknHR16d/9s1mboLLPCJrMbo25ZGsUjP0uWALeC8BeBuD1UEX6Od t90GQI1RfMwpFrrXe1b9cuRhbsvs4gRSD9WYpi6MW3jdmhPzxsPBBRhRj1ZE8vKnfhvB BxFi7+gJWvQvJz0JoTckP8wE4eg1eyx/mLdJNgjJQ8u8B3OsChoUHRrFbNwC9iWr8jni +eZzwSWvlMxZGtuk1OrrjBUykOPci9ek02XSNJiBcPdrFj+KoT0q/As/B5AWdpxFKqdr nr2g== X-Gm-Message-State: AAQBX9dDYufSV7D7GqPvEYRY52WJGmOFSlLMCNfnrSWudDN9Gq1VCkcE ivnLW0DL+D5H6k8XwVb7/xLxNLf6sYS9+9wF0Pw= X-Google-Smtp-Source: AKy350bjXstxUBducg3D8WMXEwadmbG3MedYpok4UDgTfpycuwWDYZLkHh+QAwF/rb68j41MMEK//Q== X-Received: by 2002:a17:90a:1d5:b0:246:fa2b:91be with SMTP id 21-20020a17090a01d500b00246fa2b91bemr16882360pjd.3.1681944489513; Wed, 19 Apr 2023 15:48:09 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id l9-20020a17090a49c900b002353082958csm1853364pjm.10.2023.04.19.15.48.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Apr 2023 15:48:08 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() Date: Wed, 19 Apr 2023 16:48:03 -0600 Message-Id: <20230419224805.693734-3-axboe@kernel.dk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230419224805.693734-1-axboe@kernel.dk> References: <20230419224805.693734-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org In preparation for having more than one time of ring allocator, make the existing one return valid/error-pointer rather than just NULL. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 7b4f3eb16a73..13faa3115eb5 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2719,8 +2719,12 @@ static void io_mem_free(void *ptr) static void *io_mem_alloc(size_t size) { gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; + void *ret; - return (void *) __get_free_pages(gfp, get_order(size)); + ret = (void *) __get_free_pages(gfp, get_order(size)); + if (ret) + return ret; + return ERR_PTR(-ENOMEM); } static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, @@ -3686,6 +3690,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, { struct io_rings *rings; size_t size, sq_array_offset; + void *ptr; /* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; @@ -3696,8 +3701,8 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; rings = io_mem_alloc(size); - if (!rings) - return -ENOMEM; + if (IS_ERR(rings)) + return PTR_ERR(rings); ctx->rings = rings; ctx->sq_array = (u32 *)((char *)rings + sq_array_offset); @@ -3716,13 +3721,14 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; } - ctx->sq_sqes = io_mem_alloc(size); - if (!ctx->sq_sqes) { + ptr = io_mem_alloc(size); + if (IS_ERR(ptr)) { io_mem_free(ctx->rings); ctx->rings = NULL; - return -ENOMEM; + return PTR_ERR(ptr); } + ctx->sq_sqes = io_mem_alloc(size); return 0; } From patchwork Wed Apr 19 22:48:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13217613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C399EC6FD18 for ; Wed, 19 Apr 2023 22:48:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232125AbjDSWsS (ORCPT ); Wed, 19 Apr 2023 18:48:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231543AbjDSWsN (ORCPT ); Wed, 19 Apr 2023 18:48:13 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5516D1703 for ; Wed, 19 Apr 2023 15:48:11 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id 98e67ed59e1d1-2470f9acb51so57478a91.0 for ; Wed, 19 Apr 2023 15:48:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1681944490; x=1684536490; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PBUBPDrWetaBK6Buf5vqOOjXplexDaTAi8W+z5Uki0s=; b=sfQhV0a3DxY6PGcv4MQakd2ACpOcQuhQFLyf5eXRaFv8UWEdcfRAOjKtJ7FtQKJu16 mJktGFLK86iKuIYHD94qpc/QpV3BpUqoYuMcrhpGppDSU21HlmJ/AhRVfVBOdEYbHP8H 8kI2pJGUtcCYQz9yLpLKLOY7tZ/1Kcw3Kft2WHNqXyen1gl7cE0MbEkR4XARPXzUxSiu 6xilrEwB06hWpz3a4bWV8ENhKkLMw3VwN03hsdxN+syQZIxg89SBPkU7QCU3yxPE5GLv oSfjEQGAF72+VLz94NaCr8veX+abwxk4Kgw8R93jqOaVC++i9BbYI0gXUbIgiLr3Rmc/ BYEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681944490; x=1684536490; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PBUBPDrWetaBK6Buf5vqOOjXplexDaTAi8W+z5Uki0s=; b=FLepo1kf7clyZzK3kNrqEYQCRhtJODbBaUPqK0AMB5wuDpFOJCYymTcPlL6eulJabX o7usTCGH3A8Yg9dWjrYiDvGvrkUKxGBDwIOdGYXq9EnzHZrjYuTgSavAk44HqyXTiZdM STcIXCcQHhV3bAeFtehNzDXHbmP2EEEQ2qorEyh+4kAngfuZzbzNA8OdrI6C8xdpOmOq x6p9cos8LU1EEFtrQL+D14Kz4PO2IzcQsboUo0i+ES7aCW/Q8TDwIHS2Yf6BsnQkfs+B pgODdTDj61KPLKwn5Y7JWyCPaV7vgwHWRkL4pB3sJ2YIUJVTBrE5ZL5XT9tCBHJNN2IG PICg== X-Gm-Message-State: AAQBX9dQfYpoBd8hBD20dL1sMYxCLfAih3UVpbRx6GhCsEmOjtz57E6l qZutrbDfYtERkswU8QKrQ5fl7i2njElMV90iw6o= X-Google-Smtp-Source: AKy350a1TdIDOWyn6ozc9Tn4NsHTxS1VTx2H5UfloYPXaco7h28fSTcllo53q2HHhSFw6qUTcpU/dA== X-Received: by 2002:a17:90a:1a17:b0:247:9e56:d895 with SMTP id 23-20020a17090a1a1700b002479e56d895mr9267079pjk.1.1681944490416; Wed, 19 Apr 2023 15:48:10 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id l9-20020a17090a49c900b002353082958csm1853364pjm.10.2023.04.19.15.48.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Apr 2023 15:48:09 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/4] io_uring: add ring freeing helper Date: Wed, 19 Apr 2023 16:48:04 -0600 Message-Id: <20230419224805.693734-4-axboe@kernel.dk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230419224805.693734-1-axboe@kernel.dk> References: <20230419224805.693734-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org We do rings and sqes separately, move them into a helper that does both the freeing and clearing of the memory. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 13faa3115eb5..cf570b0f82ec 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2716,6 +2716,14 @@ static void io_mem_free(void *ptr) free_compound_page(page); } +static void io_rings_free(struct io_ring_ctx *ctx) +{ + io_mem_free(ctx->rings); + io_mem_free(ctx->sq_sqes); + ctx->rings = NULL; + ctx->sq_sqes = NULL; +} + static void *io_mem_alloc(size_t size) { gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; @@ -2880,8 +2888,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) mmdrop(ctx->mm_account); ctx->mm_account = NULL; } - io_mem_free(ctx->rings); - io_mem_free(ctx->sq_sqes); + io_rings_free(ctx); percpu_ref_exit(&ctx->refs); free_uid(ctx->user); @@ -3716,15 +3723,13 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, else size = array_size(sizeof(struct io_uring_sqe), p->sq_entries); if (size == SIZE_MAX) { - io_mem_free(ctx->rings); - ctx->rings = NULL; + io_rings_free(ctx); return -EOVERFLOW; } ptr = io_mem_alloc(size); if (IS_ERR(ptr)) { - io_mem_free(ctx->rings); - ctx->rings = NULL; + io_rings_free(ctx); return PTR_ERR(ptr); } From patchwork Wed Apr 19 22:48:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13217614 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A9B4C77B73 for ; Wed, 19 Apr 2023 22:48:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231391AbjDSWsY (ORCPT ); Wed, 19 Apr 2023 18:48:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231678AbjDSWsO (ORCPT ); Wed, 19 Apr 2023 18:48:14 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3646B1991 for ; Wed, 19 Apr 2023 15:48:12 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id 98e67ed59e1d1-24733b262fdso53148a91.1 for ; Wed, 19 Apr 2023 15:48:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1681944491; x=1684536491; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DJouhk3GtHhx4j7Gx0k/VX97WoDn1hZiBkStWsBGUsk=; b=CILzMRnKPxG2P5ORSEwNtPdCR+n06cpYg4hEUCFLJG0jSCLcTGYy8KbA5PFgxvqUZP qqWAbAidxpuDXE0ll6ubRai9XxgP/mhnmNWs/jeHayYBnKzqLDCWLK/mOA1AIGr4Xpg3 ZUObfsOm1jyvodDePOi0gn+g1fyW8lC+QLRpKehoNl16+MfADInOqtBlTK50Y5KcurpE f1A+mL9raVvDrOTShoajzr7uoBZtM/cTJuNuyzNfOrJRjFasmlJ14Dq8jwKr8Mz/kqo2 SKdkbHdpaaNAQuLAAxgzes6gDqD6DoaA27chDGpI+b7FwmDm+dCdcrLXpX4eUNp/gf4+ dvlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681944491; x=1684536491; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DJouhk3GtHhx4j7Gx0k/VX97WoDn1hZiBkStWsBGUsk=; b=HSRpjXAQ8tztOK7JM5FUbOGLyjsnRoz/09IBmtdWmX3JAf+qgZVBwySEwFVckkS1A/ aw/VGeC/6hrnwsq2iueo64WQrgLdQTLme3EBobFzHD2rg6si1K/F4vq8Pk1EOqdr/RCm w51KKDOE/FKtqU1QaoDFqgdZUHSWZC3RLTD7Fp3w18qhAA5gXJp0hJRMGBtocZkz2q7p 6rlCZeGAFcqcJ/nR90qfe0xZQSm/aBCSb4oJLrHB4NZjj2rJYdqqjJ9yrGWPUN6ZqNJD n0t2zc2WgTAWwJ/C+jSj+qo+oC5bjRFx2GaSAsppiOy2UGdafVW1pTpE3+x+ccfHLvBq ZOng== X-Gm-Message-State: AAQBX9dGnmOL1tLDhnnAqd9mYz4t5yIO4l/B2yM/1rQt+leJk3SyAqOf PuRniiOk44pSJ4rgzdN022TKXo7IXZm6BQLdKsY= X-Google-Smtp-Source: AKy350bLAtVy5I9uQtEzMHfuz57P6ZJ3vgScNy04PlF2de6oEj9S1yPUdamjT7mub9QObvm5dj8fRw== X-Received: by 2002:a17:90a:352:b0:247:b1f7:9f67 with SMTP id 18-20020a17090a035200b00247b1f79f67mr7832495pjf.4.1681944491370; Wed, 19 Apr 2023 15:48:11 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id l9-20020a17090a49c900b002353082958csm1853364pjm.10.2023.04.19.15.48.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Apr 2023 15:48:10 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 4/4] io_uring: support for user allocated memory for rings/sqes Date: Wed, 19 Apr 2023 16:48:05 -0600 Message-Id: <20230419224805.693734-5-axboe@kernel.dk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230419224805.693734-1-axboe@kernel.dk> References: <20230419224805.693734-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Currently io_uring applications must call mmap(2) twice to map the rings themselves, and the sqes array. This works fine, but it does not support using huge pages to back the rings/sqes. Provide a way for the application to pass in pre-allocated memory for the rings/sqes, which can then suitably be allocated from shmfs or via mmap to get huge page support. Particularly for larger rings, this reduces the TLBs needed. If an application wishes to take advantage of that, it must pre-allocate the memory needed for the sq/cq ring, and the sqes. The former must be passed in via the io_uring_params->cq_off.user_data field, while the latter is passed in via the io_uring_params->sq_off.user_data field. Then it must set IORING_SETUP_NO_MMAP in the io_uring_params->flags field, and io_uring will then map the existing memory into the kernel for shared use. The application must not call mmap(2) to map rings as it otherwise would have, that will now fail with -EINVAL if this setup flag was used. The pages used for the rings and sqes must be contigious. The intent here is clearly that huge pages should be used, otherwise the normal setup procedure works fine as-is. The application may use one huge page for both the rings and sqes. Outside of those initialization changes, everything works like it did before. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 10 ++++ include/uapi/linux/io_uring.h | 9 ++- io_uring/io_uring.c | 102 +++++++++++++++++++++++++++++---- 3 files changed, 109 insertions(+), 12 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index c54f3fb7ab1a..3489fa223586 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -211,6 +211,16 @@ struct io_ring_ctx { unsigned int compat: 1; enum task_work_notify_mode notify_method; + + /* + * If IORING_SETUP_NO_MMAP is used, then the below holds + * the gup'ed pages for the two rings, and the sqes. + */ + unsigned short n_ring_pages; + unsigned short n_sqe_pages; + struct page **ring_pages; + struct page **sqe_pages; + struct io_rings *rings; struct task_struct *submitter_task; struct percpu_ref refs; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index ea903a677ce9..5499f9728f9d 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -179,6 +179,11 @@ enum { */ #define IORING_SETUP_NO_OFFLOAD (1U << 14) +/* + * Application provides the memory for the rings + */ +#define IORING_SETUP_NO_MMAP (1U << 15) + enum io_uring_op { IORING_OP_NOP, IORING_OP_READV, @@ -412,7 +417,7 @@ struct io_sqring_offsets { __u32 dropped; __u32 array; __u32 resv1; - __u64 resv2; + __u64 user_addr; }; /* @@ -431,7 +436,7 @@ struct io_cqring_offsets { __u32 cqes; __u32 flags; __u32 resv1; - __u64 resv2; + __u64 user_addr; }; /* diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index cf570b0f82ec..d6694bd92453 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2716,12 +2716,80 @@ static void io_mem_free(void *ptr) free_compound_page(page); } +static void io_pages_free(struct page ***pages, int npages) +{ + struct page **page_array; + int i; + + if (!pages) + return; + page_array = *pages; + for (i = 0; i < npages; i++) + unpin_user_page(page_array[i]); + kvfree(page_array); + *pages = NULL; +} + +static void *__io_uaddr_map(struct page ***pages, unsigned short *npages, + unsigned long uaddr, size_t size) +{ + struct page **page_array; + unsigned int nr_pages; + int ret; + + *npages = 0; + + if (uaddr & (PAGE_SIZE - 1) || !size) + return ERR_PTR(-EINVAL); + + nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + if (nr_pages > USHRT_MAX) + return ERR_PTR(-EINVAL); + page_array = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!page_array) + return ERR_PTR(-ENOMEM); + + ret = pin_user_pages_fast(uaddr, nr_pages, FOLL_WRITE | FOLL_LONGTERM, + page_array); + if (ret != nr_pages) { +err: + io_pages_free(&page_array, ret > 0 ? ret : 0); + return ret < 0 ? ERR_PTR(ret) : ERR_PTR(-EFAULT); + } + /* pages must be contig */ + ret--; + if (page_array[0] + ret != page_array[ret]) + goto err; + *pages = page_array; + *npages = nr_pages; + return page_to_virt(page_array[0]); +} + +static void *io_rings_map(struct io_ring_ctx *ctx, unsigned long uaddr, + size_t size) +{ + return __io_uaddr_map(&ctx->ring_pages, &ctx->n_ring_pages, uaddr, + size); +} + +static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr, + size_t size) +{ + return __io_uaddr_map(&ctx->sqe_pages, &ctx->n_sqe_pages, uaddr, + size); +} + static void io_rings_free(struct io_ring_ctx *ctx) { - io_mem_free(ctx->rings); - io_mem_free(ctx->sq_sqes); - ctx->rings = NULL; - ctx->sq_sqes = NULL; + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) { + io_mem_free(ctx->rings); + io_mem_free(ctx->sq_sqes); + ctx->rings = NULL; + ctx->sq_sqes = NULL; + } else { + io_pages_free(&ctx->ring_pages, ctx->n_ring_pages); + io_pages_free(&ctx->sqe_pages, ctx->n_sqe_pages); + } } static void *io_mem_alloc(size_t size) @@ -3366,6 +3434,10 @@ static void *io_uring_validate_mmap_request(struct file *file, struct page *page; void *ptr; + /* Don't allow mmap if the ring was setup without it */ + if (ctx->flags & IORING_SETUP_NO_MMAP) + return ERR_PTR(-EINVAL); + switch (offset & IORING_OFF_MMAP_MASK) { case IORING_OFF_SQ_RING: case IORING_OFF_CQ_RING: @@ -3707,7 +3779,11 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, if (size == SIZE_MAX) return -EOVERFLOW; - rings = io_mem_alloc(size); + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + rings = io_mem_alloc(size); + else + rings = io_rings_map(ctx, p->cq_off.user_addr, size); + if (IS_ERR(rings)) return PTR_ERR(rings); @@ -3727,13 +3803,17 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; } - ptr = io_mem_alloc(size); + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + ptr = io_mem_alloc(size); + else + ptr = io_sqes_map(ctx, p->sq_off.user_addr, size); + if (IS_ERR(ptr)) { io_rings_free(ctx); return PTR_ERR(ptr); } - ctx->sq_sqes = io_mem_alloc(size); + ctx->sq_sqes = ptr; return 0; } @@ -3919,7 +3999,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->sq_off.dropped = offsetof(struct io_rings, sq_dropped); p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings; p->sq_off.resv1 = 0; - p->sq_off.resv2 = 0; + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + p->sq_off.user_addr = 0; p->cq_off.head = offsetof(struct io_rings, cq.head); p->cq_off.tail = offsetof(struct io_rings, cq.tail); @@ -3929,7 +4010,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->cq_off.cqes = offsetof(struct io_rings, cqes); p->cq_off.flags = offsetof(struct io_rings, cq_flags); p->cq_off.resv1 = 0; - p->cq_off.resv2 = 0; + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + p->cq_off.user_addr = 0; p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP | IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS | @@ -3996,7 +4078,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG | IORING_SETUP_SQE128 | IORING_SETUP_CQE32 | IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN | - IORING_SETUP_NO_OFFLOAD)) + IORING_SETUP_NO_OFFLOAD | IORING_SETUP_NO_MMAP)) return -EINVAL; return io_uring_create(entries, &p, params);