From patchwork Thu Nov 30 19:45:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13474965 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="b4oQpmwm" Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com [IPv6:2607:f8b0:4864:20::d34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1D48D67 for ; Thu, 30 Nov 2023 11:46:41 -0800 (PST) Received: by mail-io1-xd34.google.com with SMTP id ca18e2360f4ac-7b393fd9419so6309339f.0 for ; Thu, 30 Nov 2023 11:46:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1701373601; x=1701978401; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rzni/7YGjthyX/EKfOM3P3qwS1b3E+PYRXzgk/3HwtU=; b=b4oQpmwmKPW7DgXXS5pRaL4H0dc7S8TdNy3Wj7W/IqquqQo6ko7dpbSOhq+eNJX8em 9l8K8GA2Rs6GEoOPsq7kHYr/lMlQohjOkoUlKzZLRx9o1G+7DSsKXgrsuexKpJVLyCiH Al7/b6CV+XCaWbGQjmwOjwOqq4vDGn/ZnWUxq3LPu6eZywcguavY0G0mjyI2QNUId109 aXvbFx/waNVHWK5OXs8rP8V1KFE67bVhkBuCGQ+LQkUpghQqbHX9ipWN0K0nWxLLTSiQ LWESbHkbHoP2LedCez82geWbKQC7kFEbblVa9iSzFtcSJKGkpNr5NTp5vphQ/4YVMpb3 aosw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701373601; x=1701978401; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rzni/7YGjthyX/EKfOM3P3qwS1b3E+PYRXzgk/3HwtU=; b=kIPkf5tKiRV4UMo6rIa6Inm069Z3gmnnaAd8PvYYviorJ3Y6UftxA0SiGR0zuv7kPP vFrvLnjM5QlIzLpTs+h2S3cU98BAKewTEk6HPEEgl3/neBdjMmf1TsbeYhtk6TifFLbk KopkDS0xJ8M0/qrqcRL2tZK+gFXKpHo448VlHS9SuzttIK8xRN/Txs3ORW81Wm0nTAuL qVI+JJgblf5ZjKA7I5a9uax2u15XwAdmvlXAW4Y099PvccnzcYBiggaizu9MD1fu3VF/ FZFtqIoDZB4byI66+Z3LHNS+uFJjVVk4QVcF+Ut3UveefVpy4vpcM3xUjHYcK/eyqkdx HzVg== X-Gm-Message-State: AOJu0YwCNsQbrUU2YPwyE0PSlgm+ARx/ch52H/qVzpyPcruApO1mC4W0 DamM1VZ8B8V9Zzf2jIdjhV44hES8lxl0cdjlL5zucg== X-Google-Smtp-Source: AGHT+IGC3u4ZpnZCT+lO/hHmrcofG05ga5uT1s4J/28aCYMZ/Hx7i65Y9YHMKbcs/IFc15WQg0qk6g== X-Received: by 2002:a05:6602:489a:b0:7b3:95a4:de9c with SMTP id ee26-20020a056602489a00b007b395a4de9cmr19103111iob.1.1701373600877; Thu, 30 Nov 2023 11:46:40 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a18-20020a029f92000000b004667167d8cdsm461179jam.116.2023.11.30.11.46.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 11:46:39 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe , stable@vger.kernel.org, Jann Horn Subject: [PATCH 1/8] io_uring: don't allow discontig pages for IORING_SETUP_NO_MMAP Date: Thu, 30 Nov 2023 12:45:47 -0700 Message-ID: <20231130194633.649319-2-axboe@kernel.dk> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231130194633.649319-1-axboe@kernel.dk> References: <20231130194633.649319-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 io_sqes_map() is used rather than io_mem_alloc(), if the application passes in memory for mapping rather than have the kernel allocate it and then mmap(2) the ranges. This then calls __io_uaddr_map() to perform the page mapping and pinning, which checks if we end up with the same pages, if more than one page is mapped. But this check is incorrect and only checks if the first and last pages are the same, where it really should be checking if the mapped pages are contigous. This allows mapping a single normal page, or a huge page range. Down the line we can add support for remapping pages to be virtually contigous, which is really all that io_uring cares about. Cc: stable@vger.kernel.org Fixes: 03d89a2de25b ("io_uring: support for user allocated memory for rings/sqes") Reported-by: Jann Horn Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ed254076c723..b45abfd75415 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2697,6 +2697,7 @@ static void *__io_uaddr_map(struct page ***pages, unsigned short *npages, { struct page **page_array; unsigned int nr_pages; + void *page_addr; int ret, i; *npages = 0; @@ -2718,27 +2719,29 @@ static void *__io_uaddr_map(struct page ***pages, unsigned short *npages, io_pages_free(&page_array, ret > 0 ? ret : 0); return ret < 0 ? ERR_PTR(ret) : ERR_PTR(-EFAULT); } - /* - * Should be a single page. If the ring is small enough that we can - * use a normal page, that is fine. If we need multiple pages, then - * userspace should use a huge page. That's the only way to guarantee - * that we get contigious memory, outside of just being lucky or - * (currently) having low memory fragmentation. - */ - if (page_array[0] != page_array[ret - 1]) - goto err; - /* - * Can't support mapping user allocated ring memory on 32-bit archs - * where it could potentially reside in highmem. Just fail those with - * -EINVAL, just like we did on kernels that didn't support this - * feature. - */ + page_addr = page_address(page_array[0]); for (i = 0; i < nr_pages; i++) { - if (PageHighMem(page_array[i])) { - ret = -EINVAL; + ret = -EINVAL; + + /* + * Can't support mapping user allocated ring memory on 32-bit + * archs where it could potentially reside in highmem. Just + * fail those with -EINVAL, just like we did on kernels that + * didn't support this feature. + */ + if (PageHighMem(page_array[i])) goto err; - } + + /* + * No support for discontig pages for now, should either be a + * single normal page, or a huge page. Later on we can add + * support for remapping discontig pages, for now we will + * just fail them with EINVAL. + */ + if (page_address(page_array[i]) != page_addr) + goto err; + page_addr += PAGE_SIZE; } *pages = page_array; From patchwork Thu Nov 30 19:45:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13474964 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="XnXwgLPZ" Received: from mail-io1-xd29.google.com (mail-io1-xd29.google.com [IPv6:2607:f8b0:4864:20::d29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BBA8D7F for ; Thu, 30 Nov 2023 11:46:43 -0800 (PST) Received: by mail-io1-xd29.google.com with SMTP id ca18e2360f4ac-7b05e65e784so6243839f.1 for ; Thu, 30 Nov 2023 11:46:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1701373602; x=1701978402; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=f9hDKK//PcdFxhtS7/Qn7jAt09MAas7oaU4k4FD1Rqw=; b=XnXwgLPZvmIeKv2Y9jZtds8oYOniL/y4dXZrOBat7gqzpaT/wA3b9mK1k06+L/IsVW xmftrj7K1Xr7lkTuDA44QTNiBuSYA+L25pGFYt7KP7iwr6Nu0zz2Ey4gRl2THyW6k7ur LuUEaOEL3hrSPVsw/9raK9Jtv2r5NqkbrLgJDN+uKAOdy9G6hCdh+d0RnbpqbwR4FVeA xUBGfzZo/klUzWUylM+8LDesp1dhdYqGnKBgQXWRUMWtpHGJtGpQRLTSKS0XLStgYYVi T40L+ddaH9UDgzdKfqZb+gneehZui2HyeVcDkOARPjmkc1yFLecHXjGSayzBWubfEs7k B7RQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701373602; x=1701978402; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=f9hDKK//PcdFxhtS7/Qn7jAt09MAas7oaU4k4FD1Rqw=; b=SyPgs44+j12PbkqrJamEPSmO7hRGETODzf6Pc3F+rZxQIBXBpZ1Ibkd0/PM89zT8Ie QHmZmpOp9ER7XHAQjxVL4xKcWrgjJhi9QsZzbIzQYK+3edFeqhSlj5EScqPH12yiDlW9 gOrFtEHWVMPPofro37cKPwY/FBcIDzLHviSRbP11yHom3O4i1WZM7eMpqx994NQifA4/ Bo5ByYo12CjswSBOeC86Kl1+HGd4j3cRL56q9Lod6n8a+Q8VOCYRA5j1po4Ga0pv+Mzn uooZ2/73x+g6jff/qjhVMBFNj6fpG0VtkJ6k+xt7KCG2oKVbiABp6+nVSYud9dn7Rjsb FABQ== X-Gm-Message-State: AOJu0Yyk7wypOTBSPlIiffN7kBu/3jJ2M5k9T2qhbp9Dq7KVKObFs1DW SGYC8UMDgWMx/BowdHVeNjZTMsZmYKZsNapvTZnwRQ== X-Google-Smtp-Source: AGHT+IFJ6mHvElhGT5jPQ8JYEJZcf+L1A7r0mDrYJ68VOR4tM1Lj8OUi3k06cw9dAGnoUNZ86ahGUg== X-Received: by 2002:a5e:9512:0:b0:7b0:75a7:6606 with SMTP id r18-20020a5e9512000000b007b075a76606mr22604509ioj.0.1701373602272; Thu, 30 Nov 2023 11:46:42 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a18-20020a029f92000000b004667167d8cdsm461179jam.116.2023.11.30.11.46.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 11:46:41 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe , stable@vger.kernel.org Subject: [PATCH 2/8] io_uring: don't guard IORING_OFF_PBUF_RING with SETUP_NO_MMAP Date: Thu, 30 Nov 2023 12:45:48 -0700 Message-ID: <20231130194633.649319-3-axboe@kernel.dk> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231130194633.649319-1-axboe@kernel.dk> References: <20231130194633.649319-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This flag only applies to the SQ and CQ rings, it's perfectly valid to use a mmap approach for the provided ring buffers. Move the check into where it belongs. Cc: stable@vger.kernel.org Fixes: 03d89a2de25b ("io_uring: support for user allocated memory for rings/sqes") Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index b45abfd75415..52e4b14ad8aa 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3478,16 +3478,18 @@ static void *io_uring_validate_mmap_request(struct file *file, struct page *page; void *ptr; - /* Don't allow mmap if the ring was setup without it */ - if (ctx->flags & IORING_SETUP_NO_MMAP) - return ERR_PTR(-EINVAL); - switch (offset & IORING_OFF_MMAP_MASK) { case IORING_OFF_SQ_RING: case IORING_OFF_CQ_RING: + /* Don't allow mmap if the ring was setup without it */ + if (ctx->flags & IORING_SETUP_NO_MMAP) + return ERR_PTR(-EINVAL); ptr = ctx->rings; break; case IORING_OFF_SQES: + /* Don't allow mmap if the ring was setup without it */ + if (ctx->flags & IORING_SETUP_NO_MMAP) + return ERR_PTR(-EINVAL); ptr = ctx->sq_sqes; break; case IORING_OFF_PBUF_RING: { From patchwork Thu Nov 30 19:45:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13474966 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="Otj0HMzD" Received: from mail-io1-xd2a.google.com (mail-io1-xd2a.google.com [IPv6:2607:f8b0:4864:20::d2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A49DD6C for ; Thu, 30 Nov 2023 11:46:44 -0800 (PST) Received: by mail-io1-xd2a.google.com with SMTP id ca18e2360f4ac-7b38ff8a517so16096839f.1 for ; Thu, 30 Nov 2023 11:46:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1701373603; x=1701978403; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6Eqt0miAnRu0iNO8NoAJMZVKKakVekPyJk+CuHmgmXM=; b=Otj0HMzD59nkOcFkalnjDuX8kiNjrlO4Oax1+AKJjUTNosJ5M8IAeX6ov3CrfwoXVc z0V5hiMDEBk0jaJ2BuNFiqLK6cAY8TejdxFpeR1H2MGcWoCwvagBo768W8ncnPre14jX sGA6PQFTOm55WOrmJesr4giw2WjOJnRP+WsFo9A1e45On4lDqOCELWapkjKG4qmUG91R V8jEJNPNAdQRopnAuu5UkE+p2H1k7X8c9GEYjWL7+1U3mNWjXNcHZKGOpI1/+PSaD8KH 2I6/Zm96sk8/5L6eKOlAzEGsUdQkZiubW2c+Ef3Ef7KlPwLoJ4crQCncq1fZQvU0hFp8 H1GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701373603; x=1701978403; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6Eqt0miAnRu0iNO8NoAJMZVKKakVekPyJk+CuHmgmXM=; b=ALqVaXBBXggo/J9uCYJRBEaThDFxz3WdZx5CAxxJ4JLDzIOxp6dfLyxK5NWKTA+fKn aGeyJU2MYKCoxlR+jLrgQ1gAonI55l8INbOsbQOh4OHpJvHeokQpkgohJX9NH8YZ/ChP G2ZZnL+jOaitvF1F/LEQhxcnOUJBxCePlVJyrqNI3WppE3I7/3bg5i9YHdG2SwgYOsEw VbeWNxcvNESn4GzKuQbLDnCKbEz88xBjfxJfDDvhfpbxa2bkGxNZiMCwNvE7lyGiDQge Uh/w++g5dGtWl64KQqQto0eJ+0KqiDYxn45Tu8el7f3HN1QHioqIVgRprMHtLSmsoaUx s2yA== X-Gm-Message-State: AOJu0YzEiVLnrmFYUa1FaTYuV3IAV1y0ylNsYFjZO4vK1bPyAvs+mlAu txRbUae4H1zVzuPYbHvG0ZQiuLEKh5EqYBEO/pFjuw== X-Google-Smtp-Source: AGHT+IGUTY4qbf+aOjTRd8OzjvlLIKHbF75D0ffWNDx+6r0irNKJVFSOO6uShhcFGcrikpV+VEe0gw== X-Received: by 2002:a05:6602:2bd2:b0:7b3:e4a6:45d4 with SMTP id s18-20020a0566022bd200b007b3e4a645d4mr7544673iov.0.1701373603556; Thu, 30 Nov 2023 11:46:43 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a18-20020a029f92000000b004667167d8cdsm461179jam.116.2023.11.30.11.46.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 11:46:42 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/8] io_uring: enable io_mem_alloc/free to be used in other parts Date: Thu, 30 Nov 2023 12:45:49 -0700 Message-ID: <20231130194633.649319-4-axboe@kernel.dk> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231130194633.649319-1-axboe@kernel.dk> References: <20231130194633.649319-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for using these helpers, make them non-static and add them to our internal header. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 4 ++-- io_uring/io_uring.h | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 52e4b14ad8aa..e40b11438210 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2666,7 +2666,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0; } -static void io_mem_free(void *ptr) +void io_mem_free(void *ptr) { if (!ptr) return; @@ -2778,7 +2778,7 @@ static void io_rings_free(struct io_ring_ctx *ctx) } } -static void *io_mem_alloc(size_t size) +void *io_mem_alloc(size_t size) { gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; void *ret; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index dc6d779b452b..ed84f2737b3a 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -86,6 +86,9 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); +void *io_mem_alloc(size_t size); +void io_mem_free(void *ptr); + #if defined(CONFIG_PROVE_LOCKING) static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) { From patchwork Thu Nov 30 19:45:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13474967 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="YpP0Tl5p" Received: from mail-il1-x12f.google.com (mail-il1-x12f.google.com [IPv6:2607:f8b0:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B99DD5C for ; Thu, 30 Nov 2023 11:46:46 -0800 (PST) Received: by mail-il1-x12f.google.com with SMTP id e9e14a558f8ab-35c683417f1so1093855ab.0 for ; Thu, 30 Nov 2023 11:46:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1701373605; x=1701978405; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hxeP5jSRRX1HG4Q7VF5uTV5vDunsnLEvK5wo5+KXwMg=; b=YpP0Tl5pAtrtu6Ltuw5syOoZYFCOax/w/1It8VueVa15f3z5QsuRpnhXCvQP3vNGTa ZShsVmDUJe+C984NeZ20hSCiK1o4gc/3s2XgkyIINZbgbTeCpgksgEXBMek1y++vi5Pn akH2z5Lj9SJtVZM0F8U0n6HznlKT2Bo/eFW7OZsA0/8JhkQCJzYLo41nwluijOnw1Fso sHIttV9kEnBz0FPjB238RZ98/+qzsoYjyZTxjnpETY0u9t/p0Q+IMgrtL12SRSx0nvJY T0qilMKcQ1LGvvDyIhzcDJOyuRFrk/YaEnkdOZc4t04kRek4boEFY+gLlqMtUi4GFO/I p/Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701373605; x=1701978405; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hxeP5jSRRX1HG4Q7VF5uTV5vDunsnLEvK5wo5+KXwMg=; b=iglfQqLFornJi1qc/da/wSMLClFGO9KqRNvbD3MNcssq/pJA2XMR+8fOiFRcyaEYFl Vh7UkOcS3+DDGXIl+aXcyKfeG/kY+wQeOfTfl6/zNRLIbQHjGvWMqy+RYw5VU2N/iLQY /sF3TvN0dkYy0fHxBY9KsVWoGLymCO1NTjKJwCKuWPG02nTTwcAt8lZOh43mXUUNSofY GqttNKUL5NMP6QtHF59yN+LFQ2Oe58l3xExOzfQgcX6vX9311pbwoS08HK4+v33V2vm+ i3m+/9URRStOUl3uCLKkM+RMIwAnn3ULm40PrApFl1jvztjLykD4qtRYVqiprI6aUl8W oDEw== X-Gm-Message-State: AOJu0YxhzCehQj4UUOqeIFJL9l14nWcWCyILP6o6WC70sNSftFJ82X5T ZP6dDKqX1KJz19Uq0Z6W1auyLWaz0SeMVfGsMkHuJw== X-Google-Smtp-Source: AGHT+IGsIkCPSEw70Evsd2pxr6HUEd73RHYUxadc7Yk55VL/36orPOUQVAZ/dKVmJRfgw7e4/50weg== X-Received: by 2002:a6b:660f:0:b0:7b3:58c4:b894 with SMTP id a15-20020a6b660f000000b007b358c4b894mr20263864ioc.1.1701373604903; Thu, 30 Nov 2023 11:46:44 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a18-20020a029f92000000b004667167d8cdsm461179jam.116.2023.11.30.11.46.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 11:46:43 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe , stable@vger.kernel.org, Jann Horn Subject: [PATCH 4/8] io_uring/kbuf: defer release of mapped buffer rings Date: Thu, 30 Nov 2023 12:45:50 -0700 Message-ID: <20231130194633.649319-5-axboe@kernel.dk> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231130194633.649319-1-axboe@kernel.dk> References: <20231130194633.649319-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If a provided buffer ring is setup with IOU_PBUF_RING_MMAP, then the kernel allocates the memory for it and the application is expected to mmap(2) this memory. However, io_uring uses remap_pfn_range() for this operation, so we cannot rely on normal munmap/release on freeing them for us. Stash an io_buf_free entry away for each of these, if any, and provide a helper to free them post ->release(). Cc: stable@vger.kernel.org Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring") Reported-by: Jann Horn Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 3 +++ io_uring/io_uring.c | 2 ++ io_uring/kbuf.c | 44 ++++++++++++++++++++++++++++++---- io_uring/kbuf.h | 2 ++ 4 files changed, 46 insertions(+), 5 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index d3009d56af0b..805bb635cdf5 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -340,6 +340,9 @@ struct io_ring_ctx { struct list_head io_buffers_cache; + /* deferred free list, protected by ->uring_lock */ + struct hlist_head io_buf_list; + /* Keep this last, we don't need it for the fast path */ struct wait_queue_head poll_wq; struct io_restriction restrictions; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index e40b11438210..3a216f0744dd 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -325,6 +325,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_LIST_HEAD(&ctx->sqd_list); INIT_LIST_HEAD(&ctx->cq_overflow_list); INIT_LIST_HEAD(&ctx->io_buffers_cache); + INIT_HLIST_HEAD(&ctx->io_buf_list); io_alloc_cache_init(&ctx->rsrc_node_cache, IO_NODE_ALLOC_CACHE_MAX, sizeof(struct io_rsrc_node)); io_alloc_cache_init(&ctx->apoll_cache, IO_ALLOC_CACHE_MAX, @@ -2950,6 +2951,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) ctx->mm_account = NULL; } io_rings_free(ctx); + io_kbuf_mmap_list_free(ctx); percpu_ref_exit(&ctx->refs); free_uid(ctx->user); diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index a1e4239c7d75..85e680fc74ce 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -33,6 +33,11 @@ struct io_provide_buf { __u16 bid; }; +struct io_buf_free { + struct hlist_node list; + void *mem; +}; + static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx, unsigned int bgid) { @@ -223,7 +228,10 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, if (bl->is_mapped) { i = bl->buf_ring->tail - bl->head; if (bl->is_mmap) { - folio_put(virt_to_folio(bl->buf_ring)); + /* + * io_kbuf_list_free() will free the page(s) at + * ->release() time. + */ bl->buf_ring = NULL; bl->is_mmap = 0; } else if (bl->buf_nr_pages) { @@ -531,18 +539,28 @@ static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg, return -EINVAL; } -static int io_alloc_pbuf_ring(struct io_uring_buf_reg *reg, +static int io_alloc_pbuf_ring(struct io_ring_ctx *ctx, + struct io_uring_buf_reg *reg, struct io_buffer_list *bl) { - gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; + struct io_buf_free *ibf; size_t ring_size; void *ptr; ring_size = reg->ring_entries * sizeof(struct io_uring_buf_ring); - ptr = (void *) __get_free_pages(gfp, get_order(ring_size)); + ptr = io_mem_alloc(ring_size); if (!ptr) return -ENOMEM; + /* Allocate and store deferred free entry */ + ibf = kmalloc(sizeof(*ibf), GFP_KERNEL_ACCOUNT); + if (!ibf) { + io_mem_free(ptr); + return -ENOMEM; + } + ibf->mem = ptr; + hlist_add_head(&ibf->list, &ctx->io_buf_list); + bl->buf_ring = ptr; bl->is_mapped = 1; bl->is_mmap = 1; @@ -599,7 +617,7 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) if (!(reg.flags & IOU_PBUF_RING_MMAP)) ret = io_pin_pbuf_ring(®, bl); else - ret = io_alloc_pbuf_ring(®, bl); + ret = io_alloc_pbuf_ring(ctx, ®, bl); if (!ret) { bl->nr_entries = reg.ring_entries; @@ -649,3 +667,19 @@ void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid) return bl->buf_ring; } + +/* + * Called at or after ->release(), free the mmap'ed buffers that we used + * for memory mapped provided buffer rings. + */ +void io_kbuf_mmap_list_free(struct io_ring_ctx *ctx) +{ + struct io_buf_free *ibf; + struct hlist_node *tmp; + + hlist_for_each_entry_safe(ibf, tmp, &ctx->io_buf_list, list) { + hlist_del(&ibf->list); + io_mem_free(ibf->mem); + kfree(ibf); + } +} diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index f2d615236b2c..6c7646e6057c 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -51,6 +51,8 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags); int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg); int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg); +void io_kbuf_mmap_list_free(struct io_ring_ctx *ctx); + unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags); bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags); From patchwork Thu Nov 30 19:45:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13474968 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="fNGC8Ocj" Received: from mail-io1-xd2a.google.com (mail-io1-xd2a.google.com [IPv6:2607:f8b0:4864:20::d2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B34FFD67 for ; Thu, 30 Nov 2023 11:46:47 -0800 (PST) Received: by mail-io1-xd2a.google.com with SMTP id ca18e2360f4ac-7b05e65e784so6244839f.1 for ; Thu, 30 Nov 2023 11:46:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1701373606; x=1701978406; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jcoagagE8ChRc7FE6rp+15OsfYRm5luXE8rawYKeohg=; b=fNGC8OcjD73GlceajgsEdO/tl6dRWns8+jDzZB4sWui840VOgxcoXijHrsNk93bOjy xmzNshdvVNwYKYR3jLS2qyhljV7YBHyxlLRejC29QEc6vmX+XvtyQ+ub/IwftgkEJQMw 9c5vxEnTTd3gee5t8jM/pDyGjJ5xAqqcdIH7nYZOIhWjbrWOPOqDmX9ncBw7eLlDn1xJ cw/+Nyzxp4zBjfFmDUCp/lMlhUelZe3Uevsia6pGeNOCw1ZpyAvMV+cQTzYiNxzUcabB TPg3s/YhZLxWy7ND9vStUuR/PyyQm1BANJyRWxKQRok8K3XCYe9VDNaLib1sj8qQhJsG 22AA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701373606; x=1701978406; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jcoagagE8ChRc7FE6rp+15OsfYRm5luXE8rawYKeohg=; b=qqEAyfXxEhhsqsuvAJxYimu8XZDfCHHAQ+/CiXO0TfkTMnsXJCmRndjA+74EXNnhSq kfTvZLCQDb2rmgCxq25Y1aZfGDbf/bg01JWR4YQ/LdquKmKvmpiERyWA9x+FsPU3enAf VqnBdh9qynj0qMZXJr7pr7L5FgTzvja2vtHl66H4uL0izeV4KsR3hcjM7HYgWPTkIid2 PsSsCdhYTXalf63tJ8jSku6GkBASZDu3LMrXBuEfwczKfEo71R2ew1ywjJHxfGR9UgN+ TFLd0WYtJA58FNT7WJCYVhnOT7jYES2marEDSNWDzgZpC3mw1k5wj/f/5fkN3BWOeOun nxNQ== X-Gm-Message-State: AOJu0YzG5UKDz/yS9ln4o3eGfDAdDMyRlIHrJ+IsMlw8YC57DSMtfUS/ KSgEIX6MoFGlwEv+fT2Y9XcwFAtoLlXS14b2LN2lSg== X-Google-Smtp-Source: AGHT+IHYo1wZxnO/I9f6KwWUcCPY7rAlVcUGoLFml+lSwBlFcbgW8tz5td2B5c0UNqcVhkjj8BlnDw== X-Received: by 2002:a5e:9512:0:b0:7b0:75a7:6606 with SMTP id r18-20020a5e9512000000b007b075a76606mr22604669ioj.0.1701373606356; Thu, 30 Nov 2023 11:46:46 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a18-20020a029f92000000b004667167d8cdsm461179jam.116.2023.11.30.11.46.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 11:46:45 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 5/8] io_uring/kbuf: recycle freed mapped buffer ring entries Date: Thu, 30 Nov 2023 12:45:51 -0700 Message-ID: <20231130194633.649319-6-axboe@kernel.dk> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231130194633.649319-1-axboe@kernel.dk> References: <20231130194633.649319-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Right now we stash any potentially mmap'ed provided ring buffer range for freeing at release time, regardless of when they get unregistered. Since we're keeping track of these ranges anyway, keep track of their registration state as well, and use that to recycle ranges when appropriate rather than always allocate new ones. The lookup is a basic scan of entries, checking for the best matching free entry. Fixes: c392cbecd8ec ("io_uring/kbuf: defer release of mapped buffer rings") Signed-off-by: Jens Axboe --- io_uring/kbuf.c | 77 ++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 66 insertions(+), 11 deletions(-) diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 85e680fc74ce..325ca7f8b0a0 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -36,6 +36,8 @@ struct io_provide_buf { struct io_buf_free { struct hlist_node list; void *mem; + size_t size; + int inuse; }; static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx, @@ -216,6 +218,24 @@ static __cold int io_init_bl_list(struct io_ring_ctx *ctx) return 0; } +/* + * Mark the given mapped range as free for reuse + */ +static void io_kbuf_mark_free(struct io_ring_ctx *ctx, struct io_buffer_list *bl) +{ + struct io_buf_free *ibf; + + hlist_for_each_entry(ibf, &ctx->io_buf_list, list) { + if (bl->buf_ring == ibf->mem) { + ibf->inuse = 0; + return; + } + } + + /* can't happen... */ + WARN_ON_ONCE(1); +} + static int __io_remove_buffers(struct io_ring_ctx *ctx, struct io_buffer_list *bl, unsigned nbufs) { @@ -232,6 +252,7 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, * io_kbuf_list_free() will free the page(s) at * ->release() time. */ + io_kbuf_mark_free(ctx, bl); bl->buf_ring = NULL; bl->is_mmap = 0; } else if (bl->buf_nr_pages) { @@ -539,6 +560,34 @@ static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg, return -EINVAL; } +/* + * See if we have a suitable region that we can reuse, rather than allocate + * both a new io_buf_free and mem region again. We leave it on the list as + * even a reused entry will need freeing at ring release. + */ +static struct io_buf_free *io_lookup_buf_free_entry(struct io_ring_ctx *ctx, + size_t ring_size) +{ + struct io_buf_free *ibf, *best = NULL; + size_t best_dist; + + hlist_for_each_entry(ibf, &ctx->io_buf_list, list) { + size_t dist; + + if (ibf->inuse || ibf->size < ring_size) + continue; + dist = ibf->size - ring_size; + if (!best || dist < best_dist) { + best = ibf; + if (!dist) + break; + best_dist = dist; + } + } + + return best; +} + static int io_alloc_pbuf_ring(struct io_ring_ctx *ctx, struct io_uring_buf_reg *reg, struct io_buffer_list *bl) @@ -548,20 +597,26 @@ static int io_alloc_pbuf_ring(struct io_ring_ctx *ctx, void *ptr; ring_size = reg->ring_entries * sizeof(struct io_uring_buf_ring); - ptr = io_mem_alloc(ring_size); - if (!ptr) - return -ENOMEM; - /* Allocate and store deferred free entry */ - ibf = kmalloc(sizeof(*ibf), GFP_KERNEL_ACCOUNT); + /* Reuse existing entry, if we can */ + ibf = io_lookup_buf_free_entry(ctx, ring_size); if (!ibf) { - io_mem_free(ptr); - return -ENOMEM; - } - ibf->mem = ptr; - hlist_add_head(&ibf->list, &ctx->io_buf_list); + ptr = io_mem_alloc(ring_size); + if (!ptr) + return -ENOMEM; - bl->buf_ring = ptr; + /* Allocate and store deferred free entry */ + ibf = kmalloc(sizeof(*ibf), GFP_KERNEL_ACCOUNT); + if (!ibf) { + io_mem_free(ptr); + return -ENOMEM; + } + ibf->mem = ptr; + ibf->size = ring_size; + hlist_add_head(&ibf->list, &ctx->io_buf_list); + } + ibf->inuse = 1; + bl->buf_ring = ibf->mem; bl->is_mapped = 1; bl->is_mmap = 1; return 0; From patchwork Thu Nov 30 19:45:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13474969 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="KN3L5ncl" Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1EDCDD5C for ; Thu, 30 Nov 2023 11:46:49 -0800 (PST) Received: by mail-io1-xd35.google.com with SMTP id ca18e2360f4ac-7b393fd9419so6310539f.0 for ; Thu, 30 Nov 2023 11:46:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1701373608; x=1701978408; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/3P7Om/JPlWfZ4/jS1Ma8gFMYggIm+5/nIXeclWkiYw=; b=KN3L5nclLvp6Afv5zNa3Ou2h7F3msVEGKghm+vwSp04bChflyf3QYthQo1b8nsCPNq FiroSX1qYdVW2W+6FdJihjOO83J/OmylsqhWwmNrO7iUcjlR8A8CztpmrZ5pA2SeJkex m5GpVZhW0ZC+WhsgHswOoQcV7ugfw3Qz7K+zioyp5QEUzmPO+BvKOO5EqELkpKh7NdU8 TgfN4gizlQVTOejANMDhY1xz4OqkPheU7iv9EAm7VG+ZbcXV1Vjtlb2LRiSPEWkDtSr9 +0//47vsBG6k2OHn5mJq7O2eOXG4xknSn9wyTXFSf30lS2x6hLPObArBryjDcL5BTSrN XVeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701373608; x=1701978408; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/3P7Om/JPlWfZ4/jS1Ma8gFMYggIm+5/nIXeclWkiYw=; b=WBylJ9FeKFNLPb+toORFdldz9XLkeISA2a1Xh4EX1nNSMtZHfy7QIRBJ6tXmwIROvv mliZbBaEs3y0pSdudcE3Qy9c5XeP94fjv7l3tnEH60ACYq6tpbbqGMFFNqxDl8qiq07Y eUIxF97zZg8m1Ek8CCVTXu1+MIufExf5HxVIK1HG0eZxmQLCzegi+IJlUV2ss5bOQ8xn r782NI6JSzmDV2rcD8cAbU6y3KJ5svmE5xiN6nOhjEM8grR96wM0pPQHo7hae/L8eC8U zeU+TEyduzUo6MeAp+QmkhbhdFQMkUjASSn7JeumYMCYZvRgADb+nRrVuTNk04CgMuQN /KjQ== X-Gm-Message-State: AOJu0Yz61WhcMo6g59VlWNGBE8XAygBr2f7CkgY7dcbr4cOaSiL8WHru bGtZJKiDwIvrRrcnJlW/hK1eBo7FuG4LJyQN6raTiQ== X-Google-Smtp-Source: AGHT+IHWzQOvQpyQ7TCT6v52vO5dssdzUUTAwx2UcoTTJJy3RZFtI/HEyPApbSls/VdTZRmFGoohOA== X-Received: by 2002:a05:6602:2245:b0:7b3:5be5:fa55 with SMTP id o5-20020a056602224500b007b35be5fa55mr22989581ioo.2.1701373608102; Thu, 30 Nov 2023 11:46:48 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a18-20020a029f92000000b004667167d8cdsm461179jam.116.2023.11.30.11.46.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 11:46:46 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 6/8] io_uring/kbuf: prune deferred locked cache when tearing down Date: Thu, 30 Nov 2023 12:45:52 -0700 Message-ID: <20231130194633.649319-7-axboe@kernel.dk> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231130194633.649319-1-axboe@kernel.dk> References: <20231130194633.649319-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We used to just use our page list for final teardown, which would ensure that we got all the buffers, even the ones that were not on the normal cached list. But while moving to slab for the io_buffers, we know only prune this list, not the deferred locked list that we have. This can cause a leak of memory, if the workload ends up using the intermediate locked list. Fix this by always pruning both lists when tearing down. Fixes: b3a4dbc89d40 ("io_uring/kbuf: Use slab for struct io_buffer objects") Signed-off-by: Jens Axboe --- io_uring/kbuf.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 325ca7f8b0a0..39d15a27eb92 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -306,6 +306,14 @@ void io_destroy_buffers(struct io_ring_ctx *ctx) kfree(bl); } + /* + * Move deferred locked entries to cache before pruning + */ + spin_lock(&ctx->completion_lock); + if (!list_empty(&ctx->io_buffers_comp)) + list_splice_init(&ctx->io_buffers_comp, &ctx->io_buffers_cache); + spin_unlock(&ctx->completion_lock); + list_for_each_safe(item, tmp, &ctx->io_buffers_cache) { buf = list_entry(item, struct io_buffer, list); kmem_cache_free(io_buf_cachep, buf); From patchwork Thu Nov 30 19:45:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13474970 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="vAMIYsl8" Received: from mail-il1-x135.google.com (mail-il1-x135.google.com [IPv6:2607:f8b0:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0780D67 for ; Thu, 30 Nov 2023 11:46:50 -0800 (PST) Received: by mail-il1-x135.google.com with SMTP id e9e14a558f8ab-35cfd975a53so1082485ab.1 for ; Thu, 30 Nov 2023 11:46:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1701373609; x=1701978409; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=h2HWi279rh64gXCrH8f0I7iMcUE2uT/adU0g3wTQ5Ao=; b=vAMIYsl8PY1UMjiDH2QCgnklTbIZDsI5fINv7UrbS+66CUCjEIjHRrbx274E4JzURZ 1KTjBZ/Wtwigb80WSyms8NvChNatDoWF/X+k7C92UO9TlaTO8xe/+e87LpISSUvwBofv TAfQffBN9yne/sdbO8Kj9I5FwaECDb5w1LugWGqb63qpvdMoBMMRJzHSlHXEoV/TH8YI +BXc9JGyupte/lGrrBlL1Q1fkAaspQ5HZUFuDFGu2IZc+nSrLH0nHdVTQPJUH5QS+PRm fKX+GJIA0BXSd70UjMnzDiA++/gaVM+lvfqhyB1M3jZ7HQ9FhlBM4gl/UdZ4qfcsFF5J G9dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701373609; x=1701978409; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h2HWi279rh64gXCrH8f0I7iMcUE2uT/adU0g3wTQ5Ao=; b=nleAfnFZWKs5pHO9rQ3H4j7VGRF9vOQNam4WdhLaFskrQAxfGYCGJZx8HPMZDpj76c eo9i+jxJqWWPKRRuKlbS9+MG54tjlwXEeUoeSOHHcILsCnt3udptDA9Lze3nHxnuN6k+ pTkxnzbpX0ESJSXSLkjxCsfHUlEzOwF4rUF/grs2uEeRlXHrByHEL6CwiYt/3EU4UTCO 3m9KEufDoHPXBv2Pl3jtM9N9FjPsZuJDrGKz7abrY+f7MV2uThkCh/I6L2JHrqocOd5r P+A281fnV/92i7y0QKBSOlfrbdp5mKkLoQC+w67BC9oUHAqExGbAsDORkpTzxc2LiZju kxQg== X-Gm-Message-State: AOJu0Yx7c54maywQkZsciY74ygvXWBgSMXtw+UFhaxLj7MpSN8aIiNJs 2syuq8gP+VSxrk4/gNPtnJqi0ALgXYDH2NE6+dU0DA== X-Google-Smtp-Source: AGHT+IFt1rr2No8HuHyy980pqzaVmvIIQ1typwkiS5GFSscUFuIxnKkRNCW4o7sR79g/HFhnu22t2Q== X-Received: by 2002:a5d:9c18:0:b0:79f:a8c2:290d with SMTP id 24-20020a5d9c18000000b0079fa8c2290dmr23173995ioe.0.1701373609558; Thu, 30 Nov 2023 11:46:49 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a18-20020a029f92000000b004667167d8cdsm461179jam.116.2023.11.30.11.46.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 11:46:48 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe , stable@vger.kernel.org Subject: [PATCH 7/8] io_uring: free io_buffer_list entries via RCU Date: Thu, 30 Nov 2023 12:45:53 -0700 Message-ID: <20231130194633.649319-8-axboe@kernel.dk> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231130194633.649319-1-axboe@kernel.dk> References: <20231130194633.649319-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 mmap_lock nests under uring_lock out of necessity, as we may be doing user copies with uring_lock held. However, for mmap of provided buffer rings, we attempt to grab uring_lock with mmap_lock already held from do_mmap(). This makes lockdep, rightfully, complain: WARNING: possible circular locking dependency detected 6.7.0-rc1-00009-gff3337ebaf94-dirty #4438 Not tainted ------------------------------------------------------ buf-ring.t/442 is trying to acquire lock: ffff00020e1480a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_uring_validate_mmap_request.isra.0+0x4c/0x140 but task is already holding lock: ffff0000dc226190 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x124/0x264 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mm->mmap_lock){++++}-{3:3}: __might_fault+0x90/0xbc io_register_pbuf_ring+0x94/0x488 __arm64_sys_io_uring_register+0x8dc/0x1318 invoke_syscall+0x5c/0x17c el0_svc_common.constprop.0+0x108/0x130 do_el0_svc+0x2c/0x38 el0_svc+0x4c/0x94 el0t_64_sync_handler+0x118/0x124 el0t_64_sync+0x168/0x16c -> #0 (&ctx->uring_lock){+.+.}-{3:3}: __lock_acquire+0x19a0/0x2d14 lock_acquire+0x2e0/0x44c __mutex_lock+0x118/0x564 mutex_lock_nested+0x20/0x28 io_uring_validate_mmap_request.isra.0+0x4c/0x140 io_uring_mmu_get_unmapped_area+0x3c/0x98 get_unmapped_area+0xa4/0x158 do_mmap+0xec/0x5b4 vm_mmap_pgoff+0x158/0x264 ksys_mmap_pgoff+0x1d4/0x254 __arm64_sys_mmap+0x80/0x9c invoke_syscall+0x5c/0x17c el0_svc_common.constprop.0+0x108/0x130 do_el0_svc+0x2c/0x38 el0_svc+0x4c/0x94 el0t_64_sync_handler+0x118/0x124 el0t_64_sync+0x168/0x16c From that mmap(2) path, we really just need to ensure that the buffer list doesn't go away from underneath us. For the lower indexed entries, they never go away until the ring is freed and we can always sanely reference those as long as the caller has a file reference. For the higher indexed ones in our xarray, we just need to ensure that the buffer list remains valid while we return the address of it. Free the higher indexed io_buffer_list entries via RCU. With that we can avoid needing ->uring_lock inside mmap(2), and simply hold the RCU read lock around the buffer list lookup and address check. To ensure that the arrayed lookup either returns a valid fully formulated entry via RCU lookup, add an 'is_ready' flag that we access with store and release memory ordering. This isn't needed for the xarray lookups, but doesn't hurt either. Since this isn't a fast path, retain it across both types. Similarly, for the allocated array inside the ctx, ensure we use the proper load/acquire as setup could in theory be running in parallel with mmap. While in there, add a few lockdep checks for documentation purposes. Cc: stable@vger.kernel.org Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring") Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 4 +-- io_uring/kbuf.c | 64 ++++++++++++++++++++++++++++++++++++--------- io_uring/kbuf.h | 3 +++ 3 files changed, 56 insertions(+), 15 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3a216f0744dd..05f933dddfde 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3498,9 +3498,9 @@ static void *io_uring_validate_mmap_request(struct file *file, unsigned int bgid; bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT; - mutex_lock(&ctx->uring_lock); + rcu_read_lock(); ptr = io_pbuf_get_address(ctx, bgid); - mutex_unlock(&ctx->uring_lock); + rcu_read_unlock(); if (!ptr) return ERR_PTR(-EINVAL); break; diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 39d15a27eb92..268788305b61 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -40,19 +40,35 @@ struct io_buf_free { int inuse; }; +static struct io_buffer_list *__io_buffer_get_list(struct io_ring_ctx *ctx, + struct io_buffer_list *bl, + unsigned int bgid) +{ + if (bl && bgid < BGID_ARRAY) + return &bl[bgid]; + + return xa_load(&ctx->io_bl_xa, bgid); +} + static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx, unsigned int bgid) { - if (ctx->io_bl && bgid < BGID_ARRAY) - return &ctx->io_bl[bgid]; + lockdep_assert_held(&ctx->uring_lock); - return xa_load(&ctx->io_bl_xa, bgid); + return __io_buffer_get_list(ctx, ctx->io_bl, bgid); } static int io_buffer_add_list(struct io_ring_ctx *ctx, struct io_buffer_list *bl, unsigned int bgid) { + /* + * Store buffer group ID and finally mark the list as visible. + * The normal lookup doesn't care about the visibility as we're + * always under the ->uring_lock, but the RCU lookup from mmap does. + */ bl->bgid = bgid; + smp_store_release(&bl->is_ready, 1); + if (bgid < BGID_ARRAY) return 0; @@ -203,18 +219,19 @@ void __user *io_buffer_select(struct io_kiocb *req, size_t *len, static __cold int io_init_bl_list(struct io_ring_ctx *ctx) { + struct io_buffer_list *bl; int i; - ctx->io_bl = kcalloc(BGID_ARRAY, sizeof(struct io_buffer_list), - GFP_KERNEL); - if (!ctx->io_bl) + bl = kcalloc(BGID_ARRAY, sizeof(struct io_buffer_list), GFP_KERNEL); + if (!bl) return -ENOMEM; for (i = 0; i < BGID_ARRAY; i++) { - INIT_LIST_HEAD(&ctx->io_bl[i].buf_list); - ctx->io_bl[i].bgid = i; + INIT_LIST_HEAD(&bl[i].buf_list); + bl[i].bgid = i; } + smp_store_release(&ctx->io_bl, bl); return 0; } @@ -303,7 +320,7 @@ void io_destroy_buffers(struct io_ring_ctx *ctx) xa_for_each(&ctx->io_bl_xa, index, bl) { xa_erase(&ctx->io_bl_xa, bl->bgid); __io_remove_buffers(ctx, bl, -1U); - kfree(bl); + kfree_rcu(bl, rcu); } /* @@ -497,7 +514,16 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags) INIT_LIST_HEAD(&bl->buf_list); ret = io_buffer_add_list(ctx, bl, p->bgid); if (ret) { - kfree(bl); + /* + * Doesn't need rcu free as it was never visible, but + * let's keep it consistent throughout. Also can't + * be a lower indexed array group, as adding one + * where lookup failed cannot happen. + */ + if (p->bgid >= BGID_ARRAY) + kfree_rcu(bl, rcu); + else + WARN_ON_ONCE(1); goto err; } } @@ -636,6 +662,8 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) struct io_buffer_list *bl, *free_bl = NULL; int ret; + lockdep_assert_held(&ctx->uring_lock); + if (copy_from_user(®, arg, sizeof(reg))) return -EFAULT; @@ -690,7 +718,7 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) return 0; } - kfree(free_bl); + kfree_rcu(free_bl, rcu); return ret; } @@ -699,6 +727,8 @@ int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) struct io_uring_buf_reg reg; struct io_buffer_list *bl; + lockdep_assert_held(&ctx->uring_lock); + if (copy_from_user(®, arg, sizeof(reg))) return -EFAULT; if (reg.resv[0] || reg.resv[1] || reg.resv[2]) @@ -715,7 +745,7 @@ int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) __io_remove_buffers(ctx, bl, -1U); if (bl->bgid >= BGID_ARRAY) { xa_erase(&ctx->io_bl_xa, bl->bgid); - kfree(bl); + kfree_rcu(bl, rcu); } return 0; } @@ -724,7 +754,15 @@ void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid) { struct io_buffer_list *bl; - bl = io_buffer_get_list(ctx, bgid); + bl = __io_buffer_get_list(ctx, smp_load_acquire(&ctx->io_bl), bgid); + + /* + * Ensure the list is fully setup. Only strictly needed for RCU lookup + * via mmap, and in that case only for the array indexed groups. For + * the xarray lookups, it's either visible and ready, or not at all. + */ + if (!smp_load_acquire(&bl->is_ready)) + return NULL; if (!bl || !bl->is_mmap) return NULL; diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index 6c7646e6057c..9be5960817ea 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -15,6 +15,7 @@ struct io_buffer_list { struct page **buf_pages; struct io_uring_buf_ring *buf_ring; }; + struct rcu_head rcu; }; __u16 bgid; @@ -28,6 +29,8 @@ struct io_buffer_list { __u8 is_mapped; /* ring mapped provided buffers, but mmap'ed by application */ __u8 is_mmap; + /* bl is visible from an RCU point of view for lookup */ + __u8 is_ready; }; struct io_buffer { From patchwork Thu Nov 30 19:45:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13474971 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="Rgv5DEgE" Received: from mail-io1-xd2c.google.com (mail-io1-xd2c.google.com [IPv6:2607:f8b0:4864:20::d2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFD0910D9 for ; Thu, 30 Nov 2023 11:46:51 -0800 (PST) Received: by mail-io1-xd2c.google.com with SMTP id ca18e2360f4ac-7b37846373eso16356439f.0 for ; Thu, 30 Nov 2023 11:46:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1701373611; x=1701978411; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WUfGQN4wOIiBxZTOLoqEzFe4YZPkK5hvpNxoZoJGul0=; b=Rgv5DEgE4cirFBIq2I8ngokFEAUA8IRoOmXMc+QODkzBdLEY7gXw8oQ19zHmm1XiQr Q8+Vj0bA/zrrdm4V8rZpddKck36SlUlT0VOH1i3Mz1ACOJOSzaGb9fj9pJcSn6Jazfb5 SFMVV9c+vef+jrd4QbgOyyiwUTi5omPofmXMGRP/Fpkgxu6oGRXy8UdF6X5r1DsjF6zG Z3+pav+Omszr9OoU56Jh12qK8dE5UJqaAGqAiCvq+B+Am1GVjLaHyuHtyLtEd0ubJ0rD fsjjI7pTZWsjOpACarCP+8prPkPJvp2HS15FrkV0Hrry3n6gk26WRNmRFS0IjrEDZs9p DPJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701373611; x=1701978411; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WUfGQN4wOIiBxZTOLoqEzFe4YZPkK5hvpNxoZoJGul0=; b=tiRnV4biAie5U4twEzsoUX3qNECwB1rL1G4S0UTdpfHXt9HXYCdekrLD4lgrCpZmxd IwDMpNS9266NBfZjdBJfRZaWA0qOyqoIJgZXJP3XgHWTZslK6HqkW7kWzBOUsaAj9Xxt KiLvtBS5FNMCQvuRn/G/malGawGb7f2te0ozmIKiO4OVWy2atvKsbc+KaSf1s63aGWMO 4C56vxo86cxwL9mKZ4dxe5WEDl1bChorZDRkJooSyL/2mQZ3Dw4WQ6OqL73lRVpx8uxr ngfDa+tsCSY10TnoBwO45aSPskWg6CmmSCOzq38JPmAevFVyPmOWLQr6rC7HQat9LXGM KXQA== X-Gm-Message-State: AOJu0YzOHNAdzXjEpHsH1oHm5MwACcpsDtAC+vUhcpFIwBLTow0cOJwo lRT3yL7t762OFBrdFV5aIrTxrg+PVQ+5ltVp/4mqcg== X-Google-Smtp-Source: AGHT+IFC9t1Qs639FdrLngne+Z4zbSeGX/021HZv7mHEsiD1J7zH3phrbfqqUCvu4+JGXA9vCB0gsw== X-Received: by 2002:a5e:cb02:0:b0:7b0:acce:5535 with SMTP id p2-20020a5ecb02000000b007b0acce5535mr24376797iom.1.1701373610914; Thu, 30 Nov 2023 11:46:50 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id a18-20020a029f92000000b004667167d8cdsm461179jam.116.2023.11.30.11.46.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 11:46:49 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe , Jann Horn Subject: [PATCH 8/8] io_uring: use fget/fput consistently Date: Thu, 30 Nov 2023 12:45:54 -0700 Message-ID: <20231130194633.649319-9-axboe@kernel.dk> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231130194633.649319-1-axboe@kernel.dk> References: <20231130194633.649319-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Normally within a syscall it's fine to use fdget/fdput for grabbing a file from the file table, and it's fine within io_uring as well. We do that via io_uring_enter(2), io_uring_register(2), and then also for cancel which is invoked from the latter. io_uring cannot close its own file descriptors as that is explicitly rejected, and for the cancel side of things, the file itself is just used as a lookup cookie. However, it is more prudent to ensure that full references are always grabbed. For anything threaded, either explicitly in the application itself or through use of the io-wq worker threads, this is what happens anyway. Generalize it and use fget/fput throughout. Also see the below link for more details. Link: https://lore.kernel.org/io-uring/CAG48ez1htVSO3TqmrF8QcX2WFuYTRM-VZ_N10i-VZgbtg=NNqw@mail.gmail.com/ Suggested-by: Jann Horn Signed-off-by: Jens Axboe --- io_uring/cancel.c | 11 ++++++----- io_uring/io_uring.c | 36 ++++++++++++++++++------------------ 2 files changed, 24 insertions(+), 23 deletions(-) diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 3c19cccb1aec..8a8b07dfc444 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -273,7 +273,7 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg) }; ktime_t timeout = KTIME_MAX; struct io_uring_sync_cancel_reg sc; - struct fd f = { }; + struct file *file = NULL; DEFINE_WAIT(wait); int ret, i; @@ -295,10 +295,10 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg) /* we can grab a normal file descriptor upfront */ if ((cd.flags & IORING_ASYNC_CANCEL_FD) && !(cd.flags & IORING_ASYNC_CANCEL_FD_FIXED)) { - f = fdget(sc.fd); - if (!f.file) + file = fget(sc.fd); + if (!file) return -EBADF; - cd.file = f.file; + cd.file = file; } ret = __io_sync_cancel(current->io_uring, &cd, sc.fd); @@ -348,6 +348,7 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg) if (ret == -ENOENT || ret > 0) ret = 0; out: - fdput(f); + if (file) + fput(file); return ret; } diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 05f933dddfde..aba5657d287e 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3652,7 +3652,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, size_t, argsz) { struct io_ring_ctx *ctx; - struct fd f; + struct file *file; long ret; if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP | @@ -3670,20 +3670,19 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX)) return -EINVAL; fd = array_index_nospec(fd, IO_RINGFD_REG_MAX); - f.file = tctx->registered_rings[fd]; - f.flags = 0; - if (unlikely(!f.file)) + file = tctx->registered_rings[fd]; + if (unlikely(!file)) return -EBADF; } else { - f = fdget(fd); - if (unlikely(!f.file)) + file = fget(fd); + if (unlikely(!file)) return -EBADF; ret = -EOPNOTSUPP; - if (unlikely(!io_is_uring_fops(f.file))) + if (unlikely(!io_is_uring_fops(file))) goto out; } - ctx = f.file->private_data; + ctx = file->private_data; ret = -EBADFD; if (unlikely(ctx->flags & IORING_SETUP_R_DISABLED)) goto out; @@ -3777,7 +3776,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, } } out: - fdput(f); + if (!(flags & IORING_ENTER_REGISTERED_RING)) + fput(file); return ret; } @@ -4618,7 +4618,7 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, { struct io_ring_ctx *ctx; long ret = -EBADF; - struct fd f; + struct file *file; bool use_registered_ring; use_registered_ring = !!(opcode & IORING_REGISTER_USE_REGISTERED_RING); @@ -4637,27 +4637,27 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX)) return -EINVAL; fd = array_index_nospec(fd, IO_RINGFD_REG_MAX); - f.file = tctx->registered_rings[fd]; - f.flags = 0; - if (unlikely(!f.file)) + file = tctx->registered_rings[fd]; + if (unlikely(!file)) return -EBADF; } else { - f = fdget(fd); - if (unlikely(!f.file)) + file = fget(fd); + if (unlikely(!file)) return -EBADF; ret = -EOPNOTSUPP; - if (!io_is_uring_fops(f.file)) + if (!io_is_uring_fops(file)) goto out_fput; } - ctx = f.file->private_data; + ctx = file->private_data; mutex_lock(&ctx->uring_lock); ret = __io_uring_register(ctx, opcode, arg, nr_args); mutex_unlock(&ctx->uring_lock); trace_io_uring_register(ctx, opcode, ctx->nr_user_files, ctx->nr_user_bufs, ret); out_fput: - fdput(f); + if (!use_registered_ring) + fput(file); return ret; }