From patchwork Thu Mar 28 23:31:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609737 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8C4F225A8 for ; Thu, 28 Mar 2024 23:34:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668891; cv=none; b=V/XeFkohrvsFqm2o9SZm088d9qzCt+PsZkGEybxllBZn0oL1XAIZxuOSW0kXkVhgjObwVfHa9/43dDLijHfSEJ8vQoHeejWIBZ4TwTLrhPeTjYI/+qCq5eNbU7+xjY3pDb/eoXBhdsOrQ+/nnNx+zCxZPw2gLDS+VWIZD065Yjk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668891; c=relaxed/simple; bh=P1GMNtFtESVWn1dPRf9Nl9/lk5ZAk1q8MCKI0cwCGrE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NcFJ5GyI/lsxJjSHVuTYV4UDFXBopZD83BXHbi9MNv/yJXDTzENaLkLO3qWdRq3ET0S80V6Lxd8OOD9yjM/+OPKX1OOeyTgd2x1EAT33oE0Ga/+blwk66pV4XkA81bkBB2Hz6RCeQpezVCjXdv5GE89ARjgnJZ4m7p/3hWFXVnU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=sOuOrSyj; arc=none smtp.client-ip=209.85.216.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="sOuOrSyj" Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-29f8ae4eae5so413293a91.0 for ; Thu, 28 Mar 2024 16:34:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668889; x=1712273689; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HhrfVg/In3eA3LSQhK4pBME3GHjl1vkZLwH4Hhx29tY=; b=sOuOrSyjkUxb3UegXB7qWrLUoUBIAXimeDdpVwmYMQO/fNCTERkfgPxADO4A2XeoKV HuRPVfa8W0xeRJw4CGongz6QND9ah0K0vBUMAJBVWu379+sd+SPb0dE/5SBJgL1xoKjg puX4ce9LYp8wMa1+WTG3BzuRAvXAwg+9mPXQ2wqDuezpjyyOlnMi4fYOUETjecQ/u8wt OhZvbY2JWXYoye1QS3EeU1RbUDZ3kjlbDMCiBR/cuhOTkSnwAN4MkHhGwJf5x8vFdAVm Q5C++3h4W5JMu5hcjkhTMluJnYbHBQrYAZP/p/XtrFxOvnjWioTBaQNFNd7GsDcFzTlX KaTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668889; x=1712273689; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HhrfVg/In3eA3LSQhK4pBME3GHjl1vkZLwH4Hhx29tY=; b=EGRAo8SzGGCVNgA8uXM9n3cnHaCVFhKoNBxmE20cKFMg+4D2tQ6BtBS5RdxAuxC4/d t6AQiVJhRmk6h2vXqDa+fcqlGb3ruLmBXwSM5ecTFv3QeapHbR3n9QQCKppG1uSMtKEQ kisfcCFEjLLUgMT3dcOzGjVTow72KahvqcNm0xdujquyth/L5PjHI620msY9u4FHY33o PK02WUaomcIVvloE/lO6AVjfHnq8O5k0JtOUeT+WhohhHt/KkW5U1NHrnosvuAjw/Vdz ZvU/v0oUWbSA2WaMHIwvsIsnWgUxLFemouA9wnySMdRQU72/+NoFmEwuraYTrNBS88PF N2oA== X-Gm-Message-State: AOJu0Yz20Ybq8P6prwINV3bhCvnVSAAYXagmZrZShnYGn3jt+oBo2vw/ yrnLG1IRVZDAadSpWd5L4RKJEQbEYC6QHvAKYscbeYHfClSbw4LNQV67m/77rz3A/LSW/rphEhP 1 X-Google-Smtp-Source: AGHT+IGjNbQ+J1O+6k7qwPDsGJRMNd/FFgnxaRr28fXqX/v/Tet6sxHqzrpUOzI48QRZ0iFi7xnjIQ== X-Received: by 2002:a17:902:ed46:b0:1dd:7c4c:c6b6 with SMTP id y6-20020a170902ed4600b001dd7c4cc6b6mr904101plb.5.1711668888719; Thu, 28 Mar 2024 16:34:48 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.34.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:34:47 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 01/11] mm: add nommu variant of vm_insert_pages() Date: Thu, 28 Mar 2024 17:31:28 -0600 Message-ID: <20240328233443.797828-2-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 An identical one exists for vm_insert_page(), add one for vm_insert_pages() to avoid needing to check for CONFIG_MMU in code using it. Acked-by: Johannes Weiner Signed-off-by: Jens Axboe --- mm/nommu.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/nommu.c b/mm/nommu.c index 5ec8f44e7ce9..a34a0e376611 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -355,6 +355,13 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, } EXPORT_SYMBOL(vm_insert_page); +int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr, + struct page **pages, unsigned long *num) +{ + return -EINVAL; +} +EXPORT_SYMBOL(vm_insert_pages); + int vm_map_pages(struct vm_area_struct *vma, struct page **pages, unsigned long num) { From patchwork Thu Mar 28 23:31:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609738 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED653225A8 for ; Thu, 28 Mar 2024 23:34:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668893; cv=none; b=legSf87Zc5o1jOEy8r2OYwrShWuDeq/9IwBNLhZXpzlTdmo4OcVTGRNVtpZe4cC74bZTshWMmS8836LhzHBIjFCYS2FjdJJ7QVEX336zilGufWbcDbmHP4L7eIQabeGundB117W3KaZXAzr7z5fqExhx2RqKA/W2Dt00Rn9uZhU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668893; c=relaxed/simple; bh=XUkl/+i7OtHZUwnqDrEzH8BtHEvc59M13YnBnBSirLY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pgBzBR8LTTJthlbnNAMtgycWLGSAOHqgn7ozgFE7rKXiNrsNQGORsf9TduNwgYlex+ZJ6wf05feY5QbnppNPKaURAsFfk7HwTjcUXnOWhfguCW5i94mAjT2RKK7iGBIVjt2ADl9wqArKx3BdmYrp6T8q2WR51Tm9YKa2ycMQRT0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=0VfMq8fY; arc=none smtp.client-ip=209.85.216.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="0VfMq8fY" Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-29f8ae4eae5so413299a91.0 for ; Thu, 28 Mar 2024 16:34:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668891; x=1712273691; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6rxmMF0ZC/HoWcm93GNcGMXq+P4YsNv0RJNPeAouMcs=; b=0VfMq8fY3JdChLhXnhqDx98XUr5LZbDxIEK3k9W4d+5xKqm3lP7oOjKjfaYs66Uf/W jKvvPfkK2TAn8xssWxVsyei1sCyaV5PArEDT2ItWCIusa7dZQsrkYNGqT81oxwMuCsJp ibo5qkf6W/Jitzmm0cjhZ8EIkTSFfmi5uXEly7MItBM+k/sexjcLk37BiqXzo/cAt57K U7yTdzkgK7hXWaTvT7eoWRjaXRngQJqd8sqqGsDfJJY4otIbuW26ZP6mGTuCey7lhgZf ZCBqcNLfAsNN+3EtmKkCsMm5/FLtyq1GadWEBFjCxxkiTbIKCuRsmkVTKppqaiG3Fj+Y iuGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668891; x=1712273691; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6rxmMF0ZC/HoWcm93GNcGMXq+P4YsNv0RJNPeAouMcs=; b=oUXx/9SCyyeFl/V7jUC2/Od/MXSfJZ5XGHaS+dCO0l1qZAUbGORkElM/KBXkZjbc2d 0CPNCd8WHJABLlRZdsuW9RLAxH1e0bH0h13ZbPkoDr0MGcCzFC5e4JOovCYlQk5h0SQd OEwXBF4s9SGmQnguODSlHTn9jkIy1YuI/N/Q1Xn7sNAqorVi2TugH/eB+F8GSIDKj0P+ xTH5PHnmM52lljZR0veYrI9zpRSpwy6iYhGvE83+J6FDQfVoBelq3yecTlFDuog51J2e s+q3v7Pn6Rj2gv9OzyyhwamvxTmRHfwrY/CpYp6V1I9DMsMzSP751LVSXh7fzBenF6jj Mszg== X-Gm-Message-State: AOJu0YzJD+1PeFmSrNfDVIIblOBSz0trc/It3/mL+9yBDJf8oh28kOCN kcTzKOURObSB4lNkpk7PxVv7plm3B/ikKqfI7Z4XFpYwHvSrAhOPlXbwVRpceZ6ffRus3gsTPOs i X-Google-Smtp-Source: AGHT+IGGb+uxJqvDXip4mf6BB1ldp1ADLQLpaP3Rwmi8/nzQ2460aJrh/P5rU3huBzYzGXledHRe/A== X-Received: by 2002:a17:902:bb91:b0:1e0:99b2:8a91 with SMTP id m17-20020a170902bb9100b001e099b28a91mr918073pls.4.1711668890676; Thu, 28 Mar 2024 16:34:50 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.34.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:34:49 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 02/11] io_uring: get rid of remap_pfn_range() for mapping rings/sqes Date: Thu, 28 Mar 2024 17:31:29 -0600 Message-ID: <20240328233443.797828-3-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_pages() and have it Just Work. If possible, allocate a single compound page that covers the range that is needed. If that works, then we can just use page_address() on that page. If we fail to get a compound page, allocate single pages and use vmap() to map them into the kernel virtual address space. This just covers the rings/sqes, the other remaining user of the mmap remap_pfn_range() user will be converted separately. Once that is done, we can kill the old alloc/free code. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 136 +++++++++++++++++++++++++++++++++++++++++--- io_uring/io_uring.h | 2 + 2 files changed, 130 insertions(+), 8 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 104899522bc5..982545ca23f9 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2594,6 +2594,33 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0; } +static void io_pages_unmap(void *ptr, struct page ***pages, + unsigned short *npages) +{ + bool do_vunmap = false; + + if (*npages) { + struct page **to_free = *pages; + int i; + + /* + * Only did vmap for the non-compound multiple page case. + * For the compound page, we just need to put the head. + */ + if (PageCompound(to_free[0])) + *npages = 1; + else if (*npages > 1) + do_vunmap = true; + for (i = 0; i < *npages; i++) + put_page(to_free[i]); + } + if (do_vunmap) + vunmap(ptr); + kvfree(*pages); + *pages = NULL; + *npages = 0; +} + void io_mem_free(void *ptr) { if (!ptr) @@ -2694,8 +2721,8 @@ static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr, static void io_rings_free(struct io_ring_ctx *ctx) { if (!(ctx->flags & IORING_SETUP_NO_MMAP)) { - io_mem_free(ctx->rings); - io_mem_free(ctx->sq_sqes); + io_pages_unmap(ctx->rings, &ctx->ring_pages, &ctx->n_ring_pages); + io_pages_unmap(ctx->sq_sqes, &ctx->sqe_pages, &ctx->n_sqe_pages); } else { io_pages_free(&ctx->ring_pages, ctx->n_ring_pages); ctx->n_ring_pages = 0; @@ -2707,6 +2734,80 @@ static void io_rings_free(struct io_ring_ctx *ctx) ctx->sq_sqes = NULL; } +static void *io_mem_alloc_compound(struct page **pages, int nr_pages, + size_t size, gfp_t gfp) +{ + struct page *page; + int i, order; + + order = get_order(size); + if (order > MAX_PAGE_ORDER) + return NULL; + else if (order) + gfp |= __GFP_COMP; + + page = alloc_pages(gfp, order); + if (!page) + return NULL; + + for (i = 0; i < nr_pages; i++) + pages[i] = page + i; + + return page_address(page); +} + +static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size, + gfp_t gfp) +{ + void *ret; + int i; + + for (i = 0; i < nr_pages; i++) { + pages[i] = alloc_page(gfp); + if (!pages[i]) + goto err; + } + + ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); + if (ret) + return ret; +err: + while (i--) + put_page(pages[i]); + return ERR_PTR(-ENOMEM); +} + +static void *io_pages_map(struct page ***out_pages, unsigned short *npages, + size_t size) +{ + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN; + struct page **pages; + int nr_pages; + void *ret; + + nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + pages = kvmalloc_array(nr_pages, sizeof(struct page *), gfp); + if (!pages) + return ERR_PTR(-ENOMEM); + + ret = io_mem_alloc_compound(pages, nr_pages, size, gfp); + if (ret) + goto done; + + ret = io_mem_alloc_single(pages, nr_pages, size, gfp); + if (ret) { +done: + *out_pages = pages; + *npages = nr_pages; + return ret; + } + + kvfree(pages); + *out_pages = NULL; + *npages = 0; + return ERR_PTR(-ENOMEM); +} + void *io_mem_alloc(size_t size) { gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; @@ -3294,14 +3395,12 @@ static void *io_uring_validate_mmap_request(struct file *file, /* Don't allow mmap if the ring was setup without it */ if (ctx->flags & IORING_SETUP_NO_MMAP) return ERR_PTR(-EINVAL); - ptr = ctx->rings; - break; + return ctx->rings; case IORING_OFF_SQES: /* Don't allow mmap if the ring was setup without it */ if (ctx->flags & IORING_SETUP_NO_MMAP) return ERR_PTR(-EINVAL); - ptr = ctx->sq_sqes; - break; + return ctx->sq_sqes; case IORING_OFF_PBUF_RING: { unsigned int bgid; @@ -3324,11 +3423,22 @@ static void *io_uring_validate_mmap_request(struct file *file, return ptr; } +int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma, + struct page **pages, int npages) +{ + unsigned long nr_pages = npages; + + vm_flags_set(vma, VM_DONTEXPAND); + return vm_insert_pages(vma, vma->vm_start, pages, &nr_pages); +} + #ifdef CONFIG_MMU static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) { + struct io_ring_ctx *ctx = file->private_data; size_t sz = vma->vm_end - vma->vm_start; + long offset = vma->vm_pgoff << PAGE_SHIFT; unsigned long pfn; void *ptr; @@ -3336,6 +3446,16 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) if (IS_ERR(ptr)) return PTR_ERR(ptr); + switch (offset & IORING_OFF_MMAP_MASK) { + case IORING_OFF_SQ_RING: + case IORING_OFF_CQ_RING: + return io_uring_mmap_pages(ctx, vma, ctx->ring_pages, + ctx->n_ring_pages); + case IORING_OFF_SQES: + return io_uring_mmap_pages(ctx, vma, ctx->sqe_pages, + ctx->n_sqe_pages); + } + pfn = virt_to_phys(ptr) >> PAGE_SHIFT; return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot); } @@ -3625,7 +3745,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; if (!(ctx->flags & IORING_SETUP_NO_MMAP)) - rings = io_mem_alloc(size); + rings = io_pages_map(&ctx->ring_pages, &ctx->n_ring_pages, size); else rings = io_rings_map(ctx, p->cq_off.user_addr, size); @@ -3650,7 +3770,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, } if (!(ctx->flags & IORING_SETUP_NO_MMAP)) - ptr = io_mem_alloc(size); + ptr = io_pages_map(&ctx->sqe_pages, &ctx->n_sqe_pages, size); else ptr = io_sqes_map(ctx, p->sq_off.user_addr, size); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index dbd9a2b870eb..75230d914007 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -70,6 +70,8 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags); void __io_commit_cqring_flush(struct io_ring_ctx *ctx); struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages); +int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma, + struct page **pages, int npages); struct file *io_file_get_normal(struct io_kiocb *req, int fd); struct file *io_file_get_fixed(struct io_kiocb *req, int fd, From patchwork Thu Mar 28 23:31:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609739 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCBB5136E35 for ; Thu, 28 Mar 2024 23:34:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668895; cv=none; b=WmscSU9bLQFwvu7iyfYLhjd4wgX7FYZAmE69g+AeshcjoJCSG4uQoG7H4geZ563AAng4PyOHB38nxpdcR7/C68z7qYBNPgEi9hqo26NLsNbXeBRSObRho1AetpP79Oc3e0vLENmzw6PnGbKsbI3XQYoEqBjW+dJje7PBlXHrcGg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668895; c=relaxed/simple; bh=qca79bPx4QE4FdCO1F9KhQ8sdnFh5uCJmdpa2OpXU6w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=d8tTn+apn7BOF1OxYYhUwdqBflS/UyI3rELsSLWuXVMXUgn/l/JPdEvMfCcHC1eU56yTt0RKLSHFuvWksRq2Iivse+S027AWLAn7esIzl4ZQWc+oindFkZxSzhFDjSd+k0w47LLJHAVu5v9djMd5xsF1td9FqWgPEZGuEj+3Z8o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=2IO5mD5N; arc=none smtp.client-ip=209.85.216.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="2IO5mD5N" Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-2a1f8308532so294985a91.1 for ; Thu, 28 Mar 2024 16:34:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668893; x=1712273693; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=inc4OH6qBFXNe31XZ2xEFuCJLRY5h4FhZzKZtMt5Q/c=; b=2IO5mD5N6XHAjLPYy2jmAYae4r6UMaA3hXzWx9VX09RIunsplNGD6YG/pqmn/5BYVW oBBRuHZcq2VGVivj39xnB6gb+p4HjWehBG4FwuVvOJLVrULffWsFpyCl51gu3NJm2+WQ Z++5iZ5D0F55akPJYB9uONGOlHQnvZXBj4qmIrr8sLkUBqFf2WTSULiA7U2rTkzdGRhH aZtI3al06lQQFwMD2OoA9hrDYjNMWYaC6/t5XVdJHI4ERKHKwD1/t9Ikg4DWsPPmXU1A RrG1hXlyfDBfSwS0jhbtmQME8stkDO1kx7PKmQxzFboRX+es7Fn0pp+vWfMkNIEjoHmU 7QVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668893; x=1712273693; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=inc4OH6qBFXNe31XZ2xEFuCJLRY5h4FhZzKZtMt5Q/c=; b=qJBnBvjJb1srcbUnJ0NzkRj57yZWARtZ8NYe8rsSHuqcUC4Bv3FSKw0p6AccEaHmbm MudviIxEZd/UN5LaledtArHd5gOYcut7/5njGKkPCVlxpIAcxBrZOQJgUGsluyPvhvUf SZrAudbj/pfQ+E5tRehSYGdhkGbUx2D7qLY8MvSXv3yCUo5F/gEv7lcjL5/NOsx3045B yuqvH8H1AQ0vRaxnLa1GI25jQyGmJy5/L3v2azRXdGAsDYsfnyboyblU8ZBpFKqbGtMJ 833il5O6QAcVXa1TVYUKNkn451kjqwqKEz8vTAiKTse30Hi/Jj+HKi3BvchIiZtqoxOH dO9g== X-Gm-Message-State: AOJu0YydeEdqhVMXUP3FJl+XJxTOo9ywdTBIV/ci2RcXjwnbyRyd6FR3 ZERFVRZJI0xs/QeiP00Khg/QMzjobKTouewLUgKcfmSUVJT3z+pR5jfyek0q9LpWSbUhti2Mvdu v X-Google-Smtp-Source: AGHT+IGiLQxSVGxFpYJC+RLf0bN1JVOnHJL9CW7fVknzPb/7O7SaX0LNKJRR3dTayg6dPz0AnsrXZg== X-Received: by 2002:a17:902:b092:b0:1e0:c887:f938 with SMTP id p18-20020a170902b09200b001e0c887f938mr962457plr.3.1711668892743; Thu, 28 Mar 2024 16:34:52 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.34.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:34:51 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 03/11] io_uring: use vmap() for ring mapping Date: Thu, 28 Mar 2024 17:31:30 -0600 Message-ID: <20240328233443.797828-4-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This is the last holdout which does odd page checking, convert it to vmap just like what is done for the non-mmap path. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 38 +++++++++----------------------------- 1 file changed, 9 insertions(+), 29 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 982545ca23f9..4c6eeb299e5d 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -63,7 +63,6 @@ #include #include #include -#include #include #include #include @@ -2649,7 +2648,7 @@ static void *__io_uaddr_map(struct page ***pages, unsigned short *npages, struct page **page_array; unsigned int nr_pages; void *page_addr; - int ret, i, pinned; + int ret, pinned; *npages = 0; @@ -2671,34 +2670,13 @@ static void *__io_uaddr_map(struct page ***pages, unsigned short *npages, goto free_pages; } - page_addr = page_address(page_array[0]); - for (i = 0; i < nr_pages; i++) { - ret = -EINVAL; - - /* - * Can't support mapping user allocated ring memory on 32-bit - * archs where it could potentially reside in highmem. Just - * fail those with -EINVAL, just like we did on kernels that - * didn't support this feature. - */ - if (PageHighMem(page_array[i])) - goto free_pages; - - /* - * No support for discontig pages for now, should either be a - * single normal page, or a huge page. Later on we can add - * support for remapping discontig pages, for now we will - * just fail them with EINVAL. - */ - if (page_address(page_array[i]) != page_addr) - goto free_pages; - page_addr += PAGE_SIZE; + page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL); + if (page_addr) { + *pages = page_array; + *npages = nr_pages; + return page_addr; } - - *pages = page_array; - *npages = nr_pages; - return page_to_virt(page_array[0]); - + ret = -ENOMEM; free_pages: io_pages_free(&page_array, pinned > 0 ? pinned : 0); return ERR_PTR(ret); @@ -2728,6 +2706,8 @@ static void io_rings_free(struct io_ring_ctx *ctx) ctx->n_ring_pages = 0; io_pages_free(&ctx->sqe_pages, ctx->n_sqe_pages); ctx->n_sqe_pages = 0; + vunmap(ctx->rings); + vunmap(ctx->sq_sqes); } ctx->rings = NULL; From patchwork Thu Mar 28 23:31:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609740 Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9A19130AC8 for ; Thu, 28 Mar 2024 23:34:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668897; cv=none; b=US813cDNF7X7kXdUg+B7xNZep2s42ZwOKcSlj+qC4UXnIEEXZeBjO+H9gQXCM/GGvSydfo2ORdiY4yrbmQrv4k0QjyPw5reuyQoF2RgpJPPpu15nTm4rAIPNS9GbjY6BqBJiHJn+aQI7jNsDXmuh7ht1WL0eYsUW+qO2o6+yRIE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668897; c=relaxed/simple; bh=DVP9nSBiNSFr9i9Z7TY5KdEBVCeY9WdQ2M/7Lpd3tck=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RNmuoLg+1GD+o8CCDgPjHI9d21a+f9WHXhttazUbFhO3pbcTAXzhJ/L8zrjImG5GV4RmJPEyvB8kjJymeq+Y0+oLZMpJBr1h1cweqJAotP7jr0cSQnASgJLYwL7eBdORErtNHa+BdZvUwU8y9D5pnZe9ubHmi0Y+wz7aqErmgnE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=iixutGTQ; arc=none smtp.client-ip=209.85.216.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="iixutGTQ" Received: by mail-pj1-f46.google.com with SMTP id 98e67ed59e1d1-29df844539bso409589a91.1 for ; Thu, 28 Mar 2024 16:34:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668895; x=1712273695; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0BmIeXpb4QrnCLFpsrPOywqx8dk2Tjma8P5taVJ4LeY=; b=iixutGTQ9/4qfzZi8/awCnyLdpGZVfZppmqUuCz/W0ad7NmQjeR3OR406qeBJTP+R0 pvw5bjtc2WJf2ryTrTcM9PgD4MKtPOpqI1OIpt45ZE3vCUPF9e5FzVFycaCyalkZx0lL homvCIZBQXUQZlQPRNLXvm3z+R4U5EyjvZ8J8vNmSBDvXcrIDGvdeuXpwt0QnNIS3BXA 2NkqHvbVlcC56UBT9RMAYAGc3zi3TBrfMYSzUsJbX5kyylX1udqdMn95HTuVLN94EkDY 8CbSQQrOI0KclNCLinUrSPT3maX1pcfIeDeYILft1hyUfKKia/XK1uiQGcMPOmqANbWG fAgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668895; x=1712273695; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0BmIeXpb4QrnCLFpsrPOywqx8dk2Tjma8P5taVJ4LeY=; b=fcr73i4DZXLAhYODASdNE+MsnA4acc8sURoqPxnQFeoCCHv7FICX04RqvtJT/FpCae BQp7kRvSgwad3451+BpCVf6G8agpqsFgj5Zj5iNlbrfqFc1wgHRL2dNdvENO5ea/jtTY PAFGcYKmC/VukRC1FT30lhHeJwRt3RN+tBrSFEC397M0HxDLXK/ak1sAeyyvrHylRYFE Lk3W/ZJ6p3Ty8EP2MkCzvPybpqpAUDO2y8fop2IzJqvNAPCe9PgR/Dq9MMdYHtB4lUUZ CcOKp1lMnzTEIHFQHjk22oPcSn/MTA5Fn5SnqBs1dd3Z11PhmeB/IEE/jdxZ7Geyv2y2 u3TA== X-Gm-Message-State: AOJu0Yz5AbUqhwsep7CYkLghjuKS7Fdh41CgI3QRQBA2Rp6A+KqZaTjg qMOg07mVBbx+0Nc1iuxF+D/HZ9V4h3V3XRXYP1Tb3Y8fqTtsyV58HssbOq3A+UfRXCSmBJ2yK46 c X-Google-Smtp-Source: AGHT+IHqUnWV6SgFxj7UpJhbC51ZIKBiKVRxcdAs0Xc4KIynL/fmai/Ap33JVmI2Hje0Dds8cx6wgQ== X-Received: by 2002:a17:902:ed46:b0:1dd:7c4c:c6b6 with SMTP id y6-20020a170902ed4600b001dd7c4cc6b6mr904319plb.5.1711668894720; Thu, 28 Mar 2024 16:34:54 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.34.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:34:53 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 04/11] io_uring: unify io_pin_pages() Date: Thu, 28 Mar 2024 17:31:31 -0600 Message-ID: <20240328233443.797828-5-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Move it into io_uring.c where it belongs, and use it in there as well rather than have two implementations of this. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 61 +++++++++++++++++++++++++++++++-------------- io_uring/rsrc.c | 36 -------------------------- 2 files changed, 42 insertions(+), 55 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 4c6eeb299e5d..3aac7fbee499 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2642,33 +2642,57 @@ static void io_pages_free(struct page ***pages, int npages) *pages = NULL; } +struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages) +{ + unsigned long start, end, nr_pages; + struct page **pages; + int ret; + + end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; + start = uaddr >> PAGE_SHIFT; + nr_pages = end - start; + if (WARN_ON_ONCE(!nr_pages)) + return ERR_PTR(-EINVAL); + + pages = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!pages) + return ERR_PTR(-ENOMEM); + + ret = pin_user_pages_fast(uaddr, nr_pages, FOLL_WRITE | FOLL_LONGTERM, + pages); + /* success, mapped all pages */ + if (ret == nr_pages) { + *npages = nr_pages; + return pages; + } + + /* partial map, or didn't map anything */ + if (ret >= 0) { + /* if we did partial map, release any pages we did get */ + if (ret) + unpin_user_pages(pages, ret); + ret = -EFAULT; + } + kvfree(pages); + return ERR_PTR(ret); +} + static void *__io_uaddr_map(struct page ***pages, unsigned short *npages, unsigned long uaddr, size_t size) { struct page **page_array; unsigned int nr_pages; void *page_addr; - int ret, pinned; *npages = 0; if (uaddr & (PAGE_SIZE - 1) || !size) return ERR_PTR(-EINVAL); - nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; - if (nr_pages > USHRT_MAX) - return ERR_PTR(-EINVAL); - page_array = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL); - if (!page_array) - return ERR_PTR(-ENOMEM); - - - pinned = pin_user_pages_fast(uaddr, nr_pages, FOLL_WRITE | FOLL_LONGTERM, - page_array); - if (pinned != nr_pages) { - ret = (pinned < 0) ? pinned : -EFAULT; - goto free_pages; - } + nr_pages = 0; + page_array = io_pin_pages(uaddr, size, &nr_pages); + if (IS_ERR(page_array)) + return page_array; page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL); if (page_addr) { @@ -2676,10 +2700,9 @@ static void *__io_uaddr_map(struct page ***pages, unsigned short *npages, *npages = nr_pages; return page_addr; } - ret = -ENOMEM; -free_pages: - io_pages_free(&page_array, pinned > 0 ? pinned : 0); - return ERR_PTR(ret); + + io_pages_free(&page_array, nr_pages); + return ERR_PTR(-ENOMEM); } static void *io_rings_map(struct io_ring_ctx *ctx, unsigned long uaddr, diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 7b8a056f98ed..8a34181c97ab 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -870,42 +870,6 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, return ret; } -struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages) -{ - unsigned long start, end, nr_pages; - struct page **pages = NULL; - int ret; - - end = (ubuf + len + PAGE_SIZE - 1) >> PAGE_SHIFT; - start = ubuf >> PAGE_SHIFT; - nr_pages = end - start; - WARN_ON(!nr_pages); - - pages = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL); - if (!pages) - return ERR_PTR(-ENOMEM); - - mmap_read_lock(current->mm); - ret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM, pages); - mmap_read_unlock(current->mm); - - /* success, mapped all pages */ - if (ret == nr_pages) { - *npages = nr_pages; - return pages; - } - - /* partial map, or didn't map anything */ - if (ret >= 0) { - /* if we did partial map, release any pages we did get */ - if (ret) - unpin_user_pages(pages, ret); - ret = -EFAULT; - } - kvfree(pages); - return ERR_PTR(ret); -} - static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, struct io_mapped_ubuf **pimu, struct page **last_hpage) From patchwork Thu Mar 28 23:31:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609741 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD581130AC8 for ; Thu, 28 Mar 2024 23:34:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668899; cv=none; b=fCDx1pc82XJJbKh0ny/JTeEika+MfZuqNAdjfZUxS+uetVieuGZ1HSCD6dIyVZXtFwEocjV5hqsu4Ik1X9YBZb4zunoWDTHqPOnk8Rscrgj6DOjUM5NBjgCVoWyd4JaEiWrUah79wioBwxHgRRwY64HuYlP9RwU5JO3cdkgpIFM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668899; c=relaxed/simple; bh=fHyvvebnEPUnAwmr0t/L6S4igv+2Bli8GSZaj7RK5Ek=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Qa7X/FDYjbwwxSrinCxzoxo9np9weVu1pCi6ATEHM6H93qewpLkFqQhbE5TLvqc5ZNOGrnNzf26wluFPLgUgg54DV2ebmV9bXuf8lDI6OD/On57+hRRjDM9S3yPVtY+OYj8IZhsMlCSV6JisBG6BKwbBNdT29b9eRAwLOD028uE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=N6p24DWP; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="N6p24DWP" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-1dde367a10aso2555485ad.0 for ; Thu, 28 Mar 2024 16:34:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668897; x=1712273697; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HcTGmr9xA8QipTDJRrWEwTCI9+2LyvKqUf3LPC31+e0=; b=N6p24DWPJIgq0Mft9AHs98znD+8FfWRAa0dpffnU9srTK1JH9yiRJ0s59K8JTMLtWj XX3fs9Q3yj7V9G8pi5N5mDFZz8PzeCJY/G2+43J6fsgjjwU42fpIOlVmhdrU3Wl2l7HD k8pa6Q4o4+LegZhfB1XR8SPgQyqhhKItqryds46Xppln5ViardhOTybnt6aN2TPAV/LL 0Ceu8RznsWc7YqNR8RtBZsgiz6DZRDKhwemyCrO869Pq0+LEsHIKXjovatF5vOeRMlTA gWoV4CahMEV0NHXVzRsG8DCdgjuUJAv+R2bU4rlJ3WAxgKPsu5tQYGG/GwJIrxviJi34 NXcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668897; x=1712273697; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HcTGmr9xA8QipTDJRrWEwTCI9+2LyvKqUf3LPC31+e0=; b=kMHHcDqpx+OjPa/aXjQEyYx22CzPCuQDYTT6UjXqyM81gTRSTp/9QLSk6A8VOubIfg YWh+iX+m/3+bm7Qu6ldHJ8F8dQxDWdLC/RdMEM5Ml9DkvS+q0dxfkDWxLBpOwWON3btQ oiNc7ROK23WXIki6sG4lcAiT8SejRF3FVBHZNxgeUf/ONBiqu5YXPq+fk0TiHTClB1Hj ocUrTYwppY6718T8oKmrnSEd7TSalH8AkdTtCzXs9hjyobZRjh/sSv3oq96a0cZ3Ywmm pKc7vmJWYSW0E3VUQztqcuyq1sFghq/7jx/6nwyd8LkLibHgLWCXJ0IrzU4gSGJerqL0 Iu4w== X-Gm-Message-State: AOJu0YzcVEftKyg81YkFjfMWJbnevdbJe3cEItP/17qAQyXlc8kx+xxs aER0wS3b4av6JK+dd/dtjSTiie6rFj2Gx2AUixOv0QgafAS6olnvHoWZK7WE5Qxgmffzv9EKtf+ w X-Google-Smtp-Source: AGHT+IHGt2masg4RVkJfFYOBjIUE0I7p8uoewhWeR8iGso1p43MW/TzYE4bZ6hFskTioEx5SVy4Wjg== X-Received: by 2002:a17:903:124d:b0:1de:ddc6:27a6 with SMTP id u13-20020a170903124d00b001deddc627a6mr959943plh.2.1711668896700; Thu, 28 Mar 2024 16:34:56 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.34.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:34:55 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 05/11] io_uring/kbuf: get rid of lower BGID lists Date: Thu, 28 Mar 2024 17:31:32 -0600 Message-ID: <20240328233443.797828-6-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Just rely on the xarray for any kind of bgid. This simplifies things, and it really doesn't bring us much, if anything. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 1 - io_uring/io_uring.c | 2 - io_uring/kbuf.c | 70 ++++------------------------------ 3 files changed, 8 insertions(+), 65 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index b191710bec4f..8c64c303dee8 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -295,7 +295,6 @@ struct io_ring_ctx { struct io_submit_state submit_state; - struct io_buffer_list *io_bl; struct xarray io_bl_xa; struct io_hash_table cancel_table_locked; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3aac7fbee499..17db5b2aa4b5 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -353,7 +353,6 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) io_futex_cache_free(ctx); kfree(ctx->cancel_table.hbs); kfree(ctx->cancel_table_locked.hbs); - kfree(ctx->io_bl); xa_destroy(&ctx->io_bl_xa); kfree(ctx); return NULL; @@ -2928,7 +2927,6 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) io_napi_free(ctx); kfree(ctx->cancel_table.hbs); kfree(ctx->cancel_table_locked.hbs); - kfree(ctx->io_bl); xa_destroy(&ctx->io_bl_xa); kfree(ctx); } diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 693c26da4ee1..8bf0121f00af 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -17,8 +17,6 @@ #define IO_BUFFER_LIST_BUF_PER_PAGE (PAGE_SIZE / sizeof(struct io_uring_buf)) -#define BGID_ARRAY 64 - /* BIDs are addressed by a 16-bit field in a CQE */ #define MAX_BIDS_PER_BGID (1 << 16) @@ -40,13 +38,9 @@ struct io_buf_free { int inuse; }; -static struct io_buffer_list *__io_buffer_get_list(struct io_ring_ctx *ctx, - struct io_buffer_list *bl, - unsigned int bgid) +static inline struct io_buffer_list *__io_buffer_get_list(struct io_ring_ctx *ctx, + unsigned int bgid) { - if (bl && bgid < BGID_ARRAY) - return &bl[bgid]; - return xa_load(&ctx->io_bl_xa, bgid); } @@ -55,7 +49,7 @@ static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx, { lockdep_assert_held(&ctx->uring_lock); - return __io_buffer_get_list(ctx, ctx->io_bl, bgid); + return __io_buffer_get_list(ctx, bgid); } static int io_buffer_add_list(struct io_ring_ctx *ctx, @@ -68,10 +62,6 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx, */ bl->bgid = bgid; smp_store_release(&bl->is_ready, 1); - - if (bgid < BGID_ARRAY) - return 0; - return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL)); } @@ -208,24 +198,6 @@ void __user *io_buffer_select(struct io_kiocb *req, size_t *len, return ret; } -static __cold int io_init_bl_list(struct io_ring_ctx *ctx) -{ - struct io_buffer_list *bl; - int i; - - bl = kcalloc(BGID_ARRAY, sizeof(struct io_buffer_list), GFP_KERNEL); - if (!bl) - return -ENOMEM; - - for (i = 0; i < BGID_ARRAY; i++) { - INIT_LIST_HEAD(&bl[i].buf_list); - bl[i].bgid = i; - } - - smp_store_release(&ctx->io_bl, bl); - return 0; -} - /* * Mark the given mapped range as free for reuse */ @@ -300,13 +272,6 @@ void io_destroy_buffers(struct io_ring_ctx *ctx) struct list_head *item, *tmp; struct io_buffer *buf; unsigned long index; - int i; - - for (i = 0; i < BGID_ARRAY; i++) { - if (!ctx->io_bl) - break; - __io_remove_buffers(ctx, &ctx->io_bl[i], -1U); - } xa_for_each(&ctx->io_bl_xa, index, bl) { xa_erase(&ctx->io_bl_xa, bl->bgid); @@ -489,12 +454,6 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags) io_ring_submit_lock(ctx, issue_flags); - if (unlikely(p->bgid < BGID_ARRAY && !ctx->io_bl)) { - ret = io_init_bl_list(ctx); - if (ret) - goto err; - } - bl = io_buffer_get_list(ctx, p->bgid); if (unlikely(!bl)) { bl = kzalloc(sizeof(*bl), GFP_KERNEL_ACCOUNT); @@ -507,14 +466,9 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags) if (ret) { /* * Doesn't need rcu free as it was never visible, but - * let's keep it consistent throughout. Also can't - * be a lower indexed array group, as adding one - * where lookup failed cannot happen. + * let's keep it consistent throughout. */ - if (p->bgid >= BGID_ARRAY) - kfree_rcu(bl, rcu); - else - WARN_ON_ONCE(1); + kfree_rcu(bl, rcu); goto err; } } @@ -679,12 +633,6 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) if (reg.ring_entries >= 65536) return -EINVAL; - if (unlikely(reg.bgid < BGID_ARRAY && !ctx->io_bl)) { - int ret = io_init_bl_list(ctx); - if (ret) - return ret; - } - bl = io_buffer_get_list(ctx, reg.bgid); if (bl) { /* if mapped buffer ring OR classic exists, don't allow */ @@ -734,10 +682,8 @@ int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) return -EINVAL; __io_remove_buffers(ctx, bl, -1U); - if (bl->bgid >= BGID_ARRAY) { - xa_erase(&ctx->io_bl_xa, bl->bgid); - kfree_rcu(bl, rcu); - } + xa_erase(&ctx->io_bl_xa, bl->bgid); + kfree_rcu(bl, rcu); return 0; } @@ -771,7 +717,7 @@ void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid) { struct io_buffer_list *bl; - bl = __io_buffer_get_list(ctx, smp_load_acquire(&ctx->io_bl), bgid); + bl = __io_buffer_get_list(ctx, bgid); if (!bl || !bl->is_mmap) return NULL; From patchwork Thu Mar 28 23:31:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609742 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9CC4A225A8 for ; Thu, 28 Mar 2024 23:34:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668901; cv=none; b=EDYmM03SHiRzRkIfDHKee6WB3un8i95beM6xO3AgAwLryG9VZnKiftkHCmrkDT6JWpuGLw8+1GVmieoZF2zEcrdfpFHHuOJisMLMXkDKgjDjCjY2mlmBAu+2gq04lMn8MTNFIytNr+AUWPTYDnwq7rPD9W6oiUvj6PlwO3NBOxU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668901; c=relaxed/simple; bh=8G3QN6wuHYeuEtMo+UdWrxs8c//P5kJMH+4+MKrhbiw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Bapj7voK0wdKgyKWwbAXUskXgQ05BMLjlUwfDrE17e6F8JO8A5Pf5Sig7JCk21vSDNGc65KnjGFU/b6xzC5GkEWAksTqitoCGIrZnI4UvQXfeRxqSl/G5DEo8VzTKbSDW3NvOj6NUyE0VmKWKs/3VG31d2IpLhSuVfRqouMZHJc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=DbE0gLpP; arc=none smtp.client-ip=209.85.215.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="DbE0gLpP" Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-5d862e8b163so317575a12.1 for ; Thu, 28 Mar 2024 16:34:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668899; x=1712273699; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QtvzBKul3lnX4UwRp+DUDb1X15hTCGMb7+BkwEEbcgU=; b=DbE0gLpPprOUtmyGI5T/wYx8JdktjL7my2/vucvF6zrqvIbPJ2yYpapiPw4fyfavPD 4HT8q9OUBao5P+2BYk3UFtd9ddCeUnyaiQqnX2lov/oChrB5a/rco4pzyrNhFce15d34 yIUCMf3owhYxqRUMKMnF77dH13ykZG38X4vQQmchKliMTvwS5cLxwAEEA83duOkIVh1B aPl+jU8lONLsICNG4cbFtc9F2GsL4r41jXeX3/EvX0T2RXFKOYREfCyGb+dVxutIPHbV vJvLQJZ4b0WbaWyx7NIaqdi03D7qd20HBTL5h0yrJTS+siUsJEkr0AzyHb772oeb16Wj lXyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668899; x=1712273699; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QtvzBKul3lnX4UwRp+DUDb1X15hTCGMb7+BkwEEbcgU=; b=JUNtaS0HpjB0g+YGpgOWKiNwE1Wb8NUVhnrEcpRVaqMS2bv+TtZ8GaheeFIvgxbQfw m27sAQ+vSSQ0Nv5rCWHRnUlOWJq7TWy5B6DGCFB7+FdAUM0XrADKiDU+drUKZ8jnwmP2 uz/iCwteYAAcegj4NIGzIysdqmI+/QS8DNA2mKjYi2O0hqmDzUkTtUt52eMnjLRJaDF0 eAhCkLGB8UpumiujoKuIMkQIt5wapgxZApJxprCsGSQmVQZHpRUWa43HJ7ZtnzEnXGFb R4PIniZvHPG1zcXlUV1kfEqePSFl6V2eh+yEGNXcLvan+d8KzenQjK26hy5Dq3thHyFs GljA== X-Gm-Message-State: AOJu0YzN1pGKhfeHJSFXF23zJx/FcGW9LqDHk4CwXTMIRXmbcsTLQ1o5 xbtebO/jA+QURTRvR6s2THnWX4153yPmohei93S//NZSqRflVBINZHrropFT5T0QrjHA1/66s2b / X-Google-Smtp-Source: AGHT+IHMrKoSuOgUJAZ1BjSxfV5Yyb01o9L8DQuab2vKfL2DOj2AT5BGY+ufChJnR8NHTkxdMBpRlg== X-Received: by 2002:a17:902:e88b:b0:1dd:b54c:df51 with SMTP id w11-20020a170902e88b00b001ddb54cdf51mr956244plg.4.1711668898649; Thu, 28 Mar 2024 16:34:58 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.34.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:34:57 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 06/11] io_uring/kbuf: get rid of bl->is_ready Date: Thu, 28 Mar 2024 17:31:33 -0600 Message-ID: <20240328233443.797828-7-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Now that xarray is being exclusively used for the buffer_list lookup, this check is no longer needed. Get rid of it and the is_ready member. Signed-off-by: Jens Axboe --- io_uring/kbuf.c | 8 -------- io_uring/kbuf.h | 2 -- 2 files changed, 10 deletions(-) diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 8bf0121f00af..011280d873e7 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -61,7 +61,6 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx, * always under the ->uring_lock, but the RCU lookup from mmap does. */ bl->bgid = bgid; - smp_store_release(&bl->is_ready, 1); return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL)); } @@ -721,13 +720,6 @@ void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid) if (!bl || !bl->is_mmap) return NULL; - /* - * Ensure the list is fully setup. Only strictly needed for RCU lookup - * via mmap, and in that case only for the array indexed groups. For - * the xarray lookups, it's either visible and ready, or not at all. - */ - if (!smp_load_acquire(&bl->is_ready)) - return NULL; return bl->buf_ring; } diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index 1c7b654ee726..fdbb10449513 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -29,8 +29,6 @@ struct io_buffer_list { __u8 is_buf_ring; /* ring mapped provided buffers, but mmap'ed by application */ __u8 is_mmap; - /* bl is visible from an RCU point of view for lookup */ - __u8 is_ready; }; struct io_buffer { From patchwork Thu Mar 28 23:31:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609743 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 869C9225A8 for ; Thu, 28 Mar 2024 23:35:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668903; cv=none; b=GWM7uR9nGcVWA0bJdsewSeTkfAK++XjQSIfr0xUm20m1qMEKIrc2KlIW9F9pwnC58Xp2CNTSo59ydx4qO1Vod5TtoxYYDEqybyZ380T+Rxu3d0eslKDIwH4r1Q1htz653PFIHn2KiqjCLI/V28NX+ANuIfREww4U+/5qftXqyds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668903; c=relaxed/simple; bh=0W2obeekuSQdzZWXiQybp+acO7ke81Dc2YvaIvoTuQY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AxvBH1xxpll8ckysW0h6J+2aiU9A/VzYNNhmDvsbxZ50CJHIGu97ljWGSxRehRqIm5rkoKPE67JZGypVaMo1qeOkCW56zQRIjytNMpTtYqOrqbYoyt1sp8Tjdzn6i86wtT3bVnkCtnfc4vVmBwaULd5armpKsQIjxkcfohvaQr0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=vYQ4xaYM; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="vYQ4xaYM" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1dd3c6c8dbbso2242955ad.1 for ; Thu, 28 Mar 2024 16:35:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668900; x=1712273700; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IrcjwXvOlSphD9f4BvWGSMDrK92UD29qUBOBEFOPsVY=; b=vYQ4xaYMxRSy80Yl/vVwtjxEQpfeeyoRBywo/ES6OA8k0TZZ/PI3iuoZvxkquaxMYV g/a5Xjs1z487K6wod8vo4+Pq6Du81cJdmjfwkVtn+QhDtTBd/s+zuwS9cetnH39ojIzZ UctEmPntyOxwMw3sEgkU3AdrkEBe6Pjvj7uwIbNbUERmMm2L2G9yTl67gWe65JEmH7V8 cXKnj+VBrnTb30QactATZ52b8aGyaZEAJ6KQ0QEGvjCNWvnos5GTTZ4m8T8ZpcLncMWk /eEXmC3LpHzOqfD7Ue7yaYBIUQKv/5zIy3A3h5SnhZG9alKIg/dX2WfvpBo6j9nc9W1a Riew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668900; x=1712273700; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IrcjwXvOlSphD9f4BvWGSMDrK92UD29qUBOBEFOPsVY=; b=dvwGJcRMSIZrrwRXhW7XqYIylmCf5AGnBDUyxw8CW/FOe6WaRoVU1EBAwuYe8dC88V y3fFcIawoJW8Uom0yFyiYCbkp0C2DC1PRjHnTyEfWZ6cckqsW1XmVMqdtHdsG9TheBo9 Kd3w+RT+Fj1S0slFiI8azNXGyNDans+yYJ8C2cA/eyRTrlysn5aUc+iKIkXLKq0pE+mN EJP15TE6bJKmlFITYXid0jZcLHLRhQsTnWvJpTmE8YNY0iHTCgb3wXQHvuDyyun2Xqt3 fzcRW6538rM6ATPinueJhvJ49+U20nihAuXdA238AenZCYjmZj5zOkNl7p7mJqcdtgyY NvjQ== X-Gm-Message-State: AOJu0YzrTyTKcMScBXfp432Yg0kvJu07bI02n3JM40SKTFwTEiI6loBw O/0Zv1lOGc8aaMlrdhjmjnaq8Qoez89iZub6wigX5qnhPWXgfbBtqKj48Dlg0oDPraynIdkARE6 z X-Google-Smtp-Source: AGHT+IFQiLQ3jjSDdv+HXbLZpdwfpE8vT5LRB2kjU9KB95frRjRodEyl1J28JW62h7rchFf8XGGT9Q== X-Received: by 2002:a17:903:503:b0:1dd:b883:3398 with SMTP id jn3-20020a170903050300b001ddb8833398mr946742plb.4.1711668900443; Thu, 28 Mar 2024 16:35:00 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.34.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:34:59 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 07/11] io_uring/kbuf: vmap pinned buffer ring Date: Thu, 28 Mar 2024 17:31:34 -0600 Message-ID: <20240328233443.797828-8-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This avoids needing to care about HIGHMEM, and it makes the buffer indexing easier as both ring provided buffer methods are now virtually mapped in a contigious fashion. Signed-off-by: Jens Axboe --- io_uring/kbuf.c | 39 +++++++++++++++------------------------ 1 file changed, 15 insertions(+), 24 deletions(-) diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 011280d873e7..72c15dde34d3 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -145,15 +146,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len, req->flags |= REQ_F_BL_EMPTY; head &= bl->mask; - /* mmaped buffers are always contig */ - if (bl->is_mmap || head < IO_BUFFER_LIST_BUF_PER_PAGE) { - buf = &br->bufs[head]; - } else { - int off = head & (IO_BUFFER_LIST_BUF_PER_PAGE - 1); - int index = head / IO_BUFFER_LIST_BUF_PER_PAGE; - buf = page_address(bl->buf_pages[index]); - buf += off; - } + buf = &br->bufs[head]; if (*len == 0 || *len > buf->len) *len = buf->len; req->flags |= REQ_F_BUFFER_RING; @@ -240,6 +233,7 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, for (j = 0; j < bl->buf_nr_pages; j++) unpin_user_page(bl->buf_pages[j]); kvfree(bl->buf_pages); + vunmap(bl->buf_ring); bl->buf_pages = NULL; bl->buf_nr_pages = 0; } @@ -490,9 +484,9 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags) static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg, struct io_buffer_list *bl) { - struct io_uring_buf_ring *br; + struct io_uring_buf_ring *br = NULL; + int nr_pages, ret, i; struct page **pages; - int i, nr_pages; pages = io_pin_pages(reg->ring_addr, flex_array_size(br, bufs, reg->ring_entries), @@ -500,18 +494,12 @@ static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg, if (IS_ERR(pages)) return PTR_ERR(pages); - /* - * Apparently some 32-bit boxes (ARM) will return highmem pages, - * which then need to be mapped. We could support that, but it'd - * complicate the code and slowdown the common cases quite a bit. - * So just error out, returning -EINVAL just like we did on kernels - * that didn't support mapped buffer rings. - */ - for (i = 0; i < nr_pages; i++) - if (PageHighMem(pages[i])) - goto error_unpin; + br = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); + if (!br) { + ret = -ENOMEM; + goto error_unpin; + } - br = page_address(pages[0]); #ifdef SHM_COLOUR /* * On platforms that have specific aliasing requirements, SHM_COLOUR @@ -522,8 +510,10 @@ static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg, * should use IOU_PBUF_RING_MMAP instead, and liburing will handle * this transparently. */ - if ((reg->ring_addr | (unsigned long) br) & (SHM_COLOUR - 1)) + if ((reg->ring_addr | (unsigned long) br) & (SHM_COLOUR - 1)) { + ret = -EINVAL; goto error_unpin; + } #endif bl->buf_pages = pages; bl->buf_nr_pages = nr_pages; @@ -535,7 +525,8 @@ static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg, for (i = 0; i < nr_pages; i++) unpin_user_page(pages[i]); kvfree(pages); - return -EINVAL; + vunmap(br); + return ret; } /* From patchwork Thu Mar 28 23:31:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609744 Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C224E136E2F for ; Thu, 28 Mar 2024 23:35:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668905; cv=none; b=IJndIs2hvn4Z98QOz7NKqJ2JGp9ejLTSIK2MhE3vWGXkYQHFGdsORJMbsrNpIk5K/OLccQwzSNBVNFAtcemKv9hC4sVqP3UZI9Fzn0CfUWQOva2+1bTDIwumeCgQSUpQGzoJMJAXlYGMHzXnQaGidPsKyukGYWDLGGp7i08UOG8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668905; c=relaxed/simple; bh=5YqocHr5mxuGOr5Z0D+Jj5Z+RZ0wDYj67RIhWvHLoys=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PwYwKmynCgat1wzI/PORK9Ql2VKwzQ78WZvilkgPztGwQphO/r0h+hcHx5VsT+UVP8+/fP7g0jalS8nmgkxuJxUGYUrzktZpLbiIHo/9vv4KJUqBMm1amEvelxbQ206z9Oi1hEK7CjGWt2F0E7CZhxI8Wb1bTOCzhpKy/WCZNgk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=teIZ5hNL; arc=none smtp.client-ip=209.85.215.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="teIZ5hNL" Received: by mail-pg1-f178.google.com with SMTP id 41be03b00d2f7-5d862e8b163so317588a12.1 for ; Thu, 28 Mar 2024 16:35:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668902; x=1712273702; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dRbXBnmBssLCefthvGjqIke1i8J8DEDsrgn5OvhdPnw=; b=teIZ5hNLpjWL4f2YYNHiFcN1bHOtAivqdcs11h8WICOgDuAm08cZlHHBv2EQH1xF4A 48/6RwBNjoOqLOq/tjBZntr/sd5aQruAFLQbZMX4bU9qikozIxkKfmmifiDySxjoQ4mb pg03reP5RUTG/aCLsEx9tpvwRfETzgVFNTVXBDFhSfHAdbKbEWfq/gK971wWvOXilAN5 KUTmD4bWoSV+V+mfA/q1lShTEFLKfffzvzexqqqarQJKVEema9SzVfvbg8sp68hoKJ3h 3hS/yA4rGMm8LVMXFaapxcC1jlTuNogWSOsyIerH9qF3qbgB45b9TDm7B9A8ZV6Qery8 2xug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668902; x=1712273702; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dRbXBnmBssLCefthvGjqIke1i8J8DEDsrgn5OvhdPnw=; b=OXu/MZi1MF4SBan18DcVOyxL4kQDrvc1y8j+pK44DyCbs7qHQQh8oBbA+Q832ve+VN 8EDoaua4T8K7aDfc6QRjL7Qk/Ye1aOT2fUaVWYrI+9oi+5peGCy6WMXhL3pcjmRLyxfK XB1mEbGo7AL7TAWenL0SihkOSFRTfPWiJqIpimbPrGrpwr+dyf+1tEjhXbIuZfxhFrAG d47N/3XsQBSAjglqhAHjgDxkiUvhpX/bt9i5JHSGwsOGKu+EVmvDdA8llnjnKyYSGt5N z2DYLvOLWfpKoxFixoi2el9I32jnYt4VJrtgXGCJK+wKsXzHUTFHjgfsKufBbVEdQqAB PRtg== X-Gm-Message-State: AOJu0Yxp+5+zPm2R72TCnvpradGxXIqj+NSpx5iSxdakj5fWY1nBJZhA 1LrOfCr6jZliesUyYeRkk83PR5wZlhya6BKYnW2dYr3CSShOy41fAug4xS6qTfsCG3bwjibSLPu x X-Google-Smtp-Source: AGHT+IGnkir1bN5136ABMvARK0KWrUS1v8cnv/2ciEs2ZgD62XmYdr831W5IH0VHpvUGUB6Jfiipeg== X-Received: by 2002:a17:902:c401:b0:1dd:e128:16b1 with SMTP id k1-20020a170902c40100b001dde12816b1mr933529plk.6.1711668902540; Thu, 28 Mar 2024 16:35:02 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.35.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:35:00 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 08/11] io_uring/kbuf: protect io_buffer_list teardown with a reference Date: Thu, 28 Mar 2024 17:31:35 -0600 Message-ID: <20240328233443.797828-9-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 No functional changes in this patch, just in preparation for being able to keep the buffer list alive outside of the ctx->uring_lock. Signed-off-by: Jens Axboe --- io_uring/kbuf.c | 15 +++++++++++---- io_uring/kbuf.h | 2 ++ 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 72c15dde34d3..206f4d352e15 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -62,6 +62,7 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx, * always under the ->uring_lock, but the RCU lookup from mmap does. */ bl->bgid = bgid; + atomic_set(&bl->refs, 1); return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL)); } @@ -259,6 +260,14 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, return i; } +static void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl) +{ + if (atomic_dec_and_test(&bl->refs)) { + __io_remove_buffers(ctx, bl, -1U); + kfree_rcu(bl, rcu); + } +} + void io_destroy_buffers(struct io_ring_ctx *ctx) { struct io_buffer_list *bl; @@ -268,8 +277,7 @@ void io_destroy_buffers(struct io_ring_ctx *ctx) xa_for_each(&ctx->io_bl_xa, index, bl) { xa_erase(&ctx->io_bl_xa, bl->bgid); - __io_remove_buffers(ctx, bl, -1U); - kfree_rcu(bl, rcu); + io_put_bl(ctx, bl); } /* @@ -671,9 +679,8 @@ int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) if (!bl->is_buf_ring) return -EINVAL; - __io_remove_buffers(ctx, bl, -1U); xa_erase(&ctx->io_bl_xa, bl->bgid); - kfree_rcu(bl, rcu); + io_put_bl(ctx, bl); return 0; } diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index fdbb10449513..8b868a1744e2 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -25,6 +25,8 @@ struct io_buffer_list { __u16 head; __u16 mask; + atomic_t refs; + /* ring mapped provided buffers */ __u8 is_buf_ring; /* ring mapped provided buffers, but mmap'ed by application */ From patchwork Thu Mar 28 23:31:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609745 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDAE3225A8 for ; Thu, 28 Mar 2024 23:35:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668907; cv=none; b=CTeZmaFpoOxkcf1UYF2L4fYrMDQFyuVykb61QqOJJmfNhErmThcjwYOFulBmSeKU3ipIHjk/qiALGUVt49yYEc272LKSGJxA/SMzFM4Tin/RJeSMy9KCtbblmXGzJe6KigxmRmUu6wm14k0WimBVqz2ywgwCrQ+fHObAjL/Zxm8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668907; c=relaxed/simple; bh=4dhUHqRgwt8/Q+/qtau1Kl1V75DE5a2rKDjWkfi3+TQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CVacOTFDdP/o6u47/DahllIaQEkS6In+pQFXBOlLQ1Sfd/LRrZCI0vTEU9P7eVPHCtYTHxsM33Tq+d3qNnl6FYSj+RwZ9QLCkN2L0hyAYeJ/EXY/aqvbEvAYZj1jU2xAcVesww8MtWXzOfSplZqLqp+8yOEXo5VaCbWLG4zWhtQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=byRImGTk; arc=none smtp.client-ip=209.85.216.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="byRImGTk" Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-29df844539bso409624a91.1 for ; Thu, 28 Mar 2024 16:35:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668905; x=1712273705; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JOjDCEW/vDs3KlXARcdqkBEDx9X1E0V3AAxabhX8V4A=; b=byRImGTk8v/pPa/1DwIixH7loEUNb+t5wZMCOBgYhnWbGyUum1Ul3YqWO8tN7gc0zn Saax21V2n4flBe9MEXbaW8uVedbect2YjuUHMJ6g2iT7JsdMB2yxQWMO/fDKcPy08c2B L4NWemtBERnJiQABryH72BlvLk8Wye7Kat12DzpVSUIPhTE9aoC7WMHaTg6973yYbERF TiKVorLmWsFTSx6stMVM//X9GATZcB4GKiKzdlxgPHNmktShxynvlYIZDw9CcHEJcBtG Gwlr3KmKOBaB5HxsZbu0hocQXKyhZEezQ2aPAMGWEgfjAvruXgjLXwEfpTnWAxyrsVb4 C49A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668905; x=1712273705; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JOjDCEW/vDs3KlXARcdqkBEDx9X1E0V3AAxabhX8V4A=; b=bHLP/4LcwOkpiWG904Wy3WXkfcD5cyshQT2o8iHzK23Vbx4DIbtNx64G3uMbme+Wi3 L4OZdxlBA7Jq9kdx2tpwEvmodzdCUyNGuq3kpNOAPZOytZ5mNXp6MjEMRIi+3dkxkwEI iHGxcMG5cCucvxAwbVhXvFijTj37pcq1gneWi05KtGo5iYkk6c1S6Y9XX7lpGZGqRfrZ OmoCd1VVpPc3MPCOfii3ROEEDWQlGyqxkkcc6K2snb0qZ4nOft0WxyExs57/40g9sjdD u5aNTxho6lFKXsQz4ujhNpnC9Qvf2DxGtR4+wVlNqPQGeSMc2gZNd07brtRRMuF+10wY BCmg== X-Gm-Message-State: AOJu0Yyge81UohBP2L83ORMZ1k+p/BghhQgIAC4U3Z+pCq3Fccy6slby FI2Qu7uBkVhmoh03e8CxS5fEia0HmuMRSJjlafvtDcDWBGrpypVGTsZUNhsxvzHGvzq3IE99hqe W X-Google-Smtp-Source: AGHT+IE7WhcLGr8+L7g8tCfz8G/abD1k6X/CtFIlhuXoiPrnO/BInBQjBV0wYY/KyAsEsotfY0T8QQ== X-Received: by 2002:a17:902:760b:b0:1dc:df03:ad86 with SMTP id k11-20020a170902760b00b001dcdf03ad86mr943507pll.2.1711668904615; Thu, 28 Mar 2024 16:35:04 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.35.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:35:03 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 09/11] io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring Date: Thu, 28 Mar 2024 17:31:36 -0600 Message-ID: <20240328233443.797828-10-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_page() and have it Just Work. This requires a bit of effort on the mmap lookup side, as the ctx uring_lock isn't held, which otherwise protects buffer_lists from being torn down, and it's not safe to grab from mmap context that would introduce an ABBA deadlock between the mmap lock and the ctx uring_lock. Instead, lookup the buffer_list under RCU, as the the list is RCU freed already. Use the existing reference count to determine whether it's possible to safely grab a reference to it (eg if it's not zero already), and drop that reference when done with the mapping. If the mmap reference is the last one, the buffer_list and the associated memory can go away, since the vma insertion has references to the inserted pages at that point. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 3 - io_uring/io_uring.c | 69 +++++-------- io_uring/io_uring.h | 6 +- io_uring/kbuf.c | 171 +++++++++++---------------------- io_uring/kbuf.h | 7 +- 5 files changed, 85 insertions(+), 171 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 8c64c303dee8..aeb4639785b5 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -372,9 +372,6 @@ struct io_ring_ctx { struct list_head io_buffers_cache; - /* deferred free list, protected by ->uring_lock */ - struct hlist_head io_buf_list; - /* Keep this last, we don't need it for the fast path */ struct wait_queue_head poll_wq; struct io_restriction restrictions; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 17db5b2aa4b5..83f63630365a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -302,7 +302,6 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_LIST_HEAD(&ctx->sqd_list); INIT_LIST_HEAD(&ctx->cq_overflow_list); INIT_LIST_HEAD(&ctx->io_buffers_cache); - INIT_HLIST_HEAD(&ctx->io_buf_list); ret = io_alloc_cache_init(&ctx->rsrc_node_cache, IO_NODE_ALLOC_CACHE_MAX, sizeof(struct io_rsrc_node)); ret |= io_alloc_cache_init(&ctx->apoll_cache, IO_POLL_ALLOC_CACHE_MAX, @@ -2592,12 +2591,12 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0; } -static void io_pages_unmap(void *ptr, struct page ***pages, - unsigned short *npages) +void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages, + bool put_pages) { bool do_vunmap = false; - if (*npages) { + if (put_pages && *npages) { struct page **to_free = *pages; int i; @@ -2619,14 +2618,6 @@ static void io_pages_unmap(void *ptr, struct page ***pages, *npages = 0; } -void io_mem_free(void *ptr) -{ - if (!ptr) - return; - - folio_put(virt_to_folio(ptr)); -} - static void io_pages_free(struct page ***pages, int npages) { struct page **page_array = *pages; @@ -2721,8 +2712,10 @@ static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr, static void io_rings_free(struct io_ring_ctx *ctx) { if (!(ctx->flags & IORING_SETUP_NO_MMAP)) { - io_pages_unmap(ctx->rings, &ctx->ring_pages, &ctx->n_ring_pages); - io_pages_unmap(ctx->sq_sqes, &ctx->sqe_pages, &ctx->n_sqe_pages); + io_pages_unmap(ctx->rings, &ctx->ring_pages, &ctx->n_ring_pages, + true); + io_pages_unmap(ctx->sq_sqes, &ctx->sqe_pages, &ctx->n_sqe_pages, + true); } else { io_pages_free(&ctx->ring_pages, ctx->n_ring_pages); ctx->n_ring_pages = 0; @@ -2779,8 +2772,8 @@ static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size, return ERR_PTR(-ENOMEM); } -static void *io_pages_map(struct page ***out_pages, unsigned short *npages, - size_t size) +void *io_pages_map(struct page ***out_pages, unsigned short *npages, + size_t size) { gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN; struct page **pages; @@ -2810,17 +2803,6 @@ static void *io_pages_map(struct page ***out_pages, unsigned short *npages, return ERR_PTR(-ENOMEM); } -void *io_mem_alloc(size_t size) -{ - gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; - void *ret; - - ret = (void *) __get_free_pages(gfp, get_order(size)); - if (ret) - return ret; - return ERR_PTR(-ENOMEM); -} - static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, unsigned int cq_entries, size_t *sq_offset) { @@ -2917,7 +2899,6 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) ctx->mm_account = NULL; } io_rings_free(ctx); - io_kbuf_mmap_list_free(ctx); percpu_ref_exit(&ctx->refs); free_uid(ctx->user); @@ -3387,10 +3368,8 @@ static void *io_uring_validate_mmap_request(struct file *file, { struct io_ring_ctx *ctx = file->private_data; loff_t offset = pgoff << PAGE_SHIFT; - struct page *page; - void *ptr; - switch (offset & IORING_OFF_MMAP_MASK) { + switch ((pgoff << PAGE_SHIFT) & IORING_OFF_MMAP_MASK) { case IORING_OFF_SQ_RING: case IORING_OFF_CQ_RING: /* Don't allow mmap if the ring was setup without it */ @@ -3403,25 +3382,21 @@ static void *io_uring_validate_mmap_request(struct file *file, return ERR_PTR(-EINVAL); return ctx->sq_sqes; case IORING_OFF_PBUF_RING: { + struct io_buffer_list *bl; unsigned int bgid; + void *ret; bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT; - rcu_read_lock(); - ptr = io_pbuf_get_address(ctx, bgid); - rcu_read_unlock(); - if (!ptr) - return ERR_PTR(-EINVAL); - break; + bl = io_pbuf_get_bl(ctx, bgid); + if (IS_ERR(bl)) + return bl; + ret = bl->buf_ring; + io_put_bl(ctx, bl); + return ret; } - default: - return ERR_PTR(-EINVAL); } - page = virt_to_head_page(ptr); - if (sz > page_size(page)) - return ERR_PTR(-EINVAL); - - return ptr; + return ERR_PTR(-EINVAL); } int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma, @@ -3440,7 +3415,6 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) struct io_ring_ctx *ctx = file->private_data; size_t sz = vma->vm_end - vma->vm_start; long offset = vma->vm_pgoff << PAGE_SHIFT; - unsigned long pfn; void *ptr; ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz); @@ -3455,10 +3429,11 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) case IORING_OFF_SQES: return io_uring_mmap_pages(ctx, vma, ctx->sqe_pages, ctx->n_sqe_pages); + case IORING_OFF_PBUF_RING: + return io_pbuf_mmap(file, vma); } - pfn = virt_to_phys(ptr) >> PAGE_SHIFT; - return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot); + return -EINVAL; } static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp, diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 75230d914007..dec996a1c789 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -109,8 +109,10 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); -void *io_mem_alloc(size_t size); -void io_mem_free(void *ptr); +void *io_pages_map(struct page ***out_pages, unsigned short *npages, + size_t size); +void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages, + bool put_pages); enum { IO_EVENTFD_OP_SIGNAL_BIT, diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 206f4d352e15..99b349930a1a 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -32,25 +32,12 @@ struct io_provide_buf { __u16 bid; }; -struct io_buf_free { - struct hlist_node list; - void *mem; - size_t size; - int inuse; -}; - -static inline struct io_buffer_list *__io_buffer_get_list(struct io_ring_ctx *ctx, - unsigned int bgid) -{ - return xa_load(&ctx->io_bl_xa, bgid); -} - static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx, unsigned int bgid) { lockdep_assert_held(&ctx->uring_lock); - return __io_buffer_get_list(ctx, bgid); + return xa_load(&ctx->io_bl_xa, bgid); } static int io_buffer_add_list(struct io_ring_ctx *ctx, @@ -191,24 +178,6 @@ void __user *io_buffer_select(struct io_kiocb *req, size_t *len, return ret; } -/* - * Mark the given mapped range as free for reuse - */ -static void io_kbuf_mark_free(struct io_ring_ctx *ctx, struct io_buffer_list *bl) -{ - struct io_buf_free *ibf; - - hlist_for_each_entry(ibf, &ctx->io_buf_list, list) { - if (bl->buf_ring == ibf->mem) { - ibf->inuse = 0; - return; - } - } - - /* can't happen... */ - WARN_ON_ONCE(1); -} - static int __io_remove_buffers(struct io_ring_ctx *ctx, struct io_buffer_list *bl, unsigned nbufs) { @@ -220,23 +189,18 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, if (bl->is_buf_ring) { i = bl->buf_ring->tail - bl->head; - if (bl->is_mmap) { - /* - * io_kbuf_list_free() will free the page(s) at - * ->release() time. - */ - io_kbuf_mark_free(ctx, bl); - bl->buf_ring = NULL; - bl->is_mmap = 0; - } else if (bl->buf_nr_pages) { + if (bl->buf_nr_pages) { int j; - for (j = 0; j < bl->buf_nr_pages; j++) - unpin_user_page(bl->buf_pages[j]); - kvfree(bl->buf_pages); - vunmap(bl->buf_ring); - bl->buf_pages = NULL; - bl->buf_nr_pages = 0; + for (j = 0; j < bl->buf_nr_pages; j++) { + if (bl->is_mmap) + put_page(bl->buf_pages[j]); + else + unpin_user_page(bl->buf_pages[j]); + } + io_pages_unmap(bl->buf_ring, &bl->buf_pages, + &bl->buf_nr_pages, false); + bl->is_mmap = 0; } /* make sure it's seen as empty */ INIT_LIST_HEAD(&bl->buf_list); @@ -260,7 +224,7 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx, return i; } -static void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl) +void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl) { if (atomic_dec_and_test(&bl->refs)) { __io_remove_buffers(ctx, bl, -1U); @@ -537,63 +501,18 @@ static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg, return ret; } -/* - * See if we have a suitable region that we can reuse, rather than allocate - * both a new io_buf_free and mem region again. We leave it on the list as - * even a reused entry will need freeing at ring release. - */ -static struct io_buf_free *io_lookup_buf_free_entry(struct io_ring_ctx *ctx, - size_t ring_size) -{ - struct io_buf_free *ibf, *best = NULL; - size_t best_dist; - - hlist_for_each_entry(ibf, &ctx->io_buf_list, list) { - size_t dist; - - if (ibf->inuse || ibf->size < ring_size) - continue; - dist = ibf->size - ring_size; - if (!best || dist < best_dist) { - best = ibf; - if (!dist) - break; - best_dist = dist; - } - } - - return best; -} - static int io_alloc_pbuf_ring(struct io_ring_ctx *ctx, struct io_uring_buf_reg *reg, struct io_buffer_list *bl) { - struct io_buf_free *ibf; size_t ring_size; - void *ptr; ring_size = reg->ring_entries * sizeof(struct io_uring_buf_ring); - /* Reuse existing entry, if we can */ - ibf = io_lookup_buf_free_entry(ctx, ring_size); - if (!ibf) { - ptr = io_mem_alloc(ring_size); - if (IS_ERR(ptr)) - return PTR_ERR(ptr); - - /* Allocate and store deferred free entry */ - ibf = kmalloc(sizeof(*ibf), GFP_KERNEL_ACCOUNT); - if (!ibf) { - io_mem_free(ptr); - return -ENOMEM; - } - ibf->mem = ptr; - ibf->size = ring_size; - hlist_add_head(&ibf->list, &ctx->io_buf_list); - } - ibf->inuse = 1; - bl->buf_ring = ibf->mem; + bl->buf_ring = io_pages_map(&bl->buf_pages, &bl->buf_nr_pages, ring_size); + if (!bl->buf_ring) + return -ENOMEM; + bl->is_buf_ring = 1; bl->is_mmap = 1; return 0; @@ -710,30 +629,50 @@ int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg) return 0; } -void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid) +struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx, + unsigned long bgid) { struct io_buffer_list *bl; + int ret; - bl = __io_buffer_get_list(ctx, bgid); - - if (!bl || !bl->is_mmap) - return NULL; - - return bl->buf_ring; + /* + * We have to be a bit careful here - we're inside mmap and cannot + * grab the uring_lock. This means the buffer_list could be + * simultaneously going away, if someone is trying to be sneaky. + * Look it up under rcu so we now it's not going away, and attempt + * to grab a reference to it. If the ref is already zero, then fail + * the mapping. If successful, we'll drop the reference at at the end. + * This may then safely free the buffer_list (and drop the pages) at + * that point, vm_insert_pages() would've already grabbed the + * necessary vma references. + */ + rcu_read_lock(); + bl = xa_load(&ctx->io_bl_xa, bgid); + /* must be a mmap'able buffer ring and have pages */ + if (bl && bl->is_mmap && bl->buf_nr_pages) + ret = atomic_inc_not_zero(&bl->refs); + rcu_read_unlock(); + + if (!ret) + return ERR_PTR(-EINVAL); + + return bl; } -/* - * Called at or after ->release(), free the mmap'ed buffers that we used - * for memory mapped provided buffer rings. - */ -void io_kbuf_mmap_list_free(struct io_ring_ctx *ctx) +int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma) { - struct io_buf_free *ibf; - struct hlist_node *tmp; + struct io_ring_ctx *ctx = file->private_data; + loff_t pgoff = vma->vm_pgoff << PAGE_SHIFT; + struct io_buffer_list *bl; + int bgid, ret; - hlist_for_each_entry_safe(ibf, tmp, &ctx->io_buf_list, list) { - hlist_del(&ibf->list); - io_mem_free(ibf->mem); - kfree(ibf); - } + bgid = (pgoff & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT; + + bl = io_pbuf_get_bl(ctx, bgid); + if (IS_ERR(bl)) + return PTR_ERR(bl); + + ret = io_uring_mmap_pages(ctx, vma, bl->buf_pages, bl->buf_nr_pages); + io_put_bl(ctx, bl); + return ret; } diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index 8b868a1744e2..53c141d9a8b2 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -55,13 +55,14 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg); int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg); int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg); -void io_kbuf_mmap_list_free(struct io_ring_ctx *ctx); - void __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags); bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags); -void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid); +void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl); +struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx, + unsigned long bgid); +int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma); static inline bool io_kbuf_recycle_ring(struct io_kiocb *req) { From patchwork Thu Mar 28 23:31:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609746 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BAAD1311AC for ; Thu, 28 Mar 2024 23:35:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668909; cv=none; b=Bj2yYrk4es4NeZ6BwLed/dFQFv03H/j6VvvLxjL5ixWH8tDSqNEVlOxmM3aT0tx2txBa3HK9LH90TUaZ5jn/r1PO2Z6SjsHoOTn6g6BdXrlPnGm3iLBiGMIYYD++tKL6vr11y1EIDBVlEOg1KKGB5ll9xwOGMYTilnGrWqkPQs0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668909; c=relaxed/simple; bh=zMV6299kW5W5BslYzGX1FoPZfGMJUyscqS/rq0NYjuQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iAgo0n/R9Ko4NlQ7HG/TWSgfr4YZM0G4UHITeuRSqZ3x3Y26cvO/mOKbmE11wbf4Bja8NUV5vf48KIp9rFlA/EdgdUa7OotTM6adi7E/Zvw1o8wfqmCflJHzk0z+nV4c/MHYJelvakklEMBM0kchlhqrJ5ZJAhuouJ0nWdrjSpo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=oljRFRg6; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="oljRFRg6" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-6ea895eaaadso292263b3a.1 for ; Thu, 28 Mar 2024 16:35:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668907; x=1712273707; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NkC1wgxtZXdI83W5YRayAah5RZJ0rineL5BY43A7LG0=; b=oljRFRg6jCXuCQ/cODwP3i6CYX5rfA3eA0TR6+RXoz0/XI8C2OM7RscFBa908gPK/3 qLY7NyIFPsZkPvZaDyF6y7TmONbKYqen78RNFytyV+QZ8chvyHGmBUit7Ec8IAWuaUX8 El38zLtVqg4C0WJ5kaSjpVA8DVcpq+ul5JTsW+UlltbkiwbmzYoJlXPHx8ygexPADajP elq8xpVBN5jjgNXjkqAhw+vHY9tbS7pCF2bAirEvWul7EtFHakHXW8JaJtKTR2Q+IJRl UB8wCtxKe7+AUv829ZFN5y+kGC+FBpYWV0QTEkWVC6KU8Qchr24NnD4TcIjfMXw7L/Oa 7N/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668907; x=1712273707; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NkC1wgxtZXdI83W5YRayAah5RZJ0rineL5BY43A7LG0=; b=Ft2ds+CEGJnoAbyIIzskyNORr+OqkQr3HVd7BMy3FAylMYuu42dZu/jNI8zhZGpRzK uWB7p4fnSvytFtCbGYejtioUr/VIVYDQMSv3MctY3YCWhc8fSbvC2GJSCREWrL+Z6Udp UB5Icb5PCcQZnsIkJTaAOcFkS5jhylUdtcrUoHKQjeCfteOKk4SVOZvCXb5fI9mStugA UTOIm4Ppco2gV+VnT43TZb49l9V0cr9hhBjAY9MY0DgfoMEvFlUaHkPcP2WV+N51IoO2 UgGS8Jupk0H+sdzSXEHWTisC4FknOsLNtHg9ecTQg5qgoXq7RAmtlgUKaM8kiAmdRgDl 8uSg== X-Gm-Message-State: AOJu0YwXcAXmFtZGnY5irS29sn5yeVgg+FsWzd39HA6sA34Z08HmRbD6 0Qh07n37TCM/sKLPy6Y/VhEwRg0RnZvTnMrs16h3h8sLHjViNd71LJezD48b9hrVjIyHbb/AMlL 4 X-Google-Smtp-Source: AGHT+IHV7qFGvLhFDkfJInmeimD1Xk7KxxufcP/2kLUppZVO+LZtyHJaJ5TzDRwZ3bKaijCKPTJSQA== X-Received: by 2002:a05:6a21:a593:b0:1a3:c61f:c2d5 with SMTP id gd19-20020a056a21a59300b001a3c61fc2d5mr594044pzc.6.1711668906764; Thu, 28 Mar 2024 16:35:06 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.35.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:35:05 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 10/11] io_uring: use unpin_user_pages() where appropriate Date: Thu, 28 Mar 2024 17:31:37 -0600 Message-ID: <20240328233443.797828-11-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 There are a few cases of open-rolled loops around unpin_user_page(), use the generic helper instead. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 4 +--- io_uring/kbuf.c | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 83f63630365a..00b98e80f8ca 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2621,13 +2621,11 @@ void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages, static void io_pages_free(struct page ***pages, int npages) { struct page **page_array = *pages; - int i; if (!page_array) return; - for (i = 0; i < npages; i++) - unpin_user_page(page_array[i]); + unpin_user_pages(page_array, npages); kvfree(page_array); *pages = NULL; } diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 99b349930a1a..3ba576ccb1d9 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -457,8 +457,8 @@ static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg, struct io_buffer_list *bl) { struct io_uring_buf_ring *br = NULL; - int nr_pages, ret, i; struct page **pages; + int nr_pages, ret; pages = io_pin_pages(reg->ring_addr, flex_array_size(br, bufs, reg->ring_entries), @@ -494,8 +494,7 @@ static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg, bl->is_mmap = 0; return 0; error_unpin: - for (i = 0; i < nr_pages; i++) - unpin_user_page(pages[i]); + unpin_user_pages(pages, nr_pages); kvfree(pages); vunmap(br); return ret; From patchwork Thu Mar 28 23:31:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13609747 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91B5E225A8 for ; Thu, 28 Mar 2024 23:35:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668913; cv=none; b=R8VWC5t6ohmKF97D6BSkiWhiN76DwginrucvhILaEJy19UIIF8l5kCepCAjy5i0RyBgLFLxJKEoLSvGrEb31UM/8X07k2yYlBfud9mgVWT3geyLeY8xuCAadsztDBpspvgSOPqTzKq7uaVLV3DGLynI6PBvGfCwuQuqYyngVvus= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711668913; c=relaxed/simple; bh=O+kS4JDASOdO5CqxGTE/lFx80kvhI11pQnz/ewuMMTg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K/IBWhgYLsef2L7Gxu1sNCTJVBUiDwIetv0GVjIn9fA6wbXkVLYbl2hJRaYCaRiuC+ptjRtavtHn3EX7hqWwVIo6RjdXmfJX+B0q7Z3J2/qVNagn4iA1uAUkrjaHFnMpb1+Pjajf7HzuiwiSiLwlnG6g4SThcYvDr7KQoIsHjv4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=WcnfDlJo; arc=none smtp.client-ip=209.85.216.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="WcnfDlJo" Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-29df844539bso409635a91.1 for ; Thu, 28 Mar 2024 16:35:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1711668909; x=1712273709; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0qg3i6J4NScNr25m5MiWnkmUy5W5kQx3QdqVHYNf+yM=; b=WcnfDlJoRbM1q+QvS/A54e9HAzspyxAR4che/c6cSMvM29KKPEW6Q+wXVxcaKpf7Sy yeF/lfxdt0G7VE+zAxo3zG6JBDfw4p0QzITYsG1GTaxBkVPd2hNGrWuW1maBkR4Eh9JY cBCe8C6Jz/c2zD5Nl4aMT1gwX36KYpPUN5xB+lZES2euk00A/WOnf0kTlojQZBKIYcuE B/43UtlbpF0wqRfMbFm4g3WTK0lTKUeOTslKe+e/Zvy0Tj3Vy/aSwJy0xZu6GyaJ4m2l heA7/p75wo9Xs5ByATgzbBhldY/v/HKv0Z/z5ZFYMcVt/WM4Ppd9xlk69zFoCyCyoV/m 3vmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711668909; x=1712273709; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0qg3i6J4NScNr25m5MiWnkmUy5W5kQx3QdqVHYNf+yM=; b=IH+OS3GzCoFPR+0++/8ZziwhuKdz3CrclIjqBWSQdvq6uDeHXKg6fP+Bt3gRI3GYG1 8yFugICz8H5YfaJm3OTw0bix0e90jpkgXa/d09lC++wgcSSfQf1EYQy/53UcMefbJ9nZ KXy3kD3qiZ+n6QKcgNtTUUCfKtIGR172gz+DYH7cHqkWUPPhsysg0Wto4ErwFrQxRQ3C FbxaZBdRaW7zjyIgIf1EhnvofeAkh0/WZGN+i+gI/CRc3X6LVaR0+dGqbvopG7n765X3 5eBCBq0D+fc1drQj0uC8b/jGnWgw1Q+8LZYyATi0kgCnLmzLH6gP8yWMhBg8xXvajjRB PZkA== X-Gm-Message-State: AOJu0YxC5nYkr8S7bJbLkBa+aWYaa7uxcXQuNFKEvSjwF6ah3qsQmVRi af5pGiqW2hYbzJue6eZ7wNzZgzNszxyUNd0UC5kdt94TJ8ijR7Zrn5rEileXL6/Lt7HF2cuLrX5 2 X-Google-Smtp-Source: AGHT+IHN8UgscSzWsLEG4Nvkd7sBZneZ8BoeCk4IYSS0kOTkknQZOWxXAp8abv5DaViRVSXpKY6YWg== X-Received: by 2002:a17:902:aa83:b0:1dd:de68:46cf with SMTP id d3-20020a170902aa8300b001ddde6846cfmr895713plr.6.1711668908651; Thu, 28 Mar 2024 16:35:08 -0700 (PDT) Received: from localhost.localdomain ([50.234.116.5]) by smtp.gmail.com with ESMTPSA id i6-20020a170902c94600b001e0b3c9fe60sm2216981pla.46.2024.03.28.16.35.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Mar 2024 16:35:07 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: hannes@cmpxchg.org, Jens Axboe Subject: [PATCH 11/11] io_uring: move mapping/allocation helpers to a separate file Date: Thu, 28 Mar 2024 17:31:38 -0600 Message-ID: <20240328233443.797828-12-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240328233443.797828-1-axboe@kernel.dk> References: <20240328233443.797828-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Move the related code from io_uring.c into memmap.c. No functional changes in this patch, just cleaning it up a bit now that the full transition is done. Signed-off-by: Jens Axboe --- io_uring/Makefile | 3 +- io_uring/io_uring.c | 324 +----------------------------------------- io_uring/io_uring.h | 9 -- io_uring/kbuf.c | 1 + io_uring/memmap.c | 333 ++++++++++++++++++++++++++++++++++++++++++++ io_uring/memmap.h | 25 ++++ io_uring/rsrc.c | 1 + 7 files changed, 364 insertions(+), 332 deletions(-) create mode 100644 io_uring/memmap.c create mode 100644 io_uring/memmap.h diff --git a/io_uring/Makefile b/io_uring/Makefile index bd7c692a6a7c..fc1b23c524e8 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -8,7 +8,8 @@ obj-$(CONFIG_IO_URING) += io_uring.o opdef.o kbuf.o rsrc.o notif.o \ xattr.o nop.o fs.o splice.o sync.o \ msg_ring.o advise.o openclose.o \ epoll.o statx.o timeout.o fdinfo.o \ - cancel.o waitid.o register.o truncate.o + cancel.o waitid.o register.o \ + truncate.o memmap.o obj-$(CONFIG_IO_WQ) += io-wq.o obj-$(CONFIG_FUTEX) += futex.o obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 00b98e80f8ca..fddaefb9cbff 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -95,6 +95,7 @@ #include "futex.h" #include "napi.h" #include "uring_cmd.h" +#include "memmap.h" #include "timeout.h" #include "poll.h" @@ -2591,108 +2592,6 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0; } -void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages, - bool put_pages) -{ - bool do_vunmap = false; - - if (put_pages && *npages) { - struct page **to_free = *pages; - int i; - - /* - * Only did vmap for the non-compound multiple page case. - * For the compound page, we just need to put the head. - */ - if (PageCompound(to_free[0])) - *npages = 1; - else if (*npages > 1) - do_vunmap = true; - for (i = 0; i < *npages; i++) - put_page(to_free[i]); - } - if (do_vunmap) - vunmap(ptr); - kvfree(*pages); - *pages = NULL; - *npages = 0; -} - -static void io_pages_free(struct page ***pages, int npages) -{ - struct page **page_array = *pages; - - if (!page_array) - return; - - unpin_user_pages(page_array, npages); - kvfree(page_array); - *pages = NULL; -} - -struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages) -{ - unsigned long start, end, nr_pages; - struct page **pages; - int ret; - - end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; - start = uaddr >> PAGE_SHIFT; - nr_pages = end - start; - if (WARN_ON_ONCE(!nr_pages)) - return ERR_PTR(-EINVAL); - - pages = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL); - if (!pages) - return ERR_PTR(-ENOMEM); - - ret = pin_user_pages_fast(uaddr, nr_pages, FOLL_WRITE | FOLL_LONGTERM, - pages); - /* success, mapped all pages */ - if (ret == nr_pages) { - *npages = nr_pages; - return pages; - } - - /* partial map, or didn't map anything */ - if (ret >= 0) { - /* if we did partial map, release any pages we did get */ - if (ret) - unpin_user_pages(pages, ret); - ret = -EFAULT; - } - kvfree(pages); - return ERR_PTR(ret); -} - -static void *__io_uaddr_map(struct page ***pages, unsigned short *npages, - unsigned long uaddr, size_t size) -{ - struct page **page_array; - unsigned int nr_pages; - void *page_addr; - - *npages = 0; - - if (uaddr & (PAGE_SIZE - 1) || !size) - return ERR_PTR(-EINVAL); - - nr_pages = 0; - page_array = io_pin_pages(uaddr, size, &nr_pages); - if (IS_ERR(page_array)) - return page_array; - - page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL); - if (page_addr) { - *pages = page_array; - *npages = nr_pages; - return page_addr; - } - - io_pages_free(&page_array, nr_pages); - return ERR_PTR(-ENOMEM); -} - static void *io_rings_map(struct io_ring_ctx *ctx, unsigned long uaddr, size_t size) { @@ -2727,80 +2626,6 @@ static void io_rings_free(struct io_ring_ctx *ctx) ctx->sq_sqes = NULL; } -static void *io_mem_alloc_compound(struct page **pages, int nr_pages, - size_t size, gfp_t gfp) -{ - struct page *page; - int i, order; - - order = get_order(size); - if (order > MAX_PAGE_ORDER) - return NULL; - else if (order) - gfp |= __GFP_COMP; - - page = alloc_pages(gfp, order); - if (!page) - return NULL; - - for (i = 0; i < nr_pages; i++) - pages[i] = page + i; - - return page_address(page); -} - -static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size, - gfp_t gfp) -{ - void *ret; - int i; - - for (i = 0; i < nr_pages; i++) { - pages[i] = alloc_page(gfp); - if (!pages[i]) - goto err; - } - - ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); - if (ret) - return ret; -err: - while (i--) - put_page(pages[i]); - return ERR_PTR(-ENOMEM); -} - -void *io_pages_map(struct page ***out_pages, unsigned short *npages, - size_t size) -{ - gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN; - struct page **pages; - int nr_pages; - void *ret; - - nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; - pages = kvmalloc_array(nr_pages, sizeof(struct page *), gfp); - if (!pages) - return ERR_PTR(-ENOMEM); - - ret = io_mem_alloc_compound(pages, nr_pages, size, gfp); - if (ret) - goto done; - - ret = io_mem_alloc_single(pages, nr_pages, size, gfp); - if (ret) { -done: - *out_pages = pages; - *npages = nr_pages; - return ret; - } - - kvfree(pages); - *out_pages = NULL; - *npages = 0; - return ERR_PTR(-ENOMEM); -} - static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, unsigned int cq_entries, size_t *sq_offset) { @@ -3361,149 +3186,6 @@ void __io_uring_cancel(bool cancel_all) io_uring_cancel_generic(cancel_all, NULL); } -static void *io_uring_validate_mmap_request(struct file *file, - loff_t pgoff, size_t sz) -{ - struct io_ring_ctx *ctx = file->private_data; - loff_t offset = pgoff << PAGE_SHIFT; - - switch ((pgoff << PAGE_SHIFT) & IORING_OFF_MMAP_MASK) { - case IORING_OFF_SQ_RING: - case IORING_OFF_CQ_RING: - /* Don't allow mmap if the ring was setup without it */ - if (ctx->flags & IORING_SETUP_NO_MMAP) - return ERR_PTR(-EINVAL); - return ctx->rings; - case IORING_OFF_SQES: - /* Don't allow mmap if the ring was setup without it */ - if (ctx->flags & IORING_SETUP_NO_MMAP) - return ERR_PTR(-EINVAL); - return ctx->sq_sqes; - case IORING_OFF_PBUF_RING: { - struct io_buffer_list *bl; - unsigned int bgid; - void *ret; - - bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT; - bl = io_pbuf_get_bl(ctx, bgid); - if (IS_ERR(bl)) - return bl; - ret = bl->buf_ring; - io_put_bl(ctx, bl); - return ret; - } - } - - return ERR_PTR(-EINVAL); -} - -int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma, - struct page **pages, int npages) -{ - unsigned long nr_pages = npages; - - vm_flags_set(vma, VM_DONTEXPAND); - return vm_insert_pages(vma, vma->vm_start, pages, &nr_pages); -} - -#ifdef CONFIG_MMU - -static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) -{ - struct io_ring_ctx *ctx = file->private_data; - size_t sz = vma->vm_end - vma->vm_start; - long offset = vma->vm_pgoff << PAGE_SHIFT; - void *ptr; - - ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz); - if (IS_ERR(ptr)) - return PTR_ERR(ptr); - - switch (offset & IORING_OFF_MMAP_MASK) { - case IORING_OFF_SQ_RING: - case IORING_OFF_CQ_RING: - return io_uring_mmap_pages(ctx, vma, ctx->ring_pages, - ctx->n_ring_pages); - case IORING_OFF_SQES: - return io_uring_mmap_pages(ctx, vma, ctx->sqe_pages, - ctx->n_sqe_pages); - case IORING_OFF_PBUF_RING: - return io_pbuf_mmap(file, vma); - } - - return -EINVAL; -} - -static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp, - unsigned long addr, unsigned long len, - unsigned long pgoff, unsigned long flags) -{ - void *ptr; - - /* - * Do not allow to map to user-provided address to avoid breaking the - * aliasing rules. Userspace is not able to guess the offset address of - * kernel kmalloc()ed memory area. - */ - if (addr) - return -EINVAL; - - ptr = io_uring_validate_mmap_request(filp, pgoff, len); - if (IS_ERR(ptr)) - return -ENOMEM; - - /* - * Some architectures have strong cache aliasing requirements. - * For such architectures we need a coherent mapping which aliases - * kernel memory *and* userspace memory. To achieve that: - * - use a NULL file pointer to reference physical memory, and - * - use the kernel virtual address of the shared io_uring context - * (instead of the userspace-provided address, which has to be 0UL - * anyway). - * - use the same pgoff which the get_unmapped_area() uses to - * calculate the page colouring. - * For architectures without such aliasing requirements, the - * architecture will return any suitable mapping because addr is 0. - */ - filp = NULL; - flags |= MAP_SHARED; - pgoff = 0; /* has been translated to ptr above */ -#ifdef SHM_COLOUR - addr = (uintptr_t) ptr; - pgoff = addr >> PAGE_SHIFT; -#else - addr = 0UL; -#endif - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); -} - -#else /* !CONFIG_MMU */ - -static int io_uring_mmap(struct file *file, struct vm_area_struct *vma) -{ - return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL; -} - -static unsigned int io_uring_nommu_mmap_capabilities(struct file *file) -{ - return NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_WRITE; -} - -static unsigned long io_uring_nommu_get_unmapped_area(struct file *file, - unsigned long addr, unsigned long len, - unsigned long pgoff, unsigned long flags) -{ - void *ptr; - - ptr = io_uring_validate_mmap_request(file, pgoff, len); - if (IS_ERR(ptr)) - return PTR_ERR(ptr); - - return (unsigned long) ptr; -} - -#endif /* !CONFIG_MMU */ - static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t argsz) { if (flags & IORING_ENTER_EXT_ARG) { @@ -3686,11 +3368,9 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, static const struct file_operations io_uring_fops = { .release = io_uring_release, .mmap = io_uring_mmap, + .get_unmapped_area = io_uring_get_unmapped_area, #ifndef CONFIG_MMU - .get_unmapped_area = io_uring_nommu_get_unmapped_area, .mmap_capabilities = io_uring_nommu_mmap_capabilities, -#else - .get_unmapped_area = io_uring_mmu_get_unmapped_area, #endif .poll = io_uring_poll, #ifdef CONFIG_PROC_FS diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index dec996a1c789..1eb65324792a 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -69,10 +69,6 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags); void __io_commit_cqring_flush(struct io_ring_ctx *ctx); -struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages); -int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma, - struct page **pages, int npages); - struct file *io_file_get_normal(struct io_kiocb *req, int fd); struct file *io_file_get_fixed(struct io_kiocb *req, int fd, unsigned issue_flags); @@ -109,11 +105,6 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); -void *io_pages_map(struct page ***out_pages, unsigned short *npages, - size_t size); -void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages, - bool put_pages); - enum { IO_EVENTFD_OP_SIGNAL_BIT, IO_EVENTFD_OP_FREE_BIT, diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 3ba576ccb1d9..96dd8d05c754 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -15,6 +15,7 @@ #include "io_uring.h" #include "opdef.h" #include "kbuf.h" +#include "memmap.h" #define IO_BUFFER_LIST_BUF_PER_PAGE (PAGE_SIZE / sizeof(struct io_uring_buf)) diff --git a/io_uring/memmap.c b/io_uring/memmap.c new file mode 100644 index 000000000000..acf5e8ca6b28 --- /dev/null +++ b/io_uring/memmap.c @@ -0,0 +1,333 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "memmap.h" +#include "kbuf.h" + +static void *io_mem_alloc_compound(struct page **pages, int nr_pages, + size_t size, gfp_t gfp) +{ + struct page *page; + int i, order; + + order = get_order(size); + if (order > MAX_PAGE_ORDER) + return NULL; + else if (order) + gfp |= __GFP_COMP; + + page = alloc_pages(gfp, order); + if (!page) + return NULL; + + for (i = 0; i < nr_pages; i++) + pages[i] = page + i; + + return page_address(page); +} + +static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size, + gfp_t gfp) +{ + void *ret; + int i; + + for (i = 0; i < nr_pages; i++) { + pages[i] = alloc_page(gfp); + if (!pages[i]) + goto err; + } + + ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); + if (ret) + return ret; +err: + while (i--) + put_page(pages[i]); + return ERR_PTR(-ENOMEM); +} + +void *io_pages_map(struct page ***out_pages, unsigned short *npages, + size_t size) +{ + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN; + struct page **pages; + int nr_pages; + void *ret; + + nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + pages = kvmalloc_array(nr_pages, sizeof(struct page *), gfp); + if (!pages) + return ERR_PTR(-ENOMEM); + + ret = io_mem_alloc_compound(pages, nr_pages, size, gfp); + if (ret) + goto done; + + ret = io_mem_alloc_single(pages, nr_pages, size, gfp); + if (ret) { +done: + *out_pages = pages; + *npages = nr_pages; + return ret; + } + + kvfree(pages); + *out_pages = NULL; + *npages = 0; + return ERR_PTR(-ENOMEM); +} + +void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages, + bool put_pages) +{ + bool do_vunmap = false; + + if (put_pages && *npages) { + struct page **to_free = *pages; + int i; + + /* + * Only did vmap for the non-compound multiple page case. + * For the compound page, we just need to put the head. + */ + if (PageCompound(to_free[0])) + *npages = 1; + else if (*npages > 1) + do_vunmap = true; + for (i = 0; i < *npages; i++) + put_page(to_free[i]); + } + if (do_vunmap) + vunmap(ptr); + kvfree(*pages); + *pages = NULL; + *npages = 0; +} + +void io_pages_free(struct page ***pages, int npages) +{ + struct page **page_array = *pages; + + if (!page_array) + return; + + unpin_user_pages(page_array, npages); + kvfree(page_array); + *pages = NULL; +} + +struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages) +{ + unsigned long start, end, nr_pages; + struct page **pages; + int ret; + + end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; + start = uaddr >> PAGE_SHIFT; + nr_pages = end - start; + if (WARN_ON_ONCE(!nr_pages)) + return ERR_PTR(-EINVAL); + + pages = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!pages) + return ERR_PTR(-ENOMEM); + + ret = pin_user_pages_fast(uaddr, nr_pages, FOLL_WRITE | FOLL_LONGTERM, + pages); + /* success, mapped all pages */ + if (ret == nr_pages) { + *npages = nr_pages; + return pages; + } + + /* partial map, or didn't map anything */ + if (ret >= 0) { + /* if we did partial map, release any pages we did get */ + if (ret) + unpin_user_pages(pages, ret); + ret = -EFAULT; + } + kvfree(pages); + return ERR_PTR(ret); +} + +void *__io_uaddr_map(struct page ***pages, unsigned short *npages, + unsigned long uaddr, size_t size) +{ + struct page **page_array; + unsigned int nr_pages; + void *page_addr; + + *npages = 0; + + if (uaddr & (PAGE_SIZE - 1) || !size) + return ERR_PTR(-EINVAL); + + nr_pages = 0; + page_array = io_pin_pages(uaddr, size, &nr_pages); + if (IS_ERR(page_array)) + return page_array; + + page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL); + if (page_addr) { + *pages = page_array; + *npages = nr_pages; + return page_addr; + } + + io_pages_free(&page_array, nr_pages); + return ERR_PTR(-ENOMEM); +} + +static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff, + size_t sz) +{ + struct io_ring_ctx *ctx = file->private_data; + loff_t offset = pgoff << PAGE_SHIFT; + + switch ((pgoff << PAGE_SHIFT) & IORING_OFF_MMAP_MASK) { + case IORING_OFF_SQ_RING: + case IORING_OFF_CQ_RING: + /* Don't allow mmap if the ring was setup without it */ + if (ctx->flags & IORING_SETUP_NO_MMAP) + return ERR_PTR(-EINVAL); + return ctx->rings; + case IORING_OFF_SQES: + /* Don't allow mmap if the ring was setup without it */ + if (ctx->flags & IORING_SETUP_NO_MMAP) + return ERR_PTR(-EINVAL); + return ctx->sq_sqes; + case IORING_OFF_PBUF_RING: { + struct io_buffer_list *bl; + unsigned int bgid; + void *ret; + + bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT; + bl = io_pbuf_get_bl(ctx, bgid); + if (IS_ERR(bl)) + return bl; + ret = bl->buf_ring; + io_put_bl(ctx, bl); + return ret; + } + } + + return ERR_PTR(-EINVAL); +} + +int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma, + struct page **pages, int npages) +{ + unsigned long nr_pages = npages; + + vm_flags_set(vma, VM_DONTEXPAND); + return vm_insert_pages(vma, vma->vm_start, pages, &nr_pages); +} + +#ifdef CONFIG_MMU + +__cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct io_ring_ctx *ctx = file->private_data; + size_t sz = vma->vm_end - vma->vm_start; + long offset = vma->vm_pgoff << PAGE_SHIFT; + void *ptr; + + ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz); + if (IS_ERR(ptr)) + return PTR_ERR(ptr); + + switch (offset & IORING_OFF_MMAP_MASK) { + case IORING_OFF_SQ_RING: + case IORING_OFF_CQ_RING: + return io_uring_mmap_pages(ctx, vma, ctx->ring_pages, + ctx->n_ring_pages); + case IORING_OFF_SQES: + return io_uring_mmap_pages(ctx, vma, ctx->sqe_pages, + ctx->n_sqe_pages); + case IORING_OFF_PBUF_RING: + return io_pbuf_mmap(file, vma); + } + + return -EINVAL; +} + +unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags) +{ + void *ptr; + + /* + * Do not allow to map to user-provided address to avoid breaking the + * aliasing rules. Userspace is not able to guess the offset address of + * kernel kmalloc()ed memory area. + */ + if (addr) + return -EINVAL; + + ptr = io_uring_validate_mmap_request(filp, pgoff, len); + if (IS_ERR(ptr)) + return -ENOMEM; + + /* + * Some architectures have strong cache aliasing requirements. + * For such architectures we need a coherent mapping which aliases + * kernel memory *and* userspace memory. To achieve that: + * - use a NULL file pointer to reference physical memory, and + * - use the kernel virtual address of the shared io_uring context + * (instead of the userspace-provided address, which has to be 0UL + * anyway). + * - use the same pgoff which the get_unmapped_area() uses to + * calculate the page colouring. + * For architectures without such aliasing requirements, the + * architecture will return any suitable mapping because addr is 0. + */ + filp = NULL; + flags |= MAP_SHARED; + pgoff = 0; /* has been translated to ptr above */ +#ifdef SHM_COLOUR + addr = (uintptr_t) ptr; + pgoff = addr >> PAGE_SHIFT; +#else + addr = 0UL; +#endif + return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); +} + +#else /* !CONFIG_MMU */ + +int io_uring_mmap(struct file *file, struct vm_area_struct *vma) +{ + return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL; +} + +unsigned int io_uring_nommu_mmap_capabilities(struct file *file) +{ + return NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_WRITE; +} + +unsigned long io_uring_get_unmapped_area(struct file *file, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags) +{ + void *ptr; + + ptr = io_uring_validate_mmap_request(file, pgoff, len); + if (IS_ERR(ptr)) + return PTR_ERR(ptr); + + return (unsigned long) ptr; +} + +#endif /* !CONFIG_MMU */ diff --git a/io_uring/memmap.h b/io_uring/memmap.h new file mode 100644 index 000000000000..5cec5b7ac49a --- /dev/null +++ b/io_uring/memmap.h @@ -0,0 +1,25 @@ +#ifndef IO_URING_MEMMAP_H +#define IO_URING_MEMMAP_H + +struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages); +void io_pages_free(struct page ***pages, int npages); +int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma, + struct page **pages, int npages); + +void *io_pages_map(struct page ***out_pages, unsigned short *npages, + size_t size); +void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages, + bool put_pages); + +void *__io_uaddr_map(struct page ***pages, unsigned short *npages, + unsigned long uaddr, size_t size); + +#ifndef CONFIG_MMU +unsigned int io_uring_nommu_mmap_capabilities(struct file *file); +#endif +unsigned long io_uring_get_unmapped_area(struct file *file, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags); +int io_uring_mmap(struct file *file, struct vm_area_struct *vma); + +#endif diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 8a34181c97ab..65417c9553b1 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -16,6 +16,7 @@ #include "alloc_cache.h" #include "openclose.h" #include "rsrc.h" +#include "memmap.h" struct io_rsrc_update { struct file *file;