From patchwork Thu Nov 7 20:20:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13867063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5265D5D689 for ; Thu, 7 Nov 2024 20:20:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD7726B00A8; Thu, 7 Nov 2024 15:20:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DAF436B00A9; Thu, 7 Nov 2024 15:20:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4F866B00AA; Thu, 7 Nov 2024 15:20:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AA2606B00A8 for ; Thu, 7 Nov 2024 15:20:43 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2DB08C0268 for ; Thu, 7 Nov 2024 20:20:43 +0000 (UTC) X-FDA: 82760415318.13.24B5FA4 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf29.hostedemail.com (Postfix) with ESMTP id BB62A120012 for ; Thu, 7 Nov 2024 20:19:52 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=cwDULtvL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of 3GCEtZwYKCE8FBGyr5x55x2v.t532z4BE-331Crt1.58x@flex--yuzhao.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3GCEtZwYKCE8FBGyr5x55x2v.t532z4BE-331Crt1.58x@flex--yuzhao.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731010756; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1tthOirtAYAbVODGiUuOcdhR/O6us8nRui7ePy/4XFU=; b=ejEQ9YOdpubiQNssMtVUXsnRgalbm0Q3Vw+Yc97KplZy4gVDTt7TS5ZXymEFgvEYk9ibyG 1A0tCr1qNOnxLUAyBgxrk7WunHwXgL/CFNh2o/vObXQwfG+OwOCJ9JEj4DS5wAE3m1Ycnq cOsbgPBHW70PzMsX+z8BGr32buf4pLQ= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=cwDULtvL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of 3GCEtZwYKCE8FBGyr5x55x2v.t532z4BE-331Crt1.58x@flex--yuzhao.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3GCEtZwYKCE8FBGyr5x55x2v.t532z4BE-331Crt1.58x@flex--yuzhao.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731010756; a=rsa-sha256; cv=none; b=ha4RVqh/hcsa5ub3uqsb5Ww09jgs3eBRyFFC7RXG/bFHTJpnZpaFONyHCwEOk64HMmTmc4 +FdOlg2V3wFcnd+bgNX7Wm8JqeQYmYffWrF81HtAzX7/bG123InlHe9bhe0AYrWRO+te+e v3y1d9ZFGCe5LiUbReNX//yPraRJHnA= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e59dc7df64so17925187b3.1 for ; Thu, 07 Nov 2024 12:20:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010840; x=1731615640; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1tthOirtAYAbVODGiUuOcdhR/O6us8nRui7ePy/4XFU=; b=cwDULtvLoLXFRnP283GX9YNYOy9mJPKx0fAgWlQDL0bmWBE1w+EytS0b+Yr+yZzM1h I6EUy7AEIVuPGU/Qi1k1WFAbrcyppS9bA9ewJfIKZFdGkB8TJoNylDM9y8eQqM2GJN1l umSH0iljWWPjcPbFt/1zEfjUFeAOKlIXGUqP1KVFdyWdV9S0GMamFi4DbbiykW+wxDHV bCgQF+r8g6J3o3LiCtMptX98JXqx/uOuFDV1kcwy2aBAgVQYwCeurqFasgbFxr6EAPaT D4Gpz8WNSBkzRnX2QWoIYr6t2VSSKPaIEGpgVEltNHmIPvyurCr9YILeeg2iAYBSNXgp /7oA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010840; x=1731615640; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1tthOirtAYAbVODGiUuOcdhR/O6us8nRui7ePy/4XFU=; b=KXU/9SbU8H9eEm5lbHgBZLjhUK1SlcsdR6Zfh1W+s5dA0yXdZnbNqJtSX6U2Sb7XdB 5pPMo1GsjHTxCeP1goGpa1Z1r0juegX9rHCMiZt/YE4IMBzLdfFNdDcHztVxb96cN4M8 yfCfrsjNj9qg4y0P+ZNVyI8TlvBe5VR+Ljv/KS6NZcZkxGSyeGZ6d0Piebaj/G7WR/Ei Vux9wNFRm8AYfzvBg9Q4ahJDNXEXmEK+XFiWHmNQ1lzaLejWXmwcfBjxryLI+8EjlvNE LXcU+Ep9Gvke8XPkb6j2r4ksjoyk4dDkvsAL39QyU0RDlBoPy32k3B52izuaSVt/3Dkv bFKw== X-Forwarded-Encrypted: i=1; AJvYcCXxeYyEXyLf2QY7bQcCs/vDSCZTnPDIRd/M5aTdUWhDtFvIqVfi1v1kjfHOxQF74miafLxFclgkoQ==@kvack.org X-Gm-Message-State: AOJu0YwizgXgCk+Mzl5a5pw9A4Eb8ZuwTz7NF79LKMGW+wHs6c3SrHds 5rH6buBmB2HeFXiVbZ8jTW3S6oRw2QKI7bS7BFtMNRWAdf+ngJiwkpjebFEL2KSrRGlexuzuJun ViQ== X-Google-Smtp-Source: AGHT+IGTqsn/Xtckt6+UQcvmaBMjWwhX3GjS38ldjfYHLBKaWt3NH7a5E1E88BJehCRrOBEtvF7Ru96C8rA= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a05:690c:255:b0:6ea:4b3a:6703 with SMTP id 00721157ae682-6eadc16949bmr149247b3.5.1731010840283; Thu, 07 Nov 2024 12:20:40 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:28 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-2-yuzhao@google.com> Subject: [PATCH v2 1/6] mm/hugetlb_vmemmap: batch-update PTEs From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: BB62A120012 X-Stat-Signature: euh33kom6zcnqhcart3uno5tppfyazy8 X-HE-Tag: 1731010792-351604 X-HE-Meta: U2FsdGVkX19gDfNw7akgpnv3h99I1Dx0CDcMRJn7kaUZJAj64oqfQ26zm4KfBeQRD96eiis5T4ivKdL8Ge51ppepinabHG9WpVFhoQNrd2k2jcsWIpWCgfUbyErsQFSPmKx5JrHpmq7WXh0byFwveui5ufkHHpRNOf9jlxShWa/VxPs7Ob3vSIODC23x1PuZA5qMMNtXOH3Ome8NPClnndMg8zGqmJ8/dz5aQnGd3E6g13g7gY9Gh1PavXMzPE7SdnjQmQmofmHbAuo8oER3W2W4GbAaH2lm63Jbtk4AfiHuZKGyrctLt6N2aN8YE5kb96EfvgHLiDGOv48qvzc0LnfZERIWm8s4EwDGbYeedvqmslATPqi1IzQJ+k/gJvhGVDzo6BO/ia8ZLAj6F4akAW9OQnNb9SIxcdImxNcgvPFvZO/aPTZ153AHHXgtf4FqpZzvPnkZU0io+CxylRFBYghaA1Cr7YxDrNEQT69tgoAFGmXVA+GnZlYof1m4EJehSNEfK0MoCpe5lME9sRENeSAytEub9JgXjzaF+lGLPK1Wj4Z/FZw1hZAmI1n7pNuO7866Rrc8BZzZk69yb6eSMAnq45Ds6nGWlMu8yKsPss1Ss4FYkZ1NWnmo5Dt3aX1x5DYGasshKbDlcL95tx3yRwnx8+mHsdIkSf7QgnloeeZr0b8OcaenNhtAt6zglGB8mUmlm2mh4DSTk+4pXyfiNrkS4LhiufghjZhcxpS1o9yl3iUyMRPQNHzL6/BmXlb0+V57PtCk537Q6ePA4uitvK9i8PN6L8X9zPtkeR8Ttk/bylUDbhwcNM0Nzli5Hym8VGhpq9T1V+B6966ezc2gSEone6wHzE8qDuzaU0m4iPqcPhN6IhAZUuIRgssa41MZ4lSLB8FEDTOBppEvJfm0lbYxnhW1rRm8fFOq1VGXgFVi630BGCxkBk6SDddGkhNlFz4uwtAo+oNIdtHS7BP cSilW2OO zSRMmcORH53xFd7e3B4ZTbQohBMcSpE2rcd2bjtRXoSIF28EmGxX3Lxp0605TyeuaVVh/V0iP8b2LboZ70TOZtJbgIZGnM2k2T9lXJGSMGO/X66W5qgHH9HnrL/7BefX+LY9Un2UuGX2kW9X4/qbTWp+XmiGTeUTQ5MBVU0gLC1NvR4XKBTYJj+/O5e0FGjq8CG2/8RygM2g0FojB7KztjxXitQaty29/rHs//+beOG5FCdVtKFmf/mbtxxz/YtzwF+kH524F7mM0trhLoYMmZQGYe1lcWTiVokxcGFEuHgJkTjIWyMeLwSDiA8Ov77KMdlvAfBXHqYAQPzsZ2Jv1K7L2fPIS5UGF1uehQ3gWE12cXKUTDheWzoocBTA9U9Qxbok70rBJ6tTZ2/p1ZTfKdzHviiIsV7o8v+fYYZgCGUDXh+HIZINO2jZBRGzcrxpQ+XWukYJxk+s2/V+TsgGP+EJdNKH9XjzoSd0n7EwHeYNVdScnFOEoUOr5JTJXTLXz6jU4WQklHA4zwdaEcmoCQ/PZ7+XgF7Dq65D5eFT53ohdwLwneH7SC05k/+yfe4tMpoE1azbc4JxAvskDzxslFLAvmQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Convert vmemmap_remap_walk->remap_pte to ->remap_pte_range so that vmemmap remap walks can batch-update PTEs. The goal of this conversion is to allow architectures to implement their own optimizations if possible, e.g., only to stop remote CPUs once for each batch when updating vmemmap on arm64. It is not intended to change the remap workflow nor should it by itself have any side effects on performance. Signed-off-by: Yu Zhao --- mm/hugetlb_vmemmap.c | 163 ++++++++++++++++++++++++------------------- 1 file changed, 91 insertions(+), 72 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 57b7f591eee8..46befab48d41 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -22,7 +22,7 @@ /** * struct vmemmap_remap_walk - walk vmemmap page table * - * @remap_pte: called for each lowest-level entry (PTE). + * @remap_pte_range: called on a range of PTEs. * @nr_walked: the number of walked pte. * @reuse_page: the page which is reused for the tail vmemmap pages. * @reuse_addr: the virtual address of the @reuse_page page. @@ -32,8 +32,8 @@ * operations. */ struct vmemmap_remap_walk { - void (*remap_pte)(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk); + void (*remap_pte_range)(pte_t *pte, unsigned long start, + unsigned long end, struct vmemmap_remap_walk *walk); unsigned long nr_walked; struct page *reuse_page; unsigned long reuse_addr; @@ -101,10 +101,6 @@ static int vmemmap_pmd_entry(pmd_t *pmd, unsigned long addr, struct page *head; struct vmemmap_remap_walk *vmemmap_walk = walk->private; - /* Only splitting, not remapping the vmemmap pages. */ - if (!vmemmap_walk->remap_pte) - walk->action = ACTION_CONTINUE; - spin_lock(&init_mm.page_table_lock); head = pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL; /* @@ -129,33 +125,36 @@ static int vmemmap_pmd_entry(pmd_t *pmd, unsigned long addr, ret = -ENOTSUPP; } spin_unlock(&init_mm.page_table_lock); - if (!head || ret) + if (ret) return ret; - return vmemmap_split_pmd(pmd, head, addr & PMD_MASK, vmemmap_walk); -} + if (head) { + ret = vmemmap_split_pmd(pmd, head, addr & PMD_MASK, vmemmap_walk); + if (ret) + return ret; + } -static int vmemmap_pte_entry(pte_t *pte, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - struct vmemmap_remap_walk *vmemmap_walk = walk->private; + if (vmemmap_walk->remap_pte_range) { + pte_t *pte = pte_offset_kernel(pmd, addr); - /* - * The reuse_page is found 'first' in page table walking before - * starting remapping. - */ - if (!vmemmap_walk->reuse_page) - vmemmap_walk->reuse_page = pte_page(ptep_get(pte)); - else - vmemmap_walk->remap_pte(pte, addr, vmemmap_walk); - vmemmap_walk->nr_walked++; + vmemmap_walk->nr_walked += (next - addr) / PAGE_SIZE; + /* + * The reuse_page is found 'first' in page table walking before + * starting remapping. + */ + if (!vmemmap_walk->reuse_page) { + vmemmap_walk->reuse_page = pte_page(ptep_get(pte)); + pte++; + addr += PAGE_SIZE; + } + vmemmap_walk->remap_pte_range(pte, addr, next, vmemmap_walk); + } return 0; } static const struct mm_walk_ops vmemmap_remap_ops = { .pmd_entry = vmemmap_pmd_entry, - .pte_entry = vmemmap_pte_entry, }; static int vmemmap_remap_range(unsigned long start, unsigned long end, @@ -172,7 +171,7 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end, if (ret) return ret; - if (walk->remap_pte && !(walk->flags & VMEMMAP_REMAP_NO_TLB_FLUSH)) + if (walk->remap_pte_range && !(walk->flags & VMEMMAP_REMAP_NO_TLB_FLUSH)) flush_tlb_kernel_range(start, end); return 0; @@ -204,33 +203,45 @@ static void free_vmemmap_page_list(struct list_head *list) free_vmemmap_page(page); } -static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk) +static void vmemmap_remap_pte_range(pte_t *pte, unsigned long start, unsigned long end, + struct vmemmap_remap_walk *walk) { - /* - * Remap the tail pages as read-only to catch illegal write operation - * to the tail pages. - */ - pgprot_t pgprot = PAGE_KERNEL_RO; - struct page *page = pte_page(ptep_get(pte)); - pte_t entry; - - /* Remapping the head page requires r/w */ - if (unlikely(addr == walk->reuse_addr)) { - pgprot = PAGE_KERNEL; - list_del(&walk->reuse_page->lru); + int i; + struct page *page; + int nr_pages = (end - start) / PAGE_SIZE; + for (i = 0; i < nr_pages; i++) { + page = pte_page(ptep_get(pte + i)); + + list_add(&page->lru, walk->vmemmap_pages); + } + + page = walk->reuse_page; + + if (start == walk->reuse_addr) { + list_del(&page->lru); + copy_page(page_to_virt(page), (void *)walk->reuse_addr); /* - * Makes sure that preceding stores to the page contents from - * vmemmap_remap_free() become visible before the set_pte_at() - * write. + * Makes sure that preceding stores to the page contents become + * visible before set_pte_at(). */ smp_wmb(); } - entry = mk_pte(walk->reuse_page, pgprot); - list_add(&page->lru, walk->vmemmap_pages); - set_pte_at(&init_mm, addr, pte, entry); + for (i = 0; i < nr_pages; i++) { + pte_t val; + + /* + * The head page must be mapped read-write; the tail pages are + * mapped read-only to catch illegal modifications. + */ + if (!i && start == walk->reuse_addr) + val = mk_pte(page, PAGE_KERNEL); + else + val = mk_pte(page, PAGE_KERNEL_RO); + + set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); + } } /* @@ -252,27 +263,39 @@ static inline void reset_struct_pages(struct page *start) memcpy(start, from, sizeof(*from) * NR_RESET_STRUCT_PAGE); } -static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk) +static void vmemmap_restore_pte_range(pte_t *pte, unsigned long start, unsigned long end, + struct vmemmap_remap_walk *walk) { - pgprot_t pgprot = PAGE_KERNEL; + int i; struct page *page; - void *to; - - BUG_ON(pte_page(ptep_get(pte)) != walk->reuse_page); + int nr_pages = (end - start) / PAGE_SIZE; page = list_first_entry(walk->vmemmap_pages, struct page, lru); - list_del(&page->lru); - to = page_to_virt(page); - copy_page(to, (void *)walk->reuse_addr); - reset_struct_pages(to); + + for (i = 0; i < nr_pages; i++) { + BUG_ON(pte_page(ptep_get(pte + i)) != walk->reuse_page); + + copy_page(page_to_virt(page), (void *)walk->reuse_addr); + reset_struct_pages(page_to_virt(page)); + + page = list_next_entry(page, lru); + } /* * Makes sure that preceding stores to the page contents become visible - * before the set_pte_at() write. + * before set_pte_at(). */ smp_wmb(); - set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); + + for (i = 0; i < nr_pages; i++) { + pte_t val; + + page = list_first_entry(walk->vmemmap_pages, struct page, lru); + list_del(&page->lru); + + val = mk_pte(page, PAGE_KERNEL); + set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); + } } /** @@ -290,7 +313,6 @@ static int vmemmap_remap_split(unsigned long start, unsigned long end, unsigned long reuse) { struct vmemmap_remap_walk walk = { - .remap_pte = NULL, .flags = VMEMMAP_SPLIT_NO_TLB_FLUSH, }; @@ -322,10 +344,10 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, { int ret; struct vmemmap_remap_walk walk = { - .remap_pte = vmemmap_remap_pte, - .reuse_addr = reuse, - .vmemmap_pages = vmemmap_pages, - .flags = flags, + .remap_pte_range = vmemmap_remap_pte_range, + .reuse_addr = reuse, + .vmemmap_pages = vmemmap_pages, + .flags = flags, }; int nid = page_to_nid((struct page *)reuse); gfp_t gfp_mask = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; @@ -340,8 +362,6 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, */ walk.reuse_page = alloc_pages_node(nid, gfp_mask, 0); if (walk.reuse_page) { - copy_page(page_to_virt(walk.reuse_page), - (void *)walk.reuse_addr); list_add(&walk.reuse_page->lru, vmemmap_pages); memmap_pages_add(1); } @@ -371,10 +391,9 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, * They will be restored in the following call. */ walk = (struct vmemmap_remap_walk) { - .remap_pte = vmemmap_restore_pte, - .reuse_addr = reuse, - .vmemmap_pages = vmemmap_pages, - .flags = 0, + .remap_pte_range = vmemmap_restore_pte_range, + .reuse_addr = reuse, + .vmemmap_pages = vmemmap_pages, }; vmemmap_remap_range(reuse, end, &walk); @@ -425,10 +444,10 @@ static int vmemmap_remap_alloc(unsigned long start, unsigned long end, { LIST_HEAD(vmemmap_pages); struct vmemmap_remap_walk walk = { - .remap_pte = vmemmap_restore_pte, - .reuse_addr = reuse, - .vmemmap_pages = &vmemmap_pages, - .flags = flags, + .remap_pte_range = vmemmap_restore_pte_range, + .reuse_addr = reuse, + .vmemmap_pages = &vmemmap_pages, + .flags = flags, }; /* See the comment in the vmemmap_remap_free(). */