From patchwork Thu Nov 7 20:20:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13867071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 62D00D5D689 for ; Thu, 7 Nov 2024 20:24:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From: Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=1tthOirtAYAbVODGiUuOcdhR/O6us8nRui7ePy/4XFU=; b=sYFLWNwcchuSj5afPXMGvC0LOP OcTSqxjcepn2UyAPYmFuGIxJt5EEdyvXrbTxQ65Nc0anuXj2VhsMQUoHZvMa5RIVp1UwTfSsT0DMd keqY3tEbSMQ0hyq1H6txvYKy7bYzw8p5lHm/sRq7QohWC5xtel1gbs5NntlpMuyE/JT1LMX1kK/iT pn9TFu59ffICK2FN35jgMXohur2S5+CrWrvNzRfIOmWK8jxFZDCAFw9WdT8o8SaOIXwtCE7cWe89p bDKj9nKfdH8DYaPxr8z5fj2YMKOHNUhRfpBlPZt31m76wIx7ftFmanxXn9qtsie2yqrPqvq6RtSX+ c+qLy3Ag==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t993C-00000008Edv-1IU0; Thu, 07 Nov 2024 20:24:18 +0000 Received: from mail-yw1-x1149.google.com ([2607:f8b0:4864:20::1149]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t98zi-00000008E2u-0cLB for linux-arm-kernel@lists.infradead.org; Thu, 07 Nov 2024 20:20:43 +0000 Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-6e59dc7df64so17925147b3.1 for ; Thu, 07 Nov 2024 12:20:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010840; x=1731615640; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1tthOirtAYAbVODGiUuOcdhR/O6us8nRui7ePy/4XFU=; b=2CXDU0isuYRCSj4pcsHYtr8GvI8GzFUwFstB2zmkSx6VmxnEdMr72hY/l1352c6/kA +bpHAs3eP73L6Ce0Q/mFkYsh1ZFSgbVQKzM+M2STrcm5RabKIekcvy94q1QZCbxR0STg bU1hDJ5frw5S78YfOlA/1QevmDyi6wzycNLpGYGXVAV3VxTbsBDLXC6zAXZSaQrJjn6x K3GZ01cSzmn9CimPos3/usLPc4NnlkyH24Rj0V58YTgi41xtqP78iSa2eyLTW17uBsff Zb5vYjY5NvCUjY1ssUdTjF/oZ39+4C3YlBXW9r2sD84w5q+yQ62plD7uTzvEvekhg9Qh YgMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010840; x=1731615640; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1tthOirtAYAbVODGiUuOcdhR/O6us8nRui7ePy/4XFU=; b=Gyiz0+ZCG9gCZV8t7T0MQ/SlfdbifBUFXTGA5CzSQVVIqFgxMF/owufuj94HxT8y9Q wpLcCCXr0gXKITL0d63uxgqNaBqrR7cPCO9pCTa4mHOc+kHciagStks6lQsNmSvVVnTn a6eeNLsSbqXdadXQKIDCIWphrPYXyKJnbaojw6X4uPREVI62bl4FO5SV2B2ae+zlVEza EZOMr11PfF6eqABI0nIB/1dBk2emyngq0pguF2EhMHqGBuyFE3kkPRzqElgHth6woTVb 51B9t5ZAwC+Gd+ufHCYYVOlBfO/fKuUcQvTMPZY0alxCdgncX0QEgqhOD8LiV4huS6Pl f2xw== X-Forwarded-Encrypted: i=1; AJvYcCVwLznsFwizpYFrTRJttOclfcxRxxw79x8Lpl+joy0C7EScc3TUPsYSdiSa+Lm8WO9GEKRBeybvAwRIJyFlp7I0@lists.infradead.org X-Gm-Message-State: AOJu0YyhLv8mMZ+Mj/itVzvUCVivMxa9a9g4MGzK5Qsg3OSZ8pToW9s6 cMqG9wW4ont4m9DEg4D/8DFz5opAFzsirvcnE76LQXtB/Qw8WrjutlIx20Kch9r8luytmKD/Eon 3zw== X-Google-Smtp-Source: AGHT+IGTqsn/Xtckt6+UQcvmaBMjWwhX3GjS38ldjfYHLBKaWt3NH7a5E1E88BJehCRrOBEtvF7Ru96C8rA= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a05:690c:255:b0:6ea:4b3a:6703 with SMTP id 00721157ae682-6eadc16949bmr149247b3.5.1731010840283; Thu, 07 Nov 2024 12:20:40 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:28 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-2-yuzhao@google.com> Subject: [PATCH v2 1/6] mm/hugetlb_vmemmap: batch-update PTEs From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241107_122042_220883_7058B057 X-CRM114-Status: GOOD ( 26.33 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Convert vmemmap_remap_walk->remap_pte to ->remap_pte_range so that vmemmap remap walks can batch-update PTEs. The goal of this conversion is to allow architectures to implement their own optimizations if possible, e.g., only to stop remote CPUs once for each batch when updating vmemmap on arm64. It is not intended to change the remap workflow nor should it by itself have any side effects on performance. Signed-off-by: Yu Zhao --- mm/hugetlb_vmemmap.c | 163 ++++++++++++++++++++++++------------------- 1 file changed, 91 insertions(+), 72 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 57b7f591eee8..46befab48d41 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -22,7 +22,7 @@ /** * struct vmemmap_remap_walk - walk vmemmap page table * - * @remap_pte: called for each lowest-level entry (PTE). + * @remap_pte_range: called on a range of PTEs. * @nr_walked: the number of walked pte. * @reuse_page: the page which is reused for the tail vmemmap pages. * @reuse_addr: the virtual address of the @reuse_page page. @@ -32,8 +32,8 @@ * operations. */ struct vmemmap_remap_walk { - void (*remap_pte)(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk); + void (*remap_pte_range)(pte_t *pte, unsigned long start, + unsigned long end, struct vmemmap_remap_walk *walk); unsigned long nr_walked; struct page *reuse_page; unsigned long reuse_addr; @@ -101,10 +101,6 @@ static int vmemmap_pmd_entry(pmd_t *pmd, unsigned long addr, struct page *head; struct vmemmap_remap_walk *vmemmap_walk = walk->private; - /* Only splitting, not remapping the vmemmap pages. */ - if (!vmemmap_walk->remap_pte) - walk->action = ACTION_CONTINUE; - spin_lock(&init_mm.page_table_lock); head = pmd_leaf(*pmd) ? pmd_page(*pmd) : NULL; /* @@ -129,33 +125,36 @@ static int vmemmap_pmd_entry(pmd_t *pmd, unsigned long addr, ret = -ENOTSUPP; } spin_unlock(&init_mm.page_table_lock); - if (!head || ret) + if (ret) return ret; - return vmemmap_split_pmd(pmd, head, addr & PMD_MASK, vmemmap_walk); -} + if (head) { + ret = vmemmap_split_pmd(pmd, head, addr & PMD_MASK, vmemmap_walk); + if (ret) + return ret; + } -static int vmemmap_pte_entry(pte_t *pte, unsigned long addr, - unsigned long next, struct mm_walk *walk) -{ - struct vmemmap_remap_walk *vmemmap_walk = walk->private; + if (vmemmap_walk->remap_pte_range) { + pte_t *pte = pte_offset_kernel(pmd, addr); - /* - * The reuse_page is found 'first' in page table walking before - * starting remapping. - */ - if (!vmemmap_walk->reuse_page) - vmemmap_walk->reuse_page = pte_page(ptep_get(pte)); - else - vmemmap_walk->remap_pte(pte, addr, vmemmap_walk); - vmemmap_walk->nr_walked++; + vmemmap_walk->nr_walked += (next - addr) / PAGE_SIZE; + /* + * The reuse_page is found 'first' in page table walking before + * starting remapping. + */ + if (!vmemmap_walk->reuse_page) { + vmemmap_walk->reuse_page = pte_page(ptep_get(pte)); + pte++; + addr += PAGE_SIZE; + } + vmemmap_walk->remap_pte_range(pte, addr, next, vmemmap_walk); + } return 0; } static const struct mm_walk_ops vmemmap_remap_ops = { .pmd_entry = vmemmap_pmd_entry, - .pte_entry = vmemmap_pte_entry, }; static int vmemmap_remap_range(unsigned long start, unsigned long end, @@ -172,7 +171,7 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end, if (ret) return ret; - if (walk->remap_pte && !(walk->flags & VMEMMAP_REMAP_NO_TLB_FLUSH)) + if (walk->remap_pte_range && !(walk->flags & VMEMMAP_REMAP_NO_TLB_FLUSH)) flush_tlb_kernel_range(start, end); return 0; @@ -204,33 +203,45 @@ static void free_vmemmap_page_list(struct list_head *list) free_vmemmap_page(page); } -static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk) +static void vmemmap_remap_pte_range(pte_t *pte, unsigned long start, unsigned long end, + struct vmemmap_remap_walk *walk) { - /* - * Remap the tail pages as read-only to catch illegal write operation - * to the tail pages. - */ - pgprot_t pgprot = PAGE_KERNEL_RO; - struct page *page = pte_page(ptep_get(pte)); - pte_t entry; - - /* Remapping the head page requires r/w */ - if (unlikely(addr == walk->reuse_addr)) { - pgprot = PAGE_KERNEL; - list_del(&walk->reuse_page->lru); + int i; + struct page *page; + int nr_pages = (end - start) / PAGE_SIZE; + for (i = 0; i < nr_pages; i++) { + page = pte_page(ptep_get(pte + i)); + + list_add(&page->lru, walk->vmemmap_pages); + } + + page = walk->reuse_page; + + if (start == walk->reuse_addr) { + list_del(&page->lru); + copy_page(page_to_virt(page), (void *)walk->reuse_addr); /* - * Makes sure that preceding stores to the page contents from - * vmemmap_remap_free() become visible before the set_pte_at() - * write. + * Makes sure that preceding stores to the page contents become + * visible before set_pte_at(). */ smp_wmb(); } - entry = mk_pte(walk->reuse_page, pgprot); - list_add(&page->lru, walk->vmemmap_pages); - set_pte_at(&init_mm, addr, pte, entry); + for (i = 0; i < nr_pages; i++) { + pte_t val; + + /* + * The head page must be mapped read-write; the tail pages are + * mapped read-only to catch illegal modifications. + */ + if (!i && start == walk->reuse_addr) + val = mk_pte(page, PAGE_KERNEL); + else + val = mk_pte(page, PAGE_KERNEL_RO); + + set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); + } } /* @@ -252,27 +263,39 @@ static inline void reset_struct_pages(struct page *start) memcpy(start, from, sizeof(*from) * NR_RESET_STRUCT_PAGE); } -static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, - struct vmemmap_remap_walk *walk) +static void vmemmap_restore_pte_range(pte_t *pte, unsigned long start, unsigned long end, + struct vmemmap_remap_walk *walk) { - pgprot_t pgprot = PAGE_KERNEL; + int i; struct page *page; - void *to; - - BUG_ON(pte_page(ptep_get(pte)) != walk->reuse_page); + int nr_pages = (end - start) / PAGE_SIZE; page = list_first_entry(walk->vmemmap_pages, struct page, lru); - list_del(&page->lru); - to = page_to_virt(page); - copy_page(to, (void *)walk->reuse_addr); - reset_struct_pages(to); + + for (i = 0; i < nr_pages; i++) { + BUG_ON(pte_page(ptep_get(pte + i)) != walk->reuse_page); + + copy_page(page_to_virt(page), (void *)walk->reuse_addr); + reset_struct_pages(page_to_virt(page)); + + page = list_next_entry(page, lru); + } /* * Makes sure that preceding stores to the page contents become visible - * before the set_pte_at() write. + * before set_pte_at(). */ smp_wmb(); - set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); + + for (i = 0; i < nr_pages; i++) { + pte_t val; + + page = list_first_entry(walk->vmemmap_pages, struct page, lru); + list_del(&page->lru); + + val = mk_pte(page, PAGE_KERNEL); + set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); + } } /** @@ -290,7 +313,6 @@ static int vmemmap_remap_split(unsigned long start, unsigned long end, unsigned long reuse) { struct vmemmap_remap_walk walk = { - .remap_pte = NULL, .flags = VMEMMAP_SPLIT_NO_TLB_FLUSH, }; @@ -322,10 +344,10 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, { int ret; struct vmemmap_remap_walk walk = { - .remap_pte = vmemmap_remap_pte, - .reuse_addr = reuse, - .vmemmap_pages = vmemmap_pages, - .flags = flags, + .remap_pte_range = vmemmap_remap_pte_range, + .reuse_addr = reuse, + .vmemmap_pages = vmemmap_pages, + .flags = flags, }; int nid = page_to_nid((struct page *)reuse); gfp_t gfp_mask = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; @@ -340,8 +362,6 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, */ walk.reuse_page = alloc_pages_node(nid, gfp_mask, 0); if (walk.reuse_page) { - copy_page(page_to_virt(walk.reuse_page), - (void *)walk.reuse_addr); list_add(&walk.reuse_page->lru, vmemmap_pages); memmap_pages_add(1); } @@ -371,10 +391,9 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, * They will be restored in the following call. */ walk = (struct vmemmap_remap_walk) { - .remap_pte = vmemmap_restore_pte, - .reuse_addr = reuse, - .vmemmap_pages = vmemmap_pages, - .flags = 0, + .remap_pte_range = vmemmap_restore_pte_range, + .reuse_addr = reuse, + .vmemmap_pages = vmemmap_pages, }; vmemmap_remap_range(reuse, end, &walk); @@ -425,10 +444,10 @@ static int vmemmap_remap_alloc(unsigned long start, unsigned long end, { LIST_HEAD(vmemmap_pages); struct vmemmap_remap_walk walk = { - .remap_pte = vmemmap_restore_pte, - .reuse_addr = reuse, - .vmemmap_pages = &vmemmap_pages, - .flags = flags, + .remap_pte_range = vmemmap_restore_pte_range, + .reuse_addr = reuse, + .vmemmap_pages = &vmemmap_pages, + .flags = flags, }; /* See the comment in the vmemmap_remap_free(). */ From patchwork Thu Nov 7 20:20:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13867072 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1CBDFD5D689 for ; Thu, 7 Nov 2024 20:26:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From: Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Oo5C9e0+cl8r7hE7UWmRxwqyWt2dE16i/xTv+Vqu/GQ=; b=auXtLy/s93FbCSAeME6bSsTpjY gSlA59bLV9nl2L1ZHBWGBieo8NXy+S9+NY/uS49uP9f32d+DsB/LmhxLUbS/N6AtI1SPMdeeawCSH zRrhftovlgzYbRmiEWAx+AI4bhajA0GBsUCgUp2l7Zx5M+xImDzbIVSp7HZO2UlDMytTwNGG/4Yh1 vKz1wjamB/K+wR1RAaxltqodNZ1DfRT5Up7clJddzUGH9koLUFEJWVnQSUuMdw2YLcJRU2QSJrWL7 +GXbrbD367CdT2rP3peNmA2wN3HN51nw13tPpZKSXJvaGcU/dMhkkMHu9C6XVZlLNoD8NZjVDDebq JLYH9H3A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t994u-00000008EoX-3Yhz; Thu, 07 Nov 2024 20:26:04 +0000 Received: from mail-yw1-x114a.google.com ([2607:f8b0:4864:20::114a]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t98zk-00000008E3H-2tUf for linux-arm-kernel@lists.infradead.org; Thu, 07 Nov 2024 20:20:46 +0000 Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-6ea527764c3so24331677b3.0 for ; Thu, 07 Nov 2024 12:20:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010843; x=1731615643; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Oo5C9e0+cl8r7hE7UWmRxwqyWt2dE16i/xTv+Vqu/GQ=; b=dVXgi5wL3+/yZhsfifJerY+kXuSSmQSnd1rWGbexeRB0BTWuBa0N+DnUkk/TaNmGqJ FlUByfNHPnL/OgsvEOuNrLccx3bDT9OqQyRg8aCGRDyzuVw1sIUYAOyv2MTo/D1PlTkj 7VqavHOnRMTGVVTw5u0NGAN9J8GFNLPC4nGKSMYXNotm/OS6DxiPVMTfRiDxo/+dGjG3 D4QkscChzaal2X0GG373tuxIz0wUiLzjwnr0pZ7ZM9UyNatde86UGsGNsz7VHnL3E4GR iucMlycad82rWUDHWadRcOCDkUxxp9pwHOQU1zs0cMvmvnUUkFyuIgU+4GijKGy94k1L 8zOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010843; x=1731615643; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Oo5C9e0+cl8r7hE7UWmRxwqyWt2dE16i/xTv+Vqu/GQ=; b=KAVjfQvn5Xw4QChplD9p4hFEJxoe2zvp9afqzYLz34H9IFdenZ7wHHaqwhnB9EbzvS wHg7jZER9xtZsLWh3S+YV7dOcQord33GU3kHKoj5+V9vMe+FV9V465RDKVG1T/mRlBp4 lukkDcUNxvQYqVQ/6zzIKfXLkk0QkVXQ0Y4THu+K/0tAfoOBhEgY/mwogqpx+3InlWZT O+ovZ/ofTonAUYGBqO/HVt7p+Op0FwvBPc2nfBCgDZRRa2dVvbbSPxeeDcu14w/ku+Fx 8tUy+rfS1fWohCq36OWbJz+fCiONY2ixCKPq2L/mfGVmSkP4xSvvEl/iCKCmzVvN21nt qUOw== X-Forwarded-Encrypted: i=1; AJvYcCVnp0iQxl5/+Hq7cnAlKjyT/N9sZWVvtmaOn5BEEkUpoQOuk/2BIzcQ06MmaX1QX4fXA0I/d2zogRJl+nYZuYGF@lists.infradead.org X-Gm-Message-State: AOJu0Yz2z1WYWR7A8a/D8XcddlVNwQD+Oe+Ceuvq0So2SHm9eFgj0qhE fN3xOlmk0spvcQMHcvpV+UPMoR8Sh7ToOX3QigYxqWFVWVeU4yAmjYL+XHhO3Fcr8TCZulyDkNp YEw== X-Google-Smtp-Source: AGHT+IFItECqZ2dxgSyANVkerUl0Fg/UFqJy0LSXkY8wHLj1uLzeJdS7jFKPfch4UM5ukL5EJb9In+OVPNo= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a05:690c:4a13:b0:6e3:b93:3ae2 with SMTP id 00721157ae682-6eaddd704d6mr14027b3.1.1731010842758; Thu, 07 Nov 2024 12:20:42 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:29 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-3-yuzhao@google.com> Subject: [PATCH v2 2/6] mm/hugetlb_vmemmap: add arch-independent helpers From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241107_122044_762207_28595036 X-CRM114-Status: GOOD ( 22.39 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Add architecture-independent helpers to allow individual architectures to work around their own limitations when updating vmemmap. Specifically, the current remap workflow requires break-before-make (BBM) on arm64. By overriding the default helpers later in this series, arm64 will be able to support the current HVO implementation. Signed-off-by: Yu Zhao --- include/linux/mm_types.h | 7 +++ mm/hugetlb_vmemmap.c | 99 ++++++++++++++++++++++++++++++++++------ 2 files changed, 92 insertions(+), 14 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e3bdf8e38bc..0f3ae6e173f6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1499,4 +1499,11 @@ enum { /* See also internal only FOLL flags in mm/internal.h */ }; +/* Skip the TLB flush when we split the PMD */ +#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) +/* Skip the TLB flush when we remap the PTE */ +#define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) +/* synchronize_rcu() to avoid writes from page_ref_add_unless() */ +#define VMEMMAP_SYNCHRONIZE_RCU BIT(2) + #endif /* _LINUX_MM_TYPES_H */ diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 46befab48d41..e50a196399f5 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -38,16 +38,56 @@ struct vmemmap_remap_walk { struct page *reuse_page; unsigned long reuse_addr; struct list_head *vmemmap_pages; - -/* Skip the TLB flush when we split the PMD */ -#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) -/* Skip the TLB flush when we remap the PTE */ -#define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) -/* synchronize_rcu() to avoid writes from page_ref_add_unless() */ -#define VMEMMAP_SYNCHRONIZE_RCU BIT(2) unsigned long flags; }; +#ifndef VMEMMAP_ARCH_TLB_FLUSH_FLAGS +#define VMEMMAP_ARCH_TLB_FLUSH_FLAGS 0 +#endif + +#ifndef vmemmap_update_supported +static bool vmemmap_update_supported(void) +{ + return true; +} +#endif + +#ifndef vmemmap_update_lock +static void vmemmap_update_lock(void) +{ +} +#endif + +#ifndef vmemmap_update_unlock +static void vmemmap_update_unlock(void) +{ +} +#endif + +#ifndef vmemmap_update_pte_range_start +static void vmemmap_update_pte_range_start(pte_t *pte, unsigned long start, unsigned long end) +{ +} +#endif + +#ifndef vmemmap_update_pte_range_end +static void vmemmap_update_pte_range_end(void) +{ +} +#endif + +#ifndef vmemmap_update_pmd_range_start +static void vmemmap_update_pmd_range_start(pmd_t *pmd, unsigned long start, unsigned long end) +{ +} +#endif + +#ifndef vmemmap_update_pmd_range_end +static void vmemmap_update_pmd_range_end(void) +{ +} +#endif + static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long start, struct vmemmap_remap_walk *walk) { @@ -83,7 +123,9 @@ static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long start, /* Make pte visible before pmd. See comment in pmd_install(). */ smp_wmb(); + vmemmap_update_pmd_range_start(pmd, start, start + PMD_SIZE); pmd_populate_kernel(&init_mm, pmd, pgtable); + vmemmap_update_pmd_range_end(); if (!(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)) flush_tlb_kernel_range(start, start + PMD_SIZE); } else { @@ -164,10 +206,12 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end, VM_BUG_ON(!PAGE_ALIGNED(start | end)); + vmemmap_update_lock(); mmap_read_lock(&init_mm); ret = walk_page_range_novma(&init_mm, start, end, &vmemmap_remap_ops, NULL, walk); mmap_read_unlock(&init_mm); + vmemmap_update_unlock(); if (ret) return ret; @@ -228,6 +272,8 @@ static void vmemmap_remap_pte_range(pte_t *pte, unsigned long start, unsigned lo smp_wmb(); } + vmemmap_update_pte_range_start(pte, start, end); + for (i = 0; i < nr_pages; i++) { pte_t val; @@ -242,6 +288,8 @@ static void vmemmap_remap_pte_range(pte_t *pte, unsigned long start, unsigned lo set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); } + + vmemmap_update_pte_range_end(); } /* @@ -287,6 +335,8 @@ static void vmemmap_restore_pte_range(pte_t *pte, unsigned long start, unsigned */ smp_wmb(); + vmemmap_update_pte_range_start(pte, start, end); + for (i = 0; i < nr_pages; i++) { pte_t val; @@ -296,6 +346,8 @@ static void vmemmap_restore_pte_range(pte_t *pte, unsigned long start, unsigned val = mk_pte(page, PAGE_KERNEL); set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); } + + vmemmap_update_pte_range_end(); } /** @@ -513,7 +565,8 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h, */ int hugetlb_vmemmap_restore_folio(const struct hstate *h, struct folio *folio) { - return __hugetlb_vmemmap_restore_folio(h, folio, VMEMMAP_SYNCHRONIZE_RCU); + return __hugetlb_vmemmap_restore_folio(h, folio, + VMEMMAP_SYNCHRONIZE_RCU | VMEMMAP_ARCH_TLB_FLUSH_FLAGS); } /** @@ -553,7 +606,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h, list_move(&folio->lru, non_hvo_folios); } - if (restored) + if (restored && !(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_REMAP_NO_TLB_FLUSH)) flush_tlb_all(); if (!ret) ret = restored; @@ -641,7 +694,8 @@ void hugetlb_vmemmap_optimize_folio(const struct hstate *h, struct folio *folio) { LIST_HEAD(vmemmap_pages); - __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, VMEMMAP_SYNCHRONIZE_RCU); + __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, + VMEMMAP_SYNCHRONIZE_RCU | VMEMMAP_ARCH_TLB_FLUSH_FLAGS); free_vmemmap_page_list(&vmemmap_pages); } @@ -683,7 +737,8 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l break; } - flush_tlb_all(); + if (!(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_SPLIT_NO_TLB_FLUSH)) + flush_tlb_all(); list_for_each_entry(folio, folio_list, lru) { int ret; @@ -701,24 +756,35 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l * allowing more vmemmap remaps to occur. */ if (ret == -ENOMEM && !list_empty(&vmemmap_pages)) { - flush_tlb_all(); + if (!(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_REMAP_NO_TLB_FLUSH)) + flush_tlb_all(); free_vmemmap_page_list(&vmemmap_pages); INIT_LIST_HEAD(&vmemmap_pages); __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags); } } - flush_tlb_all(); + if (!(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_REMAP_NO_TLB_FLUSH)) + flush_tlb_all(); free_vmemmap_page_list(&vmemmap_pages); } +static int hugetlb_vmemmap_sysctl(const struct ctl_table *ctl, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + if (!vmemmap_update_supported()) + return -ENODEV; + + return proc_dobool(ctl, write, buffer, lenp, ppos); +} + static struct ctl_table hugetlb_vmemmap_sysctls[] = { { .procname = "hugetlb_optimize_vmemmap", .data = &vmemmap_optimize_enabled, .maxlen = sizeof(vmemmap_optimize_enabled), .mode = 0644, - .proc_handler = proc_dobool, + .proc_handler = hugetlb_vmemmap_sysctl, }, }; @@ -729,6 +795,11 @@ static int __init hugetlb_vmemmap_init(void) /* HUGETLB_VMEMMAP_RESERVE_SIZE should cover all used struct pages */ BUILD_BUG_ON(__NR_USED_SUBPAGE > HUGETLB_VMEMMAP_RESERVE_PAGES); + if (READ_ONCE(vmemmap_optimize_enabled) && !vmemmap_update_supported()) { + pr_warn("HugeTLB: disabling HVO due to missing support.\n"); + WRITE_ONCE(vmemmap_optimize_enabled, false); + } + for_each_hstate(h) { if (hugetlb_vmemmap_optimizable(h)) { register_sysctl_init("vm", hugetlb_vmemmap_sysctls); From patchwork Thu Nov 7 20:20:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13867073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 54653D5D689 for ; Thu, 7 Nov 2024 20:28:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From: Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=v35HyYfzjB2bG4AOilBZdKULCbIxbvNUBJLvIql+6B0=; b=Z6i4yY/XkUjGIzKBAJBr8Uw4oD L6ownW1kKw17HLitZnzLXLj7VOvomQZ4XJozE/MJ+mE+jxQLVaXpRYZlQ+R18lB1sXI4c8naQDqCs 9OGQF7WpzdbMQORo7epojEw5/CeuCfRF0HQfN+MHwQ3ZqFWFWHBUTyMr2zOqKdgb+RfO8hDNz6Ohs /RVrwkhsvAK+9C6m1c3idXubiYUVjwtcwrDqEhELz3yxoqJ6h4Dyj4Sv0BZpVpbn13NGIcyt9S6jp VX+5oSqfRMLHdRtksfUi/fMMcvDWWCtOYtucSzmBhXLoJF7dfN9UuO3ubeXqGUZ7XDrLlaOLQfoGO IPHZhUAA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t996b-00000008EyW-35vE; Thu, 07 Nov 2024 20:27:49 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t98zm-00000008E43-3BJ5 for linux-arm-kernel@lists.infradead.org; Thu, 07 Nov 2024 20:20:48 +0000 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-e33152c8225so2940975276.0 for ; Thu, 07 Nov 2024 12:20:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010845; x=1731615645; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=v35HyYfzjB2bG4AOilBZdKULCbIxbvNUBJLvIql+6B0=; b=uAXKv/usErvKVayxi+rurMaj+9CYnP3I4DsySYF+02Hu2BEh9f0O32hPA0WK/kSI0p aE90oPm4XHzIFCtZKWS5RfFmwjh9tN3P4eCNftMfx/G5Chs80IWAJfKNvkqygV8uc8es 5xgSzE4UHn9W7ZVwazquEf8TTX+gUbyCCHnRQakBZBVuUDkhNfqsO8gFg8709lY3G+ze mJwZ4oLJuHm2OTXEdFM1UirMwnYS7qaeeHa6KPgVf9RGh04U4UIOONisr6ipOZA8TLqf ypmcKGPoNJi3ICdgoB2qD81nSLlB9+NUW1Gu2mBsw+T/YWZcT8MLF5IWq8dvjlhuX3oI XcPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010845; x=1731615645; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=v35HyYfzjB2bG4AOilBZdKULCbIxbvNUBJLvIql+6B0=; b=hFpZxWIBGoQIT+ppV2QAjdvuQYZB72s0gBx/dhtZpenYMABCZGbN/NY7ssypkzWjgZ xbVEQURy6D0yOVjEQvEd5FTJD6Lcahi0md8Gq+eBuVMVjOMEyXYfnJ4+xLzXgxnndmmg IQBd5jUogCz24lPJFo+d6wukEJhI4l397UIx+a8lbtf0jovuu8FCBIE2aOhTZ8V6LPNB rpKiiV+NIlQPhmvtcCBq4hzYg+vgAYNEUYx3lEL1v8CgtszRgLq8hfMwjVC9ITG0vIWW eBLWLNENEndYpJ5jy4oqK/iMV1rePsXzwMtyWg3SavG7j/4uEi6Yy2uFxWLx87dn0Oe/ 6NLg== X-Forwarded-Encrypted: i=1; AJvYcCW05qfKV9cj2x51FSavLlylADawECNX9szQI8UR5ARQdE8tDDQcj6DFn6E8i1+h0+5BWNTqFpKiEgSo8alwDntE@lists.infradead.org X-Gm-Message-State: AOJu0YzZsfu+mms98WWlkgGy7ZkW5+PbSRkUcTakvDH0gcS6ouSfB98D dFlKHJQPFXdFiOz445jwg7k6b90Ba3jtOnAMjKHQJuny7/rotqzSQ6Vf0tdPvGr5T3TzpTVxtkK l0g== X-Google-Smtp-Source: AGHT+IH5+g5PrNVgp5Pr5OTYOBdUbFjFulnS4uSn6ZxFeBhddvMAobpkpn0NbUnR+mioP6ARIrKUFKFKqUg= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a25:a249:0:b0:e28:e510:6ab1 with SMTP id 3f1490d57ef6-e337f8faff5mr210276.8.1731010845128; Thu, 07 Nov 2024 12:20:45 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:30 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-4-yuzhao@google.com> Subject: [PATCH v2 3/6] irqchip/gic-v3: support SGI broadcast From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241107_122046_823522_9F68D1CA X-CRM114-Status: GOOD ( 16.26 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org GIC v3 and later support SGI broadcast, i.e., the mode that routes interrupts to all PEs in the system excluding the local CPU. Supporting this mode can avoid looping through all the remote CPUs when broadcasting SGIs, especially for systems with 200+ CPUs. The performance improvement can be measured with the rest of this series booted with "hugetlb_free_vmemmap=on irqchip.gicv3_pseudo_nmi=1": cd /sys/kernel/mm/hugepages/ echo 600 >hugepages-1048576kB/nr_hugepages echo 2048kB >hugepages-1048576kB/demote_size perf record -g time echo 600 >hugepages-1048576kB/demote" With 80 CPUs: gic_ipi_send_mask() bash sys time Before: 38.14% 0m10.513s After: 0.20% 0m5.132s Signed-off-by: Yu Zhao --- drivers/irqchip/irq-gic-v3.c | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index ce87205e3e82..7ebe870e4608 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1322,6 +1322,7 @@ static void gic_cpu_init(void) #define MPIDR_TO_SGI_RS(mpidr) (MPIDR_RS(mpidr) << ICC_SGI1R_RS_SHIFT) #define MPIDR_TO_SGI_CLUSTER_ID(mpidr) ((mpidr) & ~0xFUL) +#define MPIDR_TO_SGI_TARGET_LIST(mpidr) (1 << ((mpidr) & 0xf)) /* * gic_starting_cpu() is called after the last point where cpuhp is allowed @@ -1356,7 +1357,7 @@ static u16 gic_compute_target_list(int *base_cpu, const struct cpumask *mask, mpidr = gic_cpu_to_affinity(cpu); while (cpu < nr_cpu_ids) { - tlist |= 1 << (mpidr & 0xf); + tlist |= MPIDR_TO_SGI_TARGET_LIST(mpidr); next_cpu = cpumask_next(cpu, mask); if (next_cpu >= nr_cpu_ids) @@ -1394,9 +1395,20 @@ static void gic_send_sgi(u64 cluster_id, u16 tlist, unsigned int irq) gic_write_sgi1r(val); } +static void gic_broadcast_sgi(unsigned int irq) +{ + u64 val; + + val = BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT) | (irq << ICC_SGI1R_SGI_ID_SHIFT); + + pr_devel("CPU %d: broadcasting SGI %u\n", smp_processor_id(), irq); + gic_write_sgi1r(val); +} + static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask) { - int cpu; + int cpu = smp_processor_id(); + bool self = cpumask_test_cpu(cpu, mask); if (WARN_ON(d->hwirq >= 16)) return; @@ -1407,6 +1419,19 @@ static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask) */ dsb(ishst); + if (cpumask_weight(mask) + !self == num_online_cpus()) { + /* Broadcast to all but self */ + gic_broadcast_sgi(d->hwirq); + if (self) { + unsigned long mpidr = gic_cpu_to_affinity(cpu); + + /* Send to self */ + gic_send_sgi(MPIDR_TO_SGI_CLUSTER_ID(mpidr), + MPIDR_TO_SGI_TARGET_LIST(mpidr), d->hwirq); + } + goto done; + } + for_each_cpu(cpu, mask) { u64 cluster_id = MPIDR_TO_SGI_CLUSTER_ID(gic_cpu_to_affinity(cpu)); u16 tlist; @@ -1414,7 +1439,7 @@ static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask) tlist = gic_compute_target_list(&cpu, mask, cluster_id); gic_send_sgi(cluster_id, tlist, d->hwirq); } - +done: /* Force the above writes to ICC_SGI1R_EL1 to be executed */ isb(); } From patchwork Thu Nov 7 20:20:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13867074 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0A0A0D5D689 for ; Thu, 7 Nov 2024 20:29:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From: Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=KhIcSe1txNfkCnP7MPXpnY6tOWGMY/4eAudpBxJaXiU=; b=SVyYSsGZDs4G8EKPFxZqgmtLdl ypA+1e79sAjTaWR17jO7x3hnEbHn+2CFdzuWbSfnLTpE+p7wjsTtm4vKegtEWNowjd6/k4o8BLvwl h4iATaBt0AunO/VFH7gCD0lQsYzPvy5Cf47Q6MTLpcjxyzlwd9rql2ceAQiWOOFFjmwVavWw3nwgO 2QFrqVZsU0dpUbQX0zC4J5NltNWBvjZMrpp3dZdUUG6fDRkju0T7Dtz9nhWOIMrllkAD+CMIzsJ2u wONQdrIfGFHHNSs8QQ2UvjttRWTbfRjMySl832I7vYi54e2KM/fW4uGYd9FU7oEKMJyZCjkS6G0Tu +60o9tZg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t998K-00000008F7j-0RzL; Thu, 07 Nov 2024 20:29:36 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t98zo-00000008E4s-2Txv for linux-arm-kernel@lists.infradead.org; Thu, 07 Nov 2024 20:20:50 +0000 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-e30daaf5928so2627245276.1 for ; Thu, 07 Nov 2024 12:20:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010847; x=1731615647; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KhIcSe1txNfkCnP7MPXpnY6tOWGMY/4eAudpBxJaXiU=; b=qokIMZUC8wxmxLdV80Mbm6qZwESzoieGm/ObC3zkh+2x/SOzv8mRvwiDYVgq6RP+2i dWz4D5O9EwOMBd9/box7RqaUAcEUl20dJLSV28k71QbL9WOU1xmdK8/oJ0Hwg6bBRogW XBmf7e5sgXUYZ/NHBQSUEzCHEVQJ1MpXFvtcefCOaBBPFBefuZt/Qo455131lYOJeLod CzX3Oh2SufAa5w/zkdT0xS6Tzo2n1B9uPvgMg0G/+/C2ZWylOAP0p/Kxrtf0pVddRXcD n+yOdT1jJzHOI98kVPbV464odyukU/ZESsSIkxFVONQDVQzU7jkRtE4I3sBU+A5/x9lc G75w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010847; x=1731615647; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KhIcSe1txNfkCnP7MPXpnY6tOWGMY/4eAudpBxJaXiU=; b=rYAeqt+Rj+s0ejfxdtEGO0mG2pwBCq+t/hPPu9BX3rOk+WLxjnXvzaOMrqXohYdsaL 18/w+6+9BxO3C54014kYiGED9iEBPfAYC08oCRSfEMGpSFvp6O2hZgJGAQIgXICMzoa/ +ko8g+TwChonGqIkyQJLIWz2+33xThzt+byPsuhIENQzgFg2/uLBVEHtMA9S2rgKlvrs JUY9GuWWo5sxaR3CeE0X3oTLAATQwd6s4HxUzmkkUWuChtd911nWeyM5uQls/gq6cx+x ADahCgDigYc4VcJkfEXOADpgdbV3YwMndaMCuFD7t3Ci+5tYFWhWGWhuGJvSpSEYJelu QW/w== X-Forwarded-Encrypted: i=1; AJvYcCXccqwvjxgNekKjMib7xkwb6E1uQQk05V99Fo0fNLk8PZJVFtiNv233ZccHHSYwwWMVz91iRLSI3TG4Ygzk1493@lists.infradead.org X-Gm-Message-State: AOJu0Yy6psFu05g2vnEd0OEeD2UezxT0rv63WfYtk3IjCPh7QRNf7Ph9 y1vENFPPKp2Nwg+g8crDW3dYAV0TN9G9k+9rjFYiT9uYRsBZYb3L22M4DlDj6uQAbba0n/cE9V+ M/w== X-Google-Smtp-Source: AGHT+IEnJ0e357/3OotzjsP9lexlQ5CBUwlEpVYwPI3w1zjYPdS8y4hVyHrOYLJ8xSQLJ2NE0kbCcYElb98= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a25:bc84:0:b0:e2b:d0e9:1cdc with SMTP id 3f1490d57ef6-e337f908f8dmr320276.10.1731010847379; Thu, 07 Nov 2024 12:20:47 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:31 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-5-yuzhao@google.com> Subject: [PATCH v2 4/6] arm64: broadcast IPIs to pause remote CPUs From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241107_122048_762582_5378C383 X-CRM114-Status: GOOD ( 22.80 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Broadcast pseudo-NMI IPIs to pause remote CPUs for a short period of time, and then reliably resume them when the local CPU exits critical sections that preclude the execution of remote CPUs. A typical example of such critical sections is BBM on kernel PTEs. HugeTLB Vmemmap Optimization (HVO) on arm64 was disabled by commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP") due to the folllowing reason: This is deemed UNPREDICTABLE by the Arm architecture without a break-before-make sequence (make the PTE invalid, TLBI, write the new valid PTE). However, such sequence is not possible since the vmemmap may be concurrently accessed by the kernel. Supporting BBM on kernel PTEs is one of the approaches that can make HVO safe on arm64. Signed-off-by: Yu Zhao --- arch/arm64/include/asm/smp.h | 3 ++ arch/arm64/kernel/smp.c | 85 +++++++++++++++++++++++++++++++++--- 2 files changed, 81 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 2510eec026f7..cffb0cfed961 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -133,6 +133,9 @@ bool cpus_are_stuck_in_kernel(void); extern void crash_smp_send_stop(void); extern bool smp_crash_stop_failed(void); +void pause_remote_cpus(void); +void resume_remote_cpus(void); + #endif /* ifndef __ASSEMBLY__ */ #endif /* ifndef __ASM_SMP_H */ diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 3b3f6b56e733..54e9f6374aa3 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -85,7 +85,12 @@ static int ipi_irq_base __ro_after_init; static int nr_ipi __ro_after_init = NR_IPI; static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init; -static bool crash_stop; +enum { + SEND_STOP, + CRASH_STOP, +}; + +static unsigned long stop_in_progress; static void ipi_setup(int cpu); @@ -917,6 +922,72 @@ static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs #endif } +static DEFINE_RAW_SPINLOCK(cpu_pause_lock); +static bool __cacheline_aligned_in_smp cpu_paused; +static atomic_t __cacheline_aligned_in_smp nr_cpus_paused; + +static void pause_local_cpu(void) +{ + atomic_inc(&nr_cpus_paused); + + while (READ_ONCE(cpu_paused)) + cpu_relax(); + + atomic_dec(&nr_cpus_paused); + + /* + * The caller of resume_remote_cpus() should make sure that clearing + * cpu_paused is ordered after other changes that can have any impact on + * this CPU. The isb() below makes sure this CPU doesn't speculatively + * execute the next instruction before it sees all those changes. + */ + isb(); +} + +void pause_remote_cpus(void) +{ + cpumask_t cpus_to_pause; + int nr_cpus_to_pause = num_online_cpus() - 1; + + lockdep_assert_cpus_held(); + lockdep_assert_preemption_disabled(); + + if (!nr_cpus_to_pause) + return; + + cpumask_copy(&cpus_to_pause, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &cpus_to_pause); + + raw_spin_lock(&cpu_pause_lock); + + WARN_ON_ONCE(cpu_paused); + WARN_ON_ONCE(atomic_read(&nr_cpus_paused)); + + cpu_paused = true; + + smp_cross_call(&cpus_to_pause, IPI_CPU_STOP_NMI); + + while (atomic_read(&nr_cpus_paused) != nr_cpus_to_pause) + cpu_relax(); + + raw_spin_unlock(&cpu_pause_lock); +} + +void resume_remote_cpus(void) +{ + if (!cpu_paused) + return; + + raw_spin_lock(&cpu_pause_lock); + + WRITE_ONCE(cpu_paused, false); + + while (atomic_read(&nr_cpus_paused)) + cpu_relax(); + + raw_spin_unlock(&cpu_pause_lock); +} + static void arm64_backtrace_ipi(cpumask_t *mask) { __ipi_send_mask(ipi_desc[IPI_CPU_BACKTRACE], mask); @@ -970,7 +1041,9 @@ static void do_handle_IPI(int ipinr) case IPI_CPU_STOP: case IPI_CPU_STOP_NMI: - if (IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop) { + if (!test_bit(SEND_STOP, &stop_in_progress)) { + pause_local_cpu(); + } else if (test_bit(CRASH_STOP, &stop_in_progress)) { ipi_cpu_crash_stop(cpu, get_irq_regs()); unreachable(); } else { @@ -1142,7 +1215,6 @@ static inline unsigned int num_other_online_cpus(void) void smp_send_stop(void) { - static unsigned long stop_in_progress; cpumask_t mask; unsigned long timeout; @@ -1154,7 +1226,7 @@ void smp_send_stop(void) goto skip_ipi; /* Only proceed if this is the first CPU to reach this code */ - if (test_and_set_bit(0, &stop_in_progress)) + if (test_and_set_bit(SEND_STOP, &stop_in_progress)) return; /* @@ -1230,12 +1302,11 @@ void crash_smp_send_stop(void) * This function can be called twice in panic path, but obviously * we execute this only once. * - * We use this same boolean to tell whether the IPI we send was a + * We use the CRASH_STOP bit to tell whether the IPI we send was a * stop or a "crash stop". */ - if (crash_stop) + if (test_and_set_bit(CRASH_STOP, &stop_in_progress)) return; - crash_stop = 1; smp_send_stop(); From patchwork Thu Nov 7 20:20:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13867075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E516FD5D689 for ; Thu, 7 Nov 2024 20:31:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From: Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=kshc+5TecBnElfjUZu5d6HS1cvMCCmL1E/fnjesi6Ow=; b=DZA9TZ2cj1v3XwRyg1tUsO/Xym hbLAXe0dZ258Vx+HSXC6K+EKORu/P0UyNPgjwIrEptqtZeR1FM1EghInafjSklnq+lks6d4ljV65M kauQ1vi3eQH3o+JYWvp9AZYnm/jbL68D4l9jMLtzssPgZLyG4G10RWZ2fDxbsMM3mhwd82F6s77g8 EJkaYS4e4gxAe5xu7siSjNG6P4lhOm630toma8M6rV2actzmQCq3hMzDKQxdOOLIrBP0/nt2uUI5R kB6MNJI5NaQC7aTWfjpRWD0luR5GHplDoQpoq95Mp31VUVTpcClafqmLsOu5IPJ4mQkI3RcrfDPqc w5c2g/0g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t99A2-00000008FQJ-1tPb; Thu, 07 Nov 2024 20:31:22 +0000 Received: from mail-yw1-x114a.google.com ([2607:f8b0:4864:20::114a]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t98zq-00000008E5t-4491 for linux-arm-kernel@lists.infradead.org; Thu, 07 Nov 2024 20:20:52 +0000 Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-6e35199eb2bso28090347b3.3 for ; Thu, 07 Nov 2024 12:20:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010849; x=1731615649; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=kshc+5TecBnElfjUZu5d6HS1cvMCCmL1E/fnjesi6Ow=; b=eUJAki2yL7LrAB0/WVprJpyI6Psv43CnA0Tu2DaD2kHrfIkzAtkODvS1lyXr5ztMqb qo4GeItZxthkCS/y2Qb7LJYNJV2aHtyV/RUzlpxzC4GtkmXUO9CVjL2fUNX4YeV+V7Pm gtVOcKu2LTHGDIV5kiyKB4xUKEiCFkbp+Hy+yByhNJCTOuTwfIeVSZIgzhASzfVLccUV x1fbWl1y5ukAwD7bnLvXzCYT6H9UWmpZKrvNC9RTBDiFs4gte/R7bjU0XwEjqQ/y/w22 7iC7XiynSkotRr8tXSvCfaEir/gU9T4hIsN6RLxQt9lAvfIzgSpy/n12SytTV7wbu2mW M9BQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010849; x=1731615649; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kshc+5TecBnElfjUZu5d6HS1cvMCCmL1E/fnjesi6Ow=; b=UwC/jPfoWETmcJHYWAhh6GWdaNh6fso/7bbm1JHxKScY29a1jK9ro7lh4Zmb3IEM2i 0FThO0V/ek9+GSUgy9GbyIkeRi/EtmTiVjlGlqWQPfIcY8zjHpVM+1O01cBoDFDyfCjj t0XJ7GQ9n3rWFoZfwigFJK3zy1AVw+kA4QMvO2XndVfGEAio8ukprNDqFhXIaS+m3wWu mmhDa49wymGIhokLwq/iVkAcgBaH4HZx7On48T1MSZVocqZxcLuGOQttA2j1vv5Ep6xm RNBHv6KbKHfaCboVfGVrGzr8FrrmhbVOhHpkVeG6jbyGv1XPjGV3OeW60wuRj58abIlQ MWyg== X-Forwarded-Encrypted: i=1; AJvYcCWOJHl0z0BH55WSojqvCUOgOmlDCyKmhq6CvnbjiXMNQuuReSa6By603GTbswSJwyyn339Q+y0x9Pa6qIUcNZXf@lists.infradead.org X-Gm-Message-State: AOJu0YwvXDZaHqFfsnl81I7WQJ9WJlUGo7kpPhuj19F5QZd/88mpE/6p 6yFPOLhoWSIf1sE2PexVigRuGzSSQDxm78LC8fzgmE9zlcLZSDSMNTgtsws1Z+RrPYkUEszDwem agg== X-Google-Smtp-Source: AGHT+IEjIv/a3qKWak4A1Vn18DXvCmBDzVmrxwgOEzTP/K9OGHGZiWxXZlnHtaXBKDC55f0LUGtreAyzNRE= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a05:690c:c08:b0:6c1:298e:5a7 with SMTP id 00721157ae682-6eaddfb88bcmr33137b3.5.1731010849660; Thu, 07 Nov 2024 12:20:49 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:32 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-6-yuzhao@google.com> Subject: [PATCH v2 5/6] arm64: pause remote CPUs to update vmemmap From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241107_122051_074321_DC09B740 X-CRM114-Status: GOOD ( 12.41 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Pause remote CPUs so that the local CPU can follow the proper BBM sequence to safely update the vmemmap mapping `struct page` areas. While updating the vmemmap, it is guaranteed that neither the local CPU nor the remote ones will access the `struct page` area being updated, and therefore they should not trigger kernel PFs. Signed-off-by: Yu Zhao --- arch/arm64/include/asm/pgalloc.h | 69 ++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h index 8ff5f2a2579e..f50f79f57c1e 100644 --- a/arch/arm64/include/asm/pgalloc.h +++ b/arch/arm64/include/asm/pgalloc.h @@ -12,6 +12,7 @@ #include #include #include +#include #define __HAVE_ARCH_PGD_FREE #define __HAVE_ARCH_PUD_FREE @@ -137,4 +138,72 @@ pmd_populate(struct mm_struct *mm, pmd_t *pmdp, pgtable_t ptep) __pmd_populate(pmdp, page_to_phys(ptep), PMD_TYPE_TABLE | PMD_TABLE_PXN); } +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP + +#define VMEMMAP_ARCH_TLB_FLUSH_FLAGS (VMEMMAP_SPLIT_NO_TLB_FLUSH | VMEMMAP_REMAP_NO_TLB_FLUSH) + +#define vmemmap_update_supported vmemmap_update_supported +static inline bool vmemmap_update_supported(void) +{ + return system_uses_irq_prio_masking(); +} + +#define vmemmap_update_lock vmemmap_update_lock +static inline void vmemmap_update_lock(void) +{ + cpus_read_lock(); +} + +#define vmemmap_update_unlock vmemmap_update_unlock +static inline void vmemmap_update_unlock(void) +{ + cpus_read_unlock(); +} + +#define vmemmap_update_pte_range_start vmemmap_update_pte_range_start +static inline void vmemmap_update_pte_range_start(pte_t *pte, + unsigned long start, unsigned long end) +{ + unsigned long addr; + + local_irq_disable(); + pause_remote_cpus(); + + for (addr = start; addr != end; addr += PAGE_SIZE, pte++) + pte_clear(&init_mm, addr, pte); + + flush_tlb_kernel_range(start, end); +} + +#define vmemmap_update_pte_range_end vmemmap_update_pte_range_end +static inline void vmemmap_update_pte_range_end(void) +{ + resume_remote_cpus(); + local_irq_enable(); +} + +#define vmemmap_update_pmd_range_start vmemmap_update_pmd_range_start +static inline void vmemmap_update_pmd_range_start(pmd_t *pmd, + unsigned long start, unsigned long end) +{ + unsigned long addr; + + local_irq_disable(); + pause_remote_cpus(); + + for (addr = start; addr != end; addr += PMD_SIZE, pmd++) + pmd_clear(pmd); + + flush_tlb_kernel_range(start, end); +} + +#define vmemmap_update_pmd_range_end vmemmap_update_pmd_range_end +static inline void vmemmap_update_pmd_range_end(void) +{ + resume_remote_cpus(); + local_irq_enable(); +} + +#endif /* CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP */ + #endif From patchwork Thu Nov 7 20:20:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13867086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 254DBD5D689 for ; Thu, 7 Nov 2024 20:33:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From: Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=RYMFXTGWFxXGTBSXbf4RqlCV7Y4+WKb9i4kIMpLBOpY=; b=1+ERRoSMg2ptXnVj9lZbfmr8+v E5/bYXuQHYmCRdGjQ56ibMRFNpYzT/v8xw+4QcOQ0UGAr07jv5zf4LV7x7dSSnEc/AmgXSUERK3pN E2YdB1hCDUypXd/bYoSCy214SdRXf38+ASJabYJE6hDhoYS2o9wxgfQcLh4uhcjtzHLkTl3yHWx+t yh25/jMWrGrRlk4cUQohtpKp9+RKfw5KhCCDY8TLu7F8I6LAJC8/DQVeGROQz3IkJwCArr2u8tytU 5767pjx58zcUxIwnpFjxCVfUtuuWvhQGDkLHAMbRrTYpfU1gi/wxfe1CsWGKM7e8P4rGDNEh8oXKF jXbfvAAA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t99Br-00000008FyA-2GCl; Thu, 07 Nov 2024 20:33:15 +0000 Received: from mail-yw1-x1149.google.com ([2607:f8b0:4864:20::1149]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t98zt-00000008E7q-2p7S for linux-arm-kernel@lists.infradead.org; Thu, 07 Nov 2024 20:20:55 +0000 Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-6ea95f530fdso25371137b3.1 for ; Thu, 07 Nov 2024 12:20:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731010852; x=1731615652; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=RYMFXTGWFxXGTBSXbf4RqlCV7Y4+WKb9i4kIMpLBOpY=; b=IyJ5speeZy5460vgI2KLzKtiwlGF7JgCFbH4tmsPvYEhYhbU2DNuqpDwBwCBx0cN7H zj5PhdkZ8b7R2KZOkx2kSnE2kFQPha203T5ILTUeyvh8LOSd+LUVR03vypr2TFBf/olv D2n75PCAluxgmE2UT/6MFoen0S4oFTaxIuP5woUgiWEZ3HGpLr/nv0GDS3hVkrOnpcm+ bck2HvfAZSJNgxO8m9UIIjhzqKtUHhLgR+q5uqvex0mJF3fCWUArQvF/l3cMXpjtS9Un cQaqhgaa+VB+7KKFUmTBJeBirGbuYofIxKwyr6dSmB7MLHiCMHAaGbgPw4tcHQq6EZJZ Wzhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731010852; x=1731615652; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RYMFXTGWFxXGTBSXbf4RqlCV7Y4+WKb9i4kIMpLBOpY=; b=WbltKAY8tdVUHCvPgULR3SpFohXopAgNc38b6raPWZKCr6wdXjOAagVrCzozKxhPAq oYSt5UDZVyxOOmvA3hpUg0OPjR+h+sdMgpbBb/ZJBvZ87A96gp+JKEA4BCDjJVjeYSl8 o8RcyxO/dpjp8o06eEG9OTHOGh5hK5GZQ3hg/rE7+wPmvGOW0yzV6seFmn/KN8nm0IMZ 4DMjbVZHG/paeVJt0wbQfrVmD12LbgGrMmZ8XbmewXVJTUrhVEENKOBs6L5pVpGdZTp9 YuZr3eK+yS94rg1L59aOqClN8eJ2/ASg3Ii3CydOye1NxMxB9rBN18G0WfTUsAHx4Vgs 6o6Q== X-Forwarded-Encrypted: i=1; AJvYcCXFyYiVNPPEVI+4pqBatkZK4pgxcwlEYOhRjSt6kWRp4E/5pLcv/i1SHcoK3iYGrftTWzyIeAgSC5eY9kgRRgxm@lists.infradead.org X-Gm-Message-State: AOJu0YweYoDT4a6UCfnWWLc/jFL3nqIyvHDDUJtlbjkUoplcmbIKFB99 VCHtRdOzAELRQGuEsSnoHZHu/EWhUcIb8530H1O1MJyspDb5kzIgfAt9kSXoxehOp38UpMeNK3j g4A== X-Google-Smtp-Source: AGHT+IHlhijSC9zoCXOj87mpdA8tF9p5aSLV/movLFpBkBVUAVMMT2gDU4exk7NlnuXq35RVmXnfaJPFwyA= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:a4c0:c64f:6cdd:91f8]) (user=yuzhao job=sendgmr) by 2002:a5b:809:0:b0:e2e:3401:ea0f with SMTP id 3f1490d57ef6-e337f8f63a3mr313276.7.1731010852085; Thu, 07 Nov 2024 12:20:52 -0800 (PST) Date: Thu, 7 Nov 2024 13:20:33 -0700 In-Reply-To: <20241107202033.2721681-1-yuzhao@google.com> Mime-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.277.g8800431eea-goog Message-ID: <20241107202033.2721681-7-yuzhao@google.com> Subject: [PATCH v2 6/6] arm64: select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241107_122053_791474_C740826C X-CRM114-Status: GOOD ( 12.22 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org To use HVO, make sure that the kernel is booted with pseudo-NMI enabled by "irqchip.gicv3_pseudo_nmi=1", as well as "hugetlb_free_vmemmap=on" unless HVO is enabled by default. Note that HVO checks the pseudo-NMI capability and is disabled at runtime if that turns out not supported. Successfully enabling HVO should have the following: # dmesg | grep NMI GICv3: Pseudo-NMIs enabled using ... # sysctl vm.hugetlb_optimize_vmemmap vm.hugetlb_optimize_vmemmap = 1 For comparison purposes, the whole series was measured against this patch only, to show the overhead from pausing remote CPUs: HugeTLB operations This patch only The whole series Change Alloc 600 1GB 0m3.526s 0m3.649s +4% Free 600 1GB 0m0.880s 0m0.917s +4% Demote 600 1GB to 307200 2MB 0m1.575s 0m3.640s +231% Free 307200 2MB 0m0.946s 0m2.921s +309% Signed-off-by: Yu Zhao --- arch/arm64/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fd9df6dcc593..e93745f819d9 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -109,6 +109,7 @@ config ARM64 select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT select ARCH_WANT_FRAME_POINTERS select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36) + select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_EXECMEM_LATE if EXECMEM select ARCH_WANTS_NO_INSTR