From patchwork Mon Oct 21 04:22:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13843548 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 49373D3C93E for ; Mon, 21 Oct 2024 04:27:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:Cc:To:From: Subject:Message-ID:References:Mime-Version:In-Reply-To:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=kQVfzlUfyOXzQNiM6rrgf1IxWXC/TuVQ/9/mg1gNSUk=; b=iBJ2aDUncQekrhdjpLKv3gZIBt EZ52m/MZCq59Mncs9sXE5oXzntKnWaDHq9qU59wXdprjSp15pvsuWTIqIe3b53mjKtQ3W9ozrxDAy rTbfQ/KKkOKqgwJ3yAXd8wXg1B1a4seLYfkdQCvi4o2P2XTd3azBFKRVosdREWm9nv6SS8kE/T2Sy yP1YTrh18G/g/bhbjvKWOxwNbAv4TLfirQqwzV1W0i4pXVUplYe0XeDz+f9ATYzBu63HT9vg0L/pu 7MfnOgEf0aHlkD3AmsfNx76z/o41/l0kkVr+lv/Spg3Ru4L35mcDikkdcUBL/HVCY5QMslr4vpjoe CXRIBeiw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1t2k0Y-000000061VW-1Pk5; Mon, 21 Oct 2024 04:27:06 +0000 Received: from mail-yw1-x114a.google.com ([2607:f8b0:4864:20::114a]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1t2jw5-000000060rS-2HUa for linux-arm-kernel@lists.infradead.org; Mon, 21 Oct 2024 04:22:30 +0000 Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-6e32e8436adso55268507b3.0 for ; Sun, 20 Oct 2024 21:22:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729484548; x=1730089348; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=kQVfzlUfyOXzQNiM6rrgf1IxWXC/TuVQ/9/mg1gNSUk=; b=GrZaMfp6gugHLWPZBJfgz+fakm1e70yG6gZBuND7yf4A0iXDBUaVTraG0HVkHehc0a SKKT5Zqoih8IH9bdzwNYvgARFcRTZ3/22d56maFOeaOcDGMfLTWPE/la/tb/EF2GMVgU m+PdnGH4cJ/UCB4QJ0Je64U5S7NZkk5eCy9RCE22DAGO2Lrrykubx/Y0/Sk90b/tTTaD GHpPiRm1uVDVJrPw9h3m+mq3ZZyKdehXX4RHZNp7K7K7u1fkUb/tuCtE2aCWCJPkk/hm uegf0lIze4MsgVpLAQ9VJYmpT60qRukb79Ke37owzb6eBFR/zumuTugStozGju2jE9Xr kZAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729484548; x=1730089348; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kQVfzlUfyOXzQNiM6rrgf1IxWXC/TuVQ/9/mg1gNSUk=; b=DiNvTMqjchNZPXhNyKX74lKBSjTvHpUpmg1uf7IzKfS0ogeiirJc+VSlmPFOKqeQKp bYGGEAzDRbKqWXUMOvEkgoSjNGKpvHduljnOY22pjE5U4wXByIEDGqLcSFTLPA7N1Lga cdFLZQRtEI2351Z4weD2VLPocTP17K5GuRjGQm8DEZzZD0+bkYGyIRzVtC34I7P5pi2n mgwqfwv1gDeNfIRNNIGOu4Sug9D8QvONruqTcPmBcRIRhVFHyV6Mhwbazjwx6BTZjXSX +xfgiwXzRw07ppgHd64COhp0GG2yMY1kiCPRb7lXEQXIWT1noviLAJPV6jC/GeFtpXLz fVfQ== X-Forwarded-Encrypted: i=1; AJvYcCWQ26K1GrBchu1tKoRSbu6baNLVDPHWjztHXJ/IAhy4pjBFhoAa6tS1+W5nl4GhlbLIBRarjw/Ci0SvQ8NiU1bZ@lists.infradead.org X-Gm-Message-State: AOJu0YxaPtRglyaMpmCI+yZ2WrKZ88NTmTKcLlj9eCGC3DozTCoDq6+e 1+IonL6AvuCCVa9pO2WIAWr0rUIzNC1gu3gBwvoWx/G55YOeqZ9QeEPLezloGM/7di56JZpQ7iV w3A== X-Google-Smtp-Source: AGHT+IFa+RAhMnCTBGawKhIBMzr1i4wNva1osbWPtOgVISCaAsioJIQoe2S7YEzQTgFhpJ04zIjZfcK8uDQ= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:1569:9ef4:20ab:abf9]) (user=yuzhao job=sendgmr) by 2002:a05:690c:6383:b0:663:ddc1:eab8 with SMTP id 00721157ae682-6e5bfbeb2e0mr2568607b3.4.1729484547736; Sun, 20 Oct 2024 21:22:27 -0700 (PDT) Date: Sun, 20 Oct 2024 22:22:14 -0600 In-Reply-To: <20241021042218.746659-1-yuzhao@google.com> Mime-Version: 1.0 References: <20241021042218.746659-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog Message-ID: <20241021042218.746659-3-yuzhao@google.com> Subject: [PATCH v1 2/6] mm/hugetlb_vmemmap: add arch-independent helpers From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241020_212229_622317_E19A8913 X-CRM114-Status: GOOD ( 21.87 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Add architecture-independent helpers to allow individual architectures to work around their own limitations when updating vmemmap. Specifically, the current remap workflow requires break-before-make (BBM) on arm64. By overriding the default helpers later in this series, arm64 will be able to support the current HVO implementation. Signed-off-by: Yu Zhao --- include/linux/mm_types.h | 7 +++ mm/hugetlb_vmemmap.c | 99 ++++++++++++++++++++++++++++++++++------ 2 files changed, 92 insertions(+), 14 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e3bdf8e38bc..0f3ae6e173f6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1499,4 +1499,11 @@ enum { /* See also internal only FOLL flags in mm/internal.h */ }; +/* Skip the TLB flush when we split the PMD */ +#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) +/* Skip the TLB flush when we remap the PTE */ +#define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) +/* synchronize_rcu() to avoid writes from page_ref_add_unless() */ +#define VMEMMAP_SYNCHRONIZE_RCU BIT(2) + #endif /* _LINUX_MM_TYPES_H */ diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 46befab48d41..e50a196399f5 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -38,16 +38,56 @@ struct vmemmap_remap_walk { struct page *reuse_page; unsigned long reuse_addr; struct list_head *vmemmap_pages; - -/* Skip the TLB flush when we split the PMD */ -#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) -/* Skip the TLB flush when we remap the PTE */ -#define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) -/* synchronize_rcu() to avoid writes from page_ref_add_unless() */ -#define VMEMMAP_SYNCHRONIZE_RCU BIT(2) unsigned long flags; }; +#ifndef VMEMMAP_ARCH_TLB_FLUSH_FLAGS +#define VMEMMAP_ARCH_TLB_FLUSH_FLAGS 0 +#endif + +#ifndef vmemmap_update_supported +static bool vmemmap_update_supported(void) +{ + return true; +} +#endif + +#ifndef vmemmap_update_lock +static void vmemmap_update_lock(void) +{ +} +#endif + +#ifndef vmemmap_update_unlock +static void vmemmap_update_unlock(void) +{ +} +#endif + +#ifndef vmemmap_update_pte_range_start +static void vmemmap_update_pte_range_start(pte_t *pte, unsigned long start, unsigned long end) +{ +} +#endif + +#ifndef vmemmap_update_pte_range_end +static void vmemmap_update_pte_range_end(void) +{ +} +#endif + +#ifndef vmemmap_update_pmd_range_start +static void vmemmap_update_pmd_range_start(pmd_t *pmd, unsigned long start, unsigned long end) +{ +} +#endif + +#ifndef vmemmap_update_pmd_range_end +static void vmemmap_update_pmd_range_end(void) +{ +} +#endif + static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long start, struct vmemmap_remap_walk *walk) { @@ -83,7 +123,9 @@ static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long start, /* Make pte visible before pmd. See comment in pmd_install(). */ smp_wmb(); + vmemmap_update_pmd_range_start(pmd, start, start + PMD_SIZE); pmd_populate_kernel(&init_mm, pmd, pgtable); + vmemmap_update_pmd_range_end(); if (!(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)) flush_tlb_kernel_range(start, start + PMD_SIZE); } else { @@ -164,10 +206,12 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end, VM_BUG_ON(!PAGE_ALIGNED(start | end)); + vmemmap_update_lock(); mmap_read_lock(&init_mm); ret = walk_page_range_novma(&init_mm, start, end, &vmemmap_remap_ops, NULL, walk); mmap_read_unlock(&init_mm); + vmemmap_update_unlock(); if (ret) return ret; @@ -228,6 +272,8 @@ static void vmemmap_remap_pte_range(pte_t *pte, unsigned long start, unsigned lo smp_wmb(); } + vmemmap_update_pte_range_start(pte, start, end); + for (i = 0; i < nr_pages; i++) { pte_t val; @@ -242,6 +288,8 @@ static void vmemmap_remap_pte_range(pte_t *pte, unsigned long start, unsigned lo set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); } + + vmemmap_update_pte_range_end(); } /* @@ -287,6 +335,8 @@ static void vmemmap_restore_pte_range(pte_t *pte, unsigned long start, unsigned */ smp_wmb(); + vmemmap_update_pte_range_start(pte, start, end); + for (i = 0; i < nr_pages; i++) { pte_t val; @@ -296,6 +346,8 @@ static void vmemmap_restore_pte_range(pte_t *pte, unsigned long start, unsigned val = mk_pte(page, PAGE_KERNEL); set_pte_at(&init_mm, start + PAGE_SIZE * i, pte + i, val); } + + vmemmap_update_pte_range_end(); } /** @@ -513,7 +565,8 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h, */ int hugetlb_vmemmap_restore_folio(const struct hstate *h, struct folio *folio) { - return __hugetlb_vmemmap_restore_folio(h, folio, VMEMMAP_SYNCHRONIZE_RCU); + return __hugetlb_vmemmap_restore_folio(h, folio, + VMEMMAP_SYNCHRONIZE_RCU | VMEMMAP_ARCH_TLB_FLUSH_FLAGS); } /** @@ -553,7 +606,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h, list_move(&folio->lru, non_hvo_folios); } - if (restored) + if (restored && !(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_REMAP_NO_TLB_FLUSH)) flush_tlb_all(); if (!ret) ret = restored; @@ -641,7 +694,8 @@ void hugetlb_vmemmap_optimize_folio(const struct hstate *h, struct folio *folio) { LIST_HEAD(vmemmap_pages); - __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, VMEMMAP_SYNCHRONIZE_RCU); + __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, + VMEMMAP_SYNCHRONIZE_RCU | VMEMMAP_ARCH_TLB_FLUSH_FLAGS); free_vmemmap_page_list(&vmemmap_pages); } @@ -683,7 +737,8 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l break; } - flush_tlb_all(); + if (!(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_SPLIT_NO_TLB_FLUSH)) + flush_tlb_all(); list_for_each_entry(folio, folio_list, lru) { int ret; @@ -701,24 +756,35 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l * allowing more vmemmap remaps to occur. */ if (ret == -ENOMEM && !list_empty(&vmemmap_pages)) { - flush_tlb_all(); + if (!(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_REMAP_NO_TLB_FLUSH)) + flush_tlb_all(); free_vmemmap_page_list(&vmemmap_pages); INIT_LIST_HEAD(&vmemmap_pages); __hugetlb_vmemmap_optimize_folio(h, folio, &vmemmap_pages, flags); } } - flush_tlb_all(); + if (!(VMEMMAP_ARCH_TLB_FLUSH_FLAGS & VMEMMAP_REMAP_NO_TLB_FLUSH)) + flush_tlb_all(); free_vmemmap_page_list(&vmemmap_pages); } +static int hugetlb_vmemmap_sysctl(const struct ctl_table *ctl, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + if (!vmemmap_update_supported()) + return -ENODEV; + + return proc_dobool(ctl, write, buffer, lenp, ppos); +} + static struct ctl_table hugetlb_vmemmap_sysctls[] = { { .procname = "hugetlb_optimize_vmemmap", .data = &vmemmap_optimize_enabled, .maxlen = sizeof(vmemmap_optimize_enabled), .mode = 0644, - .proc_handler = proc_dobool, + .proc_handler = hugetlb_vmemmap_sysctl, }, }; @@ -729,6 +795,11 @@ static int __init hugetlb_vmemmap_init(void) /* HUGETLB_VMEMMAP_RESERVE_SIZE should cover all used struct pages */ BUILD_BUG_ON(__NR_USED_SUBPAGE > HUGETLB_VMEMMAP_RESERVE_PAGES); + if (READ_ONCE(vmemmap_optimize_enabled) && !vmemmap_update_supported()) { + pr_warn("HugeTLB: disabling HVO due to missing support.\n"); + WRITE_ONCE(vmemmap_optimize_enabled, false); + } + for_each_hstate(h) { if (hugetlb_vmemmap_optimizable(h)) { register_sysctl_init("vm", hugetlb_vmemmap_sysctls);