From patchwork Thu Apr 4 14:33:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13617957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 04906CD1292 for ; Thu, 4 Apr 2024 14:33:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=zCqkLc4GeD+jDNoYE6EvdLYE2ZNUY4aCClRt/yi3HcY=; b=R6sMf4pk6oWppL b8/l0I/Cx85adsNoSZChiXjLB/IViZ8cvdQOPOcKGhyqP4nfDINoqU2enHv5sxk3x9us4r6cMc7FT qHLyp0bidg6/5g9Yfa2e/o5+G1alQb91lzOcXKxr4tDwMqIxtKTDLnHZ9t1InlZm675QS5vKN9DYZ 3xX5+GRHhgkes9dFrSfYz9OgA8Z9VIhisck5FTkPcnUppEWeopGGbmpNfz7pedMK9d3immVIbwl9j MI1OUlHposExwUDcO5zZAIhpsRqciLyCDM6PwTeu1hmjj/JLA3RbMZMVHYKHEISb/+lEWHACkxECj d13MWo2Voz5K9ES7DPvg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsO9f-00000003301-38VJ; Thu, 04 Apr 2024 14:33:27 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsO9a-000000032y6-2j2a for linux-arm-kernel@lists.infradead.org; Thu, 04 Apr 2024 14:33:24 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 191FC1007; Thu, 4 Apr 2024 07:33:51 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E9A1A3F64C; Thu, 4 Apr 2024 07:33:18 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Mark Rutland , Ard Biesheuvel , David Hildenbrand , Donald Dutile , Eric Chanudet Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Itaru Kitayama Subject: [PATCH v2 1/4] arm64: mm: Don't remap pgtables per-cont(pte|pmd) block Date: Thu, 4 Apr 2024 15:33:05 +0100 Message-Id: <20240404143308.2224141-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240404143308.2224141-1-ryan.roberts@arm.com> References: <20240404143308.2224141-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240404_073323_064843_739FF74A X-CRM114-Status: GOOD ( 14.39 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org A large part of the kernel boot time is creating the kernel linear map page tables. When rodata=full, all memory is mapped by pte. And when there is lots of physical ram, there are lots of pte tables to populate. The primary cost associated with this is mapping and unmapping the pte table memory in the fixmap; at unmap time, the TLB entry must be invalidated and this is expensive. Previously, each pmd and pte table was fixmapped/fixunmapped for each cont(pte|pmd) block of mappings (16 entries with 4K granule). This means we ended up issuing 32 TLBIs per (pmd|pte) table during the population phase. Let's fix that, and fixmap/fixunmap each page once per population, for a saving of 31 TLBIs per (pmd|pte) table. This gives a significant boot speedup. Execution time of map_mem(), which creates the kernel linear map page tables, was measured on different machines with different RAM configs: | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra | VM, 16G | VM, 64G | VM, 256G | Metal, 512G ---------------|-------------|-------------|-------------|------------- | ms (%) | ms (%) | ms (%) | ms (%) ---------------|-------------|-------------|-------------|------------- before | 153 (0%) | 2227 (0%) | 8798 (0%) | 17442 (0%) after | 77 (-49%) | 431 (-81%) | 1727 (-80%) | 3796 (-78%) Signed-off-by: Ryan Roberts Tested-by: Itaru Kitayama Tested-by: Eric Chanudet Acked-by: Mark Rutland --- arch/arm64/mm/mmu.c | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 495b732d5af3..fd91b5bdb514 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -172,12 +172,9 @@ bool pgattr_change_is_safe(u64 old, u64 new) return ((old ^ new) & ~mask) == 0; } -static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, - phys_addr_t phys, pgprot_t prot) +static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end, + phys_addr_t phys, pgprot_t prot) { - pte_t *ptep; - - ptep = pte_set_fixmap_offset(pmdp, addr); do { pte_t old_pte = __ptep_get(ptep); @@ -193,7 +190,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, phys += PAGE_SIZE; } while (ptep++, addr += PAGE_SIZE, addr != end); - pte_clear_fixmap(); + return ptep; } static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, @@ -204,6 +201,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, { unsigned long next; pmd_t pmd = READ_ONCE(*pmdp); + pte_t *ptep; BUG_ON(pmd_sect(pmd)); if (pmd_none(pmd)) { @@ -219,6 +217,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, } BUG_ON(pmd_bad(pmd)); + ptep = pte_set_fixmap_offset(pmdp, addr); do { pgprot_t __prot = prot; @@ -229,20 +228,20 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, (flags & NO_CONT_MAPPINGS) == 0) __prot = __pgprot(pgprot_val(prot) | PTE_CONT); - init_pte(pmdp, addr, next, phys, __prot); + ptep = init_pte(ptep, addr, next, phys, __prot); phys += next - addr; } while (addr = next, addr != end); + + pte_clear_fixmap(); } -static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end, - phys_addr_t phys, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(int), int flags) +static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end, + phys_addr_t phys, pgprot_t prot, + phys_addr_t (*pgtable_alloc)(int), int flags) { unsigned long next; - pmd_t *pmdp; - pmdp = pmd_set_fixmap_offset(pudp, addr); do { pmd_t old_pmd = READ_ONCE(*pmdp); @@ -269,7 +268,7 @@ static void init_pmd(pud_t *pudp, unsigned long addr, unsigned long end, phys += next - addr; } while (pmdp++, addr = next, addr != end); - pmd_clear_fixmap(); + return pmdp; } static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, @@ -279,6 +278,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, { unsigned long next; pud_t pud = READ_ONCE(*pudp); + pmd_t *pmdp; /* * Check for initial section mappings in the pgd/pud. @@ -297,6 +297,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, } BUG_ON(pud_bad(pud)); + pmdp = pmd_set_fixmap_offset(pudp, addr); do { pgprot_t __prot = prot; @@ -307,10 +308,13 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, (flags & NO_CONT_MAPPINGS) == 0) __prot = __pgprot(pgprot_val(prot) | PTE_CONT); - init_pmd(pudp, addr, next, phys, __prot, pgtable_alloc, flags); + pmdp = init_pmd(pmdp, addr, next, phys, __prot, pgtable_alloc, + flags); phys += next - addr; } while (addr = next, addr != end); + + pmd_clear_fixmap(); } static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end, From patchwork Thu Apr 4 14:33:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13617956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D4C20CD1299 for ; Thu, 4 Apr 2024 14:33:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=AF1eyslSE95JU92VGvS372KpH+anWIt9XwvaxUZoOEg=; b=v59Dbrcbf05Gfi h0c+T7e6o3oLZlA74cTDI5gqv33ZOcA25g8gSJLnr1t1pf+jjo/yz6egqRV5974r37ZwGbbaONdAR 1KsLVLswz9uue5s1UAowVVEB9DC3fonKemrSo8+oWz+i/vAStVBPPGiHEWu9ky5P9JkB3CtICNGdF OsF8f5amj+ZR6DyihoiLmEVL8/ZUeC+NMNJJwSAxuFhO7TgTY08Jglx870apF4rcxC+XaEW8bjG6O wb73/zzhIyxwLO8uO4dkjruzuPQKqFKtoBjgvZugfDPniY+9sVnXx7DO8Zlwq6/z2u7HhiU/iKFAs JyuJ1ZRukcOGhvSAcTfQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsO9h-0000000330O-0f8H; Thu, 04 Apr 2024 14:33:29 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsO9a-000000032yL-3pWt for linux-arm-kernel@lists.infradead.org; Thu, 04 Apr 2024 14:33:24 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B87C1150C; Thu, 4 Apr 2024 07:33:52 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 94D6A3F64C; Thu, 4 Apr 2024 07:33:20 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Mark Rutland , Ard Biesheuvel , David Hildenbrand , Donald Dutile , Eric Chanudet Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Itaru Kitayama Subject: [PATCH v2 2/4] arm64: mm: Batch dsb and isb when populating pgtables Date: Thu, 4 Apr 2024 15:33:06 +0100 Message-Id: <20240404143308.2224141-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240404143308.2224141-1-ryan.roberts@arm.com> References: <20240404143308.2224141-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240404_073323_121132_EFF75321 X-CRM114-Status: GOOD ( 12.45 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org After removing uneccessary TLBIs, the next bottleneck when creating the page tables for the linear map is DSB and ISB, which were previously issued per-pte in __set_pte(). Since we are writing multiple ptes in a given pte table, we can elide these barriers and insert them once we have finished writing to the table. Execution time of map_mem(), which creates the kernel linear map page tables, was measured on different machines with different RAM configs: | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra | VM, 16G | VM, 64G | VM, 256G | Metal, 512G ---------------|-------------|-------------|-------------|------------- | ms (%) | ms (%) | ms (%) | ms (%) ---------------|-------------|-------------|-------------|------------- before | 77 (0%) | 431 (0%) | 1727 (0%) | 3796 (0%) after | 13 (-84%) | 162 (-62%) | 655 (-62%) | 1656 (-56%) Signed-off-by: Ryan Roberts Tested-by: Itaru Kitayama Tested-by: Eric Chanudet --- arch/arm64/include/asm/pgtable.h | 7 ++++++- arch/arm64/mm/mmu.c | 13 ++++++++++++- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index afdd56d26ad7..105a95a8845c 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -271,9 +271,14 @@ static inline pte_t pte_mkdevmap(pte_t pte) return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL)); } -static inline void __set_pte(pte_t *ptep, pte_t pte) +static inline void __set_pte_nosync(pte_t *ptep, pte_t pte) { WRITE_ONCE(*ptep, pte); +} + +static inline void __set_pte(pte_t *ptep, pte_t pte) +{ + __set_pte_nosync(ptep, pte); /* * Only if the new pte is valid and kernel, otherwise TLB maintenance diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index fd91b5bdb514..dc86dceb0efe 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -178,7 +178,11 @@ static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end, do { pte_t old_pte = __ptep_get(ptep); - __set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot)); + /* + * Required barriers to make this visible to the table walker + * are deferred to the end of alloc_init_cont_pte(). + */ + __set_pte_nosync(ptep, pfn_pte(__phys_to_pfn(phys), prot)); /* * After the PTE entry has been populated once, we @@ -234,6 +238,13 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, } while (addr = next, addr != end); pte_clear_fixmap(); + + /* + * Ensure all previous pgtable writes are visible to the table walker. + * See init_pte(). + */ + dsb(ishst); + isb(); } static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end, From patchwork Thu Apr 4 14:33:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13617958 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 64400CD1284 for ; Thu, 4 Apr 2024 14:33:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=SimjBoSfBDj6SUya19rMGFyGs85z2fBI3DN/q8mAnps=; b=LzJW25kMoWqRNG j/GUgygZbsN9B21rnGUgkyE8nRw3RUrRQDy1tcLFR/VcJV+IPnSXgdiGOqAfGwb4hIJmIv2HGf6i3 srz6bnm3URCc+IWVbXq8dS6q/SF/HSrm2oAqDltH0mo/qjEHh8Km44NdfOTuWk92lYjO03P6ruWD0 bpZJWw9izR30ktFaLrXLTevZsnqRNoSDxPf2AW9tNbEr14wYaKt/xh/Hsozbxa/34JrpQL64QGdzd gF4HJLOy8wMPx0d6Mf3VSzhYdRNI/gk5L/JiOjX7kED/yv2I6WzllTF0/Yt/nQgYLlwPNeh93ERlt 0qATuUhXam+S309MjwEQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsO9i-00000003319-0JZi; Thu, 04 Apr 2024 14:33:30 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsO9c-000000032yv-0Wdo for linux-arm-kernel@lists.infradead.org; Thu, 04 Apr 2024 14:33:26 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 809F4FEC; Thu, 4 Apr 2024 07:33:54 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 413143F64C; Thu, 4 Apr 2024 07:33:22 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Mark Rutland , Ard Biesheuvel , David Hildenbrand , Donald Dutile , Eric Chanudet Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Itaru Kitayama Subject: [PATCH v2 3/4] arm64: mm: Don't remap pgtables for allocate vs populate Date: Thu, 4 Apr 2024 15:33:07 +0100 Message-Id: <20240404143308.2224141-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240404143308.2224141-1-ryan.roberts@arm.com> References: <20240404143308.2224141-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240404_073324_292389_428DB4FB X-CRM114-Status: GOOD ( 25.85 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org During linear map pgtable creation, each pgtable is fixmapped / fixunmapped twice; once during allocation to zero the memory, and a again during population to write the entries. This means each table has 2 TLB invalidations issued against it. Let's fix this so that each table is only fixmapped/fixunmapped once, halving the number of TLBIs, and improving performance. Achieve this by abstracting pgtable allocate, map and unmap operations out of the main pgtable population loop code and into a `struct pgtable_ops` function pointer structure. This allows us to formalize the semantics of "alloc" to mean "alloc and map", requiring an "unmap" when finished. So "map" is only performed (and also matched by "unmap") if the pgtable has already been allocated. As a side effect of this refactoring, we no longer need to use the fixmap at all once pages have been mapped in the linear map because their "map" operation can simply do a __va() translation. So with this change, we are down to 1 TLBI per table when doing early pgtable manipulations, and 0 TLBIs when doing late pgtable manipulations. Execution time of map_mem(), which creates the kernel linear map page tables, was measured on different machines with different RAM configs: | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra | VM, 16G | VM, 64G | VM, 256G | Metal, 512G ---------------|-------------|-------------|-------------|------------- | ms (%) | ms (%) | ms (%) | ms (%) ---------------|-------------|-------------|-------------|------------- before | 13 (0%) | 162 (0%) | 655 (0%) | 1656 (0%) after | 11 (-15%) | 109 (-33%) | 449 (-31%) | 1257 (-24%) Signed-off-by: Ryan Roberts Tested-by: Itaru Kitayama Tested-by: Eric Chanudet --- arch/arm64/include/asm/mmu.h | 8 + arch/arm64/include/asm/pgtable.h | 2 + arch/arm64/kernel/cpufeature.c | 10 +- arch/arm64/mm/mmu.c | 308 ++++++++++++++++++++++--------- 4 files changed, 237 insertions(+), 91 deletions(-) diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 65977c7783c5..ae44353010e8 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -109,6 +109,14 @@ static inline bool kaslr_requires_kpti(void) return true; } +#ifdef CONFIG_UNMAP_KERNEL_AT_EL0 +extern +void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt, + phys_addr_t size, pgprot_t prot, + void *(*pgtable_alloc)(int, phys_addr_t *), + int flags); +#endif + #define INIT_MM_CONTEXT(name) \ .pgd = swapper_pg_dir, diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 105a95a8845c..92c9aed5e7af 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1010,6 +1010,8 @@ static inline p4d_t *p4d_offset_kimg(pgd_t *pgdp, u64 addr) static inline bool pgtable_l5_enabled(void) { return false; } +#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1)) + /* Match p4d_offset folding in */ #define p4d_set_fixmap(addr) NULL #define p4d_set_fixmap_offset(p4dp, addr) ((p4d_t *)p4dp) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 56583677c1f2..9a70b1954706 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1866,17 +1866,13 @@ static bool has_lpa2(const struct arm64_cpu_capabilities *entry, int scope) #ifdef CONFIG_UNMAP_KERNEL_AT_EL0 #define KPTI_NG_TEMP_VA (-(1UL << PMD_SHIFT)) -extern -void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt, - phys_addr_t size, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(int), int flags); - static phys_addr_t __initdata kpti_ng_temp_alloc; -static phys_addr_t __init kpti_ng_pgd_alloc(int shift) +static void *__init kpti_ng_pgd_alloc(int type, phys_addr_t *pa) { kpti_ng_temp_alloc -= PAGE_SIZE; - return kpti_ng_temp_alloc; + *pa = kpti_ng_temp_alloc; + return __va(kpti_ng_temp_alloc); } static int __init __kpti_install_ng_mappings(void *__unused) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index dc86dceb0efe..90bf822859b8 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -41,9 +41,42 @@ #include #include +enum pgtable_type { + TYPE_P4D = 0, + TYPE_PUD = 1, + TYPE_PMD = 2, + TYPE_PTE = 3, +}; + +/** + * struct pgtable_ops - Ops to allocate and access pgtable memory. Calls must be + * serialized by the caller. + * @alloc: Allocates 1 page of memory for use as pgtable `type` and maps it + * into va space. Returned memory is zeroed. Puts physical address + * of page in *pa, and returns virtual address of the mapping. User + * must explicitly unmap() before doing another alloc() or map() of + * the same `type`. + * @map: Determines the physical address of the pgtable of `type` by + * interpretting `parent` as the pgtable entry for the next level + * up. Maps the page and returns virtual address of the pgtable + * entry within the table that corresponds to `addr`. User must + * explicitly unmap() before doing another alloc() or map() of the + * same `type`. + * @unmap: Unmap the currently mapped page of `type`, which will have been + * mapped either as a result of a previous call to alloc() or + * map(). The page's virtual address must be considered invalid + * after this call returns. + */ +struct pgtable_ops { + void *(*alloc)(int type, phys_addr_t *pa); + void *(*map)(int type, void *parent, unsigned long addr); + void (*unmap)(int type); +}; + #define NO_BLOCK_MAPPINGS BIT(0) #define NO_CONT_MAPPINGS BIT(1) #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */ +#define NO_ALLOC BIT(3) u64 kimage_voffset __ro_after_init; EXPORT_SYMBOL(kimage_voffset); @@ -106,34 +139,89 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, } EXPORT_SYMBOL(phys_mem_access_prot); -static phys_addr_t __init early_pgtable_alloc(int shift) +static void *__init early_pgtable_alloc(int type, phys_addr_t *pa) { - phys_addr_t phys; - void *ptr; + void *va; - phys = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0, - MEMBLOCK_ALLOC_NOLEAKTRACE); - if (!phys) + *pa = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0, + MEMBLOCK_ALLOC_NOLEAKTRACE); + if (!*pa) panic("Failed to allocate page table page\n"); - /* - * The FIX_{PGD,PUD,PMD} slots may be in active use, but the FIX_PTE - * slot will be free, so we can (ab)use the FIX_PTE slot to initialise - * any level of table. - */ - ptr = pte_set_fixmap(phys); + switch (type) { + case TYPE_P4D: + va = p4d_set_fixmap(*pa); + break; + case TYPE_PUD: + va = pud_set_fixmap(*pa); + break; + case TYPE_PMD: + va = pmd_set_fixmap(*pa); + break; + case TYPE_PTE: + va = pte_set_fixmap(*pa); + break; + default: + BUG(); + } + memset(va, 0, PAGE_SIZE); - memset(ptr, 0, PAGE_SIZE); + /* Ensure the zeroed page is visible to the page table walker */ + dsb(ishst); - /* - * Implicit barriers also ensure the zeroed page is visible to the page - * table walker - */ - pte_clear_fixmap(); + return va; +} + +static void *__init early_pgtable_map(int type, void *parent, unsigned long addr) +{ + void *entry; + + switch (type) { + case TYPE_P4D: + entry = p4d_set_fixmap_offset((pgd_t *)parent, addr); + break; + case TYPE_PUD: + entry = pud_set_fixmap_offset((p4d_t *)parent, addr); + break; + case TYPE_PMD: + entry = pmd_set_fixmap_offset((pud_t *)parent, addr); + break; + case TYPE_PTE: + entry = pte_set_fixmap_offset((pmd_t *)parent, addr); + break; + default: + BUG(); + } - return phys; + return entry; +} + +static void __init early_pgtable_unmap(int type) +{ + switch (type) { + case TYPE_P4D: + p4d_clear_fixmap(); + break; + case TYPE_PUD: + pud_clear_fixmap(); + break; + case TYPE_PMD: + pmd_clear_fixmap(); + break; + case TYPE_PTE: + pte_clear_fixmap(); + break; + default: + BUG(); + } } +static struct pgtable_ops early_pgtable_ops __initdata = { + .alloc = early_pgtable_alloc, + .map = early_pgtable_map, + .unmap = early_pgtable_unmap, +}; + bool pgattr_change_is_safe(u64 old, u64 new) { /* @@ -200,7 +288,7 @@ static pte_t *init_pte(pte_t *ptep, unsigned long addr, unsigned long end, static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, phys_addr_t phys, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(int), + struct pgtable_ops *ops, int flags) { unsigned long next; @@ -214,14 +302,15 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, if (flags & NO_EXEC_MAPPINGS) pmdval |= PMD_TABLE_PXN; - BUG_ON(!pgtable_alloc); - pte_phys = pgtable_alloc(PAGE_SHIFT); + BUG_ON(flags & NO_ALLOC); + ptep = ops->alloc(TYPE_PTE, &pte_phys); + ptep += pte_index(addr); __pmd_populate(pmdp, pte_phys, pmdval); - pmd = READ_ONCE(*pmdp); + } else { + BUG_ON(pmd_bad(pmd)); + ptep = ops->map(TYPE_PTE, pmdp, addr); } - BUG_ON(pmd_bad(pmd)); - ptep = pte_set_fixmap_offset(pmdp, addr); do { pgprot_t __prot = prot; @@ -237,7 +326,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, phys += next - addr; } while (addr = next, addr != end); - pte_clear_fixmap(); + ops->unmap(TYPE_PTE); /* * Ensure all previous pgtable writes are visible to the table walker. @@ -249,7 +338,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end, phys_addr_t phys, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(int), int flags) + struct pgtable_ops *ops, int flags) { unsigned long next; @@ -271,7 +360,7 @@ static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end, READ_ONCE(pmd_val(*pmdp)))); } else { alloc_init_cont_pte(pmdp, addr, next, phys, prot, - pgtable_alloc, flags); + ops, flags); BUG_ON(pmd_val(old_pmd) != 0 && pmd_val(old_pmd) != READ_ONCE(pmd_val(*pmdp))); @@ -285,7 +374,7 @@ static pmd_t *init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end, static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, unsigned long end, phys_addr_t phys, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(int), int flags) + struct pgtable_ops *ops, int flags) { unsigned long next; pud_t pud = READ_ONCE(*pudp); @@ -301,14 +390,15 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, if (flags & NO_EXEC_MAPPINGS) pudval |= PUD_TABLE_PXN; - BUG_ON(!pgtable_alloc); - pmd_phys = pgtable_alloc(PMD_SHIFT); + BUG_ON(flags & NO_ALLOC); + pmdp = ops->alloc(TYPE_PMD, &pmd_phys); + pmdp += pmd_index(addr); __pud_populate(pudp, pmd_phys, pudval); - pud = READ_ONCE(*pudp); + } else { + BUG_ON(pud_bad(pud)); + pmdp = ops->map(TYPE_PMD, pudp, addr); } - BUG_ON(pud_bad(pud)); - pmdp = pmd_set_fixmap_offset(pudp, addr); do { pgprot_t __prot = prot; @@ -319,18 +409,17 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, (flags & NO_CONT_MAPPINGS) == 0) __prot = __pgprot(pgprot_val(prot) | PTE_CONT); - pmdp = init_pmd(pmdp, addr, next, phys, __prot, pgtable_alloc, - flags); + pmdp = init_pmd(pmdp, addr, next, phys, __prot, ops, flags); phys += next - addr; } while (addr = next, addr != end); - pmd_clear_fixmap(); + ops->unmap(TYPE_PMD); } static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end, phys_addr_t phys, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(int), + struct pgtable_ops *ops, int flags) { unsigned long next; @@ -343,14 +432,15 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end, if (flags & NO_EXEC_MAPPINGS) p4dval |= P4D_TABLE_PXN; - BUG_ON(!pgtable_alloc); - pud_phys = pgtable_alloc(PUD_SHIFT); + BUG_ON(flags & NO_ALLOC); + pudp = ops->alloc(TYPE_PUD, &pud_phys); + pudp += pud_index(addr); __p4d_populate(p4dp, pud_phys, p4dval); - p4d = READ_ONCE(*p4dp); + } else { + BUG_ON(p4d_bad(p4d)); + pudp = ops->map(TYPE_PUD, p4dp, addr); } - BUG_ON(p4d_bad(p4d)); - pudp = pud_set_fixmap_offset(p4dp, addr); do { pud_t old_pud = READ_ONCE(*pudp); @@ -372,7 +462,7 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end, READ_ONCE(pud_val(*pudp)))); } else { alloc_init_cont_pmd(pudp, addr, next, phys, prot, - pgtable_alloc, flags); + ops, flags); BUG_ON(pud_val(old_pud) != 0 && pud_val(old_pud) != READ_ONCE(pud_val(*pudp))); @@ -380,12 +470,12 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end, phys += next - addr; } while (pudp++, addr = next, addr != end); - pud_clear_fixmap(); + ops->unmap(TYPE_PUD); } static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end, phys_addr_t phys, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(int), + struct pgtable_ops *ops, int flags) { unsigned long next; @@ -398,21 +488,21 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end, if (flags & NO_EXEC_MAPPINGS) pgdval |= PGD_TABLE_PXN; - BUG_ON(!pgtable_alloc); - p4d_phys = pgtable_alloc(P4D_SHIFT); + BUG_ON(flags & NO_ALLOC); + p4dp = ops->alloc(TYPE_P4D, &p4d_phys); + p4dp += p4d_index(addr); __pgd_populate(pgdp, p4d_phys, pgdval); - pgd = READ_ONCE(*pgdp); + } else { + BUG_ON(pgd_bad(pgd)); + p4dp = ops->map(TYPE_P4D, pgdp, addr); } - BUG_ON(pgd_bad(pgd)); - p4dp = p4d_set_fixmap_offset(pgdp, addr); do { p4d_t old_p4d = READ_ONCE(*p4dp); next = p4d_addr_end(addr, end); - alloc_init_pud(p4dp, addr, next, phys, prot, - pgtable_alloc, flags); + alloc_init_pud(p4dp, addr, next, phys, prot, ops, flags); BUG_ON(p4d_val(old_p4d) != 0 && p4d_val(old_p4d) != READ_ONCE(p4d_val(*p4dp))); @@ -420,13 +510,13 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end, phys += next - addr; } while (p4dp++, addr = next, addr != end); - p4d_clear_fixmap(); + ops->unmap(TYPE_P4D); } static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys, unsigned long virt, phys_addr_t size, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(int), + struct pgtable_ops *ops, int flags) { unsigned long addr, end, next; @@ -445,8 +535,7 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys, do { next = pgd_addr_end(addr, end); - alloc_init_p4d(pgdp, addr, next, phys, prot, pgtable_alloc, - flags); + alloc_init_p4d(pgdp, addr, next, phys, prot, ops, flags); phys += next - addr; } while (pgdp++, addr = next, addr != end); } @@ -454,36 +543,31 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys, static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys, unsigned long virt, phys_addr_t size, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(int), + struct pgtable_ops *ops, int flags) { mutex_lock(&fixmap_lock); __create_pgd_mapping_locked(pgdir, phys, virt, size, prot, - pgtable_alloc, flags); + ops, flags); mutex_unlock(&fixmap_lock); } -#ifdef CONFIG_UNMAP_KERNEL_AT_EL0 -extern __alias(__create_pgd_mapping_locked) -void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt, - phys_addr_t size, pgprot_t prot, - phys_addr_t (*pgtable_alloc)(int), int flags); -#endif - -static phys_addr_t __pgd_pgtable_alloc(int shift) +static void *__pgd_pgtable_alloc(int type, phys_addr_t *pa) { - void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL); - BUG_ON(!ptr); + void *va = (void *)__get_free_page(GFP_PGTABLE_KERNEL); + + BUG_ON(!va); /* Ensure the zeroed page is visible to the page table walker */ dsb(ishst); - return __pa(ptr); + *pa = __pa(va); + return va; } -static phys_addr_t pgd_pgtable_alloc(int shift) +static void *pgd_pgtable_alloc(int type, phys_addr_t *pa) { - phys_addr_t pa = __pgd_pgtable_alloc(shift); - struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa)); + void *va = __pgd_pgtable_alloc(type, pa); + struct ptdesc *ptdesc = page_ptdesc(phys_to_page(*pa)); /* * Call proper page table ctor in case later we need to @@ -493,13 +577,69 @@ static phys_addr_t pgd_pgtable_alloc(int shift) * We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is * folded, and if so pagetable_pte_ctor() becomes nop. */ - if (shift == PAGE_SHIFT) + if (type == TYPE_PTE) BUG_ON(!pagetable_pte_ctor(ptdesc)); - else if (shift == PMD_SHIFT) + else if (type == TYPE_PMD) BUG_ON(!pagetable_pmd_ctor(ptdesc)); - return pa; + return va; +} + +static void *pgd_pgtable_map(int type, void *parent, unsigned long addr) +{ + void *entry; + + switch (type) { + case TYPE_P4D: + entry = p4d_offset((pgd_t *)parent, addr); + break; + case TYPE_PUD: + entry = pud_offset((p4d_t *)parent, addr); + break; + case TYPE_PMD: + entry = pmd_offset((pud_t *)parent, addr); + break; + case TYPE_PTE: + entry = pte_offset_kernel((pmd_t *)parent, addr); + break; + default: + BUG(); + } + + return entry; +} + +static void pgd_pgtable_unmap(int type) +{ +} + +static struct pgtable_ops pgd_pgtable_ops = { + .alloc = pgd_pgtable_alloc, + .map = pgd_pgtable_map, + .unmap = pgd_pgtable_unmap, +}; + +static struct pgtable_ops __pgd_pgtable_ops = { + .alloc = __pgd_pgtable_alloc, + .map = pgd_pgtable_map, + .unmap = pgd_pgtable_unmap, +}; + +#ifdef CONFIG_UNMAP_KERNEL_AT_EL0 +void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt, + phys_addr_t size, pgprot_t prot, + void *(*pgtable_alloc)(int, phys_addr_t *), + int flags) +{ + struct pgtable_ops ops = { + .alloc = pgtable_alloc, + .map = pgd_pgtable_map, + .unmap = pgd_pgtable_unmap, + }; + + __create_pgd_mapping_locked(pgdir, phys, virt, size, prot, &ops, flags); } +#endif /* * This function can only be used to modify existing table entries, @@ -514,8 +654,8 @@ void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt, &phys, virt); return; } - __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, - NO_CONT_MAPPINGS); + __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, + &early_pgtable_ops, NO_CONT_MAPPINGS | NO_ALLOC); } void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys, @@ -530,7 +670,7 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys, flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; __create_pgd_mapping(mm->pgd, phys, virt, size, prot, - pgd_pgtable_alloc, flags); + &pgd_pgtable_ops, flags); } static void update_mapping_prot(phys_addr_t phys, unsigned long virt, @@ -542,8 +682,8 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt, return; } - __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, - NO_CONT_MAPPINGS); + __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, + &pgd_pgtable_ops, NO_CONT_MAPPINGS | NO_ALLOC); /* flush the TLBs after updating live kernel mappings */ flush_tlb_kernel_range(virt, virt + size); @@ -553,7 +693,7 @@ static void __init __map_memblock(pgd_t *pgdp, phys_addr_t start, phys_addr_t end, pgprot_t prot, int flags) { __create_pgd_mapping(pgdp, start, __phys_to_virt(start), end - start, - prot, early_pgtable_alloc, flags); + prot, &early_pgtable_ops, flags); } void __init mark_linear_text_alias_ro(void) @@ -744,7 +884,7 @@ static int __init map_entry_trampoline(void) memset(tramp_pg_dir, 0, PGD_SIZE); __create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, entry_tramp_text_size(), prot, - __pgd_pgtable_alloc, NO_BLOCK_MAPPINGS); + &__pgd_pgtable_ops, NO_BLOCK_MAPPINGS); /* Map both the text and data into the kernel page table */ for (i = 0; i < DIV_ROUND_UP(entry_tramp_text_size(), PAGE_SIZE); i++) @@ -1346,7 +1486,7 @@ int arch_add_memory(int nid, u64 start, u64 size, flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start), - size, params->pgprot, __pgd_pgtable_alloc, + size, params->pgprot, &__pgd_pgtable_ops, flags); memblock_clear_nomap(start, size); From patchwork Thu Apr 4 14:33:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13617959 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 47838CD1284 for ; Thu, 4 Apr 2024 14:33:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ZHhBnlRChcfIhV9f9L9r+DDMnNFuozvGqHNnLhVMoOc=; b=UJBBzJ3wwfwGiv T/CMCK28KPuaXquMyCezS0y3bj+I8wLBayJUhFeD7Q6Tw5QVDeYA6ok4TtvjR91AT0R7s6BaLVsgw s+I0D12AqMCAoFs3qUUxqn4YpIapKDR0VaCW0JFofjq2hm0kXiry1NEmaUK/MsyrQUzSzVBnO/WTv Pw4MYmIxvIFFUILNk1qEICfgUkm1HVgN8jV4a1zY1dDyYdhUp7312pSJDY2izHKWByuCtbNHSIbGS jterFrJj+Yezy5rrbgF7tqo+qETmp7u2MXwJH3TSbfXRbWoMsVVWLWr0ZNAhUzJArC64ycuKIOEfg qvpy61SugBvdO/i7ZQrg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsO9p-0000000335m-2G5e; Thu, 04 Apr 2024 14:33:37 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rsO9e-000000032zd-26OH for linux-arm-kernel@lists.infradead.org; Thu, 04 Apr 2024 14:33:28 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2D4A11007; Thu, 4 Apr 2024 07:33:56 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 086CD3F64C; Thu, 4 Apr 2024 07:33:23 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Mark Rutland , Ard Biesheuvel , David Hildenbrand , Donald Dutile , Eric Chanudet Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Itaru Kitayama Subject: [PATCH v2 4/4] arm64: mm: Lazily clear pte table mappings from fixmap Date: Thu, 4 Apr 2024 15:33:08 +0100 Message-Id: <20240404143308.2224141-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240404143308.2224141-1-ryan.roberts@arm.com> References: <20240404143308.2224141-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240404_073326_674448_5856DFE7 X-CRM114-Status: GOOD ( 20.97 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org With the pgtable operations abstracted into `struct pgtable_ops`, the early pgtable alloc, map and unmap operations are nicely centralized. So let's enhance the implementation to speed up the clearing of pte table mappings in the fixmap. Extend FIX_MAP so that we now have 16 slots in the fixmap dedicated for pte tables. At alloc/map time, we select the next slot in the series and map it. Or if we are at the end and no more slots are available, clear down all of the slots and start at the beginning again. Batching the clear like this means we can issue tlbis more efficiently. Due to the batching, there may still be some slots mapped at the end, so address this by adding an optional cleanup() function to `struct pgtable_ops` to handle this for us. Execution time of map_mem(), which creates the kernel linear map page tables, was measured on different machines with different RAM configs: | Apple M2 VM | Ampere Altra| Ampere Altra| Ampere Altra | VM, 16G | VM, 64G | VM, 256G | Metal, 512G ---------------|-------------|-------------|-------------|------------- | ms (%) | ms (%) | ms (%) | ms (%) ---------------|-------------|-------------|-------------|------------- before | 11 (0%) | 109 (0%) | 449 (0%) | 1257 (0%) after | 6 (-46%) | 61 (-44%) | 257 (-43%) | 838 (-33%) Signed-off-by: Ryan Roberts Tested-by: Itaru Kitayama Tested-by: Eric Chanudet --- arch/arm64/include/asm/fixmap.h | 5 +++- arch/arm64/include/asm/pgtable.h | 4 --- arch/arm64/mm/fixmap.c | 11 ++++++++ arch/arm64/mm/mmu.c | 44 +++++++++++++++++++++++++++++--- 4 files changed, 56 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h index 87e307804b99..91fcd7c5c513 100644 --- a/arch/arm64/include/asm/fixmap.h +++ b/arch/arm64/include/asm/fixmap.h @@ -84,7 +84,9 @@ enum fixed_addresses { * Used for kernel page table creation, so unmapped memory may be used * for tables. */ - FIX_PTE, +#define NR_PTE_SLOTS 16 + FIX_PTE_END, + FIX_PTE_BEGIN = FIX_PTE_END + NR_PTE_SLOTS - 1, FIX_PMD, FIX_PUD, FIX_P4D, @@ -108,6 +110,7 @@ void __init early_fixmap_init(void); #define __late_clear_fixmap(idx) __set_fixmap((idx), 0, FIXMAP_PAGE_CLEAR) extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot); +void __init clear_fixmap_nosync(enum fixed_addresses idx); #include diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 92c9aed5e7af..4c7114d49697 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -691,10 +691,6 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) /* Find an entry in the third-level page table. */ #define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t)) -#define pte_set_fixmap(addr) ((pte_t *)set_fixmap_offset(FIX_PTE, addr)) -#define pte_set_fixmap_offset(pmd, addr) pte_set_fixmap(pte_offset_phys(pmd, addr)) -#define pte_clear_fixmap() clear_fixmap(FIX_PTE) - #define pmd_page(pmd) phys_to_page(__pmd_to_phys(pmd)) /* use ONLY for statically allocated translation tables */ diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c index de1e09d986ad..0cb09bedeeec 100644 --- a/arch/arm64/mm/fixmap.c +++ b/arch/arm64/mm/fixmap.c @@ -131,6 +131,17 @@ void __set_fixmap(enum fixed_addresses idx, } } +void __init clear_fixmap_nosync(enum fixed_addresses idx) +{ + unsigned long addr = __fix_to_virt(idx); + pte_t *ptep; + + BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses); + + ptep = fixmap_pte(addr); + __pte_clear(&init_mm, addr, ptep); +} + void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot) { const u64 dt_virt_base = __fix_to_virt(FIX_FDT); diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 90bf822859b8..2e3b594aa23c 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -66,11 +66,14 @@ enum pgtable_type { * mapped either as a result of a previous call to alloc() or * map(). The page's virtual address must be considered invalid * after this call returns. + * @cleanup: (Optional) Called at the end of a set of operations to cleanup + * any lazy state. */ struct pgtable_ops { void *(*alloc)(int type, phys_addr_t *pa); void *(*map)(int type, void *parent, unsigned long addr); void (*unmap)(int type); + void (*cleanup)(void); }; #define NO_BLOCK_MAPPINGS BIT(0) @@ -139,9 +142,33 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, } EXPORT_SYMBOL(phys_mem_access_prot); +static int pte_slot_next __initdata = FIX_PTE_BEGIN; + +static void __init clear_pte_fixmap_slots(void) +{ + unsigned long start = __fix_to_virt(FIX_PTE_BEGIN); + unsigned long end = __fix_to_virt(pte_slot_next); + int i; + + for (i = FIX_PTE_BEGIN; i > pte_slot_next; i--) + clear_fixmap_nosync(i); + + flush_tlb_kernel_range(start, end); + pte_slot_next = FIX_PTE_BEGIN; +} + +static int __init pte_fixmap_slot(void) +{ + if (pte_slot_next < FIX_PTE_END) + clear_pte_fixmap_slots(); + + return pte_slot_next--; +} + static void *__init early_pgtable_alloc(int type, phys_addr_t *pa) { void *va; + int slot; *pa = memblock_phys_alloc_range(PAGE_SIZE, PAGE_SIZE, 0, MEMBLOCK_ALLOC_NOLEAKTRACE); @@ -159,7 +186,9 @@ static void *__init early_pgtable_alloc(int type, phys_addr_t *pa) va = pmd_set_fixmap(*pa); break; case TYPE_PTE: - va = pte_set_fixmap(*pa); + slot = pte_fixmap_slot(); + set_fixmap(slot, *pa); + va = (pte_t *)__fix_to_virt(slot); break; default: BUG(); @@ -174,7 +203,9 @@ static void *__init early_pgtable_alloc(int type, phys_addr_t *pa) static void *__init early_pgtable_map(int type, void *parent, unsigned long addr) { + phys_addr_t pa; void *entry; + int slot; switch (type) { case TYPE_P4D: @@ -187,7 +218,10 @@ static void *__init early_pgtable_map(int type, void *parent, unsigned long addr entry = pmd_set_fixmap_offset((pud_t *)parent, addr); break; case TYPE_PTE: - entry = pte_set_fixmap_offset((pmd_t *)parent, addr); + slot = pte_fixmap_slot(); + pa = pte_offset_phys((pmd_t *)parent, addr); + set_fixmap(slot, pa); + entry = (pte_t *)(__fix_to_virt(slot) + (pa & (PAGE_SIZE - 1))); break; default: BUG(); @@ -209,7 +243,7 @@ static void __init early_pgtable_unmap(int type) pmd_clear_fixmap(); break; case TYPE_PTE: - pte_clear_fixmap(); + // Unmap lazily: see clear_pte_fixmap_slots(). break; default: BUG(); @@ -220,6 +254,7 @@ static struct pgtable_ops early_pgtable_ops __initdata = { .alloc = early_pgtable_alloc, .map = early_pgtable_map, .unmap = early_pgtable_unmap, + .cleanup = clear_pte_fixmap_slots, }; bool pgattr_change_is_safe(u64 old, u64 new) @@ -538,6 +573,9 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys, alloc_init_p4d(pgdp, addr, next, phys, prot, ops, flags); phys += next - addr; } while (pgdp++, addr = next, addr != end); + + if (ops->cleanup) + ops->cleanup(); } static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,