From patchwork Wed Jul 11 16:15:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Morse X-Patchwork-Id: 10520203 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id BF46D600CA for ; Wed, 11 Jul 2018 16:31:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ADF7029639 for ; Wed, 11 Jul 2018 16:31:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AC40429643; Wed, 11 Jul 2018 16:31:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1D2BE29678 for ; Wed, 11 Jul 2018 16:31:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date: Message-ID:From:References:To:Subject:Reply-To:Content-ID:Content-Description :Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=SIH6HbfF9JknjIW72biAaNQ7uwpc0G8bVt1pnoLSsQE=; b=GZQCn/5bUh9lvW +I+DCmh6htITnoOnd+F+u6hQqs9+L5ODnvoqD4x9BW+9Tvw/nFGShX4JqCcV4tTNE95Ev1pbnWlCE 45Fz3TSslx5aQ1odE9zGUjJ6eN5LsFMlSDMzM26n4m4+zdrme4tYSVfP4qYRlKcbkra8ofScPxQ1n ie6LQ2LKZAMSQJNOhDlNAqzUk3YUZ6ajsFIfPTqZS2jlWqubDkDaaxga9jwwlPQhLEv/3SCGrWO+D w27R9u0gq0AjmMuDqoruGHbMJ9Ik3EzQ68ypd/GUDSv5cmJgRzx2TUCJvNr9Cb77uAMfnWsJZaUTC 3szVrW8k9vJ1bvpV8WHw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fdI19-0003wc-Kn; Wed, 11 Jul 2018 16:31:03 +0000 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fdHmB-0004R8-LO for linux-arm-kernel@lists.infradead.org; Wed, 11 Jul 2018 16:16:17 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DB09D7A9; Wed, 11 Jul 2018 09:15:23 -0700 (PDT) Received: from [10.1.206.34] (melchizedek.cambridge.arm.com [10.1.206.34]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 819EC3F589; Wed, 11 Jul 2018 09:15:22 -0700 (PDT) Subject: Re: [PATCH v3 5/5] arm64/mm: Move {idmap_pg_dir, swapper_pg_dir} to .rodata section To: Jun Yao References: <20180702111659.25570-1-yaojun8558363@gmail.com> <20180702111659.25570-6-yaojun8558363@gmail.com> From: James Morse Message-ID: Date: Wed, 11 Jul 2018 17:15:18 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180702111659.25570-6-yaojun8558363@gmail.com> Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180711_091536_394685_AC520315 X-CRM114-Status: GOOD ( 30.30 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: catalin.marinas@arm.com, will.deacon@arm.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, suzuki.poulose@arm.com Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Yun, On 02/07/18 12:16, Jun Yao wrote: > Move {idmap_pg_dir, swapper_pg_dir} to .rodata section and > populate swapper_pg_dir by fixmap. (any chance you could split the fixmap bits into a separate patch so that the rodata move comes last? This will make review and bisecting any problems easier.) > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h > index 2e05bcd944c8..a0ce7d0f81c5 100644 > --- a/arch/arm64/include/asm/pgalloc.h > +++ b/arch/arm64/include/asm/pgalloc.h > @@ -29,6 +29,23 @@ > #define PGALLOC_GFP (GFP_KERNEL | __GFP_ZERO) > #define PGD_SIZE (PTRS_PER_PGD * sizeof(pgd_t)) > > +static inline int in_swapper_dir(void *addr) > +{ > + if ((unsigned long)addr >= (unsigned long)swapper_pg_dir && > + (unsigned long)addr < (unsigned long)swapper_pg_end) { > + return 1; > + } > + return 0; > +} Making this a bool and returning the condition feels more natural. We know swapper_pg_dir and swapper_pg_end are both page aligned, if we can tell the compiler, it can generate better code. Something like: | return ((long)addr & PAGE_MASK) == ((long)swapper_pg_dir & PAGE_MASK); (if you agree its the same) > +static inline void *swapper_mirror_addr(void *start, void *addr) > +{ > + unsigned long offset; > + > + offset = (unsigned long)addr - (unsigned long)swapper_pg_dir; > + return start + offset; > +} I think you've re-invented __set_fixmap_offset, ... > #if CONFIG_PGTABLE_LEVELS > 2 > > static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) > @@ -49,6 +66,17 @@ static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot) > > static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmdp) > { > +#ifdef __PAGETABLE_PUD_FOLDED > + if ((mm == &init_mm) && in_swapper_dir(pudp)) { > + pud_t *pud; > + pud = pud_set_fixmap(__pa_symbol(swapper_pg_dir)); > + pud = (pud_t *)swapper_mirror_addr(pud, pudp); This is calculating the corresponding pudp in the fixmap mapping of swapper. If you give pud_set_fixmap() the physical address of the original pudp it will calculate this for you. I think this could fold down to something like: | pud_t *fixmap_pudp = pud_set_fixmap(__kimg_to_phys((long)pudp)); (please check I've dug through all that properly!) On a wider issue: these p?d_populate() helpers only put table entries down, these are used by vmalloc(). Unfortunately ioremap() will try to add block mappings if it can, which don't use this helper. (bits and pieces for testing this in [0]). Systems with nvdimms probably do this all the time... We could add the same in_swapper_dir() tests and fixmap operations to p?d_set_huge(), but it may be worth pushing them down to set_p?d(), which is where all these paths join up. This would also cover pgd_clear(), which is called via p?d_none_or_clear_bad(). I think we should do something about concurrent calls. Today that ~should work, providing they aren't modifying the same locations, but with these changes both CPUs would try and clear_fixmap()/tlbi the range from underneath each other. The p?d_alloc() helpers take the mm lock around the p?d_populate() calls, but it doesn't look like there is any other lock for the ioremap work calling p?d_set_huge(), so we will need one of our own, especially for p?d_none_or_clear_bad(). > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 3b408f21fe2e..b479d1b997c2 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -475,6 +475,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd) > */ > #define mk_pte(page,prot) pfn_pte(page_to_pfn(page),prot) > > +#define pmd_set_fixmap(addr) ((pmd_t *)set_fixmap_offset(FIX_PMD, addr)) > +#define pmd_clear_fixmap() clear_fixmap(FIX_PMD) > + Ooer, we shouldn't need to move these around. If they're wrong, it should be the subject of a separate patch, but its more likely we're using the wrong ones... > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S > index d69e11ad92e3..beff018bf0f9 100644 > --- a/arch/arm64/kernel/vmlinux.lds.S > +++ b/arch/arm64/kernel/vmlinux.lds.S > @@ -223,21 +223,25 @@ SECTIONS > BSS_SECTION(0, 0, 0) > > . = ALIGN(PAGE_SIZE); > - idmap_pg_dir = .; > - . += IDMAP_DIR_SIZE; > + > + .rodata : { Now I'm confused. I thought this named section was output by RO_DATA() further up, just before __init_begin, which mark_rodata_ro() uses as its end marker. I thought named sections had to be unique. Are we expecting the linker to realise we want it to move this stuff up there, and not move the read-only contents down here, outside the markers. (maybe the linker always takes the first definition?) Can we move these to sit near INIT_DIR/INIT_PG_TABLES as a macro, which we then put after NOTES? (something like KERNEL_PG_TABLES?) Thanks! James [0] abuse kdump to give us a 1G naturally aligned block of memory that is missing from the linear map. We can memremap() this, and get a top-level block entry on !4-level systems. Do this late enough Add 'crashdump=1G' to your cmdline ----------------------%<---------------------- section_size, PAGE_KERNEL_RO); debug_checkwx(); + +#ifdef CONFIG_KEXEC_CORE + hack__remap_kdump_region(); +#endif } static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end, ----------------------%<---------------------- diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 325cfb3b858a..081d4e216067 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -104,7 +105,7 @@ static void __init reserve_crashkernel(void) if (crash_base == 0) { /* Current arm64 boot protocol requires 2MB alignment */ crash_base = memblock_find_in_range(0, ARCH_LOW_ADDRESS_LIMIT, - crash_size, SZ_2M); + crash_size, SZ_1G); if (crash_base == 0) { pr_warn("cannot allocate crashkernel (size:0x%llx)\n", crash_size); diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 955b40efcc0c..f8cca5adb0c8 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -485,12 +485,22 @@ static void __init map_mem(pgd_t *pgdp) __map_memblock(pgdp, crashk_res.start, crashk_res.end + 1, PAGE_KERNEL, NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS); - memblock_clear_nomap(crashk_res.start, - resource_size(&crashk_res)); } #endif } +#ifdef CONFIG_KEXEC_CORE +static void hack__remap_kdump_region(void) +{ + size_t size = crashk_res.end - crashk_res.start; + + if (!crashk_res.end) + return; + + memremap((unsigned long)crashk_res.start, size, MEMREMAP_WB); +} +#endif + void mark_rodata_ro(void) { unsigned long section_size; @@ -504,6 +514,10 @@ void mark_rodata_ro(void)