Message ID | 20210111194017.22696-2-rppt@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: fix initialization of struct page for holes in memory layout | expand |
On Mon, Jan 11, 2021 at 09:40:16PM +0200, Mike Rapoport wrote: > From: Mike Rapoport <rppt@linux.ibm.com> > > The first 4Kb of memory is a BIOS owned area and to avoid its allocation > for the kernel it was not listed in e820 tables as memory. As the result, > pfn 0 was never recognised by the generic memory management and it is not a > part of neither node 0 nor ZONE_DMA. So, since it never was added to memblock.memory structs, it was not initialized by init_unavailable_mem, right?
On Wed, Jan 13, 2021 at 09:56:49AM +0100, Oscar Salvador wrote: > On Mon, Jan 11, 2021 at 09:40:16PM +0200, Mike Rapoport wrote: > > From: Mike Rapoport <rppt@linux.ibm.com> > > > > The first 4Kb of memory is a BIOS owned area and to avoid its allocation > > for the kernel it was not listed in e820 tables as memory. As the result, > > pfn 0 was never recognised by the generic memory management and it is not a > > part of neither node 0 nor ZONE_DMA. > > So, since it never was added to memblock.memory structs, it was not > initialized by init_unavailable_mem, right? Actually it was initialized by init_unavailable_mem() and got zone=0 and node=0, but the DMA zone started from pfn 1, so pfn 0 was never a part of ZONE_DMA. > -- > Oscar Salvador > SUSE L3
On 11.01.21 20:40, Mike Rapoport wrote: > From: Mike Rapoport <rppt@linux.ibm.com> > > The first 4Kb of memory is a BIOS owned area and to avoid its allocation > for the kernel it was not listed in e820 tables as memory. As the result, > pfn 0 was never recognised by the generic memory management and it is not a > part of neither node 0 nor ZONE_DMA. > > If set_pfnblock_flags_mask() would be ever called for the pageblock > corresponding to the first 2Mbytes of memory, having pfn 0 outside of > ZONE_DMA would trigger > > VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); > > Along with reserving the first 4Kb in e820 tables, several first pages are > reserved with memblock in several places during setup_arch(). These > reservations are enough to ensure the kernel does not touch the BIOS area > and it is not necessary to remove E820_TYPE_RAM for pfn 0. > > Remove the update of e820 table that changes the type of pfn 0 and move the > comment describing why it was done to trim_low_memory_range() that reserves > the beginning of the memory. > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> > --- > arch/x86/kernel/setup.c | 20 +++++++++----------- > 1 file changed, 9 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 740f3bdb3f61..3412c4595efd 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -660,17 +660,6 @@ static void __init trim_platform_memory_ranges(void) > > static void __init trim_bios_range(void) > { > - /* > - * A special case is the first 4Kb of memory; > - * This is a BIOS owned area, not kernel ram, but generally > - * not listed as such in the E820 table. > - * > - * This typically reserves additional memory (64KiB by default) > - * since some BIOSes are known to corrupt low memory. See the > - * Kconfig help text for X86_RESERVE_LOW. > - */ > - e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED); > - > /* > * special case: Some BIOSes report the PC BIOS > * area (640Kb -> 1Mb) as RAM even though it is not. > @@ -728,6 +717,15 @@ early_param("reservelow", parse_reservelow); > > static void __init trim_low_memory_range(void) > { > + /* > + * A special case is the first 4Kb of memory; > + * This is a BIOS owned area, not kernel ram, but generally > + * not listed as such in the E820 table. > + * > + * This typically reserves additional memory (64KiB by default) > + * since some BIOSes are known to corrupt low memory. See the > + * Kconfig help text for X86_RESERVE_LOW. > + */ > memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); > } > > The only somewhat-confusing thing is that in-between e820__memblock_setup() and trim_low_memory_range(), we already have memblock allocations. So [0..4095] might look like ordinary memory until we reserve it later on. E.g., reserve_real_mode() does a mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE); ... memblock_reserve(mem, size); set_real_mode_mem(mem); which looks kind of suspicious to me. Most probably I am missing something, just wanted to point that out. We might want to do such trimming/adjustments before any kind of allocations.
On Wed, Jan 13, 2021 at 01:56:45PM +0100, David Hildenbrand wrote: > On 11.01.21 20:40, Mike Rapoport wrote: > > From: Mike Rapoport <rppt@linux.ibm.com> > > > > The first 4Kb of memory is a BIOS owned area and to avoid its allocation > > for the kernel it was not listed in e820 tables as memory. As the result, > > pfn 0 was never recognised by the generic memory management and it is not a > > part of neither node 0 nor ZONE_DMA. > > > > If set_pfnblock_flags_mask() would be ever called for the pageblock > > corresponding to the first 2Mbytes of memory, having pfn 0 outside of > > ZONE_DMA would trigger > > > > VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); > > > > Along with reserving the first 4Kb in e820 tables, several first pages are > > reserved with memblock in several places during setup_arch(). These > > reservations are enough to ensure the kernel does not touch the BIOS area > > and it is not necessary to remove E820_TYPE_RAM for pfn 0. > > > > Remove the update of e820 table that changes the type of pfn 0 and move the > > comment describing why it was done to trim_low_memory_range() that reserves > > the beginning of the memory. > > > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> > > --- > > arch/x86/kernel/setup.c | 20 +++++++++----------- > > 1 file changed, 9 insertions(+), 11 deletions(-) > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > > index 740f3bdb3f61..3412c4595efd 100644 > > --- a/arch/x86/kernel/setup.c > > +++ b/arch/x86/kernel/setup.c > > @@ -660,17 +660,6 @@ static void __init trim_platform_memory_ranges(void) > > > > static void __init trim_bios_range(void) > > { > > - /* > > - * A special case is the first 4Kb of memory; > > - * This is a BIOS owned area, not kernel ram, but generally > > - * not listed as such in the E820 table. > > - * > > - * This typically reserves additional memory (64KiB by default) > > - * since some BIOSes are known to corrupt low memory. See the > > - * Kconfig help text for X86_RESERVE_LOW. > > - */ > > - e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED); > > - > > /* > > * special case: Some BIOSes report the PC BIOS > > * area (640Kb -> 1Mb) as RAM even though it is not. > > @@ -728,6 +717,15 @@ early_param("reservelow", parse_reservelow); > > > > static void __init trim_low_memory_range(void) > > { > > + /* > > + * A special case is the first 4Kb of memory; > > + * This is a BIOS owned area, not kernel ram, but generally > > + * not listed as such in the E820 table. > > + * > > + * This typically reserves additional memory (64KiB by default) > > + * since some BIOSes are known to corrupt low memory. See the > > + * Kconfig help text for X86_RESERVE_LOW. > > + */ > > memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); > > } > > > > > > The only somewhat-confusing thing is that in-between > e820__memblock_setup() and trim_low_memory_range(), we already have > memblock allocations. So [0..4095] might look like ordinary memory until > we reserve it later on. > > E.g., reserve_real_mode() does a > > mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE); > ... > memblock_reserve(mem, size); > set_real_mode_mem(mem); > > which looks kind of suspicious to me. Most probably I am missing > something, just wanted to point that out. We might want to do such > trimming/adjustments before any kind of allocations. You are right and it looks suspicious, but the first page is reserved at the very beginning of x86::setup_arch() and, moreover, memblock never allocates it (look at memblock::memblock_find_in_range_node()). As for the range 0x1000 <-> reserve_low, we are unlikely to allocate it in the default top-down mode. The bottom-up mode was only allocating memory above the kernel so this would also prevent allocation of the lowest memory, at least until the recent changes for CMA allocation: https://lore.kernel.org/lkml/20201217201214.3414100-1-guro@fb.com That said, we'd better consolidate all the trim_some_memory() and move it closer to the beginning of setup_arch(). I'm going to take a look at it in the next few days. > -- > Thanks, > > David / dhildenb >
On Mon, Jan 11, 2021 at 09:40:16PM +0200, Mike Rapoport wrote: > From: Mike Rapoport <rppt@linux.ibm.com> > > The first 4Kb of memory is a BIOS owned area and to avoid its allocation > for the kernel it was not listed in e820 tables as memory. As the result, > pfn 0 was never recognised by the generic memory management and it is not a > part of neither node 0 nor ZONE_DMA. > > If set_pfnblock_flags_mask() would be ever called for the pageblock > corresponding to the first 2Mbytes of memory, having pfn 0 outside of > ZONE_DMA would trigger > > VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); > > Along with reserving the first 4Kb in e820 tables, several first pages are > reserved with memblock in several places during setup_arch(). These > reservations are enough to ensure the kernel does not touch the BIOS area > and it is not necessary to remove E820_TYPE_RAM for pfn 0. > > Remove the update of e820 table that changes the type of pfn 0 and move the > comment describing why it was done to trim_low_memory_range() that reserves > the beginning of the memory. > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> > --- > arch/x86/kernel/setup.c | 20 +++++++++----------- > 1 file changed, 9 insertions(+), 11 deletions(-) FWIW, Acked-by: Borislav Petkov <bp@suse.de>
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 740f3bdb3f61..3412c4595efd 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -660,17 +660,6 @@ static void __init trim_platform_memory_ranges(void) static void __init trim_bios_range(void) { - /* - * A special case is the first 4Kb of memory; - * This is a BIOS owned area, not kernel ram, but generally - * not listed as such in the E820 table. - * - * This typically reserves additional memory (64KiB by default) - * since some BIOSes are known to corrupt low memory. See the - * Kconfig help text for X86_RESERVE_LOW. - */ - e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED); - /* * special case: Some BIOSes report the PC BIOS * area (640Kb -> 1Mb) as RAM even though it is not. @@ -728,6 +717,15 @@ early_param("reservelow", parse_reservelow); static void __init trim_low_memory_range(void) { + /* + * A special case is the first 4Kb of memory; + * This is a BIOS owned area, not kernel ram, but generally + * not listed as such in the E820 table. + * + * This typically reserves additional memory (64KiB by default) + * since some BIOSes are known to corrupt low memory. See the + * Kconfig help text for X86_RESERVE_LOW. + */ memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); }