diff mbox series

[v2,3/3] arm: extend pfn_valid to take into accound freed memory map alignment

Message ID 20210519141436.11961-4-rppt@kernel.org (mailing list archive)
State New, archived
Headers show
Series memblock, arm: fixes for freeing of the memory map | expand

Commit Message

Mike Rapoport May 19, 2021, 2:14 p.m. UTC
From: Mike Rapoport <rppt@linux.ibm.com>

When unused memory map is freed the preserved part of the memory map is
extended to match pageblock boundaries because lots of core mm
functionality relies on homogeneity of the memory map within pageblock
boundaries.

Since pfn_valid() is used to check whether there is a valid memory map
entry for a PFN, make it return true also for PFNs that have memory map
entries even if there is no actual memory populated there.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm/mm/init.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Tony Lindgren June 28, 2021, 10:20 a.m. UTC | #1
Hi,

Looks like this patch causes a boot regression at least for Cortex-A15.
That's commit 990e6d0e1de8 ("arm: extend pfn_valid to take into accound
freed memory map alignment") in Linux next.

Most of the time I see the following on beagle-x15 right after init starts:

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
CPU0: stopping
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc7-next-20210625 #100
Hardware name: Generic DRA74X (Flattened Device Tree)
[<c0110c54>] (unwind_backtrace) from [<c010b408>] (show_stack+0x10/0x14)
[<c010b408>] (show_stack) from [<c09fae04>] (dump_stack_lvl+0x40/0x4c)
[<c09fae04>] (dump_stack_lvl) from [<c010e768>] (do_handle_IPI+0x2c8/0x334)
[<c010e768>] (do_handle_IPI) from [<c010e7e8>] (ipi_handler+0x14/0x20)
[<c010e7e8>] (ipi_handler) from [<c01a5f14>] (handle_percpu_devid_irq+0xa8/0x22c)
[<c01a5f14>] (handle_percpu_devid_irq) from [<c019fc78>] (handle_domain_irq+0x64/0xa4)
[<c019fc78>] (handle_domain_irq) from [<c05b9bdc>] (gic_handle_irq+0x88/0xb0)
[<c05b9bdc>] (gic_handle_irq) from [<c0100b6c>] (__irq_svc+0x6c/0x90)
Exception stack(0xc0f01f08 to 0xc0f01f50)
1f00:                   00000f38 00000f37 00000000 fe600000 c0ff90c0 00000000
1f20: c0f0520c c0f05260 00000000 c0f00000 00000000 c0e788f0 00000000 c0f01f58
1f40: c0126aa0 c0107dc4 60000013 ffffffff
[<c0100b6c>] (__irq_svc) from [<c0107dc4>] (arch_cpu_idle+0x1c/0x3c)
[<c0107dc4>] (arch_cpu_idle) from [<c0a098d8>] (default_idle_call+0x38/0xe0)
[<c0a098d8>] (default_idle_call) from [<c0172860>] (do_idle+0x214/0x2cc)
[<c0172860>] (do_idle) from [<c0172c0c>] (cpu_startup_entry+0x18/0x1c)
[<c0172c0c>] (cpu_startup_entry) from [<c0e00ef8>] (start_kernel+0x5cc/0x6c4)

Sometimes the system boots to console, but maybe only about 20% of the
time. Reverting 990e6d0e1de8 makes Linux next boot again for me.

Regards,

Tony

#regzb introduced: 990e6d0e1de8 ("arm: extend pfn_valid to take into accound freed memory map alignment")


* Mike Rapoport <rppt@kernel.org> [700101 02:00]:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> When unused memory map is freed the preserved part of the memory map is
> extended to match pageblock boundaries because lots of core mm
> functionality relies on homogeneity of the memory map within pageblock
> boundaries.
> 
> Since pfn_valid() is used to check whether there is a valid memory map
> entry for a PFN, make it return true also for PFNs that have memory map
> entries even if there is no actual memory populated there.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/arm/mm/init.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 9d4744a632c6..6162a070a410 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -125,11 +125,22 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>  int pfn_valid(unsigned long pfn)
>  {
>  	phys_addr_t addr = __pfn_to_phys(pfn);
> +	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
>  
>  	if (__phys_to_pfn(addr) != pfn)
>  		return 0;
>  
> -	return memblock_is_map_memory(addr);
> +	/*
> +	 * If address less than pageblock_size bytes away from a present
> +	 * memory chunk there still will be a memory map entry for it
> +	 * because we round freed memory map to the pageblock boundaries.
> +	 */
> +	if (memblock_overlaps_region(&memblock.memory,
> +				     ALIGN_DOWN(addr, pageblock_size),
> +				     pageblock_size))
> +		return 1;
> +
> +	return 0;
>  }
>  EXPORT_SYMBOL(pfn_valid);
>  #endif
> -- 
> 2.28.0
>
Mike Rapoport June 28, 2021, 1:16 p.m. UTC | #2
Hi Tony,

On Mon, Jun 28, 2021 at 01:20:11PM +0300, Tony Lindgren wrote:
> Hi,
> 
> Looks like this patch causes a boot regression at least for Cortex-A15.
> That's commit 990e6d0e1de8 ("arm: extend pfn_valid to take into accound
> freed memory map alignment") in Linux next.
> 
> Most of the time I see the following on beagle-x15 right after init starts:
> 
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> CPU0: stopping
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc7-next-20210625 #100
> Hardware name: Generic DRA74X (Flattened Device Tree)
> [<c0110c54>] (unwind_backtrace) from [<c010b408>] (show_stack+0x10/0x14)
> [<c010b408>] (show_stack) from [<c09fae04>] (dump_stack_lvl+0x40/0x4c)
> [<c09fae04>] (dump_stack_lvl) from [<c010e768>] (do_handle_IPI+0x2c8/0x334)
> [<c010e768>] (do_handle_IPI) from [<c010e7e8>] (ipi_handler+0x14/0x20)
> [<c010e7e8>] (ipi_handler) from [<c01a5f14>] (handle_percpu_devid_irq+0xa8/0x22c)
> [<c01a5f14>] (handle_percpu_devid_irq) from [<c019fc78>] (handle_domain_irq+0x64/0xa4)
> [<c019fc78>] (handle_domain_irq) from [<c05b9bdc>] (gic_handle_irq+0x88/0xb0)
> [<c05b9bdc>] (gic_handle_irq) from [<c0100b6c>] (__irq_svc+0x6c/0x90)
> Exception stack(0xc0f01f08 to 0xc0f01f50)
> 1f00:                   00000f38 00000f37 00000000 fe600000 c0ff90c0 00000000
> 1f20: c0f0520c c0f05260 00000000 c0f00000 00000000 c0e788f0 00000000 c0f01f58
> 1f40: c0126aa0 c0107dc4 60000013 ffffffff
> [<c0100b6c>] (__irq_svc) from [<c0107dc4>] (arch_cpu_idle+0x1c/0x3c)
> [<c0107dc4>] (arch_cpu_idle) from [<c0a098d8>] (default_idle_call+0x38/0xe0)
> [<c0a098d8>] (default_idle_call) from [<c0172860>] (do_idle+0x214/0x2cc)
> [<c0172860>] (do_idle) from [<c0172c0c>] (cpu_startup_entry+0x18/0x1c)
> [<c0172c0c>] (cpu_startup_entry) from [<c0e00ef8>] (start_kernel+0x5cc/0x6c4)
> 
> Sometimes the system boots to console, but maybe only about 20% of the
> time. Reverting 990e6d0e1de8 makes Linux next boot again for me.

Can you please send log with 'memblock=debug' added to the command line?
 
> Regards,
> 
> Tony
> 
> #regzb introduced: 990e6d0e1de8 ("arm: extend pfn_valid to take into accound freed memory map alignment")
> 
> 
> * Mike Rapoport <rppt@kernel.org> [700101 02:00]:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > When unused memory map is freed the preserved part of the memory map is
> > extended to match pageblock boundaries because lots of core mm
> > functionality relies on homogeneity of the memory map within pageblock
> > boundaries.
> > 
> > Since pfn_valid() is used to check whether there is a valid memory map
> > entry for a PFN, make it return true also for PFNs that have memory map
> > entries even if there is no actual memory populated there.
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >  arch/arm/mm/init.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > index 9d4744a632c6..6162a070a410 100644
> > --- a/arch/arm/mm/init.c
> > +++ b/arch/arm/mm/init.c
> > @@ -125,11 +125,22 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
> >  int pfn_valid(unsigned long pfn)
> >  {
> >  	phys_addr_t addr = __pfn_to_phys(pfn);
> > +	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
> >  
> >  	if (__phys_to_pfn(addr) != pfn)
> >  		return 0;
> >  
> > -	return memblock_is_map_memory(addr);
> > +	/*
> > +	 * If address less than pageblock_size bytes away from a present
> > +	 * memory chunk there still will be a memory map entry for it
> > +	 * because we round freed memory map to the pageblock boundaries.
> > +	 */
> > +	if (memblock_overlaps_region(&memblock.memory,
> > +				     ALIGN_DOWN(addr, pageblock_size),
> > +				     pageblock_size))
> > +		return 1;
> > +
> > +	return 0;
> >  }
> >  EXPORT_SYMBOL(pfn_valid);
> >  #endif
> > -- 
> > 2.28.0
> >
Tony Lindgren June 28, 2021, 3:26 p.m. UTC | #3
* Mike Rapoport <rppt@linux.ibm.com> [210628 14:07]:
> Can you please send log with 'memblock=debug' added to the command line?

Sure, log now available at:

http://muru.com/beagle-x15.txt

Regards,

Tony
Mike Rapoport June 29, 2021, 5:33 a.m. UTC | #4
On Mon, Jun 28, 2021 at 06:26:26PM +0300, Tony Lindgren wrote:
> * Mike Rapoport <rppt@linux.ibm.com> [210628 14:07]:
> > Can you please send log with 'memblock=debug' added to the command line?
> 
> Sure, log now available at:
> 
> http://muru.com/beagle-x15.txt

Hmm, no clues yet :(

Do you have CONFIG_DEBUG_VM, CONFIG_DEBUG_VM_PGFLAGS and
CONFIG_PAGE_POISONING enabled in your config?
If not, can you please enable them and see if any of VM_BUG_* triggers?

Do you use FLATMEM or SPARSEMEM in your config?

Let's try seeing what PFNs get false results from pfn_valid, maybe this
will give a better lead.

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 6162a070a410..66985fc3e730 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -126,10 +126,16 @@ int pfn_valid(unsigned long pfn)
 {
 	phys_addr_t addr = __pfn_to_phys(pfn);
 	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
+	bool overlaps = memblock_overlaps_region(&memblock.memory,
+				     ALIGN_DOWN(addr, pageblock_size),
+				     pageblock_size);
 
 	if (__phys_to_pfn(addr) != pfn)
 		return 0;
 
+	if (memblock_is_map_memory(addr) != overlaps)
+		pr_info("%s(%pS): pfn: %lx: is_map: %d overlaps: %d\n", __func__, (void *)_RET_IP_, pfn, memblock_is_map_memory(addr), overlaps);
+
 	/*
 	 * If address less than pageblock_size bytes away from a present
 	 * memory chunk there still will be a memory map entry for it
Tony Lindgren June 29, 2021, 8:54 a.m. UTC | #5
Hi,

* Mike Rapoport <rppt@linux.ibm.com> [210629 05:33]:
> On Mon, Jun 28, 2021 at 06:26:26PM +0300, Tony Lindgren wrote:
> > * Mike Rapoport <rppt@linux.ibm.com> [210628 14:07]:
> > > Can you please send log with 'memblock=debug' added to the command line?
> > 
> > Sure, log now available at:
> > 
> > http://muru.com/beagle-x15.txt
> 
> Hmm, no clues yet :(
> 
> Do you have CONFIG_DEBUG_VM, CONFIG_DEBUG_VM_PGFLAGS and
> CONFIG_PAGE_POISONING enabled in your config?
> If not, can you please enable them and see if any of VM_BUG_* triggers?

OK enabled, and no errors or warnings are triggered.

> Do you use FLATMEM or SPARSEMEM in your config?

Looks like make omap2plus_defconfig enables FLATMEM:

$ grep -e SPARSEMEM -e FLATMEM .config
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y

> Let's try seeing what PFNs get false results from pfn_valid, maybe this
> will give a better lead.

With your patch below, system boots with lots of the following:

[   13.058654] Freeing unused kernel image (initmem) memory: 1024K
...
[   13.129211] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffb6: is_map: 1 overlaps: 0
[   13.137481] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffb7: is_map: 1 overlaps: 0
[   13.145751] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffb8: is_map: 1 overlaps: 0
[   13.153991] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffb9: is_map: 1 overlaps: 0
[   13.162200] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffba: is_map: 1 overlaps: 0
[   13.170440] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffbb: is_map: 1 overlaps: 0
[   13.178680] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffbc: is_map: 1 overlaps: 0
[   13.186920] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffbd: is_map: 1 overlaps: 0
[   13.195159] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffbe: is_map: 1 overlaps: 0
[   13.203399] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffbf: is_map: 1 overlaps: 0
[   13.211639] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fff00: is_map: 1 overlaps: 0
...

Then changing console loglevel to 0 boots system to login prompt. But I'm
seeing some init processes segfaulting during start-up.

Regards,

Tony


> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 6162a070a410..66985fc3e730 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -126,10 +126,16 @@ int pfn_valid(unsigned long pfn)
>  {
>  	phys_addr_t addr = __pfn_to_phys(pfn);
>  	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
> +	bool overlaps = memblock_overlaps_region(&memblock.memory,
> +				     ALIGN_DOWN(addr, pageblock_size),
> +				     pageblock_size);
>  
>  	if (__phys_to_pfn(addr) != pfn)
>  		return 0;
>  
> +	if (memblock_is_map_memory(addr) != overlaps)
> +		pr_info("%s(%pS): pfn: %lx: is_map: %d overlaps: %d\n", __func__, (void *)_RET_IP_, pfn, memblock_is_map_memory(addr), overlaps);
> +
>  	/*
>  	 * If address less than pageblock_size bytes away from a present
>  	 * memory chunk there still will be a memory map entry for it
> 
> -- 
> Sincerely yours,
> Mike.
Mike Rapoport June 29, 2021, 10:50 a.m. UTC | #6
Hi,

On Tue, Jun 29, 2021 at 11:54:02AM +0300, Tony Lindgren wrote:
> Hi,
> 
> * Mike Rapoport <rppt@linux.ibm.com> [210629 05:33]:
> > On Mon, Jun 28, 2021 at 06:26:26PM +0300, Tony Lindgren wrote:
> > > * Mike Rapoport <rppt@linux.ibm.com> [210628 14:07]:
> > > > Can you please send log with 'memblock=debug' added to the command line?
> > > 
> > > Sure, log now available at:
> > > 
> > > http://muru.com/beagle-x15.txt
> > 
> > Hmm, no clues yet :(
> > 
> > Do you have CONFIG_DEBUG_VM, CONFIG_DEBUG_VM_PGFLAGS and
> > CONFIG_PAGE_POISONING enabled in your config?
> > If not, can you please enable them and see if any of VM_BUG_* triggers?
> 
> OK enabled, and no errors or warnings are triggered.
> 
> > Do you use FLATMEM or SPARSEMEM in your config?
> 
> Looks like make omap2plus_defconfig enables FLATMEM:
> 
> $ grep -e SPARSEMEM -e FLATMEM .config
> CONFIG_ARCH_FLATMEM_ENABLE=y
> CONFIG_ARCH_SPARSEMEM_ENABLE=y
> CONFIG_FLATMEM_MANUAL=y
> # CONFIG_SPARSEMEM_MANUAL is not set
> CONFIG_FLATMEM=y
> 
> > Let's try seeing what PFNs get false results from pfn_valid, maybe this
> > will give a better lead.
> 
> With your patch below, system boots with lots of the following:
> 
> [   13.058654] Freeing unused kernel image (initmem) memory: 1024K
> ...
> [   13.129211] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffb6: is_map: 1 overlaps: 0
> [   13.137481] pfn_valid(__sync_icache_dcache+0x2c/0x138): pfn: fffb7: is_map: 1 overlaps: 0

...

> Then changing console loglevel to 0 boots system to login prompt. But I'm
> seeing some init processes segfaulting during start-up.

As it seems, the new version of pfn_valid() decides that last pages are not
valid because of the overflow in memblock_overlaps_region(). As the result,
__sync_icache_dcache() skips flushing these pages.

The patch below should fix this. I've left the prints for now, hopefully
they will not appear anymore. 

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 6162a070a410..7ba22d23eca4 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -126,10 +126,16 @@ int pfn_valid(unsigned long pfn)
 {
 	phys_addr_t addr = __pfn_to_phys(pfn);
 	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
+	bool overlaps = memblock_overlaps_region(&memblock.memory,
+				     ALIGN_DOWN(addr, pageblock_size),
+				     pageblock_size - 1);
 
 	if (__phys_to_pfn(addr) != pfn)
 		return 0;
 
+	if (memblock_is_map_memory(addr) != overlaps)
+		pr_info("%s(%pS): pfn: %lx: is_map: %d overlaps: %d\n", __func__, (void *)_RET_IP_, pfn, memblock_is_map_memory(addr), overlaps);
+
 	/*
 	 * If address less than pageblock_size bytes away from a present
 	 * memory chunk there still will be a memory map entry for it
@@ -137,7 +143,7 @@ int pfn_valid(unsigned long pfn)
 	 */
 	if (memblock_overlaps_region(&memblock.memory,
 				     ALIGN_DOWN(addr, pageblock_size),
-				     pageblock_size))
+				     pageblock_size - 1))
 		return 1;
 
 	return 0;
Tony Lindgren June 29, 2021, 11:52 a.m. UTC | #7
* Mike Rapoport <rppt@linux.ibm.com> [210629 10:51]:
> As it seems, the new version of pfn_valid() decides that last pages are not
> valid because of the overflow in memblock_overlaps_region(). As the result,
> __sync_icache_dcache() skips flushing these pages.
> 
> The patch below should fix this. I've left the prints for now, hopefully
> they will not appear anymore. 

Yes this allows the system to boot for me :)

I'm still seeing these three prints though:

...
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (3994.41 BogoMIPS).
CPU: All CPU(s) started in SVC mode.
pfn_valid(__pageblock_pfn_to_page+0x14/0xa8): pfn: afe00: is_map: 0 overlaps: 1
pfn_valid(__pageblock_pfn_to_page+0x28/0xa8): pfn: affff: is_map: 0 overlaps: 1
pfn_valid(__pageblock_pfn_to_page+0x38/0xa8): pfn: afe00: is_map: 0 overlaps: 1
devtmpfs: initialized
...

Regards,

Tony


> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 6162a070a410..7ba22d23eca4 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -126,10 +126,16 @@ int pfn_valid(unsigned long pfn)
>  {
>  	phys_addr_t addr = __pfn_to_phys(pfn);
>  	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
> +	bool overlaps = memblock_overlaps_region(&memblock.memory,
> +				     ALIGN_DOWN(addr, pageblock_size),
> +				     pageblock_size - 1);
>  
>  	if (__phys_to_pfn(addr) != pfn)
>  		return 0;
>  
> +	if (memblock_is_map_memory(addr) != overlaps)
> +		pr_info("%s(%pS): pfn: %lx: is_map: %d overlaps: %d\n", __func__, (void *)_RET_IP_, pfn, memblock_is_map_memory(addr), overlaps);
> +
>  	/*
>  	 * If address less than pageblock_size bytes away from a present
>  	 * memory chunk there still will be a memory map entry for it
> @@ -137,7 +143,7 @@ int pfn_valid(unsigned long pfn)
>  	 */
>  	if (memblock_overlaps_region(&memblock.memory,
>  				     ALIGN_DOWN(addr, pageblock_size),
> -				     pageblock_size))
> +				     pageblock_size - 1))
>  		return 1;
>  
>  	return 0;
>  
> -- 
> Sincerely yours,
> Mike.
Mike Rapoport June 29, 2021, 12:50 p.m. UTC | #8
On Tue, Jun 29, 2021 at 02:52:39PM +0300, Tony Lindgren wrote:
> * Mike Rapoport <rppt@linux.ibm.com> [210629 10:51]:
> > As it seems, the new version of pfn_valid() decides that last pages are not
> > valid because of the overflow in memblock_overlaps_region(). As the result,
> > __sync_icache_dcache() skips flushing these pages.
> > 
> > The patch below should fix this. I've left the prints for now, hopefully
> > they will not appear anymore. 
> 
> Yes this allows the system to boot for me :)
> 
> I'm still seeing these three prints though:
> 
> ...
> smp: Brought up 1 node, 2 CPUs
> SMP: Total of 2 processors activated (3994.41 BogoMIPS).
> CPU: All CPU(s) started in SVC mode.
> pfn_valid(__pageblock_pfn_to_page+0x14/0xa8): pfn: afe00: is_map: 0 overlaps: 1
> pfn_valid(__pageblock_pfn_to_page+0x28/0xa8): pfn: affff: is_map: 0 overlaps: 1
> pfn_valid(__pageblock_pfn_to_page+0x38/0xa8): pfn: afe00: is_map: 0 overlaps: 1

These pfns do have memory map despite they are stolen in
arm_memblock_steal():

memblock_free: [0xaff00000-0xafffffff] arm_memblock_steal+0x50/0x70
memblock_remove: [0xaff00000-0xafffffff] arm_memblock_steal+0x5c/0x70
...
memblock_free: [0xafe00000-0xafefffff] arm_memblock_steal+0x50/0x70
memblock_remove: [0xafe00000-0xafefffff] arm_memblock_steal+0x5c/0x70

But the struct pages there are never initialized.

I'll resend v3 of the entire set with an addition patch to take care of
that as well.

> devtmpfs: initialized
> ...
> 
> Regards,
> 
> Tony
> 
> 
> > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > index 6162a070a410..7ba22d23eca4 100644
> > --- a/arch/arm/mm/init.c
> > +++ b/arch/arm/mm/init.c
> > @@ -126,10 +126,16 @@ int pfn_valid(unsigned long pfn)
> >  {
> >  	phys_addr_t addr = __pfn_to_phys(pfn);
> >  	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
> > +	bool overlaps = memblock_overlaps_region(&memblock.memory,
> > +				     ALIGN_DOWN(addr, pageblock_size),
> > +				     pageblock_size - 1);
> >  
> >  	if (__phys_to_pfn(addr) != pfn)
> >  		return 0;
> >  
> > +	if (memblock_is_map_memory(addr) != overlaps)
> > +		pr_info("%s(%pS): pfn: %lx: is_map: %d overlaps: %d\n", __func__, (void *)_RET_IP_, pfn, memblock_is_map_memory(addr), overlaps);
> > +
> >  	/*
> >  	 * If address less than pageblock_size bytes away from a present
> >  	 * memory chunk there still will be a memory map entry for it
> > @@ -137,7 +143,7 @@ int pfn_valid(unsigned long pfn)
> >  	 */
> >  	if (memblock_overlaps_region(&memblock.memory,
> >  				     ALIGN_DOWN(addr, pageblock_size),
> > -				     pageblock_size))
> > +				     pageblock_size - 1))
> >  		return 1;
> >  
> >  	return 0;
> >  
> > -- 
> > Sincerely yours,
> > Mike.
Tony Lindgren June 29, 2021, 12:53 p.m. UTC | #9
* Mike Rapoport <rppt@linux.ibm.com> [210629 12:50]:
> On Tue, Jun 29, 2021 at 02:52:39PM +0300, Tony Lindgren wrote:
> > * Mike Rapoport <rppt@linux.ibm.com> [210629 10:51]:
> > > As it seems, the new version of pfn_valid() decides that last pages are not
> > > valid because of the overflow in memblock_overlaps_region(). As the result,
> > > __sync_icache_dcache() skips flushing these pages.
> > > 
> > > The patch below should fix this. I've left the prints for now, hopefully
> > > they will not appear anymore. 
> > 
> > Yes this allows the system to boot for me :)
> > 
> > I'm still seeing these three prints though:
> > 
> > ...
> > smp: Brought up 1 node, 2 CPUs
> > SMP: Total of 2 processors activated (3994.41 BogoMIPS).
> > CPU: All CPU(s) started in SVC mode.
> > pfn_valid(__pageblock_pfn_to_page+0x14/0xa8): pfn: afe00: is_map: 0 overlaps: 1
> > pfn_valid(__pageblock_pfn_to_page+0x28/0xa8): pfn: affff: is_map: 0 overlaps: 1
> > pfn_valid(__pageblock_pfn_to_page+0x38/0xa8): pfn: afe00: is_map: 0 overlaps: 1
> 
> These pfns do have memory map despite they are stolen in
> arm_memblock_steal():
> 
> memblock_free: [0xaff00000-0xafffffff] arm_memblock_steal+0x50/0x70
> memblock_remove: [0xaff00000-0xafffffff] arm_memblock_steal+0x5c/0x70
> ...
> memblock_free: [0xafe00000-0xafefffff] arm_memblock_steal+0x50/0x70
> memblock_remove: [0xafe00000-0xafefffff] arm_memblock_steal+0x5c/0x70
> 
> But the struct pages there are never initialized.
> 
> I'll resend v3 of the entire set with an addition patch to take care of
> that as well.

OK sounds good to me :)

Thanks,

Tony
Mike Rapoport June 30, 2021, 7:20 a.m. UTC | #10
On Tue, Jun 29, 2021 at 03:50:23PM +0300, Mike Rapoport wrote:
> On Tue, Jun 29, 2021 at 02:52:39PM +0300, Tony Lindgren wrote:
> > * Mike Rapoport <rppt@linux.ibm.com> [210629 10:51]:
> > > As it seems, the new version of pfn_valid() decides that last pages are not
> > > valid because of the overflow in memblock_overlaps_region(). As the result,
> > > __sync_icache_dcache() skips flushing these pages.
> > > 
> > > The patch below should fix this. I've left the prints for now, hopefully
> > > they will not appear anymore. 
> > 
> > Yes this allows the system to boot for me :)
> > 
> > I'm still seeing these three prints though:
> > 
> > ...
> > smp: Brought up 1 node, 2 CPUs
> > SMP: Total of 2 processors activated (3994.41 BogoMIPS).
> > CPU: All CPU(s) started in SVC mode.
> > pfn_valid(__pageblock_pfn_to_page+0x14/0xa8): pfn: afe00: is_map: 0 overlaps: 1
> > pfn_valid(__pageblock_pfn_to_page+0x28/0xa8): pfn: affff: is_map: 0 overlaps: 1
> > pfn_valid(__pageblock_pfn_to_page+0x38/0xa8): pfn: afe00: is_map: 0 overlaps: 1
> 
> These pfns do have memory map despite they are stolen in
> arm_memblock_steal():
> 
> memblock_free: [0xaff00000-0xafffffff] arm_memblock_steal+0x50/0x70
> memblock_remove: [0xaff00000-0xafffffff] arm_memblock_steal+0x5c/0x70
> ...
> memblock_free: [0xafe00000-0xafefffff] arm_memblock_steal+0x50/0x70
> memblock_remove: [0xafe00000-0xafefffff] arm_memblock_steal+0x5c/0x70
> 
> But the struct pages there are never initialized.

Actually, with FLATMEM these struct pages will be always set to 0 because
we don't do memory map poisoning with FLATMEM.

I could not find a case where zeroed struct page would cause real trouble,
so I'd say it is more theoretical issue and it can be addressed unrelated
to these changes.
diff mbox series

Patch

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 9d4744a632c6..6162a070a410 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -125,11 +125,22 @@  static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
 int pfn_valid(unsigned long pfn)
 {
 	phys_addr_t addr = __pfn_to_phys(pfn);
+	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
 
 	if (__phys_to_pfn(addr) != pfn)
 		return 0;
 
-	return memblock_is_map_memory(addr);
+	/*
+	 * If address less than pageblock_size bytes away from a present
+	 * memory chunk there still will be a memory map entry for it
+	 * because we round freed memory map to the pageblock boundaries.
+	 */
+	if (memblock_overlaps_region(&memblock.memory,
+				     ALIGN_DOWN(addr, pageblock_size),
+				     pageblock_size))
+		return 1;
+
+	return 0;
 }
 EXPORT_SYMBOL(pfn_valid);
 #endif