mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
diff mbox series

Message ID 20190225191710.48131-1-cai@lca.pw
State New
Headers show
Series
  • mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
Related show

Commit Message

Qian Cai Feb. 25, 2019, 7:17 p.m. UTC
When onlining memory pages, it calls kernel_unmap_linear_page(),
However, it does not call kernel_map_linear_page() while offlining
memory pages. As the result, it triggers a panic below while onlining on
ppc64le as it checks if the pages are mapped before unmapping,
Therefore, let it call kernel_map_linear_page() when setting all pages
as reserved.

kernel BUG at arch/powerpc/mm/hash_utils_64.c:1815!
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA pSeries
CPU: 2 PID: 4298 Comm: bash Not tainted 5.0.0-rc7+ #15
NIP:  c000000000062670 LR: c00000000006265c CTR: 0000000000000000
REGS: c0000005bf8a75b0 TRAP: 0700   Not tainted  (5.0.0-rc7+)
MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28422842  XER: 00000000
CFAR: c000000000804f44 IRQMASK: 1
GPR00: c00000000006265c c0000005bf8a7840 c000000001518200 c0000000013cbcc8
GPR04: 0000000000080004 0000000000000000 00000000ccc457e0 c0000005c4e341d8
GPR08: 0000000000000000 0000000000000001 c000000007f4f800 0000000000000001
GPR12: 0000000000002200 c000000007f4e100 0000000000000000 0000000139c29710
GPR16: 0000000139c29714 0000000139c29788 c0000000013cbcc8 0000000000000000
GPR20: 0000000000034000 c0000000016e05e8 0000000000000000 0000000000000001
GPR24: 0000000000bf50d9 800000000000018e 0000000000000000 c0000000016e04b8
GPR28: f000000000d00040 0000006420a2f217 f000000000d00000 00ea1b2170340000
NIP [c000000000062670] __kernel_map_pages+0x2e0/0x4f0
LR [c00000000006265c] __kernel_map_pages+0x2cc/0x4f0
Call Trace:
[c0000005bf8a7840] [c00000000006265c] __kernel_map_pages+0x2cc/0x4f0 (unreliable)
[c0000005bf8a78d0] [c00000000028c4a0] free_unref_page_prepare+0x2f0/0x4d0
[c0000005bf8a7930] [c000000000293144] free_unref_page+0x44/0x90
[c0000005bf8a7970] [c00000000037af24] __online_page_free+0x84/0x110
[c0000005bf8a79a0] [c00000000037b6e0] online_pages_range+0xc0/0x150
[c0000005bf8a7a00] [c00000000005aaa8] walk_system_ram_range+0xc8/0x120
[c0000005bf8a7a50] [c00000000037e710] online_pages+0x280/0x5a0
[c0000005bf8a7b40] [c0000000006419e4] memory_subsys_online+0x1b4/0x270
[c0000005bf8a7bb0] [c000000000616720] device_online+0xc0/0xf0
[c0000005bf8a7bf0] [c000000000642570] state_store+0xc0/0x180
[c0000005bf8a7c30] [c000000000610b2c] dev_attr_store+0x3c/0x60
[c0000005bf8a7c50] [c0000000004c0a50] sysfs_kf_write+0x70/0xb0
[c0000005bf8a7c90] [c0000000004bf40c] kernfs_fop_write+0x10c/0x250
[c0000005bf8a7ce0] [c0000000003e4b18] __vfs_write+0x48/0x240
[c0000005bf8a7d80] [c0000000003e4f68] vfs_write+0xd8/0x210
[c0000005bf8a7dd0] [c0000000003e52f0] ksys_write+0x70/0x120
[c0000005bf8a7e20] [c00000000000b000] system_call+0x5c/0x70
Instruction dump:
7fbd5278 7fbd4a78 3e42ffeb 7bbd0640 3a523ac8 7e439378 487a2881 60000000
e95505f0 7e6aa0ae 6a690080 7929c9c2 <0b090000> 7f4aa1ae 7e439378 487a28dd

Signed-off-by: Qian Cai <cai@lca.pw>
---
 mm/page_alloc.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Souptick Joarder Feb. 26, 2019, 12:13 p.m. UTC | #1
On Tue, Feb 26, 2019 at 12:47 AM Qian Cai <cai@lca.pw> wrote:
>
> When onlining memory pages, it calls kernel_unmap_linear_page(),
> However, it does not call kernel_map_linear_page() while offlining
> memory pages. As the result, it triggers a panic below while onlining on
> ppc64le as it checks if the pages are mapped before unmapping,
> Therefore, let it call kernel_map_linear_page() when setting all pages
> as reserved.
>
> kernel BUG at arch/powerpc/mm/hash_utils_64.c:1815!
> Oops: Exception in kernel mode, sig: 5 [#1]
> LE SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA pSeries
> CPU: 2 PID: 4298 Comm: bash Not tainted 5.0.0-rc7+ #15
> NIP:  c000000000062670 LR: c00000000006265c CTR: 0000000000000000
> REGS: c0000005bf8a75b0 TRAP: 0700   Not tainted  (5.0.0-rc7+)
> MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28422842  XER: 00000000
> CFAR: c000000000804f44 IRQMASK: 1
> GPR00: c00000000006265c c0000005bf8a7840 c000000001518200 c0000000013cbcc8
> GPR04: 0000000000080004 0000000000000000 00000000ccc457e0 c0000005c4e341d8
> GPR08: 0000000000000000 0000000000000001 c000000007f4f800 0000000000000001
> GPR12: 0000000000002200 c000000007f4e100 0000000000000000 0000000139c29710
> GPR16: 0000000139c29714 0000000139c29788 c0000000013cbcc8 0000000000000000
> GPR20: 0000000000034000 c0000000016e05e8 0000000000000000 0000000000000001
> GPR24: 0000000000bf50d9 800000000000018e 0000000000000000 c0000000016e04b8
> GPR28: f000000000d00040 0000006420a2f217 f000000000d00000 00ea1b2170340000
> NIP [c000000000062670] __kernel_map_pages+0x2e0/0x4f0
> LR [c00000000006265c] __kernel_map_pages+0x2cc/0x4f0
> Call Trace:
> [c0000005bf8a7840] [c00000000006265c] __kernel_map_pages+0x2cc/0x4f0 (unreliable)
> [c0000005bf8a78d0] [c00000000028c4a0] free_unref_page_prepare+0x2f0/0x4d0
> [c0000005bf8a7930] [c000000000293144] free_unref_page+0x44/0x90
> [c0000005bf8a7970] [c00000000037af24] __online_page_free+0x84/0x110
> [c0000005bf8a79a0] [c00000000037b6e0] online_pages_range+0xc0/0x150
> [c0000005bf8a7a00] [c00000000005aaa8] walk_system_ram_range+0xc8/0x120
> [c0000005bf8a7a50] [c00000000037e710] online_pages+0x280/0x5a0
> [c0000005bf8a7b40] [c0000000006419e4] memory_subsys_online+0x1b4/0x270
> [c0000005bf8a7bb0] [c000000000616720] device_online+0xc0/0xf0
> [c0000005bf8a7bf0] [c000000000642570] state_store+0xc0/0x180
> [c0000005bf8a7c30] [c000000000610b2c] dev_attr_store+0x3c/0x60
> [c0000005bf8a7c50] [c0000000004c0a50] sysfs_kf_write+0x70/0xb0
> [c0000005bf8a7c90] [c0000000004bf40c] kernfs_fop_write+0x10c/0x250
> [c0000005bf8a7ce0] [c0000000003e4b18] __vfs_write+0x48/0x240
> [c0000005bf8a7d80] [c0000000003e4f68] vfs_write+0xd8/0x210
> [c0000005bf8a7dd0] [c0000000003e52f0] ksys_write+0x70/0x120
> [c0000005bf8a7e20] [c00000000000b000] system_call+0x5c/0x70
> Instruction dump:
> 7fbd5278 7fbd4a78 3e42ffeb 7bbd0640 3a523ac8 7e439378 487a2881 60000000
> e95505f0 7e6aa0ae 6a690080 7929c9c2 <0b090000> 7f4aa1ae 7e439378 487a28dd
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>  mm/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 10d0f2ed9f69..025fc93d1518 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8349,6 +8349,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>                 for (i = 0; i < (1 << order); i++)
>                         SetPageReserved((page+i));
>                 pfn += (1 << order);
> +               kernel_map_pages(page, 1 << order, 1);

Doubt , Not sure, but does this change will have any impact on
drivers/base/memory.c#L249
memory_block_action() ->  offline_pages() ?

>         }
>         spin_unlock_irqrestore(&zone->lock, flags);
>  }
> --
> 2.17.2 (Apple Git-113)
>
Michal Hocko Feb. 26, 2019, 12:35 p.m. UTC | #2
On Mon 25-02-19 14:17:10, Qian Cai wrote:
> When onlining memory pages, it calls kernel_unmap_linear_page(),
> However, it does not call kernel_map_linear_page() while offlining
> memory pages. As the result, it triggers a panic below while onlining on
> ppc64le as it checks if the pages are mapped before unmapping,
> Therefore, let it call kernel_map_linear_page() when setting all pages
> as reserved.

This really begs for much more explanation. All the pages should be
unmapped as they get freed AFAIR. So why do we need a special handing
here when this path only offlines free pages?
Qian Cai Feb. 26, 2019, 2:07 p.m. UTC | #3
On 2/26/19 7:13 AM, Souptick Joarder wrote:
> On Tue, Feb 26, 2019 at 12:47 AM Qian Cai <cai@lca.pw> wrote:
>>
>> When onlining memory pages, it calls kernel_unmap_linear_page(),
>> However, it does not call kernel_map_linear_page() while offlining
>> memory pages. As the result, it triggers a panic below while onlining on
>> ppc64le as it checks if the pages are mapped before unmapping,
>> Therefore, let it call kernel_map_linear_page() when setting all pages
>> as reserved.
>>
>> kernel BUG at arch/powerpc/mm/hash_utils_64.c:1815!
>> Oops: Exception in kernel mode, sig: 5 [#1]
>> LE SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA pSeries
>> CPU: 2 PID: 4298 Comm: bash Not tainted 5.0.0-rc7+ #15
>> NIP:  c000000000062670 LR: c00000000006265c CTR: 0000000000000000
>> REGS: c0000005bf8a75b0 TRAP: 0700   Not tainted  (5.0.0-rc7+)
>> MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28422842  XER: 00000000
>> CFAR: c000000000804f44 IRQMASK: 1
>> GPR00: c00000000006265c c0000005bf8a7840 c000000001518200 c0000000013cbcc8
>> GPR04: 0000000000080004 0000000000000000 00000000ccc457e0 c0000005c4e341d8
>> GPR08: 0000000000000000 0000000000000001 c000000007f4f800 0000000000000001
>> GPR12: 0000000000002200 c000000007f4e100 0000000000000000 0000000139c29710
>> GPR16: 0000000139c29714 0000000139c29788 c0000000013cbcc8 0000000000000000
>> GPR20: 0000000000034000 c0000000016e05e8 0000000000000000 0000000000000001
>> GPR24: 0000000000bf50d9 800000000000018e 0000000000000000 c0000000016e04b8
>> GPR28: f000000000d00040 0000006420a2f217 f000000000d00000 00ea1b2170340000
>> NIP [c000000000062670] __kernel_map_pages+0x2e0/0x4f0
>> LR [c00000000006265c] __kernel_map_pages+0x2cc/0x4f0
>> Call Trace:
>> [c0000005bf8a7840] [c00000000006265c] __kernel_map_pages+0x2cc/0x4f0 (unreliable)
>> [c0000005bf8a78d0] [c00000000028c4a0] free_unref_page_prepare+0x2f0/0x4d0
>> [c0000005bf8a7930] [c000000000293144] free_unref_page+0x44/0x90
>> [c0000005bf8a7970] [c00000000037af24] __online_page_free+0x84/0x110
>> [c0000005bf8a79a0] [c00000000037b6e0] online_pages_range+0xc0/0x150
>> [c0000005bf8a7a00] [c00000000005aaa8] walk_system_ram_range+0xc8/0x120
>> [c0000005bf8a7a50] [c00000000037e710] online_pages+0x280/0x5a0
>> [c0000005bf8a7b40] [c0000000006419e4] memory_subsys_online+0x1b4/0x270
>> [c0000005bf8a7bb0] [c000000000616720] device_online+0xc0/0xf0
>> [c0000005bf8a7bf0] [c000000000642570] state_store+0xc0/0x180
>> [c0000005bf8a7c30] [c000000000610b2c] dev_attr_store+0x3c/0x60
>> [c0000005bf8a7c50] [c0000000004c0a50] sysfs_kf_write+0x70/0xb0
>> [c0000005bf8a7c90] [c0000000004bf40c] kernfs_fop_write+0x10c/0x250
>> [c0000005bf8a7ce0] [c0000000003e4b18] __vfs_write+0x48/0x240
>> [c0000005bf8a7d80] [c0000000003e4f68] vfs_write+0xd8/0x210
>> [c0000005bf8a7dd0] [c0000000003e52f0] ksys_write+0x70/0x120
>> [c0000005bf8a7e20] [c00000000000b000] system_call+0x5c/0x70
>> Instruction dump:
>> 7fbd5278 7fbd4a78 3e42ffeb 7bbd0640 3a523ac8 7e439378 487a2881 60000000
>> e95505f0 7e6aa0ae 6a690080 7929c9c2 <0b090000> 7f4aa1ae 7e439378 487a28dd
>>
>> Signed-off-by: Qian Cai <cai@lca.pw>
>> ---
>>  mm/page_alloc.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 10d0f2ed9f69..025fc93d1518 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -8349,6 +8349,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>>                 for (i = 0; i < (1 << order); i++)
>>                         SetPageReserved((page+i));
>>                 pfn += (1 << order);
>> +               kernel_map_pages(page, 1 << order, 1);
> 
> Doubt , Not sure, but does this change will have any impact on
> drivers/base/memory.c#L249
> memory_block_action() ->  offline_pages() ?

Yes, it does.
Qian Cai Feb. 26, 2019, 2:16 p.m. UTC | #4
On 2/26/19 7:35 AM, Michal Hocko wrote:
> On Mon 25-02-19 14:17:10, Qian Cai wrote:
>> When onlining memory pages, it calls kernel_unmap_linear_page(),
>> However, it does not call kernel_map_linear_page() while offlining
>> memory pages. As the result, it triggers a panic below while onlining on
>> ppc64le as it checks if the pages are mapped before unmapping,
>> Therefore, let it call kernel_map_linear_page() when setting all pages
>> as reserved.
> 
> This really begs for much more explanation. All the pages should be
> unmapped as they get freed AFAIR. So why do we need a special handing
> here when this path only offlines free pages?
> 

It sounds like this is exact the point to explain the imbalance. When offlining,
every page has already been unmapped and marked reserved. When onlining, it
tries to free those reserved pages via __online_page_free(). Since those pages
are order 0, it goes free_unref_page() which in-turn call
kernel_unmap_linear_page() again without been mapped first.
Michal Hocko Feb. 26, 2019, 2:23 p.m. UTC | #5
On Tue 26-02-19 09:16:30, Qian Cai wrote:
> 
> 
> On 2/26/19 7:35 AM, Michal Hocko wrote:
> > On Mon 25-02-19 14:17:10, Qian Cai wrote:
> >> When onlining memory pages, it calls kernel_unmap_linear_page(),
> >> However, it does not call kernel_map_linear_page() while offlining
> >> memory pages. As the result, it triggers a panic below while onlining on
> >> ppc64le as it checks if the pages are mapped before unmapping,
> >> Therefore, let it call kernel_map_linear_page() when setting all pages
> >> as reserved.
> > 
> > This really begs for much more explanation. All the pages should be
> > unmapped as they get freed AFAIR. So why do we need a special handing
> > here when this path only offlines free pages?
> > 
> 
> It sounds like this is exact the point to explain the imbalance. When offlining,
> every page has already been unmapped and marked reserved. When onlining, it
> tries to free those reserved pages via __online_page_free(). Since those pages
> are order 0, it goes free_unref_page() which in-turn call
> kernel_unmap_linear_page() again without been mapped first.

How is this any different from an initial page being freed to the
allocator during the boot?
Qian Cai Feb. 26, 2019, 5:53 p.m. UTC | #6
On Tue, 2019-02-26 at 15:23 +0100, Michal Hocko wrote:
> On Tue 26-02-19 09:16:30, Qian Cai wrote:
> > 
> > 
> > On 2/26/19 7:35 AM, Michal Hocko wrote:
> > > On Mon 25-02-19 14:17:10, Qian Cai wrote:
> > > > When onlining memory pages, it calls kernel_unmap_linear_page(),
> > > > However, it does not call kernel_map_linear_page() while offlining
> > > > memory pages. As the result, it triggers a panic below while onlining on
> > > > ppc64le as it checks if the pages are mapped before unmapping,
> > > > Therefore, let it call kernel_map_linear_page() when setting all pages
> > > > as reserved.
> > > 
> > > This really begs for much more explanation. All the pages should be
> > > unmapped as they get freed AFAIR. So why do we need a special handing
> > > here when this path only offlines free pages?
> > > 
> > 
> > It sounds like this is exact the point to explain the imbalance. When
> > offlining,
> > every page has already been unmapped and marked reserved. When onlining, it
> > tries to free those reserved pages via __online_page_free(). Since those
> > pages
> > are order 0, it goes free_unref_page() which in-turn call
> > kernel_unmap_linear_page() again without been mapped first.
> 
> How is this any different from an initial page being freed to the
> allocator during the boot?
> 

As least for IBM POWER8, it does this during the boot,

early_setup
  early_init_mmu
    harsh__early_init_mmu
      htab_initialize [1]
        htab_bolt_mapping [2]

where it effectively map all memblock regions just like
kernel_map_linear_page(), so later mem_init() -> memblock_free_all() will unmap
them just fine.

[1]
for_each_memblock(memory, reg) {
	base = (unsigned long)__va(reg->base);
	size = reg->size;

	DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
		base, size, prot);

	BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
		prot, mmu_linear_psize, mmu_kernel_ssize));
	}

[2] linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
Michal Hocko Feb. 26, 2019, 6:16 p.m. UTC | #7
On Tue 26-02-19 12:53:05, Qian Cai wrote:
> On Tue, 2019-02-26 at 15:23 +0100, Michal Hocko wrote:
> > On Tue 26-02-19 09:16:30, Qian Cai wrote:
> > > 
> > > 
> > > On 2/26/19 7:35 AM, Michal Hocko wrote:
> > > > On Mon 25-02-19 14:17:10, Qian Cai wrote:
> > > > > When onlining memory pages, it calls kernel_unmap_linear_page(),
> > > > > However, it does not call kernel_map_linear_page() while offlining
> > > > > memory pages. As the result, it triggers a panic below while onlining on
> > > > > ppc64le as it checks if the pages are mapped before unmapping,
> > > > > Therefore, let it call kernel_map_linear_page() when setting all pages
> > > > > as reserved.
> > > > 
> > > > This really begs for much more explanation. All the pages should be
> > > > unmapped as they get freed AFAIR. So why do we need a special handing
> > > > here when this path only offlines free pages?
> > > > 
> > > 
> > > It sounds like this is exact the point to explain the imbalance. When
> > > offlining,
> > > every page has already been unmapped and marked reserved. When onlining, it
> > > tries to free those reserved pages via __online_page_free(). Since those
> > > pages
> > > are order 0, it goes free_unref_page() which in-turn call
> > > kernel_unmap_linear_page() again without been mapped first.
> > 
> > How is this any different from an initial page being freed to the
> > allocator during the boot?
> > 
> 
> As least for IBM POWER8, it does this during the boot,
> 
> early_setup
>   early_init_mmu
>     harsh__early_init_mmu
>       htab_initialize [1]
>         htab_bolt_mapping [2]
> 
> where it effectively map all memblock regions just like
> kernel_map_linear_page(), so later mem_init() -> memblock_free_all() will unmap
> them just fine.
> 
> [1]
> for_each_memblock(memory, reg) {
> 	base = (unsigned long)__va(reg->base);
> 	size = reg->size;
> 
> 	DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
> 		base, size, prot);
> 
> 	BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
> 		prot, mmu_linear_psize, mmu_kernel_ssize));
> 	}
> 
> [2] linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;

Thanks for the clarification. I would have expected that there is a
generic path to do kernel_map_pages from an appropriate place. I am also
wondering whether blowing up is actually the right thing to do. Is the
ppc specific code correct? Isn't your patch simply working around a
bogus condition?
Michal Hocko Feb. 26, 2019, 6:20 p.m. UTC | #8
On Tue 26-02-19 19:16:48, Michal Hocko wrote:
> On Tue 26-02-19 12:53:05, Qian Cai wrote:
> > On Tue, 2019-02-26 at 15:23 +0100, Michal Hocko wrote:
> > > On Tue 26-02-19 09:16:30, Qian Cai wrote:
> > > > 
> > > > 
> > > > On 2/26/19 7:35 AM, Michal Hocko wrote:
> > > > > On Mon 25-02-19 14:17:10, Qian Cai wrote:
> > > > > > When onlining memory pages, it calls kernel_unmap_linear_page(),
> > > > > > However, it does not call kernel_map_linear_page() while offlining
> > > > > > memory pages. As the result, it triggers a panic below while onlining on
> > > > > > ppc64le as it checks if the pages are mapped before unmapping,
> > > > > > Therefore, let it call kernel_map_linear_page() when setting all pages
> > > > > > as reserved.
> > > > > 
> > > > > This really begs for much more explanation. All the pages should be
> > > > > unmapped as they get freed AFAIR. So why do we need a special handing
> > > > > here when this path only offlines free pages?
> > > > > 
> > > > 
> > > > It sounds like this is exact the point to explain the imbalance. When
> > > > offlining,
> > > > every page has already been unmapped and marked reserved. When onlining, it
> > > > tries to free those reserved pages via __online_page_free(). Since those
> > > > pages
> > > > are order 0, it goes free_unref_page() which in-turn call
> > > > kernel_unmap_linear_page() again without been mapped first.
> > > 
> > > How is this any different from an initial page being freed to the
> > > allocator during the boot?
> > > 
> > 
> > As least for IBM POWER8, it does this during the boot,
> > 
> > early_setup
> >   early_init_mmu
> >     harsh__early_init_mmu
> >       htab_initialize [1]
> >         htab_bolt_mapping [2]
> > 
> > where it effectively map all memblock regions just like
> > kernel_map_linear_page(), so later mem_init() -> memblock_free_all() will unmap
> > them just fine.
> > 
> > [1]
> > for_each_memblock(memory, reg) {
> > 	base = (unsigned long)__va(reg->base);
> > 	size = reg->size;
> > 
> > 	DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
> > 		base, size, prot);
> > 
> > 	BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
> > 		prot, mmu_linear_psize, mmu_kernel_ssize));
> > 	}
> > 
> > [2] linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
> 
> Thanks for the clarification. I would have expected that there is a
> generic path to do kernel_map_pages from an appropriate place. I am also
> wondering whether blowing up is actually the right thing to do. Is the
> ppc specific code correct? Isn't your patch simply working around a
> bogus condition?

Btw. what happens if the offlined pfn range is removed completely? Is
the range still mapped? What kind of consequences does this have?
Also when does this tweak happens on a completely new hotplugged memory
range?
Qian Cai Feb. 26, 2019, 7:19 p.m. UTC | #9
On Tue, 2019-02-26 at 19:20 +0100, Michal Hocko wrote:
> Btw. what happens if the offlined pfn range is removed completely? Is
> the range still mapped? What kind of consequences does this have?

Well, the pages are still marked as reserved as well, so it is up to the
physically memory hotplug handler to free kernel direct mapping pagetable,
virtual memory mapping pages, and virtual memory mapping pagetable as by design,
although I have no way to test it.

> Also when does this tweak happens on a completely new hotplugged memory
> range?

I suppose it will call online_pages() which in-turn call
kernel_unmap_linear_page() which may or may not have the same issue, but I have
no way to test that path.
Michal Hocko Feb. 26, 2019, 7:40 p.m. UTC | #10
On Tue 26-02-19 14:19:42, Qian Cai wrote:
> On Tue, 2019-02-26 at 19:20 +0100, Michal Hocko wrote:
> > Btw. what happens if the offlined pfn range is removed completely? Is
> > the range still mapped? What kind of consequences does this have?
> 
> Well, the pages are still marked as reserved as well, so it is up to the
> physically memory hotplug handler to free kernel direct mapping pagetable,
> virtual memory mapping pages, and virtual memory mapping pagetable as by design,
> although I have no way to test it.
> 
> > Also when does this tweak happens on a completely new hotplugged memory
> > range?
> 
> I suppose it will call online_pages() which in-turn call
> kernel_unmap_linear_page() which may or may not have the same issue, but I have
> no way to test that path.

It seems you have missed the point of my question. It simply doesn't
make much sense to have offline memory mapped. That memory is not
accessible in general. So mapping it at the offline time is dubious at
best. Also you do not get through the offlining phase on a newly
hotplugged (and not yet onlined) memory. So the patch doesn't look
correct to me and it all smells like the bug you are seeing is a wrong
reporting.

I might be wrong here of course, because I didn't really get to study
the code very deeply. But then the changelog needs to elaborate much
more than, it bugs on so we make it to not bug.
Qian Cai Feb. 26, 2019, 8:10 p.m. UTC | #11
On Tue, 2019-02-26 at 20:40 +0100, Michal Hocko wrote:
> It seems you have missed the point of my question. It simply doesn't
> make much sense to have offline memory mapped. That memory is not
> accessible in general. So mapping it at the offline time is dubious at
> best. 

Well, kernel_map_pages() is like other debug features which could look
"unusual".

> Also you do not get through the offlining phase on a newly
> hotplugged (and not yet onlined) memory. So the patch doesn't look
> correct to me and it all smells like the bug you are seeing is a wrong
> reporting.
> 

That (physical memory hotadd) is a special case like during the boot. The patch
is strictly to deal with offline/online memory, i.e., logical/soft memory
hotplug.
Michal Hocko Feb. 26, 2019, 8:40 p.m. UTC | #12
On Tue 26-02-19 15:10:39, Qian Cai wrote:
> On Tue, 2019-02-26 at 20:40 +0100, Michal Hocko wrote:
> > It seems you have missed the point of my question. It simply doesn't
> > make much sense to have offline memory mapped. That memory is not
> > accessible in general. So mapping it at the offline time is dubious at
> > best. 
> 
> Well, kernel_map_pages() is like other debug features which could look
> "unusual".
> 
> > Also you do not get through the offlining phase on a newly
> > hotplugged (and not yet onlined) memory. So the patch doesn't look
> > correct to me and it all smells like the bug you are seeing is a wrong
> > reporting.
> > 
> 
> That (physical memory hotadd) is a special case like during the boot. The patch
> is strictly to deal with offline/online memory, i.e., logical/soft memory
> hotplug.

And it doesn't handle it properly AFAICS. You want to get an exception
when accessing an offline memory, don't you? Offline, free or not present
memory is basically the same case - nobody should be touching that
memory.

Patch
diff mbox series

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 10d0f2ed9f69..025fc93d1518 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8349,6 +8349,7 @@  __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 		for (i = 0; i < (1 << order); i++)
 			SetPageReserved((page+i));
 		pfn += (1 << order);
+		kernel_map_pages(page, 1 << order, 1);
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
 }