diff mbox series

[7/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

Message ID 20200209104826.3385-8-bhe@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm/hotplug: Only use subsection in VMEMMAP case and fix hot add/remove failure in SPARSEMEM|!VMEMMAP case | expand

Commit Message

Baoquan He Feb. 9, 2020, 10:48 a.m. UTC
In section_deactivate(), pfn_to_page() doesn't work any more after
ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
It caused hot remove failure, the trace is:

kernel BUG at mm/page_alloc.c:4806!
invalid opcode: 0000 [#1] SMP PTI
CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
RIP: 0010:free_pages+0x85/0xa0
Call Trace:
 __remove_pages+0x99/0xc0
 arch_remove_memory+0x23/0x4d
 try_remove_memory+0xc8/0x130
 ? walk_memory_blocks+0x72/0xa0
 __remove_memory+0xa/0x11
 acpi_memory_device_remove+0x72/0x100
 acpi_bus_trim+0x55/0x90
 acpi_device_hotplug+0x2eb/0x3d0
 acpi_hotplug_work_fn+0x1a/0x30
 process_one_work+0x1a7/0x370
 worker_thread+0x30/0x380
 ? flush_rcu_work+0x30/0x30
 kthread+0x112/0x130
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x35/0x40

Let's defer the ->section_mem_map resetting after depopulate_section_memmap()
to fix it.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/sparse.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Wei Yang Feb. 9, 2020, 11:52 p.m. UTC | #1
On Sun, Feb 09, 2020 at 06:48:26PM +0800, Baoquan He wrote:
>In section_deactivate(), pfn_to_page() doesn't work any more after
>ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
>It caused hot remove failure, the trace is:
>
>kernel BUG at mm/page_alloc.c:4806!
>invalid opcode: 0000 [#1] SMP PTI
>CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G        W         5.5.0-next-20200205+ #340
>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
>Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>RIP: 0010:free_pages+0x85/0xa0
>Call Trace:
> __remove_pages+0x99/0xc0
> arch_remove_memory+0x23/0x4d
> try_remove_memory+0xc8/0x130
> ? walk_memory_blocks+0x72/0xa0
> __remove_memory+0xa/0x11
> acpi_memory_device_remove+0x72/0x100
> acpi_bus_trim+0x55/0x90
> acpi_device_hotplug+0x2eb/0x3d0
> acpi_hotplug_work_fn+0x1a/0x30
> process_one_work+0x1a7/0x370
> worker_thread+0x30/0x380
> ? flush_rcu_work+0x30/0x30
> kthread+0x112/0x130
> ? kthread_create_on_node+0x60/0x60
> ret_from_fork+0x35/0x40
>
>Let's defer the ->section_mem_map resetting after depopulate_section_memmap()
>to fix it.
>
>Signed-off-by: Baoquan He <bhe@redhat.com>
>---
> mm/sparse.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/mm/sparse.c b/mm/sparse.c
>index 623755e88255..345d065ef6ce 100644
>--- a/mm/sparse.c
>+++ b/mm/sparse.c
>@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> 			ms->usage = NULL;
> 		}
> 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
>-		ms->section_mem_map = (unsigned long)NULL;
> 	}
> 
> 	if (section_is_early && memmap)
> 		free_map_bootmem(memmap);
> 	else
> 		depopulate_section_memmap(pfn, nr_pages, altmap);

The crash happens in depopulate_section_memmap() when trying to get memmap by
pfn_to_page(). Can we pass memmap directly?

>+
>+	if(!rc)
>+		ms->section_mem_map = (unsigned long)NULL;
> }
> 
> static struct page * __meminit section_activate(int nid, unsigned long pfn,
>-- 
>2.17.2
Baoquan He Feb. 10, 2020, 3:41 a.m. UTC | #2
On 02/10/20 at 07:52am, Wei Yang wrote:
> >---
> > mm/sparse.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/mm/sparse.c b/mm/sparse.c
> >index 623755e88255..345d065ef6ce 100644
> >--- a/mm/sparse.c
> >+++ b/mm/sparse.c
> >@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > 			ms->usage = NULL;
> > 		}
> > 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> >-		ms->section_mem_map = (unsigned long)NULL;
> > 	}
> > 
> > 	if (section_is_early && memmap)
> > 		free_map_bootmem(memmap);
> > 	else
> > 		depopulate_section_memmap(pfn, nr_pages, altmap);
> 
> The crash happens in depopulate_section_memmap() when trying to get memmap by
> pfn_to_page(). Can we pass memmap directly?

Yes, that's also a good idea. While it needs to add a parameter for
depopulate_section_memmap(), the parameter is useless for VMEMMAP
though, I personally prefer the current fix which is a little simpler.

Anyway, both is fine to me, I can update if you think passing memmap is
better.

> 
> >+
> >+	if(!rc)
> >+		ms->section_mem_map = (unsigned long)NULL;
> > }
> > 
> > static struct page * __meminit section_activate(int nid, unsigned long pfn,
> >-- 
> >2.17.2
> 
> -- 
> Wei Yang
> Help you, Help me
>
Wei Yang Feb. 10, 2020, 6:08 a.m. UTC | #3
On Mon, Feb 10, 2020 at 11:41:05AM +0800, Baoquan He wrote:
>On 02/10/20 at 07:52am, Wei Yang wrote:
>> >---
>> > mm/sparse.c | 4 +++-
>> > 1 file changed, 3 insertions(+), 1 deletion(-)
>> >
>> >diff --git a/mm/sparse.c b/mm/sparse.c
>> >index 623755e88255..345d065ef6ce 100644
>> >--- a/mm/sparse.c
>> >+++ b/mm/sparse.c
>> >@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>> > 			ms->usage = NULL;
>> > 		}
>> > 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
>> >-		ms->section_mem_map = (unsigned long)NULL;
>> > 	}
>> > 
>> > 	if (section_is_early && memmap)
>> > 		free_map_bootmem(memmap);
>> > 	else
>> > 		depopulate_section_memmap(pfn, nr_pages, altmap);
>> 
>> The crash happens in depopulate_section_memmap() when trying to get memmap by
>> pfn_to_page(). Can we pass memmap directly?
>
>Yes, that's also a good idea. While it needs to add a parameter for
>depopulate_section_memmap(), the parameter is useless for VMEMMAP
>though, I personally prefer the current fix which is a little simpler.
>

Not a new parameter, but replace pfn with memmap.

Not sure why the parameter is useless for VMEMMAP? memmap will be assigned to
start and finally pass to vmemmap_free().

>Anyway, both is fine to me, I can update if you think passing memmap is
>better.
>
>> 
>> >+
>> >+	if(!rc)
>> >+		ms->section_mem_map = (unsigned long)NULL;
>> > }
>> > 
>> > static struct page * __meminit section_activate(int nid, unsigned long pfn,
>> >-- 
>> >2.17.2
>> 
>> -- 
>> Wei Yang
>> Help you, Help me
>>
Baoquan He Feb. 10, 2020, 7:54 a.m. UTC | #4
On 02/10/20 at 02:08pm, Wei Yang wrote:
> On Mon, Feb 10, 2020 at 11:41:05AM +0800, Baoquan He wrote:
> >On 02/10/20 at 07:52am, Wei Yang wrote:
> >> >---
> >> > mm/sparse.c | 4 +++-
> >> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >> >
> >> >diff --git a/mm/sparse.c b/mm/sparse.c
> >> >index 623755e88255..345d065ef6ce 100644
> >> >--- a/mm/sparse.c
> >> >+++ b/mm/sparse.c
> >> >@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> >> > 			ms->usage = NULL;
> >> > 		}
> >> > 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> >> >-		ms->section_mem_map = (unsigned long)NULL;
> >> > 	}
> >> > 
> >> > 	if (section_is_early && memmap)
> >> > 		free_map_bootmem(memmap);
> >> > 	else
> >> > 		depopulate_section_memmap(pfn, nr_pages, altmap);
> >> 
> >> The crash happens in depopulate_section_memmap() when trying to get memmap by
> >> pfn_to_page(). Can we pass memmap directly?
> >
> >Yes, that's also a good idea. While it needs to add a parameter for
> >depopulate_section_memmap(), the parameter is useless for VMEMMAP
> >though, I personally prefer the current fix which is a little simpler.
> >
> 
> Not a new parameter, but replace pfn with memmap.
> 
> Not sure why the parameter is useless for VMEMMAP? memmap will be assigned to
> start and finally pass to vmemmap_free().

In section_deactivate(), per the code comments from Dan, we can know
that:

	/*
	 * section which only contains bootmem will be handled by
	 * free_map_bootmem(), including a complete section, or partial
	 * section which only has memory starting from the begining.
	 */
        if (section_is_early && memmap)
                free_map_bootmem(memmap);                                                                                                         
        else
	/* 
	 * section which contains region mixing bootmem with hot added
	 * sub-section region, only sub-section region, complete
	 * section. And in the mxied case, if hot remove the hot added
	 * sub-section aligned part, no memmap is got in the current
	 * code. So we still need pfn to calculate it for vmemmap case.
	 * To me, whenever we need, it looks better that we always use
	 * pfn to get its own memmap. 
	 */
                depopulate_section_memmap(pfn, nr_pages, altmap);

This is why I would like to keep the current logic as is,only one line
of code adjusting can fix the issue. Please let me know if I miss
anything.


> 
> >Anyway, both is fine to me, I can update if you think passing memmap is
> >better.
> >
> >> 
> >> >+
> >> >+	if(!rc)
> >> >+		ms->section_mem_map = (unsigned long)NULL;
> >> > }
> >> > 
> >> > static struct page * __meminit section_activate(int nid, unsigned long pfn,
> >> >-- 
> >> >2.17.2
> >> 
> >> -- 
> >> Wei Yang
> >> Help you, Help me
> >> 
> 
> -- 
> Wei Yang
> Help you, Help me
>
Wei Yang Feb. 10, 2020, 11:05 p.m. UTC | #5
On Mon, Feb 10, 2020 at 03:54:06PM +0800, Baoquan He wrote:
>On 02/10/20 at 02:08pm, Wei Yang wrote:
>> On Mon, Feb 10, 2020 at 11:41:05AM +0800, Baoquan He wrote:
>> >On 02/10/20 at 07:52am, Wei Yang wrote:
>> >> >---
>> >> > mm/sparse.c | 4 +++-
>> >> > 1 file changed, 3 insertions(+), 1 deletion(-)
>> >> >
>> >> >diff --git a/mm/sparse.c b/mm/sparse.c
>> >> >index 623755e88255..345d065ef6ce 100644
>> >> >--- a/mm/sparse.c
>> >> >+++ b/mm/sparse.c
>> >> >@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>> >> > 			ms->usage = NULL;
>> >> > 		}
>> >> > 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
>> >> >-		ms->section_mem_map = (unsigned long)NULL;
>> >> > 	}
>> >> > 
>> >> > 	if (section_is_early && memmap)
>> >> > 		free_map_bootmem(memmap);
>> >> > 	else
>> >> > 		depopulate_section_memmap(pfn, nr_pages, altmap);
>> >> 
>> >> The crash happens in depopulate_section_memmap() when trying to get memmap by
>> >> pfn_to_page(). Can we pass memmap directly?
>> >
>> >Yes, that's also a good idea. While it needs to add a parameter for
>> >depopulate_section_memmap(), the parameter is useless for VMEMMAP
>> >though, I personally prefer the current fix which is a little simpler.
>> >
>> 
>> Not a new parameter, but replace pfn with memmap.
>> 
>> Not sure why the parameter is useless for VMEMMAP? memmap will be assigned to
>> start and finally pass to vmemmap_free().
>
>In section_deactivate(), per the code comments from Dan, we can know
>that:
>
>	/*
>	 * section which only contains bootmem will be handled by
>	 * free_map_bootmem(), including a complete section, or partial
>	 * section which only has memory starting from the begining.
>	 */
>        if (section_is_early && memmap)
>                free_map_bootmem(memmap);                                                                                                         
>        else
>	/* 
>	 * section which contains region mixing bootmem with hot added
>	 * sub-section region, only sub-section region, complete
>	 * section. And in the mxied case, if hot remove the hot added
>	 * sub-section aligned part, no memmap is got in the current
>	 * code. So we still need pfn to calculate it for vmemmap case.
>	 * To me, whenever we need, it looks better that we always use
>	 * pfn to get its own memmap. 
>	 */
>                depopulate_section_memmap(pfn, nr_pages, altmap);
>
>This is why I would like to keep the current logic as is,only one line
>of code adjusting can fix the issue. Please let me know if I miss
>anything.
>

You are right. I missed this point.

>
>> 
>> >Anyway, both is fine to me, I can update if you think passing memmap is
>> >better.
>> >
>> >> 
>> >> >+
>> >> >+	if(!rc)
>> >> >+		ms->section_mem_map = (unsigned long)NULL;
>> >> > }
>> >> > 
>> >> > static struct page * __meminit section_activate(int nid, unsigned long pfn,
>> >> >-- 
>> >> >2.17.2
>> >> 
>> >> -- 
>> >> Wei Yang
>> >> Help you, Help me
>> >> 
>> 
>> -- 
>> Wei Yang
>> Help you, Help me
>>
diff mbox series

Patch

diff --git a/mm/sparse.c b/mm/sparse.c
index 623755e88255..345d065ef6ce 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -854,13 +854,15 @@  static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 			ms->usage = NULL;
 		}
 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
-		ms->section_mem_map = (unsigned long)NULL;
 	}
 
 	if (section_is_early && memmap)
 		free_map_bootmem(memmap);
 	else
 		depopulate_section_memmap(pfn, nr_pages, altmap);
+
+	if(!rc)
+		ms->section_mem_map = (unsigned long)NULL;
 }
 
 static struct page * __meminit section_activate(int nid, unsigned long pfn,