diff mbox series

mm/sparse: reset section's mem_map when fully deactivated

Message ID 1579143668-27941-1-git-send-email-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm/sparse: reset section's mem_map when fully deactivated | expand

Commit Message

Pingfan Liu Jan. 16, 2020, 3:01 a.m. UTC
When fully deactivated, it is meaningless to keep the value of a section's
mem_map. And its mem_map will be reassigned during re-added.

Beside this, it breaks the user space tool "makedumpfile", which makes
assumption that a hot-removed section having mem_map as NULL.

The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
trigger a crash, and save vmcore by makedumpfile

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: kexec@lists.infradead.org
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
---
 mm/sparse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Qian Cai Jan. 16, 2020, 3:18 a.m. UTC | #1
> On Jan 15, 2020, at 10:01 PM, Pingfan Liu <kernelfans@gmail.com> wrote:
> 
> When fully deactivated, it is meaningless to keep the value of a section's
> mem_map. And its mem_map will be reassigned during re-added.
> 
> Beside this, it breaks the user space tool "makedumpfile", which makes
> assumption that a hot-removed section having mem_map as NULL.

Well, tools like makedumpfile and the crash utility always has to copy with
low kernel implementation details changes like this. Why is it different this time?

> 
> The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
> trigger a crash, and save vmcore by makedumpfile
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> To: linux-mm@kvack.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: kexec@lists.infradead.org
> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
> ---
> mm/sparse.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 3822ecb..fddac80 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -789,7 +789,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> 			ms->usage = NULL;
> 		}
> 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> -		ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr);
> +		ms->section_mem_map = NULL;
> 	}
> 
> 	if (section_is_early && memmap)
> -- 
> 2.7.5
> 
>
Pingfan Liu Jan. 16, 2020, 3:34 a.m. UTC | #2
On Thu, Jan 16, 2020 at 11:18 AM Qian Cai <cai@lca.pw> wrote:
>
>
>
> > On Jan 15, 2020, at 10:01 PM, Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > When fully deactivated, it is meaningless to keep the value of a section's
> > mem_map. And its mem_map will be reassigned during re-added.
> >
> > Beside this, it breaks the user space tool "makedumpfile", which makes
> > assumption that a hot-removed section having mem_map as NULL.
>
> Well, tools like makedumpfile and the crash utility always has to copy with

                            ^^^ cope ?
> low kernel implementation details changes like this. Why is it different this time?
Yeah, there are two direction. But as the first argument, it is
meaningless to keep the value.

Thanks,
Pingfan
Michal Hocko Jan. 16, 2020, 7:50 a.m. UTC | #3
On Thu 16-01-20 11:01:08, Pingfan Liu wrote:
> When fully deactivated, it is meaningless to keep the value of a section's
> mem_map. And its mem_map will be reassigned during re-added.
> 
> Beside this, it breaks the user space tool "makedumpfile", which makes
> assumption that a hot-removed section having mem_map as NULL.

We used to do that before ba72b4c8cf60 ("mm/sparsemem: support
sub-section hotplug"). Dan was this an intentional change?
 
> The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
> trigger a crash, and save vmcore by makedumpfile
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> To: linux-mm@kvack.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: kexec@lists.infradead.org
> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 3822ecb..fddac80 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -789,7 +789,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>  			ms->usage = NULL;
>  		}
>  		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> -		ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr);
> +		ms->section_mem_map = NULL;
>  	}
>  
>  	if (section_is_early && memmap)
> -- 
> 2.7.5
David Hildenbrand Jan. 16, 2020, 8:06 a.m. UTC | #4
On 16.01.20 04:01, Pingfan Liu wrote:
> When fully deactivated, it is meaningless to keep the value of a section's
> mem_map. And its mem_map will be reassigned during re-added.
> 
> Beside this, it breaks the user space tool "makedumpfile", which makes
> assumption that a hot-removed section having mem_map as NULL.
> 
> The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
> trigger a crash, and save vmcore by makedumpfile

Are you using an up-to-date makedumfile and did kdump.service properly
get reloaded on the udev events? I remember that this works.

makedumpfile will not dump memory sections that a) are not marked
offline (SECTION_IS_ONLINE) - after offlining b) are not part of an
iomem resource - after memory unplug.


The current code makes sure that sparse_decode_mem_map() will return NULL.
David Hildenbrand Jan. 16, 2020, 8:14 a.m. UTC | #5
On 16.01.20 09:06, David Hildenbrand wrote:
> On 16.01.20 04:01, Pingfan Liu wrote:
>> When fully deactivated, it is meaningless to keep the value of a section's
>> mem_map. And its mem_map will be reassigned during re-added.
>>
>> Beside this, it breaks the user space tool "makedumpfile", which makes
>> assumption that a hot-removed section having mem_map as NULL.
>>
>> The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
>> trigger a crash, and save vmcore by makedumpfile
> 
> Are you using an up-to-date makedumfile and did kdump.service properly
> get reloaded on the udev events? I remember that this works.
> 
> makedumpfile will not dump memory sections that a) are not marked
> offline (SECTION_IS_ONLINE) - after offlining b) are not part of an
> iomem resource - after memory unplug.
> 
> 
> The current code makes sure that sparse_decode_mem_map() will return NULL.
> 

... but it's only used at this very place. I think we should add a
Fixes: tag, although this might be fixed as well in makedumpfile (so
people are aware that patch broke something)

Reviewed-by: David Hildenbrand <david@redhat.com>
Baoquan He Jan. 16, 2020, 8:24 a.m. UTC | #6
On 01/16/20 at 09:14am, David Hildenbrand wrote:
> On 16.01.20 09:06, David Hildenbrand wrote:
> > On 16.01.20 04:01, Pingfan Liu wrote:
> >> When fully deactivated, it is meaningless to keep the value of a section's
> >> mem_map. And its mem_map will be reassigned during re-added.
> >>
> >> Beside this, it breaks the user space tool "makedumpfile", which makes
> >> assumption that a hot-removed section having mem_map as NULL.
> >>
> >> The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
> >> trigger a crash, and save vmcore by makedumpfile
> > 
> > Are you using an up-to-date makedumfile and did kdump.service properly
> > get reloaded on the udev events? I remember that this works.
> > 
> > makedumpfile will not dump memory sections that a) are not marked
> > offline (SECTION_IS_ONLINE) - after offlining b) are not part of an
> > iomem resource - after memory unplug.

Makedumpfile seems to only check SECTION_MARKED_PRESENT. Then the NULL
memmap will fail vmcore dumping, I guess.

> > 
> > 
> > The current code makes sure that sparse_decode_mem_map() will return NULL.
> > 
> 
> ... but it's only used at this very place. I think we should add a
> Fixes: tag, although this might be fixed as well in makedumpfile (so
> people are aware that patch broke something)

Agree, it's worth fixing it too in makedumpfile side to enhance.
David Hildenbrand Jan. 16, 2020, 8:57 a.m. UTC | #7
On 16.01.20 09:24, Baoquan He wrote:
> On 01/16/20 at 09:14am, David Hildenbrand wrote:
>> On 16.01.20 09:06, David Hildenbrand wrote:
>>> On 16.01.20 04:01, Pingfan Liu wrote:
>>>> When fully deactivated, it is meaningless to keep the value of a section's
>>>> mem_map. And its mem_map will be reassigned during re-added.
>>>>
>>>> Beside this, it breaks the user space tool "makedumpfile", which makes
>>>> assumption that a hot-removed section having mem_map as NULL.
>>>>
>>>> The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
>>>> trigger a crash, and save vmcore by makedumpfile
>>>
>>> Are you using an up-to-date makedumfile and did kdump.service properly
>>> get reloaded on the udev events? I remember that this works.
>>>
>>> makedumpfile will not dump memory sections that a) are not marked
>>> offline (SECTION_IS_ONLINE) - after offlining b) are not part of an
>>> iomem resource - after memory unplug.
> 
> Makedumpfile seems to only check SECTION_MARKED_PRESENT. Then the NULL
> memmap will fail vmcore dumping, I guess.

But why should the section be marked SECTION_MARKED_PRESENT? After
unplugging a section, the flag will be cleared.

validate_mem_section() seems to iterate over all sections 0..num_section
- 1 and validates them.

section = nr_to_section(section_nr, mem_sec);
if (section == NOT_KV_ADDR) {
	mem_map = NOT_MEMMAP_ADDR:
} else {
	mem_map = section_mem_map_addr(section, &map_mask)
	if (!(map_mask & SECTION_MARKED_PRESENT)) {
		return FALSE;
	}
	if (mem_map == 0) {
		mem_map = NOT_MEMMAP_ADDR;
	} else {
	...

But here it is:

section_mem_map_addr():

map = ULONG(mem_section + OFFSET(mem_section.section_mem_map));
mask = SECTION_MAP_MASK;
*map_mask = map & ~mask;
if (map == 0x0)
	*map_mask |= SECTION_MARKED_PRESENT;

If the map is zero, the section is assumed to be present. The way the
map is calculated, it will be zero due to the changes in the kernel.
Pingfan Liu Jan. 17, 2020, 6:18 a.m. UTC | #8
On Thu, Jan 16, 2020 at 4:06 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 16.01.20 04:01, Pingfan Liu wrote:
> > When fully deactivated, it is meaningless to keep the value of a section's
> > mem_map. And its mem_map will be reassigned during re-added.
> >
> > Beside this, it breaks the user space tool "makedumpfile", which makes
> > assumption that a hot-removed section having mem_map as NULL.
> >
> > The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
> > trigger a crash, and save vmcore by makedumpfile
>
> Are you using an up-to-date makedumfile and did kdump.service properly
> get reloaded on the udev events? I remember that this works.
I tried to get a machine and double-check it. The latest devel branch
of makedumpfile can not work.
>
> makedumpfile will not dump memory sections that a) are not marked
> offline (SECTION_IS_ONLINE) - after offlining b) are not part of an
> iomem resource - after memory unplug.
I think the current implementation of makedumpfile should improve the
handling process.

And my primary argument for this patch is a pattern like alloc/free.
And when fully deactivated, it is meaningless to keep the section
start pfn info
>
>
> The current code makes sure that sparse_decode_mem_map() will return NULL.
Do you mean the code in makedumpfile?  If yes, it can. But
makedumpfile related code has some bug, and should be improved to
utilize this function.

Thanks,
Pingfan
>
> --
> Thanks,
>
> David / dhildenb
>
Pingfan Liu Jan. 17, 2020, 6:20 a.m. UTC | #9
On Thu, Jan 16, 2020 at 4:25 PM Baoquan He <bhe@redhat.com> wrote:
>
> On 01/16/20 at 09:14am, David Hildenbrand wrote:
> > On 16.01.20 09:06, David Hildenbrand wrote:
> > > On 16.01.20 04:01, Pingfan Liu wrote:
> > >> When fully deactivated, it is meaningless to keep the value of a section's
> > >> mem_map. And its mem_map will be reassigned during re-added.
> > >>
> > >> Beside this, it breaks the user space tool "makedumpfile", which makes
> > >> assumption that a hot-removed section having mem_map as NULL.
> > >>
> > >> The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
> > >> trigger a crash, and save vmcore by makedumpfile
> > >
> > > Are you using an up-to-date makedumfile and did kdump.service properly
> > > get reloaded on the udev events? I remember that this works.
> > >
> > > makedumpfile will not dump memory sections that a) are not marked
> > > offline (SECTION_IS_ONLINE) - after offlining b) are not part of an
> > > iomem resource - after memory unplug.
>
> Makedumpfile seems to only check SECTION_MARKED_PRESENT. Then the NULL
> memmap will fail vmcore dumping, I guess.
makedumpfile.c / validate_mem_section() has some trick, so it will not fail.
>
> > >
> > >
> > > The current code makes sure that sparse_decode_mem_map() will return NULL.
> > >
> >
> > ... but it's only used at this very place. I think we should add a
> > Fixes: tag, although this might be fixed as well in makedumpfile (so
> > people are aware that patch broke something)
>
> Agree, it's worth fixing it too in makedumpfile side to enhance.
>
Yeah, I also agree :)
Pingfan Liu Jan. 17, 2020, 6:22 a.m. UTC | #10
On Thu, Jan 16, 2020 at 3:50 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 16-01-20 11:01:08, Pingfan Liu wrote:
> > When fully deactivated, it is meaningless to keep the value of a section's
> > mem_map. And its mem_map will be reassigned during re-added.
> >
> > Beside this, it breaks the user space tool "makedumpfile", which makes
> > assumption that a hot-removed section having mem_map as NULL.
>
> We used to do that before ba72b4c8cf60 ("mm/sparsemem: support
> sub-section hotplug"). Dan was this an intentional change?
I do not know the purpose of this. But the change just leave section
start pfn in fully deactivated section_mem_map, and not used any more.

Thanks,
Pingfan
Dan Williams Jan. 17, 2020, 7:14 a.m. UTC | #11
On Thu, Jan 16, 2020 at 10:23 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 3:50 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 16-01-20 11:01:08, Pingfan Liu wrote:
> > > When fully deactivated, it is meaningless to keep the value of a section's
> > > mem_map. And its mem_map will be reassigned during re-added.
> > >
> > > Beside this, it breaks the user space tool "makedumpfile", which makes
> > > assumption that a hot-removed section having mem_map as NULL.
> >
> > We used to do that before ba72b4c8cf60 ("mm/sparsemem: support
> > sub-section hotplug"). Dan was this an intentional change?
> I do not know the purpose of this. But the change just leave section
> start pfn in fully deactivated section_mem_map, and not used any more.

Not intentional, IIRC at the time I had convinced myself that the
value would always be translated by sparse_decode_mem_map(), so I
thought it could be hiding NULL de-references.  I don't see any harm
in the patch.
Michal Hocko Jan. 17, 2020, 7:47 a.m. UTC | #12
On Thu 16-01-20 23:14:02, Dan Williams wrote:
> On Thu, Jan 16, 2020 at 10:23 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > On Thu, Jan 16, 2020 at 3:50 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 16-01-20 11:01:08, Pingfan Liu wrote:
> > > > When fully deactivated, it is meaningless to keep the value of a section's
> > > > mem_map. And its mem_map will be reassigned during re-added.
> > > >
> > > > Beside this, it breaks the user space tool "makedumpfile", which makes
> > > > assumption that a hot-removed section having mem_map as NULL.
> > >
> > > We used to do that before ba72b4c8cf60 ("mm/sparsemem: support
> > > sub-section hotplug"). Dan was this an intentional change?
> > I do not know the purpose of this. But the change just leave section
> > start pfn in fully deactivated section_mem_map, and not used any more.
> 
> Not intentional, IIRC at the time I had convinced myself that the
> value would always be translated by sparse_decode_mem_map(), so I
> thought it could be hiding NULL de-references.  I don't see any harm
> in the patch.

Thanks for the confirmation. It would be great to have this in the
changelog.
Pingfan Liu Jan. 17, 2020, 9:49 a.m. UTC | #13
On Fri, Jan 17, 2020 at 3:47 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 16-01-20 23:14:02, Dan Williams wrote:
> > On Thu, Jan 16, 2020 at 10:23 PM Pingfan Liu <kernelfans@gmail.com> wrote:
> > >
> > > On Thu, Jan 16, 2020 at 3:50 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Thu 16-01-20 11:01:08, Pingfan Liu wrote:
> > > > > When fully deactivated, it is meaningless to keep the value of a section's
> > > > > mem_map. And its mem_map will be reassigned during re-added.
> > > > >
> > > > > Beside this, it breaks the user space tool "makedumpfile", which makes
> > > > > assumption that a hot-removed section having mem_map as NULL.
> > > >
> > > > We used to do that before ba72b4c8cf60 ("mm/sparsemem: support
> > > > sub-section hotplug"). Dan was this an intentional change?
> > > I do not know the purpose of this. But the change just leave section
> > > start pfn in fully deactivated section_mem_map, and not used any more.
> >
> > Not intentional, IIRC at the time I had convinced myself that the
> > value would always be translated by sparse_decode_mem_map(), so I
> > thought it could be hiding NULL de-references.  I don't see any harm
> > in the patch.
>
> Thanks for the confirmation. It would be great to have this in the
> changelog.
Should I post V2 with this commit log?

Thanks,
Pingfan
David Hildenbrand Jan. 17, 2020, 9:52 a.m. UTC | #14
On 17.01.20 10:49, Pingfan Liu wrote:
> On Fri, Jan 17, 2020 at 3:47 PM Michal Hocko <mhocko@suse.com> wrote:
>>
>> On Thu 16-01-20 23:14:02, Dan Williams wrote:
>>> On Thu, Jan 16, 2020 at 10:23 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>>>>
>>>> On Thu, Jan 16, 2020 at 3:50 PM Michal Hocko <mhocko@suse.com> wrote:
>>>>>
>>>>> On Thu 16-01-20 11:01:08, Pingfan Liu wrote:
>>>>>> When fully deactivated, it is meaningless to keep the value of a section's
>>>>>> mem_map. And its mem_map will be reassigned during re-added.
>>>>>>
>>>>>> Beside this, it breaks the user space tool "makedumpfile", which makes
>>>>>> assumption that a hot-removed section having mem_map as NULL.
>>>>>
>>>>> We used to do that before ba72b4c8cf60 ("mm/sparsemem: support
>>>>> sub-section hotplug"). Dan was this an intentional change?
>>>> I do not know the purpose of this. But the change just leave section
>>>> start pfn in fully deactivated section_mem_map, and not used any more.
>>>
>>> Not intentional, IIRC at the time I had convinced myself that the
>>> value would always be translated by sparse_decode_mem_map(), so I
>>> thought it could be hiding NULL de-references.  I don't see any harm
>>> in the patch.
>>
>> Thanks for the confirmation. It would be great to have this in the
>> changelog.
> Should I post V2 with this commit log?
> 
> Thanks,
> Pingfan
> 

Id' say yes, and maybe also add some details why this makes makedumpfile
bail out now, so people looking at this commit only know what's happening.
Pingfan Liu Jan. 20, 2020, 2:31 a.m. UTC | #15
On Fri, Jan 17, 2020 at 5:53 PM David Hildenbrand <david@redhat.com> wrote:
>
[...]
> >> Thanks for the confirmation. It would be great to have this in the
> >> changelog.
> > Should I post V2 with this commit log?
> >
> > Thanks,
> > Pingfan
> >
>
> Id' say yes, and maybe also add some details why this makes makedumpfile
> bail out now, so people looking at this commit only know what's happening.
OK, thanks for the suggestion.

I have sent another patch for makedumpfile and cc you.

Thanks,
Pingfan
diff mbox series

Patch

diff --git a/mm/sparse.c b/mm/sparse.c
index 3822ecb..fddac80 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -789,7 +789,7 @@  static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
 			ms->usage = NULL;
 		}
 		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
-		ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr);
+		ms->section_mem_map = NULL;
 	}
 
 	if (section_is_early && memmap)