diff mbox series

[v2] mm/sparsemem: get address to page struct instead of address to pfn

Message ID 20200210005048.10437-1-richardw.yang@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [v2] mm/sparsemem: get address to page struct instead of address to pfn | expand

Commit Message

Wei Yang Feb. 10, 2020, 12:50 a.m. UTC
memmap should be the address to page struct instead of address to pfn.

As mentioned by David, if system memory and devmem sit within a
section, the mismatch address would lead kdump to dump unexpected
memory.

Since sub-section only works for SPARSEMEM_VMEMMAP, pfn_to_page() is
valid to get the page struct address at this point.

Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: David Hildenbrand <david@redhat.com>
CC: Baoquan He <bhe@redhat.com>

---
v2:
  * adjust comment to mention the mismatch data would affect kdump

---
 mm/sparse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Hildenbrand Feb. 10, 2020, 9 a.m. UTC | #1
On 10.02.20 01:50, Wei Yang wrote:
> memmap should be the address to page struct instead of address to pfn.
> 

"mm/sparsemem: fix wrong address in ms->section_mem_map with sub-sections

We want to store the address of the memmap, not the address of the first
pfn.

E.g., we can have both (boot) system memory and devmem residing in a
single section. Once we hot-add the devmem part, the address stored in
ms->section_mem_map would be wrong, and kdump would not be able to
dump the right memory.
"

? See below

> As mentioned by David, if system memory and devmem sit within a
> section, the mismatch address would lead kdump to dump unexpected
> memory.
> 
> Since sub-section only works for SPARSEMEM_VMEMMAP, pfn_to_page() is
> valid to get the page struct address at this point.
> 
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> CC: Dan Williams <dan.j.williams@intel.com>
> CC: David Hildenbrand <david@redhat.com>
> CC: Baoquan He <bhe@redhat.com>
> 
> ---
> v2:
>   * adjust comment to mention the mismatch data would affect kdump
> 
> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 586d85662978..4862ec2cfbc0 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -887,7 +887,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>  
>  	/* Align memmap to section boundary in the subsection case */
>  	if (section_nr_to_pfn(section_nr) != start_pfn)
> -		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));

I think this whole code should be reworked.

Callee returns a pointer. Caller: Nah, I know it better.

Just nasty.


Can we do something like this instead:


diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 200aef686722..c5091feef29e 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -266,5 +266,5 @@ struct page * __meminit
__populate_section_memmap(unsigned long pfn,
        if (vmemmap_populate(start, end, nid, altmap))
                return NULL;

-       return pfn_to_page(pfn);
+       return pfn_to_page(SECTION_ALIGN_DOWN(pfn));
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index c184b69460b7..21902d7931e4 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -788,6 +788,10 @@ static void section_deactivate(unsigned long pfn,
unsigned long nr_pages,
                depopulate_section_memmap(pfn, nr_pages, altmap);
 }

+/*
+ * Returns the memmap of the first pfn of the section (not of
+ * sub-sections).
+ */
 static struct page * __meminit section_activate(int nid, unsigned long pfn,
                unsigned long nr_pages, struct vmem_altmap *altmap)
 {
@@ -882,9 +886,6 @@ int __meminit sparse_add_section(int nid, unsigned
long start_pfn,
        set_section_nid(section_nr, nid);
        section_mark_present(ms);

-       /* Align memmap to section boundary in the subsection case */
-       if (section_nr_to_pfn(section_nr) != start_pfn)
-               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
        sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);

        return 0;


Untested, of course :)
David Hildenbrand Feb. 10, 2020, 9:13 a.m. UTC | #2
On 10.02.20 10:00, David Hildenbrand wrote:
> On 10.02.20 01:50, Wei Yang wrote:
>> memmap should be the address to page struct instead of address to pfn.
>>
> 
> "mm/sparsemem: fix wrong address in ms->section_mem_map with sub-sections
> 
> We want to store the address of the memmap, not the address of the first
> pfn.
> 
> E.g., we can have both (boot) system memory and devmem residing in a
> single section. Once we hot-add the devmem part, the address stored in
> ms->section_mem_map would be wrong, and kdump would not be able to
> dump the right memory.
> "
> 
> ? See below
> 
>> As mentioned by David, if system memory and devmem sit within a
>> section, the mismatch address would lead kdump to dump unexpected
>> memory.
>>
>> Since sub-section only works for SPARSEMEM_VMEMMAP, pfn_to_page() is
>> valid to get the page struct address at this point.
>>
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> CC: David Hildenbrand <david@redhat.com>
>> CC: Baoquan He <bhe@redhat.com>
>>
>> ---
>> v2:
>>   * adjust comment to mention the mismatch data would affect kdump
>>
>> ---
>>  mm/sparse.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index 586d85662978..4862ec2cfbc0 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -887,7 +887,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>  
>>  	/* Align memmap to section boundary in the subsection case */
>>  	if (section_nr_to_pfn(section_nr) != start_pfn)
>> -		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
> 
> I think this whole code should be reworked.
> 
> Callee returns a pointer. Caller: Nah, I know it better.
> 
> Just nasty.
> 
> 
> Can we do something like this instead:
> 
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 200aef686722..c5091feef29e 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -266,5 +266,5 @@ struct page * __meminit
> __populate_section_memmap(unsigned long pfn,
>         if (vmemmap_populate(start, end, nid, altmap))
>                 return NULL;
> 
> -       return pfn_to_page(pfn);
> +       return pfn_to_page(SECTION_ALIGN_DOWN(pfn));
>  }
> diff --git a/mm/sparse.c b/mm/sparse.c
> index c184b69460b7..21902d7931e4 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -788,6 +788,10 @@ static void section_deactivate(unsigned long pfn,
> unsigned long nr_pages,
>                 depopulate_section_memmap(pfn, nr_pages, altmap);
>  }
> 
> +/*
> + * Returns the memmap of the first pfn of the section (not of
> + * sub-sections).
> + */
>  static struct page * __meminit section_activate(int nid, unsigned long pfn,
>                 unsigned long nr_pages, struct vmem_altmap *altmap)
>  {
> @@ -882,9 +886,6 @@ int __meminit sparse_add_section(int nid, unsigned
> long start_pfn,
>         set_section_nid(section_nr, nid);
>         section_mark_present(ms);
> 
> -       /* Align memmap to section boundary in the subsection case */
> -       if (section_nr_to_pfn(section_nr) != start_pfn)
> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>         sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
> 
>         return 0;
> 
> 
> Untested, of course :)
> 

I think the following would be needed as well with Wei's fix:

@@ -876,15 +880,13 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
         * Poison uninitialized struct pages in order to catch invalid flags
         * combinations.
         */
-       page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
+       page_init_poison(memmap + start_pfn - SECTION_ALIGN_DOWN(start_pfn),
+                        sizeof(struct page) * nr_pages);
 
        ms = __nr_to_section(section_nr);

Or we can simply move the poisoning behind sparse_init_one_section().
Wei Yang Feb. 10, 2020, 11:16 p.m. UTC | #3
On Mon, Feb 10, 2020 at 10:00:47AM +0100, David Hildenbrand wrote:
>On 10.02.20 01:50, Wei Yang wrote:
>> memmap should be the address to page struct instead of address to pfn.
>> 
>
>"mm/sparsemem: fix wrong address in ms->section_mem_map with sub-sections
>
>We want to store the address of the memmap, not the address of the first
>pfn.
>
>E.g., we can have both (boot) system memory and devmem residing in a
>single section. Once we hot-add the devmem part, the address stored in
>ms->section_mem_map would be wrong, and kdump would not be able to
>dump the right memory.
>"
>
>? See below
>
>> As mentioned by David, if system memory and devmem sit within a
>> section, the mismatch address would lead kdump to dump unexpected
>> memory.
>> 
>> Since sub-section only works for SPARSEMEM_VMEMMAP, pfn_to_page() is
>> valid to get the page struct address at this point.
>> 
>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> CC: David Hildenbrand <david@redhat.com>
>> CC: Baoquan He <bhe@redhat.com>
>> 
>> ---
>> v2:
>>   * adjust comment to mention the mismatch data would affect kdump
>> 
>> ---
>>  mm/sparse.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index 586d85662978..4862ec2cfbc0 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -887,7 +887,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>  
>>  	/* Align memmap to section boundary in the subsection case */
>>  	if (section_nr_to_pfn(section_nr) != start_pfn)
>> -		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>
>I think this whole code should be reworked.
>
>Callee returns a pointer. Caller: Nah, I know it better.
>
>Just nasty.
>
>
>Can we do something like this instead:
>
>
>diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>index 200aef686722..c5091feef29e 100644
>--- a/mm/sparse-vmemmap.c
>+++ b/mm/sparse-vmemmap.c
>@@ -266,5 +266,5 @@ struct page * __meminit
>__populate_section_memmap(unsigned long pfn,
>        if (vmemmap_populate(start, end, nid, altmap))
>                return NULL;
>
>-       return pfn_to_page(pfn);
>+       return pfn_to_page(SECTION_ALIGN_DOWN(pfn));
> }
>diff --git a/mm/sparse.c b/mm/sparse.c
>index c184b69460b7..21902d7931e4 100644
>--- a/mm/sparse.c
>+++ b/mm/sparse.c
>@@ -788,6 +788,10 @@ static void section_deactivate(unsigned long pfn,
>unsigned long nr_pages,
>                depopulate_section_memmap(pfn, nr_pages, altmap);
> }
>
>+/*
>+ * Returns the memmap of the first pfn of the section (not of
>+ * sub-sections).
>+ */
> static struct page * __meminit section_activate(int nid, unsigned long pfn,
>                unsigned long nr_pages, struct vmem_altmap *altmap)
> {
>@@ -882,9 +886,6 @@ int __meminit sparse_add_section(int nid, unsigned
>long start_pfn,
>        set_section_nid(section_nr, nid);
>        section_mark_present(ms);
>
>-       /* Align memmap to section boundary in the subsection case */
>-       if (section_nr_to_pfn(section_nr) != start_pfn)
>-               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>        sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
>
>        return 0;
>
>
>Untested, of course :)

I think you get some point. As you mentioned in the following reply, we need
to adjust poisoning after this change.

This looks like a trade off between two options. I don't have a strong
preference.

>
>-- 
>Thanks,
>
>David / dhildenb
David Hildenbrand Feb. 11, 2020, 2:01 p.m. UTC | #4
On 11.02.20 00:16, Wei Yang wrote:
> On Mon, Feb 10, 2020 at 10:00:47AM +0100, David Hildenbrand wrote:
>> On 10.02.20 01:50, Wei Yang wrote:
>>> memmap should be the address to page struct instead of address to pfn.
>>>
>>
>> "mm/sparsemem: fix wrong address in ms->section_mem_map with sub-sections
>>
>> We want to store the address of the memmap, not the address of the first
>> pfn.
>>
>> E.g., we can have both (boot) system memory and devmem residing in a
>> single section. Once we hot-add the devmem part, the address stored in
>> ms->section_mem_map would be wrong, and kdump would not be able to
>> dump the right memory.
>> "
>>
>> ? See below
>>
>>> As mentioned by David, if system memory and devmem sit within a
>>> section, the mismatch address would lead kdump to dump unexpected
>>> memory.
>>>
>>> Since sub-section only works for SPARSEMEM_VMEMMAP, pfn_to_page() is
>>> valid to get the page struct address at this point.
>>>
>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>> CC: Dan Williams <dan.j.williams@intel.com>
>>> CC: David Hildenbrand <david@redhat.com>
>>> CC: Baoquan He <bhe@redhat.com>
>>>
>>> ---
>>> v2:
>>>   * adjust comment to mention the mismatch data would affect kdump
>>>
>>> ---
>>>  mm/sparse.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>> index 586d85662978..4862ec2cfbc0 100644
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -887,7 +887,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>>  
>>>  	/* Align memmap to section boundary in the subsection case */
>>>  	if (section_nr_to_pfn(section_nr) != start_pfn)
>>> -		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>>
>> I think this whole code should be reworked.
>>
>> Callee returns a pointer. Caller: Nah, I know it better.
>>
>> Just nasty.
>>
>>
>> Can we do something like this instead:
>>
>>
>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>> index 200aef686722..c5091feef29e 100644
>> --- a/mm/sparse-vmemmap.c
>> +++ b/mm/sparse-vmemmap.c
>> @@ -266,5 +266,5 @@ struct page * __meminit
>> __populate_section_memmap(unsigned long pfn,
>>        if (vmemmap_populate(start, end, nid, altmap))
>>                return NULL;
>>
>> -       return pfn_to_page(pfn);
>> +       return pfn_to_page(SECTION_ALIGN_DOWN(pfn));
>> }
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index c184b69460b7..21902d7931e4 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -788,6 +788,10 @@ static void section_deactivate(unsigned long pfn,
>> unsigned long nr_pages,
>>                depopulate_section_memmap(pfn, nr_pages, altmap);
>> }
>>
>> +/*
>> + * Returns the memmap of the first pfn of the section (not of
>> + * sub-sections).
>> + */
>> static struct page * __meminit section_activate(int nid, unsigned long pfn,
>>                unsigned long nr_pages, struct vmem_altmap *altmap)
>> {
>> @@ -882,9 +886,6 @@ int __meminit sparse_add_section(int nid, unsigned
>> long start_pfn,
>>        set_section_nid(section_nr, nid);
>>        section_mark_present(ms);
>>
>> -       /* Align memmap to section boundary in the subsection case */
>> -       if (section_nr_to_pfn(section_nr) != start_pfn)
>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>        sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
>>
>>        return 0;
>>
>>
>> Untested, of course :)
> 
> I think you get some point. As you mentioned in the following reply, we need
> to adjust poisoning after this change.

We can just poison after setting up the section (IOW, move it further down).

> 
> This looks like a trade off between two options. I don't have a strong
> preference.

I clearly prefer if *section*_activate() returns the memmap of the
section. This code is just confusing. But I can send a cleanup on top if
you want to keep it like that for now.
Wei Yang Feb. 12, 2020, 2:28 a.m. UTC | #5
On Tue, Feb 11, 2020 at 03:01:35PM +0100, David Hildenbrand wrote:
>On 11.02.20 00:16, Wei Yang wrote:
>> On Mon, Feb 10, 2020 at 10:00:47AM +0100, David Hildenbrand wrote:
>>> On 10.02.20 01:50, Wei Yang wrote:
>>>> memmap should be the address to page struct instead of address to pfn.
>>>>
>>>
>>> "mm/sparsemem: fix wrong address in ms->section_mem_map with sub-sections
>>>
>>> We want to store the address of the memmap, not the address of the first
>>> pfn.
>>>
>>> E.g., we can have both (boot) system memory and devmem residing in a
>>> single section. Once we hot-add the devmem part, the address stored in
>>> ms->section_mem_map would be wrong, and kdump would not be able to
>>> dump the right memory.
>>> "
>>>
>>> ? See below
>>>
>>>> As mentioned by David, if system memory and devmem sit within a
>>>> section, the mismatch address would lead kdump to dump unexpected
>>>> memory.
>>>>
>>>> Since sub-section only works for SPARSEMEM_VMEMMAP, pfn_to_page() is
>>>> valid to get the page struct address at this point.
>>>>
>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>> CC: Dan Williams <dan.j.williams@intel.com>
>>>> CC: David Hildenbrand <david@redhat.com>
>>>> CC: Baoquan He <bhe@redhat.com>
>>>>
>>>> ---
>>>> v2:
>>>>   * adjust comment to mention the mismatch data would affect kdump
>>>>
>>>> ---
>>>>  mm/sparse.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>> index 586d85662978..4862ec2cfbc0 100644
>>>> --- a/mm/sparse.c
>>>> +++ b/mm/sparse.c
>>>> @@ -887,7 +887,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>>>  
>>>>  	/* Align memmap to section boundary in the subsection case */
>>>>  	if (section_nr_to_pfn(section_nr) != start_pfn)
>>>> -		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>>> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>>>
>>> I think this whole code should be reworked.
>>>
>>> Callee returns a pointer. Caller: Nah, I know it better.
>>>
>>> Just nasty.
>>>
>>>
>>> Can we do something like this instead:
>>>
>>>
>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>> index 200aef686722..c5091feef29e 100644
>>> --- a/mm/sparse-vmemmap.c
>>> +++ b/mm/sparse-vmemmap.c
>>> @@ -266,5 +266,5 @@ struct page * __meminit
>>> __populate_section_memmap(unsigned long pfn,
>>>        if (vmemmap_populate(start, end, nid, altmap))
>>>                return NULL;
>>>
>>> -       return pfn_to_page(pfn);
>>> +       return pfn_to_page(SECTION_ALIGN_DOWN(pfn));
>>> }
>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>> index c184b69460b7..21902d7931e4 100644
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -788,6 +788,10 @@ static void section_deactivate(unsigned long pfn,
>>> unsigned long nr_pages,
>>>                depopulate_section_memmap(pfn, nr_pages, altmap);
>>> }
>>>
>>> +/*
>>> + * Returns the memmap of the first pfn of the section (not of
>>> + * sub-sections).
>>> + */
>>> static struct page * __meminit section_activate(int nid, unsigned long pfn,
>>>                unsigned long nr_pages, struct vmem_altmap *altmap)
>>> {
>>> @@ -882,9 +886,6 @@ int __meminit sparse_add_section(int nid, unsigned
>>> long start_pfn,
>>>        set_section_nid(section_nr, nid);
>>>        section_mark_present(ms);
>>>
>>> -       /* Align memmap to section boundary in the subsection case */
>>> -       if (section_nr_to_pfn(section_nr) != start_pfn)
>>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>>        sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
>>>
>>>        return 0;
>>>
>>>
>>> Untested, of course :)
>> 
>> I think you get some point. As you mentioned in the following reply, we need
>> to adjust poisoning after this change.
>
>We can just poison after setting up the section (IOW, move it further down).
>
>> 
>> This looks like a trade off between two options. I don't have a strong
>> preference.
>
>I clearly prefer if *section*_activate() returns the memmap of the
>section. This code is just confusing. But I can send a cleanup on top if
>you want to keep it like that for now.
>

Sure, a cleanup patch may help audience get more understanding about the
change.

>
>-- 
>Thanks,
>
>David / dhildenb
David Hildenbrand Feb. 12, 2020, 11:22 a.m. UTC | #6
On 12.02.20 03:28, Wei Yang wrote:
> On Tue, Feb 11, 2020 at 03:01:35PM +0100, David Hildenbrand wrote:
>> On 11.02.20 00:16, Wei Yang wrote:
>>> On Mon, Feb 10, 2020 at 10:00:47AM +0100, David Hildenbrand wrote:
>>>> On 10.02.20 01:50, Wei Yang wrote:
>>>>> memmap should be the address to page struct instead of address to pfn.
>>>>>
>>>>
>>>> "mm/sparsemem: fix wrong address in ms->section_mem_map with sub-sections
>>>>
>>>> We want to store the address of the memmap, not the address of the first
>>>> pfn.
>>>>
>>>> E.g., we can have both (boot) system memory and devmem residing in a
>>>> single section. Once we hot-add the devmem part, the address stored in
>>>> ms->section_mem_map would be wrong, and kdump would not be able to
>>>> dump the right memory.
>>>> "
>>>>
>>>> ? See below
>>>>
>>>>> As mentioned by David, if system memory and devmem sit within a
>>>>> section, the mismatch address would lead kdump to dump unexpected
>>>>> memory.
>>>>>
>>>>> Since sub-section only works for SPARSEMEM_VMEMMAP, pfn_to_page() is
>>>>> valid to get the page struct address at this point.
>>>>>
>>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>>>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>>>> CC: Dan Williams <dan.j.williams@intel.com>
>>>>> CC: David Hildenbrand <david@redhat.com>
>>>>> CC: Baoquan He <bhe@redhat.com>
>>>>>
>>>>> ---
>>>>> v2:
>>>>>   * adjust comment to mention the mismatch data would affect kdump
>>>>>
>>>>> ---
>>>>>  mm/sparse.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>>> index 586d85662978..4862ec2cfbc0 100644
>>>>> --- a/mm/sparse.c
>>>>> +++ b/mm/sparse.c
>>>>> @@ -887,7 +887,7 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
>>>>>  
>>>>>  	/* Align memmap to section boundary in the subsection case */
>>>>>  	if (section_nr_to_pfn(section_nr) != start_pfn)
>>>>> -		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>>>> +		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
>>>>
>>>> I think this whole code should be reworked.
>>>>
>>>> Callee returns a pointer. Caller: Nah, I know it better.
>>>>
>>>> Just nasty.
>>>>
>>>>
>>>> Can we do something like this instead:
>>>>
>>>>
>>>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>>>> index 200aef686722..c5091feef29e 100644
>>>> --- a/mm/sparse-vmemmap.c
>>>> +++ b/mm/sparse-vmemmap.c
>>>> @@ -266,5 +266,5 @@ struct page * __meminit
>>>> __populate_section_memmap(unsigned long pfn,
>>>>        if (vmemmap_populate(start, end, nid, altmap))
>>>>                return NULL;
>>>>
>>>> -       return pfn_to_page(pfn);
>>>> +       return pfn_to_page(SECTION_ALIGN_DOWN(pfn));
>>>> }
>>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>>> index c184b69460b7..21902d7931e4 100644
>>>> --- a/mm/sparse.c
>>>> +++ b/mm/sparse.c
>>>> @@ -788,6 +788,10 @@ static void section_deactivate(unsigned long pfn,
>>>> unsigned long nr_pages,
>>>>                depopulate_section_memmap(pfn, nr_pages, altmap);
>>>> }
>>>>
>>>> +/*
>>>> + * Returns the memmap of the first pfn of the section (not of
>>>> + * sub-sections).
>>>> + */
>>>> static struct page * __meminit section_activate(int nid, unsigned long pfn,
>>>>                unsigned long nr_pages, struct vmem_altmap *altmap)
>>>> {
>>>> @@ -882,9 +886,6 @@ int __meminit sparse_add_section(int nid, unsigned
>>>> long start_pfn,
>>>>        set_section_nid(section_nr, nid);
>>>>        section_mark_present(ms);
>>>>
>>>> -       /* Align memmap to section boundary in the subsection case */
>>>> -       if (section_nr_to_pfn(section_nr) != start_pfn)
>>>> -               memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
>>>>        sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
>>>>
>>>>        return 0;
>>>>
>>>>
>>>> Untested, of course :)
>>>
>>> I think you get some point. As you mentioned in the following reply, we need
>>> to adjust poisoning after this change.
>>
>> We can just poison after setting up the section (IOW, move it further down).
>>
>>>
>>> This looks like a trade off between two options. I don't have a strong
>>> preference.
>>
>> I clearly prefer if *section*_activate() returns the memmap of the
>> section. This code is just confusing. But I can send a cleanup on top if
>> you want to keep it like that for now.
>>
> 
> Sure, a cleanup patch may help audience get more understanding about the
> change.
> 

For this simple fix for now.

Acked-by: David Hildenbrand <david@redhat.com>

Will send a cleanup in case I don't forget :)
diff mbox series

Patch

diff --git a/mm/sparse.c b/mm/sparse.c
index 586d85662978..4862ec2cfbc0 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -887,7 +887,7 @@  int __meminit sparse_add_section(int nid, unsigned long start_pfn,
 
 	/* Align memmap to section boundary in the subsection case */
 	if (section_nr_to_pfn(section_nr) != start_pfn)
-		memmap = pfn_to_kaddr(section_nr_to_pfn(section_nr));
+		memmap = pfn_to_page(section_nr_to_pfn(section_nr));
 	sparse_init_one_section(ms, section_nr, memmap, ms->usage, 0);
 
 	return 0;