mbox series

[RFC,0/3] Do not touch pages in remove_memory path

Message ID 20180807133757.18352-1-osalvador@techadventures.net (mailing list archive)
Headers show
Series Do not touch pages in remove_memory path | expand

Message

Oscar Salvador Aug. 7, 2018, 1:37 p.m. UTC
From: Oscar Salvador <osalvador@suse.de>

This tries to fix [1], which was reported by David Hildenbrand, and also
does some cleanups/refactoring.

I am sending this as RFC to see if the direction I am going is right before
spending more time into it.
And also to gather feedback about hmm/zone_device stuff.
The code compiles and I tested it successfully with normal memory-hotplug operations.

Here we go:

With the following scenario:

1) We add memory
2) We do not online it
3) We remove the memory

an invalid access is being made to those pages.

The reason is that the pages are initialized in online_pages() path:

        /   online_pages
        |    move_pfn_range
ONLINE  |     move_pfn_range_to_zone
PAGES   |      ...
        |      memmap_init_zone

But depending on our policy about onlining the pages by default, we might not
online them right after having added the memory, and so, those pages might be
left unitialized.

This is a problem because we access those pages in arch_remove_memory:

...
if (altmap)
        page += vmem_altmap_offset(altmap);
        zone = page_zone(page);
...

So we are accessing unitialized data basically.


Currently, we need to have the zone from arch_remove_memory to all the way down
because

1) we call __remove_zone zo shrink spanned pages from pgdat/zone
2) we get the pgdat from the zone

Number 1 can be fixed by moving __remove_zone back to offline_pages(), where it should be.
This, besides fixing the bug, will make the code more consistent because all the reveserse
operations from online_pages() will be made in offline_pages().

Number 2 can be fixed by passing nid instead of zone.

The tricky part of all this is the hmm code and the zone_device stuff.

Fixing the calls to arch_remove_memory in the arch code is easy, but arch_remove_memory
is being used in:

kernel/memremap.c: devm_memremap_pages_release()
mm/hmm.c:          hmm_devmem_release()

I did my best to get my head around this, but my knowledge in that area is 0, so I am pretty sure
I did not get it right.

The thing is:

devm_memremap_pages(), which is the counterpart of devm_memremap_pages_release(),
calls arch_add_memory(), and then calls move_pfn_range_to_zone() (to ZONE_DEVICE).
So it does not go through online_pages().
So there I call shrink_pages() (it does pretty much as __remove_zone) before calling
to arch_remove_memory.
But as I said, I do now if that is right.

[1] https://patchwork.kernel.org/patch/10547445/

Oscar Salvador (3):
  mm/memory_hotplug: Add nid parameter to arch_remove_memory
  mm/memory_hotplug: Create __shrink_pages and move it to offline_pages
  mm/memory_hotplug: Refactor shrink_zone/pgdat_span

 arch/ia64/mm/init.c            |   6 +-
 arch/powerpc/mm/mem.c          |  13 +--
 arch/s390/mm/init.c            |   2 +-
 arch/sh/mm/init.c              |   6 +-
 arch/x86/mm/init_32.c          |   6 +-
 arch/x86/mm/init_64.c          |  10 +--
 include/linux/memory_hotplug.h |   8 +-
 kernel/memremap.c              |   9 +-
 mm/hmm.c                       |   6 +-
 mm/memory_hotplug.c            | 190 +++++++++++++++++++++--------------------
 mm/sparse.c                    |   4 +-
 11 files changed, 127 insertions(+), 133 deletions(-)

Comments

David Hildenbrand Aug. 7, 2018, 2:16 p.m. UTC | #1
On 07.08.2018 15:37, osalvador@techadventures.net wrote:
> From: Oscar Salvador <osalvador@suse.de>
> 
> This tries to fix [1], which was reported by David Hildenbrand, and also
> does some cleanups/refactoring.
> 
> I am sending this as RFC to see if the direction I am going is right before
> spending more time into it.
> And also to gather feedback about hmm/zone_device stuff.
> The code compiles and I tested it successfully with normal memory-hotplug operations.
>

Please coordinate next time with people already working on this,
otherwise you might end up wasting other people's time.

Anyhow, thanks for looking into this.
Oscar Salvador Aug. 7, 2018, 2:19 p.m. UTC | #2
On Tue, Aug 07, 2018 at 04:16:35PM +0200, David Hildenbrand wrote:
> On 07.08.2018 15:37, osalvador@techadventures.net wrote:
> > From: Oscar Salvador <osalvador@suse.de>
> > 
> > This tries to fix [1], which was reported by David Hildenbrand, and also
> > does some cleanups/refactoring.
> > 
> > I am sending this as RFC to see if the direction I am going is right before
> > spending more time into it.
> > And also to gather feedback about hmm/zone_device stuff.
> > The code compiles and I tested it successfully with normal memory-hotplug operations.
> >
> 
> Please coordinate next time with people already working on this,
> otherwise you might end up wasting other people's time.

Hi David,

Sorry, if you are already working on this, I step back immediately.
I will wait for your work.

thanks
David Hildenbrand Aug. 7, 2018, 2:20 p.m. UTC | #3
On 07.08.2018 16:19, Oscar Salvador wrote:
> On Tue, Aug 07, 2018 at 04:16:35PM +0200, David Hildenbrand wrote:
>> On 07.08.2018 15:37, osalvador@techadventures.net wrote:
>>> From: Oscar Salvador <osalvador@suse.de>
>>>
>>> This tries to fix [1], which was reported by David Hildenbrand, and also
>>> does some cleanups/refactoring.
>>>
>>> I am sending this as RFC to see if the direction I am going is right before
>>> spending more time into it.
>>> And also to gather feedback about hmm/zone_device stuff.
>>> The code compiles and I tested it successfully with normal memory-hotplug operations.
>>>
>>
>> Please coordinate next time with people already working on this,
>> otherwise you might end up wasting other people's time.
> 
> Hi David,
> 
> Sorry, if you are already working on this, I step back immediately.
> I will wait for your work.

No, please keep going, you are way ahead of me ;)

(I was got stuck at ZONE_DEVICE so far)

> 
> thanks
>
Oscar Salvador Aug. 7, 2018, 2:28 p.m. UTC | #4
On Tue, Aug 07, 2018 at 04:20:37PM +0200, David Hildenbrand wrote:
> On 07.08.2018 16:19, Oscar Salvador wrote:
> > On Tue, Aug 07, 2018 at 04:16:35PM +0200, David Hildenbrand wrote:
> >> On 07.08.2018 15:37, osalvador@techadventures.net wrote:
> >>> From: Oscar Salvador <osalvador@suse.de>
> >>>
> >>> This tries to fix [1], which was reported by David Hildenbrand, and also
> >>> does some cleanups/refactoring.
> >>>
> >>> I am sending this as RFC to see if the direction I am going is right before
> >>> spending more time into it.
> >>> And also to gather feedback about hmm/zone_device stuff.
> >>> The code compiles and I tested it successfully with normal memory-hotplug operations.
> >>>
> >>
> >> Please coordinate next time with people already working on this,
> >> otherwise you might end up wasting other people's time.
> > 
> > Hi David,
> > 
> > Sorry, if you are already working on this, I step back immediately.
> > I will wait for your work.
> 
> No, please keep going, you are way ahead of me ;)
> 
> (I was got stuck at ZONE_DEVICE so far)

It seems mine breaks ZONE_DEVICE for hmm at least, so.. not much better ^^.
So since you already got some work, let us not throw it away.

Thanks
David Hildenbrand Aug. 7, 2018, 2:41 p.m. UTC | #5
On 07.08.2018 16:28, Oscar Salvador wrote:
> On Tue, Aug 07, 2018 at 04:20:37PM +0200, David Hildenbrand wrote:
>> On 07.08.2018 16:19, Oscar Salvador wrote:
>>> On Tue, Aug 07, 2018 at 04:16:35PM +0200, David Hildenbrand wrote:
>>>> On 07.08.2018 15:37, osalvador@techadventures.net wrote:
>>>>> From: Oscar Salvador <osalvador@suse.de>
>>>>>
>>>>> This tries to fix [1], which was reported by David Hildenbrand, and also
>>>>> does some cleanups/refactoring.
>>>>>
>>>>> I am sending this as RFC to see if the direction I am going is right before
>>>>> spending more time into it.
>>>>> And also to gather feedback about hmm/zone_device stuff.
>>>>> The code compiles and I tested it successfully with normal memory-hotplug operations.
>>>>>
>>>>
>>>> Please coordinate next time with people already working on this,
>>>> otherwise you might end up wasting other people's time.
>>>
>>> Hi David,
>>>
>>> Sorry, if you are already working on this, I step back immediately.
>>> I will wait for your work.
>>
>> No, please keep going, you are way ahead of me ;)
>>
>> (I was got stuck at ZONE_DEVICE so far)
> 
> It seems mine breaks ZONE_DEVICE for hmm at least, so.. not much better ^^.
> So since you already got some work, let us not throw it away.

I am not close to an RFC (spent most time looking into the details -
still have plenty to learn in the MM area - and wondering on how to
handle ZONE_DEVICE). It might take some time for me to get something
clean up and running.

So let's continue with your series, I'll happily review it.

(I was just surprised by this series without a prior note as reply to
the patch where we discussed the solution for the problem)

> 
> Thanks
>
Oscar Salvador Aug. 7, 2018, 2:52 p.m. UTC | #6
On Tue, Aug 07, 2018 at 04:41:33PM +0200, David Hildenbrand wrote:
> I am not close to an RFC (spent most time looking into the details -
> still have plenty to learn in the MM area - and wondering on how to
> handle ZONE_DEVICE). It might take some time for me to get something
> clean up and running.

That is probably the way to go, details like ZONE_DEVICE what I thought
it was about to be easier.
This has been broken for a while, so a few more weeks(or more) will not hurt.
Also, I need to catch up with ZONE_DEVICE myself and I will be on vacation
for a few weeks, so that is it.

Feel free to re-use anything you find useful in these series(in case you find something).

Thanks
Pasha Tatashin Aug. 15, 2018, 2:05 p.m. UTC | #7
> This tries to fix [1], which was reported by David Hildenbrand, and also
> does some cleanups/refactoring.

Hi Oscar,

I would like to review this work. Are you in process of sending a new version? If so, I will wait for it.

Thank you,
Pavel

> 
> I am sending this as RFC to see if the direction I am going is right before
> spending more time into it.
> And also to gather feedback about hmm/zone_device stuff.
> The code compiles and I tested it successfully with normal memory-hotplug
> operations.
> 
> Here we go:
> 
> With the following scenario:
> 
> 1) We add memory
> 2) We do not online it
> 3) We remove the memory
> 
> an invalid access is being made to those pages.
> 
> The reason is that the pages are initialized in online_pages() path:
> 
>         /   online_pages
>         |    move_pfn_range
> ONLINE  |     move_pfn_range_to_zone
> PAGES   |      ...
>         |      memmap_init_zone
> 
> But depending on our policy about onlining the pages by default, we might not
> online them right after having added the memory, and so, those pages might
> be
> left unitialized.
> 
> This is a problem because we access those pages in arch_remove_memory:
> 
> ...
> if (altmap)
>         page += vmem_altmap_offset(altmap);
>         zone = page_zone(page);
> ...
> 
> So we are accessing unitialized data basically.
> 
> 
> Currently, we need to have the zone from arch_remove_memory to all the
> way down
> because
> 
> 1) we call __remove_zone zo shrink spanned pages from pgdat/zone
> 2) we get the pgdat from the zone
> 
> Number 1 can be fixed by moving __remove_zone back to offline_pages(),
> where it should be.
> This, besides fixing the bug, will make the code more consistent because all
> the reveserse
> operations from online_pages() will be made in offline_pages().
> 
> Number 2 can be fixed by passing nid instead of zone.
> 
> The tricky part of all this is the hmm code and the zone_device stuff.
> 
> Fixing the calls to arch_remove_memory in the arch code is easy, but
> arch_remove_memory
> is being used in:
> 
> kernel/memremap.c: devm_memremap_pages_release()
> mm/hmm.c:          hmm_devmem_release()
> 
> I did my best to get my head around this, but my knowledge in that area is 0,
> so I am pretty sure
> I did not get it right.
> 
> The thing is:
> 
> devm_memremap_pages(), which is the counterpart of
> devm_memremap_pages_release(),
> calls arch_add_memory(), and then calls move_pfn_range_to_zone() (to
> ZONE_DEVICE).
> So it does not go through online_pages().
> So there I call shrink_pages() (it does pretty much as __remove_zone) before
> calling
> to arch_remove_memory.
> But as I said, I do now if that is right.
> 
> [1] https://patchwork.kernel.org/patch/10547445/
> 
> Oscar Salvador (3):
>   mm/memory_hotplug: Add nid parameter to arch_remove_memory
>   mm/memory_hotplug: Create __shrink_pages and move it to offline_pages
>   mm/memory_hotplug: Refactor shrink_zone/pgdat_span
> 
>  arch/ia64/mm/init.c            |   6 +-
>  arch/powerpc/mm/mem.c          |  13 +--
>  arch/s390/mm/init.c            |   2 +-
>  arch/sh/mm/init.c              |   6 +-
>  arch/x86/mm/init_32.c          |   6 +-
>  arch/x86/mm/init_64.c          |  10 +--
>  include/linux/memory_hotplug.h |   8 +-
>  kernel/memremap.c              |   9 +-
>  mm/hmm.c                       |   6 +-
>  mm/memory_hotplug.c            | 190 +++++++++++++++++++++--------------------
>  mm/sparse.c                    |   4 +-
>  11 files changed, 127 insertions(+), 133 deletions(-)
> 
> --
> 2.13.6
Oscar Salvador Aug. 15, 2018, 2:32 p.m. UTC | #8
On Wed, Aug 15, 2018 at 02:05:35PM +0000, Pavel Tatashin wrote:
> > This tries to fix [1], which was reported by David Hildenbrand, and also
> > does some cleanups/refactoring.
> 
> Hi Oscar,
> 
> I would like to review this work. Are you in process of sending a new version? If so, I will wait for it.

Hi Pavel,

Yes, I plan to send a new version by Friday latest (although I hope tomorrow).

Thanks