mbox series

[RFC,v2,0/2] Do not touch pages in remove_memory path

Message ID 20180817154127.28602-1-osalvador@techadventures.net (mailing list archive)
Headers show
Series Do not touch pages in remove_memory path | expand

Message

Oscar Salvador Aug. 17, 2018, 3:41 p.m. UTC
From: Oscar Salvador <osalvador@suse.de>

This patchset moves all zone/page handling from the remove_memory
path back to the offline_pages stage.
This has be done for two reasons:

1) We can access steal pages if we remove memory that was never online [1]
2) Code consistency

Currently, when we online memory, online_pages() takes care to initialize
the pages and put the memory range into its corresponding zone.
So, zone/pgdat's spanned/present pages get resized.

But the opposite does not happen when we offline memory.
Only present pages is decremented, and we wait to shrink zone/node's
spanned_pages until we remove that memory.
But as explained above, this is wrong.

So this patchset tries to cover this by moving this handling to the place 
it should be.

The main difficulty I faced here was in regard of HMM/devm, as it really handles
the hot-add/remove memory particulary, and what is more important,
also the resources.

I really scratched my head for ideas about how to handle this case, and
after some fails I came up with the idea that we could check for the
res->flags.

Memory resources that goes through the "official" memory-hotplug channels
have the IORESOURCE_SYSTEM_RAM flag.
This flag is made of (IORESOURCE_MEM|IORESOURCE_SYSRAM).

HMM/devm, on the other hand, request and release the resources
through devm_request_mem_region/devm_release_mem_region, and 
these resources do not contain the IORESOURCE_SYSRAM flag.

So what I ended up doing is to check for IORESOURCE_SYSRAM
in release_mem_region_adjustable.
If we see that a resource does not have such a flag, we know that
we are dealing with a resource coming from HMM/devm, and so,
we do not need to do anything as HMM/dev will take care of that part.

I online compiled the code, but I did not test it (I will do next week),
but I sent this RFCv2 mainly because I would like to get feedback,
and see if the direction I took is the right one.

This time I left out [2] because I am working on this in a separate patch,
and does not really belong to this patchset.

[1] https://patchwork.kernel.org/patch/10547445/ (Reported by David)
[2] https://patchwork.kernel.org/patch/10558723/

Oscar Salvador (2):
  mm/memory_hotplug: Add nid parameter to arch_remove_memory
  mm/memory_hotplug: Shrink spanned pages when offlining memory

 arch/ia64/mm/init.c            |   6 +-
 arch/powerpc/mm/mem.c          |  12 +---
 arch/s390/mm/init.c            |   2 +-
 arch/sh/mm/init.c              |   6 +-
 arch/x86/mm/init_32.c          |   6 +-
 arch/x86/mm/init_64.c          |  10 +--
 include/linux/memory_hotplug.h |  11 +++-
 kernel/memremap.c              |  16 ++---
 kernel/resource.c              |  16 +++++
 mm/hmm.c                       |  34 +++++-----
 mm/memory_hotplug.c            | 145 ++++++++++++++++++++++++++---------------
 mm/sparse.c                    |   4 +-
 12 files changed, 157 insertions(+), 111 deletions(-)

Comments

Oscar Salvador Aug. 28, 2018, 11:47 a.m. UTC | #1
On Fri, Aug 17, 2018 at 05:41:25PM +0200, Oscar Salvador wrote:
> From: Oscar Salvador <osalvador@suse.de>
[...]
> 
> The main difficulty I faced here was in regard of HMM/devm, as it really handles
> the hot-add/remove memory particulary, and what is more important,
> also the resources.
> 
> I really scratched my head for ideas about how to handle this case, and
> after some fails I came up with the idea that we could check for the
> res->flags.
> 
> Memory resources that goes through the "official" memory-hotplug channels
> have the IORESOURCE_SYSTEM_RAM flag.
> This flag is made of (IORESOURCE_MEM|IORESOURCE_SYSRAM).
> 
> HMM/devm, on the other hand, request and release the resources
> through devm_request_mem_region/devm_release_mem_region, and 
> these resources do not contain the IORESOURCE_SYSRAM flag.
> 
> So what I ended up doing is to check for IORESOURCE_SYSRAM
> in release_mem_region_adjustable.
> If we see that a resource does not have such a flag, we know that
> we are dealing with a resource coming from HMM/devm, and so,
> we do not need to do anything as HMM/dev will take care of that part.
> 

Jerome/Dan, now that the merge window is closed, and before sending the RFCv3, could you please check
this and see if you see something that is flagrant wrong? (about devm/HMM)

If you prefer I can send v3 spliting up even more.
Maybe this will ease the review.

Thanks
Jerome Glisse Aug. 29, 2018, 5:04 p.m. UTC | #2
On Tue, Aug 28, 2018 at 01:47:09PM +0200, Oscar Salvador wrote:
> On Fri, Aug 17, 2018 at 05:41:25PM +0200, Oscar Salvador wrote:
> > From: Oscar Salvador <osalvador@suse.de>
> [...]
> > 
> > The main difficulty I faced here was in regard of HMM/devm, as it really handles
> > the hot-add/remove memory particulary, and what is more important,
> > also the resources.
> > 
> > I really scratched my head for ideas about how to handle this case, and
> > after some fails I came up with the idea that we could check for the
> > res->flags.
> > 
> > Memory resources that goes through the "official" memory-hotplug channels
> > have the IORESOURCE_SYSTEM_RAM flag.
> > This flag is made of (IORESOURCE_MEM|IORESOURCE_SYSRAM).
> > 
> > HMM/devm, on the other hand, request and release the resources
> > through devm_request_mem_region/devm_release_mem_region, and 
> > these resources do not contain the IORESOURCE_SYSRAM flag.
> > 
> > So what I ended up doing is to check for IORESOURCE_SYSRAM
> > in release_mem_region_adjustable.
> > If we see that a resource does not have such a flag, we know that
> > we are dealing with a resource coming from HMM/devm, and so,
> > we do not need to do anything as HMM/dev will take care of that part.
> > 
> 
> Jerome/Dan, now that the merge window is closed, and before sending the RFCv3, could you please check
> this and see if you see something that is flagrant wrong? (about devm/HMM)
> 
> If you prefer I can send v3 spliting up even more.
> Maybe this will ease the review.
> 

This looks good to me you can add Reviewed-by: Jérôme Glisse <jglisse@redhat.com>