mbox series

[v2,0/7] mm/hotplug: Only use subsection map in VMEMMAP case

Message ID 20200220043316.19668-1-bhe@redhat.com (mailing list archive)
Headers show
Series mm/hotplug: Only use subsection map in VMEMMAP case | expand

Message

Baoquan He Feb. 20, 2020, 4:33 a.m. UTC
Memory sub-section hotplug was added to fix the issue that nvdimm could
be mapped at non-section aligned starting address. A subsection map is
added into struct mem_section_usage to implement it. However, sub-section
is only supported in VMEMMAP case. Hence there's no need to operate
subsection map in SPARSEMEM|!VMEMMAP case. In this patchset, change
codes to make sub-section map and the relevant operation only available
in VMEMMAP case.

And since sub-section hotplug added, the hot add/remove functionality
have been broken in SPARSEMEM|!VMEMMAP case. Wei Yang and I, each of us
make one patch to fix one of the failures. In this patchset, the patch
1/7 from me is used to fix the hot remove failure. Wei Yang's patch has
been merged by Andrew. 

Changelog:
v1->v2:
  Move the hot remove fixing patch to the front so that people can 
  back port it to easier. Suggested by David.

  Split the old patch which invalidate the sub-section map in
  !VMEMMAP case into two patches, patch 4/7, and patch 6/7. This
  makes patch reviewing easier. David by David.

  Take Wei Yang's fixing patch out to post alone, since it has been
  reviewed and acked by people. Suggested by Andrew.

  Fix a code comment mistake in the current patch 2/7. Found out by
  Wei Yang during reviewing.

Baoquan He (7):
  mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
  mm/sparse.c: introduce new function fill_subsection_map()
  mm/sparse.c: introduce a new function clear_subsection_map()
  mm/sparse.c: only use subsection map in VMEMMAP case
  mm/sparse.c: add code comment about sub-section hotplug
  mm/sparse.c: move subsection_map related codes together
  mm/sparse.c: Use __get_free_pages() instead in
    populate_section_memmap()

 include/linux/mmzone.h |   2 +
 mm/sparse.c            | 178 +++++++++++++++++++++++++++++------------
 2 files changed, 127 insertions(+), 53 deletions(-)

Comments

Michal Hocko Feb. 20, 2020, 10:38 a.m. UTC | #1
On Thu 20-02-20 12:33:09, Baoquan He wrote:
> Memory sub-section hotplug was added to fix the issue that nvdimm could
> be mapped at non-section aligned starting address. A subsection map is
> added into struct mem_section_usage to implement it. However, sub-section
> is only supported in VMEMMAP case.

Why? Is there any fundamental reason or just a lack of implementation?
VMEMMAP should be really only an implementation detail unless I am
missing something subtle.

> Hence there's no need to operate
> subsection map in SPARSEMEM|!VMEMMAP case. In this patchset, change
> codes to make sub-section map and the relevant operation only available
> in VMEMMAP case.
> 
> And since sub-section hotplug added, the hot add/remove functionality
> have been broken in SPARSEMEM|!VMEMMAP case. Wei Yang and I, each of us
> make one patch to fix one of the failures. In this patchset, the patch
> 1/7 from me is used to fix the hot remove failure. Wei Yang's patch has
> been merged by Andrew.

Not sure I understand. Are there more issues to be fixed?
>  include/linux/mmzone.h |   2 +
>  mm/sparse.c            | 178 +++++++++++++++++++++++++++++------------
>  2 files changed, 127 insertions(+), 53 deletions(-)

Why do we need to add so much code to remove a functionality from one
memory model?
Baoquan He Feb. 21, 2020, 2:28 p.m. UTC | #2
On 02/20/20 at 11:38am, Michal Hocko wrote:
> On Thu 20-02-20 12:33:09, Baoquan He wrote:
> > Memory sub-section hotplug was added to fix the issue that nvdimm could
> > be mapped at non-section aligned starting address. A subsection map is
> > added into struct mem_section_usage to implement it. However, sub-section
> > is only supported in VMEMMAP case.
> 
> Why? Is there any fundamental reason or just a lack of implementation?
> VMEMMAP should be really only an implementation detail unless I am
> missing something subtle.

Thanks for checking.

VMEMMAP is one of two ways to convert a PFN to the corresponding
'struct page' in SPARSE model. I mentioned them as VMEMMAP case, or
!VMEMMAP case because we called them like this previously when reviewed
patches, hope it won't cause confusion.

Currently, config ZONE_DEVICE depends on SPARSEMEM_VMEMMAP. The
subsection_map is added to struct mem_section_usage to track which sub
section is present, VMEMMAP fills those bits which corresponding
sub-sections are present, while !VMEMMAP, namely classic SPARSE, fills
the whole map always.

As we know, VMEMMAP builds page table to map a cluster of 'struct page'
into the corresponding area of 'vmemmap'. Subsection hotplug can be
supported naturally, w/o any change, just map needed region related to
sub-sections on demand. For !VMEMMAP, it allocates memmap with
alloc_pages() or vmalloc, thing is a little complicated, e.g the mixed
section, boot memory occupies the starting area, later pmem hot added to
the rear part.

About !VMEMMAP which doesn't support sub-section hotplog, Dan said 
it's more because the effort and maintenance burden outweighs the
benefit. And the current 64 bit ARCHes all enable
SPARSEMEM_VMEMMAP_ENABLE by default.

So no need to keep subsection_map and its handling in SPARSE|!VMEMMAP.

> 
> > Hence there's no need to operate
> > subsection map in SPARSEMEM|!VMEMMAP case. In this patchset, change
> > codes to make sub-section map and the relevant operation only available
> > in VMEMMAP case.
> > 
> > And since sub-section hotplug added, the hot add/remove functionality
> > have been broken in SPARSEMEM|!VMEMMAP case. Wei Yang and I, each of us
> > make one patch to fix one of the failures. In this patchset, the patch
> > 1/7 from me is used to fix the hot remove failure. Wei Yang's patch has
> > been merged by Andrew.
> 
> Not sure I understand. Are there more issues to be fixed?

Only these two. Wei Yang firstly posted the patch to fix the hot add
failure in SPARSE|!VMEMMAP. When I reviewed his patch and tested, found
hot remove failed too. So the patch 1/7 is to fix the hot remove failure
in !VMEMMAP. With these two patches, hot add/remove works well in !VMEMMAP.
Not sure if it's clear.

> >  include/linux/mmzone.h |   2 +
> >  mm/sparse.c            | 178 +++++++++++++++++++++++++++++------------
> >  2 files changed, 127 insertions(+), 53 deletions(-)
> 
> Why do we need to add so much code to remove a functionality from one
> memory model?

Hmm, Dan also asked this before.

The adding mainly happens in patch 2, 3, 4, including the two newly
added function defitions, the code comments above them, and those added
dummy functions for !VMEMMAP.

Thanks
Baoquan
David Hildenbrand Feb. 25, 2020, 9:10 a.m. UTC | #3
>>>  include/linux/mmzone.h |   2 +
>>>  mm/sparse.c            | 178 +++++++++++++++++++++++++++++------------
>>>  2 files changed, 127 insertions(+), 53 deletions(-)
>>
>> Why do we need to add so much code to remove a functionality from one
>> memory model?
> 
> Hmm, Dan also asked this before.
> 
> The adding mainly happens in patch 2, 3, 4, including the two newly
> added function defitions, the code comments above them, and those added
> dummy functions for !VMEMMAP.

AFAIKS, it's mostly a bunch of newly added comments on top of functions.
E.g., the comment for fill_subsection_map() alone spans 12 LOC in total.
I do wonder if we have to be that verbose. We are barely that verbose on
MM code (and usually I don't see much benefit unless it's a function
with many users from many different places).
Michal Hocko Feb. 25, 2020, 10:02 a.m. UTC | #4
On Tue 25-02-20 10:10:45, David Hildenbrand wrote:
> >>>  include/linux/mmzone.h |   2 +
> >>>  mm/sparse.c            | 178 +++++++++++++++++++++++++++++------------
> >>>  2 files changed, 127 insertions(+), 53 deletions(-)
> >>
> >> Why do we need to add so much code to remove a functionality from one
> >> memory model?
> > 
> > Hmm, Dan also asked this before.
> > 
> > The adding mainly happens in patch 2, 3, 4, including the two newly
> > added function defitions, the code comments above them, and those added
> > dummy functions for !VMEMMAP.
> 
> AFAIKS, it's mostly a bunch of newly added comments on top of functions.
> E.g., the comment for fill_subsection_map() alone spans 12 LOC in total.
> I do wonder if we have to be that verbose. We are barely that verbose on
> MM code (and usually I don't see much benefit unless it's a function
> with many users from many different places).

I would tend to agree here. Not that I am against kernel doc
documentation but these are internal functions and the comment doesn't
really give any better insight IMHO. I would be much more inclined if
this was the general pattern in the respective file but it just stands
out.
Michal Hocko Feb. 25, 2020, 10:03 a.m. UTC | #5
On Fri 21-02-20 22:28:47, Baoquan He wrote:
> On 02/20/20 at 11:38am, Michal Hocko wrote:
> > On Thu 20-02-20 12:33:09, Baoquan He wrote:
> > > Memory sub-section hotplug was added to fix the issue that nvdimm could
> > > be mapped at non-section aligned starting address. A subsection map is
> > > added into struct mem_section_usage to implement it. However, sub-section
> > > is only supported in VMEMMAP case.
> > 
> > Why? Is there any fundamental reason or just a lack of implementation?
> > VMEMMAP should be really only an implementation detail unless I am
> > missing something subtle.
> 
> Thanks for checking.
> 
> VMEMMAP is one of two ways to convert a PFN to the corresponding
> 'struct page' in SPARSE model. I mentioned them as VMEMMAP case, or
> !VMEMMAP case because we called them like this previously when reviewed
> patches, hope it won't cause confusion.
> 
> Currently, config ZONE_DEVICE depends on SPARSEMEM_VMEMMAP. The
> subsection_map is added to struct mem_section_usage to track which sub
> section is present, VMEMMAP fills those bits which corresponding
> sub-sections are present, while !VMEMMAP, namely classic SPARSE, fills
> the whole map always.
> 
> As we know, VMEMMAP builds page table to map a cluster of 'struct page'
> into the corresponding area of 'vmemmap'. Subsection hotplug can be
> supported naturally, w/o any change, just map needed region related to
> sub-sections on demand. For !VMEMMAP, it allocates memmap with
> alloc_pages() or vmalloc, thing is a little complicated, e.g the mixed
> section, boot memory occupies the starting area, later pmem hot added to
> the rear part.
> 
> About !VMEMMAP which doesn't support sub-section hotplog, Dan said 
> it's more because the effort and maintenance burden outweighs the
> benefit. And the current 64 bit ARCHes all enable
> SPARSEMEM_VMEMMAP_ENABLE by default.

OK, if this is the primary argument then make sure to document it in the
changelog (cover letter).
Baoquan He Feb. 26, 2020, 3:42 a.m. UTC | #6
On 02/25/20 at 11:02am, Michal Hocko wrote:
> On Tue 25-02-20 10:10:45, David Hildenbrand wrote:
> > >>>  include/linux/mmzone.h |   2 +
> > >>>  mm/sparse.c            | 178 +++++++++++++++++++++++++++++------------
> > >>>  2 files changed, 127 insertions(+), 53 deletions(-)
> > >>
> > >> Why do we need to add so much code to remove a functionality from one
> > >> memory model?
> > > 
> > > Hmm, Dan also asked this before.
> > > 
> > > The adding mainly happens in patch 2, 3, 4, including the two newly
> > > added function defitions, the code comments above them, and those added
> > > dummy functions for !VMEMMAP.
> > 
> > AFAIKS, it's mostly a bunch of newly added comments on top of functions.
> > E.g., the comment for fill_subsection_map() alone spans 12 LOC in total.
> > I do wonder if we have to be that verbose. We are barely that verbose on
> > MM code (and usually I don't see much benefit unless it's a function
> > with many users from many different places).
> 
> I would tend to agree here. Not that I am against kernel doc
> documentation but these are internal functions and the comment doesn't
> really give any better insight IMHO. I would be much more inclined if
> this was the general pattern in the respective file but it just stands
> out.

I saw there are internal functions which have code comments, e.g
shrink_slab() in mm/vmscan.c, not only this one place, there are several
places. I personally prefer to see code comment for function if
possible, this can save time, e.g people can skip the bitmap operation
when read code if not necessary. And here I mainly want to tell there
are different returned value to note different behaviour when call them.

Anyway, it's fine to me to remove them. The two functions are internal,
and not so complicated. I will remove them since you both object.
However, I disagree with the saying that we should not add code comment
for internal function.

Thanks
Baoquan
Baoquan He Feb. 26, 2020, 3:44 a.m. UTC | #7
On 02/25/20 at 11:03am, Michal Hocko wrote:
> On Fri 21-02-20 22:28:47, Baoquan He wrote:
> > On 02/20/20 at 11:38am, Michal Hocko wrote:
> > > On Thu 20-02-20 12:33:09, Baoquan He wrote:
> > > > Memory sub-section hotplug was added to fix the issue that nvdimm could
> > > > be mapped at non-section aligned starting address. A subsection map is
> > > > added into struct mem_section_usage to implement it. However, sub-section
> > > > is only supported in VMEMMAP case.
> > > 
> > > Why? Is there any fundamental reason or just a lack of implementation?
> > > VMEMMAP should be really only an implementation detail unless I am
> > > missing something subtle.
> > 
> > Thanks for checking.
> > 
> > VMEMMAP is one of two ways to convert a PFN to the corresponding
> > 'struct page' in SPARSE model. I mentioned them as VMEMMAP case, or
> > !VMEMMAP case because we called them like this previously when reviewed
> > patches, hope it won't cause confusion.
> > 
> > Currently, config ZONE_DEVICE depends on SPARSEMEM_VMEMMAP. The
> > subsection_map is added to struct mem_section_usage to track which sub
> > section is present, VMEMMAP fills those bits which corresponding
> > sub-sections are present, while !VMEMMAP, namely classic SPARSE, fills
> > the whole map always.
> > 
> > As we know, VMEMMAP builds page table to map a cluster of 'struct page'
> > into the corresponding area of 'vmemmap'. Subsection hotplug can be
> > supported naturally, w/o any change, just map needed region related to
> > sub-sections on demand. For !VMEMMAP, it allocates memmap with
> > alloc_pages() or vmalloc, thing is a little complicated, e.g the mixed
> > section, boot memory occupies the starting area, later pmem hot added to
> > the rear part.
> > 
> > About !VMEMMAP which doesn't support sub-section hotplog, Dan said 
> > it's more because the effort and maintenance burden outweighs the
> > benefit. And the current 64 bit ARCHes all enable
> > SPARSEMEM_VMEMMAP_ENABLE by default.
> 
> OK, if this is the primary argument then make sure to document it in the
> changelog (cover letter).

Will add it when repost.
Michal Hocko Feb. 26, 2020, 9:14 a.m. UTC | #8
On Wed 26-02-20 11:42:36, Baoquan He wrote:
> On 02/25/20 at 11:02am, Michal Hocko wrote:
> > On Tue 25-02-20 10:10:45, David Hildenbrand wrote:
> > > >>>  include/linux/mmzone.h |   2 +
> > > >>>  mm/sparse.c            | 178 +++++++++++++++++++++++++++++------------
> > > >>>  2 files changed, 127 insertions(+), 53 deletions(-)
> > > >>
> > > >> Why do we need to add so much code to remove a functionality from one
> > > >> memory model?
> > > > 
> > > > Hmm, Dan also asked this before.
> > > > 
> > > > The adding mainly happens in patch 2, 3, 4, including the two newly
> > > > added function defitions, the code comments above them, and those added
> > > > dummy functions for !VMEMMAP.
> > > 
> > > AFAIKS, it's mostly a bunch of newly added comments on top of functions.
> > > E.g., the comment for fill_subsection_map() alone spans 12 LOC in total.
> > > I do wonder if we have to be that verbose. We are barely that verbose on
> > > MM code (and usually I don't see much benefit unless it's a function
> > > with many users from many different places).
> > 
> > I would tend to agree here. Not that I am against kernel doc
> > documentation but these are internal functions and the comment doesn't
> > really give any better insight IMHO. I would be much more inclined if
> > this was the general pattern in the respective file but it just stands
> > out.
> 
> I saw there are internal functions which have code comments, e.g
> shrink_slab() in mm/vmscan.c, not only this one place, there are several
> places. I personally prefer to see code comment for function if
> possible, this can save time, e.g people can skip the bitmap operation
> when read code if not necessary. And here I mainly want to tell there
> are different returned value to note different behaviour when call them.

... yet nobody really cares about different return code. It is an error
that is propagated up the call chain and that's all.

I also like when there is a kernel doc comment that helps to understand
the intented usage, context the function can be called from, potential
side effects, locking requirements and other details people need to know
when calling functions. But have a look at 
/**
 * clear_subsection_map - Clear subsection map of one memory region
 *
 * @pfn - start pfn of the memory range
 * @nr_pages - number of pfns to add in the region
 *
 * This is only intended for hotplug, and clear the related subsection
 * map inside one section.
 *
 * Return:
 * * -EINVAL	- Section already deactived.
 * * 0		- Subsection map is emptied.
 * * 1		- Subsection map is not empty.
 */

the only useful information in here is that this is a hotplug stuff but
I would be completely lost about the intention without already knowing
what is this whole subsection about.
Baoquan He Feb. 26, 2020, 12:30 p.m. UTC | #9
On 02/26/20 at 10:14am, Michal Hocko wrote:
> On Wed 26-02-20 11:42:36, Baoquan He wrote:
> > On 02/25/20 at 11:02am, Michal Hocko wrote:
> > > On Tue 25-02-20 10:10:45, David Hildenbrand wrote:
> > > > >>>  include/linux/mmzone.h |   2 +
> > > > >>>  mm/sparse.c            | 178 +++++++++++++++++++++++++++++------------
> > > > >>>  2 files changed, 127 insertions(+), 53 deletions(-)
> > > > >>
> > > > >> Why do we need to add so much code to remove a functionality from one
> > > > >> memory model?
> > > > > 
> > > > > Hmm, Dan also asked this before.
> > > > > 
> > > > > The adding mainly happens in patch 2, 3, 4, including the two newly
> > > > > added function defitions, the code comments above them, and those added
> > > > > dummy functions for !VMEMMAP.
> > > > 
> > > > AFAIKS, it's mostly a bunch of newly added comments on top of functions.
> > > > E.g., the comment for fill_subsection_map() alone spans 12 LOC in total.
> > > > I do wonder if we have to be that verbose. We are barely that verbose on
> > > > MM code (and usually I don't see much benefit unless it's a function
> > > > with many users from many different places).
> > > 
> > > I would tend to agree here. Not that I am against kernel doc
> > > documentation but these are internal functions and the comment doesn't
> > > really give any better insight IMHO. I would be much more inclined if
> > > this was the general pattern in the respective file but it just stands
> > > out.
> > 
> > I saw there are internal functions which have code comments, e.g
> > shrink_slab() in mm/vmscan.c, not only this one place, there are several
> > places. I personally prefer to see code comment for function if
> > possible, this can save time, e.g people can skip the bitmap operation
> > when read code if not necessary. And here I mainly want to tell there
> > are different returned value to note different behaviour when call them.
> 
> ... yet nobody really cares about different return code. It is an error
> that is propagated up the call chain and that's all.
> 
> I also like when there is a kernel doc comment that helps to understand
> the intented usage, context the function can be called from, potential
> side effects, locking requirements and other details people need to know

Fair enough. As I have said, I didn't intend to stick to add kernel doc
comments for these two functions. Will remove them. Thanks for
reviewing.

> when calling functions. But have a look at 
> /**
>  * clear_subsection_map - Clear subsection map of one memory region
>  *
>  * @pfn - start pfn of the memory range
>  * @nr_pages - number of pfns to add in the region
>  *
>  * This is only intended for hotplug, and clear the related subsection
>  * map inside one section.
>  *
>  * Return:
>  * * -EINVAL	- Section already deactived.
>  * * 0		- Subsection map is emptied.
>  * * 1		- Subsection map is not empty.
>  */
> 
> the only useful information in here is that this is a hotplug stuff but
> I would be completely lost about the intention without already knowing
> what is this whole subsection about.
> 
> -- 
> Michal Hocko
> SUSE Labs
>