Message ID | 20180823130732.9489-4-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | trivial code refine for sparsemem | expand |
On Thu 23-08-18 21:07:32, Wei Yang wrote: > And it is known, __highest_present_section_nr is a more strict boundary > than NR_MEM_SECTIONS. > > This patch uses a __highest_present_section_nr to check a valid pfn. But why is this an improvement? Sure when you loop over all sections than __highest_present_section_nr makes a lot of sense. But all the updated function perform a trivial comparision. > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > --- > include/linux/mmzone.h | 4 ++-- > mm/sparse.c | 1 + > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 33086f86d1a7..5138efde11ae 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1237,7 +1237,7 @@ extern int __highest_present_section_nr; > #ifndef CONFIG_HAVE_ARCH_PFN_VALID > static inline int pfn_valid(unsigned long pfn) > { > - if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) > + if (pfn_to_section_nr(pfn) > __highest_present_section_nr) > return 0; > return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); > } > @@ -1245,7 +1245,7 @@ static inline int pfn_valid(unsigned long pfn) > > static inline int pfn_present(unsigned long pfn) > { > - if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) > + if (pfn_to_section_nr(pfn) > __highest_present_section_nr) > return 0; > return present_section(__nr_to_section(pfn_to_section_nr(pfn))); > } > diff --git a/mm/sparse.c b/mm/sparse.c > index 90bab7f03757..a9c55c8da11f 100644 > --- a/mm/sparse.c > +++ b/mm/sparse.c > @@ -174,6 +174,7 @@ void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn, > * those loops early. > */ > int __highest_present_section_nr; > +EXPORT_SYMBOL(__highest_present_section_nr); > static void section_mark_present(struct mem_section *ms) > { > int section_nr = __section_nr(ms); > -- > 2.15.1 >
On Thu, Aug 23, 2018 at 03:25:26PM +0200, Michal Hocko wrote: > On Thu 23-08-18 21:07:32, Wei Yang wrote: > > And it is known, __highest_present_section_nr is a more strict boundary > > than NR_MEM_SECTIONS. > > > > This patch uses a __highest_present_section_nr to check a valid pfn. > > But why is this an improvement? Sure when you loop over all sections > than __highest_present_section_nr makes a lot of sense. But all the > updated function perform a trivial comparision. I think it makes some sense. NR_MEM_SECTIONS can be a big number, but we might not be using all sections, so __highest_present_section_nr ends up being a much lower value. I think that we want to compare the pfn's section_nr with our current limit of present sections. Sections over that do not really exist for us, so it is no use to look for them in __nr_to_section/valid_section. It might not be a big improvement, but I think that given the nature of pfn_valid/pfn_present, comparing to __highest_present_section_nr suits better.
On Thu 23-08-18 16:00:53, Oscar Salvador wrote: > On Thu, Aug 23, 2018 at 03:25:26PM +0200, Michal Hocko wrote: > > On Thu 23-08-18 21:07:32, Wei Yang wrote: > > > And it is known, __highest_present_section_nr is a more strict boundary > > > than NR_MEM_SECTIONS. > > > > > > This patch uses a __highest_present_section_nr to check a valid pfn. > > > > But why is this an improvement? Sure when you loop over all sections > > than __highest_present_section_nr makes a lot of sense. But all the > > updated function perform a trivial comparision. > > I think it makes some sense. > NR_MEM_SECTIONS can be a big number, but we might not be using > all sections, so __highest_present_section_nr ends up being a much lower > value. And how exactly does it help to check for the smaller vs. a larger number? Both are O(1) operations AFAICS. __highest_present_section_nr makes perfect sense when we iterate over all sections or similar operations where it smaller number of iterations really makes sense. I am not saying the patch is wrong but I just do not see this being an improvement. You have to export an internal symbol to achieve something that is hardly an optimization.
On Thu, Aug 23, 2018 at 09:17:29PM +0200, Michal Hocko wrote: > And how exactly does it help to check for the smaller vs. a larger number? > Both are O(1) operations AFAICS. __highest_present_section_nr makes > perfect sense when we iterate over all sections or similar operations > where it smaller number of iterations really makes sense. Sure, improvement/optimization was not really my point, a comparasion is a comparasion. The gain, if any, would be because we would catch non present sections sooner before calling to present_section(). In the case that __highest_present_section_nr differs from NR_MEM_SECTIONS, of course. I thought it would make more sense given the nature of the function itself. The only thing I did not like much was that we need to export the symbol, though. So, as you said, the price might be too hight for what we get.
On 08/23/2018 06:07 AM, Wei Yang wrote: > And it is known, __highest_present_section_nr is a more strict boundary > than NR_MEM_SECTIONS. What is the benefit of this patch? You're adding a more "strict" boundary, but you're also adding a potential cacheline miss and removing optimizations that the compiler can make with a constant vs. a variable. Providing absolute bounds limits that the compiler knows about can actually be pretty handy for it to optimize things Do you have *any* analysis to show that this has a benefit? What does it do to text size, for instance?
On Thu, Aug 23, 2018 at 05:15:39PM -0700, Dave Hansen wrote: >On 08/23/2018 06:07 AM, Wei Yang wrote: >> And it is known, __highest_present_section_nr is a more strict boundary >> than NR_MEM_SECTIONS. > >What is the benefit of this patch? > The original idea is simple: with a more strict boundary, it will has less computation. >You're adding a more "strict" boundary, but you're also adding a >potential cacheline miss and removing optimizations that the compiler >can make with a constant vs. a variable. Providing absolute bounds >limits that the compiler knows about can actually be pretty handy for it >to optimize things You are right. I haven't thought about the compiler optimization case. > >Do you have *any* analysis to show that this has a benefit? What does >it do to text size, for instance? I don't have more analysis about this. Originally, I thought a strict boundary will have a benefit. But as you mentioned, it loose the compiler optimization. BTW, I did some tests and found during a normal boot, all the section number are within NR_MEM_SECTIONS. I am wondering in which case, this check will be valid and return 0?
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 33086f86d1a7..5138efde11ae 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1237,7 +1237,7 @@ extern int __highest_present_section_nr; #ifndef CONFIG_HAVE_ARCH_PFN_VALID static inline int pfn_valid(unsigned long pfn) { - if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) + if (pfn_to_section_nr(pfn) > __highest_present_section_nr) return 0; return valid_section(__nr_to_section(pfn_to_section_nr(pfn))); } @@ -1245,7 +1245,7 @@ static inline int pfn_valid(unsigned long pfn) static inline int pfn_present(unsigned long pfn) { - if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) + if (pfn_to_section_nr(pfn) > __highest_present_section_nr) return 0; return present_section(__nr_to_section(pfn_to_section_nr(pfn))); } diff --git a/mm/sparse.c b/mm/sparse.c index 90bab7f03757..a9c55c8da11f 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -174,6 +174,7 @@ void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn, * those loops early. */ int __highest_present_section_nr; +EXPORT_SYMBOL(__highest_present_section_nr); static void section_mark_present(struct mem_section *ms) { int section_nr = __section_nr(ms);
And it is known, __highest_present_section_nr is a more strict boundary than NR_MEM_SECTIONS. This patch uses a __highest_present_section_nr to check a valid pfn. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> --- include/linux/mmzone.h | 4 ++-- mm/sparse.c | 1 + 2 files changed, 3 insertions(+), 2 deletions(-)