diff mbox series

[3/3] mm/sparse: use __highest_present_section_nr as the boundary for pfn check

Message ID 20180823130732.9489-4-richard.weiyang@gmail.com (mailing list archive)
State New, archived
Headers show
Series trivial code refine for sparsemem | expand

Commit Message

Wei Yang Aug. 23, 2018, 1:07 p.m. UTC
And it is known, __highest_present_section_nr is a more strict boundary
than NR_MEM_SECTIONS.

This patch uses a __highest_present_section_nr to check a valid pfn.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 include/linux/mmzone.h | 4 ++--
 mm/sparse.c            | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

Comments

Michal Hocko Aug. 23, 2018, 1:25 p.m. UTC | #1
On Thu 23-08-18 21:07:32, Wei Yang wrote:
> And it is known, __highest_present_section_nr is a more strict boundary
> than NR_MEM_SECTIONS.
> 
> This patch uses a __highest_present_section_nr to check a valid pfn.

But why is this an improvement? Sure when you loop over all sections
than __highest_present_section_nr makes a lot of sense. But all the
updated function perform a trivial comparision.

> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> ---
>  include/linux/mmzone.h | 4 ++--
>  mm/sparse.c            | 1 +
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 33086f86d1a7..5138efde11ae 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1237,7 +1237,7 @@ extern int __highest_present_section_nr;
>  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
>  static inline int pfn_valid(unsigned long pfn)
>  {
> -	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> +	if (pfn_to_section_nr(pfn) > __highest_present_section_nr)
>  		return 0;
>  	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
>  }
> @@ -1245,7 +1245,7 @@ static inline int pfn_valid(unsigned long pfn)
>  
>  static inline int pfn_present(unsigned long pfn)
>  {
> -	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> +	if (pfn_to_section_nr(pfn) > __highest_present_section_nr)
>  		return 0;
>  	return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
>  }
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 90bab7f03757..a9c55c8da11f 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -174,6 +174,7 @@ void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
>   * those loops early.
>   */
>  int __highest_present_section_nr;
> +EXPORT_SYMBOL(__highest_present_section_nr);
>  static void section_mark_present(struct mem_section *ms)
>  {
>  	int section_nr = __section_nr(ms);
> -- 
> 2.15.1
>
Oscar Salvador Aug. 23, 2018, 2 p.m. UTC | #2
On Thu, Aug 23, 2018 at 03:25:26PM +0200, Michal Hocko wrote:
> On Thu 23-08-18 21:07:32, Wei Yang wrote:
> > And it is known, __highest_present_section_nr is a more strict boundary
> > than NR_MEM_SECTIONS.
> > 
> > This patch uses a __highest_present_section_nr to check a valid pfn.
> 
> But why is this an improvement? Sure when you loop over all sections
> than __highest_present_section_nr makes a lot of sense. But all the
> updated function perform a trivial comparision.

I think it makes some sense.
NR_MEM_SECTIONS can be a big number, but we might not be using
all sections, so __highest_present_section_nr ends up being a much lower
value.

I think that we want to compare the pfn's section_nr with our current limit
of present sections.
Sections over that do not really exist for us, so it is no use to look for
them in __nr_to_section/valid_section.

It might not be a big improvement, but I think that given the nature of
pfn_valid/pfn_present, comparing to __highest_present_section_nr suits better.
Michal Hocko Aug. 23, 2018, 7:17 p.m. UTC | #3
On Thu 23-08-18 16:00:53, Oscar Salvador wrote:
> On Thu, Aug 23, 2018 at 03:25:26PM +0200, Michal Hocko wrote:
> > On Thu 23-08-18 21:07:32, Wei Yang wrote:
> > > And it is known, __highest_present_section_nr is a more strict boundary
> > > than NR_MEM_SECTIONS.
> > > 
> > > This patch uses a __highest_present_section_nr to check a valid pfn.
> > 
> > But why is this an improvement? Sure when you loop over all sections
> > than __highest_present_section_nr makes a lot of sense. But all the
> > updated function perform a trivial comparision.
> 
> I think it makes some sense.
> NR_MEM_SECTIONS can be a big number, but we might not be using
> all sections, so __highest_present_section_nr ends up being a much lower
> value.

And how exactly does it help to check for the smaller vs. a larger number?
Both are O(1) operations AFAICS. __highest_present_section_nr makes
perfect sense when we iterate over all sections or similar operations
where it smaller number of iterations really makes sense.

I am not saying the patch is wrong but I just do not see this being an
improvement. You have to export an internal symbol to achieve something
that is hardly an optimization.
Oscar Salvador Aug. 23, 2018, 8:52 p.m. UTC | #4
On Thu, Aug 23, 2018 at 09:17:29PM +0200, Michal Hocko wrote:
> And how exactly does it help to check for the smaller vs. a larger number?
> Both are O(1) operations AFAICS. __highest_present_section_nr makes
> perfect sense when we iterate over all sections or similar operations
> where it smaller number of iterations really makes sense.

Sure, improvement/optimization was not really my point, a comparasion is
a comparasion.
The gain, if any, would be because we would catch
non present sections sooner before calling to present_section().
In the case that __highest_present_section_nr differs from
NR_MEM_SECTIONS, of course.

I thought it would make more sense given the nature of the function itself.

The only thing I did not like much was that we need to export the symbol, though.
So, as you said, the price might be too hight for what we get.
Dave Hansen Aug. 24, 2018, 12:15 a.m. UTC | #5
On 08/23/2018 06:07 AM, Wei Yang wrote:
> And it is known, __highest_present_section_nr is a more strict boundary
> than NR_MEM_SECTIONS.

What is the benefit of this patch?

You're adding a more "strict" boundary, but you're also adding a
potential cacheline miss and removing optimizations that the compiler
can make with a constant vs. a variable.  Providing absolute bounds
limits that the compiler knows about can actually be pretty handy for it
to optimize things

Do you have *any* analysis to show that this has a benefit?  What does
it do to text size, for instance?
Wei Yang Aug. 24, 2018, 6:11 p.m. UTC | #6
On Thu, Aug 23, 2018 at 05:15:39PM -0700, Dave Hansen wrote:
>On 08/23/2018 06:07 AM, Wei Yang wrote:
>> And it is known, __highest_present_section_nr is a more strict boundary
>> than NR_MEM_SECTIONS.
>
>What is the benefit of this patch?
>

The original idea is simple: with a more strict boundary, it will has less
computation.

>You're adding a more "strict" boundary, but you're also adding a
>potential cacheline miss and removing optimizations that the compiler
>can make with a constant vs. a variable.  Providing absolute bounds
>limits that the compiler knows about can actually be pretty handy for it
>to optimize things

You are right.

I haven't thought about the compiler optimization case.

>
>Do you have *any* analysis to show that this has a benefit?  What does
>it do to text size, for instance?

I don't have more analysis about this. Originally, I thought a strict boundary
will have a benefit. But as you mentioned, it loose the compiler optimization.

BTW, I did some tests and found during a normal boot, all the section number
are within NR_MEM_SECTIONS. I am wondering in which case, this check will be
valid and return 0?
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 33086f86d1a7..5138efde11ae 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1237,7 +1237,7 @@  extern int __highest_present_section_nr;
 #ifndef CONFIG_HAVE_ARCH_PFN_VALID
 static inline int pfn_valid(unsigned long pfn)
 {
-	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
+	if (pfn_to_section_nr(pfn) > __highest_present_section_nr)
 		return 0;
 	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
 }
@@ -1245,7 +1245,7 @@  static inline int pfn_valid(unsigned long pfn)
 
 static inline int pfn_present(unsigned long pfn)
 {
-	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
+	if (pfn_to_section_nr(pfn) > __highest_present_section_nr)
 		return 0;
 	return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index 90bab7f03757..a9c55c8da11f 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -174,6 +174,7 @@  void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
  * those loops early.
  */
 int __highest_present_section_nr;
+EXPORT_SYMBOL(__highest_present_section_nr);
 static void section_mark_present(struct mem_section *ms)
 {
 	int section_nr = __section_nr(ms);