diff mbox series

[1/3] mm/sparse: add likely to mem_section[root] check in sparse_index_init()

Message ID 20180823130732.9489-2-richard.weiyang@gmail.com (mailing list archive)
State New, archived
Headers show
Series trivial code refine for sparsemem | expand

Commit Message

Wei Yang Aug. 23, 2018, 1:07 p.m. UTC
Each time SECTIONS_PER_ROOT number of mem_section is allocated when
mem_section[root] is null. This means only (1 / SECTIONS_PER_ROOT) chance
of the mem_section[root] check is false.

This patch adds likely to the if check to optimize this a little.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/sparse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michal Hocko Aug. 23, 2018, 1:13 p.m. UTC | #1
On Thu 23-08-18 21:07:30, Wei Yang wrote:
> Each time SECTIONS_PER_ROOT number of mem_section is allocated when
> mem_section[root] is null. This means only (1 / SECTIONS_PER_ROOT) chance
> of the mem_section[root] check is false.
> 
> This patch adds likely to the if check to optimize this a little.

Could you evaluate how much does this help if any? Does this have any
impact on the initialization path at all?

> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> ---
>  mm/sparse.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 10b07eea9a6e..90bab7f03757 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
>  	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
>  	struct mem_section *section;
>  
> -	if (mem_section[root])
> +	if (likely(mem_section[root]))
>  		return -EEXIST;
>  
>  	section = sparse_index_alloc(nid);
> -- 
> 2.15.1
>
Wei Yang Aug. 23, 2018, 10:57 p.m. UTC | #2
On Thu, Aug 23, 2018 at 03:13:39PM +0200, Michal Hocko wrote:
>On Thu 23-08-18 21:07:30, Wei Yang wrote:
>> Each time SECTIONS_PER_ROOT number of mem_section is allocated when
>> mem_section[root] is null. This means only (1 / SECTIONS_PER_ROOT) chance
>> of the mem_section[root] check is false.
>> 
>> This patch adds likely to the if check to optimize this a little.
>
>Could you evaluate how much does this help if any? Does this have any
>impact on the initialization path at all?

Let me test on my 4G machine with this patch :-)

>
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> ---
>>  mm/sparse.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index 10b07eea9a6e..90bab7f03757 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
>>  	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
>>  	struct mem_section *section;
>>  
>> -	if (mem_section[root])
>> +	if (likely(mem_section[root]))
>>  		return -EEXIST;
>>  
>>  	section = sparse_index_alloc(nid);
>> -- 
>> 2.15.1
>> 
>
>-- 
>Michal Hocko
>SUSE Labs
Dave Hansen Aug. 24, 2018, 12:11 a.m. UTC | #3
On 08/23/2018 06:07 AM, Wei Yang wrote:
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
>  	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
>  	struct mem_section *section;
>  
> -	if (mem_section[root])
> +	if (likely(mem_section[root]))
>  		return -EEXIST;

We could add likely()/unlikely() to approximately a billion if()s around
the kernel if we felt like it.  We don't because it's messy and it
actually takes away choices from the compiler.

Please don't send patches like this unless you have some *actual*
analysis that shows the benefit of the patch.  Performance numbers are best.
Michal Hocko Aug. 24, 2018, 7:31 a.m. UTC | #4
On Thu 23-08-18 22:57:42, Wei Yang wrote:
> On Thu, Aug 23, 2018 at 03:13:39PM +0200, Michal Hocko wrote:
> >On Thu 23-08-18 21:07:30, Wei Yang wrote:
> >> Each time SECTIONS_PER_ROOT number of mem_section is allocated when
> >> mem_section[root] is null. This means only (1 / SECTIONS_PER_ROOT) chance
> >> of the mem_section[root] check is false.
> >> 
> >> This patch adds likely to the if check to optimize this a little.
> >
> >Could you evaluate how much does this help if any? Does this have any
> >impact on the initialization path at all?
> 
> Let me test on my 4G machine with this patch :-)

Well, this should have been done before posting the patch. In general,
though, you should have some convincing numbers in order to add new
likely/unlikely annotations. They are rarely helpful and they tend to
rot over time. Steven Rostedt has made some test few years back and
found out that a vast majority of those annotation were simply wrong.
Wei Yang Aug. 24, 2018, 3:07 p.m. UTC | #5
On Thu, Aug 23, 2018 at 05:11:48PM -0700, Dave Hansen wrote:
>On 08/23/2018 06:07 AM, Wei Yang wrote:
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
>>  	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
>>  	struct mem_section *section;
>>  
>> -	if (mem_section[root])
>> +	if (likely(mem_section[root]))
>>  		return -EEXIST;
>
>We could add likely()/unlikely() to approximately a billion if()s around
>the kernel if we felt like it.  We don't because it's messy and it
>actually takes away choices from the compiler.
>
>Please don't send patches like this unless you have some *actual*
>analysis that shows the benefit of the patch.  Performance numbers are best.

Thanks all for your comments, Michal, Dave and Oscar.

Well, maybe I took it for granted, so let me put more words on this. To be
honest, my analysis maybe partially effective, so if the cost is higher than
the gain, please let me know.

Below is my analysis and test result for this patch.
------------------------------------------------------

During bootup, the call flow looks like this.

    sparse_memory_present_with_active_regions()
        memory_present()
            sparse_index_init()

sparse_memory_present_with_active_regions() iterates on pfn continuously for
the whole system RAM, which leads to sparse_index_init() will iterate
section_nr continuously. Usually, we don't expect many large holes, right?

Each time when mem_section[root] is null, SECTIONS_PER_ROOT number of
mem_section will be allocated. This means, for SECTIONS_PER_ROOT number of
check, only the first check is false. So the possibility to be false is 
(1 / SECTIONS_PER_ROOT).

SECTIONS_PER_ROOT is defined as (PAGE_SIZE / sizeof (struct mem_section)).

On my x86_64 machine, PAGE_SIZE is 4KB and mem_section is 16B.

    SECTIONS_PER_ROOT = 4K / 16 = 256.

So the check for mem_section[root] is (1 / 256) chance to be invalid and
(255 / 256) valid. In theory, this value seems to be a "likely" to me.

In practice, when the system RAM is multiple times of
((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT), the "likely" chance is
(255 / 256), otherwise the chance would be less. 

On my x86_64 machine, SECTION_SIZE_BITS is defined to 27.

    ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT) = 32GB

          System RAM size       32G         16G        8G         4G
      Possibility          (255 / 256) (127 / 128) (63 / 64)  (31 / 32)

Generally, in my mind, if we iterate pfn continuously and there is no large
holes, the check on mem_section[root] is likely to be true.

At last, here is the test result on my 4G virtual machine. I added printk
before and after sparse_memory_present_with_active_regions() and tested three
times with/without "likely".

                without      with
     Elapsed   0.000252     0.000250   -0.8%

The benefit seems to be too small on a 4G virtual machine or even this is not
stable. Not sure we can see some visible effect on a 32G machine.


Well, above is all my analysis and test result. I did the optimization based
on my own experience and understanding. If this is not qualified, I am very
glad to hear from your statement, so that I would learn more from your
experience.

Thanks all for your comments again :-)
Wei Yang Sept. 3, 2018, 10:27 p.m. UTC | #6
On Fri, Aug 24, 2018 at 11:07:17PM +0800, Wei Yang wrote:
>On Thu, Aug 23, 2018 at 05:11:48PM -0700, Dave Hansen wrote:
>>On 08/23/2018 06:07 AM, Wei Yang wrote:
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
>>>  	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
>>>  	struct mem_section *section;
>>>  
>>> -	if (mem_section[root])
>>> +	if (likely(mem_section[root]))
>>>  		return -EEXIST;
>>
>>We could add likely()/unlikely() to approximately a billion if()s around
>>the kernel if we felt like it.  We don't because it's messy and it
>>actually takes away choices from the compiler.
>>
>>Please don't send patches like this unless you have some *actual*
>>analysis that shows the benefit of the patch.  Performance numbers are best.
>

Hi, 

Is my analysis reasonable? Or which part is not valid?

>Thanks all for your comments, Michal, Dave and Oscar.
>
>Well, maybe I took it for granted, so let me put more words on this. To be
>honest, my analysis maybe partially effective, so if the cost is higher than
>the gain, please let me know.
>
>Below is my analysis and test result for this patch.
>------------------------------------------------------
>
>During bootup, the call flow looks like this.
>
>    sparse_memory_present_with_active_regions()
>        memory_present()
>            sparse_index_init()
>
>sparse_memory_present_with_active_regions() iterates on pfn continuously for
>the whole system RAM, which leads to sparse_index_init() will iterate
>section_nr continuously. Usually, we don't expect many large holes, right?
>
>Each time when mem_section[root] is null, SECTIONS_PER_ROOT number of
>mem_section will be allocated. This means, for SECTIONS_PER_ROOT number of
>check, only the first check is false. So the possibility to be false is 
>(1 / SECTIONS_PER_ROOT).
>
>SECTIONS_PER_ROOT is defined as (PAGE_SIZE / sizeof (struct mem_section)).
>
>On my x86_64 machine, PAGE_SIZE is 4KB and mem_section is 16B.
>
>    SECTIONS_PER_ROOT = 4K / 16 = 256.
>
>So the check for mem_section[root] is (1 / 256) chance to be invalid and
>(255 / 256) valid. In theory, this value seems to be a "likely" to me.
>
>In practice, when the system RAM is multiple times of
>((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT), the "likely" chance is
>(255 / 256), otherwise the chance would be less. 
>
>On my x86_64 machine, SECTION_SIZE_BITS is defined to 27.
>
>    ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT) = 32GB
>
>          System RAM size       32G         16G        8G         4G
>      Possibility          (255 / 256) (127 / 128) (63 / 64)  (31 / 32)
>
>Generally, in my mind, if we iterate pfn continuously and there is no large
>holes, the check on mem_section[root] is likely to be true.
>
>At last, here is the test result on my 4G virtual machine. I added printk
>before and after sparse_memory_present_with_active_regions() and tested three
>times with/without "likely".
>
>                without      with
>     Elapsed   0.000252     0.000250   -0.8%
>
>The benefit seems to be too small on a 4G virtual machine or even this is not
>stable. Not sure we can see some visible effect on a 32G machine.
>
>
>Well, above is all my analysis and test result. I did the optimization based
>on my own experience and understanding. If this is not qualified, I am very
>glad to hear from your statement, so that I would learn more from your
>experience.
>
>Thanks all for your comments again :-)
> 
>
>-- 
>Wei Yang
>Help you, Help me
Wei Yang Sept. 9, 2018, 1:38 a.m. UTC | #7
On Mon, Sep 03, 2018 at 10:27:32PM +0000, Wei Yang wrote:
>On Fri, Aug 24, 2018 at 11:07:17PM +0800, Wei Yang wrote:
>>On Thu, Aug 23, 2018 at 05:11:48PM -0700, Dave Hansen wrote:
>>>On 08/23/2018 06:07 AM, Wei Yang wrote:
>>>> --- a/mm/sparse.c
>>>> +++ b/mm/sparse.c
>>>> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid)
>>>>  	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
>>>>  	struct mem_section *section;
>>>>  
>>>> -	if (mem_section[root])
>>>> +	if (likely(mem_section[root]))
>>>>  		return -EEXIST;
>>>
>>>We could add likely()/unlikely() to approximately a billion if()s around
>>>the kernel if we felt like it.  We don't because it's messy and it
>>>actually takes away choices from the compiler.
>>>
>>>Please don't send patches like this unless you have some *actual*
>>>analysis that shows the benefit of the patch.  Performance numbers are best.
>>
>
>Hi, 
>
>Is my analysis reasonable? Or which part is not valid?
>

Would someone share some idea on my analysis?

>>Thanks all for your comments, Michal, Dave and Oscar.
>>
>>Well, maybe I took it for granted, so let me put more words on this. To be
>>honest, my analysis maybe partially effective, so if the cost is higher than
>>the gain, please let me know.
>>
>>Below is my analysis and test result for this patch.
>>------------------------------------------------------
>>
>>During bootup, the call flow looks like this.
>>
>>    sparse_memory_present_with_active_regions()
>>        memory_present()
>>            sparse_index_init()
>>
>>sparse_memory_present_with_active_regions() iterates on pfn continuously for
>>the whole system RAM, which leads to sparse_index_init() will iterate
>>section_nr continuously. Usually, we don't expect many large holes, right?
>>
>>Each time when mem_section[root] is null, SECTIONS_PER_ROOT number of
>>mem_section will be allocated. This means, for SECTIONS_PER_ROOT number of
>>check, only the first check is false. So the possibility to be false is 
>>(1 / SECTIONS_PER_ROOT).
>>
>>SECTIONS_PER_ROOT is defined as (PAGE_SIZE / sizeof (struct mem_section)).
>>
>>On my x86_64 machine, PAGE_SIZE is 4KB and mem_section is 16B.
>>
>>    SECTIONS_PER_ROOT = 4K / 16 = 256.
>>
>>So the check for mem_section[root] is (1 / 256) chance to be invalid and
>>(255 / 256) valid. In theory, this value seems to be a "likely" to me.
>>
>>In practice, when the system RAM is multiple times of
>>((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT), the "likely" chance is
>>(255 / 256), otherwise the chance would be less. 
>>
>>On my x86_64 machine, SECTION_SIZE_BITS is defined to 27.
>>
>>    ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT) = 32GB
>>
>>          System RAM size       32G         16G        8G         4G
>>      Possibility          (255 / 256) (127 / 128) (63 / 64)  (31 / 32)
>>
>>Generally, in my mind, if we iterate pfn continuously and there is no large
>>holes, the check on mem_section[root] is likely to be true.
>>
>>At last, here is the test result on my 4G virtual machine. I added printk
>>before and after sparse_memory_present_with_active_regions() and tested three
>>times with/without "likely".
>>
>>                without      with
>>     Elapsed   0.000252     0.000250   -0.8%
>>
>>The benefit seems to be too small on a 4G virtual machine or even this is not
>>stable. Not sure we can see some visible effect on a 32G machine.
>>
>>
>>Well, above is all my analysis and test result. I did the optimization based
>>on my own experience and understanding. If this is not qualified, I am very
>>glad to hear from your statement, so that I would learn more from your
>>experience.
>>
>>Thanks all for your comments again :-)
>> 
>>
>>-- 
>>Wei Yang
>>Help you, Help me
>
>-- 
>Wei Yang
>Help you, Help me
Dave Hansen Sept. 10, 2018, 8:30 p.m. UTC | #8
On 09/08/2018 06:38 PM, owner-linux-mm@kvack.org wrote:
> 
> At last, here is the test result on my 4G virtual machine. I added printk
> before and after sparse_memory_present_with_active_regions() and tested three
> times with/without "likely".
> 
>                without      with
>     Elapsed   0.000252     0.000250   -0.8%
> 
> The benefit seems to be too small on a 4G virtual machine or even this is not
> stable. Not sure we can see some visible effect on a 32G machine.

I think it's highly unlikely you have found something significant here.
It's one system, in a VM and it's not being measured using a mechanism
that is suitable for benchmarking (the kernel dmesg timestamps).

Plus, if this is a really tight loop, the cpu's branch predictors will
be good at it.
Wei Yang Sept. 11, 2018, 3 p.m. UTC | #9
On Mon, Sep 10, 2018 at 01:30:11PM -0700, Dave Hansen wrote:
>On 09/08/2018 06:38 PM, owner-linux-mm@kvack.org wrote:
>> 
>> At last, here is the test result on my 4G virtual machine. I added printk
>> before and after sparse_memory_present_with_active_regions() and tested three
>> times with/without "likely".
>> 
>>                without      with
>>     Elapsed   0.000252     0.000250   -0.8%
>> 
>> The benefit seems to be too small on a 4G virtual machine or even this is not
>> stable. Not sure we can see some visible effect on a 32G machine.
>
>I think it's highly unlikely you have found something significant here.
>It's one system, in a VM and it's not being measured using a mechanism
>that is suitable for benchmarking (the kernel dmesg timestamps).
>
>Plus, if this is a really tight loop, the cpu's branch predictors will
>be good at it.

Hi, Dave

Thanks for your reply.

I think you are right. This part is not significant and cpu may do its
job well.

Hmm... I am still willing to hear your opinion on my analysis of this
situation. In which case we would use likely/unlikely.

For example, in this case the possibility is (255/ 256) if the system
has 32G RAM. Do we have a threshold of the possibility to use
likely/unlikely. Or we'd prefer not to use this any more? Let compiler
and cpu do their job.

Look forward your insights.
diff mbox series

Patch

diff --git a/mm/sparse.c b/mm/sparse.c
index 10b07eea9a6e..90bab7f03757 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -78,7 +78,7 @@  static int __meminit sparse_index_init(unsigned long section_nr, int nid)
 	unsigned long root = SECTION_NR_TO_ROOT(section_nr);
 	struct mem_section *section;
 
-	if (mem_section[root])
+	if (likely(mem_section[root]))
 		return -EEXIST;
 
 	section = sparse_index_alloc(nid);