Message ID | 20180823130732.9489-2-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | trivial code refine for sparsemem | expand |
On Thu 23-08-18 21:07:30, Wei Yang wrote: > Each time SECTIONS_PER_ROOT number of mem_section is allocated when > mem_section[root] is null. This means only (1 / SECTIONS_PER_ROOT) chance > of the mem_section[root] check is false. > > This patch adds likely to the if check to optimize this a little. Could you evaluate how much does this help if any? Does this have any impact on the initialization path at all? > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > --- > mm/sparse.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/sparse.c b/mm/sparse.c > index 10b07eea9a6e..90bab7f03757 100644 > --- a/mm/sparse.c > +++ b/mm/sparse.c > @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) > unsigned long root = SECTION_NR_TO_ROOT(section_nr); > struct mem_section *section; > > - if (mem_section[root]) > + if (likely(mem_section[root])) > return -EEXIST; > > section = sparse_index_alloc(nid); > -- > 2.15.1 >
On Thu, Aug 23, 2018 at 03:13:39PM +0200, Michal Hocko wrote: >On Thu 23-08-18 21:07:30, Wei Yang wrote: >> Each time SECTIONS_PER_ROOT number of mem_section is allocated when >> mem_section[root] is null. This means only (1 / SECTIONS_PER_ROOT) chance >> of the mem_section[root] check is false. >> >> This patch adds likely to the if check to optimize this a little. > >Could you evaluate how much does this help if any? Does this have any >impact on the initialization path at all? Let me test on my 4G machine with this patch :-) > >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> --- >> mm/sparse.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/sparse.c b/mm/sparse.c >> index 10b07eea9a6e..90bab7f03757 100644 >> --- a/mm/sparse.c >> +++ b/mm/sparse.c >> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) >> unsigned long root = SECTION_NR_TO_ROOT(section_nr); >> struct mem_section *section; >> >> - if (mem_section[root]) >> + if (likely(mem_section[root])) >> return -EEXIST; >> >> section = sparse_index_alloc(nid); >> -- >> 2.15.1 >> > >-- >Michal Hocko >SUSE Labs
On 08/23/2018 06:07 AM, Wei Yang wrote: > --- a/mm/sparse.c > +++ b/mm/sparse.c > @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) > unsigned long root = SECTION_NR_TO_ROOT(section_nr); > struct mem_section *section; > > - if (mem_section[root]) > + if (likely(mem_section[root])) > return -EEXIST; We could add likely()/unlikely() to approximately a billion if()s around the kernel if we felt like it. We don't because it's messy and it actually takes away choices from the compiler. Please don't send patches like this unless you have some *actual* analysis that shows the benefit of the patch. Performance numbers are best.
On Thu 23-08-18 22:57:42, Wei Yang wrote: > On Thu, Aug 23, 2018 at 03:13:39PM +0200, Michal Hocko wrote: > >On Thu 23-08-18 21:07:30, Wei Yang wrote: > >> Each time SECTIONS_PER_ROOT number of mem_section is allocated when > >> mem_section[root] is null. This means only (1 / SECTIONS_PER_ROOT) chance > >> of the mem_section[root] check is false. > >> > >> This patch adds likely to the if check to optimize this a little. > > > >Could you evaluate how much does this help if any? Does this have any > >impact on the initialization path at all? > > Let me test on my 4G machine with this patch :-) Well, this should have been done before posting the patch. In general, though, you should have some convincing numbers in order to add new likely/unlikely annotations. They are rarely helpful and they tend to rot over time. Steven Rostedt has made some test few years back and found out that a vast majority of those annotation were simply wrong.
On Thu, Aug 23, 2018 at 05:11:48PM -0700, Dave Hansen wrote: >On 08/23/2018 06:07 AM, Wei Yang wrote: >> --- a/mm/sparse.c >> +++ b/mm/sparse.c >> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) >> unsigned long root = SECTION_NR_TO_ROOT(section_nr); >> struct mem_section *section; >> >> - if (mem_section[root]) >> + if (likely(mem_section[root])) >> return -EEXIST; > >We could add likely()/unlikely() to approximately a billion if()s around >the kernel if we felt like it. We don't because it's messy and it >actually takes away choices from the compiler. > >Please don't send patches like this unless you have some *actual* >analysis that shows the benefit of the patch. Performance numbers are best. Thanks all for your comments, Michal, Dave and Oscar. Well, maybe I took it for granted, so let me put more words on this. To be honest, my analysis maybe partially effective, so if the cost is higher than the gain, please let me know. Below is my analysis and test result for this patch. ------------------------------------------------------ During bootup, the call flow looks like this. sparse_memory_present_with_active_regions() memory_present() sparse_index_init() sparse_memory_present_with_active_regions() iterates on pfn continuously for the whole system RAM, which leads to sparse_index_init() will iterate section_nr continuously. Usually, we don't expect many large holes, right? Each time when mem_section[root] is null, SECTIONS_PER_ROOT number of mem_section will be allocated. This means, for SECTIONS_PER_ROOT number of check, only the first check is false. So the possibility to be false is (1 / SECTIONS_PER_ROOT). SECTIONS_PER_ROOT is defined as (PAGE_SIZE / sizeof (struct mem_section)). On my x86_64 machine, PAGE_SIZE is 4KB and mem_section is 16B. SECTIONS_PER_ROOT = 4K / 16 = 256. So the check for mem_section[root] is (1 / 256) chance to be invalid and (255 / 256) valid. In theory, this value seems to be a "likely" to me. In practice, when the system RAM is multiple times of ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT), the "likely" chance is (255 / 256), otherwise the chance would be less. On my x86_64 machine, SECTION_SIZE_BITS is defined to 27. ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT) = 32GB System RAM size 32G 16G 8G 4G Possibility (255 / 256) (127 / 128) (63 / 64) (31 / 32) Generally, in my mind, if we iterate pfn continuously and there is no large holes, the check on mem_section[root] is likely to be true. At last, here is the test result on my 4G virtual machine. I added printk before and after sparse_memory_present_with_active_regions() and tested three times with/without "likely". without with Elapsed 0.000252 0.000250 -0.8% The benefit seems to be too small on a 4G virtual machine or even this is not stable. Not sure we can see some visible effect on a 32G machine. Well, above is all my analysis and test result. I did the optimization based on my own experience and understanding. If this is not qualified, I am very glad to hear from your statement, so that I would learn more from your experience. Thanks all for your comments again :-)
On Fri, Aug 24, 2018 at 11:07:17PM +0800, Wei Yang wrote: >On Thu, Aug 23, 2018 at 05:11:48PM -0700, Dave Hansen wrote: >>On 08/23/2018 06:07 AM, Wei Yang wrote: >>> --- a/mm/sparse.c >>> +++ b/mm/sparse.c >>> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) >>> unsigned long root = SECTION_NR_TO_ROOT(section_nr); >>> struct mem_section *section; >>> >>> - if (mem_section[root]) >>> + if (likely(mem_section[root])) >>> return -EEXIST; >> >>We could add likely()/unlikely() to approximately a billion if()s around >>the kernel if we felt like it. We don't because it's messy and it >>actually takes away choices from the compiler. >> >>Please don't send patches like this unless you have some *actual* >>analysis that shows the benefit of the patch. Performance numbers are best. > Hi, Is my analysis reasonable? Or which part is not valid? >Thanks all for your comments, Michal, Dave and Oscar. > >Well, maybe I took it for granted, so let me put more words on this. To be >honest, my analysis maybe partially effective, so if the cost is higher than >the gain, please let me know. > >Below is my analysis and test result for this patch. >------------------------------------------------------ > >During bootup, the call flow looks like this. > > sparse_memory_present_with_active_regions() > memory_present() > sparse_index_init() > >sparse_memory_present_with_active_regions() iterates on pfn continuously for >the whole system RAM, which leads to sparse_index_init() will iterate >section_nr continuously. Usually, we don't expect many large holes, right? > >Each time when mem_section[root] is null, SECTIONS_PER_ROOT number of >mem_section will be allocated. This means, for SECTIONS_PER_ROOT number of >check, only the first check is false. So the possibility to be false is >(1 / SECTIONS_PER_ROOT). > >SECTIONS_PER_ROOT is defined as (PAGE_SIZE / sizeof (struct mem_section)). > >On my x86_64 machine, PAGE_SIZE is 4KB and mem_section is 16B. > > SECTIONS_PER_ROOT = 4K / 16 = 256. > >So the check for mem_section[root] is (1 / 256) chance to be invalid and >(255 / 256) valid. In theory, this value seems to be a "likely" to me. > >In practice, when the system RAM is multiple times of >((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT), the "likely" chance is >(255 / 256), otherwise the chance would be less. > >On my x86_64 machine, SECTION_SIZE_BITS is defined to 27. > > ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT) = 32GB > > System RAM size 32G 16G 8G 4G > Possibility (255 / 256) (127 / 128) (63 / 64) (31 / 32) > >Generally, in my mind, if we iterate pfn continuously and there is no large >holes, the check on mem_section[root] is likely to be true. > >At last, here is the test result on my 4G virtual machine. I added printk >before and after sparse_memory_present_with_active_regions() and tested three >times with/without "likely". > > without with > Elapsed 0.000252 0.000250 -0.8% > >The benefit seems to be too small on a 4G virtual machine or even this is not >stable. Not sure we can see some visible effect on a 32G machine. > > >Well, above is all my analysis and test result. I did the optimization based >on my own experience and understanding. If this is not qualified, I am very >glad to hear from your statement, so that I would learn more from your >experience. > >Thanks all for your comments again :-) > > >-- >Wei Yang >Help you, Help me
On Mon, Sep 03, 2018 at 10:27:32PM +0000, Wei Yang wrote: >On Fri, Aug 24, 2018 at 11:07:17PM +0800, Wei Yang wrote: >>On Thu, Aug 23, 2018 at 05:11:48PM -0700, Dave Hansen wrote: >>>On 08/23/2018 06:07 AM, Wei Yang wrote: >>>> --- a/mm/sparse.c >>>> +++ b/mm/sparse.c >>>> @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) >>>> unsigned long root = SECTION_NR_TO_ROOT(section_nr); >>>> struct mem_section *section; >>>> >>>> - if (mem_section[root]) >>>> + if (likely(mem_section[root])) >>>> return -EEXIST; >>> >>>We could add likely()/unlikely() to approximately a billion if()s around >>>the kernel if we felt like it. We don't because it's messy and it >>>actually takes away choices from the compiler. >>> >>>Please don't send patches like this unless you have some *actual* >>>analysis that shows the benefit of the patch. Performance numbers are best. >> > >Hi, > >Is my analysis reasonable? Or which part is not valid? > Would someone share some idea on my analysis? >>Thanks all for your comments, Michal, Dave and Oscar. >> >>Well, maybe I took it for granted, so let me put more words on this. To be >>honest, my analysis maybe partially effective, so if the cost is higher than >>the gain, please let me know. >> >>Below is my analysis and test result for this patch. >>------------------------------------------------------ >> >>During bootup, the call flow looks like this. >> >> sparse_memory_present_with_active_regions() >> memory_present() >> sparse_index_init() >> >>sparse_memory_present_with_active_regions() iterates on pfn continuously for >>the whole system RAM, which leads to sparse_index_init() will iterate >>section_nr continuously. Usually, we don't expect many large holes, right? >> >>Each time when mem_section[root] is null, SECTIONS_PER_ROOT number of >>mem_section will be allocated. This means, for SECTIONS_PER_ROOT number of >>check, only the first check is false. So the possibility to be false is >>(1 / SECTIONS_PER_ROOT). >> >>SECTIONS_PER_ROOT is defined as (PAGE_SIZE / sizeof (struct mem_section)). >> >>On my x86_64 machine, PAGE_SIZE is 4KB and mem_section is 16B. >> >> SECTIONS_PER_ROOT = 4K / 16 = 256. >> >>So the check for mem_section[root] is (1 / 256) chance to be invalid and >>(255 / 256) valid. In theory, this value seems to be a "likely" to me. >> >>In practice, when the system RAM is multiple times of >>((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT), the "likely" chance is >>(255 / 256), otherwise the chance would be less. >> >>On my x86_64 machine, SECTION_SIZE_BITS is defined to 27. >> >> ((1 << SECTION_SIZE_BITS) * SECTIONS_PER_ROOT) = 32GB >> >> System RAM size 32G 16G 8G 4G >> Possibility (255 / 256) (127 / 128) (63 / 64) (31 / 32) >> >>Generally, in my mind, if we iterate pfn continuously and there is no large >>holes, the check on mem_section[root] is likely to be true. >> >>At last, here is the test result on my 4G virtual machine. I added printk >>before and after sparse_memory_present_with_active_regions() and tested three >>times with/without "likely". >> >> without with >> Elapsed 0.000252 0.000250 -0.8% >> >>The benefit seems to be too small on a 4G virtual machine or even this is not >>stable. Not sure we can see some visible effect on a 32G machine. >> >> >>Well, above is all my analysis and test result. I did the optimization based >>on my own experience and understanding. If this is not qualified, I am very >>glad to hear from your statement, so that I would learn more from your >>experience. >> >>Thanks all for your comments again :-) >> >> >>-- >>Wei Yang >>Help you, Help me > >-- >Wei Yang >Help you, Help me
On 09/08/2018 06:38 PM, owner-linux-mm@kvack.org wrote: > > At last, here is the test result on my 4G virtual machine. I added printk > before and after sparse_memory_present_with_active_regions() and tested three > times with/without "likely". > > without with > Elapsed 0.000252 0.000250 -0.8% > > The benefit seems to be too small on a 4G virtual machine or even this is not > stable. Not sure we can see some visible effect on a 32G machine. I think it's highly unlikely you have found something significant here. It's one system, in a VM and it's not being measured using a mechanism that is suitable for benchmarking (the kernel dmesg timestamps). Plus, if this is a really tight loop, the cpu's branch predictors will be good at it.
On Mon, Sep 10, 2018 at 01:30:11PM -0700, Dave Hansen wrote: >On 09/08/2018 06:38 PM, owner-linux-mm@kvack.org wrote: >> >> At last, here is the test result on my 4G virtual machine. I added printk >> before and after sparse_memory_present_with_active_regions() and tested three >> times with/without "likely". >> >> without with >> Elapsed 0.000252 0.000250 -0.8% >> >> The benefit seems to be too small on a 4G virtual machine or even this is not >> stable. Not sure we can see some visible effect on a 32G machine. > >I think it's highly unlikely you have found something significant here. >It's one system, in a VM and it's not being measured using a mechanism >that is suitable for benchmarking (the kernel dmesg timestamps). > >Plus, if this is a really tight loop, the cpu's branch predictors will >be good at it. Hi, Dave Thanks for your reply. I think you are right. This part is not significant and cpu may do its job well. Hmm... I am still willing to hear your opinion on my analysis of this situation. In which case we would use likely/unlikely. For example, in this case the possibility is (255/ 256) if the system has 32G RAM. Do we have a threshold of the possibility to use likely/unlikely. Or we'd prefer not to use this any more? Let compiler and cpu do their job. Look forward your insights.
diff --git a/mm/sparse.c b/mm/sparse.c index 10b07eea9a6e..90bab7f03757 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -78,7 +78,7 @@ static int __meminit sparse_index_init(unsigned long section_nr, int nid) unsigned long root = SECTION_NR_TO_ROOT(section_nr); struct mem_section *section; - if (mem_section[root]) + if (likely(mem_section[root])) return -EEXIST; section = sparse_index_alloc(nid);
Each time SECTIONS_PER_ROOT number of mem_section is allocated when mem_section[root] is null. This means only (1 / SECTIONS_PER_ROOT) chance of the mem_section[root] check is false. This patch adds likely to the if check to optimize this a little. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> --- mm/sparse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)