diff mbox series

[v18,9/9] mm: hugetlb: optimize the code with the help of the compiler

Message ID 20210308102807.59745-10-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Free some vmemmap pages of HugeTLB page | expand

Commit Message

Muchun Song March 8, 2021, 10:28 a.m. UTC
When the "struct page size" crosses page boundaries we cannot
make use of this feature. Let free_vmemmap_pages_per_hpage()
return zero if that is the case, most of the functions can be
optimized away.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Chen Huang <chenhuang5@huawei.com>
Tested-by: Bodeddula Balasubramaniam <bodeddub@amazon.com>
---
 include/linux/hugetlb.h | 3 ++-
 mm/hugetlb_vmemmap.c    | 7 +++++++
 mm/hugetlb_vmemmap.h    | 6 ++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

Comments

Michal Hocko March 10, 2021, 3:41 p.m. UTC | #1
On Mon 08-03-21 18:28:07, Muchun Song wrote:
> When the "struct page size" crosses page boundaries we cannot
> make use of this feature. Let free_vmemmap_pages_per_hpage()
> return zero if that is the case, most of the functions can be
> optimized away.

I am confused. Don't you check for this in early_hugetlb_free_vmemmap_param already?
Why do we need any runtime checks?

> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> Tested-by: Chen Huang <chenhuang5@huawei.com>
> Tested-by: Bodeddula Balasubramaniam <bodeddub@amazon.com>
> ---
>  include/linux/hugetlb.h | 3 ++-
>  mm/hugetlb_vmemmap.c    | 7 +++++++
>  mm/hugetlb_vmemmap.h    | 6 ++++++
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index c70421e26189..333dd0479fc2 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -880,7 +880,8 @@ extern bool hugetlb_free_vmemmap_enabled;
>  
>  static inline bool is_hugetlb_free_vmemmap_enabled(void)
>  {
> -	return hugetlb_free_vmemmap_enabled;
> +	return hugetlb_free_vmemmap_enabled &&
> +	       is_power_of_2(sizeof(struct page));
>  }
>  #else
>  static inline bool is_hugetlb_free_vmemmap_enabled(void)
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 33e42678abe3..1ba1ef45c48c 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -265,6 +265,13 @@ void __init hugetlb_vmemmap_init(struct hstate *h)
>  	BUILD_BUG_ON(__NR_USED_SUBPAGE >=
>  		     RESERVE_VMEMMAP_SIZE / sizeof(struct page));
>  
> +	/*
> +	 * The compiler can help us to optimize this function to null
> +	 * when the size of the struct page is not power of 2.
> +	 */
> +	if (!is_power_of_2(sizeof(struct page)))
> +		return;
> +
>  	if (!hugetlb_free_vmemmap_enabled)
>  		return;
>  
> diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
> index cb2bef8f9e73..29aaaf7b741e 100644
> --- a/mm/hugetlb_vmemmap.h
> +++ b/mm/hugetlb_vmemmap.h
> @@ -21,6 +21,12 @@ void hugetlb_vmemmap_init(struct hstate *h);
>   */
>  static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h)
>  {
> +	/*
> +	 * This check aims to let the compiler help us optimize the code as
> +	 * much as possible.
> +	 */
> +	if (!is_power_of_2(sizeof(struct page)))
> +		return 0;
>  	return h->nr_free_vmemmap_pages;
>  }
>  #else
> -- 
> 2.11.0
>
Muchun Song March 11, 2021, 7:33 a.m. UTC | #2
On Wed, Mar 10, 2021 at 11:41 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 08-03-21 18:28:07, Muchun Song wrote:
> > When the "struct page size" crosses page boundaries we cannot
> > make use of this feature. Let free_vmemmap_pages_per_hpage()
> > return zero if that is the case, most of the functions can be
> > optimized away.
>
> I am confused. Don't you check for this in early_hugetlb_free_vmemmap_param already?

Right.

> Why do we need any runtime checks?

If the size of the struct page is not power of 2, compiler can think
is_hugetlb_free_vmemmap_enabled() always return false. So
the code snippet of this user can be optimized away.

E.g.

if (is_hugetlb_free_vmemmap_enabled())
        /* do something */

The compiler can drop "/* do something */" directly, because
it knows is_hugetlb_free_vmemmap_enabled() always returns
false.

Thanks.

>
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> > Reviewed-by: Oscar Salvador <osalvador@suse.de>
> > Tested-by: Chen Huang <chenhuang5@huawei.com>
> > Tested-by: Bodeddula Balasubramaniam <bodeddub@amazon.com>
> > ---
> >  include/linux/hugetlb.h | 3 ++-
> >  mm/hugetlb_vmemmap.c    | 7 +++++++
> >  mm/hugetlb_vmemmap.h    | 6 ++++++
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index c70421e26189..333dd0479fc2 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -880,7 +880,8 @@ extern bool hugetlb_free_vmemmap_enabled;
> >
> >  static inline bool is_hugetlb_free_vmemmap_enabled(void)
> >  {
> > -     return hugetlb_free_vmemmap_enabled;
> > +     return hugetlb_free_vmemmap_enabled &&
> > +            is_power_of_2(sizeof(struct page));
> >  }
> >  #else
> >  static inline bool is_hugetlb_free_vmemmap_enabled(void)
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index 33e42678abe3..1ba1ef45c48c 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -265,6 +265,13 @@ void __init hugetlb_vmemmap_init(struct hstate *h)
> >       BUILD_BUG_ON(__NR_USED_SUBPAGE >=
> >                    RESERVE_VMEMMAP_SIZE / sizeof(struct page));
> >
> > +     /*
> > +      * The compiler can help us to optimize this function to null
> > +      * when the size of the struct page is not power of 2.
> > +      */
> > +     if (!is_power_of_2(sizeof(struct page)))
> > +             return;
> > +
> >       if (!hugetlb_free_vmemmap_enabled)
> >               return;
> >
> > diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
> > index cb2bef8f9e73..29aaaf7b741e 100644
> > --- a/mm/hugetlb_vmemmap.h
> > +++ b/mm/hugetlb_vmemmap.h
> > @@ -21,6 +21,12 @@ void hugetlb_vmemmap_init(struct hstate *h);
> >   */
> >  static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h)
> >  {
> > +     /*
> > +      * This check aims to let the compiler help us optimize the code as
> > +      * much as possible.
> > +      */
> > +     if (!is_power_of_2(sizeof(struct page)))
> > +             return 0;
> >       return h->nr_free_vmemmap_pages;
> >  }
> >  #else
> > --
> > 2.11.0
> >
>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko March 11, 2021, 8:55 a.m. UTC | #3
On Thu 11-03-21 15:33:20, Muchun Song wrote:
> On Wed, Mar 10, 2021 at 11:41 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 08-03-21 18:28:07, Muchun Song wrote:
> > > When the "struct page size" crosses page boundaries we cannot
> > > make use of this feature. Let free_vmemmap_pages_per_hpage()
> > > return zero if that is the case, most of the functions can be
> > > optimized away.
> >
> > I am confused. Don't you check for this in early_hugetlb_free_vmemmap_param already?
> 
> Right.
> 
> > Why do we need any runtime checks?
> 
> If the size of the struct page is not power of 2, compiler can think
> is_hugetlb_free_vmemmap_enabled() always return false. So
> the code snippet of this user can be optimized away.
> 
> E.g.
> 
> if (is_hugetlb_free_vmemmap_enabled())
>         /* do something */
> 
> The compiler can drop "/* do something */" directly, because
> it knows is_hugetlb_free_vmemmap_enabled() always returns
> false.

OK, so this is a micro-optimization to generate a better code?
Is this measurable to warrant more code?
Muchun Song March 11, 2021, 9:08 a.m. UTC | #4
On Thu, Mar 11, 2021 at 4:55 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 11-03-21 15:33:20, Muchun Song wrote:
> > On Wed, Mar 10, 2021 at 11:41 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 08-03-21 18:28:07, Muchun Song wrote:
> > > > When the "struct page size" crosses page boundaries we cannot
> > > > make use of this feature. Let free_vmemmap_pages_per_hpage()
> > > > return zero if that is the case, most of the functions can be
> > > > optimized away.
> > >
> > > I am confused. Don't you check for this in early_hugetlb_free_vmemmap_param already?
> >
> > Right.
> >
> > > Why do we need any runtime checks?
> >
> > If the size of the struct page is not power of 2, compiler can think
> > is_hugetlb_free_vmemmap_enabled() always return false. So
> > the code snippet of this user can be optimized away.
> >
> > E.g.
> >
> > if (is_hugetlb_free_vmemmap_enabled())
> >         /* do something */
> >
> > The compiler can drop "/* do something */" directly, because
> > it knows is_hugetlb_free_vmemmap_enabled() always returns
> > false.
>
> OK, so this is a micro-optimization to generate a better code?

Right.

> Is this measurable to warrant more code?

I have disassembled the code to confirm this behavior.
I know this is not the hot path. But it actually can decrease
the code size.

Thanks.

> --
> Michal Hocko
> SUSE Labs
Michal Hocko March 11, 2021, 9:39 a.m. UTC | #5
On Thu 11-03-21 17:08:34, Muchun Song wrote:
> On Thu, Mar 11, 2021 at 4:55 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 11-03-21 15:33:20, Muchun Song wrote:
> > > On Wed, Mar 10, 2021 at 11:41 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Mon 08-03-21 18:28:07, Muchun Song wrote:
> > > > > When the "struct page size" crosses page boundaries we cannot
> > > > > make use of this feature. Let free_vmemmap_pages_per_hpage()
> > > > > return zero if that is the case, most of the functions can be
> > > > > optimized away.
> > > >
> > > > I am confused. Don't you check for this in early_hugetlb_free_vmemmap_param already?
> > >
> > > Right.
> > >
> > > > Why do we need any runtime checks?
> > >
> > > If the size of the struct page is not power of 2, compiler can think
> > > is_hugetlb_free_vmemmap_enabled() always return false. So
> > > the code snippet of this user can be optimized away.
> > >
> > > E.g.
> > >
> > > if (is_hugetlb_free_vmemmap_enabled())
> > >         /* do something */
> > >
> > > The compiler can drop "/* do something */" directly, because
> > > it knows is_hugetlb_free_vmemmap_enabled() always returns
> > > false.
> >
> > OK, so this is a micro-optimization to generate a better code?
> 
> Right.
> 
> > Is this measurable to warrant more code?
> 
> I have disassembled the code to confirm this behavior.
> I know this is not the hot path. But it actually can decrease
> the code size.

struct page which is not power of 2 is not a common case. Are you sure
it makes sense to micro optimize for an outliar. If you really want to
microptimize then do that for a common case - the feature being
disabled - via static key.
Muchun Song March 11, 2021, 10 a.m. UTC | #6
On Thu, Mar 11, 2021 at 5:39 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 11-03-21 17:08:34, Muchun Song wrote:
> > On Thu, Mar 11, 2021 at 4:55 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Thu 11-03-21 15:33:20, Muchun Song wrote:
> > > > On Wed, Mar 10, 2021 at 11:41 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Mon 08-03-21 18:28:07, Muchun Song wrote:
> > > > > > When the "struct page size" crosses page boundaries we cannot
> > > > > > make use of this feature. Let free_vmemmap_pages_per_hpage()
> > > > > > return zero if that is the case, most of the functions can be
> > > > > > optimized away.
> > > > >
> > > > > I am confused. Don't you check for this in early_hugetlb_free_vmemmap_param already?
> > > >
> > > > Right.
> > > >
> > > > > Why do we need any runtime checks?
> > > >
> > > > If the size of the struct page is not power of 2, compiler can think
> > > > is_hugetlb_free_vmemmap_enabled() always return false. So
> > > > the code snippet of this user can be optimized away.
> > > >
> > > > E.g.
> > > >
> > > > if (is_hugetlb_free_vmemmap_enabled())
> > > >         /* do something */
> > > >
> > > > The compiler can drop "/* do something */" directly, because
> > > > it knows is_hugetlb_free_vmemmap_enabled() always returns
> > > > false.
> > >
> > > OK, so this is a micro-optimization to generate a better code?
> >
> > Right.
> >
> > > Is this measurable to warrant more code?
> >
> > I have disassembled the code to confirm this behavior.
> > I know this is not the hot path. But it actually can decrease
> > the code size.
>
> struct page which is not power of 2 is not a common case.

I know this is not a common case. But the check of
is_power_of_2(sizeof(struct page)) does not bring extra
runtime overhead. It just tells the compiler to optimize code
as much as possible.

> Are you sure
> it makes sense to micro optimize for an outliar. If you really want to
> microptimize then do that for a common case - the feature being
> disabled - via static key.

We cannot optimize the code size (vmlinux) even if we use a static
key when the size is not power of 2.

Sorry. I am confused why you disagree with this change.
It does not bring any disadvantages.

> --
> Michal Hocko
> SUSE Labs
Michal Hocko March 11, 2021, 12:16 p.m. UTC | #7
On Thu 11-03-21 18:00:09, Muchun Song wrote:
[...]
> Sorry. I am confused why you disagree with this change.
> It does not bring any disadvantages.

Because it is adding a code which is not really necessary and which will
have to be maintained. Think of future changes which would need to grow
more of these. Hugetlb code paths shouldn't really think about size of
the struct page.
Muchun Song March 11, 2021, 1 p.m. UTC | #8
On Thu, Mar 11, 2021 at 8:16 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 11-03-21 18:00:09, Muchun Song wrote:
> [...]
> > Sorry. I am confused why you disagree with this change.
> > It does not bring any disadvantages.
>
> Because it is adding a code which is not really necessary and which will
> have to be maintained. Think of future changes which would need to grow
> more of these. Hugetlb code paths shouldn't really think about size of
> the struct page.

Got it. I will drop this patch.

> --
> Michal Hocko
> SUSE Labs
Oscar Salvador March 11, 2021, 1:45 p.m. UTC | #9
On Thu, Mar 11, 2021 at 01:16:37PM +0100, Michal Hocko wrote:
> On Thu 11-03-21 18:00:09, Muchun Song wrote:
> [...]
> > Sorry. I am confused why you disagree with this change.
> > It does not bring any disadvantages.
> 
> Because it is adding a code which is not really necessary and which will
> have to be maintained. Think of future changes which would need to grow
> more of these. Hugetlb code paths shouldn't really think about size of
> the struct page.

I have to confess that when I looked at the patch I found it nice in the way that
wipes out almost all clode dealing with vmemmap when sizeof(struct page) != power_of_2,
and I was convinced by the fact that only two places required the change.
So all in all it did not look like much churn, and not __that__ hard to maintain.

But I did not think in the case where this trick needs to be spread in more places
if the code changes over time.

So I agree that although it gets rid of a lot of code, it would seldomly pay off as
not many configuration out there are running on !power_of_2, and hugetlb is already
tricky enough.
diff mbox series

Patch

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c70421e26189..333dd0479fc2 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -880,7 +880,8 @@  extern bool hugetlb_free_vmemmap_enabled;
 
 static inline bool is_hugetlb_free_vmemmap_enabled(void)
 {
-	return hugetlb_free_vmemmap_enabled;
+	return hugetlb_free_vmemmap_enabled &&
+	       is_power_of_2(sizeof(struct page));
 }
 #else
 static inline bool is_hugetlb_free_vmemmap_enabled(void)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 33e42678abe3..1ba1ef45c48c 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -265,6 +265,13 @@  void __init hugetlb_vmemmap_init(struct hstate *h)
 	BUILD_BUG_ON(__NR_USED_SUBPAGE >=
 		     RESERVE_VMEMMAP_SIZE / sizeof(struct page));
 
+	/*
+	 * The compiler can help us to optimize this function to null
+	 * when the size of the struct page is not power of 2.
+	 */
+	if (!is_power_of_2(sizeof(struct page)))
+		return;
+
 	if (!hugetlb_free_vmemmap_enabled)
 		return;
 
diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
index cb2bef8f9e73..29aaaf7b741e 100644
--- a/mm/hugetlb_vmemmap.h
+++ b/mm/hugetlb_vmemmap.h
@@ -21,6 +21,12 @@  void hugetlb_vmemmap_init(struct hstate *h);
  */
 static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h)
 {
+	/*
+	 * This check aims to let the compiler help us optimize the code as
+	 * much as possible.
+	 */
+	if (!is_power_of_2(sizeof(struct page)))
+		return 0;
 	return h->nr_free_vmemmap_pages;
 }
 #else