diff mbox series

[v2,1/9] mm/hugetlb: remove checking hstate_is_gigantic() in return_unused_surplus_pages()

Message ID 20220623235153.2623702-2-naoya.horiguchi@linux.dev (mailing list archive)
State New
Headers show
Series mm, hwpoison: enable 1GB hugepage support (v2) | expand

Commit Message

Naoya Horiguchi June 23, 2022, 11:51 p.m. UTC
From: Naoya Horiguchi <naoya.horiguchi@nec.com>

I found a weird state of 1GB hugepage pool, caused by the following
procedure:

  - run a process reserving all free 1GB hugepages,
  - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
    /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
  - kill the reserving process.

, then all the hugepages are free *and* surplus at the same time.

  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
  3
  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
  3
  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
  0
  $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
  3

This state is resolved by reserving and allocating the pages then
freeing them again, so this seems not to result in serious problem.
But it's a little surprizing (shrinking pool suddenly fails).

This behavior is caused by hstate_is_gigantic() check in
return_unused_surplus_pages(). This was introduced so long ago in 2008
by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
it seems to me that this check is no longer unnecessary. Let's remove it.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/hugetlb.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Miaohe Lin June 24, 2022, 2:25 a.m. UTC | #1
On 2022/6/24 7:51, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> I found a weird state of 1GB hugepage pool, caused by the following
> procedure:
> 
>   - run a process reserving all free 1GB hugepages,
>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
>   - kill the reserving process.
> 
> , then all the hugepages are free *and* surplus at the same time.
> 
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>   3
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
>   3
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
>   0
>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
>   3
> 
> This state is resolved by reserving and allocating the pages then
> freeing them again, so this seems not to result in serious problem.
> But it's a little surprizing (shrinking pool suddenly fails).
> 
> This behavior is caused by hstate_is_gigantic() check in
> return_unused_surplus_pages(). This was introduced so long ago in 2008
> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> it seems to me that this check is no longer unnecessary. Let's remove it.

s/unnecessary/necessary/

> 
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  mm/hugetlb.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a57e1be41401..c538278170a2 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2432,10 +2432,6 @@ static void return_unused_surplus_pages(struct hstate *h,
>  	/* Uncommit the reservation */
>  	h->resv_huge_pages -= unused_resv_pages;
>  
> -	/* Cannot return gigantic pages currently */
> -	if (hstate_is_gigantic(h))
> -		goto out;
> -

IIUC it might be better to do the below check:
	/*
	 * Cannot return gigantic pages currently if runtime gigantic page
	 * allocation is not supported.
	 */
	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
		goto out;

But I might be miss something.

Thanks.

>  	/*
>  	 * Part (or even all) of the reservation could have been backed
>  	 * by pre-allocated pages. Only free surplus pages.
>
Muchun Song June 24, 2022, 8:03 a.m. UTC | #2
On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
> On 2022/6/24 7:51, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > I found a weird state of 1GB hugepage pool, caused by the following
> > procedure:
> > 
> >   - run a process reserving all free 1GB hugepages,
> >   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
> >     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
> >   - kill the reserving process.
> > 
> > , then all the hugepages are free *and* surplus at the same time.
> > 
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> >   3
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> >   3
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
> >   0
> >   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
> >   3
> > 
> > This state is resolved by reserving and allocating the pages then
> > freeing them again, so this seems not to result in serious problem.
> > But it's a little surprizing (shrinking pool suddenly fails).
> > 
> > This behavior is caused by hstate_is_gigantic() check in
> > return_unused_surplus_pages(). This was introduced so long ago in 2008
> > by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> > it seems to me that this check is no longer unnecessary. Let's remove it.
> 
> s/unnecessary/necessary/
> 
> > 
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> >  mm/hugetlb.c | 4 ----
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index a57e1be41401..c538278170a2 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2432,10 +2432,6 @@ static void return_unused_surplus_pages(struct hstate *h,
> >  	/* Uncommit the reservation */
> >  	h->resv_huge_pages -= unused_resv_pages;
> >  
> > -	/* Cannot return gigantic pages currently */
> > -	if (hstate_is_gigantic(h))
> > -		goto out;
> > -
> 
> IIUC it might be better to do the below check:
> 	/*
> 	 * Cannot return gigantic pages currently if runtime gigantic page
> 	 * allocation is not supported.
> 	 */
> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> 		goto out;
>

The change looks good to me. However, the comments above is unnecessary
since gigantic_page_runtime_supported() is straightforward.

Thanks.
 
> But I might be miss something.
> 
> Thanks.
> 
> >  	/*
> >  	 * Part (or even all) of the reservation could have been backed
> >  	 * by pre-allocated pages. Only free surplus pages.
> > 
> 
>
Miaohe Lin June 24, 2022, 8:15 a.m. UTC | #3
On 2022/6/24 16:03, Muchun Song wrote:
> On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
>> On 2022/6/24 7:51, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> I found a weird state of 1GB hugepage pool, caused by the following
>>> procedure:
>>>
>>>   - run a process reserving all free 1GB hugepages,
>>>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
>>>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
>>>   - kill the reserving process.
>>>
>>> , then all the hugepages are free *and* surplus at the same time.
>>>
>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>>>   3
>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
>>>   3
>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
>>>   0
>>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
>>>   3
>>>
>>> This state is resolved by reserving and allocating the pages then
>>> freeing them again, so this seems not to result in serious problem.
>>> But it's a little surprizing (shrinking pool suddenly fails).
>>>
>>> This behavior is caused by hstate_is_gigantic() check in
>>> return_unused_surplus_pages(). This was introduced so long ago in 2008
>>> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
>>> it seems to me that this check is no longer unnecessary. Let's remove it.
>>
>> s/unnecessary/necessary/
>>
>>>
>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>> ---
>>>  mm/hugetlb.c | 4 ----
>>>  1 file changed, 4 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index a57e1be41401..c538278170a2 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -2432,10 +2432,6 @@ static void return_unused_surplus_pages(struct hstate *h,
>>>  	/* Uncommit the reservation */
>>>  	h->resv_huge_pages -= unused_resv_pages;
>>>  
>>> -	/* Cannot return gigantic pages currently */
>>> -	if (hstate_is_gigantic(h))
>>> -		goto out;
>>> -
>>
>> IIUC it might be better to do the below check:
>> 	/*
>> 	 * Cannot return gigantic pages currently if runtime gigantic page
>> 	 * allocation is not supported.
>> 	 */
>> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>> 		goto out;
>>
> 
> The change looks good to me. However, the comments above is unnecessary
> since gigantic_page_runtime_supported() is straightforward.

Agree. The comments can be removed.

> 
> Thanks.

Thanks for reviewing.

>  
>> But I might be miss something.
>>
>> Thanks.
>>
>>>  	/*
>>>  	 * Part (or even all) of the reservation could have been backed
>>>  	 * by pre-allocated pages. Only free surplus pages.
>>>
>>
>>
> .
>
HORIGUCHI NAOYA(堀口 直也) June 24, 2022, 8:34 a.m. UTC | #4
On Fri, Jun 24, 2022 at 04:15:19PM +0800, Miaohe Lin wrote:
> On 2022/6/24 16:03, Muchun Song wrote:
> > On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
> >> On 2022/6/24 7:51, Naoya Horiguchi wrote:
> >>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>>
> >>> I found a weird state of 1GB hugepage pool, caused by the following
> >>> procedure:
> >>>
> >>>   - run a process reserving all free 1GB hugepages,
> >>>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
> >>>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
> >>>   - kill the reserving process.
> >>>
> >>> , then all the hugepages are free *and* surplus at the same time.
> >>>
> >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> >>>   3
> >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> >>>   3
> >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
> >>>   0
> >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
> >>>   3
> >>>
> >>> This state is resolved by reserving and allocating the pages then
> >>> freeing them again, so this seems not to result in serious problem.
> >>> But it's a little surprizing (shrinking pool suddenly fails).
> >>>
> >>> This behavior is caused by hstate_is_gigantic() check in
> >>> return_unused_surplus_pages(). This was introduced so long ago in 2008
> >>> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> >>> it seems to me that this check is no longer unnecessary. Let's remove it.
> >>
> >> s/unnecessary/necessary/

Thanks, I fixed it.

> >>
> >>>
> >>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>> ---
> >>>  mm/hugetlb.c | 4 ----
> >>>  1 file changed, 4 deletions(-)
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index a57e1be41401..c538278170a2 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -2432,10 +2432,6 @@ static void return_unused_surplus_pages(struct hstate *h,
> >>>  	/* Uncommit the reservation */
> >>>  	h->resv_huge_pages -= unused_resv_pages;
> >>>  
> >>> -	/* Cannot return gigantic pages currently */
> >>> -	if (hstate_is_gigantic(h))
> >>> -		goto out;
> >>> -
> >>
> >> IIUC it might be better to do the below check:
> >> 	/*
> >> 	 * Cannot return gigantic pages currently if runtime gigantic page
> >> 	 * allocation is not supported.
> >> 	 */
> >> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> >> 		goto out;
> >>
> > 
> > The change looks good to me. However, the comments above is unnecessary
> > since gigantic_page_runtime_supported() is straightforward.
> 
> Agree. The comments can be removed.

Thank you, both. Adding !gigantic_page_runtime_supported without comment
makes sense to me.

Thanks,
Naoya Horiguchi
Mike Kravetz June 24, 2022, 7:11 p.m. UTC | #5
On 06/24/22 08:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Fri, Jun 24, 2022 at 04:15:19PM +0800, Miaohe Lin wrote:
> > On 2022/6/24 16:03, Muchun Song wrote:
> > > On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
> > >> On 2022/6/24 7:51, Naoya Horiguchi wrote:
> > >>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > >>>
> > >>> I found a weird state of 1GB hugepage pool, caused by the following
> > >>> procedure:
> > >>>
> > >>>   - run a process reserving all free 1GB hugepages,
> > >>>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
> > >>>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
> > >>>   - kill the reserving process.
> > >>>
> > >>> , then all the hugepages are free *and* surplus at the same time.
> > >>>
> > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> > >>>   3
> > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> > >>>   3
> > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
> > >>>   0
> > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
> > >>>   3
> > >>>
> > >>> This state is resolved by reserving and allocating the pages then
> > >>> freeing them again, so this seems not to result in serious problem.
> > >>> But it's a little surprizing (shrinking pool suddenly fails).
> > >>>
> > >>> This behavior is caused by hstate_is_gigantic() check in
> > >>> return_unused_surplus_pages(). This was introduced so long ago in 2008
> > >>> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> > >>> it seems to me that this check is no longer unnecessary. Let's remove it.

Thank you for finding this!!!

> > >>> +++ b/mm/hugetlb.c
> > >>> @@ -2432,10 +2432,6 @@ static void return_unused_surplus_pages(struct hstate *h,
> > >>>  	/* Uncommit the reservation */
> > >>>  	h->resv_huge_pages -= unused_resv_pages;
> > >>>  
> > >>> -	/* Cannot return gigantic pages currently */
> > >>> -	if (hstate_is_gigantic(h))
> > >>> -		goto out;
> > >>> -
> > >>
> > >> IIUC it might be better to do the below check:
> > >> 	/*
> > >> 	 * Cannot return gigantic pages currently if runtime gigantic page
> > >> 	 * allocation is not supported.
> > >> 	 */
> > >> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> > >> 		goto out;
> > >>
> > > 
> > > The change looks good to me. However, the comments above is unnecessary
> > > since gigantic_page_runtime_supported() is straightforward.
> > 
> > Agree. The comments can be removed.
> 
> Thank you, both. Adding !gigantic_page_runtime_supported without comment
> makes sense to me.

The change above makes sense to me.  However, ...

If we make the change above, will we have the same strange situation described
in the commit message when !gigantic_page_runtime_supported() is true?

IIUC, !gigantic_page_runtime_supported implies that gigantic hugetlb
pages can not be allocated or freed at run time.  They can only be
allocated at boot time.  So, there should NEVER be surplus gigantic
pages if !gigantic_page_runtime_supported().  To avoid this situation,
perhaps we should change set_max_huge_pages as follows (not tested)?
HORIGUCHI NAOYA(堀口 直也) June 27, 2022, 6:02 a.m. UTC | #6
On Fri, Jun 24, 2022 at 12:11:07PM -0700, Mike Kravetz wrote:
> On 06/24/22 08:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Fri, Jun 24, 2022 at 04:15:19PM +0800, Miaohe Lin wrote:
> > > On 2022/6/24 16:03, Muchun Song wrote:
> > > > On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
> > > >> On 2022/6/24 7:51, Naoya Horiguchi wrote:
> > > >>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > >>>
> > > >>> I found a weird state of 1GB hugepage pool, caused by the following
> > > >>> procedure:
> > > >>>
> > > >>>   - run a process reserving all free 1GB hugepages,
> > > >>>   - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
> > > >>>     /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
> > > >>>   - kill the reserving process.
> > > >>>
> > > >>> , then all the hugepages are free *and* surplus at the same time.
> > > >>>
> > > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> > > >>>   3
> > > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> > > >>>   3
> > > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
> > > >>>   0
> > > >>>   $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
> > > >>>   3
> > > >>>
> > > >>> This state is resolved by reserving and allocating the pages then
> > > >>> freeing them again, so this seems not to result in serious problem.
> > > >>> But it's a little surprizing (shrinking pool suddenly fails).
> > > >>>
> > > >>> This behavior is caused by hstate_is_gigantic() check in
> > > >>> return_unused_surplus_pages(). This was introduced so long ago in 2008
> > > >>> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> > > >>> it seems to me that this check is no longer unnecessary. Let's remove it.
> 
> Thank you for finding this!!!
> 
> > > >>> +++ b/mm/hugetlb.c
> > > >>> @@ -2432,10 +2432,6 @@ static void return_unused_surplus_pages(struct hstate *h,
> > > >>>  	/* Uncommit the reservation */
> > > >>>  	h->resv_huge_pages -= unused_resv_pages;
> > > >>>  
> > > >>> -	/* Cannot return gigantic pages currently */
> > > >>> -	if (hstate_is_gigantic(h))
> > > >>> -		goto out;
> > > >>> -
> > > >>
> > > >> IIUC it might be better to do the below check:
> > > >> 	/*
> > > >> 	 * Cannot return gigantic pages currently if runtime gigantic page
> > > >> 	 * allocation is not supported.
> > > >> 	 */
> > > >> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> > > >> 		goto out;
> > > >>
> > > > 
> > > > The change looks good to me. However, the comments above is unnecessary
> > > > since gigantic_page_runtime_supported() is straightforward.
> > > 
> > > Agree. The comments can be removed.
> > 
> > Thank you, both. Adding !gigantic_page_runtime_supported without comment
> > makes sense to me.
> 
> The change above makes sense to me.  However, ...
> 
> If we make the change above, will we have the same strange situation described
> in the commit message when !gigantic_page_runtime_supported() is true?
> 
> IIUC, !gigantic_page_runtime_supported implies that gigantic hugetlb
> pages can not be allocated or freed at run time.  They can only be
> allocated at boot time.  So, there should NEVER be surplus gigantic
> pages if !gigantic_page_runtime_supported().

I have the same understanding as the above.

>  To avoid this situation,
> perhaps we should change set_max_huge_pages as follows (not tested)?

The suggested diff looks clearer about what it does, so I'd like to take it
in the next version.  Then, what do we do on the "if (hstate_if_gigantic())"
check in return_unused_surplus_pages in the original suggestion?  Should it
be kept as is, or removed, or checked with !gigantic_page_runtime_supported()?
I guess that the new checks prevent calling return_unused_surplus_pages()
during pool shrinking, so the check seems not necessary any more.

Thanks,
Naoya Horiguchi

> 
> -- 
> Mike Kravetz
> 
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5eabb8009d8a..c78d6c20e6b0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3416,7 +3416,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
>  	 * the user tries to allocate gigantic pages but let the user free the
>  	 * boottime allocated gigantic pages.
>  	 */
> -	if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
> +	if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
> +					!gigantic_page_runtime_supported())) {
>  		if (count > persistent_huge_pages(h)) {
>  			spin_unlock_irq(&hugetlb_lock);
>  			mutex_unlock(&h->resize_lock);
> @@ -3464,6 +3465,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
>  			goto out;
>  	}
>  
> +	/*
> +	 * We can not decrease gigantic pool size if runtime modification
> +	 * is not supported.
> +	 */
> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
> +		if (count < persistent_huge_pages(h)) {
> +			spin_unlock_irq(&hugetlb_lock);
> +			mutex_unlock(&h->resize_lock);
> +			NODEMASK_FREE(node_alloc_noretry);
> +			return -EINVAL;
> +		}
> +	}
> +
>  	/*
>  	 * Decrease the pool size
>  	 * First return free pages to the buddy allocator (being careful
Mike Kravetz June 27, 2022, 5:25 p.m. UTC | #7
On 06/27/22 06:02, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Fri, Jun 24, 2022 at 12:11:07PM -0700, Mike Kravetz wrote:
> > On 06/24/22 08:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > On Fri, Jun 24, 2022 at 04:15:19PM +0800, Miaohe Lin wrote:
> > > > On 2022/6/24 16:03, Muchun Song wrote:
> > > > > On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
> > > > >> On 2022/6/24 7:51, Naoya Horiguchi wrote:
> > > > >>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > > >>
> > > > >> IIUC it might be better to do the below check:
> > > > >> 	/*
> > > > >> 	 * Cannot return gigantic pages currently if runtime gigantic page
> > > > >> 	 * allocation is not supported.
> > > > >> 	 */
> > > > >> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> > > > >> 		goto out;
> > > > >>
> > > > > 
> > > > > The change looks good to me. However, the comments above is unnecessary
> > > > > since gigantic_page_runtime_supported() is straightforward.
> > > > 
> > > > Agree. The comments can be removed.
> > > 
> > > Thank you, both. Adding !gigantic_page_runtime_supported without comment
> > > makes sense to me.
> > 
> > The change above makes sense to me.  However, ...
> > 
> > If we make the change above, will we have the same strange situation described
> > in the commit message when !gigantic_page_runtime_supported() is true?
> > 
> > IIUC, !gigantic_page_runtime_supported implies that gigantic hugetlb
> > pages can not be allocated or freed at run time.  They can only be
> > allocated at boot time.  So, there should NEVER be surplus gigantic
> > pages if !gigantic_page_runtime_supported().
> 
> I have the same understanding as the above.
> 
> >  To avoid this situation,
> > perhaps we should change set_max_huge_pages as follows (not tested)?
> 
> The suggested diff looks clearer about what it does, so I'd like to take it
> in the next version.  Then, what do we do on the "if (hstate_if_gigantic())"
> check in return_unused_surplus_pages in the original suggestion?  Should it
> be kept as is, or removed, or checked with !gigantic_page_runtime_supported()?
> I guess that the new checks prevent calling return_unused_surplus_pages()
> during pool shrinking, so the check seems not necessary any more.

My first thought was to keep the check in return_unused_surplus_pages() as it
is called in other code paths.  However, it SHOULD only try to free surplus
hugetlb pages.  With the modifications to set_max_huge_pages we will not
have any surplus gigantic pages if !gigantic_page_runtime_supported, so
the check can be removed.

Also note that we never try to dynamically allocate surplus gigantic pages.
This also is left over from the time when we could not easily allocate a
gigantic page at runtime.  It would not surprise me if someone found a use
case to ease this restriction in the future.  Especially so if 1G THP support
is ever added.  If this happens, the check would be necessary and I would
guess that it would not be added.

Sorry for thinking our loud!!!  Although not necessary, it 'might' be a good
idea to leave the check because it would be overlooked if dynamic allocation
of gigantic surplus pages is ever added.  I do not have a strong opinion.

P.S. This also reminds me that a similar check should be added to the
demote hugetlb code path.  It would be bad if !gigantic_page_runtime_supported
and we demoted a gigantic page into numerous non-gigantic pages.  I will
send a patch.
Miaohe Lin June 28, 2022, 3:01 a.m. UTC | #8
On 2022/6/28 1:25, Mike Kravetz wrote:
> On 06/27/22 06:02, HORIGUCHI NAOYA(堀口 直也) wrote:
>> On Fri, Jun 24, 2022 at 12:11:07PM -0700, Mike Kravetz wrote:
>>> On 06/24/22 08:34, HORIGUCHI NAOYA(堀口 直也) wrote:
>>>> On Fri, Jun 24, 2022 at 04:15:19PM +0800, Miaohe Lin wrote:
>>>>> On 2022/6/24 16:03, Muchun Song wrote:
>>>>>> On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
>>>>>>> On 2022/6/24 7:51, Naoya Horiguchi wrote:
>>>>>>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>>>>>
>>>>>>> IIUC it might be better to do the below check:
>>>>>>> 	/*
>>>>>>> 	 * Cannot return gigantic pages currently if runtime gigantic page
>>>>>>> 	 * allocation is not supported.
>>>>>>> 	 */
>>>>>>> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>>>>>>> 		goto out;
>>>>>>>
>>>>>>
>>>>>> The change looks good to me. However, the comments above is unnecessary
>>>>>> since gigantic_page_runtime_supported() is straightforward.
>>>>>
>>>>> Agree. The comments can be removed.
>>>>
>>>> Thank you, both. Adding !gigantic_page_runtime_supported without comment
>>>> makes sense to me.
>>>
>>> The change above makes sense to me.  However, ...
>>>
>>> If we make the change above, will we have the same strange situation described
>>> in the commit message when !gigantic_page_runtime_supported() is true?
>>>
>>> IIUC, !gigantic_page_runtime_supported implies that gigantic hugetlb
>>> pages can not be allocated or freed at run time.  They can only be
>>> allocated at boot time.  So, there should NEVER be surplus gigantic
>>> pages if !gigantic_page_runtime_supported().
>>
>> I have the same understanding as the above.
>>
>>>  To avoid this situation,
>>> perhaps we should change set_max_huge_pages as follows (not tested)?
>>
>> The suggested diff looks clearer about what it does, so I'd like to take it
>> in the next version.  Then, what do we do on the "if (hstate_if_gigantic())"
>> check in return_unused_surplus_pages in the original suggestion?  Should it
>> be kept as is, or removed, or checked with !gigantic_page_runtime_supported()?
>> I guess that the new checks prevent calling return_unused_surplus_pages()
>> during pool shrinking, so the check seems not necessary any more.
> 
> My first thought was to keep the check in return_unused_surplus_pages() as it
> is called in other code paths.  However, it SHOULD only try to free surplus
> hugetlb pages.  With the modifications to set_max_huge_pages we will not
> have any surplus gigantic pages if !gigantic_page_runtime_supported, so
> the check can be removed.
> 
> Also note that we never try to dynamically allocate surplus gigantic pages.
> This also is left over from the time when we could not easily allocate a
> gigantic page at runtime.  It would not surprise me if someone found a use
> case to ease this restriction in the future.  Especially so if 1G THP support
> is ever added.  If this happens, the check would be necessary and I would
> guess that it would not be added.
> 
> Sorry for thinking our loud!!!  Although not necessary, it 'might' be a good
> idea to leave the check because it would be overlooked if dynamic allocation
> of gigantic surplus pages is ever added.  I do not have a strong opinion.
> 
> P.S. This also reminds me that a similar check should be added to the
> demote hugetlb code path.  It would be bad if !gigantic_page_runtime_supported
> and we demoted a gigantic page into numerous non-gigantic pages.  I will
> send a patch.

Out-of-topic
There're some places check "if (hstate_is_gigantic(h))" while others check
"if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())". Should
we make it explicit in some manner when gigantic_page_runtime_supported is
needed to make code easier to follow?

Just a trivial suggestion. Thanks!

>
HORIGUCHI NAOYA(堀口 直也) June 28, 2022, 8:38 a.m. UTC | #9
On Mon, Jun 27, 2022 at 10:25:13AM -0700, Mike Kravetz wrote:
> On 06/27/22 06:02, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Fri, Jun 24, 2022 at 12:11:07PM -0700, Mike Kravetz wrote:
> > > On 06/24/22 08:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > > On Fri, Jun 24, 2022 at 04:15:19PM +0800, Miaohe Lin wrote:
> > > > > On 2022/6/24 16:03, Muchun Song wrote:
> > > > > > On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
> > > > > >> On 2022/6/24 7:51, Naoya Horiguchi wrote:
> > > > > >>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > > > >>
> > > > > >> IIUC it might be better to do the below check:
> > > > > >> 	/*
> > > > > >> 	 * Cannot return gigantic pages currently if runtime gigantic page
> > > > > >> 	 * allocation is not supported.
> > > > > >> 	 */
> > > > > >> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> > > > > >> 		goto out;
> > > > > >>
> > > > > > 
> > > > > > The change looks good to me. However, the comments above is unnecessary
> > > > > > since gigantic_page_runtime_supported() is straightforward.
> > > > > 
> > > > > Agree. The comments can be removed.
> > > > 
> > > > Thank you, both. Adding !gigantic_page_runtime_supported without comment
> > > > makes sense to me.
> > > 
> > > The change above makes sense to me.  However, ...
> > > 
> > > If we make the change above, will we have the same strange situation described
> > > in the commit message when !gigantic_page_runtime_supported() is true?
> > > 
> > > IIUC, !gigantic_page_runtime_supported implies that gigantic hugetlb
> > > pages can not be allocated or freed at run time.  They can only be
> > > allocated at boot time.  So, there should NEVER be surplus gigantic
> > > pages if !gigantic_page_runtime_supported().
> > 
> > I have the same understanding as the above.
> > 
> > >  To avoid this situation,
> > > perhaps we should change set_max_huge_pages as follows (not tested)?
> > 
> > The suggested diff looks clearer about what it does, so I'd like to take it
> > in the next version.  Then, what do we do on the "if (hstate_if_gigantic())"
> > check in return_unused_surplus_pages in the original suggestion?  Should it
> > be kept as is, or removed, or checked with !gigantic_page_runtime_supported()?
> > I guess that the new checks prevent calling return_unused_surplus_pages()
> > during pool shrinking, so the check seems not necessary any more.
> 
> My first thought was to keep the check in return_unused_surplus_pages() as it
> is called in other code paths.  However, it SHOULD only try to free surplus
> hugetlb pages.  With the modifications to set_max_huge_pages we will not
> have any surplus gigantic pages if !gigantic_page_runtime_supported, so
> the check can be removed.
> 
> Also note that we never try to dynamically allocate surplus gigantic pages.
> This also is left over from the time when we could not easily allocate a
> gigantic page at runtime.  It would not surprise me if someone found a use
> case to ease this restriction in the future.  Especially so if 1G THP support
> is ever added.  If this happens, the check would be necessary and I would
> guess that it would not be added.
> 
> Sorry for thinking our loud!!!  Although not necessary, it 'might' be a good
> idea to leave the check because it would be overlooked if dynamic allocation
> of gigantic surplus pages is ever added.  I do not have a strong opinion.

OK, so let's keep the check.

> 
> P.S. This also reminds me that a similar check should be added to the
> demote hugetlb code path.  It would be bad if !gigantic_page_runtime_supported
> and we demoted a gigantic page into numerous non-gigantic pages.  I will
> send a patch.

Sounds nice.

Thanks,
Naoya Horiguchi
HORIGUCHI NAOYA(堀口 直也) June 30, 2022, 2:27 a.m. UTC | #10
On Tue, Jun 28, 2022 at 08:38:08AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Jun 27, 2022 at 10:25:13AM -0700, Mike Kravetz wrote:
> > On 06/27/22 06:02, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > On Fri, Jun 24, 2022 at 12:11:07PM -0700, Mike Kravetz wrote:
> > > > On 06/24/22 08:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> > > > > On Fri, Jun 24, 2022 at 04:15:19PM +0800, Miaohe Lin wrote:
> > > > > > On 2022/6/24 16:03, Muchun Song wrote:
> > > > > > > On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
> > > > > > >> On 2022/6/24 7:51, Naoya Horiguchi wrote:
> > > > > > >>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > > > > >>
> > > > > > >> IIUC it might be better to do the below check:
> > > > > > >> 	/*
> > > > > > >> 	 * Cannot return gigantic pages currently if runtime gigantic page
> > > > > > >> 	 * allocation is not supported.
> > > > > > >> 	 */
> > > > > > >> 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> > > > > > >> 		goto out;
> > > > > > >>
> > > > > > > 
> > > > > > > The change looks good to me. However, the comments above is unnecessary
> > > > > > > since gigantic_page_runtime_supported() is straightforward.
> > > > > > 
> > > > > > Agree. The comments can be removed.
> > > > > 
> > > > > Thank you, both. Adding !gigantic_page_runtime_supported without comment
> > > > > makes sense to me.
> > > > 
> > > > The change above makes sense to me.  However, ...
> > > > 
> > > > If we make the change above, will we have the same strange situation described
> > > > in the commit message when !gigantic_page_runtime_supported() is true?
> > > > 
> > > > IIUC, !gigantic_page_runtime_supported implies that gigantic hugetlb
> > > > pages can not be allocated or freed at run time.  They can only be
> > > > allocated at boot time.  So, there should NEVER be surplus gigantic
> > > > pages if !gigantic_page_runtime_supported().
> > > 
> > > I have the same understanding as the above.
> > > 
> > > >  To avoid this situation,
> > > > perhaps we should change set_max_huge_pages as follows (not tested)?
> > > 
> > > The suggested diff looks clearer about what it does, so I'd like to take it
> > > in the next version.  Then, what do we do on the "if (hstate_if_gigantic())"
> > > check in return_unused_surplus_pages in the original suggestion?  Should it
> > > be kept as is, or removed, or checked with !gigantic_page_runtime_supported()?
> > > I guess that the new checks prevent calling return_unused_surplus_pages()
> > > during pool shrinking, so the check seems not necessary any more.
> > 
> > My first thought was to keep the check in return_unused_surplus_pages() as it
> > is called in other code paths.  However, it SHOULD only try to free surplus
> > hugetlb pages.  With the modifications to set_max_huge_pages we will not
> > have any surplus gigantic pages if !gigantic_page_runtime_supported, so
> > the check can be removed.
> > 
> > Also note that we never try to dynamically allocate surplus gigantic pages.
> > This also is left over from the time when we could not easily allocate a
> > gigantic page at runtime.  It would not surprise me if someone found a use
> > case to ease this restriction in the future.  Especially so if 1G THP support
> > is ever added.  If this happens, the check would be necessary and I would
> > guess that it would not be added.
> > 
> > Sorry for thinking our loud!!!  Although not necessary, it 'might' be a good
> > idea to leave the check because it would be overlooked if dynamic allocation
> > of gigantic surplus pages is ever added.  I do not have a strong opinion.
> 
> OK, so let's keep the check.

Sorry, I found that keeping the check doesn't fix the problem.
I'll update the check with !gigantic_page_runtime_supported().

- Naoya Horiguchi
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a57e1be41401..c538278170a2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2432,10 +2432,6 @@  static void return_unused_surplus_pages(struct hstate *h,
 	/* Uncommit the reservation */
 	h->resv_huge_pages -= unused_resv_pages;
 
-	/* Cannot return gigantic pages currently */
-	if (hstate_is_gigantic(h))
-		goto out;
-
 	/*
 	 * Part (or even all) of the reservation could have been backed
 	 * by pre-allocated pages. Only free surplus pages.