diff mbox series

[09/12] hugetlb_vmemmap: Optimistically set Optimized flag

Message ID 20230825190436.55045-10-mike.kravetz@oracle.com (mailing list archive)
State New
Headers show
Series Batch hugetlb vmemmap modification operations | expand

Commit Message

Mike Kravetz Aug. 25, 2023, 7:04 p.m. UTC
At the beginning of hugetlb_vmemmap_optimize, optimistically set
the HPageVmemmapOptimized flag in the head page.  Clear the flag
if the operation fails.

No change in behavior.  However, this will become important in
subsequent patches where we batch delay TLB flushing.  We need to
make sure the content in the old and new vmemmap pages are the same.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb_vmemmap.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Muchun Song Aug. 30, 2023, 7:26 a.m. UTC | #1
On 2023/8/26 03:04, Mike Kravetz wrote:
> At the beginning of hugetlb_vmemmap_optimize, optimistically set
> the HPageVmemmapOptimized flag in the head page.  Clear the flag
> if the operation fails.
>
> No change in behavior.  However, this will become important in
> subsequent patches where we batch delay TLB flushing.  We need to
> make sure the content in the old and new vmemmap pages are the same.

Sorry, I didn't get the point here. Could you elaborate it?

>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>   mm/hugetlb_vmemmap.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index e390170c0887..500a118915ff 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -566,7 +566,9 @@ static void __hugetlb_vmemmap_optimize(const struct hstate *h,
>   	if (!vmemmap_should_optimize(h, head))
>   		return;
>   
> +	/* Optimistically assume success */
>   	static_branch_inc(&hugetlb_optimize_vmemmap_key);
> +	SetHPageVmemmapOptimized(head);
>   
>   	vmemmap_end	= vmemmap_start + hugetlb_vmemmap_size(h);
>   	vmemmap_reuse	= vmemmap_start;
> @@ -577,10 +579,10 @@ static void __hugetlb_vmemmap_optimize(const struct hstate *h,
>   	 * to the page which @vmemmap_reuse is mapped to, then free the pages
>   	 * which the range [@vmemmap_start, @vmemmap_end] is mapped to.
>   	 */
> -	if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, bulk_pages))
> +	if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, bulk_pages)) {
>   		static_branch_dec(&hugetlb_optimize_vmemmap_key);
> -	else
> -		SetHPageVmemmapOptimized(head);
> +		ClearHPageVmemmapOptimized(head);
> +	}
>   }
>   
>   /**
Mike Kravetz Aug. 30, 2023, 10:47 p.m. UTC | #2
On 08/30/23 15:26, Muchun Song wrote:
> 
> 
> On 2023/8/26 03:04, Mike Kravetz wrote:
> > At the beginning of hugetlb_vmemmap_optimize, optimistically set
> > the HPageVmemmapOptimized flag in the head page.  Clear the flag
> > if the operation fails.
> > 
> > No change in behavior.  However, this will become important in
> > subsequent patches where we batch delay TLB flushing.  We need to
> > make sure the content in the old and new vmemmap pages are the same.
> 
> Sorry, I didn't get the point here. Could you elaborate it?
> 

Sorry, this really could use a better explanation.

> > 
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> > ---
> >   mm/hugetlb_vmemmap.c | 8 +++++---
> >   1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index e390170c0887..500a118915ff 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -566,7 +566,9 @@ static void __hugetlb_vmemmap_optimize(const struct hstate *h,
> >   	if (!vmemmap_should_optimize(h, head))
> >   		return;
> > +	/* Optimistically assume success */
> >   	static_branch_inc(&hugetlb_optimize_vmemmap_key);
> > +	SetHPageVmemmapOptimized(head);
> >   	vmemmap_end	= vmemmap_start + hugetlb_vmemmap_size(h);
> >   	vmemmap_reuse	= vmemmap_start;
> > @@ -577,10 +579,10 @@ static void __hugetlb_vmemmap_optimize(const struct hstate *h,
> >   	 * to the page which @vmemmap_reuse is mapped to, then free the pages
> >   	 * which the range [@vmemmap_start, @vmemmap_end] is mapped to.
> >   	 */
> > -	if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, bulk_pages))
> > +	if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, bulk_pages)) {
> >   		static_branch_dec(&hugetlb_optimize_vmemmap_key);
> > -	else
> > -		SetHPageVmemmapOptimized(head);
> > +		ClearHPageVmemmapOptimized(head);
> > +	}

Consider the case where we have successfully remapped vmemmap AND
- we have replaced the page table page (pte page) containing the struct
  page of the hugetlb head page.  Joao's commit 11aad2631bf7
  'mm/hugetlb_vmemmap: remap head page to newly allocated page'.
- we have NOT flushed the TLB after remapping due to batching the
  operations before flush.

In this case, it is possible that the old head page is still in the TLB
and caches and SetHPageVmemmapOptimized(head) will actually set the flag
in the old pte page.  We then have an optimized hugetlb page without the
HPageVmemmapOptimized flag set.  When developing this series, we
experienced various BUGs as a result of this situation.

In the case of an error during optimization, we do a TLB flush so if
we need to clear the flag we will write to the correct pte page.

Hope that makes sense.

I add an explanation like this to the commit message and perhaps put
this closer to/or squash with the patch that batches operations before
flushing TLB.
Muchun Song Aug. 31, 2023, 3:27 a.m. UTC | #3
> On Aug 31, 2023, at 06:47, Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
> On 08/30/23 15:26, Muchun Song wrote:
>> 
>> 
>> On 2023/8/26 03:04, Mike Kravetz wrote:
>>> At the beginning of hugetlb_vmemmap_optimize, optimistically set
>>> the HPageVmemmapOptimized flag in the head page.  Clear the flag
>>> if the operation fails.
>>> 
>>> No change in behavior.  However, this will become important in
>>> subsequent patches where we batch delay TLB flushing.  We need to
>>> make sure the content in the old and new vmemmap pages are the same.
>> 
>> Sorry, I didn't get the point here. Could you elaborate it?
>> 
> 
> Sorry, this really could use a better explanation.
> 
>>> 
>>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>>> ---
>>>  mm/hugetlb_vmemmap.c | 8 +++++---
>>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index e390170c0887..500a118915ff 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -566,7 +566,9 @@ static void __hugetlb_vmemmap_optimize(const struct hstate *h,
>>>   if (!vmemmap_should_optimize(h, head))
>>>   return;
>>> + /* Optimistically assume success */
>>>   static_branch_inc(&hugetlb_optimize_vmemmap_key);
>>> + SetHPageVmemmapOptimized(head);
>>>   vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h);
>>>   vmemmap_reuse = vmemmap_start;
>>> @@ -577,10 +579,10 @@ static void __hugetlb_vmemmap_optimize(const struct hstate *h,
>>>    * to the page which @vmemmap_reuse is mapped to, then free the pages
>>>    * which the range [@vmemmap_start, @vmemmap_end] is mapped to.
>>>    */
>>> - if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, bulk_pages))
>>> + if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, bulk_pages)) {
>>>   static_branch_dec(&hugetlb_optimize_vmemmap_key);
>>> - else
>>> - SetHPageVmemmapOptimized(head);
>>> + ClearHPageVmemmapOptimized(head);
>>> + }
> 
> Consider the case where we have successfully remapped vmemmap AND
> - we have replaced the page table page (pte page) containing the struct
>  page of the hugetlb head page.  Joao's commit 11aad2631bf7
>  'mm/hugetlb_vmemmap: remap head page to newly allocated page'.
> - we have NOT flushed the TLB after remapping due to batching the
>  operations before flush.
> 
> In this case, it is possible that the old head page is still in the TLB
> and caches and SetHPageVmemmapOptimized(head) will actually set the flag
> in the old pte page.  We then have an optimized hugetlb page without the
> HPageVmemmapOptimized flag set.  When developing this series, we
> experienced various BUGs as a result of this situation.

Now, I got it. Thanks for your elaboration.

> 
> In the case of an error during optimization, we do a TLB flush so if
> we need to clear the flag we will write to the correct pte page.

Right.

> 
> Hope that makes sense.
> 
> I add an explanation like this to the commit message and perhaps put
> this closer to/or squash with the patch that batches operations before
> flushing TLB.

Yes. But I'd also like to add a big comment to explain what's going on here
instead of a simple "Optimistically assume success". This one really makes
me think it is an optimization not a mandatory premise.

Thanks.

> -- 
> Mike Kravetz
> 
>>>  }
>>>  /**
diff mbox series

Patch

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index e390170c0887..500a118915ff 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -566,7 +566,9 @@  static void __hugetlb_vmemmap_optimize(const struct hstate *h,
 	if (!vmemmap_should_optimize(h, head))
 		return;
 
+	/* Optimistically assume success */
 	static_branch_inc(&hugetlb_optimize_vmemmap_key);
+	SetHPageVmemmapOptimized(head);
 
 	vmemmap_end	= vmemmap_start + hugetlb_vmemmap_size(h);
 	vmemmap_reuse	= vmemmap_start;
@@ -577,10 +579,10 @@  static void __hugetlb_vmemmap_optimize(const struct hstate *h,
 	 * to the page which @vmemmap_reuse is mapped to, then free the pages
 	 * which the range [@vmemmap_start, @vmemmap_end] is mapped to.
 	 */
-	if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, bulk_pages))
+	if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, bulk_pages)) {
 		static_branch_dec(&hugetlb_optimize_vmemmap_key);
-	else
-		SetHPageVmemmapOptimized(head);
+		ClearHPageVmemmapOptimized(head);
+	}
 }
 
 /**