diff mbox series

mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp

Message ID 20240521073446.23185-1-osalvador@suse.de (mailing list archive)
State New
Headers show
Series mm/hugetlb: Move vmf_anon_prepare upfront in hugetlb_wp | expand

Commit Message

Oscar Salvador May 21, 2024, 7:34 a.m. UTC
hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which
means that we might need to call restore_reserve_on_error() upon error.
vmf_anon_prepare() releases the vma lock before returning, but
restore_reserve_on_error() expects the vma lock to be held by the caller.

Fix it by calling vmf_anon_prepare() before allocating the page.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()")
---
I did not hit this bug, I just spotted this because I was looking at hugetlb_wp
for some other reason. And I did not want to get creative to see if I could
trigger this so I could get a backtrace.
My assumption is that we could trigger this if 1) this was a shared mapping,
so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause
the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp().

 mm/hugetlb.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Comments

David Hildenbrand May 21, 2024, 9:56 a.m. UTC | #1
On 21.05.24 09:34, Oscar Salvador wrote:
> hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which
> means that we might need to call restore_reserve_on_error() upon error.
> vmf_anon_prepare() releases the vma lock before returning, but
> restore_reserve_on_error() expects the vma lock to be held by the caller.
> 
> Fix it by calling vmf_anon_prepare() before allocating the page.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()")
> ---
> I did not hit this bug, I just spotted this because I was looking at hugetlb_wp
> for some other reason. And I did not want to get creative to see if I could
> trigger this so I could get a backtrace.
> My assumption is that we could trigger this if 1) this was a shared mapping,
> so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause
> the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp().

FOLL_WRITE should never result in FLAG_UNSHARE.

> 
>   mm/hugetlb.c | 17 +++++++++--------
>   1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6be78e7d4f6e..eb0d8a45505e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6005,6 +6005,15 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
>   	 * be acquired again before returning to the caller, as expected.
>   	 */
>   	spin_unlock(vmf->ptl);
> +
> +	/*
> +	 * When the original hugepage is shared one, it does not have
> +	 * anon_vma prepared.
> +	 */
> +	ret = vmf_anon_prepare(vmf);
> +	if (unlikely(ret))
> +		goto out_release_old;
> +
>   	new_folio = alloc_hugetlb_folio(vma, vmf->address, outside_reserve);
>   
>   	if (IS_ERR(new_folio)) {
> @@ -6058,14 +6067,6 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
>   		goto out_release_old;
>   	}
>   
> -	/*
> -	 * When the original hugepage is shared one, it does not have
> -	 * anon_vma prepared.
> -	 */
> -	ret = vmf_anon_prepare(vmf);
> -	if (unlikely(ret))
> -		goto out_release_all;
> -
>   	if (copy_user_large_folio(new_folio, old_folio, vmf->real_address, vma)) {
>   		ret = VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h));
>   		goto out_release_all;

The joy of hugetlb reservation code.

LGTM
Oscar Salvador May 21, 2024, 10:23 a.m. UTC | #2
On Tue, May 21, 2024 at 11:56:54AM +0200, David Hildenbrand wrote:
> On 21.05.24 09:34, Oscar Salvador wrote:
> > hugetlb_wp calls vmf_anon_prepare() after having allocated a page, which
> > means that we might need to call restore_reserve_on_error() upon error.
> > vmf_anon_prepare() releases the vma lock before returning, but
> > restore_reserve_on_error() expects the vma lock to be held by the caller.
> > 
> > Fix it by calling vmf_anon_prepare() before allocating the page.
> > 
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > Fixes: 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of anon_vma_prepare()")
> > ---
> > I did not hit this bug, I just spotted this because I was looking at hugetlb_wp
> > for some other reason. And I did not want to get creative to see if I could
> > trigger this so I could get a backtrace.
> > My assumption is that we could trigger this if 1) this was a shared mapping,
> > so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause
> > the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp().
> 
> FOLL_WRITE should never result in FLAG_UNSHARE.

You are right.
It was quite early when I looked at this and I managed to confuse myself when
reading hugetlb_follow_page_mask().
 
> The joy of hugetlb reservation code.
> 
> LGTM

thanks David!
Oscar Salvador May 27, 2024, 8:53 a.m. UTC | #3
On Tue, May 21, 2024 at 09:34:46AM +0200, Oscar Salvador wrote:
> I did not hit this bug, I just spotted this because I was looking at hugetlb_wp
> for some other reason. And I did not want to get creative to see if I could
> trigger this so I could get a backtrace.
> My assumption is that we could trigger this if 1) this was a shared mapping,
> so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause
> the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp().

So I checked this again and I have to confess I am bit confused.

hugetlb_wp() can be called from either hugetlb_fault() or hugetlb_no_page().

hugetlb_fault()->hugetlb_wp() upon FAULT_FLAG_{WRITE,UNSHARE}
hugetlb_no_page->hugetlb_wp()-> upon FAULT_FLAG_WRITE && !VM_SHARED

hugetlb_no_page()->vmf_anon_prepare() upon !VM_SHARED, which means that VM_SHARED
mappings do not have vma->anon_vma, while others do.

hugetlb_wp() will call set_huge_ptep_writable() right away and return if it sees
that the mapping is shared.
So the only other we have to end up in hugetlb_wp() is via FAULT_FLAG_UNSHARE.
For that to happen gup_must_unshare() must return true, which means the following
assumptions must hold.

- For Anonymous pages:
  1) !PageAnonExclusive
- For Filebacked pages:
  2) We do not have a vma
  3) It is a COW mapping

1) If gup_must_unshare() returns true for Anonymous pages because the page is not
   exclusive and must be unshared, hugetlb_wp() will already see the
   vma->anon_prepare being initialized because of the previous
   hugetlb_no_page()->vmf_anon_prepare.

2) I do not quite understand this case.
3) !VMSHARED mappings already had its anon_vma initialized in
   hugetlb_no_page()->vmf_anon_prepare.

Probably I am missing some bits here, but I cannot see how we would even
need vmf_anon_prepare() in hugetlb_wp() in the first place.
David Hildenbrand May 27, 2024, 1:17 p.m. UTC | #4
Am 27.05.24 um 10:53 schrieb Oscar Salvador:
> On Tue, May 21, 2024 at 09:34:46AM +0200, Oscar Salvador wrote:
>> I did not hit this bug, I just spotted this because I was looking at hugetlb_wp
>> for some other reason. And I did not want to get creative to see if I could
>> trigger this so I could get a backtrace.
>> My assumption is that we could trigger this if 1) this was a shared mapping,
>> so no anon_vma and 2) we call in GUP code with FOLL_WRITE, which would cause
>> the FLAG_UNSHARE to be passed, so we will end up in hugetlb_wp().
> 
> So I checked this again and I have to confess I am bit confused.
> 
> hugetlb_wp() can be called from either hugetlb_fault() or hugetlb_no_page().
> 
> hugetlb_fault()->hugetlb_wp() upon FAULT_FLAG_{WRITE,UNSHARE}
> hugetlb_no_page->hugetlb_wp()-> upon FAULT_FLAG_WRITE && !VM_SHARED
> 
> hugetlb_no_page()->vmf_anon_prepare() upon !VM_SHARED, which means that VM_SHARED
> mappings do not have vma->anon_vma, while others do.
> 
> hugetlb_wp() will call set_huge_ptep_writable() right away and return if it sees
> that the mapping is shared.
> So the only other we have to end up in hugetlb_wp() is via FAULT_FLAG_UNSHARE.
> For that to happen gup_must_unshare() must return true, which means the following
> assumptions must hold.
> 
> - For Anonymous pages:
>    1) !PageAnonExclusive
> - For Filebacked pages:
>    2) We do not have a vma
>    3) It is a COW mapping
> 
> 1) If gup_must_unshare() returns true for Anonymous pages because the page is not
>     exclusive and must be unshared, hugetlb_wp() will already see the
>     vma->anon_prepare being initialized because of the previous
>     hugetlb_no_page()->vmf_anon_prepare.
> 
> 2) I do not quite understand this case.

gup_must_unshare() without a VMA is only used for GUP-fast. Before triggering a 
page fault we always fallback to GUP-slow first, where we have a VMA.

IMHO we better not make assumptions that hugetlb_wp() will always already have 
an anon VMA.
Oscar Salvador May 27, 2024, 1:54 p.m. UTC | #5
On Mon, May 27, 2024 at 03:17:01PM +0200, David Hildenbrand wrote:
> gup_must_unshare() without a VMA is only used for GUP-fast. Before
> triggering a page fault we always fallback to GUP-slow first, where we have
> a VMA.

I see, thanks for explaining!

> IMHO we better not make assumptions that hugetlb_wp() will always already
> have an anon VMA.

Fine by me, I just wanted to have the picture more clear, because I could not
see how we could end up in such situation, but better be safe and than sorry.

@Andrew: This fix is to avoid an issue like [1], but for hugetlb_wp().

[1] https://lore.kernel.org/linux-mm/000000000000daf1e10615e64dcb@google.com/T/
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6be78e7d4f6e..eb0d8a45505e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6005,6 +6005,15 @@  static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
 	 * be acquired again before returning to the caller, as expected.
 	 */
 	spin_unlock(vmf->ptl);
+
+	/*
+	 * When the original hugepage is shared one, it does not have
+	 * anon_vma prepared.
+	 */
+	ret = vmf_anon_prepare(vmf);
+	if (unlikely(ret))
+		goto out_release_old;
+
 	new_folio = alloc_hugetlb_folio(vma, vmf->address, outside_reserve);
 
 	if (IS_ERR(new_folio)) {
@@ -6058,14 +6067,6 @@  static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
 		goto out_release_old;
 	}
 
-	/*
-	 * When the original hugepage is shared one, it does not have
-	 * anon_vma prepared.
-	 */
-	ret = vmf_anon_prepare(vmf);
-	if (unlikely(ret))
-		goto out_release_all;
-
 	if (copy_user_large_folio(new_folio, old_folio, vmf->real_address, vma)) {
 		ret = VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h));
 		goto out_release_all;