diff mbox series

[hmm,3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock

Message ID 20200311183506.3997-4-jgg@ziepe.ca (mailing list archive)
State New, archived
Headers show
Series Various error case bug fixes for hmm_range_fault() | expand

Commit Message

Jason Gunthorpe March 11, 2020, 6:35 p.m. UTC
From: Jason Gunthorpe <jgg@mellanox.com>

This eventually calls into handle_mm_fault() which is a sleeping function.
Release the lock first.

hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
need the lock.

Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

Comments

Ralph Campbell March 12, 2020, 1:31 a.m. UTC | #1
On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.
> 
> Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
> Cc: Steven Price <steven.price@arm.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9e8f68eb83287a..32dcbfd3908315 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   
>   	pud = READ_ONCE(*pudp);
>   	if (pud_none(pud)) {
> -		ret = hmm_vma_walk_hole(start, end, -1, walk);
> -		goto out_unlock;
> +		spin_unlock(ptl);
> +		return hmm_vma_walk_hole(start, end, -1, walk);
>   	}
>   
>   	if (pud_huge(pud) && pud_devmap(pud)) {
> @@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		bool fault, write_fault;
>   
>   		if (!pud_present(pud)) {
> -			ret = hmm_vma_walk_hole(start, end, -1, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole(start, end, -1, walk);
>   		}
>   
>   		i = (addr - range->start) >> PAGE_SHIFT;
> @@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     cpu_flags, &fault, &write_fault);
>   		if (fault || write_fault) {
> -			ret = hmm_vma_walk_hole_(addr, end, fault,
> -						 write_fault, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
> +						  walk);
>   		}
>   
>   		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
>
Steven Price March 12, 2020, 8:54 a.m. UTC | #2
On 11/03/2020 18:35, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.
> 
> Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
> Cc: Steven Price <steven.price@arm.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Sorry about that, thanks for fixing.

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>   mm/hmm.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9e8f68eb83287a..32dcbfd3908315 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   
>   	pud = READ_ONCE(*pudp);
>   	if (pud_none(pud)) {
> -		ret = hmm_vma_walk_hole(start, end, -1, walk);
> -		goto out_unlock;
> +		spin_unlock(ptl);
> +		return hmm_vma_walk_hole(start, end, -1, walk);
>   	}
>   
>   	if (pud_huge(pud) && pud_devmap(pud)) {
> @@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		bool fault, write_fault;
>   
>   		if (!pud_present(pud)) {
> -			ret = hmm_vma_walk_hole(start, end, -1, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole(start, end, -1, walk);
>   		}
>   
>   		i = (addr - range->start) >> PAGE_SHIFT;
> @@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     cpu_flags, &fault, &write_fault);
>   		if (fault || write_fault) {
> -			ret = hmm_vma_walk_hole_(addr, end, fault,
> -						 write_fault, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
> +						  walk);
>   		}
>   
>   		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
>
Christoph Hellwig March 16, 2020, 9:05 a.m. UTC | #3
On Wed, Mar 11, 2020 at 03:35:01PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.

So how did this manage to not be noticed before?

The fix looks fine assuming we want something backportable before
starting the cleanups:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Jason Gunthorpe March 16, 2020, 12:56 p.m. UTC | #4
On Mon, Mar 16, 2020 at 10:05:03AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:01PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > This eventually calls into handle_mm_fault() which is a sleeping function.
> > Release the lock first.
> > 
> > hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> > need the lock.
> 
> So how did this manage to not be noticed before?

I'm still struggling a bit with how this PUD stuff works.. However
AFAICT:

1) The first hunk around pud_none() is actualy covering a race. In the
   non-race case the page walker will directly call
   hmm_vma_walk_hole() for pud_none so it would be very hard to hit
   this

2) I'm not 100% sure, but I think pud_present() == pud_none(), as
   there is no swap entry for PUD I don't know what a non-present
   but non-none entry is or how to create one. This is possibly dead
   code due to #1

3) To hit hmm_range_need_fault() requesting fault you would need
   a read-only huge PUD and a fault requesting write. I suspect
   creating a read only huge PUD entry is very rare - not something
   someone would deliberately construct.

4) To even trigger the PUD flow at all you need the 1G THP or the
   special 1G DAX pages which I strongly suspect people are not
   testing with.

> The fix looks fine assuming we want something backportable before
> starting the cleanups:

I found it easier to make the other cleanup patches make sense if they
didn't introduce all kinds of new logic too..

Thanks,
Jason
diff mbox series

Patch

diff --git a/mm/hmm.c b/mm/hmm.c
index 9e8f68eb83287a..32dcbfd3908315 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -473,8 +473,8 @@  static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 
 	pud = READ_ONCE(*pudp);
 	if (pud_none(pud)) {
-		ret = hmm_vma_walk_hole(start, end, -1, walk);
-		goto out_unlock;
+		spin_unlock(ptl);
+		return hmm_vma_walk_hole(start, end, -1, walk);
 	}
 
 	if (pud_huge(pud) && pud_devmap(pud)) {
@@ -483,8 +483,8 @@  static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		bool fault, write_fault;
 
 		if (!pud_present(pud)) {
-			ret = hmm_vma_walk_hole(start, end, -1, walk);
-			goto out_unlock;
+			spin_unlock(ptl);
+			return hmm_vma_walk_hole(start, end, -1, walk);
 		}
 
 		i = (addr - range->start) >> PAGE_SHIFT;
@@ -495,9 +495,9 @@  static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
 				     cpu_flags, &fault, &write_fault);
 		if (fault || write_fault) {
-			ret = hmm_vma_walk_hole_(addr, end, fault,
-						 write_fault, walk);
-			goto out_unlock;
+			spin_unlock(ptl);
+			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
+						  walk);
 		}
 
 		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);