diff mbox series

[hmm,8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling

Message ID 20200311183506.3997-9-jgg@ziepe.ca (mailing list archive)
State New, archived
Headers show
Series Various error case bug fixes for hmm_range_fault() | expand

Commit Message

Jason Gunthorpe March 11, 2020, 6:35 p.m. UTC
From: Jason Gunthorpe <jgg@mellanox.com>

Currently if a special PTE is encountered hmm_range_fault() immediately
returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
uses).

EFAULT should only be returned after testing with hmm_pte_need_fault().

Also pte_devmap() and pte_special() are exclusive, and there is no need to
check IS_ENABLED, pte_special() is stubbed out to return false on
unsupported architectures.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

Comments

Ralph Campbell March 12, 2020, 1:38 a.m. UTC | #1
On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Currently if a special PTE is encountered hmm_range_fault() immediately
> returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> uses).
> 
> EFAULT should only be returned after testing with hmm_pte_need_fault().
> 
> Also pte_devmap() and pte_special() are exclusive, and there is no need to
> check IS_ENABLED, pte_special() is stubbed out to return false on
> unsupported architectures.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 19 ++++++++++++-------
>   1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index f61fddf2ef6505..ca33d086bdc190 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -335,16 +335,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   			pte_unmap(ptep);
>   			return -EBUSY;
>   		}
> -	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> -		if (!is_zero_pfn(pte_pfn(pte))) {
> +	}
> +
> +	/*
> +	 * Since each architecture defines a struct page for the zero page, just
> +	 * fall through and treat it like a normal page.
> +	 */
> +	if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
> +		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
> +				   &write_fault);
> +		if (fault || write_fault) {
>   			pte_unmap(ptep);
> -			*pfn = range->values[HMM_PFN_SPECIAL];
>   			return -EFAULT;
>   		}
> -		/*
> -		 * Since each architecture defines a struct page for the zero
> -		 * page, just fall through and treat it like a normal page.
> -		 */
> +		*pfn = range->values[HMM_PFN_SPECIAL];
> +		return 0;
>   	}
>   
>   	*pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;
>
Christoph Hellwig March 16, 2020, 9:13 a.m. UTC | #2
On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Currently if a special PTE is encountered hmm_range_fault() immediately
> returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> uses).
> 
> EFAULT should only be returned after testing with hmm_pte_need_fault().
> 
> Also pte_devmap() and pte_special() are exclusive, and there is no need to
> check IS_ENABLED, pte_special() is stubbed out to return false on
> unsupported architectures.

I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
fault on special ptes that aren't the zero page as an error.
Jason Gunthorpe March 16, 2020, 12:10 p.m. UTC | #3
On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > Currently if a special PTE is encountered hmm_range_fault() immediately
> > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > uses).
> > 
> > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > 
> > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > check IS_ENABLED, pte_special() is stubbed out to return false on
> > unsupported architectures.
> 
> I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> fault on special ptes that aren't the zero page as an error.
 
I have another series that is doing that - this change is to make the
next series make sense and not introduce new control logic too.

Even when this is switched to ERROR it still needs to have the
hmm_range_fault() logic this patch introduces.

Thanks,
Jason
Christoph Hellwig March 16, 2020, 12:49 p.m. UTC | #4
On Mon, Mar 16, 2020 at 09:10:53AM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> > On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > 
> > > Currently if a special PTE is encountered hmm_range_fault() immediately
> > > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > > uses).
> > > 
> > > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > > 
> > > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > > check IS_ENABLED, pte_special() is stubbed out to return false on
> > > unsupported architectures.
> > 
> > I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> > fault on special ptes that aren't the zero page as an error.
>  
> I have another series that is doing that - this change is to make the
> next series make sense and not introduce new control logic too.

Ok.  I had some cleanups like this based of older trees, but if you are
active in this area I think I'll let you handle it.

> Even when this is switched to ERROR it still needs to have the
> hmm_range_fault() logic this patch introduces.

True.
Christoph Hellwig March 16, 2020, 12:51 p.m. UTC | #5
On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Currently if a special PTE is encountered hmm_range_fault() immediately
> returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> uses).
> 
> EFAULT should only be returned after testing with hmm_pte_need_fault().
> 
> Also pte_devmap() and pte_special() are exclusive, and there is no need to
> check IS_ENABLED, pte_special() is stubbed out to return false on
> unsupported architectures.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Jason Gunthorpe March 16, 2020, 1:04 p.m. UTC | #6
On Mon, Mar 16, 2020 at 01:49:53PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 09:10:53AM -0300, Jason Gunthorpe wrote:
> > On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> > > On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > 
> > > > Currently if a special PTE is encountered hmm_range_fault() immediately
> > > > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > > > uses).
> > > > 
> > > > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > > > 
> > > > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > > > check IS_ENABLED, pte_special() is stubbed out to return false on
> > > > unsupported architectures.
> > > 
> > > I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> > > fault on special ptes that aren't the zero page as an error.
> >  
> > I have another series that is doing that - this change is to make the
> > next series make sense and not introduce new control logic too.
> 
> Ok.  I had some cleanups like this based of older trees, but if you are
> active in this area I think I'll let you handle it.

You once said you wanted to loose the weird pfn flags scheme, so
before putting hmm_range_fault in ODP I planned to do that.

If you have your series someplace send me a URL and I'll look on it

Thanks,
Jason
Christoph Hellwig March 16, 2020, 1:12 p.m. UTC | #7
On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote:
> > Ok.  I had some cleanups like this based of older trees, but if you are
> > active in this area I think I'll let you handle it.
> 
> You once said you wanted to loose the weird pfn flags scheme, so
> before putting hmm_range_fault in ODP I planned to do that.
> 
> If you have your series someplace send me a URL and I'll look on it

I have a local branch I just started hacking on, but it is rather broken
based on various discussions we had.  But for a basic one I'd suggest
something like:

 - kill HMM_PFN_SPECIAL as it serves no purpose
 - split the ->pfns array into an input flags (optional) and an output
   pfn (mandtory) one, using new flags for the input side
 - replace the output flags/values indirection with a bunch of values
   encoded in the high bits of a u64, with the rest used for the pfn
Christoph Hellwig March 17, 2020, 12:32 p.m. UTC | #8
On Mon, Mar 16, 2020 at 02:12:01PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote:
> > > Ok.  I had some cleanups like this based of older trees, but if you are
> > > active in this area I think I'll let you handle it.
> > 
> > You once said you wanted to loose the weird pfn flags scheme, so
> > before putting hmm_range_fault in ODP I planned to do that.
> > 
> > If you have your series someplace send me a URL and I'll look on it
> 
> I have a local branch I just started hacking on, but it is rather broken
> based on various discussions we had.  But for a basic one I'd suggest
> something like:
> 
>  - kill HMM_PFN_SPECIAL as it serves no purpose
>  - split the ->pfns array into an input flags (optional) and an output
>    pfn (mandtory) one, using new flags for the input side
>  - replace the output flags/values indirection with a bunch of values
>    encoded in the high bits of a u64, with the rest used for the pfn

Thinking out loud a bit more:

 - do we really need HMM_PFN_ERROR, or is just a return value from
   hmm_range_fault enough?
 - because if it is we don't need output flags at all, and the output
   array could be struct pages, which would make for a much easier
   to use API
Jason Gunthorpe March 17, 2020, 12:53 p.m. UTC | #9
On Tue, Mar 17, 2020 at 01:32:10PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 02:12:01PM +0100, Christoph Hellwig wrote:
> > On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote:
> > > > Ok.  I had some cleanups like this based of older trees, but if you are
> > > > active in this area I think I'll let you handle it.
> > > 
> > > You once said you wanted to loose the weird pfn flags scheme, so
> > > before putting hmm_range_fault in ODP I planned to do that.
> > > 
> > > If you have your series someplace send me a URL and I'll look on it
> > 
> > I have a local branch I just started hacking on, but it is rather broken
> > based on various discussions we had.  But for a basic one I'd suggest
> > something like:
> > 
> >  - kill HMM_PFN_SPECIAL as it serves no purpose
> >  - split the ->pfns array into an input flags (optional) and an output
> >    pfn (mandtory) one, using new flags for the input side
> >  - replace the output flags/values indirection with a bunch of values
> >    encoded in the high bits of a u64, with the rest used for the pfn
> 
> Thinking out loud a bit more:
> 
>  - do we really need HMM_PFN_ERROR, or is just a return value from
>    hmm_range_fault enough?

I'm not totally clear on this. The only use for ERROR is to signal to a
non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
device fault (ie device SIGSEGV).

We can also handle ERROR by having the device take the fault to CPU,
then fail during a faulting hmm_range_fault and then dropping the
ERROR indication toward the device.

If we already know the page cannot be faulted when we read it it feels
natural to return that too.

I have a patch, that now needs rebasing, which removes the PFN_ERROR
from the faulting cases. So only non-faulting hmm_range_fault users
will see it. faulting users will always see an errno return.

>  - because if it is we don't need output flags at all, and the output
>    array could be struct pages, which would make for a much easier
>    to use API

valid and write is required for the non-faulting case, I don't
think flags can go away.

Being able to do a non-faulting hmm_range_fault is going to be a big
performance win for ODP, this is my interest here.

Jason
Christoph Hellwig March 17, 2020, 1:06 p.m. UTC | #10
On Tue, Mar 17, 2020 at 09:53:17AM -0300, Jason Gunthorpe wrote:
> > Thinking out loud a bit more:
> > 
> >  - do we really need HMM_PFN_ERROR, or is just a return value from
> >    hmm_range_fault enough?
> 
> I'm not totally clear on this. The only use for ERROR is to signal to a
> non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
> device fault (ie device SIGSEGV).
> 
> We can also handle ERROR by having the device take the fault to CPU,
> then fail during a faulting hmm_range_fault and then dropping the
> ERROR indication toward the device.
> 
> If we already know the page cannot be faulted when we read it it feels
> natural to return that too.

True.  Of course we could just stick an ERR_PTR into the page array
in this case.

> >  - because if it is we don't need output flags at all, and the output
> >    array could be struct pages, which would make for a much easier
> >    to use API
> 
> valid and write is required for the non-faulting case, I don't
> think flags can go away.

Do we have any cases where these aren't simply kept from the input array?
I couldn't find any, but I've not understood some of the subtle details
before.
Jason Gunthorpe March 17, 2020, 1:25 p.m. UTC | #11
On Tue, Mar 17, 2020 at 02:06:08PM +0100, Christoph Hellwig wrote:
> On Tue, Mar 17, 2020 at 09:53:17AM -0300, Jason Gunthorpe wrote:
> > > Thinking out loud a bit more:
> > > 
> > >  - do we really need HMM_PFN_ERROR, or is just a return value from
> > >    hmm_range_fault enough?
> > 
> > I'm not totally clear on this. The only use for ERROR is to signal to a
> > non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
> > device fault (ie device SIGSEGV).
> > 
> > We can also handle ERROR by having the device take the fault to CPU,
> > then fail during a faulting hmm_range_fault and then dropping the
> > ERROR indication toward the device.
> > 
> > If we already know the page cannot be faulted when we read it it feels
> > natural to return that too.
> 
> True.  Of course we could just stick an ERR_PTR into the page array
> in this case.

If we make the output here struct page *[] then there is also no
reason for hmm_range_fault to exist, get_user_pages already works like
that.

I see nearly the entire point of hmm_range_fault as being able to
return the flags..

> > >  - because if it is we don't need output flags at all, and the output
> > >    array could be struct pages, which would make for a much easier
> > >    to use API
> > 
> > valid and write is required for the non-faulting case, I don't
> > think flags can go away.
> 
> Do we have any cases where these aren't simply kept from the input array?
> I couldn't find any, but I've not understood some of the subtle details
> before.

Not today, I think

All this stuff becomes interesting when we have a non-faulting caller
of hmm_range_fault(). Then the input request flags will all be 0 and
the output flags will indicate the current state of the page table.

Jason
diff mbox series

Patch

diff --git a/mm/hmm.c b/mm/hmm.c
index f61fddf2ef6505..ca33d086bdc190 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -335,16 +335,21 @@  static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			pte_unmap(ptep);
 			return -EBUSY;
 		}
-	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
-		if (!is_zero_pfn(pte_pfn(pte))) {
+	}
+
+	/*
+	 * Since each architecture defines a struct page for the zero page, just
+	 * fall through and treat it like a normal page.
+	 */
+	if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
+		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
+				   &write_fault);
+		if (fault || write_fault) {
 			pte_unmap(ptep);
-			*pfn = range->values[HMM_PFN_SPECIAL];
 			return -EFAULT;
 		}
-		/*
-		 * Since each architecture defines a struct page for the zero
-		 * page, just fall through and treat it like a normal page.
-		 */
+		*pfn = range->values[HMM_PFN_SPECIAL];
+		return 0;
 	}
 
 	*pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;