Message ID | 20200311183506.3997-9-jgg@ziepe.ca (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Various error case bug fixes for hmm_range_fault() | expand |
On 3/11/20 11:35 AM, Jason Gunthorpe wrote: > From: Jason Gunthorpe <jgg@mellanox.com> > > Currently if a special PTE is encountered hmm_range_fault() immediately > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing > uses). > > EFAULT should only be returned after testing with hmm_pte_need_fault(). > > Also pte_devmap() and pte_special() are exclusive, and there is no need to > check IS_ENABLED, pte_special() is stubbed out to return false on > unsupported architectures. > > Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem") > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> > --- > mm/hmm.c | 19 ++++++++++++------- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/mm/hmm.c b/mm/hmm.c > index f61fddf2ef6505..ca33d086bdc190 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -335,16 +335,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, > pte_unmap(ptep); > return -EBUSY; > } > - } else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) { > - if (!is_zero_pfn(pte_pfn(pte))) { > + } > + > + /* > + * Since each architecture defines a struct page for the zero page, just > + * fall through and treat it like a normal page. > + */ > + if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) { > + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault, > + &write_fault); > + if (fault || write_fault) { > pte_unmap(ptep); > - *pfn = range->values[HMM_PFN_SPECIAL]; > return -EFAULT; > } > - /* > - * Since each architecture defines a struct page for the zero > - * page, just fall through and treat it like a normal page. > - */ > + *pfn = range->values[HMM_PFN_SPECIAL]; > + return 0; > } > > *pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags; >
On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe <jgg@mellanox.com> > > Currently if a special PTE is encountered hmm_range_fault() immediately > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing > uses). > > EFAULT should only be returned after testing with hmm_pte_need_fault(). > > Also pte_devmap() and pte_special() are exclusive, and there is no need to > check IS_ENABLED, pte_special() is stubbed out to return false on > unsupported architectures. I think the right fix is to just kill HMM_PFN_SPECIAL and treat any fault on special ptes that aren't the zero page as an error.
On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote: > On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote: > > From: Jason Gunthorpe <jgg@mellanox.com> > > > > Currently if a special PTE is encountered hmm_range_fault() immediately > > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing > > uses). > > > > EFAULT should only be returned after testing with hmm_pte_need_fault(). > > > > Also pte_devmap() and pte_special() are exclusive, and there is no need to > > check IS_ENABLED, pte_special() is stubbed out to return false on > > unsupported architectures. > > I think the right fix is to just kill HMM_PFN_SPECIAL and treat any > fault on special ptes that aren't the zero page as an error. I have another series that is doing that - this change is to make the next series make sense and not introduce new control logic too. Even when this is switched to ERROR it still needs to have the hmm_range_fault() logic this patch introduces. Thanks, Jason
On Mon, Mar 16, 2020 at 09:10:53AM -0300, Jason Gunthorpe wrote: > On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote: > > On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote: > > > From: Jason Gunthorpe <jgg@mellanox.com> > > > > > > Currently if a special PTE is encountered hmm_range_fault() immediately > > > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing > > > uses). > > > > > > EFAULT should only be returned after testing with hmm_pte_need_fault(). > > > > > > Also pte_devmap() and pte_special() are exclusive, and there is no need to > > > check IS_ENABLED, pte_special() is stubbed out to return false on > > > unsupported architectures. > > > > I think the right fix is to just kill HMM_PFN_SPECIAL and treat any > > fault on special ptes that aren't the zero page as an error. > > I have another series that is doing that - this change is to make the > next series make sense and not introduce new control logic too. Ok. I had some cleanups like this based of older trees, but if you are active in this area I think I'll let you handle it. > Even when this is switched to ERROR it still needs to have the > hmm_range_fault() logic this patch introduces. True.
On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe <jgg@mellanox.com> > > Currently if a special PTE is encountered hmm_range_fault() immediately > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing > uses). > > EFAULT should only be returned after testing with hmm_pte_need_fault(). > > Also pte_devmap() and pte_special() are exclusive, and there is no need to > check IS_ENABLED, pte_special() is stubbed out to return false on > unsupported architectures. Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de>
On Mon, Mar 16, 2020 at 01:49:53PM +0100, Christoph Hellwig wrote: > On Mon, Mar 16, 2020 at 09:10:53AM -0300, Jason Gunthorpe wrote: > > On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote: > > > On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote: > > > > From: Jason Gunthorpe <jgg@mellanox.com> > > > > > > > > Currently if a special PTE is encountered hmm_range_fault() immediately > > > > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing > > > > uses). > > > > > > > > EFAULT should only be returned after testing with hmm_pte_need_fault(). > > > > > > > > Also pte_devmap() and pte_special() are exclusive, and there is no need to > > > > check IS_ENABLED, pte_special() is stubbed out to return false on > > > > unsupported architectures. > > > > > > I think the right fix is to just kill HMM_PFN_SPECIAL and treat any > > > fault on special ptes that aren't the zero page as an error. > > > > I have another series that is doing that - this change is to make the > > next series make sense and not introduce new control logic too. > > Ok. I had some cleanups like this based of older trees, but if you are > active in this area I think I'll let you handle it. You once said you wanted to loose the weird pfn flags scheme, so before putting hmm_range_fault in ODP I planned to do that. If you have your series someplace send me a URL and I'll look on it Thanks, Jason
On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote: > > Ok. I had some cleanups like this based of older trees, but if you are > > active in this area I think I'll let you handle it. > > You once said you wanted to loose the weird pfn flags scheme, so > before putting hmm_range_fault in ODP I planned to do that. > > If you have your series someplace send me a URL and I'll look on it I have a local branch I just started hacking on, but it is rather broken based on various discussions we had. But for a basic one I'd suggest something like: - kill HMM_PFN_SPECIAL as it serves no purpose - split the ->pfns array into an input flags (optional) and an output pfn (mandtory) one, using new flags for the input side - replace the output flags/values indirection with a bunch of values encoded in the high bits of a u64, with the rest used for the pfn
On Mon, Mar 16, 2020 at 02:12:01PM +0100, Christoph Hellwig wrote: > On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote: > > > Ok. I had some cleanups like this based of older trees, but if you are > > > active in this area I think I'll let you handle it. > > > > You once said you wanted to loose the weird pfn flags scheme, so > > before putting hmm_range_fault in ODP I planned to do that. > > > > If you have your series someplace send me a URL and I'll look on it > > I have a local branch I just started hacking on, but it is rather broken > based on various discussions we had. But for a basic one I'd suggest > something like: > > - kill HMM_PFN_SPECIAL as it serves no purpose > - split the ->pfns array into an input flags (optional) and an output > pfn (mandtory) one, using new flags for the input side > - replace the output flags/values indirection with a bunch of values > encoded in the high bits of a u64, with the rest used for the pfn Thinking out loud a bit more: - do we really need HMM_PFN_ERROR, or is just a return value from hmm_range_fault enough? - because if it is we don't need output flags at all, and the output array could be struct pages, which would make for a much easier to use API
On Tue, Mar 17, 2020 at 01:32:10PM +0100, Christoph Hellwig wrote: > On Mon, Mar 16, 2020 at 02:12:01PM +0100, Christoph Hellwig wrote: > > On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote: > > > > Ok. I had some cleanups like this based of older trees, but if you are > > > > active in this area I think I'll let you handle it. > > > > > > You once said you wanted to loose the weird pfn flags scheme, so > > > before putting hmm_range_fault in ODP I planned to do that. > > > > > > If you have your series someplace send me a URL and I'll look on it > > > > I have a local branch I just started hacking on, but it is rather broken > > based on various discussions we had. But for a basic one I'd suggest > > something like: > > > > - kill HMM_PFN_SPECIAL as it serves no purpose > > - split the ->pfns array into an input flags (optional) and an output > > pfn (mandtory) one, using new flags for the input side > > - replace the output flags/values indirection with a bunch of values > > encoded in the high bits of a u64, with the rest used for the pfn > > Thinking out loud a bit more: > > - do we really need HMM_PFN_ERROR, or is just a return value from > hmm_range_fault enough? I'm not totally clear on this. The only use for ERROR is to signal to a non-faulting hmm_range_fault (ie shapshot) that the page should generate a device fault (ie device SIGSEGV). We can also handle ERROR by having the device take the fault to CPU, then fail during a faulting hmm_range_fault and then dropping the ERROR indication toward the device. If we already know the page cannot be faulted when we read it it feels natural to return that too. I have a patch, that now needs rebasing, which removes the PFN_ERROR from the faulting cases. So only non-faulting hmm_range_fault users will see it. faulting users will always see an errno return. > - because if it is we don't need output flags at all, and the output > array could be struct pages, which would make for a much easier > to use API valid and write is required for the non-faulting case, I don't think flags can go away. Being able to do a non-faulting hmm_range_fault is going to be a big performance win for ODP, this is my interest here. Jason
On Tue, Mar 17, 2020 at 09:53:17AM -0300, Jason Gunthorpe wrote: > > Thinking out loud a bit more: > > > > - do we really need HMM_PFN_ERROR, or is just a return value from > > hmm_range_fault enough? > > I'm not totally clear on this. The only use for ERROR is to signal to a > non-faulting hmm_range_fault (ie shapshot) that the page should generate a > device fault (ie device SIGSEGV). > > We can also handle ERROR by having the device take the fault to CPU, > then fail during a faulting hmm_range_fault and then dropping the > ERROR indication toward the device. > > If we already know the page cannot be faulted when we read it it feels > natural to return that too. True. Of course we could just stick an ERR_PTR into the page array in this case. > > - because if it is we don't need output flags at all, and the output > > array could be struct pages, which would make for a much easier > > to use API > > valid and write is required for the non-faulting case, I don't > think flags can go away. Do we have any cases where these aren't simply kept from the input array? I couldn't find any, but I've not understood some of the subtle details before.
On Tue, Mar 17, 2020 at 02:06:08PM +0100, Christoph Hellwig wrote: > On Tue, Mar 17, 2020 at 09:53:17AM -0300, Jason Gunthorpe wrote: > > > Thinking out loud a bit more: > > > > > > - do we really need HMM_PFN_ERROR, or is just a return value from > > > hmm_range_fault enough? > > > > I'm not totally clear on this. The only use for ERROR is to signal to a > > non-faulting hmm_range_fault (ie shapshot) that the page should generate a > > device fault (ie device SIGSEGV). > > > > We can also handle ERROR by having the device take the fault to CPU, > > then fail during a faulting hmm_range_fault and then dropping the > > ERROR indication toward the device. > > > > If we already know the page cannot be faulted when we read it it feels > > natural to return that too. > > True. Of course we could just stick an ERR_PTR into the page array > in this case. If we make the output here struct page *[] then there is also no reason for hmm_range_fault to exist, get_user_pages already works like that. I see nearly the entire point of hmm_range_fault as being able to return the flags.. > > > - because if it is we don't need output flags at all, and the output > > > array could be struct pages, which would make for a much easier > > > to use API > > > > valid and write is required for the non-faulting case, I don't > > think flags can go away. > > Do we have any cases where these aren't simply kept from the input array? > I couldn't find any, but I've not understood some of the subtle details > before. Not today, I think All this stuff becomes interesting when we have a non-faulting caller of hmm_range_fault(). Then the input request flags will all be 0 and the output flags will indicate the current state of the page table. Jason
diff --git a/mm/hmm.c b/mm/hmm.c index f61fddf2ef6505..ca33d086bdc190 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -335,16 +335,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, pte_unmap(ptep); return -EBUSY; } - } else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) { - if (!is_zero_pfn(pte_pfn(pte))) { + } + + /* + * Since each architecture defines a struct page for the zero page, just + * fall through and treat it like a normal page. + */ + if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) { + hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault, + &write_fault); + if (fault || write_fault) { pte_unmap(ptep); - *pfn = range->values[HMM_PFN_SPECIAL]; return -EFAULT; } - /* - * Since each architecture defines a struct page for the zero - * page, just fall through and treat it like a normal page. - */ + *pfn = range->values[HMM_PFN_SPECIAL]; + return 0; } *pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;