Message ID | 148910376733.27406.9048213402296101821.stgit@djiang5-desk3.ch.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Mar 9, 2017 at 3:56 PM, Dave Jiang <dave.jiang@intel.com> wrote: > Jeff Moyer reports that: > " > With a device dax alignment of 4KB or 2MB, I get sigbus when running the > attached fio job file for the current kernel (4.11.0-rc1+). If I > specify an alignment of 1GB, it works. > > I turned on debug output, and saw that it was failing in the huge fault > code. > > [ 4614.138357] dax dax1.0: dax_open > [ 4614.154838] dax dax1.0: dax_mmap > [ 4614.171898] dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 - 0x7f0ce0800000) > [ 4614.211720] dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf600) failed > [ 4614.568911] dax dax1.0: dax_release > > fio config for reproduce: > [global] > ioengine=dev-dax > direct=0 > filename=/dev/dax0.0 > bs=2m > > [write] > rw=write > > [read] > stonewall > rw=read > " > > It looks like the code does not fallback at all when handling faults. The > fault handlers are missing some boundary checking for the VMA. Also, we need > to fall back if the alignment requested is less than the P-entry size. > > Reported-by: Jeff Moyer <jmoyer@redhat.com> > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/dax/dax.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c > index 174690a..5c7998e 100644 > --- a/drivers/dax/dax.c > +++ b/drivers/dax/dax.c > @@ -480,12 +480,21 @@ static int __dax_dev_pmd_fault(struct dax_dev *dax_dev, struct vm_fault *vmf) > return VM_FAULT_SIGBUS; > } > > + /* fall back if we are outside of the VMA */ > + if (pmd_addr < vmf->vma->vm_start) > + return VM_FAULT_FALLBACK; > + if ((pmd_addr + PMD_SIZE) > vmf->vma->vm_end) > + return VM_FAULT_FALLBACK; I think these are already handled by check_vma(), and we only want to fallback if dax_region->align < PMD_SIZE, otherwise we're violating device-dax's guaranteed fault granularity. > + > + if (dax_region->align < PMD_SIZE) > + return VM_FAULT_FALLBACK; > + > pgoff = linear_page_index(vmf->vma, pmd_addr); > phys = pgoff_to_phys(dax_dev, pgoff, PMD_SIZE); > if (phys == -1) { > dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__, > pgoff); > - return VM_FAULT_SIGBUS; > + return VM_FAULT_FALLBACK; If we get past all the checks above then this is a failure. > } > > pfn = phys_to_pfn_t(phys, dax_region->pfn_flags); > @@ -519,12 +528,21 @@ static int __dax_dev_pud_fault(struct dax_dev *dax_dev, struct vm_fault *vmf) > return VM_FAULT_SIGBUS; > } > > + /* fall back if we are outside of the VMA */ > + if (pud_addr < vmf->vma->vm_start) > + return VM_FAULT_FALLBACK; > + if ((pud_addr + PUD_SIZE) > vmf->vma->vm_end) > + return VM_FAULT_FALLBACK; > + > + if (dax_region->align < PUD_SIZE) > + return VM_FAULT_FALLBACK; > + > pgoff = linear_page_index(vmf->vma, pud_addr); > phys = pgoff_to_phys(dax_dev, pgoff, PUD_SIZE); > if (phys == -1) { > dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__, > pgoff); > - return VM_FAULT_SIGBUS; > + return VM_FAULT_FALLBACK; Same comments as above. The dax_region->align check should catch all the valid fallback cases and check_vma() should clean up everything else. > } > > pfn = phys_to_pfn_t(phys, dax_region->pfn_flags); >
On 03/09/2017 05:03 PM, Dan Williams wrote: > On Thu, Mar 9, 2017 at 3:56 PM, Dave Jiang <dave.jiang@intel.com> wrote: >> Jeff Moyer reports that: >> " >> With a device dax alignment of 4KB or 2MB, I get sigbus when running the >> attached fio job file for the current kernel (4.11.0-rc1+). If I >> specify an alignment of 1GB, it works. >> >> I turned on debug output, and saw that it was failing in the huge fault >> code. >> >> [ 4614.138357] dax dax1.0: dax_open >> [ 4614.154838] dax dax1.0: dax_mmap >> [ 4614.171898] dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 - 0x7f0ce0800000) >> [ 4614.211720] dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf600) failed >> [ 4614.568911] dax dax1.0: dax_release >> >> fio config for reproduce: >> [global] >> ioengine=dev-dax >> direct=0 >> filename=/dev/dax0.0 >> bs=2m >> >> [write] >> rw=write >> >> [read] >> stonewall >> rw=read >> " >> >> It looks like the code does not fallback at all when handling faults. The >> fault handlers are missing some boundary checking for the VMA. Also, we need >> to fall back if the alignment requested is less than the P-entry size. >> >> Reported-by: Jeff Moyer <jmoyer@redhat.com> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com> >> --- >> drivers/dax/dax.c | 22 ++++++++++++++++++++-- >> 1 file changed, 20 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c >> index 174690a..5c7998e 100644 >> --- a/drivers/dax/dax.c >> +++ b/drivers/dax/dax.c >> @@ -480,12 +480,21 @@ static int __dax_dev_pmd_fault(struct dax_dev *dax_dev, struct vm_fault *vmf) >> return VM_FAULT_SIGBUS; >> } >> >> + /* fall back if we are outside of the VMA */ >> + if (pmd_addr < vmf->vma->vm_start) >> + return VM_FAULT_FALLBACK; >> + if ((pmd_addr + PMD_SIZE) > vmf->vma->vm_end) >> + return VM_FAULT_FALLBACK; > > I think these are already handled by check_vma(), and we only want to > fallback if dax_region->align < PMD_SIZE, otherwise we're violating > device-dax's guaranteed fault granularity. I was seeing pud_addr < vmf->vma->vm_start in the pud handler. It doesn't look like check_vma() checks the boundaries. Also, it wouldn't know the pud_addr or pmd_addr. Or are you saying it doesn't matter? > >> + >> + if (dax_region->align < PMD_SIZE) >> + return VM_FAULT_FALLBACK; >> + >> pgoff = linear_page_index(vmf->vma, pmd_addr); >> phys = pgoff_to_phys(dax_dev, pgoff, PMD_SIZE); >> if (phys == -1) { >> dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__, >> pgoff); >> - return VM_FAULT_SIGBUS; >> + return VM_FAULT_FALLBACK; > > If we get past all the checks above then this is a failure. > >> } >> >> pfn = phys_to_pfn_t(phys, dax_region->pfn_flags); >> @@ -519,12 +528,21 @@ static int __dax_dev_pud_fault(struct dax_dev *dax_dev, struct vm_fault *vmf) >> return VM_FAULT_SIGBUS; >> } >> >> + /* fall back if we are outside of the VMA */ >> + if (pud_addr < vmf->vma->vm_start) >> + return VM_FAULT_FALLBACK; >> + if ((pud_addr + PUD_SIZE) > vmf->vma->vm_end) >> + return VM_FAULT_FALLBACK; >> + >> + if (dax_region->align < PUD_SIZE) >> + return VM_FAULT_FALLBACK; >> + >> pgoff = linear_page_index(vmf->vma, pud_addr); >> phys = pgoff_to_phys(dax_dev, pgoff, PUD_SIZE); >> if (phys == -1) { >> dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__, >> pgoff); >> - return VM_FAULT_SIGBUS; >> + return VM_FAULT_FALLBACK; > > Same comments as above. The dax_region->align check should catch all > the valid fallback cases and check_vma() should clean up everything > else. > >> } >> >> pfn = phys_to_pfn_t(phys, dax_region->pfn_flags); >>
On Thu, Mar 9, 2017 at 4:11 PM, Dave Jiang <dave.jiang@intel.com> wrote: > > > On 03/09/2017 05:03 PM, Dan Williams wrote: >> On Thu, Mar 9, 2017 at 3:56 PM, Dave Jiang <dave.jiang@intel.com> wrote: >>> Jeff Moyer reports that: >>> " >>> With a device dax alignment of 4KB or 2MB, I get sigbus when running the >>> attached fio job file for the current kernel (4.11.0-rc1+). If I >>> specify an alignment of 1GB, it works. >>> >>> I turned on debug output, and saw that it was failing in the huge fault >>> code. >>> >>> [ 4614.138357] dax dax1.0: dax_open >>> [ 4614.154838] dax dax1.0: dax_mmap >>> [ 4614.171898] dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 - 0x7f0ce0800000) >>> [ 4614.211720] dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf600) failed >>> [ 4614.568911] dax dax1.0: dax_release >>> >>> fio config for reproduce: >>> [global] >>> ioengine=dev-dax >>> direct=0 >>> filename=/dev/dax0.0 >>> bs=2m >>> >>> [write] >>> rw=write >>> >>> [read] >>> stonewall >>> rw=read >>> " >>> >>> It looks like the code does not fallback at all when handling faults. The >>> fault handlers are missing some boundary checking for the VMA. Also, we need >>> to fall back if the alignment requested is less than the P-entry size. >>> >>> Reported-by: Jeff Moyer <jmoyer@redhat.com> >>> Signed-off-by: Dave Jiang <dave.jiang@intel.com> >>> --- >>> drivers/dax/dax.c | 22 ++++++++++++++++++++-- >>> 1 file changed, 20 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c >>> index 174690a..5c7998e 100644 >>> --- a/drivers/dax/dax.c >>> +++ b/drivers/dax/dax.c >>> @@ -480,12 +480,21 @@ static int __dax_dev_pmd_fault(struct dax_dev *dax_dev, struct vm_fault *vmf) >>> return VM_FAULT_SIGBUS; >>> } >>> >>> + /* fall back if we are outside of the VMA */ >>> + if (pmd_addr < vmf->vma->vm_start) >>> + return VM_FAULT_FALLBACK; >>> + if ((pmd_addr + PMD_SIZE) > vmf->vma->vm_end) >>> + return VM_FAULT_FALLBACK; >> >> I think these are already handled by check_vma(), and we only want to >> fallback if dax_region->align < PMD_SIZE, otherwise we're violating >> device-dax's guaranteed fault granularity. > > I was seeing pud_addr < vmf->vma->vm_start in the pud handler. It > doesn't look like check_vma() checks the boundaries. Sorry, yes, we do need to do the vma boundary checks in __dax_dev_{pmd,pud}_fault() since check_vma() does not know the fault size. We just can't fallback blindly. > Also, it wouldn't > know the pud_addr or pmd_addr. Or are you saying it doesn't matter? If the fault does not align to the vma we need to fall back, but only if the current fault size is > dax_region->align and the fallback fault size is >= dax_region->align.
diff --git a/drivers/dax/dax.c b/drivers/dax/dax.c index 174690a..5c7998e 100644 --- a/drivers/dax/dax.c +++ b/drivers/dax/dax.c @@ -480,12 +480,21 @@ static int __dax_dev_pmd_fault(struct dax_dev *dax_dev, struct vm_fault *vmf) return VM_FAULT_SIGBUS; } + /* fall back if we are outside of the VMA */ + if (pmd_addr < vmf->vma->vm_start) + return VM_FAULT_FALLBACK; + if ((pmd_addr + PMD_SIZE) > vmf->vma->vm_end) + return VM_FAULT_FALLBACK; + + if (dax_region->align < PMD_SIZE) + return VM_FAULT_FALLBACK; + pgoff = linear_page_index(vmf->vma, pmd_addr); phys = pgoff_to_phys(dax_dev, pgoff, PMD_SIZE); if (phys == -1) { dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__, pgoff); - return VM_FAULT_SIGBUS; + return VM_FAULT_FALLBACK; } pfn = phys_to_pfn_t(phys, dax_region->pfn_flags); @@ -519,12 +528,21 @@ static int __dax_dev_pud_fault(struct dax_dev *dax_dev, struct vm_fault *vmf) return VM_FAULT_SIGBUS; } + /* fall back if we are outside of the VMA */ + if (pud_addr < vmf->vma->vm_start) + return VM_FAULT_FALLBACK; + if ((pud_addr + PUD_SIZE) > vmf->vma->vm_end) + return VM_FAULT_FALLBACK; + + if (dax_region->align < PUD_SIZE) + return VM_FAULT_FALLBACK; + pgoff = linear_page_index(vmf->vma, pud_addr); phys = pgoff_to_phys(dax_dev, pgoff, PUD_SIZE); if (phys == -1) { dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__, pgoff); - return VM_FAULT_SIGBUS; + return VM_FAULT_FALLBACK; } pfn = phys_to_pfn_t(phys, dax_region->pfn_flags);
Jeff Moyer reports that: " With a device dax alignment of 4KB or 2MB, I get sigbus when running the attached fio job file for the current kernel (4.11.0-rc1+). If I specify an alignment of 1GB, it works. I turned on debug output, and saw that it was failing in the huge fault code. [ 4614.138357] dax dax1.0: dax_open [ 4614.154838] dax dax1.0: dax_mmap [ 4614.171898] dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 - 0x7f0ce0800000) [ 4614.211720] dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf600) failed [ 4614.568911] dax dax1.0: dax_release fio config for reproduce: [global] ioengine=dev-dax direct=0 filename=/dev/dax0.0 bs=2m [write] rw=write [read] stonewall rw=read " It looks like the code does not fallback at all when handling faults. The fault handlers are missing some boundary checking for the VMA. Also, we need to fall back if the alignment requested is less than the P-entry size. Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- drivers/dax/dax.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)