Message ID | 20230324142620.2344140-1-peterx@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] mm/hugetlb: Fix uffd wr-protection for CoW optimization path | expand |
On 3/24/23 7:26 PM, Peter Xu wrote: > This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be > writable even with uffd-wp bit set. It only happens with hugetlb private > mappings, when someone firstly wr-protects a missing pte (which will > install a pte marker), then a write to the same page without any prior > access to the page. > > Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before > reaching hugetlb_wp() to avoid taking more locks that userfault won't need. > However there's one CoW optimization path that can trigger hugetlb_wp() > inside hugetlb_no_page(), which will bypass the trap. > > This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit > is detected. The new path will only trigger in the CoW optimization path > because generic hugetlb_fault() (e.g. when a present pte was wr-protected) > will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE > won't be affected and can still be resolved, IOW only skip CoW not CoR. > > This patch will be needed for v5.19+ hence copy stable. > > Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> > Cc: linux-stable <stable@vger.kernel.org> > Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") > Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> > --- > > Notes: > > v2 is not on the list but in an attachment in the reply; this v3 is mostly > to make sure it's not the same as the patch used to be attached. Sorry > Andrew, we need to drop the queued one as I rewrote the commit message. > > Muhammad, I didn't attach your T-b because of the slight functional change. > Please feel free to re-attach if it still works for you (which I believe > should). Thank you for the fix. > > thanks, > --- > mm/hugetlb.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8bfd07f4c143..a58b3739ed4b 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > struct folio *pagecache_folio, spinlock_t *ptl) > { > const bool unshare = flags & FAULT_FLAG_UNSHARE; > - pte_t pte; > + pte_t pte = huge_ptep_get(ptep); > struct hstate *h = hstate_vma(vma); > struct page *old_page; > struct folio *new_folio; > @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long haddr = address & huge_page_mask(h); > struct mmu_notifier_range range; > > + /* > + * Never handle CoW for uffd-wp protected pages. It should be only > + * handled when the uffd-wp protection is removed. > + * > + * Note that only the CoW optimization path (in hugetlb_no_page()) > + * can trigger this, because hugetlb_fault() will always resolve > + * uffd-wp bit first. > + */ > + if (!unshare && huge_pte_uffd_wp(pte)) > + return 0; > + > /* > * hugetlb does not support FOLL_FORCE-style write faults that keep the > * PTE mapped R/O such as maybe_mkwrite() would do. > @@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > return 0; > } > > - pte = huge_ptep_get(ptep); > old_page = pte_page(pte); > > delayacct_wpcopy_start();
On 03/24/23 10:26, Peter Xu wrote: > This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be > writable even with uffd-wp bit set. It only happens with hugetlb private > mappings, when someone firstly wr-protects a missing pte (which will > install a pte marker), then a write to the same page without any prior > access to the page. > > Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before > reaching hugetlb_wp() to avoid taking more locks that userfault won't need. > However there's one CoW optimization path that can trigger hugetlb_wp() > inside hugetlb_no_page(), which will bypass the trap. > > This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit > is detected. The new path will only trigger in the CoW optimization path > because generic hugetlb_fault() (e.g. when a present pte was wr-protected) > will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE > won't be affected and can still be resolved, IOW only skip CoW not CoR. > > This patch will be needed for v5.19+ hence copy stable. > > Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> > Cc: linux-stable <stable@vger.kernel.org> > Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > > Notes: > > v2 is not on the list but in an attachment in the reply; this v3 is mostly > to make sure it's not the same as the patch used to be attached. Sorry > Andrew, we need to drop the queued one as I rewrote the commit message. My appologies! I saw the code path missed in v2 and assumed you did not think it applied. So, I said nothing. My bad! > Muhammad, I didn't attach your T-b because of the slight functional change. > Please feel free to re-attach if it still works for you (which I believe > should). > > thanks, > --- > mm/hugetlb.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8bfd07f4c143..a58b3739ed4b 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > struct folio *pagecache_folio, spinlock_t *ptl) > { > const bool unshare = flags & FAULT_FLAG_UNSHARE; > - pte_t pte; > + pte_t pte = huge_ptep_get(ptep); > struct hstate *h = hstate_vma(vma); > struct page *old_page; > struct folio *new_folio; > @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long haddr = address & huge_page_mask(h); > struct mmu_notifier_range range; > > + /* > + * Never handle CoW for uffd-wp protected pages. It should be only > + * handled when the uffd-wp protection is removed. > + * > + * Note that only the CoW optimization path (in hugetlb_no_page()) > + * can trigger this, because hugetlb_fault() will always resolve > + * uffd-wp bit first. > + */ > + if (!unshare && huge_pte_uffd_wp(pte)) > + return 0; This looks correct. However, since the previous version looked correct I must ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems we would need to possibly propogate that uffd_wp to the new pte as in v2 > + > /* > * hugetlb does not support FOLL_FORCE-style write faults that keep the > * PTE mapped R/O such as maybe_mkwrite() would do. > @@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > return 0; > } > > - pte = huge_ptep_get(ptep); > old_page = pte_page(pte); > > delayacct_wpcopy_start(); > -- > 2.39.1 >
On 24.03.23 23:27, Mike Kravetz wrote: > On 03/24/23 10:26, Peter Xu wrote: >> This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be >> writable even with uffd-wp bit set. It only happens with hugetlb private >> mappings, when someone firstly wr-protects a missing pte (which will >> install a pte marker), then a write to the same page without any prior >> access to the page. >> >> Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before >> reaching hugetlb_wp() to avoid taking more locks that userfault won't need. >> However there's one CoW optimization path that can trigger hugetlb_wp() >> inside hugetlb_no_page(), which will bypass the trap. >> >> This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit >> is detected. The new path will only trigger in the CoW optimization path >> because generic hugetlb_fault() (e.g. when a present pte was wr-protected) >> will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE >> won't be affected and can still be resolved, IOW only skip CoW not CoR. >> >> This patch will be needed for v5.19+ hence copy stable. >> >> Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >> Cc: linux-stable <stable@vger.kernel.org> >> Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") >> Signed-off-by: Peter Xu <peterx@redhat.com> >> --- >> >> Notes: >> >> v2 is not on the list but in an attachment in the reply; this v3 is mostly >> to make sure it's not the same as the patch used to be attached. Sorry >> Andrew, we need to drop the queued one as I rewrote the commit message. > > My appologies! I saw the code path missed in v2 and assumed you did not > think it applied. So, I said nothing. My bad! > >> Muhammad, I didn't attach your T-b because of the slight functional change. >> Please feel free to re-attach if it still works for you (which I believe >> should). >> >> thanks, >> --- >> mm/hugetlb.c | 14 ++++++++++++-- >> 1 file changed, 12 insertions(+), 2 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 8bfd07f4c143..a58b3739ed4b 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, >> struct folio *pagecache_folio, spinlock_t *ptl) >> { >> const bool unshare = flags & FAULT_FLAG_UNSHARE; >> - pte_t pte; >> + pte_t pte = huge_ptep_get(ptep); >> struct hstate *h = hstate_vma(vma); >> struct page *old_page; >> struct folio *new_folio; >> @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, >> unsigned long haddr = address & huge_page_mask(h); >> struct mmu_notifier_range range; >> >> + /* >> + * Never handle CoW for uffd-wp protected pages. It should be only >> + * handled when the uffd-wp protection is removed. >> + * >> + * Note that only the CoW optimization path (in hugetlb_no_page()) >> + * can trigger this, because hugetlb_fault() will always resolve >> + * uffd-wp bit first. >> + */ >> + if (!unshare && huge_pte_uffd_wp(pte)) >> + return 0; > > This looks correct. However, since the previous version looked correct I must > ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems > we would need to possibly propogate that uffd_wp to the new pte as in v2 We can. A reproducer would share an anon hugetlb page because parent and child. In the parent, we would uffd-wp that page. We could trigger unsharing by R/O-pinning that page.
On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote: > > > @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > > > unsigned long haddr = address & huge_page_mask(h); > > > struct mmu_notifier_range range; > > > + /* > > > + * Never handle CoW for uffd-wp protected pages. It should be only > > > + * handled when the uffd-wp protection is removed. > > > + * > > > + * Note that only the CoW optimization path (in hugetlb_no_page()) > > > + * can trigger this, because hugetlb_fault() will always resolve > > > + * uffd-wp bit first. > > > + */ > > > + if (!unshare && huge_pte_uffd_wp(pte)) > > > + return 0; > > > > This looks correct. However, since the previous version looked correct I must > > ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems > > we would need to possibly propogate that uffd_wp to the new pte as in v2 Good point, thanks for spotting! > > We can. A reproducer would share an anon hugetlb page because parent and > child. In the parent, we would uffd-wp that page. We could trigger unsharing > by R/O-pinning that page. Right. This seems to be a separate bug.. It should be triggered in totally different context and much harder due to rare use of RO pins, meanwhile used with userfault-wp. If both of you agree, I can prepare a separate patch for this bug, and I'll better prepare a reproducer/selftest with it.
On 03/26/23 10:46, Peter Xu wrote: > On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote: > > > > @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, > > > > unsigned long haddr = address & huge_page_mask(h); > > > > struct mmu_notifier_range range; > > > > + /* > > > > + * Never handle CoW for uffd-wp protected pages. It should be only > > > > + * handled when the uffd-wp protection is removed. > > > > + * > > > > + * Note that only the CoW optimization path (in hugetlb_no_page()) > > > > + * can trigger this, because hugetlb_fault() will always resolve > > > > + * uffd-wp bit first. > > > > + */ > > > > + if (!unshare && huge_pte_uffd_wp(pte)) > > > > + return 0; > > > > > > This looks correct. However, since the previous version looked correct I must > > > ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems > > > we would need to possibly propogate that uffd_wp to the new pte as in v2 > > Good point, thanks for spotting! > > > > > We can. A reproducer would share an anon hugetlb page because parent and > > child. In the parent, we would uffd-wp that page. We could trigger unsharing > > by R/O-pinning that page. > > Right. This seems to be a separate bug.. It should be triggered in > totally different context and much harder due to rare use of RO pins, > meanwhile used with userfault-wp. > > If both of you agree, I can prepare a separate patch for this bug, and I'll > better prepare a reproducer/selftest with it. > I am OK with separate patches, and agree that the R/O pinning case is less likely to happen. Since this patch addresses the issue found by Muhammad, Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
On 27.03.23 20:34, Mike Kravetz wrote: > On 03/26/23 10:46, Peter Xu wrote: >> On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote: >>>>> @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, >>>>> unsigned long haddr = address & huge_page_mask(h); >>>>> struct mmu_notifier_range range; >>>>> + /* >>>>> + * Never handle CoW for uffd-wp protected pages. It should be only >>>>> + * handled when the uffd-wp protection is removed. >>>>> + * >>>>> + * Note that only the CoW optimization path (in hugetlb_no_page()) >>>>> + * can trigger this, because hugetlb_fault() will always resolve >>>>> + * uffd-wp bit first. >>>>> + */ >>>>> + if (!unshare && huge_pte_uffd_wp(pte)) >>>>> + return 0; >>>> >>>> This looks correct. However, since the previous version looked correct I must >>>> ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems >>>> we would need to possibly propogate that uffd_wp to the new pte as in v2 >> >> Good point, thanks for spotting! >> >>> >>> We can. A reproducer would share an anon hugetlb page because parent and >>> child. In the parent, we would uffd-wp that page. We could trigger unsharing >>> by R/O-pinning that page. >> >> Right. This seems to be a separate bug.. It should be triggered in >> totally different context and much harder due to rare use of RO pins, >> meanwhile used with userfault-wp. >> >> If both of you agree, I can prepare a separate patch for this bug, and I'll >> better prepare a reproducer/selftest with it. >> > > I am OK with separate patches, and agree that the R/O pinning case is less > likely to happen. Yes, the combination should be rather rare and we can fix that separately. Ideally, we'd try to mimic the same uffd code flow in hugetlb cow/unshare handling that we use in memory.c > > Since this patch addresses the issue found by Muhammad, > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Hopefully we didn't forget about yet another case :D Acked-by: David Hildenbrand <david@redhat.com>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8bfd07f4c143..a58b3739ed4b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, struct folio *pagecache_folio, spinlock_t *ptl) { const bool unshare = flags & FAULT_FLAG_UNSHARE; - pte_t pte; + pte_t pte = huge_ptep_get(ptep); struct hstate *h = hstate_vma(vma); struct page *old_page; struct folio *new_folio; @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr = address & huge_page_mask(h); struct mmu_notifier_range range; + /* + * Never handle CoW for uffd-wp protected pages. It should be only + * handled when the uffd-wp protection is removed. + * + * Note that only the CoW optimization path (in hugetlb_no_page()) + * can trigger this, because hugetlb_fault() will always resolve + * uffd-wp bit first. + */ + if (!unshare && huge_pte_uffd_wp(pte)) + return 0; + /* * hugetlb does not support FOLL_FORCE-style write faults that keep the * PTE mapped R/O such as maybe_mkwrite() would do. @@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, return 0; } - pte = huge_ptep_get(ptep); old_page = pte_page(pte); delayacct_wpcopy_start();
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be writable even with uffd-wp bit set. It only happens with hugetlb private mappings, when someone firstly wr-protects a missing pte (which will install a pte marker), then a write to the same page without any prior access to the page. Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before reaching hugetlb_wp() to avoid taking more locks that userfault won't need. However there's one CoW optimization path that can trigger hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap. This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit is detected. The new path will only trigger in the CoW optimization path because generic hugetlb_fault() (e.g. when a present pte was wr-protected) will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE won't be affected and can still be resolved, IOW only skip CoW not CoR. This patch will be needed for v5.19+ hence copy stable. Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: linux-stable <stable@vger.kernel.org> Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") Signed-off-by: Peter Xu <peterx@redhat.com> --- Notes: v2 is not on the list but in an attachment in the reply; this v3 is mostly to make sure it's not the same as the patch used to be attached. Sorry Andrew, we need to drop the queued one as I rewrote the commit message. Muhammad, I didn't attach your T-b because of the slight functional change. Please feel free to re-attach if it still works for you (which I believe should). thanks, --- mm/hugetlb.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)