Message ID | 20240812-uffd-thp-flip-fix-v1-1-4fc1db7ccdd0@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | userfaultfd: fix races around pmd_trans_huge() check | expand |
Hi Jann, On 2024/8/13 00:42, Jann Horn wrote: > The following race can occur: > > mfill_atomic other thread > ============ ============ > <zap PMD> > pmdp_get_lockless() [reads none pmd] > <bail if trans_huge> > <if none:> > <pagefault creates transhuge zeropage> > __pte_alloc [no-op] > <zap PMD> > <bail if pmd_trans_huge(*dst_pmd)> > BUG_ON(pmd_none(*dst_pmd)) > > I have experimentally verified this in a kernel with extra mdelay() calls; > the BUG_ON(pmd_none(*dst_pmd)) triggers. > > On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow > pte_offset_map[_lock]() to fail"), this can't lead to anything worse than > a BUG_ON(), since the page table access helpers are actually designed to > deal with page tables concurrently disappearing; but on older kernels > (<=6.4), I think we could probably theoretically race past the two BUG_ON() > checks and end up treating a hugepage as a page table. > > Cc: stable@vger.kernel.org > Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation") > Signed-off-by: Jann Horn <jannh@google.com> > --- > mm/userfaultfd.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index e54e5c8907fa..ec3750467aa5 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -801,7 +801,8 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, > break; > } > /* If an huge pmd materialized from under us fail */ > - if (unlikely(pmd_trans_huge(*dst_pmd))) { > + dst_pmdval = pmdp_get_lockless(dst_pmd); > + if (unlikely(pmd_none(dst_pmdval) || pmd_trans_huge(dst_pmdval))) { Before commit 0d940a9b270b, should we also check for is_pmd_migration_entry(), pmd_devmap() and pmd_bad() here? Thanks, Qi > err = -EFAULT; > break; > } >
On 12.08.24 18:42, Jann Horn wrote: > The following race can occur: > > mfill_atomic other thread > ============ ============ > <zap PMD> > pmdp_get_lockless() [reads none pmd] > <bail if trans_huge> > <if none:> > <pagefault creates transhuge zeropage> > __pte_alloc [no-op] > <zap PMD> > <bail if pmd_trans_huge(*dst_pmd)> > BUG_ON(pmd_none(*dst_pmd)) > > I have experimentally verified this in a kernel with extra mdelay() calls; > the BUG_ON(pmd_none(*dst_pmd)) triggers. > > On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow > pte_offset_map[_lock]() to fail"), this can't lead to anything worse than > a BUG_ON(), since the page table access helpers are actually designed to > deal with page tables concurrently disappearing; but on older kernels > (<=6.4), I think we could probably theoretically race past the two BUG_ON() > checks and end up treating a hugepage as a page table. > > Cc: stable@vger.kernel.org > Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation") > Signed-off-by: Jann Horn <jannh@google.com> > --- Acked-by: David Hildenbrand <david@redhat.com>
On Tue, Aug 13, 2024 at 8:19 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote: > > Hi Jann, > > On 2024/8/13 00:42, Jann Horn wrote: > > The following race can occur: > > > > mfill_atomic other thread > > ============ ============ > > <zap PMD> > > pmdp_get_lockless() [reads none pmd] > > <bail if trans_huge> > > <if none:> > > <pagefault creates transhuge zeropage> > > __pte_alloc [no-op] > > <zap PMD> > > <bail if pmd_trans_huge(*dst_pmd)> > > BUG_ON(pmd_none(*dst_pmd)) > > > > I have experimentally verified this in a kernel with extra mdelay() calls; > > the BUG_ON(pmd_none(*dst_pmd)) triggers. > > > > On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow > > pte_offset_map[_lock]() to fail"), this can't lead to anything worse than > > a BUG_ON(), since the page table access helpers are actually designed to > > deal with page tables concurrently disappearing; but on older kernels > > (<=6.4), I think we could probably theoretically race past the two BUG_ON() > > checks and end up treating a hugepage as a page table. > > > > Cc: stable@vger.kernel.org > > Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation") > > Signed-off-by: Jann Horn <jannh@google.com> > > --- > > mm/userfaultfd.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > index e54e5c8907fa..ec3750467aa5 100644 > > --- a/mm/userfaultfd.c > > +++ b/mm/userfaultfd.c > > @@ -801,7 +801,8 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, > > break; > > } > > /* If an huge pmd materialized from under us fail */ > > - if (unlikely(pmd_trans_huge(*dst_pmd))) { > > + dst_pmdval = pmdp_get_lockless(dst_pmd); > > + if (unlikely(pmd_none(dst_pmdval) || pmd_trans_huge(dst_pmdval))) { > > Before commit 0d940a9b270b, should we also check for > is_pmd_migration_entry(), pmd_devmap() and pmd_bad() here? Oooh. I think you're right that this check is insufficient, thanks for spotting that. I think I should probably change the check to something like this? if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) || pmd_devmap(dst_pmdval) || pmd_bad(dst_pmdval))) { !pmd_present() implies !is_pmd_migration_entry(). And the pmd_bad() at the end shouldn't be necessary if everything is working right, I'm just tacking it on to be safe. I'll send a v2 with this change soon. (Alternatively, pmd_leaf() might be useful here, but then we'd have to figure an alternate way of doing this for the backport.)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e54e5c8907fa..ec3750467aa5 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -801,7 +801,8 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, break; } /* If an huge pmd materialized from under us fail */ - if (unlikely(pmd_trans_huge(*dst_pmd))) { + dst_pmdval = pmdp_get_lockless(dst_pmd); + if (unlikely(pmd_none(dst_pmdval) || pmd_trans_huge(dst_pmdval))) { err = -EFAULT; break; }
The following race can occur: mfill_atomic other thread ============ ============ <zap PMD> pmdp_get_lockless() [reads none pmd] <bail if trans_huge> <if none:> <pagefault creates transhuge zeropage> __pte_alloc [no-op] <zap PMD> <bail if pmd_trans_huge(*dst_pmd)> BUG_ON(pmd_none(*dst_pmd)) I have experimentally verified this in a kernel with extra mdelay() calls; the BUG_ON(pmd_none(*dst_pmd)) triggers. On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow pte_offset_map[_lock]() to fail"), this can't lead to anything worse than a BUG_ON(), since the page table access helpers are actually designed to deal with page tables concurrently disappearing; but on older kernels (<=6.4), I think we could probably theoretically race past the two BUG_ON() checks and end up treating a hugepage as a page table. Cc: stable@vger.kernel.org Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation") Signed-off-by: Jann Horn <jannh@google.com> --- mm/userfaultfd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)