diff mbox series

[v4,2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled

Message ID 20210218231202.15426-1-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series hugetlb: Disable huge pmd unshare for uffd-wp | expand

Commit Message

Peter Xu Feb. 18, 2021, 11:12 p.m. UTC
Huge pmd sharing could bring problem to userfaultfd.  The thing is that
userfaultfd is running its logic based on the special bits on page table
entries, however the huge pmd sharing could potentially share page table
entries for different address ranges.  That could cause issues on either:

  - When sharing huge pmd page tables for an uffd write protected range, the
    newly mapped huge pmd range will also be write protected unexpectedly, or,

  - When we try to write protect a range of huge pmd shared range, we'll first
    do huge_pmd_unshare() in hugetlb_change_protection(), however that also
    means the UFFDIO_WRITEPROTECT could be silently skipped for the shared
    region, which could lead to data loss.

Since at it, a few other things are done altogether:

  - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because
    that's definitely something that arch code would like to use too

  - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when
    trying to share huge pmd.  Switch to the want_pmd_share() helper.

Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share().

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/arm64/mm/hugetlbpage.c   |  3 +--
 include/linux/hugetlb.h       |  2 ++
 include/linux/userfaultfd_k.h |  9 +++++++++
 mm/hugetlb.c                  | 20 ++++++++++++++------
 4 files changed, 26 insertions(+), 8 deletions(-)

Comments

Naresh Kamboju March 10, 2021, 7:48 a.m. UTC | #1
Hi Peter,

On Fri, 19 Feb 2021 at 04:43, Peter Xu <peterx@redhat.com> wrote:
>
> Huge pmd sharing could bring problem to userfaultfd.  The thing is that
> userfaultfd is running its logic based on the special bits on page table
> entries, however the huge pmd sharing could potentially share page table
> entries for different address ranges.  That could cause issues on either:
>
>   - When sharing huge pmd page tables for an uffd write protected range, the
>     newly mapped huge pmd range will also be write protected unexpectedly, or,
>
>   - When we try to write protect a range of huge pmd shared range, we'll first
>     do huge_pmd_unshare() in hugetlb_change_protection(), however that also
>     means the UFFDIO_WRITEPROTECT could be silently skipped for the shared
>     region, which could lead to data loss.
>
> Since at it, a few other things are done altogether:
>
>   - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because
>     that's definitely something that arch code would like to use too
>
>   - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when
>     trying to share huge pmd.  Switch to the want_pmd_share() helper.
>
> Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share().
>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  arch/arm64/mm/hugetlbpage.c   |  3 +--
>  include/linux/hugetlb.h       |  2 ++
>  include/linux/userfaultfd_k.h |  9 +++++++++
>  mm/hugetlb.c                  | 20 ++++++++++++++------
>  4 files changed, 26 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 6e3bcffe2837..58987a98e179 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -284,8 +284,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
>                  */
>                 ptep = pte_alloc_map(mm, pmdp, addr);
>         } else if (sz == PMD_SIZE) {
> -               if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) &&
> -                   pud_none(READ_ONCE(*pudp)))
> +               if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp)))

While building Linux next 20210310 tag for arm64 architecture with

  - CONFIG_ARM64_64K_PAGES=y

enabled the build failed due to below errors / warnings

make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=arm64
CROSS_COMPILE=aarch64-linux-gnu- 'CC=sccache aarch64-linux-gnu-gcc'
'HOSTCC=sccache gcc'
aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected!
aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected!
aarch64-linux-gnu-ld: arch/arm64/mm/hugetlbpage.o: in function `huge_pte_alloc':
hugetlbpage.c:(.text+0x7d8): undefined reference to `want_pmd_share'

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

Steps to reproduce:
----------------------------
# TuxMake is a command line tool and Python library that provides
# portable and repeatable Linux kernel builds across a variety of
# architectures, toolchains, kernel configurations, and make targets.
#
# TuxMake supports the concept of runtimes.
# See https://docs.tuxmake.org/runtimes/, for that to work it requires
# that you install podman or docker on your system.
#
# To install tuxmake on your system globally:
# sudo pip3 install -U tuxmake
#
# See https://docs.tuxmake.org/ for complete documentation.


tuxmake --runtime podman --target-arch arm64 --toolchain gcc-9
--kconfig defconfig --kconfig-add
https://builds.tuxbuild.com/1pYCSoc1oGtPWlPgLAJxbHx07kL/config

Build link,
https://builds.tuxbuild.com/1pYCSoc1oGtPWlPgLAJxbHx07kL/
Peter Xu March 10, 2021, 4:57 p.m. UTC | #2
On Wed, Mar 10, 2021 at 01:18:42PM +0530, Naresh Kamboju wrote:
> Hi Peter,

Hi, Naresh,

> 
> On Fri, 19 Feb 2021 at 04:43, Peter Xu <peterx@redhat.com> wrote:
> >
> > Huge pmd sharing could bring problem to userfaultfd.  The thing is that
> > userfaultfd is running its logic based on the special bits on page table
> > entries, however the huge pmd sharing could potentially share page table
> > entries for different address ranges.  That could cause issues on either:
> >
> >   - When sharing huge pmd page tables for an uffd write protected range, the
> >     newly mapped huge pmd range will also be write protected unexpectedly, or,
> >
> >   - When we try to write protect a range of huge pmd shared range, we'll first
> >     do huge_pmd_unshare() in hugetlb_change_protection(), however that also
> >     means the UFFDIO_WRITEPROTECT could be silently skipped for the shared
> >     region, which could lead to data loss.
> >
> > Since at it, a few other things are done altogether:
> >
> >   - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because
> >     that's definitely something that arch code would like to use too
> >
> >   - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when
> >     trying to share huge pmd.  Switch to the want_pmd_share() helper.
> >
> > Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share().
> >
> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  arch/arm64/mm/hugetlbpage.c   |  3 +--
> >  include/linux/hugetlb.h       |  2 ++
> >  include/linux/userfaultfd_k.h |  9 +++++++++
> >  mm/hugetlb.c                  | 20 ++++++++++++++------
> >  4 files changed, 26 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > index 6e3bcffe2837..58987a98e179 100644
> > --- a/arch/arm64/mm/hugetlbpage.c
> > +++ b/arch/arm64/mm/hugetlbpage.c
> > @@ -284,8 +284,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
> >                  */
> >                 ptep = pte_alloc_map(mm, pmdp, addr);
> >         } else if (sz == PMD_SIZE) {
> > -               if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) &&
> > -                   pud_none(READ_ONCE(*pudp)))
> > +               if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp)))
> 
> While building Linux next 20210310 tag for arm64 architecture with
> 
>   - CONFIG_ARM64_64K_PAGES=y
> 
> enabled the build failed due to below errors / warnings
> 
> make --silent --keep-going --jobs=8
> O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=arm64
> CROSS_COMPILE=aarch64-linux-gnu- 'CC=sccache aarch64-linux-gnu-gcc'
> 'HOSTCC=sccache gcc'
> aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected!
> aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected!
> aarch64-linux-gnu-ld: arch/arm64/mm/hugetlbpage.o: in function `huge_pte_alloc':
> hugetlbpage.c:(.text+0x7d8): undefined reference to `want_pmd_share'
> 
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

Sorry for the issue & thanks for the report.  Would you please check whether
the patch attached could fix the issue?
Naresh Kamboju March 10, 2021, 6:09 p.m. UTC | #3
On Wed, 10 Mar 2021 at 22:27, Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Mar 10, 2021 at 01:18:42PM +0530, Naresh Kamboju wrote:
> > Hi Peter,
>
> Hi, Naresh,
>
> >
> > On Fri, 19 Feb 2021 at 04:43, Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Huge pmd sharing could bring problem to userfaultfd.  The thing is that
> > > userfaultfd is running its logic based on the special bits on page table
> > > entries, however the huge pmd sharing could potentially share page table
> > > entries for different address ranges.  That could cause issues on either:
> > >
> > >   - When sharing huge pmd page tables for an uffd write protected range, the
> > >     newly mapped huge pmd range will also be write protected unexpectedly, or,
> > >
> > >   - When we try to write protect a range of huge pmd shared range, we'll first
> > >     do huge_pmd_unshare() in hugetlb_change_protection(), however that also
> > >     means the UFFDIO_WRITEPROTECT could be silently skipped for the shared
> > >     region, which could lead to data loss.
> > >
> > > Since at it, a few other things are done altogether:
> > >
> > >   - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because
> > >     that's definitely something that arch code would like to use too
> > >
> > >   - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when
> > >     trying to share huge pmd.  Switch to the want_pmd_share() helper.
> > >
> > > Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share().
> > >
> > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > > Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  arch/arm64/mm/hugetlbpage.c   |  3 +--
> > >  include/linux/hugetlb.h       |  2 ++
> > >  include/linux/userfaultfd_k.h |  9 +++++++++
> > >  mm/hugetlb.c                  | 20 ++++++++++++++------
> > >  4 files changed, 26 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > > index 6e3bcffe2837..58987a98e179 100644
> > > --- a/arch/arm64/mm/hugetlbpage.c
> > > +++ b/arch/arm64/mm/hugetlbpage.c
> > > @@ -284,8 +284,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
> > >                  */
> > >                 ptep = pte_alloc_map(mm, pmdp, addr);
> > >         } else if (sz == PMD_SIZE) {
> > > -               if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) &&
> > > -                   pud_none(READ_ONCE(*pudp)))
> > > +               if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp)))
> >
> > While building Linux next 20210310 tag for arm64 architecture with
> >
> >   - CONFIG_ARM64_64K_PAGES=y
> >
> > enabled the build failed due to below errors / warnings
> >
> > make --silent --keep-going --jobs=8
> > O=/home/tuxbuild/.cache/tuxmake/builds/1/tmp ARCH=arm64
> > CROSS_COMPILE=aarch64-linux-gnu- 'CC=sccache aarch64-linux-gnu-gcc'
> > 'HOSTCC=sccache gcc'
> > aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected!
> > aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected!
> > aarch64-linux-gnu-ld: arch/arm64/mm/hugetlbpage.o: in function `huge_pte_alloc':
> > hugetlbpage.c:(.text+0x7d8): undefined reference to `want_pmd_share'
> >
> > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
>
> Sorry for the issue & thanks for the report.  Would you please check whether
> the patch attached could fix the issue?

The attached patch build tested and build pass for arm64
including 64k pages config.

 CONFIG_ARM64_64K_PAGES=y

- Naresh
diff mbox series

Patch

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 6e3bcffe2837..58987a98e179 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -284,8 +284,7 @@  pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		ptep = pte_alloc_map(mm, pmdp, addr);
 	} else if (sz == PMD_SIZE) {
-		if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) &&
-		    pud_none(READ_ONCE(*pudp)))
+		if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp)))
 			ptep = huge_pmd_share(mm, vma, addr, pudp);
 		else
 			ptep = (pte_t *)pmd_alloc(mm, pudp, addr);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a6113fa6d21d..bc86f2f516e7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -950,4 +950,6 @@  static inline __init void hugetlb_cma_check(void)
 }
 #endif
 
+bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr);
+
 #endif /* _LINUX_HUGETLB_H */
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index a8e5f3ea9bb2..c63ccdae3eab 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -52,6 +52,15 @@  static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
 	return vma->vm_userfaultfd_ctx.ctx == vm_ctx.ctx;
 }
 
+/*
+ * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp
+ * protect information is per pgtable entry.
+ */
+static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma)
+{
+	return vma->vm_flags & VM_UFFD_WP;
+}
+
 static inline bool userfaultfd_missing(struct vm_area_struct *vma)
 {
 	return vma->vm_flags & VM_UFFD_MISSING;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 07bb9bdc3282..8e8e2f3dfe06 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5292,6 +5292,18 @@  static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr)
 	return false;
 }
 
+bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr)
+{
+#ifndef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
+	return false;
+#endif
+#ifdef CONFIG_USERFAULTFD
+	if (uffd_disable_huge_pmd_share(vma))
+		return false;
+#endif
+	return vma_shareable(vma, addr);
+}
+
 /*
  * Determine if start,end range within vma could be mapped by shared pmd.
  * If yes, adjust start and end to cover range associated with possible
@@ -5346,9 +5358,6 @@  pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
 	pte_t *pte;
 	spinlock_t *ptl;
 
-	if (!vma_shareable(vma, addr))
-		return (pte_t *)pmd_alloc(mm, pud, addr);
-
 	i_mmap_assert_locked(mapping);
 	vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
 		if (svma == vma)
@@ -5412,7 +5421,7 @@  int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
 	*addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE;
 	return 1;
 }
-#define want_pmd_share()	(1)
+
 #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */
 pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma,
 		      unsigned long addr, pud_t *pud)
@@ -5430,7 +5439,6 @@  void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
 				unsigned long *start, unsigned long *end)
 {
 }
-#define want_pmd_share()	(0)
 #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */
 
 #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB
@@ -5452,7 +5460,7 @@  pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			pte = (pte_t *)pud;
 		} else {
 			BUG_ON(sz != PMD_SIZE);
-			if (want_pmd_share() && pud_none(*pud))
+			if (want_pmd_share(vma, addr) && pud_none(*pud))
 				pte = huge_pmd_share(mm, vma, addr, pud);
 			else
 				pte = (pte_t *)pmd_alloc(mm, pud, addr);