diff mbox series

userfaultfd: fix a race between writeprotect and exit_mmap()

Message ID 20210921200247.25749-1-namit@vmware.com (mailing list archive)
State New
Headers show
Series userfaultfd: fix a race between writeprotect and exit_mmap() | expand

Commit Message

Nadav Amit Sept. 21, 2021, 8:02 p.m. UTC
From: Nadav Amit <namit@vmware.com>

A race is possible when a process exits, its VMAs are removed
by exit_mmap() and at the same time userfaultfd_writeprotect() is
called.

The race was detected by KASAN on a development kernel, but it appears
to be possible on vanilla kernels as well.

Use mmget_not_zero() to prevent the race as done in other userfaultfd
operations.

Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 63b2d4174c4ad ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl")
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 fs/userfaultfd.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Li Wang Sept. 22, 2021, 9:06 a.m. UTC | #1
Hi,

I confirmed this patch (applied on 5.14) gets rid of the below userfaultfd
test failure.

# ./userfaultfd anon 16 2
nr_pages: 4096, nr_pages_per_cpu: 256
bounces: 1, mode: rnd read, userfaults: 313 missing
(51+34+37+26+41+28+15+20+16+12+13+7+10+2+0+1) 995 wp
(121+79+96+53+90+104+48+61+56+82+56+41+49+26+11+22)
bounces: 0, mode: read, userfaults: 64 missing
(15+8+10+6+5+2+4+3+3+1+4+0+0+2+0+1) 2157 wp
(223+274+189+141+116+132+203+153+143+126+110+114+101+66+42+24)
testing uffd-wp with pagemap (pgsize=4096): done
testing uffd-wp with pagemap (pgsize=2097152): done
testing UFFDIO_ZEROPAGE: done.
testing signal delivery: done.
testing events (fork, remap, remove): ERROR: nr 3933 memory corruption 0 1
 (errno=0, line=963)
ERROR: faulting process failed (errno=0, line=1117)


On Wed, Sep 22, 2021 at 11:34 AM Nadav Amit <nadav.amit@gmail.com> wrote:

> From: Nadav Amit <namit@vmware.com>
>
> A race is possible when a process exits, its VMAs are removed
> by exit_mmap() and at the same time userfaultfd_writeprotect() is
> called.
>
> The race was detected by KASAN on a development kernel, but it appears
> to be possible on vanilla kernels as well.
>
> Use mmget_not_zero() to prevent the race as done in other userfaultfd
> operations.
>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: stable@vger.kernel.org
> Fixes: 63b2d4174c4ad ("userfaultfd: wp: add the writeprotect API to
> userfaultfd ioctl")
> Signed-off-by: Nadav Amit <namit@vmware.com>
>

Tested-by: Li Wang <liwang@redhat.com>



> ---
>  fs/userfaultfd.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 003f0d31743e..22bf14ab2d16 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1827,9 +1827,15 @@ static int userfaultfd_writeprotect(struct
> userfaultfd_ctx *ctx,
>         if (mode_wp && mode_dontwake)
>                 return -EINVAL;
>
> -       ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start,
> -                                 uffdio_wp.range.len, mode_wp,
> -                                 &ctx->mmap_changing);
> +       if (mmget_not_zero(ctx->mm)) {
> +               ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start,
> +                                         uffdio_wp.range.len, mode_wp,
> +                                         &ctx->mmap_changing);
> +               mmput(ctx->mm);
> +       } else {
> +               return -ESRCH;
> +       }
> +
>         if (ret)
>                 return ret;
>
> --
> 2.25.1
>
>
>
Peter Xu Sept. 22, 2021, 2:30 p.m. UTC | #2
On Tue, Sep 21, 2021 at 01:02:47PM -0700, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> A race is possible when a process exits, its VMAs are removed
> by exit_mmap() and at the same time userfaultfd_writeprotect() is
> called.
> 
> The race was detected by KASAN on a development kernel, but it appears
> to be possible on vanilla kernels as well.
> 
> Use mmget_not_zero() to prevent the race as done in other userfaultfd
> operations.
> 
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: stable@vger.kernel.org
> Fixes: 63b2d4174c4ad ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl")
> Signed-off-by: Nadav Amit <namit@vmware.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks!
Peter Xu Sept. 24, 2021, 12:16 a.m. UTC | #3
On Wed, Sep 22, 2021 at 05:06:53PM +0800, Li Wang wrote:
> Hi,

Li,

> 
> I confirmed this patch (applied on 5.14) gets rid of the below userfaultfd
> test failure.
> 
> # ./userfaultfd anon 16 2
> nr_pages: 4096, nr_pages_per_cpu: 256
> bounces: 1, mode: rnd read, userfaults: 313 missing
> (51+34+37+26+41+28+15+20+16+12+13+7+10+2+0+1) 995 wp
> (121+79+96+53+90+104+48+61+56+82+56+41+49+26+11+22)
> bounces: 0, mode: read, userfaults: 64 missing
> (15+8+10+6+5+2+4+3+3+1+4+0+0+2+0+1) 2157 wp
> (223+274+189+141+116+132+203+153+143+126+110+114+101+66+42+24)
> testing uffd-wp with pagemap (pgsize=4096): done
> testing uffd-wp with pagemap (pgsize=2097152): done
> testing UFFDIO_ZEROPAGE: done.
> testing signal delivery: done.
> testing events (fork, remap, remove): ERROR: nr 3933 memory corruption 0 1
>  (errno=0, line=963)
> ERROR: faulting process failed (errno=0, line=1117)

Just to keep a record within this thread - my understanding is above issue is a
separate issue from what Nadav has fixed.  The other fix could be:

  https://lore.kernel.org/lkml/20210923232512.210092-1-peterx@redhat.com/

When verify with Nadav's patch, please check whether you have thp enabled
globally:

  # echo always > /sys/kernel/mm/transparent_hugepage/enabled

Thanks,
diff mbox series

Patch

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 003f0d31743e..22bf14ab2d16 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1827,9 +1827,15 @@  static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx,
 	if (mode_wp && mode_dontwake)
 		return -EINVAL;
 
-	ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start,
-				  uffdio_wp.range.len, mode_wp,
-				  &ctx->mmap_changing);
+	if (mmget_not_zero(ctx->mm)) {
+		ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start,
+					  uffdio_wp.range.len, mode_wp,
+					  &ctx->mmap_changing);
+		mmput(ctx->mm);
+	} else {
+		return -ESRCH;
+	}
+
 	if (ret)
 		return ret;