diff mbox series

[1/2] memfd: Fix COW issue on MAP_PRIVATE and F_SEAL_FUTURE_WRITE mappings

Message ID 20191107195355.80608-1-joel@joelfernandes.org (mailing list archive)
State New
Headers show
Series [1/2] memfd: Fix COW issue on MAP_PRIVATE and F_SEAL_FUTURE_WRITE mappings | expand

Commit Message

Joel Fernandes Nov. 7, 2019, 7:53 p.m. UTC
From: Nicolas Geoffray <ngeoffray@google.com>

F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
A private mapping created after the memfd file that gets sealed with
F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
children and parent share the same memory, even though the mapping is
private.

The reason for this is due to the code below:

static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
{
        struct shmem_inode_info *info = SHMEM_I(file_inode(file));

        if (info->seals & F_SEAL_FUTURE_WRITE) {
                /*
                 * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
                 * "future write" seal active.
                 */
                if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
                        return -EPERM;

                /*
                 * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
                 * read-only mapping, take care to not allow mprotect to revert
                 * protections.
                 */
                vma->vm_flags &= ~(VM_MAYWRITE);
        }
        ...
}

And for the mm to know if a mapping is copy-on-write:
static inline bool is_cow_mapping(vm_flags_t flags)
{
        return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
}

The patch fixes the issue by making the mprotect revert protection
happen only for shared mappings. For private mappings, using mprotect
will have no effect on the seal behavior.

Cc: kernel-team@android.com
Signed-off-by: Nicolas Geoffray <ngeoffray@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

---
Google bug: 143833776

 mm/shmem.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Andrew Morton Nov. 8, 2019, 1 a.m. UTC | #1
On Thu,  7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:

> F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
> A private mapping created after the memfd file that gets sealed with
> F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
> children and parent share the same memory, even though the mapping is
> private.

That sounds fairly serious.  Should this be backported into -stable kernels?
Joel Fernandes Nov. 8, 2019, 2:06 a.m. UTC | #2
On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
> On Thu,  7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> 
> > F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
> > A private mapping created after the memfd file that gets sealed with
> > F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
> > children and parent share the same memory, even though the mapping is
> > private.
> 
> That sounds fairly serious.  Should this be backported into -stable kernels?

Yes, it should be. The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so
v5.3.x stable kernels would need a backport. I can submit a backport tomorrow
unless we are Ok with stable automatically picking it up (I believe the
stable folks "auto select" fixes which should detect this is a fix since I
have said it is a fix in the subject line).

thanks,

 - Joel
Andrew Morton Nov. 8, 2019, 3:25 a.m. UTC | #3
On Thu, 7 Nov 2019 21:06:14 -0500 Joel Fernandes <joel@joelfernandes.org> wrote:

> On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
> > On Thu,  7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> > 
> > > F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
> > > A private mapping created after the memfd file that gets sealed with
> > > F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
> > > children and parent share the same memory, even though the mapping is
> > > private.
> > 
> > That sounds fairly serious.  Should this be backported into -stable kernels?
> 
> Yes, it should be.

I added

Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
Cc: <stable@vger.kernel.org>

> The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so
> v5.3.x stable kernels would need a backport. I can submit a backport tomorrow
> unless we are Ok with stable automatically picking it up (I believe the
> stable folks "auto select" fixes which should detect this is a fix since I
> have said it is a fix in the subject line).

The Cc:stable tag should trigger the appropriate actions, assisted by
the Fixes:.  I doubt if "fix" in the Subject has much effect.
Christoph Hellwig Nov. 8, 2019, 6:33 a.m. UTC | #4
> -		 * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
> -		 * read-only mapping, take care to not allow mprotect to revert
> -		 * protections.
> +		 * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
> +		 * MAP_SHARED and read-only, take care to not allow mprotect to
> +		 * revert protections on such mappings. Do this only for shared
> +		 * mappings. For private mappings, don't need to mask VM_MAYWRITE

This adds an > 80 char line.
Greg KH Nov. 8, 2019, 6:37 a.m. UTC | #5
On Thu, Nov 07, 2019 at 09:06:14PM -0500, Joel Fernandes wrote:
> On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
> > On Thu,  7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> > 
> > > F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
> > > A private mapping created after the memfd file that gets sealed with
> > > F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
> > > children and parent share the same memory, even though the mapping is
> > > private.
> > 
> > That sounds fairly serious.  Should this be backported into -stable kernels?
> 
> Yes, it should be. The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so
> v5.3.x stable kernels would need a backport. I can submit a backport tomorrow
> unless we are Ok with stable automatically picking it up (I believe the
> stable folks "auto select" fixes which should detect this is a fix since I
> have said it is a fix in the subject line).

Never rely on "auto select" to pick up a patch for stable if you already
know it should go to stable.  Just mark it as such, or tell stable@vger
after the fact.

thanks,

greg k-h
Joel Fernandes Nov. 8, 2019, 3:34 p.m. UTC | #6
On Fri, Nov 08, 2019 at 07:37:15AM +0100, Greg KH wrote:
> On Thu, Nov 07, 2019 at 09:06:14PM -0500, Joel Fernandes wrote:
> > On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
> > > On Thu,  7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:
> > > 
> > > > F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE:
> > > > A private mapping created after the memfd file that gets sealed with
> > > > F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning
> > > > children and parent share the same memory, even though the mapping is
> > > > private.
> > > 
> > > That sounds fairly serious.  Should this be backported into -stable kernels?
> > 
> > Yes, it should be. The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so
> > v5.3.x stable kernels would need a backport. I can submit a backport tomorrow
> > unless we are Ok with stable automatically picking it up (I believe the
> > stable folks "auto select" fixes which should detect this is a fix since I
> > have said it is a fix in the subject line).
> 
> Never rely on "auto select" to pick up a patch for stable if you already
> know it should go to stable.  Just mark it as such, or tell stable@vger
> after the fact.

Sure, agreed.

Thanks Andrew for adding the tags!

thanks,

 - Joel
Joel Fernandes Nov. 8, 2019, 3:35 p.m. UTC | #7
On Thu, Nov 07, 2019 at 10:33:08PM -0800, Christoph Hellwig wrote:
> > -		 * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
> > -		 * read-only mapping, take care to not allow mprotect to revert
> > -		 * protections.
> > +		 * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
> > +		 * MAP_SHARED and read-only, take care to not allow mprotect to
> > +		 * revert protections on such mappings. Do this only for shared
> > +		 * mappings. For private mappings, don't need to mask VM_MAYWRITE
> 
> This adds an > 80 char line.

Oh, true. Sorry. Andrew I hate to ask you but since you took the patch
already, could you just the comment for the character limit in the one
you applied?

thanks,

 - Joel
diff mbox series

Patch

diff --git a/mm/shmem.c b/mm/shmem.c
index 447fd575587c..6ac5e867ef13 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2214,11 +2214,14 @@  static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
 			return -EPERM;
 
 		/*
-		 * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
-		 * read-only mapping, take care to not allow mprotect to revert
-		 * protections.
+		 * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
+		 * MAP_SHARED and read-only, take care to not allow mprotect to
+		 * revert protections on such mappings. Do this only for shared
+		 * mappings. For private mappings, don't need to mask VM_MAYWRITE
+		 * as we still want them to be COW-writable.
 		 */
-		vma->vm_flags &= ~(VM_MAYWRITE);
+		if (vma->vm_flags & VM_SHARED)
+			vma->vm_flags &= ~(VM_MAYWRITE);
 	}
 
 	file_accessed(file);