mbox series

[RFC,0/3] permit write-sealed memfd read-only shared mappings

Message ID cover.1680560277.git.lstoakes@gmail.com (mailing list archive)
Headers show
Series permit write-sealed memfd read-only shared mappings | expand

Message

Lorenzo Stoakes April 3, 2023, 10:28 p.m. UTC
This patch series is in two parts:-

1. Currently there are a number of places in the kernel where we assume
   VM_SHARED implies that a mapping is writable. Let's be slightly less
   strict and relax this restriction in the case that VM_MAYWRITE is not
   set.

   This should have no noticeable impact as the lack of VM_MAYWRITE implies
   that the mapping can not be made writable via mprotect() or any other
   means.

2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap().
   The latter already clears the VM_MAYWRITE flag for a sealed read-only
   mapping, we simply extend this to F_SEAL_WRITE too.

   For this to have effect, we must also invoke call_mmap() before
   mapping_map_writable().

As this is quite a fundamental change on the assumptions around VM_SHARED
and since this causes a visible change to userland (in permitting read-only
shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC
to see if there is anything terribly wrong with it.

I suspect even if the patch series as a whole is unpalatable, there are
probably things we can salvage from it in any case.

Thanks to Andy Lutomirski who inspired the series!

Lorenzo Stoakes (3):
  mm: drop the assumption that VM_SHARED always implies writable
  mm: update seal_check_[future_]write() to include F_SEAL_WRITE as well
  mm: perform the mapping_map_writable() check after call_mmap()

 fs/hugetlbfs/inode.c |  2 +-
 include/linux/fs.h   |  4 ++--
 include/linux/mm.h   | 24 ++++++++++++++++++------
 kernel/fork.c        |  2 +-
 mm/filemap.c         |  2 +-
 mm/madvise.c         |  2 +-
 mm/mmap.c            | 22 +++++++++++-----------
 mm/shmem.c           |  2 +-
 8 files changed, 36 insertions(+), 24 deletions(-)

--
2.40.0

Comments

Jan Kara April 21, 2023, 9:01 a.m. UTC | #1
Hi!

On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote:
> This patch series is in two parts:-
> 
> 1. Currently there are a number of places in the kernel where we assume
>    VM_SHARED implies that a mapping is writable. Let's be slightly less
>    strict and relax this restriction in the case that VM_MAYWRITE is not
>    set.
> 
>    This should have no noticeable impact as the lack of VM_MAYWRITE implies
>    that the mapping can not be made writable via mprotect() or any other
>    means.
> 
> 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap().
>    The latter already clears the VM_MAYWRITE flag for a sealed read-only
>    mapping, we simply extend this to F_SEAL_WRITE too.
> 
>    For this to have effect, we must also invoke call_mmap() before
>    mapping_map_writable().
> 
> As this is quite a fundamental change on the assumptions around VM_SHARED
> and since this causes a visible change to userland (in permitting read-only
> shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC
> to see if there is anything terribly wrong with it.

So what I miss in this series is what the motivation is. Is it that you need
to map F_SEAL_WRITE read-only? Why?

								Honza
Lorenzo Stoakes April 21, 2023, 9:23 p.m. UTC | #2
On Fri, Apr 21, 2023 at 11:01:26AM +0200, Jan Kara wrote:
> Hi!
>
> On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote:
> > This patch series is in two parts:-
> >
> > 1. Currently there are a number of places in the kernel where we assume
> >    VM_SHARED implies that a mapping is writable. Let's be slightly less
> >    strict and relax this restriction in the case that VM_MAYWRITE is not
> >    set.
> >
> >    This should have no noticeable impact as the lack of VM_MAYWRITE implies
> >    that the mapping can not be made writable via mprotect() or any other
> >    means.
> >
> > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap().
> >    The latter already clears the VM_MAYWRITE flag for a sealed read-only
> >    mapping, we simply extend this to F_SEAL_WRITE too.
> >
> >    For this to have effect, we must also invoke call_mmap() before
> >    mapping_map_writable().
> >
> > As this is quite a fundamental change on the assumptions around VM_SHARED
> > and since this causes a visible change to userland (in permitting read-only
> > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC
> > to see if there is anything terribly wrong with it.
>
> So what I miss in this series is what the motivation is. Is it that you need
> to map F_SEAL_WRITE read-only? Why?
>

This originated from the discussion in [1], which refers to the bug
reported in [2]. Essentially the user is write-sealing a memfd then trying
to mmap it read-only, but receives an -EPERM error.

F_SEAL_FUTURE_WRITE _does_ explicitly permit this but F_SEAL_WRITE does not.

The fcntl() man page states:

    Furthermore, trying to create new shared, writable memory-mappings via
    mmap(2) will also fail with EPERM.

So the kernel does not behave as the documentation states.

I took the user-supplied repro and slightly modified it, enclosed
below. After this patch series, this code works correctly.

I think there's definitely a case for the VM_MAYWRITE part of this patch
series even if the memfd bits are not considered useful, as we do seem to
make the implicit assumption that MAP_SHARED == writable even if
!VM_MAYWRITE which seems odd.

Reproducer:-

int main()
{
       int fd = memfd_create("test", MFD_ALLOW_SEALING);
       if (fd == -1) {
	       perror("memfd_create");
	       return EXIT_FAILURE;
       }

       write(fd, "test", 4);

       if (fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE) == -1) {
	       perror("fcntl");
	       return EXIT_FAILURE;
       }

       void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0);
       if (ret == MAP_FAILED) {
	       perror("mmap");
	       return EXIT_FAILURE;
       }

       return EXIT_SUCCESS;
}

[1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/
[2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238

> 								Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
Jan Kara April 24, 2023, 12:19 p.m. UTC | #3
On Fri 21-04-23 22:23:12, Lorenzo Stoakes wrote:
> On Fri, Apr 21, 2023 at 11:01:26AM +0200, Jan Kara wrote:
> > Hi!
> >
> > On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote:
> > > This patch series is in two parts:-
> > >
> > > 1. Currently there are a number of places in the kernel where we assume
> > >    VM_SHARED implies that a mapping is writable. Let's be slightly less
> > >    strict and relax this restriction in the case that VM_MAYWRITE is not
> > >    set.
> > >
> > >    This should have no noticeable impact as the lack of VM_MAYWRITE implies
> > >    that the mapping can not be made writable via mprotect() or any other
> > >    means.
> > >
> > > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap().
> > >    The latter already clears the VM_MAYWRITE flag for a sealed read-only
> > >    mapping, we simply extend this to F_SEAL_WRITE too.
> > >
> > >    For this to have effect, we must also invoke call_mmap() before
> > >    mapping_map_writable().
> > >
> > > As this is quite a fundamental change on the assumptions around VM_SHARED
> > > and since this causes a visible change to userland (in permitting read-only
> > > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC
> > > to see if there is anything terribly wrong with it.
> >
> > So what I miss in this series is what the motivation is. Is it that you need
> > to map F_SEAL_WRITE read-only? Why?
> >
> 
> This originated from the discussion in [1], which refers to the bug
> reported in [2]. Essentially the user is write-sealing a memfd then trying
> to mmap it read-only, but receives an -EPERM error.
> 
> F_SEAL_FUTURE_WRITE _does_ explicitly permit this but F_SEAL_WRITE does not.
> 
> The fcntl() man page states:
> 
>     Furthermore, trying to create new shared, writable memory-mappings via
>     mmap(2) will also fail with EPERM.
> 
> So the kernel does not behave as the documentation states.
> 
> I took the user-supplied repro and slightly modified it, enclosed
> below. After this patch series, this code works correctly.
> 
> I think there's definitely a case for the VM_MAYWRITE part of this patch
> series even if the memfd bits are not considered useful, as we do seem to
> make the implicit assumption that MAP_SHARED == writable even if
> !VM_MAYWRITE which seems odd.

Thanks for the explanation! Could you please include this information in
the cover letter (perhaps in a form of a short note and reference to the
mailing list) for future reference? Thanks!

								Honza
Lorenzo Stoakes April 24, 2023, 12:23 p.m. UTC | #4
On Mon, Apr 24, 2023 at 02:19:36PM +0200, Jan Kara wrote:
> On Fri 21-04-23 22:23:12, Lorenzo Stoakes wrote:
> > On Fri, Apr 21, 2023 at 11:01:26AM +0200, Jan Kara wrote:
> > > Hi!
> > >
> > > On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote:
> > > > This patch series is in two parts:-
> > > >
> > > > 1. Currently there are a number of places in the kernel where we assume
> > > >    VM_SHARED implies that a mapping is writable. Let's be slightly less
> > > >    strict and relax this restriction in the case that VM_MAYWRITE is not
> > > >    set.
> > > >
> > > >    This should have no noticeable impact as the lack of VM_MAYWRITE implies
> > > >    that the mapping can not be made writable via mprotect() or any other
> > > >    means.
> > > >
> > > > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap().
> > > >    The latter already clears the VM_MAYWRITE flag for a sealed read-only
> > > >    mapping, we simply extend this to F_SEAL_WRITE too.
> > > >
> > > >    For this to have effect, we must also invoke call_mmap() before
> > > >    mapping_map_writable().
> > > >
> > > > As this is quite a fundamental change on the assumptions around VM_SHARED
> > > > and since this causes a visible change to userland (in permitting read-only
> > > > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC
> > > > to see if there is anything terribly wrong with it.
> > >
> > > So what I miss in this series is what the motivation is. Is it that you need
> > > to map F_SEAL_WRITE read-only? Why?
> > >
> >
> > This originated from the discussion in [1], which refers to the bug
> > reported in [2]. Essentially the user is write-sealing a memfd then trying
> > to mmap it read-only, but receives an -EPERM error.
> >
> > F_SEAL_FUTURE_WRITE _does_ explicitly permit this but F_SEAL_WRITE does not.
> >
> > The fcntl() man page states:
> >
> >     Furthermore, trying to create new shared, writable memory-mappings via
> >     mmap(2) will also fail with EPERM.
> >
> > So the kernel does not behave as the documentation states.
> >
> > I took the user-supplied repro and slightly modified it, enclosed
> > below. After this patch series, this code works correctly.
> >
> > I think there's definitely a case for the VM_MAYWRITE part of this patch
> > series even if the memfd bits are not considered useful, as we do seem to
> > make the implicit assumption that MAP_SHARED == writable even if
> > !VM_MAYWRITE which seems odd.
>
> Thanks for the explanation! Could you please include this information in
> the cover letter (perhaps in a form of a short note and reference to the
> mailing list) for future reference? Thanks!
>
> 								Honza
>

Sure, apologies for not being clear about that :)

I may respin this as a non-RFC (with updated description of course) as its
received very little attention as an RFC and I don't think it's so
insane/huge a concept as to warrant remaining one.

> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR