Message ID | 20210816194840.42769-1-david@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | Remove in-tree usage of MAP_DENYWRITE | expand |
Am 16.08.21 um 21:48 schrieb David Hildenbrand: > This series removes all in-tree usage of MAP_DENYWRITE from the kernel > and removes VM_DENYWRITE. We stopped supporting MAP_DENYWRITE for > user space applications a while ago because of the chance for DoS. > The last renaming user is binfmt binary loading during exec and > legacy library loading via uselib(). > > With this change, MAP_DENYWRITE is effectively ignored throughout the > kernel. Although the net change is small (well, we actually add code and > comments), I think the cleanup in mmap() is quite nice. > > There are some (minor) user-visible changes with this series: > 1. We no longer deny write access to shared libaries loaded via legacy > uselib(); this behavior matches modern user space e.g., via dlopen(). > 2. We no longer deny write access to the elf interpreter after exec > completed, treating it just like shared libraries (which it often is). > 3. We always deny write access to the file linked via /proc/pid/exe: > sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the file > cannot be denied, and write access to the file will remain denied > until the link is effectivel gone (exec, termination, > sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file. > > There is a related problem [2] with overlayfs, that should at least partly > be tackled by this series. I don't quite understand the interaction of > overlayfs and deny_write_access()/allow_write_access() at exec time: > > If we end up denying write access to the wrong file and not to the > realfile, that would be fundamentally broken. We would have to reroute > our deny_write_access()/ allow_write_access() calls for the exec file to > the realfile -- but I leave figuring out the details to overlayfs guys, as > that would be a related but different issue. > > There was a lengthy discussion in [3] whether to remove deny_write_access() > completely; however, if we decide to go that way, it would ideally be done > on top, because it could be that some applications even rely on the current > behavior. > > v1 -> v2: > - "kernel/fork: factor out replacing the current MM exe_file" > -- Call the function "replace_mm_exe_file()" instead > -- Add some doc, similar to set_mm_exe_file() > -- Update patch subject/description > - "kernel/fork: always deny write access to current MM exe_file" > -- Introduce dup_mm_exe_file() > -- Make set_mm_exe_file() return an error to make the code easier to > grasp. > -- Improve comments > - Added ACKs > - Mention "sys_prctl(PR_SET_MM_MAP/EXE_FILE)" everywhere instead of > only "sys_prctl(PR_SET_MM_EXE_FILE)". > > RFC -> v1: > - "binfmt: remove in-tree usage of MAP_DENYWRITE" > -- Add a note that this should fix part of a problem with overlayfs This is unfortunately way beyond my understanding of the fs layer to actually review this. But from the ten mile high view it looks like a really nice cleanup to me and makes my live in the DRM subsystem much easier. Feel free to add a Acked-by: Christian König <christian.koenig@amd.com> to the series. Thanks, Christian. > > [1] https://lore.kernel.org/r/20210423131640.20080-1-david@redhat.com/ > [2] https://lore.kernel.org/r/YNHXzBgzRrZu1MrD@miu.piliscsaba.redhat.com/ > [3] https://lkml.kernel.org/r/20210812084348.6521-1-david@redhat.com > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Borislav Petkov <bp@alien8.de> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: Alexey Dobriyan <adobriyan@gmail.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Arnaldo Carvalho de Melo <acme@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Cc: Jiri Olsa <jolsa@redhat.com> > Cc: Namhyung Kim <namhyung@kernel.org> > Cc: Petr Mladek <pmladek@suse.com> > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> > Cc: Kees Cook <keescook@chromium.org> > Cc: "Eric W. Biederman" <ebiederm@xmission.com> > Cc: Greg Ungerer <gerg@linux-m68k.org> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: Mike Rapoport <rppt@kernel.org> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> > Cc: Chinwen Chang <chinwen.chang@mediatek.com> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> > Cc: Huang Ying <ying.huang@intel.com> > Cc: Jann Horn <jannh@google.com> > Cc: Feng Tang <feng.tang@intel.com> > Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Shawn Anastasio <shawn@anastas.io> > Cc: Steven Price <steven.price@arm.com> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Christian Brauner <christian.brauner@ubuntu.com> > Cc: Jens Axboe <axboe@kernel.dk> > Cc: Gabriel Krisman Bertazi <krisman@collabora.com> > Cc: Peter Xu <peterx@redhat.com> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Shakeel Butt <shakeelb@google.com> > Cc: Marco Elver <elver@google.com> > Cc: Daniel Jordan <daniel.m.jordan@oracle.com> > Cc: Nicolas Viennot <Nicolas.Viennot@twosigma.com> > Cc: Thomas Cedeno <thomascedeno@google.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Miklos Szeredi <miklos@szeredi.hu> > Cc: Chengguang Xu <cgxu519@mykernel.net> > Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com> > Cc: Florian Weimer <fweimer@redhat.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: David Laight <David.Laight@ACULAB.COM> > Cc: linux-unionfs@vger.kernel.org > Cc: linux-api@vger.kernel.org > Cc: x86@kernel.org > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-mm@kvack.org > > David Hildenbrand (7): > binfmt: don't use MAP_DENYWRITE when loading shared libraries via > uselib() > kernel/fork: factor out replacing the current MM exe_file > kernel/fork: always deny write access to current MM exe_file > binfmt: remove in-tree usage of MAP_DENYWRITE > mm: remove VM_DENYWRITE > mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff() > fs: update documentation of get_write_access() and friends > > arch/x86/ia32/ia32_aout.c | 8 ++- > fs/binfmt_aout.c | 7 ++- > fs/binfmt_elf.c | 6 +-- > fs/binfmt_elf_fdpic.c | 2 +- > fs/exec.c | 4 +- > fs/proc/task_mmu.c | 1 - > include/linux/fs.h | 19 ++++--- > include/linux/mm.h | 4 +- > include/linux/mman.h | 4 +- > include/trace/events/mmflags.h | 1 - > kernel/events/core.c | 2 - > kernel/fork.c | 95 ++++++++++++++++++++++++++++++---- > kernel/sys.c | 33 +----------- > lib/test_printf.c | 5 +- > mm/mmap.c | 29 ++--------- > mm/nommu.c | 2 - > 16 files changed, 119 insertions(+), 103 deletions(-) > > > base-commit: 7c60610d476766e128cc4284bb6349732cbd6606
On 16.08.21 21:48, David Hildenbrand wrote: > This series removes all in-tree usage of MAP_DENYWRITE from the kernel > and removes VM_DENYWRITE. We stopped supporting MAP_DENYWRITE for > user space applications a while ago because of the chance for DoS. > The last renaming user is binfmt binary loading during exec and > legacy library loading via uselib(). > So, how do we want to continue with this? Pick it up for v5.15? Have it in -next for a while and eventually pick it up for v5.16? I think the "remove ETXTBSY completely" and "remove the sanity mapping check" thingies should be done on top.
On Fri, Sep 3, 2021 at 2:45 AM David Hildenbrand <david@redhat.com> wrote: > > So, how do we want to continue with this? Pick it up for v5.15? Have it > in -next for a while and eventually pick it up for v5.16? I'm ok with the series. If you have a git tree, do the normal pull request, and we can do it for 5.15 and see if anybody notices. As you say, any final removal of ETXTBSY should be a separate and later patch on top. Linus