diff mbox series

[v2,2/7] kernel/fork: factor out replacing the current MM exe_file

Message ID 20210816194840.42769-3-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series Remove in-tree usage of MAP_DENYWRITE | expand

Commit Message

David Hildenbrand Aug. 16, 2021, 7:48 p.m. UTC
Let's factor the main logic out into replace_mm_exe_file(), such that
all mm->exe_file logic is contained in kernel/fork.c.

While at it, perform some simple cleanups that are possible now that
we're simplifying the individual functions.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h |  1 +
 kernel/fork.c      | 44 +++++++++++++++++++++++++++++++++++++++++---
 kernel/sys.c       | 33 +--------------------------------
 3 files changed, 43 insertions(+), 35 deletions(-)

Comments

Linus Torvalds Aug. 19, 2021, 8:51 p.m. UTC | #1
So I like this series.

However, logically, I think this part in replace_mm_exe_file() no
longer makes sense:

On Mon, Aug 16, 2021 at 12:50 PM David Hildenbrand <david@redhat.com> wrote:
>
> +       /* Forbid mm->exe_file change if old file still mapped. */
> +       old_exe_file = get_mm_exe_file(mm);
> +       if (old_exe_file) {
> +               mmap_read_lock(mm);
> +               for (vma = mm->mmap; vma && !ret; vma = vma->vm_next) {
> +                       if (!vma->vm_file)
> +                               continue;
> +                       if (path_equal(&vma->vm_file->f_path,
> +                                      &old_exe_file->f_path))
> +                               ret = -EBUSY;
> +               }
> +               mmap_read_unlock(mm);
> +               fput(old_exe_file);
> +               if (ret)
> +                       return ret;
> +       }

and should just be removed.

NOTE! I think it makes sense within the context of this patch (where
you just move code around), but that it should then be removed in the
next patch that does that "always deny write access to current MM
exe_file" thing.

I just quoted it in the context of this patch, since the next patch
doesn't actually show this code any more.

In the *old* model - where the ETXTBUSY was about the mmap() of the
file - the above tests make sense.

But in the new model, walking the mappings just doesn't seem to be a
sensible operation any more. The mappings simply aren't what ETXTBUSY
is about in the new world order, and so doing that mapping walk seems
nonsensical.

Hmm?

                 Linus
David Hildenbrand Aug. 20, 2021, 8:46 a.m. UTC | #2
On 19.08.21 22:51, Linus Torvalds wrote:
> So I like this series.
> 
> However, logically, I think this part in replace_mm_exe_file() no
> longer makes sense:
> 
> On Mon, Aug 16, 2021 at 12:50 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> +       /* Forbid mm->exe_file change if old file still mapped. */
>> +       old_exe_file = get_mm_exe_file(mm);
>> +       if (old_exe_file) {
>> +               mmap_read_lock(mm);
>> +               for (vma = mm->mmap; vma && !ret; vma = vma->vm_next) {
>> +                       if (!vma->vm_file)
>> +                               continue;
>> +                       if (path_equal(&vma->vm_file->f_path,
>> +                                      &old_exe_file->f_path))
>> +                               ret = -EBUSY;
>> +               }
>> +               mmap_read_unlock(mm);
>> +               fput(old_exe_file);
>> +               if (ret)
>> +                       return ret;
>> +       }
> 
> and should just be removed.
> 
> NOTE! I think it makes sense within the context of this patch (where
> you just move code around), but that it should then be removed in the
> next patch that does that "always deny write access to current MM
> exe_file" thing.
> 
> I just quoted it in the context of this patch, since the next patch
> doesn't actually show this code any more.
> 
> In the *old* model - where the ETXTBUSY was about the mmap() of the
> file - the above tests make sense.
> 
> But in the new model, walking the mappings just doesn't seem to be a
> sensible operation any more. The mappings simply aren't what ETXTBUSY
> is about in the new world order, and so doing that mapping walk seems
> nonsensical.
> 
> Hmm?

I think this is somewhat another kind of "stop user space trying
to do stupid things" thingy, not necessarily glued to ETXTBUSY:
don't allow replacing exe_file if that very file is still mapped
and consequently eventually still in use by the application.

I don't think it necessarily has many things to do with ETXTBUSY:
we only check if there is a VMA mapping that file, not that it's
a VM_DENYWRITE mapping.

That code originates from

commit 4229fb1dc6843c49a14bb098719f8a696cdc44f8
Author: Konstantin Khlebnikov <khlebnikov@openvz.org>
Date:   Wed Jul 11 14:02:11 2012 -0700

     c/r: prctl: less paranoid prctl_set_mm_exe_file()

     "no other files mapped" requirement from my previous patch (c/r: prctl:
     update prctl_set_mm_exe_file() after mm->num_exe_file_vmas removal) is too
     paranoid, it forbids operation even if there mapped one shared-anon vma.
     
     Let's check that current mm->exe_file already unmapped, in this case
     exe_file symlink already outdated and its changing is reasonable.


The statement "exe_file symlink already outdated and its
changing is reasonable" somewhat makes sense.


Long story short, I think this check somehow makes a bit of sense, but
we wouldn't lose too much if we drop it -- just another sanity check.

Your call :)
Eric W. Biederman Aug. 20, 2021, 2:36 p.m. UTC | #3
David Hildenbrand <david@redhat.com> writes:

> On 19.08.21 22:51, Linus Torvalds wrote:
>> So I like this series.
>>
>> However, logically, I think this part in replace_mm_exe_file() no
>> longer makes sense:
>>
>> On Mon, Aug 16, 2021 at 12:50 PM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> +       /* Forbid mm->exe_file change if old file still mapped. */
>>> +       old_exe_file = get_mm_exe_file(mm);
>>> +       if (old_exe_file) {
>>> +               mmap_read_lock(mm);
>>> +               for (vma = mm->mmap; vma && !ret; vma = vma->vm_next) {
>>> +                       if (!vma->vm_file)
>>> +                               continue;
>>> +                       if (path_equal(&vma->vm_file->f_path,
>>> +                                      &old_exe_file->f_path))
>>> +                               ret = -EBUSY;
>>> +               }
>>> +               mmap_read_unlock(mm);
>>> +               fput(old_exe_file);
>>> +               if (ret)
>>> +                       return ret;
>>> +       }
>>
>> and should just be removed.
>>
>> NOTE! I think it makes sense within the context of this patch (where
>> you just move code around), but that it should then be removed in the
>> next patch that does that "always deny write access to current MM
>> exe_file" thing.
>>
>> I just quoted it in the context of this patch, since the next patch
>> doesn't actually show this code any more.
>>
>> In the *old* model - where the ETXTBUSY was about the mmap() of the
>> file - the above tests make sense.
>>
>> But in the new model, walking the mappings just doesn't seem to be a
>> sensible operation any more. The mappings simply aren't what ETXTBUSY
>> is about in the new world order, and so doing that mapping walk seems
>> nonsensical.
>>
>> Hmm?
>
> I think this is somewhat another kind of "stop user space trying
> to do stupid things" thingy, not necessarily glued to ETXTBUSY:
> don't allow replacing exe_file if that very file is still mapped
> and consequently eventually still in use by the application.
>
> I don't think it necessarily has many things to do with ETXTBUSY:
> we only check if there is a VMA mapping that file, not that it's
> a VM_DENYWRITE mapping.
>
> That code originates from
>
> commit 4229fb1dc6843c49a14bb098719f8a696cdc44f8
> Author: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Date:   Wed Jul 11 14:02:11 2012 -0700
>
>     c/r: prctl: less paranoid prctl_set_mm_exe_file()
>
>     "no other files mapped" requirement from my previous patch (c/r: prctl:
>     update prctl_set_mm_exe_file() after mm->num_exe_file_vmas removal) is too
>     paranoid, it forbids operation even if there mapped one shared-anon vma.
>       Let's check that current mm->exe_file already unmapped, in this case
>     exe_file symlink already outdated and its changing is reasonable.
>
>
> The statement "exe_file symlink already outdated and its
> changing is reasonable" somewhat makes sense.
>
>
> Long story short, I think this check somehow makes a bit of sense, but
> we wouldn't lose too much if we drop it -- just another sanity check.
>
> Your call :)

There has been quite a bit of conversation of the years about how bad is
it to allow changing /proc/self/exe as some userspace depends on it.

I think this check is there to keep from changing /proc/self/exe
arbitrarily.

Maybe it is all completely silly and we should not care about the code
that thinks /proc/self/exe is a reliable measure of anything, but short
of that I think we should either keep the code or put in some careful
thought as to which restrictions make sense when changing
/proc/self/exe.

Eric
Linus Torvalds Aug. 22, 2021, 5:58 p.m. UTC | #4
On Fri, Aug 20, 2021 at 7:36 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> I think this check is there to keep from changing /proc/self/exe
> arbitrarily.

Well, you pretty much can already. You just have to jump through a few hoops.

> Maybe it is all completely silly and we should not care about the code
> that thinks /proc/self/exe is a reliable measure of anything, but short
> of that I think we should either keep the code or put in some careful
> thought as to which restrictions make sense when changing
> /proc/self/exe.

I think the important ones are already there: checking that it is (a)
an executable and (b) that we have execute permission to it.

I also think the code is actually racy - while we are checking "did
the old mm_exe file have any mappings", there's nothing that keeps
another thread from changing the exe file to another one that _does_
have mappings, and then we'll happily replace it with yet another file
because we checked the old one, not the new one it was replaced by in
the meantime.

Of course, that "race" doesn't really matter - exactly because this
isn't about security, it's just a random "let's test that immaterial
thing, and we don't actually care about corner cases".

So I'm not saying that race needs to be fixed - I'm just pointing it
out as an example of how nonsensical the test really is. It's not
fundamental to anything, it's just a random "let's test this odd
condition".

That said, I don't care _that_ much. I'm happy with David's series, I
just think that once we don't do this at a mmap level any more, the
"go look for mappings" code makes little sense.

So we can leave it, and remove it later if people agree.

                  Linus
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7ca22e6e694a..48c6fa9ab792 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2581,6 +2581,7 @@  extern int mm_take_all_locks(struct mm_struct *mm);
 extern void mm_drop_all_locks(struct mm_struct *mm);
 
 extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
+extern int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
 extern struct file *get_mm_exe_file(struct mm_struct *mm);
 extern struct file *get_task_exe_file(struct task_struct *task);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index bc94b2cc5995..eedce5c77041 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1148,9 +1148,7 @@  void mmput_async(struct mm_struct *mm)
  *
  * Main users are mmput() and sys_execve(). Callers prevent concurrent
  * invocations: in mmput() nobody alive left, in execve task is single
- * threaded. sys_prctl(PR_SET_MM_MAP/EXE_FILE) also needs to set the
- * mm->exe_file, but does so without using set_mm_exe_file() in order
- * to avoid the need for any locks.
+ * threaded.
  */
 void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 {
@@ -1170,6 +1168,46 @@  void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
 		fput(old_exe_file);
 }
 
+/**
+ * replace_mm_exe_file - replace a reference to the mm's executable file
+ *
+ * This changes mm's executable file (shown as symlink /proc/[pid]/exe),
+ * dealing with concurrent invocation and without grabbing the mmap lock in
+ * write mode.
+ *
+ * Main user is sys_prctl(PR_SET_MM_MAP/EXE_FILE).
+ */
+int replace_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
+{
+	struct vm_area_struct *vma;
+	struct file *old_exe_file;
+	int ret = 0;
+
+	/* Forbid mm->exe_file change if old file still mapped. */
+	old_exe_file = get_mm_exe_file(mm);
+	if (old_exe_file) {
+		mmap_read_lock(mm);
+		for (vma = mm->mmap; vma && !ret; vma = vma->vm_next) {
+			if (!vma->vm_file)
+				continue;
+			if (path_equal(&vma->vm_file->f_path,
+				       &old_exe_file->f_path))
+				ret = -EBUSY;
+		}
+		mmap_read_unlock(mm);
+		fput(old_exe_file);
+		if (ret)
+			return ret;
+	}
+
+	/* set the new file, lockless */
+	get_file(new_exe_file);
+	old_exe_file = xchg(&mm->exe_file, new_exe_file);
+	if (old_exe_file)
+		fput(old_exe_file);
+	return 0;
+}
+
 /**
  * get_mm_exe_file - acquire a reference to the mm's executable file
  *
diff --git a/kernel/sys.c b/kernel/sys.c
index ef1a78f5d71c..30c12e54585a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1846,7 +1846,6 @@  SYSCALL_DEFINE1(umask, int, mask)
 static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 {
 	struct fd exe;
-	struct file *old_exe, *exe_file;
 	struct inode *inode;
 	int err;
 
@@ -1869,40 +1868,10 @@  static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
 	if (err)
 		goto exit;
 
-	/*
-	 * Forbid mm->exe_file change if old file still mapped.
-	 */
-	exe_file = get_mm_exe_file(mm);
-	err = -EBUSY;
-	if (exe_file) {
-		struct vm_area_struct *vma;
-
-		mmap_read_lock(mm);
-		for (vma = mm->mmap; vma; vma = vma->vm_next) {
-			if (!vma->vm_file)
-				continue;
-			if (path_equal(&vma->vm_file->f_path,
-				       &exe_file->f_path))
-				goto exit_err;
-		}
-
-		mmap_read_unlock(mm);
-		fput(exe_file);
-	}
-
-	err = 0;
-	/* set the new file, lockless */
-	get_file(exe.file);
-	old_exe = xchg(&mm->exe_file, exe.file);
-	if (old_exe)
-		fput(old_exe);
+	err = replace_mm_exe_file(mm, exe.file);
 exit:
 	fdput(exe);
 	return err;
-exit_err:
-	mmap_read_unlock(mm);
-	fput(exe_file);
-	goto exit;
 }
 
 /*