[v3,10/16] exec: Remove do_execve_file
diff mbox series

Message ID 20200702164140.4468-10-ebiederm@xmission.com
State New
Headers show
Series
  • Make the user mode driver code a better citizen
Related show

Commit Message

Eric W. Biederman July 2, 2020, 4:41 p.m. UTC
Now that the last callser has been removed remove this code from exec.

For anyone thinking of resurrecing do_execve_file please note that
the code was buggy in several fundamental ways.

- It did not ensure the file it was passed was read-only and that
  deny_write_access had been called on it.  Which subtlely breaks
  invaniants in exec.

- The caller of do_execve_file was expected to hold and put a
  reference to the file, but an extra reference for use by exec was
  not taken so that when exec put it's reference to the file an
  underflow occured on the file reference count.

- The point of the interface was so that a pathname did not need to
  exist.  Which breaks pathname based LSMs.

Tetsuo Handa originally reported these issues[1].  While it was clear
that deny_write_access was missing the fundamental incompatibility
with the passed in O_RDWR filehandle was not immediately recognized.

All of these issues were fixed by modifying the usermode driver code
to have a path, so it did not need this hack.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/
v1: https://lkml.kernel.org/r/871rm2f0hi.fsf_-_@x220.int.ebiederm.org
v2: https://lkml.kernel.org/r/87lfk54p0m.fsf_-_@x220.int.ebiederm.org
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c               | 38 +++++++++-----------------------------
 include/linux/binfmts.h |  1 -
 2 files changed, 9 insertions(+), 30 deletions(-)

Comments

Luis Chamberlain July 8, 2020, 6:35 a.m. UTC | #1
On Thu, Jul 02, 2020 at 11:41:34AM -0500, Eric W. Biederman wrote:
> Now that the last callser has been removed remove this code from exec.
> 
> For anyone thinking of resurrecing do_execve_file please note that
> the code was buggy in several fundamental ways.
> 
> - It did not ensure the file it was passed was read-only and that
>   deny_write_access had been called on it.  Which subtlely breaks
>   invaniants in exec.
> 
> - The caller of do_execve_file was expected to hold and put a
>   reference to the file, but an extra reference for use by exec was
>   not taken so that when exec put it's reference to the file an
>   underflow occured on the file reference count.

Maybe its my growing love with testing, but I'm going to have to partly
blame here that we added a new API without any respective testing.
Granted, I recall this this patch set could have used more wider review
and a bit more patience... but just mentioning this so we try to avoid
new api-without-testing with more reason in the future.

But more importantly, *how* could we have caught this? Or how can we
catch this sort of stuff better in the future?

> - The point of the interface was so that a pathname did not need to
>   exist.  Which breaks pathname based LSMs.

Perhaps so but this fails to do justice of the LSM consideration done
for the patch which added this during patch review [0], and I
particularly recall I called out LSM folks to bring their ray guns out at
this patch. It didn't get much attention.

Let me recap a few points I think your commit log should somehow
consider. You do as you please.

Users of shmem_kernel_file_setup() spawned out of the desire to                                                                                                      
*avoid* LSMs since it didn't make sense in their case as their inodes                                                                                                      
are never exposed to userspace. Such is the case for ipc/shm.c and                                                                                                         
security/keys/big_key.c. Refer to commit c7277090927a5 ("security: shmem:                                                                                                  
implement kernel private shmem inodes") and then commit e1832f2923ec9                                                                                                      
("ipc: use private shmem or hugetlbfs inodes for shm segments").

And the umh module approach was doing:

 a) mapping data already extracted by the kernel somehow from
    a file somehow, presumably from /lib/modules/ path somewhere, but
    again this is not visible to umc.c, as it just gets called with:
                                                                                                                                                                          
     fork_usermode_blob(void *data, size_t len, struct umh_info *info)
                                                                                                                                                                          
 b) Creating the respective tmpfs file with shmem_kernel_file_setup()
 c) Populating the file created and stuffing it with our data passed
 d) Calling do_execve_file() on it.

So, although I was hoping LSM folks would chime in for things I may have
missed during my patch review, my recollection from the patch thread was
that this becuase of a) it in theory could skip out on dealing with LSMs.

[0] https://lkml.kernel.org/r/20180509022526.hertzfpvy7apz6ny@ast-mbp               

  Luis
Luis Chamberlain July 8, 2020, 12:41 p.m. UTC | #2
On Wed, Jul 08, 2020 at 06:35:25AM +0000, Luis Chamberlain wrote:
> On Thu, Jul 02, 2020 at 11:41:34AM -0500, Eric W. Biederman wrote:
> > Now that the last callser has been removed remove this code from exec.
> > 
> > For anyone thinking of resurrecing do_execve_file please note that
> > the code was buggy in several fundamental ways.
> > 
> > - It did not ensure the file it was passed was read-only and that
> >   deny_write_access had been called on it.  Which subtlely breaks
> >   invaniants in exec.
> > 
> > - The caller of do_execve_file was expected to hold and put a
> >   reference to the file, but an extra reference for use by exec was
> >   not taken so that when exec put it's reference to the file an
> >   underflow occured on the file reference count.
> 
> Maybe its my growing love with testing, but I'm going to have to partly
> blame here that we added a new API without any respective testing.
> Granted, I recall this this patch set could have used more wider review
> and a bit more patience... but just mentioning this so we try to avoid
> new api-without-testing with more reason in the future.
> 
> But more importantly, *how* could we have caught this? Or how can we
> catch this sort of stuff better in the future?

Of all the issues you pointed out with do_execve_file(), since upon
review the assumption *by design* was that LSMs/etc would pick up issues
with the file *prior* to processing, I think that this file reference
count issue comes to my attention as the more serious issue which I
wish we could address *first* before this crusade.

So I have to ask, has anyone *really tried* to give a crack at fixing
this refcount issue in a smaller way first? Alexei?

I'm not opposed to the removal of do_execve_file(), however if there
is a reproducible crash / issue with the existing user, this sledge
hammer seems a bit overkill for older kernels.

  Luis
Eric W. Biederman July 8, 2020, 1:08 p.m. UTC | #3
Luis Chamberlain <mcgrof@kernel.org> writes:

> On Wed, Jul 08, 2020 at 06:35:25AM +0000, Luis Chamberlain wrote:
>> On Thu, Jul 02, 2020 at 11:41:34AM -0500, Eric W. Biederman wrote:
>> > Now that the last callser has been removed remove this code from exec.
>> > 
>> > For anyone thinking of resurrecing do_execve_file please note that
>> > the code was buggy in several fundamental ways.
>> > 
>> > - It did not ensure the file it was passed was read-only and that
>> >   deny_write_access had been called on it.  Which subtlely breaks
>> >   invaniants in exec.
>> > 
>> > - The caller of do_execve_file was expected to hold and put a
>> >   reference to the file, but an extra reference for use by exec was
>> >   not taken so that when exec put it's reference to the file an
>> >   underflow occured on the file reference count.
>> 
>> Maybe its my growing love with testing, but I'm going to have to partly
>> blame here that we added a new API without any respective testing.
>> Granted, I recall this this patch set could have used more wider review
>> and a bit more patience... but just mentioning this so we try to avoid
>> new api-without-testing with more reason in the future.
>> 
>> But more importantly, *how* could we have caught this? Or how can we
>> catch this sort of stuff better in the future?
>
> Of all the issues you pointed out with do_execve_file(), since upon
> review the assumption *by design* was that LSMs/etc would pick up issues
> with the file *prior* to processing, I think that this file reference
> count issue comes to my attention as the more serious issue which I
> wish we could address *first* before this crusade.
>
> So I have to ask, has anyone *really tried* to give a crack at fixing
> this refcount issue in a smaller way first? Alexei?
>
> I'm not opposed to the removal of do_execve_file(), however if there
> is a reproducible crash / issue with the existing user, this sledge
> hammer seems a bit overkill for older kernels.

It does not matter for older kernels because there is exactly one user.
That one user is just a place holder keeping the code alive until a real
user comes along.

For older kernels the solution is to just mark the bpfilter code broken
in Kconfig and refuse to compile it.  That is the trivial backportable
fix if anyone wants one.

Eric
Luis Chamberlain July 8, 2020, 1:32 p.m. UTC | #4
On Wed, Jul 08, 2020 at 08:08:09AM -0500, Eric W. Biederman wrote:
> Luis Chamberlain <mcgrof@kernel.org> writes:
> 
> > On Wed, Jul 08, 2020 at 06:35:25AM +0000, Luis Chamberlain wrote:
> >> On Thu, Jul 02, 2020 at 11:41:34AM -0500, Eric W. Biederman wrote:
> >> > Now that the last callser has been removed remove this code from exec.
> >> > 
> >> > For anyone thinking of resurrecing do_execve_file please note that
> >> > the code was buggy in several fundamental ways.
> >> > 
> >> > - It did not ensure the file it was passed was read-only and that
> >> >   deny_write_access had been called on it.  Which subtlely breaks
> >> >   invaniants in exec.
> >> > 
> >> > - The caller of do_execve_file was expected to hold and put a
> >> >   reference to the file, but an extra reference for use by exec was
> >> >   not taken so that when exec put it's reference to the file an
> >> >   underflow occured on the file reference count.
> >> 
> >> Maybe its my growing love with testing, but I'm going to have to partly
> >> blame here that we added a new API without any respective testing.
> >> Granted, I recall this this patch set could have used more wider review
> >> and a bit more patience... but just mentioning this so we try to avoid
> >> new api-without-testing with more reason in the future.
> >> 
> >> But more importantly, *how* could we have caught this? Or how can we
> >> catch this sort of stuff better in the future?
> >
> > Of all the issues you pointed out with do_execve_file(), since upon
> > review the assumption *by design* was that LSMs/etc would pick up issues
> > with the file *prior* to processing, I think that this file reference
> > count issue comes to my attention as the more serious issue which I
> > wish we could address *first* before this crusade.
> >
> > So I have to ask, has anyone *really tried* to give a crack at fixing
> > this refcount issue in a smaller way first? Alexei?
> >
> > I'm not opposed to the removal of do_execve_file(), however if there
> > is a reproducible crash / issue with the existing user, this sledge
> > hammer seems a bit overkill for older kernels.
> 
> It does not matter for older kernels because there is exactly one user.
> That one user is just a place holder keeping the code alive until a real
> user comes along.
> 
> For older kernels the solution is to just mark the bpfilter code broken
> in Kconfig and refuse to compile it.  That is the trivial backportable
> fix if anyone wants one.

This seals the deal for me, thanks! Carry on, but hey, please add
yourself to MAINTAINERS too :)
 
  Luis
Pavel Machek July 12, 2020, 9:02 p.m. UTC | #5
On Thu 2020-07-02 11:41:34, Eric W. Biederman wrote:
> Now that the last callser has been removed remove this code from exec.

Typo "caller".

> For anyone thinking of resurrecing do_execve_file please note that

resurrecting?

> the code was buggy in several fundamental ways.
> 
> - It did not ensure the file it was passed was read-only and that
>   deny_write_access had been called on it.  Which subtlely breaks
>   invaniants in exec.

subtly, invariants?

									Pavel

Patch
diff mbox series

diff --git a/fs/exec.c b/fs/exec.c
index e6e8a9a70327..23dfbb820626 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1818,13 +1818,14 @@  static int exec_binprm(struct linux_binprm *bprm)
 /*
  * sys_execve() executes a new program.
  */
-static int __do_execve_file(int fd, struct filename *filename,
-			    struct user_arg_ptr argv,
-			    struct user_arg_ptr envp,
-			    int flags, struct file *file)
+static int do_execveat_common(int fd, struct filename *filename,
+			      struct user_arg_ptr argv,
+			      struct user_arg_ptr envp,
+			      int flags)
 {
 	char *pathbuf = NULL;
 	struct linux_binprm *bprm;
+	struct file *file;
 	struct files_struct *displaced;
 	int retval;
 
@@ -1863,8 +1864,7 @@  static int __do_execve_file(int fd, struct filename *filename,
 	check_unsafe_exec(bprm);
 	current->in_execve = 1;
 
-	if (!file)
-		file = do_open_execat(fd, filename, flags);
+	file = do_open_execat(fd, filename, flags);
 	retval = PTR_ERR(file);
 	if (IS_ERR(file))
 		goto out_unmark;
@@ -1872,9 +1872,7 @@  static int __do_execve_file(int fd, struct filename *filename,
 	sched_exec();
 
 	bprm->file = file;
-	if (!filename) {
-		bprm->filename = "none";
-	} else if (fd == AT_FDCWD || filename->name[0] == '/') {
+	if (fd == AT_FDCWD || filename->name[0] == '/') {
 		bprm->filename = filename->name;
 	} else {
 		if (filename->name[0] == '\0')
@@ -1935,8 +1933,7 @@  static int __do_execve_file(int fd, struct filename *filename,
 	task_numa_free(current, false);
 	free_bprm(bprm);
 	kfree(pathbuf);
-	if (filename)
-		putname(filename);
+	putname(filename);
 	if (displaced)
 		put_files_struct(displaced);
 	return retval;
@@ -1967,27 +1964,10 @@  static int __do_execve_file(int fd, struct filename *filename,
 	if (displaced)
 		reset_files_struct(displaced);
 out_ret:
-	if (filename)
-		putname(filename);
+	putname(filename);
 	return retval;
 }
 
-static int do_execveat_common(int fd, struct filename *filename,
-			      struct user_arg_ptr argv,
-			      struct user_arg_ptr envp,
-			      int flags)
-{
-	return __do_execve_file(fd, filename, argv, envp, flags, NULL);
-}
-
-int do_execve_file(struct file *file, void *__argv, void *__envp)
-{
-	struct user_arg_ptr argv = { .ptr.native = __argv };
-	struct user_arg_ptr envp = { .ptr.native = __envp };
-
-	return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file);
-}
-
 int do_execve(struct filename *filename,
 	const char __user *const __user *__argv,
 	const char __user *const __user *__envp)
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 4a20b7517dd0..7c27d7b57871 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -141,6 +141,5 @@  extern int do_execveat(int, struct filename *,
 		       const char __user * const __user *,
 		       const char __user * const __user *,
 		       int);
-int do_execve_file(struct file *file, void *__argv, void *__envp);
 
 #endif /* _LINUX_BINFMTS_H */