mbox series

[00/14] Make the user mode driver code a better citizen

Message ID 87pn9mgfc2.fsf_-_@x220.int.ebiederm.org (mailing list archive)
Headers show
Series Make the user mode driver code a better citizen | expand

Message

Eric W. Biederman June 26, 2020, 12:51 p.m. UTC
Asking for people to fix their bugs in this user mode driver code has
been remarkably unproductive.  So here are my bug fixes.

I have tested them by booting with the code compiled in and
by killing "bpfilter_umh" and running iptables -vnL to restart
the userspace driver.

I have split the changes into small enough pieces so they should be
easily readable and testable.  

The changes lean into the preexisting interfaces in the kernel and
remove special cases for user mode driver code in favor of solutions
that don't need special cases.  This results in smaller code with
fewer bugs.

At a practical level this removes the maintenance burden of the
user mode drivers from the user mode helper code and from exec as
the special cases are removed.

Similarly the LSM interaction bugs are fixed by not having unnecessary
special cases for user mode drivers.

Please let me know if you see any bugs.  Once the code review is
finished I plan to take this through my tree.

Eric W. Biederman (14):
      umh: Capture the pid in umh_pipe_setup
      umh: Move setting PF_UMH into umh_pipe_setup
      umh: Rename the user mode driver helpers for clarity
      umh: Remove call_usermodehelper_setup_file.
      umh: Separate the user mode driver and the user mode helper support
      umd: For clarity rename umh_info umd_info
      umd: Rename umd_info.cmdline umd_info.driver_name
      umd: Transform fork_usermode_blob into fork_usermode_driver
      umh: Stop calling do_execve_file
      exec: Remove do_execve_file
      bpfilter: Move bpfilter_umh back into init data
      umd: Track user space drivers with struct pid
      bpfilter: Take advantage of the facilities of struct pid
      umd: Remove exit_umh

 fs/exec.c                        |  38 ++------
 include/linux/binfmts.h          |   1 -
 include/linux/bpfilter.h         |   7 +-
 include/linux/sched.h            |   9 --
 include/linux/umd.h              |  18 ++++
 include/linux/umh.h              |  15 ----
 kernel/Makefile                  |   1 +
 kernel/exit.c                    |   1 -
 kernel/umd.c                     | 183 +++++++++++++++++++++++++++++++++++++++
 kernel/umh.c                     | 171 +-----------------------------------
 net/bpfilter/bpfilter_kern.c     |  38 ++++----
 net/bpfilter/bpfilter_umh_blob.S |   2 +-
 net/ipv4/bpfilter/sockopt.c      |  20 +++--
 13 files changed, 249 insertions(+), 255 deletions(-)

Eric

Comments

Eric W. Biederman June 26, 2020, 1:48 p.m. UTC | #1
Adding Luis Chamberlain as he maintains the user mode helper code.

Just so everyone who is relevant is at least aware of what is going on.

ebiederm@xmission.com (Eric W. Biederman) writes:

> Asking for people to fix their bugs in this user mode driver code has
> been remarkably unproductive.  So here are my bug fixes.
>
> I have tested them by booting with the code compiled in and
> by killing "bpfilter_umh" and running iptables -vnL to restart
> the userspace driver.
>
> I have split the changes into small enough pieces so they should be
> easily readable and testable.  
>
> The changes lean into the preexisting interfaces in the kernel and
> remove special cases for user mode driver code in favor of solutions
> that don't need special cases.  This results in smaller code with
> fewer bugs.
>
> At a practical level this removes the maintenance burden of the
> user mode drivers from the user mode helper code and from exec as
> the special cases are removed.
>
> Similarly the LSM interaction bugs are fixed by not having unnecessary
> special cases for user mode drivers.
>
> Please let me know if you see any bugs.  Once the code review is
> finished I plan to take this through my tree.
>
> Eric W. Biederman (14):
>       umh: Capture the pid in umh_pipe_setup
>       umh: Move setting PF_UMH into umh_pipe_setup
>       umh: Rename the user mode driver helpers for clarity
>       umh: Remove call_usermodehelper_setup_file.
>       umh: Separate the user mode driver and the user mode helper support
>       umd: For clarity rename umh_info umd_info
>       umd: Rename umd_info.cmdline umd_info.driver_name
>       umd: Transform fork_usermode_blob into fork_usermode_driver
>       umh: Stop calling do_execve_file
>       exec: Remove do_execve_file
>       bpfilter: Move bpfilter_umh back into init data
>       umd: Track user space drivers with struct pid
>       bpfilter: Take advantage of the facilities of struct pid
>       umd: Remove exit_umh
>
>  fs/exec.c                        |  38 ++------
>  include/linux/binfmts.h          |   1 -
>  include/linux/bpfilter.h         |   7 +-
>  include/linux/sched.h            |   9 --
>  include/linux/umd.h              |  18 ++++
>  include/linux/umh.h              |  15 ----
>  kernel/Makefile                  |   1 +
>  kernel/exit.c                    |   1 -
>  kernel/umd.c                     | 183 +++++++++++++++++++++++++++++++++++++++
>  kernel/umh.c                     | 171 +-----------------------------------
>  net/bpfilter/bpfilter_kern.c     |  38 ++++----
>  net/bpfilter/bpfilter_umh_blob.S |   2 +-
>  net/ipv4/bpfilter/sockopt.c      |  20 +++--
>  13 files changed, 249 insertions(+), 255 deletions(-)
>
> Eric
Greg KH June 26, 2020, 2:10 p.m. UTC | #2
On Fri, Jun 26, 2020 at 07:51:41AM -0500, Eric W. Biederman wrote:
> 
> Asking for people to fix their bugs in this user mode driver code has
> been remarkably unproductive.  So here are my bug fixes.
> 
> I have tested them by booting with the code compiled in and
> by killing "bpfilter_umh" and running iptables -vnL to restart
> the userspace driver.
> 
> I have split the changes into small enough pieces so they should be
> easily readable and testable.  
> 
> The changes lean into the preexisting interfaces in the kernel and
> remove special cases for user mode driver code in favor of solutions
> that don't need special cases.  This results in smaller code with
> fewer bugs.
> 
> At a practical level this removes the maintenance burden of the
> user mode drivers from the user mode helper code and from exec as
> the special cases are removed.
> 
> Similarly the LSM interaction bugs are fixed by not having unnecessary
> special cases for user mode drivers.
> 
> Please let me know if you see any bugs.  Once the code review is
> finished I plan to take this through my tree.
> 
> Eric W. Biederman (14):
>       umh: Capture the pid in umh_pipe_setup
>       umh: Move setting PF_UMH into umh_pipe_setup
>       umh: Rename the user mode driver helpers for clarity
>       umh: Remove call_usermodehelper_setup_file.
>       umh: Separate the user mode driver and the user mode helper support
>       umd: For clarity rename umh_info umd_info
>       umd: Rename umd_info.cmdline umd_info.driver_name
>       umd: Transform fork_usermode_blob into fork_usermode_driver
>       umh: Stop calling do_execve_file
>       exec: Remove do_execve_file
>       bpfilter: Move bpfilter_umh back into init data
>       umd: Track user space drivers with struct pid
>       bpfilter: Take advantage of the facilities of struct pid
>       umd: Remove exit_umh

After a quick read, all looks sane to me, nice cleanups!

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexei Starovoitov June 26, 2020, 4:40 p.m. UTC | #3
On Fri, Jun 26, 2020 at 07:51:41AM -0500, Eric W. Biederman wrote:
> 
> Asking for people to fix their bugs in this user mode driver code has
> been remarkably unproductive.  So here are my bug fixes.
> 
> I have tested them by booting with the code compiled in and
> by killing "bpfilter_umh" and running iptables -vnL to restart
> the userspace driver.
> 
> I have split the changes into small enough pieces so they should be
> easily readable and testable.  
> 
> The changes lean into the preexisting interfaces in the kernel and
> remove special cases for user mode driver code in favor of solutions
> that don't need special cases.  This results in smaller code with
> fewer bugs.
> 
> At a practical level this removes the maintenance burden of the
> user mode drivers from the user mode helper code and from exec as
> the special cases are removed.
> 
> Similarly the LSM interaction bugs are fixed by not having unnecessary
> special cases for user mode drivers.
> 
> Please let me know if you see any bugs.  Once the code review is
> finished I plan to take this through my tree.

I did a quick look and I like the cleanup. Thanks! The end result looks good.
The only problem that you keep breaking the build between patches,
so series will not be bisectable.
blob_to_mnt is a great idea. Much better than embedding fs you advocated earlier.

I'm swamped with other stuff today and will test the set Sunday/Monday
with other patches that I'm working on.
I'm not sure why you want to rename the interface. Seems pointless. But fine.

As far as routing trees. Do you mind I'll take it via bpf-next ?
As I said countless times we're working on bpf_iter using fork_blob.
If you take this set via your tree we would need to wait the whole kernel release.
Which is 8+ weeks before we can use the interface (due to renaming and overall changes).
I'd really like to avoid this huge delay.
Unless you can land it into 5.8-rc2 or rc3.
Eric W. Biederman June 26, 2020, 5:17 p.m. UTC | #4
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Fri, Jun 26, 2020 at 07:51:41AM -0500, Eric W. Biederman wrote:
>> 
>> Asking for people to fix their bugs in this user mode driver code has
>> been remarkably unproductive.  So here are my bug fixes.
>> 
>> I have tested them by booting with the code compiled in and
>> by killing "bpfilter_umh" and running iptables -vnL to restart
>> the userspace driver.
>> 
>> I have split the changes into small enough pieces so they should be
>> easily readable and testable.  
>> 
>> The changes lean into the preexisting interfaces in the kernel and
>> remove special cases for user mode driver code in favor of solutions
>> that don't need special cases.  This results in smaller code with
>> fewer bugs.
>> 
>> At a practical level this removes the maintenance burden of the
>> user mode drivers from the user mode helper code and from exec as
>> the special cases are removed.
>> 
>> Similarly the LSM interaction bugs are fixed by not having unnecessary
>> special cases for user mode drivers.
>> 
>> Please let me know if you see any bugs.  Once the code review is
>> finished I plan to take this through my tree.
>
> I did a quick look and I like the cleanup. Thanks!

Good then we have a path forward.

> The end result looks good.
> The only problem that you keep breaking the build between patches,
> so series will not be bisectable.

Keep breaking?  There is an issue with patch 5/14 where the build breaks
when bpfilter is not enabled.  Do you see any others? I know I tested
each patch individually.  But I was only testing with CONFIG_BPFILTER
enabled so I missed one.

So there should not be things that break
but things slip through occassionally.

I will resend this shortly with the fix and any others that I can find.

> blob_to_mnt is a great idea. Much better than embedding fs you advocated earlier.

I was lazy and not overdesigning but I still suspect the blob will
benefit from becoming a cpio in the future.

> I'm swamped with other stuff today and will test the set Sunday/Monday
> with other patches that I'm working on.
> I'm not sure why you want to rename the interface. Seems
> pointless. But fine.

For maintainability I think the code very much benefits from a clear
separation between the user mode driver code from the user mode helper
code.

> As far as routing trees. Do you mind I'll take it via bpf-next ?
> As I said countless times we're working on bpf_iter using fork_blob.
> If you take this set via your tree we would need to wait the whole kernel release.
> Which is 8+ weeks before we can use the interface (due to renaming and overall changes).
> I'd really like to avoid this huge delay.
> Unless you can land it into 5.8-rc2 or rc3.

I also want to build upon this code.

How about when the review is done I post a frozen branch based on
v5.8-rc1 that you can merge into the bpf-next tree, and I can merge into
my branch.  That way we both can build upon this code.  That is the way
conflicts like this are usually handled.

Further I will leave any further enhancements to the user mode driver
infrastructure that people have suggested to you.

I will probably replace do_execve with a kernel_execve that doesn't need
set_fs() to copy the command line argument.  I haven't seen Christoph
Hellwig address that yet, and it looks pretty straight foward at this
point.


Eric
Alexei Starovoitov June 26, 2020, 6:22 p.m. UTC | #5
On Fri, Jun 26, 2020 at 12:17:40PM -0500, Eric W. Biederman wrote:
> 
> > I'm swamped with other stuff today and will test the set Sunday/Monday
> > with other patches that I'm working on.
> > I'm not sure why you want to rename the interface. Seems
> > pointless. But fine.
> 
> For maintainability I think the code very much benefits from a clear
> separation between the user mode driver code from the user mode helper
> code.

you mean different name gives that separation? makes sense.

> > As far as routing trees. Do you mind I'll take it via bpf-next ?
> > As I said countless times we're working on bpf_iter using fork_blob.
> > If you take this set via your tree we would need to wait the whole kernel release.
> > Which is 8+ weeks before we can use the interface (due to renaming and overall changes).
> > I'd really like to avoid this huge delay.
> > Unless you can land it into 5.8-rc2 or rc3.
> 
> I also want to build upon this code.
> 
> How about when the review is done I post a frozen branch based on
> v5.8-rc1 that you can merge into the bpf-next tree, and I can merge into
> my branch.  That way we both can build upon this code.  That is the way
> conflicts like this are usually handled.

sure. that works too.

> Further I will leave any further enhancements to the user mode driver
> infrastructure that people have suggested to you.

ok
Tetsuo Handa June 27, 2020, 11:38 a.m. UTC | #6
On 2020/06/26 21:51, Eric W. Biederman wrote:
> Please let me know if you see any bugs.  Once the code review is
> finished I plan to take this through my tree.

This series needs some sanity checks.

diff --git a/kernel/umd.c b/kernel/umd.c
index de2f542191e5..f3e0227a3012 100644
--- a/kernel/umd.c
+++ b/kernel/umd.c
@@ -47,15 +47,18 @@ static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *na
 
 /**
  * umd_load_blob - Remember a blob of bytes for fork_usermode_driver
- * @info: information about usermode driver
- * @data: a blob of bytes that can be executed as a file
- * @len:  The lentgh of the blob
+ * @info: information about usermode driver (shouldn't be NULL)
+ * @data: a blob of bytes that can be executed as a file (shouldn't be NULL)
+ * @len:  The lentgh of the blob (shouldn't be 0)
  *
  */
 int umd_load_blob(struct umd_info *info, const void *data, size_t len)
 {
 	struct vfsmount *mnt;
 
+	if (!info || !info->driver_name || !data || !len)
+		return -EINVAL;
+
 	if (WARN_ON_ONCE(info->wd.dentry || info->wd.mnt))
 		return -EBUSY;
 
@@ -158,6 +161,9 @@ int fork_usermode_driver(struct umd_info *info)
 	char **argv = NULL;
 	int err;
 
+	if (!info || !info->driver_name)
+		return -EINVAL;
+
 	if (WARN_ON_ONCE(info->tgid))
 		return -EBUSY;
 
But loading

----- test.c -----
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/umd.h>

static int __init test_init(void)
{
	const char blob[464] = {
		"\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x02\x00\x3e\x00\x01\x00\x00\x00\x80\x00\x40\x00\x00\x00\x00\x00"
		"\x40\x00\x00\x00\x00\x00\x00\x00\xd0\x00\x00\x00\x00\x00\x00\x00"
		"\x00\x00\x00\x00\x40\x00\x38\x00\x01\x00\x40\x00\x04\x00\x03\x00"
		"\x01\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x00\x00\x40\x00\x00\x00\x00\x00\x00\x00\x40\x00\x00\x00\x00\x00"
		"\xb4\x00\x00\x00\x00\x00\x00\x00\xb4\x00\x00\x00\x00\x00\x00\x00"
		"\x00\x00\x20\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\xb8\x01\x00\x00\x00\xbf\x01\x00\x00\x00\x48\xbe\xa8\x00\x40\x00"
		"\x00\x00\x00\x00\xba\x0c\x00\x00\x00\x0f\x05\xb8\xe7\x00\x00\x00"
		"\xbf\x00\x00\x00\x00\x0f\x05\x00\x48\x65\x6c\x6c\x6f\x20\x77\x6f"
		"\x72\x6c\x64\x0a\x00\x2e\x73\x68\x73\x74\x72\x74\x61\x62\x00\x2e"
		"\x74\x65\x78\x74\x00\x2e\x72\x6f\x64\x61\x74\x61\x00\x00\x00\x00"
		"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x0b\x00\x00\x00\x01\x00\x00\x00\x06\x00\x00\x00\x00\x00\x00\x00"
		"\x80\x00\x40\x00\x00\x00\x00\x00\x80\x00\x00\x00\x00\x00\x00\x00"
		"\x27\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x11\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00"
		"\xa8\x00\x40\x00\x00\x00\x00\x00\xa8\x00\x00\x00\x00\x00\x00\x00"
		"\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x01\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x00\x00\x00\x00\x00\x00\x00\x00\xb4\x00\x00\x00\x00\x00\x00\x00"
		"\x19\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
		"\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
	};
	struct umd_info *info = kzalloc(sizeof(*info), GFP_KERNEL);
	
	if (!info)
		return -ENOMEM;
	info->driver_name = kstrdup("my test driver", GFP_KERNEL);
	printk("umd_load_blob()=%d\n", umd_load_blob(info, blob, 464));
	//printk("fork_usermode_driver()=%d\n", fork_usermode_driver(info));
	return -EINVAL;
}

module_init(test_init);
MODULE_LICENSE("GPL");
----- test.c -----

causes

   BUG_ON(!(task->flags & PF_KTHREAD));

in __fput_sync(). Do we want to forbid umd_load_blob() from process context (e.g.
upon module initialization time) ?

Also, since umd_load_blob() uses info->driver_name as filename, info->driver_name has to
satisfy strchr(info->driver_name, '/') == NULL && strlen(info->driver_name) <= NAME_MAX
in order to avoid -ENOENT failure. On the other hand, since fork_usermode_driver() uses
info->driver_name as argv[], info->driver_name has to use ' ' within this constraint.
This might be inconvenient...
Eric W. Biederman June 27, 2020, 12:59 p.m. UTC | #7
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:

> On 2020/06/26 21:51, Eric W. Biederman wrote:
>> Please let me know if you see any bugs.  Once the code review is
>> finished I plan to take this through my tree.
>

[sniped example code]
> causes
>
>    BUG_ON(!(task->flags & PF_KTHREAD));
>
> in __fput_sync(). Do we want to forbid umd_load_blob() from process context (e.g.
> upon module initialization time) ?

Interesting.  I had not realized that fput_sync would not work from
module context.

Forcing the fput to finish is absolutely necessary.  Otherwise the file
will still be open for write and deny_write_access in execve will fail.

Can you try replacing the __fput_sync with:
	fput(file);
        flush_delayed_fput();
        task_work_run();


Given that there is a big requirement for the code to run before init
I don't necessarily think it is a problem __fput_sync is a problem.
But it also seems silly to forbid modules if we can easily fix
the code.

Eric
Tetsuo Handa June 27, 2020, 1:57 p.m. UTC | #8
On 2020/06/27 21:59, Eric W. Biederman wrote:
> Can you try replacing the __fput_sync with:
> 	fput(file);
>         flush_delayed_fput();
>         task_work_run();

With below change, TOMOYO can obtain pathname like "tmpfs:/my\040test\040driver".

Please avoid WARN_ON() if printk() is sufficient (for friendliness to panic_on_warn=1 environments).
For argv[], I guess that fork_usermode_driver() should receive argv[] as argument rather than
trying to split info->driver_name, for somebody might want to pass meaningful argv[] (and
TOMOYO wants to use meaningful argv[] as a hint for identifying the intent).

diff --git a/kernel/umd.c b/kernel/umd.c
index de2f542191e5..ae6e85283f13 100644
--- a/kernel/umd.c
+++ b/kernel/umd.c
@@ -7,6 +7,7 @@
 #include <linux/mount.h>
 #include <linux/fs_struct.h>
 #include <linux/umd.h>
+#include <linux/task_work.h>
 
 static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *name)
 {
@@ -25,7 +26,7 @@ static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *na
 	if (IS_ERR(mnt))
 		return mnt;
 
-	file = file_open_root(mnt->mnt_root, mnt, name, O_CREAT | O_WRONLY, 0700);
+	file = file_open_root(mnt->mnt_root, mnt, name, O_CREAT | O_WRONLY | O_EXCL, 0700);
 	if (IS_ERR(file)) {
 		mntput(mnt);
 		return ERR_CAST(file);
@@ -41,23 +42,33 @@ static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *na
 		return ERR_PTR(err);
 	}
 
-	__fput_sync(file);
+	if (current->flags & PF_KTHREAD) {
+		__fput_sync(file);
+	} else {
+		fput(file);
+		flush_delayed_fput();
+		task_work_run();
+	}
 	return mnt;
 }
 
 /**
  * umd_load_blob - Remember a blob of bytes for fork_usermode_driver
- * @info: information about usermode driver
- * @data: a blob of bytes that can be executed as a file
- * @len:  The lentgh of the blob
+ * @info: information about usermode driver (shouldn't be NULL)
+ * @data: a blob of bytes that can be executed as a file (shouldn't be NULL)
+ * @len:  The lentgh of the blob (shouldn't be 0)
  *
  */
 int umd_load_blob(struct umd_info *info, const void *data, size_t len)
 {
 	struct vfsmount *mnt;
 
-	if (WARN_ON_ONCE(info->wd.dentry || info->wd.mnt))
+	if (!info || !info->driver_name || !data || !len)
+		return -EINVAL;
+	if (info->wd.dentry || info->wd.mnt) {
+		pr_info("%s already loaded.\n", info->driver_name);
 		return -EBUSY;
+	}
 
 	mnt = blob_to_mnt(data, len, info->driver_name);
 	if (IS_ERR(mnt))
@@ -71,14 +82,14 @@ EXPORT_SYMBOL_GPL(umd_load_blob);
 
 /**
  * umd_unload_blob - Disassociate @info from a previously loaded blob
- * @info: information about usermode driver
+ * @info: information about usermode driver (shouldn't be NULL)
  *
  */
 int umd_unload_blob(struct umd_info *info)
 {
-	if (WARN_ON_ONCE(!info->wd.mnt ||
-			 !info->wd.dentry ||
-			 info->wd.mnt->mnt_root != info->wd.dentry))
+	if (!info || !info->driver_name || !info->wd.dentry || !info->wd.mnt)
+		return -EINVAL;
+	if (WARN_ON_ONCE(info->wd.mnt->mnt_root != info->wd.dentry))
 		return -EINVAL;
 
 	kern_unmount(info->wd.mnt);
@@ -158,8 +169,14 @@ int fork_usermode_driver(struct umd_info *info)
 	char **argv = NULL;
 	int err;
 
-	if (WARN_ON_ONCE(info->tgid))
+	if (!info || !info->driver_name || !info->wd.dentry || !info->wd.mnt)
+		return -EINVAL;
+	if (WARN_ON_ONCE(info->wd.mnt->mnt_root != info->wd.dentry))
+		return -EINVAL;
+	if (info->tgid) {
+		pr_info("%s already running.\n", info->driver_name);
 		return -EBUSY;
+	}
 
 	err = -ENOMEM;
 	argv = argv_split(GFP_KERNEL, info->driver_name, NULL);
Alexei Starovoitov June 28, 2020, 7:44 p.m. UTC | #9
On Sat, Jun 27, 2020 at 10:57:10PM +0900, Tetsuo Handa wrote:
> On 2020/06/27 21:59, Eric W. Biederman wrote:
> > Can you try replacing the __fput_sync with:
> > 	fput(file);
> >         flush_delayed_fput();
> >         task_work_run();
> 
> With below change, TOMOYO can obtain pathname like "tmpfs:/my\040test\040driver".
> 
> Please avoid WARN_ON() if printk() is sufficient (for friendliness to panic_on_warn=1 environments).
> For argv[], I guess that fork_usermode_driver() should receive argv[] as argument rather than
> trying to split info->driver_name, for somebody might want to pass meaningful argv[] (and
> TOMOYO wants to use meaningful argv[] as a hint for identifying the intent).
> 
> diff --git a/kernel/umd.c b/kernel/umd.c
> index de2f542191e5..ae6e85283f13 100644
> --- a/kernel/umd.c
> +++ b/kernel/umd.c
> @@ -7,6 +7,7 @@
>  #include <linux/mount.h>
>  #include <linux/fs_struct.h>
>  #include <linux/umd.h>
> +#include <linux/task_work.h>
>  
>  static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *name)
>  {
> @@ -25,7 +26,7 @@ static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *na
>  	if (IS_ERR(mnt))
>  		return mnt;
>  
> -	file = file_open_root(mnt->mnt_root, mnt, name, O_CREAT | O_WRONLY, 0700);
> +	file = file_open_root(mnt->mnt_root, mnt, name, O_CREAT | O_WRONLY | O_EXCL, 0700);
>  	if (IS_ERR(file)) {
>  		mntput(mnt);
>  		return ERR_CAST(file);
> @@ -41,23 +42,33 @@ static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *na
>  		return ERR_PTR(err);
>  	}
>  
> -	__fput_sync(file);
> +	if (current->flags & PF_KTHREAD) {
> +		__fput_sync(file);
> +	} else {
> +		fput(file);
> +		flush_delayed_fput();
> +		task_work_run();
> +	}

Thanks. This makes sense to me.

>  	return mnt;
>  }
>  
>  /**
>   * umd_load_blob - Remember a blob of bytes for fork_usermode_driver
> - * @info: information about usermode driver
> - * @data: a blob of bytes that can be executed as a file
> - * @len:  The lentgh of the blob
> + * @info: information about usermode driver (shouldn't be NULL)
> + * @data: a blob of bytes that can be executed as a file (shouldn't be NULL)
> + * @len:  The lentgh of the blob (shouldn't be 0)
>   *
>   */
>  int umd_load_blob(struct umd_info *info, const void *data, size_t len)
>  {
>  	struct vfsmount *mnt;
>  
> -	if (WARN_ON_ONCE(info->wd.dentry || info->wd.mnt))
> +	if (!info || !info->driver_name || !data || !len)
> +		return -EINVAL;
> +	if (info->wd.dentry || info->wd.mnt) {
> +		pr_info("%s already loaded.\n", info->driver_name);
>  		return -EBUSY;
> +	}

But all the defensive programming kinda goes against general kernel style.
I wouldn't do it. Especially pr_info() ?!
Though I don't feel strongly about it.

I would like to generalize elf_header_check() a bit and call it
before doing blob_to_mnt() to make sure that all blobs are elf files only.
Supporting '#!/bin/bash' or other things as blobs seems wrong to me.
Tetsuo Handa June 29, 2020, 2:20 a.m. UTC | #10
On 2020/06/29 4:44, Alexei Starovoitov wrote:
> But all the defensive programming kinda goes against general kernel style.
> I wouldn't do it. Especially pr_info() ?!
> Though I don't feel strongly about it.

Honestly speaking, caller should check for errors and print appropriate
messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
went wrong (maybe memory corruption). But other conditions are not fatal.
That is, I consider even pr_info() here should be unnecessary.

> 
> I would like to generalize elf_header_check() a bit and call it
> before doing blob_to_mnt() to make sure that all blobs are elf files only.
> Supporting '#!/bin/bash' or other things as blobs seems wrong to me.

Why? There is no point with forbidding "#!", for users can use a wrapper
ELF binary which contains instructions including glibc's execv()/system()
functions even if "#!" cannot be used.

What is more important is what protection/isolation properties processes started
via fork_usermode_driver() should hold, for ELF binary can contain arbitrary
instructions, these processes run as daemons (reading request from stdin and
writing response to stdout) but hidden from "/usr/bin/pstree -p 1" (because
they are forked from kthreadd kernel thread).
Eric W. Biederman June 29, 2020, 8:19 p.m. UTC | #11
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:

> On 2020/06/29 4:44, Alexei Starovoitov wrote:
>> But all the defensive programming kinda goes against general kernel style.
>> I wouldn't do it. Especially pr_info() ?!
>> Though I don't feel strongly about it.
>
> Honestly speaking, caller should check for errors and print appropriate
> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
> went wrong (maybe memory corruption). But other conditions are not fatal.
> That is, I consider even pr_info() here should be unnecessary.

They were all should never happen cases.  Which is why my patches do:
if (WARN_ON_ONCE(...))

That let's the caller know the messed up very clearly while still
providing a change to continue.

If they were clearly corruption no ones kernel should ever continue
BUG_ON would be appropriate.

>> I would like to generalize elf_header_check() a bit and call it
>> before doing blob_to_mnt() to make sure that all blobs are elf files only.
>> Supporting '#!/bin/bash' or other things as blobs seems wrong to me.

I vote for not worry about things that have never happened, and are
obviously incorrect.

The only points of checks like that is to catch cases where other
developers misunderstand the interface.  When you get to something like
sysfs with lots and lots of users where it is hard to audit there
is real value in sanity checks.  In something like this with very few
users. Just making the code clear should be enough for people not to do
ridiculous things.


In any case Tetsuo I will leave futher sanity checks for you and Alexei
to work out.  It is beyond the scope of my patchset, and they are easy
enough to add as follow on patches.

Eric
Tetsuo Handa June 30, 2020, 6:28 a.m. UTC | #12
On 2020/06/30 5:19, Eric W. Biederman wrote:
> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
> 
>> On 2020/06/29 4:44, Alexei Starovoitov wrote:
>>> But all the defensive programming kinda goes against general kernel style.
>>> I wouldn't do it. Especially pr_info() ?!
>>> Though I don't feel strongly about it.
>>
>> Honestly speaking, caller should check for errors and print appropriate
>> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
>> went wrong (maybe memory corruption). But other conditions are not fatal.
>> That is, I consider even pr_info() here should be unnecessary.
> 
> They were all should never happen cases.  Which is why my patches do:
> if (WARN_ON_ONCE(...))

No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
This bug was unfortunately not found by syzkaller because this path is
not easily reachable via syscall interface.

> 
> That let's the caller know the messed up very clearly while still
> providing a change to continue.
> 
> If they were clearly corruption no ones kernel should ever continue
> BUG_ON would be appropriate.

Please use BUG_ON() (to only corruption case) like I suggested in my updated diff.
Eric W. Biederman June 30, 2020, 12:32 p.m. UTC | #13
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:

> On 2020/06/30 5:19, Eric W. Biederman wrote:
>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
>> 
>>> On 2020/06/29 4:44, Alexei Starovoitov wrote:
>>>> But all the defensive programming kinda goes against general kernel style.
>>>> I wouldn't do it. Especially pr_info() ?!
>>>> Though I don't feel strongly about it.
>>>
>>> Honestly speaking, caller should check for errors and print appropriate
>>> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
>>> went wrong (maybe memory corruption). But other conditions are not fatal.
>>> That is, I consider even pr_info() here should be unnecessary.
>> 
>> They were all should never happen cases.  Which is why my patches do:
>> if (WARN_ON_ONCE(...))
>
> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
> This bug was unfortunately not found by syzkaller because this path is
> not easily reachable via syscall interface.

Absolutely yes.  These are cases that should never happen.
They should never be reachable by userspace.

It is absolutely a bug if these are hit by userspace.

Now if fuzzers want horrible cases to be even more horrible and change a
nice friendly warn into a panic that is their problem.  The issue being
do they capture the information the rest of us need to fix.

Eric
Alexei Starovoitov June 30, 2020, 4:48 p.m. UTC | #14
On Tue, Jun 30, 2020 at 03:28:49PM +0900, Tetsuo Handa wrote:
> On 2020/06/30 5:19, Eric W. Biederman wrote:
> > Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
> > 
> >> On 2020/06/29 4:44, Alexei Starovoitov wrote:
> >>> But all the defensive programming kinda goes against general kernel style.
> >>> I wouldn't do it. Especially pr_info() ?!
> >>> Though I don't feel strongly about it.
> >>
> >> Honestly speaking, caller should check for errors and print appropriate
> >> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
> >> went wrong (maybe memory corruption). But other conditions are not fatal.
> >> That is, I consider even pr_info() here should be unnecessary.
> > 
> > They were all should never happen cases.  Which is why my patches do:
> > if (WARN_ON_ONCE(...))
> 
> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.

I don't believe that's true.
Please show fuzzing stack trace to prove your point.
Tetsuo Handa June 30, 2020, 9:54 p.m. UTC | #15
On 2020/07/01 1:48, Alexei Starovoitov wrote:
> On Tue, Jun 30, 2020 at 03:28:49PM +0900, Tetsuo Handa wrote:
>> On 2020/06/30 5:19, Eric W. Biederman wrote:
>>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
>>>
>>>> On 2020/06/29 4:44, Alexei Starovoitov wrote:
>>>>> But all the defensive programming kinda goes against general kernel style.
>>>>> I wouldn't do it. Especially pr_info() ?!
>>>>> Though I don't feel strongly about it.
>>>>
>>>> Honestly speaking, caller should check for errors and print appropriate
>>>> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
>>>> went wrong (maybe memory corruption). But other conditions are not fatal.
>>>> That is, I consider even pr_info() here should be unnecessary.
>>>
>>> They were all should never happen cases.  Which is why my patches do:
>>> if (WARN_ON_ONCE(...))
>>
>> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
> 
> I don't believe that's true.
> Please show fuzzing stack trace to prove your point.
> 

Please find links containing "WARNING" from https://syzkaller.appspot.com/upstream . ;-)
Alexei Starovoitov June 30, 2020, 9:57 p.m. UTC | #16
On Tue, Jun 30, 2020 at 2:55 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2020/07/01 1:48, Alexei Starovoitov wrote:
> > On Tue, Jun 30, 2020 at 03:28:49PM +0900, Tetsuo Handa wrote:
> >> On 2020/06/30 5:19, Eric W. Biederman wrote:
> >>> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes:
> >>>
> >>>> On 2020/06/29 4:44, Alexei Starovoitov wrote:
> >>>>> But all the defensive programming kinda goes against general kernel style.
> >>>>> I wouldn't do it. Especially pr_info() ?!
> >>>>> Though I don't feel strongly about it.
> >>>>
> >>>> Honestly speaking, caller should check for errors and print appropriate
> >>>> messages. info->wd.mnt->mnt_root != info->wd.dentry indicates that something
> >>>> went wrong (maybe memory corruption). But other conditions are not fatal.
> >>>> That is, I consider even pr_info() here should be unnecessary.
> >>>
> >>> They were all should never happen cases.  Which is why my patches do:
> >>> if (WARN_ON_ONCE(...))
> >>
> >> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
> >
> > I don't believe that's true.
> > Please show fuzzing stack trace to prove your point.
> >
>
> Please find links containing "WARNING" from https://syzkaller.appspot.com/upstream . ;-)

Is it a joke? Do you understand how syzbot works?
If so, please explain how it can invoke umd_* interface.
Tetsuo Handa June 30, 2020, 10:58 p.m. UTC | #17
On 2020/07/01 6:57, Alexei Starovoitov wrote:
>>>>> They were all should never happen cases.  Which is why my patches do:
>>>>> if (WARN_ON_ONCE(...))
>>>>
>>>> No. Fuzz testing (which uses panic_on_warn=1) will trivially hit them.
>>>
>>> I don't believe that's true.
>>> Please show fuzzing stack trace to prove your point.
>>>
>>
>> Please find links containing "WARNING" from https://syzkaller.appspot.com/upstream . ;-)
> 
> Is it a joke? Do you understand how syzbot works?
> If so, please explain how it can invoke umd_* interface.
> 

Currently syzkaller can't invoke umd_* interface because this interface is used by only
bpfilter_umh module. But I can imagine that someone starts using this interface in a way
syzkaller can somehow invoke. Thus, how can it be a joke? I don't understand your question.