diff mbox series

[rdma-core] verbs: Allow all commands to be invoked by ioctl

Message ID 20181203205827.GA25410@ziepe.ca (mailing list archive)
State Accepted
Headers show
Series [rdma-core] verbs: Allow all commands to be invoked by ioctl | expand

Commit Message

Jason Gunthorpe Dec. 3, 2018, 8:58 p.m. UTC
Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing
write() commands can be immediately converted to execute over ioctl.

If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported,
and write will not be called. Otherwise the 'both' mode will auto
detect the kernel support and use it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 libibverbs/cmd_fallback.c | 44 ++++++++++++++++++++++++++++++++++++++-
 libibverbs/device.c       | 27 ++++++++++++++++++++++++
 libibverbs/ibverbs.h      |  1 +
 3 files changed, 71 insertions(+), 1 deletion(-)

This needs the matching kernel patch:

https://patchwork.kernel.org/patch/10706127/

a kernel headers update and the series at:

https://github.com/linux-rdma/rdma-core/pull/434

To apply.

Comments

Doug Ledford Dec. 3, 2018, 10:07 p.m. UTC | #1
On Mon, 2018-12-03 at 20:58 +0000, Jason Gunthorpe wrote:
> Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing
> write() commands can be immediately converted to execute over ioctl.
> 
> If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported,
> and write will not be called. Otherwise the 'both' mode will auto
> detect the kernel support and use it.
> 
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
>  libibverbs/cmd_fallback.c | 44 ++++++++++++++++++++++++++++++++++++++-
>  libibverbs/device.c       | 27 ++++++++++++++++++++++++
>  libibverbs/ibverbs.h      |  1 +
>  3 files changed, 71 insertions(+), 1 deletion(-)
> 
> This needs the matching kernel patch:
> 
> https://patchwork.kernel.org/patch/10706127/
> 
> a kernel headers update and the series at:
> 
> https://github.com/linux-rdma/rdma-core/pull/434
> 
> To apply.

And long term, this is what we need to resolve the security issue/no
fork problem...

We leave the write() interface enabled in the kernel for backward
compatibility for the foreseeable future (this is the whole "no
regressions" requirement from Linus...we were required to regress in
order to resolve the security issue, but other than that, we should
require users to update rdma-core to keep working systems working, so
the existing write() interface needs to stay online).

We enable this ioctl-can-wrap-writes feature in rdma-core and in the
kernel

New builds of rdma-core are switched to be ioctl only by preference with
a fallback to write if necessary, or are you thinking build it with
-DIOCTL_MODE and force it to require newer kernels?

Regardless, we keep the draconian check in place on the write syscall
entry point, so if anyone tries to do any funny stuff with forking an
application that's already opened a uverbs device, and manages to trick
an application to writing to the uverbs device, it fails.  But we don't
turn the write() interface off, we just leave that draconian check
forever.

Going forward, the only way to fork then use open device work is if you
use the ioctl method exclusively.  This is enforced in the kernel simply
by leaving the draconian check on the write syscall entry point and the
new ioctl method you just posted bypasses the write syscall entry point
and directly looks up the method handler for all of the write commands
and then calls the handler directly after setting up the right cmd/resp
udata structs as part of the attr bundle.

In the end, this is all guaranteed safe because the rest of the core
kernel doesn't play setfs(KERN_DS); games with ioctl commands, where as
they might do so on the write() syscall to redirect things under certain
circumstances.

Have I got all of the long term intentions for this straight while I'm
going over it all?
Jason Gunthorpe Dec. 3, 2018, 10:20 p.m. UTC | #2
On Mon, Dec 03, 2018 at 05:07:53PM -0500, Doug Ledford wrote:
> On Mon, 2018-12-03 at 20:58 +0000, Jason Gunthorpe wrote:
> > Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing
> > write() commands can be immediately converted to execute over ioctl.
> > 
> > If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported,
> > and write will not be called. Otherwise the 'both' mode will auto
> > detect the kernel support and use it.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> >  libibverbs/cmd_fallback.c | 44 ++++++++++++++++++++++++++++++++++++++-
> >  libibverbs/device.c       | 27 ++++++++++++++++++++++++
> >  libibverbs/ibverbs.h      |  1 +
> >  3 files changed, 71 insertions(+), 1 deletion(-)
> > 
> > This needs the matching kernel patch:
> > 
> > https://patchwork.kernel.org/patch/10706127/
> > 
> > a kernel headers update and the series at:
> > 
> > https://github.com/linux-rdma/rdma-core/pull/434
> > 
> > To apply.
> 
> And long term, this is what we need to resolve the security issue/no
> fork problem...

Yes

> We leave the write() interface enabled in the kernel for backward
> compatibility for the foreseeable future (this is the whole "no
> regressions" requirement from Linus...we were required to regress in
> order to resolve the security issue, but other than that, we should
> require users to update rdma-core to keep working systems working, so
> the existing write() interface needs to stay online).

Yes

> We enable this ioctl-can-wrap-writes feature in rdma-core and in the
> kernel

Yes, but this patch just provides the code, the default compile option
is left as write() only.

All the provider teams should test with ioctl() only mode and confirm
there aren't any crazy things missed. I was imagining we'd give
another month or two for that, the moven the rdma-core default to both
(ie prefer ioctl if available)

At some point testing of the write() side will diminish, but now that
everything is the same flow it shouldn't be too hard to keep working.

> New builds of rdma-core are switched to be ioctl only by preference with
> a fallback to write if necessary, or are you thinking build it with
> -DIOCTL_MODE and force it to require newer kernels?

I image the ioctl only mode as an option distros could choose to
enable if they feel confident they don't want to support write() only
kernels. It speeds up the call path a bit and removes some code from
rdma-core, so it is worth doing if you can.

> Regardless, we keep the draconian check in place on the write syscall
> entry point, so if anyone tries to do any funny stuff with forking an
> application that's already opened a uverbs device, and manages to trick
> an application to writing to the uverbs device, it fails.  But we don't
> turn the write() interface off, we just leave that draconian check
> forever.

yes, basically. In two years we add a pr_once saying the userspace
should be udpated or something

> Going forward, the only way to fork then use open device work is if you
> use the ioctl method exclusively.  This is enforced in the kernel simply
> by leaving the draconian check on the write syscall entry point and the
> new ioctl method you just posted bypasses the write syscall entry point
> and directly looks up the method handler for all of the write commands
> and then calls the handler directly after setting up the right cmd/resp
> udata structs as part of the attr bundle.

Yes

> In the end, this is all guaranteed safe because the rest of the core
> kernel doesn't play setfs(KERN_DS); games with ioctl commands, where as
> they might do so on the write() syscall to redirect things under certain
> circumstances.

Yes, ioctl is allowed to chase pointers, write() and read() are not..
 
> Have I got all of the long term intentions for this straight while I'm
> going over it all?

Yep

Jason
Leon Romanovsky Dec. 4, 2018, 8:55 a.m. UTC | #3
On Mon, Dec 03, 2018 at 03:20:40PM -0700, Jason Gunthorpe wrote:
> On Mon, Dec 03, 2018 at 05:07:53PM -0500, Doug Ledford wrote:
> > On Mon, 2018-12-03 at 20:58 +0000, Jason Gunthorpe wrote:
> > > Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing
> > > write() commands can be immediately converted to execute over ioctl.
> > >
> > > If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported,
> > > and write will not be called. Otherwise the 'both' mode will auto
> > > detect the kernel support and use it.
> > >
> > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> > >  libibverbs/cmd_fallback.c | 44 ++++++++++++++++++++++++++++++++++++++-
> > >  libibverbs/device.c       | 27 ++++++++++++++++++++++++
> > >  libibverbs/ibverbs.h      |  1 +
> > >  3 files changed, 71 insertions(+), 1 deletion(-)
> > >
> > > This needs the matching kernel patch:
> > >
> > > https://patchwork.kernel.org/patch/10706127/
> > >
> > > a kernel headers update and the series at:
> > >
> > > https://github.com/linux-rdma/rdma-core/pull/434
> > >
> > > To apply.
> >
> > And long term, this is what we need to resolve the security issue/no
> > fork problem...
>
> Yes
>
> > We leave the write() interface enabled in the kernel for backward
> > compatibility for the foreseeable future (this is the whole "no
> > regressions" requirement from Linus...we were required to regress in
> > order to resolve the security issue, but other than that, we should
> > require users to update rdma-core to keep working systems working, so
> > the existing write() interface needs to stay online).
>
> Yes
>
> > We enable this ioctl-can-wrap-writes feature in rdma-core and in the
> > kernel
>
> Yes, but this patch just provides the code, the default compile option
> is left as write() only.
>
> All the provider teams should test with ioctl() only mode and confirm
> there aren't any crazy things missed. I was imagining we'd give
> another month or two for that, the moven the rdma-core default to both
> (ie prefer ioctl if available)
>
> At some point testing of the write() side will diminish, but now that
> everything is the same flow it shouldn't be too hard to keep working.

I'm in favour to set ioctl as default right now, so those "providers
teams" will go an test this path right now and not in some unknown
future.

Thanks
Doug Ledford Dec. 18, 2018, 7:41 p.m. UTC | #4
On Mon, 2018-12-03 at 20:58 +0000, Jason Gunthorpe wrote:
> Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing
> write() commands can be immediately converted to execute over ioctl.
> 
> If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported,
> and write will not be called. Otherwise the 'both' mode will auto
> detect the kernel support and use it.
> 
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

I've pulled this patch, the official uABI headers update commit, and a
commit to change the default to both for the ioctl mode into a PR that
is currently undergoing Travis CI.  If it passes, I'll merge it and we
will officially have a ready to test ioctl mode API that can be utilized
to get around the fork() issues created by the security fix to the
write() path support.
Doug Ledford Dec. 18, 2018, 7:57 p.m. UTC | #5
On Tue, 2018-12-18 at 14:41 -0500, Doug Ledford wrote:
> On Mon, 2018-12-03 at 20:58 +0000, Jason Gunthorpe wrote:
> > Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing
> > write() commands can be immediately converted to execute over ioctl.
> > 
> > If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported,
> > and write will not be called. Otherwise the 'both' mode will auto
> > detect the kernel support and use it.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> 
> I've pulled this patch, the official uABI headers update commit, and a
> commit to change the default to both for the ioctl mode into a PR that
> is currently undergoing Travis CI.  If it passes, I'll merge it and we
> will officially have a ready to test ioctl mode API that can be utilized
> to get around the fork() issues created by the security fix to the
> write() path support.
> 

Travis checks passed, merged.
diff mbox series

Patch

diff --git a/libibverbs/cmd_fallback.c b/libibverbs/cmd_fallback.c
index 0e50503e003484..38193a186a8ccb 100644
--- a/libibverbs/cmd_fallback.c
+++ b/libibverbs/cmd_fallback.c
@@ -206,11 +206,43 @@  void *_write_get_resp_ex(struct ibv_command_buffer *link,
 	return resp_start;
 }
 
+static int ioctl_write(struct ibv_context *ctx, unsigned int write_method,
+		       const void *req, size_t core_req_size, size_t req_size,
+		       void *resp, size_t core_resp_size, size_t resp_size)
+{
+	DECLARE_COMMAND_BUFFER(cmdb, UVERBS_OBJECT_DEVICE,
+			       UVERBS_METHOD_INVOKE_WRITE, 5);
+
+	fill_attr_const_in(cmdb, UVERBS_ATTR_WRITE_CMD, write_method);
+
+	if (core_req_size)
+		fill_attr_in(cmdb, UVERBS_ATTR_CORE_IN, req, core_req_size);
+	if (core_resp_size)
+		fill_attr_out(cmdb, UVERBS_ATTR_CORE_OUT, resp, core_resp_size);
+
+	if (req_size - core_req_size)
+		fill_attr_in(cmdb, UVERBS_ATTR_UHW_IN, req + core_req_size,
+			     req_size - core_req_size);
+	if (resp_size - core_resp_size)
+		fill_attr_out(cmdb, UVERBS_ATTR_UHW_OUT, resp + core_resp_size,
+			     resp_size - core_resp_size);
+
+	return execute_ioctl(ctx, cmdb);
+}
+
 int _execute_cmd_write(struct ibv_context *ctx, unsigned int write_method,
 		       struct ib_uverbs_cmd_hdr *req, size_t core_req_size,
 		       size_t req_size, void *resp, size_t core_resp_size,
 		       size_t resp_size)
 {
+	struct verbs_ex_private *priv = get_priv(ctx);
+
+	if (!VERBS_WRITE_ONLY && (VERBS_IOCTL_ONLY || priv->use_ioctl_write))
+		return ioctl_write(ctx, write_method, req + 1,
+				   core_req_size - sizeof(*req),
+				   req_size - sizeof(*req), resp,
+				   core_resp_size, resp_size);
+
 	req->command = write_method;
 	req->in_words = __check_divide(req_size, 4);
 	req->out_words = __check_divide(resp_size, 4);
@@ -232,6 +264,15 @@  int _execute_cmd_write_ex(struct ibv_context *ctx, unsigned int write_method,
 		       size_t req_size, void *resp, size_t core_resp_size,
 		       size_t resp_size)
 {
+	struct verbs_ex_private *priv = get_priv(ctx);
+
+	if (!VERBS_WRITE_ONLY && (VERBS_IOCTL_ONLY || priv->use_ioctl_write))
+		return ioctl_write(
+			ctx, IB_USER_VERBS_CMD_FLAG_EXTENDED | write_method,
+			req + 1, core_req_size - sizeof(*req),
+			req_size - sizeof(*req), resp, core_resp_size,
+			resp_size);
+
 	req->hdr.command = IB_USER_VERBS_CMD_FLAG_EXTENDED | write_method;
 	req->hdr.in_words =
 		__check_divide(core_req_size - sizeof(struct ex_hdr), 8);
@@ -245,7 +286,8 @@  int _execute_cmd_write_ex(struct ibv_context *ctx, unsigned int write_method,
 
 	/*
 	 * Users assumes the stack buffer is zeroed before passing to the
-	 * kernel for writing.
+	 * kernel for writing. New kernels with the ioctl path do this
+	 * automatically for us.
 	 */
 	if (resp)
 		memset(resp, 0, resp_size);
diff --git a/libibverbs/device.c b/libibverbs/device.c
index e6e23e718d9fca..1fed29b17ec31d 100644
--- a/libibverbs/device.c
+++ b/libibverbs/device.c
@@ -43,6 +43,7 @@ 
 #include <alloca.h>
 #include <errno.h>
 
+#include <rdma/ib_user_ioctl_cmds.h>
 #include <util/symver.h>
 #include "ibverbs.h"
 
@@ -193,6 +194,31 @@  __lib_ibv_create_cq_ex(struct ibv_context *context,
 	return cq;
 }
 
+static bool has_ioctl_write(struct ibv_context *ctx)
+{
+	int rc;
+	DECLARE_COMMAND_BUFFER(cmdb, UVERBS_OBJECT_DEVICE,
+			       UVERBS_METHOD_INVOKE_WRITE, 1);
+
+	if (VERBS_IOCTL_ONLY)
+		return true;
+	if (VERBS_WRITE_ONLY)
+		return false;
+
+	/*
+	 * This command should return ENOSPC since the request length is too
+	 * small.
+	 */
+	fill_attr_const_in(cmdb, UVERBS_ATTR_WRITE_CMD,
+			   IB_USER_VERBS_CMD_QUERY_DEVICE);
+	rc = execute_ioctl(ctx, cmdb);
+	if (rc == EPROTONOSUPPORT)
+		return false;
+	if (rc == ENOTTY)
+		return false;
+	return true;
+}
+
 /*
  * Ownership of cmd_fd is transferred into this function, and it will either
  * be released during the matching call to verbs_uninit_contxt or during the
@@ -240,6 +266,7 @@  int verbs_init_context(struct verbs_context *context_ex,
 
 	context_ex->priv->driver_id = driver_id;
 	verbs_set_ops(context_ex, &verbs_dummy_ops);
+	context_ex->priv->use_ioctl_write = has_ioctl_write(context);
 
 	return 0;
 }
diff --git a/libibverbs/ibverbs.h b/libibverbs/ibverbs.h
index 24843b49fdb3c7..36e29a6b14d87b 100644
--- a/libibverbs/ibverbs.h
+++ b/libibverbs/ibverbs.h
@@ -70,6 +70,7 @@  void load_drivers(void);
 struct verbs_ex_private {
 	BITMAP_DECLARE(unsupported_ioctls, VERBS_OPS_NUM);
 	uint32_t driver_id;
+	bool use_ioctl_write;
 	struct verbs_context_ops ops;
 };