Message ID | 20181203205827.GA25410@ziepe.ca (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [rdma-core] verbs: Allow all commands to be invoked by ioctl | expand |
On Mon, 2018-12-03 at 20:58 +0000, Jason Gunthorpe wrote: > Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing > write() commands can be immediately converted to execute over ioctl. > > If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported, > and write will not be called. Otherwise the 'both' mode will auto > detect the kernel support and use it. > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> > --- > libibverbs/cmd_fallback.c | 44 ++++++++++++++++++++++++++++++++++++++- > libibverbs/device.c | 27 ++++++++++++++++++++++++ > libibverbs/ibverbs.h | 1 + > 3 files changed, 71 insertions(+), 1 deletion(-) > > This needs the matching kernel patch: > > https://patchwork.kernel.org/patch/10706127/ > > a kernel headers update and the series at: > > https://github.com/linux-rdma/rdma-core/pull/434 > > To apply. And long term, this is what we need to resolve the security issue/no fork problem... We leave the write() interface enabled in the kernel for backward compatibility for the foreseeable future (this is the whole "no regressions" requirement from Linus...we were required to regress in order to resolve the security issue, but other than that, we should require users to update rdma-core to keep working systems working, so the existing write() interface needs to stay online). We enable this ioctl-can-wrap-writes feature in rdma-core and in the kernel New builds of rdma-core are switched to be ioctl only by preference with a fallback to write if necessary, or are you thinking build it with -DIOCTL_MODE and force it to require newer kernels? Regardless, we keep the draconian check in place on the write syscall entry point, so if anyone tries to do any funny stuff with forking an application that's already opened a uverbs device, and manages to trick an application to writing to the uverbs device, it fails. But we don't turn the write() interface off, we just leave that draconian check forever. Going forward, the only way to fork then use open device work is if you use the ioctl method exclusively. This is enforced in the kernel simply by leaving the draconian check on the write syscall entry point and the new ioctl method you just posted bypasses the write syscall entry point and directly looks up the method handler for all of the write commands and then calls the handler directly after setting up the right cmd/resp udata structs as part of the attr bundle. In the end, this is all guaranteed safe because the rest of the core kernel doesn't play setfs(KERN_DS); games with ioctl commands, where as they might do so on the write() syscall to redirect things under certain circumstances. Have I got all of the long term intentions for this straight while I'm going over it all?
On Mon, Dec 03, 2018 at 05:07:53PM -0500, Doug Ledford wrote: > On Mon, 2018-12-03 at 20:58 +0000, Jason Gunthorpe wrote: > > Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing > > write() commands can be immediately converted to execute over ioctl. > > > > If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported, > > and write will not be called. Otherwise the 'both' mode will auto > > detect the kernel support and use it. > > > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> > > libibverbs/cmd_fallback.c | 44 ++++++++++++++++++++++++++++++++++++++- > > libibverbs/device.c | 27 ++++++++++++++++++++++++ > > libibverbs/ibverbs.h | 1 + > > 3 files changed, 71 insertions(+), 1 deletion(-) > > > > This needs the matching kernel patch: > > > > https://patchwork.kernel.org/patch/10706127/ > > > > a kernel headers update and the series at: > > > > https://github.com/linux-rdma/rdma-core/pull/434 > > > > To apply. > > And long term, this is what we need to resolve the security issue/no > fork problem... Yes > We leave the write() interface enabled in the kernel for backward > compatibility for the foreseeable future (this is the whole "no > regressions" requirement from Linus...we were required to regress in > order to resolve the security issue, but other than that, we should > require users to update rdma-core to keep working systems working, so > the existing write() interface needs to stay online). Yes > We enable this ioctl-can-wrap-writes feature in rdma-core and in the > kernel Yes, but this patch just provides the code, the default compile option is left as write() only. All the provider teams should test with ioctl() only mode and confirm there aren't any crazy things missed. I was imagining we'd give another month or two for that, the moven the rdma-core default to both (ie prefer ioctl if available) At some point testing of the write() side will diminish, but now that everything is the same flow it shouldn't be too hard to keep working. > New builds of rdma-core are switched to be ioctl only by preference with > a fallback to write if necessary, or are you thinking build it with > -DIOCTL_MODE and force it to require newer kernels? I image the ioctl only mode as an option distros could choose to enable if they feel confident they don't want to support write() only kernels. It speeds up the call path a bit and removes some code from rdma-core, so it is worth doing if you can. > Regardless, we keep the draconian check in place on the write syscall > entry point, so if anyone tries to do any funny stuff with forking an > application that's already opened a uverbs device, and manages to trick > an application to writing to the uverbs device, it fails. But we don't > turn the write() interface off, we just leave that draconian check > forever. yes, basically. In two years we add a pr_once saying the userspace should be udpated or something > Going forward, the only way to fork then use open device work is if you > use the ioctl method exclusively. This is enforced in the kernel simply > by leaving the draconian check on the write syscall entry point and the > new ioctl method you just posted bypasses the write syscall entry point > and directly looks up the method handler for all of the write commands > and then calls the handler directly after setting up the right cmd/resp > udata structs as part of the attr bundle. Yes > In the end, this is all guaranteed safe because the rest of the core > kernel doesn't play setfs(KERN_DS); games with ioctl commands, where as > they might do so on the write() syscall to redirect things under certain > circumstances. Yes, ioctl is allowed to chase pointers, write() and read() are not.. > Have I got all of the long term intentions for this straight while I'm > going over it all? Yep Jason
On Mon, Dec 03, 2018 at 03:20:40PM -0700, Jason Gunthorpe wrote: > On Mon, Dec 03, 2018 at 05:07:53PM -0500, Doug Ledford wrote: > > On Mon, 2018-12-03 at 20:58 +0000, Jason Gunthorpe wrote: > > > Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing > > > write() commands can be immediately converted to execute over ioctl. > > > > > > If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported, > > > and write will not be called. Otherwise the 'both' mode will auto > > > detect the kernel support and use it. > > > > > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> > > > libibverbs/cmd_fallback.c | 44 ++++++++++++++++++++++++++++++++++++++- > > > libibverbs/device.c | 27 ++++++++++++++++++++++++ > > > libibverbs/ibverbs.h | 1 + > > > 3 files changed, 71 insertions(+), 1 deletion(-) > > > > > > This needs the matching kernel patch: > > > > > > https://patchwork.kernel.org/patch/10706127/ > > > > > > a kernel headers update and the series at: > > > > > > https://github.com/linux-rdma/rdma-core/pull/434 > > > > > > To apply. > > > > And long term, this is what we need to resolve the security issue/no > > fork problem... > > Yes > > > We leave the write() interface enabled in the kernel for backward > > compatibility for the foreseeable future (this is the whole "no > > regressions" requirement from Linus...we were required to regress in > > order to resolve the security issue, but other than that, we should > > require users to update rdma-core to keep working systems working, so > > the existing write() interface needs to stay online). > > Yes > > > We enable this ioctl-can-wrap-writes feature in rdma-core and in the > > kernel > > Yes, but this patch just provides the code, the default compile option > is left as write() only. > > All the provider teams should test with ioctl() only mode and confirm > there aren't any crazy things missed. I was imagining we'd give > another month or two for that, the moven the rdma-core default to both > (ie prefer ioctl if available) > > At some point testing of the write() side will diminish, but now that > everything is the same flow it shouldn't be too hard to keep working. I'm in favour to set ioctl as default right now, so those "providers teams" will go an test this path right now and not in some unknown future. Thanks
On Mon, 2018-12-03 at 20:58 +0000, Jason Gunthorpe wrote: > Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing > write() commands can be immediately converted to execute over ioctl. > > If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported, > and write will not be called. Otherwise the 'both' mode will auto > detect the kernel support and use it. > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> I've pulled this patch, the official uABI headers update commit, and a commit to change the default to both for the ioctl mode into a PR that is currently undergoing Travis CI. If it passes, I'll merge it and we will officially have a ready to test ioctl mode API that can be utilized to get around the fork() issues created by the security fix to the write() path support.
On Tue, 2018-12-18 at 14:41 -0500, Doug Ledford wrote: > On Mon, 2018-12-03 at 20:58 +0000, Jason Gunthorpe wrote: > > Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing > > write() commands can be immediately converted to execute over ioctl. > > > > If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported, > > and write will not be called. Otherwise the 'both' mode will auto > > detect the kernel support and use it. > > > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> > > I've pulled this patch, the official uABI headers update commit, and a > commit to change the default to both for the ioctl mode into a PR that > is currently undergoing Travis CI. If it passes, I'll merge it and we > will officially have a ready to test ioctl mode API that can be utilized > to get around the fork() issues created by the security fix to the > write() path support. > Travis checks passed, merged.
diff --git a/libibverbs/cmd_fallback.c b/libibverbs/cmd_fallback.c index 0e50503e003484..38193a186a8ccb 100644 --- a/libibverbs/cmd_fallback.c +++ b/libibverbs/cmd_fallback.c @@ -206,11 +206,43 @@ void *_write_get_resp_ex(struct ibv_command_buffer *link, return resp_start; } +static int ioctl_write(struct ibv_context *ctx, unsigned int write_method, + const void *req, size_t core_req_size, size_t req_size, + void *resp, size_t core_resp_size, size_t resp_size) +{ + DECLARE_COMMAND_BUFFER(cmdb, UVERBS_OBJECT_DEVICE, + UVERBS_METHOD_INVOKE_WRITE, 5); + + fill_attr_const_in(cmdb, UVERBS_ATTR_WRITE_CMD, write_method); + + if (core_req_size) + fill_attr_in(cmdb, UVERBS_ATTR_CORE_IN, req, core_req_size); + if (core_resp_size) + fill_attr_out(cmdb, UVERBS_ATTR_CORE_OUT, resp, core_resp_size); + + if (req_size - core_req_size) + fill_attr_in(cmdb, UVERBS_ATTR_UHW_IN, req + core_req_size, + req_size - core_req_size); + if (resp_size - core_resp_size) + fill_attr_out(cmdb, UVERBS_ATTR_UHW_OUT, resp + core_resp_size, + resp_size - core_resp_size); + + return execute_ioctl(ctx, cmdb); +} + int _execute_cmd_write(struct ibv_context *ctx, unsigned int write_method, struct ib_uverbs_cmd_hdr *req, size_t core_req_size, size_t req_size, void *resp, size_t core_resp_size, size_t resp_size) { + struct verbs_ex_private *priv = get_priv(ctx); + + if (!VERBS_WRITE_ONLY && (VERBS_IOCTL_ONLY || priv->use_ioctl_write)) + return ioctl_write(ctx, write_method, req + 1, + core_req_size - sizeof(*req), + req_size - sizeof(*req), resp, + core_resp_size, resp_size); + req->command = write_method; req->in_words = __check_divide(req_size, 4); req->out_words = __check_divide(resp_size, 4); @@ -232,6 +264,15 @@ int _execute_cmd_write_ex(struct ibv_context *ctx, unsigned int write_method, size_t req_size, void *resp, size_t core_resp_size, size_t resp_size) { + struct verbs_ex_private *priv = get_priv(ctx); + + if (!VERBS_WRITE_ONLY && (VERBS_IOCTL_ONLY || priv->use_ioctl_write)) + return ioctl_write( + ctx, IB_USER_VERBS_CMD_FLAG_EXTENDED | write_method, + req + 1, core_req_size - sizeof(*req), + req_size - sizeof(*req), resp, core_resp_size, + resp_size); + req->hdr.command = IB_USER_VERBS_CMD_FLAG_EXTENDED | write_method; req->hdr.in_words = __check_divide(core_req_size - sizeof(struct ex_hdr), 8); @@ -245,7 +286,8 @@ int _execute_cmd_write_ex(struct ibv_context *ctx, unsigned int write_method, /* * Users assumes the stack buffer is zeroed before passing to the - * kernel for writing. + * kernel for writing. New kernels with the ioctl path do this + * automatically for us. */ if (resp) memset(resp, 0, resp_size); diff --git a/libibverbs/device.c b/libibverbs/device.c index e6e23e718d9fca..1fed29b17ec31d 100644 --- a/libibverbs/device.c +++ b/libibverbs/device.c @@ -43,6 +43,7 @@ #include <alloca.h> #include <errno.h> +#include <rdma/ib_user_ioctl_cmds.h> #include <util/symver.h> #include "ibverbs.h" @@ -193,6 +194,31 @@ __lib_ibv_create_cq_ex(struct ibv_context *context, return cq; } +static bool has_ioctl_write(struct ibv_context *ctx) +{ + int rc; + DECLARE_COMMAND_BUFFER(cmdb, UVERBS_OBJECT_DEVICE, + UVERBS_METHOD_INVOKE_WRITE, 1); + + if (VERBS_IOCTL_ONLY) + return true; + if (VERBS_WRITE_ONLY) + return false; + + /* + * This command should return ENOSPC since the request length is too + * small. + */ + fill_attr_const_in(cmdb, UVERBS_ATTR_WRITE_CMD, + IB_USER_VERBS_CMD_QUERY_DEVICE); + rc = execute_ioctl(ctx, cmdb); + if (rc == EPROTONOSUPPORT) + return false; + if (rc == ENOTTY) + return false; + return true; +} + /* * Ownership of cmd_fd is transferred into this function, and it will either * be released during the matching call to verbs_uninit_contxt or during the @@ -240,6 +266,7 @@ int verbs_init_context(struct verbs_context *context_ex, context_ex->priv->driver_id = driver_id; verbs_set_ops(context_ex, &verbs_dummy_ops); + context_ex->priv->use_ioctl_write = has_ioctl_write(context); return 0; } diff --git a/libibverbs/ibverbs.h b/libibverbs/ibverbs.h index 24843b49fdb3c7..36e29a6b14d87b 100644 --- a/libibverbs/ibverbs.h +++ b/libibverbs/ibverbs.h @@ -70,6 +70,7 @@ void load_drivers(void); struct verbs_ex_private { BITMAP_DECLARE(unsupported_ioctls, VERBS_OPS_NUM); uint32_t driver_id; + bool use_ioctl_write; struct verbs_context_ops ops; };
Using the new UVERBS_METHOD_INVOKE_WRITE interface all of the existing write() commands can be immediately converted to execute over ioctl. If -DIOCTL_MODE=ioctl is set then only the IOCTL interface is supported, and write will not be called. Otherwise the 'both' mode will auto detect the kernel support and use it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> --- libibverbs/cmd_fallback.c | 44 ++++++++++++++++++++++++++++++++++++++- libibverbs/device.c | 27 ++++++++++++++++++++++++ libibverbs/ibverbs.h | 1 + 3 files changed, 71 insertions(+), 1 deletion(-) This needs the matching kernel patch: https://patchwork.kernel.org/patch/10706127/ a kernel headers update and the series at: https://github.com/linux-rdma/rdma-core/pull/434 To apply.