From patchwork Thu Jun 29 14:41:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 9817209 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 65B98602B1 for ; Thu, 29 Jun 2017 14:42:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 593F9274D0 for ; Thu, 29 Jun 2017 14:42:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4D79028479; Thu, 29 Jun 2017 14:42:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 941C7274D0 for ; Thu, 29 Jun 2017 14:42:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752852AbdF2Omt (ORCPT ); Thu, 29 Jun 2017 10:42:49 -0400 Received: from mail-il-dmz.mellanox.com ([193.47.165.129]:56223 "EHLO mellanox.co.il" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752830AbdF2Oms (ORCPT ); Thu, 29 Jun 2017 10:42:48 -0400 Received: from Internal Mail-Server by MTLPINE1 (envelope-from yishaih@mellanox.com) with ESMTPS (AES256-SHA encrypted); 29 Jun 2017 17:42:16 +0300 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [10.7.2.17]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id v5TEgGd3003468; Thu, 29 Jun 2017 17:42:16 +0300 Received: from vnc17.mtl.labs.mlnx (vnc17.mtl.labs.mlnx [127.0.0.1]) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8) with ESMTP id v5TEgGj4002253; Thu, 29 Jun 2017 17:42:16 +0300 Received: (from yishaih@localhost) by vnc17.mtl.labs.mlnx (8.13.8/8.13.8/Submit) id v5TEgFmx002251; Thu, 29 Jun 2017 17:42:15 +0300 From: Yishai Hadas To: linux-rdma@vger.kernel.org Cc: dledford@redhat.com, yishaih@mellanox.com, maorg@mellanox.com, majd@mellanox.com Subject: [PATCH rdma-core] mlx4: Cleanup resources upon device fatal Date: Thu, 29 Jun 2017 17:41:57 +0300 Message-Id: <1498747317-2170-1-git-send-email-yishaih@mellanox.com> X-Mailer: git-send-email 1.8.2.3 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Cleanup driver resources upon device fatal on closing commands. (e.g. destroy qp/srq/cq etc.) Currently when a device fatal error occurred the uverbs layer returns from kernel EIO to indicate that the kernel driver low level resources were already cleaned up. However, closing commands as of destroy_qp/cq/srq might leak the user space driver memory that had to be freed upon success. The verbs layer (cmd.c) can't generally return a success in that case as it had to deal with an application flow that still handles the reported events in some other thread but the 'events_reported' data doesn't exist as part of the command response. In case the application takes control over the events before cleaning its resources (e.g. by using one thread, ack events before destroying its resources) the driver memory can be safely cleaned up. Let applications that use mlx4 to set some environment variable to indicate that so that won't be a memory leak upon driver device fatal. Signed-off-by: Yishai Hadas Reviewed-by: Maor Gottlieb --- Pull request was sent: https://github.com/linux-rdma/rdma-core/pull/158 providers/mlx4/mlx4.c | 12 ++++++++++++ providers/mlx4/mlx4.h | 6 ++++++ providers/mlx4/srq.c | 2 +- providers/mlx4/verbs.c | 19 ++++++++++--------- 4 files changed, 29 insertions(+), 10 deletions(-) diff --git a/providers/mlx4/mlx4.c b/providers/mlx4/mlx4.c index b798b37..9ba80d4 100644 --- a/providers/mlx4/mlx4.c +++ b/providers/mlx4/mlx4.c @@ -43,6 +43,8 @@ #include "mlx4.h" #include "mlx4-abi.h" +int mlx4_cleanup_upon_device_fatal = 0; + #ifndef PCI_VENDOR_ID_MELLANOX #define PCI_VENDOR_ID_MELLANOX 0x15b3 #endif @@ -118,6 +120,15 @@ static struct ibv_context_ops mlx4_ctx_ops = { .detach_mcast = ibv_cmd_detach_mcast }; +static void mlx4_read_env(void) +{ + char *env_value; + + env_value = getenv("MLX4_DEVICE_FATAL_CLEANUP"); + if (env_value) + mlx4_cleanup_upon_device_fatal = (strcmp(env_value, "0")) ? 1 : 0; +} + static int mlx4_map_internal_clock(struct mlx4_device *mdev, struct ibv_context *ibv_ctx) { @@ -159,6 +170,7 @@ static int mlx4_init_context(struct verbs_device *v_device, context = to_mctx(ibv_ctx); ibv_ctx->cmd_fd = cmd_fd; + mlx4_read_env(); if (dev->abi_version <= MLX4_UVERBS_NO_DEV_CAPS_ABI_VERSION) { if (ibv_cmd_get_context(ibv_ctx, &cmd, sizeof cmd, &resp_v3.ibv_resp, sizeof resp_v3)) diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h index b4f6e86..4637c10 100644 --- a/providers/mlx4/mlx4.h +++ b/providers/mlx4/mlx4.h @@ -350,6 +350,12 @@ static inline void mlx4_update_cons_index(struct mlx4_cq *cq) *cq->set_ci_db = htobe32(cq->cons_index & 0xffffff); } +extern int mlx4_cleanup_upon_device_fatal; +static inline int cleanup_on_fatal(int ret) +{ + return (ret == EIO && mlx4_cleanup_upon_device_fatal); +} + int mlx4_alloc_buf(struct mlx4_buf *buf, size_t size, int page_size); void mlx4_free_buf(struct mlx4_buf *buf); diff --git a/providers/mlx4/srq.c b/providers/mlx4/srq.c index f30cc2e..adcf0fa 100644 --- a/providers/mlx4/srq.c +++ b/providers/mlx4/srq.c @@ -308,7 +308,7 @@ int mlx4_destroy_xrc_srq(struct ibv_srq *srq) pthread_spin_unlock(&mcq->lock); ret = ibv_cmd_destroy_srq(srq); - if (ret) { + if (ret && !cleanup_on_fatal(ret)) { pthread_spin_lock(&mcq->lock); mlx4_store_xsrq(&mctx->xsrq_table, msrq->verbs_srq.srq_num, msrq); pthread_spin_unlock(&mcq->lock); diff --git a/providers/mlx4/verbs.c b/providers/mlx4/verbs.c index 80efd9a..5770430 100644 --- a/providers/mlx4/verbs.c +++ b/providers/mlx4/verbs.c @@ -218,7 +218,7 @@ int mlx4_free_pd(struct ibv_pd *pd) int ret; ret = ibv_cmd_dealloc_pd(pd); - if (ret) + if (ret && !cleanup_on_fatal(ret)) return ret; free(to_mpd(pd)); @@ -255,10 +255,11 @@ int mlx4_close_xrcd(struct ibv_xrcd *ib_xrcd) int ret; ret = ibv_cmd_close_xrcd(xrcd); - if (!ret) - free(xrcd); + if (ret && !cleanup_on_fatal(ret)) + return ret; - return ret; + free(xrcd); + return 0; } struct ibv_mr *mlx4_reg_mr(struct ibv_pd *pd, void *addr, size_t length, @@ -307,7 +308,7 @@ int mlx4_dereg_mr(struct ibv_mr *mr) int ret; ret = ibv_cmd_dereg_mr(mr); - if (ret) + if (ret && !cleanup_on_fatal(ret)) return ret; free(mr); @@ -342,7 +343,7 @@ int mlx4_dealloc_mw(struct ibv_mw *mw) struct ibv_dealloc_mw cmd; ret = ibv_cmd_dealloc_mw(mw, &cmd, sizeof(cmd)); - if (ret) + if (ret && !cleanup_on_fatal(ret)) return ret; free(mw); @@ -629,7 +630,7 @@ int mlx4_destroy_cq(struct ibv_cq *cq) int ret; ret = ibv_cmd_destroy_cq(cq); - if (ret) + if (ret && !cleanup_on_fatal(ret)) return ret; mlx4_free_db(to_mctx(cq->context), MLX4_DB_TYPE_CQ, to_mcq(cq)->set_ci_db); @@ -733,7 +734,7 @@ int mlx4_destroy_srq(struct ibv_srq *srq) return mlx4_destroy_xrc_srq(srq); ret = ibv_cmd_destroy_srq(srq); - if (ret) + if (ret && !cleanup_on_fatal(ret)) return ret; mlx4_free_db(to_mctx(srq->context), MLX4_DB_TYPE_RQ, to_msrq(srq)->db); @@ -1090,7 +1091,7 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) pthread_mutex_lock(&to_mctx(ibqp->context)->qp_table_mutex); ret = ibv_cmd_destroy_qp(ibqp); - if (ret) { + if (ret && !cleanup_on_fatal(ret)) { pthread_mutex_unlock(&to_mctx(ibqp->context)->qp_table_mutex); return ret; }