From patchwork Tue Aug 29 17:34:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roland Dreier X-Patchwork-Id: 9927817 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B45A46022E for ; Tue, 29 Aug 2017 17:34:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F0BB289CE for ; Tue, 29 Aug 2017 17:34:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 92013289D0; Tue, 29 Aug 2017 17:34:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BAEDB289CE for ; Tue, 29 Aug 2017 17:34:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750822AbdH2Rev (ORCPT ); Tue, 29 Aug 2017 13:34:51 -0400 Received: from mail-pg0-f47.google.com ([74.125.83.47]:34552 "EHLO mail-pg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750815AbdH2Rev (ORCPT ); Tue, 29 Aug 2017 13:34:51 -0400 Received: by mail-pg0-f47.google.com with SMTP id y15so12771312pgc.1 for ; Tue, 29 Aug 2017 10:34:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google; h=sender:from:to:cc:subject:date:message-id; bh=BHVaR1WYhJxQIg/lfhF4F8SR2ESETsgukxoM9yqEbSU=; b=lO7Bwel3FS0HxpaborMY0cg0vqEhYa7k5tccX1Knpjcqs4E2bEhvf8+BLI5yekKml2 B7i12Bn4Ne0DVFcvuHpjovN7jgjzlhxiki7POzih9tvPdTE4UABcn/5cGKa+wPFv4/A4 wUWcvuzh9i8hSF091ESMllJOrEXCN6tJ9yhgU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id; bh=BHVaR1WYhJxQIg/lfhF4F8SR2ESETsgukxoM9yqEbSU=; b=D11PK3j9uB1EnIAe7v8fqn94NTmfsBQL7JaspR23eCsJMCwIXgkUrgQH3NTrJXJ79u /GQI87MZ1VnpgS5hWmnIMDtAtl1sFj7Yzz8I/LtLQgGvUQYcTUsqSJhYrg/sG+5ALCaZ VOAtJWGRAHn7KNQJIj30Z4M+uUQYOwOPACIzMmodw5JxeLDYdewr1m2kMQNk2g0Szpsc FZzQOOT+60St3gtx5JZA2VEXfwnIvN9LyWh4NvWgBbtAGYwVZw1VLl80BL68pbgVKSmQ vbL3i7PF4F9S4yu+qJQe2+zhxaYOlXqiYQqdgnOVHLMzRdfy5mmmD3gI3hJwmfNM5gGg K9DA== X-Gm-Message-State: AHYfb5grIGsR6F8cbIhEF7qxHXkuFjRH3f3P8ySmPzcQ6KB1NQZrklNH c3MkoNVeTvstkqDf X-Google-Smtp-Source: ADKCNb4II/7rxawQEVqi4e9zcTWi2QlPo796DLYTZjeew/79xToWgH7tMdygDQ9CzvL00SiUCI5DYA== X-Received: by 10.84.230.229 with SMTP id e92mr1385261plk.332.1504028090232; Tue, 29 Aug 2017 10:34:50 -0700 (PDT) Received: from roland-x1-yoga.home.digitalvampire.org ([2001:470:1f05:221:9810:34d5:f833:73ce]) by smtp.gmail.com with ESMTPSA id 128sm5455110pff.62.2017.08.29.10.34.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Aug 2017 10:34:48 -0700 (PDT) From: Roland Dreier To: Doug Ledford , Sean Hefty Cc: linux-rdma@vger.kernel.org Subject: [PATCH 1/2] IB/cm: Fix sleeping in atomic when RoCE is used Date: Tue, 29 Aug 2017 10:34:43 -0700 Message-Id: <20170829173444.4289-1-roland@kernel.org> X-Mailer: git-send-email 2.14.1 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Roland Dreier A couple of places in the CM do spin_lock_irq(&cm_id_priv->lock); ... if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg)) However when the underlying transport is RoCE, this leads to a sleeping function being called with the lock held - the callchain is cm_alloc_response_msg() -> ib_create_ah_from_wc() -> ib_init_ah_from_wc() -> rdma_addr_find_l2_eth_by_grh() -> rdma_resolve_ip() and rdma_resolve_ip() starts out by doing req = kzalloc(sizeof *req, GFP_KERNEL); not to mention rdma_addr_find_l2_eth_by_grh() doing wait_for_completion(&ctx.comp); to wait for the task that rdma_resolve_ip() queues up. Fix this by moving the AH creation out of the lock. Signed-off-by: Roland Dreier Reviewed-by: Sean Hefty --- drivers/infiniband/core/cm.c | 63 +++++++++++++++++++++++++++++++------------- 1 file changed, 44 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 2b4d613a3474..f814c5035c74 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -373,11 +373,19 @@ static int cm_alloc_msg(struct cm_id_private *cm_id_priv, return ret; } -static int cm_alloc_response_msg(struct cm_port *port, - struct ib_mad_recv_wc *mad_recv_wc, - struct ib_mad_send_buf **msg) +static struct ib_mad_send_buf *cm_alloc_response_msg_no_ah(struct cm_port *port, + struct ib_mad_recv_wc *mad_recv_wc) +{ + return ib_create_send_mad(port->mad_agent, 1, mad_recv_wc->wc->pkey_index, + 0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA, + GFP_ATOMIC, + IB_MGMT_BASE_VERSION); +} + +static int cm_create_response_msg_ah(struct cm_port *port, + struct ib_mad_recv_wc *mad_recv_wc, + struct ib_mad_send_buf *msg) { - struct ib_mad_send_buf *m; struct ib_ah *ah; ah = ib_create_ah_from_wc(port->mad_agent->qp->pd, mad_recv_wc->wc, @@ -385,27 +393,40 @@ static int cm_alloc_response_msg(struct cm_port *port, if (IS_ERR(ah)) return PTR_ERR(ah); - m = ib_create_send_mad(port->mad_agent, 1, mad_recv_wc->wc->pkey_index, - 0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA, - GFP_ATOMIC, - IB_MGMT_BASE_VERSION); - if (IS_ERR(m)) { - rdma_destroy_ah(ah); - return PTR_ERR(m); - } - m->ah = ah; - *msg = m; + msg->ah = ah; return 0; } static void cm_free_msg(struct ib_mad_send_buf *msg) { - rdma_destroy_ah(msg->ah); + if (msg->ah) + rdma_destroy_ah(msg->ah); if (msg->context[0]) cm_deref_id(msg->context[0]); ib_free_send_mad(msg); } +static int cm_alloc_response_msg(struct cm_port *port, + struct ib_mad_recv_wc *mad_recv_wc, + struct ib_mad_send_buf **msg) +{ + struct ib_mad_send_buf *m; + int ret; + + m = cm_alloc_response_msg_no_ah(port, mad_recv_wc); + if (IS_ERR(m)) + return PTR_ERR(m); + + ret = cm_create_response_msg_ah(port, mad_recv_wc, m); + if (ret) { + cm_free_msg(m); + return ret; + } + + *msg = m; + return 0; +} + static void * cm_copy_private_data(const void *private_data, u8 private_data_len) { @@ -2424,7 +2445,8 @@ static int cm_dreq_handler(struct cm_work *work) case IB_CM_TIMEWAIT: atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES]. counter[CM_DREQ_COUNTER]); - if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg)) + msg = cm_alloc_response_msg_no_ah(work->port, work->mad_recv_wc); + if (IS_ERR(msg)) goto unlock; cm_format_drep((struct cm_drep_msg *) msg->mad, cm_id_priv, @@ -2432,7 +2454,8 @@ static int cm_dreq_handler(struct cm_work *work) cm_id_priv->private_data_len); spin_unlock_irq(&cm_id_priv->lock); - if (ib_post_send_mad(msg, NULL)) + if (cm_create_response_msg_ah(work->port, work->mad_recv_wc, msg) || + ib_post_send_mad(msg, NULL)) cm_free_msg(msg); goto deref; case IB_CM_DREQ_RCVD: @@ -2980,7 +3003,8 @@ static int cm_lap_handler(struct cm_work *work) case IB_CM_MRA_LAP_SENT: atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES]. counter[CM_LAP_COUNTER]); - if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg)) + msg = cm_alloc_response_msg_no_ah(work->port, work->mad_recv_wc); + if (IS_ERR(msg)) goto unlock; cm_format_mra((struct cm_mra_msg *) msg->mad, cm_id_priv, @@ -2990,7 +3014,8 @@ static int cm_lap_handler(struct cm_work *work) cm_id_priv->private_data_len); spin_unlock_irq(&cm_id_priv->lock); - if (ib_post_send_mad(msg, NULL)) + if (cm_create_response_msg_ah(work->port, work->mad_recv_wc, msg) || + ib_post_send_mad(msg, NULL)) cm_free_msg(msg); goto deref; case IB_CM_LAP_RCVD: