From patchwork Wed Nov 16 08:39:15 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 9431035 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id D440560471 for ; Wed, 16 Nov 2016 08:39:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C223728E93 for ; Wed, 16 Nov 2016 08:39:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B741E28E95; Wed, 16 Nov 2016 08:39:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 593DE28E93 for ; Wed, 16 Nov 2016 08:39:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753140AbcKPIjc (ORCPT ); Wed, 16 Nov 2016 03:39:32 -0500 Received: from mail.kernel.org ([198.145.29.136]:46370 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753000AbcKPIjc (ORCPT ); Wed, 16 Nov 2016 03:39:32 -0500 Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4C4A3203C2; Wed, 16 Nov 2016 08:39:30 +0000 (UTC) Received: from localhost (unknown [213.57.247.249]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id AB806203B8; Wed, 16 Nov 2016 08:39:28 +0000 (UTC) From: Leon Romanovsky To: dledford@redhat.com Cc: linux-rdma@vger.kernel.org, Yonatan Cohen Subject: [PATCH rdma-rc 2/5] IB/rxe: Fix handling of erroneous WR Date: Wed, 16 Nov 2016 10:39:15 +0200 Message-Id: <1479285558-19627-3-git-send-email-leon@kernel.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1479285558-19627-1-git-send-email-leon@kernel.org> References: <1479285558-19627-1-git-send-email-leon@kernel.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Yonatan Cohen To correctly handle a erroneous WR this fix does the following 1. Make sure the bad WQE causes a user completion event. 2. Call rxe_completer to handle the erred WQE. Before the fix, when rxe_requester found a bad WQE, it changed its status to IB_WC_LOC_PROT_ERR and exit with 0 for non RC QPs. If this was the 1st WQE then there would be no ACK to invoke the completer and this bad WQE would be stuck in the QP's send-q. On top of that the requester exiting with 0 caused rxe_do_task to endlessly invoke rxe_requester, resulting in a soft-lockup attached below. In case the WQE was not the 1st and rxe_completer did get a chance to handle the bad WQE, it did not cause a complete event since the WQE's IB_SEND_SIGNALED flag was not set. Setting WQE status to IB_SEND_SIGNALED is subject to IBA spec version 1.2.1, section 10.7.3.1 Signaled Completions. NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [] ? rxe_pool_get_index+0x35/0xb0 [rdma_rxe] [] lookup_mem+0x3c/0xc0 [rdma_rxe] [] copy_data+0x1c4/0x230 [rdma_rxe] [] rxe_requester+0x9d0/0x1100 [rdma_rxe] [] ? kfree_skbmem+0x5a/0x60 [] rxe_do_task+0x89/0xf0 [rdma_rxe] [] rxe_run_task+0x12/0x30 [rdma_rxe] [] rxe_post_send+0x41a/0x550 [rdma_rxe] [] ? __kmalloc+0x182/0x200 [] ? down_read+0x12/0x40 [] ib_uverbs_post_send+0x532/0x540 [ib_uverbs] [] ? tcp_sendmsg+0x402/0xb80 [] ib_uverbs_write+0x18c/0x3f0 [ib_uverbs] [] ? inet_recvmsg+0x7e/0xb0 [] ? sock_recvmsg+0x3d/0x50 [] __vfs_write+0x37/0x140 [] vfs_write+0xb2/0x1b0 [] SyS_write+0x55/0xc0 [] entry_SYSCALL_64_fastpath+0x1a/0xa Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen Reviewed-by: Moni Shoua Signed-off-by: Leon Romanovsky --- drivers/infiniband/sw/rxe/rxe_req.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index 832846b..22bd963 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -696,7 +696,8 @@ int rxe_requester(void *arg) qp->req.wqe_index); wqe->state = wqe_state_done; wqe->status = IB_WC_SUCCESS; - goto complete; + __rxe_do_task(&qp->comp.task); + return 0; } payload = mtu; } @@ -745,13 +746,17 @@ int rxe_requester(void *arg) wqe->status = IB_WC_LOC_PROT_ERR; wqe->state = wqe_state_error; -complete: - if (qp_type(qp) != IB_QPT_RC) { - while (rxe_completer(qp) == 0) - ; - } - - return 0; + /* + * IBA Spec. Section 10.7.3.1 SIGNALED COMPLETIONS + * ---------8<---------8<------------- + * ...Note that if a completion error occurs, a Work Completion + * will always be generated, even if the signaling + * indicator requests an Unsignaled Completion. + * ---------8<---------8<------------- + */ + wqe->wr.send_flags |= IB_SEND_SIGNALED; + __rxe_do_task(&qp->comp.task); + return -EAGAIN; exit: return -EAGAIN;