From patchwork Tue Nov 1 16:44:21 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 9407653 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1D8CA60585 for ; Tue, 1 Nov 2016 16:44:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0EDDA2993E for ; Tue, 1 Nov 2016 16:44:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 03189299E6; Tue, 1 Nov 2016 16:44:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.7 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM,URIBL_BLACK autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 713762993E for ; Tue, 1 Nov 2016 16:44:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752718AbcKAQo0 (ORCPT ); Tue, 1 Nov 2016 12:44:26 -0400 Received: from mail-wm0-f51.google.com ([74.125.82.51]:34845 "EHLO mail-wm0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751592AbcKAQoZ (ORCPT ); Tue, 1 Nov 2016 12:44:25 -0400 Received: by mail-wm0-f51.google.com with SMTP id a197so88817753wmd.0 for ; Tue, 01 Nov 2016 09:44:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=xRJrJ8o37/giIaitMQ9cSF4TxggigvM0zG35MpfZCZo=; b=eyZpHJtEgutgiZgWCXJPaVcR3HLFj+PFP3uM5o7wYE4oyBZaBfUalq/0SjPDCZEhs1 gObqioCD+yNW5wa4vo/XYG24L8C1U8jzJckL6VXVKz9GbFfsGwlBt3M66lg0WjP30Fq8 lXaPCnTYDK6Dp5Vfi50XkQtylbQPscJdNTeTC7aJ4gsRztzB/Le2Y/j+bREoU6GAq9N7 pXXFvZjPxt2ty/vvtC5LAYyQ7c5UL1yfr5XEkGIa9f2KTNq+iCHyTX6lRyxAXy2YREXa otvY9qJ3rJrXrm3I9zi24pOvcjgdp3hne0Vdf+AODQ40nSFE3wLiNM7rDctAfIdahUD1 Kq+A== X-Gm-Message-State: ABUngvcr1sso/OAZajhFXmdBkznN3o/i7tv3Ru9yFD0g9Fb0kJDE/7TA6gOZsvuzmhIPJg== X-Received: by 10.28.140.136 with SMTP id o130mr2536944wmd.4.1478018664054; Tue, 01 Nov 2016 09:44:24 -0700 (PDT) Received: from [192.168.1.177] (bzq-82-81-101-184.red.bezeqint.net. [82.81.101.184]) by smtp.gmail.com with ESMTPSA id 132sm31402014wmn.16.2016.11.01.09.44.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Nov 2016 09:44:23 -0700 (PDT) Subject: Re: nvmet_rdma crash - DISCONNECT event with NULL queue To: Steve Wise , 'Christoph Hellwig' References: <01b401d23458$af277210$0d765630$@opengridcomputing.com> <6f42d056-284d-00fc-2b98-189f54957980@grimberg.me> <01cc01d2345b$d445acd0$7cd10670$@opengridcomputing.com> <4cc25277-429a-4ab9-470c-b3af1428ce93@grimberg.me> <01d101d2345e$2f054390$8d0fcab0$@opengridcomputing.com> Cc: linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org From: Sagi Grimberg Message-ID: Date: Tue, 1 Nov 2016 18:44:21 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <01d101d2345e$2f054390$8d0fcab0$@opengridcomputing.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP >> pphh, somehow managed to miss it... >> >> So we have a case where we can call rdma_destroy_qp and >> then rdma_destroy_id but still get events on the cm_id... >> Not very nice... >> >> So I think that the patch from Bart a few weeks ago was correct: >> > > Not quite. It just guards against a null queue for TIMEWAIT_EXIT, which is only > generated by the IB_CM. Yes, this is why we need ADDR_CHANGE and DISCONNECTED too "(and include all the relevant cases around it)" The other events we don't get to LIVE state and we don't have other error flows that will trigger queue teardown sequence. --- nvmet-rdma: Fix possible NULL deref when handling rdma cm events When we initiate queue teardown sequence we call rdma_destroy_qp which clears cm_id->qp, afterwards we call rdma_destroy_id, but we might see a rdma_cm event in between with a cleared cm_id->qp so watch out for that and silently ignore the event because this means that the queue teardown sequence is in progress. Signed-off-by: Bart Van Assche Signed-off-by: Sagi Grimberg --- drivers/nvme/target/rdma.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) ret = nvmet_rdma_device_removal(cm_id, queue); -- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index b4d648536c3e..240888efd920 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1351,7 +1351,13 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id, case RDMA_CM_EVENT_ADDR_CHANGE: case RDMA_CM_EVENT_DISCONNECTED: case RDMA_CM_EVENT_TIMEWAIT_EXIT: - nvmet_rdma_queue_disconnect(queue); + /* + * We might end up here when we already freed the qp + * which means queue release sequence is in progress, + * so don't get in the way... + */ + if (!queue) + nvmet_rdma_queue_disconnect(queue); break; case RDMA_CM_EVENT_DEVICE_REMOVAL: