From patchwork Tue Nov 1 16:34:23 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Grimberg X-Patchwork-Id: 9407625 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 74C2B60234 for ; Tue, 1 Nov 2016 16:34:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64410299D9 for ; Tue, 1 Nov 2016 16:34:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 590DF299F1; Tue, 1 Nov 2016 16:34:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EB399299D9 for ; Tue, 1 Nov 2016 16:34:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752630AbcKAQec (ORCPT ); Tue, 1 Nov 2016 12:34:32 -0400 Received: from mail-wm0-f48.google.com ([74.125.82.48]:35165 "EHLO mail-wm0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752627AbcKAQeb (ORCPT ); Tue, 1 Nov 2016 12:34:31 -0400 Received: by mail-wm0-f48.google.com with SMTP id a197so88290160wmd.0 for ; Tue, 01 Nov 2016 09:34:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=A0fUnIEtthZrSNPGsEgyIRVqLuurW+FkFswrqVw4yow=; b=Wf3+qNkanRY1JTBHazIAqgn6ssUjHhvPdUhQ9oaEC4P0OPBJRLdvzvFafYUV6KXIjX Bxl8axhjzPmd9Q9Yt6DD9tZgk9MH6x0p8ccWY58VRqdqWSsSji8BLhPRvDFDnAcXzEZY 25mk6O+1FYh4gG60hfttBTYioV46zD1AhedThTCBPLeoUQ6DULJ49q8W5ktg0T5cjbBp E00/0oeWVVSrd17D7ShAua4owJdTmIVCEh8RHBdV5U4L0D1IqUgIgvxUPhgghUewgheB /IyXJ4TzIFttBoLNEaW6JbC7TCL+V5UNfJL6vqg2C2M1PeGj5lxHS4AuKA6NgxvF7CJx ol1w== X-Gm-Message-State: ABUngvduuCSRzoX1Co/7tdmRjtmHGQEVDjs0zMAkRcNKTrVYag3f3cptd58gXZIuUof1rQ== X-Received: by 10.28.1.135 with SMTP id 129mr2573793wmb.108.1478018065132; Tue, 01 Nov 2016 09:34:25 -0700 (PDT) Received: from [192.168.1.177] (bzq-82-81-101-184.red.bezeqint.net. [82.81.101.184]) by smtp.gmail.com with ESMTPSA id b184sm31385592wma.0.2016.11.01.09.34.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Nov 2016 09:34:24 -0700 (PDT) Subject: Re: nvmet_rdma crash - DISCONNECT event with NULL queue To: Steve Wise , 'Christoph Hellwig' References: <01b401d23458$af277210$0d765630$@opengridcomputing.com> <6f42d056-284d-00fc-2b98-189f54957980@grimberg.me> <01cc01d2345b$d445acd0$7cd10670$@opengridcomputing.com> Cc: linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org From: Sagi Grimberg Message-ID: <4cc25277-429a-4ab9-470c-b3af1428ce93@grimberg.me> Date: Tue, 1 Nov 2016 18:34:23 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <01cc01d2345b$d445acd0$7cd10670$@opengridcomputing.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP >>> I just hit an nvmf target NULL pointer deref BUG after a few hours of > keep-alive >>> timeout testing. It appears that nvmet_rdma_cm_handler() was called with >>> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as >>> NULL. But then nvmet_rdma_queue_disconnect() is called with queue == NULL >> which >>> causes the crash. >> >> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even >> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown >> paths does not nullify cm_id->qp... > > rdma_destroy_qp() nulls out cm_id->qp. pphh, somehow managed to miss it... So we have a case where we can call rdma_destroy_qp and then rdma_destroy_id but still get events on the cm_id... Not very nice... So I think that the patch from Bart a few weeks ago was correct: --- drivers/nvme/target/rdma.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) break; --- In case this fixes the issue (as expected) I'll queue it up with a change log and a code comment on why we need to do this (and include all the relevant cases around it)... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index d1aea17..a61e47f 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1354,9 +1354,12 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id, break; case RDMA_CM_EVENT_ADDR_CHANGE: case RDMA_CM_EVENT_DISCONNECTED: - case RDMA_CM_EVENT_TIMEWAIT_EXIT: nvmet_rdma_queue_disconnect(queue); break; + case RDMA_CM_EVENT_TIMEWAIT_EXIT: + if (queue) + nvmet_rdma_queue_disconnect(queue); + break; case RDMA_CM_EVENT_DEVICE_REMOVAL: ret = nvmet_rdma_device_removal(cm_id, queue);