From patchwork Wed Jan 3 21:25:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bart Van Assche X-Patchwork-Id: 10143409 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0CBBF601A1 for ; Wed, 3 Jan 2018 21:25:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 04B4F2015F for ; Wed, 3 Jan 2018 21:25:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ED7AA2900E; Wed, 3 Jan 2018 21:25:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 519552015F for ; Wed, 3 Jan 2018 21:25:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750945AbeACVZw (ORCPT ); Wed, 3 Jan 2018 16:25:52 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:19844 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750912AbeACVZv (ORCPT ); Wed, 3 Jan 2018 16:25:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1515014752; x=1546550752; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-id:content-transfer-encoding: mime-version; bh=hn0V0M/HZql4yR4HgskcXIpwIjYfIyYbktePpILW9xQ=; b=HwabMJlI6ZVMdAtvNENWB2FOGA5zNhfMT2zN+KQYPOldw5PntrXN5zre 2MdGicZAxcOiDDKTj7MZaZwrF1VbbHcHFW2qCHRqO1RbEmji+SHt/Jirl 0VuResoEpM2nllM/ScuLCt/u2sRMvcfkzS09XGXhfRsyqE5YzWMw6ytNi zJHuUSmf+sfryyBGBGMCKMxt1OM69qFxzP23idNCdPSrE2RkSyDrjeTzH F3avqDmcd3oQjs+YwJn3BahxoYHWiHvK4wXxGgQLX9rfED2It0gkwpj12 6rsFMkJ3m4M9O4D0puertFE3JoIOUJ8j8iaFjlqsFRfBwjQIjxha4O0So Q==; X-IronPort-AV: E=Sophos;i="5.45,504,1508774400"; d="scan'208";a="68159474" Received: from mail-by2nam03lp0051.outbound.protection.outlook.com (HELO NAM03-BY2-obe.outbound.protection.outlook.com) ([216.32.180.51]) by ob1.hgst.iphmx.com with ESMTP; 04 Jan 2018 05:25:51 +0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sharedspace.onmicrosoft.com; s=selector1-wdc-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=hn0V0M/HZql4yR4HgskcXIpwIjYfIyYbktePpILW9xQ=; b=FVKpNxOt+ayf1lc4biF7nGACeFZH/+gCQ+koeBpEVBqS6l/DoMx31Ng/YqQXeWzl/sRZLhUEouuxmsSghDrxXqj893EB8XkDcdlIh+u55gle929+1qDAfcnJcB4/EBM+K3aU2ntEHFZj6aQx0C/YdMoKKnbaTK4y8Qa7g0QW2lM= Received: from CY1PR0401MB1536.namprd04.prod.outlook.com (10.163.19.154) by CY1PR0401MB1536.namprd04.prod.outlook.com (10.163.19.154) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.386.5; Wed, 3 Jan 2018 21:25:49 +0000 Received: from CY1PR0401MB1536.namprd04.prod.outlook.com ([10.163.19.154]) by CY1PR0401MB1536.namprd04.prod.outlook.com ([10.163.19.154]) with mapi id 15.20.0386.005; Wed, 3 Jan 2018 21:25:49 +0000 From: Bart Van Assche To: "monis@mellanox.com" CC: "linux-rdma@vger.kernel.org" Subject: Re: Linux kernel v4.15-rc4 and rdma_rxe Thread-Topic: Linux kernel v4.15-rc4 and rdma_rxe Thread-Index: AQHTeS9YRyfkvxnCak+mAqqTKKeo8aNNd3iAgAKHsQCABA+nAIAMzG2AgAAVLICAAHe2gIAASMwAgAEPjYA= Date: Wed, 3 Jan 2018 21:25:49 +0000 Message-ID: <1515014747.2582.46.camel@wdc.com> References: <1513732236.2535.25.camel@wdc.com> <1513983674.2579.27.camel@wdc.com> <1514940799.14857.20.camel@wdc.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Bart.VanAssche@wdc.com; x-originating-ip: [199.255.44.172] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; CY1PR0401MB1536; 7:q4JjfS464dWi23DQvrfDP4Dv1tic2Ul1+lJorRDs0k3/IYq8P0O+HuPfZPbh1EJUjYeT/jJ8InBmHLvWWwn/cQcp5YWLi4VbF9REo8s9AhFMIAuqWqmQ+zyuEegbnhnYkvSuXknvQwEPwRIve7+EnwV2T69f6eCAHxa9IaPbcDzPy9vPRdPERt3T+OYUhfK0DkIyiTku1smxH+wPat5oCebXrA3pvY6S0ENG5gMv6HpMQ+b4P/QhWwOTS9IAXdRu; 20:pvUlI8DpJ/3aVvy8XvhW3ESdJYNW52qMPCoi4GyX9vHPJoNLmjkRJnwFsGbnv91qEzLAdAMexfEPSi0DPm31TgaND7wpx0FlkH9cKpufybCjkR7zJJHo9a+5a5ZQyL5IsJrECenyNAxsoqB/xOwcYlNxIWsv4yuV0KpBlaxv7NY= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: d760bbaf-a5ae-4b9a-bc5d-08d552f08f52 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(48565401081)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(5600026)(4604075)(3008032)(2017052603307)(7153060); SRVR:CY1PR0401MB1536; x-ms-traffictypediagnostic: CY1PR0401MB1536: wdcipoutbound: EOP-TRUE x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040470)(2401047)(5005006)(8121501046)(10201501046)(3002001)(3231023)(944501075)(93006095)(93001095)(6055026)(6041268)(20161123558120)(20161123562045)(20161123564045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:CY1PR0401MB1536; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:CY1PR0401MB1536; x-forefront-prvs: 0541031FF6 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39380400002)(396003)(376002)(366004)(39860400002)(346002)(189003)(199004)(24454002)(377424004)(52314003)(6512007)(6246003)(6116002)(81156014)(3846002)(1730700003)(6436002)(5640700003)(2900100001)(4326008)(7736002)(305945005)(2950100002)(97736004)(102836004)(68736007)(2906002)(3660700001)(6916009)(25786009)(3280700002)(36756003)(81166006)(93886005)(72206003)(103116003)(6506007)(2501003)(5660300001)(316002)(8936002)(86362001)(14454004)(8676002)(2351001)(478600001)(53936002)(59450400001)(77096006)(6486002)(105586002)(66066001)(76176011)(229853002)(99286004)(106356001); DIR:OUT; SFP:1102; SCL:1; SRVR:CY1PR0401MB1536; H:CY1PR0401MB1536.namprd04.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; x-microsoft-antispam-message-info: IwZpLVVxspbkO+8mc/fpkOLF9sK4StoSZncRfVdF8s6GGXtwI7TbjwpkEZJDT2frfRNxqCHNRiW7bvbMkgGyiA== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: <373C3FB2B9D2E443B81E0E5D70174087@namprd04.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: wdc.com X-MS-Exchange-CrossTenant-Network-Message-Id: d760bbaf-a5ae-4b9a-bc5d-08d552f08f52 X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Jan 2018 21:25:49.1735 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0401MB1536 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, 2018-01-03 at 07:13 +0200, Moni Shoua wrote: > > Does this perhaps mean that the rxe_qp structure can be freed while rxe_do_task() > > is in progress? Please note that the ib_srpt driver only destroys a QP > > (srpt_destroy_ch_ib() call in srpt_release_channel_work()) after all SCSI command > > processing has finished (transport_deregister_session()). > > If I understand right you say that the system is hung when trying to > take a lock in rxe_do_taks() (line 89). Is that right? > Anyway, It's possible that you hit a bug related to destroying a QP. Hello Moni, The issues I had reported may be unrelated. BTW, this is what I saw appearing in the system log a few minutes ago: Jan 3 13:03:56 ubuntu-vm kernel: ib_srpt:srpt_close_ch: ib_srpt 192.168.122.76-18: queued zerolength write Jan 3 13:03:56 ubuntu-vm kernel: rdma_rxe:rxe_completer: rdma_rxe: rxe_completer(): qp valid 1, state ERROR [ ... ] Jan 3 13:04:09 ubuntu-vm kernel: ib_srpt:srpt_disconnect_ch_sync: ib_srpt ch 192.168.122.76-18 state 3 [ ... ] Jan 3 13:04:14 ubuntu-vm kernel: ib_srpt srpt_disconnect_ch_sync(192.168.122.76-18 state 3): still waiting ... In other words, the ib_srpt driver had queued a zero-length write and changed the QP state into ERROR but no completion was queued for that zero-length write. The rdma_rxe log message was generated by the following code: Bart. diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index 6cdc40ed8a9f..f6c40edbddc6 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -550,6 +550,9 @@ int rxe_completer(void *arg) if (!qp->valid || qp->req.state == QP_STATE_ERROR || qp->req.state == QP_STATE_RESET) { + pr_debug("rxe_completer(): qp valid %d, state %s\n", + qp->valid, qp->req.state == QP_STATE_ERROR ? "ERROR" : + qp->req.state == QP_STATE_RESET ? "RESET" : "(?)"); rxe_drain_resp_pkts(qp, qp->valid && qp->req.state == QP_STATE_ERROR); goto exit;