From patchwork Fri Aug 25 22:57:09 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Gurtovoy X-Patchwork-Id: 9923047 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0187F6022E for ; Fri, 25 Aug 2017 22:58:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6EC628533 for ; Fri, 25 Aug 2017 22:58:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DBC5B2853E; Fri, 25 Aug 2017 22:58:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 065C128533 for ; Fri, 25 Aug 2017 22:58:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964787AbdHYW6g (ORCPT ); Fri, 25 Aug 2017 18:58:36 -0400 Received: from mail-eopbgr00046.outbound.protection.outlook.com ([40.107.0.46]:59200 "EHLO EUR02-AM5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S964772AbdHYW6f (ORCPT ); Fri, 25 Aug 2017 18:58:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=QeVWyPsEGX4SmGRGS5R8KqdoLO02Zv1n12M87in15T0=; b=Nz4IUXANDTjlDzsofsxN9wLDVquA/p2cyjKfPRIuuxrTrWbmPLrGY8k84fOaPyLYDZy05kDvJ+EW51w+RZmtPIOpZWNuP3kSex/vswn+Ow8j3v3fxotyI1yeBAqLiD5XlmB3yKDM5hSjkHBnXOHt1EL/vvZjO9zXQk0UymPBLeY= Received: from AM5PR0502CA0013.eurprd05.prod.outlook.com (2603:10a6:203:91::23) by DB4PR05MB399.eurprd05.prod.outlook.com (2a01:111:e400:2811::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1362.18; Fri, 25 Aug 2017 22:58:31 +0000 Received: from VE1EUR03FT057.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e09::208) by AM5PR0502CA0013.outlook.office365.com (2603:10a6:203:91::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1385.9 via Frontend Transport; Fri, 25 Aug 2017 22:58:31 +0000 Authentication-Results: spf=pass (sender IP is 193.47.165.134) smtp.mailfrom=mellanox.com; lists.infradead.org; dkim=none (message not signed) header.d=none;lists.infradead.org; dmarc=pass action=none header.from=mellanox.com; Received-SPF: Pass (protection.outlook.com: domain of mellanox.com designates 193.47.165.134 as permitted sender) receiver=protection.outlook.com; client-ip=193.47.165.134; helo=mtlcas13.mtl.com; Received: from mtlcas13.mtl.com (193.47.165.134) by VE1EUR03FT057.mail.protection.outlook.com (10.152.19.123) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1341.15 via Frontend Transport; Fri, 25 Aug 2017 22:58:30 +0000 Received: from MTLCAS13.mtl.com (10.0.8.78) by mtlcas13.mtl.com (10.0.8.78) with Microsoft SMTP Server (TLS) id 15.0.1178.4; Sat, 26 Aug 2017 01:58:15 +0300 Received: from MTLCAS01.mtl.com (10.0.8.71) by MTLCAS13.mtl.com (10.0.8.78) with Microsoft SMTP Server (TLS) id 15.0.1178.4 via Frontend Transport; Sat, 26 Aug 2017 01:58:15 +0300 Received: from [172.16.1.12] (172.16.1.12) by MTLCAS01.mtl.com (10.0.8.71) with Microsoft SMTP Server (TLS) id 14.3.301.0; Sat, 26 Aug 2017 01:57:10 +0300 Subject: Re: kernel NULL pointer during reset_controller operation with IO on 4.11.0-rc7 To: Yi Zhang , Leon Romanovsky , "Sagi Grimberg" References: <1413097100.14743757.1492668219336.JavaMail.zimbra@redhat.com> <97bb90ec-4337-62f7-f08d-a673975a5637@grimberg.me> <20170425180630.GU14088@mtr-leonro.local> <39bb8b67-4018-09bd-9d7d-a8f8534084a7@redhat.com> CC: , From: Max Gurtovoy Message-ID: <7ceef67d-4424-97d5-02f5-7569a1f5a20e@mellanox.com> Date: Sat, 26 Aug 2017 01:57:09 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <39bb8b67-4018-09bd-9d7d-a8f8534084a7@redhat.com> X-Originating-IP: [172.16.1.12] X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:193.47.165.134; IPV:NLI; CTRY:IL; EFV:NLI; SFV:NSPM; SFS:(10009020)(6009001)(39860400002)(2980300002)(438002)(199003)(189002)(24454002)(377454003)(81156014)(8676002)(65956001)(2906002)(23676002)(81166006)(5660300001)(4001350100001)(65826007)(65806001)(8936002)(47776003)(575784001)(54906002)(33646002)(189998001)(86362001)(93886005)(4326008)(230700001)(31696002)(626005)(6116002)(83506001)(229853002)(50986999)(356003)(3846002)(53546010)(2950100002)(54356999)(76176999)(7736002)(305945005)(77096006)(36756003)(64126003)(6246003)(478600001)(106466001)(31686004)(50466002)(3940600001); DIR:OUT; SFP:1101; SCL:1; SRVR:DB4PR05MB399; H:mtlcas13.mtl.com; FPR:; SPF:Pass; PTR:mail13.mellanox.com; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; VE1EUR03FT057; 1:I4XQ15vMJf7HkTE2K61jA1NzJeDZDDwbwXYt9lXtYs3/k3eeDkNaVLtpFnFt1u4jsKtxMeVQrntZsNDGl39fVaBbmqjQBe/p88ND9bBwogbujocAcTx5OVGCraheEZOz X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 54dc5145-cd27-4d3a-b4a1-08d4ec0cce3e X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(8251501002)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:DB4PR05MB399; X-Microsoft-Exchange-Diagnostics: 1; DB4PR05MB399; 3:k5NRa18RFH/DBI6xVp4Mdk1bzGl6+WCLdUGwfPeZu4oqUNcKg6wNv4RiFUaSdGWjNYzNn1eAzj0aKAf1yZdy8BkYdyvsFSW38D6lmiJobym0OVR7O1PnadpXfzLeimyLnnh9Dg9jtj2pZ89PXn3rtzsU4/DLnRbIvMK6Xqixc16QHl09roRdVYbloSif/xSdP0MEFSUG/etSn79R4BKycERKhcO+VAO9HjpjA39lvjmObUtbw2LIVFdg8DANqi+2cluXrgisMV/jsSzK6cJJpSDo+bMRIhk0OmVnU3wZUmMNoIbPU5UMGawy17lGDYHUmxw6s7hac3gqlBM7EF6ozPxL5Js/VLr3d4KydZ19JZE=; 25:r4DYW9OMDDea0XKTjAuID9bjrQVoHj8lhK2gmQNxZqqJZ+L9MPiN4DAh1hLWdSaFyjo3u3v8nblA7xrSt9ioQeAnZ6gOYvXUMdu8/8anf3A9Sld0q9v2mtpaxDgVFIOmbebrr6AM9K2sbEx24avtxjM9b4DYlreDjrzrqfpiJi0W9K0vB38FEHXdPnUXobnU6yb9j2MYff4/F5ojlAo+ryk8/lb6/e+MuRmmmVt9KM31ZY+a+76dx5dNzQvEExbHYyJNGvEhXYPXsrI+VUqTWYV9UAQMCEQzwubUhOiPCrLsuhaPw+GtC0UzT8NL9gVIRGP3UXbAL2Y/Gy30h07U9dxNrwtMkWakJNDcfOyl9e0= X-MS-TrafficTypeDiagnostic: DB4PR05MB399: X-Microsoft-Exchange-Diagnostics: 1; DB4PR05MB399; 31:SfIir+TP+9GqLqwfPujNH8feFv9uiNAvAPPbAV9+HjL0Cx3APOOia3xHXsmRM41AfUJgFehVIN3yLBUbudTJ49DghmkAbd4GPDtQE6ZUiZRdtj9U+3SXzUt04kJwKNj5kP3TvFR1CEP81wKAmFWgKdKLLyr8/dGhmmy3/tA/9sm9eYuON5BuVFsjTtfD46rPHx/fGt4SeBs+KI7EWpRYE86dgTNTc59EvzmFDxFXyrc=; 20:EBV0csxo8YWVcvzTTcb8oklfs10dPKhyrMoeGdA5tqNLwVKo2X7GZRGEhzu479+FbKBZ2E5vROB1+ifoq6fo5ASgArbp8usaKk7GBWch9KqtSQR6DL+Pc53NoYFOgPTkFkAJkzl2xFm6IEMpUKa/mlWe6rPOcgAy650m2OllUR6GSeRDLRxLmjpYCt1E/ZLHps1HwBNM0VWDW9hChtcTtGbCm8Qg6Dvje2WZw1ruCM5IeZ99IBqkwAMUxuu+H1jrvCV5QlebIl9rrXcaohBiUchFeUBAcCAenQYlVSsn+GUBiZRocCVObRp2AKeOU38/eib44o1qIvifLHECzNoL8E6r4+eVtaUvkpLCBOIQGXtzB9g+P5ZkNKq35U4WSKN5dO+Ux6AuN9WSxQr/6p13zFP7v/DFp9Nd0DW/GkfLq8MttOPte3uF4taxoGneU1l4zhWLRnrG+uo4zB46rk/3J88r5KhtD0TqU7EcmedJxUR5mWGe4qaWYqK+P9b7WCWD X-Exchange-Antispam-Report-Test: UriScan:; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(13018025)(13016025)(8121501046)(100000703101)(100105400095)(10201501046)(93006095)(93004095)(3002001)(6055026)(6041248)(20161123558100)(20161123555025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(20161123560025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DB4PR05MB399; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DB4PR05MB399; X-Microsoft-Exchange-Diagnostics: 1; DB4PR05MB399; 4:s9ZZbFPL3hdV2RCAF4C215h58iYETB6hL0zGIt5IDUws9B2gyCNhYHZwqBujR+5stPSGYkwB9h6VKmigsjh8wsK81n/IxJ2P80XBhp5nt9QbI3/A5VCrhRj7yqMOj/+euGCJYuYl2vGtJsU0VIscsZzY5quObiPnKcs3/Swhg9C3u47USgqBw3A7lR74TjXkZDKefpB4Wa5Tm+8WG2jIM9T8er5tkWc2QE3sRWxZ7MlTlCunJW8tR34mJtdU5WGG X-Forefront-PRVS: 041032FF37 X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtEQjRQUjA1TUIzOTk7MjM6ZmtWM1BWNkxxcm1ibXV6WjFiSnFCT2IwZVVr?= =?utf-8?B?WFVxMW5raWliOTZjUHRPYVVqS0QxcTJiVFduMFNnYnRFZUlJZWRoU2tpN3dR?= =?utf-8?B?djNWREFYdHZsY2NJUnVtcER5cklzVTY0emE0UGJNZzVsaURVSEVBUE14Qlcw?= =?utf-8?B?U2RtUXVHWGFNM3N4UU5MRzV5b3phdUVpMlR5SEM1M2FKS1FBaklQajhxQm5M?= =?utf-8?B?cjhQWE5RamdTNDVXN1VGTmhxc05OOE8xNGJNanJJeEJPcWhSSnhUb2ZqWSts?= =?utf-8?B?U29lSytsMlprYlV4eUxNT0FST3RucXA1SFo1VS9UUGdqYUdVSUcyem5DTkFz?= =?utf-8?B?dW95VnMweFdGNU9yTExXV2hJZDAwRWU3MnRLT1RIcjlQMEJDVmhQeWtNdThM?= =?utf-8?B?ci85L0QvNXJoV1Q3TWNPYTlWUWw5OEE5VmhIblhtN1EybVhCaWhCNXByZ3NY?= =?utf-8?B?a0t4clM5dmpraGpqSU1tTkZFWm1HcUlNWTZTdEdWbHY4OG5hK00veDYyaDJu?= =?utf-8?B?cU5YbWFIZHpSZTR1ZXl6UjVHbkllbUFOdUlTckd4U1crQ1BlUDRFMkZnWHFV?= =?utf-8?B?SmJJZnllbmpxNTdjN2JEaVQ1OWlReFowRmFsdDMzNm9kZTQ1d3gxaXNBU1Ix?= =?utf-8?B?VmVQci93NDJKTitiWTRkREZVa01uKzVEeHBEVjladVpXUWRFUU9MdHd4dDJs?= =?utf-8?B?UldicVkrYWZPN3pqdlMrY2ZzWWNQNktnRDUrQnl1V1VLVjZXNVVRSytXZURZ?= =?utf-8?B?UWhwZFhwbGwvcWZMZFNHSEUvWi9CaVovUjVMY25lbkw3THF0cG54RjVFQlVv?= =?utf-8?B?VjFnQmhPelRQUzNBRGY1YnI5MWNxYy9DL2Y0V0FEWmYzYzh0eGN3OGJud2lP?= =?utf-8?B?eGpIL3REVm4zamJ0dHJmcTZabHE3VHBwNUhSKy90ZUxwbWYzV0t2NTZBWTI0?= =?utf-8?B?STdtb25tbG9ibHB3NW9qMWFLZVBaWUluYUNqYXdhYkNsbWdMY2dKcnpuRXJt?= =?utf-8?B?Ykpwd0dhV1JCbTMrZ3ZrOWFvcnpOdkZIcHJuWWwvSlg5ZzJPLzBaWUhtcGpm?= =?utf-8?B?TU1VLzBkMGNsbzluY0JMeXZLcFJITEdJL3M5QWRKdzFLNjhiUWwrQWxjbmtO?= =?utf-8?B?NElBMHJ1QWpic25XUG16SWFXUlhjdFB6RENMQTZQY2VxczRtZllUT3hIdU1U?= =?utf-8?B?RWV6M1Q0MENlM0xkdUdsMWtScWUrNU5pWmF0QitGZlpkaEJvaXJobUFXRVRL?= =?utf-8?B?ODJ2VDFKMEhJTWxLam5lNjFteStGRTVCYnpFTmo5cVhoQWFzNzdaMjg0M2xa?= =?utf-8?B?Q0luR281WDVFRWNVZUNEaDhNcGs2OFhuL2NJMGlKbnlxRlRKT05XN3lmUnVZ?= =?utf-8?B?WUppRU5URGEwbndOSU85WXF1R2dDV2Fkdmk3eDFBZVlyNXE5YjNnL2x4akpB?= =?utf-8?B?aWs0OVNIeGN1S1lDWXFnemc1MFFrM2dKcGFoK2dwRmh6QzNmZFZLVkQwbHA5?= =?utf-8?B?ME9iSnBzTTgyV3FSM1FCTjk4bVYzSVcrTnh3MUZ4SkJWR0NOcVZPa2R0MXFG?= =?utf-8?B?UUZtZUpzM2hsalZvd014VFJxbG9KZEhmbUlWV21vN3hXK3JRdDVPQStzeWZI?= =?utf-8?B?ejBtWGJ6d2xqbFFISjhqWTdWU0cwUnpIZFRyZ3R0MDNvdHQxaENSbUE9PQ==?= X-Microsoft-Exchange-Diagnostics: 1; DB4PR05MB399; 6:iu6/1pAl1LIO6W9z/awiyrMrekgQvbW7oYnX0T/ET3Axe8Yc7051d9nnzWi0ELJhIoTCRJDzVQoUAhbGAcm5rsIrHvzgWrYl/PJAy0qH3pXKI8R7P2wBaegodCEO45nlyjhPLo6m2mGXJYkoqo+vg38kB/J63r8K8UPYSYJewAr9CuBP/Y0kqIUiKGBA6rTsL/Ijx2Pf4/8/kS4RF9J1awerxTtZz0XBHVyJBAl19Fw43aEaC/zdYw4myl659tviLw7o2F/uCfpV9XeJ06eds8xFfHgIxRLi0sAd5Xyk0hsggyGepfY3FEgmySH0OFzy0sbruj5KjmtpphFXbCJ++w==; 5:E6+UnIDZap1kCV276jjCJwB0EW5RPEXaX7MzkDheIPMpGcysjo0/mw/vw8r+Kv+PxEIutKxv13vhAISiL9dX5L+6chXDctPpJJrqhgek2YQA1P+aeQ6cwkKAh/hYzatbb3lp5FDv7TZT18I3/dVN1A==; 24:s+LvrTQo+yW4ggIpWpCAeVpOPy57v7nU1tHcLwLrLqSdXIhnYEdo+TWo9KAvkVvxJjMWy7Ba7lNVwJkfaM779W+Qmb9RsGfFKhtl4JRrjww=; 7:HVfVipijnYHRfsU+a8yU7QL9dMD6ETbWrTZeX1HtoRFT51d/WPgtBvlDx/Ci7IZ/4b4yBOHYjJc9w1F6bIGNu9K37e0XxdmQ2hjSJGHzbHfylBUBIw3eiYJ39TppAnPwVKdp0AM4lHDDkNlnTS7nVwe1YvkmH92AMgZPMfARUKEmbzHDb/PPOa8UCyfO/xDNpqhIj5g7DzlTAfEh1RoWH1ElZqV31bVRea/HM3nJPlQ= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Aug 2017 22:58:30.7873 (UTC) X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=a652971c-7d2e-4d9b-a6a4-d149256f461b; Ip=[193.47.165.134]; Helo=[mtlcas13.mtl.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR05MB399 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 8/25/2017 3:10 PM, Yi Zhang wrote: > > > On 08/24/2017 08:11 PM, Max Gurtovoy wrote: >> >> >> On 4/25/2017 9:06 PM, Leon Romanovsky wrote: >>> On Thu, Apr 20, 2017 at 07:21:29PM +0300, Sagi Grimberg wrote: >>>> >>>>> [1] >>>>> [ 5968.515237] DMAR: DRHD: handling fault status reg 2 >>>>> [ 5968.519449] mlx5_2:dump_cqe:262:(pid 0): dump error cqe >>>>> [ 5968.519450] 00000000 00000000 00000000 00000000 >>>>> [ 5968.519451] 00000000 00000000 00000000 00000000 >>>>> [ 5968.519451] 00000000 00000000 00000000 00000000 >>>>> [ 5968.519452] 00000000 02005104 00000316 a71710e3 >>>> >>>> Max, Can you decode this for us? >>> >>> I'm not Max and maybe he will shed more light on it. I didn't find such >>> error in our documentation. >> >> >> Sorry for the late response. >> >> Yi Zhang, >> Is it still repro ? >> > Hi Max > The good news is the NULL pointer cannot be reproduced any more with > 4.13.0-rc6. > > But I found bellow error on target and client side during the test. > Client side: > rdma-virt-03 login: [ 927.033550] print_req_error: I/O error, dev > nvme0n1, sector 140477384 > [ 927.033577] print_req_error: I/O error, dev nvme0n1, sector 271251016 > [ 927.033579] Buffer I/O error on dev nvme0n1, logical block 33906377, > lost async page write > [ 927.033583] Buffer I/O error on dev nvme0n1, logical block 33906378, > lost async page write > [ 927.033584] Buffer I/O error on dev nvme0n1, logical block 33906379, > lost async page write > [ 927.033585] Buffer I/O error on dev nvme0n1, logical block 33906380, > lost async page write > [ 927.033586] Buffer I/O error on dev nvme0n1, logical block 33906381, > lost async page write > [ 927.033586] Buffer I/O error on dev nvme0n1, logical block 33906382, > lost async page write > [ 927.033587] Buffer I/O error on dev nvme0n1, logical block 33906383, > lost async page write > [ 927.033588] Buffer I/O error on dev nvme0n1, logical block 33906384, > lost async page write > [ 927.033591] print_req_error: I/O error, dev nvme0n1, sector 271299456 > [ 927.033592] Buffer I/O error on dev nvme0n1, logical block 33912432, > lost async page write > [ 927.033593] Buffer I/O error on dev nvme0n1, logical block 33912433, > lost async page write > [ 927.033600] print_req_error: I/O error, dev nvme0n1, sector 271299664 > [ 927.033606] print_req_error: I/O error, dev nvme0n1, sector 271300200 > [ 927.033610] print_req_error: I/O error, dev nvme0n1, sector 271198824 > [ 927.033617] print_req_error: I/O error, dev nvme0n1, sector 271201256 > [ 927.033621] print_req_error: I/O error, dev nvme0n1, sector 271251224 > [ 927.033624] print_req_error: I/O error, dev nvme0n1, sector 271251280 > [ 927.033632] print_req_error: I/O error, dev nvme0n1, sector 271251696 > [ 957.561764] print_req_error: 243 callbacks suppressed > [ 957.567643] print_req_error: I/O error, dev nvme0n1, sector 140682256 > [ 957.575049] buffer_io_error: 1965 callbacks suppressed > [ 957.581006] Buffer I/O error on dev nvme0n1, logical block 17585282, > lost async page write > [ 957.590477] Buffer I/O error on dev nvme0n1, logical block 17585283, > lost async page write > [ 957.599946] Buffer I/O error on dev nvme0n1, logical block 17585284, > lost async page write > [ 957.609406] Buffer I/O error on dev nvme0n1, logical block 17585285, > lost async page write > [ 957.618874] Buffer I/O error on dev nvme0n1, logical block 17585286, > lost async page write > [ 957.628345] print_req_error: I/O error, dev nvme0n1, sector 140692416 > [ 957.635788] Buffer I/O error on dev nvme0n1, logical block 17586552, > lost async page write > [ 957.645290] Buffer I/O error on dev nvme0n1, logical block 17586553, > lost async page write > [ 957.654790] Buffer I/O error on dev nvme0n1, logical block 17586554, > lost async page write > [ 957.664292] print_req_error: I/O error, dev nvme0n1, sector 140693744 > [ 957.671767] Buffer I/O error on dev nvme0n1, logical block 17586718, > lost async page write > [ 957.681299] Buffer I/O error on dev nvme0n1, logical block 17586719, > lost async page write > [ 957.690833] print_req_error: I/O error, dev nvme0n1, sector 140697416 > [ 957.698345] print_req_error: I/O error, dev nvme0n1, sector 140697664 > [ 957.705855] print_req_error: I/O error, dev nvme0n1, sector 140698576 > [ 957.713367] print_req_error: I/O error, dev nvme0n1, sector 140699656 > [ 957.720877] print_req_error: I/O error, dev nvme0n1, sector 140701768 > [ 957.728390] print_req_error: I/O error, dev nvme0n1, sector 140702728 > [ 957.735902] print_req_error: I/O error, dev nvme0n1, sector 140705304 > [ 957.744235] mlx5_2:mlx5_ib_post_send:3846:(pid 1007): > [ 957.750308] nvme nvme0: nvme_rdma_post_send failed with error code -12 > [ 957.757941] mlx5_2:mlx5_ib_post_send:3846:(pid 1007): > [ 957.764030] nvme nvme0: Queueing INV WR for rkey 0x1a1d9f failed (-12) > [ 957.771687] mlx5_2:mlx5_ib_post_send:3846:(pid 1007): > [ 957.777799] nvme nvme0: nvme_rdma_post_send failed with error code -12 > [ 957.785465] mlx5_2:mlx5_ib_post_send:3846:(pid 1007): > [ 957.791587] nvme nvme0: Queueing INV WR for rkey 0x1a1da0 failed (-12) > [ 957.799262] mlx5_2:mlx5_ib_post_send:3846:(pid 1254): > [ 957.805391] mlx5_2:mlx5_ib_post_send:3846:(pid 1007): > [ 957.805396] nvme nvme0: nvme_rdma_post_send failed with error code -12 > [ 957.819307] mlx5_2:mlx5_ib_post_send:3846:(pid 1254): > [ 957.819318] nvme nvme0: nvme_rdma_post_send failed with error code -12 > [ 957.833260] mlx5_2:mlx5_ib_post_send:3846:(pid 1007): > [ 957.833268] nvme nvme0: Queueing INV WR for rkey 0x1a1da1 failed (-12) > [ 957.847263] nvme nvme0: Queueing INV WR for rkey 0x1a1fa1 failed (-12) > [ 957.855006] mlx5_2:mlx5_ib_post_send:3846:(pid 1254): > [ 957.861254] nvme nvme0: nvme_rdma_post_send failed with error code -12 > [ 957.869004] mlx5_2:mlx5_ib_post_send:3846:(pid 1254): > [ 957.875192] nvme nvme0: Queueing INV WR for rkey 0x1a1da2 failed (-12) > [ 987.962014] print_req_error: 244 callbacks suppressed > [ 987.968150] print_req_error: I/O error, dev nvme0n1, sector 140819704 > [ 987.975829] buffer_io_error: 1826 callbacks suppressed > [ 987.982058] Buffer I/O error on dev nvme0n1, logical block 17602463, > lost async page write > [ 987.991803] Buffer I/O error on dev nvme0n1, logical block 17602464, > lost async page write > [ 988.001547] Buffer I/O error on dev nvme0n1, logical block 17602465, > lost async page write I couldn't repro it, but for some reason you got an overflow in the QP send queue. seems like something might be wrong with the calculation (probably signaling calculation). please supply more details: 1. link layer ? 2. HCA type + FW versions on target/host sides ? 3. B2B connection ? try this one as a first step: queue->cm_id = rdma_create_id(&init_net, nvme_rdma_cm_handler, queue, RDMA_PS_TCP, IB_QPT_RC); @@ -1009,9 +1011,7 @@ static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc) */ static inline bool nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue) { - int limit = 1 << ilog2((queue->queue_size + 1) / 2); - - return (atomic_inc_return(&queue->sig_count) & (limit - 1)) == 0; + return (atomic_inc_return(&queue->sig_count) & (queue->limit_mask)) == 0; } static int nvme_rdma_post_send(struct nvme_rdma_queue *queue, --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 82fcb07..1437306 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -88,6 +88,7 @@ struct nvme_rdma_queue { struct nvme_rdma_qe *rsp_ring; atomic_t sig_count; int queue_size; + int limit_mask; size_t cmnd_capsule_len; struct nvme_rdma_ctrl *ctrl; struct nvme_rdma_device *device; @@ -521,6 +522,7 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl, queue->queue_size = queue_size; atomic_set(&queue->sig_count, 0); + queue->limit_mask = (min(32, 1 << ilog2((queue->queue_size + 1) / 2))) - 1;