From patchwork Tue May 2 23:28:34 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Gurtovoy X-Patchwork-Id: 9708651 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B9F3D60385 for ; Tue, 2 May 2017 23:28:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D579284E3 for ; Tue, 2 May 2017 23:28:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 91CD1285F0; Tue, 2 May 2017 23:28:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, T_DKIM_INVALID, URIBL_BLACK autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A149E285F1 for ; Tue, 2 May 2017 23:28:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750773AbdEBX2s (ORCPT ); Tue, 2 May 2017 19:28:48 -0400 Received: from mail-he1eur02hn0200.outbound.protection.outlook.com ([104.47.5.200]:61298 "EHLO EUR02-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750725AbdEBX2r (ORCPT ); Tue, 2 May 2017 19:28:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=AXVDKJ1oi9lDo9aaIet6TFJ0C4OngQfUG5kr5xRz4mg=; b=EPd3MThcjwpjjEPZpcKi4KvxpdmKfd8NGtzRaZVCJ1Sb8AxiPn+SnCrwAwXN9vZmg1jZd1Oa7QgI2v+SrTKdYH1iyEKG4KgViF7xn6wIYtpAIIw47mp/TmQfBwlwF1HeGRHs8EzZbY5yPTROU1SV9MsTAUFJGEZT5B7JIH6BzUQ= Received: from AM5PR0501CA0028.eurprd05.prod.outlook.com (10.164.187.38) by VI1PR0502MB2958.eurprd05.prod.outlook.com (10.175.21.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1061.12; Tue, 2 May 2017 23:28:37 +0000 Received: from AM1FFO11FD023.protection.gbl (2a01:111:f400:7e00::198) by AM5PR0501CA0028.outlook.office365.com (2603:10a6:203:17::38) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1075.11 via Frontend Transport; Tue, 2 May 2017 23:28:37 +0000 Authentication-Results: spf=pass (sender IP is 193.47.165.134) smtp.mailfrom=mellanox.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=mellanox.com; Received-SPF: Pass (protection.outlook.com: domain of mellanox.com designates 193.47.165.134 as permitted sender) receiver=protection.outlook.com; client-ip=193.47.165.134; helo=mtlcas13.mtl.com; Received: from mtlcas13.mtl.com (193.47.165.134) by AM1FFO11FD023.mail.protection.outlook.com (10.174.64.212) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1047.9 via Frontend Transport; Tue, 2 May 2017 23:28:36 +0000 Received: from MTLCAS13.mtl.com (10.0.8.78) by mtlcas13.mtl.com (10.0.8.78) with Microsoft SMTP Server (TLS) id 15.0.1178.4; Wed, 3 May 2017 02:28:36 +0300 Received: from MTLCAS01.mtl.com (10.0.8.71) by MTLCAS13.mtl.com (10.0.8.78) with Microsoft SMTP Server (TLS) id 15.0.1178.4 via Frontend Transport; Wed, 3 May 2017 02:28:36 +0300 Received: from [172.16.0.132] (172.16.0.132) by MTLCAS01.mtl.com (10.0.8.71) with Microsoft SMTP Server (TLS) id 14.3.301.0; Wed, 3 May 2017 02:28:34 +0300 Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array To: Laurence Oberman References: <8992bd28-667f-94b1-e582-106e6b41aa4b@sandisk.com> <438230391.2090966.1493152655709.JavaMail.zimbra@redhat.com> <896e9a9e-43b6-7a21-e41b-861e4f795436@mellanox.com> <288883138.2280971.1493207257218.JavaMail.zimbra@redhat.com> <497950649.2287440.1493209093092.JavaMail.zimbra@redhat.com> <16ea1371-84a5-c055-5b0c-fdc6d355276a@mellanox.com> <2122831810.2341766.1493213317484.JavaMail.zimbra@redhat.com> <1879402127.2348907.1493214625254.JavaMail.zimbra@redhat.com> <1477402175.2378198.1493219418826.JavaMail.zimbra@redhat.com> CC: Leon Romanovsky , Bart Van Assche , Doug Ledford , "Sagi Grimberg" , Israel Rukshin , From: Max Gurtovoy Message-ID: <9112e5a4-e7c0-7098-2aca-691661e427d6@mellanox.com> Date: Wed, 3 May 2017 02:28:34 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1477402175.2378198.1493219418826.JavaMail.zimbra@redhat.com> X-Originating-IP: [172.16.0.132] X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:193.47.165.134; IPV:NLI; CTRY:IL; EFV:NLI; SFV:SPM; SFS:(10009020)(6009001)(39450400003)(39840400002)(39860400002)(39400400002)(39410400002)(39850400002)(2980300002)(438002)(13464003)(24454002)(199003)(189002)(377454003)(9170700003)(54356999)(50986999)(229853002)(76176999)(31686004)(3846002)(8936002)(65806001)(305945005)(50466002)(65956001)(36756003)(7736002)(4326008)(47776003)(81166006)(93886004)(8676002)(8656002)(77096006)(53546009)(54906002)(356003)(6116002)(478600001)(106466001)(230700001)(31696002)(6306002)(38730400002)(33646002)(2906002)(5660300001)(966004)(575784001)(23676002)(83506001)(189998001)(110136004)(6246003)(86362001)(65826007)(6916009)(2950100002)(4001350100001)(64126003)(50929005)(15760500002); DIR:OUT; SFP:1501; SCL:5; SRVR:VI1PR0502MB2958; H:mtlcas13.mtl.com; FPR:; SPF:Pass; MLV:nov; A:1; MX:1; PTR:mail13.mellanox.com; X-Microsoft-Exchange-Diagnostics: 1; AM1FFO11FD023; 1:H7k0BIE5H5swaqGQGSIZW2QsGFWS0KRIm7o06pR9fWtmu8XfX94/Zu8e3DNHIbsOm/nB2zWcgue1CmJ3RzJlGw7h2ycn5bMap2nk5kYQQi1piNRZVQADb3CEZTqQbkDjel3kcpesS1b9BKyGUNhS3jtA7L+SzCs9dq/Gkf/33D+wyoJxPYz0ZdO22DbGCn1rLNTd09hi44R0vnydDxcEJ8YW/OPZOjJ2/QERAbrSfK+MnfgH30V0NAPznbSroQfuqWbyRSrfhyeDdtEkZSGq04cN2nVlUSXgSLlhtsHcuAMsw4Hh3yJJO2BaIR5tKQFHBXUcU+MT7iJc5toX1aM1cLAyUbgv2xSTT3gZOqetLrsljySGkpx2MBxly9D0d0mZksxHb/uVEFjENe50CNt1kTVlNZJL4hX7j1agbwljmZ3KtPeYJsYXp+vMeV1k4WMo3zYbYup7wQDT+MkPfqru7rIRJcgWafv72dLVdP2iri3Zo82REwsPPL7SqQQdKcUmMdxpPhUbfkPbISI7bIrb71ogxVX8NRvMlFkQWQeZhCM0uV2HdPoq8Si+5f9vlCLKb4e2FqnTf29NrDcbd9J4gA== X-MS-Office365-Filtering-Correlation-Id: 95a2bf3d-7852-40cd-a9d4-08d491b2f55d X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(8251501002)(2017030254075)(201703131423075)(201703031133081); SRVR:VI1PR0502MB2958; X-Microsoft-Exchange-Diagnostics: 1; VI1PR0502MB2958; 3:jm0N0yPiMP23TzaEa7FyjnJu2brL56TBpU47UjHCXCXcJ7/FXSR/ndgEop/euqzHYppY8enUIgaURWgT6DfL1S8n3e9OCrxPoWQlKV2EDNoS5olp0KA1o6xqZvdBhl/hz8Ty00IH6iH/FQ8FHFKRki+/VjNvmJbrqk8q2tiGcCqqlmAlTjLhaDY96wc682iGXLH2uCLP0zP7OCp+CO6xPYBqkfw7WpKkG7LbZjGtwcXqxlQtNVEvIIoXitNCAGjaZkq7VsMFZLqQp5uFbL7Vvwr3Tb3kRlMJ1Co0gQ0V7HLgC/axiaJeOQDmnZajMrUrksSa76OBgDEVj8CpMLGPsL/J8+B+s45gxawRhxjjT5DzKcyXjmN/PyvwwZOiar6pY2aK87aFG9XQejNy92JNvLRRItd66ccdowGXdviZy0p0kR+h75UD5+GlUNdjHzGTT9SGsekDmW/CAD9dkr6hUyrtjDT/k2MJWzmxnRic5ko/v5YagsjNzT9zZuo7IcGW X-Microsoft-Exchange-Diagnostics: 1; VI1PR0502MB2958; 25:OKkDgjO2Vfr+xfKnnjotnCe0heUAwDtxon8JFcIy6/sc0MtzxsheqqxOs89olGANr3ufyfysCKSkL5b92Stcgcqen23LJ5LqpYQdFTrbVw0brf7VPAdDiIGkd/l58PkUXb4VKiH6o+hCOIl2ikXl0lnuSafEhtXrSgQlV1qByvJPW4LjTBA2eY08RuYOagjVYmwCq3R6Kf7GJ1PgndpRGv3cTW/hVM5oC0cgZ+UVEf8cKlaVDta4x6AphreWJRMdWtsE0g0sO5ErnW73p+v7KT40j+5nKQzjLvUNTtO0z4qhlSwmqkOf8dwh8QuzReDDrgxDmEfYA8yqyk+IKM11Ni0+GEPXC6oqx9IMKleAgwO2Lb2ra0VG2NC9XN5NEd6/MXVg2fpAai4zx6qTN+EswoQy/R0CGYZCjeTJ6xuA12f3p2RV4/StmaRJ+6qIW8algsdmZwII7IPIFdbnf9LchA==; 31:Ip51F9xEqgkJdmoH1T6eQOvumHs71xbh62MP39fptZb//tOl+0Ov9MjxQ27mfTWjtYmFPd65/FONpqeigxPQl1BWA5qJviP31hW2/P4y7vtwnHqlEPYE8tng1oZg5Gzgkls67z6apq2cW+N6Azb5q9Vijat2b0QHDolqQsCvuTks1iWhuWJniCwKjnEG5+/8MlqhNH8nq/c54luE+SqsvHUcHGdtBZLkK/ZKi2EWtisjK62/MpXs8iLil2d6B0AW3isJCrJo45OBmHfjqQOzWw== X-Microsoft-Exchange-Diagnostics: 1; VI1PR0502MB2958; 20:u38cZhsyY8kcxxcG0DYEdnxCRKBnvHiOqwbER9Ywis7WFKwKK/dESH1x8zv23+IH9EZ/iFlkmAwfxg2uA8gk/V8jfVNZ8IEclMCDuSj8YuGUs1zAXiSTh8yPn8gxTFq9J8hZNjpplS0VYLKbfEe3LBKMVGb9g+FF7RanN0SOHVkf6gMQk7XPH8t5wuvoqF8jY8oJ6oR21EyDMuzcmexoZC4LtBv8eGs+Pk2QFyPBAcuczdg6HFmONSEYQ7GgemLrkUuvY+1EV6CmN8/Tq3XwfUIx0a/UhLX9XdLETAvGcs7kqbP+sX6HdN4pIVE66b5ODwqdruIg62MMFGSoZf7m5h1EWbAkmNez8MY1XLmEWElSn/hAot/Mdy5tiPhJljrvIAav/2qBqzV46YPXXm5hk5yJhU2AfCCDnrvLXFlkBlG/fX3PQOJ+EclohnJIZedu49t9fLHtN6kdR1GumlR1iGT8ueZzTjnhrkqDziNdOJIqdwBtLN310JOLVuHXsP7O X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(166708455590820)(9452136761055)(131327999870524)(265634631926514)(42932892334569)(235219596079481); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(8121501046)(5005006)(13016025)(13018025)(3002001)(10201501046)(93006095)(93004095)(6055026)(2002001)(6041248)(20161123558100)(20161123562025)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(20161123564025)(6072148); SRVR:VI1PR0502MB2958; BCL:0; PCL:0; RULEID:; SRVR:VI1PR0502MB2958; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtWSTFQUjA1MDJNQjI5NTg7NDo2WXI3dnV2eEszZE05THcrbzZtdkU5TWFw?= =?utf-8?B?WGFhbXM3K3hIS2F5Mk8rQVRRMTNIY0djVlZNSjBwbng2bzA5ZU4wMmpUdkFo?= =?utf-8?B?ZTFRMVBxNzJmdURPUklNTS9ZRnFWVEJDeGVWaU5KYWVNN0QrYnFIOWhianR5?= =?utf-8?B?cDBjaEx1TjJiYzY3NUF3NktkcGlxSUtHRGNHbVFEcUhiMUZXUm5hMmpvVDg5?= =?utf-8?B?Z2I2a0s3ckE5YnFReENBbVovZXBMNTBsVFlnTTVGQi9TbDZWNmRzdG85dHFL?= =?utf-8?B?Q2N1WEhYMWQxakZveVNINk9xelBZZHY5OFdoZHkrWEo2K29hTHpReVlYdXM4?= =?utf-8?B?VGFncWRHK2Q1aWtZdU1wbUVmVWcvUHJPR2hiRmlBVzFNbDR1VUUySU8zaDRi?= =?utf-8?B?UXF5aUpJR25QMUc5VDVScVpEVEpGRVhGZTAyVHhsQktsZ0RXTCtQekJWa3Vj?= =?utf-8?B?TTJOU2ZraENiVGtKUjBncmNqR0x0YUtlZk53NEh5bDcwMTNYWTJSd3hKYWJh?= =?utf-8?B?UXNJUzJIbTB4akFXVkpOT25aWE5WYTIyNERpSExDTVF4bXd0RmIzVGtXb3BW?= =?utf-8?B?K00va3VBdUU3cVhZTDN5Z1F1WnZoVUVKdTJYUGRvRHNuYWlVSHUrQ1ZnYm5H?= =?utf-8?B?eTBxMUUvTEZLOW9qM09rYW9DWktXZlpaQW9PYTVsZWZxeXNJRGtzTXBHUHhR?= =?utf-8?B?KytkaWpZTmpBZlVuMjAyVHM5ZHZjMytRR3FJYlpGZzNlNnpaL3UzQkNUemdy?= =?utf-8?B?MXpIamJ4ZDBsSUhyS0pFVnUrc1d3eG4xRWpTaXNQV1Rnb1NEWjZaZnNHYlBB?= =?utf-8?B?djBvaEJXWm44REdxS3ltOTFtZ2k4dGpEUDVFTDNsVTNKelNyUWVZR1BiNTU2?= =?utf-8?B?OTl4UlB2MnljY0psRHptRksyRktpbWpBNW8vMXhaTXoyV216QUdFSS8rMWYr?= =?utf-8?B?RWFrZjcxRlNaSzZndDZ3YVYvRWh3d0FhOEF5amU4ckJzVEtySjhzc0kwUWxv?= =?utf-8?B?RWRCSWNJZy91elpHQUljVFdFamV4U21nZkVpeHI0cjgxdkJzRlVmZGdNd28r?= =?utf-8?B?SVcybFhjdWd3VE9jWDZFRmQ3ZGxlSmtRZnNTaHNYNXRoS0J1TWxONjRuNVYy?= =?utf-8?B?RHUxbzh4YnZ5R2JwaWt6WkxKK2d2MVcwcjVCMEVQOHJYZGFsanJnMzJPc3ZL?= =?utf-8?B?NXpKUWFSbUNNRUNGbjVPckc0VnRDbWlMQzJ5OVhIc0ozdXFJN2tvbGxFbjNj?= =?utf-8?B?ZkpoUDdlL2I5VG9OVFpSYVhxM1JOZ2FmY09DYkcvNXlPejE1YlpxeFFTemlk?= =?utf-8?B?VWlpN08xT016TTdYYzdsZFJRTHQ3Q2pIbitHMk9sQmhXc1Q2NGtnVm9xMmRh?= =?utf-8?B?T1RxcmlBVE10KzJRY3ZFWkdBR0hpcmRsRW9BLzFlWE9raUtIR2FMQlJ6MVdy?= =?utf-8?Q?tOaCRkdtGki+N0ioRVhueGjU6+4/?= X-Forefront-PRVS: 02951C14DC X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtWSTFQUjA1MDJNQjI5NTg7MjM6Skx6MlFlM2Raa2I2bmZyZFVQYUtQeXVu?= =?utf-8?B?bk9sSVJVcHVmTlo3SFY2SEtrU0FxdlRhTEc4M09JR20ySzdLU3VwR2xtdUcr?= =?utf-8?B?ejRSRFpVQmpzWkNuc1ZZVHZGVFBOYzlsQjdjNjBaUWpCRGhMWEVvMlZFdnd2?= =?utf-8?B?K2JxZ0Z3MnhRR0NNKzg1VG51akQ2Q0tjOE5sNXloV2U1aHNvV3JqMCtaN2FY?= =?utf-8?B?Y0s2NFZqZDJidmFVZ1ZneWt6U2xGalVlN2d2STNrWDZiM0JGYkVJS245aFlC?= =?utf-8?B?bHdvNmxuWFZ6dXZBWDVnMXM3RnJtYzBjNkNqbUw0SUl2Z0F3NHBqdFNKWlBX?= =?utf-8?B?YlpSR1hGV1JXaGRnT1pLTWtRWkVLRmhTcUxwem54ekExZytXS2QyK3liRTBN?= =?utf-8?B?OWR3d0I2aHBWQVQzUm1jZDBQTmVLZDg4bHI3dUNMSUJRR3dnMEFCVW8yS012?= =?utf-8?B?NFdyL084VWNGM2Fqb2NrQnB3cGxvVjBCMzd5TUNscEt1czI3ZmwyeE9GYW1F?= =?utf-8?B?dG5TNXZVQ2twY3Qzbm1LVFQ4VHBUY09sRU9LME9SclVLNkxiblEvWk8xSnVH?= =?utf-8?B?MGdmM3hmdUZKTWdMVWI4Q1dWNEV1MXg5MWJabUxxcHF5R1JRYmdkcE4zdlRU?= =?utf-8?B?US9OSkJWM2htU1lwQ2VTMWh3VFo5Z3EzSCttRENFL3ZEMlV0TGxMVzJnQ1dv?= =?utf-8?B?b1hUWXdUUjR4Y1hEamZwWnBIUGtoMXl4dHRlZGI2dGVmcC8xS2UzWE8wc2Zh?= =?utf-8?B?VzB3R3VnL0cycHdLNC91dml2TzR5TEtHZG0wSSs3SnBxNE8vSmk4KzNJNVFS?= =?utf-8?B?SHM1V1J6aW1qdUlnS2wxVHZwSDMwSldWVERnZmt6Z3pxS2R2dDRkbEZJSmxF?= =?utf-8?B?cUZNMDdxS1Bkd2hCck84TXhSZXlXZTlFY05vNDJHTCtQdkVkQzMwZHNNMkU4?= =?utf-8?B?enJrbGdtelBJaVhXcGlvNC9pLzJZVHowYnRYOS9ZNkM1SVgyMjM1YXZMaWd4?= =?utf-8?B?a1RlQi83REc3VENjbEJBc29OYVBOZ3dFS200VTJFSHJPQWJxRDRETzRTWnlQ?= =?utf-8?B?c2ZTbEJ0Z2NjSlB6OU11TnlQUW0reDZ2aTZWT0JPUmJVRGRyeEJVSHV3U3Br?= =?utf-8?B?OFlsVDRsTG1oNFltcmJzU3diaG1wZCs4cUY5WWp6UTZlMU5ZbVVnM1drY1hv?= =?utf-8?B?UCtXdkpvMkRlNjZ4L2RGeGxLQnJORHZYelJJUHBWaisxY2NKeHdvb1RXYm9h?= =?utf-8?B?WlNDNWs0cXpKVnVPVmFqMU82RE1zTTE1SHMrV3E4Wnk1bmZQTnVXM0t0UTl5?= =?utf-8?B?NldwN1ZOL0l5VWtKTUZPd2IwSWIxNWxLTDZtc2IzRlU2eTFxc3pSWlZjUnhL?= =?utf-8?B?SHlsSThOenBWR2ZlM1RhOTUwWlRmYW5tYVRVak1qREI5WUlvSU9XVlIvYjJP?= =?utf-8?B?d1Z4V3o4TVdPMm51YnRHa2NpR2ZzdkI5MkVadWgrMm5jempqNFRQaWlHU0gx?= =?utf-8?B?NnUwOFJOdk02VlVkZHFMTGMxTFAwamJtL3dpQjZyeEppcjJaSU9ZcDh3RUlx?= =?utf-8?B?eHRxKzFpcUpQYVorTW5LNVNLNlRhaFJveXFQT3RsVzdNL3puRUlTaUpSN0da?= =?utf-8?B?bkRZK2ZQUy9zcGZRR1Z2d2thMXdxbzVYRDVvRmtrcmFKUTM1K2hXTTVKZWVE?= =?utf-8?B?U2I2VHZtUFpYcXFkS0RMNWEwWEdTRys4eTNWRjVVdG9rSGhKaFZreDVoY2FM?= =?utf-8?B?MGJDUW9GdmNib1NoeERPcElRbWZocmUvODEva1FuRjlYSGI1NzlTeVkwaDgy?= =?utf-8?B?cUlXVDlkQVlxaFhjZERYdkhnMGJ6T0EzTHkrMW45OWY5cncxT0VDNmhpY1Nw?= =?utf-8?B?M3BIZis5NS9FRnJaQ25naFREMzdwa0E4UmEzMTNpQ3VxN1lhVVVGbUtQc3pP?= =?utf-8?B?MVJVSWUrVGgwTVFzcTZQdi9aZEZXYkhXYmZaMUVFOWhMbi9ieS9ZdlNkamxj?= =?utf-8?B?bzlkOXNZMTBzTVZMSTJxMWdoRzVmRG1KUG1LY2s1ejNIK3U4dG5Za0lEYWxB?= =?utf-8?B?aEN2WWhFcVJDN01lRkZ4U2xPL3ZMSlh5dUFHVWZuSEo3ZkJia3pWM2pTSkZ0?= =?utf-8?B?c0pmUT09?= X-Microsoft-Exchange-Diagnostics: 1; VI1PR0502MB2958; 23:9cE80D6FhfJzKtwrQotguC9Rs3JCc+szrpuUGG46j0G4kJoPSJwd6LxtomfIApDc6zS4CEUpRLAjYP+NSxc0VEgPbyoVeKUAOjzPu/kKMwpmxKKcC7pUoWp9wv03+vOwvmajQnPPgZvEWQreT0W+0A==; 6:pfBpr/X0zumtZSpX0XLFi75o4oXO85u/Ct6k8s/IeibQsmqTztpKa9th+gDS5FzT0LZkMdQvuLFoBKXWQj3NJ/ELdQQFPbWrAm1cID8cQBoE4Im4IYazAhJ5RZ4++Gri1PHbawRpzk35XvCvRrX2P87fhG0WOqpALgPi/JFX0Y3PiPWkpHYediSMOzQJLYrGhIfuPQGwb8YQ4oyzN0JTP8TESDMwZiscaDxqXy2XNvfD5ZymNgscXWhooWpzc2CoGINRgSzMVDx+YynwQyOa0iG6I8ZG+bBwXGJ2tpkKrwqE7Ocl3xfNFTSaBgoJrZWEjU7ccVz3gfwcs/EQFsyonuk34oq2BeJe7wsySZyyacCkH+5kTP8llztbc7lRkDrojqw2TlNxKyLz3TFyfd0U7/ErX7Dn0FxegfLnyEkNWQ6yWXYDsSojitfulRW2ffHAY/uL2d0w56wXfgahpbkRDUyYSly0aAv/yuwtKZCqmj1Deh5UZO6GE/1Zi6L6Bl0O86zIs6NdCjbt0Y/mgGygqogOEftb53L2ACbWR706iCc=; 5:UpsXURxaA0EAVFGjw/YPecayCl21GNs+NBnpAnc3KbWp/p/t50NKv4y1Ry2e4abQaE3/dbtMbDkp7Fdiz6qfjQi7mPqJ2ESs9lsi0pLOqRotiQjljGwOeXk6tvU6Om61Moxvyf12+PGLijghOCY8TA== SpamDiagnosticOutput: 1:22 X-Microsoft-Exchange-Diagnostics: 1; VI1PR0502MB2958; 7:lxnlh130YzWL1J6KsGjPFw+6bQ5FQUTF7YpMgKNto1uz2UnNoZIUikHmaBMoQda7bFHVPILzKozk/YLU4wxL/93RSWLM0MXu0+PpjEW7gfY4LRbRfMneS7MctdS8MPJP7ESetCou8Z0759rv947eN6cIUvuS9BufFOGjqnW60QimCSuM6XpbMYtYiDahHIWrKbSfzR2MPxu5f+X+nZHa1SfR94qT400Ha4tKNneU7lUtUk/jqffAplnM8qwc3M85pCVVooBV+hhEDNzBW9vNa1gUeH2mDvq/15fT+MG0Ij3ffVemBXO2jX3knQg4l2zbPXSqn4o+FXURMTY0ggD6lw== X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 May 2017 23:28:36.9725 (UTC) X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=a652971c-7d2e-4d9b-a6a4-d149256f461b; Ip=[193.47.165.134]; Helo=[mtlcas13.mtl.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0502MB2958 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 4/26/2017 6:10 PM, Laurence Oberman wrote: > > > ----- Original Message ----- >> From: "Laurence Oberman" >> To: "Max Gurtovoy" >> Cc: "Leon Romanovsky" , "Bart Van Assche" , "Doug Ledford" >> , "Sagi Grimberg" , "Israel Rukshin" , >> linux-rdma@vger.kernel.org >> Sent: Wednesday, April 26, 2017 9:50:25 AM >> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array >> >> >> >> ----- Original Message ----- >>> From: "Laurence Oberman" >>> To: "Max Gurtovoy" >>> Cc: "Leon Romanovsky" , "Bart Van Assche" >>> , "Doug Ledford" >>> , "Sagi Grimberg" , "Israel Rukshin" >>> , >>> linux-rdma@vger.kernel.org >>> Sent: Wednesday, April 26, 2017 9:28:37 AM >>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() >>> overflows the klms[] array >>> >>> >>> >>> ----- Original Message ----- >>>> From: "Max Gurtovoy" >>>> To: "Laurence Oberman" >>>> Cc: "Leon Romanovsky" , "Bart Van Assche" >>>> , "Doug Ledford" >>>> , "Sagi Grimberg" , "Israel >>>> Rukshin" >>>> , >>>> linux-rdma@vger.kernel.org >>>> Sent: Wednesday, April 26, 2017 8:25:30 AM >>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() >>>> overflows the klms[] array >>>> >>>> >>>> >>>> On 4/26/2017 3:18 PM, Laurence Oberman wrote: >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> From: "Laurence Oberman" >>>>>> To: "Max Gurtovoy" >>>>>> Cc: "Leon Romanovsky" , "Bart Van Assche" >>>>>> , "Doug Ledford" >>>>>> , "Sagi Grimberg" , "Israel >>>>>> Rukshin" , >>>>>> linux-rdma@vger.kernel.org >>>>>> Sent: Wednesday, April 26, 2017 7:47:37 AM >>>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() >>>>>> overflows the klms[] array >>>>>> >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>>> From: "Max Gurtovoy" >>>>>>> To: "Laurence Oberman" , "Leon Romanovsky" >>>>>>> >>>>>>> Cc: "Bart Van Assche" , "Doug Ledford" >>>>>>> , "Sagi Grimberg" >>>>>>> , "Israel Rukshin" , >>>>>>> linux-rdma@vger.kernel.org >>>>>>> Sent: Wednesday, April 26, 2017 4:31:57 AM >>>>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that mlx5_ib_sg_to_klms() >>>>>>> overflows the klms[] array >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 4/25/2017 11:37 PM, Laurence Oberman wrote: >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>>> From: "Leon Romanovsky" >>>>>>>>> To: "Bart Van Assche" >>>>>>>>> Cc: "Doug Ledford" , "Max Gurtovoy" >>>>>>>>> , "Sagi Grimberg" , >>>>>>>>> "Israel Rukshin" , "Laurence Oberman" >>>>>>>>> , linux-rdma@vger.kernel.org >>>>>>>>> Sent: Tuesday, April 25, 2017 1:58:49 PM >>>>>>>>> Subject: Re: [PATCH, untested] mlx5: Avoid that >>>>>>>>> mlx5_ib_sg_to_klms() >>>>>>>>> overflows the klms[] array >>>>>>>>> >>>>>>>>> On Mon, Apr 24, 2017 at 03:15:28PM -0700, Bart Van Assche wrote: >>>>>>>>>> ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger >>>>>>>>>> than what fits into a single MR. .map_mr_sg() must not attempt to >>>>>>>>>> map more SG-list elements than what fits into a single MR. >>>>>>>>>> Hence make sure that mlx5_ib_sg_to_klms() does not write outside >>>>>>>>>> the MR klms[] array. >>>>>>>>>> >>>>>>>>>> Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support") >>>>>>>>>> Signed-off-by: Bart Van Assche >>>>>>>>>> Reviewed-by: Max Gurtovoy >>>>>>>>>> Cc: Sagi Grimberg >>>>>>>>>> Cc: Leon Romanovsky >>>>>>>>>> Cc: Israel Rukshin >>>>>>>>>> Cc: >>>>>>>>>> --- >>>>>>>>>> drivers/infiniband/hw/mlx5/mr.c | 2 +- >>>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>> >>>>>>>>> Bart, >>>>>>>>> >>>>>>>>> Thanks a lot, it indeed looks right. >>>>>>>>> Acked-by: Leon Romanovsky >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hello Bart, Leon, Max and Israel. >>>>>>>> >>>>>>>> I cloned off Barts tree. >>>>>>>> >>>>>>>> git clone https://github.com/bvanassche/linux >>>>>>>> cd linux >>>>>>>> git checkout block-scsi-for-next >>>>>>>> >>>>>>>> I checked all patches were in for this test. >>>>>>>> >>>>>>>> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS >>>>>>>> dfa5a2b mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] >>>>>>>> array >>>>>>>> f759c80 mlx5: Fix mlx5_ib_map_mr_sg mr lengt >>>>>>> >>>>>>> Hi, >>>>>>> copying Sagi's request from different thread: >>>>>>> >>>>>>> " >>>>>>> Can you please enable srp_add_one debug: >>>>>>> >>>>>>> echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control >>>>>>> >>>>>>> In addition apply the following: >>>>>>> -- >>>>>>> diff --git a/drivers/infiniband/hw/mlx5/mr.c >>>>>>> b/drivers/infiniband/hw/mlx5/mr.c >>>>>>> index d9c6c0ea750b..040fbc387e4f 100644 >>>>>>> --- a/drivers/infiniband/hw/mlx5/mr.c >>>>>>> +++ b/drivers/infiniband/hw/mlx5/mr.c >>>>>>> @@ -1403,6 +1403,8 @@ mlx5_alloc_priv_descs(struct ib_device *device, >>>>>>> int add_size; >>>>>>> int ret; >>>>>>> >>>>>>> + WARN_ON_ONCE(ndescs > >>>>>>> device->attr.max_fast_reg_page_list_len); >>>>>>> + >>>>>>> add_size = max_t(int, MLX5_UMR_ALIGN - >>>>>>> ARCH_KMALLOC_MINALIGN, >>>>>>> 0); >>>>>>> >>>>>>> mr->descs_alloc = kzalloc(size + add_size, GFP_KERNEL); >>>>>>> >>>>>>> " >>>>>>> >>>>>>> Max. >>>>>>> >>>>>>>> >>>>>>>> Built and tested the kernel. >>>>>>>> >>>>>>>> However this issue is not resolved :( >>>>>>>> >>>>>>>> [ 2707.931909] scsi host1: ib_srp: failed RECV status WR flushed (5) >>>>>>>> for >>>>>>>> CQE ffff8817edca86b0 >>>>>>>> [ 2708.089806] mlx5_0:dump_cqe:262:(pid 20129): dump error cqe >>>>>>>> [ 2708.121342] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2708.147104] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2708.172633] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2708.198702] 00000000 0f007806 2500002a 14a527d0 >>>>>>>> [ 2732.434127] scsi host1: ib_srp: reconnect succeeded >>>>>>>> [ 2733.048023] scsi host1: ib_srp: failed RECV status WR flushed (5) >>>>>>>> for >>>>>>>> CQE ffff8817ed0a9c30 >>>>>>>> >>>>>>>> [root@localhost ~]# [ 2746.413277] mlx5_0:dump_cqe:262:(pid 15877): >>>>>>>> dump >>>>>>>> error cqe >>>>>>>> [ 2746.443240] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2746.469323] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2746.495310] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2746.521407] 00000000 0f007806 25000032 003c7ad0 >>>>>>>> [ 2752.445899] scsi host1: ib_srp: reconnect succeeded >>>>>>>> [ 2752.481835] scsi host1: ib_srp: failed RECV status WR flushed (5) >>>>>>>> for >>>>>>>> CQE ffff8817ed0a9cf0 >>>>>>>> [ 2763.267386] mlx5_0:dump_cqe:262:(pid 15877): dump error cqe >>>>>>>> [ 2763.297826] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2763.323352] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2763.348722] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2763.374681] 00000000 0f007806 2500003a 00084bd0 >>>>>>>> >>>>>>>> [root@localhost ~]# [ 2769.385203] fast_io_fail_tmo expired for SRP >>>>>>>> port-1:1 / host1. >>>>>>>> [ 2769.415956] scsi host1: ib_srp: reconnect succeeded >>>>>>>> [ 2769.450258] scsi host1: ib_srp: failed RECV status WR flushed (5) >>>>>>>> for >>>>>>>> CQE ffff8817ed0a9cf0 >>>>>>>> [ 2780.064627] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe >>>>>>>> [ 2780.093520] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2780.120067] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2780.145575] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2780.171153] 00000000 0f007806 25000042 000833d0 >>>>>>>> [ 2785.923399] scsi host1: ib_srp: reconnect succeeded >>>>>>>> [ 2785.957504] scsi host1: ib_srp: failed RECV status WR flushed (5) >>>>>>>> for >>>>>>>> CQE ffff8817ed0a9cf0 >>>>>>>> [ 2796.463426] mlx5_0:dump_cqe:262:(pid 18771): dump error cqe >>>>>>>> [ 2796.495257] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2796.521506] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2796.547640] 00000000 00000000 00000000 00000000 >>>>>>>> [ 2796.573120] 00000000 0f007806 2500004a 00083bd0 >>>>>>>> [ 2802.562578] scsi host1: ib_srp: reconnect succeeded >>>>>>>> [ 2802.596880] scsi host1: ib_srp: failed RECV status WR flushed (5) >>>>>>>> for >>>>>>>> CQE ffff8817ed0a9cf0 >>>>>>>> >>>>>>>> Regards >>>>>>>> Laurence >>>>>>>> >>>>>>> >>>>>> Doing this now >>>>>> Thanks >>>>>> Laurence >>>>> >>>>> Max >>>>> >>>>> The Patch is not correct. >>>>> >>>>> drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_alloc_priv_descs': >>>>> drivers/infiniband/hw/mlx5/mr.c:1406:30: error: 'struct ib_device' has >>>>> no >>>>> member named 'attr' >>>>> WARN_ON_ONCE(ndescs > device->attr.max_fast_reg_page_list_len); >>>>> ^ >>>>> ./include/asm-generic/bug.h:117:27: note: in definition of macro >>>>> 'WARN_ON_ONCE' >>>>> int __ret_warn_once = !!(condition); \ >>>>> >>>>> I think you meant to give me >>>>> >>>>> WARN_ON_ONCE(ndescs > ib_device_attr->attr.max_fast_reg_page_list_len); >>>>> >>>>> Can you confirm >>>> >>>> Hi Laurence, >>>> should be device->attrs.max_fast_reg_page_list_len. >>>> >>>> please check this one that might solve the issue (on top of everything): >>>> >>>> >>>> diff --git a/drivers/infiniband/hw/mlx5/mr.c >>>> b/drivers/infiniband/hw/mlx5/mr.c >>>> index b8f9382..063d116 100644 >>>> --- a/drivers/infiniband/hw/mlx5/mr.c >>>> +++ b/drivers/infiniband/hw/mlx5/mr.c >>>> @@ -1559,7 +1559,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, >>>> mr->max_descs = ndescs; >>>> } else if (mr_type == IB_MR_TYPE_SG_GAPS) { >>>> mr->access_mode = MLX5_MKC_ACCESS_MODE_KLMS; >>>> - >>>> + MLX5_SET(mkc, mkc, translations_octword_size, >>>> ALIGN(max_num_sg + 1, 4)); >>>> err = mlx5_alloc_priv_descs(pd->device, mr, >>>> ndescs, sizeof(struct >>>> mlx5_klm)); >>>> if (err) >>>> >>>> thanks, >>>> Max. >>>> >>>>> >>>>> Thanks >>>>> Laurence >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> >>> Hello Max >>> >>> I have the corrected WARN_ON_ONCE patch and the above patch as well as the >>> rest as it was from Barts tree. >>> >>> Still fails. >>> >>> For a baseline I can revert >>> a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS >>> >>> Then test again to make sure we are starting from a good place. >>> >>> Initiator log >>> >>> [ 280.481951] scsi host1: ib_srp: failed FAST REG status memory management >>> operation error (6) for CQE ffff8817d9a881b8 >>> [ 301.149106] scsi host1: ib_srp: reconnect succeeded >>> [ 301.280635] scsi host1: ib_srp: failed RECV status WR flushed (5) for >>> CQE >>> ffff8817ed32f2f0 >>> [ 334.596420] scsi host2: ib_srp: failed RECV status WR flushed (5) for >>> CQE >>> ffff8817c592c970 >>> [ 334.599689] mlx5_1:dump_cqe:262:(pid 20): dump error cqe >>> [ 334.599691] 00000000 00000000 00000000 00000000 >>> [ 334.599692] 00000000 00000000 00000000 00000000 >>> [ 334.599692] 00000000 00000000 00000000 00000000 >>> [ 334.599693] 00000000 0f007806 2500002d 067b48d0 >>> [ 334.599697] scsi host2: ib_srp: failed FAST REG status memory management >>> operation error (6) for CQE ffff8817c6e30078 >>> [ 336.117248] mlx5_0:dump_cqe:262:(pid 130): dump error cqe >>> [ 336.145840] 00000000 00000000 00000000 00000000 >>> [ 336.171830] 00000000 00000000 00000000 00000000 >>> [ 336.197688] 00000000 00000000 00000000 00000000 >>> [ 336.223720] 00000000 0f007806 25000032 005408d0 >>> [ 339.712706] fast_io_fail_tmo expired for SRP port-1:1 / host1. >>> [ 341.453634] scsi host1: ib_srp: reconnect succeeded >>> [ 341.481600] mlx5_0:dump_cqe:262:(pid 130): dump error cqe >>> [ 341.482145] scsi host1: ib_srp: failed RECV status WR flushed (5) for >>> CQE >>> ffff8817ecaf6970 >>> [ 341.559359] 00000000 00000000 00000000 00000000 >>> [ 341.585397] 00000000 00000000 00000000 00000000 >>> [ 341.610948] 00000000 00000000 00000000 00000000 >>> [ 341.637515] 00000000 0f007806 2500003d 000046d0 >>> [ 342.297598] sd 1:0:0:9: rejecting I/O to offline device >>> [ 342.297936] sd 1:0:0:9: [sdg] tag#28 FAILED Result: >>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK >>> [ 342.297941] sd 1:0:0:9: [sdg] tag#28 CDB: Write(10) 2a 00 00 00 40 00 00 >>> 40 00 00 >>> [ 342.297943] blk_update_request: recoverable transport error, dev sdg, >>> sector 16384 >>> [ 342.297951] sd 1:0:0:20: [sdar] tag#5 FAILED Result: >>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK >>> [ 342.297952] sd 1:0:0:20: [sdar] tag#15 FAILED Result: >>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK >>> [ 342.297956] sd 1:0:0:20: [sdar] tag#5 CDB: Write(10) 2a 00 00 03 c0 00 >>> 00 >>> 40 00 00 >>> [ 342.297956] sd 1:0:0:20: [sdar] tag#15 CDB: Write(10) 2a 00 00 2c c0 00 >>> 00 >>> 40 00 00 >>> [ 342.297958] blk_update_request: recoverable transport error, dev sdar, >>> sector 245760 >>> [ 342.297959] blk_update_request: recoverable transport error, dev sdar, >>> sector 2932736 >>> [ 342.298119] device-mapper: multipath: Failing path 8:96. >>> [ 342.298266] sd 1:0:0:9: [sdg] tag#29 FAILED Result: >>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK >>> [ 342.298268] sd 1:0:0:9: [sdg] tag#29 CDB: Write(10) 2a 00 00 00 c0 00 00 >>> 40 00 00 >>> [ 342.298269] blk_update_request: recoverable transport error, dev sdg, >>> sector 49152 >>> [ 342.298300] device-mapper: multipath: Failing path 66:176. >>> [ 342.298486] sd 1:0:0:20: [sdar] tag#16 FAILED Result: >>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK >>> [ 342.298488] sd 1:0:0:20: [sdar] tag#6 FAILED Result: >>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK >>> [ 342.298489] sd 1:0:0:20: [sdar] tag#16 CDB: Write(10) 2a 00 00 2d 40 00 >>> 00 >>> 40 00 00 >>> [ 342.298490] sd 1:0:0:20: [sdar] tag#6 CDB: Write(10) 2a 00 00 04 40 00 >>> 00 >>> 40 00 00 >>> [ 342.298491] blk_update_request: recoverable transport error, dev sdar, >>> sector 2965504 >>> [ 342.298492] blk_update_request: recoverable transport error, dev sdar, >>> sector 278528 >>> [ 342.298582] sd 1:0:0:9: [sdg] tag#30 FAILED Result: >>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK >>> [ 342.298584] sd 1:0:0:9: [sdg] tag#30 CDB: Write(10) 2a 00 00 01 40 00 00 >>> 40 00 00 >>> [ 342.298585] blk_update_request: recoverable transport error, dev sdg, >>> sector 81920 >>> [ 342.298889] sd 1:0:0:9: [sdg] tag#31 FAILED Result: >>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK >>> [ 342.298890] sd 1:0:0:9: [sdg] tag#31 CDB: Write(10) 2a 00 00 01 c0 00 00 >>> 40 00 00 >>> [ 342.298891] blk_update_request: recoverable transport error, dev sdg, >>> sector 114688 >>> [ 342.298981] sd 1:0:0:20: [sdar] tag#7 FAILED Result: >>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK >>> [ 342.298983] sd 1:0:0:20: [sdar] tag#7 CDB: Write(10) 2a 00 00 04 c0 00 >>> 00 >>> 40 00 00 >>> [ 342.298985] blk_update_request: recoverable transport error, dev sdar, >>> sector 311296 >>> [ 342.299004] sd 1:0:0:20: [sdar] tag#17 FAILED Result: >>> hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK >>> [ 342.299007] sd 1:0:0:20: [sdar] tag#17 CDB: Write(10) 2a 00 00 34 c0 00 >>> 00 >>> 40 00 00 >>> [ 342.299009] blk_update_request: recoverable transport error, dev sdar, >>> sector 3457024 >>> [ 342.356353] device-mapper: multipath: Failing path 8:64. >>> [ 342.356489] device-mapper: multipath: Failing path 8:128. >>> [ 342.356628] device-mapper: multipath: Failing path 8:160. >>> [ 342.356699] device-mapper: multipath: Failing path 8:176. >>> [ 342.356767] device-mapper: multipath: Failing path 8:240. >>> [ 342.356834] device-mapper: multipath: Failing path 8:208. >>> [ 342.356900] device-mapper: multipath: Failing path 65:16. >>> [ 342.356967] device-mapper: multipath: Failing path 65:64. >>> [ 342.357035] device-mapper: multipath: Failing path 65:96. >>> [ 342.357103] device-mapper: multipath: Failing path 65:128. >>> [ 342.357169] device-mapper: multipath: Failing path 65:176. >>> [ 342.357237] device-mapper: multipath: Failing path 65:208. >>> [ 342.357303] device-mapper: multipath: Failing path 65:224. >>> [ 342.357371] device-mapper: multipath: Failing path 66:0. >>> [ 342.357454] device-mapper: multipath: Failing path 66:32. >>> [ 342.357521] device-mapper: multipath: Failing path 66:48. >>> [ 342.357647] device-mapper: multipath: Failing path 66:80. >>> [ 342.357714] device-mapper: multipath: Failing path 66:112. >>> [ 342.357781] device-mapper: multipath: Failing path 66:144. >>> [ 342.357936] device-mapper: multipath: Failing path 66:208. >>> [ 342.358019] device-mapper: multipath: Failing path 66:240. >>> [ 342.358115] device-mapper: multipath: Failing path 67:16. >>> [ 342.358183] device-mapper: multipath: Failing path 67:48. >>> [ 342.358264] device-mapper: multipath: Failing path 67:80. >>> [ 342.358359] device-mapper: multipath: Failing path 67:128. >>> [ 342.358442] device-mapper: multipath: Failing path 67:160. >>> [ 342.358594] device-mapper: multipath: Failing path 67:224. >>> [ 342.358671] device-mapper: multipath: Failing path 67:208. >>> [ 350.157728] scsi host2: ib_srp: reconnect succeeded >>> [ 350.189605] mlx5_1:dump_cqe:262:(pid 4756): dump error cqe >>> [ 350.193180] mlx5_1:dump_cqe:262:(pid 1275): dump error cqe >>> [ 350.193182] 00000000 00000000 00000000 00000000 >>> [ 350.193182] 00000000 00000000 00000000 00000000 >>> [ 350.193183] 00000000 00000000 00000000 00000000 >>> [ 350.193183] 00000000 0f007806 25000035 04f569d0 >>> [ 350.193187] scsi host2: ib_srp: failed FAST REG status memory management >>> operation error (6) for CQE ffff8817c6e30078 >>> [ 350.412637] 00000000 00000000 00000000 00000000 >>> [ 350.436431] 00000000 00000000 00000000 00000000 >>> [ 350.461871] 00000000 00000000 00000000 00000000 >>> [ 350.487549] 00000000 0f007806 25000032 000843d0 >>> >>> Target Log >>> >>> Thee events happened after the first failures on the initiator >>> >>> [ 1111.029847] ib_srpt Received CM TimeWait exit for ch >>> 0x4f6e72000390fe7c7cfe900300726ed3-49. >>> [ 1111.078815] ib_srpt Received CM TimeWait exit for ch >>> 0x4f6e72000390fe7c7cfe900300726ed3-48. >>> [ 1111.127420] ib_srpt Received CM TimeWait exit for ch >>> 0x4f6e72000390fe7c7cfe900300726ed3-47. >>> [ 1111.175801] ib_srpt Received CM TimeWait exit for ch >>> 0x4f6e72000390fe7c7cfe900300726ed3-46. >>> [ 1111.223725] ib_srpt Received CM TimeWait exit for ch >>> 0x4f6e72000390fe7c7cfe900300726ed3-45. >>> [ 1111.271957] ib_srpt Received CM TimeWait exit for ch >>> 0x4f6e72000390fe7c7cfe900300726ed3-44. >>> [ 1111.319494] ib_srpt Received CM TimeWait exit for ch >>> 0x4f6e72000390fe7c7cfe900300726ed3-43. >>> [ 1111.365795] ib_srpt Received CM TimeWait exit for ch >>> 0x4f6e72000390fe7c7cfe900300726ed3-42. >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> Max >> >> These are the parameters all my tests run with. >> Same as always. >> >> [root@localhost modprobe.d]# cat ib_srp.conf >> options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048 >> >> I dont set prefer_fr so it defaults to Y >> >> [root@localhost parameters]# cat prefer_fr >> Y >> >> I have no settings for mlx5_core, all defaults. >> >> Thanks >> Laurence >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > Max, > > Reverting a83e404 IB/srp: Reenable IB_MR_TYPE_SG_GAPS on the same source tree with all esle applied I am stable. > So clearly we still have issues with IB_MR_TYPE_SG_GAPS. > > Thanks > Laurence > Hi Laurence, I would like to see the prints that Sagi asked in the srp_add_one function (echo "func srp_add_one +p" > /sys/kernel/debug/dynamic_debug/control) and also prints from srp_create_target (echo "func srp_create_target +p" > /sys/kernel/debug/dynamic_debug/control). another patch can help is: INIT_WORK(&target->tl_err_work, srp_tl_err_work); INIT_WORK(&target->remove_work, srp_remove_work); spin_lock_init(&target->lock); please add also the SG_GAPS Reenable commit and let's repro it again. BTW, how many channels are open ? can you load ib_srp module with ch_count param changes from 4 to #num_cpus and let's see when we get to repro it again. thanks, Max. --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index cee4626..53a67fd 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -3387,6 +3387,10 @@ static ssize_t srp_create_target(struct device *dev, sizeof (struct srp_indirect_buf) + target->cmd_sg_cnt * sizeof (struct srp_direct_buf); + pr_info("sg_tablesize %u mr_pool_size %u mr_per_cmd %u indirect_size %u max_iu_len %u max_sectors %u\n", + target->sg_tablesize, target->mr_pool_size, target->mr_per_cmd, target->indirect_size, + target->max_iu_len, target->scsi_host->max_sectors); +