From patchwork Fri May 12 15:55:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Dupuis, Chad" X-Patchwork-Id: 9724431 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8075E601E7 for ; Fri, 12 May 2017 15:56:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7192D2882D for ; Fri, 12 May 2017 15:56:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 641B228848; Fri, 12 May 2017 15:56:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 996F228853 for ; Fri, 12 May 2017 15:56:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756946AbdELP41 (ORCPT ); Fri, 12 May 2017 11:56:27 -0400 Received: from mail-by2nam03on0066.outbound.protection.outlook.com ([104.47.42.66]:9370 "EHLO NAM03-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756893AbdELP4Y (ORCPT ); Fri, 12 May 2017 11:56:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=hG2mN7e051I+jL60k7ntbZJQ0/zPZ4XSA4ybuT7auEw=; b=VfpQYlWiSKSD6YAzD+SpvL2RcJrbp6DvSL3wvMBTSonao60Jdd+tr48Ru7fWwT2JSc4U1HpN4TYx+yabPsROlYETNzyOABiruUEeh5GKQQ7f13uqQ4MUbFzHUP7neY1JHkRELI5Ld5VmynOY/0S7+KWbWlburaVzoS0B4zpLG1w= Authentication-Results: linux.vnet.ibm.com; dkim=none (message not signed) header.d=none;linux.vnet.ibm.com; dmarc=none action=none header.from=cavium.com; Received: from n6024mn55p0yw1.qlogic.org (4.15.251.130) by CY1PR0701MB1727.namprd07.prod.outlook.com (10.163.21.141) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1084.16; Fri, 12 May 2017 15:56:09 +0000 Date: Fri, 12 May 2017 11:55:52 -0400 (EDT) From: Chad Dupuis X-X-Sender: cdupuis@n6024mn55p0yw1.qlogic.org To: James Bottomley cc: "Martin K. Petersen" , Sebastian Andrzej Siewior , linux-scsi@vger.kernel.org, Chris Leech , Chad Dupuis , rt@linutronix.de, Lee Duncan , QLogic-Storage-Upstream@qlogic.com, Andrew Morton , Johannes Thumshirn , Christoph Hellwig Subject: Re: [REEEEPOST] bnx2i + bnx2fc: convert to generic workqueue (#3) In-Reply-To: <1494343100.2688.34.camel@linux.vnet.ibm.com> Message-ID: References: <20170410171254.30367-1-bigeasy@linutronix.de> <20170504174427.6hebbnqwfgems6dg@linutronix.de> <1494343100.2688.34.camel@linux.vnet.ibm.com> User-Agent: Alpine 2.00 (OSX 1167 2008-08-23) MIME-Version: 1.0 X-Originating-IP: [4.15.251.130] X-ClientProxiedBy: MWHPR04CA0039.namprd04.prod.outlook.com (10.172.163.25) To CY1PR0701MB1727.namprd07.prod.outlook.com (10.163.21.141) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: d1cbcad6-ada7-4d5b-96ea-08d4994f6974 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(201703131423075)(201703031133081); SRVR:CY1PR0701MB1727; X-Microsoft-Exchange-Diagnostics: 1; CY1PR0701MB1727; 3:MoXE6cF5ftaeaRfaBlG5s0U62u6p3W69YJmodc2sS8BSISN5x57jpzPAg1NqRatl7vl+lrfsgzMHAFBaY6sNQJ+a1v0egegP1dVyFRGtXd2vJI0dBnhiP/HN8zPVQSYv6v/ZrrTGppSizMTgFf0mCX7NVcKr3FwRpdweOQZRqQ9oZ7qhRL5Q0hB3D85I+UIpQP9QiVMKhX5+WH0+Hd7eSIkT6f7hunGA2kdDBncmQA3vENvJCg+a55K85wWeri+x5mxU9d+i412zQDfC9vmSX/65+XQtJJX0v61oxWClwLmbiNKONX2W7SJFxjaON0TJ+vtuDuB4U3QrD40Qao5EtA==; 25:mesFlQd1lJZ6cGyj5Mlh1CpyPsdXb0OxtWX4aRAAW6wOOSCGUOqD4321pHM3X9MyQ1XVAlNPI/+zZDqSmjRSAFaPGqH8E4VC3Jr5JYeUhMHrY/bf9dE/Dt4dIwIFTkM9bAlaOeH/R3RFf47CX08hOCC7gO0CfXLAL05kf7ULKvuKyCDU5litXByG+ya9Ovo2foS+E28GX4A+ED0MwuE7607d0vrq/2yvuGlpDSg9kXJlHYN1HWdAeFbAThK7FqXzMYuH1PkTN/BTZG6RIhoOaTmWKCXJzezg7X+3tLtfYwp4NRCAXKrcryX40ghPir2ojlCe/z1V/oD0+XBJ811gRMPWq/yRDLbxarsTH3VcYX5ex5FhYq3ff2U3PYwbsezIGUl2FP5lLf5mpRNdEluXba1cbIZDM82uSoTL2Ky9B3s6yvj24WSuguD/8a72fOl6NtcTyizfV9JD7csv1yhvSNycFQDqTfHD4tvS9/zdRtE= X-Microsoft-Exchange-Diagnostics: 1; CY1PR0701MB1727; 31:fIz8XSCsJeIVr2gqLTVDuJOAY5q6UoXJDheiGTC0VZE5n/8MbUxZWtu03s46xaRrRablO9+OiHTRLwIqmUxSA9tlsAXpTRVFGI7ZYbRPXmmCIewYvQ8KXdgySSn6LNmnKRMZ79F+anVQeNBVtVHUHPofnWsH9Y3L+rYQOwkqQkzhYx6Z9jHCiEyw0XJW0kFBSl+dUVPD2k1PPMO4il7Y4J0KV+ABzzfyffOZQc+wWbAEyV8xuL15x/oaj1x5Kef/aZoWm+uzIre/cYBYRo5GGw==; 20:qa8doVsthL4xjUeSQBI5xlVtxD04asybbYf0Vsi2MNf1GsYPjj93OK58bzYLHYJNIJexHrODEwLQN7SQKgTx5KxW6//qzwqrYgbCS6TZBU2ymz+09miFr64KPzU0op54wUnoyzBj2+XDLv+/PyBhOggIZNP0EfkhSR307/ZOvLnWEzBy/MKI2LybvxXZ6yyBmECse5YOdf7ko7dugKdWxvoRMak+v1B/o6HQMPMxbBS7kgIvFlppu2kMxFDzrHcBeLJ3D8B0Sz6+qJfhSVcsoZ8fplcjaYZfWYDdR3pDkxYOnnj7TgMcK3MFqmqEKNkWHzYwxLdt2q74nMfgUjA3yrE2CY5JDHTFq/OpRzXScz0wwPWUHagq8XxIDyigzQ4Dmcnev9V7eBERRPsifAXsWQ1PWPlbF8Y+X48qQdyGcziyH6pMMHyUXEVo+rOpvmslKHM7vzOTqP64htSjIPaSTHH68HR6FtjxEgZP77kFe4nYB6EEXMfCbJqocORsAX6Z X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(209352067349851)(211171220733660); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(93006095)(93001095)(6041248)(20161123564025)(20161123560025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(20161123558100)(6072148); SRVR:CY1PR0701MB1727; BCL:0; PCL:0; RULEID:; SRVR:CY1PR0701MB1727; X-Microsoft-Exchange-Diagnostics: 1; CY1PR0701MB1727; 4:8jZSFjmmrH0TFSRxp1W3v0CLUIWwFHLSlfdQSmjvz8Vkpd0Mu+HMrkdmZZeD90P5E6pX62ZXjY/MIxFi4xcpj0alS1UMpJulyQG41gmzKELthg0/tJG1puFDCJRTsZHpCmGnmIp+26/KucVsY4B96+7153b1LfT1vFJ6dUxrA+KpCkVOTWGBYhJdtd1M4OZCSnvBu1pxZmXWXh4tmghBVydv6L6/0s/mKQ3AWMS4G5E+icJnTvfPaE3HS2v27HlkxdRs6wVwNdHbQP6639P/4pPgrIKm3jOXeFdiuRq9K7jpT15fdhdFOYmd5ckKu5UTBcTp2Z69VNlT0C4e8FLCpttVILiah3J/w2fHxK2jxFu2he5QPRP1/ZiLLVHdpmD40FJerme7FJ9pjb62Gb4mHhTAOqJMNXUlNMHyXuKRY0gSwAKWlC/fb01lXfh44ZeIW68IuBBWzHwrw37h7zF3AxYBy+t3y//eOb+Dxjb4lpo+8Jzq/svkqOEzz4S/fOvc+dEPolTiYb0fTv+U4YHJGqMFYsdHg+q6CXtQfMJLuK28O4lNQNEyvYZD0u/0ymbWxJygmDs1gnq2jBJbAmJTJ0Dz+pJCRsTs2Y91f4iaeQi8OTkANIsMOEavNNTtc6BcIvCMFvyrkUTyi+jMtZe39EF+kTcznJ43/Rzd8Zee6rkAys7SVuLLmAsTcXMZ08QCC2Wq0+YsSkiqaDtYtSvCYaNU2SqJNGr1RlRL7X3+HvwqmE690azitYke9u8TU9QQf3ZxlqscqAq31pezDzTkWQK35ylfBdi7UF0FTHfDMPMAmhj2oh6+dtJVLjG/IvSFHmNLawbqJ3RNWzW1B1qexI5JteIUMPg5z6MJS7sr53M= X-Forefront-PRVS: 0305463112 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(4630300001)(6009001)(39400400002)(39410400002)(39850400002)(39840400002)(39450400003)(377424004)(24454002)(93886004)(6116002)(2950100002)(6666003)(6916009)(66066001)(6506006)(3846002)(5660300001)(54356999)(7416002)(4326008)(53416004)(42186005)(25786009)(7736002)(229853002)(23726003)(50986999)(76176999)(305945005)(189998001)(83506001)(53936002)(478600001)(81166006)(5005980100005)(37156001)(8676002)(38730400002)(55016002)(47776003)(2906002)(110136004)(72206003)(9686003)(6246003)(41533002); DIR:OUT; SFP:1101; SCL:1; SRVR:CY1PR0701MB1727; H:n6024mn55p0yw1.qlogic.org; FPR:; SPF:None; MLV:sfv; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; CY1PR0701MB1727; 23:CgbQAOb7XgRpsfJL8+a88ytUyVNGZR1PPFQEBN0?= =?us-ascii?Q?1nG3Aoi4Cvf1jX8ElEsV5UizR3rHCsT8elqcm6TvZPCX0kkJ278m0gik2hkP?= =?us-ascii?Q?7r2SCwSyu8Bc7ch46uGEdOb7Kb2JwHkGyU2rCFHas/iItVYwJuH3oAYoPKjI?= =?us-ascii?Q?U9Xu/vV7D7mGmP0F9ib3ZMIeEt/zTnA8Z5OXZ7T9W3kaU25ulcXtIM25fMEA?= =?us-ascii?Q?sEgxr+VHbRqj57yWcptF+GO5UwdjwvVo+wethF5KE9o7v05XOMPevrtvswda?= =?us-ascii?Q?gpwehtBCqRgVdc/Tgg0Ek10W31+4OS407uUgbEW/gal3G8TkKZ5aSmnBWMll?= =?us-ascii?Q?rgu+NrWAS1IBaL0J8bmdqTFu+HFAVtMQnnzWcPNrPX9MA40zCIlCzZcDuq1s?= =?us-ascii?Q?E4ZKplm7uC3o9E5biAPonIwvGmF4l9ekOmD2OyuHoKgNpAO6TOYn7Ckk3QU4?= =?us-ascii?Q?8f6xSIq5hcrQOvLpxvbyr5F2qAbdowBr8V5UJ5RjacJrw3pH21+q+jKJYfLc?= =?us-ascii?Q?Oty2W8z4uK0xy59UjdKNXEROSGWTCee2TLoJg4PaVQn/f70h5gOvEs/hltqD?= =?us-ascii?Q?oKGvj4aCox1dsE8O2chPXRp4dJbzCIsH9n5VPpISqz2bWnpdEDAuZ6qqHn6P?= =?us-ascii?Q?CcIxhcg2CYGz0YXfxlI5m9ymHOrEFSQ4OlRYla8elkHA77rtmOZscocaM3WZ?= =?us-ascii?Q?lkBoNTOmiHVqloMFgYnwsmJtXBtLmiRVQC9O9ZOMPlWM9d1oc8uABHaQGqv9?= =?us-ascii?Q?z5zq5d6J2PgzN8RERCb7t7quWbZDDO9Snv+RQCtO4Fn/wnC+O/VoglbJr+EV?= =?us-ascii?Q?tOzpgQMpO9DVwwTPkKREbifhcr5pRDwGt6Xeml2wr7zn3zJiZ5r5N39togdv?= =?us-ascii?Q?Dd/4+CpNbjLI9s+7+rpW8ur6NDVgdjuPNWh5rty7ngPpzW7RQad1rGMkoNf+?= =?us-ascii?Q?vnqQb0T6OUdhNn0iRDb21dO8mA+oMcO8j5/gEW0u/hlWSLwvuGm3BtB4/WxV?= =?us-ascii?Q?eko3XEE1IT/wI8g1cmlqRfJiMyJGM1AjzojxrZXTfNHSuWkKaY6+ZRFGOIF+?= =?us-ascii?Q?G7UvQXzwalIkzlT5OXV/Gzevh3JK9rOxHaUDIIUtPbBwq2MY28oOOkbPttGZ?= =?us-ascii?Q?aDHqRMXc+uDPXj5JZPIGwpLWniRcL4uInwuV7R5q6iUe9yzlgNIViTXuBg69?= =?us-ascii?Q?AT4+EoEiO2KknKnQ=3D?= X-Microsoft-Exchange-Diagnostics: 1; CY1PR0701MB1727; 6:ij4j1UXea/Nqztf1imuJ2O8ZgX0pwmf5gOTnchJKLVBGNNyrhWsf14iWo1pR5wwtQTBj+OJpTJII3rc7rclvQNas4NaPLiR0oGUwWKqYZAFCdzvZCP+mLmBVCzuBtUpYgHo4hIcpta1irH5AhCkTq6yVMhz9D6M2YQKL2Kki3ObdIZ0P1nGDPPy5TY4GaH7deS9z6+LRnzAJCl8UUidFf2UB4YU56PY+vvZNK7yVaqrXQI7XSXSxkC+IzkTcN74sODnGScyVy32invVmsej0NTfu7+lr74Z2Gmm4BXnsv0pUJGOlsepl0iecOLPTNEIi6fJ8n4RVO3Wez5UEErPJq/wAX0cB6KKEMWBCCkwJyyjNdIDGsfmfPKHdLIv/elVOLX93EHoOY8zqVQbZ0KqMFRGTrMs8cPdKtWFTqGg4iJqjWYUDu8Vs2dqo0wLl58ngMN58TpoZOheDiYHys43Qk8Y2FzuYH4kCypXSaYdKOip/j2SATjx9SW7Nt50Q0kq0CvoFff58cA5qitoy2sz2Qw==; 5:kDeZxevehhDt6jis2nqm2HMkjKTYXbamuMDZumcO2BxG653sENijaelFNQTSliSxNYx9PHICLxkPj90wlmRFs1iA8YYm4MOpe0LsQzWvQvNovHhkT25HLmb/yB6unIeLlW5kKGP+p5ZwqLObAe357A==; 24:2LtkQJEU9NYdYCETbziRDHMoD/SiVWShOtt/ARMUwaYXq8BUMyDryuSvowuZ87wusevycrcj5cl7ho/dA8lG4iXSQfapibaCxEPHXQqSvg8= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; CY1PR0701MB1727; 7:7gDXZb4ulQJL2tJ6FVfGtPFaAn9is0FBRN/6YbO38Xc4jEBpJ79idShivwdZFh9Z1RweFtjYIViCmpiRf2H5bOV4t5RfF8vRej9pNRxNsE1zcCgDGjiU381usPWi73dWGMeKzGt+V8HpLH4NSTgxwe4nv8HRtF2SpXky6fgyVp2joUj8SvC+1CHzt3zw59lUxlXVAbkTJVkODovKJRvM4iYa1OhFU8b027r3z0FNwZIbyCTMu7hYF6oH5FOSepx52xOFOQwXWEBmwbImKRTYF38RWEmyUhoih7MDR7gyHAgyUF3eD49zSK9gq5LlEnKe7zjOnyEQvu1hAm6QLsUqdQ== X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 May 2017 15:56:09.7421 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0701MB1727 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, 9 May 2017, 11:18am, James Bottomley wrote: > On Tue, 2017-05-09 at 10:17 -0400, Chad Dupuis wrote: > > On Mon, 8 May 2017, 10:04pm, Martin K. Petersen wrote: > > > > > > > > Sebastian, > > > > > > > Martin, do you see any chance to get this merged? Chad replied to > > the > > > > list that he is going to test it on 2017-04-10, didn't respond to > > the > > > > ping 10 days later. The series stalled last time in the same way. > > > > > > I am very reluctant to merge something when a driver has an active > > > maintainer and that person has not acked the change. > > > > > > That said, Chad: You have been sitting on this for quite a while. > > Please > > > make it a priority. In exchange for veto rights you do have to > > provide > > > timely feedback on anything that touches your driver. > > > > > > Thanks! > > > > > > > We did do some testing and hit a calltrace during device discovery: > > > > [ 1332.551799] INFO: task scsi_eh_15:1970 blocked for more than 120 > > seconds. > > [ 1332.551804] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables > > this message. > > [ 1332.551807] scsi_eh_15 D ffff880823488c14 0 1970 2 > > 0x00000080 > > [ 1332.551813] ffff881053a17cb0 0000000000000046 ffff88084d693ec0 > > ffff881053a17fd8 > > [ 1332.551817] ffff881053a17fd8 ffff881053a17fd8 ffff88084d693ec0 > > ffff880823488d48 > > [ 1332.551821] ffff880823488d50 7fffffffffffffff ffff88084d693ec0 > > ffff880823488c14 > > [ 1332.551825] Call Trace: > > [ 1332.551838] [] schedule+0x29/0x70 > > [ 1332.551844] [] schedule_timeout+0x239/0x2d0 > > [ 1332.551850] [] ? console_unlock+0x208/0x400 > > [ 1332.551855] [] ? vprintk_emit+0x3c4/0x510 > > [ 1332.551861] [] ? > > lock_timer_base.isra.33+0x2b/0x50 > > [ 1332.551866] [] wait_for_completion+0x116/0x170 > > [ 1332.551874] [] ? wake_up_state+0x20/0x20 > > [ 1332.551885] [] bnx2fc_abts_cleanup+0x3d/0x62 > > [bnx2fc] > > [ 1332.551892] [] bnx2fc_eh_abort+0x470/0x580 > > [bnx2fc] > > [ 1332.551900] [] scsi_error_handler+0x59f/0x8b0 > > [ 1332.551904] [] ? scsi_eh_get_sense+0x250/0x250 > > [ 1332.551911] [] kthread+0xcf/0xe0 > > [ 1332.551916] [] ? > > kthread_create_on_node+0x140/0x140 > > [ 1332.551923] [] ret_from_fork+0x58/0x90 > > [ 1332.551928] [] ? > > kthread_create_on_node+0x140/0x140 > > Reporting this when you found it would have been helpful ... > > That said, it does look like a genuine hang in the workqueues, so it > rather invalidates the current patch set. > > > To be honest, I'm reluctant to merge these patches on bnx2fc as the > > I/O path on this driver has been stable for quite some time and given > > that it's an older driver I'm not looking to make changes there. > > OK, so find a way to achieve both sets of goals because there's a limit > to how long we allow "stable" drivers to hold up infrastructure changes > within the kernel. The main goal of the current patch set is to remove > the cpu hotplug calls from the drivers because they want to remove them > from the kernel. This is rather complex because you're using per cpu > work queues so you currently have to manage starting and stopping them > as the CPUs come up or go down ... getting rid of that for standard > kernel infrastructure will make the driver easier to keep in > maintenance mode for longer. > > James > Ok, I believe I've found the issue here. The machine that the test has performed on had many more possible CPUs than active CPUs. We calculate which CPU to the work time on in bnx2fc_process_new_cqes() like this: unsigned int cpu = wqe % num_possible_cpus(); Since not all CPUs are active, we were trying to schedule work on non-active CPUs which meant that the upper layers were never notified of the completion. With this change: if (work) { The issue is fixed. Sebastian, can you add this change to your patch set? diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c b/drivers/scsi/bnx2fc/bnx2fc_hwi.c index c2288d6..6f08e43 100644 --- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c +++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c @@ -1042,7 +1042,12 @@ static int bnx2fc_process_new_cqes(struct bnx2fc_rport *tgt) /* Pending work request completion */ struct bnx2fc_work *work = NULL; struct bnx2fc_percpu_s *fps = NULL; - unsigned int cpu = wqe % num_possible_cpus(); + unsigned int cpu = wqe % num_active_cpus(); + + /* Sanity check cpu to make sure it's online */ + if (!cpu_active(cpu)) + /* Default to CPU 0 */ + cpu = 0; work = bnx2fc_alloc_work(tgt, wqe);