From patchwork Tue Aug 2 16:19:31 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gabriel Krisman Bertazi X-Patchwork-Id: 9258633 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id AC2E56089F for ; Tue, 2 Aug 2016 16:21:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9CDD42856F for ; Tue, 2 Aug 2016 16:21:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 91B5C28577; Tue, 2 Aug 2016 16:21:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3557928572 for ; Tue, 2 Aug 2016 16:21:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S967542AbcHBQVv (ORCPT ); Tue, 2 Aug 2016 12:21:51 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:21510 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S935399AbcHBQTo (ORCPT ); Tue, 2 Aug 2016 12:19:44 -0400 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u72GImcD133439 for ; Tue, 2 Aug 2016 12:19:42 -0400 Received: from e24smtp03.br.ibm.com (e24smtp03.br.ibm.com [32.104.18.24]) by mx0a-001b2d01.pphosted.com with ESMTP id 24jq3rwcwe-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 02 Aug 2016 12:19:41 -0400 Received: from localhost by e24smtp03.br.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 2 Aug 2016 13:19:39 -0300 Received: from d24dlp01.br.ibm.com (9.18.248.204) by e24smtp03.br.ibm.com (10.172.0.139) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 2 Aug 2016 13:19:37 -0300 X-IBM-Helo: d24dlp01.br.ibm.com X-IBM-MailFrom: krisman@linux.vnet.ibm.com X-IBM-RcptTo: linux-block@vger.kernel.org;linux-scsi@vger.kernel.org Received: from d24relay01.br.ibm.com (d24relay01.br.ibm.com [9.8.31.16]) by d24dlp01.br.ibm.com (Postfix) with ESMTP id A483F3520068; Tue, 2 Aug 2016 12:19:16 -0400 (EDT) Received: from d24av04.br.ibm.com (d24av04.br.ibm.com [9.8.31.97]) by d24relay01.br.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u72GJa4Z4481078; Tue, 2 Aug 2016 13:19:36 -0300 Received: from d24av04.br.ibm.com (localhost [127.0.0.1]) by d24av04.br.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u72GJauD014276; Tue, 2 Aug 2016 13:19:36 -0300 Received: from localhost ([9.78.136.77]) by d24av04.br.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u72GJZZi014273; Tue, 2 Aug 2016 13:19:35 -0300 From: Gabriel Krisman Bertazi To: Jens Axboe Cc: Gabriel Krisman Bertazi , Brian King , linux-block@vger.kernel.org, linux-scsi@vger.kernel.org Subject: [RFC PATCH] blk-mq: Prevent round-robin from scheduling dead cpus Date: Tue, 2 Aug 2016 13:19:31 -0300 X-Mailer: git-send-email 2.7.4 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16080216-0024-0000-0000-000000EC1721 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16080216-0025-0000-0000-000015737946 Message-Id: <1470154771-22774-1-git-send-email-krisman@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-08-02_11:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608020165 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi, I'm not completely sure I got the cause for this one completely right. Still, it does looks like the correct fix and a good improvement in the overall, so I'm making it an RFC for now to gather some feedback. Let me hear your thoughts. -- >8 -- When notifying blk-mq about CPU removals while running IO, we risk racing the hctx->cpumask update with blk_mq_hctx_next_cpu, and end up scheduling a dead cpu to execute hctx->run_{,delayed_}work. As a result, kblockd_schedule_delayed_work_on() may schedule another cpu outside of hctx->cpumask, which triggers the following warning at __blk_mq_run_hw_queue: WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)); This patch makes the issue much more unlikely to happen, as it makes blk_mq_hctx_next_cpu aware of dead cpus, and triggers the round-robin code, despite of remaining batch processing time. Thus, in case we offline a cpu in the middle of its batch processing time, we no longer waste time scheduling it here, and just move through to the next cpu in the mask. The warning may still be triggered, though, since this is not the only case that may cause the queue to schedule on a dead cpu. But this fixes the common case, which is the remaining batch processing time of a sudden dead cpu, which makes the issue much more unlikely to happen. Signed-off-by: Gabriel Krisman Bertazi Cc: Brian King Cc: linux-block@vger.kernel.org Cc: linux-scsi@vger.kernel.org --- block/blk-mq.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index c27bb37..a2cb64c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -858,7 +858,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) if (hctx->queue->nr_hw_queues == 1) return WORK_CPU_UNBOUND; - if (--hctx->next_cpu_batch <= 0) { + if (--hctx->next_cpu_batch <= 0 || + !cpumask_test_cpu(hctx->next_cpu, cpu_online_mask)) { int cpu = hctx->next_cpu, next_cpu; next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask); @@ -868,7 +869,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) hctx->next_cpu = next_cpu; hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; - return cpu; + return (cpumask_test_cpu(cpu, cpu_online_mask)) ? + cpu : blk_mq_hctx_next_cpu(hctx); } return hctx->next_cpu;