From patchwork Tue Nov 22 17:54:28 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gabriel Krisman Bertazi X-Patchwork-Id: 9441807 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 89101605EE for ; Tue, 22 Nov 2016 17:58:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FF28285BD for ; Tue, 22 Nov 2016 17:58:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 74D7A285BF; Tue, 22 Nov 2016 17:58:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 01739285BD for ; Tue, 22 Nov 2016 17:58:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932553AbcKVR5w (ORCPT ); Tue, 22 Nov 2016 12:57:52 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:41937 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755658AbcKVR5u (ORCPT ); Tue, 22 Nov 2016 12:57:50 -0500 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uAMHu9kS003487 for ; Tue, 22 Nov 2016 12:56:20 -0500 Received: from e24smtp02.br.ibm.com (e24smtp02.br.ibm.com [32.104.18.86]) by mx0a-001b2d01.pphosted.com with ESMTP id 26vn31m9hs-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 22 Nov 2016 12:56:18 -0500 Received: from localhost by e24smtp02.br.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 22 Nov 2016 15:54:48 -0200 Received: from d24dlp01.br.ibm.com (9.18.248.204) by e24smtp02.br.ibm.com (10.172.0.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 22 Nov 2016 15:54:45 -0200 Received: from d24relay01.br.ibm.com (d24relay01.br.ibm.com [9.8.31.16]) by d24dlp01.br.ibm.com (Postfix) with ESMTP id EED593520068; Tue, 22 Nov 2016 12:54:15 -0500 (EST) Received: from d24av02.br.ibm.com (d24av02.br.ibm.com [9.8.31.93]) by d24relay01.br.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id uAMHsjT85226590; Tue, 22 Nov 2016 15:54:45 -0200 Received: from d24av02.br.ibm.com (localhost [127.0.0.1]) by d24av02.br.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id uAMHsiJ2005908; Tue, 22 Nov 2016 15:54:44 -0200 Received: from localhost ([9.85.148.25]) by d24av02.br.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id uAMHshlD005895; Tue, 22 Nov 2016 15:54:44 -0200 From: Gabriel Krisman Bertazi To: Jens Axboe Cc: Gabriel Krisman Bertazi , Brian King , Douglas Miller , linux-block@vger.kernel.org, linux-scsi@vger.kernel.org Subject: [PATCH v2 1/2] blk-mq: Fix failed allocation path when mapping queues Date: Tue, 22 Nov 2016 15:54:28 -0200 X-Mailer: git-send-email 2.7.4 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16112217-0020-0000-0000-000002675382 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16112217-0021-0000-0000-0000307D54C6 Message-Id: <1479837269-5742-1-git-send-email-krisman@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-11-22_10:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611220313 Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In blk_mq_map_swqueue, there is a memory optimization that frees the tags of a queue that has gone unmapped. Later, if that hctx is remapped after another topology change, the tags need to be reallocated. If this allocation fails, a simple WARN_ON triggers, but the block layer ends up with an active hctx without any corresponding set of tags. Then, any income IO to that hctx can trigger an Oops. I can reproduce it consistently by running IO, flipping CPUs on and off and eventually injecting a memory allocation failure in that path. In the fix below, if the system experiences a failed allocation of any hctx's tags, we remap all the ctxs of that queue to the hctx_0, which should always keep it's tags. There is a minor performance hit, since our mapping just got worse after the error path, but this is the simplest solution to handle this error path. The performance hit will disappear after another successful remap. I considered dropping the memory optimization all together, but it seemed a bad trade-off to handle this very specific error case. My one concern about this patch is if remapping an arbitrary queue to hctx_0 could result in outstanding requests getting submitted to the wrong hctx. I couldn't observe this happening during tests, but I'm not entirely sure it'll never happen. I believe the queue will be empty if we are trying to allocate tags for it, unless it was using another alive hctx queue and for some reason got reassigned to this new hctx. is this possible at all? This should apply cleanly on top of Jen's for-next branch. The Oops is the one below: SP (3fff935ce4d0) is in userspace 1:mon> e cpu 0x1: Vector: 300 (Data Access) at [c000000fe99eb110] pc: c0000000005e868c: __sbitmap_queue_get+0x2c/0x180 lr: c000000000575328: __bt_get+0x48/0xd0 sp: c000000fe99eb390 msr: 900000010280b033 dar: 28 dsisr: 40000000 current = 0xc000000fe9966800 paca = 0xc000000007e80300 softe: 0 irq_happened: 0x01 pid = 11035, comm = aio-stress Linux version 4.8.0-rc6+ (root@bean) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #3 SMP Mon Oct 10 20:16:53 CDT 2016 1:mon> s [c000000fe99eb3d0] c000000000575328 __bt_get+0x48/0xd0 [c000000fe99eb400] c000000000575838 bt_get.isra.1+0x78/0x2d0 [c000000fe99eb480] c000000000575cb4 blk_mq_get_tag+0x44/0x100 [c000000fe99eb4b0] c00000000056f6f4 __blk_mq_alloc_request+0x44/0x220 [c000000fe99eb500] c000000000570050 blk_mq_map_request+0x100/0x1f0 [c000000fe99eb580] c000000000574650 blk_mq_make_request+0xf0/0x540 [c000000fe99eb640] c000000000561c44 generic_make_request+0x144/0x230 [c000000fe99eb690] c000000000561e00 submit_bio+0xd0/0x200 [c000000fe99eb740] c0000000003ef740 ext4_io_submit+0x90/0xb0 [c000000fe99eb770] c0000000003e95d8 ext4_writepages+0x588/0xdd0 [c000000fe99eb910] c00000000025a9f0 do_writepages+0x60/0xc0 [c000000fe99eb940] c000000000246c88 __filemap_fdatawrite_range+0xf8/0x180 [c000000fe99eb9e0] c000000000246f90 filemap_write_and_wait_range+0x70/0xf0 [c000000fe99eba20] c0000000003dd844 ext4_sync_file+0x214/0x540 [c000000fe99eba80] c000000000364718 vfs_fsync_range+0x78/0x130 [c000000fe99ebad0] c0000000003dd46c ext4_file_write_iter+0x35c/0x430 [c000000fe99ebb90] c00000000038c280 aio_run_iocb+0x3b0/0x450 [c000000fe99ebce0] c00000000038dc28 do_io_submit+0x368/0x730 [c000000fe99ebe30] c000000000009404 system_call+0x38/0xec Signed-off-by: Gabriel Krisman Bertazi Cc: Brian King Cc: Douglas Miller Cc: linux-block@vger.kernel.org Cc: linux-scsi@vger.kernel.org --- block/blk-mq.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index a6af66ae01e5..8cc58f23b2c2 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1870,7 +1870,7 @@ static void blk_mq_init_cpu_queues(struct request_queue *q, static void blk_mq_map_swqueue(struct request_queue *q, const struct cpumask *online_mask) { - unsigned int i; + unsigned int i, hctx_idx; struct blk_mq_hw_ctx *hctx; struct blk_mq_ctx *ctx; struct blk_mq_tag_set *set = q->tag_set; @@ -1893,6 +1893,15 @@ static void blk_mq_map_swqueue(struct request_queue *q, if (!cpumask_test_cpu(i, online_mask)) continue; + hctx_idx = q->mq_map[i]; + /* unmapped hw queue can be remapped after CPU topo changed */ + if (!set->tags[hctx_idx]) { + set->tags[hctx_idx] = blk_mq_init_rq_map(set, hctx_idx); + + if (!set->tags[hctx_idx]) + q->mq_map[i] = 0; + } + ctx = per_cpu_ptr(q->queue_ctx, i); hctx = blk_mq_map_queue(q, i); @@ -1909,7 +1918,10 @@ static void blk_mq_map_swqueue(struct request_queue *q, * disable it and free the request entries. */ if (!hctx->nr_ctx) { - if (set->tags[i]) { + /* Never unmap queue 0. We need it as a + * fallback in case of a new remap fails + * allocation. */ + if (i && set->tags[i]) { blk_mq_free_rq_map(set, set->tags[i], i); set->tags[i] = NULL; } @@ -1917,11 +1929,8 @@ static void blk_mq_map_swqueue(struct request_queue *q, continue; } - /* unmapped hw queue can be remapped after CPU topo changed */ - if (!set->tags[i]) - set->tags[i] = blk_mq_init_rq_map(set, i); hctx->tags = set->tags[i]; - WARN_ON(!hctx->tags); + BUG_ON(!hctx->tags); /* * Set the map size to the number of mapped software queues.