From patchwork Mon Jan 2 14:33:02 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jack Wang X-Patchwork-Id: 9493729 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 77D1762AB3 for ; Mon, 2 Jan 2017 14:33:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6053724560 for ; Mon, 2 Jan 2017 14:33:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 532C4268AE; Mon, 2 Jan 2017 14:33:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 718B024560 for ; Mon, 2 Jan 2017 14:33:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932849AbdABOdF (ORCPT ); Mon, 2 Jan 2017 09:33:05 -0500 Received: from mail-it0-f65.google.com ([209.85.214.65]:34935 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932719AbdABOdE (ORCPT ); Mon, 2 Jan 2017 09:33:04 -0500 Received: by mail-it0-f65.google.com with SMTP id b123so48207029itb.2; Mon, 02 Jan 2017 06:33:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=hqkTx1u8/LWs3RTcXqAzSZ6pTLHXCkISf/MwgV2uwho=; b=vaPHNrfvwzYlE56CfJhxY/FDh6RE87OH3AhY5aQoAnZHOYTb2IKhWEVZg3cp2W8AE/ MRondbvCTMaCQ9Ln66uRipvOhliUQcCYKto9FfypzLI71kjNJHB/N6Z4ciKsvDwJR6TW oTrusWPHPf0trIqzxe2jYeUJ+lO+9T13YhZ23sRG45iX/o/TTNbByr+A+u+5DXbySt00 3vcYH1fKsqiCIY7FHA/QFbO9//hvPh1Cq0Y9oeWhqMyV11Bd+3C2yC9ZGUX3FpPl5gKv nN8YBvY/qKKm0vt5cZcKsfldzOKgpGjWDx4oLgKpPjmlsLcmOmOuF3jMt38REZPqQKRY mgxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=hqkTx1u8/LWs3RTcXqAzSZ6pTLHXCkISf/MwgV2uwho=; b=KjbcQ1ibnA7kgUMAhzAcBj8+9rqbJASB2eDvxGJuqf4aqnx37sLrJ9wz6JT1S5wzwD KGhDgZBhNNeltj13twnsQB7cat5FF/YKzYj+WrVXH0ULKllfy/bgfCtXLg+/wcO9RdxC L6xVjK6DlmNm37+4pEmHVqtlhpCwxw1yAKdta136G/FSDDtlyI5AaRGp6YEuZosW4dhe grTwKFCfGThzy5121n5DOtiEVbsqqYWhUemvoMtUKP/EdprPBIIc5jrTFCA1tKPcVKfs QkbQeOGFGyERGnl97xkN8i9nMOUi11/lSAuyudMMynw5SvI9PH4u0fqkYspu6TJXMIn6 VnRQ== X-Gm-Message-State: AIkVDXIB3+uOAqTD2QHle7OH502dcryCijc53P7BAHGEXx9z9BUbq6/+mcbTfPAuSaMiPdE7SooJ79g0qXWUyA== X-Received: by 10.36.129.2 with SMTP id q2mr41136662itd.26.1483367582924; Mon, 02 Jan 2017 06:33:02 -0800 (PST) MIME-Version: 1.0 Received: by 10.64.66.193 with HTTP; Mon, 2 Jan 2017 06:33:02 -0800 (PST) In-Reply-To: <20161223114553.GP4138@soda.linbit> References: <1467990243-3531-1-git-send-email-lars.ellenberg@linbit.com> <1467990243-3531-2-git-send-email-lars.ellenberg@linbit.com> <20160711141042.GY13335@soda.linbit> <76d9bf14-d848-4405-8358-3771c0a93d39@profitbricks.com> <20161223114553.GP4138@soda.linbit> From: Jack Wang Date: Mon, 2 Jan 2017 15:33:02 +0100 Message-ID: Subject: Re: [dm-devel] [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion To: Lars Ellenberg Cc: Michael Wang , Jens Axboe , linux-block@vger.kernel.org, "Martin K. Petersen" , Mike Snitzer , Peter Zijlstra , Jiri Kosina , Ming Lei , "Kirill A. Shutemov" , NeilBrown , linux-kernel@vger.kernel.org, linux-raid , Takashi Iwai , "linux-bcache@vger.kernel.org" , Zheng Liu , Kent Overstreet , Keith Busch , device-mapper development , Shaohua Li , Ingo Molnar , Alasdair Kergon , Roland Kammerer Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP 2016-12-23 12:45 GMT+01:00 Lars Ellenberg : > On Fri, Dec 23, 2016 at 09:49:53AM +0100, Michael Wang wrote: >> Dear Maintainers >> >> I'd like to ask for the status of this patch since we hit the >> issue too during our testing on md raid1. >> >> Split remainder bio_A was queued ahead, following by bio_B for >> lower device, at this moment raid start freezing, the loop take >> out bio_A firstly and deliver it, which will hung since raid is >> freezing, while the freezing never end since it waiting for >> bio_B to finish, and bio_B is still on the queue, waiting for >> bio_A to finish... >> >> We're looking for a good solution and we found this patch >> already progressed a lot, but we can't find it on linux-next, >> so we'd like to ask are we still planning to have this fix >> in upstream? > > I don't see why not, I'd even like to have it in older kernels, > but did not have the time and energy to push it. > > Thanks for the bump. > > Lars > Hi folks, As Michael mentioned, we hit a bug this patch is trying to fix. Neil suggested another way to fix it. I attached below. I personal prefer Neil's version as it's less code change, and straight forward. Could you share your comments, we can get one fix into mainline. Thanks, Jinpu From 69a4829a55503e496ce9c730d2c8e3dd8a08874a Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 14 Dec 2016 16:55:52 +0100 Subject: [PATCH] block: fix deadlock between freeze_array() and wait_barrier() When we call wait_barrier, we might have some bios waiting in current->bio_list, which prevents the array_freeze call to complete. Those can only be internal READs, which have already passed the wait_barrier call (thus incrementing nr_pending), but still were not submitted to the lower level, due to generic_make_request logic to avoid recursive calls. In such case, we have a deadlock: - array_frozen is already set to 1, so wait_barrier unconditionally waits, so - internal READ bios will not be submitted, thus freeze_array will never completes. To fix this, modify generic_make_request to always sort bio_list_on_stack first with lowest level, then higher, until same level. Sent to linux-raid mail list: https://marc.info/?l=linux-raid&m=148232453107685&w=2 Suggested-by: NeilBrown Signed-off-by: Jack Wang --- block/blk-core.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/block/blk-core.c b/block/blk-core.c index 9e3ac56..47ef373 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2138,10 +2138,30 @@ blk_qc_t generic_make_request(struct bio *bio) struct request_queue *q = bdev_get_queue(bio->bi_bdev); if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) { + struct bio_list lower, same, hold; + + /* Create a fresh bio_list for all subordinate requests */ + bio_list_init(&hold); + bio_list_merge(&hold, &bio_list_on_stack); + bio_list_init(&bio_list_on_stack); ret = q->make_request_fn(q, bio); blk_queue_exit(q); + /* sort new bios into those for a lower level + * and those for the same level + */ + bio_list_init(&lower); + bio_list_init(&same); + while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL) + if (q == bdev_get_queue(bio->bi_bdev)) + bio_list_add(&same, bio); + else + bio_list_add(&lower, bio); + /* now assemble so we handle the lowest level first */ + bio_list_merge(&bio_list_on_stack, &lower); + bio_list_merge(&bio_list_on_stack, &same); + bio_list_merge(&bio_list_on_stack, &hold); bio = bio_list_pop(current->bio_list); } else { -- 2.7.4