From patchwork Fri Jan 6 23:01:07 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 9502463 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DCF4660413 for ; Fri, 6 Jan 2017 23:03:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C17A028532 for ; Fri, 6 Jan 2017 23:03:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B5A5628540; Fri, 6 Jan 2017 23:03:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 177FB28532 for ; Fri, 6 Jan 2017 23:03:01 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id v06N1U1w009473; Fri, 6 Jan 2017 18:01:31 -0500 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id v06N1TW9013181 for ; Fri, 6 Jan 2017 18:01:29 -0500 Received: from mx1.redhat.com (ext-mx01.extmail.prod.ext.phx2.redhat.com [10.5.110.25]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v06N1SnE030367 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 6 Jan 2017 18:01:28 -0500 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) (using TLSv1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5E40D81252; Fri, 6 Jan 2017 23:01:27 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 7AB09AC0D; Fri, 6 Jan 2017 23:01:24 +0000 (UTC) From: NeilBrown To: Mike Snitzer , Mikulas Patocka Date: Sat, 07 Jan 2017 10:01:07 +1100 In-Reply-To: <20170106195216.GA15583@redhat.com> References: <1467990243-3531-1-git-send-email-lars.ellenberg@linbit.com> <1467990243-3531-2-git-send-email-lars.ellenberg@linbit.com> <20160711141042.GY13335@soda.linbit> <76d9bf14-d848-4405-8358-3771c0a93d39@profitbricks.com> <20161223114553.GP4138@soda.linbit> <878tqrmmqx.fsf@notabene.neil.brown.name> <20170104185046.GA982@redhat.com> <20170106195216.GA15583@redhat.com> User-Agent: Notmuch/0.22.1 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-suse-linux-gnu) Message-ID: <877f67lrnw.fsf@notabene.neil.brown.name> MIME-Version: 1.0 X-Greylist: Sender passed SPF test, Sender IP whitelisted by DNSRBL, ACL 199 matched, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Fri, 06 Jan 2017 23:01:27 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Fri, 06 Jan 2017 23:01:27 +0000 (UTC) for IP:'195.135.220.15' DOMAIN:'mx2.suse.de' HELO:'mx2.suse.de' FROM:'neilb@suse.de' RCPT:'' X-RedHat-Spam-Score: -4.699 (BAYES_50, HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, SPF_PASS) 195.135.220.15 mx2.suse.de 195.135.220.15 mx2.suse.de X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Scanned-By: MIMEDefang 2.78 on 10.5.110.25 X-loop: dm-devel@redhat.com Cc: Peter Zijlstra , Ming Lei , Zheng Liu , Keith Busch , device-mapper development , Jack Wang , Alasdair Kergon , Roland Kammerer , Michael Wang , Takashi Iwai , Ingo Molnar , Shaohua Li , Kent Overstreet , linux-block@vger.kernel.org, "linux-bcache@vger.kernel.org" , Jens Axboe , linux-raid , "Martin K. Petersen" , Jiri Kosina , linux-kernel@vger.kernel.org, Lars Ellenberg , "Kirill A. Shutemov" Subject: Re: [dm-devel] [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Virus-Scanned: ClamAV using ClamSMTP On Sat, Jan 07 2017, Mike Snitzer wrote: > On Fri, Jan 06 2017 at 12:34pm -0500, > Mikulas Patocka wrote: > >> >> >> On Fri, 6 Jan 2017, Mikulas Patocka wrote: >> >> > >> > >> > On Wed, 4 Jan 2017, Mike Snitzer wrote: >> > >> > > On Wed, Jan 04 2017 at 12:12am -0500, >> > > NeilBrown wrote: >> > > >> > > > > Suggested-by: NeilBrown >> > > > > Signed-off-by: Jack Wang >> > > > > --- >> > > > > block/blk-core.c | 20 ++++++++++++++++++++ >> > > > > 1 file changed, 20 insertions(+) >> > > > > >> > > > > diff --git a/block/blk-core.c b/block/blk-core.c >> > > > > index 9e3ac56..47ef373 100644 >> > > > > --- a/block/blk-core.c >> > > > > +++ b/block/blk-core.c >> > > > > @@ -2138,10 +2138,30 @@ blk_qc_t generic_make_request(struct bio *bio) >> > > > > struct request_queue *q = bdev_get_queue(bio->bi_bdev); >> > > > > >> > > > > if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) { >> > > > > + struct bio_list lower, same, hold; >> > > > > + >> > > > > + /* Create a fresh bio_list for all subordinate requests */ >> > > > > + bio_list_init(&hold); >> > > > > + bio_list_merge(&hold, &bio_list_on_stack); >> > > > > + bio_list_init(&bio_list_on_stack); >> > > > > >> > > > > ret = q->make_request_fn(q, bio); >> > > > > >> > > > > blk_queue_exit(q); >> > > > > + /* sort new bios into those for a lower level >> > > > > + * and those for the same level >> > > > > + */ >> > > > > + bio_list_init(&lower); >> > > > > + bio_list_init(&same); >> > > > > + while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL) >> > > > > + if (q == bdev_get_queue(bio->bi_bdev)) >> > > > > + bio_list_add(&same, bio); >> > > > > + else >> > > > > + bio_list_add(&lower, bio); >> > > > > + /* now assemble so we handle the lowest level first */ >> > > > > + bio_list_merge(&bio_list_on_stack, &lower); >> > > > > + bio_list_merge(&bio_list_on_stack, &same); >> > > > > + bio_list_merge(&bio_list_on_stack, &hold); >> > > > > >> > > > > bio = bio_list_pop(current->bio_list); >> > > > > } else { >> > > > > -- >> > > > > 2.7.4 >> > > >> > > Mikulas, would you be willing to try the below patch with the >> > > dm-snapshot deadlock scenario and report back on whether it fixes that? >> > > >> > > Patch below looks to be the same as here: >> > > https://marc.info/?l=linux-raid&m=148232453107685&q=p3 >> > > >> > > Neil and/or others if that isn't the patch that should be tested please >> > > provide a pointer to the latest. >> > > >> > > Thanks, >> > > Mike >> > >> > The bad news is that this doesn't fix the snapshot deadlock. >> > >> > I created a test program for the snapshot deadlock bug (it was originally >> > created years ago to test for a different bug, so it contains some cruft). >> > You also need to insert "if (ci->sector_count) msleep(100);" to the end of >> > __split_and_process_non_flush to make the kernel sleep when splitting the >> > bio. >> > >> > And with the above above patch, the snapshot deadlock bug still happens. > > That is really unfortunate. Would be useful to dig in and understand > why. Because ordering of the IO in generic_make_request() really should > take care of it. I *think* you might be able to resolve this by changing __split_and_process_bio() to only ever perform a single split. No looping. i.e. if the bio is too big to handle directly, then split off the front to a new bio, which you bio_chain to the original. The original then has bio_advance() called to stop over the front, then generic_make_request() so it is queued. Then the code proceeds to __clone_and_map_data_bio() on the front that got split off. When that completes it *doesn't* loop round, but returns into generic_make_request() which does the looping and makes sure to handle the lowest-level bio next. something vaguely like this: though I haven't tested, and the change (if it works) should probably be more fully integrated into surrounding code. You probably don't want to use "fs_bio_set" either - a target-local pool would be best. NeilBrown --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 3086da5664f3..06ee0960e415 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct clone_info *ci) len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count); + if (len < ci->sector_count) { + struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set); + bio_chain(split, bio); + generic_make_request(bio); + bio = split; + ci->sector_count = len; + } + r = __clone_and_map_data_bio(ci, ti, ci->sector, &len); if (r < 0) return r;