From patchwork Sun May 22 04:26:29 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Johnston X-Patchwork-Id: 9130707 X-Patchwork-Delegate: snitzer@redhat.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 918926048B for ; Sun, 22 May 2016 04:31:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3576B28135 for ; Sun, 22 May 2016 04:31:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2A08128191; Sun, 22 May 2016 04:31:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from mx6-phx2.redhat.com (mx6-phx2.redhat.com [209.132.183.39]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5FE4D28135 for ; Sun, 22 May 2016 04:31:16 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx6-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u4M4Qsdr014457; Sun, 22 May 2016 00:26:54 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u4M4QqB8011401 for ; Sun, 22 May 2016 00:26:52 -0400 Received: from mx1.redhat.com (ext-mx04.extmail.prod.ext.phx2.redhat.com [10.5.110.28]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u4M4Qqfq015739 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 22 May 2016 00:26:52 -0400 Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 24F878553D; Sun, 22 May 2016 04:26:51 +0000 (UTC) Received: from JTJLaptop (c-68-59-201-157.hsd1.tn.comcast.net [68.59.201.157]) by mx.zohomail.com with SMTPS id 1463891192162993.8842008862897; Sat, 21 May 2016 21:26:32 -0700 (PDT) From: "James Johnston" To: "'Eric Wheeler'" References: <044401d1a958$ea7ef4e0$bf7cdea0$@codenest.com> <5739F07A.5090608@buttersideup.com> <02b101d1b265$2bc46fb0$834d4f10$@codenest.com> In-Reply-To: Date: Sun, 22 May 2016 04:26:29 -0000 Message-ID: <034e01d1b3e2$1de2f910$59a8eb30$@codenest.com> MIME-Version: 1.0 Thread-Index: AQJljQO2/LdKh7ULWmq6UtY/arGUzgF2yKFxAloiMGwB1GuXNwHPiFr0nmD90PA= Content-Language: en-us X-Greylist: Sender IP whitelisted by DNSRBL, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Sun, 22 May 2016 04:26:51 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Sun, 22 May 2016 04:26:51 +0000 (UTC) for IP:'74.201.84.163' DOMAIN:'sender163-mail.zoho.com' HELO:'sender163-mail.zoho.com' FROM:'johnstonj.public@codenest.com' RCPT:'' X-RedHat-Spam-Score: 0.389 (BAYES_50, DCC_REPUT_00_12, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2) 74.201.84.163 sender163-mail.zoho.com 74.201.84.163 sender163-mail.zoho.com X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Scanned-By: MIMEDefang 2.75 on 10.5.110.28 X-MIME-Autoconverted: from quoted-printable to 8bit by lists01.pubmisc.prod.ext.phx2.redhat.com id u4M4QqB8011401 X-loop: dm-devel@redhat.com Cc: "'Mike Snitzer'" , dm-crypt@saout.de, dm-devel@redhat.com, "'Neil Brown'" , linux-raid@vger.kernel.org, linux-bcache@vger.kernel.org, "'Mikulas Patocka'" , "'Kent Overstreet'" , "'Alasdair Kergon'" , "'Tim Small'" Subject: Re: [dm-devel] bcache gets stuck flushing writeback cache when used in combination with LUKS/dm-crypt and non-default bucket size X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Virus-Scanned: ClamAV using ClamSMTP > On Fri, 20 May 2016, James Johnston wrote: > > > > On Mon, 16 May 2016, Tim Small wrote: > > > > > > > On 08/05/16 19:39, James Johnston wrote: > > > > > I've run into a problem where the bcache writeback cache can't be flushed to > > > > > disk when the backing device is a LUKS / dm-crypt device and the cache set has > > > > > a non-default bucket size. Basically, only a few megabytes will be flushed to > > > > > disk, and then it gets stuck. Stuck means that the bcache writeback task > > > > > thrashes the disk by constantly reading hundreds of MB/second from the cache set > > > > > in an infinite loop, while not actually progressing (dirty_data never decreases > > > > > beyond a certain point). > > > > > > > > > [...] > > > > > > > > > The situation is basically unrecoverable as far as I can tell: if you attempt > > > > > to detach the cache set then the cache set disk gets thrashed extra-hard > > > > > forever, and it's impossible to actually get the cache set detached. The only > > > > > solution seems to be to back up the data and destroy the volume... > > > > > > > > You can boot an older kernel to flush the device without destroying it > > > > (I'm guessing that's because older kernels split down the big requests > > > > which are failing on the 4.4 kernel). Once flushed you could put the > > > > cache into writethrough mode, or use a smaller bucket size. > > > > > > Indeed, can someone test 4.1.y and see if the problem persists with a 2M > > > bucket size? (If someone has already tested 4.1, then appologies as I've > > > not yet seen that report.) > > > > > > If 4.1 works, then I think a bisect is in order. Such a bisect would at > > > least highlight the problem and might indicate a (hopefully trivial) fix. > > > > To help narrow this down, I tested the following generic pre-compiled mainline kernels > > on Ubuntu 15.10: > > > > * WORKS: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.3.6-wily/ > > * DOES NOT WORK: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc1+cod1-wily/ > > > > I also tried the default & latest distribution-provided 4.2 kernel. It worked. > > This one also worked: > > > > * WORKS: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2.8-wily/ > > > > So it seems to me that it is a regression from 4.3.6 kernel to any 4.4 kernel. That > > should help save time with bisection... > > Below is the patchlist for md and block that might help with a place to > start. Are there any other places in the Linux tree where we should watch > for changes? > > I'm wondering if it might be in dm-4.4-changes since this is dm-crypt > related, but it could be ac322de which was quite large. > > James or Tim, > > Can you try building ac322de? If that produces the problem, then there > are only 3 more to try (unless this was actually a problem in 4.3 which > was fixed in 4.3.y, but hopefully that isn't so). > > ccf21b6 is probably the next to test to rule out neil's big md patch, > which Linus abreviated in the commit log so it must be quite long. OTOH, > if dm-4.4-changes works, then I'm not sure what commit might produce the > problem because the rest are not obviously relevant to the issue that are > more recent. So I decided to go ahead and bisect it today. Looks like the bad commit is this one. The commit prior flushed the bcache writeback cache without incident; this one does not and I guess caused this bcache regression. (FWIW ac322de came up during bisection, and tested good.) johnstonj@kernel-build:~/linux$ git bisect bad dbba42d8a9ebddcc1c1412e8457f79f3cb6ef6e7 is the first bad commit commit dbba42d8a9ebddcc1c1412e8457f79f3cb6ef6e7 Author: Mikulas Patocka Date: Wed Oct 21 16:34:20 2015 -0400 dm: eliminate unused "bioset" process for each bio-based DM device Commit 54efd50bfd873e2dbf784e0b21a8027ba4299a3e ("block: make generic_make_request handle arbitrarily sized bios") makes it possible for block devices to process large bios. In doing so that commit allocates a new queue->bio_split bioset for each block device, this bioset is used for allocating bios when the driver needs to split large bios. Each bioset allocates a workqueue process, thus the above commit increases the number of processes allocated per block device. DM doesn't need the queue->bio_split bioset, thus we can deallocate it. This reduces the number of allocated processes per bio-based DM device from 3 to 2. Also remove the call to blk_queue_split(), it is not needed because DM does its own splitting. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer The patch for this commit is very brief; reproduced here: Here is the bisect log: johnstonj@kernel-build:~/linux$ git bisect log git bisect start # good: [6a13feb9c82803e2b815eca72fa7a9f5561d7861] Linux 4.3 git bisect good 6a13feb9c82803e2b815eca72fa7a9f5561d7861 # bad: [8005c49d9aea74d382f474ce11afbbc7d7130bec] Linux 4.4-rc1 git bisect bad 8005c49d9aea74d382f474ce11afbbc7d7130bec # bad: [118c216e16c5ccb028cd03a0dcd56d17a07ff8d7] Merge tag 'staging-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad 118c216e16c5ccb028cd03a0dcd56d17a07ff8d7 # good: [e627078a0cbdc0c391efeb5a2c4eb287328fd633] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux git bisect good e627078a0cbdc0c391efeb5a2c4eb287328fd633 # good: [c17c6da659571a115c7b4983da6c6ac464317c34] staging: wilc1000: rename pfScanResult of struct scan_attr git bisect good c17c6da659571a115c7b4983da6c6ac464317c34 # good: [7bdb7d554e0e433b92b63f3472523cc3067f8ab4] Staging: rtl8192u: ieee80211: corrected indent git bisect good 7bdb7d554e0e433b92b63f3472523cc3067f8ab4 # good: [ac322de6bf5416cb145b58599297b8be73cd86ac] Merge tag 'md/4.4' of git://neil.brown.name/md git bisect good ac322de6bf5416cb145b58599297b8be73cd86ac # good: [a4d8e93c3182a54d8d21a4d1cec6538ae1be9e16] Merge tag 'usb-for-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next git bisect good a4d8e93c3182a54d8d21a4d1cec6538ae1be9e16 # good: [4f56f3fdca43c9a18339b6e0c3b1aa2f57f6d0b0] serial: 8250: Tolerate clock variance for max baud rate git bisect good 4f56f3fdca43c9a18339b6e0c3b1aa2f57f6d0b0 # good: [e052c6d15c61cc4caff2f06cbca72b183da9f15e] tty: Use unbound workqueue for all input workers git bisect good e052c6d15c61cc4caff2f06cbca72b183da9f15e # good: [b9ca0c948c921e960006aaf319a29c004917cdf6] uwb: neh: Use setup_timer git bisect good b9ca0c948c921e960006aaf319a29c004917cdf6 # bad: [aad9ae4550755edc020b5c511a8b54f0104b2f47] dm switch: simplify conditional in alloc_region_table() git bisect bad aad9ae4550755edc020b5c511a8b54f0104b2f47 # good: [a3d939ae7b5f82688a6d3450f95286eaea338328] dm: convert ffs to __ffs git bisect good a3d939ae7b5f82688a6d3450f95286eaea338328 # bad: [00272c854ee17b804ce81ef706f611dac17f4f89] dm linear: remove redundant target name from error messages git bisect bad 00272c854ee17b804ce81ef706f611dac17f4f89 # bad: [4c7da06f5a780bbf44ebd7547789e48536d0a823] dm persistent data: eliminate unnecessary return values git bisect bad 4c7da06f5a780bbf44ebd7547789e48536d0a823 # bad: [dbba42d8a9ebddcc1c1412e8457f79f3cb6ef6e7] dm: eliminate unused "bioset" process for each bio-based DM device git bisect bad dbba42d8a9ebddcc1c1412e8457f79f3cb6ef6e7 # first bad commit: [dbba42d8a9ebddcc1c1412e8457f79f3cb6ef6e7] dm: eliminate unused "bioset" process for each bio-based DM device Commands used for testing: # Make cache set make-bcache --bucket 2M -C /dev/sdb # Set up backing device crypto cryptsetup luksFormat /dev/sdc cryptsetup open --type luks /dev/sdc backCrypt # Make backing device & enable writeback make-bcache -B /dev/mapper/backCrypt bcache-super-show /dev/sdb | grep cset.uuid | cut -f 3 > /sys/block/bcache0/bcache/attach echo writeback > /sys/block/bcache0/bcache/cache_mode # KILL SEQUENCE cd /sys/block/bcache0/bcache echo 0 > sequential_cutoff # Verify that the cache is attached (i.e. does not say "no cache") cat state dd if=/dev/urandom of=/dev/bcache0 bs=1M count=250 cat dirty_data cat state # Next line causes severe disk thrashing and failure to flush writeback cache # on bad commits. echo 1 > detach cat dirty_data cat state Hope this provides some insight into the problem... James --- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 9555843..64b50b7 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1763,8 +1763,6 @@ static void dm_make_request(struct request_queue *q, struct bio *bio) map = dm_get_live_table(md, &srcu_idx); - blk_queue_split(q, &bio, q->bio_split); - generic_start_io_acct(rw, bio_sectors(bio), &dm_disk(md)->part0); /* if we're suspended, we have to queue this io for later */ @@ -2792,6 +2790,12 @@ int dm_setup_md_queue(struct mapped_device *md) case DM_TYPE_BIO_BASED: dm_init_old_md_queue(md); blk_queue_make_request(md->queue, dm_make_request); + /* + * DM handles splitting bios as needed. Free the bio_split bioset + * since it won't be used (saves 1 process per bio-based DM device). + */ + bioset_free(md->queue->bio_split); + md->queue->bio_split = NULL; break; }