From patchwork Mon Aug 30 09:58:14 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 143431 Received: from mx01.colomx.prod.int.phx2.redhat.com (mx3-phx2.redhat.com [209.132.183.24]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id o7UJSfDm021917 for ; Mon, 30 Aug 2010 19:29:16 GMT Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx01.colomx.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o7UJQQZQ002150; Mon, 30 Aug 2010 15:26:26 -0400 Received: from int-mx03.intmail.prod.int.phx2.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.16]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o7UJOZ3o021895 for ; Mon, 30 Aug 2010 15:24:35 -0400 Received: from localhost (dhcp-100-19-150.bos.redhat.com [10.16.19.150]) by int-mx03.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o7UJOTno029622 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Mon, 30 Aug 2010 15:24:30 -0400 Resent-From: Mike Snitzer Resent-Date: Mon, 30 Aug 2010 15:24:29 -0400 Resent-Message-ID: <20100830192429.GE9195@redhat.com> Resent-To: dm-devel@redhat.com Received: from zmta02.collab.prod.int.phx2.redhat.com (LHLO zmta02.collab.prod.int.phx2.redhat.com) (10.5.5.32) by mail05.corp.redhat.com with LMTP; Mon, 30 Aug 2010 06:05:18 -0400 (EDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by zmta02.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id AE7BD128005 for ; Mon, 30 Aug 2010 06:05:18 -0400 (EDT) Received: from zmta02.collab.prod.int.phx2.redhat.com ([127.0.0.1]) by localhost (zmta02.collab.prod.int.phx2.redhat.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iRvMf9UVFYVf for ; Mon, 30 Aug 2010 06:05:18 -0400 (EDT) Received: from int-mx08.intmail.prod.int.phx2.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by zmta02.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 53740128001 for ; Mon, 30 Aug 2010 06:05:18 -0400 (EDT) Received: from mx1.redhat.com (ext-mx03.extmail.prod.ext.phx2.redhat.com [10.5.110.7]) by int-mx08.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o7UA5HWZ004983 for ; Mon, 30 Aug 2010 06:05:17 -0400 Received: from hera.kernel.org (hera.kernel.org [140.211.167.34]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o7UA57c1000904 for ; Mon, 30 Aug 2010 06:05:07 -0400 Received: from htj.dyndns.org (localhost [127.0.0.1]) by hera.kernel.org (8.14.4/8.14.3) with ESMTP id o7UA4u9Z026650; Mon, 30 Aug 2010 10:04:56 GMT X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.95.2 at hera.kernel.org Received: by htj.dyndns.org (Postfix, from userid 10000) id 881761CC06A7; Mon, 30 Aug 2010 11:58:18 +0200 (CEST) From: Tejun Heo To: jaxboe@fusionio.com, k-ueda@ct.jp.nec.com, snitzer@redhat.com, j-nomura@ce.jp.nec.com, jamie@shareable.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-raid@vger.kernel.org, hch@lst.de Date: Mon, 30 Aug 2010 11:58:14 +0200 Message-Id: <1283162296-13650-4-git-send-email-tj@kernel.org> In-Reply-To: <1283162296-13650-1-git-send-email-tj@kernel.org> References: <1283162296-13650-1-git-send-email-tj@kernel.org> X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on hera.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter1.kernel.org [140.211.167.41]); Mon, 30 Aug 2010 19:29:16 +0000 (UTC) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3 (hera.kernel.org [127.0.0.1]); Mon, 30 Aug 2010 10:05:00 +0000 (UTC) X-RedHat-Spam-Score: -2.31 (RCVD_IN_DNSWL_MED,T_RP_MATCHES_RCVD) X-Scanned-By: MIMEDefang 2.67 on 10.5.11.16 X-Scanned-By: MIMEDefang 2.67 on 10.5.11.21 X-Scanned-By: MIMEDefang 2.67 on 10.5.110.7 X-loop: dm-devel@redhat.com Cc: Tejun Heo Subject: [dm-devel] [PATCH 3/5] dm: relax ordering of bio-based flush implementation X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 32e6622..e67c519 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -110,7 +110,6 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo); #define DMF_FREEING 3 #define DMF_DELETING 4 #define DMF_NOFLUSH_SUSPENDING 5 -#define DMF_QUEUE_IO_TO_THREAD 6 /* * Work processed by per-device workqueue. @@ -144,11 +143,6 @@ struct mapped_device { spinlock_t deferred_lock; /* - * An error from the flush request currently being processed. - */ - int flush_error; - - /* * Protect barrier_error from concurrent endio processing * in request-based dm. */ @@ -529,16 +523,10 @@ static void end_io_acct(struct dm_io *io) */ static void queue_io(struct mapped_device *md, struct bio *bio) { - down_write(&md->io_lock); - spin_lock_irq(&md->deferred_lock); bio_list_add(&md->deferred, bio); spin_unlock_irq(&md->deferred_lock); - - if (!test_and_set_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags)) - queue_work(md->wq, &md->work); - - up_write(&md->io_lock); + queue_work(md->wq, &md->work); } /* @@ -626,11 +614,9 @@ static void dec_pending(struct dm_io *io, int error) * Target requested pushing back the I/O. */ spin_lock_irqsave(&md->deferred_lock, flags); - if (__noflush_suspending(md)) { - if (!(io->bio->bi_rw & REQ_FLUSH)) - bio_list_add_head(&md->deferred, - io->bio); - } else + if (__noflush_suspending(md)) + bio_list_add_head(&md->deferred, io->bio); + else /* noflush suspend was interrupted. */ io->error = -EIO; spin_unlock_irqrestore(&md->deferred_lock, flags); @@ -638,26 +624,22 @@ static void dec_pending(struct dm_io *io, int error) io_error = io->error; bio = io->bio; + end_io_acct(io); + free_io(md, io); + + if (io_error == DM_ENDIO_REQUEUE) + return; - if (bio->bi_rw & REQ_FLUSH) { + if (!(bio->bi_rw & REQ_FLUSH) || !bio->bi_size) { + trace_block_bio_complete(md->queue, bio); + bio_endio(bio, io_error); + } else { /* - * There can be just one flush request so we use - * a per-device variable for error reporting. - * Note that you can't touch the bio after end_io_acct + * Preflush done for flush with data, reissue + * without REQ_FLUSH. */ - if (!md->flush_error) - md->flush_error = io_error; - end_io_acct(io); - free_io(md, io); - } else { - end_io_acct(io); - free_io(md, io); - - if (io_error != DM_ENDIO_REQUEUE) { - trace_block_bio_complete(md->queue, bio); - - bio_endio(bio, io_error); - } + bio->bi_rw &= ~REQ_FLUSH; + queue_io(md, bio); } } } @@ -1369,21 +1351,17 @@ static int __clone_and_map(struct clone_info *ci) */ static void __split_and_process_bio(struct mapped_device *md, struct bio *bio) { + bool is_flush = bio->bi_rw & REQ_FLUSH; struct clone_info ci; int error = 0; ci.map = dm_get_live_table(md); if (unlikely(!ci.map)) { - if (!(bio->bi_rw & REQ_FLUSH)) - bio_io_error(bio); - else - if (!md->flush_error) - md->flush_error = -EIO; + bio_io_error(bio); return; } ci.md = md; - ci.bio = bio; ci.io = alloc_io(md); ci.io->error = 0; atomic_set(&ci.io->io_count, 1); @@ -1391,18 +1369,19 @@ static void __split_and_process_bio(struct mapped_device *md, struct bio *bio) ci.io->md = md; spin_lock_init(&ci.io->endio_lock); ci.sector = bio->bi_sector; - if (!(bio->bi_rw & REQ_FLUSH)) + ci.idx = bio->bi_idx; + + if (!is_flush) { + ci.bio = bio; ci.sector_count = bio_sectors(bio); - else { - /* all FLUSH bio's reaching here should be empty */ - WARN_ON_ONCE(bio_has_data(bio)); + } else { + ci.bio = &ci.md->flush_bio; ci.sector_count = 1; } - ci.idx = bio->bi_idx; start_io_acct(ci.io); while (ci.sector_count && !error) { - if (!(bio->bi_rw & REQ_FLUSH)) + if (!is_flush) error = __clone_and_map(&ci); else error = __clone_and_map_flush(&ci); @@ -1490,22 +1469,14 @@ static int _dm_request(struct request_queue *q, struct bio *bio) part_stat_add(cpu, &dm_disk(md)->part0, sectors[rw], bio_sectors(bio)); part_stat_unlock(); - /* - * If we're suspended or the thread is processing flushes - * we have to queue this io for later. - */ - if (unlikely(test_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags)) || - (bio->bi_rw & REQ_FLUSH)) { + /* if we're suspended, we have to queue this io for later */ + if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) { up_read(&md->io_lock); - if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) && - bio_rw(bio) == READA) { + if (bio_rw(bio) != READA) + queue_io(md, bio); + else bio_io_error(bio); - return 0; - } - - queue_io(md, bio); - return 0; } @@ -2015,6 +1986,10 @@ static struct mapped_device *alloc_dev(int minor) if (!md->bdev) goto bad_bdev; + bio_init(&md->flush_bio); + md->flush_bio.bi_bdev = md->bdev; + md->flush_bio.bi_rw = WRITE_FLUSH; + /* Populate the mapping, nobody knows we exist yet */ spin_lock(&_minor_lock); old_md = idr_replace(&_minor_idr, md, minor); @@ -2407,37 +2382,6 @@ static int dm_wait_for_completion(struct mapped_device *md, int interruptible) return r; } -static void process_flush(struct mapped_device *md, struct bio *bio) -{ - md->flush_error = 0; - - /* handle REQ_FLUSH */ - dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE); - - bio_init(&md->flush_bio); - md->flush_bio.bi_bdev = md->bdev; - md->flush_bio.bi_rw = WRITE_FLUSH; - __split_and_process_bio(md, &md->flush_bio); - - dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE); - - /* if it's an empty flush or the preflush failed, we're done */ - if (!bio_has_data(bio) || md->flush_error) { - if (md->flush_error != DM_ENDIO_REQUEUE) - bio_endio(bio, md->flush_error); - else { - spin_lock_irq(&md->deferred_lock); - bio_list_add_head(&md->deferred, bio); - spin_unlock_irq(&md->deferred_lock); - } - return; - } - - /* issue data + REQ_FUA */ - bio->bi_rw &= ~REQ_FLUSH; - __split_and_process_bio(md, bio); -} - /* * Process the deferred bios */ @@ -2447,33 +2391,27 @@ static void dm_wq_work(struct work_struct *work) work); struct bio *c; - down_write(&md->io_lock); + down_read(&md->io_lock); while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) { spin_lock_irq(&md->deferred_lock); c = bio_list_pop(&md->deferred); spin_unlock_irq(&md->deferred_lock); - if (!c) { - clear_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags); + if (!c) break; - } - up_write(&md->io_lock); + up_read(&md->io_lock); if (dm_request_based(md)) generic_make_request(c); - else { - if (c->bi_rw & REQ_FLUSH) - process_flush(md, c); - else - __split_and_process_bio(md, c); - } + else + __split_and_process_bio(md, c); - down_write(&md->io_lock); + down_read(&md->io_lock); } - up_write(&md->io_lock); + up_read(&md->io_lock); } static void dm_queue_flush(struct mapped_device *md) @@ -2672,17 +2610,12 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags) * * To get all processes out of __split_and_process_bio in dm_request, * we take the write lock. To prevent any process from reentering - * __split_and_process_bio from dm_request, we set - * DMF_QUEUE_IO_TO_THREAD. - * - * To quiesce the thread (dm_wq_work), we set DMF_BLOCK_IO_FOR_SUSPEND - * and call flush_workqueue(md->wq). flush_workqueue will wait until - * dm_wq_work exits and DMF_BLOCK_IO_FOR_SUSPEND will prevent any - * further calls to __split_and_process_bio from dm_wq_work. + * __split_and_process_bio from dm_request and quiesce the thread + * (dm_wq_work), we set BMF_BLOCK_IO_FOR_SUSPEND and call + * flush_workqueue(md->wq). */ down_write(&md->io_lock); set_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags); - set_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags); up_write(&md->io_lock); /*