From patchwork Mon Aug 30 15:45:06 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 143511 Received: from mx02.colomx.prod.int.phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id o7UJdInI024528 for ; Mon, 30 Aug 2010 19:39:53 GMT Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx02.colomx.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o7UJbEgQ020850; Mon, 30 Aug 2010 15:37:14 -0400 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o7UJOkr2021956 for ; Mon, 30 Aug 2010 15:24:46 -0400 Received: from localhost (dhcp-100-19-150.bos.redhat.com [10.16.19.150]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o7UJOeV1010525 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Mon, 30 Aug 2010 15:24:41 -0400 Resent-From: Mike Snitzer Resent-Date: Mon, 30 Aug 2010 15:24:40 -0400 Resent-Message-ID: <20100830192440.GM9195@redhat.com> Resent-To: dm-devel@redhat.com Received: from zmta01.collab.prod.int.phx2.redhat.com (LHLO zmta01.collab.prod.int.phx2.redhat.com) (10.5.5.31) by mail05.corp.redhat.com with LMTP; Mon, 30 Aug 2010 11:45:26 -0400 (EDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by zmta01.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 4DEF59C003 for ; Mon, 30 Aug 2010 11:45:26 -0400 (EDT) Received: from zmta01.collab.prod.int.phx2.redhat.com ([127.0.0.1]) by localhost (zmta01.collab.prod.int.phx2.redhat.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id usGvH4Yxm-TO for ; Mon, 30 Aug 2010 11:45:26 -0400 (EDT) Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by zmta01.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 387BF9C002 for ; Mon, 30 Aug 2010 11:45:26 -0400 (EDT) Received: from mx1.redhat.com (ext-mx06.extmail.prod.ext.phx2.redhat.com [10.5.110.10]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o7UFjQwu032520 for ; Mon, 30 Aug 2010 11:45:26 -0400 Received: from hera.kernel.org (hera.kernel.org [140.211.167.34]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o7UFjDCC004420 for ; Mon, 30 Aug 2010 11:45:14 -0400 Received: from htj.dyndns.org (localhost [127.0.0.1]) by hera.kernel.org (8.14.4/8.14.3) with ESMTP id o7UFj7RP000992; Mon, 30 Aug 2010 15:45:07 GMT X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.95.2 at hera.kernel.org Received: from [127.0.0.2] (htj.dyndns.org [127.0.0.2]) by htj.dyndns.org (Postfix) with ESMTPSA id D14DE1CC069A; Mon, 30 Aug 2010 17:45:06 +0200 (CEST) Message-ID: <4C7BD202.4040700@kernel.org> Date: Mon, 30 Aug 2010 17:45:06 +0200 From: Tejun Heo User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2 MIME-Version: 1.0 To: Mike Snitzer References: <1283162296-13650-1-git-send-email-tj@kernel.org> <1283162296-13650-5-git-send-email-tj@kernel.org> <20100830132836.GB5283@redhat.com> <4C7BB932.1070405@kernel.org> In-Reply-To: <4C7BB932.1070405@kernel.org> X-Enigmail-Version: 1.1.1 X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on hera.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter1.kernel.org [140.211.167.41]); Mon, 30 Aug 2010 19:39:54 +0000 (UTC) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3 (hera.kernel.org [127.0.0.1]); Mon, 30 Aug 2010 15:45:08 +0000 (UTC) X-RedHat-Spam-Score: -2.31 (RCVD_IN_DNSWL_MED,T_RP_MATCHES_RCVD) X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Scanned-By: MIMEDefang 2.67 on 10.5.110.10 X-loop: dm-devel@redhat.com Cc: k-ueda@ct.jp.nec.com, jaxboe@fusionio.com, jamie@shareable.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, linux-fsdevel@vger.kernel.org, j-nomura@ce.jp.nec.com, hch@lst.de Subject: [dm-devel] [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com Index: block/drivers/md/dm.c =================================================================== --- block.orig/drivers/md/dm.c +++ block/drivers/md/dm.c @@ -143,20 +143,9 @@ struct mapped_device { spinlock_t deferred_lock; /* - * Protect barrier_error from concurrent endio processing - * in request-based dm. - */ - spinlock_t barrier_error_lock; - int barrier_error; - - /* - * Processing queue (flush/barriers) + * Processing queue (flush) */ struct workqueue_struct *wq; - struct work_struct barrier_work; - - /* A pointer to the currently processing pre/post flush request */ - struct request *flush_request; /* * The current mapping. @@ -732,23 +721,6 @@ static void end_clone_bio(struct bio *cl blk_update_request(tio->orig, 0, nr_bytes); } -static void store_barrier_error(struct mapped_device *md, int error) -{ - unsigned long flags; - - spin_lock_irqsave(&md->barrier_error_lock, flags); - /* - * Basically, the first error is taken, but: - * -EOPNOTSUPP supersedes any I/O error. - * Requeue request supersedes any I/O error but -EOPNOTSUPP. - */ - if (!md->barrier_error || error == -EOPNOTSUPP || - (md->barrier_error != -EOPNOTSUPP && - error == DM_ENDIO_REQUEUE)) - md->barrier_error = error; - spin_unlock_irqrestore(&md->barrier_error_lock, flags); -} - /* * Don't touch any member of the md after calling this function because * the md may be freed in dm_put() at the end of this function. @@ -786,13 +758,11 @@ static void free_rq_clone(struct request static void dm_end_request(struct request *clone, int error) { int rw = rq_data_dir(clone); - int run_queue = 1; - bool is_barrier = clone->cmd_flags & REQ_HARDBARRIER; struct dm_rq_target_io *tio = clone->end_io_data; struct mapped_device *md = tio->md; struct request *rq = tio->orig; - if (rq->cmd_type == REQ_TYPE_BLOCK_PC && !is_barrier) { + if (rq->cmd_type == REQ_TYPE_BLOCK_PC) { rq->errors = clone->errors; rq->resid_len = clone->resid_len; @@ -806,15 +776,8 @@ static void dm_end_request(struct reques } free_rq_clone(clone); - - if (unlikely(is_barrier)) { - if (unlikely(error)) - store_barrier_error(md, error); - run_queue = 0; - } else - blk_end_request_all(rq, error); - - rq_completed(md, rw, run_queue); + blk_end_request_all(rq, error); + rq_completed(md, rw, true); } static void dm_unprep_request(struct request *rq) @@ -839,16 +802,6 @@ void dm_requeue_unmapped_request(struct struct request_queue *q = rq->q; unsigned long flags; - if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) { - /* - * Barrier clones share an original request. - * Leave it to dm_end_request(), which handles this special - * case. - */ - dm_end_request(clone, DM_ENDIO_REQUEUE); - return; - } - dm_unprep_request(rq); spin_lock_irqsave(q->queue_lock, flags); @@ -938,19 +891,6 @@ static void dm_complete_request(struct r struct dm_rq_target_io *tio = clone->end_io_data; struct request *rq = tio->orig; - if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) { - /* - * Barrier clones share an original request. So can't use - * softirq_done with the original. - * Pass the clone to dm_done() directly in this special case. - * It is safe (even if clone->q->queue_lock is held here) - * because there is no I/O dispatching during the completion - * of barrier clone. - */ - dm_done(clone, error, true); - return; - } - tio->error = error; rq->completion_data = clone; blk_complete_request(rq); @@ -967,17 +907,6 @@ void dm_kill_unmapped_request(struct req struct dm_rq_target_io *tio = clone->end_io_data; struct request *rq = tio->orig; - if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) { - /* - * Barrier clones share an original request. - * Leave it to dm_end_request(), which handles this special - * case. - */ - BUG_ON(error > 0); - dm_end_request(clone, error); - return; - } - rq->cmd_flags |= REQ_FAILED; dm_complete_request(clone, error); } @@ -1507,14 +1436,6 @@ static int dm_request(struct request_que return _dm_request(q, bio); } -static bool dm_rq_is_flush_request(struct request *rq) -{ - if (rq->cmd_flags & REQ_FLUSH) - return true; - else - return false; -} - void dm_dispatch_request(struct request *rq) { int r; @@ -1562,22 +1483,15 @@ static int setup_clone(struct request *c { int r; - if (dm_rq_is_flush_request(rq)) { - blk_rq_init(NULL, clone); - clone->cmd_type = REQ_TYPE_FS; - clone->cmd_flags |= (REQ_HARDBARRIER | WRITE); - } else { - r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC, - dm_rq_bio_constructor, tio); - if (r) - return r; - - clone->cmd = rq->cmd; - clone->cmd_len = rq->cmd_len; - clone->sense = rq->sense; - clone->buffer = rq->buffer; - } + r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC, + dm_rq_bio_constructor, tio); + if (r) + return r; + clone->cmd = rq->cmd; + clone->cmd_len = rq->cmd_len; + clone->sense = rq->sense; + clone->buffer = rq->buffer; clone->end_io = end_clone_request; clone->end_io_data = tio; @@ -1618,9 +1532,6 @@ static int dm_prep_fn(struct request_que struct mapped_device *md = q->queuedata; struct request *clone; - if (unlikely(dm_rq_is_flush_request(rq))) - return BLKPREP_OK; - if (unlikely(rq->special)) { DMWARN("Already has something in rq->special."); return BLKPREP_KILL; @@ -1697,6 +1608,7 @@ static void dm_request_fn(struct request struct dm_table *map = dm_get_live_table(md); struct dm_target *ti; struct request *rq, *clone; + sector_t pos; /* * For suspend, check blk_queue_stopped() and increment @@ -1709,15 +1621,14 @@ static void dm_request_fn(struct request if (!rq) goto plug_and_out; - if (unlikely(dm_rq_is_flush_request(rq))) { - BUG_ON(md->flush_request); - md->flush_request = rq; - blk_start_request(rq); - queue_work(md->wq, &md->barrier_work); - goto out; - } + /* always use block 0 to find the target for flushes for now */ + pos = 0; + if (!(rq->cmd_flags & REQ_FLUSH)) + pos = blk_rq_pos(rq); + + ti = dm_table_find_target(map, pos); + BUG_ON(!dm_target_is_valid(ti)); - ti = dm_table_find_target(map, blk_rq_pos(rq)); if (ti->type->busy && ti->type->busy(ti)) goto plug_and_out; @@ -1888,7 +1799,6 @@ out: static const struct block_device_operations dm_blk_dops; static void dm_wq_work(struct work_struct *work); -static void dm_rq_barrier_work(struct work_struct *work); static void dm_init_md_queue(struct mapped_device *md) { @@ -1943,7 +1853,6 @@ static struct mapped_device *alloc_dev(i mutex_init(&md->suspend_lock); mutex_init(&md->type_lock); spin_lock_init(&md->deferred_lock); - spin_lock_init(&md->barrier_error_lock); rwlock_init(&md->map_lock); atomic_set(&md->holders, 1); atomic_set(&md->open_count, 0); @@ -1966,7 +1875,6 @@ static struct mapped_device *alloc_dev(i atomic_set(&md->pending[1], 0); init_waitqueue_head(&md->wait); INIT_WORK(&md->work, dm_wq_work); - INIT_WORK(&md->barrier_work, dm_rq_barrier_work); init_waitqueue_head(&md->eventq); md->disk->major = _major; @@ -2220,8 +2128,6 @@ static int dm_init_request_based_queue(s blk_queue_softirq_done(md->queue, dm_softirq_done); blk_queue_prep_rq(md->queue, dm_prep_fn); blk_queue_lld_busy(md->queue, dm_lld_busy); - /* no flush support for request based dm yet */ - blk_queue_flush(md->queue, 0); elv_register_queue(md->queue); @@ -2421,73 +2327,6 @@ static void dm_queue_flush(struct mapped queue_work(md->wq, &md->work); } -static void dm_rq_set_target_request_nr(struct request *clone, unsigned request_nr) -{ - struct dm_rq_target_io *tio = clone->end_io_data; - - tio->info.target_request_nr = request_nr; -} - -/* Issue barrier requests to targets and wait for their completion. */ -static int dm_rq_barrier(struct mapped_device *md) -{ - int i, j; - struct dm_table *map = dm_get_live_table(md); - unsigned num_targets = dm_table_get_num_targets(map); - struct dm_target *ti; - struct request *clone; - - md->barrier_error = 0; - - for (i = 0; i < num_targets; i++) { - ti = dm_table_get_target(map, i); - for (j = 0; j < ti->num_flush_requests; j++) { - clone = clone_rq(md->flush_request, md, GFP_NOIO); - dm_rq_set_target_request_nr(clone, j); - atomic_inc(&md->pending[rq_data_dir(clone)]); - map_request(ti, clone, md); - } - } - - dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE); - dm_table_put(map); - - return md->barrier_error; -} - -static void dm_rq_barrier_work(struct work_struct *work) -{ - int error; - struct mapped_device *md = container_of(work, struct mapped_device, - barrier_work); - struct request_queue *q = md->queue; - struct request *rq; - unsigned long flags; - - /* - * Hold the md reference here and leave it at the last part so that - * the md can't be deleted by device opener when the barrier request - * completes. - */ - dm_get(md); - - error = dm_rq_barrier(md); - - rq = md->flush_request; - md->flush_request = NULL; - - if (error == DM_ENDIO_REQUEUE) { - spin_lock_irqsave(q->queue_lock, flags); - blk_requeue_request(q, rq); - spin_unlock_irqrestore(q->queue_lock, flags); - } else - blk_end_request_all(rq, error); - - blk_run_queue(q); - - dm_put(md); -} - /* * Swap in a new table, returning the old one for the caller to destroy. */ @@ -2619,9 +2458,8 @@ int dm_suspend(struct mapped_device *md, up_write(&md->io_lock); /* - * Request-based dm uses md->wq for barrier (dm_rq_barrier_work) which - * can be kicked until md->queue is stopped. So stop md->queue before - * flushing md->wq. + * Stop md->queue before flushing md->wq in case request-based + * dm defers requests to md->wq from md->queue. */ if (dm_request_based(md)) stop_queue(md->queue);