From patchwork Mon Aug 30 15:45:06 2010
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejun Heo <tj@kernel.org>
X-Patchwork-Id: 143511
Received: from mx02.colomx.prod.int.phx2.redhat.com (mx4-phx2.redhat.com
	[209.132.183.25])
	by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id o7UJdInI024528
	for <patchwork-dm-devel@patchwork.kernel.org>;
	Mon, 30 Aug 2010 19:39:53 GMT
Received: from lists01.pubmisc.prod.ext.phx2.redhat.com
	(lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33])
	by mx02.colomx.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id
	o7UJbEgQ020850; Mon, 30 Aug 2010 15:37:14 -0400
Received: from int-mx02.intmail.prod.int.phx2.redhat.com
	(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])
	by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with
	ESMTP id o7UJOkr2021956 for <dm-devel@listman.util.phx.redhat.com>;
	Mon, 30 Aug 2010 15:24:46 -0400
Received: from localhost (dhcp-100-19-150.bos.redhat.com [10.16.19.150])
	by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with
	ESMTP id o7UJOeV1010525
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO)
	for <dm-devel@redhat.com>; Mon, 30 Aug 2010 15:24:41 -0400
Resent-From: Mike Snitzer <snitzer@redhat.com>
Resent-Date: Mon, 30 Aug 2010 15:24:40 -0400
Resent-Message-ID: <20100830192440.GM9195@redhat.com>
Resent-To: dm-devel@redhat.com
Received: from zmta01.collab.prod.int.phx2.redhat.com (LHLO
	zmta01.collab.prod.int.phx2.redhat.com) (10.5.5.31) by
	mail05.corp.redhat.com with LMTP;
	Mon, 30 Aug 2010 11:45:26 -0400 (EDT)
Received: from localhost (localhost.localdomain [127.0.0.1])
	by zmta01.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id
	4DEF59C003
	for <msnitzer@redhat.com>; Mon, 30 Aug 2010 11:45:26 -0400 (EDT)
Received: from zmta01.collab.prod.int.phx2.redhat.com ([127.0.0.1])
	by localhost (zmta01.collab.prod.int.phx2.redhat.com [127.0.0.1])
	(amavisd-new, port 10024)
	with ESMTP id usGvH4Yxm-TO for <msnitzer@redhat.com>;
	Mon, 30 Aug 2010 11:45:26 -0400 (EDT)
Received: from int-mx02.intmail.prod.int.phx2.redhat.com
	(int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])
	by zmta01.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id
	387BF9C002 for <msnitzer@mail.corp.redhat.com>;
	Mon, 30 Aug 2010 11:45:26 -0400 (EDT)
Received: from mx1.redhat.com (ext-mx06.extmail.prod.ext.phx2.redhat.com
	[10.5.110.10])
	by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with
	ESMTP id o7UFjQwu032520
	for <snitzer@redhat.com>; Mon, 30 Aug 2010 11:45:26 -0400
Received: from hera.kernel.org (hera.kernel.org [140.211.167.34])
	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o7UFjDCC004420
	for <snitzer@redhat.com>; Mon, 30 Aug 2010 11:45:14 -0400
Received: from htj.dyndns.org (localhost [127.0.0.1])
	by hera.kernel.org (8.14.4/8.14.3) with ESMTP id o7UFj7RP000992;
	Mon, 30 Aug 2010 15:45:07 GMT
X-Virus-Status: Clean
X-Virus-Scanned: clamav-milter 0.95.2 at hera.kernel.org
Received: from [127.0.0.2] (htj.dyndns.org [127.0.0.2])
	by htj.dyndns.org (Postfix) with ESMTPSA id D14DE1CC069A;
	Mon, 30 Aug 2010 17:45:06 +0200 (CEST)
Message-ID: <4C7BD202.4040700@kernel.org>
Date: Mon, 30 Aug 2010 17:45:06 +0200
From: Tejun Heo <tj@kernel.org>
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US;
	rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
MIME-Version: 1.0
To: Mike Snitzer <snitzer@redhat.com>
References: <1283162296-13650-1-git-send-email-tj@kernel.org>
	<1283162296-13650-5-git-send-email-tj@kernel.org>
	<20100830132836.GB5283@redhat.com> <4C7BB932.1070405@kernel.org>
In-Reply-To: <4C7BB932.1070405@kernel.org>
X-Enigmail-Version: 1.1.1
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on hera.kernel.org
X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by
	milter-greylist-4.2.3 (demeter1.kernel.org [140.211.167.41]);
	Mon, 30 Aug 2010 19:39:54 +0000 (UTC)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3
	(hera.kernel.org [127.0.0.1]);
	Mon, 30 Aug 2010 15:45:08 +0000 (UTC)
X-RedHat-Spam-Score: -2.31  (RCVD_IN_DNSWL_MED,T_RP_MATCHES_RCVD)
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12
X-Scanned-By: MIMEDefang 2.67 on 10.5.110.10
X-loop: dm-devel@redhat.com
Cc: k-ueda@ct.jp.nec.com, jaxboe@fusionio.com, jamie@shareable.org,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, j-nomura@ce.jp.nec.com, hch@lst.de
Subject: [dm-devel] [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support
	for request-based dm
X-BeenThere: dm-devel@redhat.com
X-Mailman-Version: 2.1.12
Precedence: junk
Reply-To: device-mapper development <dm-devel@redhat.com>
List-Id: device-mapper development <dm-devel.redhat.com>
List-Unsubscribe: <https://www.redhat.com/mailman/options/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/dm-devel>
List-Post: <mailto:dm-devel@redhat.com>
List-Help: <mailto:dm-devel-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>,
	<mailto:dm-devel-request@redhat.com?subject=subscribe>
Sender: dm-devel-bounces@redhat.com
Errors-To: dm-devel-bounces@redhat.com


Index: block/drivers/md/dm.c
===================================================================
--- block.orig/drivers/md/dm.c
+++ block/drivers/md/dm.c
@@ -143,20 +143,9 @@ struct mapped_device {
 	spinlock_t deferred_lock;

 	/*
-	 * Protect barrier_error from concurrent endio processing
-	 * in request-based dm.
-	 */
-	spinlock_t barrier_error_lock;
-	int barrier_error;
-
-	/*
-	 * Processing queue (flush/barriers)
+	 * Processing queue (flush)
 	 */
 	struct workqueue_struct *wq;
-	struct work_struct barrier_work;
-
-	/* A pointer to the currently processing pre/post flush request */
-	struct request *flush_request;

 	/*
 	 * The current mapping.
@@ -732,23 +721,6 @@ static void end_clone_bio(struct bio *cl
 	blk_update_request(tio->orig, 0, nr_bytes);
 }

-static void store_barrier_error(struct mapped_device *md, int error)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&md->barrier_error_lock, flags);
-	/*
-	 * Basically, the first error is taken, but:
-	 *   -EOPNOTSUPP supersedes any I/O error.
-	 *   Requeue request supersedes any I/O error but -EOPNOTSUPP.
-	 */
-	if (!md->barrier_error || error == -EOPNOTSUPP ||
-	    (md->barrier_error != -EOPNOTSUPP &&
-	     error == DM_ENDIO_REQUEUE))
-		md->barrier_error = error;
-	spin_unlock_irqrestore(&md->barrier_error_lock, flags);
-}
-
 /*
  * Don't touch any member of the md after calling this function because
  * the md may be freed in dm_put() at the end of this function.
@@ -786,13 +758,11 @@ static void free_rq_clone(struct request
 static void dm_end_request(struct request *clone, int error)
 {
 	int rw = rq_data_dir(clone);
-	int run_queue = 1;
-	bool is_barrier = clone->cmd_flags & REQ_HARDBARRIER;
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct mapped_device *md = tio->md;
 	struct request *rq = tio->orig;

-	if (rq->cmd_type == REQ_TYPE_BLOCK_PC && !is_barrier) {
+	if (rq->cmd_type == REQ_TYPE_BLOCK_PC) {
 		rq->errors = clone->errors;
 		rq->resid_len = clone->resid_len;

@@ -806,15 +776,8 @@ static void dm_end_request(struct reques
 	}

 	free_rq_clone(clone);
-
-	if (unlikely(is_barrier)) {
-		if (unlikely(error))
-			store_barrier_error(md, error);
-		run_queue = 0;
-	} else
-		blk_end_request_all(rq, error);
-
-	rq_completed(md, rw, run_queue);
+	blk_end_request_all(rq, error);
+	rq_completed(md, rw, true);
 }

 static void dm_unprep_request(struct request *rq)
@@ -839,16 +802,6 @@ void dm_requeue_unmapped_request(struct
 	struct request_queue *q = rq->q;
 	unsigned long flags;

-	if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
-		/*
-		 * Barrier clones share an original request.
-		 * Leave it to dm_end_request(), which handles this special
-		 * case.
-		 */
-		dm_end_request(clone, DM_ENDIO_REQUEUE);
-		return;
-	}
-
 	dm_unprep_request(rq);

 	spin_lock_irqsave(q->queue_lock, flags);
@@ -938,19 +891,6 @@ static void dm_complete_request(struct r
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct request *rq = tio->orig;

-	if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
-		/*
-		 * Barrier clones share an original request.  So can't use
-		 * softirq_done with the original.
-		 * Pass the clone to dm_done() directly in this special case.
-		 * It is safe (even if clone->q->queue_lock is held here)
-		 * because there is no I/O dispatching during the completion
-		 * of barrier clone.
-		 */
-		dm_done(clone, error, true);
-		return;
-	}
-
 	tio->error = error;
 	rq->completion_data = clone;
 	blk_complete_request(rq);
@@ -967,17 +907,6 @@ void dm_kill_unmapped_request(struct req
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct request *rq = tio->orig;

-	if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
-		/*
-		 * Barrier clones share an original request.
-		 * Leave it to dm_end_request(), which handles this special
-		 * case.
-		 */
-		BUG_ON(error > 0);
-		dm_end_request(clone, error);
-		return;
-	}
-
 	rq->cmd_flags |= REQ_FAILED;
 	dm_complete_request(clone, error);
 }
@@ -1507,14 +1436,6 @@ static int dm_request(struct request_que
 	return _dm_request(q, bio);
 }

-static bool dm_rq_is_flush_request(struct request *rq)
-{
-	if (rq->cmd_flags & REQ_FLUSH)
-		return true;
-	else
-		return false;
-}
-
 void dm_dispatch_request(struct request *rq)
 {
 	int r;
@@ -1562,22 +1483,15 @@ static int setup_clone(struct request *c
 {
 	int r;

-	if (dm_rq_is_flush_request(rq)) {
-		blk_rq_init(NULL, clone);
-		clone->cmd_type = REQ_TYPE_FS;
-		clone->cmd_flags |= (REQ_HARDBARRIER | WRITE);
-	} else {
-		r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
-				      dm_rq_bio_constructor, tio);
-		if (r)
-			return r;
-
-		clone->cmd = rq->cmd;
-		clone->cmd_len = rq->cmd_len;
-		clone->sense = rq->sense;
-		clone->buffer = rq->buffer;
-	}
+	r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
+			      dm_rq_bio_constructor, tio);
+	if (r)
+		return r;

+	clone->cmd = rq->cmd;
+	clone->cmd_len = rq->cmd_len;
+	clone->sense = rq->sense;
+	clone->buffer = rq->buffer;
 	clone->end_io = end_clone_request;
 	clone->end_io_data = tio;

@@ -1618,9 +1532,6 @@ static int dm_prep_fn(struct request_que
 	struct mapped_device *md = q->queuedata;
 	struct request *clone;

-	if (unlikely(dm_rq_is_flush_request(rq)))
-		return BLKPREP_OK;
-
 	if (unlikely(rq->special)) {
 		DMWARN("Already has something in rq->special.");
 		return BLKPREP_KILL;
@@ -1697,6 +1608,7 @@ static void dm_request_fn(struct request
 	struct dm_table *map = dm_get_live_table(md);
 	struct dm_target *ti;
 	struct request *rq, *clone;
+	sector_t pos;

 	/*
 	 * For suspend, check blk_queue_stopped() and increment
@@ -1709,15 +1621,14 @@ static void dm_request_fn(struct request
 		if (!rq)
 			goto plug_and_out;

-		if (unlikely(dm_rq_is_flush_request(rq))) {
-			BUG_ON(md->flush_request);
-			md->flush_request = rq;
-			blk_start_request(rq);
-			queue_work(md->wq, &md->barrier_work);
-			goto out;
-		}
+		/* always use block 0 to find the target for flushes for now */
+		pos = 0;
+		if (!(rq->cmd_flags & REQ_FLUSH))
+			pos = blk_rq_pos(rq);
+
+		ti = dm_table_find_target(map, pos);
+		BUG_ON(!dm_target_is_valid(ti));

-		ti = dm_table_find_target(map, blk_rq_pos(rq));
 		if (ti->type->busy && ti->type->busy(ti))
 			goto plug_and_out;

@@ -1888,7 +1799,6 @@ out:
 static const struct block_device_operations dm_blk_dops;

 static void dm_wq_work(struct work_struct *work);
-static void dm_rq_barrier_work(struct work_struct *work);

 static void dm_init_md_queue(struct mapped_device *md)
 {
@@ -1943,7 +1853,6 @@ static struct mapped_device *alloc_dev(i
 	mutex_init(&md->suspend_lock);
 	mutex_init(&md->type_lock);
 	spin_lock_init(&md->deferred_lock);
-	spin_lock_init(&md->barrier_error_lock);
 	rwlock_init(&md->map_lock);
 	atomic_set(&md->holders, 1);
 	atomic_set(&md->open_count, 0);
@@ -1966,7 +1875,6 @@ static struct mapped_device *alloc_dev(i
 	atomic_set(&md->pending[1], 0);
 	init_waitqueue_head(&md->wait);
 	INIT_WORK(&md->work, dm_wq_work);
-	INIT_WORK(&md->barrier_work, dm_rq_barrier_work);
 	init_waitqueue_head(&md->eventq);

 	md->disk->major = _major;
@@ -2220,8 +2128,6 @@ static int dm_init_request_based_queue(s
 	blk_queue_softirq_done(md->queue, dm_softirq_done);
 	blk_queue_prep_rq(md->queue, dm_prep_fn);
 	blk_queue_lld_busy(md->queue, dm_lld_busy);
-	/* no flush support for request based dm yet */
-	blk_queue_flush(md->queue, 0);

 	elv_register_queue(md->queue);

@@ -2421,73 +2327,6 @@ static void dm_queue_flush(struct mapped
 	queue_work(md->wq, &md->work);
 }

-static void dm_rq_set_target_request_nr(struct request *clone, unsigned request_nr)
-{
-	struct dm_rq_target_io *tio = clone->end_io_data;
-
-	tio->info.target_request_nr = request_nr;
-}
-
-/* Issue barrier requests to targets and wait for their completion. */
-static int dm_rq_barrier(struct mapped_device *md)
-{
-	int i, j;
-	struct dm_table *map = dm_get_live_table(md);
-	unsigned num_targets = dm_table_get_num_targets(map);
-	struct dm_target *ti;
-	struct request *clone;
-
-	md->barrier_error = 0;
-
-	for (i = 0; i < num_targets; i++) {
-		ti = dm_table_get_target(map, i);
-		for (j = 0; j < ti->num_flush_requests; j++) {
-			clone = clone_rq(md->flush_request, md, GFP_NOIO);
-			dm_rq_set_target_request_nr(clone, j);
-			atomic_inc(&md->pending[rq_data_dir(clone)]);
-			map_request(ti, clone, md);
-		}
-	}
-
-	dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
-	dm_table_put(map);
-
-	return md->barrier_error;
-}
-
-static void dm_rq_barrier_work(struct work_struct *work)
-{
-	int error;
-	struct mapped_device *md = container_of(work, struct mapped_device,
-						barrier_work);
-	struct request_queue *q = md->queue;
-	struct request *rq;
-	unsigned long flags;
-
-	/*
-	 * Hold the md reference here and leave it at the last part so that
-	 * the md can't be deleted by device opener when the barrier request
-	 * completes.
-	 */
-	dm_get(md);
-
-	error = dm_rq_barrier(md);
-
-	rq = md->flush_request;
-	md->flush_request = NULL;
-
-	if (error == DM_ENDIO_REQUEUE) {
-		spin_lock_irqsave(q->queue_lock, flags);
-		blk_requeue_request(q, rq);
-		spin_unlock_irqrestore(q->queue_lock, flags);
-	} else
-		blk_end_request_all(rq, error);
-
-	blk_run_queue(q);
-
-	dm_put(md);
-}
-
 /*
  * Swap in a new table, returning the old one for the caller to destroy.
  */
@@ -2619,9 +2458,8 @@ int dm_suspend(struct mapped_device *md,
 	up_write(&md->io_lock);

 	/*
-	 * Request-based dm uses md->wq for barrier (dm_rq_barrier_work) which
-	 * can be kicked until md->queue is stopped.  So stop md->queue before
-	 * flushing md->wq.
+	 * Stop md->queue before flushing md->wq in case request-based
+	 * dm defers requests to md->wq from md->queue.
 	 */
 	if (dm_request_based(md))
 		stop_queue(md->queue);