diff mbox

block: properly handle flush/fua requests in blk_insert_cloned_request

Message ID 20110809174347.GA13293@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Mike Snitzer Aug. 9, 2011, 5:43 p.m. UTC
On Tue, Aug 09 2011 at 12:13pm -0400,
Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> On Tue, Aug 09, 2011 at 11:53:51AM -0400, Jeff Moyer wrote:
> > Tejun Heo <tj@kernel.org> writes:
> > > I'm a bit confused.  We still need ELEVATOR_INSERT_FLUSH fix for
> > > insertion paths, right?  Or is blk_insert_cloned_request() supposed to
> > > used only by request based dm which lives under the elevator?  If so,
> > > it would be great to make that explicit in the comment.  Maybe just
> > > renaming it to blk_insert_dm_cloned_request() would be better as it
> > > wouldn't be safe for other cases anyway.
> > 
> > request-based dm is the only caller at present.  I'm not a fan of
> > renaming the function, but I'm more than willing to comment it.
> 
> I'm still confused and don't think the patch is correct (you can't
> turn off REQ_FUA without decomposing it to data + post flush).
> 
> Going through flush machinery twice is okay and I think is the right
> thing to do.  At the upper queue, the request is decomposed to member
> requests.  After decomposition, it's either REQ_FLUSH w/o data or data
> request w/ or w/o REQ_FUA.  When the decomposed request reaches lower
> queue, the lower queue will then either short-circuit it, execute
> as-is or decompose data w/ REQ_FUA into data + REQ_FLUSH sequence.
> 
> AFAICS, the breakages are...
> 
> * ELEVATOR_INSERT_FLUSH not used properly from insert paths.
> 
> * Short circuit not kicking in for the dm requests. (the above and the
>   policy patch should solve this, right?)
> 
> * BUG(!rq->bio || ...) in blk_insert_flush().  I think we can lift
>   this restriction for empty REQ_FLUSH but also dm can just send down
>   requests with empty bio.

[cc'ing dm-devel]

All of these issues have come to light because DM was not setting
flush_flags based on the underlying device(s).  Now fixed in v3.1-rc1:
ed8b752 dm table: set flush capability based on underlying devices

Given that commit, and that request-based DM is beneath the elevator, it
seems any additional effort to have DM flushes re-enter the flush
machinary is unnecessary.

We expect:
1) flushes to have gone through the flush machinary
2) no FLUSH/FUA should be entering underlying queues if not supported

I think it best to just document the expectation that any FLUSH/FUA
request that enters blk_insert_cloned_request() will already match the
queue that the request is being sent to.  One way to document it is to
change Jeff's flag striping in to pure BUG_ON()s, e.g.:

---
 block/blk-core.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Tejun Heo Aug. 9, 2011, 5:51 p.m. UTC | #1
Hello,

On Tue, Aug 09, 2011 at 01:43:47PM -0400, Mike Snitzer wrote:
> [cc'ing dm-devel]
> 
> All of these issues have come to light because DM was not setting
> flush_flags based on the underlying device(s).  Now fixed in v3.1-rc1:
> ed8b752 dm table: set flush capability based on underlying devices
> 
> Given that commit, and that request-based DM is beneath the elevator, it
> seems any additional effort to have DM flushes re-enter the flush
> machinary is unnecessary.

Hmmm... what if the underlying devices have different featureset?  Use
the lowest common denominator?  The flush mechanism is designed to
deal with stacking by going through flush at each level.  Stacking
queue can simply export support for both REQ_FLUSH and FUA and the
lower lever queue will decompose them according to the capability
supported by the actual device.

IIUC, what's broken here is some insert functions aren't using
ELEVATOR_INSERT_FLUSH when needed and there are some issues due to the
special nature of the stacked requests which I think should be fixed.

Thanks.
Vivek Goyal Aug. 9, 2011, 5:52 p.m. UTC | #2
On Tue, Aug 09, 2011 at 01:43:47PM -0400, Mike Snitzer wrote:
> On Tue, Aug 09 2011 at 12:13pm -0400,
> Tejun Heo <tj@kernel.org> wrote:
> 
> > Hello,
> > 
> > On Tue, Aug 09, 2011 at 11:53:51AM -0400, Jeff Moyer wrote:
> > > Tejun Heo <tj@kernel.org> writes:
> > > > I'm a bit confused.  We still need ELEVATOR_INSERT_FLUSH fix for
> > > > insertion paths, right?  Or is blk_insert_cloned_request() supposed to
> > > > used only by request based dm which lives under the elevator?  If so,
> > > > it would be great to make that explicit in the comment.  Maybe just
> > > > renaming it to blk_insert_dm_cloned_request() would be better as it
> > > > wouldn't be safe for other cases anyway.
> > > 
> > > request-based dm is the only caller at present.  I'm not a fan of
> > > renaming the function, but I'm more than willing to comment it.
> > 
> > I'm still confused and don't think the patch is correct (you can't
> > turn off REQ_FUA without decomposing it to data + post flush).
> > 
> > Going through flush machinery twice is okay and I think is the right
> > thing to do.  At the upper queue, the request is decomposed to member
> > requests.  After decomposition, it's either REQ_FLUSH w/o data or data
> > request w/ or w/o REQ_FUA.  When the decomposed request reaches lower
> > queue, the lower queue will then either short-circuit it, execute
> > as-is or decompose data w/ REQ_FUA into data + REQ_FLUSH sequence.
> > 
> > AFAICS, the breakages are...
> > 
> > * ELEVATOR_INSERT_FLUSH not used properly from insert paths.
> > 
> > * Short circuit not kicking in for the dm requests. (the above and the
> >   policy patch should solve this, right?)
> > 
> > * BUG(!rq->bio || ...) in blk_insert_flush().  I think we can lift
> >   this restriction for empty REQ_FLUSH but also dm can just send down
> >   requests with empty bio.
> 
> [cc'ing dm-devel]
> 
> All of these issues have come to light because DM was not setting
> flush_flags based on the underlying device(s).  Now fixed in v3.1-rc1:
> ed8b752 dm table: set flush capability based on underlying devices
> 
> Given that commit, and that request-based DM is beneath the elevator, it
> seems any additional effort to have DM flushes re-enter the flush
> machinary is unnecessary.
> 
> We expect:
> 1) flushes to have gone through the flush machinary
> 2) no FLUSH/FUA should be entering underlying queues if not supported
> 
> I think it best to just document the expectation that any FLUSH/FUA
> request that enters blk_insert_cloned_request() will already match the
> queue that the request is being sent to.  One way to document it is to
> change Jeff's flag striping in to pure BUG_ON()s, e.g.:
> 
> ---
>  block/blk-core.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b627558..201bb27 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1710,6 +1710,14 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
>  	    should_fail_request(&rq->rq_disk->part0, blk_rq_bytes(rq)))
>  		return -EIO;
>  
> +	/*
> +	 * All FLUSH/FUA requests are expected to have gone through the
> +	 * flush machinary.  If a request's cmd_flags doesn't match the
> +	 * flush_flags of the underlying request_queue it is a bug.
> +	 */
> +	BUG_ON((rq->cmd_flags & REQ_FLUSH) && !(q->flush_flags & REQ_FLUSH));
> +	BUG_ON((rq->cmd_flags & REQ_FUA) && !(q->flush_flags & REQ_FUA));
> +

Actually this makes sense and is simple. :-) Is BUG_ON() too harsh, how
about WARN_ONCE() variants? To me system continues to work so warning 
is probably good enough.

Thanks
Vivek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Tejun Heo Aug. 9, 2011, 5:55 p.m. UTC | #3
On Tue, Aug 09, 2011 at 01:52:37PM -0400, Vivek Goyal wrote:
> > +	/*
> > +	 * All FLUSH/FUA requests are expected to have gone through the
> > +	 * flush machinary.  If a request's cmd_flags doesn't match the
> > +	 * flush_flags of the underlying request_queue it is a bug.
> > +	 */
> > +	BUG_ON((rq->cmd_flags & REQ_FLUSH) && !(q->flush_flags & REQ_FLUSH));
> > +	BUG_ON((rq->cmd_flags & REQ_FUA) && !(q->flush_flags & REQ_FUA));
> > +
> 
> Actually this makes sense and is simple. :-) Is BUG_ON() too harsh, how
> about WARN_ONCE() variants? To me system continues to work so warning 
> is probably good enough.

I don't really think this is the right thing to do.  It makes the
cloning a special case.  It looks easy now but puts it in its own
special box which usually ends up ugly in the long run.  Let's get the
thing fixed and let it play like others do.  There's nothing
fundamentally broken with the usual mechanism.

Thanks.
Mike Snitzer Aug. 9, 2011, 6:33 p.m. UTC | #4
On Tue, Aug 09 2011 at  1:51pm -0400,
Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> On Tue, Aug 09, 2011 at 01:43:47PM -0400, Mike Snitzer wrote:
> > [cc'ing dm-devel]
> > 
> > All of these issues have come to light because DM was not setting
> > flush_flags based on the underlying device(s).  Now fixed in v3.1-rc1:
> > ed8b752 dm table: set flush capability based on underlying devices
> > 
> > Given that commit, and that request-based DM is beneath the elevator, it
> > seems any additional effort to have DM flushes re-enter the flush
> > machinary is unnecessary.
> 
> Hmmm... what if the underlying devices have different featureset?  Use
> the lowest common denominator?

Yes, for DM, if any device in the stack requires FLUSH/FUA then
the upper request_queue's flush_flags will be set to reflect that.

Bio-based DM _could_ end up issuing a flush to a device that doesn't
need the flush.  But once the bio emerges from the bottom of the
bio-based stack it'll get handed to the flush mechanism where it'll be
dropped.

Bio-based DM may stack ontop of request-based DM (think LVM ontop of
mpath device).  Request-based DM may _not_ stack on bio-based DM.

Request-based DM, the cause of all this discussion, is only used for the
multipath target and I'm not aware of any plans to provide more complex
stacking via request-based DM.  Doing so would require splitting a
request that spans target boundaries, etc.

So we're left with request-based DM only allowing a single target,
meaning: even if request-based DM were stacked a couple times the
flush_flags would still reflect the underlying physical device's queue.

> The flush mechanism is designed to
> deal with stacking by going through flush at each level.  Stacking
> queue can simply export support for both REQ_FLUSH and FUA and the
> lower lever queue will decompose them according to the capability
> supported by the actual device.
> 
> IIUC, what's broken here is some insert functions aren't using
> ELEVATOR_INSERT_FLUSH when needed and there are some issues due to the
> special nature of the stacked requests which I think should be fixed.

OK, conceptually pure, just seems the fixes are multiplying.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Aug. 9, 2011, 6:55 p.m. UTC | #5
On Tue, Aug 09 2011 at  1:52pm -0400,
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Tue, Aug 09, 2011 at 01:43:47PM -0400, Mike Snitzer wrote:
> > On Tue, Aug 09 2011 at 12:13pm -0400,
> > Tejun Heo <tj@kernel.org> wrote:
> > 
> > > Hello,
> > > 
> > > On Tue, Aug 09, 2011 at 11:53:51AM -0400, Jeff Moyer wrote:
> > > > Tejun Heo <tj@kernel.org> writes:
> > > > > I'm a bit confused.  We still need ELEVATOR_INSERT_FLUSH fix for
> > > > > insertion paths, right?  Or is blk_insert_cloned_request() supposed to
> > > > > used only by request based dm which lives under the elevator?  If so,
> > > > > it would be great to make that explicit in the comment.  Maybe just
> > > > > renaming it to blk_insert_dm_cloned_request() would be better as it
> > > > > wouldn't be safe for other cases anyway.
> > > > 
> > > > request-based dm is the only caller at present.  I'm not a fan of
> > > > renaming the function, but I'm more than willing to comment it.
> > > 
> > > I'm still confused and don't think the patch is correct (you can't
> > > turn off REQ_FUA without decomposing it to data + post flush).
> > > 
> > > Going through flush machinery twice is okay and I think is the right
> > > thing to do.  At the upper queue, the request is decomposed to member
> > > requests.  After decomposition, it's either REQ_FLUSH w/o data or data
> > > request w/ or w/o REQ_FUA.  When the decomposed request reaches lower
> > > queue, the lower queue will then either short-circuit it, execute
> > > as-is or decompose data w/ REQ_FUA into data + REQ_FLUSH sequence.
> > > 
> > > AFAICS, the breakages are...
> > > 
> > > * ELEVATOR_INSERT_FLUSH not used properly from insert paths.
> > > 
> > > * Short circuit not kicking in for the dm requests. (the above and the
> > >   policy patch should solve this, right?)
> > > 
> > > * BUG(!rq->bio || ...) in blk_insert_flush().  I think we can lift
> > >   this restriction for empty REQ_FLUSH but also dm can just send down
> > >   requests with empty bio.
> > 
> > [cc'ing dm-devel]
> > 
> > All of these issues have come to light because DM was not setting
> > flush_flags based on the underlying device(s).  Now fixed in v3.1-rc1:
> > ed8b752 dm table: set flush capability based on underlying devices
> > 
> > Given that commit, and that request-based DM is beneath the elevator, it
> > seems any additional effort to have DM flushes re-enter the flush
> > machinary is unnecessary.
> > 
> > We expect:
> > 1) flushes to have gone through the flush machinary
> > 2) no FLUSH/FUA should be entering underlying queues if not supported
> > 
> > I think it best to just document the expectation that any FLUSH/FUA
> > request that enters blk_insert_cloned_request() will already match the
> > queue that the request is being sent to.  One way to document it is to
> > change Jeff's flag striping in to pure BUG_ON()s, e.g.:
> > 
> > ---
> >  block/blk-core.c |    8 ++++++++
> >  1 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index b627558..201bb27 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1710,6 +1710,14 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
> >  	    should_fail_request(&rq->rq_disk->part0, blk_rq_bytes(rq)))
> >  		return -EIO;
> >  
> > +	/*
> > +	 * All FLUSH/FUA requests are expected to have gone through the
> > +	 * flush machinary.  If a request's cmd_flags doesn't match the
> > +	 * flush_flags of the underlying request_queue it is a bug.
> > +	 */
> > +	BUG_ON((rq->cmd_flags & REQ_FLUSH) && !(q->flush_flags & REQ_FLUSH));
> > +	BUG_ON((rq->cmd_flags & REQ_FUA) && !(q->flush_flags & REQ_FUA));
> > +
> 
> Actually this makes sense and is simple. :-) Is BUG_ON() too harsh, how
> about WARN_ONCE() variants? To me system continues to work so warning 
> is probably good enough.

Sure, WARN_ONCE() is fine by me.

Seems Tejun wants a more involved fix though.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Vivek Goyal Aug. 9, 2011, 7:05 p.m. UTC | #6
On Tue, Aug 09, 2011 at 02:55:31PM -0400, Mike Snitzer wrote:

[..]
> > > +	/*
> > > +	 * All FLUSH/FUA requests are expected to have gone through the
> > > +	 * flush machinary.  If a request's cmd_flags doesn't match the
> > > +	 * flush_flags of the underlying request_queue it is a bug.
> > > +	 */
> > > +	BUG_ON((rq->cmd_flags & REQ_FLUSH) && !(q->flush_flags & REQ_FLUSH));
> > > +	BUG_ON((rq->cmd_flags & REQ_FUA) && !(q->flush_flags & REQ_FUA));
> > > +
> > 
> > Actually this makes sense and is simple. :-) Is BUG_ON() too harsh, how
> > about WARN_ONCE() variants? To me system continues to work so warning 
> > is probably good enough.
> 
> Sure, WARN_ONCE() is fine by me.
> 
> Seems Tejun wants a more involved fix though.

Fixing it properly doesn't hurt. Makes it more future proof. In fact I am
thinking what happens to blk_execute_rq() variants where one can prepare a
request and send it down. What if caller sets FLUSH/FUA flags there.

Thanks
Vivek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Jeff Moyer Aug. 10, 2011, 3:49 p.m. UTC | #7
Vivek Goyal <vgoyal@redhat.com> writes:

> On Tue, Aug 09, 2011 at 02:55:31PM -0400, Mike Snitzer wrote:
>
> [..]
>> > > +	/*
>> > > +	 * All FLUSH/FUA requests are expected to have gone through the
>> > > +	 * flush machinary.  If a request's cmd_flags doesn't match the
>> > > +	 * flush_flags of the underlying request_queue it is a bug.
>> > > +	 */
>> > > +	BUG_ON((rq->cmd_flags & REQ_FLUSH) && !(q->flush_flags & REQ_FLUSH));
>> > > +	BUG_ON((rq->cmd_flags & REQ_FUA) && !(q->flush_flags & REQ_FUA));
>> > > +
>> > 
>> > Actually this makes sense and is simple. :-) Is BUG_ON() too harsh, how
>> > about WARN_ONCE() variants? To me system continues to work so warning 
>> > is probably good enough.
>> 
>> Sure, WARN_ONCE() is fine by me.
>> 
>> Seems Tejun wants a more involved fix though.
>
> Fixing it properly doesn't hurt. Makes it more future proof. In fact I am
> thinking what happens to blk_execute_rq() variants where one can prepare a
> request and send it down. What if caller sets FLUSH/FUA flags there.

Callers of blk_execute_rq are special.  Those aren't REQ_TYPE_FS
requests, and so the callers are responsible for doing their own
sequencing.

Cheers,
Jeff

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index b627558..201bb27 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1710,6 +1710,14 @@  int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
 	    should_fail_request(&rq->rq_disk->part0, blk_rq_bytes(rq)))
 		return -EIO;
 
+	/*
+	 * All FLUSH/FUA requests are expected to have gone through the
+	 * flush machinary.  If a request's cmd_flags doesn't match the
+	 * flush_flags of the underlying request_queue it is a bug.
+	 */
+	BUG_ON((rq->cmd_flags & REQ_FLUSH) && !(q->flush_flags & REQ_FLUSH));
+	BUG_ON((rq->cmd_flags & REQ_FUA) && !(q->flush_flags & REQ_FUA));
+
 	spin_lock_irqsave(q->queue_lock, flags);
 
 	/*