diff mbox series

[v2,1/4] blk-flush: flush_rq should inherit first_rq's cmd_flags

Message ID 20230725130102.3030032-2-chengming.zhou@linux.dev (mailing list archive)
State New, archived
Headers show
Series blk-flush: optimize non-postflush requests | expand

Commit Message

Chengming Zhou July 25, 2023, 1 p.m. UTC
From: Chengming Zhou <zhouchengming@bytedance.com>

The cmd_flags in blk_kick_flush() should inherit the original request's
cmd_flags, but the current code looks buggy to me:

flush_end_io()
  blk_flush_complete_seq() // requests on flush running list
    blk_kick_flush()

So the request passed to blk_flush_complete_seq() may will be ended
before blk_kick_flush().
On the other hand, flush_rq will inherit first_rq's tag, it should
use first_rq's cmd_flags too.

This patch is just preparation for the following patches, no bugfix
intended.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 block/blk-flush.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

Comments

Christoph Hellwig July 31, 2023, 6:09 a.m. UTC | #1
On Tue, Jul 25, 2023 at 09:00:59PM +0800, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> The cmd_flags in blk_kick_flush() should inherit the original request's
> cmd_flags, but the current code looks buggy to me:

Should it?  I know the code is kinda trying to do it, but does it really
make sense?  Adding Hannes who originally added this inheritance and
discussing the details below:

>  	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
> -	flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
> +	flush_rq->cmd_flags |= (first_rq->cmd_flags & REQ_DRV) |
> +			       (first_rq->cmd_flags & REQ_FAILFAST_MASK);

Two cases here:

 1) REQ_FAILFAST_MASK:  I don't think this is actually set on flush request
    currently, and even if it was applying it to the flush that serves more
    than a single originating command seems wrong to me.
 2) REQ_DRV is only set by drivers that have seen a bio.  For dm this
    is used as REQ_DM_POLL_LIST which should never be set for a flush/fua
    request.  For nvme-mpath it is REQ_NVME_MPATH, which is set in the
    bio based driver and used for decision making in the I/O completion
    handler.  So I guess this one actually does need to get passed
    through.
Chengming Zhou July 31, 2023, 2:02 p.m. UTC | #2
On 2023/7/31 14:09, Christoph Hellwig wrote:
> On Tue, Jul 25, 2023 at 09:00:59PM +0800, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> The cmd_flags in blk_kick_flush() should inherit the original request's
>> cmd_flags, but the current code looks buggy to me:
> 
> Should it?  I know the code is kinda trying to do it, but does it really
> make sense?  Adding Hannes who originally added this inheritance and
> discussing the details below:

I'm not sure, actually I don't get what the current code is doing...
Hope Hannes could provide some details.

blk_flush_complete_seq(rq) -> blk_kick_flush(rq->cmd_flags)

flush_rq will use the cmd_flags of request which just complete a sequence,
there are three cases:

1. blk_insert_flush(rq):	rq is pending, wait for flush
2. flush_end_io(flush_rq):	rq flush seq done
3. mq_flush_data_end_io(rq):	rq data seq done

Only in the 1st case, the rq is the pending request that wait for flush_rq.
In the 2nd and 3rd cases, the rq has nothing to do with the next flush_rq?

So it's more reasonable for flush_rq to use its pending first_rq's cmd_flags?

> 
>>  	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
>> -	flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
>> +	flush_rq->cmd_flags |= (first_rq->cmd_flags & REQ_DRV) |
>> +			       (first_rq->cmd_flags & REQ_FAILFAST_MASK);
> 
> Two cases here:
> 
>  1) REQ_FAILFAST_MASK:  I don't think this is actually set on flush request
>     currently, and even if it was applying it to the flush that serves more
>     than a single originating command seems wrong to me.
>  2) REQ_DRV is only set by drivers that have seen a bio.  For dm this
>     is used as REQ_DM_POLL_LIST which should never be set for a flush/fua
>     request.  For nvme-mpath it is REQ_NVME_MPATH, which is set in the
>     bio based driver and used for decision making in the I/O completion
>     handler.  So I guess this one actually does need to get passed
>     through.
> 

The commit 84fca1b0c461 ("block: pass failfast and driver-specific flags to
flush requests") says:
If flush requests are being sent to the device we need to inherit the
failfast and driver-specific flags, too, otherwise I/O will fail.

1) REQ_FAILFAST_MASK: agree, shouldn't set to the flush_rq I think?
2) REQ_DRV: I don't get why this flag not set would cause I/O fail?

Thanks!
Christoph Hellwig July 31, 2023, 2:09 p.m. UTC | #3
On Mon, Jul 31, 2023 at 10:02:39PM +0800, Chengming Zhou wrote:
> The commit 84fca1b0c461 ("block: pass failfast and driver-specific flags to
> flush requests") says:
> If flush requests are being sent to the device we need to inherit the
> failfast and driver-specific flags, too, otherwise I/O will fail.
> 
> 1) REQ_FAILFAST_MASK: agree, shouldn't set to the flush_rq I think?
> 2) REQ_DRV: I don't get why this flag not set would cause I/O fail?

I don't think it would fail I/O by it's own, but it will cause the
nvme driver to not do the correct handling of an error when a flush
is set to multipath setups.
Hannes Reinecke July 31, 2023, 4:28 p.m. UTC | #4
On 7/31/23 08:09, Christoph Hellwig wrote:
> On Tue, Jul 25, 2023 at 09:00:59PM +0800, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> The cmd_flags in blk_kick_flush() should inherit the original request's
>> cmd_flags, but the current code looks buggy to me:
> 
> Should it?  I know the code is kinda trying to do it, but does it really
> make sense?  Adding Hannes who originally added this inheritance and
> discussing the details below:
> 
Yeah, it does.
The flush machinery is sending flushes before and/or after the original 
request (preflush/postflush). For blocked transports (ie during FC RSCN 
handling) the transport will error out commands depending on the 
FAILFAST setting. If FAILFAST is set the SCSI layer gets an 
STS_TRANSPORT error (causing the I/O to be retried), but STS_ERROR if 
not set (causing I/O to failed).

So if the FAILFAST setting is _not_ aligned between flush_rq and the 
original we'll get an error on the flush rq and a retry on the original 
rq, causing the entire command to fail.

I guess we need to align them.

Cheers,

Hannes
Christoph Hellwig Aug. 1, 2023, 11:04 a.m. UTC | #5
On Mon, Jul 31, 2023 at 06:28:01PM +0200, Hannes Reinecke wrote:
> The flush machinery is sending flushes before and/or after the original 
> request (preflush/postflush). For blocked transports (ie during FC RSCN 
> handling) the transport will error out commands depending on the FAILFAST 
> setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error 
> (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to 
> failed).
>
> So if the FAILFAST setting is _not_ aligned between flush_rq and the 
> original we'll get an error on the flush rq and a retry on the original rq, 
> causing the entire command to fail.
>
> I guess we need to align them.

But you can't, because multiple pre/postflushes are coalesced into a
single outstanding flush request.  They can and will not match quite
commonly.
Christoph Hellwig Aug. 1, 2023, 11:06 a.m. UTC | #6
On Tue, Aug 01, 2023 at 01:04:32PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 31, 2023 at 06:28:01PM +0200, Hannes Reinecke wrote:
> > The flush machinery is sending flushes before and/or after the original 
> > request (preflush/postflush). For blocked transports (ie during FC RSCN 
> > handling) the transport will error out commands depending on the FAILFAST 
> > setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error 
> > (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to 
> > failed).
> >
> > So if the FAILFAST setting is _not_ aligned between flush_rq and the 
> > original we'll get an error on the flush rq and a retry on the original rq, 
> > causing the entire command to fail.
> >
> > I guess we need to align them.
> 
> But you can't, because multiple pre/postflushes are coalesced into a
> single outstanding flush request.  They can and will not match quite
> commonly.

And if you mean the REQ_FAILFAST_TRANSPORT added by dm - this will
never even see the flush state machine, as that is run in dm-mpath
which then inserts the fully built flush request into the lower request
queue.  At least for request based multipath, bio could hit it.
Chengming Zhou Aug. 3, 2023, 3:35 p.m. UTC | #7
On 2023/8/1 19:06, Christoph Hellwig wrote:
> On Tue, Aug 01, 2023 at 01:04:32PM +0200, Christoph Hellwig wrote:
>> On Mon, Jul 31, 2023 at 06:28:01PM +0200, Hannes Reinecke wrote:
>>> The flush machinery is sending flushes before and/or after the original 
>>> request (preflush/postflush). For blocked transports (ie during FC RSCN 
>>> handling) the transport will error out commands depending on the FAILFAST 
>>> setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error 
>>> (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to 
>>> failed).
>>>
>>> So if the FAILFAST setting is _not_ aligned between flush_rq and the 
>>> original we'll get an error on the flush rq and a retry on the original rq, 
>>> causing the entire command to fail.
>>>
>>> I guess we need to align them.
>>
>> But you can't, because multiple pre/postflushes are coalesced into a
>> single outstanding flush request.  They can and will not match quite
>> commonly.
> 
> And if you mean the REQ_FAILFAST_TRANSPORT added by dm - this will
> never even see the flush state machine, as that is run in dm-mpath
> which then inserts the fully built flush request into the lower request
> queue.  At least for request based multipath, bio could hit it.

Yes, multiple pre/postflushes are coalesced into a single flush request.
So we can't figure out which request to use.

From the above explanation, can we just drop this inherit logic? It seems
strange or wrong here.

Thanks.
diff mbox series

Patch

diff --git a/block/blk-flush.c b/block/blk-flush.c
index e73dc22d05c1..fc25228f7bb1 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -92,7 +92,7 @@  enum {
 };
 
 static void blk_kick_flush(struct request_queue *q,
-			   struct blk_flush_queue *fq, blk_opf_t flags);
+			   struct blk_flush_queue *fq);
 
 static inline struct blk_flush_queue *
 blk_get_flush_queue(struct request_queue *q, struct blk_mq_ctx *ctx)
@@ -166,11 +166,9 @@  static void blk_flush_complete_seq(struct request *rq,
 {
 	struct request_queue *q = rq->q;
 	struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
-	blk_opf_t cmd_flags;
 
 	BUG_ON(rq->flush.seq & seq);
 	rq->flush.seq |= seq;
-	cmd_flags = rq->cmd_flags;
 
 	if (likely(!error))
 		seq = blk_flush_cur_seq(rq);
@@ -210,7 +208,7 @@  static void blk_flush_complete_seq(struct request *rq,
 		BUG();
 	}
 
-	blk_kick_flush(q, fq, cmd_flags);
+	blk_kick_flush(q, fq);
 }
 
 static enum rq_end_io_ret flush_end_io(struct request *flush_rq,
@@ -277,7 +275,6 @@  bool is_flush_rq(struct request *rq)
  * blk_kick_flush - consider issuing flush request
  * @q: request_queue being kicked
  * @fq: flush queue
- * @flags: cmd_flags of the original request
  *
  * Flush related states of @q have changed, consider issuing flush request.
  * Please read the comment at the top of this file for more info.
@@ -286,8 +283,7 @@  bool is_flush_rq(struct request *rq)
  * spin_lock_irq(fq->mq_flush_lock)
  *
  */
-static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
-			   blk_opf_t flags)
+static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
 {
 	struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
 	struct request *first_rq =
@@ -336,7 +332,8 @@  static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 		flush_rq->internal_tag = first_rq->internal_tag;
 
 	flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
-	flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
+	flush_rq->cmd_flags |= (first_rq->cmd_flags & REQ_DRV) |
+			       (first_rq->cmd_flags & REQ_FAILFAST_MASK);
 	flush_rq->rq_flags |= RQF_FLUSH_SEQ;
 	flush_rq->end_io = flush_end_io;
 	/*