Message ID | 20230725130102.3030032-2-chengming.zhou@linux.dev (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-flush: optimize non-postflush requests | expand |
On Tue, Jul 25, 2023 at 09:00:59PM +0800, chengming.zhou@linux.dev wrote: > From: Chengming Zhou <zhouchengming@bytedance.com> > > The cmd_flags in blk_kick_flush() should inherit the original request's > cmd_flags, but the current code looks buggy to me: Should it? I know the code is kinda trying to do it, but does it really make sense? Adding Hannes who originally added this inheritance and discussing the details below: > flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH; > - flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK); > + flush_rq->cmd_flags |= (first_rq->cmd_flags & REQ_DRV) | > + (first_rq->cmd_flags & REQ_FAILFAST_MASK); Two cases here: 1) REQ_FAILFAST_MASK: I don't think this is actually set on flush request currently, and even if it was applying it to the flush that serves more than a single originating command seems wrong to me. 2) REQ_DRV is only set by drivers that have seen a bio. For dm this is used as REQ_DM_POLL_LIST which should never be set for a flush/fua request. For nvme-mpath it is REQ_NVME_MPATH, which is set in the bio based driver and used for decision making in the I/O completion handler. So I guess this one actually does need to get passed through.
On 2023/7/31 14:09, Christoph Hellwig wrote: > On Tue, Jul 25, 2023 at 09:00:59PM +0800, chengming.zhou@linux.dev wrote: >> From: Chengming Zhou <zhouchengming@bytedance.com> >> >> The cmd_flags in blk_kick_flush() should inherit the original request's >> cmd_flags, but the current code looks buggy to me: > > Should it? I know the code is kinda trying to do it, but does it really > make sense? Adding Hannes who originally added this inheritance and > discussing the details below: I'm not sure, actually I don't get what the current code is doing... Hope Hannes could provide some details. blk_flush_complete_seq(rq) -> blk_kick_flush(rq->cmd_flags) flush_rq will use the cmd_flags of request which just complete a sequence, there are three cases: 1. blk_insert_flush(rq): rq is pending, wait for flush 2. flush_end_io(flush_rq): rq flush seq done 3. mq_flush_data_end_io(rq): rq data seq done Only in the 1st case, the rq is the pending request that wait for flush_rq. In the 2nd and 3rd cases, the rq has nothing to do with the next flush_rq? So it's more reasonable for flush_rq to use its pending first_rq's cmd_flags? > >> flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH; >> - flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK); >> + flush_rq->cmd_flags |= (first_rq->cmd_flags & REQ_DRV) | >> + (first_rq->cmd_flags & REQ_FAILFAST_MASK); > > Two cases here: > > 1) REQ_FAILFAST_MASK: I don't think this is actually set on flush request > currently, and even if it was applying it to the flush that serves more > than a single originating command seems wrong to me. > 2) REQ_DRV is only set by drivers that have seen a bio. For dm this > is used as REQ_DM_POLL_LIST which should never be set for a flush/fua > request. For nvme-mpath it is REQ_NVME_MPATH, which is set in the > bio based driver and used for decision making in the I/O completion > handler. So I guess this one actually does need to get passed > through. > The commit 84fca1b0c461 ("block: pass failfast and driver-specific flags to flush requests") says: If flush requests are being sent to the device we need to inherit the failfast and driver-specific flags, too, otherwise I/O will fail. 1) REQ_FAILFAST_MASK: agree, shouldn't set to the flush_rq I think? 2) REQ_DRV: I don't get why this flag not set would cause I/O fail? Thanks!
On Mon, Jul 31, 2023 at 10:02:39PM +0800, Chengming Zhou wrote: > The commit 84fca1b0c461 ("block: pass failfast and driver-specific flags to > flush requests") says: > If flush requests are being sent to the device we need to inherit the > failfast and driver-specific flags, too, otherwise I/O will fail. > > 1) REQ_FAILFAST_MASK: agree, shouldn't set to the flush_rq I think? > 2) REQ_DRV: I don't get why this flag not set would cause I/O fail? I don't think it would fail I/O by it's own, but it will cause the nvme driver to not do the correct handling of an error when a flush is set to multipath setups.
On 7/31/23 08:09, Christoph Hellwig wrote: > On Tue, Jul 25, 2023 at 09:00:59PM +0800, chengming.zhou@linux.dev wrote: >> From: Chengming Zhou <zhouchengming@bytedance.com> >> >> The cmd_flags in blk_kick_flush() should inherit the original request's >> cmd_flags, but the current code looks buggy to me: > > Should it? I know the code is kinda trying to do it, but does it really > make sense? Adding Hannes who originally added this inheritance and > discussing the details below: > Yeah, it does. The flush machinery is sending flushes before and/or after the original request (preflush/postflush). For blocked transports (ie during FC RSCN handling) the transport will error out commands depending on the FAILFAST setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to failed). So if the FAILFAST setting is _not_ aligned between flush_rq and the original we'll get an error on the flush rq and a retry on the original rq, causing the entire command to fail. I guess we need to align them. Cheers, Hannes
On Mon, Jul 31, 2023 at 06:28:01PM +0200, Hannes Reinecke wrote: > The flush machinery is sending flushes before and/or after the original > request (preflush/postflush). For blocked transports (ie during FC RSCN > handling) the transport will error out commands depending on the FAILFAST > setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error > (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to > failed). > > So if the FAILFAST setting is _not_ aligned between flush_rq and the > original we'll get an error on the flush rq and a retry on the original rq, > causing the entire command to fail. > > I guess we need to align them. But you can't, because multiple pre/postflushes are coalesced into a single outstanding flush request. They can and will not match quite commonly.
On Tue, Aug 01, 2023 at 01:04:32PM +0200, Christoph Hellwig wrote: > On Mon, Jul 31, 2023 at 06:28:01PM +0200, Hannes Reinecke wrote: > > The flush machinery is sending flushes before and/or after the original > > request (preflush/postflush). For blocked transports (ie during FC RSCN > > handling) the transport will error out commands depending on the FAILFAST > > setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error > > (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to > > failed). > > > > So if the FAILFAST setting is _not_ aligned between flush_rq and the > > original we'll get an error on the flush rq and a retry on the original rq, > > causing the entire command to fail. > > > > I guess we need to align them. > > But you can't, because multiple pre/postflushes are coalesced into a > single outstanding flush request. They can and will not match quite > commonly. And if you mean the REQ_FAILFAST_TRANSPORT added by dm - this will never even see the flush state machine, as that is run in dm-mpath which then inserts the fully built flush request into the lower request queue. At least for request based multipath, bio could hit it.
On 2023/8/1 19:06, Christoph Hellwig wrote: > On Tue, Aug 01, 2023 at 01:04:32PM +0200, Christoph Hellwig wrote: >> On Mon, Jul 31, 2023 at 06:28:01PM +0200, Hannes Reinecke wrote: >>> The flush machinery is sending flushes before and/or after the original >>> request (preflush/postflush). For blocked transports (ie during FC RSCN >>> handling) the transport will error out commands depending on the FAILFAST >>> setting. If FAILFAST is set the SCSI layer gets an STS_TRANSPORT error >>> (causing the I/O to be retried), but STS_ERROR if not set (causing I/O to >>> failed). >>> >>> So if the FAILFAST setting is _not_ aligned between flush_rq and the >>> original we'll get an error on the flush rq and a retry on the original rq, >>> causing the entire command to fail. >>> >>> I guess we need to align them. >> >> But you can't, because multiple pre/postflushes are coalesced into a >> single outstanding flush request. They can and will not match quite >> commonly. > > And if you mean the REQ_FAILFAST_TRANSPORT added by dm - this will > never even see the flush state machine, as that is run in dm-mpath > which then inserts the fully built flush request into the lower request > queue. At least for request based multipath, bio could hit it. Yes, multiple pre/postflushes are coalesced into a single flush request. So we can't figure out which request to use. From the above explanation, can we just drop this inherit logic? It seems strange or wrong here. Thanks.
diff --git a/block/blk-flush.c b/block/blk-flush.c index e73dc22d05c1..fc25228f7bb1 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -92,7 +92,7 @@ enum { }; static void blk_kick_flush(struct request_queue *q, - struct blk_flush_queue *fq, blk_opf_t flags); + struct blk_flush_queue *fq); static inline struct blk_flush_queue * blk_get_flush_queue(struct request_queue *q, struct blk_mq_ctx *ctx) @@ -166,11 +166,9 @@ static void blk_flush_complete_seq(struct request *rq, { struct request_queue *q = rq->q; struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx]; - blk_opf_t cmd_flags; BUG_ON(rq->flush.seq & seq); rq->flush.seq |= seq; - cmd_flags = rq->cmd_flags; if (likely(!error)) seq = blk_flush_cur_seq(rq); @@ -210,7 +208,7 @@ static void blk_flush_complete_seq(struct request *rq, BUG(); } - blk_kick_flush(q, fq, cmd_flags); + blk_kick_flush(q, fq); } static enum rq_end_io_ret flush_end_io(struct request *flush_rq, @@ -277,7 +275,6 @@ bool is_flush_rq(struct request *rq) * blk_kick_flush - consider issuing flush request * @q: request_queue being kicked * @fq: flush queue - * @flags: cmd_flags of the original request * * Flush related states of @q have changed, consider issuing flush request. * Please read the comment at the top of this file for more info. @@ -286,8 +283,7 @@ bool is_flush_rq(struct request *rq) * spin_lock_irq(fq->mq_flush_lock) * */ -static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq, - blk_opf_t flags) +static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq) { struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx]; struct request *first_rq = @@ -336,7 +332,8 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq, flush_rq->internal_tag = first_rq->internal_tag; flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH; - flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK); + flush_rq->cmd_flags |= (first_rq->cmd_flags & REQ_DRV) | + (first_rq->cmd_flags & REQ_FAILFAST_MASK); flush_rq->rq_flags |= RQF_FLUSH_SEQ; flush_rq->end_io = flush_end_io; /*