Message ID | 20575f0a-656e-4bb3-9d82-dec6c7e3a35c@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | block: cleanup and fix batch completion adding conditions | expand |
CC+: Alan, On Feb 13, 2025 / 08:18, Jens Axboe wrote: > The conditions for whether or not a request is allowed adding to a > completion batch are a bit hard to read, and they also have a few > issues. One is that ioerror may indeed be a random value on passthrough, > and it's being checked unconditionally of whether or not the given > request is a passthrough request or not. > > Rewrite the conditions to be separate for easier reading, and only check > ioerror for non-passthrough requests. This fixes an issue with bio > unmapping on passthrough, where it fails getting added to a batch. This > both leads to suboptimal performance, and may trigger a potential > schedule-under-atomic condition for polled passthrough IO. > > Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()") > Signed-off-by: Jens Axboe <axboe@kernel.dk> I observed the blktests test case nvme/039 failure with v6.14-rc3 kernel. I bisected and found that this patch in the v6.14-rc3 is the trigger. The test run output is as follows: nvme/039 => nvme0n1 (test error logging) [failed] runtime 5.378s ... 5.354s --- tests/nvme/039.out 2024-09-20 11:20:26.405380875 +0900 +++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad 2025-02-19 16:13:05.061387179 +0900 @@ -1,7 +1,3 @@ Running nvme/039 - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR - Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR - Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR cdw10=0x1 cdw11=0x0 cdw12=0x0 cdw13=0x0 cdw14=0x0 cdw15=0x0 - Read(0x2), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR cdw10=0x0 cdw11=0x0 cdw12=0x1 cdw13=0x0 cdw14=0x0 cdw15=0x0 Test complete The test case does error injection. Test method requires reconsideration to adjust to this kernel change, probably. Help for fix will be appreciated.
On 2/18/25 11:26 PM, Shinichiro Kawasaki wrote: > CC+: Alan, > > On Feb 13, 2025 / 08:18, Jens Axboe wrote: >> The conditions for whether or not a request is allowed adding to a >> completion batch are a bit hard to read, and they also have a few >> issues. One is that ioerror may indeed be a random value on passthrough, >> and it's being checked unconditionally of whether or not the given >> request is a passthrough request or not. >> >> Rewrite the conditions to be separate for easier reading, and only check >> ioerror for non-passthrough requests. This fixes an issue with bio >> unmapping on passthrough, where it fails getting added to a batch. This >> both leads to suboptimal performance, and may trigger a potential >> schedule-under-atomic condition for polled passthrough IO. >> >> Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()") >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > I observed the blktests test case nvme/039 failure with v6.14-rc3 kernel. I > bisected and found that this patch in the v6.14-rc3 is the trigger. > > The test run output is as follows: > > nvme/039 => nvme0n1 (test error logging) [failed] > runtime 5.378s ... 5.354s > --- tests/nvme/039.out 2024-09-20 11:20:26.405380875 +0900 > +++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad 2025-02-19 16:13:05.061387179 +0900 > @@ -1,7 +1,3 @@ > Running nvme/039 > - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR > - Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR > - Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR > Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR cdw10=0x1 cdw11=0x0 cdw12=0x0 cdw13=0x0 cdw14=0x0 cdw15=0x0 > - Read(0x2), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR cdw10=0x0 cdw11=0x0 cdw12=0x1 cdw13=0x0 cdw14=0x0 cdw15=0x0 > Test complete > > > The test case does error injection. Test method requires reconsideration to > adjust to this kernel change, probably. Help for fix will be appreciated. Not really an issue with the test, rather the error injector is broken. I'll investigate. Alan
On 2/19/25 11:17 AM, alan.adamson@oracle.com wrote: > > On 2/18/25 11:26 PM, Shinichiro Kawasaki wrote: >> CC+: Alan, >> >> On Feb 13, 2025 / 08:18, Jens Axboe wrote: >>> The conditions for whether or not a request is allowed adding to a >>> completion batch are a bit hard to read, and they also have a few >>> issues. One is that ioerror may indeed be a random value on >>> passthrough, >>> and it's being checked unconditionally of whether or not the given >>> request is a passthrough request or not. >>> >>> Rewrite the conditions to be separate for easier reading, and only >>> check >>> ioerror for non-passthrough requests. This fixes an issue with bio >>> unmapping on passthrough, where it fails getting added to a batch. This >>> both leads to suboptimal performance, and may trigger a potential >>> schedule-under-atomic condition for polled passthrough IO. >>> >>> Fixes: f794f3351f26 ("block: add support for >>> blk_mq_end_request_batch()") >>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> I observed the blktests test case nvme/039 failure with v6.14-rc3 >> kernel. I >> bisected and found that this patch in the v6.14-rc3 is the trigger. >> >> The test run output is as follows: >> >> nvme/039 => nvme0n1 (test error logging) [failed] >> runtime 5.378s ... 5.354s >> --- tests/nvme/039.out 2024-09-20 11:20:26.405380875 +0900 >> +++ >> /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad >> 2025-02-19 16:13:05.061387179 +0900 >> @@ -1,7 +1,3 @@ >> Running nvme/039 >> - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / >> sc 0x81) DNR >> - Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR >> - Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR >> Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR cdw10=0x1 >> cdw11=0x0 cdw12=0x0 cdw13=0x0 cdw14=0x0 cdw15=0x0 >> - Read(0x2), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR >> cdw10=0x0 cdw11=0x0 cdw12=0x1 cdw13=0x0 cdw14=0x0 cdw15=0x0 >> Test complete >> >> >> The test case does error injection. Test method requires >> reconsideration to >> adjust to this kernel change, probably. Help for fix will be >> appreciated. > > Not really an issue with the test, rather the error injector is > broken. I'll investigate. The following change allows the test to pass. - The check of (ioerror < 0) should be (ioerror != 0) - passthroughs can also have ioerror set, so false needs to be returned. This needs to be resolved. diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index fa2a76cc2f73..b2bd2ac40441 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -865,17 +865,17 @@ static inline bool blk_mq_add_to_batch(struct request *req, * 1) No batch container * 2) Has scheduler data attached * 3) Not a passthrough request and end_io set - * 4) Not a passthrough request and an ioerror + * 4) An ioerror */ if (!iob) return false; if (req->rq_flags & RQF_SCHED_TAGS) return false; + if (ioerror) + return false; if (!blk_rq_is_passthrough(req)) { if (req->end_io) return false; - if (ioerror < 0) - return false; } Alan
On Feb 19, 2025 / 17:04, alan.adamson@oracle.com wrote: > > On 2/19/25 11:17 AM, alan.adamson@oracle.com wrote: > > > > On 2/18/25 11:26 PM, Shinichiro Kawasaki wrote: > > > CC+: Alan, > > > > > > On Feb 13, 2025 / 08:18, Jens Axboe wrote: > > > > The conditions for whether or not a request is allowed adding to a > > > > completion batch are a bit hard to read, and they also have a few > > > > issues. One is that ioerror may indeed be a random value on > > > > passthrough, > > > > and it's being checked unconditionally of whether or not the given > > > > request is a passthrough request or not. > > > > > > > > Rewrite the conditions to be separate for easier reading, and > > > > only check > > > > ioerror for non-passthrough requests. This fixes an issue with bio > > > > unmapping on passthrough, where it fails getting added to a batch. This > > > > both leads to suboptimal performance, and may trigger a potential > > > > schedule-under-atomic condition for polled passthrough IO. > > > > > > > > Fixes: f794f3351f26 ("block: add support for > > > > blk_mq_end_request_batch()") > > > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > > I observed the blktests test case nvme/039 failure with v6.14-rc3 > > > kernel. I > > > bisected and found that this patch in the v6.14-rc3 is the trigger. > > > > > > The test run output is as follows: > > > > > > nvme/039 => nvme0n1 (test error logging) [failed] > > > runtime 5.378s ... 5.354s > > > --- tests/nvme/039.out 2024-09-20 11:20:26.405380875 +0900 > > > +++ > > > /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad > > > 2025-02-19 16:13:05.061387179 +0900 > > > @@ -1,7 +1,3 @@ > > > Running nvme/039 > > > - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 > > > / sc 0x81) DNR > > > - Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR > > > - Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR > > > Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR > > > cdw10=0x1 cdw11=0x0 cdw12=0x0 cdw13=0x0 cdw14=0x0 cdw15=0x0 > > > - Read(0x2), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR > > > cdw10=0x0 cdw11=0x0 cdw12=0x1 cdw13=0x0 cdw14=0x0 cdw15=0x0 > > > Test complete > > > > > > > > > The test case does error injection. Test method requires > > > reconsideration to > > > adjust to this kernel change, probably. Help for fix will be > > > appreciated. > > > > Not really an issue with the test, rather the error injector is broken. > > I'll investigate. Alan, thank you for looking into this. Then it sounds that a fix is needed in kernel side. > > The following change allows the test to pass. > > - The check of (ioerror < 0) should be (ioerror != 0) > > - passthroughs can also have ioerror set, so false needs to be returned. > This needs to be resolved. > > > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h > index fa2a76cc2f73..b2bd2ac40441 100644 > --- a/include/linux/blk-mq.h > +++ b/include/linux/blk-mq.h > @@ -865,17 +865,17 @@ static inline bool blk_mq_add_to_batch(struct request > *req, > * 1) No batch container > * 2) Has scheduler data attached > * 3) Not a passthrough request and end_io set > - * 4) Not a passthrough request and an ioerror > + * 4) An ioerror > */ > if (!iob) > return false; > if (req->rq_flags & RQF_SCHED_TAGS) > return false; > + if (ioerror) > + return false; > if (!blk_rq_is_passthrough(req)) { > if (req->end_io) > return false; > - if (ioerror < 0) > - return false; > } I think the modification above essentially reverts the trigger commit, so it means that we can not achieve what the commit aimed at. I took a look in the call chain. Before the commit, the call chain was like this for passthrough commands: nvme_handle_cqe() blk_mq_add_to_batch() ... false is returned, then call nvme_pci_complete_rq() nvme_pci_complete_rq() nvme_complete_rq() nvme_end_req() nvme_log_err_passthrough() ... log is printed __nvme_end_req() ... the nvme command ends After the commit, it looks like this: nvme_handle_cqe() blk_mq_add_to_batch() ... true is returned. Set nvme_pci_complete_batch() nvme_pci_complete_batch() nvme_complete_batch() nvme_complete_batch() nvme_complete_batch_req() __nvme_end_req() ... log is not printed but the command ends, then nvme/039 failed Assuming the change by the trigger commit is required, I guess adding nvme_log_err_passthrough() call in nvme_complete_batch_req() will restore the log print feature for the passthrough case. As a trial, I created a patch below [*]. I confirmed it avoids the nvme/039 failure. (As Alan noted, still I also find the check for "ioerror != 0" is required for non-passthrough case, to make the test case pass.) Jens, if you have comments, please share. [*] diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 818d4e49aab5..33030dd9b76a 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -496,6 +496,9 @@ void nvme_complete_batch_req(struct request *req) { trace_nvme_complete_rq(req); nvme_cleanup_cmd(req); + if (unlikely(nvme_req(req)->status && !(req->rq_flags & RQF_QUIET) && + blk_rq_is_passthrough(req))) + nvme_log_err_passthru(req); __nvme_end_req(req); } EXPORT_SYMBOL_GPL(nvme_complete_batch_req); diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index fa2a76cc2f73..b891ed113306 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -874,7 +874,7 @@ static inline bool blk_mq_add_to_batch(struct request *req, if (!blk_rq_is_passthrough(req)) { if (req->end_io) return false; - if (ioerror < 0) + if (ioerror) return false; }
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index a0a9007cc1e3..cc1cb5de8fb6 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -861,12 +861,22 @@ static inline bool blk_mq_add_to_batch(struct request *req, void (*complete)(struct io_comp_batch *)) { /* - * blk_mq_end_request_batch() can't end request allocated from - * sched tags + * Check various conditions that exclude batch processing: + * 1) No batch container + * 2) Has scheduler data attached + * 3) Not a passthrough request and end_io set + * 4) Not a passthrough request and an ioerror */ - if (!iob || (req->rq_flags & RQF_SCHED_TAGS) || ioerror || - (req->end_io && !blk_rq_is_passthrough(req))) + if (!iob) return false; + if (req->rq_flags & RQF_SCHED_TAGS) + return false; + if (!blk_rq_is_passthrough(req)) { + if (req->end_io) + return false; + if (ioerror < 0) + return false; + } if (!iob->complete) iob->complete = complete;
The conditions for whether or not a request is allowed adding to a completion batch are a bit hard to read, and they also have a few issues. One is that ioerror may indeed be a random value on passthrough, and it's being checked unconditionally of whether or not the given request is a passthrough request or not. Rewrite the conditions to be separate for easier reading, and only check ioerror for non-passthrough requests. This fixes an issue with bio unmapping on passthrough, where it fails getting added to a batch. This both leads to suboptimal performance, and may trigger a potential schedule-under-atomic condition for polled passthrough IO. Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()") Signed-off-by: Jens Axboe <axboe@kernel.dk> ---