Message ID | 20180129203300.50595-1-snitzer@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2018-01-29 at 15:33 -0500, Mike Snitzer wrote: > + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART > + * bit is set, run queue after 10ms to avoid IO stalls > + * that could otherwise occur if the queue is idle. > */ > - if (!blk_mq_sched_needs_restart(hctx) || > + needs_restart = blk_mq_sched_needs_restart(hctx); > + if (!needs_restart || > (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) > blk_mq_run_hw_queue(hctx, true); > + else if (needs_restart && (ret == BLK_STS_RESOURCE)) > + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY); > } The above code only calls blk_mq_delay_run_hw_queue() if the following condition is met: !(!needs_restart || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) && (needs_restart && (ret == BLK_STS_RESOURCE)). That boolean expression can be simplified into the following: needs_restart && ret == BLK_STS_RESOURCE && !(no_tag && list_empty_careful(&hctx->dispatch_wait.entry)). From blk-mq-sched.h: static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx) { return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); } In other words, the above code will not call blk_mq_delay_run_hw_queue() if BLK_MQ_S_SCHED_RESTART is cleared after it got set and before the above code is reached. The BLK_MQ_S_SCHED_RESTART bit gets cleared if a request completes concurrently with the above code. From blk-mq-sched.c: static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) { if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) return false; if (hctx->flags & BLK_MQ_F_TAG_SHARED) { struct request_queue *q = hctx->queue; if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) atomic_dec(&q->shared_hctx_restart); } else clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); return blk_mq_run_hw_queue(hctx, true); } The above blk_mq_run_hw_queue() call may finish either before or after blk_mq_dispatch_rq_list() examines the BLK_MQ_S_SCHED_RESTART flag. That's why I wrote in previous e-mails that this patch and also the previous versions of this patch change a systematic call of blk_mq_delay_run_hw_queue() into a call that may or may not happen, depending on whether or not a request completes concurrently with request queueing. Sorry but I think that means that the above change combined with changing BLK_STS_RESOURCE into BLK_STS_DEV_RESOURCE is wrong and why I expect that this will result in sporadic queue stalls. As you know sporadic queue stalls are annoying and hard to debug. Plenty of e-mails have already been exchanged about versions one to four of this patch but so far nobody has even tried to contradict the above. Bart.
On Mon, Jan 29 2018 at 4:22pm -0500, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Mon, 2018-01-29 at 15:33 -0500, Mike Snitzer wrote: > > + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART > > + * bit is set, run queue after 10ms to avoid IO stalls > > + * that could otherwise occur if the queue is idle. > > */ > > - if (!blk_mq_sched_needs_restart(hctx) || > > + needs_restart = blk_mq_sched_needs_restart(hctx); > > + if (!needs_restart || > > (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) > > blk_mq_run_hw_queue(hctx, true); > > + else if (needs_restart && (ret == BLK_STS_RESOURCE)) > > + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY); > > } > > The above code only calls blk_mq_delay_run_hw_queue() if the following condition > is met: !(!needs_restart || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) > && (needs_restart && (ret == BLK_STS_RESOURCE)). That boolean expression can be > simplified into the following: needs_restart && ret == BLK_STS_RESOURCE && > !(no_tag && list_empty_careful(&hctx->dispatch_wait.entry)). From blk-mq-sched.h: > > static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx) > { > return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); > } > > In other words, the above code will not call blk_mq_delay_run_hw_queue() if > BLK_MQ_S_SCHED_RESTART is cleared after it got set and before the above code is > reached. The BLK_MQ_S_SCHED_RESTART bit gets cleared if a request completes > concurrently with the above code. From blk-mq-sched.c: > > static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) > { > if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) > return false; > > if (hctx->flags & BLK_MQ_F_TAG_SHARED) { > struct request_queue *q = hctx->queue; > > if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) > atomic_dec(&q->shared_hctx_restart); > } else > clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); > > return blk_mq_run_hw_queue(hctx, true); > } > > The above blk_mq_run_hw_queue() call may finish either before or after > blk_mq_dispatch_rq_list() examines the BLK_MQ_S_SCHED_RESTART flag. But regardless of which wins the race, the queue will have been run. Which is all we care about right? > That's why I wrote in previous e-mails that this patch and also the previous > versions of this patch change a systematic call of blk_mq_delay_run_hw_queue() > into a call that may or may not happen, depending on whether or not a request > completes concurrently with request queueing. Sorry but I think that means > that the above change combined with changing BLK_STS_RESOURCE into > BLK_STS_DEV_RESOURCE is wrong and why I expect that this will result in > sporadic queue stalls. As you know sporadic queue stalls are annoying and hard > to debug. > > Plenty of e-mails have already been exchanged about versions one to four of this > patch but so far nobody has even tried to contradict the above. Sure, but we aren't mind-readers. We're all very open to the idea that you have noticed something we haven't. But until your very helpful mail I'm replying to, you hadn't been specific about the race that concerned you. Thanks for sharing. I'll defer to Jens and/or Ming on whether the race you've raised is of concern. As I said above, I'm inclined to think it doesn't matter who wins the race given the queue will be run again. And when it does it may get STS_RESOURCE again, and in that case it may then resort to blk_mq_delay_run_hw_queue() in blk_mq_dispatch_rq_list(). Mike
On Mon, 2018-01-29 at 16:44 -0500, Mike Snitzer wrote: > But regardless of which wins the race, the queue will have been run. > Which is all we care about right? Running the queue is not sufficient. With this patch applied it can happen that the block driver returns BLK_STS_DEV_RESOURCE, that the two or more concurrent queue runs finish before sufficient device resources are available to execute the request and that blk_mq_delay_run_hw_queue() does not get called at all. If no other activity triggers a queue run, e.g. request completion, this will result in a queue stall. Bart.
On Mon, Jan 29 2018 at 4:51pm -0500, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Mon, 2018-01-29 at 16:44 -0500, Mike Snitzer wrote: > > But regardless of which wins the race, the queue will have been run. > > Which is all we care about right? > > Running the queue is not sufficient. With this patch applied it can happen > that the block driver returns BLK_STS_DEV_RESOURCE, that the two or more > concurrent queue runs finish before sufficient device resources are available > to execute the request and that blk_mq_delay_run_hw_queue() does not get > called at all. If no other activity triggers a queue run, e.g. request > completion, this will result in a queue stall. If BLK_STS_DEV_RESOURCE is returned then the driver doesn't need to rely on a future queue run. IIUC, that is the entire premise of BLK_STS_DEV_RESOURCE. If the driver had doubt about whether the resource were going to be available in the future it should return BLK_STS_RESOURCE. That may seem like putting a lot on a driver developer (to decide between the 2) but I'll again defer to Jens here. This was the approach he was advocating be pursued. In practice it is working. Would welcome you testing this V4. Thanks, Mike
On Mon, 2018-01-29 at 17:14 -0500, Mike Snitzer wrote: > On Mon, Jan 29 2018 at 4:51pm -0500, > Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > > On Mon, 2018-01-29 at 16:44 -0500, Mike Snitzer wrote: > > > But regardless of which wins the race, the queue will have been run. > > > Which is all we care about right? > > > > Running the queue is not sufficient. With this patch applied it can happen > > that the block driver returns BLK_STS_DEV_RESOURCE, that the two or more > > concurrent queue runs finish before sufficient device resources are available > > to execute the request and that blk_mq_delay_run_hw_queue() does not get > > called at all. If no other activity triggers a queue run, e.g. request > > completion, this will result in a queue stall. > > If BLK_STS_DEV_RESOURCE is returned then the driver doesn't need to rely > on a future queue run. IIUC, that is the entire premise of > BLK_STS_DEV_RESOURCE. If the driver had doubt about whether the > resource were going to be available in the future it should return > BLK_STS_RESOURCE. > > That may seem like putting a lot on a driver developer (to decide > between the 2) but I'll again defer to Jens here. This was the approach > he was advocating be pursued. Hello Mike, There was a typo in my reply: I should have referred to BLK_STS_RESOURCE instead of BLK_STS_DEV_RESOURCE. The convention for BLK_STS_DEV_RESOURCE is namely that if .queue_rq() returns that status code that the block driver guarantees that the queue will be rerun. Bart.
On Tue, Jan 30, 2018 at 5:22 AM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Mon, 2018-01-29 at 15:33 -0500, Mike Snitzer wrote: >> + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART >> + * bit is set, run queue after 10ms to avoid IO stalls >> + * that could otherwise occur if the queue is idle. >> */ >> - if (!blk_mq_sched_needs_restart(hctx) || >> + needs_restart = blk_mq_sched_needs_restart(hctx); >> + if (!needs_restart || >> (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) >> blk_mq_run_hw_queue(hctx, true); >> + else if (needs_restart && (ret == BLK_STS_RESOURCE)) >> + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY); >> } > > The above code only calls blk_mq_delay_run_hw_queue() if the following condition > is met: !(!needs_restart || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) > && (needs_restart && (ret == BLK_STS_RESOURCE)). That boolean expression can be > simplified into the following: needs_restart && ret == BLK_STS_RESOURCE && > !(no_tag && list_empty_careful(&hctx->dispatch_wait.entry)). From blk-mq-sched.h: No, the expression of (needs_restart && (ret == BLK_STS_RESOURCE)) clearly expresses what we want to do. When RESTART is working, and meantime BLK_STS_RESOURCE is returned from driver, we need to avoid IO hang by blk_mq_delay_run_hw_queue(). > > static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx) > { > return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); > } > > In other words, the above code will not call blk_mq_delay_run_hw_queue() if > BLK_MQ_S_SCHED_RESTART is cleared after it got set and before the above code is > reached. The BLK_MQ_S_SCHED_RESTART bit gets cleared if a request completes > concurrently with the above code. From blk-mq-sched.c: That won't be an issue, given any time the SCHED_RESTART is cleared, the queue will be run again, so we needn't to worry about that. > > static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) > { > if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) > return false; > > if (hctx->flags & BLK_MQ_F_TAG_SHARED) { > struct request_queue *q = hctx->queue; > > if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) > atomic_dec(&q->shared_hctx_restart); > } else > clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); > > return blk_mq_run_hw_queue(hctx, true); > } > > The above blk_mq_run_hw_queue() call may finish either before or after If the above blk_mq_run_hw_queue() finishes before blk_mq_dispatch_rq_list() examines the flag, blk_mq_dispatch_rq_list() will see the flag cleared, and run queue again by the following code branch: if (!blk_mq_sched_needs_restart(hctx) || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) blk_mq_run_hw_queue(hctx, true); If blk_mq_run_hw_queue() finishes after blk_mq_dispatch_rq_list() examines the flag, this blk_mq_run_hw_queue() will see the new added request, and still everything is fine. Even blk_mq_delay_run_hw_queue() can be started too. > blk_mq_dispatch_rq_list() examines the BLK_MQ_S_SCHED_RESTART flag. > > That's why I wrote in previous e-mails that this patch and also the previous > versions of this patch change a systematic call of blk_mq_delay_run_hw_queue() > into a call that may or may not happen, depending on whether or not a request > completes concurrently with request queueing. Sorry but I think that means > that the above change combined with changing BLK_STS_RESOURCE into > BLK_STS_DEV_RESOURCE is wrong and why I expect that this will result in > sporadic queue stalls. As you know sporadic queue stalls are annoying and hard > to debug. Now there is debugfs, it isn't difficult to deal with that if reporter'd like to cowork. > > Plenty of e-mails have already been exchanged about versions one to four of this > patch but so far nobody has even tried to contradict the above. No, I don't see the issue, let's revisit the main change again: + needs_restart = blk_mq_sched_needs_restart(hctx); + if (!needs_restart || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) blk_mq_run_hw_queue(hctx, true); + else if (needs_restart && (ret == BLK_STS_RESOURCE)) + blk_mq_delay_run_hw_queue(hctx, 10); If SCHED_RESTART isn't set, queue is run immediately, otherwise when BLK_STS_RESOURCE is returned, we avoid IO hang by blk_mq_delay_run_hw_queue(hctx, XXX). And we don't cover BLK_STS_DEV_RESOURCE above because it is documented clearly that BLK_STS_DEV_RESOURCE is only returned by driver iff queue will be rerun in future if resource is available. Is there any hole in above code? Thanks, Ming Lei
On Tue, Jan 30, 2018 at 5:51 AM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Mon, 2018-01-29 at 16:44 -0500, Mike Snitzer wrote: >> But regardless of which wins the race, the queue will have been run. >> Which is all we care about right? > > Running the queue is not sufficient. With this patch applied it can happen > that the block driver returns BLK_STS_DEV_RESOURCE, that the two or more > concurrent queue runs finish before sufficient device resources are available > to execute the request and that blk_mq_delay_run_hw_queue() does not get > called at all. If no other activity triggers a queue run, e.g. request > completion, this will result in a queue stall. Please see document of BLK_STS_DEV_RESOURCE: + * BLK_STS_DEV_RESOURCE is returned from driver to block layer if device + * related resource is unavailable, but driver can guarantee that queue + * will be rerun in future once the resource is available (whereby + * dispatching requests). I have explained the SCSI's BLK_STS_DEV_RESOURCE conversion in another thread, let's know if you have further concern: https://marc.info/?l=linux-block&m=151727454815018&w=2
On Tue, Jan 30, 2018 at 6:14 AM, Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, Jan 29 2018 at 4:51pm -0500, > Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > >> On Mon, 2018-01-29 at 16:44 -0500, Mike Snitzer wrote: >> > But regardless of which wins the race, the queue will have been run. >> > Which is all we care about right? >> >> Running the queue is not sufficient. With this patch applied it can happen >> that the block driver returns BLK_STS_DEV_RESOURCE, that the two or more >> concurrent queue runs finish before sufficient device resources are available >> to execute the request and that blk_mq_delay_run_hw_queue() does not get >> called at all. If no other activity triggers a queue run, e.g. request >> completion, this will result in a queue stall. > > If BLK_STS_DEV_RESOURCE is returned then the driver doesn't need to rely > on a future queue run. IIUC, that is the entire premise of > BLK_STS_DEV_RESOURCE. If the driver had doubt about whether the > resource were going to be available in the future it should return > BLK_STS_RESOURCE. > > That may seem like putting a lot on a driver developer (to decide > between the 2) but I'll again defer to Jens here. This was the approach > he was advocating be pursued. Thinking of further, maybe you can add the following description in V5, and it should be much easier for driver developer to follow: When any resource allocation fails, if driver can make sure that there is any in-flight IO, it is safe to return BLK_STS_DEV_RESOURCE to blk-mq, that is exactly what scsi_queue_rq() is doing. Follows the theory: 1) driver returns BLK_STS_DEV_RESOURCE if driver figures out there is any in-flight IO, in case of any resource allocation failure 2) If all these in-flight IOs complete before examining SCHED_RESTART in blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue is run immediately in this case by blk_mq_dispatch_rq_list(); 3) if there is any in-flight IO after/when examining SCHED_RESTART in blk_mq_dispatch_rq_list(): - if SCHED_RESTART isn't set, queue is run immediately as handled in 2) - otherwise, this request will be dispatched after any in-flight IO is completed via blk_mq_sched_restart() since this request is added to hctx->dispatch already And there are two invariants when driver returns BLK_STS_DEV_RESOURCE iff there is any in-flight IOs: 1) SCHED_RESTART must be zero if no in-flight IOs 2) there has to be any IO in-flight if SCHED_RESTART is read as 1 Thanks, Ming
>>>>> "Bart" == Bart Van Assche <Bart.VanAssche@wdc.com> writes: Bart> On Mon, 2018-01-29 at 16:44 -0500, Mike Snitzer wrote: >> But regardless of which wins the race, the queue will have been run. >> Which is all we care about right? Bart> Running the queue is not sufficient. With this patch applied it can happen Bart> that the block driver returns BLK_STS_DEV_RESOURCE, that the two or more Bart> concurrent queue runs finish before sufficient device resources are available Bart> to execute the request and that blk_mq_delay_run_hw_queue() does not get Bart> called at all. If no other activity triggers a queue run, e.g. request Bart> completion, this will result in a queue stall. Doesn't this argue that you really want some sort of completions to be run in this case instead? Instead of busy looping or waiting for a set amount of time, just fire off a callback to run once you have the resources available, no? I can't provide any code examples, I don't know the code base nearly well enough. John
On Thu, 2018-02-01 at 19:26 -0500, John Stoffel wrote: > Doesn't this argue that you really want some sort of completions to be > run in this case instead? Instead of busy looping or waiting for a > set amount of time, just fire off a callback to run once you have the > resources available, no? Hello John, Rerunning the queue when the resource we ran out of is available again would be ideal. However, I'm not sure it is possible to implement this. If a driver performs memory allocation while executing a request and kmalloc() returns -ENOMEM then I don't know which mechanism should be used to trigger a queue rerun when sufficient memory is available again. Another example is the SCSI subsystem. If a the .queuecommand() implementation of a SCSI LLD returns SCSI_MLQUEUE_*_BUSY because a certain HBA condition occurred and the HBA does not trigger an interrupt when that condition is cleared, how to figure out whether or not a request can be executed other than by retrying, which means rerunning the queue? Thanks, Bart.
diff --git a/block/blk-core.c b/block/blk-core.c index cdae69be68e9..38279d4ae08b 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -145,6 +145,7 @@ static const struct { [BLK_STS_MEDIUM] = { -ENODATA, "critical medium" }, [BLK_STS_PROTECTION] = { -EILSEQ, "protection" }, [BLK_STS_RESOURCE] = { -ENOMEM, "kernel resource" }, + [BLK_STS_DEV_RESOURCE] = { -ENOMEM, "device resource" }, [BLK_STS_AGAIN] = { -EAGAIN, "nonblocking retry" }, /* device mapper special case, should not leak out: */ diff --git a/block/blk-mq.c b/block/blk-mq.c index 43e7449723e0..dd097ca5f1e9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1160,6 +1160,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx, return true; } +#define BLK_MQ_QUEUE_DELAY 3 /* ms units */ + bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, bool got_budget) { @@ -1167,6 +1169,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, struct request *rq, *nxt; bool no_tag = false; int errors, queued; + blk_status_t ret = BLK_STS_OK; if (list_empty(list)) return false; @@ -1179,7 +1182,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, errors = queued = 0; do { struct blk_mq_queue_data bd; - blk_status_t ret; rq = list_first_entry(list, struct request, queuelist); if (!blk_mq_get_driver_tag(rq, &hctx, false)) { @@ -1224,7 +1226,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, } ret = q->mq_ops->queue_rq(hctx, &bd); - if (ret == BLK_STS_RESOURCE) { + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { /* * If an I/O scheduler has been configured and we got a * driver tag for the next request already, free it @@ -1255,6 +1257,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, * that is where we will continue on next queue run. */ if (!list_empty(list)) { + bool needs_restart; + spin_lock(&hctx->lock); list_splice_init(list, &hctx->dispatch); spin_unlock(&hctx->lock); @@ -1278,10 +1282,17 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, * - Some but not all block drivers stop a queue before * returning BLK_STS_RESOURCE. Two exceptions are scsi-mq * and dm-rq. + * + * If driver returns BLK_STS_RESOURCE and SCHED_RESTART + * bit is set, run queue after 10ms to avoid IO stalls + * that could otherwise occur if the queue is idle. */ - if (!blk_mq_sched_needs_restart(hctx) || + needs_restart = blk_mq_sched_needs_restart(hctx); + if (!needs_restart || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) blk_mq_run_hw_queue(hctx, true); + else if (needs_restart && (ret == BLK_STS_RESOURCE)) + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_QUEUE_DELAY); } return (queued + errors) != 0; @@ -1762,6 +1773,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, *cookie = new_cookie; break; case BLK_STS_RESOURCE: + case BLK_STS_DEV_RESOURCE: __blk_mq_requeue_request(rq); break; default: @@ -1824,7 +1836,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, hctx_lock(hctx, &srcu_idx); ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false); - if (ret == BLK_STS_RESOURCE) + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) blk_mq_sched_insert_request(rq, false, true, false); else if (ret != BLK_STS_OK) blk_mq_end_request(rq, ret); diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index 5b94e530570c..4bc25fc4e73c 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -1230,7 +1230,7 @@ static blk_status_t null_handle_cmd(struct nullb_cmd *cmd) return BLK_STS_OK; } else /* requeue request */ - return BLK_STS_RESOURCE; + return BLK_STS_DEV_RESOURCE; } } diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 68846897d213..79908e6ddbf2 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -276,7 +276,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, /* Out of mem doesn't actually happen, since we fall back * to direct descriptors */ if (err == -ENOMEM || err == -ENOSPC) - return BLK_STS_RESOURCE; + return BLK_STS_DEV_RESOURCE; return BLK_STS_IOERR; } diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 891265acb10e..e126e4cac2ca 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -911,7 +911,7 @@ static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx, out_busy: blk_mq_stop_hw_queue(hctx); spin_unlock_irqrestore(&rinfo->ring_lock, flags); - return BLK_STS_RESOURCE; + return BLK_STS_DEV_RESOURCE; } static void blkif_complete_rq(struct request *rq) diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index b7d175e94a02..348a0cb6963a 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -404,7 +404,7 @@ static blk_status_t dm_dispatch_clone_request(struct request *clone, struct requ clone->start_time = jiffies; r = blk_insert_cloned_request(clone->q, clone); - if (r != BLK_STS_OK && r != BLK_STS_RESOURCE) + if (r != BLK_STS_OK && r != BLK_STS_RESOURCE && r != BLK_STS_DEV_RESOURCE) /* must complete clone in terms of original request */ dm_complete_request(rq, r); return r; @@ -496,7 +496,7 @@ static int map_request(struct dm_rq_target_io *tio) trace_block_rq_remap(clone->q, clone, disk_devt(dm_disk(md)), blk_rq_pos(rq)); ret = dm_dispatch_clone_request(clone, rq); - if (ret == BLK_STS_RESOURCE) { + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_rq_unprep_clone(clone); tio->ti->type->release_clone_rq(clone); tio->clone = NULL; @@ -769,7 +769,6 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx, /* Undo dm_start_request() before requeuing */ rq_end_stats(md, rq); rq_completed(md, rq_data_dir(rq), false); - blk_mq_delay_run_hw_queue(hctx, 100/*ms*/); return BLK_STS_RESOURCE; } diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index b76ba4629e02..54e679541ad6 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -35,8 +35,6 @@ enum nvme_fc_queue_flags { NVME_FC_Q_LIVE, }; -#define NVMEFC_QUEUE_DELAY 3 /* ms units */ - #define NVME_FC_DEFAULT_DEV_LOSS_TMO 60 /* seconds */ struct nvme_fc_queue { @@ -2231,7 +2229,7 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue, * the target device is present */ if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) - goto busy; + return BLK_STS_RESOURCE; if (!nvme_fc_ctrl_get(ctrl)) return BLK_STS_IOERR; @@ -2311,16 +2309,10 @@ nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue, ret != -EBUSY) return BLK_STS_IOERR; - goto busy; + return BLK_STS_RESOURCE; } return BLK_STS_OK; - -busy: - if (!(op->flags & FCOP_FLAGS_AEN) && queue->hctx) - blk_mq_delay_run_hw_queue(queue->hctx, NVMEFC_QUEUE_DELAY); - - return BLK_STS_RESOURCE; } static inline blk_status_t nvme_fc_is_ready(struct nvme_fc_queue *queue, diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index d9ca1dfab154..55be2550c555 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2030,9 +2030,9 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, case BLK_STS_OK: break; case BLK_STS_RESOURCE: - if (atomic_read(&sdev->device_busy) == 0 && - !scsi_device_blocked(sdev)) - blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); + if (atomic_read(&sdev->device_busy) || + scsi_device_blocked(sdev)) + ret = BLK_STS_DEV_RESOURCE; break; default: /* diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 2d973ac54b09..6a8ad60e7f09 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -39,6 +39,20 @@ typedef u8 __bitwise blk_status_t; #define BLK_STS_AGAIN ((__force blk_status_t)12) +/* + * BLK_STS_DEV_RESOURCE is returned from driver to block layer if device + * related resource is unavailable, but driver can guarantee that queue + * will be rerun in future once the resource is available (whereby + * dispatching requests). + * + * Difference with BLK_STS_RESOURCE: + * If driver isn't sure if the queue will be rerun once device resource + * is made available, please return BLK_STS_RESOURCE. For example: when + * memory allocation, DMA Mapping or other system resource allocation + * fails and IO can't be submitted to device. + */ +#define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13) + /** * blk_path_error - returns true if error may be path related * @error: status the request was completed with