Message ID | 20180123161613.14214-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Tue, Jan 23 2018 at 11:16am -0500, Ming Lei <ming.lei@redhat.com> wrote: > This status is returned from driver to block layer if device related > resource is run out of, but driver can guarantee that IO dispatch will > be triggered in future when the resource is available. > > This patch converts some drivers to use this return value. Meantime > if driver returns BLK_STS_RESOURCE and S_SCHED_RESTART is marked, run > queue after 10ms for avoiding IO hang. > > Suggested-by: Jens Axboe <axboe@kernel.dk> > Tested-by: Laurence Oberman <loberman@redhat.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Mike Snitzer <snitzer@redhat.com> > Cc: Laurence Oberman <loberman@redhat.com> > Cc: Bart Van Assche <bart.vanassche@sandisk.com> > Cc: Martin K. Petersen <martin.petersen@oracle.com> > Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Thanks Ming -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, 2018-01-24 at 00:16 +0800, Ming Lei wrote: > @@ -1280,10 +1282,18 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, > * - Some but not all block drivers stop a queue before > * returning BLK_STS_RESOURCE. Two exceptions are scsi-mq > * and dm-rq. > + * > + * If drivers return BLK_STS_RESOURCE and S_SCHED_RESTART > + * bit is set, run queue after 10ms for avoiding IO hang > + * because the queue may be idle and the RESTART mechanism > + * can't work any more. > */ > - if (!blk_mq_sched_needs_restart(hctx) || > + needs_restart = blk_mq_sched_needs_restart(hctx); > + if (!needs_restart || > (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) > blk_mq_run_hw_queue(hctx, true); > + else if (needs_restart && (ret == BLK_STS_RESOURCE)) > + blk_mq_delay_run_hw_queue(hctx, 10); > } My opinion about this patch is as follows: * Changing a blk_mq_delay_run_hw_queue() call followed by return BLK_STS_DEV_RESOURCE into return BLK_STS_RESOURCE is wrong because it changes a guaranteed queue rerun into a queue rerun that may or may not happen depending on whether or not multiple queue runs happen simultaneously. * This change makes block drivers less readable because anyone who encounters BLK_STS_DEV_RESOURCE will have to look up its definition to figure out what it's meaning is. * We don't need the new status code BLK_STS_DEV_RESOURCE because a delayed queue run can be implemented easily with the existing block layer API. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jan 23, 2018 at 04:24:20PM +0000, Bart Van Assche wrote: > On Wed, 2018-01-24 at 00:16 +0800, Ming Lei wrote: > > @@ -1280,10 +1282,18 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, > > * - Some but not all block drivers stop a queue before > > * returning BLK_STS_RESOURCE. Two exceptions are scsi-mq > > * and dm-rq. > > + * > > + * If drivers return BLK_STS_RESOURCE and S_SCHED_RESTART > > + * bit is set, run queue after 10ms for avoiding IO hang > > + * because the queue may be idle and the RESTART mechanism > > + * can't work any more. > > */ > > - if (!blk_mq_sched_needs_restart(hctx) || > > + needs_restart = blk_mq_sched_needs_restart(hctx); > > + if (!needs_restart || > > (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) > > blk_mq_run_hw_queue(hctx, true); > > + else if (needs_restart && (ret == BLK_STS_RESOURCE)) > > + blk_mq_delay_run_hw_queue(hctx, 10); > > } > > My opinion about this patch is as follows: > * Changing a blk_mq_delay_run_hw_queue() call followed by return > BLK_STS_DEV_RESOURCE into return BLK_STS_RESOURCE is wrong because it changes > a guaranteed queue rerun into a queue rerun that may or may not happen > depending on whether or not multiple queue runs happen simultaneously. You may not understand the two: 1) it is always safe to return BLK_STS_RESOURCE, which will make sure to avoid IO hang by blk_mq_delay_run_hw_queue() or blk_mq_run_hw_queue(), and using which one depends on SCHED_RESTART. 2) if driver can make sure the queue will be rerun after some resource is available, either by itself or by blk-mq, it will return BLK_STS_DEV_RESOURCE So what is wrong with this way? > * This change makes block drivers less readable because anyone who encounters > BLK_STS_DEV_RESOURCE will have to look up its definition to figure out what > it's meaning is. It has been well-documented. BLK_STS_DEV_RESOURCE can be used very less, so it shouldn't be an issue. > * We don't need the new status code BLK_STS_DEV_RESOURCE because a delayed > queue run can be implemented easily with the existing block layer API. You mean to convert every STS_RESOURCE to call the API there, that way need lots of change, and with race in theory, since when the delay run queue is called in driver, the request isn't added to dispatch list.
On Wed, 2018-01-24 at 00:37 +0800, Ming Lei wrote: > On Tue, Jan 23, 2018 at 04:24:20PM +0000, Bart Van Assche wrote: > > My opinion about this patch is as follows: > > * Changing a blk_mq_delay_run_hw_queue() call followed by return > > BLK_STS_DEV_RESOURCE into return BLK_STS_RESOURCE is wrong because it changes > > a guaranteed queue rerun into a queue rerun that may or may not happen > > depending on whether or not multiple queue runs happen simultaneously. > > You may not understand the two: > > 1) it is always safe to return BLK_STS_RESOURCE, which will make sure to > avoid IO hang by blk_mq_delay_run_hw_queue() or blk_mq_run_hw_queue(), > and using which one depends on SCHED_RESTART. > > 2) if driver can make sure the queue will be rerun after some resource > is available, either by itself or by blk-mq, it will return BLK_STS_DEV_RESOURCE > > So what is wrong with this way? Sorry, I swapped BLK_STS_DEV_RESOURCE and BLK_STS_RESOURCE accidentally in my reply. What I meant is that changing a blk_mq_delay_run_hw_queue() call followed by return BLK_STS_RESOURCE into BLK_STS_DEV_RESOURCE is wrong and introduces a race condition in code where there was no race condition. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jan 23, 2018 at 04:57:34PM +0000, Bart Van Assche wrote: > On Wed, 2018-01-24 at 00:37 +0800, Ming Lei wrote: > > On Tue, Jan 23, 2018 at 04:24:20PM +0000, Bart Van Assche wrote: > > > My opinion about this patch is as follows: > > > * Changing a blk_mq_delay_run_hw_queue() call followed by return > > > BLK_STS_DEV_RESOURCE into return BLK_STS_RESOURCE is wrong because it changes > > > a guaranteed queue rerun into a queue rerun that may or may not happen > > > depending on whether or not multiple queue runs happen simultaneously. > > > > You may not understand the two: > > > > 1) it is always safe to return BLK_STS_RESOURCE, which will make sure to > > avoid IO hang by blk_mq_delay_run_hw_queue() or blk_mq_run_hw_queue(), > > and using which one depends on SCHED_RESTART. > > > > 2) if driver can make sure the queue will be rerun after some resource > > is available, either by itself or by blk-mq, it will return BLK_STS_DEV_RESOURCE > > > > So what is wrong with this way? > > Sorry, I swapped BLK_STS_DEV_RESOURCE and BLK_STS_RESOURCE accidentally in my > reply. What I meant is that changing a blk_mq_delay_run_hw_queue() call followed > by return BLK_STS_RESOURCE into BLK_STS_DEV_RESOURCE is wrong and introduces a > race condition in code where there was no race condition. OK, then no such race you worried about in this patch. Jens, could you take a look at this patch? Thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jan 23 2018 at 10:31pm -0500, Ming Lei <ming.lei@redhat.com> wrote: > On Tue, Jan 23, 2018 at 04:57:34PM +0000, Bart Van Assche wrote: > > On Wed, 2018-01-24 at 00:37 +0800, Ming Lei wrote: > > > On Tue, Jan 23, 2018 at 04:24:20PM +0000, Bart Van Assche wrote: > > > > My opinion about this patch is as follows: > > > > * Changing a blk_mq_delay_run_hw_queue() call followed by return > > > > BLK_STS_DEV_RESOURCE into return BLK_STS_RESOURCE is wrong because it changes > > > > a guaranteed queue rerun into a queue rerun that may or may not happen > > > > depending on whether or not multiple queue runs happen simultaneously. > > > > > > You may not understand the two: > > > > > > 1) it is always safe to return BLK_STS_RESOURCE, which will make sure to > > > avoid IO hang by blk_mq_delay_run_hw_queue() or blk_mq_run_hw_queue(), > > > and using which one depends on SCHED_RESTART. > > > > > > 2) if driver can make sure the queue will be rerun after some resource > > > is available, either by itself or by blk-mq, it will return BLK_STS_DEV_RESOURCE > > > > > > So what is wrong with this way? > > > > Sorry, I swapped BLK_STS_DEV_RESOURCE and BLK_STS_RESOURCE accidentally in my > > reply. What I meant is that changing a blk_mq_delay_run_hw_queue() call followed > > by return BLK_STS_RESOURCE into BLK_STS_DEV_RESOURCE is wrong and introduces a > > race condition in code where there was no race condition. > > OK, then no such race you worried about in this patch. > > Jens, could you take a look at this patch? Hi Jens, Ming let me know that he successfully tested this V3 patch using both your test (fio to both mpath and underlying path) and Bart's (02-mq with can_queue in guest). Would be great if you'd review and verify this fix works for you too. Ideally we'd get a fix for this regression staged for 4.16 inclusion. This V3 patch seems like the best option we have at this point. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, 2018-01-27 at 14:09 -0500, Mike Snitzer wrote: > Ming let me know that he successfully tested this V3 patch using both > your test (fio to both mpath and underlying path) and Bart's (02-mq with > can_queue in guest). > > Would be great if you'd review and verify this fix works for you too. > > Ideally we'd get a fix for this regression staged for 4.16 inclusion. > This V3 patch seems like the best option we have at this point. Hello Mike, There are several issues with the patch at the start of this thread: - It is an unnecessary change of the block layer API. Queue stalls can already be addressed with the current block layer API, namely by inserting a blk_mq_delay_run_hw_queue() call before returning BLK_STS_RESOURCE. - The patch at the start of this thread complicates code further that is already too complicated, namely the blk-mq core. - The patch at the start of this thread introduces a regression in the SCSI core, namely a queue stall if a request completion occurs concurrently with the newly added BLK_MQ_S_SCHED_RESTART test in the blk-mq core. As a kernel maintainer one of your responsibilities is to help keeping the quality of the kernel code high. So I think that you, as a kernel maintainer, should tell Ming to discard this patch instead of asking to merge it upstream given all the disadvantages of this patch. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, Jan 27, 2018 at 10:12:43PM +0000, Bart Van Assche wrote: > On Sat, 2018-01-27 at 14:09 -0500, Mike Snitzer wrote: > > Ming let me know that he successfully tested this V3 patch using both > > your test (fio to both mpath and underlying path) and Bart's (02-mq with > > can_queue in guest). > > > > Would be great if you'd review and verify this fix works for you too. > > > > Ideally we'd get a fix for this regression staged for 4.16 inclusion. > > This V3 patch seems like the best option we have at this point. > > Hello Mike, > > There are several issues with the patch at the start of this thread: > - It is an unnecessary change of the block layer API. Queue stalls can > already be addressed with the current block layer API, namely by inserting > a blk_mq_delay_run_hw_queue() call before returning BLK_STS_RESOURCE. Again, both Jens and I concluded that it is a generic issue, which need generic solution. https://marc.info/?l=linux-kernel&m=151638176727612&w=2 Otherwise, it needs to change the handling on every BLK_STS_RESOURCE in drivers, do we really want to do that? Not mention, the request isn't added to dispatch list yet in .queue_rq(), strictly speaking, it is not correct to call blk_mq_delay_run_hw_queue() in .queue_rq(), so the current block layer API can't handle it well enough. > - The patch at the start of this thread complicates code further that is > already too complicated, namely the blk-mq core. That is just your opinion, I don't agree. > - The patch at the start of this thread introduces a regression in the > SCSI core, namely a queue stall if a request completion occurs concurrently > with the newly added BLK_MQ_S_SCHED_RESTART test in the blk-mq core. This patch only moves the blk_mq_delay_run_hw_queue() from scsi_queue_rq() to blk-mq, again, please explain it in detail how this patch V3 introduces this regression on SCSI. Actually this patch should fix a race on SCSI-MQ, because when scsi_queue_rq() call blk_mq_delay_run_hw_queue(), the request isn't in dispatch list yet, so in theory this request may not be visible when __blk_mq_run_hw_queue() is run. Don't expect the 3ms delay will cover that, it is absolutely fragile to depend on timing to deal with the race. Maybe it can be one LSF/MM topic proposal... thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, Jan 27 2018 at 5:12pm -0500, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Sat, 2018-01-27 at 14:09 -0500, Mike Snitzer wrote: > > Ming let me know that he successfully tested this V3 patch using both > > your test (fio to both mpath and underlying path) and Bart's (02-mq with > > can_queue in guest). > > > > Would be great if you'd review and verify this fix works for you too. > > > > Ideally we'd get a fix for this regression staged for 4.16 inclusion. > > This V3 patch seems like the best option we have at this point. > > Hello Mike, > > There are several issues with the patch at the start of this thread: > - It is an unnecessary change of the block layer API. Queue stalls can > already be addressed with the current block layer API, namely by inserting > a blk_mq_delay_run_hw_queue() call before returning BLK_STS_RESOURCE. > - The patch at the start of this thread complicates code further that is > already too complicated, namely the blk-mq core. The above says _nothing_ of substance. You talk so loudly against Ming's work that it has gotten to the point where nothing you say against Ming's work can be taken seriously. > - The patch at the start of this thread introduces a regression in the > SCSI core, namely a queue stall if a request completion occurs concurrently > with the newly added BLK_MQ_S_SCHED_RESTART test in the blk-mq core. And why is this supposed race unique to SCSI core? Fact is Ming dutifully implemented what Jens suggested. And he verified it to work. What have you done other than play the antagonist? > As a kernel maintainer one of your responsibilities is to help keeping the > quality of the kernel code high. So I think that you, as a kernel maintainer, > should tell Ming to discard this patch instead of > asking to merge it upstream > given all the disadvantages of this patch. Your contributions do _not_ make up for your inability to work well with others. Tiresome doesn't begin to describe these interactions. Life is too short to continue enduring your bullshit. But do let us know when you have something of substance to contribute (hint: code talks). -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, 2018-01-27 at 19:23 -0500, Mike Snitzer wrote: > Your contributions do _not_ make up for your inability to work well with > others. Tiresome doesn't begin to describe these interactions. > > Life is too short to continue enduring your bullshit. > > But do let us know when you have something of substance to contribute > (hint: code talks). Sorry Mike but what you wrote above is not only very gross but it is also wrong. What I did in my e-mail is to identify technical problems in Ming's work. Additionally, it seems like you forgot that recently I helped Ming? My patch "blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays" has been queued by Jens for kernel v4.16 and solves a problem that Ming has been working on for two months but that he was unable to come up with a proper fix for. Additionally, there is something that you have to explain: the patch "dm mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE" that you have queued in your tree is wrong and introduces a regression (https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=dm-4.16&id=316a795ad388e0c3ca613454851a28079d917a92). I'm surprised that you have not yet reverted that patch? BTW, in case you would not yet have noticed my patch "blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays" eliminates the delay you referred to in the description of that patch. In case the above would not yet have addressed the technical issue(s) you are facing, I would really appreciate it if you would stop using insults and if you could explain what technical problems you are facing. Isn't that what the Linux kernel is about, namely about collaboration between technical people from different organizations? Isn't that what made the Linux kernel great? Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, Jan 27 2018 at 7:54pm -0500, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Sat, 2018-01-27 at 19:23 -0500, Mike Snitzer wrote: > > Your contributions do _not_ make up for your inability to work well with > > others. Tiresome doesn't begin to describe these interactions. > > > > Life is too short to continue enduring your bullshit. > > > > But do let us know when you have something of substance to contribute > > (hint: code talks). > > Sorry Mike but what you wrote above is not only very gross but it is also > wrong. What I did in my e-mail is to identify technical problems in Ming's > work. Additionally, it seems like you forgot that recently I helped Ming? > My patch "blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces > unintended delays" has been queued by Jens for kernel v4.16 and solves a > problem that Ming has been working on for two months but that he was > unable to come up with a proper fix for. Additionally, there is something > that you have to explain: the patch "dm mpath: don't call > blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE" that you have > queued in your tree is wrong and introduces a regression > (https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=dm-4.16&id=316a795ad388e0c3ca613454851a28079d917a92). > I'm surprised that you have not yet reverted that patch? BTW, in case you > would not yet have noticed my patch "blk-mq: Avoid that > blk_mq_delay_run_hw_queue() introduces unintended delays" eliminates the > delay you referred to in the description of that patch. You cannot even be forthcoming about the technical merit of a change you authored (commit 6077c2d70) that I'm left to clean up in the face of performance bottlenecks it unwittingly introduced? If you were being honest: you'd grant that the random delay of 100ms is utterly baseless (not to mention that kicking the queue like you did is a complete hack). So that 100ms delay is what my dm-4.16 commit is talking about. Not the fact that blk-mq wasn't using kblockd_mod_delayed_work_on(). Commit 6077c2d70 was wrong and never should've been introduced! And you even said that reintroducing commit 6077c2d70 didn't fix the regression Ming's V3 fully corrects. So why would I revert eliminating it exactly? You aren't doing yourself any justice. We're all smart enough to see through your misdirection and labored attempt to cover your tracks. In the past you've been very helpful with highly reasoned and technical contributions. But you get unhinged more often than not when it comes to Ming's work. That has made you one of the most duplicitous engineers I've witnessed and had to deal with directly. It is like Dr Jeykll and Mr Hyde with you. So I'm done treating you with kid gloves. You are incapable of responding favorably to subtle social queues or even outright pleas for more productive behavior: https://marc.info/?l=linux-scsi&m=151011037229460&w=2 Otherwise I wouldn't be having to respond to you right now! > In case the above would not yet have addressed the technical issue(s) you > are facing, I would really appreciate it if you would stop using insults and > if you could explain what technical problems you are facing. Isn't that what > the Linux kernel is about, namely about collaboration between technical > people from different organizations? Isn't that what made the Linux kernel > great? Don't project onto me Bart. This isn't the first time you've been completely unprofessional and sadly it likely won't be the last. Treat others how you want to be treated. It really is that simple. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, 2018-01-27 at 21:03 -0500, Mike Snitzer wrote: > You cannot even be forthcoming about the technical merit of a change you > authored (commit 6077c2d70) that I'm left to clean up in the face of > performance bottlenecks it unwittingly introduced? If you were being > honest: you'd grant that the random delay of 100ms is utterly baseless > (not to mention that kicking the queue like you did is a complete > hack). So that 100ms delay is what my dm-4.16 commit is talking about. There are multiple errors in the above: 1. I have already explained in detail why commit 6077c2d70 is (a) correct and (b) essential. See e.g. https://www.redhat.com/archives/dm-devel/2018-January/msg00168.html. 2. With patch "blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays" applied, there is nothing to clean up anymore since that patch eliminates the queue delays that were triggered by blk_mq_delay_run_hw_queue(). 3. You know that I'm honest. Suggesting that I'm not is wrong. 4. I never claimed that 100ms is the optimal value for the queue rerunning delay. I have already explained to you that I copied that value from older dm-rq code. > Don't project onto me Bart. This isn't the first time you've been > completely unprofessional and sadly it likely won't be the last. The only person who is behaving unprofessionally in this e-mail thread is you. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, Jan 27 2018 at 10:00pm -0500, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Sat, 2018-01-27 at 21:03 -0500, Mike Snitzer wrote: > > You cannot even be forthcoming about the technical merit of a change you > > authored (commit 6077c2d70) that I'm left to clean up in the face of > > performance bottlenecks it unwittingly introduced? If you were being > > honest: you'd grant that the random delay of 100ms is utterly baseless > > (not to mention that kicking the queue like you did is a complete > > hack). So that 100ms delay is what my dm-4.16 commit is talking about. > > There are multiple errors in the above: > 1. I have already explained in detail why commit 6077c2d70 is (a) correct > and (b) essential. See e.g. https://www.redhat.com/archives/dm-devel/2018-January/msg00168.html. And you'd be wrong. Again. We've already established that commit 6077c2d70 is _not_ essential. Ming's V3, in conjunction with all the changes that already went into block and DM for 4.16, prove that. Yet here you go repeating yourself. You are clinging to a patch that absolutely had no business going in when it did. And even when presented with the reality that it was not the proper way to fix the issue you observed you keep going back to it like you cured cancer with a single line of code. > 2. With patch "blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces > unintended delays" applied, there is nothing to clean up anymore since > that patch eliminates the queue delays that were triggered by > blk_mq_delay_run_hw_queue(). The issue Ming fixed is that your random queue kicks break RESTART on dm-mq mpath. > 3. You know that I'm honest. Suggesting that I'm not is wrong. No, I trully do _not_ know you're always honest. You've conflated so many details on this topic that it makes me have serious doubts. You're lashing out so much, in defense of your dm_mq_queue_rq delayed queue kick, that you're missing there is a genuine generic blk-mq problem that is getting fixed in Ming's V3. > 4. I never claimed that 100ms is the optimal value for the queue > rerunning delay. I have already explained to you that I copied that > value from older dm-rq code. And I pointed out to you that you most certainly did _not_ copy the value from elsewhere: https://www.redhat.com/archives/dm-devel/2018-January/msg00202.html The specific delay used isn't the issue; the need to kick the queue like this is. Ming's V3 avoids unnecessary kicking. > > Don't project onto me Bart. This isn't the first time you've been > > completely unprofessional and sadly it likely won't be the last. > > The only person who is behaving unprofessionally in this e-mail thread > is you. Bart, the number of emails exchanged on this topic has exhausted _everyone_. Emotions get tense when things don't evolve despite every oppotunity for progress to be realized. When people cling to solutions that are no longer relevant. There is a very real need for progress here. So either get out of the way or suggest a specific series of change(s) that you feel better than Ming's V3. With that, I'll also stop responding to your baiting now and forevermore (e.g. "maintainers should this.. and maintainers should not that" or "your employer is not investing adequately". Next time you find yourself typing drivel like that: spare us all and hit delete). -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sun, Jan 28, 2018 at 12:54:38AM +0000, Bart Van Assche wrote: > On Sat, 2018-01-27 at 19:23 -0500, Mike Snitzer wrote: > > Your contributions do _not_ make up for your inability to work well with > > others. Tiresome doesn't begin to describe these interactions. > > > > Life is too short to continue enduring your bullshit. > > > > But do let us know when you have something of substance to contribute > > (hint: code talks). > > Sorry Mike but what you wrote above is not only very gross but it is also > wrong. What I did in my e-mail is to identify technical problems in Ming's Bart, You said my patch V3 introduces a race on SCSI, but you never explained it, I have asked you for at least two times and you never explain it, so could you please deal with this in a technical way? Thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, 2018-01-27 at 23:58 -0500, Mike Snitzer wrote: > On Sat, Jan 27 2018 at 10:00pm -0500, > Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > > On Sat, 2018-01-27 at 21:03 -0500, Mike Snitzer wrote: > > > You cannot even be forthcoming about the technical merit of a change you > > > authored (commit 6077c2d70) that I'm left to clean up in the face of > > > performance bottlenecks it unwittingly introduced? If you were being > > > honest: you'd grant that the random delay of 100ms is utterly baseless > > > (not to mention that kicking the queue like you did is a complete > > > hack). So that 100ms delay is what my dm-4.16 commit is talking about. > > > > There are multiple errors in the above: > > 1. I have already explained in detail why commit 6077c2d70 is (a) correct > > and (b) essential. See e.g. https://www.redhat.com/archives/dm-devel/2018-January/msg00168.html. > > And you'd be wrong. Again. We've already established that commit > 6077c2d70 is _not_ essential. Ming's V3, in conjunction with all the > changes that already went into block and DM for 4.16, prove that. What I wrote was right when commit 6077c2d70 got introduced. My explanation shows that at that time was both correct and essential. Ming's v3 is in my opinion not relevant yet to this discussion because it introduces new bugs and so far no attempt has been made to address these bugs. > > 2. With patch "blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces > > unintended delays" applied, there is nothing to clean up anymore since > > that patch eliminates the queue delays that were triggered by > > blk_mq_delay_run_hw_queue(). > > The issue Ming fixed is that your random queue kicks break RESTART on > dm-mq mpath. That comment is too short to be comprehensible for anyone. Can you elaborate this further? > > 3. You know that I'm honest. Suggesting that I'm not is wrong. > > No, I trully do _not_ know you're always honest. You've conflated so > many details on this topic that it makes me have serious doubts. You're > lashing out so much, in defense of your dm_mq_queue_rq delayed queue > kick, that you're missing there is a genuine generic blk-mq problem that > is getting fixed in Ming's V3. > > [ ... ] > > Bart, the number of emails exchanged on this topic has exhausted > _everyone_. Emotions get tense when things don't evolve despite every > oppotunity for progress to be realized. When people cling to solutions > that are no longer relevant. There is a very real need for progress > here. So either get out of the way or suggest a specific series of > change(s) that you feel better than Ming's V3. If you and Ming really care about the approach in the patch at the start of this e-mail thread, what are you waiting for to address the technical shortcomings that I pointed out? > With that, I'll also stop responding to your baiting now and forevermore > (e.g. "maintainers should this.. and maintainers should not that" or > "your employer is not investing adequately". Next time you find > yourself typing drivel like that: spare us all and hit delete). My opinion about what you wrote in this and the other e-mails you sent to me during the past months is as follows: 1. That I have always done my best to provide all the relevant technical information. 2. That my focus was on the technical aspects of the discussion. 3. That you did not try to reach a consensus but rather to dominate the discussion. 4. That the tone of your e-mails was very aggressive. 5. That you insulted me several times personally. I believe that your behavior is a violation of the kernel code of conflict and sufficient to file a complaint with the TAB. From Documentation/process/code-of-conflict.rst: "If however, anyone feels personally abused, threatened, or otherwise uncomfortable due to this process, that is not acceptable. If so, please contact the Linux Foundation's Technical Advisory Board [ ... ]" Additionally, since you are a U.S. Citizen, you should know that what you wrote in previous e-mails amounts to libel. A quote from a definition of libel: "to publish in print (including pictures), writing or broadcast through radio, television or film, an untruth about another which will do harm to that person or his/her reputation, by tending to bring the target into ridicule, hatred, scorn or contempt of others." You should be aware since I live in the U.S. that this makes it possible for me to sue you for defamation and to ask for a compensation. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, 2018-01-27 at 19:23 -0500, Mike Snitzer wrote: > On Sat, Jan 27 2018 at 5:12pm -0500, > Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > - The patch at the start of this thread introduces a regression in the > > SCSI core, namely a queue stall if a request completion occurs concurrently > > with the newly added BLK_MQ_S_SCHED_RESTART test in the blk-mq core. > > And why is this supposed race unique to SCSI core? That race is not unique to the SCSI core. It is possible that the patch at the start of this thread introduces the same race in other block drivers but I have not yet tried to verify that. > Fact is Ming dutifully implemented what Jens suggested. That's a misleading statement. Jens proposed to introduce a new status code (which he called "NO_DEV_RESOURCE") but did not specify any of the other aspects of Ming's patch. Jens definitely did not ask to introduce new race conditions. See also https://www.spinics.net/lists/kernel/msg2703154.html Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sun, 2018-01-28 at 16:57 +0000, Bart Van Assche wrote: > On Sat, 2018-01-27 at 23:58 -0500, Mike Snitzer wrote: > > On Sat, Jan 27 2018 at 10:00pm -0500, > > Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > > > > On Sat, 2018-01-27 at 21:03 -0500, Mike Snitzer wrote: > > > > You cannot even be forthcoming about the technical merit of a > > > > change you > > > > authored (commit 6077c2d70) that I'm left to clean up in the > > > > face of > > > > performance bottlenecks it unwittingly introduced? If you were > > > > being > > > > honest: you'd grant that the random delay of 100ms is utterly > > > > baseless > > > > (not to mention that kicking the queue like you did is a > > > > complete > > > > hack). So that 100ms delay is what my dm-4.16 commit is > > > > talking about. > > > > > > There are multiple errors in the above: > > > 1. I have already explained in detail why commit 6077c2d70 is (a) > > > correct > > > and (b) essential. See e.g. https://www.redhat.com/archives/dm > > > -devel/2018-January/msg00168.html. > > > > And you'd be wrong. Again. We've already established that commit > > 6077c2d70 is _not_ essential. Ming's V3, in conjunction with all > > the > > changes that already went into block and DM for 4.16, prove that. > > What I wrote was right when commit 6077c2d70 got introduced. My > explanation > shows that at that time was both correct and essential. Ming's v3 is > in my > opinion not relevant yet to this discussion because it introduces new > bugs > and so far no attempt has been made to address these bugs. > > > > 2. With patch "blk-mq: Avoid that blk_mq_delay_run_hw_queue() > > > introduces > > > unintended delays" applied, there is nothing to clean up > > > anymore since > > > that patch eliminates the queue delays that were triggered by > > > blk_mq_delay_run_hw_queue(). > > > > The issue Ming fixed is that your random queue kicks break RESTART > > on > > dm-mq mpath. > > That comment is too short to be comprehensible for anyone. Can you > elaborate > this further? > > > > 3. You know that I'm honest. Suggesting that I'm not is wrong. > > > > No, I trully do _not_ know you're always honest. You've conflated > > so > > many details on this topic that it makes me have serious > > doubts. You're > > lashing out so much, in defense of your dm_mq_queue_rq delayed > > queue > > kick, that you're missing there is a genuine generic blk-mq problem > > that > > is getting fixed in Ming's V3. > > > > [ ... ] > > > > Bart, the number of emails exchanged on this topic has exhausted > > _everyone_. Emotions get tense when things don't evolve despite > > every > > oppotunity for progress to be realized. When people cling to > > solutions > > that are no longer relevant. There is a very real need for > > progress > > here. So either get out of the way or suggest a specific series of > > change(s) that you feel better than Ming's V3. > > If you and Ming really care about the approach in the patch at the > start > of this e-mail thread, what are you waiting for to address the > technical > shortcomings that I pointed out? > > > With that, I'll also stop responding to your baiting now and > > forevermore > > (e.g. "maintainers should this.. and maintainers should not that" > > or > > "your employer is not investing adequately". Next time you find > > yourself typing drivel like that: spare us all and hit delete). > > My opinion about what you wrote in this and the other e-mails you > sent to > me during the past months is as follows: > 1. That I have always done my best to provide all the relevant > technical > information. > 2. That my focus was on the technical aspects of the discussion. > 3. That you did not try to reach a consensus but rather to dominate > the > discussion. > 4. That the tone of your e-mails was very aggressive. > 5. That you insulted me several times personally. > > I believe that your behavior is a violation of the kernel code of > conflict > and sufficient to file a complaint with the TAB. From > Documentation/process/code-of-conflict.rst: > > "If however, anyone feels personally abused, threatened, or otherwise > uncomfortable due to this process, that is not acceptable. If so, > please contact the Linux Foundation's Technical Advisory Board [ ... > ]" > > Additionally, since you are a U.S. Citizen, you should know that what > you > wrote in previous e-mails amounts to libel. A quote from a definition > of > libel: "to publish in print (including pictures), writing or > broadcast > through radio, television or film, an untruth about another which > will do > harm to that person or his/her reputation, by tending to bring the > target > into ridicule, hatred, scorn or contempt of others." You should be > aware > since I live in the U.S. that this makes it possible for me to sue > you for > defamation and to ask for a compensation. > > Bart. Folks I think we need to all take some time, take a breath and lets see how we can figure this out amicably. Everybody on this list contributes a huge amount of professional and personal time to improving the kernel. I for one have benefited a huge amount from assistance given to me in the past by Bart, Ming and Mike (and others) and I am not about to take sides here. While we cannot always be expected to agree we need to find a way to get over these types of disagreements. While Ming, Bart and Mike all have strong opinions this has to move forward so what is the best way for tie breaker here. Do the folks that know this code really well take a vote, majority wins for now and we have progress. Should this turn out to be an issue in the future then it gets addressed by reverts/changes. The changes have been tested but there is clearly some exposure or Bart would not be concerned the way he is about this. The amount of exposure should govern if this gets accepted or not. Being a very very low member on the this very talented list I for one can only contribute as far as testing is concerned and generic testing of this has been very well behaved so perhaps while we have exposure its acceptable risk. I hope I am not speaking out of turn. Sincerely Laurence -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sun, Jan 28, 2018 at 05:03:49PM +0000, Bart Van Assche wrote: > On Sat, 2018-01-27 at 19:23 -0500, Mike Snitzer wrote: > > On Sat, Jan 27 2018 at 5:12pm -0500, > > Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > > - The patch at the start of this thread introduces a regression in the > > > SCSI core, namely a queue stall if a request completion occurs concurrently > > > with the newly added BLK_MQ_S_SCHED_RESTART test in the blk-mq core. > > > > And why is this supposed race unique to SCSI core? > > That race is not unique to the SCSI core. It is possible that the patch at > the start of this thread introduces the same race in other block drivers > but I have not yet tried to verify that. > > > Fact is Ming dutifully implemented what Jens suggested. > > That's a misleading statement. Jens proposed to introduce a new status code > (which he called "NO_DEV_RESOURCE") but did not specify any of the other > aspects of Ming's patch. Jens definitely did not ask to introduce new race > conditions. See also https://www.spinics.net/lists/kernel/msg2703154.html I don't see what is the difference between my patch and Jens's suggestion. My patch just introduces new code of "BLK_STS_DEV_RESOURCE" which is returned to blk-mq if driver can make sure that queue will be run after resource is available, and keep the current code of 'BLK_STS_RESOURCE', this way is simpler since most of the in-tree 'BLK_STS_RESOURCE' does need to run queue via blk_mq_delay_run_hw_queue(hctx, 10)? Right? You just mentioned my patch introduces race, but never explained what is the race? How is it triggered? When is it? Now it is the 3rd time for me to ask you explain the introduced race in my patch V3. Is this your technical way to review/comment patch? Thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sun, 2018-01-28 at 07:41 +0800, Ming Lei wrote: > Not mention, the request isn't added to dispatch list yet in .queue_rq(), > strictly speaking, it is not correct to call blk_mq_delay_run_hw_queue() in > .queue_rq(), so the current block layer API can't handle it well enough. I disagree that it is not correct to call blk_mq_delay_run_hw_queue() from inside .queue_rq(). Additionally, I have already explained to you in previous e-mails why it's fine to call that function from inside .queue_rq(): - Nobody has ever observed the race you described because the time after which a queue is rerun by blk_mq_delay_run_hw_queue() is much larger than the time during which that race exists. - It is easy to fix this race inside the block layer, namely by using call_rcu() inside the blk_mq_delay_run_hw_queue() implementation to postpone the queue rerunning until after the request has been added back to the dispatch list. > > - The patch at the start of this thread introduces a regression in the > > SCSI core, namely a queue stall if a request completion occurs concurrently > > with the newly added BLK_MQ_S_SCHED_RESTART test in the blk-mq core. > > This patch only moves the blk_mq_delay_run_hw_queue() from scsi_queue_rq() > to blk-mq, again, please explain it in detail how this patch V3 introduces this > regression on SCSI. Please reread the following message: https://marc.info/?l=dm-devel&m=151672480107560. Thanks, Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Jan 29, 2018 at 04:48:31PM +0000, Bart Van Assche wrote: > On Sun, 2018-01-28 at 07:41 +0800, Ming Lei wrote: > > Not mention, the request isn't added to dispatch list yet in .queue_rq(), > > strictly speaking, it is not correct to call blk_mq_delay_run_hw_queue() in > > .queue_rq(), so the current block layer API can't handle it well enough. > > I disagree that it is not correct to call blk_mq_delay_run_hw_queue() from > inside .queue_rq(). Additionally, I have already explained to you in > previous e-mails why it's fine to call that function from inside .queue_rq(): > - Nobody has ever observed the race you described because the time after 'Nobody observed it' does not mean there isn't the race, since I can explain it clearly. > which a queue is rerun by blk_mq_delay_run_hw_queue() is much larger than > the time during which that race exists. Again it is fragile to depend on the delay(10ms, 3ms, or what ever) to make everything correct, why can't we make the way robust like the approach I took? You can not guarantee that the request handled in .queue_rq() is added to hctx->dispatch when you call blk_mq_delay_run_hw_queue(), because the operation of adding request to hctx->dispatch happens after returning from .queue_rq() to blk_mq_dispatch_rq_list(), which is done in the future against calling blk_mq_delay_run_hw_queue(). Right? Especially now kblockd_mod_delayed_work_on() is changed to use mod_delayed_work_on(), which may run the queue before the timer is expired from another context, then IO hang still can be triggered since the run queue may miss the request to be added to hctx->dispatch. > - It is easy to fix this race inside the block layer, namely by using > call_rcu() inside the blk_mq_delay_run_hw_queue() implementation to > postpone the queue rerunning until after the request has been added back to > the dispatch list. It is just easy to say, can you cook a patch and fix all drivers first? Then let's compare which patch is simpler and better. > > > > - The patch at the start of this thread introduces a regression in the > > > SCSI core, namely a queue stall if a request completion occurs concurrently > > > with the newly added BLK_MQ_S_SCHED_RESTART test in the blk-mq core. > > > > This patch only moves the blk_mq_delay_run_hw_queue() from scsi_queue_rq() > > to blk-mq, again, please explain it in detail how this patch V3 introduces this > > regression on SCSI. > > Please reread the following message: https://marc.info/?l=dm-devel&m=151672480107560. OK, the following message is copied from the above link: > My opinion about this patch is as follows: > * Changing a blk_mq_delay_run_hw_queue() call followed by return > BLK_STS_DEV_RESOURCE into return BLK_STS_RESOURCE is wrong because it changes > a guaranteed queue rerun into a queue rerun that may or may not happen > depending on whether or not multiple queue runs happen simultaneously. > * This change makes block drivers less readable because anyone who encounters > BLK_STS_DEV_RESOURCE will have to look up its definition to figure out what > it's meaning is. > * We don't need the new status code BLK_STS_DEV_RESOURCE because a delayed > queue run can be implemented easily with the existing block layer API. Later, you admitted you understood the patch wrong, so follows your reply again from https://marc.info/?l=dm-devel&m=151672694508389&w=2 > On Wed, 2018-01-24 at 00:37 +0800, Ming Lei wrote: > > On Tue, Jan 23, 2018 at 04:24:20PM +0000, Bart Van Assche wrote: > > > My opinion about this patch is as follows: > > > * Changing a blk_mq_delay_run_hw_queue() call followed by return > > > BLK_STS_DEV_RESOURCE into return BLK_STS_RESOURCE is wrong because it changes > > > a guaranteed queue rerun into a queue rerun that may or may not happen > > > depending on whether or not multiple queue runs happen simultaneously. > > > > You may not understand the two: > > > > 1) it is always safe to return BLK_STS_RESOURCE, which will make sure to > > avoid IO hang by blk_mq_delay_run_hw_queue() or blk_mq_run_hw_queue(), > > and using which one depends on SCHED_RESTART. > > > > 2) if driver can make sure the queue will be rerun after some resource > > is available, either by itself or by blk-mq, it will return BLK_STS_DEV_RESOURCE > > > > So what is wrong with this way? > > Sorry, I swapped BLK_STS_DEV_RESOURCE and BLK_STS_RESOURCE accidentally in my > reply. What I meant is that changing a blk_mq_delay_run_hw_queue() call followed > by return BLK_STS_RESOURCE into BLK_STS_DEV_RESOURCE is wrong and introduces a > race condition in code where there was no race condition. You still doesn't explain the race condition here, right? I can explain it again, but I am losing patience on you if you continue to refuse answer my question, just like you refused to provide debugfs log before when you reported issue. When turning BLK_STS_RESOURCE into BLK_STS_DEV_RESOURCE, it is driver's responsibility to make sure that the queue will be run after resource is available. That point is documented clearly. Especially on scsi_queue_rq(): - if (atomic_read(&sdev->device_busy) == 0 && - !scsi_device_blocked(sdev)) - blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); + if (atomic_read(&sdev->device_busy) || + scsi_device_blocked(sdev)) + ret = BLK_STS_DEV_RESOURCE; When either of two conditions become true, scsi knows that the queue will be restarted in future, and this patch just moves the blk_mq_delay_run_hw_queue() out of .queue_rq() into blk_mq_dispatch_rq_list() for avoiding the race mentioned above. I know there is race in scsi_queue_rq(), such as all in-flight request may be completed after atomic_read(&sdev->device_busy) in the following code is checked, but this race exists before and after my patch V3, and my patch V3 changes nothing about this race, so I don't know how/why you concluded that race conditions is introduced by my patch V3. - if (atomic_read(&sdev->device_busy) == 0 && - !scsi_device_blocked(sdev)) - blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
On Tue, 2018-01-30 at 09:07 +0800, Ming Lei wrote: > On Mon, Jan 29, 2018 at 04:48:31PM +0000, Bart Van Assche wrote: > > - It is easy to fix this race inside the block layer, namely by using > > call_rcu() inside the blk_mq_delay_run_hw_queue() implementation to > > postpone the queue rerunning until after the request has been added back to > > the dispatch list. > > It is just easy to say, can you cook a patch and fix all drivers first? Please reread what I wrote. I proposed to change the blk_mq_delay_run_hw_queue() IMPLEMENTATION such that the callers do not have to be modified. > [ ... ] Later, you admitted you understood the patch wrong. [ ... ] That's nonsense. I never wrote that. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jan 30, 2018 at 01:11:22AM +0000, Bart Van Assche wrote: > On Tue, 2018-01-30 at 09:07 +0800, Ming Lei wrote: > > On Mon, Jan 29, 2018 at 04:48:31PM +0000, Bart Van Assche wrote: > > > - It is easy to fix this race inside the block layer, namely by using > > > call_rcu() inside the blk_mq_delay_run_hw_queue() implementation to > > > postpone the queue rerunning until after the request has been added back to > > > the dispatch list. > > > > It is just easy to say, can you cook a patch and fix all drivers first? > > Please reread what I wrote. I proposed to change the blk_mq_delay_run_hw_queue() > IMPLEMENTATION such that the callers do not have to be modified. Please take a look at drivers, when BLK_STS_RESOURCE is returned, who will call blk_mq_delay_run_hw_queue() for drivers? > > > [ ... ] Later, you admitted you understood the patch wrong. [ ... ] > > That's nonsense. I never wrote that. Believe it or not, follows the link and your reply: https://marc.info/?l=dm-devel&m=151672694508389&w=2 > So what is wrong with this way? >Sorry, I swapped BLK_STS_DEV_RESOURCE and BLK_STS_RESOURCE accidentally in my >reply. What I meant is that changing a blk_mq_delay_run_hw_queue() call followed >by return BLK_STS_RESOURCE into BLK_STS_DEV_RESOURCE is wrong and introduces a >race condition in code where there was no race condition. > >Bart.
On Tue, 2018-01-30 at 11:31 +0800, Ming Lei wrote: > Please take a look at drivers, when BLK_STS_RESOURCE is returned, who > will call blk_mq_delay_run_hw_queue() for drivers? As you know the SCSI and dm drivers in kernel v4.15 already call that function whenever necessary. > > > > > [ ... ] Later, you admitted you understood the patch wrong. [ ... ] > > > > That's nonsense. I never wrote that. > > Believe it or not, follows the link and your reply: > > https://marc.info/?l=dm-devel&m=151672694508389&w=2 > > > So what is wrong with this way? > > Sorry, I swapped BLK_STS_DEV_RESOURCE and BLK_STS_RESOURCE accidentally in my > > reply. What I meant is that changing a blk_mq_delay_run_hw_queue() call followed > > by return BLK_STS_RESOURCE into BLK_STS_DEV_RESOURCE is wrong and introduces a > > race condition in code where there was no race condition. That meant that I made a mistake (typo) while typing the reply and nothing else. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Jan 30, 2018 at 03:37:38AM +0000, Bart Van Assche wrote: > On Tue, 2018-01-30 at 11:31 +0800, Ming Lei wrote: > > Please take a look at drivers, when BLK_STS_RESOURCE is returned, who > > will call blk_mq_delay_run_hw_queue() for drivers? > > As you know the SCSI and dm drivers in kernel v4.15 already call that function > whenever necessary. We have concluded that it is generic issue which need generic solution[1]. [1] https://marc.info/?l=linux-kernel&m=151638176727612&w=2 [ming@ming linux]$ git grep -n BLK_STS_RESOURCE ./drivers/ | wc -l 43 Almost every driver need this kind of change if BLK_STS_RESOURCE is returned from IO path. And the failure can be caused by kmalloc(GFP_ATOMIC), DMA Mapping, or what ever. Thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/block/blk-core.c b/block/blk-core.c index a2005a485335..ac4789c5e329 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -145,6 +145,7 @@ static const struct { [BLK_STS_MEDIUM] = { -ENODATA, "critical medium" }, [BLK_STS_PROTECTION] = { -EILSEQ, "protection" }, [BLK_STS_RESOURCE] = { -ENOMEM, "kernel resource" }, + [BLK_STS_DEV_RESOURCE] = { -ENOMEM, "device resource" }, [BLK_STS_AGAIN] = { -EAGAIN, "nonblocking retry" }, /* device mapper special case, should not leak out: */ diff --git a/block/blk-mq.c b/block/blk-mq.c index 01f271d40825..55c52b9a0f30 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1169,6 +1169,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, struct request *rq, *nxt; bool no_tag = false; int errors, queued; + blk_status_t ret = BLK_STS_OK; if (list_empty(list)) return false; @@ -1181,7 +1182,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, errors = queued = 0; do { struct blk_mq_queue_data bd; - blk_status_t ret; rq = list_first_entry(list, struct request, queuelist); if (!blk_mq_get_driver_tag(rq, &hctx, false)) { @@ -1226,7 +1226,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, } ret = q->mq_ops->queue_rq(hctx, &bd); - if (ret == BLK_STS_RESOURCE) { + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { /* * If an I/O scheduler has been configured and we got a * driver tag for the next request already, free it @@ -1257,6 +1257,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, * that is where we will continue on next queue run. */ if (!list_empty(list)) { + bool needs_restart; + spin_lock(&hctx->lock); list_splice_init(list, &hctx->dispatch); spin_unlock(&hctx->lock); @@ -1280,10 +1282,18 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, * - Some but not all block drivers stop a queue before * returning BLK_STS_RESOURCE. Two exceptions are scsi-mq * and dm-rq. + * + * If drivers return BLK_STS_RESOURCE and S_SCHED_RESTART + * bit is set, run queue after 10ms for avoiding IO hang + * because the queue may be idle and the RESTART mechanism + * can't work any more. */ - if (!blk_mq_sched_needs_restart(hctx) || + needs_restart = blk_mq_sched_needs_restart(hctx); + if (!needs_restart || (no_tag && list_empty_careful(&hctx->dispatch_wait.entry))) blk_mq_run_hw_queue(hctx, true); + else if (needs_restart && (ret == BLK_STS_RESOURCE)) + blk_mq_delay_run_hw_queue(hctx, 10); } return (queued + errors) != 0; @@ -1764,6 +1774,7 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx, *cookie = new_cookie; break; case BLK_STS_RESOURCE: + case BLK_STS_DEV_RESOURCE: __blk_mq_requeue_request(rq); break; default: @@ -1826,7 +1837,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, hctx_lock(hctx, &srcu_idx); ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false); - if (ret == BLK_STS_RESOURCE) + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) blk_mq_sched_insert_request(rq, false, true, false); else if (ret != BLK_STS_OK) blk_mq_end_request(rq, ret); diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index 6655893a3a7a..287a09611c0f 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -1230,7 +1230,7 @@ static blk_status_t null_handle_cmd(struct nullb_cmd *cmd) return BLK_STS_OK; } else /* requeue request */ - return BLK_STS_RESOURCE; + return BLK_STS_DEV_RESOURCE; } } diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 68846897d213..79908e6ddbf2 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -276,7 +276,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, /* Out of mem doesn't actually happen, since we fall back * to direct descriptors */ if (err == -ENOMEM || err == -ENOSPC) - return BLK_STS_RESOURCE; + return BLK_STS_DEV_RESOURCE; return BLK_STS_IOERR; } diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 891265acb10e..e126e4cac2ca 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -911,7 +911,7 @@ static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx, out_busy: blk_mq_stop_hw_queue(hctx); spin_unlock_irqrestore(&rinfo->ring_lock, flags); - return BLK_STS_RESOURCE; + return BLK_STS_DEV_RESOURCE; } static void blkif_complete_rq(struct request *rq) diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index d8519ddd7e1a..bf0b840645cc 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -408,7 +408,7 @@ static blk_status_t dm_dispatch_clone_request(struct request *clone, struct requ clone->start_time = jiffies; r = blk_insert_cloned_request(clone->q, clone); - if (r != BLK_STS_OK && r != BLK_STS_RESOURCE) + if (r != BLK_STS_OK && r != BLK_STS_RESOURCE && r != BLK_STS_DEV_RESOURCE) /* must complete clone in terms of original request */ dm_complete_request(rq, r); return r; @@ -500,7 +500,7 @@ static int map_request(struct dm_rq_target_io *tio) trace_block_rq_remap(clone->q, clone, disk_devt(dm_disk(md)), blk_rq_pos(rq)); ret = dm_dispatch_clone_request(clone, rq); - if (ret == BLK_STS_RESOURCE) { + if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { blk_rq_unprep_clone(clone); tio->ti->type->release_clone_rq(clone); tio->clone = NULL; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index d9ca1dfab154..55be2550c555 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2030,9 +2030,9 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, case BLK_STS_OK: break; case BLK_STS_RESOURCE: - if (atomic_read(&sdev->device_busy) == 0 && - !scsi_device_blocked(sdev)) - blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); + if (atomic_read(&sdev->device_busy) || + scsi_device_blocked(sdev)) + ret = BLK_STS_DEV_RESOURCE; break; default: /* diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index c5d3db0d83f8..730a8574d2b7 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -39,6 +39,24 @@ typedef u8 __bitwise blk_status_t; #define BLK_STS_AGAIN ((__force blk_status_t)12) +/* + * BLK_STS_DEV_RESOURCE: Block layer and block driver specific status, + * which is usually returned from driver to block layer in IO path. + * + * BLK_STS_DEV_RESOURCE is returned from driver to block layer if device + * related resource is run out of, but driver can guarantee that queue + * will be rerun in future for dispatching the current request when this + * resource is available. + * + * Difference with BLK_STS_RESOURCE: + * If driver isn't sure if the queue can be run again after this kind of + * resource is available, please return BLK_STS_RESOURCE, for example, + * when memory allocation, DMA Mapping or other system resource allocation + * fail and IO can't be submitted to device. + * + */ +#define BLK_STS_DEV_RESOURCE ((__force blk_status_t)13) + /** * blk_path_error - returns true if error may be path related * @error: status the request was completed with