diff mbox

dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure

Message ID 20170922013506.22510-1-ming.lei@redhat.com (mailing list archive)
State Rejected, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Ming Lei Sept. 22, 2017, 1:35 a.m. UTC
blk-mq will rerun queue via RESTART after one request is completed,
so not necessary to wait random time for requeuing, we should trust
blk-mq to do it.

More importantly, we need return BLK_STS_RESOURCE to blk-mq
so that dequeue from I/O scheduler can be stopped, then
I/O merge gets improved.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-mpath.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Sept. 22, 2017, 3:06 p.m. UTC | #1
On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote:
> +		/*
> +		 * blk-mq's SCHED_RESTART can cover this requeue, so
> +		 * we needn't to deal with it by DELAY_REQUEUE. More
> +		 * importantly, we have to return DM_MAPIO_REQUEUE
> +		 * so that blk-mq can get the queue busy feedback,
> +		 * otherwise I/O merge can be hurt.
> +		 */
> +		if (q->mq_ops)
> +			return DM_MAPIO_REQUEUE;
> +		else
> +			return DM_MAPIO_DELAY_REQUEUE;
>  	}

This patch is inferior to what I posted because this patch does not avoid
the delay if multiple LUNs are associated with the same SCSI host. Consider
e.g. the following configuration:
* A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda
  and /dev/sdb.
* A dm-mpath instance has been created on top of /dev/sda.
If all tags are in use by requests queued to /dev/sdb, no dm requests are in
progress and a request is submitted against the dm-mpath device then the
blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued
and the queue will be rerun after a delay.

My patch does not introduce a delay in this case.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Ming Lei Sept. 22, 2017, 5:44 p.m. UTC | #2
On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote:
> > +		/*
> > +		 * blk-mq's SCHED_RESTART can cover this requeue, so
> > +		 * we needn't to deal with it by DELAY_REQUEUE. More
> > +		 * importantly, we have to return DM_MAPIO_REQUEUE
> > +		 * so that blk-mq can get the queue busy feedback,
> > +		 * otherwise I/O merge can be hurt.
> > +		 */
> > +		if (q->mq_ops)
> > +			return DM_MAPIO_REQUEUE;
> > +		else
> > +			return DM_MAPIO_DELAY_REQUEUE;
> >  	}
> 
> This patch is inferior to what I posted because this patch does not avoid
> the delay if multiple LUNs are associated with the same SCSI host. Consider
> e.g. the following configuration:
> * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda
>   and /dev/sdb.
> * A dm-mpath instance has been created on top of /dev/sda.
> If all tags are in use by requests queued to /dev/sdb, no dm requests are in
> progress and a request is submitted against the dm-mpath device then the
> blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued
> and the queue will be rerun after a delay.
> 
> My patch does not introduce a delay in this case.

That delay may not matter because SCHED_RESTART will run queue just
after one request is completed.

There is at least one issue with get_request(GFP_NOIO): AIO
performance regression may not be caused, or even AIO may not
be possible. For example, user runs fio(libaio, randread, single
job, queue depth: 64, device: dm-mpath disk), if get_request(GFP_NOIO)
often blocks because of shared tags or out of tag, the actual queue
depth won't reach 64 at all, and may be just 1 in the worst case.
Once the actual queue depth is decreased much, random I/O performance
should be hurt a lot.
Bart Van Assche Sept. 22, 2017, 5:54 p.m. UTC | #3
On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote:
> On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote:
> > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote:
> > > +		/*
> > > +		 * blk-mq's SCHED_RESTART can cover this requeue, so
> > > +		 * we needn't to deal with it by DELAY_REQUEUE. More
> > > +		 * importantly, we have to return DM_MAPIO_REQUEUE
> > > +		 * so that blk-mq can get the queue busy feedback,
> > > +		 * otherwise I/O merge can be hurt.
> > > +		 */
> > > +		if (q->mq_ops)
> > > +			return DM_MAPIO_REQUEUE;
> > > +		else
> > > +			return DM_MAPIO_DELAY_REQUEUE;
> > >  	}
> > 
> > This patch is inferior to what I posted because this patch does not avoid
> > the delay if multiple LUNs are associated with the same SCSI host. Consider
> > e.g. the following configuration:
> > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda
> >   and /dev/sdb.
> > * A dm-mpath instance has been created on top of /dev/sda.
> > If all tags are in use by requests queued to /dev/sdb, no dm requests are in
> > progress and a request is submitted against the dm-mpath device then the
> > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued
> > and the queue will be rerun after a delay.
> > 
> > My patch does not introduce a delay in this case.
> 
> That delay may not matter because SCHED_RESTART will run queue just
> after one request is completed.

Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not
for the dm queue. That's what I was trying to explain to you in my previous e-mail.

> There is at least one issue with get_request(GFP_NOIO): AIO
> performance regression may not be caused, or even AIO may not
> be possible. For example, user runs fio(libaio, randread, single
> job, queue depth: 64, device: dm-mpath disk), if get_request(GFP_NOIO)
> often blocks because of shared tags or out of tag, the actual queue
> depth won't reach 64 at all, and may be just 1 in the worst case.
> Once the actual queue depth is decreased much, random I/O performance
> should be hurt a lot.

That's why we need to modify scsi_lld_busy(). If scsi_lld_busy() will be
modified as I proposed in a previous e-mail then it will become very
unlikely that no tag is available when blk_get_request() is called. With that
scsi_lld_busy() modification it is even possible that we don't need to modify
the dm-mpath driver.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Ming Lei Sept. 25, 2017, 3:06 a.m. UTC | #4
On Fri, Sep 22, 2017 at 05:54:48PM +0000, Bart Van Assche wrote:
> On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote:
> > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote:
> > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote:
> > > > +		/*
> > > > +		 * blk-mq's SCHED_RESTART can cover this requeue, so
> > > > +		 * we needn't to deal with it by DELAY_REQUEUE. More
> > > > +		 * importantly, we have to return DM_MAPIO_REQUEUE
> > > > +		 * so that blk-mq can get the queue busy feedback,
> > > > +		 * otherwise I/O merge can be hurt.
> > > > +		 */
> > > > +		if (q->mq_ops)
> > > > +			return DM_MAPIO_REQUEUE;
> > > > +		else
> > > > +			return DM_MAPIO_DELAY_REQUEUE;
> > > >  	}
> > > 
> > > This patch is inferior to what I posted because this patch does not avoid
> > > the delay if multiple LUNs are associated with the same SCSI host. Consider
> > > e.g. the following configuration:
> > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda
> > >   and /dev/sdb.
> > > * A dm-mpath instance has been created on top of /dev/sda.
> > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in
> > > progress and a request is submitted against the dm-mpath device then the
> > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued
> > > and the queue will be rerun after a delay.
> > > 
> > > My patch does not introduce a delay in this case.
> > 
> > That delay may not matter because SCHED_RESTART will run queue just
> > after one request is completed.
> 
> Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not
> for the dm queue. That's what I was trying to explain to you in my previous e-mail.

The patch I posted in this thread will set SCHED_RESTART for dm queue.

> 
> > There is at least one issue with get_request(GFP_NOIO): AIO
> > performance regression may not be caused, or even AIO may not
> > be possible. For example, user runs fio(libaio, randread, single
> > job, queue depth: 64, device: dm-mpath disk), if get_request(GFP_NOIO)
> > often blocks because of shared tags or out of tag, the actual queue
> > depth won't reach 64 at all, and may be just 1 in the worst case.
> > Once the actual queue depth is decreased much, random I/O performance
> > should be hurt a lot.
> 
> That's why we need to modify scsi_lld_busy(). If scsi_lld_busy() will be
> modified as I proposed in a previous e-mail then it will become very
> unlikely that no tag is available when blk_get_request() is called. With that
> scsi_lld_busy() modification it is even possible that we don't need to modify
> the dm-mpath driver.

Then post out a whole solution, and I'd like to take a look and test.
Bart Van Assche Sept. 25, 2017, 3:23 p.m. UTC | #5
On Mon, 2017-09-25 at 11:06 +0800, Ming Lei wrote:
> On Fri, Sep 22, 2017 at 05:54:48PM +0000, Bart Van Assche wrote:
> > On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote:
> > > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote:
> > > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote:
> > > > > +		/*
> > > > > +		 * blk-mq's SCHED_RESTART can cover this requeue, so
> > > > > +		 * we needn't to deal with it by DELAY_REQUEUE. More
> > > > > +		 * importantly, we have to return DM_MAPIO_REQUEUE
> > > > > +		 * so that blk-mq can get the queue busy feedback,
> > > > > +		 * otherwise I/O merge can be hurt.
> > > > > +		 */
> > > > > +		if (q->mq_ops)
> > > > > +			return DM_MAPIO_REQUEUE;
> > > > > +		else
> > > > > +			return DM_MAPIO_DELAY_REQUEUE;
> > > > >  	}
> > > > 
> > > > This patch is inferior to what I posted because this patch does not avoid
> > > > the delay if multiple LUNs are associated with the same SCSI host. Consider
> > > > e.g. the following configuration:
> > > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda
> > > >   and /dev/sdb.
> > > > * A dm-mpath instance has been created on top of /dev/sda.
> > > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in
> > > > progress and a request is submitted against the dm-mpath device then the
> > > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued
> > > > and the queue will be rerun after a delay.
> > > > 
> > > > My patch does not introduce a delay in this case.
> > > 
> > > That delay may not matter because SCHED_RESTART will run queue just
> > > after one request is completed.
> > 
> > Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not
> > for the dm queue. That's what I was trying to explain to you in my previous e-mail.
> 
> The patch I posted in this thread will set SCHED_RESTART for dm queue.

This is not how communication on an open source mailing list is assumed to work.
If you know that you are wrong you are assumed either to shut up or to admit it.
And if you disagree with the detailed explanation I gave you are assumed to
explain in detail why you think it is wrong.

> > > There is at least one issue with get_request(GFP_NOIO): AIO
> > > performance regression may not be caused, or even AIO may not
> > > be possible. For example, user runs fio(libaio, randread, single
> > > job, queue depth: 64, device: dm-mpath disk), if get_request(GFP_NOIO)
> > > often blocks because of shared tags or out of tag, the actual queue
> > > depth won't reach 64 at all, and may be just 1 in the worst case.
> > > Once the actual queue depth is decreased much, random I/O performance
> > > should be hurt a lot.
> > 
> > That's why we need to modify scsi_lld_busy(). If scsi_lld_busy() will be
> > modified as I proposed in a previous e-mail then it will become very
> > unlikely that no tag is available when blk_get_request() is called. With that
> > scsi_lld_busy() modification it is even possible that we don't need to modify
> > the dm-mpath driver.
> 
> Then post out a whole solution, and I'd like to take a look and test.

I will do that as soon as I have the time to run some measurements. The rest of
this week I will be traveling.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Ming Lei Sept. 25, 2017, 4:10 p.m. UTC | #6
On Mon, Sep 25, 2017 at 03:23:16PM +0000, Bart Van Assche wrote:
> On Mon, 2017-09-25 at 11:06 +0800, Ming Lei wrote:
> > On Fri, Sep 22, 2017 at 05:54:48PM +0000, Bart Van Assche wrote:
> > > On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote:
> > > > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote:
> > > > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote:
> > > > > > +		/*
> > > > > > +		 * blk-mq's SCHED_RESTART can cover this requeue, so
> > > > > > +		 * we needn't to deal with it by DELAY_REQUEUE. More
> > > > > > +		 * importantly, we have to return DM_MAPIO_REQUEUE
> > > > > > +		 * so that blk-mq can get the queue busy feedback,
> > > > > > +		 * otherwise I/O merge can be hurt.
> > > > > > +		 */
> > > > > > +		if (q->mq_ops)
> > > > > > +			return DM_MAPIO_REQUEUE;
> > > > > > +		else
> > > > > > +			return DM_MAPIO_DELAY_REQUEUE;
> > > > > >  	}
> > > > > 
> > > > > This patch is inferior to what I posted because this patch does not avoid
> > > > > the delay if multiple LUNs are associated with the same SCSI host. Consider
> > > > > e.g. the following configuration:
> > > > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda
> > > > >   and /dev/sdb.
> > > > > * A dm-mpath instance has been created on top of /dev/sda.
> > > > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in
> > > > > progress and a request is submitted against the dm-mpath device then the
> > > > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued
> > > > > and the queue will be rerun after a delay.
> > > > > 
> > > > > My patch does not introduce a delay in this case.
> > > > 
> > > > That delay may not matter because SCHED_RESTART will run queue just
> > > > after one request is completed.
> > > 
> > > Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not
> > > for the dm queue. That's what I was trying to explain to you in my previous e-mail.
> > 
> > The patch I posted in this thread will set SCHED_RESTART for dm queue.
> 
> This is not how communication on an open source mailing list is assumed to work.
> If you know that you are wrong you are assumed either to shut up or to admit it.

You just mentioned 'This patch is inferior' and never explained my patch
is wrong, so please go ahead and show me why this patch(the post in this
thread, also the following link) is wrong.

	https://marc.info/?l=linux-block&m=150604412910113&w=2

I admit both are two ways for the issue, but I don't think my patch
is wrong. Your approach can be a very big change because .queue_rq()
will block, and I also mentioned it might cause AIO regression.

I don't understand why you say there is communication issue since
no much discussion yet in this thread.

Please see the whole discussion:

	https://marc.info/?t=150604442500001&r=1&w=2

> And if you disagree with the detailed explanation I gave you are assumed to
> explain in detail why you think it is wrong.
Mike Snitzer Sept. 25, 2017, 4:17 p.m. UTC | #7
On Mon, Sep 25 2017 at 12:10pm -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> On Mon, Sep 25, 2017 at 03:23:16PM +0000, Bart Van Assche wrote:
> > On Mon, 2017-09-25 at 11:06 +0800, Ming Lei wrote:
> > > On Fri, Sep 22, 2017 at 05:54:48PM +0000, Bart Van Assche wrote:
> > > > On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote:
> > > > > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote:
> > > > > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote:
> > > > > > > +		/*
> > > > > > > +		 * blk-mq's SCHED_RESTART can cover this requeue, so
> > > > > > > +		 * we needn't to deal with it by DELAY_REQUEUE. More
> > > > > > > +		 * importantly, we have to return DM_MAPIO_REQUEUE
> > > > > > > +		 * so that blk-mq can get the queue busy feedback,
> > > > > > > +		 * otherwise I/O merge can be hurt.
> > > > > > > +		 */
> > > > > > > +		if (q->mq_ops)
> > > > > > > +			return DM_MAPIO_REQUEUE;
> > > > > > > +		else
> > > > > > > +			return DM_MAPIO_DELAY_REQUEUE;
> > > > > > >  	}
> > > > > > 
> > > > > > This patch is inferior to what I posted because this patch does not avoid
> > > > > > the delay if multiple LUNs are associated with the same SCSI host. Consider
> > > > > > e.g. the following configuration:
> > > > > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda
> > > > > >   and /dev/sdb.
> > > > > > * A dm-mpath instance has been created on top of /dev/sda.
> > > > > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in
> > > > > > progress and a request is submitted against the dm-mpath device then the
> > > > > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued
> > > > > > and the queue will be rerun after a delay.
> > > > > > 
> > > > > > My patch does not introduce a delay in this case.
> > > > > 
> > > > > That delay may not matter because SCHED_RESTART will run queue just
> > > > > after one request is completed.
> > > > 
> > > > Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not
> > > > for the dm queue. That's what I was trying to explain to you in my previous e-mail.
> > > 
> > > The patch I posted in this thread will set SCHED_RESTART for dm queue.
> > 
> > This is not how communication on an open source mailing list is assumed to work.
> > If you know that you are wrong you are assumed either to shut up or to admit it.

Code speaks much better than unnecessarily caustic exchanges.  Not sure
why Bart persists with that.. but that's for him to sort out.
 
> You just mentioned 'This patch is inferior' and never explained my patch
> is wrong, so please go ahead and show me why this patch(the post in this
> thread, also the following link) is wrong.
> 
> 	https://marc.info/?l=linux-block&m=150604412910113&w=2
> 
> I admit both are two ways for the issue, but I don't think my patch
> is wrong. Your approach can be a very big change because .queue_rq()
> will block, and I also mentioned it might cause AIO regression.

I have no interest in changing DM multipath to block in .queue_rq()
So please consider that approach a dead end.

Ming, just iterate on your revised patchset, test and post when you're
happy with it.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Ming Lei Sept. 26, 2017, 8:50 a.m. UTC | #8
On Mon, Sep 25, 2017 at 12:17:03PM -0400, Mike Snitzer wrote:
> On Mon, Sep 25 2017 at 12:10pm -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On Mon, Sep 25, 2017 at 03:23:16PM +0000, Bart Van Assche wrote:
> > > On Mon, 2017-09-25 at 11:06 +0800, Ming Lei wrote:
> > > > On Fri, Sep 22, 2017 at 05:54:48PM +0000, Bart Van Assche wrote:
> > > > > On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote:
> > > > > > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote:
> > > > > > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote:
> > > > > > > > +		/*
> > > > > > > > +		 * blk-mq's SCHED_RESTART can cover this requeue, so
> > > > > > > > +		 * we needn't to deal with it by DELAY_REQUEUE. More
> > > > > > > > +		 * importantly, we have to return DM_MAPIO_REQUEUE
> > > > > > > > +		 * so that blk-mq can get the queue busy feedback,
> > > > > > > > +		 * otherwise I/O merge can be hurt.
> > > > > > > > +		 */
> > > > > > > > +		if (q->mq_ops)
> > > > > > > > +			return DM_MAPIO_REQUEUE;
> > > > > > > > +		else
> > > > > > > > +			return DM_MAPIO_DELAY_REQUEUE;
> > > > > > > >  	}
> > > > > > > 
> > > > > > > This patch is inferior to what I posted because this patch does not avoid
> > > > > > > the delay if multiple LUNs are associated with the same SCSI host. Consider
> > > > > > > e.g. the following configuration:
> > > > > > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda
> > > > > > >   and /dev/sdb.
> > > > > > > * A dm-mpath instance has been created on top of /dev/sda.
> > > > > > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in
> > > > > > > progress and a request is submitted against the dm-mpath device then the
> > > > > > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued
> > > > > > > and the queue will be rerun after a delay.
> > > > > > > 
> > > > > > > My patch does not introduce a delay in this case.
> > > > > > 
> > > > > > That delay may not matter because SCHED_RESTART will run queue just
> > > > > > after one request is completed.
> > > > > 
> > > > > Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not
> > > > > for the dm queue. That's what I was trying to explain to you in my previous e-mail.
> > > > 
> > > > The patch I posted in this thread will set SCHED_RESTART for dm queue.
> > > 
> > > This is not how communication on an open source mailing list is assumed to work.
> > > If you know that you are wrong you are assumed either to shut up or to admit it.
> 
> Code speaks much better than unnecessarily caustic exchanges.  Not sure
> why Bart persists with that.. but that's for him to sort out.
>  
> > You just mentioned 'This patch is inferior' and never explained my patch
> > is wrong, so please go ahead and show me why this patch(the post in this
> > thread, also the following link) is wrong.
> > 
> > 	https://marc.info/?l=linux-block&m=150604412910113&w=2
> > 
> > I admit both are two ways for the issue, but I don't think my patch
> > is wrong. Your approach can be a very big change because .queue_rq()
> > will block, and I also mentioned it might cause AIO regression.
> 
> I have no interest in changing DM multipath to block in .queue_rq()
> So please consider that approach a dead end.
> 
> Ming, just iterate on your revised patchset, test and post when you're
> happy with it.

Hi Mike,

If you don't consider to change mpath into block in .queue_rq() now,
please take this patch first, and I am sure this way is correct, and
even it can be thought as a fix.


Thanks,
Ming

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Sept. 26, 2017, 1:55 p.m. UTC | #9
On Tue, 2017-09-26 at 16:50 +0800, Ming Lei wrote:
> If you don't consider to change mpath into block in .queue_rq() now,
> please take this patch first, and I am sure this way is correct, and
> even it can be thought as a fix.

As I explained earlier, it would be wrong to take this patch upstream without
having tested and measured the alternative I had proposed first
(https://www.spinics.net/lists/dm-devel/msg32282.html). My alternative handles
SCSI hosts with multiple LUNs properly but the patch at the start of this
e-mail thread not.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 11f273d2f018..0902f7762306 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -504,8 +504,20 @@  static int multipath_clone_and_map(struct dm_target *ti, struct request *rq,
 		if (queue_dying) {
 			atomic_inc(&m->pg_init_in_progress);
 			activate_or_offline_path(pgpath);
+			return DM_MAPIO_DELAY_REQUEUE;
 		}
-		return DM_MAPIO_DELAY_REQUEUE;
+
+		/*
+		 * blk-mq's SCHED_RESTART can cover this requeue, so
+		 * we needn't to deal with it by DELAY_REQUEUE. More
+		 * importantly, we have to return DM_MAPIO_REQUEUE
+		 * so that blk-mq can get the queue busy feedback,
+		 * otherwise I/O merge can be hurt.
+		 */
+		if (q->mq_ops)
+			return DM_MAPIO_REQUEUE;
+		else
+			return DM_MAPIO_DELAY_REQUEUE;
 	}
 	clone->bio = clone->biotail = NULL;
 	clone->rq_disk = bdev->bd_disk;