Message ID | 20170922013506.22510-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | Rejected, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote: > + /* > + * blk-mq's SCHED_RESTART can cover this requeue, so > + * we needn't to deal with it by DELAY_REQUEUE. More > + * importantly, we have to return DM_MAPIO_REQUEUE > + * so that blk-mq can get the queue busy feedback, > + * otherwise I/O merge can be hurt. > + */ > + if (q->mq_ops) > + return DM_MAPIO_REQUEUE; > + else > + return DM_MAPIO_DELAY_REQUEUE; > } This patch is inferior to what I posted because this patch does not avoid the delay if multiple LUNs are associated with the same SCSI host. Consider e.g. the following configuration: * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda and /dev/sdb. * A dm-mpath instance has been created on top of /dev/sda. If all tags are in use by requests queued to /dev/sdb, no dm requests are in progress and a request is submitted against the dm-mpath device then the blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued and the queue will be rerun after a delay. My patch does not introduce a delay in this case. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote: > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote: > > + /* > > + * blk-mq's SCHED_RESTART can cover this requeue, so > > + * we needn't to deal with it by DELAY_REQUEUE. More > > + * importantly, we have to return DM_MAPIO_REQUEUE > > + * so that blk-mq can get the queue busy feedback, > > + * otherwise I/O merge can be hurt. > > + */ > > + if (q->mq_ops) > > + return DM_MAPIO_REQUEUE; > > + else > > + return DM_MAPIO_DELAY_REQUEUE; > > } > > This patch is inferior to what I posted because this patch does not avoid > the delay if multiple LUNs are associated with the same SCSI host. Consider > e.g. the following configuration: > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda > and /dev/sdb. > * A dm-mpath instance has been created on top of /dev/sda. > If all tags are in use by requests queued to /dev/sdb, no dm requests are in > progress and a request is submitted against the dm-mpath device then the > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued > and the queue will be rerun after a delay. > > My patch does not introduce a delay in this case. That delay may not matter because SCHED_RESTART will run queue just after one request is completed. There is at least one issue with get_request(GFP_NOIO): AIO performance regression may not be caused, or even AIO may not be possible. For example, user runs fio(libaio, randread, single job, queue depth: 64, device: dm-mpath disk), if get_request(GFP_NOIO) often blocks because of shared tags or out of tag, the actual queue depth won't reach 64 at all, and may be just 1 in the worst case. Once the actual queue depth is decreased much, random I/O performance should be hurt a lot.
On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote: > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote: > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote: > > > + /* > > > + * blk-mq's SCHED_RESTART can cover this requeue, so > > > + * we needn't to deal with it by DELAY_REQUEUE. More > > > + * importantly, we have to return DM_MAPIO_REQUEUE > > > + * so that blk-mq can get the queue busy feedback, > > > + * otherwise I/O merge can be hurt. > > > + */ > > > + if (q->mq_ops) > > > + return DM_MAPIO_REQUEUE; > > > + else > > > + return DM_MAPIO_DELAY_REQUEUE; > > > } > > > > This patch is inferior to what I posted because this patch does not avoid > > the delay if multiple LUNs are associated with the same SCSI host. Consider > > e.g. the following configuration: > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda > > and /dev/sdb. > > * A dm-mpath instance has been created on top of /dev/sda. > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in > > progress and a request is submitted against the dm-mpath device then the > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued > > and the queue will be rerun after a delay. > > > > My patch does not introduce a delay in this case. > > That delay may not matter because SCHED_RESTART will run queue just > after one request is completed. Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not for the dm queue. That's what I was trying to explain to you in my previous e-mail. > There is at least one issue with get_request(GFP_NOIO): AIO > performance regression may not be caused, or even AIO may not > be possible. For example, user runs fio(libaio, randread, single > job, queue depth: 64, device: dm-mpath disk), if get_request(GFP_NOIO) > often blocks because of shared tags or out of tag, the actual queue > depth won't reach 64 at all, and may be just 1 in the worst case. > Once the actual queue depth is decreased much, random I/O performance > should be hurt a lot. That's why we need to modify scsi_lld_busy(). If scsi_lld_busy() will be modified as I proposed in a previous e-mail then it will become very unlikely that no tag is available when blk_get_request() is called. With that scsi_lld_busy() modification it is even possible that we don't need to modify the dm-mpath driver. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Fri, Sep 22, 2017 at 05:54:48PM +0000, Bart Van Assche wrote: > On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote: > > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote: > > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote: > > > > + /* > > > > + * blk-mq's SCHED_RESTART can cover this requeue, so > > > > + * we needn't to deal with it by DELAY_REQUEUE. More > > > > + * importantly, we have to return DM_MAPIO_REQUEUE > > > > + * so that blk-mq can get the queue busy feedback, > > > > + * otherwise I/O merge can be hurt. > > > > + */ > > > > + if (q->mq_ops) > > > > + return DM_MAPIO_REQUEUE; > > > > + else > > > > + return DM_MAPIO_DELAY_REQUEUE; > > > > } > > > > > > This patch is inferior to what I posted because this patch does not avoid > > > the delay if multiple LUNs are associated with the same SCSI host. Consider > > > e.g. the following configuration: > > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda > > > and /dev/sdb. > > > * A dm-mpath instance has been created on top of /dev/sda. > > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in > > > progress and a request is submitted against the dm-mpath device then the > > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued > > > and the queue will be rerun after a delay. > > > > > > My patch does not introduce a delay in this case. > > > > That delay may not matter because SCHED_RESTART will run queue just > > after one request is completed. > > Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not > for the dm queue. That's what I was trying to explain to you in my previous e-mail. The patch I posted in this thread will set SCHED_RESTART for dm queue. > > > There is at least one issue with get_request(GFP_NOIO): AIO > > performance regression may not be caused, or even AIO may not > > be possible. For example, user runs fio(libaio, randread, single > > job, queue depth: 64, device: dm-mpath disk), if get_request(GFP_NOIO) > > often blocks because of shared tags or out of tag, the actual queue > > depth won't reach 64 at all, and may be just 1 in the worst case. > > Once the actual queue depth is decreased much, random I/O performance > > should be hurt a lot. > > That's why we need to modify scsi_lld_busy(). If scsi_lld_busy() will be > modified as I proposed in a previous e-mail then it will become very > unlikely that no tag is available when blk_get_request() is called. With that > scsi_lld_busy() modification it is even possible that we don't need to modify > the dm-mpath driver. Then post out a whole solution, and I'd like to take a look and test.
On Mon, 2017-09-25 at 11:06 +0800, Ming Lei wrote: > On Fri, Sep 22, 2017 at 05:54:48PM +0000, Bart Van Assche wrote: > > On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote: > > > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote: > > > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote: > > > > > + /* > > > > > + * blk-mq's SCHED_RESTART can cover this requeue, so > > > > > + * we needn't to deal with it by DELAY_REQUEUE. More > > > > > + * importantly, we have to return DM_MAPIO_REQUEUE > > > > > + * so that blk-mq can get the queue busy feedback, > > > > > + * otherwise I/O merge can be hurt. > > > > > + */ > > > > > + if (q->mq_ops) > > > > > + return DM_MAPIO_REQUEUE; > > > > > + else > > > > > + return DM_MAPIO_DELAY_REQUEUE; > > > > > } > > > > > > > > This patch is inferior to what I posted because this patch does not avoid > > > > the delay if multiple LUNs are associated with the same SCSI host. Consider > > > > e.g. the following configuration: > > > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda > > > > and /dev/sdb. > > > > * A dm-mpath instance has been created on top of /dev/sda. > > > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in > > > > progress and a request is submitted against the dm-mpath device then the > > > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued > > > > and the queue will be rerun after a delay. > > > > > > > > My patch does not introduce a delay in this case. > > > > > > That delay may not matter because SCHED_RESTART will run queue just > > > after one request is completed. > > > > Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not > > for the dm queue. That's what I was trying to explain to you in my previous e-mail. > > The patch I posted in this thread will set SCHED_RESTART for dm queue. This is not how communication on an open source mailing list is assumed to work. If you know that you are wrong you are assumed either to shut up or to admit it. And if you disagree with the detailed explanation I gave you are assumed to explain in detail why you think it is wrong. > > > There is at least one issue with get_request(GFP_NOIO): AIO > > > performance regression may not be caused, or even AIO may not > > > be possible. For example, user runs fio(libaio, randread, single > > > job, queue depth: 64, device: dm-mpath disk), if get_request(GFP_NOIO) > > > often blocks because of shared tags or out of tag, the actual queue > > > depth won't reach 64 at all, and may be just 1 in the worst case. > > > Once the actual queue depth is decreased much, random I/O performance > > > should be hurt a lot. > > > > That's why we need to modify scsi_lld_busy(). If scsi_lld_busy() will be > > modified as I proposed in a previous e-mail then it will become very > > unlikely that no tag is available when blk_get_request() is called. With that > > scsi_lld_busy() modification it is even possible that we don't need to modify > > the dm-mpath driver. > > Then post out a whole solution, and I'd like to take a look and test. I will do that as soon as I have the time to run some measurements. The rest of this week I will be traveling. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Sep 25, 2017 at 03:23:16PM +0000, Bart Van Assche wrote: > On Mon, 2017-09-25 at 11:06 +0800, Ming Lei wrote: > > On Fri, Sep 22, 2017 at 05:54:48PM +0000, Bart Van Assche wrote: > > > On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote: > > > > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote: > > > > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote: > > > > > > + /* > > > > > > + * blk-mq's SCHED_RESTART can cover this requeue, so > > > > > > + * we needn't to deal with it by DELAY_REQUEUE. More > > > > > > + * importantly, we have to return DM_MAPIO_REQUEUE > > > > > > + * so that blk-mq can get the queue busy feedback, > > > > > > + * otherwise I/O merge can be hurt. > > > > > > + */ > > > > > > + if (q->mq_ops) > > > > > > + return DM_MAPIO_REQUEUE; > > > > > > + else > > > > > > + return DM_MAPIO_DELAY_REQUEUE; > > > > > > } > > > > > > > > > > This patch is inferior to what I posted because this patch does not avoid > > > > > the delay if multiple LUNs are associated with the same SCSI host. Consider > > > > > e.g. the following configuration: > > > > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda > > > > > and /dev/sdb. > > > > > * A dm-mpath instance has been created on top of /dev/sda. > > > > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in > > > > > progress and a request is submitted against the dm-mpath device then the > > > > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued > > > > > and the queue will be rerun after a delay. > > > > > > > > > > My patch does not introduce a delay in this case. > > > > > > > > That delay may not matter because SCHED_RESTART will run queue just > > > > after one request is completed. > > > > > > Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not > > > for the dm queue. That's what I was trying to explain to you in my previous e-mail. > > > > The patch I posted in this thread will set SCHED_RESTART for dm queue. > > This is not how communication on an open source mailing list is assumed to work. > If you know that you are wrong you are assumed either to shut up or to admit it. You just mentioned 'This patch is inferior' and never explained my patch is wrong, so please go ahead and show me why this patch(the post in this thread, also the following link) is wrong. https://marc.info/?l=linux-block&m=150604412910113&w=2 I admit both are two ways for the issue, but I don't think my patch is wrong. Your approach can be a very big change because .queue_rq() will block, and I also mentioned it might cause AIO regression. I don't understand why you say there is communication issue since no much discussion yet in this thread. Please see the whole discussion: https://marc.info/?t=150604442500001&r=1&w=2 > And if you disagree with the detailed explanation I gave you are assumed to > explain in detail why you think it is wrong.
On Mon, Sep 25 2017 at 12:10pm -0400, Ming Lei <ming.lei@redhat.com> wrote: > On Mon, Sep 25, 2017 at 03:23:16PM +0000, Bart Van Assche wrote: > > On Mon, 2017-09-25 at 11:06 +0800, Ming Lei wrote: > > > On Fri, Sep 22, 2017 at 05:54:48PM +0000, Bart Van Assche wrote: > > > > On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote: > > > > > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote: > > > > > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote: > > > > > > > + /* > > > > > > > + * blk-mq's SCHED_RESTART can cover this requeue, so > > > > > > > + * we needn't to deal with it by DELAY_REQUEUE. More > > > > > > > + * importantly, we have to return DM_MAPIO_REQUEUE > > > > > > > + * so that blk-mq can get the queue busy feedback, > > > > > > > + * otherwise I/O merge can be hurt. > > > > > > > + */ > > > > > > > + if (q->mq_ops) > > > > > > > + return DM_MAPIO_REQUEUE; > > > > > > > + else > > > > > > > + return DM_MAPIO_DELAY_REQUEUE; > > > > > > > } > > > > > > > > > > > > This patch is inferior to what I posted because this patch does not avoid > > > > > > the delay if multiple LUNs are associated with the same SCSI host. Consider > > > > > > e.g. the following configuration: > > > > > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda > > > > > > and /dev/sdb. > > > > > > * A dm-mpath instance has been created on top of /dev/sda. > > > > > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in > > > > > > progress and a request is submitted against the dm-mpath device then the > > > > > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued > > > > > > and the queue will be rerun after a delay. > > > > > > > > > > > > My patch does not introduce a delay in this case. > > > > > > > > > > That delay may not matter because SCHED_RESTART will run queue just > > > > > after one request is completed. > > > > > > > > Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not > > > > for the dm queue. That's what I was trying to explain to you in my previous e-mail. > > > > > > The patch I posted in this thread will set SCHED_RESTART for dm queue. > > > > This is not how communication on an open source mailing list is assumed to work. > > If you know that you are wrong you are assumed either to shut up or to admit it. Code speaks much better than unnecessarily caustic exchanges. Not sure why Bart persists with that.. but that's for him to sort out. > You just mentioned 'This patch is inferior' and never explained my patch > is wrong, so please go ahead and show me why this patch(the post in this > thread, also the following link) is wrong. > > https://marc.info/?l=linux-block&m=150604412910113&w=2 > > I admit both are two ways for the issue, but I don't think my patch > is wrong. Your approach can be a very big change because .queue_rq() > will block, and I also mentioned it might cause AIO regression. I have no interest in changing DM multipath to block in .queue_rq() So please consider that approach a dead end. Ming, just iterate on your revised patchset, test and post when you're happy with it. Thanks, Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Sep 25, 2017 at 12:17:03PM -0400, Mike Snitzer wrote: > On Mon, Sep 25 2017 at 12:10pm -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > On Mon, Sep 25, 2017 at 03:23:16PM +0000, Bart Van Assche wrote: > > > On Mon, 2017-09-25 at 11:06 +0800, Ming Lei wrote: > > > > On Fri, Sep 22, 2017 at 05:54:48PM +0000, Bart Van Assche wrote: > > > > > On Sat, 2017-09-23 at 01:44 +0800, Ming Lei wrote: > > > > > > On Fri, Sep 22, 2017 at 03:06:16PM +0000, Bart Van Assche wrote: > > > > > > > On Fri, 2017-09-22 at 09:35 +0800, Ming Lei wrote: > > > > > > > > + /* > > > > > > > > + * blk-mq's SCHED_RESTART can cover this requeue, so > > > > > > > > + * we needn't to deal with it by DELAY_REQUEUE. More > > > > > > > > + * importantly, we have to return DM_MAPIO_REQUEUE > > > > > > > > + * so that blk-mq can get the queue busy feedback, > > > > > > > > + * otherwise I/O merge can be hurt. > > > > > > > > + */ > > > > > > > > + if (q->mq_ops) > > > > > > > > + return DM_MAPIO_REQUEUE; > > > > > > > > + else > > > > > > > > + return DM_MAPIO_DELAY_REQUEUE; > > > > > > > > } > > > > > > > > > > > > > > This patch is inferior to what I posted because this patch does not avoid > > > > > > > the delay if multiple LUNs are associated with the same SCSI host. Consider > > > > > > > e.g. the following configuration: > > > > > > > * A single SCSI host with two SCSI LUNs associated to that host, e.g. /dev/sda > > > > > > > and /dev/sdb. > > > > > > > * A dm-mpath instance has been created on top of /dev/sda. > > > > > > > If all tags are in use by requests queued to /dev/sdb, no dm requests are in > > > > > > > progress and a request is submitted against the dm-mpath device then the > > > > > > > blk_get_request(q, GFP_ATOMIC) call will fail. The request will be requeued > > > > > > > and the queue will be rerun after a delay. > > > > > > > > > > > > > > My patch does not introduce a delay in this case. > > > > > > > > > > > > That delay may not matter because SCHED_RESTART will run queue just > > > > > > after one request is completed. > > > > > > > > > > Did you understand what I wrote? SCHED_RESTART will be set for /dev/sdb but not > > > > > for the dm queue. That's what I was trying to explain to you in my previous e-mail. > > > > > > > > The patch I posted in this thread will set SCHED_RESTART for dm queue. > > > > > > This is not how communication on an open source mailing list is assumed to work. > > > If you know that you are wrong you are assumed either to shut up or to admit it. > > Code speaks much better than unnecessarily caustic exchanges. Not sure > why Bart persists with that.. but that's for him to sort out. > > > You just mentioned 'This patch is inferior' and never explained my patch > > is wrong, so please go ahead and show me why this patch(the post in this > > thread, also the following link) is wrong. > > > > https://marc.info/?l=linux-block&m=150604412910113&w=2 > > > > I admit both are two ways for the issue, but I don't think my patch > > is wrong. Your approach can be a very big change because .queue_rq() > > will block, and I also mentioned it might cause AIO regression. > > I have no interest in changing DM multipath to block in .queue_rq() > So please consider that approach a dead end. > > Ming, just iterate on your revised patchset, test and post when you're > happy with it. Hi Mike, If you don't consider to change mpath into block in .queue_rq() now, please take this patch first, and I am sure this way is correct, and even it can be thought as a fix. Thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, 2017-09-26 at 16:50 +0800, Ming Lei wrote: > If you don't consider to change mpath into block in .queue_rq() now, > please take this patch first, and I am sure this way is correct, and > even it can be thought as a fix. As I explained earlier, it would be wrong to take this patch upstream without having tested and measured the alternative I had proposed first (https://www.spinics.net/lists/dm-devel/msg32282.html). My alternative handles SCSI hosts with multiple LUNs properly but the patch at the start of this e-mail thread not. Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 11f273d2f018..0902f7762306 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -504,8 +504,20 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq, if (queue_dying) { atomic_inc(&m->pg_init_in_progress); activate_or_offline_path(pgpath); + return DM_MAPIO_DELAY_REQUEUE; } - return DM_MAPIO_DELAY_REQUEUE; + + /* + * blk-mq's SCHED_RESTART can cover this requeue, so + * we needn't to deal with it by DELAY_REQUEUE. More + * importantly, we have to return DM_MAPIO_REQUEUE + * so that blk-mq can get the queue busy feedback, + * otherwise I/O merge can be hurt. + */ + if (q->mq_ops) + return DM_MAPIO_REQUEUE; + else + return DM_MAPIO_DELAY_REQUEUE; } clone->bio = clone->biotail = NULL; clone->rq_disk = bdev->bd_disk;
blk-mq will rerun queue via RESTART after one request is completed, so not necessary to wait random time for requeuing, we should trust blk-mq to do it. More importantly, we need return BLK_STS_RESOURCE to blk-mq so that dequeue from I/O scheduler can be stopped, then I/O merge gets improved. Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/md/dm-mpath.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)