[RFC,2/7] blk-mq: add blk-mq flag of BLK_MQ_F_NO_MANAGED_IRQ
diff mbox series

Message ID 20190712024726.1227-3-ming.lei@redhat.com
State New
Headers show
Series
  • blk-mq: improvement on handling IO during CPU hotplug
Related show

Commit Message

Ming Lei July 12, 2019, 2:47 a.m. UTC
We will stop hw queue and wait for completion of in-flight requests
when one hctx is becoming dead in the following patch. This way may
cause dead-lock for some stacking blk-mq drivers, such as dm-rq and
loop.

Add blk-mq flag of BLK_MQ_F_NO_MANAGED_IRQ and mark it for dm-rq and
loop, so we needn't to wait for completion of in-flight requests of
dm-rq & loop, then the potential dead-lock can be avoided.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c | 1 +
 drivers/block/loop.c   | 2 +-
 drivers/md/dm-rq.c     | 2 +-
 include/linux/blk-mq.h | 1 +
 4 files changed, 4 insertions(+), 2 deletions(-)

Comments

John Garry July 16, 2019, 2:53 a.m. UTC | #1
在 12/07/2019 10:47, Ming Lei 写道:
> We will stop hw queue and wait for completion of in-flight requests
> when one hctx is becoming dead in the following patch. This way may
> cause dead-lock for some stacking blk-mq drivers, such as dm-rq and
> loop.
> 
> Add blk-mq flag of BLK_MQ_F_NO_MANAGED_IRQ and mark it for dm-rq and
> loop, so we needn't to wait for completion of in-flight requests of
> dm-rq & loop, then the potential dead-lock can be avoided.

Wouldn't it make more sense to have the flag name be like 
BLK_MQ_F_DONT_DRAIN_STOPPED_HCTX?

I did not think that blk-mq is specifically concerned with managed 
interrupts, but only their indirect effect.

> 
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Keith Busch <keith.busch@intel.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-mq-debugfs.c | 1 +
>   drivers/block/loop.c   | 2 +-
>   drivers/md/dm-rq.c     | 2 +-
>   include/linux/blk-mq.h | 1 +
>   4 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index af40a02c46ee..24fff8c90942 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -240,6 +240,7 @@ static const char *const hctx_flag_name[] = {
>   	HCTX_FLAG_NAME(TAG_SHARED),
>   	HCTX_FLAG_NAME(BLOCKING),
>   	HCTX_FLAG_NAME(NO_SCHED),
> +	HCTX_FLAG_NAME(NO_MANAGED_IRQ),
>   };
>   #undef HCTX_FLAG_NAME
>   
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 44c9985f352a..199d76e8bf46 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1986,7 +1986,7 @@ static int loop_add(struct loop_device **l, int i)
>   	lo->tag_set.queue_depth = 128;
>   	lo->tag_set.numa_node = NUMA_NO_NODE;
>   	lo->tag_set.cmd_size = sizeof(struct loop_cmd);
> -	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> +	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;

nit: at this point in the series you're setting a flag which is never 
checked.

>   	lo->tag_set.driver_data = lo;
>   
>   	err = blk_mq_alloc_tag_set(&lo->tag_set);
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 5f7063f05ae0..f7fbef2d3cd7 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -546,7 +546,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
>   	md->tag_set->ops = &dm_mq_ops;
>   	md->tag_set->queue_depth = dm_get_blk_mq_queue_depth();
>   	md->tag_set->numa_node = md->numa_node_id;
> -	md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
> +	md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;
>   	md->tag_set->nr_hw_queues = dm_get_blk_mq_nr_hw_queues();
>   	md->tag_set->driver_data = md;
>   
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 3a731c3c0762..911cdc6479dc 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -219,6 +219,7 @@ struct blk_mq_ops {
>   enum {
>   	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
>   	BLK_MQ_F_TAG_SHARED	= 1 << 1,
> +	BLK_MQ_F_NO_MANAGED_IRQ	= 1 << 2,
>   	BLK_MQ_F_BLOCKING	= 1 << 5,
>   	BLK_MQ_F_NO_SCHED	= 1 << 6,
>   	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
>
Ming Lei July 16, 2019, 6:08 a.m. UTC | #2
On Tue, Jul 16, 2019 at 10:53:00AM +0800, John Garry wrote:
> 在 12/07/2019 10:47, Ming Lei 写道:
> > We will stop hw queue and wait for completion of in-flight requests
> > when one hctx is becoming dead in the following patch. This way may
> > cause dead-lock for some stacking blk-mq drivers, such as dm-rq and
> > loop.
> > 
> > Add blk-mq flag of BLK_MQ_F_NO_MANAGED_IRQ and mark it for dm-rq and
> > loop, so we needn't to wait for completion of in-flight requests of
> > dm-rq & loop, then the potential dead-lock can be avoided.
> 
> Wouldn't it make more sense to have the flag name be like
> BLK_MQ_F_DONT_DRAIN_STOPPED_HCTX?
> 
> I did not think that blk-mq is specifically concerned with managed
> interrupts, but only their indirect effect.

I am fine with either one, however it is easier for drivers to
recognize if this flag should be set, given BLK_MQ_F_NO_MANAGED_IRQ
is self-explained.

Also on the other side, this patchset serves a generic blk-mq fix
for managed IRQ issue, so it is reasonable for all drivers which
don't use managed IRQ to set the flag.

> 
> > 
> > Cc: Bart Van Assche <bvanassche@acm.org>
> > Cc: Hannes Reinecke <hare@suse.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Keith Busch <keith.busch@intel.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   block/blk-mq-debugfs.c | 1 +
> >   drivers/block/loop.c   | 2 +-
> >   drivers/md/dm-rq.c     | 2 +-
> >   include/linux/blk-mq.h | 1 +
> >   4 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> > index af40a02c46ee..24fff8c90942 100644
> > --- a/block/blk-mq-debugfs.c
> > +++ b/block/blk-mq-debugfs.c
> > @@ -240,6 +240,7 @@ static const char *const hctx_flag_name[] = {
> >   	HCTX_FLAG_NAME(TAG_SHARED),
> >   	HCTX_FLAG_NAME(BLOCKING),
> >   	HCTX_FLAG_NAME(NO_SCHED),
> > +	HCTX_FLAG_NAME(NO_MANAGED_IRQ),
> >   };
> >   #undef HCTX_FLAG_NAME
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index 44c9985f352a..199d76e8bf46 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -1986,7 +1986,7 @@ static int loop_add(struct loop_device **l, int i)
> >   	lo->tag_set.queue_depth = 128;
> >   	lo->tag_set.numa_node = NUMA_NO_NODE;
> >   	lo->tag_set.cmd_size = sizeof(struct loop_cmd);
> > -	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> > +	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;
> 
> nit: at this point in the series you're setting a flag which is never
> checked.

Yeah, I see, and this way is a bit easier for review purpose.

thanks,
Ming

Patch
diff mbox series

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index af40a02c46ee..24fff8c90942 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -240,6 +240,7 @@  static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(TAG_SHARED),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
+	HCTX_FLAG_NAME(NO_MANAGED_IRQ),
 };
 #undef HCTX_FLAG_NAME
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 44c9985f352a..199d76e8bf46 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1986,7 +1986,7 @@  static int loop_add(struct loop_device **l, int i)
 	lo->tag_set.queue_depth = 128;
 	lo->tag_set.numa_node = NUMA_NO_NODE;
 	lo->tag_set.cmd_size = sizeof(struct loop_cmd);
-	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+	lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;
 	lo->tag_set.driver_data = lo;
 
 	err = blk_mq_alloc_tag_set(&lo->tag_set);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 5f7063f05ae0..f7fbef2d3cd7 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -546,7 +546,7 @@  int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
 	md->tag_set->ops = &dm_mq_ops;
 	md->tag_set->queue_depth = dm_get_blk_mq_queue_depth();
 	md->tag_set->numa_node = md->numa_node_id;
-	md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
+	md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_MANAGED_IRQ;
 	md->tag_set->nr_hw_queues = dm_get_blk_mq_nr_hw_queues();
 	md->tag_set->driver_data = md;
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3a731c3c0762..911cdc6479dc 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -219,6 +219,7 @@  struct blk_mq_ops {
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
+	BLK_MQ_F_NO_MANAGED_IRQ	= 1 << 2,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,