mbox series

[0/3] blk-mq/dm-rq: support BLK_MQ_F_BLOCKING for dm-rq

Message ID 20211221141459.1368176-1-ming.lei@redhat.com (mailing list archive)
Headers show
Series blk-mq/dm-rq: support BLK_MQ_F_BLOCKING for dm-rq | expand

Message

Ming Lei Dec. 21, 2021, 2:14 p.m. UTC
Hello,

dm-rq may be built on blk-mq device which marks BLK_MQ_F_BLOCKING, so
dm_mq_queue_rq() may become to sleep current context.

Fixes the issue by allowing dm-rq to set BLK_MQ_F_BLOCKING in case that
any underlying queue is marked as BLK_MQ_F_BLOCKING.

DM request queue is allocated before allocating tagset, this way is a
bit special, so we need to pre-allocate srcu payload, then use the queue
flag of QUEUE_FLAG_BLOCKING for locking dispatch.


Ming Lei (3):
  block: split having srcu from queue blocking
  block: add blk_alloc_disk_srcu
  dm: mark dm queue as blocking if any underlying is blocking

 block/blk-core.c       |  2 +-
 block/blk-mq.c         |  6 +++---
 block/blk-mq.h         |  2 +-
 block/blk-sysfs.c      |  2 +-
 block/genhd.c          |  5 +++--
 drivers/md/dm-rq.c     |  5 ++++-
 drivers/md/dm-rq.h     |  3 ++-
 drivers/md/dm-table.c  | 14 ++++++++++++++
 drivers/md/dm.c        |  5 +++--
 drivers/md/dm.h        |  1 +
 include/linux/blkdev.h |  5 +++--
 include/linux/genhd.h  | 12 ++++++++----
 12 files changed, 44 insertions(+), 18 deletions(-)

Comments

Christoph Hellwig Dec. 21, 2021, 4:21 p.m. UTC | #1
On Tue, Dec 21, 2021 at 10:14:56PM +0800, Ming Lei wrote:
> Hello,
> 
> dm-rq may be built on blk-mq device which marks BLK_MQ_F_BLOCKING, so
> dm_mq_queue_rq() may become to sleep current context.
> 
> Fixes the issue by allowing dm-rq to set BLK_MQ_F_BLOCKING in case that
> any underlying queue is marked as BLK_MQ_F_BLOCKING.
> 
> DM request queue is allocated before allocating tagset, this way is a
> bit special, so we need to pre-allocate srcu payload, then use the queue
> flag of QUEUE_FLAG_BLOCKING for locking dispatch.

What is the benefit over just forcing bio-based dm-mpath for these
devices?

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Ming Lei Dec. 23, 2021, 4:16 a.m. UTC | #2
On Tue, Dec 21, 2021 at 08:21:39AM -0800, Christoph Hellwig wrote:
> On Tue, Dec 21, 2021 at 10:14:56PM +0800, Ming Lei wrote:
> > Hello,
> > 
> > dm-rq may be built on blk-mq device which marks BLK_MQ_F_BLOCKING, so
> > dm_mq_queue_rq() may become to sleep current context.
> > 
> > Fixes the issue by allowing dm-rq to set BLK_MQ_F_BLOCKING in case that
> > any underlying queue is marked as BLK_MQ_F_BLOCKING.
> > 
> > DM request queue is allocated before allocating tagset, this way is a
> > bit special, so we need to pre-allocate srcu payload, then use the queue
> > flag of QUEUE_FLAG_BLOCKING for locking dispatch.
> 
> What is the benefit over just forcing bio-based dm-mpath for these
> devices?

At least IO scheduler can't be used for bio based dm-mpath, also there should
be other drawbacks for bio based mpath and request mpath is often the default
option, maybe Mike has more input about bio vs request dm-mpath.



Thanks,
Ming

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Dec. 28, 2021, 9:30 p.m. UTC | #3
On Wed, Dec 22 2021 at 11:16P -0500,
Ming Lei <ming.lei@redhat.com> wrote:

> On Tue, Dec 21, 2021 at 08:21:39AM -0800, Christoph Hellwig wrote:
> > On Tue, Dec 21, 2021 at 10:14:56PM +0800, Ming Lei wrote:
> > > Hello,
> > > 
> > > dm-rq may be built on blk-mq device which marks BLK_MQ_F_BLOCKING, so
> > > dm_mq_queue_rq() may become to sleep current context.
> > > 
> > > Fixes the issue by allowing dm-rq to set BLK_MQ_F_BLOCKING in case that
> > > any underlying queue is marked as BLK_MQ_F_BLOCKING.
> > > 
> > > DM request queue is allocated before allocating tagset, this way is a
> > > bit special, so we need to pre-allocate srcu payload, then use the queue
> > > flag of QUEUE_FLAG_BLOCKING for locking dispatch.
> > 
> > What is the benefit over just forcing bio-based dm-mpath for these
> > devices?
> 
> At least IO scheduler can't be used for bio based dm-mpath, also there should
> be other drawbacks for bio based mpath and request mpath is often the default
> option, maybe Mike has more input about bio vs request dm-mpath.

Yeah, people use request-based for IO scheduling and more capable path
selectors. Imposing bio-based would be a pretty jarring workaround for 
BLK_MQ_F_BLOCKING. request-based DM should properly support it.

I'm on vacation for the next week but will have a look at this
patchset once I'm back.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 10, 2022, 7:26 p.m. UTC | #4
On Tue, Dec 28 2021 at  4:30P -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Wed, Dec 22 2021 at 11:16P -0500,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > On Tue, Dec 21, 2021 at 08:21:39AM -0800, Christoph Hellwig wrote:
> > > On Tue, Dec 21, 2021 at 10:14:56PM +0800, Ming Lei wrote:
> > > > Hello,
> > > > 
> > > > dm-rq may be built on blk-mq device which marks BLK_MQ_F_BLOCKING, so
> > > > dm_mq_queue_rq() may become to sleep current context.
> > > > 
> > > > Fixes the issue by allowing dm-rq to set BLK_MQ_F_BLOCKING in case that
> > > > any underlying queue is marked as BLK_MQ_F_BLOCKING.
> > > > 
> > > > DM request queue is allocated before allocating tagset, this way is a
> > > > bit special, so we need to pre-allocate srcu payload, then use the queue
> > > > flag of QUEUE_FLAG_BLOCKING for locking dispatch.
> > > 
> > > What is the benefit over just forcing bio-based dm-mpath for these
> > > devices?
> > 
> > At least IO scheduler can't be used for bio based dm-mpath, also there should
> > be other drawbacks for bio based mpath and request mpath is often the default
> > option, maybe Mike has more input about bio vs request dm-mpath.
> 
> Yeah, people use request-based for IO scheduling and more capable path
> selectors. Imposing bio-based would be a pretty jarring workaround for 
> BLK_MQ_F_BLOCKING. request-based DM should properly support it.
> 
> I'm on vacation for the next week but will have a look at this
> patchset once I'm back.

I replied last week and hoped Jens would pick this patchset up after
my Reviewed-by of patch 3/3.

Christoph wasn't back though, so best to kick this thread again.

Thoughts on this patchset?

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Jan. 11, 2022, 8:34 a.m. UTC | #5
On Tue, Dec 28, 2021 at 04:30:08PM -0500, Mike Snitzer wrote:
> Yeah, people use request-based for IO scheduling and more capable path
> selectors. Imposing bio-based would be a pretty jarring workaround for 
> BLK_MQ_F_BLOCKING. request-based DM should properly support it.

Given that nvme-tcp is the only blocking driver that has multipath
driver that driver explicitly does not intend to support dm-multipath
I'm absolutely against adding block layer cruft for this particular
use case.

SCSI even has this:

	        /*
		 * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
		 * calling synchronize_rcu() once is enough.
		 */
		WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 11, 2022, 4:15 p.m. UTC | #6
On Tue, Jan 11 2022 at  3:34P -0500,
Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Dec 28, 2021 at 04:30:08PM -0500, Mike Snitzer wrote:
> > Yeah, people use request-based for IO scheduling and more capable path
> > selectors. Imposing bio-based would be a pretty jarring workaround for 
> > BLK_MQ_F_BLOCKING. request-based DM should properly support it.
> 
> Given that nvme-tcp is the only blocking driver that has multipath
> driver that driver explicitly does not intend to support dm-multipath
> I'm absolutely against adding block layer cruft for this particular
> use case.

this diffstat amounts to what you call "cruft":

 block/blk-core.c       |  2 +-
 block/blk-mq.c         |  6 +++---
 block/blk-mq.h         |  2 +-
 block/blk-sysfs.c      |  2 +-
 block/genhd.c          |  5 +++--
 drivers/md/dm-rq.c     |  5 ++++-
 drivers/md/dm-rq.h     |  3 ++-
 drivers/md/dm-table.c  | 14 ++++++++++++++
 drivers/md/dm.c        |  5 +++--
 drivers/md/dm.h        |  1 +
 include/linux/blkdev.h |  5 +++--
 include/linux/genhd.h  | 12 ++++++++----
 12 files changed, 44 insertions(+), 18 deletions(-)

> SCSI even has this:
> 
> 	        /*
> 		 * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
> 		 * calling synchronize_rcu() once is enough.
> 		 */
> 		WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);
> 

Round and round we go.. Pretty tired of this.

You are perfectly fine with incrementally compromising request-based
DM's ability to evolve as block core does.

Seriously, this patchset shouldn't warrant bickering:
https://patchwork.kernel.org/project/dm-devel/list/?series=598823

Jens, this incremental weakening of what it is that DM is allowed to
do is not something I can continue to work with (nor should Ming's or
others' contributions be rejected for such reasons).

This tribal war needs to stop.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Jeff Moyer Jan. 11, 2022, 6:23 p.m. UTC | #7
Christoph Hellwig <hch@infradead.org> writes:

> On Tue, Dec 28, 2021 at 04:30:08PM -0500, Mike Snitzer wrote:
>> Yeah, people use request-based for IO scheduling and more capable path
>> selectors. Imposing bio-based would be a pretty jarring workaround for 
>> BLK_MQ_F_BLOCKING. request-based DM should properly support it.
>
> Given that nvme-tcp is the only blocking driver that has multipath
> driver that driver explicitly does not intend to support dm-multipath
> I'm absolutely against adding block layer cruft for this particular
> use case.

Maybe I have bad taste, but the patches didn't look like cruft to me.
:)

I'm not sure why we'd prevent users from using dm-mpath on nvmeof.  I
think there's agreement that the nvme native multipath implementation is
the preferred way (that's the default in rhel9, even), but I don't think
that's a reason to nack this patch set.

Or have I missed your point entirely?

Thanks!
Jeff

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Jan. 17, 2022, 8:08 a.m. UTC | #8
On Tue, Jan 11, 2022 at 11:15:09AM -0500, Mike Snitzer wrote:
> Round and round we go.. Pretty tired of this.

Same here.

> You are perfectly fine with incrementally compromising request-based
> DM's ability to evolve as block core does.

I would not word it that way, but I think we mean the same thing.  Yes,
I do not want to add more hooks for a complete oddball that has no
actual use case.  dm-rq does a good enough job for SCSI and has all
the infrastucture for it.  We should not more cruft to exteded it to
use cases for which there is no consesus and which are much better
covered by alredy existing code in the kernel.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Jan. 17, 2022, 8:10 a.m. UTC | #9
On Tue, Jan 11, 2022 at 01:23:53PM -0500, Jeff Moyer wrote:
> Maybe I have bad taste, but the patches didn't look like cruft to me.
> :)

They do to me.  The extend the corner case of request on request
stacking that already is a bit of mess even more by adding yet another
special case in the block layer.

> 
> I'm not sure why we'd prevent users from using dm-mpath on nvmeof.  I
> think there's agreement that the nvme native multipath implementation is
> the preferred way (that's the default in rhel9, even), but I don't think
> that's a reason to nack this patch set.
> 
> Or have I missed your point entirely?

No you have not missed the point.  nvme-multipath exists longer than
the nvme-tcp driver and is the only supported one for it upstream for
a good reason.  If RedHat wants to do their own weirdo setups they can
patch their kernels, but please leave the upstrem code alone.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Jan. 19, 2022, 9:03 p.m. UTC | #10
On Mon, Jan 17 2022 at  3:10P -0500,
Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Jan 11, 2022 at 01:23:53PM -0500, Jeff Moyer wrote:
> > Maybe I have bad taste, but the patches didn't look like cruft to me.
> > :)
> 
> They do to me.  The extend the corner case of request on request
> stacking that already is a bit of mess even more by adding yet another
> special case in the block layer.

Ming's first patch:
https://patchwork.kernel.org/project/dm-devel/patch/20211221141459.1368176-2-ming.lei@redhat.com/
is pure cleanup for the mess that went in this merge cycle.

All that dm-rq context aside, Ming's 1st patch is the correct way to
clean up the block core flags/state (internal vs external, etc).

But the 2nd paragraph of that first patch's header should be moved to
Ming's 2nd patch because it explains why DM needs the new
blk_alloc_disk_srcu() interface, e.g.:
"But dm queue is allocated before tagset is allocated"
(so it confirms it best to explain that in 2nd patch).

But really even Ming's 2nd patch:
https://patchwork.kernel.org/project/dm-devel/patch/20211221141459.1368176-3-ming.lei@redhat.com/
should _not_ need to be debated like this.

Fact is alloc_disk() has always mirrored blk_alloc_queue()'s args.  So
Ming's 2nd patch should be done anyway to expose meaningful control
over request_queue allocation.  If anything, the 2nd patch should just
add the 'alloc_srcu' arg to blk_alloc_disk() and change all but NVMe
callers to pass false.

Put another way: when the 'node_id' arg was added to blk_alloc_queue()
a new blk_alloc_disk_numa_node() wasn't added (despite most block
drivers still only using NUMA_NO_NODE).  This new 'alloc_srcu' flag is
seen to be more niche, but it really should be no different on an
interface symmetry and design level.

> > I'm not sure why we'd prevent users from using dm-mpath on nvmeof.  I
> > think there's agreement that the nvme native multipath implementation is
> > the preferred way (that's the default in rhel9, even), but I don't think
> > that's a reason to nack this patch set.
> > 
> > Or have I missed your point entirely?
> 
> No you have not missed the point.  nvme-multipath exists longer than
> the nvme-tcp driver and is the only supported one for it upstream for
> a good reason.  If RedHat wants to do their own weirdo setups they can
> patch their kernels, but please leave the upstrem code alone.

Patch 3 can be left out if you'd like to force your world view on
everyone... you've already "won", _pretty please_ stop being so
punitive by blocking reasonable change.  We really can get along if
we're all willing to be intellectually honest.

To restate: Ming's patches 1 and 2 really are not "cruft".  They
expose control over request_queue allocation that should be accessible
by both blk_alloc_queue() and blk_alloc_disk().

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel