[for-4.16,v2,1/5] block: establish request failover callback
diff mbox

Message ID 20171227032257.8182-2-snitzer@redhat.com
State New
Headers show

Commit Message

Mike Snitzer Dec. 27, 2017, 3:22 a.m. UTC
All requests allocated from a request_queue with this callback set can
failover their requests during completion.

This callback is expected to use the blk_steal_bios() interface to
transfer a request's bios back to an upper-layer bio-based
request_queue.

This will be used by both NVMe multipath and DM multipath.  Without it
DM multipath cannot get access to NVMe-specific error handling that NVMe
core provides in nvme_complete_rq().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 include/linux/blkdev.h | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Christoph Hellwig Dec. 29, 2017, 10:10 a.m. UTC | #1
On Tue, Dec 26, 2017 at 10:22:53PM -0500, Mike Snitzer wrote:
> All requests allocated from a request_queue with this callback set can
> failover their requests during completion.
> 
> This callback is expected to use the blk_steal_bios() interface to
> transfer a request's bios back to an upper-layer bio-based
> request_queue.
> 
> This will be used by both NVMe multipath and DM multipath.  Without it
> DM multipath cannot get access to NVMe-specific error handling that NVMe
> core provides in nvme_complete_rq().

And the whole point is that it should not get any such access.

The reason why we did nvme multipathing differently is because the
design of dm-multipath inflicts so much pain on users that we absolutely
want to avoid it this time around.
Mike Snitzer Dec. 29, 2017, 8:19 p.m. UTC | #2
On Fri, Dec 29 2017 at  5:10am -0500,
Christoph Hellwig <hch@lst.de> wrote:

> On Tue, Dec 26, 2017 at 10:22:53PM -0500, Mike Snitzer wrote:
> > All requests allocated from a request_queue with this callback set can
> > failover their requests during completion.
> > 
> > This callback is expected to use the blk_steal_bios() interface to
> > transfer a request's bios back to an upper-layer bio-based
> > request_queue.
> > 
> > This will be used by both NVMe multipath and DM multipath.  Without it
> > DM multipath cannot get access to NVMe-specific error handling that NVMe
> > core provides in nvme_complete_rq().
> 
> And the whole point is that it should not get any such access.

No the whole point is you hijacked multipathing for little to no gain.

> The reason why we did nvme multipathing differently is because the
> design of dm-multipath inflicts so much pain on users that we absolutely
> want to avoid it this time around.

Is that the royal "we"?

_You_ are the one subjecting users to pain.  There is no reason users
should need to have multiple management domains for multipathing unless
they opt-in.  Linux _is_ about choice, yet you're working overtime to
limit that choice.

You are blatantly ignoring/rejecting both Hannes [1] and I.  Your
attempt to impose _how_ NVMe multipathing must be done is unacceptable.

Hopefully Jens can see through your senseless position and will accept
patches 1 - 3 for 4.16.  They offer very minimal change that enables
users to decide which multipathing they'd prefer to use with NVMe.

Just wish you could stop with this petty bullshit and actually
collaborate with people.

I've shown how easy it is to enable NVMe multipathing in terms of DM
multipath (yet preserve your native NVMe multipathing).  Please stop
being so dogmatic.  Are you scared of being proven wrong about what the
market wants?

If you'd allow progress toward native NVMe and DM multipathing
coexisting we'd let the users decide what they prefer.  I don't need to
impose one way or the other, but I _do_ need to preserve DM multipath
compatibility given the extensive use of DM multipath in the enterprise
and increased tooling that builds upon it.

[1] http://lists.infradead.org/pipermail/linux-nvme/2017-October/013719.html
Christoph Hellwig Jan. 4, 2018, 10:28 a.m. UTC | #3
On Fri, Dec 29, 2017 at 03:19:04PM -0500, Mike Snitzer wrote:
> On Fri, Dec 29 2017 at  5:10am -0500,
> Christoph Hellwig <hch@lst.de> wrote:
> 
> > On Tue, Dec 26, 2017 at 10:22:53PM -0500, Mike Snitzer wrote:
> > > All requests allocated from a request_queue with this callback set can
> > > failover their requests during completion.
> > > 
> > > This callback is expected to use the blk_steal_bios() interface to
> > > transfer a request's bios back to an upper-layer bio-based
> > > request_queue.
> > > 
> > > This will be used by both NVMe multipath and DM multipath.  Without it
> > > DM multipath cannot get access to NVMe-specific error handling that NVMe
> > > core provides in nvme_complete_rq().
> > 
> > And the whole point is that it should not get any such access.
> 
> No the whole point is you hijacked multipathing for little to no gain.

That is your idea.  In the end there have been a lot of complains about
dm-multipath, and there was a lot of discussion how to do things better,
with a broad agreement on this approach.  Up to the point where Hannes
has started considering doing something similar for scsi.

And to be honest if this is the tone you'd like to set for technical
discussions I'm not really interested.  Please calm down and stick
to a technical discussion.
Mike Snitzer Jan. 4, 2018, 2:42 p.m. UTC | #4
On Thu, Jan 04 2018 at  5:28am -0500,
Christoph Hellwig <hch@lst.de> wrote:

> On Fri, Dec 29, 2017 at 03:19:04PM -0500, Mike Snitzer wrote:
> > On Fri, Dec 29 2017 at  5:10am -0500,
> > Christoph Hellwig <hch@lst.de> wrote:
> > 
> > > On Tue, Dec 26, 2017 at 10:22:53PM -0500, Mike Snitzer wrote:
> > > > All requests allocated from a request_queue with this callback set can
> > > > failover their requests during completion.
> > > > 
> > > > This callback is expected to use the blk_steal_bios() interface to
> > > > transfer a request's bios back to an upper-layer bio-based
> > > > request_queue.
> > > > 
> > > > This will be used by both NVMe multipath and DM multipath.  Without it
> > > > DM multipath cannot get access to NVMe-specific error handling that NVMe
> > > > core provides in nvme_complete_rq().
> > > 
> > > And the whole point is that it should not get any such access.
> > 
> > No the whole point is you hijacked multipathing for little to no gain.
> 
> That is your idea.  In the end there have been a lot of complains about
> dm-multipath, and there was a lot of discussion how to do things better,
> with a broad agreement on this approach.  Up to the point where Hannes
> has started considering doing something similar for scsi.

All the "DM multipath" complaints I heard at LSF were fixable and pretty
superficial.  Some less so, but Hannes had a vision for addressing
various SCSI stuff (which really complicated DM multipath).

But I'd really rather not dwell on all the history of NVMe native
multipathing's evolution.  It isn't productive (other than to
acknowledge that there are far more efficient and productive ways to
coordinate such a change).

> And to be honest if this is the tone you'd like to set for technical
> discussions I'm not really interested.  Please calm down and stick
> to a technical discussion.

I think you'd probably agree that you've repeatedly derailed or avoided
technical discussion if it got into "DM multipath".  But again I'm not
looking to dwell on how dysfunctional this has been.  I really do
appreciate your technical expertise.  Sadly, cannot say I feel you think
similarly of me.

I will say that I'm human, as such I have limits on what I'm willing to
accept.  You leveraged your position to the point where it has started
to feel like you were lording over me.  Tough to accept that.   It makes
my job _really_ feel like "work".  All I've ever been trying to do
(since accepting the reality of "NVMe native multipathing") is bridge
the gap from the old solution to new solution.  I'm not opposed to the
new solution, it just needs to mature without being the _only_ way to
provide the feature (NVMe multipathing).  Hopefully we can be productive
exchanges moving forward.

There are certainly some challenges associated with trying to allow a
kernel to support both NVMe native multipathing and DM multipathing.
E.g. would an NVMe device scan multipath blacklist be doable/acceptable?

I'd also like to understand if your vision for NVMe's ANA support will
model something like scsi_dh?  Meaning ANA is a capability that, when
attached, augments the behavior of the NVMe device but that it is
otherwise internal to the device and upper layers will get the benefit
of ANA handler being attached.  Also, curious to know if you see that as
needing to be tightly coupled to multipathing?  If so that is the next
interface point hurdle.

In the end I really think that DM multipath can help make NVMe native
multipath very robust (and vice-versa).

Mike

Patch
diff mbox

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8089ca17db9a..f45f5925e100 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -278,6 +278,7 @@  typedef int (lld_busy_fn) (struct request_queue *q);
 typedef int (bsg_job_fn) (struct bsg_job *);
 typedef int (init_rq_fn)(struct request_queue *, struct request *, gfp_t);
 typedef void (exit_rq_fn)(struct request_queue *, struct request *);
+typedef void (failover_rq_fn)(struct request *);
 
 enum blk_eh_timer_return {
 	BLK_EH_NOT_HANDLED,
@@ -423,6 +424,11 @@  struct request_queue {
 	exit_rq_fn		*exit_rq_fn;
 	/* Called from inside blk_get_request() */
 	void (*initialize_rq_fn)(struct request *rq);
+	/*
+	 * Callback to failover request's bios back to upper layer
+	 * bio-based request_queue using blk_steal_bios().
+	 */
+	failover_rq_fn		*failover_rq_fn;
 
 	const struct blk_mq_ops	*mq_ops;