diff mbox

[1/2] Don't blacklist nvme

Message ID 1487107154-24883-1-git-send-email-keith.busch@intel.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show

Commit Message

Keith Busch Feb. 14, 2017, 9:19 p.m. UTC
These devices are mulitpath capable, and have been able to stack with
dm-mpath since kernel 4.2.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 libmultipath/blacklist.c   | 2 +-
 multipath/multipath.conf.5 | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Bart Van Assche Feb. 14, 2017, 9:35 p.m. UTC | #1
On 02/14/2017 01:19 PM, Keith Busch wrote:
> These devices are mulitpath capable, and have been able to stack with
> dm-mpath since kernel 4.2.
> 
> -	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk|nvme)[0-9]");
> +	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]");

Have you checked whether dm-mpath works properly with nvme? Last time I
tried that dm-mpath crashed when it called scsi_dh_attach() because that
last function assumes that it is passed a SCSI device queue instead of
checking whether or not the queue passed to that function is a SCSI
device queue.

Bart.
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Keith Busch Feb. 14, 2017, 11 p.m. UTC | #2
On Tue, Feb 14, 2017 at 01:35:45PM -0800, Bart Van Assche wrote:
> On 02/14/2017 01:19 PM, Keith Busch wrote:
> > These devices are mulitpath capable, and have been able to stack with
> > dm-mpath since kernel 4.2.
> > 
> > -	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk|nvme)[0-9]");
> > +	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]");
> 
> Have you checked whether dm-mpath works properly with nvme? Last time I
> tried that dm-mpath crashed when it called scsi_dh_attach() because that
> last function assumes that it is passed a SCSI device queue instead of
> checking whether or not the queue passed to that function is a SCSI
> device queue.

Good point. I was unknowingly running with CONFIG_SCSI_DH disabled,
and blissfully unaware of its existence! After enabling that option,
I see what you mean.

If we don't want to mess with the kernel, I can change the multipath-tools
to get around that by appending the following to NVMe hwentry in the
second patch in this series:

	.retain_hwhandler = RETAIN_HWHANDLER_OFF,

And the problem goes away.

I'm not sure yet what the other implications are of setting that (I will
look into it, but I'm out of time today).

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Feb. 15, 2017, 2:56 p.m. UTC | #3
On Tue, Feb 14, 2017 at 04:19:13PM -0500, Keith Busch wrote:
> These devices are mulitpath capable, and have been able to stack with
> dm-mpath since kernel 4.2.

Can we make this conditional on something?  I have native NVMe
multipathing almost ready for the next merge window which is a lot easier
to use and faster than dm.  So I don't want us to be locked into this
mode just before that.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Feb. 15, 2017, 2:57 p.m. UTC | #4
On Tue, Feb 14, 2017 at 06:00:23PM -0500, Keith Busch wrote:
> Good point. I was unknowingly running with CONFIG_SCSI_DH disabled,
> and blissfully unaware of its existence! After enabling that option,
> I see what you mean.

Someone needs to fix that crash ASAP.  I though Hannes had patches for
them a long time ago in the SuSE tree, but they never went anywhere.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Keith Busch Feb. 15, 2017, 5:24 p.m. UTC | #5
On Wed, Feb 15, 2017 at 06:57:21AM -0800, Christoph Hellwig wrote:
> On Tue, Feb 14, 2017 at 06:00:23PM -0500, Keith Busch wrote:
> > Good point. I was unknowingly running with CONFIG_SCSI_DH disabled,
> > and blissfully unaware of its existence! After enabling that option,
> > I see what you mean.
> 
> Someone needs to fix that crash ASAP.  I though Hannes had patches for
> them a long time ago in the SuSE tree, but they never went anywhere.

Adding Hannes.

I can work around this by setting the parameters from multipath-tools
specific to NVMe, but that seems worng. Blindly calling scsi specific
directly from dm-mpath isn't a good idea.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 16, 2017, 1:58 a.m. UTC | #6
On Wed, Feb 15 2017 at 12:24pm -0500,
Keith Busch <keith.busch@intel.com> wrote:

> On Wed, Feb 15, 2017 at 06:57:21AM -0800, Christoph Hellwig wrote:
> > On Tue, Feb 14, 2017 at 06:00:23PM -0500, Keith Busch wrote:
> > > Good point. I was unknowingly running with CONFIG_SCSI_DH disabled,
> > > and blissfully unaware of its existence! After enabling that option,
> > > I see what you mean.
> > 
> > Someone needs to fix that crash ASAP.  I though Hannes had patches for
> > them a long time ago in the SuSE tree, but they never went anywhere.
> 
> Adding Hannes.
> 
> I can work around this by setting the parameters from multipath-tools
> specific to NVMe, but that seems worng. Blindly calling scsi specific
> directly from dm-mpath isn't a good idea.

Right, not a good idea.  Any SCSI specific code in DM multipath needs to
be made conditional.  It has been on TODO for a while (with an eye
toward NVMe support) but testbeds are hard to come by.

All patches welcome.  But reports are also welcome, like any crash you
hit.  Just send me email and I'll fix it up best I can.

This work is very timely given NVMe over Fabrics definitely has elevated
priority.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 16, 2017, 2:01 a.m. UTC | #7
On Tue, Feb 14 2017 at  6:00pm -0500,
Keith Busch <keith.busch@intel.com> wrote:

> On Tue, Feb 14, 2017 at 01:35:45PM -0800, Bart Van Assche wrote:
> > On 02/14/2017 01:19 PM, Keith Busch wrote:
> > > These devices are mulitpath capable, and have been able to stack with
> > > dm-mpath since kernel 4.2.
> > > 
> > > -	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk|nvme)[0-9]");
> > > +	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]");
> > 
> > Have you checked whether dm-mpath works properly with nvme? Last time I
> > tried that dm-mpath crashed when it called scsi_dh_attach() because that
> > last function assumes that it is passed a SCSI device queue instead of
> > checking whether or not the queue passed to that function is a SCSI
> > device queue.
> 
> Good point. I was unknowingly running with CONFIG_SCSI_DH disabled,
> and blissfully unaware of its existence! After enabling that option,
> I see what you mean.
> 
> If we don't want to mess with the kernel, I can change the multipath-tools
> to get around that by appending the following to NVMe hwentry in the
> second patch in this series:
> 
> 	.retain_hwhandler = RETAIN_HWHANDLER_OFF,
> 
> And the problem goes away.

That gives me a clue, I'll take a look.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 16, 2017, 2:35 a.m. UTC | #8
On Wed, Feb 15 2017 at  9:01pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Feb 14 2017 at  6:00pm -0500,
> Keith Busch <keith.busch@intel.com> wrote:
> 
> > On Tue, Feb 14, 2017 at 01:35:45PM -0800, Bart Van Assche wrote:
> > > On 02/14/2017 01:19 PM, Keith Busch wrote:
> > > > These devices are mulitpath capable, and have been able to stack with
> > > > dm-mpath since kernel 4.2.
> > > > 
> > > > -	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk|nvme)[0-9]");
> > > > +	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]");
> > > 
> > > Have you checked whether dm-mpath works properly with nvme? Last time I
> > > tried that dm-mpath crashed when it called scsi_dh_attach() because that
> > > last function assumes that it is passed a SCSI device queue instead of
> > > checking whether or not the queue passed to that function is a SCSI
> > > device queue.
> > 
> > Good point. I was unknowingly running with CONFIG_SCSI_DH disabled,
> > and blissfully unaware of its existence! After enabling that option,
> > I see what you mean.
> > 
> > If we don't want to mess with the kernel, I can change the multipath-tools
> > to get around that by appending the following to NVMe hwentry in the
> > second patch in this series:
> > 
> > 	.retain_hwhandler = RETAIN_HWHANDLER_OFF,
> > 
> > And the problem goes away.
> 
> That gives me a clue, I'll take a look.

Looks like drivers/scsi/scsi_dh.c:get_sdev_from_queue() needs to first
check if the request_queue is from a SCSI device (rather than assuming
it is).  Not quite sure how to check that though.

By pushing the check down into scsi_dh it implies DM multipath will be
calling into scsi_dh code for other transports (NVMe, etc).

So an alternative would be to have DM multipath guard SCSI specific code
with an appropriate check.  Again, not sure how that check (e.g. some
new is_scsi_request_queue) should be implemented.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 16, 2017, 2:53 a.m. UTC | #9
On Wed, Feb 15 2017 at  9:56am -0500,
Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Feb 14, 2017 at 04:19:13PM -0500, Keith Busch wrote:
> > These devices are mulitpath capable, and have been able to stack with
> > dm-mpath since kernel 4.2.
> 
> Can we make this conditional on something?  I have native NVMe
> multipathing almost ready for the next merge window which is a lot easier
> to use and faster than dm.  So I don't want us to be locked into this
> mode just before that.

You've avoided discussing this on any level (and I guess you aren't
going to LSF/MM?).  Yet you're expecting to just drop it into the tree
without a care in the world about the implications.

Nobody has interest in Linux multipathing becoming fragmented.

If every transport implemented their own multipathing the end-user would
be amazingly screwed trying to keep track of all the
quirks/configuration/management of each.

Not saying multipath-tools is great, nor that DM multipath is god's
gift.  But substantiating _why_ you need this "native NVMe
multipathing" would go a really long way to justifying your effort.

For starters, how about you show just how much better than DM multipath
this native NVMe multipathing performs?  NOTE: it'd imply you put effort
to making DM multipath work with NVMe.. if you've sat on that code too
that'd be amazingly unfortunate/frustrating.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Feb. 16, 2017, 5 a.m. UTC | #10
On 02/15/17 18:53, Mike Snitzer wrote:
> Nobody has interest in Linux multipathing becoming fragmented.
>
> If every transport implemented their own multipathing the end-user would
> be amazingly screwed trying to keep track of all the
> quirks/configuration/management of each.
>
> Not saying multipath-tools is great, nor that DM multipath is god's
> gift.  But substantiating _why_ you need this "native NVMe
> multipathing" would go a really long way to justifying your effort.
>
> For starters, how about you show just how much better than DM multipath
> this native NVMe multipathing performs?  NOTE: it'd imply you put effort
> to making DM multipath work with NVMe.. if you've sat on that code too
> that'd be amazingly unfortunate/frustrating.

Another question is what your attitude is towards dm-mpath changes? Last 
time I posted a series of patches that significantly clean up and
improve readability of the dm-mpath code you refused to take these upstream.

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 16, 2017, 12:37 p.m. UTC | #11
On Thu, Feb 16 2017 at 12:00am -0500,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 02/15/17 18:53, Mike Snitzer wrote:
> >Nobody has interest in Linux multipathing becoming fragmented.
> >
> >If every transport implemented their own multipathing the end-user would
> >be amazingly screwed trying to keep track of all the
> >quirks/configuration/management of each.
> >
> >Not saying multipath-tools is great, nor that DM multipath is god's
> >gift.  But substantiating _why_ you need this "native NVMe
> >multipathing" would go a really long way to justifying your effort.
> >
> >For starters, how about you show just how much better than DM multipath
> >this native NVMe multipathing performs?  NOTE: it'd imply you put effort
> >to making DM multipath work with NVMe.. if you've sat on that code too
> >that'd be amazingly unfortunate/frustrating.
> 
> Another question is what your attitude is towards dm-mpath changes?
> Last time I posted a series of patches that significantly clean up
> and improve readability of the dm-mpath code you refused to take these upstream.

Weird.  I did push back on those changes initially (just felt like
churn) but I ultimately did take them:

$ git log --oneline --author=bart drivers/md/dm-mpath.c
6599c84 dm mpath: do not modify *__clone if blk_mq_alloc_request() fails
4813577 dm mpath: change return type of pg_init_all_paths() from int to void
9f4c3f8 dm: convert wait loops to use autoremove_wake_function()

Did I miss any?

But to be 100% clear, I'm very appreciative of any DM mpath (and
request-based DM core) changes.  I'll review them with a critical eye
but if they hold up they get included.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Feb. 16, 2017, 2:26 p.m. UTC | #12
On Wed, Feb 15, 2017 at 09:53:57PM -0500, Mike Snitzer wrote:
> going to LSF/MM?).  Yet you're expecting to just drop it into the tree
> without a care in the world about the implications.

I am planning to post it for comments, and then plan to "drop it in the
tree" exactly because I think of the implications.

Keith did that.  But once we already do the discovery of the path
relations in the transport (e.g scsi_dh) we can just move the path
selectors (for which I'm reusing the DM code anyway btw) and the
bouncing of I/O to the block layer and cut out the middle man.

The main reason is that things will just work (TM) instead of having
to carry around additional userspace to configure a an unneded
additional device layer that just causes confusion.  Beyond the
scsi_dh equivalent there is basically no new code in nvme, just a little
new code in the block layer, and a move of the path selectors from dm
to the block layer.  I would not call this fragmentation.

Anyway, there is very little point in having an abstract discussion
here, I'll try to get the code ready ASAP, although until the end of
next week I'm really pressed with other deadlines.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 16, 2017, 3:13 p.m. UTC | #13
On Thu, Feb 16 2017 at  9:26am -0500,
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Feb 15, 2017 at 09:53:57PM -0500, Mike Snitzer wrote:
> > going to LSF/MM?).  Yet you're expecting to just drop it into the tree
> > without a care in the world about the implications.
> 
> I am planning to post it for comments, and then plan to "drop it in the
> tree" exactly because I think of the implications.
> 
> Keith did that

Not following what you're saying Keith did.  Please feel free to
clarify.

But we definitely need to devise a way for NVMe to inform DM multipath
(and vice-versa): hands off this device.  Awkward details to work
through to be sure...

> But once we already do the discovery of the path
> relations in the transport (e.g scsi_dh) we can just move the path
> selectors (for which I'm reusing the DM code anyway btw) and the
> bouncing of I/O to the block layer and cut out the middle man.

The middle man is useful if it can support all transports.  If it only
supports some then yeah the utility is certainly reduced.

> The main reason is that things will just work (TM) instead of having
> to carry around additional userspace to configure a an unneded
> additional device layer that just causes confusion.  Beyond the
> scsi_dh equivalent there is basically no new code in nvme, 

I'm going to look at removing any scsi_dh code from DM multipath
(someone already proposed removing the 'retain_attached_hw_handler'
feature).  Not much point having anything in DM multipath now that scsi
discovery has the ability to auto-attach the right scsi_dh via scsi_dh's
.match hook.  As a side-effect it will fix Keith's scsi_dh crash (when
operating on NVMe request_queue).

My hope is that your NVMe equivalent for scsi_dh will "just work" (TM)
like scsi_dh auto-attach does.  There isn't a finished ALUA equivalent
standard for NVMe so I'd imagine at this point you have a single device
handler for NVMe to do error translation?

Anyway, the scsi_dh equivalent for NVMe is welcomed news!

> just a little new code in the block layer, and a move of the path
> selectors from dm to the block layer.  I would not call this
> fragmentation.

I'm fine with the path selectors getting moved out; maybe it'll
encourage new path selectors to be developed.

But there will need to be some userspace interface stood up to support
your native NVMe multipathing (you may not think it needed but think in
time there will be a need to configure _something_).  That is the
fragmentation I'm referring to.

> Anyway, there is very little point in having an abstract discussion
> here, I'll try to get the code ready ASAP, although until the end of
> next week I'm really pressed with other deadlines.

OK.

FYI, I never wanted to have an abstract discussion.  We need a real nuts
and bolts discussion.  Happy to have it play out on the lists. 

I'm not violently opposed to your native NVMe multipathing -- especially
from a reference implementation point of view.  I think that in practice
it'll  keep DM multipath honest -- help drive scalability improvements, etc.
If over time the native NVMe multipathing _is_ the preferred multipathing
solution for NVme, then so be it.  It'll be on merits.. as it should be.

But I'm sure you're well aware that I and Red Hat and our partners have
a vested interest in providing a single multipath stack that "just
works" for all appropriate storage.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Feb. 16, 2017, 5:37 p.m. UTC | #14
On Thu, 2017-02-16 at 12:38 -0500, Keith Busch wrote:
> Maybe I'm not seeing the bigger picture. Is there some part to multipath
> that the kernel is not in a better position to handle?

Does this mean that the code to parse /etc/multipath.conf will be moved into
the kernel? Or will we lose the ability to configure the policies that
/etc/multipath.conf allows to configure?

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Keith Busch Feb. 16, 2017, 5:38 p.m. UTC | #15
On Thu, Feb 16, 2017 at 10:13:37AM -0500, Mike Snitzer wrote:
> On Thu, Feb 16 2017 at  9:26am -0500,
> Christoph Hellwig <hch@infradead.org> wrote:
>
> > just a little new code in the block layer, and a move of the path
> > selectors from dm to the block layer.  I would not call this
> > fragmentation.
> 
> I'm fine with the path selectors getting moved out; maybe it'll
> encourage new path selectors to be developed.
> 
> But there will need to be some userspace interface stood up to support
> your native NVMe multipathing (you may not think it needed but think in
> time there will be a need to configure _something_).  That is the
> fragmentation I'm referring to.

I'm not sure what Christoph's proposal looks like, but I have to agree
that multipath support directly in the kernel without requiring user
space to setup the mpath block device is easier for everyone. The only
NVMe specific part, though, just needs to be how it reports unique
identifiers to the multipath layer.

Maybe I'm not seeing the bigger picture. Is there some part to multipath
that the kernel is not in a better position to handle?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Sagi Grimberg Feb. 16, 2017, 6:05 p.m. UTC | #16
> I'm fine with the path selectors getting moved out; maybe it'll
> encourage new path selectors to be developed.
>
> But there will need to be some userspace interface stood up to support
> your native NVMe multipathing (you may not think it needed but think in
> time there will be a need to configure _something_).  That is the
> fragmentation I'm referring to.

I guess one config option that we'd need is multibus vs. failover
which are used per use-case.

I'd have to say that having something much simpler than multipath-tools
does sound appealing.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Keith Busch Feb. 16, 2017, 6:07 p.m. UTC | #17
On Thu, Feb 16, 2017 at 05:37:41PM +0000, Bart Van Assche wrote:
> On Thu, 2017-02-16 at 12:38 -0500, Keith Busch wrote:
> > Maybe I'm not seeing the bigger picture. Is there some part to multipath
> > that the kernel is not in a better position to handle?
> 
> Does this mean that the code to parse /etc/multipath.conf will be moved into
> the kernel? Or will we lose the ability to configure the policies that
> /etc/multipath.conf allows to configure?

No, I'm just considering the settings for a device that won't work
at all if multipath.conf is wrong. For example, the uuid attributes,
path priority, or path checker. These can't be considered configurable
policies if all but one of them are invalid for a specific device type.

It shouldn't even be an option to let a user select TUR path checker
for NVMe, and the only checkers multipath-tools provide that even make
sense for NVMe are deprecated.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 16, 2017, 6:21 p.m. UTC | #18
On Thu, Feb 16 2017 at  1:07pm -0500,
Keith Busch <keith.busch@intel.com> wrote:

> On Thu, Feb 16, 2017 at 05:37:41PM +0000, Bart Van Assche wrote:
> > On Thu, 2017-02-16 at 12:38 -0500, Keith Busch wrote:
> > > Maybe I'm not seeing the bigger picture. Is there some part to multipath
> > > that the kernel is not in a better position to handle?
> > 
> > Does this mean that the code to parse /etc/multipath.conf will be moved into
> > the kernel? Or will we lose the ability to configure the policies that
> > /etc/multipath.conf allows to configure?
> 
> No, I'm just considering the settings for a device that won't work
> at all if multipath.conf is wrong. For example, the uuid attributes,
> path priority, or path checker. These can't be considered configurable
> policies if all but one of them are invalid for a specific device type.
> 
> It shouldn't even be an option to let a user select TUR path checker
> for NVMe, and the only checkers multipath-tools provide that even make
> sense for NVMe are deprecated.

Then undeprecate them.  Decisions like marking a path checker deprecated
were _not_ made with NVMe in mind.  They must predate NVMe.

multipath-tools has tables that specify all the defaults for a given
target backend.  NVMe will just be yet another.  Yes some user _could_
shoot themselves in the foot by overriding the proper configuration but
since when are we motivated by _not_ giving users the power to hang
themselves?

As for configurability (chosing between N valid configs/settings): At
some point the user will want one behaviour vs another.  Thinking
otherwise is just naive.  Think error timeouts, etc.  Any multipath
kernel implementation (which dm-multipath is BTW) will eventually find
itself at a crossroads where the underlying fabric could be tweaked in
different ways.  Thinking you can just hardcode these attributes and
settings is foolish.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Feb. 16, 2017, 7:46 p.m. UTC | #19
On Thu, 2017-02-16 at 07:37 -0500, Mike Snitzer wrote:
> Weird.  I did push back on those changes initially (just felt like
> churn) but I ultimately did take them:
> 
> $ git log --oneline --author=bart drivers/md/dm-mpath.c
> 6599c84 dm mpath: do not modify *__clone if blk_mq_alloc_request() fails
> 4813577 dm mpath: change return type of pg_init_all_paths() from int to void
> 9f4c3f8 dm: convert wait loops to use autoremove_wake_function()
> 
> Did I miss any?

Hello Mike,

Thank you for having accepted these patches. However, I was not referring
to these patches but to the eight patches available at
https://github.com/bvanassche/linux/tree/dm-mpath

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 16, 2017, 8:23 p.m. UTC | #20
On Thu, Feb 16 2017 at  2:46pm -0500,
Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:

> On Thu, 2017-02-16 at 07:37 -0500, Mike Snitzer wrote:
> > Weird.  I did push back on those changes initially (just felt like
> > churn) but I ultimately did take them:
> > 
> > $ git log --oneline --author=bart drivers/md/dm-mpath.c
> > 6599c84 dm mpath: do not modify *__clone if blk_mq_alloc_request() fails
> > 4813577 dm mpath: change return type of pg_init_all_paths() from int to void
> > 9f4c3f8 dm: convert wait loops to use autoremove_wake_function()
> > 
> > Did I miss any?
> 
> Hello Mike,
> 
> Thank you for having accepted these patches. However, I was not referring
> to these patches but to the eight patches available at
> https://github.com/bvanassche/linux/tree/dm-mpath

Some of these look familiar (the dm-mpath micro-optimize one made me
cringe.. potential for regression, etc.  But careful review should ease
those concerns).  But yeah, these certainly slipped through the cracks.
I'll review these for 4.12 (too late for 4.11, sorry)

Please rebase these ontop of linux-block.git's for-4.11/rq-refactor
because Christoph significantly reworked the request-based DM code (all
request allocation is done in DM multipath now), see:
http://git.kernel.dk/cgit/linux-block/commit/?h=for-4.11/rq-refactor&id=eb8db831be80692bf4bda3dfc55001daf64ec299

Or you can wait to rebase on v4.11-rc1 in ~2 weeks.

Thanks

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Keith Busch Feb. 16, 2017, 8:40 p.m. UTC | #21
On Thu, Feb 16, 2017 at 01:21:29PM -0500, Mike Snitzer wrote:
> Then undeprecate them.  Decisions like marking a path checker deprecated
> were _not_ made with NVMe in mind.  They must predate NVMe.
> 
> multipath-tools has tables that specify all the defaults for a given
> target backend.  NVMe will just be yet another.  Yes some user _could_
> shoot themselves in the foot by overriding the proper configuration but
> since when are we motivated by _not_ giving users the power to hang
> themselves?
> 
> As for configurability (chosing between N valid configs/settings): At
> some point the user will want one behaviour vs another.  Thinking
> otherwise is just naive.  Think error timeouts, etc.  Any multipath
> kernel implementation (which dm-multipath is BTW) will eventually find
> itself at a crossroads where the underlying fabric could be tweaked in
> different ways.  Thinking you can just hardcode these attributes and
> settings is foolish.

Roger that, and I absolutely want to see this work with the existing
framework.

I just think it'd be easier for everyone if multipath were more like
the generic block layer, in that devices are surfaced with configurable
policies without userspace telling it which to use. The kernel knowing
safe defaults for a particular device is probably the more common case,
and userspace can still tune them as needed. Of course, I accept you're
in a better position to know if this is folly.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Bart Van Assche Feb. 16, 2017, 8:58 p.m. UTC | #22
On Thu, 2017-02-16 at 15:23 -0500, Mike Snitzer wrote:
> Or you can wait to rebase on v4.11-rc1 in ~2 weeks.

Hello Mike,

I will wait until v4.11-rc1 has been released.

Thanks,

Bart.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Feb. 17, 2017, 9:04 a.m. UTC | #23
On Thu, Feb 16, 2017 at 01:21:29PM -0500, Mike Snitzer wrote:
> multipath-tools has tables that specify all the defaults for a given
> target backend.  NVMe will just be yet another.

No, if we get things right it won't.  ALUA already got rid of most
of the parameter people would have to set under normal conditions,
and I plan to make sure the NVMe equivalent will do it for all
parameters.  I am active in the NVMe working group and will do my
best to get there.  There's a few others folks here that are more or
less active there as well (Keith, Martin, Jens for example), so I
think we have a chance.

That beeing said Keith is right that we'll always have odd setups
where we need to overrid things, and we will have to provide tunables
for that.  It's no different from any other kernel subsystem in that.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Feb. 17, 2017, 9:05 a.m. UTC | #24
On Thu, Feb 16, 2017 at 08:05:36PM +0200, Sagi Grimberg wrote:
> I guess one config option that we'd need is multibus vs. failover
> which are used per use-case.

Which fundamentally is a property of the target first, and it should
tell us that.  There might be the occasional need for an override,
but certainly not locally maintained tables of targets and their
preferences.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Christoph Hellwig Feb. 17, 2017, 9:33 a.m. UTC | #25
On Thu, Feb 16, 2017 at 10:13:37AM -0500, Mike Snitzer wrote:
> Not following what you're saying Keith did.  Please feel free to
> clarify.

Keith demonstrated what it takes to support NVMe with dm.  He also
gave a couple presentations on it in addition to various ptches on
the list.

> The middle man is useful if it can support all transports.  If it only
> supports some then yeah the utility is certainly reduced.

Again let's look at what multipathing involves:

 - discovery of multiple paths for a device, and path preferences:
   Storage protocol specific

 - handling of path state and grouping changes:
   Storage protocol specific

 - handling of path up / down events:
   Storage protocol / transport specific if provided

 - keep alive / path checking:
   Storage protocol specific with possible generic fallback

 - path selection:
   Generic, although building heavily on protocol / transport specific
   information

So most of the hard work is transport specific anyway.  And I fully
agree that generic code should be, well generic.  And with generic
I mean right in the block layer instead of involving a layer block
driver that relies on lots of low-level driver information and setup
from user space.
	
> I'm going to look at removing any scsi_dh code from DM multipath
> (someone already proposed removing the 'retain_attached_hw_handler'
> feature).  Not much point having anything in DM multipath now that scsi
> discovery has the ability to auto-attach the right scsi_dh via scsi_dh's
> .match hook.

Great.

> As a side-effect it will fix Keith's scsi_dh crash (when
> operating on NVMe request_queue).

I think we'll need to have a quick fix for that ASAP, though.

> My hope is that your NVMe equivalent for scsi_dh will "just work" (TM)
> like scsi_dh auto-attach does.  There isn't a finished ALUA equivalent
> standard for NVMe so I'd imagine at this point you have a single device
> handler for NVMe to do error translation?

Yes, error translation for the block layer, but most importantly
discovery of multiple paths to the same namespace.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 17, 2017, 2:32 p.m. UTC | #26
On Fri, Feb 17 2017 at  4:33am -0500,
Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Feb 16, 2017 at 10:13:37AM -0500, Mike Snitzer wrote:
> > Not following what you're saying Keith did.  Please feel free to
> > clarify.
> 
> Keith demonstrated what it takes to support NVMe with dm.  He also
> gave a couple presentations on it in addition to various ptches on
> the list.

Well I know very well that Keith was an early consumer of DM multipath
for NVMe (back when Intel was looking at doing dual port NVMe drives).
Keith's interest in NVMe support in multipath is what motivated all the
blk-mq request-based DM multipath advances.

But I'll follow-up with Keith to see if he can sweep through whatever
patches he has and make sure I'm aware of them.

> > The middle man is useful if it can support all transports.  If it only
> > supports some then yeah the utility is certainly reduced.
> 
> Again let's look at what multipathing involves:
> 
>  - discovery of multiple paths for a device, and path preferences:
>    Storage protocol specific
> 
>  - handling of path state and grouping changes:
>    Storage protocol specific
> 
>  - handling of path up / down events:
>    Storage protocol / transport specific if provided
> 
>  - keep alive / path checking:
>    Storage protocol specific with possible generic fallback
> 
>  - path selection:
>    Generic, although building heavily on protocol / transport specific
>    information
> 
> So most of the hard work is transport specific anyway.  And I fully
> agree that generic code should be, well generic.  And with generic
> I mean right in the block layer instead of involving a layer block
> driver that relies on lots of low-level driver information and setup
> from user space.

"Lots of low-level driver information?"  You mean like SCSI's pg_init
and such?

> > I'm going to look at removing any scsi_dh code from DM multipath
> > (someone already proposed removing the 'retain_attached_hw_handler'
> > feature).  Not much point having anything in DM multipath now that scsi
> > discovery has the ability to auto-attach the right scsi_dh via scsi_dh's
> > .match hook.
> 
> Great.
> 
> > As a side-effect it will fix Keith's scsi_dh crash (when
> > operating on NVMe request_queue).
> 
> I think we'll need to have a quick fix for that ASAP, though.

Hannes' latest patch looked good to me.  Should take care of it.  I
assume mkp will pick it up for the 4.11 merge window and mark for
stable?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 17, 2017, 2:37 p.m. UTC | #27
On Fri, Feb 17 2017 at  4:05am -0500,
Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Feb 16, 2017 at 08:05:36PM +0200, Sagi Grimberg wrote:
> > I guess one config option that we'd need is multibus vs. failover
> > which are used per use-case.
> 
> Which fundamentally is a property of the target first, and it should
> tell us that.  There might be the occasional need for an override,
> but certainly not locally maintained tables of targets and their
> preferences.

Think you're being idealistic to think that once NVMe is on the roadmap,
and an engineering priority, of every player in the industry that your
more-or-less prestine implementation isn't going to get stretched to its
limits and ultimately need a fair amount of vendor specific tweaks.

But hopefully I'm wrong.  I just know what has happened with SCSI/FC/IB
multipathing vendor hardware and the jockeying that all these vendors do
ultimately results in the need for quirks because they raced to market
and ultimately missed something or invented their own ways forward to
allow NVMe to dove-tail into their preexisting legacy "enterprise"
offerings.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 17, 2017, 2:43 p.m. UTC | #28
On Fri, Feb 17 2017 at  4:04am -0500,
hch@infradead.org <hch@infradead.org> wrote:

> On Thu, Feb 16, 2017 at 01:21:29PM -0500, Mike Snitzer wrote:
> > multipath-tools has tables that specify all the defaults for a given
> > target backend.  NVMe will just be yet another.
> 
> No, if we get things right it won't.  ALUA already got rid of most
> of the parameter people would have to set under normal conditions,
> and I plan to make sure the NVMe equivalent will do it for all
> parameters.  I am active in the NVMe working group and will do my
> best to get there.  There's a few others folks here that are more or
> less active there as well (Keith, Martin, Jens for example), so I
> think we have a chance.
> 
> That beeing said Keith is right that we'll always have odd setups
> where we need to overrid things, and we will have to provide tunables
> for that.  It's no different from any other kernel subsystem in that.

Before ALUA fixed all that vendor specific fragmentation there was the
even worse fragmentation where different vendors pushed multipathing
into their FC drivers.  James correctly pushed them toward a generic
solution (and DM multipath was born).  If every transport implements its
own multipathing then it'll be a more generic, yet very similar,
fragmentation problem.

But if your native NVMe multipathing really is factored such that the
actual IO fast path is implemented in block core, and transport specific
hooks are called out to as needed, then you've simply reimplement DM
multipath in block core.

Pretty weird place to invest _so_ much energy before you've fully
explored how unworkable DM multipath support for NVMe is.  But I
digress.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Feb. 20, 2017, 2:14 p.m. UTC | #29
On Tue, Feb 14 2017 at  4:19pm -0500,
Keith Busch <keith.busch@intel.com> wrote:

> These devices are mulitpath capable, and have been able to stack with
> dm-mpath since kernel 4.2.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  libmultipath/blacklist.c   | 2 +-
>  multipath/multipath.conf.5 | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/blacklist.c b/libmultipath/blacklist.c
> index 36af282..ee396e2 100644
> --- a/libmultipath/blacklist.c
> +++ b/libmultipath/blacklist.c
> @@ -169,7 +169,7 @@ setup_default_blist (struct config * conf)
>  	char * str;
>  	int i;
>  
> -	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk|nvme)[0-9]");
> +	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]");
>  	if (!str)
>  		return 1;
>  	if (store_ble(conf->blist_devnode, str, ORIGIN_DEFAULT))
> diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
> index 7de8bc7..d6c6c52 100644
> --- a/multipath/multipath.conf.5
> +++ b/multipath/multipath.conf.5
> @@ -943,7 +943,7 @@ The following keywords are recognized:
>  Regular expression of the device nodes to be excluded.
>  .RS
>  .TP
> -The default is: \fB^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk|nvme)[0-9]\fR and \fB^(td|hd|vd)[a-z]\fR
> +The default is: \fB^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]\fR and \fB^(td|hd|vd)[a-z]\fR
>  .RE
>  .TP
>  .B wwid

Christophe,

Please take this.  The original commit 5c412e47e589 ("dm-multipath:
blacklist NVMe devices") was completely unjustified.  Not sure how in
2014 this person had any basis for pushing that change.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski Feb. 20, 2017, 6:17 p.m. UTC | #30
On Wed, Feb 15, 2017 at 06:56:17AM -0800, Christoph Hellwig wrote:
> On Tue, Feb 14, 2017 at 04:19:13PM -0500, Keith Busch wrote:
> > These devices are mulitpath capable, and have been able to stack with
> > dm-mpath since kernel 4.2.
> 
> Can we make this conditional on something?  I have native NVMe
> multipathing almost ready for the next merge window which is a lot easier
> to use and faster than dm.  So I don't want us to be locked into this
> mode just before that.

It seems to me that it already is conditional. Keith just made
multipath-tools default to using NVMe devices. If you don't want it to,
you can add them back to the blacklist.

blacklist {
	devnode "^nvme[0-9]"
}

Just like you could turn it on before by overriding the blacklist
(except that it wouldn't work without the device handler fix and the
rest of Keith's code to setup NVMe devices correctly, which I trust is
unobjectionable).

-Ben

> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Christophe Varoqui Feb. 27, 2017, 5:37 a.m. UTC | #31
Applied.

On Tue, Feb 14, 2017 at 10:19 PM, Keith Busch <keith.busch@intel.com> wrote:

> These devices are mulitpath capable, and have been able to stack with
> dm-mpath since kernel 4.2.
>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  libmultipath/blacklist.c   | 2 +-
>  multipath/multipath.conf.5 | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libmultipath/blacklist.c b/libmultipath/blacklist.c
> index 36af282..ee396e2 100644
> --- a/libmultipath/blacklist.c
> +++ b/libmultipath/blacklist.c
> @@ -169,7 +169,7 @@ setup_default_blist (struct config * conf)
>         char * str;
>         int i;
>
> -       str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk|nvme)[0-
> 9]");
> +       str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]");
>         if (!str)
>                 return 1;
>         if (store_ble(conf->blist_devnode, str, ORIGIN_DEFAULT))
> diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
> index 7de8bc7..d6c6c52 100644
> --- a/multipath/multipath.conf.5
> +++ b/multipath/multipath.conf.5
> @@ -943,7 +943,7 @@ The following keywords are recognized:
>  Regular expression of the device nodes to be excluded.
>  .RS
>  .TP
> -The default is: \fB^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk|nvme)[0-9]\fR
> and \fB^(td|hd|vd)[a-z]\fR
> +The default is: \fB^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]\fR
> and \fB^(td|hd|vd)[a-z]\fR
>  .RE
>  .TP
>  .B wwid
> --
> 2.5.5
>
>
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/libmultipath/blacklist.c b/libmultipath/blacklist.c
index 36af282..ee396e2 100644
--- a/libmultipath/blacklist.c
+++ b/libmultipath/blacklist.c
@@ -169,7 +169,7 @@  setup_default_blist (struct config * conf)
 	char * str;
 	int i;
 
-	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk|nvme)[0-9]");
+	str = STRDUP("^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]");
 	if (!str)
 		return 1;
 	if (store_ble(conf->blist_devnode, str, ORIGIN_DEFAULT))
diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index 7de8bc7..d6c6c52 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
@@ -943,7 +943,7 @@  The following keywords are recognized:
 Regular expression of the device nodes to be excluded.
 .RS
 .TP
-The default is: \fB^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk|nvme)[0-9]\fR and \fB^(td|hd|vd)[a-z]\fR
+The default is: \fB^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]\fR and \fB^(td|hd|vd)[a-z]\fR
 .RE
 .TP
 .B wwid