diff mbox

NVMeoF multi-path setup

Message ID 1467323858.15863.3.camel@ssi (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show

Commit Message

Ming Lin June 30, 2016, 9:57 p.m. UTC
On Thu, 2016-06-30 at 14:08 -0700, Ming Lin wrote:
> Hi Mike,
> 
> I'm trying to test NVMeoF multi-path.
> 
> root@host:~# lsmod |grep dm_multipath
> dm_multipath           24576  0
> root@host:~# ps aux |grep multipath
> root     13183  0.0  0.1 238452  4972 ?        SLl  13:41   0:00
> /sbin/multipathd
> 
> I have nvme0 and nvme1 that are 2 paths to the same NVMe subsystem.
> 
> root@host:/sys/class/nvme# grep . nvme*/address
> nvme0/address:traddr=192.168.3.2,trsvcid=1023
> nvme1/address:traddr=192.168.2.2,trsvcid=1023
> 
> root@host:/sys/class/nvme# grep . nvme*/subsysnqn
> nvme0/subsysnqn:nqn.testiqn
> nvme1/subsysnqn:nqn.testiqn
> 
> root@host:~# /lib/udev/scsi_id --export --whitelisted -d /dev/nvme1n1
> ID_SCSI=1
> ID_VENDOR=NVMe
> ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
> ID_MODEL=Linux
> ID_MODEL_ENC=Linux
> ID_REVISION=0-rc
> ID_TYPE=disk
> ID_SERIAL=SNVMe_Linux
> ID_SERIAL_SHORT=
> ID_SCSI_SERIAL=1122334455667788
> 
> root@host:~# /lib/udev/scsi_id --export --whitelisted -d /dev/nvme0n1
> ID_SCSI=1
> ID_VENDOR=NVMe
> ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
> ID_MODEL=Linux
> ID_MODEL_ENC=Linux
> ID_REVISION=0-rc
> ID_TYPE=disk
> ID_SERIAL=SNVMe_Linux
> ID_SERIAL_SHORT=
> ID_SCSI_SERIAL=1122334455667788
> 
> But seems multipathd didn't recognize these 2 devices.
> 
> What else I'm missing?

There are two problems:

1. there is no "/block/" in the path

/sys/devices/virtual/nvme-fabrics/block/nvme0/nvme0n1

2. nvme was blacklisted.

I added below quick hack to just make it work.

root@host:~# cat /proc/partitions

 259        0  937692504 nvme0n1
 252        0  937692504 dm-0
 259        1  937692504 nvme1n1



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Ming Lin June 30, 2016, 10:19 p.m. UTC | #1
On Thu, Jun 30, 2016 at 2:57 PM, Ming Lin <mlin@kernel.org> wrote:
>
> There are two problems:
>
> 1. there is no "/block/" in the path
>
> /sys/devices/virtual/nvme-fabrics/block/nvme0/nvme0n1

Typo, the path is:
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0n1

>
> 2. nvme was blacklisted.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer June 30, 2016, 10:52 p.m. UTC | #2
On Thu, Jun 30 2016 at  5:57pm -0400,
Ming Lin <mlin@kernel.org> wrote:

> On Thu, 2016-06-30 at 14:08 -0700, Ming Lin wrote:
> > Hi Mike,
> > 
> > I'm trying to test NVMeoF multi-path.
> > 
> > root@host:~# lsmod |grep dm_multipath
> > dm_multipath           24576  0
> > root@host:~# ps aux |grep multipath
> > root     13183  0.0  0.1 238452  4972 ?        SLl  13:41   0:00
> > /sbin/multipathd
> > 
> > I have nvme0 and nvme1 that are 2 paths to the same NVMe subsystem.
> > 
> > root@host:/sys/class/nvme# grep . nvme*/address
> > nvme0/address:traddr=192.168.3.2,trsvcid=1023
> > nvme1/address:traddr=192.168.2.2,trsvcid=1023
> > 
> > root@host:/sys/class/nvme# grep . nvme*/subsysnqn
> > nvme0/subsysnqn:nqn.testiqn
> > nvme1/subsysnqn:nqn.testiqn
> > 
> > root@host:~# /lib/udev/scsi_id --export --whitelisted -d /dev/nvme1n1
> > ID_SCSI=1
> > ID_VENDOR=NVMe
> > ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
> > ID_MODEL=Linux
> > ID_MODEL_ENC=Linux
> > ID_REVISION=0-rc
> > ID_TYPE=disk
> > ID_SERIAL=SNVMe_Linux
> > ID_SERIAL_SHORT=
> > ID_SCSI_SERIAL=1122334455667788
> > 
> > root@host:~# /lib/udev/scsi_id --export --whitelisted -d /dev/nvme0n1
> > ID_SCSI=1
> > ID_VENDOR=NVMe
> > ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
> > ID_MODEL=Linux
> > ID_MODEL_ENC=Linux
> > ID_REVISION=0-rc
> > ID_TYPE=disk
> > ID_SERIAL=SNVMe_Linux
> > ID_SERIAL_SHORT=
> > ID_SCSI_SERIAL=1122334455667788
> > 
> > But seems multipathd didn't recognize these 2 devices.
> > 
> > What else I'm missing?
> 
> There are two problems:
> 
> 1. there is no "/block/" in the path
> 
> /sys/devices/virtual/nvme-fabrics/block/nvme0/nvme0n1

You clarified that it is:
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0n1

Do you have CONFIG_BLK_DEV_NVME_SCSI enabled?

AFAIK, hch had Intel disable that by default in the hopes of avoiding
people having dm-multipath "just work" with NVMeoF.  (Makes me wonder
what other unpleasant unilateral decisions were made because some
non-existant NVMe specific multipath capabilities would be forthcoming
but I digress).

My understanding is that enabling CONFIG_BLK_DEV_NVME_SCSI will cause
NVMe to respond favorably to standard SCSI VPD inquiries.

And _yes_, Red Hat will be enabling it so users have options!

Also, just so you're aware, I've staged bio-based dm-multipath support
for the 4.8 merge window.  Please see either the 'for-next' or 'dm-4.8'
branch in linux-dm.git:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8

I'd welcome you testing if bio-based dm-multipath performs better for
you than blk-mq request-based dm-multipath.  Both modes (using the 4.8
staged code) can be easily selected on a per DM multipath device table
by adding either: queue_mode=bio or queue_mode=mq

(made possible with this commit:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=e83068a5faafb8ca65d3b58bd1e1e3959ce1ddce
)

> 2. nvme was blacklisted.
> 
> I added below quick hack to just make it work.
> 
> root@host:~# cat /proc/partitions
> 
>  259        0  937692504 nvme0n1
>  252        0  937692504 dm-0
>  259        1  937692504 nvme1n1
> 
> diff --git a/libmultipath/blacklist.c b/libmultipath/blacklist.c
> index 2400eda..a143383 100644
> --- a/libmultipath/blacklist.c
> +++ b/libmultipath/blacklist.c
> @@ -190,9 +190,11 @@ setup_default_blist (struct config * conf)
>  	if (store_ble(conf->blist_devnode, str, ORIGIN_DEFAULT))
>  		return 1;
>  
> +#if 0
>  	str = STRDUP("^nvme.*");
>  	if (!str)
>  		return 1;
> +#endif
>  	if (store_ble(conf->blist_devnode, str, ORIGIN_DEFAULT))
>  		return 1;

That's weird, not sure why that'd be the case.. maybe because NVMeoF
hasn't been worked through to "just work" with multipath-tools
yet.. Ben? Hannes?

> diff --git a/multipathd/main.c b/multipathd/main.c
> index c0ca571..1364070 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1012,6 +1012,7 @@ uxsock_trigger (char * str, char ** reply, int * len, void * trigger_data)
>  static int
>  uev_discard(char * devpath)
>  {
> +#if 0
>  	char *tmp;
>  	char a[11], b[11];
>  
> @@ -1028,6 +1029,7 @@ uev_discard(char * devpath)
>  		condlog(4, "discard event on %s", devpath);
>  		return 1;
>  	}
> +#endif
>  	return 0;
>  }

Why did you have to comment out this discard code?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer June 30, 2016, 10:57 p.m. UTC | #3
On Thu, Jun 30 2016 at  6:52pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> Also, just so you're aware, I've staged bio-based dm-multipath support
> for the 4.8 merge window.  Please see either the 'for-next' or 'dm-4.8'
> branch in linux-dm.git:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8
> 
> I'd welcome you testing if bio-based dm-multipath performs better for
> you than blk-mq request-based dm-multipath.  Both modes (using the 4.8
> staged code) can be easily selected on a per DM multipath device table
> by adding either: queue_mode=bio or queue_mode=mq
> 
> (made possible with this commit:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=e83068a5faafb8ca65d3b58bd1e1e3959ce1ddce
> )

Sorry, no = should be used. you need either:
"queue_mode bio" or "queue_mode mq"

Added to the features section of the "multipath" ctr input.
AFAIK, once the above commit lands upstream Ben will be adding some
multipath-tools code to make configuring queue_mode easy (but I think
multipath.conf may allow you to extend the features passed on a
per-device basis already.. but I'd have to look).

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Keith Busch June 30, 2016, 11:14 p.m. UTC | #4
On Thu, Jun 30, 2016 at 06:52:07PM -0400, Mike Snitzer wrote:
> AFAIK, hch had Intel disable that by default in the hopes of avoiding
> people having dm-multipath "just work" with NVMeoF.  (Makes me wonder
> what other unpleasant unilateral decisions were made because some
> non-existant NVMe specific multipath capabilities would be forthcoming
> but I digress).

For the record, Intel was okay with making SCSI a separate config option,
but I was pretty clear about our wish to let it default to 'Y', which
didn't happen. :)

To be fair, NVMe's SCSI translation is a bit of a kludge, and we have
better ways to get device identification now. Specifically, the block
device provides 'ATTR{wwid}' available to all NVMe namespaces in existing
kernel releases.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Sagi Grimberg July 13, 2016, 10:19 a.m. UTC | #5
On 01/07/16 01:52, Mike Snitzer wrote:
> On Thu, Jun 30 2016 at  5:57pm -0400,
> Ming Lin <mlin@kernel.org> wrote:
>
>> On Thu, 2016-06-30 at 14:08 -0700, Ming Lin wrote:
>>> Hi Mike,
>>>
>>> I'm trying to test NVMeoF multi-path.
>>>
>>> root@host:~# lsmod |grep dm_multipath
>>> dm_multipath           24576  0
>>> root@host:~# ps aux |grep multipath
>>> root     13183  0.0  0.1 238452  4972 ?        SLl  13:41   0:00
>>> /sbin/multipathd
>>>
>>> I have nvme0 and nvme1 that are 2 paths to the same NVMe subsystem.
>>>
>>> root@host:/sys/class/nvme# grep . nvme*/address
>>> nvme0/address:traddr=192.168.3.2,trsvcid=1023
>>> nvme1/address:traddr=192.168.2.2,trsvcid=1023
>>>
>>> root@host:/sys/class/nvme# grep . nvme*/subsysnqn
>>> nvme0/subsysnqn:nqn.testiqn
>>> nvme1/subsysnqn:nqn.testiqn
>>>
>>> root@host:~# /lib/udev/scsi_id --export --whitelisted -d /dev/nvme1n1
>>> ID_SCSI=1
>>> ID_VENDOR=NVMe
>>> ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
>>> ID_MODEL=Linux
>>> ID_MODEL_ENC=Linux
>>> ID_REVISION=0-rc
>>> ID_TYPE=disk
>>> ID_SERIAL=SNVMe_Linux
>>> ID_SERIAL_SHORT=
>>> ID_SCSI_SERIAL=1122334455667788
>>>
>>> root@host:~# /lib/udev/scsi_id --export --whitelisted -d /dev/nvme0n1
>>> ID_SCSI=1
>>> ID_VENDOR=NVMe
>>> ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
>>> ID_MODEL=Linux
>>> ID_MODEL_ENC=Linux
>>> ID_REVISION=0-rc
>>> ID_TYPE=disk
>>> ID_SERIAL=SNVMe_Linux
>>> ID_SERIAL_SHORT=
>>> ID_SCSI_SERIAL=1122334455667788
>>>
>>> But seems multipathd didn't recognize these 2 devices.
>>>
>>> What else I'm missing?
>>
>> There are two problems:
>>
>> 1. there is no "/block/" in the path
>>
>> /sys/devices/virtual/nvme-fabrics/block/nvme0/nvme0n1
>
> You clarified that it is:
> /sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0n1
>
> Do you have CONFIG_BLK_DEV_NVME_SCSI enabled?

Indeed, for dm-multipath we need CONFIG_BLK_DEV_NVME_SCSI on.

Another thing I noticed was that for nvme we need to manually
set the timeout value because nvme devices don't expose
device/timeout sysfs file. This causes dm-multipath to take
a 200 seconds default (not a huge problem because we
have keep alive in fabrics too).

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/libmultipath/blacklist.c b/libmultipath/blacklist.c
index 2400eda..a143383 100644
--- a/libmultipath/blacklist.c
+++ b/libmultipath/blacklist.c
@@ -190,9 +190,11 @@  setup_default_blist (struct config * conf)
 	if (store_ble(conf->blist_devnode, str, ORIGIN_DEFAULT))
 		return 1;
 
+#if 0
 	str = STRDUP("^nvme.*");
 	if (!str)
 		return 1;
+#endif
 	if (store_ble(conf->blist_devnode, str, ORIGIN_DEFAULT))
 		return 1;
 
diff --git a/multipathd/main.c b/multipathd/main.c
index c0ca571..1364070 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1012,6 +1012,7 @@  uxsock_trigger (char * str, char ** reply, int * len, void * trigger_data)
 static int
 uev_discard(char * devpath)
 {
+#if 0
 	char *tmp;
 	char a[11], b[11];
 
@@ -1028,6 +1029,7 @@  uev_discard(char * devpath)
 		condlog(4, "discard event on %s", devpath);
 		return 1;
 	}
+#endif
 	return 0;
 }