[RFC] scsi: target: detect XCOPY NAA descriptor conflicts
diff mbox series

Message ID 20200813002142.13820-1-ddiss@suse.de
State New
Headers show
Series
  • [RFC] scsi: target: detect XCOPY NAA descriptor conflicts
Related show

Commit Message

David Disseldorp Aug. 13, 2020, 12:21 a.m. UTC
LIO's XCOPY implementation currently only accepts IEEE NAA 0x83 type
device descriptors for copy source and destination IDs. These IDs are
automatically generated by spc_parse_naa_6h_vendor_specific() using
*only* hexadecimal characters present in the user-configured
vpd_unit_serial string, and advertised in the Device ID Page INQUIRY
response.

spc_parse_naa_6h_vendor_specific() mapping can quite easily result in
two devices with differing vpd_unit_serial strings sharing the same NAA
ID. E.g.
LUN0
-> backstore device=/dev/sda, vpd_unit_serial=unitserialfirst
LUN1
-> backstore device=/dev/sdb, vpd_unit_serial=unitserialforth

In this case, both LUNs would advertise an NAA ID of:
0x01405eaf0000000000000000...
Where 0x01405 corresponds to the OpenFabrics IEEE Company ID and 0xeaf
are hex characters taken from vpd_unit_serial.

With the above example, an initiator wishing to copy data from LUN0 to
LUN1 may issue an XCOPY request with a copy source and copy dest set
to 0x01405eaf... and observe that (despite XCOPY success), no data has
moved from LUN0 to LUN1. Instead LIO has processed the request using
LUN0 as source and destination.

This change sees LIO fail XCOPY requests if the copy source or
destination correspond to a non-unique NAA identifier.

Signed-off-by: David Disseldorp <ddiss@suse.de>
---
 drivers/target/target_core_xcopy.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

Comments

David Disseldorp Aug. 13, 2020, 11:08 a.m. UTC | #1
FWIW, tcmu-runner appears to use the same vpd_unit_serial -> NAA WWN
mapping logic. I'll provide a patch for it too, once we decide on the
best way to address this issue.

Cheers, David
David Disseldorp Aug. 24, 2020, 4:56 p.m. UTC | #2
Ping - some feeback here would be much appreciated?

Cheers, David
Mike Christie Sept. 2, 2020, 3:17 a.m. UTC | #3
> On Aug 12, 2020, at 7:21 PM, David Disseldorp <ddiss@suse.de> wrote:
> 
> LIO's XCOPY implementation currently only accepts IEEE NAA 0x83 type
> device descriptors for copy source and destination IDs. These IDs are
> automatically generated by spc_parse_naa_6h_vendor_specific() using
> *only* hexadecimal characters present in the user-configured
> vpd_unit_serial string, and advertised in the Device ID Page INQUIRY
> response.
> 
> spc_parse_naa_6h_vendor_specific() mapping can quite easily result in
> two devices with differing vpd_unit_serial strings sharing the same NAA
> ID. E.g.
> LUN0
> -> backstore device=/dev/sda, vpd_unit_serial=unitserialfirst
> LUN1
> -> backstore device=/dev/sdb, vpd_unit_serial=unitserialforth
> 
> In this case, both LUNs would advertise an NAA ID of:
> 0x01405eaf0000000000000000...
> Where 0x01405 corresponds to the OpenFabrics IEEE Company ID and 0xeaf
> are hex characters taken from vpd_unit_serial.
> 
> With the above example, an initiator wishing to copy data from LUN0 to
> LUN1 may issue an XCOPY request with a copy source and copy dest set
> to 0x01405eaf... and observe that (despite XCOPY success), no data has
> moved from LUN0 to LUN1. Instead LIO has processed the request using
> LUN0 as source and destination.
> 
> This change sees LIO fail XCOPY requests if the copy source or
> destination correspond to a non-unique NAA identifier.
> 
> Signed-off-by: David Disseldorp <ddiss@suse.de>
> ---
> drivers/target/target_core_xcopy.c | 23 +++++++++++++++++------
> 1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c
> index 44e15d7fb2f0..3ce5da4b3e81 100644
> --- a/drivers/target/target_core_xcopy.c
> +++ b/drivers/target/target_core_xcopy.c
> @@ -68,8 +68,14 @@ static int target_xcopy_locate_se_dev_e4_iter(struct se_device *se_dev,
> 	if (rc != 0)
> 		return 0;
> 
> -	info->found_dev = se_dev;
> 	pr_debug("XCOPY 0xe4: located se_dev: %p\n", se_dev);
> +	if (info->found_dev) {
> +		pr_warn("XCOPY 0xe4 descriptor conflict for se_dev %p and %p\n",
> +			info->found_dev, se_dev);
> +		target_undepend_item(&info->found_dev->dev_group.cg_item);
> +		return -ENOTUNIQ;
> +	}
> +	info->found_dev = se_dev;

Was it valid to copy to/from the same LUN? You would copy from/to different src/destinations on that LUN. Would your patch break that?
David Disseldorp Sept. 2, 2020, 10:23 p.m. UTC | #4
Hi Mike,

On Tue, 1 Sep 2020 22:17:51 -0500, Michael Christie wrote:

> > --- a/drivers/target/target_core_xcopy.c
> > +++ b/drivers/target/target_core_xcopy.c
> > @@ -68,8 +68,14 @@ static int target_xcopy_locate_se_dev_e4_iter(struct se_device *se_dev,
> > 	if (rc != 0)
> > 		return 0;
> > 
> > -	info->found_dev = se_dev;
> > 	pr_debug("XCOPY 0xe4: located se_dev: %p\n", se_dev);
> > +	if (info->found_dev) {
> > +		pr_warn("XCOPY 0xe4 descriptor conflict for se_dev %p and %p\n",
> > +			info->found_dev, se_dev);
> > +		target_undepend_item(&info->found_dev->dev_group.cg_item);
> > +		return -ENOTUNIQ;
> > +	}
> > +	info->found_dev = se_dev;  
> 
> Was it valid to copy to/from the same LUN? You would copy from/to different src/destinations on that LUN. Would your patch break that?

XCOPY allows for copies to occur on the same LUN or between separate
src/destinations. The intention of this patch is that regardless of the
source or destination, if the NAA WWN could refer to multiple LUNs on
the same target (via target_for_each_device()) then the XCOPY should
fail and force the initiator to fallback to initiator driver copy.

Cheers, David
Mike Christie Sept. 3, 2020, 3:36 p.m. UTC | #5
> On Sep 2, 2020, at 5:23 PM, David Disseldorp <ddiss@suse.de> wrote:
> 
> Hi Mike,
> 
> On Tue, 1 Sep 2020 22:17:51 -0500, Michael Christie wrote:
> 
>>> --- a/drivers/target/target_core_xcopy.c
>>> +++ b/drivers/target/target_core_xcopy.c
>>> @@ -68,8 +68,14 @@ static int target_xcopy_locate_se_dev_e4_iter(struct se_device *se_dev,
>>> 	if (rc != 0)
>>> 		return 0;
>>> 
>>> -	info->found_dev = se_dev;
>>> 	pr_debug("XCOPY 0xe4: located se_dev: %p\n", se_dev);
>>> +	if (info->found_dev) {
>>> +		pr_warn("XCOPY 0xe4 descriptor conflict for se_dev %p and %p\n",
>>> +			info->found_dev, se_dev);
>>> +		target_undepend_item(&info->found_dev->dev_group.cg_item);
>>> +		return -ENOTUNIQ;
>>> +	}
>>> +	info->found_dev = se_dev;  
>> 
>> Was it valid to copy to/from the same LUN? You would copy from/to different src/destinations on that LUN. Would your patch break that?
> 
> XCOPY allows for copies to occur on the same LUN or between separate
> src/destinations. The intention of this patch is that regardless of the
> source or destination, if the NAA WWN could refer to multiple LUNs on
> the same target (via target_for_each_device()) then the XCOPY should
> fail and force the initiator to fallback to initiator driver copy.

So is the answer to my question a maybe but it probably will never happen?

If the user has multiple backend devices with the same serial, then your patch would now return error right? Is the reason that this patch is a RFC to try and figure out if that case is valid or ever happens? If so, the only way I could see that happening on purpose is if someone was trying to bypass a device issue.

For example, I create 2 tcmu devices. They both point to the same real device. Then export dev1 through target port1 and dev2 through target port2. Each tcmu device would then have it’s own data/cmd ring and locking, so you do not hit those perf issues. I do this for perf testing. I don’t think that type of thing is common or ever done, so I think the patch would be ok if that is a concern and it’s better than possible data corruption.

Code wise it looks ok to me.
David Disseldorp Sept. 3, 2020, 8:54 p.m. UTC | #6
On Thu, 3 Sep 2020 10:36:58 -0500, Michael Christie wrote:

> > On Sep 2, 2020, at 5:23 PM, David Disseldorp <ddiss@suse.de> wrote:
> > 
> > Hi Mike,
> > 
> > On Tue, 1 Sep 2020 22:17:51 -0500, Michael Christie wrote:
> >   
> >>> --- a/drivers/target/target_core_xcopy.c
> >>> +++ b/drivers/target/target_core_xcopy.c
> >>> @@ -68,8 +68,14 @@ static int target_xcopy_locate_se_dev_e4_iter(struct se_device *se_dev,
> >>> 	if (rc != 0)
> >>> 		return 0;
> >>> 
> >>> -	info->found_dev = se_dev;
> >>> 	pr_debug("XCOPY 0xe4: located se_dev: %p\n", se_dev);
> >>> +	if (info->found_dev) {
> >>> +		pr_warn("XCOPY 0xe4 descriptor conflict for se_dev %p and %p\n",
> >>> +			info->found_dev, se_dev);
> >>> +		target_undepend_item(&info->found_dev->dev_group.cg_item);
> >>> +		return -ENOTUNIQ;
> >>> +	}
> >>> +	info->found_dev = se_dev;    
> >> 
> >> Was it valid to copy to/from the same LUN? You would copy from/to different src/destinations on that LUN. Would your patch break that?  
> > 
> > XCOPY allows for copies to occur on the same LUN or between separate
> > src/destinations. The intention of this patch is that regardless of the
> > source or destination, if the NAA WWN could refer to multiple LUNs on
> > the same target (via target_for_each_device()) then the XCOPY should
> > fail and force the initiator to fallback to initiator driver copy.  
> 
> So is the answer to my question a maybe but it probably will never happen?

A src=dest XCOPY? I think it's just as likely as a cross device XCOPY.
The UUID collision error is probably unlikely to be triggered because:
- XCOPY is a pretty exotic SCSI command mostly used by ESXi
- Users may already provide a vpd_unit_serial with enough unique
  hexadecimal characters(?)
- The initiator could detect the NAA WWN collision itself by comparing
  the INQUIRY(dev-id)->NAA between other LUNs on the target, and thereby
  detect+avoid sending XCOPY requests with ambiguous src/dest WWNs

> If the user has multiple backend devices with the same serial, then your patch would now return error right?

Yes, XCOPY will now fail if the src or dest NAA WWN matches more than
one se_dev. Keep in mind that the NAA WWN is derived from only the
hexadecimal characters of the vpd_unit_serial (see
spc_parse_naa_6h_vendor_specific), so collision might be more likely.

> Is the reason that this patch is a RFC to try and figure out if that case is valid or ever happens? If so, the only way I could see that happening on purpose is if someone was trying to bypass a device issue.

Sorry, I should have mode this more clear in the patch itself. The
reasons for RFC are:
- there might be a better approach. I considered detecting NAA WWN
  collisions when the vpd_unit_serial is set via configfs. Throwing a
  configfs error might break existing setups, rather than just
  triggering the XCOPY error (allowing for subsequent READ/WRITE
  fallback).
- I've tested this with libiscsi's iscsi-dd (-x) and test suite, but not
  against the ESXi initiator yet.

> For example, I create 2 tcmu devices. They both point to the same real device. Then export dev1 through target port1 and dev2 through target port2. Each tcmu device would then have it’s own data/cmd ring and locking, so you do not hit those perf issues. I do this for perf testing. I don’t think that type of thing is common or ever done, so I think the patch would be ok if that is a concern and it’s better than possible data corruption.
> 
> Code wise it looks ok to me.

Thanks a lot for the feedback, Mike.

Cheers, David

Patch
diff mbox series

diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c
index 44e15d7fb2f0..3ce5da4b3e81 100644
--- a/drivers/target/target_core_xcopy.c
+++ b/drivers/target/target_core_xcopy.c
@@ -68,8 +68,14 @@  static int target_xcopy_locate_se_dev_e4_iter(struct se_device *se_dev,
 	if (rc != 0)
 		return 0;
 
-	info->found_dev = se_dev;
 	pr_debug("XCOPY 0xe4: located se_dev: %p\n", se_dev);
+	if (info->found_dev) {
+		pr_warn("XCOPY 0xe4 descriptor conflict for se_dev %p and %p\n",
+			info->found_dev, se_dev);
+		target_undepend_item(&info->found_dev->dev_group.cg_item);
+		return -ENOTUNIQ;
+	}
+	info->found_dev = se_dev;
 
 	rc = target_depend_item(&se_dev->dev_group.cg_item);
 	if (rc != 0) {
@@ -80,7 +86,8 @@  static int target_xcopy_locate_se_dev_e4_iter(struct se_device *se_dev,
 
 	pr_debug("Called configfs_depend_item for se_dev: %p se_dev->se_dev_group: %p\n",
 		 se_dev, &se_dev->dev_group);
-	return 1;
+	/* continue iteration to check for conflicts */
+	return 0;
 }
 
 static int target_xcopy_locate_se_dev_e4(const unsigned char *dev_wwn,
@@ -93,13 +100,14 @@  static int target_xcopy_locate_se_dev_e4(const unsigned char *dev_wwn,
 	info.dev_wwn = dev_wwn;
 
 	ret = target_for_each_device(target_xcopy_locate_se_dev_e4_iter, &info);
-	if (ret == 1) {
-		*found_dev = info.found_dev;
-		return 0;
-	} else {
+	if (ret < 0) {
+		return ret;
+	} else if (!info.found_dev) {
 		pr_debug_ratelimited("Unable to locate 0xe4 descriptor for EXTENDED_COPY\n");
 		return -EINVAL;
 	}
+	*found_dev = info.found_dev;
+	return 0;
 }
 
 static int target_xcopy_parse_tiddesc_e4(struct se_cmd *se_cmd, struct xcopy_op *xop,
@@ -264,6 +272,9 @@  static int target_xcopy_parse_target_descriptors(struct se_cmd *se_cmd,
 	 * is not located on this node, return COPY_ABORTED with ASQ/ASQC
 	 * 0x0d/0x02 - COPY_TARGET_DEVICE_NOT_REACHABLE to request the
 	 * initiator to fall back to normal copy method.
+	 * Fall back will also be requested if a IEEE NAA 0x83 descriptor
+	 * is not unique across all devices, which can occur due to suboptimal
+	 * vpd_unit_serial mapping in spc_parse_naa_6h_vendor_specific().
 	 */
 	if (rc < 0) {
 		*sense_ret = TCM_COPY_TARGET_DEVICE_NOT_REACHABLE;