diff mbox

Fwd: usb: uas: device reset most the time while enumeration- usb3.0

Message ID 848b9bcb5da146ae8953148ba3d5254a@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Tushar Nimkar May 17, 2018, 6:59 a.m. UTC
Hi All,

On 2018-04-24 14:42, Oliver Neukum wrote:
> Am Montag, den 23.04.2018, 18:35 +0530 schrieb Tushar Nimkar:
>> hi,
>> 
>> On 2018-04-23 18:20, Oliver Neukum wrote:
>> > Am Donnerstag, den 19.04.2018, 20:18 +0530 schrieb Tushar Nimkar:
> 
>> > No. But for testing we don't need to. Can you confirm that
>> > the problem is triggered by specific commands?
>> 
>> Observed with REPORT_LUN or READ_CAP16 or INQUIRY command.
> 
> Well, that is not the same thing. It is possible that these commands
> generally fail, but x86_64 has a working error handling but ARM does
> not.
> 
>> Are you planning to test it with dwc3?
> 
> No, I am sorry. I would need to acquire some very specific hardware.
> 
> 	Regards
> 		Oliver
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please have look on the proposed fix for the slow enumeration and reset 
issue
for UAS devices(after SD_TIMEOUT).
Fix for :
...
[  123.672358] usb 2-1: new SuperSpeed USB device number 2 using 
xhci-hcd
[  123.711940] scsi host0: uas
[  123.867378] scsi 0:0:0:0: Direct-Access     Samsung  Portable SSD T3  
0    PQ: 0 ANSI: 6
[  155.132319] sd 0:0:0:0: tag#1 uas_eh_abort_handler 0 uas-tag 2 
inflight: CMD IN
[  155.132359] sd 0:0:0:0: tag#1 CDB: opcode=0x9e, sa=0x10 9e 10 00 00 
00 00 00 00 00 00 00 00 00 20 00 00
[  155.138868] scsi host0: uas_eh_device_reset_handler start
...

If changes seems fine will push this patch to upstream : )

Issue is mostly occurring while we issue the READ_CAPACITY_16 or 
REPORT_LUNS or INQ commands.
I have confirm this with USB analyzer too. But issue is not in those 
commands(they are supported)
Also which mean there is no problem in scanning lun with REPORT_LUN or 
older sequential scan,
which I was suspecting the issue cause.
Those commands issued from different layers say: sd.. uas.. scsi..
so making them to go one after other. Once REPORT_LUN done go with 
READ_CAPACITY_16.
This is only for the UAS devices. I believe no disturb to BOT behavior.

so far tested with two UAS SSDs  dwc3 as host controller with 3.00a as 
version
	1) StoreJet TS256GESD400K
	2) Samsung  Portable SSD T3

Draft Patch :

--
  drivers/scsi/sd.c              | 8 ++++++++
  drivers/usb/storage/scsiglue.c | 1 +
  drivers/usb/storage/uas.c      | 6 ++++++
  include/scsi/scsi_host.h       | 6 ++++++
  4 files changed, 21 insertions(+)

Comments

Oliver Neukum May 17, 2018, 1:34 p.m. UTC | #1
Am Donnerstag, den 17.05.2018, 12:29 +0530 schrieb Tushar Nimkar:
> Those commands issued from different layers say: sd.. uas.. scsi..
> so making them to go one after other. Once REPORT_LUN done go with 
> READ_CAPACITY_16.
> This is only for the UAS devices. I believe no disturb to BOT behavior.

Hi,

this is good news.

1. We cannot slow down all UAS devices because a few are broken. This
would need to be selective.
2. What is insufficient about "shost->async_scan" for your approach?

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tushar Nimkar May 24, 2018, 2:43 p.m. UTC | #2
Oliver,
On 2018-05-17 19:04, Oliver Neukum wrote:
> Am Donnerstag, den 17.05.2018, 12:29 +0530 schrieb Tushar Nimkar:
>> Those commands issued from different layers say: sd.. uas.. scsi..
>> so making them to go one after other. Once REPORT_LUN done go with
>> READ_CAPACITY_16.
>> This is only for the UAS devices. I believe no disturb to BOT 
>> behavior.
> 
> Hi,
> 
> this is good news.
> 
> 1. We cannot slow down all UAS devices because a few are broken. This
> would need to be selective.
> 2. What is insufficient about "shost->async_scan" for your approach?
Unfortunately for our build CONFIG_SCSI_SCAN_ASYNC wan not set.
And enabling it solves our issue of enumeration.
But as per bellow commit

"If you have built SCSI as modules, enabling this option can
be a problem as the devices may not have been found by the time your
system expects them to have been."

commit 21db1882f79a1ad5977cae6766376a63f60ec414
Author: Matthew Wilcox <matthew@wil.cx>
Date:   Wed Nov 22 13:24:52 2006 -0700

     [SCSI] Add Kconfig option for asynchronous SCSI scanning

     Without this patch, the user has to add a kernel command line 
parameter
     to get asynchronous SCSI scanning.  Now they can select the default 
at
     compile time and still override it at boot time if they need to.

     Signed-off-by: Matthew Wilcox <matthew@wil.cx>
     Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>


We have built SCSI as module will it cause any problem to enable
CONFIG_SCSI_SCAN_ASYNC ?


> 
> 	Regards
> 		Oliver
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum May 28, 2018, 12:42 p.m. UTC | #3
Am Donnerstag, den 24.05.2018, 20:13 +0530 schrieb Tushar Nimkar:
> We have built SCSI as module will it cause any problem to enable
> CONFIG_SCSI_SCAN_ASYNC ?

No, that should work. But the failure is bizzare. You say
synchronous scanning fails, but async scan works?

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tushar Nimkar May 29, 2018, 9:02 a.m. UTC | #4
Hi,

On 2018-05-28 18:12, Oliver Neukum wrote:
> Am Donnerstag, den 24.05.2018, 20:13 +0530 schrieb Tushar Nimkar:
>> We have built SCSI as module will it cause any problem to enable
>> CONFIG_SCSI_SCAN_ASYNC ?
> 
> No, that should work. But the failure is bizzare. You say
yes there is no problem! I have run some test over night.
> synchronous scanning fails, but async scan works?
yes!
Can you check for other host too? I have only dwc3 (3.0a)
That's why my patch(shared already) was working as it was synchronous 
scanning.
> 
> 	Regards
> 		Oliver
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Neukum June 4, 2018, 2:27 p.m. UTC | #5
Am Dienstag, den 29.05.2018, 14:32 +0530 schrieb Tushar Nimkar:
> Hi,
> 
> On 2018-05-28 18:12, Oliver Neukum wrote:
> > Am Donnerstag, den 24.05.2018, 20:13 +0530 schrieb Tushar Nimkar:
> > > We have built SCSI as module will it cause any problem to enable
> > > CONFIG_SCSI_SCAN_ASYNC ?
> > 
> > No, that should work. But the failure is bizzare. You say
> 
> yes there is no problem! I have run some test over night.
> > synchronous scanning fails, but async scan works?
> 
> yes!

Odd. Extremely odd. Does this show on all architectures?
I am asking because that is the crucial question.

I see two possibilities

1. the sync probing code has a bug that shows only on some
architectures
2. architectures were a coincidence - the drive is broken

We absolutely need to know what is happening.
I am afraid this will have to be tested on another architecture.

	Regards
		Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tushar Nimkar June 5, 2018, 12:48 p.m. UTC | #6
Hi,

On 2018-06-04 19:57, Oliver Neukum wrote:
> Am Dienstag, den 29.05.2018, 14:32 +0530 schrieb Tushar Nimkar:
>> Hi,
>> 
>> On 2018-05-28 18:12, Oliver Neukum wrote:
>> > Am Donnerstag, den 24.05.2018, 20:13 +0530 schrieb Tushar Nimkar:
>> > > We have built SCSI as module will it cause any problem to enable
>> > > CONFIG_SCSI_SCAN_ASYNC ?
>> >
>> > No, that should work. But the failure is bizzare. You say
>> 
>> yes there is no problem! I have run some test over night.
>> > synchronous scanning fails, but async scan works?
>> 
>> yes!
> 
> Odd. Extremely odd. Does this show on all architectures?
I have tried with different SOC version with same architecture(ARM)and
Synopsys host controller.
> I am asking because that is the crucial question.
> 
> I see two possibilities
> 
> 1. the sync probing code has a bug that shows only on some
> architectures
> 2. architectures were a coincidence - the drive is broken
> 
> We absolutely need to know what is happening.
Please let me know if I could contribute on this issue.
I would always like to work on it :)

> I am afraid this will have to be tested on another architecture.
> 
> 	Regards
> 		Oliver
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 78430ef..38c23ad 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2979,6 +2979,14 @@  static void sd_probe_async(void *data, 
async_cookie_t cookie)
  	struct device *dev;

  	sdp = sdkp->device;
+
+	/*
+	 *  Continue async probe once host scanning is finished
+	 *  this is only for uas devices
+	 */
+	if (sdp->host->hostt->is_uas)
+		wait_for_completion_interruptible_timeout(&sdp->host->hscan_done,
+						msecs_to_jiffies(500));
  	gd = sdkp->disk;
  	index = sdkp->index;
  	dev = &sdp->sdev_gendev;
diff --git a/drivers/usb/storage/scsiglue.c 
b/drivers/usb/storage/scsiglue.c
index dba5136..fbd397a 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -594,6 +594,7 @@  void usb_stor_host_template_init(struct 
scsi_host_template *sht,
  	sht->name = name;
  	sht->proc_name = name;
  	sht->module = owner;
+	sht->is_uas = false;    /* BOT device */
  }
  EXPORT_SYMBOL_GPL(usb_stor_host_template_init);

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index f952635..ead73d0 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -44,6 +44,7 @@  struct uas_dev_info {
  	unsigned use_streams:1;
  	unsigned shutdown:1;
  	struct scsi_cmnd *cmnd[MAX_CMNDS];
+	struct Scsi_Host *shost;
  	spinlock_t lock;
  	struct work_struct work;
  };
@@ -828,6 +829,7 @@  static struct scsi_host_template uas_host_template = 
{
  	.this_id = -1,
  	.sg_tablesize = SG_NONE,
  	.skip_settle_delay = 1,
+	.is_uas = true,
  };

  #define UNUSUAL_DEV(id_vendor, id_product, bcdDeviceMin, bcdDeviceMax, 
\
@@ -934,11 +936,13 @@  static int uas_probe(struct usb_interface *intf, 
const struct usb_device_id *id)
  	devinfo->resetting = 0;
  	devinfo->shutdown = 0;
  	devinfo->flags = dev_flags;
+	devinfo->shost = shost;
  	init_usb_anchor(&devinfo->cmd_urbs);
  	init_usb_anchor(&devinfo->sense_urbs);
  	init_usb_anchor(&devinfo->data_urbs);
  	spin_lock_init(&devinfo->lock);
  	INIT_WORK(&devinfo->work, uas_do_work);
+	init_completion(&shost->hscan_done);

  	result = uas_configure_endpoints(devinfo);
  	if (result)
@@ -956,6 +960,8 @@  static int uas_probe(struct usb_interface *intf, 
const struct usb_device_id *id)
  		goto free_streams;

  	scsi_scan_host(shost);
+	/* once scsi_host_scan done, let others to do their stuffs */
+	complete(&shost->hscan_done);
  	return result;

  free_streams:
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index fcfa3d7..b7d27a8 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -498,6 +498,9 @@  struct scsi_host_template {

  	/* temporary flag to disable blk-mq I/O path */
  	bool disable_blk_mq;
+
+	/* let the host know  the device is BOT or UAS */
+	bool is_uas;
  };

  /*
@@ -564,6 +567,9 @@  struct Scsi_Host {
  	struct scsi_host_template *hostt;
  	struct scsi_transport_template *transportt;

+	/* tell other's that host scan is done */
+	struct completion	hscan_done;
+
  	/*
  	 * Area to keep a shared tag map (if needed, will be
  	 * NULL if not).