diff mbox

CPU lock-ups with 4.12.0+ kernels related to usb_storage

Message ID Pine.LNX.4.44L0.1707131258190.1571-100000@iolanthe.rowland.org (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Alan Stern July 13, 2017, 5 p.m. UTC
On Wed, 12 Jul 2017, Christoph Hellwig wrote:

> On Wed, Jul 12, 2017 at 12:10:02PM -0400, Alan Stern wrote:
> > This is pretty conclusive.  The problem comes about because
> > usb_stor_control_thread() calls scsi_mq_done() while holding
> > shost->host_lock, and then scsi_eh_scmd_add() tries to acquire that
> > same lock.
> > 
> > I don't know why this didn't show up in earlier kernels.  I guess some
> > element of the call chain listed above must be new in 4.12.
> > 
> > Christoph, what's the best way to fix this?  Should usb-storage release
> > the host lock before issuing the ->scsi_done callback?  If so, does
> > that change need to be applied to any kernels before 4.12?
> 
> 4.12 switched to blk-mq by default, and while the old code used
> a softirq for completions, which is always a difference context
> the blk-mq code might execute in the same context it's called in.
> 
> So yes, for that we'd need to drop host_lock.  But I wonder how
> many more of these are lingering somewhere and if we can find
> another workaround.

All right.  In the meantime, changing usb-storage won't hurt.
Arthur, can you test the patch below?

Alan Stern

Comments

Arthur Marsh July 13, 2017, 6:48 p.m. UTC | #1
Alan Stern wrote on 14/07/17 02:30:

> All right.  In the meantime, changing usb-storage won't hurt.
> Arthur, can you test the patch below?
> 
> Alan Stern
> 
> 
> 
> Index: usb-4.x/drivers/usb/storage/usb.c
> ===================================================================
> --- usb-4.x.orig/drivers/usb/storage/usb.c
> +++ usb-4.x/drivers/usb/storage/usb.c
> @@ -315,6 +315,7 @@ static int usb_stor_control_thread(void
>   {
>   	struct us_data *us = (struct us_data *)__us;
>   	struct Scsi_Host *host = us_to_host(us);
> +	struct scsi_cmnd *srb;
>   
>   	for (;;) {
>   		usb_stor_dbg(us, "*** thread sleeping\n");
> @@ -330,6 +331,7 @@ static int usb_stor_control_thread(void
>   		scsi_lock(host);
>   
>   		/* When we are called with no command pending, we're done */
> +		srb = us->srb;
>   		if (us->srb == NULL) {
>   			scsi_unlock(host);
>   			mutex_unlock(&us->dev_mutex);
> @@ -398,14 +400,11 @@ static int usb_stor_control_thread(void
>   		/* lock access to the state */
>   		scsi_lock(host);
>   
> -		/* indicate that the command is done */
> -		if (us->srb->result != DID_ABORT << 16) {
> -			usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
> -				     us->srb->result);
> -			us->srb->scsi_done(us->srb);
> -		} else {
> +		/* was the command aborted? */
> +		if (us->srb->result == DID_ABORT << 16) {
>   SkipForAbort:
>   			usb_stor_dbg(us, "scsi command aborted\n");
> +			srb = NULL;	/* Don't call srb->scsi_done() */
>   		}
>   
>   		/*
> @@ -429,6 +428,13 @@ SkipForAbort:
>   
>   		/* unlock the device pointers */
>   		mutex_unlock(&us->dev_mutex);
> +
> +		/* now that the locks are released, notify the SCSI core */
> +		if (srb) {
> +			usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
> +					srb->result);
> +			srb->scsi_done(srb);
> +		}
>   	} /* for (;;) */
>   
>   	/* Wait until we are told to stop */
> 
> 

Thanks for the patch!

I have applied it and am running the resulting kernel.

As I didn't have a reproducible way to trigger the problem on demand, 
I'll just have to see that there isn't a lock-up that looks related over 
the next several days.

Arthur.
Arthur Marsh July 17, 2017, 8:15 p.m. UTC | #2
Arthur Marsh wrote on 14/07/17 04:18:
> 
> 
> Alan Stern wrote on 14/07/17 02:30:
> 
>> All right.  In the meantime, changing usb-storage won't hurt.
>> Arthur, can you test the patch below?
>>
>> Alan Stern
>>
>>
>>
>> Index: usb-4.x/drivers/usb/storage/usb.c
>> ===================================================================
>> --- usb-4.x.orig/drivers/usb/storage/usb.c
>> +++ usb-4.x/drivers/usb/storage/usb.c
>> @@ -315,6 +315,7 @@ static int usb_stor_control_thread(void
>>   {
>>       struct us_data *us = (struct us_data *)__us;
>>       struct Scsi_Host *host = us_to_host(us);
>> +    struct scsi_cmnd *srb;
>>       for (;;) {
>>           usb_stor_dbg(us, "*** thread sleeping\n");
>> @@ -330,6 +331,7 @@ static int usb_stor_control_thread(void
>>           scsi_lock(host);
>>           /* When we are called with no command pending, we're done */
>> +        srb = us->srb;
>>           if (us->srb == NULL) {
>>               scsi_unlock(host);
>>               mutex_unlock(&us->dev_mutex);
>> @@ -398,14 +400,11 @@ static int usb_stor_control_thread(void
>>           /* lock access to the state */
>>           scsi_lock(host);
>> -        /* indicate that the command is done */
>> -        if (us->srb->result != DID_ABORT << 16) {
>> -            usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
>> -                     us->srb->result);
>> -            us->srb->scsi_done(us->srb);
>> -        } else {
>> +        /* was the command aborted? */
>> +        if (us->srb->result == DID_ABORT << 16) {
>>   SkipForAbort:
>>               usb_stor_dbg(us, "scsi command aborted\n");
>> +            srb = NULL;    /* Don't call srb->scsi_done() */
>>           }
>>           /*
>> @@ -429,6 +428,13 @@ SkipForAbort:
>>           /* unlock the device pointers */
>>           mutex_unlock(&us->dev_mutex);
>> +
>> +        /* now that the locks are released, notify the SCSI core */
>> +        if (srb) {
>> +            usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
>> +                    srb->result);
>> +            srb->scsi_done(srb);
>> +        }
>>       } /* for (;;) */
>>       /* Wait until we are told to stop */

Hi, just to confirm no further lock-ups occurred in the last 4 days with 
this patch applied.

Arthur.
Christoph Hellwig July 26, 2017, 3:47 p.m. UTC | #3
On Thu, Jul 13, 2017 at 01:00:29PM -0400, Alan Stern wrote:
> All right.  In the meantime, changing usb-storage won't hurt.
> Arthur, can you test the patch below?

Do you plan to send this fix to Greg for 4.13-rc?
Alan Stern July 26, 2017, 3:50 p.m. UTC | #4
On Wed, 26 Jul 2017, Christoph Hellwig wrote:

> On Thu, Jul 13, 2017 at 01:00:29PM -0400, Alan Stern wrote:
> > All right.  In the meantime, changing usb-storage won't hurt.
> > Arthur, can you test the patch below?
> 
> Do you plan to send this fix to Greg for 4.13-rc?

I just submitted it.

Alan Stern
diff mbox

Patch

Index: usb-4.x/drivers/usb/storage/usb.c
===================================================================
--- usb-4.x.orig/drivers/usb/storage/usb.c
+++ usb-4.x/drivers/usb/storage/usb.c
@@ -315,6 +315,7 @@  static int usb_stor_control_thread(void
 {
 	struct us_data *us = (struct us_data *)__us;
 	struct Scsi_Host *host = us_to_host(us);
+	struct scsi_cmnd *srb;
 
 	for (;;) {
 		usb_stor_dbg(us, "*** thread sleeping\n");
@@ -330,6 +331,7 @@  static int usb_stor_control_thread(void
 		scsi_lock(host);
 
 		/* When we are called with no command pending, we're done */
+		srb = us->srb;
 		if (us->srb == NULL) {
 			scsi_unlock(host);
 			mutex_unlock(&us->dev_mutex);
@@ -398,14 +400,11 @@  static int usb_stor_control_thread(void
 		/* lock access to the state */
 		scsi_lock(host);
 
-		/* indicate that the command is done */
-		if (us->srb->result != DID_ABORT << 16) {
-			usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
-				     us->srb->result);
-			us->srb->scsi_done(us->srb);
-		} else {
+		/* was the command aborted? */
+		if (us->srb->result == DID_ABORT << 16) {
 SkipForAbort:
 			usb_stor_dbg(us, "scsi command aborted\n");
+			srb = NULL;	/* Don't call srb->scsi_done() */
 		}
 
 		/*
@@ -429,6 +428,13 @@  SkipForAbort:
 
 		/* unlock the device pointers */
 		mutex_unlock(&us->dev_mutex);
+
+		/* now that the locks are released, notify the SCSI core */
+		if (srb) {
+			usb_stor_dbg(us, "scsi cmd done, result=0x%x\n",
+					srb->result);
+			srb->scsi_done(srb);
+		}
 	} /* for (;;) */
 
 	/* Wait until we are told to stop */