diff mbox

[BUG] hpsa: Controller lockup detected: 0x00150028

Message ID 555F46F4.2090002@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tomas Henzl May 22, 2015, 3:10 p.m. UTC
On 05/18/2015 06:11 PM, Peter Zijlstra wrote:
> On Mon, May 18, 2015 at 06:03:45PM +0200, Peter Zijlstra wrote:
>> On Mon, May 18, 2015 at 05:20:34PM +0200, Peter Zijlstra wrote:
>>> On Mon, May 18, 2015 at 01:57:39PM +0000, Oelke, Mark wrote:
>>>> The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
>>>> Which version of controller firmware are you using?
>>>
>>> Smart Array P212 in Slot 1
>>>
>>>    Hardware Revision: C
>>>    Firmware Version: 6.60
>>
>> I've updated to 6.62 and it appears to be working now; or rather, it has
>> not locked up yet where I think it would've locked up by now earlier.
>>
>> I'll let it run for a few more hours before calling it fixed, I'll let
>> you know.
> 
> And right after sending this email it went...
> 
> [ 1119.052144] hpsa 0000:06:00.0: Controller lockup detected: 0x00150029
> 
> So sadly no dice.
> 
> Anything else I can do?
An older issue for mptsas seems to handle a similar case
2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
that might be for hpsa -
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Peter Zijlstra May 22, 2015, 4:40 p.m. UTC | #1
On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> >> I've updated to 6.62 and it appears to be working now; or rather, it has

I've since gotten 6.64 from HP to test; which does not seem public yet.

6.64 actually fixes the issue for me.

> An older issue for mptsas seems to handle a similar case
> 2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
> that might be for hpsa -

> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1067,6 +1067,8 @@ static int hpsa_slave_alloc(struct scsi_device *sdev)
>         if (sd != NULL)
>                 sdev->hostdata = sd;
>         spin_unlock_irqrestore(&h->devlock, flags);
> +
> +       blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
>         return 0;
>  }

That does indeed seem _very_ similar; I'll have to defer to Mark Oelke
and or Don Brace to say if the above is a useful alternative. Since they
seem to now know what was the root cause.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Handzik, Joe May 22, 2015, 4:48 p.m. UTC | #2
No, the problem here (iirc) actually dealt with buffers in the firmware.

Don or Mark, agree?

Joe

-----Original Message-----
From: Peter Zijlstra [mailto:peterz@infradead.org] 
Sent: Friday, May 22, 2015 11:40 AM
To: Tomas Henzl
Cc: Oelke, Mark; don.brace@pmcs.com; ISS StorageDev; storagedev@pmcs.com; linux-scsi@vger.kernel.org
Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028

On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> >> I've updated to 6.62 and it appears to be working now; or rather, it has

I've since gotten 6.64 from HP to test; which does not seem public yet.

6.64 actually fixes the issue for me.

> An older issue for mptsas seems to handle a similar case
> 2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
> that might be for hpsa -

> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1067,6 +1067,8 @@ static int hpsa_slave_alloc(struct scsi_device *sdev)
>         if (sd != NULL)
>                 sdev->hostdata = sd;
>         spin_unlock_irqrestore(&h->devlock, flags);
> +
> +       blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
>         return 0;
>  }

That does indeed seem _very_ similar; I'll have to defer to Mark Oelke
and or Don Brace to say if the above is a useful alternative. Since they
seem to now know what was the root cause.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wouter Depuydt Aug. 24, 2015, 9:43 a.m. UTC | #3
> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz <at> infradead.org] 
> Sent: Friday, May 22, 2015 11:40 AM
> To: Tomas Henzl
> Cc: Oelke, Mark; don.brace <at> pmcs.com; ISS StorageDev; storagedev <at>
pmcs.com; linux-scsi <at> vger.kernel.org
> Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028
> 
> On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> > >> I've updated to 6.62 and it appears to be working now; or rather, it has
> 
> I've since gotten 6.64 from HP to test; which does not seem public yet.
> 
> 6.64 actually fixes the issue for me.
> 

Hi everone,

I've experienced a similar problem with a P411 controller in HBA mode.
Serial Number	PDNMH0ARH8P04A
Model	HP Smart Array P441 Controller
Firmware Version	2.52

This seems to be the latest firmware:
http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=7274903&swItemId=MTX_a476e21cd5e142608ff8d6aed5&swEnvOid=4176

I'm running smartd for monitoring, dayly short tests and weekly long tests.

Aug 23 13:57:11 smartd[3344]: Device: /dev/sdv [SAT], SMART Usage Attribute:
194 Temperature_Celsius changed from 41 to 42
Aug 23 13:57:42 kernel: [349157.766608] hpsa 0000:05:00.0: Abort request on
C6:B2:T18:L0
Aug 23 13:57:42 kernel: [349157.830545] hpsa 0000:05:00.0: Abort request on
C6:B2:T19:L0
Aug 23 14:00:10 kernel: [349305.554986]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:00:10 kernel: [349305.555000]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:00:10 kernel: [349305.575005]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:00:10 kernel: [349305.575019]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:02:10 kernel: [349425.572350]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:02:10 kernel: [349425.572364]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:02:10 kernel: [349425.682988]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:02:10 kernel: [349425.683001]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:03:10 kernel: [349457.674593] hpsa 0000:05:00.0: Controller lockup
detected: 0x00130001
Aug 23 14:03:38 kernel: [349485.657605] Workqueue: events
hpsa_monitor_ctlr_worker [hpsa]
Aug 23 14:03:38 kernel: [349485.657736]  <<EOE>>  [<ffffffffc02a5384>]
fail_all_cmds_on_list+0x74/0x6c0 [hpsa]
Aug 23 14:03:38 kernel: [349485.657765]  [<ffffffffc02aa9bb>]
hpsa_monitor_ctlr_worker+0x40b/0x4e0 [hpsa]
Aug 23 14:03:43 kernel: [349518.024245] Workqueue: events
hpsa_monitor_ctlr_worker [hpsa]
Aug 23 14:04:10 kernel: [349545.689592]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:04:10 kernel: [349545.689605]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:04:10 kernel: [349545.805674]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:04:10 kernel: [349545.805693]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:06:11 kernel: [349665.810469]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:06:11 kernel: [349665.810483]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]

W.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wouter Depuydt Aug. 24, 2015, 10:02 a.m. UTC | #4
Wouter Depuydt <wouter.depuydt <at> gmail.com> writes:

> 
> > -----Original Message-----
> > From: Peter Zijlstra [mailto:peterz <at> infradead.org] 
> > Sent: Friday, May 22, 2015 11:40 AM
> > To: Tomas Henzl
> > Cc: Oelke, Mark; don.brace <at> pmcs.com; ISS StorageDev; storagedev <at>
> pmcs.com; linux-scsi <at> vger.kernel.org
> > Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028
> > 
> > On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> > > >> I've updated to 6.62 and it appears to be working now; or rather,
it has
> > 
> > I've since gotten 6.64 from HP to test; which does not seem public yet.
> > 
> > 6.64 actually fixes the issue for me.
> > 
> 
> Hi everone,
> 
> I've experienced a similar problem with a P411 controller in HBA mode.
> Serial Number	PDNMH0ARH8P04A
> Model	HP Smart Array P441 Controller
> Firmware Version	2.52
> 

Other System info:

HP Proliant D380p Gen9
Ubuntu LTS 14.04
ii  linux-image-3.19.0-26-generic
ii  linux-image-extra-3.19.0-26-generic
ii  linux-image-generic-lts-vivid       

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Brace Aug. 24, 2015, 2:11 p.m. UTC | #5
On 08/24/2015 05:02 AM, Wouter Depuydt wrote:
> Wouter Depuydt <wouter.depuydt <at> gmail.com> writes:
>
>>> -----Original Message-----
>>> From: Peter Zijlstra [mailto:peterz <at> infradead.org]
>>> Sent: Friday, May 22, 2015 11:40 AM
>>> To: Tomas Henzl
>>> Cc: Oelke, Mark; don.brace <at> pmcs.com; ISS StorageDev; storagedev <at>
>> pmcs.com; linux-scsi <at> vger.kernel.org
>>> Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028
>>>
>>> On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
>>>>>> I've updated to 6.62 and it appears to be working now; or rather,
> it has
>>> I've since gotten 6.64 from HP to test; which does not seem public yet.
>>>
>>> 6.64 actually fixes the issue for me.
>>>
>> Hi everone,
>>
>> I've experienced a similar problem with a P411 controller in HBA mode.
>> Serial Number	PDNMH0ARH8P04A
>> Model	HP Smart Array P441 Controller
>> Firmware Version	2.52
>>
> Other System info:
>
> HP Proliant D380p Gen9
> Ubuntu LTS 14.04
> ii  linux-image-3.19.0-26-generic
> ii  linux-image-extra-3.19.0-26-generic
> ii  linux-image-generic-lts-vivid
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
I see this issue addressed in the Firmware update page under the "Fixes" 
tab.
http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=3984645&swItemId=MTX_55b304486f544f148de6c5cc6e&swEnvOid=4103#tab4


Problems Fixed:

     Running SMARTCTL (smartmontools) on HP Proliant G6/G7 (Px1x) Smart 
Array controllers that have firmware version 5.70 to 6.62 installed with 
SATA drives attached may result in system not responding or reboot. Wehn 
reboot occurred, a reboot 1719 POST error message with lockup 0x15 
displayed.

Hope this helps you.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1067,6 +1067,8 @@  static int hpsa_slave_alloc(struct scsi_device *sdev)
        if (sd != NULL)
                sdev->hostdata = sd;
        spin_unlock_irqrestore(&h->devlock, flags);
+
+       blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
        return 0;
 }
-tm
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in