Message ID | 1590651115-9619-1-git-send-email-newtongao@tencent.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD | expand |
On 2020/05/28 Thu 15:31, xiakaixu1987@gmail.com wrote: >From: Xiaoming Gao <newtongao@tencent.com> > >when kernel crash, and kexec into kdump kernel, megaraid_sas will hung and >print follow error logs > >24.1485901 sd 0:0:G:0: [sda 1 tag809 BRCfl Debug mfi stat 0x2(1, data len requested/conpleted 0X100 >0/0x0)] >24.1867171 sd 0:0:G :9: [sda I tag861 BRCfl Debug mfft stat 0x2d, data len reques ted/conp1e Led 0X100 >0/0x0] >24.2054191 sd 0:O:6:O: [sda 1 tag861 FAILED Result: hustbyte=DIDGK drioerbyte-DRIUCR SENSE] >24.2549711 bik_update_ request ! 1/0 error , dev sda, sector 937782912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class >21.2752791 buffer_io_error 2 callbacks suppressed >21.2752731 Duffer IO error an dev sda, logical block 117212064, async page read > >this bug is caused by commit '59db5a931bbe73f ("scsi: megaraid_sas: Handle sequence JBOD map failure at > driver level >")' >and can be fixed by not set JOB when reset_devices on I've recently run into this exact issue on a arm64 machine with Avago 3408 controller. This patch fixed the issue. Thank you. Tested-by: Kai Liu <kai.liu@suse.com> Best regards, Kai > >Signed-off-by: Xiaoming Gao <newtongao@tencent.com> >--- > drivers/scsi/megaraid/megaraid_sas_fusion.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c >index b2ad965..24e7f1b 100644 >--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >@@ -3127,7 +3127,7 @@ static void megasas_build_ld_nonrw_fusion(struct megasas_instance *instance, > << MR_RAID_CTX_RAID_FLAGS_IO_SUB_TYPE_SHIFT; > > /* If FW supports PD sequence number */ >- if (instance->support_seqnum_jbod_fp) { >+ if (!reset_devices && instance->support_seqnum_jbod_fp) { > if (instance->use_seqnum_jbod_fp && > instance->pd_list[pd_index].driveType == TYPE_DISK) { > >-- >1.8.3.1 >
> when kernel crash, and kexec into kdump kernel, megaraid_sas will hung > and print follow error logs > > 24.1485901 sd 0:0:G:0: [sda 1 tag809 BRCfl Debug mfi stat 0x2(1, data len requested/conpleted 0X100 > 0/0x0)] > 24.1867171 sd 0:0:G :9: [sda I tag861 BRCfl Debug mfft stat 0x2d, data len reques ted/conp1e Led 0X100 > 0/0x0] > 24.2054191 sd 0:O:6:O: [sda 1 tag861 FAILED Result: hustbyte=DIDGK drioerbyte-DRIUCR SENSE] > 24.2549711 bik_update_ request ! 1/0 error , dev sda, sector 937782912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class > 21.2752791 buffer_io_error 2 callbacks suppressed > 21.2752731 Duffer IO error an dev sda, logical block 117212064, async page read > > this bug is caused by commit '59db5a931bbe73f ("scsi: megaraid_sas: > Handle sequence JBOD map failure at driver level ")' and can be fixed > by not set JOB when reset_devices on Broadcom: Please review. Thanks!
>Subject: Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD > > >> when kernel crash, and kexec into kdump kernel, megaraid_sas will hung >> and print follow error logs >> >> 24.1485901 sd 0:0:G:0: [sda 1 tag809 BRCfl Debug mfi stat 0x2(1, data >> len requested/conpleted 0X100 0/0x0)] >> 24.1867171 sd 0:0:G :9: [sda I tag861 BRCfl Debug mfft stat 0x2d, data >> len reques ted/conp1e Led 0X100 0/0x0] >> 24.2054191 sd 0:O:6:O: [sda 1 tag861 FAILED Result: hustbyte=DIDGK >> drioerbyte-DRIUCR SENSE] >> 24.2549711 bik_update_ request ! 1/0 error , dev sda, sector 937782912 >> op 0x0:(READ) flags 0x0 phys_seg 1 prio class >> 21.2752791 buffer_io_error 2 callbacks suppressed >> 21.2752731 Duffer IO error an dev sda, logical block 117212064, async >> page read >> >> this bug is caused by commit '59db5a931bbe73f ("scsi: megaraid_sas: >> Handle sequence JBOD map failure at driver level ")' and can be fixed >> by not set JOB when reset_devices on > >Broadcom: Please review. > >Thanks! > >-- >Martin K. Petersen Oracle Linux Engineering We are working on it and will update you at the earliest. Thanks, Chandrakanth Patil
>Subject: RE: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD > >>Subject: Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung >>caused by JBOD >> >> >>> when kernel crash, and kexec into kdump kernel, megaraid_sas will >>> hung and print follow error logs >>> >>> 24.1485901 sd 0:0:G:0: [sda 1 tag809 BRCfl Debug mfi stat 0x2(1, data >>> len requested/conpleted 0X100 0/0x0)] >>> 24.1867171 sd 0:0:G :9: [sda I tag861 BRCfl Debug mfft stat 0x2d, >>> data len reques ted/conp1e Led 0X100 0/0x0] >>> 24.2054191 sd 0:O:6:O: [sda 1 tag861 FAILED Result: hustbyte=DIDGK >>> drioerbyte-DRIUCR SENSE] >>> 24.2549711 bik_update_ request ! 1/0 error , dev sda, sector >>> 937782912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class >>> 21.2752791 buffer_io_error 2 callbacks suppressed >>> 21.2752731 Duffer IO error an dev sda, logical block 117212064, async >>> page read >>> >>> this bug is caused by commit '59db5a931bbe73f ("scsi: megaraid_sas: >>> Handle sequence JBOD map failure at driver level ")' and can be fixed >>> by not set JOB when reset_devices on >> >>Broadcom: Please review. >> >>Thanks! >> >>-- >>Martin K. Petersen Oracle Linux Engineering > >We are working on it and will update you at the earliest. > >Thanks, >Chandrakanth Patil Hi Martin, Xiaoming Gao, Kai Liu, It is a known firmware issue and has been fixed. Please update to the latest firmware available in the Broadcom support website. Please let me know if you need any further information. Thanks, Chandrakanth Patil
On 2020/06/04 Thu 16:39, Chandrakanth Patil wrote: > >Hi Martin, Xiaoming Gao, Kai Liu, > >It is a known firmware issue and has been fixed. Please update to the >latest firmware available in the Broadcom support website. >Please let me know if you need any further information. Hi Chandrakanth, Could you let me know which megaraid based controllers are affected by this issue? All or some models or some generations? Best regards, Kai Liu
>Subject: Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused >by JBOD > >On 2020/06/04 Thu 16:39, Chandrakanth Patil wrote: >> >>Hi Martin, Xiaoming Gao, Kai Liu, >> >>It is a known firmware issue and has been fixed. Please update to the >>latest firmware available in the Broadcom support website. >>Please let me know if you need any further information. > >Hi Chandrakanth, > >Could you let me know which megaraid based controllers are affected by this >issue? All or >some models or some generations? > >Best regards, >Kai Liu Hi Kai Liu, Gen3 (Invader) and Gen3.5 (Ventura/Aero) generations of controllers are affected. Thanks, Chandrakanth Patil
On 2020/06/05 Fri 01:05, Chandrakanth Patil wrote: > >Hi Kai Liu, > >Gen3 (Invader) and Gen3.5 (Ventura/Aero) generations of controllers are >affected. Hi Chandrakanth, My card is not one of these but it's also problematic: # lspci -nn|grep 3408 02:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID Tri-Mode SAS3408 [1000:0017] (rev 01) According to megaraid_sas.h it's Tomcat: #define PCI_DEVICE_ID_LSI_TOMCAT 0x0017 According to product information on broadcom.com the card model is 9440-8i. So I tried to upgrade to the latest firmware version 51.13.0-3223 but I got these error: # ./storcli64 /c0 download file=9440-8i_nopad.rom Download Completed. Flashing image to adapter... CLI Version = 007.1316.0000.0000 Mar 12, 2020 Operating system = Linux 5.3.18-0.g6748ac9-default Controller = 0 Status = Failure Description = image corrupted I tried few more versions from broadcom website, they all failed with the same "image corrupted" error. Here is the controller information: # ./storcli64 /c0 show Generating detailed summary of the adapter, it may take a while to complete. CLI Version = 007.1316.0000.0000 Mar 12, 2020 Operating system = Linux 5.3.18-0.g6748ac9-default Controller = 0 Status = Success Description = None Product Name = SAS3408 Serial Number = 033FAT10K8000236 SAS Address = 57c1cf15516f4000 PCI Address = 00:02:00:00 System Time = 06/05/2020 12:36:59 Mfg. Date = 00/00/00 Controller Time = 06/05/2020 04:36:58 FW Package Build = 50.6.3-0109 BIOS Version = 7.06.02.2_0x07060502 FW Version = 5.060.01-2262 Driver Name = megaraid_sas Driver Version = 07.713.01.00-rc1 Vendor Id = 0x1000 Device Id = 0x17 SubVendor Id = 0x19E5 SubDevice Id = 0xD213 Host Interface = PCI-E Device Interface = SAS-12G Bus Number = 2 Device Number = 0 Function Number = 0 Domain ID = 0 Drive Groups = 3 Thanks, Kai Liu
>Subject: Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused >by JBOD > >On 2020/06/05 Fri 01:05, Chandrakanth Patil wrote: >> >>Hi Kai Liu, >> >>Gen3 (Invader) and Gen3.5 (Ventura/Aero) generations of controllers are >>affected. > >Hi Chandrakanth, > >My card is not one of these but it's also problematic: > ># lspci -nn|grep 3408 >02:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID Tri-Mode >SAS3408 >[1000:0017] (rev 01) > >According to megaraid_sas.h it's Tomcat: > >#define PCI_DEVICE_ID_LSI_TOMCAT 0x0017 > >According to product information on broadcom.com the card model is 9440-8i. >So I tried to >upgrade to the latest firmware version >51.13.0-3223 but I got these error: > ># ./storcli64 /c0 download file=9440-8i_nopad.rom Download Completed. >Flashing image to adapter... >CLI Version = 007.1316.0000.0000 Mar 12, 2020 Operating system = Linux >5.3.18- >0.g6748ac9-default Controller = 0 Status = Failure Description = image >corrupted > >I tried few more versions from broadcom website, they all failed with the >same "image >corrupted" error. > >Here is the controller information: > ># ./storcli64 /c0 show >Generating detailed summary of the adapter, it may take a while to >complete. > >CLI Version = 007.1316.0000.0000 Mar 12, 2020 Operating system = Linux >5.3.18- >0.g6748ac9-default Controller = 0 Status = Success Description = None > >Product Name = SAS3408 >Serial Number = 033FAT10K8000236 >SAS Address = 57c1cf15516f4000 >PCI Address = 00:02:00:00 >System Time = 06/05/2020 12:36:59 >Mfg. Date = 00/00/00 >Controller Time = 06/05/2020 04:36:58 >FW Package Build = 50.6.3-0109 >BIOS Version = 7.06.02.2_0x07060502 >FW Version = 5.060.01-2262 >Driver Name = megaraid_sas >Driver Version = 07.713.01.00-rc1 >Vendor Id = 0x1000 >Device Id = 0x17 >SubVendor Id = 0x19E5 >SubDevice Id = 0xD213 >Host Interface = PCI-E >Device Interface = SAS-12G >Bus Number = 2 >Device Number = 0 >Function Number = 0 >Domain ID = 0 >Drive Groups = 3 > > >Thanks, >Kai Liu Hi Kai Liu, Tomcat (Device ID: 0017) belongs to Gen3.5 controllers (Ventura family of controllers). So this issue is applicable. As this is an OEM specific firmware, Please contact Broadcom support team in order get the correct firmware image. -Chandrakanth Patil
On 2020/06/05 Fri 21:00, Chandrakanth Patil wrote: >Hi Kai Liu, > >Tomcat (Device ID: 0017) belongs to Gen3.5 controllers (Ventura family of >controllers). So this issue is applicable. >As this is an OEM specific firmware, Please contact Broadcom support team in >order get the correct firmware image. Thanks for your help, Chandrakanth. Best regards, Kai
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index b2ad965..24e7f1b 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -3127,7 +3127,7 @@ static void megasas_build_ld_nonrw_fusion(struct megasas_instance *instance, << MR_RAID_CTX_RAID_FLAGS_IO_SUB_TYPE_SHIFT; /* If FW supports PD sequence number */ - if (instance->support_seqnum_jbod_fp) { + if (!reset_devices && instance->support_seqnum_jbod_fp) { if (instance->use_seqnum_jbod_fp && instance->pd_list[pd_index].driveType == TYPE_DISK) {