diff mbox

[3/5] megaraid_sas: do not crash on invalid completion

Message ID 1478857492-4581-4-git-send-email-hare@suse.de (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Hannes Reinecke Nov. 11, 2016, 9:44 a.m. UTC
Avoid a kernel oops when receiving an invalid command completion.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

Comments

Sumit Saxena Nov. 11, 2016, 11:51 a.m. UTC | #1
>-----Original Message-----
>From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
>owner@vger.kernel.org] On Behalf Of Hannes Reinecke
>Sent: Friday, November 11, 2016 3:15 PM
>To: Martin K. Petersen
>Cc: Christoph Hellwig; James Bottomley; Sumit Saxena; linux-
>scsi@vger.kernel.org; Hannes Reinecke; Hannes Reinecke
>Subject: [PATCH 3/5] megaraid_sas: do not crash on invalid completion
>
>Avoid a kernel oops when receiving an invalid command completion.
scmd_local set to NULL(for cases MPI2_FUNCTION_SCSI_IO_REQUEST and
MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST) will be serious bug(either in driver
or firmware) which should be debugged
and driver should not really continue beyond that. This indicates that
driver internal frames are corrupted. If needed, whenever driver detects
it, it can mark the adapter as dead(stopping further activities).
If OS is installed behind megasas controller then after declaring adapter
dead, system reboot will be required. Kernel panic may give here more
information whenever this condition hits so we kept it like this.
If you are facing this issue, please share the details. I will work on
this.

Thanks,
Sumit

>
>Signed-off-by: Hannes Reinecke <hare@suse.com>
>---
> drivers/scsi/megaraid/megaraid_sas_fusion.c | 22 +++++++++++++++-------
> 1 file changed, 15 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>index 38137de..eb3cb0f 100644
>--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>@@ -2298,13 +2298,15 @@ static void megasas_build_ld_nonrw_fusion(struct
>megasas_instance *instance,
> 			break;
> 		case MPI2_FUNCTION_SCSI_IO_REQUEST:  /*Fast Path IO.*/
> 			/* Update load balancing info */
>-			device_id = MEGASAS_DEV_INDEX(scmd_local);
>-			lbinfo = &fusion->load_balance_info[device_id];
>-			if (cmd_fusion->scmd->SCp.Status &
>-			    MEGASAS_LOAD_BALANCE_FLAG) {
>-
>atomic_dec(&lbinfo->scsi_pending_cmds[cmd_fusion->pd_r1_lb]);
>-				cmd_fusion->scmd->SCp.Status &=
>-					~MEGASAS_LOAD_BALANCE_FLAG;
>+			if (scmd_local) {
>+				device_id = MEGASAS_DEV_INDEX(scmd_local);
>+				lbinfo =
>&fusion->load_balance_info[device_id];
>+				if (cmd_fusion->scmd->SCp.Status &
>+				    MEGASAS_LOAD_BALANCE_FLAG) {
>+
>atomic_dec(&lbinfo->scsi_pending_cmds[cmd_fusion->pd_r1_lb]);
>+					cmd_fusion->scmd->SCp.Status &=
>+
>~MEGASAS_LOAD_BALANCE_FLAG;
>+				}
> 			}
> 			if (reply_descript_type ==
> 			    MPI2_RPY_DESCRIPT_FLAGS_SCSI_IO_SUCCESS) { @@
>-2315,6 +2317,12 @@ static void megasas_build_ld_nonrw_fusion(struct
>megasas_instance *instance,
> 			/* Fall thru and complete IO */
> 		case MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST: /* LD-IO
>Path */
> 			/* Map the FW Cmd Status */
>+			if (!scmd_local) {
>+				dev_err(&instance->pdev->dev,
>+					"cmd[%d:%d] already completed\n",
>+					MSIxIndex, smid);
>+				break;
>+			}
> 			map_cmd_status(cmd_fusion, status, extStatus);
> 			scsi_io_req->RaidContext.status = 0;
> 			scsi_io_req->RaidContext.exStatus = 0;
>--
>1.8.5.6
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of
>a message to majordomo@vger.kernel.org More majordomo info at
>http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Reinecke Nov. 11, 2016, 3:07 p.m. UTC | #2
On 11/11/2016 12:51 PM, Sumit Saxena wrote:
>> -----Original Message-----
>> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
>> owner@vger.kernel.org] On Behalf Of Hannes Reinecke
>> Sent: Friday, November 11, 2016 3:15 PM
>> To: Martin K. Petersen
>> Cc: Christoph Hellwig; James Bottomley; Sumit Saxena; linux-
>> scsi@vger.kernel.org; Hannes Reinecke; Hannes Reinecke
>> Subject: [PATCH 3/5] megaraid_sas: do not crash on invalid completion
>>
>> Avoid a kernel oops when receiving an invalid command completion.
> scmd_local set to NULL(for cases MPI2_FUNCTION_SCSI_IO_REQUEST and
> MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST) will be serious bug(either in driver
> or firmware) which should be debugged
> and driver should not really continue beyond that. This indicates that
> driver internal frames are corrupted. If needed, whenever driver detects
> it, it can mark the adapter as dead(stopping further activities).
> If OS is installed behind megasas controller then after declaring adapter
> dead, system reboot will be required. Kernel panic may give here more
> information whenever this condition hits so we kept it like this.
> If you are facing this issue, please share the details. I will work on
> this.
>

I have come across this problem when developing scsi-mq support. Due to 
the missing mmio barrier when writing to the inbound queue port the I/O 
submission became confused, resulting in already completed frames on the 
completion queue.
While I do agree this is a pretty serious problem, the driver should 
_not_ crash; after all, it just received a completion for an unknown 
command. No reason to take the kernel down.
I'd be in favour of resetting the HBA and taking it offline if required, 
but we really should not crash here.

Incidentally, we _will_ take the HBA offline even now, as these invalid 
command completions causes a scsi timeout for the original command, and 
as the HBA couldn't send completions for them the driver would 
eventually offline the card. So it worked as expected from my POV.

Cheers,

Hannes
diff mbox

Patch

diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 38137de..eb3cb0f 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2298,13 +2298,15 @@  static void megasas_build_ld_nonrw_fusion(struct megasas_instance *instance,
 			break;
 		case MPI2_FUNCTION_SCSI_IO_REQUEST:  /*Fast Path IO.*/
 			/* Update load balancing info */
-			device_id = MEGASAS_DEV_INDEX(scmd_local);
-			lbinfo = &fusion->load_balance_info[device_id];
-			if (cmd_fusion->scmd->SCp.Status &
-			    MEGASAS_LOAD_BALANCE_FLAG) {
-				atomic_dec(&lbinfo->scsi_pending_cmds[cmd_fusion->pd_r1_lb]);
-				cmd_fusion->scmd->SCp.Status &=
-					~MEGASAS_LOAD_BALANCE_FLAG;
+			if (scmd_local) {
+				device_id = MEGASAS_DEV_INDEX(scmd_local);
+				lbinfo = &fusion->load_balance_info[device_id];
+				if (cmd_fusion->scmd->SCp.Status &
+				    MEGASAS_LOAD_BALANCE_FLAG) {
+					atomic_dec(&lbinfo->scsi_pending_cmds[cmd_fusion->pd_r1_lb]);
+					cmd_fusion->scmd->SCp.Status &=
+						~MEGASAS_LOAD_BALANCE_FLAG;
+				}
 			}
 			if (reply_descript_type ==
 			    MPI2_RPY_DESCRIPT_FLAGS_SCSI_IO_SUCCESS) {
@@ -2315,6 +2317,12 @@  static void megasas_build_ld_nonrw_fusion(struct megasas_instance *instance,
 			/* Fall thru and complete IO */
 		case MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST: /* LD-IO Path */
 			/* Map the FW Cmd Status */
+			if (!scmd_local) {
+				dev_err(&instance->pdev->dev,
+					"cmd[%d:%d] already completed\n",
+					MSIxIndex, smid);
+				break;
+			}
 			map_cmd_status(cmd_fusion, status, extStatus);
 			scsi_io_req->RaidContext.status = 0;
 			scsi_io_req->RaidContext.exStatus = 0;