diff mbox

[v4,4/4] scsi: ufs: inject errors to verify error handling

Message ID 1425308203-20695-5-git-send-email-gbroner@codeaurora.org (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Gilad Broner March 2, 2015, 2:56 p.m. UTC
From: Sujit Reddy Thumma <sthumma@codeaurora.org>

Use fault-injection framework to simulate error conditions
in the controller and verify error handling mechanisms
implemented in UFS host controller driver.

This is used only during development and hence
guarded by CONFIG_UFS_FAULT_INJECTION debug config option.

Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
---
 drivers/scsi/ufs/ufs-debugfs.c | 140 +++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/ufs/ufs-debugfs.h |   4 ++
 drivers/scsi/ufs/ufshcd.c      |   2 +
 drivers/scsi/ufs/ufshcd.h      |   5 ++
 lib/Kconfig.debug              |  14 +++++
 5 files changed, 165 insertions(+)

Comments

Akinobu Mita March 4, 2015, 1:50 p.m. UTC | #1
2015-03-02 23:56 GMT+09:00 Gilad Broner <gbroner@codeaurora.org>:
> From: Sujit Reddy Thumma <sthumma@codeaurora.org>
>
> Use fault-injection framework to simulate error conditions
> in the controller and verify error handling mechanisms
> implemented in UFS host controller driver.
>
> This is used only during development and hence
> guarded by CONFIG_UFS_FAULT_INJECTION debug config option.

This feature looks useful, but I have a couple of comments and
question.

> +static bool inject_cmd_hang_tr(struct ufs_hba *hba)
> +{
> +       int tag;
> +
> +       tag = find_first_bit(&hba->outstanding_reqs, hba->nutrs);
> +       if (tag == hba->nutrs)
> +               return 0;
> +
> +       __clear_bit(tag, &hba->outstanding_reqs);
> +       hba->lrb[tag].cmd = NULL;
> +       __clear_bit(tag, &hba->lrb_in_use);

hba->lrb_in_use is a bitmap set by test_and_set_bit_lock().  So
this should be cleared by clear_bit_unlock().

And as soon as the bit corresponds to this slot in hba->lrb_in_use is
cleared, this slot could be reused.  But if the request corresponds
to the slot is not completed yet, it could sacrifice the new request,
too.  So should we only inject into the commands which have been
completed like this?

        tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
        completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
        tag = find_first_bit(&completed_reqs, hba->nutrs);
        ...

Or clear the corresponding bit in REG_UTP_TASK_REQ_LIST_CLEAR, just
like what inject_fatal_err_tr() does?

> +
> +       /* command hang injected */
> +       return 1;
> +}
> +
> +static int inject_cmd_hang_tm(struct ufs_hba *hba)
> +{
> +       int tag;
> +
> +       tag = find_first_bit(&hba->outstanding_tasks, hba->nutmrs);
> +       if (tag == hba->nutmrs)
> +               return 0;
> +
> +       __clear_bit(tag, &hba->outstanding_tasks);
> +       __clear_bit(tag, &hba->tm_slots_in_use);
> +
> +       /* command hang injected */
> +       return 1;
> +}

The same goes for hba->tm_slots_in_use.
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gilad Broner March 10, 2015, 10:20 a.m. UTC | #2
>> +static bool inject_cmd_hang_tr(struct ufs_hba *hba)
>> +{
>> +       int tag;
>> +
>> +       tag = find_first_bit(&hba->outstanding_reqs, hba->nutrs);
>> +       if (tag == hba->nutrs)
>> +               return 0;
>> +
>> +       __clear_bit(tag, &hba->outstanding_reqs);
>> +       hba->lrb[tag].cmd = NULL;
>> +       __clear_bit(tag, &hba->lrb_in_use);
>
> hba->lrb_in_use is a bitmap set by test_and_set_bit_lock().  So
> this should be cleared by clear_bit_unlock().

You are correct. Thanks.

>
> And as soon as the bit corresponds to this slot in hba->lrb_in_use is
> cleared, this slot could be reused.  But if the request corresponds
> to the slot is not completed yet, it could sacrifice the new request,
> too.  So should we only inject into the commands which have been
> completed like this?

Please note that we only clear the bit in hba->lrb_in_use. scsi_done is
not called for this request. Therefore, the tag is not yet free in the
block layer and next calls for queuecommand will not pass down this tag to
be used in the UFS driver. So there is no danger of a new request being
sacrificed.

On a different note, we are debating internally on a few other changes so
until we consolidate those I will drop this patch with error injection.

Gilad.
Akinobu Mita March 10, 2015, 1:09 p.m. UTC | #3
2015-03-10 19:20 GMT+09:00 Gilad Broner <gbroner@codeaurora.org>:
>>> +static bool inject_cmd_hang_tr(struct ufs_hba *hba)
>>> +{
>>> +       int tag;
>>> +
>>> +       tag = find_first_bit(&hba->outstanding_reqs, hba->nutrs);
>>> +       if (tag == hba->nutrs)
>>> +               return 0;
>>> +
>>> +       __clear_bit(tag, &hba->outstanding_reqs);
>>> +       hba->lrb[tag].cmd = NULL;
>>> +       __clear_bit(tag, &hba->lrb_in_use);
>>
>> hba->lrb_in_use is a bitmap set by test_and_set_bit_lock().  So
>> this should be cleared by clear_bit_unlock().
>
> You are correct. Thanks.
>
>>
>> And as soon as the bit corresponds to this slot in hba->lrb_in_use is
>> cleared, this slot could be reused.  But if the request corresponds
>> to the slot is not completed yet, it could sacrifice the new request,
>> too.  So should we only inject into the commands which have been
>> completed like this?
>
> Please note that we only clear the bit in hba->lrb_in_use. scsi_done is
> not called for this request. Therefore, the tag is not yet free in the
> block layer and next calls for queuecommand will not pass down this tag to
> be used in the UFS driver. So there is no danger of a new request being
> sacrificed.

OK, I see there is no danger as far as the commands are comming
through queuecommand().  But what about the query requests?
PATCH 1/4 in this series has added ioctl interface for query
request which also acquires a tag in hba->lrb_in_use through
ufshcd_get_dev_cmd_tag().  Although it is very difficult to
happen, is it possible for new query requests to be clashed by
inject_cmd_hang_tr() in the same way I described earlier?
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/ufs/ufs-debugfs.c b/drivers/scsi/ufs/ufs-debugfs.c
index 27ab053..bac72d0 100644
--- a/drivers/scsi/ufs/ufs-debugfs.c
+++ b/drivers/scsi/ufs/ufs-debugfs.c
@@ -17,6 +17,7 @@ 
  *
  */
 
+#include <linux/random.h>
 #include "ufs-debugfs.h"
 #include "unipro.h"
 
@@ -41,6 +42,143 @@  struct desc_field_offset {
 	} while (0)
 #define DOORBELL_CLR_TOUT_US	(1000 * 1000) /* 1 sec */
 
+#ifdef CONFIG_UFS_FAULT_INJECTION
+
+#define INJECT_COMMAND_HANG (0x0)
+
+static DECLARE_FAULT_ATTR(fail_default_attr);
+static char *fail_request;
+module_param(fail_request, charp, 0);
+
+static bool inject_fatal_err_tr(struct ufs_hba *hba, u8 ocs_err)
+{
+	int tag;
+
+	tag = find_first_bit(&hba->outstanding_reqs, hba->nutrs);
+	if (tag == hba->nutrs)
+		return 0;
+
+	ufshcd_writel(hba, ~(1 << tag), REG_UTP_TRANSFER_REQ_LIST_CLEAR);
+	(&hba->lrb[tag])->utr_descriptor_ptr->header.dword_2 =
+							cpu_to_be32(ocs_err);
+
+	/* fatal error injected */
+	return 1;
+}
+
+static bool inject_fatal_err_tm(struct ufs_hba *hba, u8 ocs_err)
+{
+	int tag;
+
+	tag = find_first_bit(&hba->outstanding_tasks, hba->nutmrs);
+	if (tag == hba->nutmrs)
+		return 0;
+
+	ufshcd_writel(hba, ~(1 << tag), REG_UTP_TASK_REQ_LIST_CLEAR);
+	(&hba->utmrdl_base_addr[tag])->header.dword_2 =
+						cpu_to_be32(ocs_err);
+
+	/* fatal error injected */
+	return 1;
+}
+
+static bool inject_cmd_hang_tr(struct ufs_hba *hba)
+{
+	int tag;
+
+	tag = find_first_bit(&hba->outstanding_reqs, hba->nutrs);
+	if (tag == hba->nutrs)
+		return 0;
+
+	__clear_bit(tag, &hba->outstanding_reqs);
+	hba->lrb[tag].cmd = NULL;
+	__clear_bit(tag, &hba->lrb_in_use);
+
+	/* command hang injected */
+	return 1;
+}
+
+static int inject_cmd_hang_tm(struct ufs_hba *hba)
+{
+	int tag;
+
+	tag = find_first_bit(&hba->outstanding_tasks, hba->nutmrs);
+	if (tag == hba->nutmrs)
+		return 0;
+
+	__clear_bit(tag, &hba->outstanding_tasks);
+	__clear_bit(tag, &hba->tm_slots_in_use);
+
+	/* command hang injected */
+	return 1;
+}
+
+void ufsdbg_fail_request(struct ufs_hba *hba, u32 *intr_status)
+{
+	u8 ocs_err;
+	static const u32 errors[] = {
+		CONTROLLER_FATAL_ERROR,
+		SYSTEM_BUS_FATAL_ERROR,
+		INJECT_COMMAND_HANG,
+	};
+
+	if (!should_fail(&hba->debugfs_files.fail_attr, 1))
+		goto out;
+
+	*intr_status = errors[prandom_u32() % ARRAY_SIZE(errors)];
+	dev_info(hba->dev, "%s: fault-inject error: 0x%x\n",
+			__func__, *intr_status);
+
+	switch (*intr_status) {
+	case CONTROLLER_FATAL_ERROR: /* fall through */
+		ocs_err = OCS_FATAL_ERROR;
+		goto set_ocs;
+	case SYSTEM_BUS_FATAL_ERROR:
+		ocs_err = OCS_INVALID_CMD_TABLE_ATTR;
+set_ocs:
+		if (!inject_fatal_err_tr(hba, ocs_err))
+			if (!inject_fatal_err_tm(hba, ocs_err))
+				*intr_status = 0;
+		break;
+	case INJECT_COMMAND_HANG:
+		if (!inject_cmd_hang_tr(hba))
+			inject_cmd_hang_tm(hba);
+		break;
+	default:
+		BUG();
+		/* some configurations ignore panics caused by BUG() */
+		break;
+	}
+out:
+	return;
+}
+
+static void ufsdbg_setup_fault_injection(struct ufs_hba *hba)
+{
+	hba->debugfs_files.fail_attr = fail_default_attr;
+
+	if (fail_request)
+		setup_fault_attr(&hba->debugfs_files.fail_attr, fail_request);
+
+	/* suppress dump stack everytime failure is injected */
+	hba->debugfs_files.fail_attr.verbose = 0;
+
+	if (IS_ERR(fault_create_debugfs_attr("inject_fault",
+					hba->debugfs_files.debugfs_root,
+					&hba->debugfs_files.fail_attr)))
+		dev_err(hba->dev, "%s: failed to create debugfs entry\n",
+				__func__);
+}
+#else
+void ufsdbg_fail_request(struct ufs_hba *hba, u32 *intr_status)
+{
+}
+
+static void ufsdbg_setup_fault_injection(struct ufs_hba *hba)
+{
+}
+#endif /* CONFIG_UFS_FAULT_INJECTION */
+
 #define BUFF_LINE_CAPACITY 16
 #define TAB_CHARS 8
 
@@ -885,6 +1023,8 @@  void ufsdbg_add_debugfs(struct ufs_hba *hba)
 		goto err;
 	}
 
+	ufsdbg_setup_fault_injection(hba);
+
 	return;
 
 err:
diff --git a/drivers/scsi/ufs/ufs-debugfs.h b/drivers/scsi/ufs/ufs-debugfs.h
index 7ed308d..54c68f4 100644
--- a/drivers/scsi/ufs/ufs-debugfs.h
+++ b/drivers/scsi/ufs/ufs-debugfs.h
@@ -26,6 +26,7 @@ 
 #ifdef CONFIG_DEBUG_FS
 void ufsdbg_add_debugfs(struct ufs_hba *hba);
 void ufsdbg_remove_debugfs(struct ufs_hba *hba);
+void ufsdbg_fail_request(struct ufs_hba *hba, u32 *intr_status);
 #else
 static inline void ufsdbg_add_debugfs(struct ufs_hba *hba)
 {
@@ -33,6 +34,9 @@  static inline void ufsdbg_add_debugfs(struct ufs_hba *hba)
 static inline void ufsdbg_remove_debugfs(struct ufs_hba *hba)
 {
 }
+static inline void ufsdbg_fail_request(struct ufs_hba *hba, u32 *intr_status)
+{
+}
 #endif
 
 #endif /* End of Header */
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 3ae0b3f..19d7edd 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4009,6 +4009,8 @@  static void ufshcd_tmc_handler(struct ufs_hba *hba)
  */
 static void ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
 {
+	ufsdbg_fail_request(hba, &intr_status);
+
 	hba->errors = UFSHCD_ERROR_MASK & intr_status;
 	if (hba->errors)
 		ufshcd_check_errors(hba);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index d9eb2ca..b065295 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -64,6 +64,8 @@ 
 #include <scsi/scsi_dbg.h>
 #include <scsi/scsi_eh.h>
 
+#include <linux/fault-inject.h>
+
 #include "ufs.h"
 #include "ufshci.h"
 
@@ -283,6 +285,9 @@  struct debugfs_files {
 	struct dentry *dme_peer_read;
 	u32 dme_local_attr_id;
 	u32 dme_peer_attr_id;
+#ifdef CONFIG_UFS_FAULT_INJECTION
+	struct fault_attr fail_attr;
+#endif
 };
 
 /* tag stats statistics types */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5f2ce61..3fc79e7 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1432,6 +1432,20 @@  config FAIL_MMC_REQUEST
 	  and to test how the mmc host driver handles retries from
 	  the block device.
 
+config UFS_FAULT_INJECTION
+	bool "Fault-injection capability for UFS IO"
+	select DEBUG_FS
+	depends on FAULT_INJECTION && SCSI_UFSHCD
+	help
+	 Provide fault-injection capability for UFS IO.
+	 This will make the UFS host controller driver to randomly
+	 abort ongoing commands in the host controller, update OCS
+	 field according to the injected fatal error and can also
+	 forcefully hang the command indefinitely till upper layer
+	 timeout occurs. This is useful to test error handling in
+	 the UFS contoller driver and test how the driver handles
+	 the retries from block/SCSI mid layer.
+
 config FAULT_INJECTION_DEBUG_FS
 	bool "Debugfs entries for fault-injection capabilities"
 	depends on FAULT_INJECTION && SYSFS && DEBUG_FS