diff mbox

[v6,37/37] cxlflash: Fix to avoid bypassing context cleanup

Message ID 1445458592-63806-1-git-send-email-mrochs@linux.vnet.ibm.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Matthew R. Ochs Oct. 21, 2015, 8:16 p.m. UTC
Contexts may be skipped over for cleanup in situations where contention
for the adapter's table-list mutex is experienced in the presence of a
signal during the execution of the release handler.

This can lead to two known issues:

 - A hang condition on remove as that path tries to wait for users to
   cleanup - something that will never complete should this scenario play
   out as the user has already cleaned up from their perspective.

 - An Oops in the unmap_mapping_range() call that is made as part of
   the user waiting mechanism that is invoked on remove when contexts
   are found to still exist.

The root cause of this issue can be found in get_context() and how the
table-list mutex is acquired. As this code path is shared by several
different access points within the driver, a decision was made during
the development cycle to acquire this mutex in this location using the
interruptible version of the mutex locking service. In almost all of
the use-cases and environmental scenarios this holds up, even when the
mutex is contended. However, for critical system threads (such as the
release handler), failing to acquire the mutex and bailing with the
intention of the user being able to try again later is unacceptable.

In such a scenario, the context _must_ be derived as it is on an
irreversible path to being freed. Without being able to derive the
context, the code mistakenly assumes that it has already been freed
and proceeds to free up the underlying CXL context resources. From
this point on, any usage of [the now stale] CXL context resources
will result in undefined behavior. This is root cause of the Oops
mentioned as the second known issue as the mapping passed to the
unmap_mapping_range() service is owned by the CXL context.

To fix this problem, acquisition of the table-list mutex within
get_context() is simply changed to use the uninterruptible version
of the mutex locking service. This is safe as the timing windows for
holding this mutex are short and also protected against blocking.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
---
 drivers/scsi/cxlflash/superpipe.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Andrew Donnellan Oct. 22, 2015, 2:01 a.m. UTC | #1
On 22/10/15 07:16, Matthew R. Ochs wrote:
> Contexts may be skipped over for cleanup in situations where contention
> for the adapter's table-list mutex is experienced in the presence of a
> signal during the execution of the release handler.
>
> This can lead to two known issues:
>
>   - A hang condition on remove as that path tries to wait for users to
>     cleanup - something that will never complete should this scenario play
>     out as the user has already cleaned up from their perspective.
>
>   - An Oops in the unmap_mapping_range() call that is made as part of
>     the user waiting mechanism that is invoked on remove when contexts
>     are found to still exist.
>
> The root cause of this issue can be found in get_context() and how the
> table-list mutex is acquired. As this code path is shared by several
> different access points within the driver, a decision was made during
> the development cycle to acquire this mutex in this location using the
> interruptible version of the mutex locking service. In almost all of
> the use-cases and environmental scenarios this holds up, even when the
> mutex is contended. However, for critical system threads (such as the
> release handler), failing to acquire the mutex and bailing with the
> intention of the user being able to try again later is unacceptable.
>
> In such a scenario, the context _must_ be derived as it is on an
> irreversible path to being freed. Without being able to derive the
> context, the code mistakenly assumes that it has already been freed
> and proceeds to free up the underlying CXL context resources. From
> this point on, any usage of [the now stale] CXL context resources
> will result in undefined behavior. This is root cause of the Oops
> mentioned as the second known issue as the mapping passed to the
> unmap_mapping_range() service is owned by the CXL context.
>
> To fix this problem, acquisition of the table-list mutex within
> get_context() is simply changed to use the uninterruptible version
> of the mutex locking service. This is safe as the timing windows for
> holding this mutex are short and also protected against blocking.
>
> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Manoj Kumar Oct. 22, 2015, 6:05 p.m. UTC | #2
Acked-by: Manoj Kumar <manoj@linux.vnet.ibm.com>

On 10/21/2015 3:16 PM, Matthew R. Ochs wrote:
> Contexts may be skipped over for cleanup in situations where contention
> for the adapter's table-list mutex is experienced in the presence of a
> signal during the execution of the release handler.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c
index 8af7cdc..34b21a0 100644
--- a/drivers/scsi/cxlflash/superpipe.c
+++ b/drivers/scsi/cxlflash/superpipe.c
@@ -162,10 +162,7 @@  struct ctx_info *get_context(struct cxlflash_cfg *cfg, u64 rctxid,
 
 	if (likely(ctxid < MAX_CONTEXT)) {
 		while (true) {
-			rc = mutex_lock_interruptible(&cfg->ctx_tbl_list_mutex);
-			if (rc)
-				goto out;
-
+			mutex_lock(&cfg->ctx_tbl_list_mutex);
 			ctxi = cfg->ctx_tbl[ctxid];
 			if (ctxi)
 				if ((file && (ctxi->file != file)) ||