Message ID | 20210416020440.259271-6-michael.christie@oracle.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | libicsi and qedi TMF fixes | expand |
On 4/15/21 7:04 PM, Mike Christie wrote: > If we haven't done an unbind target call, we can race during conn > destruction where iscsi_conn_teardown wakes up the eh/abort thread and its > still accessing a task while iscsi_conn_teardown is freeing the conn. This > patch has us wait for all threads to drop their refs to outstanding tasks > during conn destruction. > > There is also an issue where we could be accessing the conn directly via > fields like conn->ehwait in the eh callbacks. The next patch will fix > those. > > Signed-off-by: Mike Christie <michael.christie@oracle.com> > --- > drivers/scsi/libiscsi.c | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index ce3898fdb10f..ce6d04035c64 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -3120,6 +3120,24 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session, int dd_size, > } > EXPORT_SYMBOL_GPL(iscsi_conn_setup); > > +static bool iscsi_session_has_tasks(struct iscsi_session *session) > +{ > + struct iscsi_task *task; > + int i; > + > + spin_lock_bh(&session->back_lock); > + for (i = 0; i < session->cmds_max; i++) { > + task = session->cmds[i]; > + > + if (task->sc) { > + spin_unlock_bh(&session->back_lock); > + return true; > + } > + } > + spin_unlock_bh(&session->back_lock); > + return false; > +} > + > /** > * iscsi_conn_teardown - teardown iscsi connection > * @cls_conn: iscsi class connection > @@ -3144,7 +3162,17 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn) > session->state = ISCSI_STATE_TERMINATE; > wake_up(&conn->ehwait); > } > + > spin_unlock_bh(&session->frwd_lock); > + mutex_unlock(&session->eh_mutex); > + /* > + * If the caller didn't do a target unbind we could be exiting a > + * scsi-ml entry point that had a task ref. Wait on them here. > + */ > + while (iscsi_session_has_tasks(session)) > + msleep(50); Is there a limit on the amount of time this might spin? > + > + mutex_lock(&session->eh_mutex); > > /* flush queued up work because we free the connection below */ > iscsi_suspend_tx(conn); >
On 4/22/21 10:02 AM, Lee Duncan wrote: > On 4/15/21 7:04 PM, Mike Christie wrote: >> If we haven't done an unbind target call, we can race during conn >> destruction where iscsi_conn_teardown wakes up the eh/abort thread and its >> still accessing a task while iscsi_conn_teardown is freeing the conn. This >> patch has us wait for all threads to drop their refs to outstanding tasks >> during conn destruction. >> >> There is also an issue where we could be accessing the conn directly via >> fields like conn->ehwait in the eh callbacks. The next patch will fix >> those. >> >> Signed-off-by: Mike Christie <michael.christie@oracle.com> >> --- >> drivers/scsi/libiscsi.c | 28 ++++++++++++++++++++++++++++ >> 1 file changed, 28 insertions(+) >> >> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c >> index ce3898fdb10f..ce6d04035c64 100644 >> --- a/drivers/scsi/libiscsi.c >> +++ b/drivers/scsi/libiscsi.c >> @@ -3120,6 +3120,24 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session, int dd_size, >> } >> EXPORT_SYMBOL_GPL(iscsi_conn_setup); >> >> +static bool iscsi_session_has_tasks(struct iscsi_session *session) >> +{ >> + struct iscsi_task *task; >> + int i; >> + >> + spin_lock_bh(&session->back_lock); >> + for (i = 0; i < session->cmds_max; i++) { >> + task = session->cmds[i]; >> + >> + if (task->sc) { >> + spin_unlock_bh(&session->back_lock); >> + return true; >> + } >> + } >> + spin_unlock_bh(&session->back_lock); >> + return false; >> +} >> + >> /** >> * iscsi_conn_teardown - teardown iscsi connection >> * @cls_conn: iscsi class connection >> @@ -3144,7 +3162,17 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn) >> session->state = ISCSI_STATE_TERMINATE; >> wake_up(&conn->ehwait); >> } >> + >> spin_unlock_bh(&session->frwd_lock); >> + mutex_unlock(&session->eh_mutex); >> + /* >> + * If the caller didn't do a target unbind we could be exiting a >> + * scsi-ml entry point that had a task ref. Wait on them here. >> + */ >> + while (iscsi_session_has_tasks(session)) >> + msleep(50); > > Is there a limit on the amount of time this might spin? No. Are you asking because you think there should be one or because you just wanted to check? It's really like the patch description says and is a quick wait. If you want to add a limit, we can return and leak mem and maybe crash because we've left some objects running and setup. Maybe just get a warning when we remove the session. Or we can let the crash happen.
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index ce3898fdb10f..ce6d04035c64 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -3120,6 +3120,24 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session, int dd_size, } EXPORT_SYMBOL_GPL(iscsi_conn_setup); +static bool iscsi_session_has_tasks(struct iscsi_session *session) +{ + struct iscsi_task *task; + int i; + + spin_lock_bh(&session->back_lock); + for (i = 0; i < session->cmds_max; i++) { + task = session->cmds[i]; + + if (task->sc) { + spin_unlock_bh(&session->back_lock); + return true; + } + } + spin_unlock_bh(&session->back_lock); + return false; +} + /** * iscsi_conn_teardown - teardown iscsi connection * @cls_conn: iscsi class connection @@ -3144,7 +3162,17 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn) session->state = ISCSI_STATE_TERMINATE; wake_up(&conn->ehwait); } + spin_unlock_bh(&session->frwd_lock); + mutex_unlock(&session->eh_mutex); + /* + * If the caller didn't do a target unbind we could be exiting a + * scsi-ml entry point that had a task ref. Wait on them here. + */ + while (iscsi_session_has_tasks(session)) + msleep(50); + + mutex_lock(&session->eh_mutex); /* flush queued up work because we free the connection below */ iscsi_suspend_tx(conn);
If we haven't done an unbind target call, we can race during conn destruction where iscsi_conn_teardown wakes up the eh/abort thread and its still accessing a task while iscsi_conn_teardown is freeing the conn. This patch has us wait for all threads to drop their refs to outstanding tasks during conn destruction. There is also an issue where we could be accessing the conn directly via fields like conn->ehwait in the eh callbacks. The next patch will fix those. Signed-off-by: Mike Christie <michael.christie@oracle.com> --- drivers/scsi/libiscsi.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)