[1/3] target: iscsi: fix hang in the iSCSI login code

Message ID	20230310100423.1258256-2-mlombard@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <target-devel-owner@vger.kernel.org> From: Maurizio Lombardi <mlombard@redhat.com> To: martin.petersen@oracle.com Cc: michael.christie@oracle.com, serapheim.dimitro@delphix.com, target-devel@vger.kernel.org Subject: [PATCH 1/3] target: iscsi: fix hang in the iSCSI login code Date: Fri, 10 Mar 2023 11:04:21 +0100 Message-Id: <20230310100423.1258256-2-mlombard@redhat.com> In-Reply-To: <20230310100423.1258256-1-mlombard@redhat.com> References: <20230310100423.1258256-1-mlombard@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Fix possible hangs and race conditions during login \| expand [0/3] Fix possible hangs and race conditions during login [1/3] target: iscsi: fix hang in the iSCSI login code [2/3] target: iscsi: remove unused transport_timer [3/3] target: iscsi: prevent login threads from racing between each other

Maurizio Lombardi March 10, 2023, 10:04 a.m. UTC

If the initiator suddenly stops sending data during a login while
keeping the TCP connection open, the sk_data_ready callback won't
schedule the login_work and the latter
will never timeout and release the login semaphore.

All the other login operations will therefore get stuck waiting
for the semaphore to be released.

Add a timer to check if the initiator became unresponsive, this is how it
works:

  * The timer is started in iscsi_target_start_negotiation(), before
    clearing tre LOGIN_FLAGS_INITIAL_PDU flag.
  * If iscsi_target_do_login() returned a non-zero value, this means that
    the login operation either failed or didn't require additional PDUs;
    in this case the timer is immediately stopped.
  * If iscsi_target_do_login() returned zero then the control of
    the login operation is passed to login_work that will take over the
    responsibility of releasing the login semaphore and stopping the timer.
  * If login_work gets stuck, the login timer will expire and
    will force the login to fail (by sending a SIGINT to the login kthread
    and by closing the socket).

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/target/iscsi/iscsi_target_login.c |  1 +
 drivers/target/iscsi/iscsi_target_nego.c  | 56 ++++++++++-------------
 drivers/target/iscsi/iscsi_target_util.c  | 26 +++++++++++
 drivers/target/iscsi/iscsi_target_util.h  |  3 ++
 include/target/iscsi/iscsi_target_core.h  |  1 +
 5 files changed, 56 insertions(+), 31 deletions(-)

Mike Christie March 10, 2023, 7:53 p.m. UTC | #1

On 3/10/23 4:04 AM, Maurizio Lombardi wrote:
> If the initiator suddenly stops sending data during a login while
> keeping the TCP connection open, the sk_data_ready callback won't
> schedule the login_work and the latter
> will never timeout and release the login semaphore.
> 
> All the other login operations will therefore get stuck waiting
> for the semaphore to be released.
> 

You mean np_login_sem right? Do you know why we have to serialize access
to the np during login? Is it just a simple way to handle the internal
target variables for things like MC/s, reinstatement, etc? I saw the tsih
case, but am wondering how easy it is to remove.

If we need the sem why do we use the sk_callback/non-blocking type of
approach when we can only do 1 login at a time per tpg? We end up
creating 2 threads per connection after FFP, so it seems like we would
have enough resources for a temp login thread before FFP. Looking over
the commit history, we seem to hit a good number of bugs in this code.

It seems like we could simplify the login code by do a blocking type of
implementation where:

1. __iscsi_target_login_thread starts a workstruct that runs
iscsi_target_start_negotiation.  It would then run iscsi_target_do_login_rx
which just waits for a response. When we get one, it does iscsi_target_do_login
and if we need more PDUs loops. We have one timer for all this.

2. We can remove the np_login_sem. It would be replaced by a workstruct
in the np. __iscsi_target_login_thread would just flush the work to make sure
we are not running a login on the same np already.

3. We can remove all the sk_callback* related code for iscsit tcp since
recvmsg just return failure when the state changes.

4. It looks like cxgbit will work with some small changes because I think it
just kicks off iscsi_target_do_login_rx after sending a login PDU, then it
just waits in its iscsit_get_login_rx.

5. It looks like isert will also work, because it's isert_get_login_rx can
just wait on the login_req_comp/login_comp already.

Maurizio Lombardi March 13, 2023, 9:06 a.m. UTC | #2

pá 10. 3. 2023 v 20:53 odesílatel Mike Christie
<michael.christie@oracle.com> napsal:
>
> On 3/10/23 4:04 AM, Maurizio Lombardi wrote:
> > If the initiator suddenly stops sending data during a login while
> > keeping the TCP connection open, the sk_data_ready callback won't
> > schedule the login_work and the latter
> > will never timeout and release the login semaphore.
> >
> > All the other login operations will therefore get stuck waiting
> > for the semaphore to be released.
> >
>
> You mean np_login_sem right? Do you know why we have to serialize access
> to the np during login? Is it just a simple way to handle the internal
> target variables for things like MC/s, reinstatement, etc? I saw the tsih
> case, but am wondering how easy it is to remove.

Yes, I mean the np_login_sem, right now I don't know for sure how easy
is to remove it, but I agree that probably there is room for improvement here
and the login code can be simplified a lot.

>
> If we need the sem why do we use the sk_callback/non-blocking type of
> approach when we can only do 1 login at a time per tpg?

Yeah, I guess that the author just wanted the login thread to go back to
kernel_accept() as fast as possible to be able to serve the next connection.
But yes, if the new connection is against the same tpg, the login
thread will have
to wait for the semaphore anyway.


>
> It seems like we could simplify the login code by do a blocking type of
> implementation where:
>
> 1. __iscsi_target_login_thread starts a workstruct that runs
> iscsi_target_start_negotiation.  It would then run iscsi_target_do_login_rx
> which just waits for a response. When we get one, it does iscsi_target_do_login
> and if we need more PDUs loops. We have one timer for all this.
>
> 2. We can remove the np_login_sem. It would be replaced by a workstruct
> in the np. __iscsi_target_login_thread would just flush the work to make sure
> we are not running a login on the same np already.
>
> 3. We can remove all the sk_callback* related code for iscsit tcp since
> recvmsg just return failure when the state changes.
>
> 4. It looks like cxgbit will work with some small changes because I think it
> just kicks off iscsi_target_do_login_rx after sending a login PDU, then it
> just waits in its iscsit_get_login_rx.
>
> 5. It looks like isert will also work, because it's isert_get_login_rx can
> just wait on the login_req_comp/login_comp already.
>

Ok I think I could work on this new login code.
However, I would suggest to consider taking this patchset as a
stopgap, so the Delphix guys
can keep their customers happy and I can work on this
without feeling the pressure; also, this patchset is easier and safer
to backport to
stable kernels.

Maurizio

Mike Christie March 13, 2023, 11:52 p.m. UTC | #3

On 3/10/23 4:04 AM, Maurizio Lombardi wrote:
> If the initiator suddenly stops sending data during a login while
> keeping the TCP connection open, the sk_data_ready callback won't
> schedule the login_work and the latter
> will never timeout and release the login semaphore.
> 
> All the other login operations will therefore get stuck waiting
> for the semaphore to be released.
> 
> Add a timer to check if the initiator became unresponsive, this is how it
> works:
> 
>   * The timer is started in iscsi_target_start_negotiation(), before
>     clearing tre LOGIN_FLAGS_INITIAL_PDU flag.
>   * If iscsi_target_do_login() returned a non-zero value, this means that
>     the login operation either failed or didn't require additional PDUs;
>     in this case the timer is immediately stopped.
>   * If iscsi_target_do_login() returned zero then the control of
>     the login operation is passed to login_work that will take over the
>     responsibility of releasing the login semaphore and stopping the timer.
>   * If login_work gets stuck, the login timer will expire and
>     will force the login to fail (by sending a SIGINT to the login kthread
>     and by closing the socket).
> 
> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
> ---
>  drivers/target/iscsi/iscsi_target_login.c |  1 +
>  drivers/target/iscsi/iscsi_target_nego.c  | 56 ++++++++++-------------
>  drivers/target/iscsi/iscsi_target_util.c  | 26 +++++++++++
>  drivers/target/iscsi/iscsi_target_util.h  |  3 ++
>  include/target/iscsi/iscsi_target_core.h  |  1 +
>  5 files changed, 56 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
> index 27e448c2d066..bb7d5a596266 100644
> --- a/drivers/target/iscsi/iscsi_target_login.c
> +++ b/drivers/target/iscsi/iscsi_target_login.c
> @@ -1127,6 +1127,7 @@ static struct iscsit_conn *iscsit_alloc_conn(struct iscsi_np *np)
>  	timer_setup(&conn->nopin_response_timer,
>  		    iscsit_handle_nopin_response_timeout, 0);
>  	timer_setup(&conn->nopin_timer, iscsit_handle_nopin_timeout, 0);
> +	timer_setup(&conn->login_timer, iscsit_login_timeout, 0);
>  
>  	if (iscsit_conn_set_transport(conn, np->np_transport) < 0)
>  		goto free_conn;
> diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
> index 24040c118e49..f901a7231c48 100644
> --- a/drivers/target/iscsi/iscsi_target_nego.c
> +++ b/drivers/target/iscsi/iscsi_target_nego.c
> @@ -472,12 +472,18 @@ static int iscsi_target_do_login(struct iscsit_conn *, struct iscsi_login *);
>  
>  static bool __iscsi_target_sk_check_close(struct sock *sk)
>  {
> -	if (sk->sk_state == TCP_CLOSE_WAIT || sk->sk_state == TCP_CLOSE) {
> -		pr_debug("__iscsi_target_sk_check_close: TCP_CLOSE_WAIT|TCP_CLOSE,"
> +	switch (sk->sk_state) {
> +	case TCP_FIN_WAIT1:
> +	case TCP_FIN_WAIT2:
> +	case TCP_CLOSE_WAIT:
> +	case TCP_LAST_ACK:
> +	case TCP_CLOSE:
> +		pr_debug("__iscsi_target_sk_check_close: socket closing,"
>  			"returning TRUE\n");

Don't need to break up a string. We do it a lot in the lio code, but we've
been trying not to in new code.

>  		return true;
> +	default:
> +		return false;
>  	}
> -	return false;
>  }
>  
>  static bool iscsi_target_sk_check_close(struct iscsit_conn *conn)
> @@ -535,25 +541,6 @@ static void iscsi_target_login_drop(struct iscsit_conn *conn, struct iscsi_login
>  	iscsi_target_login_sess_out(conn, zero_tsih, true);
>  }
>  
> -struct conn_timeout {
> -	struct timer_list timer;
> -	struct iscsit_conn *conn;
> -};
> -
> -static void iscsi_target_login_timeout(struct timer_list *t)
> -{
> -	struct conn_timeout *timeout = from_timer(timeout, t, timer);
> -	struct iscsit_conn *conn = timeout->conn;
> -
> -	pr_debug("Entering iscsi_target_login_timeout >>>>>>>>>>>>>>>>>>>\n");
> -
> -	if (conn->login_kworker) {
> -		pr_debug("Sending SIGINT to conn->login_kworker %s/%d\n",
> -			 conn->login_kworker->comm, conn->login_kworker->pid);
> -		send_sig(SIGINT, conn->login_kworker, 1);
> -	}
> -}
> -
>  static void iscsi_target_do_login_rx(struct work_struct *work)
>  {
>  	struct iscsit_conn *conn = container_of(work,
> @@ -562,7 +549,6 @@ static void iscsi_target_do_login_rx(struct work_struct *work)
>  	struct iscsi_np *np = login->np;
>  	struct iscsi_portal_group *tpg = conn->tpg;
>  	struct iscsi_tpg_np *tpg_np = conn->tpg_np;
> -	struct conn_timeout timeout;
>  	int rc, zero_tsih = login->zero_tsih;
>  	bool state;
>  
> @@ -597,17 +583,13 @@ static void iscsi_target_do_login_rx(struct work_struct *work)
>  		goto err;
>  	}
>  
> -	conn->login_kworker = current;
>  	allow_signal(SIGINT);
> -
> -	timeout.conn = conn;
> -	timer_setup_on_stack(&timeout.timer, iscsi_target_login_timeout, 0);
> -	mod_timer(&timeout.timer, jiffies + TA_LOGIN_TIMEOUT * HZ);
> -	pr_debug("Starting login timer for %s/%d\n", current->comm, current->pid);
> +	iscsit_start_login_timer(conn);
> +	conn->login_kworker = current;>  
>  	rc = conn->conn_transport->iscsit_get_login_rx(conn, login);
> -	del_timer_sync(&timeout.timer);
> -	destroy_timer_on_stack(&timeout.timer);
> +
> +	iscsit_stop_login_timer(conn);
>  	flush_signals(current);
>  	conn->login_kworker = NULL;
>  
> @@ -646,6 +628,13 @@ static void iscsi_target_do_login_rx(struct work_struct *work)
>  		if (iscsi_target_sk_check_and_clear(conn,
>  						    LOGIN_FLAGS_WRITE_ACTIVE))
>  			goto err;
> +
> +		/*
> +		 * Restart the login timer to prevent the
> +		 * login process from getting stuck if the initiator

I would fix up the formatting so the first line is longer.

> +		 * stops sending data.
> +		 */
> +		iscsit_start_login_timer(conn);
>  	} else if (rc == 1) {
>  		cancel_delayed_work(&conn->login_work);
>  		iscsi_target_nego_release(conn);
> @@ -657,6 +646,7 @@ static void iscsi_target_do_login_rx(struct work_struct *work)
>  err:
>  	iscsi_target_restore_sock_callbacks(conn);
>  	cancel_delayed_work(&conn->login_work);
> +	iscsit_stop_login_timer(conn);>  	iscsi_target_login_drop(conn, login);
>  	iscsit_deaccess_np(np, tpg, tpg_np);
>  }
> @@ -1358,6 +1348,9 @@ int iscsi_target_start_negotiation(
>  		set_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
>  		write_unlock_bh(&sk->sk_callback_lock);
>  	}
> +
> +	iscsit_start_login_timer(conn);

At this time, we have the np->np_login_timer running right?

Don't we only need to start this new timer when we know there are
more PDUs and the connection is good (iscsi_target_do_login returns
0 and iscsi_target_sk_check_and_clear returns 0)?

I think you can just kill np timer and only use the login_timer for
both cases. So I mean set the thread to kill as the login one and start
this login_timer in __iscsi_target_login_thread where we used to call
iscsi_start_login_thread_timer. You would then mod the timer when we
transition from iscsi_target_start_negotiation to waiting for the next
PDU.

> +
>  	/*
>  	 * If iscsi_target_do_login returns zero to signal more PDU
>  	 * exchanges are required to complete the login, go ahead and
> @@ -1377,6 +1370,7 @@ int iscsi_target_start_negotiation(
>  	}
>  	if (ret != 0) {
>  		cancel_delayed_work_sync(&conn->login_work);
> +		iscsit_stop_login_timer(conn);
>  		iscsi_target_nego_release(conn);
>  	}
>  
> diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
> index 26dc8ed3045b..414e883c5a0d 100644
> --- a/drivers/target/iscsi/iscsi_target_util.c
> +++ b/drivers/target/iscsi/iscsi_target_util.c
> @@ -1040,6 +1040,32 @@ void iscsit_stop_nopin_timer(struct iscsit_conn *conn)
>  	spin_unlock_bh(&conn->nopin_timer_lock);
>  }
>  
> +void iscsit_login_timeout(struct timer_list *t)
> +{
> +	struct iscsit_conn *conn = from_timer(conn, t, login_timer);
> +
> +	pr_debug("Entering iscsi_target_login_timeout >>>>>>>>>>>>>>>>>>>\n");
> +
> +	if (conn->login_kworker) {
> +		pr_debug("Sending SIGINT to conn->login_kworker %s/%d\n",
> +			 conn->login_kworker->comm, conn->login_kworker->pid);
> +		send_sig(SIGINT, conn->login_kworker, 1);
> +	}
> +	kernel_sock_shutdown(conn->sock, SHUT_RDWR);

For isert and cxgbit we won't have conn->sock set so I think you need some
sort of callout for those drivers, or maybe set LOGIN_FLAGS_CLOSED and queue
the login_work. Maybe the latter will work for all drivers as well. You probably
need some extra locking and LOGIN_FLAGS checks to handle an issue similar to
below.

If we do need to do the kernel_sock_shutdown we might need to add some code
to not call it twice (I'm not sure it's fully supported). It looks like we
can race and do:

1. login_work has just been queued.
2. But the login timer fires. We run kernel_sock_shutdown and
iscsi_target_sk_state_change starts to run but has not yet taken
the sk_callback_lock because..
3. iscsit_get_login_rx already started and passed the
iscsi_target_sk_check_close check. It re-arms the timer and calls
iscsit_get_login_rx.
4. iscsi_target_sk_state_change takes the lock and sets LOGIN_FLAGS_CLOSED.
5. We timeout again from #3 since we called kernel_sock_shutdown before.

Maurizio Lombardi March 14, 2023, 11:09 a.m. UTC | #4

út 14. 3. 2023 v 0:52 odesílatel Mike Christie
<michael.christie@oracle.com> napsal:
>
> > +     case TCP_CLOSE:
> > +             pr_debug("__iscsi_target_sk_check_close: socket closing,"
> >                       "returning TRUE\n");
>
> Don't need to break up a string. We do it a lot in the lio code, but we've
> been trying not to in new code.
>
> > +             /*
> > +              * Restart the login timer to prevent the
> > +              * login process from getting stuck if the initiator
>
> I would fix up the formatting so the first line is longer.

Ok

> > @@ -1358,6 +1348,9 @@ int iscsi_target_start_negotiation(
> >               set_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
> >               write_unlock_bh(&sk->sk_callback_lock);
> >       }
> > +
> > +     iscsit_start_login_timer(conn);
>
> At this time, we have the np->np_login_timer running right?

Yes.

>
> Don't we only need to start this new timer when we know there are
> more PDUs and the connection is good (iscsi_target_do_login returns
> 0 and iscsi_target_sk_check_and_clear returns 0)?

The moment iscsi_target_sk_check_and_clear() clears the
LOGIN_FLAGS_INITIAL_PDU flag
and returns 0, the login worker may be already running.
If we start the timer after the call to
iscsi_target_sk_check_and_clear(), we could have a race condition:

1) login_work runs and reschedules itself non-stop because
LOGIN_FLAGS_INITIAL_PDU is set
2) login kthread calls  iscsi_target_sk_check_and_clear() and clears
LOGIN_FLAGS_INITIAL_PDU
3) login work runs and completes the login
4) login kthread starts the timer
5) No one stops the timer, it fires and kills the connection despite
the fact the login was successful.

I could however replace this code:

ret = iscsi_target_do_login(conn, login);
 if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU))
           ret = -1;

with the following, if you like it more:

ret = iscsi_target_do_login(conn, login);
if (!ret) {
      iscsit_start_login_timer(conn);
      if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU)) {
           iscsit_stop_login_timer(conn);
           ret = -1;
      }
}

>
> I think you can just kill np timer and only use the login_timer for
> both cases. So I mean set the thread to kill as the login one and start
> this login_timer in __iscsi_target_login_thread where we used to call
> iscsi_start_login_thread_timer. You would then mod the timer when we
> transition from iscsi_target_start_negotiation to waiting for the next
> PDU.
>

Yes, maybe, but I would need to find a way to detect if conn->login_kworker
is pointing to the login thread or to the login_work's thread, because
the np_login_timer is supposed to clear the ISCSI_TF_RUNNING flag.

maybe something like this:

if (conn->login_kworker == conn->tpg_np->tpg_np->np_thread) {
     spin_lock_bh(&np->np_thread_lock);
     if (!(np->np_login_timer_flags & ISCSI_TF_STOP))
           np->np_login_timer_flags &= ~ISCSI_TF_RUNNING;
     spin_unlock_bh(&np->np_thread_lock);
}

> For isert and cxgbit we won't have conn->sock set so I think you need some
> sort of callout for those drivers, or maybe set LOGIN_FLAGS_CLOSED and queue
> the login_work. Maybe the latter will work for all drivers as well. You probably
> need some extra locking and LOGIN_FLAGS checks to handle an issue similar to
> below.

Hmm, that would need to be tested, because LOGIN_FLAGS_CLOSED is supposed
to be set when the socket is already in the process of getting closed
(it's state is TCP_CLOSE_WAIT or TCP_FIN_WAIT* or whatever)
So If I set LOGIN_FLAGS_CLOSED in the timer and the socket is
TCP_ESTABLISHED it means that I am trying to
do the opposite, will the socket be properly closed
by isert/cxgbit in this case?

>
> If we do need to do the kernel_sock_shutdown we might need to add some code
> to not call it twice (I'm not sure it's fully supported). It looks like we
> can race and do:

Correct, thanks. In that case a check like this should be sufficient.

if (!test_bit( LOGIN_FLAGS_CLOSED))
     kernel_sock_shutdown(sock, SHUT_RDWR);


Maurizio

Mike Christie March 14, 2023, 9:23 p.m. UTC | #5

On 3/14/23 6:09 AM, Maurizio Lombardi wrote:
> út 14. 3. 2023 v 0:52 odesílatel Mike Christie
> <michael.christie@oracle.com> napsal:
>>
>>> +     case TCP_CLOSE:
>>> +             pr_debug("__iscsi_target_sk_check_close: socket closing,"
>>>                       "returning TRUE\n");
>>
>> Don't need to break up a string. We do it a lot in the lio code, but we've
>> been trying not to in new code.
>>
>>> +             /*
>>> +              * Restart the login timer to prevent the
>>> +              * login process from getting stuck if the initiator
>>
>> I would fix up the formatting so the first line is longer.
> 
> Ok
> 
>>> @@ -1358,6 +1348,9 @@ int iscsi_target_start_negotiation(
>>>               set_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
>>>               write_unlock_bh(&sk->sk_callback_lock);
>>>       }
>>> +
>>> +     iscsit_start_login_timer(conn);
>>
>> At this time, we have the np->np_login_timer running right?
> 
> Yes.
> 
>>
>> Don't we only need to start this new timer when we know there are
>> more PDUs and the connection is good (iscsi_target_do_login returns
>> 0 and iscsi_target_sk_check_and_clear returns 0)?
> 
> The moment iscsi_target_sk_check_and_clear() clears the
> LOGIN_FLAGS_INITIAL_PDU flag
> and returns 0, the login worker may be already running.
> If we start the timer after the call to
> iscsi_target_sk_check_and_clear(), we could have a race condition:
> 
> 1) login_work runs and reschedules itself non-stop because
> LOGIN_FLAGS_INITIAL_PDU is set
> 2) login kthread calls  iscsi_target_sk_check_and_clear() and clears
> LOGIN_FLAGS_INITIAL_PDU
> 3) login work runs and completes the login
> 4) login kthread starts the timer
> 5) No one stops the timer, it fires and kills the connection despite
> the fact the login was successful.
> 
> I could however replace this code:
>
> ret = iscsi_target_do_login(conn, login);
>  if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU))
>            ret = -1;
> 
> with the following, if you like it more:
> 
> ret = iscsi_target_do_login(conn, login);
> if (!ret) {
>       iscsit_start_login_timer(conn);
>       if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU)) {
>            iscsit_stop_login_timer(conn);
>            ret = -1;
>       }
> }

Ah yeah, I wasn't thinking specifically about this race when I wrote the
above comment. With the combined timer below, I was thinking this is handled
when you set/check the login_kworker thread.

> 
>>
>> I think you can just kill np timer and only use the login_timer for
>> both cases. So I mean set the thread to kill as the login one and start
>> this login_timer in __iscsi_target_login_thread where we used to call
>> iscsi_start_login_thread_timer. You would then mod the timer when we
>> transition from iscsi_target_start_negotiation to waiting for the next
>> PDU.
>>
> 
> Yes, maybe, but I would need to find a way to detect if conn->login_kworker
> is pointing to the login thread or to the login_work's thread, because
> the np_login_timer is supposed to clear the ISCSI_TF_RUNNING flag.
> 
> maybe something like this:
> 
> if (conn->login_kworker == conn->tpg_np->tpg_np->np_thread) {
>      spin_lock_bh(&np->np_thread_lock);
>      if (!(np->np_login_timer_flags & ISCSI_TF_STOP))
>            np->np_login_timer_flags &= ~ISCSI_TF_RUNNING;
>      spin_unlock_bh(&np->np_thread_lock);
> }

We don't need any of the np_login_timer_flags code if we are using your per
conn login_timer do we?

For the new timer:
- We are adding one per conn timer.
- We use that for both the initial pdu and later ones.
- The timeout function, sends a signal if there is a thread set or does whatever
we figure out below for the case where there is no thread (we don't do any
np_login_timer_flags stuff).
- We probably don't need to do both the signal and whatever we decide below.
Or, we need to check some of the LOGIN_FLAGS since for example we don't
need to queue the login_work and set LOGIN_FLAGS_CLOSED if LOGIN_FLAGS_INITIAL_PDU
is set.
- The iscsi_start_login_timer function handles setting the login_kworker thread.

So we do:
1. Replace iscsi_start_login_thread_timer/iscsi_stop_login_thread_timer with
iscsit_start_login_timer/iscsit_stop_login_timer

__iscsi_target_login_thread only calls iscsit_stop_login_timer on failure.
On success it will either timeout waiting for the next PDU or
iscsi_target_do_login_rx will called and reset the timer.

2. You also have a iscsit_mod_login_timer depending on how you want to handle
the race you described above.
3. iscsi_target_start_negotiation only mods the timer if iscsi_target_do_login/
iscsi_target_sk_check_and_clear is successful and if iscsi_target_do_login_rx
has not already run.

For the latter, in iscsi_start/mod_login_timer you could add a check like your
np_thread above where if the login_work has already reset login_kworker then
we just return. Or we can add a new LOGIN_FLAGS flag, or add a bool arg to
iscsi_start/mod_login_timer where if set to true and if the login_kworker thread
does not equal current, then you know iscsi_target_do_login_rx already took over
the timer and do nothing.

4. You call iscsit_start_login_timer/iscsit_stop_login_timer like you are in
iscsi_target_do_login_rx.

> 
>> For isert and cxgbit we won't have conn->sock set so I think you need some
>> sort of callout for those drivers, or maybe set LOGIN_FLAGS_CLOSED and queue
>> the login_work. Maybe the latter will work for all drivers as well. You probably
>> need some extra locking and LOGIN_FLAGS checks to handle an issue similar to
>> below.
> 
> Hmm, that would need to be tested, because LOGIN_FLAGS_CLOSED is supposed> to be set when the socket is already in the process of getting closed
> (it's state is TCP_CLOSE_WAIT or TCP_FIN_WAIT* or whatever)
> So If I set LOGIN_FLAGS_CLOSED in the timer and the socket is
> TCP_ESTABLISHED it means that I am trying to
> do the opposite, will the socket be properly closed
> by isert/cxgbit in this case?

I'm open to suggestions. I don't really care how we fix it, but it should
work for all the drivers.

If we go the login_work route, ignore LOGIN_FLAGS_CLOSED for a second. I'm
just talking about the code path we use for error handling in this type of
case. We can easily just add a new bit.

- For iscsit tcp, if we are reading in data in iscsit_get_login_rx, and the
timer fires and we get a signal, iscsit_get_login_rx will return a failure
and we do the goto err path. So in this case we, the connection is not closed
and we go through iscsi_target_login_drop, so we know it works.

isert/cxgbit doesn't use sockets. We don't hit any of the LOGIN_FLAGS or
sk_state checks for them. The tcp connection might or might not be closed
for both drivers when the timer fires.

- For cxgbit, we can be waiting in iscsit_get_login_rx. It seems to kick this off
after we send a login PDU. The connection might be open or closed, then we timeout
and get a signal and break from our wait in iscsit_get_login_rx. We then do the
goto err path like iscsit tcp.

- For isert, after the first PDU we queue login_work when we get a login PDU.
So, I don't think we do anything right now if after the first PDU nothing is
sent like in your bug. iscsi_target_login_sess_out works for the initial PDU
timing out though, so we know we can call it when the conn is open.

Note: The isert_disconnected_handler is sort of like tcp's sk_state_change callout,
but isert just calls to iscsit_cause_connection_reinstatement which doesn't do
anything (it doesn't set LOGIN_FLAGS and iscsi_target_nego.c doesn't have any
checks for what isert is doing).

My suggestion to set LOGIN_FLAGS_CLOSED and then to queue the login work would
work like the iscsit tcp case above. When the work runs, it sees the bit and we
go down the goto err path and run iscsi_target_login_drop.

- If your concern is that we would abuse LOGIN_FLAGS_CLOSED, then we can add a
new bit.
- I didn't write out everything. Like the ordering of cancel_delayed_work and
iscsit_stop_login_timer would need to be reversed. We probably can add some
more LOGIN_FLAGS checks like in iscsi_target_sk_state_change so we don't always
queue the login_work like is done when the ACTIVE bits are set (or we could
kill that and just always queue the login_work).

If your concern was that we have no idea about all the code paths then yeah
it needs testing.

Maurizio Lombardi March 21, 2023, 5:13 p.m. UTC | #6

út 14. 3. 2023 v 22:23 odesílatel Mike Christie
<michael.christie@oracle.com> napsal:
>
> On 3/14/23 6:09 AM, Maurizio Lombardi wrote:
> > út 14. 3. 2023 v 0:52 odesílatel Mike Christie
> > <michael.christie@oracle.com> napsal:
> >>
> >>> +     case TCP_CLOSE:
> >>> +             pr_debug("__iscsi_target_sk_check_close: socket closing,"
> >>>                       "returning TRUE\n");
> >>
> >> Don't need to break up a string. We do it a lot in the lio code, but we've
> >> been trying not to in new code.
> >>
> >>> +             /*
> >>> +              * Restart the login timer to prevent the
> >>> +              * login process from getting stuck if the initiator
> >>
> >> I would fix up the formatting so the first line is longer.
> >
> > Ok
> >
> >>> @@ -1358,6 +1348,9 @@ int iscsi_target_start_negotiation(
> >>>               set_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
> >>>               write_unlock_bh(&sk->sk_callback_lock);
> >>>       }
> >>> +
> >>> +     iscsit_start_login_timer(conn);
> >>
> >> At this time, we have the np->np_login_timer running right?
> >
> > Yes.
> >
> >>
> >> Don't we only need to start this new timer when we know there are
> >> more PDUs and the connection is good (iscsi_target_do_login returns
> >> 0 and iscsi_target_sk_check_and_clear returns 0)?
> >
> > The moment iscsi_target_sk_check_and_clear() clears the
> > LOGIN_FLAGS_INITIAL_PDU flag
> > and returns 0, the login worker may be already running.
> > If we start the timer after the call to
> > iscsi_target_sk_check_and_clear(), we could have a race condition:
> >
> > 1) login_work runs and reschedules itself non-stop because
> > LOGIN_FLAGS_INITIAL_PDU is set
> > 2) login kthread calls  iscsi_target_sk_check_and_clear() and clears
> > LOGIN_FLAGS_INITIAL_PDU
> > 3) login work runs and completes the login
> > 4) login kthread starts the timer
> > 5) No one stops the timer, it fires and kills the connection despite
> > the fact the login was successful.
> >
> > I could however replace this code:
> >
> > ret = iscsi_target_do_login(conn, login);
> >  if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU))
> >            ret = -1;
> >
> > with the following, if you like it more:
> >
> > ret = iscsi_target_do_login(conn, login);
> > if (!ret) {
> >       iscsit_start_login_timer(conn);
> >       if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU)) {
> >            iscsit_stop_login_timer(conn);
> >            ret = -1;
> >       }
> > }
>
> Ah yeah, I wasn't thinking specifically about this race when I wrote the
> above comment. With the combined timer below, I was thinking this is handled
> when you set/check the login_kworker thread.
>
> >
> >>
> >> I think you can just kill np timer and only use the login_timer for
> >> both cases. So I mean set the thread to kill as the login one and start
> >> this login_timer in __iscsi_target_login_thread where we used to call
> >> iscsi_start_login_thread_timer. You would then mod the timer when we
> >> transition from iscsi_target_start_negotiation to waiting for the next
> >> PDU.
> >>
> >
> > Yes, maybe, but I would need to find a way to detect if conn->login_kworker
> > is pointing to the login thread or to the login_work's thread, because
> > the np_login_timer is supposed to clear the ISCSI_TF_RUNNING flag.
> >
> > maybe something like this:
> >
> > if (conn->login_kworker == conn->tpg_np->tpg_np->np_thread) {
> >      spin_lock_bh(&np->np_thread_lock);
> >      if (!(np->np_login_timer_flags & ISCSI_TF_STOP))
> >            np->np_login_timer_flags &= ~ISCSI_TF_RUNNING;
> >      spin_unlock_bh(&np->np_thread_lock);
> > }
>
> We don't need any of the np_login_timer_flags code if we are using your per
> conn login_timer do we?

Ah yes you're right, I was just confused by all those
"ISCSI_TF_RUNNING/STOP" flags used all
over the place, then I realized that every timer has its own flags.

>
> For the new timer:
> - We are adding one per conn timer.
> - We use that for both the initial pdu and later ones.
> - The timeout function, sends a signal if there is a thread set or does whatever
> we figure out below for the case where there is no thread (we don't do any
> np_login_timer_flags stuff).
> - We probably don't need to do both the signal and whatever we decide below.
> Or, we need to check some of the LOGIN_FLAGS since for example we don't
> need to queue the login_work and set LOGIN_FLAGS_CLOSED if LOGIN_FLAGS_INITIAL_PDU
> is set.
> - The iscsi_start_login_timer function handles setting the login_kworker thread.

Ok I have a V2 almost ready, I'm testing it right now.

Maurizio

[1/3] target: iscsi: fix hang in the iSCSI login code

Commit Message

Comments

Patch