Message ID | 1495776751-4746-1-git-send-email-nab@linux-iscsi.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Thanks for the patch. On 05/26/2017 12:32 AM, Nicholas A. Bellinger wrote: > > - state = iscsi_target_sk_state_check(sk); > - write_unlock_bh(&sk->sk_callback_lock); > - > - pr_debug("iscsi_target_sk_state_change: state: %d\n", state); > + orig_state_change(sk); > > - if (!state) { > - pr_debug("iscsi_target_sk_state_change got failed state\n"); > - schedule_delayed_work(&conn->login_cleanup_work, 0); I think login_cleanup_work is no longer used so you can also remove it and its code. The patch fixes the crash for me. However, is there a possible regression where if the initiator attempts new relogins we could run out of memory? With the old code, we would free the login attempts resources at this time, but with the new code the initiator will send more login attempts and so we just keep allocating more memory for each attempt until we run out or the login is finally able to complete.
Hey MNC, On Fri, 2017-05-26 at 22:14 -0500, Mike Christie wrote: > Thanks for the patch. > Btw, after running DATERA's internal longevity and scale tests across ~20 racks on v4.1.y with this patch over the long weekend, there haven't been any additional regressions. > On 05/26/2017 12:32 AM, Nicholas A. Bellinger wrote: > > > > - state = iscsi_target_sk_state_check(sk); > > - write_unlock_bh(&sk->sk_callback_lock); > > - > > - pr_debug("iscsi_target_sk_state_change: state: %d\n", state); > > + orig_state_change(sk); > > > > - if (!state) { > > - pr_debug("iscsi_target_sk_state_change got failed state\n"); > > - schedule_delayed_work(&conn->login_cleanup_work, 0); > > I think login_cleanup_work is no longer used so you can also remove it > and its code. Yep, since this needs to goto stable, I left that part out for now.. Will take care of that post -rc4. > > The patch fixes the crash for me. However, is there a possible > regression where if the initiator attempts new relogins we could run out > of memory? With the old code, we would free the login attempts resources > at this time, but with the new code the initiator will send more login > attempts and so we just keep allocating more memory for each attempt > until we run out or the login is finally able to complete. AFAICT, no. For the two cases in question: - Initial login request PDU processing done within iscsi_np kthread context in iscsi_target_start_negotiation(), and - subsequent login request PDU processing done by delayed work-queue kthread context in iscsi_target_do_login_rx() this patch doesn't change how aggressively connection cleanup happens for failed login attempts in the face of new connection login attempts for either case. For the first case when iscsi_np process context invokes iscsi_target_start_negotiation() -> iscsi_target_do_login() -> iscsi_check_for_session_reinstatement() to wait for backend I/O to complete, it still blocks other new connections from being accepted on the specific iscsi_np process context. This patch doesn't change this behavior. What it does change is when the host closes the connection and iscsi_target_sk_state_change() gets invoked, iscsi_np process context waits for iscsi_check_for_session_reinstatement() to complete before releasing the connection resources. However since iscsi_np process context is blocked, new connections won't be accepted until the new connection forcing session reinstatement finishes waiting for outstanding backend I/O to complete. For the second case of subsequent non initial login request PDUs handled within delayed work-queue process context, AFAICT this patch doesn't change the original behavior either.. Namely when iscsi_target_do_login_rx() is active and host closes the connection causing iscsi_target_sk_state_change() to be invoked, it still checks for LOGIN_FLAGS_READ_ACTIVE and doesn't queue shutdown to occur. As per the original logic preceding this change, it continues to wait for iscsi_target_do_login_rx() to complete in delayed work-queue context, unless sock_recvmsg() returns a failure (which it should once TCP_CLOSE occurs) or times out via iscsi_target_login_timeout(). Once the failure is detected from iscsi_target_do_login_rx(), the remaining connection resources are related from there. That said, was there another case you had in mind..?
On 05/30/2017 11:58 PM, Nicholas A. Bellinger wrote: > Hey MNC, > > On Fri, 2017-05-26 at 22:14 -0500, Mike Christie wrote: >> Thanks for the patch. >> > > Btw, after running DATERA's internal longevity and scale tests across > ~20 racks on v4.1.y with this patch over the long weekend, there haven't > been any additional regressions. > >> On 05/26/2017 12:32 AM, Nicholas A. Bellinger wrote: >>> >>> - state = iscsi_target_sk_state_check(sk); >>> - write_unlock_bh(&sk->sk_callback_lock); >>> - >>> - pr_debug("iscsi_target_sk_state_change: state: %d\n", state); >>> + orig_state_change(sk); >>> >>> - if (!state) { >>> - pr_debug("iscsi_target_sk_state_change got failed state\n"); >>> - schedule_delayed_work(&conn->login_cleanup_work, 0); >> >> I think login_cleanup_work is no longer used so you can also remove it >> and its code. > > Yep, since this needs to goto stable, I left that part out for now.. > > Will take care of that post -rc4. > >> >> The patch fixes the crash for me. However, is there a possible >> regression where if the initiator attempts new relogins we could run out >> of memory? With the old code, we would free the login attempts resources >> at this time, but with the new code the initiator will send more login >> attempts and so we just keep allocating more memory for each attempt >> until we run out or the login is finally able to complete. > > AFAICT, no. For the two cases in question: > > - Initial login request PDU processing done within iscsi_np kthread > context in iscsi_target_start_negotiation(), and > - subsequent login request PDU processing done by delayed work-queue > kthread context in iscsi_target_do_login_rx() > > this patch doesn't change how aggressively connection cleanup happens > for failed login attempts in the face of new connection login attempts > for either case. > > For the first case when iscsi_np process context invokes > iscsi_target_start_negotiation() -> iscsi_target_do_login() -> > iscsi_check_for_session_reinstatement() to wait for backend I/O to > complete, it still blocks other new connections from being accepted on > the specific iscsi_np process context. > > This patch doesn't change this behavior. > > What it does change is when the host closes the connection and > iscsi_target_sk_state_change() gets invoked, iscsi_np process context > waits for iscsi_check_for_session_reinstatement() to complete before > releasing the connection resources. > > However since iscsi_np process context is blocked, new connections won't > be accepted until the new connection forcing session reinstatement > finishes waiting for outstanding backend I/O to complete. I was seeing this. My original mail asked about iscsi login resources incorrectly, but like you said we do not get that far. I get a giant backlog (1 connection request per 5 seconds that we waited) of tcp level connection requests and drops. When the wait is done I get a flood of "iSCSI Login negotiation failed" due to the target handling all those now stale requests/drops. If we do not care about the memory use at the network level for this case (it seems like a little and reconnects are not aggressive), then patch works ok for me. I am guessing it gets nasty to handle, so maybe not worth it to handle right now? I tried to do it in my patch which is why it got all crazy with the waits/wakeups :) Thanks, and you can add a tested-by or reviewed-by from me.
On Wed, 2017-05-31 at 15:28 -0500, Mike Christie wrote: > On 05/30/2017 11:58 PM, Nicholas A. Bellinger wrote: > > Hey MNC, > > > > On Fri, 2017-05-26 at 22:14 -0500, Mike Christie wrote: > >> Thanks for the patch. <SNIP> > >> The patch fixes the crash for me. However, is there a possible > >> regression where if the initiator attempts new relogins we could run out > >> of memory? With the old code, we would free the login attempts resources > >> at this time, but with the new code the initiator will send more login > >> attempts and so we just keep allocating more memory for each attempt > >> until we run out or the login is finally able to complete. > > > > AFAICT, no. For the two cases in question: > > > > - Initial login request PDU processing done within iscsi_np kthread > > context in iscsi_target_start_negotiation(), and > > - subsequent login request PDU processing done by delayed work-queue > > kthread context in iscsi_target_do_login_rx() > > > > this patch doesn't change how aggressively connection cleanup happens > > for failed login attempts in the face of new connection login attempts > > for either case. > > > > For the first case when iscsi_np process context invokes > > iscsi_target_start_negotiation() -> iscsi_target_do_login() -> > > iscsi_check_for_session_reinstatement() to wait for backend I/O to > > complete, it still blocks other new connections from being accepted on > > the specific iscsi_np process context. > > > > This patch doesn't change this behavior. > > > > What it does change is when the host closes the connection and > > iscsi_target_sk_state_change() gets invoked, iscsi_np process context > > waits for iscsi_check_for_session_reinstatement() to complete before > > releasing the connection resources. > > > > However since iscsi_np process context is blocked, new connections won't > > be accepted until the new connection forcing session reinstatement > > finishes waiting for outstanding backend I/O to complete. > > I was seeing this. My original mail asked about iscsi login resources > incorrectly, but like you said we do not get that far. I get a giant > backlog (1 connection request per 5 seconds that we waited) of tcp level > connection requests and drops. When the wait is done I get a flood of > "iSCSI Login negotiation failed" due to the target handling all those > now stale requests/drops. Ah, I see what you mean. The TCP backlog = 256 default can fill up when a small host side login timeout is used while iscsi_np is blocked waiting for session reinstatement to complete. > > If we do not care about the memory use at the network level for this > case (it seems like a little and reconnects are not aggressive), then > patch works ok for me. I am guessing it gets nasty to handle, so maybe > not worth it to handle right now? Yeah, since it's a issue separate from root cause here, getting this merged first makes sense. > I tried to do it in my patch which is why it got all crazy with the waits/wakeups :) > One option to consider is to immediately queue into delayed work-queue context from iscsi_target_start_negotiation() instead of doing the iscsi_target_do_login() and session reinstatement from iscsi_np context. Just taking a quick look, this seems like it would be a pretty straight-forward change.. > Thanks, and you can add a tested-by or reviewed-by from me. Great, thanks MNC. Will send out a PULL request for -rc4 shortly.
diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c index 7ccc9c1..6f88b31 100644 --- a/drivers/target/iscsi/iscsi_target_nego.c +++ b/drivers/target/iscsi/iscsi_target_nego.c @@ -493,14 +493,60 @@ static void iscsi_target_restore_sock_callbacks(struct iscsi_conn *conn) static int iscsi_target_do_login(struct iscsi_conn *, struct iscsi_login *); -static bool iscsi_target_sk_state_check(struct sock *sk) +static bool __iscsi_target_sk_check_close(struct sock *sk) { if (sk->sk_state == TCP_CLOSE_WAIT || sk->sk_state == TCP_CLOSE) { - pr_debug("iscsi_target_sk_state_check: TCP_CLOSE_WAIT|TCP_CLOSE," + pr_debug("__iscsi_target_sk_check_close: TCP_CLOSE_WAIT|TCP_CLOSE," "returning FALSE\n"); - return false; + return true; } - return true; + return false; +} + +static bool iscsi_target_sk_check_close(struct iscsi_conn *conn) +{ + bool state = false; + + if (conn->sock) { + struct sock *sk = conn->sock->sk; + + read_lock_bh(&sk->sk_callback_lock); + state = (__iscsi_target_sk_check_close(sk) || + test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)); + read_unlock_bh(&sk->sk_callback_lock); + } + return state; +} + +static bool iscsi_target_sk_check_flag(struct iscsi_conn *conn, unsigned int flag) +{ + bool state = false; + + if (conn->sock) { + struct sock *sk = conn->sock->sk; + + read_lock_bh(&sk->sk_callback_lock); + state = test_bit(flag, &conn->login_flags); + read_unlock_bh(&sk->sk_callback_lock); + } + return state; +} + +static bool iscsi_target_sk_check_and_clear(struct iscsi_conn *conn, unsigned int flag) +{ + bool state = false; + + if (conn->sock) { + struct sock *sk = conn->sock->sk; + + write_lock_bh(&sk->sk_callback_lock); + state = (__iscsi_target_sk_check_close(sk) || + test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)); + if (!state) + clear_bit(flag, &conn->login_flags); + write_unlock_bh(&sk->sk_callback_lock); + } + return state; } static void iscsi_target_login_drop(struct iscsi_conn *conn, struct iscsi_login *login) @@ -540,6 +586,20 @@ static void iscsi_target_do_login_rx(struct work_struct *work) pr_debug("entering iscsi_target_do_login_rx, conn: %p, %s:%d\n", conn, current->comm, current->pid); + /* + * If iscsi_target_do_login_rx() has been invoked by ->sk_data_ready() + * before initial PDU processing in iscsi_target_start_negotiation() + * has completed, go ahead and retry until it's cleared. + * + * Otherwise if the TCP connection drops while this is occuring, + * iscsi_target_start_negotiation() will detect the failure, call + * cancel_delayed_work_sync(&conn->login_work), and cleanup the + * remaining iscsi connection resources from iscsi_np process context. + */ + if (iscsi_target_sk_check_flag(conn, LOGIN_FLAGS_INITIAL_PDU)) { + schedule_delayed_work(&conn->login_work, msecs_to_jiffies(10)); + return; + } spin_lock(&tpg->tpg_state_lock); state = (tpg->tpg_state == TPG_STATE_ACTIVE); @@ -547,26 +607,12 @@ static void iscsi_target_do_login_rx(struct work_struct *work) if (!state) { pr_debug("iscsi_target_do_login_rx: tpg_state != TPG_STATE_ACTIVE\n"); - iscsi_target_restore_sock_callbacks(conn); - iscsi_target_login_drop(conn, login); - iscsit_deaccess_np(np, tpg, tpg_np); - return; + goto err; } - if (conn->sock) { - struct sock *sk = conn->sock->sk; - - read_lock_bh(&sk->sk_callback_lock); - state = iscsi_target_sk_state_check(sk); - read_unlock_bh(&sk->sk_callback_lock); - - if (!state) { - pr_debug("iscsi_target_do_login_rx, TCP state CLOSE\n"); - iscsi_target_restore_sock_callbacks(conn); - iscsi_target_login_drop(conn, login); - iscsit_deaccess_np(np, tpg, tpg_np); - return; - } + if (iscsi_target_sk_check_close(conn)) { + pr_debug("iscsi_target_do_login_rx, TCP state CLOSE\n"); + goto err; } conn->login_kworker = current; @@ -584,34 +630,29 @@ static void iscsi_target_do_login_rx(struct work_struct *work) flush_signals(current); conn->login_kworker = NULL; - if (rc < 0) { - iscsi_target_restore_sock_callbacks(conn); - iscsi_target_login_drop(conn, login); - iscsit_deaccess_np(np, tpg, tpg_np); - return; - } + if (rc < 0) + goto err; pr_debug("iscsi_target_do_login_rx after rx_login_io, %p, %s:%d\n", conn, current->comm, current->pid); rc = iscsi_target_do_login(conn, login); if (rc < 0) { - iscsi_target_restore_sock_callbacks(conn); - iscsi_target_login_drop(conn, login); - iscsit_deaccess_np(np, tpg, tpg_np); + goto err; } else if (!rc) { - if (conn->sock) { - struct sock *sk = conn->sock->sk; - - write_lock_bh(&sk->sk_callback_lock); - clear_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags); - write_unlock_bh(&sk->sk_callback_lock); - } + if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_READ_ACTIVE)) + goto err; } else if (rc == 1) { iscsi_target_nego_release(conn); iscsi_post_login_handler(np, conn, zero_tsih); iscsit_deaccess_np(np, tpg, tpg_np); } + return; + +err: + iscsi_target_restore_sock_callbacks(conn); + iscsi_target_login_drop(conn, login); + iscsit_deaccess_np(np, tpg, tpg_np); } static void iscsi_target_do_cleanup(struct work_struct *work) @@ -659,31 +700,54 @@ static void iscsi_target_sk_state_change(struct sock *sk) orig_state_change(sk); return; } + state = __iscsi_target_sk_check_close(sk); + pr_debug("__iscsi_target_sk_close_change: state: %d\n", state); + if (test_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) { pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1 sk_state_change" " conn: %p\n", conn); + if (state) + set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags); write_unlock_bh(&sk->sk_callback_lock); orig_state_change(sk); return; } - if (test_and_set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) { + if (test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) { pr_debug("Got LOGIN_FLAGS_CLOSED=1 sk_state_change conn: %p\n", conn); write_unlock_bh(&sk->sk_callback_lock); orig_state_change(sk); return; } + /* + * If the TCP connection has dropped, go ahead and set LOGIN_FLAGS_CLOSED, + * but only queue conn->login_work -> iscsi_target_do_login_rx() + * processing if LOGIN_FLAGS_INITIAL_PDU has already been cleared. + * + * When iscsi_target_do_login_rx() runs, iscsi_target_sk_check_close() + * will detect the dropped TCP connection from delayed workqueue context. + * + * If LOGIN_FLAGS_INITIAL_PDU is still set, which means the initial + * iscsi_target_start_negotiation() is running, iscsi_target_do_login() + * via iscsi_target_sk_check_close() or iscsi_target_start_negotiation() + * via iscsi_target_sk_check_and_clear() is responsible for detecting the + * dropped TCP connection in iscsi_np process context, and cleaning up + * the remaining iscsi connection resources. + */ + if (state) { + pr_debug("iscsi_target_sk_state_change got failed state\n"); + set_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags); + state = test_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags); + write_unlock_bh(&sk->sk_callback_lock); - state = iscsi_target_sk_state_check(sk); - write_unlock_bh(&sk->sk_callback_lock); - - pr_debug("iscsi_target_sk_state_change: state: %d\n", state); + orig_state_change(sk); - if (!state) { - pr_debug("iscsi_target_sk_state_change got failed state\n"); - schedule_delayed_work(&conn->login_cleanup_work, 0); + if (!state) + schedule_delayed_work(&conn->login_work, 0); return; } + write_unlock_bh(&sk->sk_callback_lock); + orig_state_change(sk); } @@ -946,6 +1010,15 @@ static int iscsi_target_do_login(struct iscsi_conn *conn, struct iscsi_login *lo if (iscsi_target_handle_csg_one(conn, login) < 0) return -1; if (login_rsp->flags & ISCSI_FLAG_LOGIN_TRANSIT) { + /* + * Check to make sure the TCP connection has not + * dropped asynchronously while session reinstatement + * was occuring in this kthread context, before + * transitioning to full feature phase operation. + */ + if (iscsi_target_sk_check_close(conn)) + return -1; + login->tsih = conn->sess->tsih; login->login_complete = 1; iscsi_target_restore_sock_callbacks(conn); @@ -972,21 +1045,6 @@ static int iscsi_target_do_login(struct iscsi_conn *conn, struct iscsi_login *lo break; } - if (conn->sock) { - struct sock *sk = conn->sock->sk; - bool state; - - read_lock_bh(&sk->sk_callback_lock); - state = iscsi_target_sk_state_check(sk); - read_unlock_bh(&sk->sk_callback_lock); - - if (!state) { - pr_debug("iscsi_target_do_login() failed state for" - " conn: %p\n", conn); - return -1; - } - } - return 0; } @@ -1255,10 +1313,22 @@ int iscsi_target_start_negotiation( write_lock_bh(&sk->sk_callback_lock); set_bit(LOGIN_FLAGS_READY, &conn->login_flags); + set_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags); write_unlock_bh(&sk->sk_callback_lock); } - + /* + * If iscsi_target_do_login returns zero to signal more PDU + * exchanges are required to complete the login, go ahead and + * clear LOGIN_FLAGS_INITIAL_PDU but only if the TCP connection + * is still active. + * + * Otherwise if TCP connection dropped asynchronously, go ahead + * and perform connection cleanup now. + */ ret = iscsi_target_do_login(conn, login); + if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU)) + ret = -1; + if (ret < 0) { cancel_delayed_work_sync(&conn->login_work); cancel_delayed_work_sync(&conn->login_cleanup_work); diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h index 275581d..5f17fb7 100644 --- a/include/target/iscsi/iscsi_target_core.h +++ b/include/target/iscsi/iscsi_target_core.h @@ -557,6 +557,7 @@ struct iscsi_conn { #define LOGIN_FLAGS_READ_ACTIVE 1 #define LOGIN_FLAGS_CLOSED 2 #define LOGIN_FLAGS_READY 4 +#define LOGIN_FLAGS_INITIAL_PDU 8 unsigned long login_flags; struct delayed_work login_work; struct delayed_work login_cleanup_work;