Message ID | 20180119133629.14812-1-fw@strlen.de (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Fri, 2018-01-19 at 14:36 +0100, Florian Westphal wrote: > Mike Christie reports: > Starting in 4.14 iscsi logins will fail around 50% of the time. > > Problem appears to be that iscsi_target_sk_data_ready() callback may > return without doing anything in case it finds the login work queue > is still blocked in sock_recvmsg(). > > Nicholas Bellinger says: > It would indicate users providing their own ->sk_data_ready() callback > must be responsible for waking up a kthread context blocked on > sock_recvmsg(..., MSG_WAITALL), when a second ->sk_data_ready() is > received before the first sock_recvmsg(..., MSG_WAITALL) completes. > > So, do this and invoke the original data_ready() callback -- in > case of tcp sockets this takes care of waking the thread. > > Disclaimer: I do not understand why this problem did not show up before > tcp prequeue removal. > > Reported-by: Mike Christie <mchristi@redhat.com> > Bisected-by: Mike Christie <mchristi@redhat.com> > Tested-by: Mike Christie <mchristi@redhat.com> > Diagnosed-by: Nicholas Bellinger <nab@linux-iscsi.org> > Fixes: e7942d0633c4 ("tcp: remove prequeue support") > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > drivers/target/iscsi/iscsi_target_nego.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c > index b686e2ce9c0e..3723f8f419aa 100644 > --- a/drivers/target/iscsi/iscsi_target_nego.c > +++ b/drivers/target/iscsi/iscsi_target_nego.c > @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk) > if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) { > write_unlock_bh(&sk->sk_callback_lock); > pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn); > + if (WARN_ON(iscsi_target_sk_data_ready == conn->orig_data_ready)) > + return; Is this WARN_ON() belonging to this fix ? At least make it WARN_ON_ONCE() or pr_err_once() > + conn->orig_data_ready(sk); > return; > } >
Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2018-01-19 at 14:36 +0100, Florian Westphal wrote: > > diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c > > index b686e2ce9c0e..3723f8f419aa 100644 > > --- a/drivers/target/iscsi/iscsi_target_nego.c > > +++ b/drivers/target/iscsi/iscsi_target_nego.c > > @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk) > > if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) { > > write_unlock_bh(&sk->sk_callback_lock); > > pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn); > > + if (WARN_ON(iscsi_target_sk_data_ready == conn->orig_data_ready)) > > + return; > > Is this WARN_ON() belonging to this fix ? > At least make it WARN_ON_ONCE() or pr_err_once() Nicholas, I don't know this code at all so it would be good if you could give advice here (omit all together, WARN_ON_ONCE, ...). Thanks!
Hey Florian & Co, On Fri, 2018-01-19 at 18:26 +0100, Florian Westphal wrote: > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Fri, 2018-01-19 at 14:36 +0100, Florian Westphal wrote: > > > diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c > > > index b686e2ce9c0e..3723f8f419aa 100644 > > > --- a/drivers/target/iscsi/iscsi_target_nego.c > > > +++ b/drivers/target/iscsi/iscsi_target_nego.c > > > @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk) > > > if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) { > > > write_unlock_bh(&sk->sk_callback_lock); > > > pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn); > > > + if (WARN_ON(iscsi_target_sk_data_ready == conn->orig_data_ready)) > > > + return; > > > > Is this WARN_ON() belonging to this fix ? > > At least make it WARN_ON_ONCE() or pr_err_once() > > Nicholas, I don't know this code at all so it would be good if you could > give advice here (omit all together, WARN_ON_ONCE, ...). > This is regular behavior during multi PDU login sequences, and should not include a WARN_ON. So with MNC's Tested-by in place, applying to target-pending/for-next minus the WARN_ON, with a extra 4.14.y stable tag. Thanks again for taking a look at this. To your earlier point wrt net.ipv4.tcp_low_latency=1 on 4.13 code not triggering pre-queue logic. From groking the original patch to drop prequeue I agree this should really be the case, but am still at a loss how MNC is triggering on 4.14+ unless something else has changed to uncover this iscsi-target bug. Still curious to verify the root cause, but I haven't been able to reproduce this in VMs on small scale, and haven't had cycles to reproduce on HW yet. That said, since the bug appears to be masked on <= 4.13.y + tcp_low_latency=1, unless someone can reproduce this on earlier code with tcp_low_latency=0, I'll leave off the older stable tag for now.
diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c index b686e2ce9c0e..3723f8f419aa 100644 --- a/drivers/target/iscsi/iscsi_target_nego.c +++ b/drivers/target/iscsi/iscsi_target_nego.c @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk) if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) { write_unlock_bh(&sk->sk_callback_lock); pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn); + if (WARN_ON(iscsi_target_sk_data_ready == conn->orig_data_ready)) + return; + conn->orig_data_ready(sk); return; }