diff mbox series

[mptcp-next,2/2] mptcp: re-set push-pending bit on retransmit failure

Message ID 20210906060614.25217-3-fw@strlen.de (mailing list archive)
State Superseded, archived
Delegated to: Paolo Abeni
Headers show
Series Fix mptcp connection hangs after link failover | expand

Commit Message

Florian Westphal Sept. 6, 2021, 6:06 a.m. UTC
The retransmit head will be NULL in case there is no in-flight data
(meaning all data injected into network has been acked).

In that case the retransmit timer is stopped.

This is only correct if there is no more pending, not-yet-sent data.

If there is, the retransmit timer needs to set the PENDING bit again so
that mptcp tries to send the remaining (new) data once a subflow can accept
more data.

Also, mptcp_subflow_get_retrans() has to be called unconditionally.
This function checks for subflows that have become unresponsive and marks
them as stale, so in the case where the rtx queue is empty, subflows
will never be marked stale which prevents available backup subflows from
becoming eligible for transmit.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

Comments

Paolo Abeni Sept. 6, 2021, 7:43 a.m. UTC | #1
On Mon, 2021-09-06 at 08:06 +0200, Florian Westphal wrote:
> The retransmit head will be NULL in case there is no in-flight data
> (meaning all data injected into network has been acked).
> 
> In that case the retransmit timer is stopped.
> 
> This is only correct if there is no more pending, not-yet-sent data.
> 
> If there is, the retransmit timer needs to set the PENDING bit again so
> that mptcp tries to send the remaining (new) data once a subflow can accept
> more data.
> 
> Also, mptcp_subflow_get_retrans() has to be called unconditionally.
> This function checks for subflows that have become unresponsive and marks
> them as stale, so in the case where the rtx queue is empty, subflows
> will never be marked stale which prevents available backup subflows from
> becoming eligible for transmit.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

Looks great, but I'm struggling to follow the code in a couple of
points, see below...

> ---
>  net/mptcp/protocol.c | 28 +++++++++++++++++++++++++---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index c0e0ee4cb24f..b88a9b61025b 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1105,7 +1105,8 @@ static void __mptcp_clean_una(struct sock *sk)
>  	if (cleaned && tcp_under_memory_pressure(sk))
>  		__mptcp_mem_reclaim_partial(sk);
>  
> -	if (snd_una == READ_ONCE(msk->snd_nxt) && !msk->recovery) {
> +	if (snd_una == READ_ONCE(msk->snd_nxt) &&
> +	    snd_una == READ_ONCE(msk->write_seq)) {
>  		if (mptcp_timer_pending(sk) && !mptcp_data_fin_enabled(msk))
>  			mptcp_stop_timer(sk);

@Mat: I'm wild guessing the above could possibly address also
issues/230. Could you please give it a spin in the CI? How frequently
do you observe the mentioned issue in the CI?

> } else {
> @@ -1547,6 +1548,13 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
>  		msk->snd_nxt = snd_nxt_new;
>  }
>  
> +static void mptcp_check_and_set_pending(struct sock *sk)
> +{
> +	if (mptcp_send_head(sk) &&
> +	    !test_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags))
> +		set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
> +}
> +
>  void __mptcp_push_pending(struct sock *sk, unsigned int flags)
>  {
>  	struct sock *prev_ssk = NULL, *ssk = NULL;
> @@ -1603,6 +1611,13 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
>  		mptcp_push_release(sk, ssk, &info);
>  
>  out:
> +	/* Set PENDING again in case we found an ssk
> +	 * that could not accept more data
> +	 */
> +	if (unlikely(copied == 0) &&
> +	    READ_ONCE(msk->snd_una) == msk->snd_nxt && ssk)
> +		mptcp_check_and_set_pending(sk);

@Florian: Why we need to check for ssk != NULL?
Do we need something similar for __mptcp_subflow_push_pending(), too? 
I'm wondering if this will cause a sort of 'spinning'/busy loop on
above condition inside mptcp_release_cb(). Could we just ensure that
the rtx timer is harmed instead? We likely have to wait a bit for the
subflow to become ready, I think.

Cheers,

Paolo
Florian Westphal Sept. 6, 2021, 7:55 a.m. UTC | #2
Paolo Abeni <pabeni@redhat.com> wrote:
> > +	/* Set PENDING again in case we found an ssk
> > +	 * that could not accept more data
> > +	 */
> > +	if (unlikely(copied == 0) &&
> > +	    READ_ONCE(msk->snd_una) == msk->snd_nxt && ssk)
> > +		mptcp_check_and_set_pending(sk);
> 
> @Florian: Why we need to check for ssk != NULL?

Because I don't think it makes sense to re-try ASAP if we could
not find a ssk, and instead defer until next firing of rtx timer.

> Do we need something similar for __mptcp_subflow_push_pending(), too?

I don't think so, its thats not invoked from the mptcp-release cb; also:

> I'm wondering if this will cause a sort of 'spinning'/busy loop on
> above condition inside mptcp_release_cb().

Hmm, right, that might indeed happen.
It should be sae to remove the above clause and unconditioally wait
for the reworked rtx timer to assert pending bit again.
Paolo Abeni Sept. 6, 2021, 8:38 a.m. UTC | #3
On Mon, 2021-09-06 at 09:55 +0200, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > > +	/* Set PENDING again in case we found an ssk
> > > +	 * that could not accept more data
> > > +	 */
> > > +	if (unlikely(copied == 0) &&
> > > +	    READ_ONCE(msk->snd_una) == msk->snd_nxt && ssk)
> > > +		mptcp_check_and_set_pending(sk);
> > 
> > @Florian: Why we need to check for ssk != NULL?
> 
> Because I don't think it makes sense to re-try ASAP if we could
> not find a ssk, and instead defer until next firing of rtx timer.

Ok
> 
> > Do we need something similar for __mptcp_subflow_push_pending(), too?
> 
> I don't think so, its thats not invoked from the mptcp-release cb; 

yup, right. To be clear, with "something similar" I also referred to
ensuring the rtx timer is armed.

> also:
> 
> > I'm wondering if this will cause a sort of 'spinning'/busy loop on
> > above condition inside mptcp_release_cb().
> 
> Hmm, right, that might indeed happen.
> It should be sae to remove the above clause and unconditioally wait
> for the reworked rtx timer to assert pending bit again.

+1 :) (assuming s/sae/safe/ :)

Thanks!

Paolo
Florian Westphal Sept. 6, 2021, 11:35 a.m. UTC | #4
Paolo Abeni <pabeni@redhat.com> wrote:
> On Mon, 2021-09-06 at 09:55 +0200, Florian Westphal wrote:
> > Because I don't think it makes sense to re-try ASAP if we could
> > not find a ssk, and instead defer until next firing of rtx timer.
> 
> Ok
> > 
> > > Do we need something similar for __mptcp_subflow_push_pending(), too?
> > 
> > I don't think so, its thats not invoked from the mptcp-release cb; 
> 
> yup, right. To be clear, with "something similar" I also referred to
> ensuring the rtx timer is armed.

__mptcp_subflow_push_pending() is only called when we already have
pending xmit, i.e. the timer is already running and with the change in
this patch the timer is guaranteed to re-arm itself unless there is
nothing to transmit anymore.

__mptcp_push_pending is different because that is called when new data
is passed in via sendmsg, i.e. timer might not be running (yet).

Makes sense to you?

> > > I'm wondering if this will cause a sort of 'spinning'/busy loop on
> > > above condition inside mptcp_release_cb().
> > 
> > Hmm, right, that might indeed happen.
> > It should be sae to remove the above clause and unconditioally wait
> > for the reworked rtx timer to assert pending bit again.
> 
> +1 :) (assuming s/sae/safe/ :)

Yep, safe.  Test has been running for a few hours now with no failures,
will send v2 shortly.
Paolo Abeni Sept. 6, 2021, 1:14 p.m. UTC | #5
On Mon, 2021-09-06 at 13:35 +0200, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > On Mon, 2021-09-06 at 09:55 +0200, Florian Westphal wrote:
> > > Because I don't think it makes sense to re-try ASAP if we could
> > > not find a ssk, and instead defer until next firing of rtx timer.
> > 
> > Ok
> > > > Do we need something similar for __mptcp_subflow_push_pending(), too?
> > > 
> > > I don't think so, its thats not invoked from the mptcp-release cb; 
> > 
> > yup, right. To be clear, with "something similar" I also referred to
> > ensuring the rtx timer is armed.
> 
> __mptcp_subflow_push_pending() is only called when we already have
> pending xmit, i.e. the timer is already running and with the change in
> this patch the timer is guaranteed to re-arm itself unless there is
> nothing to transmit anymore.
> 
> __mptcp_push_pending is different because that is called when new data
> is passed in via sendmsg, i.e. timer might not be running (yet).
> 
> Makes sense to you?

Yes, thanks for the explaination!

> > > > I'm wondering if this will cause a sort of 'spinning'/busy loop on
> > > > above condition inside mptcp_release_cb().
> > > 
> > > Hmm, right, that might indeed happen.
> > > It should be sae to remove the above clause and unconditioally wait
> > > for the reworked rtx timer to assert pending bit again.
> > 
> > +1 :) (assuming s/sae/safe/ :)
> 
> Yep, safe.  Test has been running for a few hours now with no failures,
> will send v2 shortly.

Excellent!

Cheeers,

Paolo
diff mbox series

Patch

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index c0e0ee4cb24f..b88a9b61025b 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1105,7 +1105,8 @@  static void __mptcp_clean_una(struct sock *sk)
 	if (cleaned && tcp_under_memory_pressure(sk))
 		__mptcp_mem_reclaim_partial(sk);
 
-	if (snd_una == READ_ONCE(msk->snd_nxt) && !msk->recovery) {
+	if (snd_una == READ_ONCE(msk->snd_nxt) &&
+	    snd_una == READ_ONCE(msk->write_seq)) {
 		if (mptcp_timer_pending(sk) && !mptcp_data_fin_enabled(msk))
 			mptcp_stop_timer(sk);
 	} else {
@@ -1547,6 +1548,13 @@  static void mptcp_update_post_push(struct mptcp_sock *msk,
 		msk->snd_nxt = snd_nxt_new;
 }
 
+static void mptcp_check_and_set_pending(struct sock *sk)
+{
+	if (mptcp_send_head(sk) &&
+	    !test_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags))
+		set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
+}
+
 void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 {
 	struct sock *prev_ssk = NULL, *ssk = NULL;
@@ -1603,6 +1611,13 @@  void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 		mptcp_push_release(sk, ssk, &info);
 
 out:
+	/* Set PENDING again in case we found an ssk
+	 * that could not accept more data
+	 */
+	if (unlikely(copied == 0) &&
+	    READ_ONCE(msk->snd_una) == msk->snd_nxt && ssk)
+		mptcp_check_and_set_pending(sk);
+
 	/* ensure the rtx timer is running */
 	if (!mptcp_timer_pending(sk))
 		mptcp_reset_timer(sk);
@@ -2414,6 +2429,9 @@  static void __mptcp_retrans(struct sock *sk)
 	int ret;
 
 	mptcp_clean_una_wakeup(sk);
+
+	/* first check ssk: need to kick "stale" logic */
+	ssk = mptcp_subflow_get_retrans(msk);
 	dfrag = mptcp_rtx_head(sk);
 	if (!dfrag) {
 		if (mptcp_data_fin_enabled(msk)) {
@@ -2426,10 +2444,12 @@  static void __mptcp_retrans(struct sock *sk)
 			goto reset_timer;
 		}
 
-		return;
+		if (!mptcp_send_head(sk))
+			return;
+
+		goto reset_timer;
 	}
 
-	ssk = mptcp_subflow_get_retrans(msk);
 	if (!ssk)
 		goto reset_timer;
 
@@ -2456,6 +2476,8 @@  static void __mptcp_retrans(struct sock *sk)
 	release_sock(ssk);
 
 reset_timer:
+	mptcp_check_and_set_pending(sk);
+
 	if (!mptcp_timer_pending(sk))
 		mptcp_reset_timer(sk);
 }