diff mbox series

[v3,mptcp-next,1/3] mptcp: propagate fastclose error.

Message ID 5e26963328ae04aa375089a713ec40f5eb6adacd.1662051551.git.pabeni@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Matthieu Baerts
Headers show
Series [v3,mptcp-next,1/3] mptcp: propagate fastclose error. | expand

Commit Message

Paolo Abeni Sept. 1, 2022, 5:31 p.m. UTC
When an mptcp socket is closed due to an incoming FASTCLOSE
option, so specific sk_err is set and later syscall will
fail usually with EPIPE.

Align the current fastclose error handling with TCP reset,
properly setting the socket error according to the current
msk state and propagating such error.

Additionally sendmsg() is currently not handling properly
the sk_err, always returning EPIPE.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 45 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)

Comments

Matthieu Baerts Sept. 7, 2022, 5:22 p.m. UTC | #1
Hi Paolo,

On 01/09/2022 19:31, Paolo Abeni wrote:
> When an mptcp socket is closed due to an incoming FASTCLOSE
> option, so specific sk_err is set and later syscall will
> fail usually with EPIPE.
> 
> Align the current fastclose error handling with TCP reset,
> properly setting the socket error according to the current
> msk state and propagating such error.

Thank you for the patch and sorry for the delay!

I have some questions here below:

> Additionally sendmsg() is currently not handling properly
> the sk_err, always returning EPIPE.
Could it be seen as a bug-fix and deserving a dedicated commit?

> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/mptcp/protocol.c | 45 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 35 insertions(+), 10 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index f4891db86217..b04f184695e4 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1685,9 +1685,13 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  	if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
>  		ret = sk_stream_wait_connect(sk, &timeo);
>  		if (ret)
> -			goto out;
> +			goto do_error;
>  	}
>  
> +	ret = -EPIPE;

Regarding this line above... (see below)

> +	if (unlikely(sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)))
> +		goto do_error;
> +
>  	pfrag = sk_page_frag(sk);
>  
>  	while (msg_data_left(msg)) {

...Could we eventually have no data left and no going through the while
loop?

In this case "copied" will be 0 and we will return the new -EPIPE, maybe
not what we want, no?
Or should we change 'ret' value just before calling 'goto do_error' to
avoid that?

(...)

> @@ -2404,12 +2410,31 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
>  		unlock_sock_fast(tcp_sk, slow);
>  	}
>  
> +	/* Mirror the tcp_reset() error propagation */
> +	switch (sk->sk_state) {
> +	case TCP_SYN_SENT:
> +		sk->sk_err = ECONNREFUSED;
> +		break;
> +	case TCP_CLOSE_WAIT:
> +		sk->sk_err = EPIPE;
> +		break;
> +	case TCP_CLOSE:
> +		return;
> +	default:
> +		sk->sk_err = ECONNRESET;
> +	}

Should we eventually move this out of tcp_reset(), export it (inline
function?) and re-using it here to avoid the duplication (and eventually
being out-of-sync in case of changes on TCP side? even if I guess this
would not change but well)

> +
>  	inet_sk_state_store(sk, TCP_CLOSE);
>  	sk->sk_shutdown = SHUTDOWN_MASK;
>  	smp_mb__before_atomic(); /* SHUTDOWN must be visible first */
>  	set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags);
>  
> -	mptcp_close_wake_up(sk);
> +	/* the calling mptcp_worker will properly destroy the socket */
> +	if (sock_flag(sk, SOCK_DEAD))
> +		return;
> +
> +	sk->sk_state_change(sk);
> +	sk_error_report(sk);
>  }
>  
>  static void __mptcp_retrans(struct sock *sk)

Cheers,
Matt
Paolo Abeni Sept. 8, 2022, 11:48 a.m. UTC | #2
On Wed, 2022-09-07 at 19:22 +0200, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 01/09/2022 19:31, Paolo Abeni wrote:
> > When an mptcp socket is closed due to an incoming FASTCLOSE
> > option, so specific sk_err is set and later syscall will
> > fail usually with EPIPE.
> > 
> > Align the current fastclose error handling with TCP reset,
> > properly setting the socket error according to the current
> > msk state and propagating such error.
> 
> Thank you for the patch and sorry for the delay!

Thank you for the review!
> 
> I have some questions here below:
> 
> > Additionally sendmsg() is currently not handling properly
> > the sk_err, always returning EPIPE.
> Could it be seen as a bug-fix and deserving a dedicated commit?

I personally see this as an improvement, as AFAIK there is no mandated
by RFC return error code for MPTCP.
> 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/mptcp/protocol.c | 45 ++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 35 insertions(+), 10 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index f4891db86217..b04f184695e4 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -1685,9 +1685,13 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> >  	if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
> >  		ret = sk_stream_wait_connect(sk, &timeo);
> >  		if (ret)
> > -			goto out;
> > +			goto do_error;
> >  	}
> >  
> > +	ret = -EPIPE;
> 
> Regarding this line above... (see below)
> 
> > +	if (unlikely(sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)))
> > +		goto do_error;
> > +
> >  	pfrag = sk_page_frag(sk);
> >  
> >  	while (msg_data_left(msg)) {
> 
> ...Could we eventually have no data left and no going through the while
> loop?

yes...

> In this case "copied" will be 0 and we will return the new -EPIPE, maybe
> not what we want, no?

no;) we now always return 'copied'. We set 'copied' to the correct
error code only in the error path, saving a conditional in the fast
path.

All gracefully stolen from plain TCP ;)

> Or should we change 'ret' value just before calling 'goto do_error' to
> avoid that?
> 
> (...)
> 
> > @@ -2404,12 +2410,31 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
> >  		unlock_sock_fast(tcp_sk, slow);
> >  	}
> >  
> > +	/* Mirror the tcp_reset() error propagation */
> > +	switch (sk->sk_state) {
> > +	case TCP_SYN_SENT:
> > +		sk->sk_err = ECONNREFUSED;
> > +		break;
> > +	case TCP_CLOSE_WAIT:
> > +		sk->sk_err = EPIPE;
> > +		break;
> > +	case TCP_CLOSE:
> > +		return;
> > +	default:
> > +		sk->sk_err = ECONNRESET;
> > +	}
> 
> Should we eventually move this out of tcp_reset(), export it (inline
> function?) and re-using it here to avoid the duplication (and eventually
> being out-of-sync in case of changes on TCP side? even if I guess this
> would not change but well)

That could be a possible follow-up I guess, but I would refrain from
touching TCP to address issues/295.


Thanks!

Paolo
Matthieu Baerts Sept. 8, 2022, 12:13 p.m. UTC | #3
Hi Paolo,

Thank you for your replies!

On 08/09/2022 13:48, Paolo Abeni wrote:
> On Wed, 2022-09-07 at 19:22 +0200, Matthieu Baerts wrote:
>> On 01/09/2022 19:31, Paolo Abeni wrote:

(...)

>>> Additionally sendmsg() is currently not handling properly
>>> the sk_err, always returning EPIPE.
>> Could it be seen as a bug-fix and deserving a dedicated commit?
> 
> I personally see this as an improvement, as AFAIK there is no mandated
> by RFC return error code for MPTCP.

Mmh yes I see, we can say that.

>>
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>>  net/mptcp/protocol.c | 45 ++++++++++++++++++++++++++++++++++----------
>>>  1 file changed, 35 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index f4891db86217..b04f184695e4 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -1685,9 +1685,13 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>>>  	if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
>>>  		ret = sk_stream_wait_connect(sk, &timeo);
>>>  		if (ret)
>>> -			goto out;
>>> +			goto do_error;
>>>  	}
>>>  
>>> +	ret = -EPIPE;
>>
>> Regarding this line above... (see below)
>>
>>> +	if (unlikely(sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)))
>>> +		goto do_error;
>>> +
>>>  	pfrag = sk_page_frag(sk);
>>>  
>>>  	while (msg_data_left(msg)) {
>>
>> ...Could we eventually have no data left and no going through the while
>> loop?
> 
> yes...
> 
>> In this case "copied" will be 0 and we will return the new -EPIPE, maybe
>> not what we want, no?
> 
> no;) we now always return 'copied'. We set 'copied' to the correct
> error code only in the error path, saving a conditional in the fast
> path.

My bad, for the end of the function, I was looking at code prior this
patch... Indeed, all good with the rest of the modifications :)

> All gracefully stolen from plain TCP ;)

:)

> 
>> Or should we change 'ret' value just before calling 'goto do_error' to
>> avoid that?
>>
>> (...)
>>
>>> @@ -2404,12 +2410,31 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
>>>  		unlock_sock_fast(tcp_sk, slow);
>>>  	}
>>>  
>>> +	/* Mirror the tcp_reset() error propagation */
>>> +	switch (sk->sk_state) {
>>> +	case TCP_SYN_SENT:
>>> +		sk->sk_err = ECONNREFUSED;
>>> +		break;
>>> +	case TCP_CLOSE_WAIT:
>>> +		sk->sk_err = EPIPE;
>>> +		break;
>>> +	case TCP_CLOSE:
>>> +		return;
>>> +	default:
>>> +		sk->sk_err = ECONNRESET;
>>> +	}
>>
>> Should we eventually move this out of tcp_reset(), export it (inline
>> function?) and re-using it here to avoid the duplication (and eventually
>> being out-of-sync in case of changes on TCP side? even if I guess this
>> would not change but well)
> 
> That could be a possible follow-up I guess, but I would refrain from
> touching TCP to address issues/295.

Sure, we can do that later as a clean-up.


Patch 1/3 and 2/3 from this v3 are then good to me:

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>

(I'm not going to apply them now as we need patch 3/3 and I had some
questions there + packetdrill adaptations)


Cheers,
Matt
diff mbox series

Patch

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f4891db86217..b04f184695e4 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1685,9 +1685,13 @@  static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
 		ret = sk_stream_wait_connect(sk, &timeo);
 		if (ret)
-			goto out;
+			goto do_error;
 	}
 
+	ret = -EPIPE;
+	if (unlikely(sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)))
+		goto do_error;
+
 	pfrag = sk_page_frag(sk);
 
 	while (msg_data_left(msg)) {
@@ -1696,11 +1700,6 @@  static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		bool dfrag_collapsed;
 		size_t psize, offset;
 
-		if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) {
-			ret = -EPIPE;
-			goto out;
-		}
-
 		/* reuse tail pfrag, if possible, or carve a new one from the
 		 * page allocator
 		 */
@@ -1732,7 +1731,7 @@  static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		if (copy_page_from_iter(dfrag->page, offset, psize,
 					&msg->msg_iter) != psize) {
 			ret = -EFAULT;
-			goto out;
+			goto do_error;
 		}
 
 		/* data successfully copied into the write queue */
@@ -1764,7 +1763,7 @@  static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		__mptcp_push_pending(sk, msg->msg_flags);
 		ret = sk_stream_wait_memory(sk, &timeo);
 		if (ret)
-			goto out;
+			goto do_error;
 	}
 
 	if (copied)
@@ -1772,7 +1771,14 @@  static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 out:
 	release_sock(sk);
-	return copied ? : ret;
+	return copied;
+
+do_error:
+	if (copied)
+		goto out;
+
+	copied = sk_stream_error(sk, msg->msg_flags, ret);
+	goto out;
 }
 
 static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
@@ -2404,12 +2410,31 @@  static void mptcp_check_fastclose(struct mptcp_sock *msk)
 		unlock_sock_fast(tcp_sk, slow);
 	}
 
+	/* Mirror the tcp_reset() error propagation */
+	switch (sk->sk_state) {
+	case TCP_SYN_SENT:
+		sk->sk_err = ECONNREFUSED;
+		break;
+	case TCP_CLOSE_WAIT:
+		sk->sk_err = EPIPE;
+		break;
+	case TCP_CLOSE:
+		return;
+	default:
+		sk->sk_err = ECONNRESET;
+	}
+
 	inet_sk_state_store(sk, TCP_CLOSE);
 	sk->sk_shutdown = SHUTDOWN_MASK;
 	smp_mb__before_atomic(); /* SHUTDOWN must be visible first */
 	set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags);
 
-	mptcp_close_wake_up(sk);
+	/* the calling mptcp_worker will properly destroy the socket */
+	if (sock_flag(sk, SOCK_DEAD))
+		return;
+
+	sk->sk_state_change(sk);
+	sk_error_report(sk);
 }
 
 static void __mptcp_retrans(struct sock *sk)