diff mbox series

[mptcp-net] mptcp: fallback if MPTCP opts are dropped after 1st data

Message ID 20241003-mptcp-gh-518-v1-1-e79e1f6434ad@kernel.org (mailing list archive)
State Accepted, archived
Commit 501099bf8c1aa433d9bbe7f7377cd3f4d55cfc0e
Delegated to: Matthieu Baerts
Headers show
Series [mptcp-net] mptcp: fallback if MPTCP opts are dropped after 1st data | expand

Checks

Context Check Description
matttbe/build success Build and static analysis OK
matttbe/checkpatch warning total: 0 errors, 1 warnings, 0 checks, 8 lines checked
matttbe/shellcheck success MPTCP selftests files have not been modified
matttbe/KVM_Validation__normal success Success! ✅
matttbe/KVM_Validation__debug success Success! ✅
matttbe/KVM_Validation__btf__only_bpftest_all_ success Success! ✅

Commit Message

Matthieu Baerts (NGI0) Oct. 3, 2024, 6:43 p.m. UTC
As reported by Christoph [1], before this patch, an MPTCP connection was
wrongly reset when a host received a first data packet with MPTCP
options after the 3wHS, but got the next ones without.

According to the MPTCP v1 specs [2], a fallback should happen in this
case, because the host didn't receive a DATA_ACK from the other peer,
nor receive data for more than the initial window which implies a
DATA_ACK being received by the other peer.

The patch here re-uses the same logic as the one used in other places:
by looking at allow_infinite_fallback, which is disabled at the creation
of an additional subflow. It's not looking at the first DATA_ACK (or
implying one received from the other side) as suggested by the RFC, but
it is in continuation with what was already done, which is safer, and it
fixes the reported issue. The next step, looking at this first DATA_ACK,
is tracked in [4].

This patch has been validated using the following Packetdrill script:

   0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
  +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
  +0 bind(3, ..., ...) = 0
  +0 listen(3, 1) = 0

  // 3WHS is OK
  +0.0 < S  0:0(0)       win 65535  <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
  +0.0 > S. 0:0(0) ack 1            <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
  +0.1 <  . 1:1(0) ack 1 win 2048                                              <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
  +0 accept(3, ..., ...) = 4

  // Data from the client with valid MPTCP options (no DATA_ACK: normal)
  +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
  // From here, the MPTCP options will be dropped by a middlebox
  +0.0 >  . 1:1(0)     ack 501        <dss dack8=501 dll=0 nocs>

  +0.1 read(4, ..., 500) = 500
  +0   write(4, ..., 100) = 100

  // The server replies with data, still thinking MPTCP is being used
  +0.0 > P. 1:101(100)   ack 501          <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
  // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
  +0.1 <  . 501:501(0)   ack 101 win 2048

  +0.0 < P. 501:601(100) ack 101 win 2048
  // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
  +0.0 >  . 101:101(0)   ack 601

Note that this script requires Packetdrill with MPTCP support, see [3].

Fixes: dea2b1ea9c70 ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
Link: https://github.com/multipath-tcp/packetdrill [3]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Cc: Davide Caratti <dcaratti@redhat.com>
Cc: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/subflow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


---
base-commit: 8567c53ad205aa6e13f8d6c2739522907da3504a
change-id: 20241003-mptcp-gh-518-b91ffc4b1fcf

Best regards,

Comments

MPTCP CI Oct. 3, 2024, 7:50 p.m. UTC | #1
Hi Matthieu,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/11167965968

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/fa29a15b09a8
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=895307


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Christoph Paasch Oct. 3, 2024, 8:07 p.m. UTC | #2
+Craig from Apple

Hello,

> On Oct 3, 2024, at 11:43 AM, Matthieu Baerts (NGI0) <matttbe@kernel.org> wrote:
> 
> As reported by Christoph [1], before this patch, an MPTCP connection was
> wrongly reset when a host received a first data packet with MPTCP
> options after the 3wHS, but got the next ones without.
> 
> According to the MPTCP v1 specs [2], a fallback should happen in this
> case, because the host didn't receive a DATA_ACK from the other peer,
> nor receive data for more than the initial window which implies a
> DATA_ACK being received by the other peer.
> 
> The patch here re-uses the same logic as the one used in other places:
> by looking at allow_infinite_fallback, which is disabled at the creation
> of an additional subflow. It's not looking at the first DATA_ACK (or
> implying one received from the other side) as suggested by the RFC, but
> it is in continuation with what was already done, which is safer, and it
> fixes the reported issue. The next step, looking at this first DATA_ACK,
> is tracked in [4].
> 
> This patch has been validated using the following Packetdrill script:
> 
>   0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
>  +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>  +0 bind(3, ..., ...) = 0
>  +0 listen(3, 1) = 0
> 
>  // 3WHS is OK
>  +0.0 < S  0:0(0)       win 65535  <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
>  +0.0 > S. 0:0(0) ack 1            <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
>  +0.1 <  . 1:1(0) ack 1 win 2048                                              <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
>  +0 accept(3, ..., ...) = 4
> 
>  // Data from the client with valid MPTCP options (no DATA_ACK: normal)
>  +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
>  // From here, the MPTCP options will be dropped by a middlebox
>  +0.0 >  . 1:1(0)     ack 501        <dss dack8=501 dll=0 nocs>
> 
>  +0.1 read(4, ..., 500) = 500
>  +0   write(4, ..., 100) = 100
> 
>  // The server replies with data, still thinking MPTCP is being used
>  +0.0 > P. 1:101(100)   ack 501          <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
>  // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
>  +0.1 <  . 501:501(0)   ack 101 win 2048
> 
>  +0.0 < P. 501:601(100) ack 101 win 2048
>  // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
>  +0.0 >  . 101:101(0)   ack 601
> 
> Note that this script requires Packetdrill with MPTCP support, see [3].
> 
> Fixes: dea2b1ea9c70 ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
> Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
> Link: https://github.com/multipath-tcp/packetdrill [3]
> Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> Cc: Davide Caratti <dcaratti@redhat.com>
> Cc: Paolo Abeni <pabeni@redhat.com>

awesome! Is this ready to be tested or does this need to go through the CI,… first ?


Christoph

> ---
> net/mptcp/subflow.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index a1e28e1d8b4e39e14bc8f98164013d302d62595c..61482f8b425b5dde17a4f3272d04973ae5ec7849 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1282,7 +1282,7 @@ static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)
> 	else if (READ_ONCE(msk->csum_enabled))
> 		return !subflow->valid_csum_seen;
> 	else
> -		return !READ_ONCE(subflow->fully_established);
> +		return READ_ONCE(msk->allow_infinite_fallback);
> }
> 
> static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk)
> 
> ---
> base-commit: 8567c53ad205aa6e13f8d6c2739522907da3504a
> change-id: 20241003-mptcp-gh-518-b91ffc4b1fcf
> 
> Best regards,
> -- 
> Matthieu Baerts (NGI0) <matttbe@kernel.org>
>
Matthieu Baerts (NGI0) Oct. 3, 2024, 9:29 p.m. UTC | #3
Hi Christoph,

3 Oct 2024 23:14:07 Christoph Paasch <cpaasch@apple.com>:

> +Craig from Apple
>
> Hello,
>
>> On Oct 3, 2024, at 11:43 AM, Matthieu Baerts (NGI0) <matttbe@kernel.org> wrote:
>>
>> As reported by Christoph [1], before this patch, an MPTCP connection was
>> wrongly reset when a host received a first data packet with MPTCP
>> options after the 3wHS, but got the next ones without.
>>
>> According to the MPTCP v1 specs [2], a fallback should happen in this
>> case, because the host didn't receive a DATA_ACK from the other peer,
>> nor receive data for more than the initial window which implies a
>> DATA_ACK being received by the other peer.
>>
>> The patch here re-uses the same logic as the one used in other places:
>> by looking at allow_infinite_fallback, which is disabled at the creation
>> of an additional subflow. It's not looking at the first DATA_ACK (or
>> implying one received from the other side) as suggested by the RFC, but
>> it is in continuation with what was already done, which is safer, and it
>> fixes the reported issue. The next step, looking at this first DATA_ACK,
>> is tracked in [4].
>>
>> This patch has been validated using the following Packetdrill script:
>>
>>   0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
>> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>> +0 bind(3, ..., ...) = 0
>> +0 listen(3, 1) = 0
>>
>> // 3WHS is OK
>> +0.0 < S  0:0(0)       win 65535  <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
>> +0.0 > S. 0:0(0) ack 1            <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
>> +0.1 <  . 1:1(0) ack 1 win 2048                                              <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
>> +0 accept(3, ..., ...) = 4
>>
>> // Data from the client with valid MPTCP options (no DATA_ACK: normal)
>> +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
>> // From here, the MPTCP options will be dropped by a middlebox
>> +0.0 >  . 1:1(0)     ack 501        <dss dack8=501 dll=0 nocs>
>>
>> +0.1 read(4, ..., 500) = 500
>> +0   write(4, ..., 100) = 100
>>
>> // The server replies with data, still thinking MPTCP is being used
>> +0.0 > P. 1:101(100)   ack 501          <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
>> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
>> +0.1 <  . 501:501(0)   ack 101 win 2048
>>
>> +0.0 < P. 501:601(100) ack 101 win 2048
>> // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
>> +0.0 >  . 101:101(0)   ack 601
>>
>> Note that this script requires Packetdrill with MPTCP support, see [3].
>>
>> Fixes: dea2b1ea9c70 ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
>> Reported-by: Christoph Paasch <cpaasch@apple.com>
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
>> Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
>> Link: https://github.com/multipath-tcp/packetdrill [3]
>> Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> ---
>> Cc: Davide Caratti <dcaratti@redhat.com>
>> Cc: Paolo Abeni <pabeni@redhat.com>
>
> awesome! Is this ready to be tested or does this need to go through the CI,… first ?

To me, it looks ready to be tested, and reviewed :)

It already went through the CI, everything was OK there:

https://github.com/multipath-tcp/mptcp_net-next/actions/runs/11167965968

Cheers,
Matt
Christoph Paasch Oct. 4, 2024, 4:57 p.m. UTC | #4
> On Oct 3, 2024, at 2:29 PM, Matthieu Baerts <matttbe@kernel.org> wrote:
> 
> Hi Christoph,
> 
> 3 Oct 2024 23:14:07 Christoph Paasch <cpaasch@apple.com>:
> 
>> +Craig from Apple
>> 
>> Hello,
>> 
>>> On Oct 3, 2024, at 11:43 AM, Matthieu Baerts (NGI0) <matttbe@kernel.org> wrote:
>>> 
>>> As reported by Christoph [1], before this patch, an MPTCP connection was
>>> wrongly reset when a host received a first data packet with MPTCP
>>> options after the 3wHS, but got the next ones without.
>>> 
>>> According to the MPTCP v1 specs [2], a fallback should happen in this
>>> case, because the host didn't receive a DATA_ACK from the other peer,
>>> nor receive data for more than the initial window which implies a
>>> DATA_ACK being received by the other peer.
>>> 
>>> The patch here re-uses the same logic as the one used in other places:
>>> by looking at allow_infinite_fallback, which is disabled at the creation
>>> of an additional subflow. It's not looking at the first DATA_ACK (or
>>> implying one received from the other side) as suggested by the RFC, but
>>> it is in continuation with what was already done, which is safer, and it
>>> fixes the reported issue. The next step, looking at this first DATA_ACK,
>>> is tracked in [4].
>>> 
>>> This patch has been validated using the following Packetdrill script:
>>> 
>>>   0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
>>> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>>> +0 bind(3, ..., ...) = 0
>>> +0 listen(3, 1) = 0
>>> 
>>> // 3WHS is OK
>>> +0.0 < S  0:0(0)       win 65535  <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
>>> +0.0 > S. 0:0(0) ack 1            <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
>>> +0.1 <  . 1:1(0) ack 1 win 2048                                              <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
>>> +0 accept(3, ..., ...) = 4
>>> 
>>> // Data from the client with valid MPTCP options (no DATA_ACK: normal)
>>> +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
>>> // From here, the MPTCP options will be dropped by a middlebox
>>> +0.0 >  . 1:1(0)     ack 501        <dss dack8=501 dll=0 nocs>
>>> 
>>> +0.1 read(4, ..., 500) = 500
>>> +0   write(4, ..., 100) = 100
>>> 
>>> // The server replies with data, still thinking MPTCP is being used
>>> +0.0 > P. 1:101(100)   ack 501          <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
>>> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
>>> +0.1 <  . 501:501(0)   ack 101 win 2048
>>> 
>>> +0.0 < P. 501:601(100) ack 101 win 2048
>>> // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
>>> +0.0 >  . 101:101(0)   ack 601
>>> 
>>> Note that this script requires Packetdrill with MPTCP support, see [3].
>>> 
>>> Fixes: dea2b1ea9c70 ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
>>> Reported-by: Christoph Paasch <cpaasch@apple.com>
>>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
>>> Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
>>> Link: https://github.com/multipath-tcp/packetdrill [3]
>>> Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
>>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>> ---
>>> Cc: Davide Caratti <dcaratti@redhat.com>
>>> Cc: Paolo Abeni <pabeni@redhat.com>
>> 
>> awesome! Is this ready to be tested or does this need to go through the CI,… first ?
> 
> To me, it looks ready to be tested, and reviewed :)
> 
> It already went through the CI, everything was OK there:
> 
> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/11167965968

Thanks! Will get the testing going internally. And I will get back on syzkaller one day 
Matthieu Baerts (NGI0) Oct. 7, 2024, 6:26 p.m. UTC | #5
Hello,

On 03/10/2024 20:43, Matthieu Baerts (NGI0) wrote:
> As reported by Christoph [1], before this patch, an MPTCP connection was
> wrongly reset when a host received a first data packet with MPTCP
> options after the 3wHS, but got the next ones without.

I would like to send this patch tomorrow to netdev: we already talked
about it at the last meeting. Any objections?

Can I add any Acked-by/Reviewed-by by chance? :)

Cheers,
Matt
Matthieu Baerts (NGI0) Oct. 7, 2024, 6:32 p.m. UTC | #6
On 07/10/2024 20:26, Matthieu Baerts wrote:
> On 03/10/2024 20:43, Matthieu Baerts (NGI0) wrote:
>> As reported by Christoph [1], before this patch, an MPTCP connection was
>> wrongly reset when a host received a first data packet with MPTCP
>> options after the 3wHS, but got the next ones without.
> 
> I would like to send this patch tomorrow to netdev: we already talked
> about it at the last meeting. Any objections?
> 
> Can I add any Acked-by/Reviewed-by by chance? :)

I just added the patch in our tree, I can add the tag(s) when sending
the patch tomorrow ;)

New patches for t/upstream-net and t/upstream:
- 501099bf8c1a: mptcp: fallback if MPTCP opts are dropped after 1st data
- Results: 400d24177d84..2a3f9c146e18 (export-net)
- 78ff2df2e86f: conflict in
t/mptcp-annotate-data-races-around-subflow-fully_established
- Results: 2ce52e0f26d3..e93738912d75 (export)

Tests are now in progress:

- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/df4f7ff6d778e59b9f7ad5dc1d659805dc86e1ab/checks
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/6ce058e0f7a716c81c5cacf692f285abfbd87a39/checks

Cheers,
Matt
Paolo Abeni Oct. 8, 2024, 6:55 a.m. UTC | #7
On 10/3/24 20:43, Matthieu Baerts (NGI0) wrote:
> As reported by Christoph [1], before this patch, an MPTCP connection was
> wrongly reset when a host received a first data packet with MPTCP
> options after the 3wHS, but got the next ones without.
> 
> According to the MPTCP v1 specs [2], a fallback should happen in this
> case, because the host didn't receive a DATA_ACK from the other peer,
> nor receive data for more than the initial window which implies a
> DATA_ACK being received by the other peer.
> 
> The patch here re-uses the same logic as the one used in other places:
> by looking at allow_infinite_fallback, which is disabled at the creation
> of an additional subflow. It's not looking at the first DATA_ACK (or
> implying one received from the other side) as suggested by the RFC, but
> it is in continuation with what was already done, which is safer, and it
> fixes the reported issue. The next step, looking at this first DATA_ACK,
> is tracked in [4].
> 
> This patch has been validated using the following Packetdrill script:
> 
>     0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>    +0 bind(3, ..., ...) = 0
>    +0 listen(3, 1) = 0
> 
>    // 3WHS is OK
>    +0.0 < S  0:0(0)       win 65535  <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
>    +0.0 > S. 0:0(0) ack 1            <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
>    +0.1 <  . 1:1(0) ack 1 win 2048                                              <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
>    +0 accept(3, ..., ...) = 4
> 
>    // Data from the client with valid MPTCP options (no DATA_ACK: normal)
>    +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
>    // From here, the MPTCP options will be dropped by a middlebox
>    +0.0 >  . 1:1(0)     ack 501        <dss dack8=501 dll=0 nocs>
> 
>    +0.1 read(4, ..., 500) = 500
>    +0   write(4, ..., 100) = 100
> 
>    // The server replies with data, still thinking MPTCP is being used
>    +0.0 > P. 1:101(100)   ack 501          <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
>    // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
>    +0.1 <  . 501:501(0)   ack 101 win 2048
> 
>    +0.0 < P. 501:601(100) ack 101 win 2048
>    // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
>    +0.0 >  . 101:101(0)   ack 601
> 
> Note that this script requires Packetdrill with MPTCP support, see [3].
> 
> Fixes: dea2b1ea9c70 ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
> Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
> Link: https://github.com/multipath-tcp/packetdrill [3]
> Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> Cc: Davide Caratti <dcaratti@redhat.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> ---
>   net/mptcp/subflow.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index a1e28e1d8b4e39e14bc8f98164013d302d62595c..61482f8b425b5dde17a4f3272d04973ae5ec7849 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1282,7 +1282,7 @@ static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)
>   	else if (READ_ONCE(msk->csum_enabled))
>   		return !subflow->valid_csum_seen;
>   	else
> -		return !READ_ONCE(subflow->fully_established);
> +		return READ_ONCE(msk->allow_infinite_fallback);
>   }
>   
>   static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk)

Reviewd-by: Paolo Abeni <pabeni@redhat.com>

Note that when checksum is enabled the condition in use is much more 
restrictive. That should be ok in practice, as hopefully no-one uses csum ;)

/P
Matthieu Baerts (NGI0) Oct. 8, 2024, 8:40 a.m. UTC | #8
Hi Paolo,

On 08/10/2024 08:55, Paolo Abeni wrote:
> On 10/3/24 20:43, Matthieu Baerts (NGI0) wrote:
>> As reported by Christoph [1], before this patch, an MPTCP connection was
>> wrongly reset when a host received a first data packet with MPTCP
>> options after the 3wHS, but got the next ones without.
>>
>> According to the MPTCP v1 specs [2], a fallback should happen in this
>> case, because the host didn't receive a DATA_ACK from the other peer,
>> nor receive data for more than the initial window which implies a
>> DATA_ACK being received by the other peer.
>>
>> The patch here re-uses the same logic as the one used in other places:
>> by looking at allow_infinite_fallback, which is disabled at the creation
>> of an additional subflow. It's not looking at the first DATA_ACK (or
>> implying one received from the other side) as suggested by the RFC, but
>> it is in continuation with what was already done, which is safer, and it
>> fixes the reported issue. The next step, looking at this first DATA_ACK,
>> is tracked in [4].
>>
>> This patch has been validated using the following Packetdrill script:
>>
>>     0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
>>    +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>>    +0 bind(3, ..., ...) = 0
>>    +0 listen(3, 1) = 0
>>
>>    // 3WHS is OK
>>    +0.0 < S  0:0(0)       win 65535  <mss 1460, sackOK, nop, nop, nop,
>> wscale 6, mpcapable v1 flags[flag_h] nokey>
>>    +0.0 > S. 0:0(0) ack 1            <mss 1460, nop, nop, sackOK, nop,
>> wscale 8, mpcapable v1 flags[flag_h] key[skey]>
>>    +0.1 <  . 1:1(0) ack 1 win
>> 2048                                              <mpcapable v1
>> flags[flag_h] key[ckey=2, skey]>
>>    +0 accept(3, ..., ...) = 4
>>
>>    // Data from the client with valid MPTCP options (no DATA_ACK: normal)
>>    +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h]
>> key[skey, ckey] mpcdatalen 500, nop, nop>
>>    // From here, the MPTCP options will be dropped by a middlebox
>>    +0.0 >  . 1:1(0)     ack 501        <dss dack8=501 dll=0 nocs>
>>
>>    +0.1 read(4, ..., 500) = 500
>>    +0   write(4, ..., 100) = 100
>>
>>    // The server replies with data, still thinking MPTCP is being used
>>    +0.0 > P. 1:101(100)   ack 501          <dss dack8=501 dsn8=1 ssn=1
>> dll=100 nocs, nop, nop>
>>    // But the client already did a fallback to TCP, because the two
>> previous packets have been received without MPTCP options
>>    +0.1 <  . 501:501(0)   ack 101 win 2048
>>
>>    +0.0 < P. 501:601(100) ack 101 win 2048
>>    // The server should fallback to TCP, not reset: it didn't get a
>> DATA_ACK, nor data for more than the initial window
>>    +0.0 >  . 101:101(0)   ack 601
>>
>> Note that this script requires Packetdrill with MPTCP support, see [3].
>>
>> Fixes: dea2b1ea9c70 ("mptcp: do not reset MP_CAPABLE subflow on
>> mapping errors")
>> Reported-by: Christoph Paasch <cpaasch@apple.com>
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
>> Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
>> Link: https://github.com/multipath-tcp/packetdrill [3]
>> Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> ---
>> Cc: Davide Caratti <dcaratti@redhat.com>
>> Cc: Paolo Abeni <pabeni@redhat.com>
>> ---
>>   net/mptcp/subflow.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
>> index
>> a1e28e1d8b4e39e14bc8f98164013d302d62595c..61482f8b425b5dde17a4f3272d04973ae5ec7849 100644
>> --- a/net/mptcp/subflow.c
>> +++ b/net/mptcp/subflow.c
>> @@ -1282,7 +1282,7 @@ static bool subflow_can_fallback(struct
>> mptcp_subflow_context *subflow)
>>       else if (READ_ONCE(msk->csum_enabled))
>>           return !subflow->valid_csum_seen;
>>       else
>> -        return !READ_ONCE(subflow->fully_established);
>> +        return READ_ONCE(msk->allow_infinite_fallback);
>>   }
>>     static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock
>> *ssk)
> 
> Reviewd-by: Paolo Abeni <pabeni@redhat.com>

Thank you for the review!

> Note that when checksum is enabled the condition in use is much more
> restrictive. That should be ok in practice, as hopefully no-one uses
> csum ;)

Good point! Probably something to improve there.

Cheers,
Matt
diff mbox series

Patch

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index a1e28e1d8b4e39e14bc8f98164013d302d62595c..61482f8b425b5dde17a4f3272d04973ae5ec7849 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1282,7 +1282,7 @@  static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)
 	else if (READ_ONCE(msk->csum_enabled))
 		return !subflow->valid_csum_seen;
 	else
-		return !READ_ONCE(subflow->fully_established);
+		return READ_ONCE(msk->allow_infinite_fallback);
 }
 
 static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk)