diff mbox series

[net,v2] tcp: reset tp->sacked_out when sack is enabled

Message ID 20221027114930.137735-1-luwei32@huawei.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] tcp: reset tp->sacked_out when sack is enabled | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Lu Wei Oct. 27, 2022, 11:49 a.m. UTC
If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code
of TCPOPT_SACK_PERM is called to enable sack after data is sent
and before data is acked, it will trigger a warning in function
tcp_verify_left_out() as follows:

============================================
WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
tcp_timeout_mark_lost+0x154/0x160
tcp_enter_loss+0x2b/0x290
tcp_retransmit_timer+0x50b/0x640
tcp_write_timer_handler+0x1c8/0x340
tcp_write_timer+0xe5/0x140
call_timer_fn+0x3a/0x1b0
__run_timers.part.0+0x1bf/0x2d0
run_timer_softirq+0x43/0xb0
__do_softirq+0xfd/0x373
__irq_exit_rcu+0xf6/0x140

This warning occurs in several steps:
Step1. If sack is not enabled, when server receives dup-ack,
       it calls tcp_add_reno_sack() to increase tp->sacked_out.

Step2. Setsockopt() is called to enable sack

Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost()
       to increase tp->lost_out but not clear tp->sacked_out because
       sack is enabled and tcp_is_reno() is false.

So tp->left_out is increased repeatly in Step1 and Step3 and it is
greater than tp->packets_out and trigger the warning. In function
tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not
happen and the warning will not be triggered. As suggested by Denis
and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already
sent.

socket-tcp tests in CRIU has been tested as follows:
$ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp*  --keep-going \
       --ignore-taint

socket-tcp* represent all socket-tcp tests in test/zdtm/static/.

Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters")
Signed-off-by: Lu Wei <luwei32@huawei.com>
---
 net/ipv4/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Lu Wei Oct. 27, 2022, 11:04 a.m. UTC | #1
please ignore this patch, I forgot to change the title

在 2022/10/27 7:49 PM, Lu Wei 写道:
> If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code
> of TCPOPT_SACK_PERM is called to enable sack after data is sent
> and before data is acked, it will trigger a warning in function
> tcp_verify_left_out() as follows:
>
> ============================================
> WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
> tcp_timeout_mark_lost+0x154/0x160
> tcp_enter_loss+0x2b/0x290
> tcp_retransmit_timer+0x50b/0x640
> tcp_write_timer_handler+0x1c8/0x340
> tcp_write_timer+0xe5/0x140
> call_timer_fn+0x3a/0x1b0
> __run_timers.part.0+0x1bf/0x2d0
> run_timer_softirq+0x43/0xb0
> __do_softirq+0xfd/0x373
> __irq_exit_rcu+0xf6/0x140
>
> This warning occurs in several steps:
> Step1. If sack is not enabled, when server receives dup-ack,
>         it calls tcp_add_reno_sack() to increase tp->sacked_out.
>
> Step2. Setsockopt() is called to enable sack
>
> Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost()
>         to increase tp->lost_out but not clear tp->sacked_out because
>         sack is enabled and tcp_is_reno() is false.
>
> So tp->left_out is increased repeatly in Step1 and Step3 and it is
> greater than tp->packets_out and trigger the warning. In function
> tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not
> happen and the warning will not be triggered. As suggested by Denis
> and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already
> sent.
>
> socket-tcp tests in CRIU has been tested as follows:
> $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp*  --keep-going \
>         --ignore-taint
>
> socket-tcp* represent all socket-tcp tests in test/zdtm/static/.
>
> Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters")
> Signed-off-by: Lu Wei <luwei32@huawei.com>
> ---
>   net/ipv4/tcp.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index ef14efa1fb70..ef876e70f7fe 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
>   	case TCP_REPAIR_OPTIONS:
>   		if (!tp->repair)
>   			err = -EINVAL;
> -		else if (sk->sk_state == TCP_ESTABLISHED)
> +		else if (sk->sk_state == TCP_ESTABLISHED && !tp->packets_out)
>   			err = tcp_repair_options_est(sk, optval, optlen);
>   		else
>   			err = -EPERM;
Eric Dumazet Oct. 27, 2022, 11:08 a.m. UTC | #2
On Thu, Oct 27, 2022 at 3:45 AM Lu Wei <luwei32@huawei.com> wrote:
>
> If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code
> of TCPOPT_SACK_PERM is called to enable sack after data is sent
> and before data is acked, it will trigger a warning in function
> tcp_verify_left_out() as follows:
>
> ============================================
> WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
> tcp_timeout_mark_lost+0x154/0x160
> tcp_enter_loss+0x2b/0x290
> tcp_retransmit_timer+0x50b/0x640
> tcp_write_timer_handler+0x1c8/0x340
> tcp_write_timer+0xe5/0x140
> call_timer_fn+0x3a/0x1b0
> __run_timers.part.0+0x1bf/0x2d0
> run_timer_softirq+0x43/0xb0
> __do_softirq+0xfd/0x373
> __irq_exit_rcu+0xf6/0x140
>
> This warning occurs in several steps:
> Step1. If sack is not enabled, when server receives dup-ack,
>        it calls tcp_add_reno_sack() to increase tp->sacked_out.
>
> Step2. Setsockopt() is called to enable sack
>
> Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost()
>        to increase tp->lost_out but not clear tp->sacked_out because
>        sack is enabled and tcp_is_reno() is false.
>
> So tp->left_out is increased repeatly in Step1 and Step3 and it is
> greater than tp->packets_out and trigger the warning. In function
> tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not
> happen and the warning will not be triggered. As suggested by Denis
> and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already
> sent.
>
> socket-tcp tests in CRIU has been tested as follows:
> $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp*  --keep-going \
>        --ignore-taint
>
> socket-tcp* represent all socket-tcp tests in test/zdtm/static/.
>
> Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters")
> Signed-off-by: Lu Wei <luwei32@huawei.com>
> ---
>  net/ipv4/tcp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index ef14efa1fb70..ef876e70f7fe 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
>         case TCP_REPAIR_OPTIONS:
>                 if (!tp->repair)
>                         err = -EINVAL;
> -               else if (sk->sk_state == TCP_ESTABLISHED)
> +               else if (sk->sk_state == TCP_ESTABLISHED && !tp->packets_out)

You keep focusing on packets_out  :/

What I said was : TCP_REPAIR_OPTIONS must be denied if any packets
have been sent (and possibly already ACK)

Looking at tp->packets_out alone is not sufficient.

>                         err = tcp_repair_options_est(sk, optval, optlen);
>                 else
>                         err = -EPERM;
> --
> 2.31.1
>
Eric Dumazet Oct. 27, 2022, 11:18 a.m. UTC | #3
On Thu, Oct 27, 2022 at 4:14 AM Pavel Emelyanov <ovzxemul@gmail.com> wrote:
>
>
>
> чт, 27 окт. 2022 г., 14:08 Eric Dumazet <edumazet@google.com>:
>>
>> On Thu, Oct 27, 2022 at 3:45 AM Lu Wei <luwei32@huawei.com> wrote:
>> >
>> > If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code
>> > of TCPOPT_SACK_PERM is called to enable sack after data is sent
>> > and before data is acked, it will trigger a warning in function
>> > tcp_verify_left_out() as follows:
>> >
>> > ============================================
>> > WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
>> > tcp_timeout_mark_lost+0x154/0x160
>> > tcp_enter_loss+0x2b/0x290
>> > tcp_retransmit_timer+0x50b/0x640
>> > tcp_write_timer_handler+0x1c8/0x340
>> > tcp_write_timer+0xe5/0x140
>> > call_timer_fn+0x3a/0x1b0
>> > __run_timers.part.0+0x1bf/0x2d0
>> > run_timer_softirq+0x43/0xb0
>> > __do_softirq+0xfd/0x373
>> > __irq_exit_rcu+0xf6/0x140
>> >
>> > This warning occurs in several steps:
>> > Step1. If sack is not enabled, when server receives dup-ack,
>> >        it calls tcp_add_reno_sack() to increase tp->sacked_out.
>> >
>> > Step2. Setsockopt() is called to enable sack
>> >
>> > Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost()
>> >        to increase tp->lost_out but not clear tp->sacked_out because
>> >        sack is enabled and tcp_is_reno() is false.
>> >
>> > So tp->left_out is increased repeatly in Step1 and Step3 and it is
>> > greater than tp->packets_out and trigger the warning. In function
>> > tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not
>> > happen and the warning will not be triggered. As suggested by Denis
>> > and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already
>> > sent.
>> >
>> > socket-tcp tests in CRIU has been tested as follows:
>> > $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp*  --keep-going \
>> >        --ignore-taint
>> >
>> > socket-tcp* represent all socket-tcp tests in test/zdtm/static/.
>> >
>> > Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters")
>> > Signed-off-by: Lu Wei <luwei32@huawei.com>
>> > ---
>> >  net/ipv4/tcp.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> > index ef14efa1fb70..ef876e70f7fe 100644
>> > --- a/net/ipv4/tcp.c
>> > +++ b/net/ipv4/tcp.c
>> > @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
>> >         case TCP_REPAIR_OPTIONS:
>> >                 if (!tp->repair)
>> >                         err = -EINVAL;
>> > -               else if (sk->sk_state == TCP_ESTABLISHED)
>> > +               else if (sk->sk_state == TCP_ESTABLISHED && !tp->packets_out)
>>
>> You keep focusing on packets_out  :/
>>
>> What I said was : TCP_REPAIR_OPTIONS must be denied if any packets
>> have been sent (and possibly already ACK)
>
>
>
> If repair mode is ON, why does socket send any packets? This shouldn't happen from my perspective.

Exactly.

TCP_REPAIR is easily abused by fuzzers.

We need to enforce sanity rules in the kernel, not assuming user space
is following the CRIU way of doing things.

>
> -- Pavel
>
>>
>> Looking at tp->packets_out alone is not sufficient.
>>
>> >                         err = tcp_repair_options_est(sk, optval, optlen);
>> >                 else
>> >                         err = -EPERM;
>> > --
>> > 2.31.1
>> >
Lu Wei Oct. 27, 2022, 1:32 p.m. UTC | #4
在 2022/10/27 7:18 PM, Eric Dumazet 写道:
> On Thu, Oct 27, 2022 at 4:14 AM Pavel Emelyanov <ovzxemul@gmail.com> wrote:
>>
>>
>> чт, 27 окт. 2022 г., 14:08 Eric Dumazet <edumazet@google.com>:
>>> On Thu, Oct 27, 2022 at 3:45 AM Lu Wei <luwei32@huawei.com> wrote:
>>>> If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code
>>>> of TCPOPT_SACK_PERM is called to enable sack after data is sent
>>>> and before data is acked, it will trigger a warning in function
>>>> tcp_verify_left_out() as follows:
>>>>
>>>> ============================================
>>>> WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
>>>> tcp_timeout_mark_lost+0x154/0x160
>>>> tcp_enter_loss+0x2b/0x290
>>>> tcp_retransmit_timer+0x50b/0x640
>>>> tcp_write_timer_handler+0x1c8/0x340
>>>> tcp_write_timer+0xe5/0x140
>>>> call_timer_fn+0x3a/0x1b0
>>>> __run_timers.part.0+0x1bf/0x2d0
>>>> run_timer_softirq+0x43/0xb0
>>>> __do_softirq+0xfd/0x373
>>>> __irq_exit_rcu+0xf6/0x140
>>>>
>>>> This warning occurs in several steps:
>>>> Step1. If sack is not enabled, when server receives dup-ack,
>>>>         it calls tcp_add_reno_sack() to increase tp->sacked_out.
>>>>
>>>> Step2. Setsockopt() is called to enable sack
>>>>
>>>> Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost()
>>>>         to increase tp->lost_out but not clear tp->sacked_out because
>>>>         sack is enabled and tcp_is_reno() is false.
>>>>
>>>> So tp->left_out is increased repeatly in Step1 and Step3 and it is
>>>> greater than tp->packets_out and trigger the warning. In function
>>>> tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not
>>>> happen and the warning will not be triggered. As suggested by Denis
>>>> and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already
>>>> sent.
>>>>
>>>> socket-tcp tests in CRIU has been tested as follows:
>>>> $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp*  --keep-going \
>>>>         --ignore-taint
>>>>
>>>> socket-tcp* represent all socket-tcp tests in test/zdtm/static/.
>>>>
>>>> Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters")
>>>> Signed-off-by: Lu Wei <luwei32@huawei.com>
>>>> ---
>>>>   net/ipv4/tcp.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>> index ef14efa1fb70..ef876e70f7fe 100644
>>>> --- a/net/ipv4/tcp.c
>>>> +++ b/net/ipv4/tcp.c
>>>> @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
>>>>          case TCP_REPAIR_OPTIONS:
>>>>                  if (!tp->repair)
>>>>                          err = -EINVAL;
>>>> -               else if (sk->sk_state == TCP_ESTABLISHED)
>>>> +               else if (sk->sk_state == TCP_ESTABLISHED && !tp->packets_out)
>>> You keep focusing on packets_out  :/
>>>
>>> What I said was : TCP_REPAIR_OPTIONS must be denied if any packets
>>> have been sent (and possibly already ACK)
>>
>>
>> If repair mode is ON, why does socket send any packets? This shouldn't happen from my perspective.

repair mode is entered after packets are sent. the process is:
1. client sends packets
2. client reveives dup-ack and increace sacked_out
3. client enter repair mode and set TCP_SACK_SEEN
4. retransmit_timer expires and enter_loss and increace lost_out
5. tcp_left_out(tp->sacked_out + tp->lost_out) is greater than packets_out and triggers the warning

> Exactly.
>
> TCP_REPAIR is easily abused by fuzzers.
>
> We need to enforce sanity rules in the kernel, not assuming user space
> is following the CRIU way of doing things.
yes, the warning is caught by a fuzzers,  please refer to the attachment.
>
>> -- Pavel
>>
>>> Looking at tp->packets_out alone is not sufficient.
>>>
>>>>                          err = tcp_repair_options_est(sk, optval, optlen);
>>>>                  else
>>>>                          err = -EPERM;
>>>> --
>>>> 2.31.1
>>>>
> .
diff mbox series

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ef14efa1fb70..ef876e70f7fe 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3647,7 +3647,7 @@  int do_tcp_setsockopt(struct sock *sk, int level, int optname,
 	case TCP_REPAIR_OPTIONS:
 		if (!tp->repair)
 			err = -EINVAL;
-		else if (sk->sk_state == TCP_ESTABLISHED)
+		else if (sk->sk_state == TCP_ESTABLISHED && !tp->packets_out)
 			err = tcp_repair_options_est(sk, optval, optlen);
 		else
 			err = -EPERM;