Message ID | 20221027114930.137735-1-luwei32@huawei.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] tcp: reset tp->sacked_out when sack is enabled | expand |
please ignore this patch, I forgot to change the title 在 2022/10/27 7:49 PM, Lu Wei 写道: > If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code > of TCPOPT_SACK_PERM is called to enable sack after data is sent > and before data is acked, it will trigger a warning in function > tcp_verify_left_out() as follows: > > ============================================ > WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 > tcp_timeout_mark_lost+0x154/0x160 > tcp_enter_loss+0x2b/0x290 > tcp_retransmit_timer+0x50b/0x640 > tcp_write_timer_handler+0x1c8/0x340 > tcp_write_timer+0xe5/0x140 > call_timer_fn+0x3a/0x1b0 > __run_timers.part.0+0x1bf/0x2d0 > run_timer_softirq+0x43/0xb0 > __do_softirq+0xfd/0x373 > __irq_exit_rcu+0xf6/0x140 > > This warning occurs in several steps: > Step1. If sack is not enabled, when server receives dup-ack, > it calls tcp_add_reno_sack() to increase tp->sacked_out. > > Step2. Setsockopt() is called to enable sack > > Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost() > to increase tp->lost_out but not clear tp->sacked_out because > sack is enabled and tcp_is_reno() is false. > > So tp->left_out is increased repeatly in Step1 and Step3 and it is > greater than tp->packets_out and trigger the warning. In function > tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not > happen and the warning will not be triggered. As suggested by Denis > and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already > sent. > > socket-tcp tests in CRIU has been tested as follows: > $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp* --keep-going \ > --ignore-taint > > socket-tcp* represent all socket-tcp tests in test/zdtm/static/. > > Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") > Signed-off-by: Lu Wei <luwei32@huawei.com> > --- > net/ipv4/tcp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index ef14efa1fb70..ef876e70f7fe 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, > case TCP_REPAIR_OPTIONS: > if (!tp->repair) > err = -EINVAL; > - else if (sk->sk_state == TCP_ESTABLISHED) > + else if (sk->sk_state == TCP_ESTABLISHED && !tp->packets_out) > err = tcp_repair_options_est(sk, optval, optlen); > else > err = -EPERM;
On Thu, Oct 27, 2022 at 3:45 AM Lu Wei <luwei32@huawei.com> wrote: > > If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code > of TCPOPT_SACK_PERM is called to enable sack after data is sent > and before data is acked, it will trigger a warning in function > tcp_verify_left_out() as follows: > > ============================================ > WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 > tcp_timeout_mark_lost+0x154/0x160 > tcp_enter_loss+0x2b/0x290 > tcp_retransmit_timer+0x50b/0x640 > tcp_write_timer_handler+0x1c8/0x340 > tcp_write_timer+0xe5/0x140 > call_timer_fn+0x3a/0x1b0 > __run_timers.part.0+0x1bf/0x2d0 > run_timer_softirq+0x43/0xb0 > __do_softirq+0xfd/0x373 > __irq_exit_rcu+0xf6/0x140 > > This warning occurs in several steps: > Step1. If sack is not enabled, when server receives dup-ack, > it calls tcp_add_reno_sack() to increase tp->sacked_out. > > Step2. Setsockopt() is called to enable sack > > Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost() > to increase tp->lost_out but not clear tp->sacked_out because > sack is enabled and tcp_is_reno() is false. > > So tp->left_out is increased repeatly in Step1 and Step3 and it is > greater than tp->packets_out and trigger the warning. In function > tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not > happen and the warning will not be triggered. As suggested by Denis > and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already > sent. > > socket-tcp tests in CRIU has been tested as follows: > $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp* --keep-going \ > --ignore-taint > > socket-tcp* represent all socket-tcp tests in test/zdtm/static/. > > Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") > Signed-off-by: Lu Wei <luwei32@huawei.com> > --- > net/ipv4/tcp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index ef14efa1fb70..ef876e70f7fe 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, > case TCP_REPAIR_OPTIONS: > if (!tp->repair) > err = -EINVAL; > - else if (sk->sk_state == TCP_ESTABLISHED) > + else if (sk->sk_state == TCP_ESTABLISHED && !tp->packets_out) You keep focusing on packets_out :/ What I said was : TCP_REPAIR_OPTIONS must be denied if any packets have been sent (and possibly already ACK) Looking at tp->packets_out alone is not sufficient. > err = tcp_repair_options_est(sk, optval, optlen); > else > err = -EPERM; > -- > 2.31.1 >
On Thu, Oct 27, 2022 at 4:14 AM Pavel Emelyanov <ovzxemul@gmail.com> wrote: > > > > чт, 27 окт. 2022 г., 14:08 Eric Dumazet <edumazet@google.com>: >> >> On Thu, Oct 27, 2022 at 3:45 AM Lu Wei <luwei32@huawei.com> wrote: >> > >> > If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code >> > of TCPOPT_SACK_PERM is called to enable sack after data is sent >> > and before data is acked, it will trigger a warning in function >> > tcp_verify_left_out() as follows: >> > >> > ============================================ >> > WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 >> > tcp_timeout_mark_lost+0x154/0x160 >> > tcp_enter_loss+0x2b/0x290 >> > tcp_retransmit_timer+0x50b/0x640 >> > tcp_write_timer_handler+0x1c8/0x340 >> > tcp_write_timer+0xe5/0x140 >> > call_timer_fn+0x3a/0x1b0 >> > __run_timers.part.0+0x1bf/0x2d0 >> > run_timer_softirq+0x43/0xb0 >> > __do_softirq+0xfd/0x373 >> > __irq_exit_rcu+0xf6/0x140 >> > >> > This warning occurs in several steps: >> > Step1. If sack is not enabled, when server receives dup-ack, >> > it calls tcp_add_reno_sack() to increase tp->sacked_out. >> > >> > Step2. Setsockopt() is called to enable sack >> > >> > Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost() >> > to increase tp->lost_out but not clear tp->sacked_out because >> > sack is enabled and tcp_is_reno() is false. >> > >> > So tp->left_out is increased repeatly in Step1 and Step3 and it is >> > greater than tp->packets_out and trigger the warning. In function >> > tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not >> > happen and the warning will not be triggered. As suggested by Denis >> > and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already >> > sent. >> > >> > socket-tcp tests in CRIU has been tested as follows: >> > $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp* --keep-going \ >> > --ignore-taint >> > >> > socket-tcp* represent all socket-tcp tests in test/zdtm/static/. >> > >> > Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") >> > Signed-off-by: Lu Wei <luwei32@huawei.com> >> > --- >> > net/ipv4/tcp.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >> > index ef14efa1fb70..ef876e70f7fe 100644 >> > --- a/net/ipv4/tcp.c >> > +++ b/net/ipv4/tcp.c >> > @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, >> > case TCP_REPAIR_OPTIONS: >> > if (!tp->repair) >> > err = -EINVAL; >> > - else if (sk->sk_state == TCP_ESTABLISHED) >> > + else if (sk->sk_state == TCP_ESTABLISHED && !tp->packets_out) >> >> You keep focusing on packets_out :/ >> >> What I said was : TCP_REPAIR_OPTIONS must be denied if any packets >> have been sent (and possibly already ACK) > > > > If repair mode is ON, why does socket send any packets? This shouldn't happen from my perspective. Exactly. TCP_REPAIR is easily abused by fuzzers. We need to enforce sanity rules in the kernel, not assuming user space is following the CRIU way of doing things. > > -- Pavel > >> >> Looking at tp->packets_out alone is not sufficient. >> >> > err = tcp_repair_options_est(sk, optval, optlen); >> > else >> > err = -EPERM; >> > -- >> > 2.31.1 >> >
在 2022/10/27 7:18 PM, Eric Dumazet 写道: > On Thu, Oct 27, 2022 at 4:14 AM Pavel Emelyanov <ovzxemul@gmail.com> wrote: >> >> >> чт, 27 окт. 2022 г., 14:08 Eric Dumazet <edumazet@google.com>: >>> On Thu, Oct 27, 2022 at 3:45 AM Lu Wei <luwei32@huawei.com> wrote: >>>> If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code >>>> of TCPOPT_SACK_PERM is called to enable sack after data is sent >>>> and before data is acked, it will trigger a warning in function >>>> tcp_verify_left_out() as follows: >>>> >>>> ============================================ >>>> WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 >>>> tcp_timeout_mark_lost+0x154/0x160 >>>> tcp_enter_loss+0x2b/0x290 >>>> tcp_retransmit_timer+0x50b/0x640 >>>> tcp_write_timer_handler+0x1c8/0x340 >>>> tcp_write_timer+0xe5/0x140 >>>> call_timer_fn+0x3a/0x1b0 >>>> __run_timers.part.0+0x1bf/0x2d0 >>>> run_timer_softirq+0x43/0xb0 >>>> __do_softirq+0xfd/0x373 >>>> __irq_exit_rcu+0xf6/0x140 >>>> >>>> This warning occurs in several steps: >>>> Step1. If sack is not enabled, when server receives dup-ack, >>>> it calls tcp_add_reno_sack() to increase tp->sacked_out. >>>> >>>> Step2. Setsockopt() is called to enable sack >>>> >>>> Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost() >>>> to increase tp->lost_out but not clear tp->sacked_out because >>>> sack is enabled and tcp_is_reno() is false. >>>> >>>> So tp->left_out is increased repeatly in Step1 and Step3 and it is >>>> greater than tp->packets_out and trigger the warning. In function >>>> tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not >>>> happen and the warning will not be triggered. As suggested by Denis >>>> and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already >>>> sent. >>>> >>>> socket-tcp tests in CRIU has been tested as follows: >>>> $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp* --keep-going \ >>>> --ignore-taint >>>> >>>> socket-tcp* represent all socket-tcp tests in test/zdtm/static/. >>>> >>>> Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") >>>> Signed-off-by: Lu Wei <luwei32@huawei.com> >>>> --- >>>> net/ipv4/tcp.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >>>> index ef14efa1fb70..ef876e70f7fe 100644 >>>> --- a/net/ipv4/tcp.c >>>> +++ b/net/ipv4/tcp.c >>>> @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, >>>> case TCP_REPAIR_OPTIONS: >>>> if (!tp->repair) >>>> err = -EINVAL; >>>> - else if (sk->sk_state == TCP_ESTABLISHED) >>>> + else if (sk->sk_state == TCP_ESTABLISHED && !tp->packets_out) >>> You keep focusing on packets_out :/ >>> >>> What I said was : TCP_REPAIR_OPTIONS must be denied if any packets >>> have been sent (and possibly already ACK) >> >> >> If repair mode is ON, why does socket send any packets? This shouldn't happen from my perspective. repair mode is entered after packets are sent. the process is: 1. client sends packets 2. client reveives dup-ack and increace sacked_out 3. client enter repair mode and set TCP_SACK_SEEN 4. retransmit_timer expires and enter_loss and increace lost_out 5. tcp_left_out(tp->sacked_out + tp->lost_out) is greater than packets_out and triggers the warning > Exactly. > > TCP_REPAIR is easily abused by fuzzers. > > We need to enforce sanity rules in the kernel, not assuming user space > is following the CRIU way of doing things. yes, the warning is caught by a fuzzers, please refer to the attachment. > >> -- Pavel >> >>> Looking at tp->packets_out alone is not sufficient. >>> >>>> err = tcp_repair_options_est(sk, optval, optlen); >>>> else >>>> err = -EPERM; >>>> -- >>>> 2.31.1 >>>> > .
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index ef14efa1fb70..ef876e70f7fe 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, case TCP_REPAIR_OPTIONS: if (!tp->repair) err = -EINVAL; - else if (sk->sk_state == TCP_ESTABLISHED) + else if (sk->sk_state == TCP_ESTABLISHED && !tp->packets_out) err = tcp_repair_options_est(sk, optval, optlen); else err = -EPERM;
If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code of TCPOPT_SACK_PERM is called to enable sack after data is sent and before data is acked, it will trigger a warning in function tcp_verify_left_out() as follows: ============================================ WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132 tcp_timeout_mark_lost+0x154/0x160 tcp_enter_loss+0x2b/0x290 tcp_retransmit_timer+0x50b/0x640 tcp_write_timer_handler+0x1c8/0x340 tcp_write_timer+0xe5/0x140 call_timer_fn+0x3a/0x1b0 __run_timers.part.0+0x1bf/0x2d0 run_timer_softirq+0x43/0xb0 __do_softirq+0xfd/0x373 __irq_exit_rcu+0xf6/0x140 This warning occurs in several steps: Step1. If sack is not enabled, when server receives dup-ack, it calls tcp_add_reno_sack() to increase tp->sacked_out. Step2. Setsockopt() is called to enable sack Step3. The retransmit timer expires, it calls tcp_timeout_mark_lost() to increase tp->lost_out but not clear tp->sacked_out because sack is enabled and tcp_is_reno() is false. So tp->left_out is increased repeatly in Step1 and Step3 and it is greater than tp->packets_out and trigger the warning. In function tcp_timeout_mark_lost(), tp->sacked_out will be cleared if Step2 not happen and the warning will not be triggered. As suggested by Denis and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was already sent. socket-tcp tests in CRIU has been tested as follows: $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp* --keep-going \ --ignore-taint socket-tcp* represent all socket-tcp tests in test/zdtm/static/. Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters") Signed-off-by: Lu Wei <luwei32@huawei.com> --- net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)