Message ID | 20240307090732.56708-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] netfilter: conntrack: avoid sending RST to reply out-of-window skb | expand |
Jason Xing <kerneljasonxing@gmail.com> wrote: > client_ip:client_port <--> server_ip:b_port > > Then, some strange skbs from client or gateway, say, out-of-window > skbs are sent to the server_ip:a_port (not b_port) due to DNAT > clearing skb->_nfct value in nf_conntrack_in() function. Why? > Because the tcp_in_window() considers the incoming skb as an > invalid skb by returning NFCT_TCP_INVALID. So far everything is as intended. > I think, even we have set DNAT policy, it would be better if the > whole process/behaviour adheres to the original TCP behaviour. > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > net/netfilter/nf_conntrack_proto_tcp.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > index ae493599a3ef..3f3e620f3969 100644 > --- a/net/netfilter/nf_conntrack_proto_tcp.c > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > @@ -1253,13 +1253,11 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > res = tcp_in_window(ct, dir, index, > skb, dataoff, th, state); > switch (res) { > - case NFCT_TCP_IGNORE: > - spin_unlock_bh(&ct->lock); > - return NF_ACCEPT; > case NFCT_TCP_INVALID: > nf_tcp_handle_invalid(ct, dir, index, skb, state); > + case NFCT_TCP_IGNORE: > spin_unlock_bh(&ct->lock); > - return -NF_ACCEPT; > + return NF_ACCEPT; This looks wrong. -NF_ACCEPT means 'pass packet, but its not part of the connection' (packet will match --ctstate INVALID check). This change disables most of the tcp_in_window() test, this will pretend everything is fine even though tcp_in_window says otherwise. You could: - drop invalid tcp packets in input hook - set nf_conntrack_tcp_be_liberal=1 both will avoid this 'rst' issue.
Hello Florian, On Thu, Mar 7, 2024 at 5:33 PM Florian Westphal <fw@strlen.de> wrote: > > Jason Xing <kerneljasonxing@gmail.com> wrote: > > client_ip:client_port <--> server_ip:b_port > > > > Then, some strange skbs from client or gateway, say, out-of-window > > skbs are sent to the server_ip:a_port (not b_port) due to DNAT > > clearing skb->_nfct value in nf_conntrack_in() function. Why? > > Because the tcp_in_window() considers the incoming skb as an > > invalid skb by returning NFCT_TCP_INVALID. > > So far everything is as intended. > > > I think, even we have set DNAT policy, it would be better if the > > whole process/behaviour adheres to the original TCP behaviour. > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > net/netfilter/nf_conntrack_proto_tcp.c | 6 ++---- > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > > index ae493599a3ef..3f3e620f3969 100644 > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > @@ -1253,13 +1253,11 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > res = tcp_in_window(ct, dir, index, > > skb, dataoff, th, state); > > switch (res) { > > - case NFCT_TCP_IGNORE: > > - spin_unlock_bh(&ct->lock); > > - return NF_ACCEPT; > > case NFCT_TCP_INVALID: > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > + case NFCT_TCP_IGNORE: > > spin_unlock_bh(&ct->lock); > > - return -NF_ACCEPT; > > + return NF_ACCEPT; > > This looks wrong. -NF_ACCEPT means 'pass packet, but its not part > of the connection' (packet will match --ctstate INVALID check). > > This change disables most of the tcp_in_window() test, this will > pretend everything is fine even though tcp_in_window says otherwise. Thanks for the information. It does make sense. What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl knob which you also pointed out. It also pretends to ignore those out-of-window skbs. > > You could: > - drop invalid tcp packets in input hook How about changing the return value only as below? Only two cases will be handled: diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index ae493599a3ef..c88ce4cd041e 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, case NFCT_TCP_INVALID: nf_tcp_handle_invalid(ct, dir, index, skb, state); spin_unlock_bh(&ct->lock); - return -NF_ACCEPT; + return -NF_DROP; case NFCT_TCP_ACCEPT: break; } Then it will be dropped naturally in nf_hook_slow(). > - set nf_conntrack_tcp_be_liberal=1 Sure, it can workaround this case, but I would like to refuse the out-of-window in netfilter or TCP layer as default instead of turning on this sysctl knob. If I understand wrong, please correct me. Thanks, Jason > > both will avoid this 'rst' issue.
Jason Xing <kerneljasonxing@gmail.com> wrote: > > This change disables most of the tcp_in_window() test, this will > > pretend everything is fine even though tcp_in_window says otherwise. > > Thanks for the information. It does make sense. > > What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl > knob which you also pointed out. It also pretends to ignore those > out-of-window skbs. > > > > > You could: > > - drop invalid tcp packets in input hook > > How about changing the return value only as below? Only two cases will > be handled: > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c > b/net/netfilter/nf_conntrack_proto_tcp.c > index ae493599a3ef..c88ce4cd041e 100644 > --- a/net/netfilter/nf_conntrack_proto_tcp.c > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > case NFCT_TCP_INVALID: > nf_tcp_handle_invalid(ct, dir, index, skb, state); > spin_unlock_bh(&ct->lock); > - return -NF_ACCEPT; > + return -NF_DROP; Lets not do this. conntrack should never drop packets and defer to ruleset whereever possible. > > - set nf_conntrack_tcp_be_liberal=1 > > Sure, it can workaround this case, but I would like to refuse the > out-of-window in netfilter or TCP layer as default instead of turning > on this sysctl knob. If I understand wrong, please correct me. Thats contradictory, you make a patch to always accept, then another patch to always drop such packets? You can get the drop behaviour via '-m conntrack --ctstate DROP' in prerouting or inut hooks. You can get the 'accept + do nat processing' via nf_conntrack_tcp_be_liberal=1.
On Thu, Mar 7, 2024 at 8:00 PM Florian Westphal <fw@strlen.de> wrote: > > Jason Xing <kerneljasonxing@gmail.com> wrote: > > > This change disables most of the tcp_in_window() test, this will > > > pretend everything is fine even though tcp_in_window says otherwise. > > > > Thanks for the information. It does make sense. > > > > What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl > > knob which you also pointed out. It also pretends to ignore those > > out-of-window skbs. > > > > > > > > You could: > > > - drop invalid tcp packets in input hook > > > > How about changing the return value only as below? Only two cases will > > be handled: > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c > > b/net/netfilter/nf_conntrack_proto_tcp.c > > index ae493599a3ef..c88ce4cd041e 100644 > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > case NFCT_TCP_INVALID: > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > spin_unlock_bh(&ct->lock); > > - return -NF_ACCEPT; > > + return -NF_DROP; > > Lets not do this. conntrack should never drop packets and defer to ruleset > whereever possible. Hmm, sorry, it is against my understanding. If we cannot return -NF_DROP, why have we already added some 'return NF_DROP' in the nf_conntrack_handle_packet() function? And why does this test statement exist? nf_conntrack_in() -> nf_conntrack_handle_packet() -> if (ret <= 0) { if (ret == -NF_DROP) NF_CT_STAT_INC_ATOMIC(state->net, drop); > > > > - set nf_conntrack_tcp_be_liberal=1 > > > > Sure, it can workaround this case, but I would like to refuse the > > out-of-window in netfilter or TCP layer as default instead of turning > > on this sysctl knob. If I understand wrong, please correct me. > > Thats contradictory, you make a patch to always accept, then another > patch to always drop such packets? My only purpose is not to let the TCP layer sending strange RST to the right flow. Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl knob seems odd to me though it can workaround :S I would like to prevent sending such an RST as default behaviour. I wonder why we have to send RST at last due to out-of-window skbs. It should not happen, right? As I said before, It can be set as default without relying on some sysctl knob. Forgive my superficial knowledge :( > > You can get the drop behaviour via '-m conntrack --ctstate DROP' in > prerouting or inut hooks. > > You can get the 'accept + do nat processing' via > nf_conntrack_tcp_be_liberal=1. Sure. Just turning on the sysctl knob can be helpful because I've tested it in production. After all, it roughly returns NFCT_TCP_ACCEPT in nf_tcp_log_invalid() without considering those various out-of-window cases. Thanks, Jason
Jason Xing <kerneljasonxing@gmail.com> wrote: > On Thu, Mar 7, 2024 at 8:00 PM Florian Westphal <fw@strlen.de> wrote: > > > > Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > This change disables most of the tcp_in_window() test, this will > > > > pretend everything is fine even though tcp_in_window says otherwise. > > > > > > Thanks for the information. It does make sense. > > > > > > What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl > > > knob which you also pointed out. It also pretends to ignore those > > > out-of-window skbs. > > > > > > > > > > > You could: > > > > - drop invalid tcp packets in input hook > > > > > > How about changing the return value only as below? Only two cases will > > > be handled: > > > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c > > > b/net/netfilter/nf_conntrack_proto_tcp.c > > > index ae493599a3ef..c88ce4cd041e 100644 > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > > @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > > case NFCT_TCP_INVALID: > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > > spin_unlock_bh(&ct->lock); > > > - return -NF_ACCEPT; > > > + return -NF_DROP; > > > > Lets not do this. conntrack should never drop packets and defer to ruleset > > whereever possible. > > Hmm, sorry, it is against my understanding. > > If we cannot return -NF_DROP, why have we already added some 'return > NF_DROP' in the nf_conntrack_handle_packet() function? And why does > this test statement exist? Sure we can drop. But we should only do it if there is no better alternative. > nf_conntrack_in() > -> nf_conntrack_handle_packet() > -> if (ret <= 0) { > if (ret == -NF_DROP) NF_CT_STAT_INC_ATOMIC(state->net, drop); AFAICS this only happens when we receive syn for an existing conntrack that is being removed already so we'd expect next syn to create a new connection. Feel free to send patches that replace drop with -accept where possible/where it makes sense, but I don't think the TCP_CONNTRACK_SYN_SENT one can reasonably be avoided. > My only purpose is not to let the TCP layer sending strange RST to the > right flow. AFAIU tcp layer is correct, no? Out of the blue packet to some listener socket? > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl > knob seems odd to me though it can workaround :S I don't see a better alternative, other than -p tcp -m conntrack --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see such packets. > I would like to prevent sending such an RST as default behaviour. I don't see a way to make this work out of the box, without possible unwanted side effects. MAYBE we could drop IFF we check that the conntrack entry candidate that fails sequence validation has NAT translation applied to it, and thus the '-NF_ACCEPT' packet won't be translated. Not even compile tested: diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, case NFCT_TCP_IGNORE: spin_unlock_bh(&ct->lock); return NF_ACCEPT; - case NFCT_TCP_INVALID: + case NFCT_TCP_INVALID: { + verdict = -NF_ACCEPT; + if (ct->status & IPS_NAT_MASK) + res = NF_DROP; /* skb would miss nat transformation */ nf_tcp_handle_invalid(ct, dir, index, skb, state); spin_unlock_bh(&ct->lock); - return -NF_ACCEPT; + return verdict; + } case NFCT_TCP_ACCEPT: break; } But I don't really see the advantage compared to doing drop decision in iptables/nftables ruleset. I also have a hunch that someone will eventually complain about this change in behavior.
On Thu, Mar 7, 2024 at 10:10 PM Florian Westphal <fw@strlen.de> wrote: > > Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, Mar 7, 2024 at 8:00 PM Florian Westphal <fw@strlen.de> wrote: > > > > > > Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > This change disables most of the tcp_in_window() test, this will > > > > > pretend everything is fine even though tcp_in_window says otherwise. > > > > > > > > Thanks for the information. It does make sense. > > > > > > > > What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl > > > > knob which you also pointed out. It also pretends to ignore those > > > > out-of-window skbs. > > > > > > > > > > > > > > You could: > > > > > - drop invalid tcp packets in input hook > > > > > > > > How about changing the return value only as below? Only two cases will > > > > be handled: > > > > > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c > > > > b/net/netfilter/nf_conntrack_proto_tcp.c > > > > index ae493599a3ef..c88ce4cd041e 100644 > > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > > > @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > > > case NFCT_TCP_INVALID: > > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > > > spin_unlock_bh(&ct->lock); > > > > - return -NF_ACCEPT; > > > > + return -NF_DROP; > > > > > > Lets not do this. conntrack should never drop packets and defer to ruleset > > > whereever possible. > > > > Hmm, sorry, it is against my understanding. > > > > If we cannot return -NF_DROP, why have we already added some 'return > > NF_DROP' in the nf_conntrack_handle_packet() function? And why does > > this test statement exist? > > Sure we can drop. But we should only do it if there is no better > alternative. > > > nf_conntrack_in() > > -> nf_conntrack_handle_packet() > > -> if (ret <= 0) { > > if (ret == -NF_DROP) NF_CT_STAT_INC_ATOMIC(state->net, drop); > > AFAICS this only happens when we receive syn for an existing conntrack > that is being removed already so we'd expect next syn to create a new Sorry, I've double-checked this part and found out there is no chance to return '-NF_DROP' for nf_conntrack_handle_packet(). It might return 'NF_DROP' (see link [1]) instead. The if-else statements seem like dead code. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/netfilter/nf_conntrack_proto_tcp.c#:~:text=%2DNF_REPEAT%3B-,return%20NF_DROP%3B,-%7D%0A%09%09fallthrough%3B > connection. Feel free to send patches that replace drop with -accept > where possible/where it makes sense, but I don't think the > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided. Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in nf_conntrack_dccp_packet()? There are three points where nf_conntrack_handle_packet() returns NF_DROP: 1) one (syn_sent case) exists in nf_conntrack_tcp_packet(). As you said, it's not necessary to change. 2) another two exist in nf_conntrack_dccp_packet() which should be the same as nf_conntrack_tcp_packet() handles. The patch goes like this: diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c index e2db1f4ec2df..ebc4f733bb2e 100644 --- a/net/netfilter/nf_conntrack_proto_dccp.c +++ b/net/netfilter/nf_conntrack_proto_dccp.c @@ -525,7 +525,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct, struct sk_buff *skb, dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh); if (!dh) - return NF_DROP; + return -NF_ACCEPT; if (dccp_error(dh, skb, dataoff, state)) return -NF_ACCEPT; @@ -533,7 +533,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct, struct sk_buff *skb, /* pull again, including possible 48 bit sequences and subtype header */ dh = dccp_header_pointer(skb, dataoff, dh, &_dh); if (!dh) - return NF_DROP; + return -NF_ACCEPT; type = dh->dccph_type; if (!nf_ct_is_confirmed(ct) && !dccp_new(ct, skb, dh, state)) > > > My only purpose is not to let the TCP layer sending strange RST to the > > right flow. > > AFAIU tcp layer is correct, no? Out of the blue packet to some listener > socket? Allow me to finish the full sentence: my only purpose is not to let the TCP layer send strange RST to the _established_ socket due to receiving strange out-of-window skbs. > > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl > > knob seems odd to me though it can workaround :S > > I don't see a better alternative, other than -p tcp -m conntrack > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see > such packets. > > > I would like to prevent sending such an RST as default behaviour. > > I don't see a way to make this work out of the box, without possible > unwanted side effects. > > MAYBE we could drop IFF we check that the conntrack entry candidate > that fails sequence validation has NAT translation applied to it, and > thus the '-NF_ACCEPT' packet won't be translated. > > Not even compile tested: > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > --- a/net/netfilter/nf_conntrack_proto_tcp.c > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > case NFCT_TCP_IGNORE: > spin_unlock_bh(&ct->lock); > return NF_ACCEPT; > - case NFCT_TCP_INVALID: > + case NFCT_TCP_INVALID: { > + verdict = -NF_ACCEPT; > + if (ct->status & IPS_NAT_MASK) > + res = NF_DROP; /* skb would miss nat transformation */ Above line, I guess, should be 'verdict = NF_DROP'? Then this skb would be dropped in nf_hook_slow() eventually and would not be passed to the TCP layer. > nf_tcp_handle_invalid(ct, dir, index, skb, state); > spin_unlock_bh(&ct->lock); > - return -NF_ACCEPT; > + return verdict; > + } > case NFCT_TCP_ACCEPT: > break; > } Great! I think your draft patch makes sense really, which takes NAT into consideration. > > But I don't really see the advantage compared to doing drop decision in > iptables/nftables ruleset. From our views, especially to kernel developers, you're right: we could easily turn on that knob or add a drop policy to prevent it happening. Actually I did this in production to prevent such a case. It surely works. But from the views of normal users and those who do not understand how it works in the kernel, it looks strange: people may ask why we get some unknown RSTs in flight? > > I also have a hunch that someone will eventually complain about this > change in behavior. Well, I still think the patch you suggested is proper and don't know why people could complain about it. Thanks for your patience :) Thanks, Jason
On Thu, 7 Mar 2024, Jason Xing wrote: > On Thu, Mar 7, 2024 at 10:10 PM Florian Westphal <fw@strlen.de> wrote: > > > > Jason Xing <kerneljasonxing@gmail.com> wrote: > > > On Thu, Mar 7, 2024 at 8:00 PM Florian Westphal <fw@strlen.de> wrote: > > > > > > > > Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > This change disables most of the tcp_in_window() test, this will > > > > > > pretend everything is fine even though tcp_in_window says otherwise. > > > > > > > > > > Thanks for the information. It does make sense. > > > > > > > > > > What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl > > > > > knob which you also pointed out. It also pretends to ignore those > > > > > out-of-window skbs. > > > > > > > > > > > > > > > > > You could: > > > > > > - drop invalid tcp packets in input hook > > > > > > > > > > How about changing the return value only as below? Only two cases will > > > > > be handled: > > > > > > > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c > > > > > b/net/netfilter/nf_conntrack_proto_tcp.c > > > > > index ae493599a3ef..c88ce4cd041e 100644 > > > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > > > > @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > > > > case NFCT_TCP_INVALID: > > > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > > > > spin_unlock_bh(&ct->lock); > > > > > - return -NF_ACCEPT; > > > > > + return -NF_DROP; > > > > > > > > Lets not do this. conntrack should never drop packets and defer to ruleset > > > > whereever possible. > > > > > > Hmm, sorry, it is against my understanding. > > > > > > If we cannot return -NF_DROP, why have we already added some 'return > > > NF_DROP' in the nf_conntrack_handle_packet() function? And why does > > > this test statement exist? > > > > Sure we can drop. But we should only do it if there is no better > > alternative. > > > > > nf_conntrack_in() > > > -> nf_conntrack_handle_packet() > > > -> if (ret <= 0) { > > > if (ret == -NF_DROP) NF_CT_STAT_INC_ATOMIC(state->net, drop); > > > > AFAICS this only happens when we receive syn for an existing conntrack > > that is being removed already so we'd expect next syn to create a new > > Sorry, I've double-checked this part and found out there is no chance > to return '-NF_DROP' for nf_conntrack_handle_packet(). It might return > 'NF_DROP' (see link [1]) instead. The if-else statements seem like > dead code. > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/netfilter/nf_conntrack_proto_tcp.c#:~:text=%2DNF_REPEAT%3B-,return%20NF_DROP%3B,-%7D%0A%09%09fallthrough%3B > > > connection. Feel free to send patches that replace drop with -accept > > where possible/where it makes sense, but I don't think the > > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided. > > Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in > nf_conntrack_dccp_packet()? > > There are three points where nf_conntrack_handle_packet() returns NF_DROP: > 1) one (syn_sent case) exists in nf_conntrack_tcp_packet(). As you > said, it's not necessary to change. > 2) another two exist in nf_conntrack_dccp_packet() which should be the > same as nf_conntrack_tcp_packet() handles. > > The patch goes like this: > diff --git a/net/netfilter/nf_conntrack_proto_dccp.c > b/net/netfilter/nf_conntrack_proto_dccp.c > index e2db1f4ec2df..ebc4f733bb2e 100644 > --- a/net/netfilter/nf_conntrack_proto_dccp.c > +++ b/net/netfilter/nf_conntrack_proto_dccp.c > @@ -525,7 +525,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct, > struct sk_buff *skb, > > dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh); > if (!dh) > - return NF_DROP; > + return -NF_ACCEPT; > > if (dccp_error(dh, skb, dataoff, state)) > return -NF_ACCEPT; > @@ -533,7 +533,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct, > struct sk_buff *skb, > /* pull again, including possible 48 bit sequences and subtype header */ > dh = dccp_header_pointer(skb, dataoff, dh, &_dh); > if (!dh) > - return NF_DROP; > + return -NF_ACCEPT; > > type = dh->dccph_type; > if (!nf_ct_is_confirmed(ct) && !dccp_new(ct, skb, dh, state)) > > > > > > My only purpose is not to let the TCP layer sending strange RST to the > > > right flow. > > > > AFAIU tcp layer is correct, no? Out of the blue packet to some listener > > socket? > > Allow me to finish the full sentence: my only purpose is not to let > the TCP layer send strange RST to the _established_ socket due to > receiving strange out-of-window skbs. I don't understand why do you want to modify conntrack at all: conntrack itself does not send RST packets. And the TCP layer don't send RST packets to out of window ones either. The only possibility I see for such packets is an iptables/nftables rule which rejects packets classified as INVALID by conntrack. As Florian suggested, why don't you change that rule? The conntrack states are not fine-grained to express different TCP states which covered with INVALID. It was never a good idea to reject INVALID packets or let them through (leaking internal addresses). Best regards, Jozsef > > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl > > > knob seems odd to me though it can workaround :S > > > > I don't see a better alternative, other than -p tcp -m conntrack > > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see > > such packets. > > > > > I would like to prevent sending such an RST as default behaviour. > > > > I don't see a way to make this work out of the box, without possible > > unwanted side effects. > > > > MAYBE we could drop IFF we check that the conntrack entry candidate > > that fails sequence validation has NAT translation applied to it, and > > thus the '-NF_ACCEPT' packet won't be translated. > > > > Not even compile tested: > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > case NFCT_TCP_IGNORE: > > spin_unlock_bh(&ct->lock); > > return NF_ACCEPT; > > - case NFCT_TCP_INVALID: > > + case NFCT_TCP_INVALID: { > > + verdict = -NF_ACCEPT; > > + if (ct->status & IPS_NAT_MASK) > > + res = NF_DROP; /* skb would miss nat transformation */ > > Above line, I guess, should be 'verdict = NF_DROP'? Then this skb > would be dropped in nf_hook_slow() eventually and would not be passed > to the TCP layer. > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > spin_unlock_bh(&ct->lock); > > - return -NF_ACCEPT; > > + return verdict; > > + } > > case NFCT_TCP_ACCEPT: > > break; > > } > > Great! I think your draft patch makes sense really, which takes NAT > into consideration. > > > > > But I don't really see the advantage compared to doing drop decision in > > iptables/nftables ruleset. > > From our views, especially to kernel developers, you're right: we > could easily turn on that knob or add a drop policy to prevent it > happening. Actually I did this in production to prevent such a case. > It surely works. > > But from the views of normal users and those who do not understand how > it works in the kernel, it looks strange: people may ask why we get > some unknown RSTs in flight? > > > I also have a hunch that someone will eventually complain about this > > change in behavior. > > Well, I still think the patch you suggested is proper and don't know > why people could complain about it. > > Thanks for your patience :) > > Thanks, > Jason >
[...] > > Allow me to finish the full sentence: my only purpose is not to let > > the TCP layer send strange RST to the _established_ socket due to > > receiving strange out-of-window skbs. > > I don't understand why do you want to modify conntrack at all: conntrack > itself does not send RST packets. And the TCP layer don't send RST packets > to out of window ones either. Thanks for your reply. To normal TCP flow, you're right because the TCP layer doesn't send RST to out-of-window skbs. But the DNAT policy on the server should have converted the port of incoming skb from A_port to B_port as my description in this patch said. It actually doesn't happen. The conntrack clears the skb->_nfct value after returning -NF_ACCEPT in nf_conntrack_tcp_packet() and then DNAT would not convert the A_port to B_port. So the TCP layer is not able to look up the correct socket (client_ip, client_port, server_ip, B_port) because A_port doesn't match B_port, then an RST would be sent to the client. > > The only possibility I see for such packets is an iptables/nftables rule > which rejects packets classified as INVALID by conntrack. > > As Florian suggested, why don't you change that rule? As I said, just for the workaround method, only turning on that sysctl knob can solve the issue. Thanks, Jason > > The conntrack states are not fine-grained to express different TCP states > which covered with INVALID. It was never a good idea to reject INVALID > packets or let them through (leaking internal addresses). > > Best regards, > Jozsef > > > > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl > > > > knob seems odd to me though it can workaround :S > > > > > > I don't see a better alternative, other than -p tcp -m conntrack > > > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see > > > such packets. > > > > > > > I would like to prevent sending such an RST as default behaviour. > > > > > > I don't see a way to make this work out of the box, without possible > > > unwanted side effects. > > > > > > MAYBE we could drop IFF we check that the conntrack entry candidate > > > that fails sequence validation has NAT translation applied to it, and > > > thus the '-NF_ACCEPT' packet won't be translated. > > > > > > Not even compile tested: > > > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > > case NFCT_TCP_IGNORE: > > > spin_unlock_bh(&ct->lock); > > > return NF_ACCEPT; > > > - case NFCT_TCP_INVALID: > > > + case NFCT_TCP_INVALID: { > > > + verdict = -NF_ACCEPT; > > > + if (ct->status & IPS_NAT_MASK) > > > + res = NF_DROP; /* skb would miss nat transformation */ > > > > Above line, I guess, should be 'verdict = NF_DROP'? Then this skb > > would be dropped in nf_hook_slow() eventually and would not be passed > > to the TCP layer. > > > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > > spin_unlock_bh(&ct->lock); > > > - return -NF_ACCEPT; > > > + return verdict; > > > + } > > > case NFCT_TCP_ACCEPT: > > > break; > > > } > > > > Great! I think your draft patch makes sense really, which takes NAT > > into consideration. > > > > > > > > But I don't really see the advantage compared to doing drop decision in > > > iptables/nftables ruleset. > > > > From our views, especially to kernel developers, you're right: we > > could easily turn on that knob or add a drop policy to prevent it > > happening. Actually I did this in production to prevent such a case. > > It surely works. > > > > But from the views of normal users and those who do not understand how > > it works in the kernel, it looks strange: people may ask why we get > > some unknown RSTs in flight? > > > > > I also have a hunch that someone will eventually complain about this > > > change in behavior. > > > > Well, I still think the patch you suggested is proper and don't know > > why people could complain about it. > > > > Thanks for your patience :) > > > > Thanks, > > Jason > > > > -- > E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt > Address : Wigner Research Centre for Physics > H-1525 Budapest 114, POB. 49, Hungary
On Thu, 7 Mar 2024, Jason Xing wrote: > > > Allow me to finish the full sentence: my only purpose is not to let > > > the TCP layer send strange RST to the _established_ socket due to > > > receiving strange out-of-window skbs. > > > > I don't understand why do you want to modify conntrack at all: conntrack > > itself does not send RST packets. And the TCP layer don't send RST packets > > to out of window ones either. > > To normal TCP flow, you're right because the TCP layer doesn't send RST > to out-of-window skbs. > > But the DNAT policy on the server should have converted the port of > incoming skb from A_port to B_port as my description in this patch said. > > It actually doesn't happen. The conntrack clears the skb->_nfct value > after returning -NF_ACCEPT in nf_conntrack_tcp_packet() and then DNAT > would not convert the A_port to B_port. The packet is INVALID therefore it's not NATed. I don't think that could simply be changed in netfilter. > So the TCP layer is not able to look up the correct socket (client_ip, > client_port, server_ip, B_port) because A_port doesn't match B_port, > then an RST would be sent to the client. Yes, if you let the packet continue to traverse the stack. > > The only possibility I see for such packets is an iptables/nftables rule > > which rejects packets classified as INVALID by conntrack. > > > > As Florian suggested, why don't you change that rule? > > As I said, just for the workaround method, only turning on that sysctl > knob can solve the issue. Sorry, but why? If you drop the packet by a rule, then there's no need to use the sysctl setting and there's no unwanted RST packets. Best regards, Jozsef > > The conntrack states are not fine-grained to express different TCP states > > which covered with INVALID. It was never a good idea to reject INVALID > > packets or let them through (leaking internal addresses). > > > > Best regards, > > Jozsef > > > > > > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl > > > > > knob seems odd to me though it can workaround :S > > > > > > > > I don't see a better alternative, other than -p tcp -m conntrack > > > > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see > > > > such packets. > > > > > > > > > I would like to prevent sending such an RST as default behaviour. > > > > > > > > I don't see a way to make this work out of the box, without possible > > > > unwanted side effects. > > > > > > > > MAYBE we could drop IFF we check that the conntrack entry candidate > > > > that fails sequence validation has NAT translation applied to it, and > > > > thus the '-NF_ACCEPT' packet won't be translated. > > > > > > > > Not even compile tested: > > > > > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > > > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > > > case NFCT_TCP_IGNORE: > > > > spin_unlock_bh(&ct->lock); > > > > return NF_ACCEPT; > > > > - case NFCT_TCP_INVALID: > > > > + case NFCT_TCP_INVALID: { > > > > + verdict = -NF_ACCEPT; > > > > + if (ct->status & IPS_NAT_MASK) > > > > + res = NF_DROP; /* skb would miss nat transformation */ > > > > > > Above line, I guess, should be 'verdict = NF_DROP'? Then this skb > > > would be dropped in nf_hook_slow() eventually and would not be passed > > > to the TCP layer. > > > > > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > > > spin_unlock_bh(&ct->lock); > > > > - return -NF_ACCEPT; > > > > + return verdict; > > > > + } > > > > case NFCT_TCP_ACCEPT: > > > > break; > > > > } > > > > > > Great! I think your draft patch makes sense really, which takes NAT > > > into consideration. > > > > > > > > > > > But I don't really see the advantage compared to doing drop decision in > > > > iptables/nftables ruleset. > > > > > > From our views, especially to kernel developers, you're right: we > > > could easily turn on that knob or add a drop policy to prevent it > > > happening. Actually I did this in production to prevent such a case. > > > It surely works. > > > > > > But from the views of normal users and those who do not understand how > > > it works in the kernel, it looks strange: people may ask why we get > > > some unknown RSTs in flight? > > > > > > > I also have a hunch that someone will eventually complain about this > > > > change in behavior. > > > > > > Well, I still think the patch you suggested is proper and don't know > > > why people could complain about it. > > > > > > Thanks for your patience :) > > > > > > Thanks, > > > Jason > > > > > > > -- > > E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu > > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt > > Address : Wigner Research Centre for Physics > > H-1525 Budapest 114, POB. 49, Hungary >
On Fri, Mar 8, 2024 at 3:00 AM Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote: > > On Thu, 7 Mar 2024, Jason Xing wrote: > > > > > Allow me to finish the full sentence: my only purpose is not to let > > > > the TCP layer send strange RST to the _established_ socket due to > > > > receiving strange out-of-window skbs. > > > > > > I don't understand why do you want to modify conntrack at all: conntrack > > > itself does not send RST packets. And the TCP layer don't send RST packets > > > to out of window ones either. > > > > To normal TCP flow, you're right because the TCP layer doesn't send RST > > to out-of-window skbs. > > > > But the DNAT policy on the server should have converted the port of > > incoming skb from A_port to B_port as my description in this patch said. > > > > It actually doesn't happen. The conntrack clears the skb->_nfct value > > after returning -NF_ACCEPT in nf_conntrack_tcp_packet() and then DNAT > > would not convert the A_port to B_port. > > The packet is INVALID therefore it's not NATed. I don't think that could > simply be changed in netfilter. Well, what would you suggest ? :) > > > So the TCP layer is not able to look up the correct socket (client_ip, > > client_port, server_ip, B_port) because A_port doesn't match B_port, > > then an RST would be sent to the client. > > Yes, if you let the packet continue to traverse the stack. Yes! > > > > The only possibility I see for such packets is an iptables/nftables rule > > > which rejects packets classified as INVALID by conntrack. > > > > > > As Florian suggested, why don't you change that rule? > > > > As I said, just for the workaround method, only turning on that sysctl > > knob can solve the issue. > > Sorry, but why? If you drop the packet by a rule, then there's no need to > use the sysctl setting and there's no unwanted RST packets. Oh, I was trying to clarify that using some knob can work around which is not my original intention, but I don't seek a workaround method. I didn't use --cstate INVALID to drop those INVALID packets in production, which I feel should work. For this particular case, I feel it's not that good to ask users/customers to add more rules or turn on sysctl knob to prevent seeing such RSTs. Instead I thought we could naturally stop sending such RSTs as default without asking other people to change something. People shouldn't see the RSTs really. That's why I wrote this patch. I hope I'm right... :S Thanks, Jason > > Best regards, > Jozsef > > > > The conntrack states are not fine-grained to express different TCP states > > > which covered with INVALID. It was never a good idea to reject INVALID > > > packets or let them through (leaking internal addresses). > > > > > > Best regards, > > > Jozsef > > > > > > > > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl > > > > > > knob seems odd to me though it can workaround :S > > > > > > > > > > I don't see a better alternative, other than -p tcp -m conntrack > > > > > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see > > > > > such packets. > > > > > > > > > > > I would like to prevent sending such an RST as default behaviour. > > > > > > > > > > I don't see a way to make this work out of the box, without possible > > > > > unwanted side effects. > > > > > > > > > > MAYBE we could drop IFF we check that the conntrack entry candidate > > > > > that fails sequence validation has NAT translation applied to it, and > > > > > thus the '-NF_ACCEPT' packet won't be translated. > > > > > > > > > > Not even compile tested: > > > > > > > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > > > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > > > > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > > > > case NFCT_TCP_IGNORE: > > > > > spin_unlock_bh(&ct->lock); > > > > > return NF_ACCEPT; > > > > > - case NFCT_TCP_INVALID: > > > > > + case NFCT_TCP_INVALID: { > > > > > + verdict = -NF_ACCEPT; > > > > > + if (ct->status & IPS_NAT_MASK) > > > > > + res = NF_DROP; /* skb would miss nat transformation */ > > > > > > > > Above line, I guess, should be 'verdict = NF_DROP'? Then this skb > > > > would be dropped in nf_hook_slow() eventually and would not be passed > > > > to the TCP layer. > > > > > > > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > > > > spin_unlock_bh(&ct->lock); > > > > > - return -NF_ACCEPT; > > > > > + return verdict; > > > > > + } > > > > > case NFCT_TCP_ACCEPT: > > > > > break; > > > > > } > > > > > > > > Great! I think your draft patch makes sense really, which takes NAT > > > > into consideration. > > > > > > > > > > > > > > But I don't really see the advantage compared to doing drop decision in > > > > > iptables/nftables ruleset. > > > > > > > > From our views, especially to kernel developers, you're right: we > > > > could easily turn on that knob or add a drop policy to prevent it > > > > happening. Actually I did this in production to prevent such a case. > > > > It surely works. > > > > > > > > But from the views of normal users and those who do not understand how > > > > it works in the kernel, it looks strange: people may ask why we get > > > > some unknown RSTs in flight? > > > > > > > > > I also have a hunch that someone will eventually complain about this > > > > > change in behavior. > > > > > > > > Well, I still think the patch you suggested is proper and don't know > > > > why people could complain about it. > > > > > > > > Thanks for your patience :) > > > > > > > > Thanks, > > > > Jason > > > > > > > > > > -- > > > E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu > > > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt > > > Address : Wigner Research Centre for Physics > > > H-1525 Budapest 114, POB. 49, Hungary > > > > -- > E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt > Address : Wigner Research Centre for Physics > H-1525 Budapest 114, POB. 49, Hungary
On Thu, Mar 7, 2024 at 11:11 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, Mar 7, 2024 at 10:10 PM Florian Westphal <fw@strlen.de> wrote: > > > > Jason Xing <kerneljasonxing@gmail.com> wrote: > > > On Thu, Mar 7, 2024 at 8:00 PM Florian Westphal <fw@strlen.de> wrote: > > > > > > > > Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > This change disables most of the tcp_in_window() test, this will > > > > > > pretend everything is fine even though tcp_in_window says otherwise. > > > > > > > > > > Thanks for the information. It does make sense. > > > > > > > > > > What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl > > > > > knob which you also pointed out. It also pretends to ignore those > > > > > out-of-window skbs. > > > > > > > > > > > > > > > > > You could: > > > > > > - drop invalid tcp packets in input hook > > > > > > > > > > How about changing the return value only as below? Only two cases will > > > > > be handled: > > > > > > > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c > > > > > b/net/netfilter/nf_conntrack_proto_tcp.c > > > > > index ae493599a3ef..c88ce4cd041e 100644 > > > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > > > > @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > > > > case NFCT_TCP_INVALID: > > > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > > > > spin_unlock_bh(&ct->lock); > > > > > - return -NF_ACCEPT; > > > > > + return -NF_DROP; > > > > > > > > Lets not do this. conntrack should never drop packets and defer to ruleset > > > > whereever possible. > > > > > > Hmm, sorry, it is against my understanding. > > > > > > If we cannot return -NF_DROP, why have we already added some 'return > > > NF_DROP' in the nf_conntrack_handle_packet() function? And why does > > > this test statement exist? > > > > Sure we can drop. But we should only do it if there is no better > > alternative. > > > > > nf_conntrack_in() > > > -> nf_conntrack_handle_packet() > > > -> if (ret <= 0) { > > > if (ret == -NF_DROP) NF_CT_STAT_INC_ATOMIC(state->net, drop); > > > > AFAICS this only happens when we receive syn for an existing conntrack > > that is being removed already so we'd expect next syn to create a new > > Sorry, I've double-checked this part and found out there is no chance > to return '-NF_DROP' for nf_conntrack_handle_packet(). It might return > 'NF_DROP' (see link [1]) instead. The if-else statements seem like > dead code. My bad. My mind went blank last night, maybe it's too late. Well, It's not dead code because '-NF_DROP' which is still zero makes me misunderstand :( I'll post a patch to change 'if (ret == -NF_DROP)' to 'if (ret == NF_DROP)' just for easy reading. Thanks, Jason > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/netfilter/nf_conntrack_proto_tcp.c#:~:text=%2DNF_REPEAT%3B-,return%20NF_DROP%3B,-%7D%0A%09%09fallthrough%3B > > > connection. Feel free to send patches that replace drop with -accept > > where possible/where it makes sense, but I don't think the > > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided. > > Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in > nf_conntrack_dccp_packet()? > > There are three points where nf_conntrack_handle_packet() returns NF_DROP: > 1) one (syn_sent case) exists in nf_conntrack_tcp_packet(). As you > said, it's not necessary to change. > 2) another two exist in nf_conntrack_dccp_packet() which should be the > same as nf_conntrack_tcp_packet() handles. > > The patch goes like this: > diff --git a/net/netfilter/nf_conntrack_proto_dccp.c > b/net/netfilter/nf_conntrack_proto_dccp.c > index e2db1f4ec2df..ebc4f733bb2e 100644 > --- a/net/netfilter/nf_conntrack_proto_dccp.c > +++ b/net/netfilter/nf_conntrack_proto_dccp.c > @@ -525,7 +525,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct, > struct sk_buff *skb, > > dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh); > if (!dh) > - return NF_DROP; > + return -NF_ACCEPT; > > if (dccp_error(dh, skb, dataoff, state)) > return -NF_ACCEPT; > @@ -533,7 +533,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct, > struct sk_buff *skb, > /* pull again, including possible 48 bit sequences and subtype header */ > dh = dccp_header_pointer(skb, dataoff, dh, &_dh); > if (!dh) > - return NF_DROP; > + return -NF_ACCEPT; > > type = dh->dccph_type; > if (!nf_ct_is_confirmed(ct) && !dccp_new(ct, skb, dh, state)) > > > > > > My only purpose is not to let the TCP layer sending strange RST to the > > > right flow. > > > > AFAIU tcp layer is correct, no? Out of the blue packet to some listener > > socket? > > Allow me to finish the full sentence: my only purpose is not to let > the TCP layer send strange RST to the _established_ socket due to > receiving strange out-of-window skbs. > > > > > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl > > > knob seems odd to me though it can workaround :S > > > > I don't see a better alternative, other than -p tcp -m conntrack > > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see > > such packets. > > > > > I would like to prevent sending such an RST as default behaviour. > > > > I don't see a way to make this work out of the box, without possible > > unwanted side effects. > > > > MAYBE we could drop IFF we check that the conntrack entry candidate > > that fails sequence validation has NAT translation applied to it, and > > thus the '-NF_ACCEPT' packet won't be translated. > > > > Not even compile tested: > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c > > --- a/net/netfilter/nf_conntrack_proto_tcp.c > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c > > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, > > case NFCT_TCP_IGNORE: > > spin_unlock_bh(&ct->lock); > > return NF_ACCEPT; > > - case NFCT_TCP_INVALID: > > + case NFCT_TCP_INVALID: { > > + verdict = -NF_ACCEPT; > > + if (ct->status & IPS_NAT_MASK) > > + res = NF_DROP; /* skb would miss nat transformation */ > > Above line, I guess, should be 'verdict = NF_DROP'? Then this skb > would be dropped in nf_hook_slow() eventually and would not be passed > to the TCP layer. > > > nf_tcp_handle_invalid(ct, dir, index, skb, state); > > spin_unlock_bh(&ct->lock); > > - return -NF_ACCEPT; > > + return verdict; > > + } > > case NFCT_TCP_ACCEPT: > > break; > > } > > Great! I think your draft patch makes sense really, which takes NAT > into consideration. > > > > > But I don't really see the advantage compared to doing drop decision in > > iptables/nftables ruleset. > > From our views, especially to kernel developers, you're right: we > could easily turn on that knob or add a drop policy to prevent it > happening. Actually I did this in production to prevent such a case. > It surely works. > > But from the views of normal users and those who do not understand how > it works in the kernel, it looks strange: people may ask why we get > some unknown RSTs in flight? > > > > > I also have a hunch that someone will eventually complain about this > > change in behavior. > > Well, I still think the patch you suggested is proper and don't know > why people could complain about it. > > Thanks for your patience :) > > Thanks, > Jason
Jason Xing <kerneljasonxing@gmail.com> wrote: > > connection. Feel free to send patches that replace drop with -accept > > where possible/where it makes sense, but I don't think the > > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided. > > Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in > nf_conntrack_dccp_packet()? It would be more consistent with what tcp and sctp trackers are doing, but this should not matter in practice (the packet is malformed). > > + case NFCT_TCP_INVALID: { > > + verdict = -NF_ACCEPT; > > + if (ct->status & IPS_NAT_MASK) > > + res = NF_DROP; /* skb would miss nat transformation */ > > Above line, I guess, should be 'verdict = NF_DROP'? Yes. > Great! I think your draft patch makes sense really, which takes NAT > into consideration. You could submit this officially and we could give it a try and see if anyone complains down the road.
On Sat, Mar 9, 2024 at 6:47 AM Florian Westphal <fw@strlen.de> wrote: > > Jason Xing <kerneljasonxing@gmail.com> wrote: > > > connection. Feel free to send patches that replace drop with -accept > > > where possible/where it makes sense, but I don't think the > > > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided. > > > > Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in > > nf_conntrack_dccp_packet()? > > It would be more consistent with what tcp and sctp trackers are > doing, but this should not matter in practice (the packet is malformed). Okay, I will take some time to check the sctp part. BTW, just like one of previous emails said, I noticed there are two points in DCCP part which is not consistent with TCP part, so I submitted one simple patch [1] to do it. [1]: https://lore.kernel.org/all/20240308092915.9751-1-kerneljasonxing@gmail.com/ > > > > + case NFCT_TCP_INVALID: { > > > + verdict = -NF_ACCEPT; > > > + if (ct->status & IPS_NAT_MASK) > > > + res = NF_DROP; /* skb would miss nat transformation */ > > > > Above line, I guess, should be 'verdict = NF_DROP'? > > Yes. > > > Great! I think your draft patch makes sense really, which takes NAT > > into consideration. > > You could submit this officially and we could give it a try and see if > anyone complains down the road. Great :) Thanks, Jason
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index ae493599a3ef..3f3e620f3969 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -1253,13 +1253,11 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, res = tcp_in_window(ct, dir, index, skb, dataoff, th, state); switch (res) { - case NFCT_TCP_IGNORE: - spin_unlock_bh(&ct->lock); - return NF_ACCEPT; case NFCT_TCP_INVALID: nf_tcp_handle_invalid(ct, dir, index, skb, state); + case NFCT_TCP_IGNORE: spin_unlock_bh(&ct->lock); - return -NF_ACCEPT; + return NF_ACCEPT; case NFCT_TCP_ACCEPT: break; }