diff mbox series

[v2] xfrm: Fix oops in __xfrm_state_delete()

Message ID 20221102101848.ibvumaxg2jdvk52y@intra2net.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [v2] xfrm: Fix oops in __xfrm_state_delete() | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 4 maintainers not CCed: kuba@kernel.org pabeni@redhat.com edumazet@google.com davem@davemloft.net
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Unknown commit id 'cbc3488685b20e7b2a98ad387a1a816aada569d8', maybe rebased or not pulled?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Thomas Jarosch Nov. 2, 2022, 10:18 a.m. UTC
Kernel 5.14 added a new "byseq" index to speed
up xfrm_state lookups by sequence number in commit
fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")

While the patch was thorough, the function pfkey_send_new_mapping()
in net/af_key.c also modifies x->km.seq and never added
the current xfrm_state to the "byseq" index.

This leads to the following kernel Ooops:
    BUG: kernel NULL pointer dereference, address: 0000000000000000
    ..
    RIP: 0010:__xfrm_state_delete+0xc9/0x1c0
    ..
    Call Trace:
    <TASK>
    xfrm_state_delete+0x1e/0x40
    xfrm_del_sa+0xb0/0x110 [xfrm_user]
    xfrm_user_rcv_msg+0x12d/0x270 [xfrm_user]
    ? remove_entity_load_avg+0x8a/0xa0
    ? copy_to_user_state_extra+0x580/0x580 [xfrm_user]
    netlink_rcv_skb+0x51/0x100
    xfrm_netlink_rcv+0x30/0x50 [xfrm_user]
    netlink_unicast+0x1a6/0x270
    netlink_sendmsg+0x22a/0x480
    __sys_sendto+0x1a6/0x1c0
    ? __audit_syscall_entry+0xd8/0x130
    ? __audit_syscall_exit+0x249/0x2b0
    __x64_sys_sendto+0x23/0x30
    do_syscall_64+0x3a/0x90
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

Exact location of the crash in __xfrm_state_delete():
    if (x->km.seq)
        hlist_del_rcu(&x->byseq);

The hlist_node "byseq" was never populated.

The bug only triggers if a new NAT traversal mapping (changed IP or port)
is detected in esp_input_done2() / esp6_input_done2(), which in turn
indirectly calls pfkey_send_new_mapping() *if* the kernel is compiled
with CONFIG_NET_KEY and "af_key" is active.

The PF_KEYv2 message SADB_X_NAT_T_NEW_MAPPING is not part of RFC 2367.
Various implementations have been examined how they handle
the "sadb_msg_seq" header field:

- racoon (Android): does not process SADB_X_NAT_T_NEW_MAPPING
- strongswan: does not care about sadb_msg_seq
- openswan: does not care about sadb_msg_seq

There is no standard how PF_KEYv2 sadb_msg_seq should be populated
for SADB_X_NAT_T_NEW_MAPPING and it's not used in popular
implementations either. Herbert Xu suggested we should just
use the current km.seq value as is. This fixes the root cause
of the oops since we no longer modify km.seq itself.

The update of "km.seq" looks like a copy'n'paste error
from pfkey_send_acquire(). SADB_ACQUIRE must indeed assign a unique km.seq
number according to RFC 2367. It has been verified that code paths
involving pfkey_send_acquire() don't cause the same Oops.

PF_KEYv2 SADB_X_NAT_T_NEW_MAPPING support was originally added here:
    https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

    commit cbc3488685b20e7b2a98ad387a1a816aada569d8
    Author:     Derek Atkins <derek@ihtfp.com>
    AuthorDate: Wed Apr 2 13:21:02 2003 -0800

        [IPSEC]: Implement UDP Encapsulation framework.

        In particular, implement ESPinUDP encapsulation for IPsec
        Nat Traversal.

A note on triggering the bug: I was not able to trigger it using VMs.
There is one VPN using a high latency link on our production VPN server
that triggered it like once a day though.

Link: https://github.com/strongswan/strongswan/issues/992
Link: https://lore.kernel.org/netdev/00959f33ee52c4b3b0084d42c430418e502db554.1652340703.git.antony.antony@secunet.com/T/
Link: https://lore.kernel.org/netdev/20221027142455.3975224-1-chenzhihao@meizu.com/T/

Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
Reported-by: Roth Mark <rothm@mail.com>
Reported-by: Zhihao Chen <chenzhihao@meizu.com>
Tested-by: Roth Mark <rothm@mail.com>
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
---
 net/key/af_key.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Antony Antony Nov. 3, 2022, 10:32 a.m. UTC | #1
On Wed, Nov 02, 2022 at 11:18:48 +0100, Thomas Jarosch wrote:
> Kernel 5.14 added a new "byseq" index to speed
> up xfrm_state lookups by sequence number in commit
> fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
> 
> While the patch was thorough, the function pfkey_send_new_mapping()
> in net/af_key.c also modifies x->km.seq and never added
> the current xfrm_state to the "byseq" index.
> 
> This leads to the following kernel Ooops:
>     BUG: kernel NULL pointer dereference, address: 0000000000000000
>     ..
>     RIP: 0010:__xfrm_state_delete+0xc9/0x1c0
>     ..
>     Call Trace:
>     <TASK>
>     xfrm_state_delete+0x1e/0x40
>     xfrm_del_sa+0xb0/0x110 [xfrm_user]
>     xfrm_user_rcv_msg+0x12d/0x270 [xfrm_user]
>     ? remove_entity_load_avg+0x8a/0xa0
>     ? copy_to_user_state_extra+0x580/0x580 [xfrm_user]
>     netlink_rcv_skb+0x51/0x100
>     xfrm_netlink_rcv+0x30/0x50 [xfrm_user]
>     netlink_unicast+0x1a6/0x270
>     netlink_sendmsg+0x22a/0x480
>     __sys_sendto+0x1a6/0x1c0
>     ? __audit_syscall_entry+0xd8/0x130
>     ? __audit_syscall_exit+0x249/0x2b0
>     __x64_sys_sendto+0x23/0x30
>     do_syscall_64+0x3a/0x90
>     entry_SYSCALL_64_after_hwframe+0x61/0xcb
> 
> Exact location of the crash in __xfrm_state_delete():
>     if (x->km.seq)
>         hlist_del_rcu(&x->byseq);
> 
> The hlist_node "byseq" was never populated.
> 
> The bug only triggers if a new NAT traversal mapping (changed IP or port)
> is detected in esp_input_done2() / esp6_input_done2(), which in turn
> indirectly calls pfkey_send_new_mapping() *if* the kernel is compiled
> with CONFIG_NET_KEY and "af_key" is active.
> 
> The PF_KEYv2 message SADB_X_NAT_T_NEW_MAPPING is not part of RFC 2367.
> Various implementations have been examined how they handle
> the "sadb_msg_seq" header field:
> 
> - racoon (Android): does not process SADB_X_NAT_T_NEW_MAPPING
> - strongswan: does not care about sadb_msg_seq
> - openswan: does not care about sadb_msg_seq
> 
> There is no standard how PF_KEYv2 sadb_msg_seq should be populated
> for SADB_X_NAT_T_NEW_MAPPING and it's not used in popular
> implementations either. Herbert Xu suggested we should just
> use the current km.seq value as is. This fixes the root cause
> of the oops since we no longer modify km.seq itself.
> 
> The update of "km.seq" looks like a copy'n'paste error
> from pfkey_send_acquire(). SADB_ACQUIRE must indeed assign a unique km.seq
> number according to RFC 2367. It has been verified that code paths
> involving pfkey_send_acquire() don't cause the same Oops.
> 
> PF_KEYv2 SADB_X_NAT_T_NEW_MAPPING support was originally added here:
>     https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> 
>     commit cbc3488685b20e7b2a98ad387a1a816aada569d8
>     Author:     Derek Atkins <derek@ihtfp.com>
>     AuthorDate: Wed Apr 2 13:21:02 2003 -0800
> 
>         [IPSEC]: Implement UDP Encapsulation framework.
> 
>         In particular, implement ESPinUDP encapsulation for IPsec
>         Nat Traversal.
> 
> A note on triggering the bug: I was not able to trigger it using VMs.
> There is one VPN using a high latency link on our production VPN server
> that triggered it like once a day though.
> 
> Link: https://github.com/strongswan/strongswan/issues/992
> Link: https://lore.kernel.org/netdev/00959f33ee52c4b3b0084d42c430418e502db554.1652340703.git.antony.antony@secunet.com/T/
> Link: https://lore.kernel.org/netdev/20221027142455.3975224-1-chenzhihao@meizu.com/T/
> 
> Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
> Reported-by: Roth Mark <rothm@mail.com>
> Reported-by: Zhihao Chen <chenzhihao@meizu.com>
> Tested-by: Roth Mark <rothm@mail.com> 
> Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>

Acked-by: Antony Antony <antony.antony@secunet.com>

> ---
>  net/key/af_key.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/key/af_key.c b/net/key/af_key.c
> index c85df5b958d2..a4c128bab377 100644
> --- a/net/key/af_key.c
> +++ b/net/key/af_key.c
> @@ -3382,7 +3382,7 @@ static int pfkey_send_new_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr,
>  	hdr->sadb_msg_len = size / sizeof(uint64_t);
>  	hdr->sadb_msg_errno = 0;
>  	hdr->sadb_msg_reserved = 0;
> -	hdr->sadb_msg_seq = x->km.seq = get_acqseq();
> +	hdr->sadb_msg_seq = x->km.seq;
>  	hdr->sadb_msg_pid = 0;
>  
>  	/* SA */
> -- 
> 2.37.3
Herbert Xu Nov. 4, 2022, 4:43 a.m. UTC | #2
On Wed, Nov 02, 2022 at 11:18:48AM +0100, Thomas Jarosch wrote:
> Kernel 5.14 added a new "byseq" index to speed
> up xfrm_state lookups by sequence number in commit
> fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
> 
> While the patch was thorough, the function pfkey_send_new_mapping()
> in net/af_key.c also modifies x->km.seq and never added
> the current xfrm_state to the "byseq" index.
> 
> This leads to the following kernel Ooops:
>     BUG: kernel NULL pointer dereference, address: 0000000000000000
>     ..
>     RIP: 0010:__xfrm_state_delete+0xc9/0x1c0
>     ..
>     Call Trace:
>     <TASK>
>     xfrm_state_delete+0x1e/0x40
>     xfrm_del_sa+0xb0/0x110 [xfrm_user]
>     xfrm_user_rcv_msg+0x12d/0x270 [xfrm_user]
>     ? remove_entity_load_avg+0x8a/0xa0
>     ? copy_to_user_state_extra+0x580/0x580 [xfrm_user]
>     netlink_rcv_skb+0x51/0x100
>     xfrm_netlink_rcv+0x30/0x50 [xfrm_user]
>     netlink_unicast+0x1a6/0x270
>     netlink_sendmsg+0x22a/0x480
>     __sys_sendto+0x1a6/0x1c0
>     ? __audit_syscall_entry+0xd8/0x130
>     ? __audit_syscall_exit+0x249/0x2b0
>     __x64_sys_sendto+0x23/0x30
>     do_syscall_64+0x3a/0x90
>     entry_SYSCALL_64_after_hwframe+0x61/0xcb
> 
> Exact location of the crash in __xfrm_state_delete():
>     if (x->km.seq)
>         hlist_del_rcu(&x->byseq);
> 
> The hlist_node "byseq" was never populated.
> 
> The bug only triggers if a new NAT traversal mapping (changed IP or port)
> is detected in esp_input_done2() / esp6_input_done2(), which in turn
> indirectly calls pfkey_send_new_mapping() *if* the kernel is compiled
> with CONFIG_NET_KEY and "af_key" is active.
> 
> The PF_KEYv2 message SADB_X_NAT_T_NEW_MAPPING is not part of RFC 2367.
> Various implementations have been examined how they handle
> the "sadb_msg_seq" header field:
> 
> - racoon (Android): does not process SADB_X_NAT_T_NEW_MAPPING
> - strongswan: does not care about sadb_msg_seq
> - openswan: does not care about sadb_msg_seq
> 
> There is no standard how PF_KEYv2 sadb_msg_seq should be populated
> for SADB_X_NAT_T_NEW_MAPPING and it's not used in popular
> implementations either. Herbert Xu suggested we should just
> use the current km.seq value as is. This fixes the root cause
> of the oops since we no longer modify km.seq itself.
> 
> The update of "km.seq" looks like a copy'n'paste error
> from pfkey_send_acquire(). SADB_ACQUIRE must indeed assign a unique km.seq
> number according to RFC 2367. It has been verified that code paths
> involving pfkey_send_acquire() don't cause the same Oops.
> 
> PF_KEYv2 SADB_X_NAT_T_NEW_MAPPING support was originally added here:
>     https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> 
>     commit cbc3488685b20e7b2a98ad387a1a816aada569d8
>     Author:     Derek Atkins <derek@ihtfp.com>
>     AuthorDate: Wed Apr 2 13:21:02 2003 -0800
> 
>         [IPSEC]: Implement UDP Encapsulation framework.
> 
>         In particular, implement ESPinUDP encapsulation for IPsec
>         Nat Traversal.
> 
> A note on triggering the bug: I was not able to trigger it using VMs.
> There is one VPN using a high latency link on our production VPN server
> that triggered it like once a day though.
> 
> Link: https://github.com/strongswan/strongswan/issues/992
> Link: https://lore.kernel.org/netdev/00959f33ee52c4b3b0084d42c430418e502db554.1652340703.git.antony.antony@secunet.com/T/
> Link: https://lore.kernel.org/netdev/20221027142455.3975224-1-chenzhihao@meizu.com/T/
> 
> Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
> Reported-by: Roth Mark <rothm@mail.com>
> Reported-by: Zhihao Chen <chenzhihao@meizu.com>
> Tested-by: Roth Mark <rothm@mail.com>
> Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
> ---
>  net/key/af_key.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
Steffen Klassert Nov. 17, 2022, 6:32 a.m. UTC | #3
On Fri, Nov 04, 2022 at 12:43:53PM +0800, Herbert Xu wrote:
> On Wed, Nov 02, 2022 at 11:18:48AM +0100, Thomas Jarosch wrote:
...
> > 
> > Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
> > Reported-by: Roth Mark <rothm@mail.com>
> > Reported-by: Zhihao Chen <chenzhihao@meizu.com>
> > Tested-by: Roth Mark <rothm@mail.com>
> > Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
> > ---
> >  net/key/af_key.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks everyone!
diff mbox series

Patch

diff --git a/net/key/af_key.c b/net/key/af_key.c
index c85df5b958d2..a4c128bab377 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3382,7 +3382,7 @@  static int pfkey_send_new_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr,
 	hdr->sadb_msg_len = size / sizeof(uint64_t);
 	hdr->sadb_msg_errno = 0;
 	hdr->sadb_msg_reserved = 0;
-	hdr->sadb_msg_seq = x->km.seq = get_acqseq();
+	hdr->sadb_msg_seq = x->km.seq;
 	hdr->sadb_msg_pid = 0;
 
 	/* SA */