diff mbox series

xfrm: Fix oops in __xfrm_state_delete()

Message ID 20221031152612.o3h44x3whath4iyp@intra2net.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series xfrm: Fix oops in __xfrm_state_delete() | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 4 maintainers not CCed: kuba@kernel.org pabeni@redhat.com edumazet@google.com davem@davemloft.net
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Unknown commit id 'cbc3488685b20e7b2a98ad387a1a816aada569d8', maybe rebased or not pulled?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Thomas Jarosch Oct. 31, 2022, 3:26 p.m. UTC
Kernel 5.14 added a new "byseq" index to speed
up xfrm_state lookups by sequence number in commit
fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")

While the patch was thorough, the function pfkey_send_new_mapping()
in net/af_key.c also modifies x->km.seq and never added
the current xfrm_state to the "byseq" index.

This leads to the following kernel Ooops:
    BUG: kernel NULL pointer dereference, address: 0000000000000000
    ..
    RIP: 0010:__xfrm_state_delete+0xc9/0x1c0
    ..
    Call Trace:
    <TASK>
    xfrm_state_delete+0x1e/0x40
    xfrm_del_sa+0xb0/0x110 [xfrm_user]
    xfrm_user_rcv_msg+0x12d/0x270 [xfrm_user]
    ? remove_entity_load_avg+0x8a/0xa0
    ? copy_to_user_state_extra+0x580/0x580 [xfrm_user]
    netlink_rcv_skb+0x51/0x100
    xfrm_netlink_rcv+0x30/0x50 [xfrm_user]
    netlink_unicast+0x1a6/0x270
    netlink_sendmsg+0x22a/0x480
    __sys_sendto+0x1a6/0x1c0
    ? __audit_syscall_entry+0xd8/0x130
    ? __audit_syscall_exit+0x249/0x2b0
    __x64_sys_sendto+0x23/0x30
    do_syscall_64+0x3a/0x90
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

Exact location of the crash in __xfrm_state_delete():
    if (x->km.seq)
        hlist_del_rcu(&x->byseq);

The hlist_node "byseq" was never populated.

The bug only triggers if a new NAT traversal mapping (changed IP or port)
is detected in esp_input_done2() / esp6_input_done2(), which in turn
indirectly calls pfkey_send_new_mapping() *if* the kernel is compiled
with CONFIG_NET_KEY and "af_key" is active or loaded.

The PF_KEYv2 message SADB_X_NAT_T_NEW_MAPPING is not part of RFC 2367.
Various implementations have been examined how they handle
the "sadb_msg_seq" header field:

- racoon (Android): does not process SADB_X_NAT_T_NEW_MAPPING
- strongswan: does not care about sadb_msg_seq
- openswan: does not care about sadb_msg_seq

Since there is no standard how PF_KEYv2 sadb_msg_seq should be populated
for SADB_X_NAT_T_NEW_MAPPING and it's not used in popular
implementations and is actually the sequence number of the
PF_KEYv2 message itself, we can fix the root cause of the oops
and not modify "x->km.seq" in pfkey_send_new_mapping().

The update of "km.seq" looks like a copy'n'paste error
from pfkey_send_acquire(). SADB_ACQUIRE must indeed assign a unique km.seq
number according to RFC 2367. It has been verified that code paths
involving pfkey_send_acquire() don't cause the same Oops.

PF_KEYv2 SADB_X_NAT_T_NEW_MAPPING support was originally added here:
    https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

    commit cbc3488685b20e7b2a98ad387a1a816aada569d8
    Author:     Derek Atkins <derek@ihtfp.com>
    AuthorDate: Wed Apr 2 13:21:02 2003 -0800

        [IPSEC]: Implement UDP Encapsulation framework.

        In particular, implement ESPinUDP encapsulation for IPsec
        Nat Traversal.

A note on triggering the bug: I was not able to trigger it using VMs.
There is one VPN using a high latency link on our production VPN server
that triggered it like once a day though.

Link: https://github.com/strongswan/strongswan/issues/992
Link: https://lore.kernel.org/netdev/00959f33ee52c4b3b0084d42c430418e502db554.1652340703.git.antony.antony@secunet.com/T/
Link: https://lore.kernel.org/netdev/20221027142455.3975224-1-chenzhihao@meizu.com/T/

Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
Reported-by: Roth Mark <rothm@mail.com>
Reported-by: Zhihao Chen <chenzhihao@meizu.com>
Tested-by: Roth Mark <rothm@mail.com>
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
---
 net/key/af_key.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Herbert Xu Nov. 1, 2022, 4:39 a.m. UTC | #1
On Mon, Oct 31, 2022 at 04:26:12PM +0100, Thomas Jarosch wrote:
>
> diff --git a/net/key/af_key.c b/net/key/af_key.c
> index c85df5b958d2..65a9ede62d65 100644
> --- a/net/key/af_key.c
> +++ b/net/key/af_key.c
> @@ -3382,7 +3382,7 @@ static int pfkey_send_new_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr,
>  	hdr->sadb_msg_len = size / sizeof(uint64_t);
>  	hdr->sadb_msg_errno = 0;
>  	hdr->sadb_msg_reserved = 0;
> -	hdr->sadb_msg_seq = x->km.seq = get_acqseq();
> +	hdr->sadb_msg_seq = get_acqseq();

This looks broken.  x->km.seq is part of the state which you are
changing.  Shouldn't you do whatever xfrm_user does in the same
situation?

Thanks,
Antony Antony Nov. 1, 2022, 7:10 p.m. UTC | #2
On Tue, Nov 01, 2022 at 12:39:48 +0800, Herbert Xu wrote:
> On Mon, Oct 31, 2022 at 04:26:12PM +0100, Thomas Jarosch wrote:
> >
> > diff --git a/net/key/af_key.c b/net/key/af_key.c
> > index c85df5b958d2..65a9ede62d65 100644
> > --- a/net/key/af_key.c
> > +++ b/net/key/af_key.c
> > @@ -3382,7 +3382,7 @@ static int pfkey_send_new_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr,
> >  	hdr->sadb_msg_len = size / sizeof(uint64_t);
> >  	hdr->sadb_msg_errno = 0;
> >  	hdr->sadb_msg_reserved = 0;
> > -	hdr->sadb_msg_seq = x->km.seq = get_acqseq();

This line looks very odd.

> > +	hdr->sadb_msg_seq = get_acqseq();
> 
> This looks broken.  x->km.seq is part of the state which you are
> changing.  Shouldn't you do whatever xfrm_user does in the same
> situation?

xfrm_user sets msg_seq to zero in mapping change message. seq is only useful for
acquire message. I think setting to zero would be a better fix.

-	hdr->sadb_msg_seq = x->km.seq = get_acqseq();
+	hdr->sadb_msg_seq = 0;

While increasing x->km.seq in every call to pfkey_send_new_mapping()
could be an issue, would it alone explan the crash?

Tobias would pfkey_send_new_mapping() called in a default setting?
Herbert Xu Nov. 2, 2022, 7:07 a.m. UTC | #3
On Tue, Nov 01, 2022 at 08:10:21PM +0100, Antony Antony wrote:
>
> xfrm_user sets msg_seq to zero in mapping change message. seq is only useful for

Oh I had misread the patch and thought this was send_acquire.

> acquire message. I think setting to zero would be a better fix.
> 
> -	hdr->sadb_msg_seq = x->km.seq = get_acqseq();
> +	hdr->sadb_msg_seq = 0;
> 
> While increasing x->km.seq in every call to pfkey_send_new_mapping()
> could be an issue, would it alone explan the crash?

Probably, if you change the state without moving it to the right
hash slot then the xfrm state hash table will be inconsistent.

We should copy the xfrm_user behaviour which is to leave x->km.seq
alone.  So the patch should change the above line to

	hdr->sadb_msg_seq = x->km.seq;

Thanks,
Thomas Jarosch Nov. 2, 2022, 8:31 a.m. UTC | #4
Hi Herbert,

You wrote on Wed, Nov 02, 2022 at 03:07:57PM +0800:
> > xfrm_user sets msg_seq to zero in mapping change message. seq is only useful for
> 
> Oh I had misread the patch and thought this was send_acquire.

it's a complex bug after all ^^ it took many printks() to trace
the flow of the state corruption on the production system.

> > acquire message. I think setting to zero would be a better fix.
> > 
> > -	hdr->sadb_msg_seq = x->km.seq = get_acqseq();
> > +	hdr->sadb_msg_seq = 0;
> > 
> > While increasing x->km.seq in every call to pfkey_send_new_mapping()
> > could be an issue, would it alone explan the crash?
> 
> Probably, if you change the state without moving it to the right
> hash slot then the xfrm state hash table will be inconsistent.

in the observed cases, km.seq is always zero before. So it was never added to
the byseq hash table in the first place, resulting in the NULL pointer Oops.

If km.seq would be non-zero before entering pfkey_send_new_mapping(),
then of course the xfrm_state would stay in the wrong hash table bucket.
The only other xfrm_states I've seen in my extensive tracing with a non-zero 
sequence number were ACQUIRE states and I'm not sure those will ever end up on 
the pfkey_send_new_mapping() code path. Either way, let's fix the root cause.

> We should copy the xfrm_user behaviour which is to leave x->km.seq
> alone.  So the patch should change the above line to
> 
> 	hdr->sadb_msg_seq = x->km.seq;

thanks for your feedback, I'll update the patch and send a v2.
I'll also put it in production tonight.

Cheers,
Thomas
diff mbox series

Patch

diff --git a/net/key/af_key.c b/net/key/af_key.c
index c85df5b958d2..65a9ede62d65 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3382,7 +3382,7 @@  static int pfkey_send_new_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr,
 	hdr->sadb_msg_len = size / sizeof(uint64_t);
 	hdr->sadb_msg_errno = 0;
 	hdr->sadb_msg_reserved = 0;
-	hdr->sadb_msg_seq = x->km.seq = get_acqseq();
+	hdr->sadb_msg_seq = get_acqseq();
 	hdr->sadb_msg_pid = 0;
 
 	/* SA */