diff mbox series

[net-next] udp_tunnel: properly deal with xfrm gro encap.

Message ID 6001185ace17e7d7d2ed176c20aef2461b60c613.1742323321.git.pabeni@redhat.com (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series [net-next] udp_tunnel: properly deal with xfrm gro encap. | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 5 this patch: 5
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2025-03-19--18-00 (tests: 896)

Commit Message

Paolo Abeni March 18, 2025, 6:47 p.m. UTC
The blamed commit below does not take in account that xfrm
can enable GRO over UDP encapsulation without going through
setup_udp_tunnel_sock().

At deletion time such socket will still go through
udp_tunnel_cleanup_gro(), and the failed GRO type lookup will
trigger the reported warning.

We can safely remove such warning, simply performing no action
on failed GRO type lookup at deletion time.

Reported-by: syzbot+8c469a2260132cd095c1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8c469a2260132cd095c1
Fixes: 311b36574ceac ("udp_tunnel: use static call for GRO hooks when possible")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp_offload.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Comments

Willem de Bruijn March 19, 2025, 2:35 p.m. UTC | #1
Paolo Abeni wrote:
> The blamed commit below does not take in account that xfrm
> can enable GRO over UDP encapsulation without going through
> setup_udp_tunnel_sock().
> 
> At deletion time such socket will still go through
> udp_tunnel_cleanup_gro(), and the failed GRO type lookup will
> trigger the reported warning.
> 
> We can safely remove such warning, simply performing no action
> on failed GRO type lookup at deletion time.
> 
> Reported-by: syzbot+8c469a2260132cd095c1@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=8c469a2260132cd095c1
> Fixes: 311b36574ceac ("udp_tunnel: use static call for GRO hooks when possible")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Because XFRM does not call udp_tunnel_update_gro_rcv when enabling its
UDP GRO offload, from set_xfrm_gro_udp_encap_rcv. But it does call it
when disabling the offload, as called for all udp sockest from
udp(v6)_destroy_sock. (Just to verify my understanding.)

Not calling udp_tunnel_update_gro_rcv on add will have the unintended
side effect of enabling the static call if one other tunnel is also
active, breaking UDP GRO for XFRM socket, right? Not a significant
consequence. But eventually XFRM would want to be counted, I suppose.

Reviewed-by: Willem de Bruijn <willemb@google.com>


> ---
>  net/ipv4/udp_offload.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 088aa8cb8ac0c..2e0b52ae665bc 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -110,14 +110,7 @@ void udp_tunnel_update_gro_rcv(struct sock *sk, bool add)
>  		cur = &udp_tunnel_gro_types[udp_tunnel_gro_type_nr++];
>  		refcount_set(&cur->count, 1);
>  		cur->gro_receive = up->gro_receive;
> -	} else {
> -		/*
> -		 * The stack cleanups only successfully added tunnel, the
> -		 * lookup on removal should never fail.
> -		 */
> -		if (WARN_ON_ONCE(!cur))
> -			goto out;
> -
> +	} else if (cur) {
>  		if (!refcount_dec_and_test(&cur->count))
>  			goto out;
>  
> -- 
> 2.48.1
>
Paolo Abeni March 19, 2025, 3:49 p.m. UTC | #2
On 3/19/25 3:35 PM, Willem de Bruijn wrote:
> Paolo Abeni wrote:
>> The blamed commit below does not take in account that xfrm
>> can enable GRO over UDP encapsulation without going through
>> setup_udp_tunnel_sock().
>>
>> At deletion time such socket will still go through
>> udp_tunnel_cleanup_gro(), and the failed GRO type lookup will
>> trigger the reported warning.
>>
>> We can safely remove such warning, simply performing no action
>> on failed GRO type lookup at deletion time.
>>
>> Reported-by: syzbot+8c469a2260132cd095c1@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=8c469a2260132cd095c1
>> Fixes: 311b36574ceac ("udp_tunnel: use static call for GRO hooks when possible")
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
> Because XFRM does not call udp_tunnel_update_gro_rcv when enabling its
> UDP GRO offload, from set_xfrm_gro_udp_encap_rcv. But it does call it
> when disabling the offload, as called for all udp sockest from
> udp(v6)_destroy_sock. (Just to verify my understanding.)

Exactly.

> Not calling udp_tunnel_update_gro_rcv on add will have the unintended
> side effect of enabling the static call if one other tunnel is also
> active, breaking UDP GRO for XFRM socket, right?

Ouch, right again. I think we can/should do better.

Given syzkaller has found another splat with no reproducer on the other
UDP GRO change of mine [1] and we are almost at merge window time, I'm
considering reverting entirely such changes and re-submit later
(hopefully fixed). WDYT?

Thanks,

Paolo

[1] https://syzkaller.appspot.com/bug?extid=1fb3291cc1beeb3c315a
I *think* moving:

	if (!up->tunnel_list.pprev)

from udp_tunnel_cleanup_gro() into udp_tunnel_update_gro_lookup(), under
the udp_tunnel_gro_lock spinlock should fix it, but without a repro it's
a bit risky,
Willem de Bruijn March 19, 2025, 5:44 p.m. UTC | #3
Paolo Abeni wrote:
> 
> 
> On 3/19/25 3:35 PM, Willem de Bruijn wrote:
> > Paolo Abeni wrote:
> >> The blamed commit below does not take in account that xfrm
> >> can enable GRO over UDP encapsulation without going through
> >> setup_udp_tunnel_sock().
> >>
> >> At deletion time such socket will still go through
> >> udp_tunnel_cleanup_gro(), and the failed GRO type lookup will
> >> trigger the reported warning.
> >>
> >> We can safely remove such warning, simply performing no action
> >> on failed GRO type lookup at deletion time.
> >>
> >> Reported-by: syzbot+8c469a2260132cd095c1@syzkaller.appspotmail.com
> >> Closes: https://syzkaller.appspot.com/bug?extid=8c469a2260132cd095c1
> >> Fixes: 311b36574ceac ("udp_tunnel: use static call for GRO hooks when possible")
> >> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > 
> > Because XFRM does not call udp_tunnel_update_gro_rcv when enabling its
> > UDP GRO offload, from set_xfrm_gro_udp_encap_rcv. But it does call it
> > when disabling the offload, as called for all udp sockest from
> > udp(v6)_destroy_sock. (Just to verify my understanding.)
> 
> Exactly.
> 
> > Not calling udp_tunnel_update_gro_rcv on add will have the unintended
> > side effect of enabling the static call if one other tunnel is also
> > active, breaking UDP GRO for XFRM socket, right?
> 
> Ouch, right again. I think we can/should do better.
> 
> Given syzkaller has found another splat with no reproducer on the other
> UDP GRO change of mine [1] and we are almost at merge window time, I'm
> considering reverting entirely such changes and re-submit later
> (hopefully fixed). WDYT?

Your call. I suspect that we can forward fix this. But yes, that is
always the riskier approach. And from a first quick look at the
report, the right fix is not immediately glaringly obvious indeed.

> Thanks,
> 
> Paolo
> 
> [1] https://syzkaller.appspot.com/bug?extid=1fb3291cc1beeb3c315a
> I *think* moving:
> 
> 	if (!up->tunnel_list.pprev)
> 
> from udp_tunnel_cleanup_gro() into udp_tunnel_update_gro_lookup(), under
> the udp_tunnel_gro_lock spinlock should fix it, but without a repro it's
> a bit risky,
>
Paolo Abeni March 19, 2025, 10:15 p.m. UTC | #4
On 3/19/25 6:44 PM, Willem de Bruijn wrote:
> Paolo Abeni wrote:
>> Given syzkaller has found another splat with no reproducer on the other
>> UDP GRO change of mine [1] and we are almost at merge window time, I'm
>> considering reverting entirely such changes and re-submit later
>> (hopefully fixed). WDYT?
> 
> Your call. I suspect that we can forward fix this. But yes, that is
> always the riskier approach. And from a first quick look at the
> report, the right fix is not immediately glaringly obvious indeed.

One problem to me is that I have my hands significantly full, since the
revert looks like the faster way out it looks the more appealing
candidate to me.

WRT the other issue, I think the problem is in udp_tunnel_cleanup_gro();
this check:

        if (!up->tunnel_list.pprev)
                return;

at sk deletion time is performed outside any lock. The current CPU could
see the old list value (empty) even if another core previously added the
sk into the UDP tunnel GRO list, thus skipping the removal from such
list. Later list operation will do UaF while touching the same list.

Moving the check under the udp_tunnel_gro_lock spinlock should solve the
issue.

Thanks,

Paolo
Willem de Bruijn March 20, 2025, 1:50 p.m. UTC | #5
Paolo Abeni wrote:
> On 3/19/25 6:44 PM, Willem de Bruijn wrote:
> > Paolo Abeni wrote:
> >> Given syzkaller has found another splat with no reproducer on the other
> >> UDP GRO change of mine [1] and we are almost at merge window time, I'm
> >> considering reverting entirely such changes and re-submit later
> >> (hopefully fixed). WDYT?
> > 
> > Your call. I suspect that we can forward fix this. But yes, that is
> > always the riskier approach. And from a first quick look at the
> > report, the right fix is not immediately glaringly obvious indeed.
> 
> One problem to me is that I have my hands significantly full, since the
> revert looks like the faster way out it looks the more appealing
> candidate to me.
> 
> WRT the other issue, I think the problem is in udp_tunnel_cleanup_gro();
> this check:
> 
>         if (!up->tunnel_list.pprev)
>                 return;
> 
> at sk deletion time is performed outside any lock. The current CPU could
> see the old list value (empty) even if another core previously added the
> sk into the UDP tunnel GRO list, thus skipping the removal from such
> list. Later list operation will do UaF while touching the same list.
> 
> Moving the check under the udp_tunnel_gro_lock spinlock should solve the
> issue.

FWIW, the full revert doesn't have to happen in the net-next timeframe.
Can always revert to that in -rcX. And try a forward fix first, after
the merge is over.
Sabrina Dubroca March 20, 2025, 2:44 p.m. UTC | #6
2025-03-19, 16:49:21 +0100, Paolo Abeni wrote:
> 
> 
> On 3/19/25 3:35 PM, Willem de Bruijn wrote:
> > Paolo Abeni wrote:
> >> The blamed commit below does not take in account that xfrm
> >> can enable GRO over UDP encapsulation without going through
> >> setup_udp_tunnel_sock().
> >>
> >> At deletion time such socket will still go through
> >> udp_tunnel_cleanup_gro(), and the failed GRO type lookup will
> >> trigger the reported warning.
> >>
> >> We can safely remove such warning, simply performing no action
> >> on failed GRO type lookup at deletion time.
> >>
> >> Reported-by: syzbot+8c469a2260132cd095c1@syzkaller.appspotmail.com
> >> Closes: https://syzkaller.appspot.com/bug?extid=8c469a2260132cd095c1
> >> Fixes: 311b36574ceac ("udp_tunnel: use static call for GRO hooks when possible")
> >> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > 
> > Because XFRM does not call udp_tunnel_update_gro_rcv when enabling its
> > UDP GRO offload, from set_xfrm_gro_udp_encap_rcv. But it does call it
> > when disabling the offload, as called for all udp sockest from
> > udp(v6)_destroy_sock. (Just to verify my understanding.)
> 
> Exactly.
> 
> > Not calling udp_tunnel_update_gro_rcv on add will have the unintended
> > side effect of enabling the static call if one other tunnel is also
> > active, breaking UDP GRO for XFRM socket, right?
> 
> Ouch, right again. I think we can/should do better.

We should be able to adapt xfrm to use setup_udp_tunnel_sock, but it's
not a simple conversion because GRO could be enabled separately from
the encap itself. I'm not sure there's much benefit except for a bit
more consistency when we enable the encap with GRO at once (but we'd
still have that odd set_xfrm_gro_udp_encap_rcv to enable GRO after
ESPINUDP has been set up). A few of the UDP encaps that precede
setup_udp_tunnel_sock have been converted, I don't know why ipsec was
left.
diff mbox series

Patch

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 088aa8cb8ac0c..2e0b52ae665bc 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -110,14 +110,7 @@  void udp_tunnel_update_gro_rcv(struct sock *sk, bool add)
 		cur = &udp_tunnel_gro_types[udp_tunnel_gro_type_nr++];
 		refcount_set(&cur->count, 1);
 		cur->gro_receive = up->gro_receive;
-	} else {
-		/*
-		 * The stack cleanups only successfully added tunnel, the
-		 * lookup on removal should never fail.
-		 */
-		if (WARN_ON_ONCE(!cur))
-			goto out;
-
+	} else if (cur) {
 		if (!refcount_dec_and_test(&cur->count))
 			goto out;