diff mbox series

[net,1/2] netfilter: br_netfilter: fix panic with metadata_dst skb

Message ID 20241001154400.22787-2-aroulin@nvidia.com (mailing list archive)
State Accepted
Commit f9ff7665cd128012868098bbd07e28993e314fdb
Delegated to: Netdev Maintainers
Headers show
Series netfilter: br_netfilter: fix panic with metadata_dst skb | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 3 maintainers not CCed: coreteam@netfilter.org netfilter-devel@vger.kernel.org bridge@lists.linux.dev
netdev/build_clang success Errors and warnings before: 9 this patch: 9
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn fail Errors and warnings before: 12 this patch: 12
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 17 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Andy Roulin Oct. 1, 2024, 3:43 p.m. UTC
Fix a kernel panic in the br_netfilter module when sending untagged
traffic via a VxLAN device.
This happens during the check for fragmentation in br_nf_dev_queue_xmit.

It is dependent on:
1) the br_netfilter module being loaded;
2) net.bridge.bridge-nf-call-iptables set to 1;
3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
4) untagged frames with size higher than the VxLAN MTU forwarded/flooded

When forwarding the untagged packet to the VxLAN bridge port, before
the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and
changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type
of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL.

Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check
for frames that needs to be fragmented: frames with higher MTU than the
VxLAN device end up calling br_nf_ip_fragment, which in turns call
ip_skb_dst_mtu.

The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst
with valid dst->dev, thus the crash.

This case was never supported in the first place, so drop the packet
instead.

PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
[  176.291791] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000110
[  176.292101] Mem abort info:
[  176.292184]   ESR = 0x0000000096000004
[  176.292322]   EC = 0x25: DABT (current EL), IL = 32 bits
[  176.292530]   SET = 0, FnV = 0
[  176.292709]   EA = 0, S1PTW = 0
[  176.292862]   FSC = 0x04: level 0 translation fault
[  176.293013] Data abort info:
[  176.293104]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  176.293488]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  176.293787]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
[  176.294166] [0000000000000110] pgd=0000000000000000,
p4d=0000000000000000
[  176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
br_netfilter bridge stp llc ipv6 crct10dif_ce
[  176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
6.8.0-rc3-g5b3fbd61b9d1 #2
[  176.296314] Hardware name: linux,dummy-virt (DT)
[  176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[  176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[  176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
[  176.297636] sp : ffff800080003630
[  176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
ffff6828c49ad9f8
[  176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
00000000000003e8
[  176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
ffff6828c3b16d28
[  176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
0000000000000014
[  176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
0000000095744632
[  176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
ffffb7e137926a70
[  176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
0000000000000000
[  176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
f20e0100bebafeca
[  176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
0000000000000000
[  176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
ffff6828c7f918f0
[  176.300889] Call trace:
[  176.301123]  br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[  176.301411]  br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
[  176.301703]  nf_hook_slow+0x48/0x124
[  176.302060]  br_forward_finish+0xc8/0xe8 [bridge]
[  176.302371]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[  176.302605]  br_nf_forward_finish+0x118/0x22c [br_netfilter]
[  176.302824]  br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
[  176.303136]  br_nf_forward+0x2b8/0x4e0 [br_netfilter]
[  176.303359]  nf_hook_slow+0x48/0x124
[  176.303803]  __br_forward+0xc4/0x194 [bridge]
[  176.304013]  br_flood+0xd4/0x168 [bridge]
[  176.304300]  br_handle_frame_finish+0x1d4/0x5c4 [bridge]
[  176.304536]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[  176.304978]  br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
[  176.305188]  br_nf_pre_routing+0x250/0x524 [br_netfilter]
[  176.305428]  br_handle_frame+0x244/0x3cc [bridge]
[  176.305695]  __netif_receive_skb_core.constprop.0+0x33c/0xecc
[  176.306080]  __netif_receive_skb_one_core+0x40/0x8c
[  176.306197]  __netif_receive_skb+0x18/0x64
[  176.306369]  process_backlog+0x80/0x124
[  176.306540]  __napi_poll+0x38/0x17c
[  176.306636]  net_rx_action+0x124/0x26c
[  176.306758]  __do_softirq+0x100/0x26c
[  176.307051]  ____do_softirq+0x10/0x1c
[  176.307162]  call_on_irq_stack+0x24/0x4c
[  176.307289]  do_softirq_own_stack+0x1c/0x2c
[  176.307396]  do_softirq+0x54/0x6c
[  176.307485]  __local_bh_enable_ip+0x8c/0x98
[  176.307637]  __dev_queue_xmit+0x22c/0xd28
[  176.307775]  neigh_resolve_output+0xf4/0x1a0
[  176.308018]  ip_finish_output2+0x1c8/0x628
[  176.308137]  ip_do_fragment+0x5b4/0x658
[  176.308279]  ip_fragment.constprop.0+0x48/0xec
[  176.308420]  __ip_finish_output+0xa4/0x254
[  176.308593]  ip_finish_output+0x34/0x130
[  176.308814]  ip_output+0x6c/0x108
[  176.308929]  ip_send_skb+0x50/0xf0
[  176.309095]  ip_push_pending_frames+0x30/0x54
[  176.309254]  raw_sendmsg+0x758/0xaec
[  176.309568]  inet_sendmsg+0x44/0x70
[  176.309667]  __sys_sendto+0x110/0x178
[  176.309758]  __arm64_sys_sendto+0x28/0x38
[  176.309918]  invoke_syscall+0x48/0x110
[  176.310211]  el0_svc_common.constprop.0+0x40/0xe0
[  176.310353]  do_el0_svc+0x1c/0x28
[  176.310434]  el0_svc+0x34/0xb4
[  176.310551]  el0t_64_sync_handler+0x120/0x12c
[  176.310690]  el0t_64_sync+0x190/0x194
[  176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
[  176.315743] ---[ end trace 0000000000000000 ]---
[  176.316060] Kernel panic - not syncing: Oops: Fatal exception in
interrupt
[  176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
[  176.316564] PHYS_OFFSET: 0xffff97d780000000
[  176.316782] CPU features: 0x0,88000203,3c020000,0100421b
[  176.317210] Memory Limit: none
[  176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
Exception in interrupt ]---\

Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
---
 net/bridge/br_netfilter_hooks.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Nikolay Aleksandrov Oct. 3, 2024, 12:12 p.m. UTC | #1
On 10/1/24 18:43, Andy Roulin wrote:
> Fix a kernel panic in the br_netfilter module when sending untagged
> traffic via a VxLAN device.
> This happens during the check for fragmentation in br_nf_dev_queue_xmit.
> 
> It is dependent on:
> 1) the br_netfilter module being loaded;
> 2) net.bridge.bridge-nf-call-iptables set to 1;
> 3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
> 4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
> 
> When forwarding the untagged packet to the VxLAN bridge port, before
> the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and
> changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type
> of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL.
> 
> Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check
> for frames that needs to be fragmented: frames with higher MTU than the
> VxLAN device end up calling br_nf_ip_fragment, which in turns call
> ip_skb_dst_mtu.
> 
> The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst
> with valid dst->dev, thus the crash.
> 
> This case was never supported in the first place, so drop the packet
> instead.
> 
> PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
> [  176.291791] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000110
> [  176.292101] Mem abort info:
> [  176.292184]   ESR = 0x0000000096000004
> [  176.292322]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  176.292530]   SET = 0, FnV = 0
> [  176.292709]   EA = 0, S1PTW = 0
> [  176.292862]   FSC = 0x04: level 0 translation fault
> [  176.293013] Data abort info:
> [  176.293104]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [  176.293488]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [  176.293787]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [  176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
> [  176.294166] [0000000000000110] pgd=0000000000000000,
> p4d=0000000000000000
> [  176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [  176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
> br_netfilter bridge stp llc ipv6 crct10dif_ce
> [  176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
> 6.8.0-rc3-g5b3fbd61b9d1 #2
> [  176.296314] Hardware name: linux,dummy-virt (DT)
> [  176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [  176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
> [  176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
> [  176.297636] sp : ffff800080003630
> [  176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
> ffff6828c49ad9f8
> [  176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
> 00000000000003e8
> [  176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
> ffff6828c3b16d28
> [  176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
> 0000000000000014
> [  176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
> 0000000095744632
> [  176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
> ffffb7e137926a70
> [  176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
> 0000000000000000
> [  176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
> f20e0100bebafeca
> [  176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
> 0000000000000000
> [  176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
> ffff6828c7f918f0
> [  176.300889] Call trace:
> [  176.301123]  br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
> [  176.301411]  br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
> [  176.301703]  nf_hook_slow+0x48/0x124
> [  176.302060]  br_forward_finish+0xc8/0xe8 [bridge]
> [  176.302371]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
> [  176.302605]  br_nf_forward_finish+0x118/0x22c [br_netfilter]
> [  176.302824]  br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
> [  176.303136]  br_nf_forward+0x2b8/0x4e0 [br_netfilter]
> [  176.303359]  nf_hook_slow+0x48/0x124
> [  176.303803]  __br_forward+0xc4/0x194 [bridge]
> [  176.304013]  br_flood+0xd4/0x168 [bridge]
> [  176.304300]  br_handle_frame_finish+0x1d4/0x5c4 [bridge]
> [  176.304536]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
> [  176.304978]  br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
> [  176.305188]  br_nf_pre_routing+0x250/0x524 [br_netfilter]
> [  176.305428]  br_handle_frame+0x244/0x3cc [bridge]
> [  176.305695]  __netif_receive_skb_core.constprop.0+0x33c/0xecc
> [  176.306080]  __netif_receive_skb_one_core+0x40/0x8c
> [  176.306197]  __netif_receive_skb+0x18/0x64
> [  176.306369]  process_backlog+0x80/0x124
> [  176.306540]  __napi_poll+0x38/0x17c
> [  176.306636]  net_rx_action+0x124/0x26c
> [  176.306758]  __do_softirq+0x100/0x26c
> [  176.307051]  ____do_softirq+0x10/0x1c
> [  176.307162]  call_on_irq_stack+0x24/0x4c
> [  176.307289]  do_softirq_own_stack+0x1c/0x2c
> [  176.307396]  do_softirq+0x54/0x6c
> [  176.307485]  __local_bh_enable_ip+0x8c/0x98
> [  176.307637]  __dev_queue_xmit+0x22c/0xd28
> [  176.307775]  neigh_resolve_output+0xf4/0x1a0
> [  176.308018]  ip_finish_output2+0x1c8/0x628
> [  176.308137]  ip_do_fragment+0x5b4/0x658
> [  176.308279]  ip_fragment.constprop.0+0x48/0xec
> [  176.308420]  __ip_finish_output+0xa4/0x254
> [  176.308593]  ip_finish_output+0x34/0x130
> [  176.308814]  ip_output+0x6c/0x108
> [  176.308929]  ip_send_skb+0x50/0xf0
> [  176.309095]  ip_push_pending_frames+0x30/0x54
> [  176.309254]  raw_sendmsg+0x758/0xaec
> [  176.309568]  inet_sendmsg+0x44/0x70
> [  176.309667]  __sys_sendto+0x110/0x178
> [  176.309758]  __arm64_sys_sendto+0x28/0x38
> [  176.309918]  invoke_syscall+0x48/0x110
> [  176.310211]  el0_svc_common.constprop.0+0x40/0xe0
> [  176.310353]  do_el0_svc+0x1c/0x28
> [  176.310434]  el0_svc+0x34/0xb4
> [  176.310551]  el0t_64_sync_handler+0x120/0x12c
> [  176.310690]  el0t_64_sync+0x190/0x194
> [  176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
> [  176.315743] ---[ end trace 0000000000000000 ]---
> [  176.316060] Kernel panic - not syncing: Oops: Fatal exception in
> interrupt
> [  176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
> [  176.316564] PHYS_OFFSET: 0xffff97d780000000
> [  176.316782] CPU features: 0x0,88000203,3c020000,0100421b
> [  176.317210] Memory Limit: none
> [  176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
> Exception in interrupt ]---\
> 
> Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> Signed-off-by: Andy Roulin <aroulin@nvidia.com>
> ---
>  net/bridge/br_netfilter_hooks.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
> index 0e8bc0ea6175..1d458e9da660 100644
> --- a/net/bridge/br_netfilter_hooks.c
> +++ b/net/bridge/br_netfilter_hooks.c
> @@ -33,6 +33,7 @@
>  #include <net/ip.h>
>  #include <net/ipv6.h>
>  #include <net/addrconf.h>
> +#include <net/dst_metadata.h>
>  #include <net/route.h>
>  #include <net/netfilter/br_netfilter.h>
>  #include <net/netns/generic.h>
> @@ -879,6 +880,10 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
>  		return br_dev_queue_push_xmit(net, sk, skb);
>  	}
>  
> +	/* Fragmentation on metadata/template dst is not supported */
> +	if (unlikely(!skb_valid_dst(skb)))> +		goto drop;
> +
>  	/* This is wrong! We should preserve the original fragment
>  	 * boundaries by preserving frag_list rather than refragmenting.
>  	 */

This helper's name is a bit misleading. :) But looking at it, it seems
only metadata dsts are not considered valid, so looks good to me.

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
diff mbox series

Patch

diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 0e8bc0ea6175..1d458e9da660 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -33,6 +33,7 @@ 
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/addrconf.h>
+#include <net/dst_metadata.h>
 #include <net/route.h>
 #include <net/netfilter/br_netfilter.h>
 #include <net/netns/generic.h>
@@ -879,6 +880,10 @@  static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
 		return br_dev_queue_push_xmit(net, sk, skb);
 	}
 
+	/* Fragmentation on metadata/template dst is not supported */
+	if (unlikely(!skb_valid_dst(skb)))
+		goto drop;
+
 	/* This is wrong! We should preserve the original fragment
 	 * boundaries by preserving frag_list rather than refragmenting.
 	 */