diff mbox series

[net] bpf: devmap: provide rxq after redirect

Message ID 20240905-devel-koalo-fix-ingress-ifindex-v1-1-d12a0d74c29c@linutronix.de (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] bpf: devmap: provide rxq after redirect | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 18 of 18 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch warning WARNING: line length of 90 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2024-09-05--18-00 (tests: 722)

Commit Message

Florian Kauer Sept. 5, 2024, 1:58 p.m. UTC
rxq contains a pointer to the device from where
the redirect happened. Currently, the BPF program
that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
does not have it set.

This is particularly bad since accessing ingress_ifindex, e.g.

SEC("xdp")
int prog(struct xdp_md *pkt)
{
        return bpf_redirect_map(&dev_redirect_map, 0, 0);
}

SEC("xdp/devmap")
int prog_after_redirect(struct xdp_md *pkt)
{
        bpf_printk("ifindex %i", pkt->ingress_ifindex);
        return XDP_PASS;
}

depends on access to rxq, so a NULL pointer gets dereferenced:

<1>[  574.475170] BUG: kernel NULL pointer dereference, address: 0000000000000000
<1>[  574.475188] #PF: supervisor read access in kernel mode
<1>[  574.475194] #PF: error_code(0x0000) - not-present page
<6>[  574.475199] PGD 0 P4D 0
<4>[  574.475207] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
<4>[  574.475217] CPU: 4 UID: 0 PID: 217 Comm: kworker/4:1 Not tainted 6.11.0-rc5-reduced-00859-g780801200300 #23
<4>[  574.475226] Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, BIOS ANRPL357.0026.2023.0314.1458 03/14/2023
<4>[  574.475231] Workqueue: mld mld_ifc_work
<4>[  574.475247] RIP: 0010:bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
<4>[  574.475257] Code: cc cc cc cc cc cc cc 80 00 00 00 cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 57 20 <48> 8b 52 00 8b 92 e0 00 00 00 48 bf f8 a6 d5 c4 5d a0 ff ff be 0b
<4>[  574.475263] RSP: 0018:ffffa62440280c98 EFLAGS: 00010206
<4>[  574.475269] RAX: ffffa62440280cd8 RBX: 0000000000000001 RCX: 0000000000000000
<4>[  574.475274] RDX: 0000000000000000 RSI: ffffa62440549048 RDI: ffffa62440280ce0
<4>[  574.475278] RBP: ffffa62440280c98 R08: 0000000000000002 R09: 0000000000000001
<4>[  574.475281] R10: ffffa05dc8b98000 R11: ffffa05f577fca40 R12: ffffa05dcab24000
<4>[  574.475285] R13: ffffa62440280ce0 R14: ffffa62440549048 R15: ffffa62440549000
<4>[  574.475289] FS:  0000000000000000(0000) GS:ffffa05f4f700000(0000) knlGS:0000000000000000
<4>[  574.475294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  574.475298] CR2: 0000000000000000 CR3: 000000025522e000 CR4: 0000000000f50ef0
<4>[  574.475303] PKRU: 55555554
<4>[  574.475306] Call Trace:
<4>[  574.475313]  <IRQ>
<4>[  574.475318]  ? __die+0x23/0x70
<4>[  574.475329]  ? page_fault_oops+0x180/0x4c0
<4>[  574.475339]  ? skb_pp_cow_data+0x34c/0x490
<4>[  574.475346]  ? kmem_cache_free+0x257/0x280
<4>[  574.475357]  ? exc_page_fault+0x67/0x150
<4>[  574.475368]  ? asm_exc_page_fault+0x26/0x30
<4>[  574.475381]  ? bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
<4>[  574.475386]  bq_xmit_all+0x158/0x420
<4>[  574.475397]  __dev_flush+0x30/0x90
<4>[  574.475407]  veth_poll+0x216/0x250 [veth]
<4>[  574.475421]  __napi_poll+0x28/0x1c0
<4>[  574.475430]  net_rx_action+0x32d/0x3a0
<4>[  574.475441]  handle_softirqs+0xcb/0x2c0
<4>[  574.475451]  do_softirq+0x40/0x60
<4>[  574.475458]  </IRQ>
<4>[  574.475461]  <TASK>
<4>[  574.475464]  __local_bh_enable_ip+0x66/0x70
<4>[  574.475471]  __dev_queue_xmit+0x268/0xe40
<4>[  574.475480]  ? selinux_ip_postroute+0x213/0x420
<4>[  574.475491]  ? alloc_skb_with_frags+0x4a/0x1d0
<4>[  574.475502]  ip6_finish_output2+0x2be/0x640
<4>[  574.475512]  ? nf_hook_slow+0x42/0xf0
<4>[  574.475521]  ip6_finish_output+0x194/0x300
<4>[  574.475529]  ? __pfx_ip6_finish_output+0x10/0x10
<4>[  574.475538]  mld_sendpack+0x17c/0x240
<4>[  574.475548]  mld_ifc_work+0x192/0x410
<4>[  574.475557]  process_one_work+0x15d/0x380
<4>[  574.475566]  worker_thread+0x29d/0x3a0
<4>[  574.475573]  ? __pfx_worker_thread+0x10/0x10
<4>[  574.475580]  ? __pfx_worker_thread+0x10/0x10
<4>[  574.475587]  kthread+0xcd/0x100
<4>[  574.475597]  ? __pfx_kthread+0x10/0x10
<4>[  574.475606]  ret_from_fork+0x31/0x50
<4>[  574.475615]  ? __pfx_kthread+0x10/0x10
<4>[  574.475623]  ret_from_fork_asm+0x1a/0x30
<4>[  574.475635]  </TASK>
<4>[  574.475637] Modules linked in: veth br_netfilter bridge stp llc iwlmvm x86_pkg_temp_thermal iwlwifi efivarfs nvme nvme_core
<4>[  574.475662] CR2: 0000000000000000
<4>[  574.475668] ---[ end trace 0000000000000000 ]---

Therefore, provide it to the program by setting rxq properly.

Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
---
 kernel/bpf/devmap.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)


---
base-commit: 8e69c96df771ab469cec278edb47009351de4da6
change-id: 20240905-devel-koalo-fix-ingress-ifindex-b9293d471db6

Best regards,

Comments

Toke Høiland-Jørgensen Sept. 5, 2024, 3:15 p.m. UTC | #1
Florian Kauer <florian.kauer@linutronix.de> writes:

> rxq contains a pointer to the device from where
> the redirect happened. Currently, the BPF program
> that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
> does not have it set.
>
> This is particularly bad since accessing ingress_ifindex, e.g.
>
> SEC("xdp")
> int prog(struct xdp_md *pkt)
> {
>         return bpf_redirect_map(&dev_redirect_map, 0, 0);
> }
>
> SEC("xdp/devmap")
> int prog_after_redirect(struct xdp_md *pkt)
> {
>         bpf_printk("ifindex %i", pkt->ingress_ifindex);
>         return XDP_PASS;
> }
>
> depends on access to rxq, so a NULL pointer gets dereferenced:
>
> <1>[  574.475170] BUG: kernel NULL pointer dereference, address: 0000000000000000
> <1>[  574.475188] #PF: supervisor read access in kernel mode
> <1>[  574.475194] #PF: error_code(0x0000) - not-present page
> <6>[  574.475199] PGD 0 P4D 0
> <4>[  574.475207] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> <4>[  574.475217] CPU: 4 UID: 0 PID: 217 Comm: kworker/4:1 Not tainted 6.11.0-rc5-reduced-00859-g780801200300 #23
> <4>[  574.475226] Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, BIOS ANRPL357.0026.2023.0314.1458 03/14/2023
> <4>[  574.475231] Workqueue: mld mld_ifc_work
> <4>[  574.475247] RIP: 0010:bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
> <4>[  574.475257] Code: cc cc cc cc cc cc cc 80 00 00 00 cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 57 20 <48> 8b 52 00 8b 92 e0 00 00 00 48 bf f8 a6 d5 c4 5d a0 ff ff be 0b
> <4>[  574.475263] RSP: 0018:ffffa62440280c98 EFLAGS: 00010206
> <4>[  574.475269] RAX: ffffa62440280cd8 RBX: 0000000000000001 RCX: 0000000000000000
> <4>[  574.475274] RDX: 0000000000000000 RSI: ffffa62440549048 RDI: ffffa62440280ce0
> <4>[  574.475278] RBP: ffffa62440280c98 R08: 0000000000000002 R09: 0000000000000001
> <4>[  574.475281] R10: ffffa05dc8b98000 R11: ffffa05f577fca40 R12: ffffa05dcab24000
> <4>[  574.475285] R13: ffffa62440280ce0 R14: ffffa62440549048 R15: ffffa62440549000
> <4>[  574.475289] FS:  0000000000000000(0000) GS:ffffa05f4f700000(0000) knlGS:0000000000000000
> <4>[  574.475294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[  574.475298] CR2: 0000000000000000 CR3: 000000025522e000 CR4: 0000000000f50ef0
> <4>[  574.475303] PKRU: 55555554
> <4>[  574.475306] Call Trace:
> <4>[  574.475313]  <IRQ>
> <4>[  574.475318]  ? __die+0x23/0x70
> <4>[  574.475329]  ? page_fault_oops+0x180/0x4c0
> <4>[  574.475339]  ? skb_pp_cow_data+0x34c/0x490
> <4>[  574.475346]  ? kmem_cache_free+0x257/0x280
> <4>[  574.475357]  ? exc_page_fault+0x67/0x150
> <4>[  574.475368]  ? asm_exc_page_fault+0x26/0x30
> <4>[  574.475381]  ? bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
> <4>[  574.475386]  bq_xmit_all+0x158/0x420
> <4>[  574.475397]  __dev_flush+0x30/0x90
> <4>[  574.475407]  veth_poll+0x216/0x250 [veth]
> <4>[  574.475421]  __napi_poll+0x28/0x1c0
> <4>[  574.475430]  net_rx_action+0x32d/0x3a0
> <4>[  574.475441]  handle_softirqs+0xcb/0x2c0
> <4>[  574.475451]  do_softirq+0x40/0x60
> <4>[  574.475458]  </IRQ>
> <4>[  574.475461]  <TASK>
> <4>[  574.475464]  __local_bh_enable_ip+0x66/0x70
> <4>[  574.475471]  __dev_queue_xmit+0x268/0xe40
> <4>[  574.475480]  ? selinux_ip_postroute+0x213/0x420
> <4>[  574.475491]  ? alloc_skb_with_frags+0x4a/0x1d0
> <4>[  574.475502]  ip6_finish_output2+0x2be/0x640
> <4>[  574.475512]  ? nf_hook_slow+0x42/0xf0
> <4>[  574.475521]  ip6_finish_output+0x194/0x300
> <4>[  574.475529]  ? __pfx_ip6_finish_output+0x10/0x10
> <4>[  574.475538]  mld_sendpack+0x17c/0x240
> <4>[  574.475548]  mld_ifc_work+0x192/0x410
> <4>[  574.475557]  process_one_work+0x15d/0x380
> <4>[  574.475566]  worker_thread+0x29d/0x3a0
> <4>[  574.475573]  ? __pfx_worker_thread+0x10/0x10
> <4>[  574.475580]  ? __pfx_worker_thread+0x10/0x10
> <4>[  574.475587]  kthread+0xcd/0x100
> <4>[  574.475597]  ? __pfx_kthread+0x10/0x10
> <4>[  574.475606]  ret_from_fork+0x31/0x50
> <4>[  574.475615]  ? __pfx_kthread+0x10/0x10
> <4>[  574.475623]  ret_from_fork_asm+0x1a/0x30
> <4>[  574.475635]  </TASK>
> <4>[  574.475637] Modules linked in: veth br_netfilter bridge stp llc iwlmvm x86_pkg_temp_thermal iwlwifi efivarfs nvme nvme_core
> <4>[  574.475662] CR2: 0000000000000000
> <4>[  574.475668] ---[ end trace 0000000000000000 ]---

Yikes! I wonder how that has gone unnoticed this long. Could you please
add a selftest for this so it doesn't happen again?

> Therefore, provide it to the program by setting rxq properly.
>
> Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry")

I think the fixes tag is wrong; the original commit was fine because the
xdp_buff was still the one initialised by the driver. So this should be:

Fixes: cb261b594b41 ("bpf: Run devmap xdp_prog on flush instead of bulk enqueue")

Other than that, though, the patch LGTM:

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Florian Kauer Sept. 5, 2024, 3:22 p.m. UTC | #2
On 9/5/24 17:15, Toke Høiland-Jørgensen wrote:
> Florian Kauer <florian.kauer@linutronix.de> writes:
> 
>> rxq contains a pointer to the device from where
>> the redirect happened. Currently, the BPF program
>> that was executed after a redirect via BPF_MAP_TYPE_DEVMAP*
>> does not have it set.
>>
>> This is particularly bad since accessing ingress_ifindex, e.g.
>>
>> SEC("xdp")
>> int prog(struct xdp_md *pkt)
>> {
>>         return bpf_redirect_map(&dev_redirect_map, 0, 0);
>> }
>>
>> SEC("xdp/devmap")
>> int prog_after_redirect(struct xdp_md *pkt)
>> {
>>         bpf_printk("ifindex %i", pkt->ingress_ifindex);
>>         return XDP_PASS;
>> }
>>
>> depends on access to rxq, so a NULL pointer gets dereferenced:
>>
>> <1>[  574.475170] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> <1>[  574.475188] #PF: supervisor read access in kernel mode
>> <1>[  574.475194] #PF: error_code(0x0000) - not-present page
>> <6>[  574.475199] PGD 0 P4D 0
>> <4>[  574.475207] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
>> <4>[  574.475217] CPU: 4 UID: 0 PID: 217 Comm: kworker/4:1 Not tainted 6.11.0-rc5-reduced-00859-g780801200300 #23
>> <4>[  574.475226] Hardware name: Intel(R) Client Systems NUC13ANHi7/NUC13ANBi7, BIOS ANRPL357.0026.2023.0314.1458 03/14/2023
>> <4>[  574.475231] Workqueue: mld mld_ifc_work
>> <4>[  574.475247] RIP: 0010:bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
>> <4>[  574.475257] Code: cc cc cc cc cc cc cc 80 00 00 00 cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 57 20 <48> 8b 52 00 8b 92 e0 00 00 00 48 bf f8 a6 d5 c4 5d a0 ff ff be 0b
>> <4>[  574.475263] RSP: 0018:ffffa62440280c98 EFLAGS: 00010206
>> <4>[  574.475269] RAX: ffffa62440280cd8 RBX: 0000000000000001 RCX: 0000000000000000
>> <4>[  574.475274] RDX: 0000000000000000 RSI: ffffa62440549048 RDI: ffffa62440280ce0
>> <4>[  574.475278] RBP: ffffa62440280c98 R08: 0000000000000002 R09: 0000000000000001
>> <4>[  574.475281] R10: ffffa05dc8b98000 R11: ffffa05f577fca40 R12: ffffa05dcab24000
>> <4>[  574.475285] R13: ffffa62440280ce0 R14: ffffa62440549048 R15: ffffa62440549000
>> <4>[  574.475289] FS:  0000000000000000(0000) GS:ffffa05f4f700000(0000) knlGS:0000000000000000
>> <4>[  574.475294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> <4>[  574.475298] CR2: 0000000000000000 CR3: 000000025522e000 CR4: 0000000000f50ef0
>> <4>[  574.475303] PKRU: 55555554
>> <4>[  574.475306] Call Trace:
>> <4>[  574.475313]  <IRQ>
>> <4>[  574.475318]  ? __die+0x23/0x70
>> <4>[  574.475329]  ? page_fault_oops+0x180/0x4c0
>> <4>[  574.475339]  ? skb_pp_cow_data+0x34c/0x490
>> <4>[  574.475346]  ? kmem_cache_free+0x257/0x280
>> <4>[  574.475357]  ? exc_page_fault+0x67/0x150
>> <4>[  574.475368]  ? asm_exc_page_fault+0x26/0x30
>> <4>[  574.475381]  ? bpf_prog_5e13354d9cf5018a_prog_after_redirect+0x17/0x3c
>> <4>[  574.475386]  bq_xmit_all+0x158/0x420
>> <4>[  574.475397]  __dev_flush+0x30/0x90
>> <4>[  574.475407]  veth_poll+0x216/0x250 [veth]
>> <4>[  574.475421]  __napi_poll+0x28/0x1c0
>> <4>[  574.475430]  net_rx_action+0x32d/0x3a0
>> <4>[  574.475441]  handle_softirqs+0xcb/0x2c0
>> <4>[  574.475451]  do_softirq+0x40/0x60
>> <4>[  574.475458]  </IRQ>
>> <4>[  574.475461]  <TASK>
>> <4>[  574.475464]  __local_bh_enable_ip+0x66/0x70
>> <4>[  574.475471]  __dev_queue_xmit+0x268/0xe40
>> <4>[  574.475480]  ? selinux_ip_postroute+0x213/0x420
>> <4>[  574.475491]  ? alloc_skb_with_frags+0x4a/0x1d0
>> <4>[  574.475502]  ip6_finish_output2+0x2be/0x640
>> <4>[  574.475512]  ? nf_hook_slow+0x42/0xf0
>> <4>[  574.475521]  ip6_finish_output+0x194/0x300
>> <4>[  574.475529]  ? __pfx_ip6_finish_output+0x10/0x10
>> <4>[  574.475538]  mld_sendpack+0x17c/0x240
>> <4>[  574.475548]  mld_ifc_work+0x192/0x410
>> <4>[  574.475557]  process_one_work+0x15d/0x380
>> <4>[  574.475566]  worker_thread+0x29d/0x3a0
>> <4>[  574.475573]  ? __pfx_worker_thread+0x10/0x10
>> <4>[  574.475580]  ? __pfx_worker_thread+0x10/0x10
>> <4>[  574.475587]  kthread+0xcd/0x100
>> <4>[  574.475597]  ? __pfx_kthread+0x10/0x10
>> <4>[  574.475606]  ret_from_fork+0x31/0x50
>> <4>[  574.475615]  ? __pfx_kthread+0x10/0x10
>> <4>[  574.475623]  ret_from_fork_asm+0x1a/0x30
>> <4>[  574.475635]  </TASK>
>> <4>[  574.475637] Modules linked in: veth br_netfilter bridge stp llc iwlmvm x86_pkg_temp_thermal iwlwifi efivarfs nvme nvme_core
>> <4>[  574.475662] CR2: 0000000000000000
>> <4>[  574.475668] ---[ end trace 0000000000000000 ]---
> 
> Yikes! I wonder how that has gone unnoticed this long. Could you please
> add a selftest for this so it doesn't happen again?

Yes, I will try to construct one.

> 
>> Therefore, provide it to the program by setting rxq properly.
>>
>> Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry")
> 
> I think the fixes tag is wrong; the original commit was fine because the
> xdp_buff was still the one initialised by the driver. So this should be:
> 
> Fixes: cb261b594b41 ("bpf: Run devmap xdp_prog on flush instead of bulk enqueue")

Oh, I agree. On first glance it looked like the old dev_map_run_prog
was also only setting txq, but didn't notice that it was just overwriting
in the original xdp_buff.

> 
> Other than that, though, the patch LGTM:
> 
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> 

Thanks,
Florian
diff mbox series

Patch

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 9e0e3b0a18e4..7878be18e9d2 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -333,9 +333,11 @@  static int dev_map_hash_get_next_key(struct bpf_map *map, void *key,
 
 static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog,
 				struct xdp_frame **frames, int n,
-				struct net_device *dev)
+				struct net_device *tx_dev,
+				struct net_device *rx_dev)
 {
-	struct xdp_txq_info txq = { .dev = dev };
+	struct xdp_txq_info txq = { .dev = tx_dev };
+	struct xdp_rxq_info rxq = { .dev = rx_dev };
 	struct xdp_buff xdp;
 	int i, nframes = 0;
 
@@ -346,6 +348,7 @@  static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog,
 
 		xdp_convert_frame_to_buff(xdpf, &xdp);
 		xdp.txq = &txq;
+		xdp.rxq = &rxq;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		switch (act) {
@@ -360,7 +363,7 @@  static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog,
 			bpf_warn_invalid_xdp_action(NULL, xdp_prog, act);
 			fallthrough;
 		case XDP_ABORTED:
-			trace_xdp_exception(dev, xdp_prog, act);
+			trace_xdp_exception(tx_dev, xdp_prog, act);
 			fallthrough;
 		case XDP_DROP:
 			xdp_return_frame_rx_napi(xdpf);
@@ -388,7 +391,7 @@  static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
 	}
 
 	if (bq->xdp_prog) {
-		to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev);
+		to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev, bq->dev_rx);
 		if (!to_send)
 			goto out;
 	}