Message ID | 5627f6d1-5491-4462-9d75-bc0612c26a22@app.fastmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | NULL pointer deref inside xdp_do_flush due to bpf_net_ctx_get_all_used_flush_lists | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On 2024-10-01 13:30:13 [+0200], Yury Vostrikov wrote: > Hi, Hi, … > I get the following backtrace instead of crash: > > [ 177.216427] ------------[ cut here ]------------ … > [ 177.216464] Call Trace: > [ 177.216464] <TASK> > [ 177.216474] efx_poll+0x178/0x380 [sfc_siena] > [ 177.216479] netpoll_poll_dev+0x118/0x1b0 > [ 177.216481] __netpoll_send_skb+0x1ae/0x240 > [ 177.216482] netpoll_send_udp+0x2e5/0x400 > [ 177.216484] write_msg+0xeb/0x100 [netconsole] > [ 177.216486] console_flush_all+0x261/0x440 > [ 177.216489] console_unlock+0x71/0xf0 > [ 177.216490] vprintk_emit+0x251/0x2b0 > [ 177.216491] _printk+0x48/0x50 … > I'm out of my depth figuring out why bpf_net_ctx_get() returns NULL. For now I'm simply running with NULL check enabled. netpoll_send_udp() Does not assign a context and invokes a NAPI poll. However with a budget of 0 to just clean the TX resources. Now, the SFC driver does not clean any RX packets but invokes xdp_do_flush() anyway which leads to the crash later on. Are the SFC maintainer against the following: diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index c9e17a8208a90..f3288e02c1bd8 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -1260,7 +1260,8 @@ static int efx_poll(struct napi_struct *napi, int budget) spent = efx_process_channel(channel, budget); - xdp_do_flush(); + if (spent) + xdp_do_flush(); if (spent < budget) { if (efx_channel_has_rx_queue(channel) && diff --git a/drivers/net/ethernet/sfc/siena/efx_channels.c b/drivers/net/ethernet/sfc/siena/efx_channels.c index a7346e965bfe7..2b8b7c69bd7ae 100644 --- a/drivers/net/ethernet/sfc/siena/efx_channels.c +++ b/drivers/net/ethernet/sfc/siena/efx_channels.c @@ -1285,7 +1285,8 @@ static int efx_poll(struct napi_struct *napi, int budget) spent = efx_process_channel(channel, budget); - xdp_do_flush(); + if (spent) + xdp_do_flush(); if (spent < budget) { if (efx_channel_has_rx_queue(channel) && This should fix the crash. As an alternative we could keep track of channel->n_rx_xdp_redirect before and after the efx_process_channel() invocation to avoid the flush if there is no XDP done. Sebastian
"Yury Vostrikov" <mon@unformed.ru> writes: > Hi, > > I stumbled upon a NULL pointer derefence inside BPF code. The triggering condition is > message from OOM killer + netconsole. The crash happens at > > u32 kern_flags = bpf_net_ctx->ri.kern_flags; > > line of bpf_net_ctx_get_all_used_flush_lists() function. bpf_net_ctx is NULL here. With trivial fix > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index 7d7578a8eac1..cba16bf307f7 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -844,6 +844,9 @@ static inline void bpf_net_ctx_get_all_used_flush_lists(struct list_head **lh_ma > struct list_head **lh_xsk) > { > struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); > + WARN_ON(bpf_net_ctx == NULL); > + if (bpf_net_ctx == NULL) > + return; > u32 kern_flags = bpf_net_ctx->ri.kern_flags; > struct list_head *lh; > > I get the following backtrace instead of crash: [...] > [ 177.216474] efx_poll+0x178/0x380 [sfc_siena] Looks like the sfc driver is missing the context setup stuff entirely... -Toke
On 01/10/2024 14:36, Sebastian Andrzej Siewior wrote: > diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c > index c9e17a8208a90..f3288e02c1bd8 100644 > --- a/drivers/net/ethernet/sfc/efx_channels.c > +++ b/drivers/net/ethernet/sfc/efx_channels.c > @@ -1260,7 +1260,8 @@ static int efx_poll(struct napi_struct *napi, int budget) > > spent = efx_process_channel(channel, budget); > > - xdp_do_flush(); > + if (spent) > + xdp_do_flush(); > > if (spent < budget) { > if (efx_channel_has_rx_queue(channel) && Seems reasonable to me. Another alternative is to look at budget rather than spent, as that seems like the condition that drives whether we have a real XDP. -ed
On 2024-10-01 15:01:25 [+0100], Edward Cree wrote: > On 01/10/2024 14:36, Sebastian Andrzej Siewior wrote: > > diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c > > index c9e17a8208a90..f3288e02c1bd8 100644 > > --- a/drivers/net/ethernet/sfc/efx_channels.c > > +++ b/drivers/net/ethernet/sfc/efx_channels.c > > @@ -1260,7 +1260,8 @@ static int efx_poll(struct napi_struct *napi, int budget) > > > > spent = efx_process_channel(channel, budget); > > > > - xdp_do_flush(); > > + if (spent) > > + xdp_do_flush(); > > > > if (spent < budget) { > > if (efx_channel_has_rx_queue(channel) && > > Seems reasonable to me. > Another alternative is to look at budget rather than spent, > as that seems like the condition that drives whether we > have a real XDP. If you prefer. What are chances that you had budget but cleaned nothing? > -ed Sebastian
diff --git a/include/linux/filter.h b/include/linux/filter.h index 7d7578a8eac1..cba16bf307f7 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -844,6 +844,9 @@ static inline void bpf_net_ctx_get_all_used_flush_lists(struct list_head **lh_ma struct list_head **lh_xsk) { struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); + WARN_ON(bpf_net_ctx == NULL); + if (bpf_net_ctx == NULL) + return; u32 kern_flags = bpf_net_ctx->ri.kern_flags; struct list_head *lh;