Message ID | f6b3a268be868e9a528f2549392bf2bdf16e285d.1707848297.git.thinhtr@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | bnx2x: Fix error recovering in switch configuration | expand |
On 2/13/2024 10:32 AM, Thinh Tran wrote: > Fix race condition leading to system crash during EEH error handling > > During EEH error recovery, the bnx2x driver's transmit timeout logic > could cause a race condition when handling reset tasks. The > bnx2x_tx_timeout() schedules reset tasks via bnx2x_sp_rtnl_task(), > which ultimately leads to bnx2x_nic_unload(). In bnx2x_nic_unload() > SGEs are freed using bnx2x_free_rx_sge_range(). However, this could > overlap with the EEH driver's attempt to reset the device using > bnx2x_io_slot_reset(), which also frees SGEs. This race condition can > result in system crashes due to accessing freed memory locations. > > [ 793.003930] EEH: Beginning: 'slot_reset' > [ 793.003937] PCI 0011:01:00.0#10000: EEH: Invoking bnx2x->slot_reset() > [ 793.003939] bnx2x: [bnx2x_io_slot_reset:14228(eth1)]IO slot reset initializing... > [ 793.004037] bnx2x 0011:01:00.0: enabling device (0140 -> 0142) > [ 793.008839] bnx2x: [bnx2x_io_slot_reset:14244(eth1)]IO slot reset --> driver unload > [ 793.122134] Kernel attempted to read user page (0) - exploit attempt? (uid: 0) > [ 793.122143] BUG: Kernel NULL pointer dereference on read at 0x00000000 > [ 793.122147] Faulting instruction address: 0xc0080000025065fc > [ 793.122152] Oops: Kernel access of bad area, sig: 11 [#1] > ..... > [ 793.122315] Call Trace: > [ 793.122318] [c000000003c67a20] [c00800000250658c] bnx2x_io_slot_reset+0x204/0x610 [bnx2x] (unreliable) > [ 793.122331] [c000000003c67af0] [c0000000000518a8] eeh_report_reset+0xb8/0xf0 > [ 793.122338] [c000000003c67b60] [c000000000052130] eeh_pe_report+0x180/0x550 > [ 793.122342] [c000000003c67c70] [c00000000005318c] eeh_handle_normal_event+0x84c/0xa60 > [ 793.122347] [c000000003c67d50] [c000000000053a84] eeh_event_handler+0xf4/0x170 > [ 793.122352] [c000000003c67da0] [c000000000194c58] kthread+0x1c8/0x1d0 > [ 793.122356] [c000000003c67e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64 > > To solve this issue, we need to verify page pool allocations before > freeing. > > Fixes: 4cace675d687 ("bnx2x: Alloc 4k fragment for each rx ring buffer element") > > Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com> > > > --- > drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h > index d8b1824c334d..0bc1367fd649 100644 > --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h > +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h > @@ -1002,9 +1002,6 @@ static inline void bnx2x_set_fw_mac_addr(__le16 *fw_hi, __le16 *fw_mid, > static inline void bnx2x_free_rx_mem_pool(struct bnx2x *bp, > struct bnx2x_alloc_pool *pool) > { > - if (!pool->page) > - return; > - > put_page(pool->page); > > pool->page = NULL; > @@ -1015,6 +1012,9 @@ static inline void bnx2x_free_rx_sge_range(struct bnx2x *bp, > { > int i; > > + if (!fp->page_pool.page) > + return; > + Doesn't this still leave a race window where put_page was already called but page hasn't yet been set NULL? I think you either need to assign NULL first (and possibly WRITE_ONCE or a barrier depending on platform?) or some other serialization mechanism to ensure only one thread runs here? I guess the issue you're seeing is that bnx2x_free_rx_sge_range calls bnx2x_free_rx_sge even if the page was already removed? Does that mean you already have some other serialization ensuring that you can't have both threads call put_page simultaneously? > if (fp->mode == TPA_MODE_DISABLED) > return; >
Apologies for the delayed response. I did not receive this email and some others in my mailbox. > Doesn't this still leave a race window where put_page was already called > but page hasn't yet been set NULL? I think you either need to assign > NULL first (and possibly WRITE_ONCE or a barrier depending on platform?) > or some other serialization mechanism to ensure only one thread runs here? > > I guess the issue you're seeing is that bnx2x_free_rx_sge_range calls > bnx2x_free_rx_sge even if the page was already removed? Does that mean yes > you already have some other serialization ensuring that you can't have > both threads call put_page simultaneously? The callers to bnx2x_free_rx_sge_range() are under rtnl_lock(), which should handle the serialization. The crash occurs in the bnx2x_free_rx_sge() function due to accessing a NULL pointer. 799 static inline void bnx2x_free_rx_sge(struct bnx2x *bp, 800 struct bnx2x_fastpath *fp, u16 index) 801 { 802 struct sw_rx_page *sw_buf = &fp->rx_page_ring[index]; 803 struct page *page = sw_buf->page; 804 struct eth_rx_sge *sge = &fp->rx_sge_ring[index]; ..... 810 /* Since many fragments can share the same page, make sure to 811 * only unmap and free the page once. 812 */ 813 dma_unmap_page(&bp->pdev->dev, dma_unmap_addr(sw_buf, mapping), 814 SGE_PAGE_SIZE, DMA_FROM_DEVICE); 815 816 put_page(page); ... } This happens because sw_buf was set to NULL after the call to dma_unmap_page(), called by the preceding thread. The patch checking if that page in the pool is already freed, there is nothing else to do. Thinh Tran
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h index d8b1824c334d..0bc1367fd649 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h @@ -1002,9 +1002,6 @@ static inline void bnx2x_set_fw_mac_addr(__le16 *fw_hi, __le16 *fw_mid, static inline void bnx2x_free_rx_mem_pool(struct bnx2x *bp, struct bnx2x_alloc_pool *pool) { - if (!pool->page) - return; - put_page(pool->page); pool->page = NULL; @@ -1015,6 +1012,9 @@ static inline void bnx2x_free_rx_sge_range(struct bnx2x *bp, { int i; + if (!fp->page_pool.page) + return; + if (fp->mode == TPA_MODE_DISABLED) return;
Fix race condition leading to system crash during EEH error handling During EEH error recovery, the bnx2x driver's transmit timeout logic could cause a race condition when handling reset tasks. The bnx2x_tx_timeout() schedules reset tasks via bnx2x_sp_rtnl_task(), which ultimately leads to bnx2x_nic_unload(). In bnx2x_nic_unload() SGEs are freed using bnx2x_free_rx_sge_range(). However, this could overlap with the EEH driver's attempt to reset the device using bnx2x_io_slot_reset(), which also frees SGEs. This race condition can result in system crashes due to accessing freed memory locations. [ 793.003930] EEH: Beginning: 'slot_reset' [ 793.003937] PCI 0011:01:00.0#10000: EEH: Invoking bnx2x->slot_reset() [ 793.003939] bnx2x: [bnx2x_io_slot_reset:14228(eth1)]IO slot reset initializing... [ 793.004037] bnx2x 0011:01:00.0: enabling device (0140 -> 0142) [ 793.008839] bnx2x: [bnx2x_io_slot_reset:14244(eth1)]IO slot reset --> driver unload [ 793.122134] Kernel attempted to read user page (0) - exploit attempt? (uid: 0) [ 793.122143] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 793.122147] Faulting instruction address: 0xc0080000025065fc [ 793.122152] Oops: Kernel access of bad area, sig: 11 [#1] ..... [ 793.122315] Call Trace: [ 793.122318] [c000000003c67a20] [c00800000250658c] bnx2x_io_slot_reset+0x204/0x610 [bnx2x] (unreliable) [ 793.122331] [c000000003c67af0] [c0000000000518a8] eeh_report_reset+0xb8/0xf0 [ 793.122338] [c000000003c67b60] [c000000000052130] eeh_pe_report+0x180/0x550 [ 793.122342] [c000000003c67c70] [c00000000005318c] eeh_handle_normal_event+0x84c/0xa60 [ 793.122347] [c000000003c67d50] [c000000000053a84] eeh_event_handler+0xf4/0x170 [ 793.122352] [c000000003c67da0] [c000000000194c58] kthread+0x1c8/0x1d0 [ 793.122356] [c000000003c67e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64 To solve this issue, we need to verify page pool allocations before freeing. Fixes: 4cace675d687 ("bnx2x: Alloc 4k fragment for each rx ring buffer element") Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com> --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)