diff mbox series

[net-next] bnxt_en: fix kernel panic in queue api functions

Message ID 20240704074153.1508825-1-ap420073@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] bnxt_en: fix kernel panic in queue api functions | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 839 this patch: 839
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 846 this patch: 846
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 848 this patch: 848
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 27 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 2 this patch: 2
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-04--15-00 (tests: 688)

Commit Message

Taehee Yoo July 4, 2024, 7:41 a.m. UTC
bnxt_queue_{mem_alloc,start,stop} access bp->rx_ring array and this is
initialized while an interface is being up.
The rings are initialized as a number of channels.

The queue API functions access rx_ring without checking both null and
ring size.
So, if the queue API functions are called when interface status is down,
they access an uninitialized rx_ring array.
Also if the queue index parameter value is larger than a ring, it
would also access an uninitialized rx_ring.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 1 PID: 1697 Comm: ncdevmem Not tainted 6.10.0-rc5+ #34
 RIP: 0010:bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en]
 Code: 49 89 f5 41 54 4d 89 c4 4d 69 c0 c0 05 00 00 55 48 8d af 40 0a 00 00 53 48 89 fb 48 83 ec 05
 RSP: 0018:ffffa1ad0449ba48 EFLAGS: 00010246
 RAX: ffffffffc04c7710 RBX: ffff9b88aee48000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff9b8884ba0000 RDI: ffff9b8884ba0008
 RBP: ffff9b88aee48a40 R08: 0000000000000000 R09: ffff9b8884ba6000
 R10: ffffa1ad0449ba88 R11: ffff9b8884ba6000 R12: 0000000000000000
 R13: ffff9b8884ba0000 R14: ffff9b8884ba0000 R15: ffff9b8884ba6000
 FS:  00007f7b2a094740(0000) GS:ffff9b8f9f680000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000015f394000 CR4: 00000000007506f0
 PKRU: 55555554
 Call Trace:
  <TASK>
  ? __die+0x20/0x70
  ? page_fault_oops+0x15a/0x460
  ? __vmalloc_node_range_noprof+0x4f7/0x8e0
  ? exc_page_fault+0x6e/0x180
  ? asm_exc_page_fault+0x22/0x30
  ? __pfx_bnxt_queue_mem_alloc+0x10/0x10 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534]
  ? bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534]
  netdev_rx_queue_restart+0xa9/0x1c0
  net_devmem_bind_dmabuf_to_queue+0xcb/0x100
  netdev_nl_bind_rx_doit+0x2f6/0x350
  genl_family_rcv_msg_doit+0xd9/0x130
  genl_rcv_msg+0x184/0x2b0
  ? __pfx_netdev_nl_bind_rx_doit+0x10/0x10
  ? __pfx_genl_rcv_msg+0x10/0x10
  netlink_rcv_skb+0x54/0x100
  genl_rcv+0x24/0x40
  netlink_unicast+0x243/0x370
  netlink_sendmsg+0x1bb/0x3e0

Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---

The branch is not net because the commit 2d694c27d32e is not
yet merged into net branch.

devmem TCP causes this problem, but it is not yet merged.
So, to test this patch, please patch the current devmem TCP.
The /tools/testing/selftests/net/ncdevmem will immediately reproduce 
this problem.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Somnath Kotur July 4, 2024, 8:13 a.m. UTC | #1
On Thu, Jul 4, 2024 at 1:12 PM Taehee Yoo <ap420073@gmail.com> wrote:
>
> bnxt_queue_{mem_alloc,start,stop} access bp->rx_ring array and this is
> initialized while an interface is being up.
> The rings are initialized as a number of channels.
>
> The queue API functions access rx_ring without checking both null and
> ring size.
> So, if the queue API functions are called when interface status is down,
> they access an uninitialized rx_ring array.
> Also if the queue index parameter value is larger than a ring, it
> would also access an uninitialized rx_ring.
You might want to rephrase this as  'if the queue index parameter is
greater than the no: of rings created'
>
>  BUG: kernel NULL pointer dereference, address: 0000000000000000
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 0 P4D 0
>  Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
>  CPU: 1 PID: 1697 Comm: ncdevmem Not tainted 6.10.0-rc5+ #34
>  RIP: 0010:bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en]
>  Code: 49 89 f5 41 54 4d 89 c4 4d 69 c0 c0 05 00 00 55 48 8d af 40 0a 00 00 53 48 89 fb 48 83 ec 05
>  RSP: 0018:ffffa1ad0449ba48 EFLAGS: 00010246
>  RAX: ffffffffc04c7710 RBX: ffff9b88aee48000 RCX: 0000000000000000
>  RDX: 0000000000000000 RSI: ffff9b8884ba0000 RDI: ffff9b8884ba0008
>  RBP: ffff9b88aee48a40 R08: 0000000000000000 R09: ffff9b8884ba6000
>  R10: ffffa1ad0449ba88 R11: ffff9b8884ba6000 R12: 0000000000000000
>  R13: ffff9b8884ba0000 R14: ffff9b8884ba0000 R15: ffff9b8884ba6000
>  FS:  00007f7b2a094740(0000) GS:ffff9b8f9f680000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000000 CR3: 000000015f394000 CR4: 00000000007506f0
>  PKRU: 55555554
>  Call Trace:
>   <TASK>
>   ? __die+0x20/0x70
>   ? page_fault_oops+0x15a/0x460
>   ? __vmalloc_node_range_noprof+0x4f7/0x8e0
>   ? exc_page_fault+0x6e/0x180
>   ? asm_exc_page_fault+0x22/0x30
>   ? __pfx_bnxt_queue_mem_alloc+0x10/0x10 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534]
>   ? bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534]
>   netdev_rx_queue_restart+0xa9/0x1c0
>   net_devmem_bind_dmabuf_to_queue+0xcb/0x100
>   netdev_nl_bind_rx_doit+0x2f6/0x350
>   genl_family_rcv_msg_doit+0xd9/0x130
>   genl_rcv_msg+0x184/0x2b0
>   ? __pfx_netdev_nl_bind_rx_doit+0x10/0x10
>   ? __pfx_genl_rcv_msg+0x10/0x10
>   netlink_rcv_skb+0x54/0x100
>   genl_rcv+0x24/0x40
>   netlink_unicast+0x243/0x370
>   netlink_sendmsg+0x1bb/0x3e0
>
> Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
>
> The branch is not net because the commit 2d694c27d32e is not
> yet merged into net branch.
>
> devmem TCP causes this problem, but it is not yet merged.
> So, to test this patch, please patch the current devmem TCP.
> The /tools/testing/selftests/net/ncdevmem will immediately reproduce
> this problem.
>
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 6fc34ccb86e3..e270fb6b2866 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -15022,6 +15022,9 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
>         struct bnxt_ring_struct *ring;
>         int rc;
>
> +       if (!bp->rx_ring || idx >= bp->rx_nr_rings)
> +               return -EINVAL;
> +
>         rxr = &bp->rx_ring[idx];
>         clone = qmem;
>         memcpy(clone, rxr, sizeof(*rxr));
> @@ -15156,6 +15159,9 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
>         struct bnxt_cp_ring_info *cpr;
>         int rc;
>
> +       if (!bp->rx_ring || idx >= bp->rx_nr_rings)
> +               return -EINVAL;
> +
>         rxr = &bp->rx_ring[idx];
>         clone = qmem;
>
> @@ -15195,6 +15201,9 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>         struct bnxt *bp = netdev_priv(dev);
>         struct bnxt_rx_ring_info *rxr;
>
> +       if (!bp->rx_ring || idx >= bp->rx_nr_rings)
> +               return -EINVAL;
> +
>         rxr = &bp->rx_ring[idx];
>         napi_disable(&rxr->bnapi->napi);
>         bnxt_hwrm_rx_ring_free(bp, rxr, false);
> --
> 2.34.1
>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Jakub Kicinski July 4, 2024, 1:33 p.m. UTC | #2
On Thu,  4 Jul 2024 07:41:53 +0000 Taehee Yoo wrote:
> bnxt_queue_{mem_alloc,start,stop} access bp->rx_ring array and this is
> initialized while an interface is being up.
> The rings are initialized as a number of channels.
> 
> The queue API functions access rx_ring without checking both null and
> ring size.
> So, if the queue API functions are called when interface status is down,
> they access an uninitialized rx_ring array.
> Also if the queue index parameter value is larger than a ring, it
> would also access an uninitialized rx_ring.

Shouldn't the core be checking against dev->real_num_rx_queues instead ?
Taehee Yoo July 4, 2024, 3:20 p.m. UTC | #3
On Thu, Jul 4, 2024 at 10:33 PM Jakub Kicinski <kuba@kernel.org> wrote:
>

Hi Jakub,
Thanks a lot for review!


> On Thu, 4 Jul 2024 07:41:53 +0000 Taehee Yoo wrote:
> > bnxt_queue_{mem_alloc,start,stop} access bp->rx_ring array and this is
> > initialized while an interface is being up.
> > The rings are initialized as a number of channels.
> >
> > The queue API functions access rx_ring without checking both null and
> > ring size.
> > So, if the queue API functions are called when interface status is down,
> > they access an uninitialized rx_ring array.
> > Also if the queue index parameter value is larger than a ring, it
> > would also access an uninitialized rx_ring.
>
> Shouldn't the core be checking against dev->real_num_rx_queues instead ?

Oh, I missed it.
I agree the core should check dev->real_num_rx_queues.
But the current devmem TCP code checks dev->num_rx_queues instead of
dev->real_num_rx_queues.
I tested the below change, and it works well.
So, I will comment on it in Mina's patch.

diff --git a/net/core/devmem.c b/net/core/devmem.c
index 7afaf17801ef..da27778c2421 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -146,7 +146,7 @@ int net_devmem_bind_dmabuf_to_queue(struct
net_device *dev, u32 rxq_idx,
u32 xa_idx;
int err;

- if (rxq_idx >= dev->num_rx_queues)
+ if (rxq_idx >= dev->real_num_rx_queues)
return -ERANGE;

rxq = __netif_get_rx_queue(dev, rxq_idx);


Thanks a lot!
Taehee Yoo
David Wei July 5, 2024, 6:42 p.m. UTC | #4
On 2024-07-04 00:41, Taehee Yoo wrote:
> bnxt_queue_{mem_alloc,start,stop} access bp->rx_ring array and this is
> initialized while an interface is being up.
> The rings are initialized as a number of channels.
> 
> The queue API functions access rx_ring without checking both null and
> ring size.
> So, if the queue API functions are called when interface status is down,
> they access an uninitialized rx_ring array.
> Also if the queue index parameter value is larger than a ring, it
> would also access an uninitialized rx_ring.

Hi Taehee, I deliberately didn't do a check as it is too defensive. The
caller of the queue API e.g. netdev_rx_queue_restart() should be the one
doing the check instead.

> 
>  BUG: kernel NULL pointer dereference, address: 0000000000000000
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 0 P4D 0
>  Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
>  CPU: 1 PID: 1697 Comm: ncdevmem Not tainted 6.10.0-rc5+ #34
>  RIP: 0010:bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en]
>  Code: 49 89 f5 41 54 4d 89 c4 4d 69 c0 c0 05 00 00 55 48 8d af 40 0a 00 00 53 48 89 fb 48 83 ec 05
>  RSP: 0018:ffffa1ad0449ba48 EFLAGS: 00010246
>  RAX: ffffffffc04c7710 RBX: ffff9b88aee48000 RCX: 0000000000000000
>  RDX: 0000000000000000 RSI: ffff9b8884ba0000 RDI: ffff9b8884ba0008
>  RBP: ffff9b88aee48a40 R08: 0000000000000000 R09: ffff9b8884ba6000
>  R10: ffffa1ad0449ba88 R11: ffff9b8884ba6000 R12: 0000000000000000
>  R13: ffff9b8884ba0000 R14: ffff9b8884ba0000 R15: ffff9b8884ba6000
>  FS:  00007f7b2a094740(0000) GS:ffff9b8f9f680000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000000 CR3: 000000015f394000 CR4: 00000000007506f0
>  PKRU: 55555554
>  Call Trace:
>   <TASK>
>   ? __die+0x20/0x70
>   ? page_fault_oops+0x15a/0x460
>   ? __vmalloc_node_range_noprof+0x4f7/0x8e0
>   ? exc_page_fault+0x6e/0x180
>   ? asm_exc_page_fault+0x22/0x30
>   ? __pfx_bnxt_queue_mem_alloc+0x10/0x10 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534]
>   ? bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534]
>   netdev_rx_queue_restart+0xa9/0x1c0
>   net_devmem_bind_dmabuf_to_queue+0xcb/0x100
>   netdev_nl_bind_rx_doit+0x2f6/0x350
>   genl_family_rcv_msg_doit+0xd9/0x130
>   genl_rcv_msg+0x184/0x2b0
>   ? __pfx_netdev_nl_bind_rx_doit+0x10/0x10
>   ? __pfx_genl_rcv_msg+0x10/0x10
>   netlink_rcv_skb+0x54/0x100
>   genl_rcv+0x24/0x40
>   netlink_unicast+0x243/0x370
>   netlink_sendmsg+0x1bb/0x3e0
> 
> Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
> 
> The branch is not net because the commit 2d694c27d32e is not
> yet merged into net branch.
> 
> devmem TCP causes this problem, but it is not yet merged.
> So, to test this patch, please patch the current devmem TCP.
> The /tools/testing/selftests/net/ncdevmem will immediately reproduce 
> this problem.
> 
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 6fc34ccb86e3..e270fb6b2866 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -15022,6 +15022,9 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
>  	struct bnxt_ring_struct *ring;
>  	int rc;
>  
> +	if (!bp->rx_ring || idx >= bp->rx_nr_rings)
> +		return -EINVAL;
> +
>  	rxr = &bp->rx_ring[idx];
>  	clone = qmem;
>  	memcpy(clone, rxr, sizeof(*rxr));
> @@ -15156,6 +15159,9 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
>  	struct bnxt_cp_ring_info *cpr;
>  	int rc;
>  
> +	if (!bp->rx_ring || idx >= bp->rx_nr_rings)
> +		return -EINVAL;
> +
>  	rxr = &bp->rx_ring[idx];
>  	clone = qmem;
>  
> @@ -15195,6 +15201,9 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>  	struct bnxt *bp = netdev_priv(dev);
>  	struct bnxt_rx_ring_info *rxr;
>  
> +	if (!bp->rx_ring || idx >= bp->rx_nr_rings)
> +		return -EINVAL;
> +
>  	rxr = &bp->rx_ring[idx];
>  	napi_disable(&rxr->bnapi->napi);
>  	bnxt_hwrm_rx_ring_free(bp, rxr, false);
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6fc34ccb86e3..e270fb6b2866 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -15022,6 +15022,9 @@  static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
 	struct bnxt_ring_struct *ring;
 	int rc;
 
+	if (!bp->rx_ring || idx >= bp->rx_nr_rings)
+		return -EINVAL;
+
 	rxr = &bp->rx_ring[idx];
 	clone = qmem;
 	memcpy(clone, rxr, sizeof(*rxr));
@@ -15156,6 +15159,9 @@  static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
 	struct bnxt_cp_ring_info *cpr;
 	int rc;
 
+	if (!bp->rx_ring || idx >= bp->rx_nr_rings)
+		return -EINVAL;
+
 	rxr = &bp->rx_ring[idx];
 	clone = qmem;
 
@@ -15195,6 +15201,9 @@  static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
 	struct bnxt *bp = netdev_priv(dev);
 	struct bnxt_rx_ring_info *rxr;
 
+	if (!bp->rx_ring || idx >= bp->rx_nr_rings)
+		return -EINVAL;
+
 	rxr = &bp->rx_ring[idx];
 	napi_disable(&rxr->bnapi->napi);
 	bnxt_hwrm_rx_ring_free(bp, rxr, false);