diff mbox series

[net] net/mlx5e: add missing cpu_to_node to kvzalloc_node in mlx5e_open_xdpredirect_sq

Message ID 20250123000407.3464715-1-sdf@fomichev.me (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series [net] net/mlx5e: add missing cpu_to_node to kvzalloc_node in mlx5e_open_xdpredirect_sq | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 16 of 16 maintainers
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-01-23--12-00 (tests: 878)

Commit Message

Stanislav Fomichev Jan. 23, 2025, 12:04 a.m. UTC
kvzalloc_node is not doing a runtime check on the node argument
(__alloc_pages_node_noprof does have a VM_BUG_ON, but it expands to
nothing on !CONFIG_DEBUG_VM builds), so doing any ethtool/netlink
operation that calls mlx5e_open on a CPU that's larger that MAX_NUMNODES
triggers OOB access and panic (see the trace below).

Add missing cpu_to_node call to convert cpu id to node id.

[  165.427394] mlx5_core 0000:5c:00.0 beth1: Link up
[  166.479327] BUG: unable to handle page fault for address: 0000000800000010
[  166.494592] #PF: supervisor read access in kernel mode
[  166.505995] #PF: error_code(0x0000) - not-present page
...
[  166.816958] Call Trace:
[  166.822380]  <TASK>
[  166.827034]  ? __die_body+0x64/0xb0
[  166.834774]  ? page_fault_oops+0x2cd/0x3f0
[  166.843862]  ? exc_page_fault+0x63/0x130
[  166.852564]  ? asm_exc_page_fault+0x22/0x30
[  166.861843]  ? __kvmalloc_node_noprof+0x43/0xd0
[  166.871897]  ? get_partial_node+0x1c/0x320
[  166.880983]  ? deactivate_slab+0x269/0x2b0
[  166.890069]  ___slab_alloc+0x521/0xa90
[  166.898389]  ? __kvmalloc_node_noprof+0x43/0xd0
[  166.908442]  __kmalloc_node_noprof+0x216/0x3f0
[  166.918302]  ? __kvmalloc_node_noprof+0x43/0xd0
[  166.928354]  __kvmalloc_node_noprof+0x43/0xd0
[  166.938021]  mlx5e_open_channels+0x5e2/0xc00
[  166.947496]  mlx5e_open_locked+0x3e/0xf0
[  166.956201]  mlx5e_open+0x23/0x50
[  166.963551]  __dev_open+0x114/0x1c0
[  166.971292]  __dev_change_flags+0xa2/0x1b0
[  166.980378]  dev_change_flags+0x21/0x60
[  166.988887]  do_setlink+0x38d/0xf20
[  166.996628]  ? ep_poll_callback+0x1b9/0x240
[  167.005910]  ? __nla_validate_parse.llvm.10713395753544950386+0x80/0xd70
[  167.020782]  ? __wake_up_sync_key+0x52/0x80
[  167.030066]  ? __mutex_lock+0xff/0x550
[  167.038382]  ? security_capable+0x50/0x90
[  167.047279]  rtnl_setlink+0x1c9/0x210
[  167.055403]  ? ep_poll_callback+0x1b9/0x240
[  167.064684]  ? security_capable+0x50/0x90
[  167.073579]  rtnetlink_rcv_msg+0x2f9/0x310
[  167.082667]  ? rtnetlink_bind+0x30/0x30
[  167.091173]  netlink_rcv_skb+0xb1/0xe0
[  167.099492]  netlink_unicast+0x20f/0x2e0
[  167.108191]  netlink_sendmsg+0x389/0x420
[  167.116896]  __sys_sendto+0x158/0x1c0
[  167.125024]  __x64_sys_sendto+0x22/0x30
[  167.133534]  do_syscall_64+0x63/0x130
[  167.141657]  ? __irq_exit_rcu.llvm.17843942359718260576+0x52/0xd0
[  167.155181]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: bb135e40129d ("net/mlx5e: move XDP_REDIRECT sq to dynamic allocation")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Joe Damato Jan. 23, 2025, 12:46 a.m. UTC | #1
On Wed, Jan 22, 2025 at 04:04:07PM -0800, Stanislav Fomichev wrote:
> kvzalloc_node is not doing a runtime check on the node argument
> (__alloc_pages_node_noprof does have a VM_BUG_ON, but it expands to
> nothing on !CONFIG_DEBUG_VM builds), so doing any ethtool/netlink
> operation that calls mlx5e_open on a CPU that's larger that MAX_NUMNODES
> triggers OOB access and panic (see the trace below).
> 
> Add missing cpu_to_node call to convert cpu id to node id.
> 
> [  165.427394] mlx5_core 0000:5c:00.0 beth1: Link up
> [  166.479327] BUG: unable to handle page fault for address: 0000000800000010
> [  166.494592] #PF: supervisor read access in kernel mode
> [  166.505995] #PF: error_code(0x0000) - not-present page
> ...
> [  166.816958] Call Trace:
> [  166.822380]  <TASK>
> [  166.827034]  ? __die_body+0x64/0xb0
> [  166.834774]  ? page_fault_oops+0x2cd/0x3f0
> [  166.843862]  ? exc_page_fault+0x63/0x130
> [  166.852564]  ? asm_exc_page_fault+0x22/0x30
> [  166.861843]  ? __kvmalloc_node_noprof+0x43/0xd0
> [  166.871897]  ? get_partial_node+0x1c/0x320
> [  166.880983]  ? deactivate_slab+0x269/0x2b0
> [  166.890069]  ___slab_alloc+0x521/0xa90
> [  166.898389]  ? __kvmalloc_node_noprof+0x43/0xd0
> [  166.908442]  __kmalloc_node_noprof+0x216/0x3f0
> [  166.918302]  ? __kvmalloc_node_noprof+0x43/0xd0
> [  166.928354]  __kvmalloc_node_noprof+0x43/0xd0
> [  166.938021]  mlx5e_open_channels+0x5e2/0xc00
> [  166.947496]  mlx5e_open_locked+0x3e/0xf0
> [  166.956201]  mlx5e_open+0x23/0x50
> [  166.963551]  __dev_open+0x114/0x1c0
> [  166.971292]  __dev_change_flags+0xa2/0x1b0
> [  166.980378]  dev_change_flags+0x21/0x60
> [  166.988887]  do_setlink+0x38d/0xf20
> [  166.996628]  ? ep_poll_callback+0x1b9/0x240
> [  167.005910]  ? __nla_validate_parse.llvm.10713395753544950386+0x80/0xd70
> [  167.020782]  ? __wake_up_sync_key+0x52/0x80
> [  167.030066]  ? __mutex_lock+0xff/0x550
> [  167.038382]  ? security_capable+0x50/0x90
> [  167.047279]  rtnl_setlink+0x1c9/0x210
> [  167.055403]  ? ep_poll_callback+0x1b9/0x240
> [  167.064684]  ? security_capable+0x50/0x90
> [  167.073579]  rtnetlink_rcv_msg+0x2f9/0x310
> [  167.082667]  ? rtnetlink_bind+0x30/0x30
> [  167.091173]  netlink_rcv_skb+0xb1/0xe0
> [  167.099492]  netlink_unicast+0x20f/0x2e0
> [  167.108191]  netlink_sendmsg+0x389/0x420
> [  167.116896]  __sys_sendto+0x158/0x1c0
> [  167.125024]  __x64_sys_sendto+0x22/0x30
> [  167.133534]  do_syscall_64+0x63/0x130
> [  167.141657]  ? __irq_exit_rcu.llvm.17843942359718260576+0x52/0xd0
> [  167.155181]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> Fixes: bb135e40129d ("net/mlx5e: move XDP_REDIRECT sq to dynamic allocation")
> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index bd41b75d246e..a814b63ed97e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -2087,7 +2087,7 @@ static struct mlx5e_xdpsq *mlx5e_open_xdpredirect_sq(struct mlx5e_channel *c,
>  	struct mlx5e_xdpsq *xdpsq;
>  	int err;
>  
> -	xdpsq = kvzalloc_node(sizeof(*xdpsq), GFP_KERNEL, c->cpu);
> +	xdpsq = kvzalloc_node(sizeof(*xdpsq), GFP_KERNEL, cpu_to_node(c->cpu));
>  	if (!xdpsq)
>  		return ERR_PTR(-ENOMEM);

Nice catch!

Just out of curiosity, I looked at the other calls to kvzalloc_node
in mlx5 and they seem fine, as far as I can tell.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Tariq Toukan Jan. 23, 2025, 6:18 a.m. UTC | #2
On 23/01/2025 2:04, Stanislav Fomichev wrote:
> kvzalloc_node is not doing a runtime check on the node argument
> (__alloc_pages_node_noprof does have a VM_BUG_ON, but it expands to
> nothing on !CONFIG_DEBUG_VM builds), so doing any ethtool/netlink
> operation that calls mlx5e_open on a CPU that's larger that MAX_NUMNODES
> triggers OOB access and panic (see the trace below).
> 
> Add missing cpu_to_node call to convert cpu id to node id.
> 
> [  165.427394] mlx5_core 0000:5c:00.0 beth1: Link up
> [  166.479327] BUG: unable to handle page fault for address: 0000000800000010
> [  166.494592] #PF: supervisor read access in kernel mode
> [  166.505995] #PF: error_code(0x0000) - not-present page
> ...
> [  166.816958] Call Trace:
> [  166.822380]  <TASK>
> [  166.827034]  ? __die_body+0x64/0xb0
> [  166.834774]  ? page_fault_oops+0x2cd/0x3f0
> [  166.843862]  ? exc_page_fault+0x63/0x130
> [  166.852564]  ? asm_exc_page_fault+0x22/0x30
> [  166.861843]  ? __kvmalloc_node_noprof+0x43/0xd0
> [  166.871897]  ? get_partial_node+0x1c/0x320
> [  166.880983]  ? deactivate_slab+0x269/0x2b0
> [  166.890069]  ___slab_alloc+0x521/0xa90
> [  166.898389]  ? __kvmalloc_node_noprof+0x43/0xd0
> [  166.908442]  __kmalloc_node_noprof+0x216/0x3f0
> [  166.918302]  ? __kvmalloc_node_noprof+0x43/0xd0
> [  166.928354]  __kvmalloc_node_noprof+0x43/0xd0
> [  166.938021]  mlx5e_open_channels+0x5e2/0xc00
> [  166.947496]  mlx5e_open_locked+0x3e/0xf0
> [  166.956201]  mlx5e_open+0x23/0x50
> [  166.963551]  __dev_open+0x114/0x1c0
> [  166.971292]  __dev_change_flags+0xa2/0x1b0
> [  166.980378]  dev_change_flags+0x21/0x60
> [  166.988887]  do_setlink+0x38d/0xf20
> [  166.996628]  ? ep_poll_callback+0x1b9/0x240
> [  167.005910]  ? __nla_validate_parse.llvm.10713395753544950386+0x80/0xd70
> [  167.020782]  ? __wake_up_sync_key+0x52/0x80
> [  167.030066]  ? __mutex_lock+0xff/0x550
> [  167.038382]  ? security_capable+0x50/0x90
> [  167.047279]  rtnl_setlink+0x1c9/0x210
> [  167.055403]  ? ep_poll_callback+0x1b9/0x240
> [  167.064684]  ? security_capable+0x50/0x90
> [  167.073579]  rtnetlink_rcv_msg+0x2f9/0x310
> [  167.082667]  ? rtnetlink_bind+0x30/0x30
> [  167.091173]  netlink_rcv_skb+0xb1/0xe0
> [  167.099492]  netlink_unicast+0x20f/0x2e0
> [  167.108191]  netlink_sendmsg+0x389/0x420
> [  167.116896]  __sys_sendto+0x158/0x1c0
> [  167.125024]  __x64_sys_sendto+0x22/0x30
> [  167.133534]  do_syscall_64+0x63/0x130
> [  167.141657]  ? __irq_exit_rcu.llvm.17843942359718260576+0x52/0xd0
> [  167.155181]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> Fixes: bb135e40129d ("net/mlx5e: move XDP_REDIRECT sq to dynamic allocation")
> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index bd41b75d246e..a814b63ed97e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -2087,7 +2087,7 @@ static struct mlx5e_xdpsq *mlx5e_open_xdpredirect_sq(struct mlx5e_channel *c,
>   	struct mlx5e_xdpsq *xdpsq;
>   	int err;
>   
> -	xdpsq = kvzalloc_node(sizeof(*xdpsq), GFP_KERNEL, c->cpu);
> +	xdpsq = kvzalloc_node(sizeof(*xdpsq), GFP_KERNEL, cpu_to_node(c->cpu));
>   	if (!xdpsq)
>   		return ERR_PTR(-ENOMEM);
>   

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

Thanks for your patch.

Regards,
Tariq
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index bd41b75d246e..a814b63ed97e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2087,7 +2087,7 @@  static struct mlx5e_xdpsq *mlx5e_open_xdpredirect_sq(struct mlx5e_channel *c,
 	struct mlx5e_xdpsq *xdpsq;
 	int err;
 
-	xdpsq = kvzalloc_node(sizeof(*xdpsq), GFP_KERNEL, c->cpu);
+	xdpsq = kvzalloc_node(sizeof(*xdpsq), GFP_KERNEL, cpu_to_node(c->cpu));
 	if (!xdpsq)
 		return ERR_PTR(-ENOMEM);