diff mbox series

[v2,net] bnx2x: Fix multiple UBSAN array-index-out-of-bounds

Message ID 20240612154449.173663-1-ghadi.rahme@canonical.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2,net] bnx2x: Fix multiple UBSAN array-index-out-of-bounds | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 868 this patch: 868
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: mschmidt@redhat.com; 6 maintainers not CCed: pabeni@redhat.com kuba@kernel.org edumazet@google.com mschmidt@redhat.com manishc@marvell.com skalluru@marvell.com
netdev/build_clang success Errors and warnings before: 868 this patch: 868
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 872 this patch: 872
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 13 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 5 this patch: 5
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-06-13--12-00 (tests: 644)

Commit Message

Ghadi Elie Rahme June 12, 2024, 3:44 p.m. UTC
Fix UBSAN warnings that occur when using a system with 32 physical
cpu cores or more, or when the user defines a number of Ethernet
queues greater than or equal to FP_SB_MAX_E1x using the num_queues
module parameter.

The value of the maximum number of Ethernet queues should be limited
to FP_SB_MAX_E1x in case FCOE is disabled or to [FP_SB_MAX_E1x-1] if
enabled to avoid out of bounds reads and writes.

Stack traces:

UBSAN: array-index-out-of-bounds in
       drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11
index 20 is out of range for type 'stats_query_entry [19]'
CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic
	     #202405052133
Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
	       BIOS P89 10/21/2019
Call Trace:
 <TASK>
 dump_stack_lvl+0x76/0xa0
 dump_stack+0x10/0x20
 __ubsan_handle_out_of_bounds+0xcb/0x110
 bnx2x_prep_fw_stats_req+0x2e1/0x310 [bnx2x]
 bnx2x_stats_init+0x156/0x320 [bnx2x]
 bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x]
 bnx2x_nic_load+0x8e8/0x19e0 [bnx2x]
 bnx2x_open+0x16b/0x290 [bnx2x]
 __dev_open+0x10e/0x1d0
RIP: 0033:0x736223927a0a
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca
      64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00
      f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a
RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003
RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080
R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0
R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00
</TASK>
---[ end trace ]---
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in
       drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1546:11
index 28 is out of range for type 'stats_query_entry [19]'
CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic
	     #202405052133
Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
	       BIOS P89 10/21/2019
Call Trace:
<TASK>
dump_stack_lvl+0x76/0xa0
dump_stack+0x10/0x20
__ubsan_handle_out_of_bounds+0xcb/0x110
bnx2x_prep_fw_stats_req+0x2fd/0x310 [bnx2x]
bnx2x_stats_init+0x156/0x320 [bnx2x]
bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x]
bnx2x_nic_load+0x8e8/0x19e0 [bnx2x]
bnx2x_open+0x16b/0x290 [bnx2x]
__dev_open+0x10e/0x1d0
RIP: 0033:0x736223927a0a
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca
      64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00
      f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a
RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003
RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080
R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0
R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00
 </TASK>
---[ end trace ]---
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in
       drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1895:8
index 29 is out of range for type 'stats_query_entry [19]'
CPU: 13 PID: 163 Comm: kworker/u96:1 Not tainted 6.9.0-060900rc7-generic
	     #202405052133
Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
	       BIOS P89 10/21/2019
Workqueue: bnx2x bnx2x_sp_task [bnx2x]
Call Trace:
 <TASK>
 dump_stack_lvl+0x76/0xa0
 dump_stack+0x10/0x20
 __ubsan_handle_out_of_bounds+0xcb/0x110
 bnx2x_iov_adjust_stats_req+0x3c4/0x3d0 [bnx2x]
 bnx2x_storm_stats_post.part.0+0x4a/0x330 [bnx2x]
 ? bnx2x_hw_stats_post+0x231/0x250 [bnx2x]
 bnx2x_stats_start+0x44/0x70 [bnx2x]
 bnx2x_stats_handle+0x149/0x350 [bnx2x]
 bnx2x_attn_int_asserted+0x998/0x9b0 [bnx2x]
 bnx2x_sp_task+0x491/0x5c0 [bnx2x]
 process_one_work+0x18d/0x3f0
 </TASK>
---[ end trace ]---

Fixes: 7d0445d66a76 ("bnx2x: clamp num_queues to prevent passing a negative value")
Signed-off-by: Ghadi Elie Rahme <ghadi.rahme@canonical.com>
Cc: stable@vger.kernel.org
---
Changes since v1:
 * Fix checkpatch complaints:
   - Wrapped commit message to comply with 75 character limit
   - Added space before ( in if condition

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Michael Ellerman June 13, 2024, 12:36 p.m. UTC | #1
Ghadi Elie Rahme <ghadi.rahme@canonical.com> writes:
> Fix UBSAN warnings that occur when using a system with 32 physical
> cpu cores or more, or when the user defines a number of Ethernet
> queues greater than or equal to FP_SB_MAX_E1x using the num_queues
> module parameter.
>
> The value of the maximum number of Ethernet queues should be limited
> to FP_SB_MAX_E1x in case FCOE is disabled or to [FP_SB_MAX_E1x-1] if
> enabled to avoid out of bounds reads and writes.
>
> Stack traces:
>
> UBSAN: array-index-out-of-bounds in
>        drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11
> index 20 is out of range for type 'stats_query_entry [19]'
> CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic
> 	     #202405052133
> Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
> 	       BIOS P89 10/21/2019
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x76/0xa0
>  dump_stack+0x10/0x20
>  __ubsan_handle_out_of_bounds+0xcb/0x110
>  bnx2x_prep_fw_stats_req+0x2e1/0x310 [bnx2x]
>  bnx2x_stats_init+0x156/0x320 [bnx2x]
>  bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x]
>  bnx2x_nic_load+0x8e8/0x19e0 [bnx2x]
>  bnx2x_open+0x16b/0x290 [bnx2x]
>  __dev_open+0x10e/0x1d0
> RIP: 0033:0x736223927a0a
> Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca
>       64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00
>       f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
> RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a
> RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003
> RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0
> R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00
> </TASK>
> ---[ end trace ]---
> ------------[ cut here ]------------
> UBSAN: array-index-out-of-bounds in
>        drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1546:11
> index 28 is out of range for type 'stats_query_entry [19]'
> CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic
> 	     #202405052133
> Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
> 	       BIOS P89 10/21/2019
> Call Trace:
> <TASK>
> dump_stack_lvl+0x76/0xa0
> dump_stack+0x10/0x20
> __ubsan_handle_out_of_bounds+0xcb/0x110
> bnx2x_prep_fw_stats_req+0x2fd/0x310 [bnx2x]
> bnx2x_stats_init+0x156/0x320 [bnx2x]
> bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x]
> bnx2x_nic_load+0x8e8/0x19e0 [bnx2x]
> bnx2x_open+0x16b/0x290 [bnx2x]
> __dev_open+0x10e/0x1d0
 
I also hit this one on powerpc:

  https://lore.kernel.org/all/87pltc4rs8.fsf@mail.lhotse/

And confirm that this patch fixes it there too.

cheers
Jakub Kicinski June 13, 2024, 2:48 p.m. UTC | #2
On Wed, 12 Jun 2024 18:44:49 +0300 Ghadi Elie Rahme wrote:
> Fix UBSAN warnings that occur when using a system with 32 physical
> cpu cores or more, or when the user defines a number of Ethernet
> queues greater than or equal to FP_SB_MAX_E1x using the num_queues
> module parameter.
> 
> The value of the maximum number of Ethernet queues should be limited
> to FP_SB_MAX_E1x in case FCOE is disabled or to [FP_SB_MAX_E1x-1] if
> enabled to avoid out of bounds reads and writes.

You're just describing what the code does, not providing extra
context...

> Fixes: 7d0445d66a76 ("bnx2x: clamp num_queues to prevent passing a negative value")

Sure this is not more recent, netif_get_num_default_rss_queues()
used to always return 8.

> Signed-off-by: Ghadi Elie Rahme <ghadi.rahme@canonical.com>
> Cc: stable@vger.kernel.org

>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> index a8e07e51418f..c895dd680cf8 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> @@ -66,7 +66,12 @@ static int bnx2x_calc_num_queues(struct bnx2x *bp)
>  	if (is_kdump_kernel())
>  		nq = 1;
>  
> -	nq = clamp(nq, 1, BNX2X_MAX_QUEUES(bp));
> +	int max_nq = FP_SB_MAX_E1x - 1;

please don't mix declarations and code

> +	if (NO_FCOE(bp))
> +		max_nq = FP_SB_MAX_E1x;

you really need to explain somewhere why you're hardcoding E1x
constants while at a glance the driver also supports E2.
Also why is BNX2X_MAX_QUEUES() higher than the number of queues?
Isn't that the bug?
Ghadi Elie Rahme June 20, 2024, 2:59 p.m. UTC | #3
On 13/06/2024 17:48, Jakub Kicinski wrote:
> On Wed, 12 Jun 2024 18:44:49 +0300 Ghadi Elie Rahme wrote:
>> Fix UBSAN warnings that occur when using a system with 32 physical
>> cpu cores or more, or when the user defines a number of Ethernet
>> queues greater than or equal to FP_SB_MAX_E1x using the num_queues
>> module parameter.
>>
>> The value of the maximum number of Ethernet queues should be limited
>> to FP_SB_MAX_E1x in case FCOE is disabled or to [FP_SB_MAX_E1x-1] if
>> enabled to avoid out of bounds reads and writes.
> You're just describing what the code does, not providing extra
> context...

Apologies for the lack of explanation.

Currently there is a read/write out of bounds that occurs on the array
"struct stats_query_entry query" present inside the "bnx2x_fw_stats_req"
struct in "drivers/net/ethernet/broadcom/bnx2x/bnx2x.h".
Looking at the definition of the "struct stats_query_entry query" array:

struct stats_query_entry query[FP_SB_MAX_E1x+
         BNX2X_FIRST_QUEUE_QUERY_IDX];

FP_SB_MAX_E1x is defined as the maximum number of fast path interrupts and
has a value of 16, while BNX2X_FIRST_QUEUE_QUERY_IDX has a value of 3
meaning the array has a total size of 19.
Since accesses to "struct stats_query_entry query" are offset-ted by
BNX2X_FIRST_QUEUE_QUERY_IDX, that means that the total number of Ethernet
queues should not exceed FP_SB_MAX_E1x (16). However one of these queues
is reserved for FCOE and thus the number of Ethernet queues should be set
to [FP_SB_MAX_E1x -1] (15) if FCOE is enabled or [FP_SB_MAX_E1x] (16) if
it is not.

This is also described in a comment in the source code in
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h just above the Macro definition
of FP_SB_MAX_E1x. Below is the part of this explanation that it important
for this patch

/*
  * The total number of L2 queues, MSIX vectors and HW contexts (CIDs) is
  * control by the number of fast-path status blocks supported by the
  * device (HW/FW). Each fast-path status block (FP-SB) aka non-default
  * status block represents an independent interrupts context that can
  * serve a regular L2 networking queue. However special L2 queues such
  * as the FCoE queue do not require a FP-SB and other components like
  * the CNIC may consume FP-SB reducing the number of possible L2 queues
  *
  * If the maximum number of FP-SB available is X then:
  * a. If CNIC is supported it consumes 1 FP-SB thus the max number of
  *    regular L2 queues is Y=X-1
  * b. In MF mode the actual number of L2 queues is Y= (X-1/MF_factor)
  * c. If the FCoE L2 queue is supported the actual number of L2 queues
  *    is Y+1
  * d. The number of irqs (MSIX vectors) is either Y+1 (one extra for
  *    slow-path interrupts) or Y+2 if CNIC is supported (one additional
  *    FP interrupt context for the CNIC).
  * e. The number of HW context (CID count) is always X or X+1 if FCoE
  *    L2 queue is supported. The cid for the FCoE L2 queue is always X.
  */

Looking at the commits when the E2 support was added, it was originally
using the E1x parameters [f2e0899f0f27 (bnx2x: Add 57712 support)]. Where
FP_SB_MAX_E2 was set to 16 the same as E1x. Since I do not have access to
the datasheets of these devices I had to guess based on the previous work
done on the driver what would be the safest way to fix this array overflow.
Thus I decided to go with how things were done before, which is to limit
the E2 to using the same number of queues as E1x. This patch accomplishes
that.

However I also had another solution which made more sense to me but I had
no way to tell if it would be safe. The other solution was to increase the
size of the stats_query_entry query array to be large enough to fit the
number of queues supported by E2. This would mean that the new definition
would look like the following:

struct stats_query_entry query[FP_SB_MAX_E2+
         BNX2X_FIRST_QUEUE_QUERY_IDX];

I have tested this approach and it worked fine so I am more comfortable now
changing the patch an sending in a v3 undoing the changes in v2 and simply
increasing the array size. I believe now that using FP_SB_MAX_E1x instead
of FP_SB_MAX_E2 to define the array size might have been an oversight when
updating the driver to take full advantage of the E2 after it was just
limiting itself to the capabilities of an E1x.

>
>> Fixes: 7d0445d66a76 ("bnx2x: clamp num_queues to prevent passing a negative value")
> Sure this is not more recent, netif_get_num_default_rss_queues()
> used to always return 8.
The value of the number of queues can be defined by the kernel or the
user, which is why I used the commit that I did for the Fixes tag
because it is the job of the clamp to make sure both these values are
in check. Setting the Fixes tag to when netif_get_num_default_rss_queues()
was changed ignores the fact that the user value can be out of bounds.
>> Signed-off-by: Ghadi Elie Rahme <ghadi.rahme@canonical.com>
>> Cc: stable@vger.kernel.org
>>   drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
>> index a8e07e51418f..c895dd680cf8 100644
>> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
>> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
>> @@ -66,7 +66,12 @@ static int bnx2x_calc_num_queues(struct bnx2x *bp)
>>   	if (is_kdump_kernel())
>>   		nq = 1;
>>   
>> -	nq = clamp(nq, 1, BNX2X_MAX_QUEUES(bp));
>> +	int max_nq = FP_SB_MAX_E1x - 1;
> please don't mix declarations and code
>
>> +	if (NO_FCOE(bp))
>> +		max_nq = FP_SB_MAX_E1x;
> you really need to explain somewhere why you're hardcoding E1x
> constants while at a glance the driver also supports E2.
> Also why is BNX2X_MAX_QUEUES() higher than the number of queues?
> Isn't that the bug?
The reason I did not patch BNX2X_MAX_QUEUES() is because the macro is
working as expected by returning the actual number of queues that can be
handled by a NIC using an E2/E1x chip. It was the driver that was not able
to handle the maximum an E2 NIC can take.
Jakub Kicinski June 21, 2024, 8:38 p.m. UTC | #4
On Thu, 20 Jun 2024 17:59:42 +0300 Ghadi Rahme wrote:
> I have tested this approach and it worked fine so I am more comfortable now
> changing the patch an sending in a v3 undoing the changes in v2 and simply
> increasing the array size. I believe now that using FP_SB_MAX_E1x instead
> of FP_SB_MAX_E2 to define the array size might have been an oversight when
> updating the driver to take full advantage of the E2 after it was just
> limiting itself to the capabilities of an E1x.

Sounds good! Just to be clear, please do include the explanations you
provided here in the commit message, with the necessary edits. There's
no limit on the length of the commit message.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index a8e07e51418f..c895dd680cf8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -66,7 +66,12 @@  static int bnx2x_calc_num_queues(struct bnx2x *bp)
 	if (is_kdump_kernel())
 		nq = 1;
 
-	nq = clamp(nq, 1, BNX2X_MAX_QUEUES(bp));
+	int max_nq = FP_SB_MAX_E1x - 1;
+
+	if (NO_FCOE(bp))
+		max_nq = FP_SB_MAX_E1x;
+
+	nq = clamp(nq, 1, max_nq);
 	return nq;
 }