diff mbox series

[v3] linux/dim: Fix divide 0 in RDMA DIM.

Message ID 20220627140004.3099-1-thomas.liu@ucloud.cn (mailing list archive)
State Accepted
Commit 0fe3dbbefb74a8575f61d7801b08dbc50523d60d
Headers show
Series [v3] linux/dim: Fix divide 0 in RDMA DIM. | expand

Commit Message

Tao Liu June 27, 2022, 2 p.m. UTC
Fix a divide 0 error in rdma_dim_stats_compare() when prev->cpe_ratio == 0.

CallTrace:
  Hardware name: H3C R4900 G3/RS33M2C9S, BIOS 2.00.37P21 03/12/2020
  task: ffff880194b78000 task.stack: ffffc90006714000
  RIP: 0010:backport_rdma_dim+0x10e/0x240 [mlx_compat]
  RSP: 0018:ffff880c10e83ec0 EFLAGS: 00010202
  RAX: 0000000000002710 RBX: ffff88096cd7f780 RCX: 0000000000000064
  RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000001
  RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 000000001d7c6c09
  R13: ffff88096cd7f780 R14: ffff880b174fe800 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff880c10e80000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000a0965b00 CR3: 000000000200a003 CR4: 00000000007606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   <IRQ>
   ib_poll_handler+0x43/0x80 [ib_core]
   irq_poll_softirq+0xae/0x110
   __do_softirq+0xd1/0x28c
   irq_exit+0xde/0xf0
   do_IRQ+0x54/0xe0
   common_interrupt+0x8f/0x8f
   </IRQ>
   ? cpuidle_enter_state+0xd9/0x2a0
   ? cpuidle_enter_state+0xc7/0x2a0
   ? do_idle+0x170/0x1d0
   ? cpu_startup_entry+0x6f/0x80
   ? start_secondary+0x1b9/0x210
   ? secondary_startup_64+0xa5/0xb0
  Code: 0f 87 e1 00 00 00 8b 4c 24 14 44 8b 43 14 89 c8 4d 63 c8 44 29 c0 99 31 d0 29 d0 31 d2 48 98 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <49> f7 f1 48 83 f8 0a 0f 86 c1 00 00 00 44 39 c1 7f 10 48 89 df
  RIP: backport_rdma_dim+0x10e/0x240 [mlx_compat] RSP: ffff880c10e83ec0

Fixes: f4915455dcf0 ("linux/dim: Implement RDMA adaptive moderation (DIM)")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
---
 include/linux/dim.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Leon Romanovsky June 28, 2022, 7:10 a.m. UTC | #1
On Mon, Jun 27, 2022 at 10:00:04PM +0800, Tao Liu wrote:
> Fix a divide 0 error in rdma_dim_stats_compare() when prev->cpe_ratio == 0.
> 
> CallTrace:
>   Hardware name: H3C R4900 G3/RS33M2C9S, BIOS 2.00.37P21 03/12/2020
>   task: ffff880194b78000 task.stack: ffffc90006714000
>   RIP: 0010:backport_rdma_dim+0x10e/0x240 [mlx_compat]
>   RSP: 0018:ffff880c10e83ec0 EFLAGS: 00010202
>   RAX: 0000000000002710 RBX: ffff88096cd7f780 RCX: 0000000000000064
>   RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000001
>   RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
>   R10: 0000000000000000 R11: 0000000000000000 R12: 000000001d7c6c09
>   R13: ffff88096cd7f780 R14: ffff880b174fe800 R15: 0000000000000000
>   FS:  0000000000000000(0000) GS:ffff880c10e80000(0000)
>   knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00000000a0965b00 CR3: 000000000200a003 CR4: 00000000007606e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   PKRU: 55555554
>   Call Trace:
>    <IRQ>
>    ib_poll_handler+0x43/0x80 [ib_core]
>    irq_poll_softirq+0xae/0x110
>    __do_softirq+0xd1/0x28c
>    irq_exit+0xde/0xf0
>    do_IRQ+0x54/0xe0
>    common_interrupt+0x8f/0x8f
>    </IRQ>
>    ? cpuidle_enter_state+0xd9/0x2a0
>    ? cpuidle_enter_state+0xc7/0x2a0
>    ? do_idle+0x170/0x1d0
>    ? cpu_startup_entry+0x6f/0x80
>    ? start_secondary+0x1b9/0x210
>    ? secondary_startup_64+0xa5/0xb0
>   Code: 0f 87 e1 00 00 00 8b 4c 24 14 44 8b 43 14 89 c8 4d 63 c8 44 29 c0 99 31 d0 29 d0 31 d2 48 98 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <49> f7 f1 48 83 f8 0a 0f 86 c1 00 00 00 44 39 c1 7f 10 48 89 df
>   RIP: backport_rdma_dim+0x10e/0x240 [mlx_compat] RSP: ffff880c10e83ec0
> 
> Fixes: f4915455dcf0 ("linux/dim: Implement RDMA adaptive moderation (DIM)")
> Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> Acked-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  include/linux/dim.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Jason,

Can you please pick this patch to rdma-rc as this divide-by-zero
error is caused by RDMA app?

Thanks
Jason Gunthorpe June 28, 2022, 7:16 p.m. UTC | #2
On Mon, Jun 27, 2022 at 10:00:04PM +0800, Tao Liu wrote:
> Fix a divide 0 error in rdma_dim_stats_compare() when prev->cpe_ratio == 0.
> 
> CallTrace:
>   Hardware name: H3C R4900 G3/RS33M2C9S, BIOS 2.00.37P21 03/12/2020
>   task: ffff880194b78000 task.stack: ffffc90006714000
>   RIP: 0010:backport_rdma_dim+0x10e/0x240 [mlx_compat]
>   RSP: 0018:ffff880c10e83ec0 EFLAGS: 00010202
>   RAX: 0000000000002710 RBX: ffff88096cd7f780 RCX: 0000000000000064
>   RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000001
>   RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
>   R10: 0000000000000000 R11: 0000000000000000 R12: 000000001d7c6c09
>   R13: ffff88096cd7f780 R14: ffff880b174fe800 R15: 0000000000000000
>   FS:  0000000000000000(0000) GS:ffff880c10e80000(0000)
>   knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00000000a0965b00 CR3: 000000000200a003 CR4: 00000000007606e0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   PKRU: 55555554
>   Call Trace:
>    <IRQ>
>    ib_poll_handler+0x43/0x80 [ib_core]
>    irq_poll_softirq+0xae/0x110
>    __do_softirq+0xd1/0x28c
>    irq_exit+0xde/0xf0
>    do_IRQ+0x54/0xe0
>    common_interrupt+0x8f/0x8f
>    </IRQ>
>    ? cpuidle_enter_state+0xd9/0x2a0
>    ? cpuidle_enter_state+0xc7/0x2a0
>    ? do_idle+0x170/0x1d0
>    ? cpu_startup_entry+0x6f/0x80
>    ? start_secondary+0x1b9/0x210
>    ? secondary_startup_64+0xa5/0xb0
>   Code: 0f 87 e1 00 00 00 8b 4c 24 14 44 8b 43 14 89 c8 4d 63 c8 44 29 c0 99 31 d0 29 d0 31 d2 48 98 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <49> f7 f1 48 83 f8 0a 0f 86 c1 00 00 00 44 39 c1 7f 10 48 89 df
>   RIP: backport_rdma_dim+0x10e/0x240 [mlx_compat] RSP: ffff880c10e83ec0
> 
> Fixes: f4915455dcf0 ("linux/dim: Implement RDMA adaptive moderation (DIM)")
> Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> Acked-by: Leon Romanovsky <leonro@nvidia.com>
> ---
>  include/linux/dim.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied to for-rc

Thanks,
Jason
diff mbox series

Patch

diff --git a/include/linux/dim.h b/include/linux/dim.h
index b698266d0035..6c5733981563 100644
--- a/include/linux/dim.h
+++ b/include/linux/dim.h
@@ -21,7 +21,7 @@ 
  * We consider 10% difference as significant.
  */
 #define IS_SIGNIFICANT_DIFF(val, ref) \
-	(((100UL * abs((val) - (ref))) / (ref)) > 10)
+	((ref) && (((100UL * abs((val) - (ref))) / (ref)) > 10))
 
 /*
  * Calculate the gap between two values.