diff mbox

IB/mlx5: Fix decision to avoid using MAD_IFC command in ISSI > 0 mode

Message ID 1473232990-22766-1-git-send-email-dchang@suse.com (mailing list archive)
State Superseded
Headers show

Commit Message

David Chang Sept. 7, 2016, 7:23 a.m. UTC
When using MAD_IFC command, we should also consider avoiding in
ISSI > 0 mode, otherwise most of the MAD_IFC command features
are deprecated and cannot be used.

Fixes: d603c809ef91 ("IB/mlx5: Fix decision on using MAD_IFC")
Reported-by: Sujith Pandel <sujith_pandel@dell.com>
Signed-off-by: David Chang <dchang@suse.com>
---
 drivers/infiniband/hw/mlx5/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Or Gerlitz Sept. 7, 2016, 7:40 a.m. UTC | #1
On Wed, Sep 7, 2016 at 10:23 AM, David Chang <dchang@suse.com> wrote:
> When using MAD_IFC command, we should also consider avoiding in
> ISSI > 0 mode, otherwise most of the MAD_IFC command features
> are deprecated and cannot be used.

Ofcourse!!

Mark/Meny, didn't you stpped on it / addressed that as part of some
other counters work?

Doron, don't you see this as repeated 100% failure e.g with Eth SRIOV
VFs and/or RoCE devices?

Or.

>
> Fixes: d603c809ef91 ("IB/mlx5: Fix decision on using MAD_IFC")
> Reported-by: Sujith Pandel <sujith_pandel@dell.com>
> Signed-off-by: David Chang <dchang@suse.com>
> ---
>  drivers/infiniband/hw/mlx5/main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index 1b4094baa2de..0796fb2b04f1 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -288,7 +288,8 @@ __be16 mlx5_get_roce_udp_sport(struct mlx5_ib_dev *dev, u8 port_num,
>
>  static int mlx5_use_mad_ifc(struct mlx5_ib_dev *dev)
>  {
> -       return !MLX5_CAP_GEN(dev->mdev, ib_virt);
> +       return !dev->mdev->issi &&
> +               !MLX5_CAP_GEN(dev->mdev, ib_virt);
>  }
>
>  enum {
> --
> 2.6.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Sept. 8, 2016, 2:07 p.m. UTC | #2
On Wed, Sep 07, 2016 at 03:23:10PM +0800, David Chang wrote:
> When using MAD_IFC command, we should also consider avoiding in
> ISSI > 0 mode, otherwise most of the MAD_IFC command features
> are deprecated and cannot be used.
>
> Fixes: d603c809ef91 ("IB/mlx5: Fix decision on using MAD_IFC")
> Reported-by: Sujith Pandel <sujith_pandel@dell.com>
> Signed-off-by: David Chang <dchang@suse.com>

NAK,
It is wrong, the deprecation is removed from programming manual.

Thanks
Or Gerlitz Sept. 8, 2016, 3:24 p.m. UTC | #3
On Thu, Sep 8, 2016 at 5:07 PM, Leon Romanovsky <leonro@mellanox.com> wrote:
> On Wed, Sep 07, 2016 at 03:23:10PM +0800, David Chang wrote:
>> When using MAD_IFC command, we should also consider avoiding in
>> ISSI > 0 mode, otherwise most of the MAD_IFC command features
>> are deprecated and cannot be used.
>>
>> Fixes: d603c809ef91 ("IB/mlx5: Fix decision on using MAD_IFC")
>> Reported-by: Sujith Pandel <sujith_pandel@dell.com>
>> Signed-off-by: David Chang <dchang@suse.com>
>
> NAK,
> It is wrong, the deprecation is removed from programming manual.

The rdma programing manual is not open to the community, there's no
point to comment
here if X is there or not.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Chang Sept. 9, 2016, 2:55 a.m. UTC | #4
On Thu, Sep 08, 2016 at 05:07:52PM +0300, Leon Romanovsky wrote:
> On Wed, Sep 07, 2016 at 03:23:10PM +0800, David Chang wrote:
> > When using MAD_IFC command, we should also consider avoiding in
> > ISSI > 0 mode, otherwise most of the MAD_IFC command features
> > are deprecated and cannot be used.
> >
> > Fixes: d603c809ef91 ("IB/mlx5: Fix decision on using MAD_IFC")
> > Reported-by: Sujith Pandel <sujith_pandel@dell.com>
> > Signed-off-by: David Chang <dchang@suse.com>
> 
> NAK,
> It is wrong, the deprecation is removed from programming manual.
> 

Without the patch, we got the following message.
[    8.456327] mlx5_core 0000:03:00.0: firmware version: 12.12.780
...
[   10.417421] mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014)
[   10.419282] ------------[ cut here ]------------
[   10.419291] WARNING: CPU: 2 PID: 2517 at ../drivers/infiniband/core/cache.c:702 ib_cache_gid_set_default_gid+0x2f8/0x340 [ib_core]()
[   10.419386] CPU: 2 PID: 2517 Comm: modprobe Tainted: G                 X 4.4.19-1-default #1
[   10.419387] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.1.7 06/16/2016
[   10.419389]  0000000000000000 ffffffff8130d740 0000000000000000 ffffffffa04e0300
[   10.419395]  ffffffff8107c121
[   10.419400]  ffff88017bfe0000 ffff88003712b9e0 ffff88045ad905c0
[   10.419401]  0000000000000001 fffffffffffffffc ffffffffa04d8a58 0000000000000000
[   10.419406] Call Trace:
[   10.419415]  [<ffffffff81019a59>] dump_trace+0x59/0x310
[   10.419419]  [<ffffffff81019dfa>] show_stack_log_lvl+0xea/0x170
[   10.419421]  [<ffffffff8101ab81>] show_stack+0x21/0x40
[   10.419426]  [<ffffffff8130d740>] dump_stack+0x5c/0x7c
[   10.419431]  [<ffffffff8107c121>] warn_slowpath_common+0x81/0xb0
[   10.419436]  [<ffffffffa04d8a58>] ib_cache_gid_set_default_gid+0x2f8/0x340 [ib_core]
[   10.419449]  [<ffffffffa04da2dd>] add_netdev_ips+0x9d/0xa0 [ib_core]
[   10.419456]  [<ffffffffa04da45b>] enum_all_gids_of_dev_cb+0x7b/0xb0 [ib_core]
[   10.419461]  [<ffffffffa04d641d>] ib_enum_roce_netdev+0xdd/0x100 [ib_core]
[   10.419466]  [<ffffffffa04da5ed>] roce_rescan_device+0x1d/0x20 [ib_core]
[   10.419470]  [<ffffffffa04d8cdb>] ib_cache_setup_one+0x23b/0x3d0 [ib_core]
[   10.419475]  [<ffffffffa04d606b>] ib_register_device+0x2bb/0x4f0 [ib_core]
[   10.419483]  [<ffffffffa0618bbf>] mlx5_ib_add+0xaaf/0x12e0 [mlx5_ib]
[   10.419492]  [<ffffffffa08b76c1>] mlx5_add_device+0x41/0xa0 [mlx5_core]
[   10.419498]  [<ffffffffa08b7785>] mlx5_register_interface+0x65/0xa0 [mlx5_core]
[   10.419502]  [<ffffffffa0474030>] mlx5_ib_init+0x30/0x42 [mlx5_ib]
[   10.419506]  [<ffffffff81002138>] do_one_initcall+0xc8/0x1f0
[   10.419510]  [<ffffffff811827e8>] do_init_module+0x5a/0x1d7
[   10.419514]  [<ffffffff81103536>] load_module+0x1366/0x1c50
[   10.419518]  [<ffffffff81103fd0>] SYSC_finit_module+0x70/0xa0
[   10.419523]  [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d
[   10.420681] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0x6d
[   10.420682] Leftover inexact backtrace:
[   10.420684] ---[ end trace fc8ccb16c9d8e28a ]---
...

Thanks,
David Chang
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Sept. 9, 2016, 10:39 a.m. UTC | #5
On Fri, Sep 9, 2016 at 5:55 AM, David Chang <dchang@suse.com> wrote:
> On Thu, Sep 08, 2016 at 05:07:52PM +0300, Leon Romanovsky wrote:
>> On Wed, Sep 07, 2016 at 03:23:10PM +0800, David Chang wrote:
>> > When using MAD_IFC command, we should also consider avoiding in
>> > ISSI > 0 mode, otherwise most of the MAD_IFC command features
>> > are deprecated and cannot be used.
>> >
>> > Fixes: d603c809ef91 ("IB/mlx5: Fix decision on using MAD_IFC")
>> > Reported-by: Sujith Pandel <sujith_pandel@dell.com>
>> > Signed-off-by: David Chang <dchang@suse.com>
>>
>> NAK, It is wrong, the deprecation is removed from programming manual.

What do mean by "deprecation is removed", please clarify. Do you claim
that MAD_IFC is usable on Ethernet port or when ISSI > 0?

Note that even if this  valid with the current firmware (and I don't
think that is the case), the driver you are maintaining (mlx5_ib)
needs to support previous GA firmware releases which are out there,
for which this unknown deprecation you are talking about doesn't hold.


> Without the patch, we got the following message.
> [    8.456327] mlx5_core 0000:03:00.0: firmware version: 12.12.780
> ...
> [   10.417421] mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014)
> [   10.419282] ------------[ cut here ]------------
> [   10.419291] WARNING: CPU: 2 PID: 2517 at ../drivers/infiniband/core/cache.c:702 ib_cache_gid_set_default_gid+0x2f8/0x340 [ib_core]()


and this reproduces 100% over Eth ports or just sometimes?

> [   10.419386] CPU: 2 PID: 2517 Comm: modprobe Tainted: G                 X 4.4.19-1-default #1
> [   10.419387] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.1.7 06/16/2016
> [   10.419389]  0000000000000000 ffffffff8130d740 0000000000000000 ffffffffa04e0300
> [   10.419395]  ffffffff8107c121
> [   10.419400]  ffff88017bfe0000 ffff88003712b9e0 ffff88045ad905c0
> [   10.419401]  0000000000000001 fffffffffffffffc ffffffffa04d8a58 0000000000000000
> [   10.419406] Call Trace:
> [   10.419415]  [<ffffffff81019a59>] dump_trace+0x59/0x310
> [   10.419419]  [<ffffffff81019dfa>] show_stack_log_lvl+0xea/0x170
> [   10.419421]  [<ffffffff8101ab81>] show_stack+0x21/0x40
> [   10.419426]  [<ffffffff8130d740>] dump_stack+0x5c/0x7c
> [   10.419431]  [<ffffffff8107c121>] warn_slowpath_common+0x81/0xb0
> [   10.419436]  [<ffffffffa04d8a58>] ib_cache_gid_set_default_gid+0x2f8/0x340 [ib_core]
> [   10.419449]  [<ffffffffa04da2dd>] add_netdev_ips+0x9d/0xa0 [ib_core]
> [   10.419456]  [<ffffffffa04da45b>] enum_all_gids_of_dev_cb+0x7b/0xb0 [ib_core]
> [   10.419461]  [<ffffffffa04d641d>] ib_enum_roce_netdev+0xdd/0x100 [ib_core]
> [   10.419466]  [<ffffffffa04da5ed>] roce_rescan_device+0x1d/0x20 [ib_core]
> [   10.419470]  [<ffffffffa04d8cdb>] ib_cache_setup_one+0x23b/0x3d0 [ib_core]
> [   10.419475]  [<ffffffffa04d606b>] ib_register_device+0x2bb/0x4f0 [ib_core]
> [   10.419483]  [<ffffffffa0618bbf>] mlx5_ib_add+0xaaf/0x12e0 [mlx5_ib]
> [   10.419492]  [<ffffffffa08b76c1>] mlx5_add_device+0x41/0xa0 [mlx5_core]
> [   10.419498]  [<ffffffffa08b7785>] mlx5_register_interface+0x65/0xa0 [mlx5_core]
> [   10.419502]  [<ffffffffa0474030>] mlx5_ib_init+0x30/0x42 [mlx5_ib]
> [   10.419506]  [<ffffffff81002138>] do_one_initcall+0xc8/0x1f0
> [   10.419510]  [<ffffffff811827e8>] do_init_module+0x5a/0x1d7
> [   10.419514]  [<ffffffff81103536>] load_module+0x1366/0x1c50
> [   10.419518]  [<ffffffff81103fd0>] SYSC_finit_module+0x70/0xa0
> [   10.419523]  [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d
> [   10.420681] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0x6d
> [   10.420682] Leftover inexact backtrace:
> [   10.420684] ---[ end trace fc8ccb16c9d8e28a ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Chang Sept. 10, 2016, 6:10 a.m. UTC | #6
On Fri, Sep 09, 2016 at 01:39:18PM +0300, Or Gerlitz wrote:
> > Without the patch, we got the following message.
> > [    8.456327] mlx5_core 0000:03:00.0: firmware version: 12.12.780
> > ...
> > [   10.417421] mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014)
> > [   10.419282] ------------[ cut here ]------------
> > [   10.419291] WARNING: CPU: 2 PID: 2517 at ../drivers/infiniband/core/cache.c:702 ib_cache_gid_set_default_gid+0x2f8/0x340 [ib_core]()
> 
> 
> and this reproduces 100% over Eth ports or just sometimes?

Feedback from the customer.
It was consistently seen with fw:12.12.780
It was never seen after updating the firmware to 12.14.1100 or higher.
Currently in fw ver:12.16.1020.

Thanks,
David Chang
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Sept. 10, 2016, 7:47 a.m. UTC | #7
On Fri, Sep 09, 2016 at 10:55:27AM +0800, David Chang wrote:
> On Thu, Sep 08, 2016 at 05:07:52PM +0300, Leon Romanovsky wrote:
> > On Wed, Sep 07, 2016 at 03:23:10PM +0800, David Chang wrote:
> > > When using MAD_IFC command, we should also consider avoiding in
> > > ISSI > 0 mode, otherwise most of the MAD_IFC command features
> > > are deprecated and cannot be used.
> > >
> > > Fixes: d603c809ef91 ("IB/mlx5: Fix decision on using MAD_IFC")
> > > Reported-by: Sujith Pandel <sujith_pandel@dell.com>
> > > Signed-off-by: David Chang <dchang@suse.com>
> >
> > NAK,
> > It is wrong, the deprecation is removed from programming manual.
> >
>
> Without the patch, we got the following message.
> [    8.456327] mlx5_core 0000:03:00.0: firmware version: 12.12.780
> ...

This command is supported only for physical function (PF) drivers
and only when physical port is IB without relation to ISSI.

When I'll return to office (next week), I'll check that we are checking
this requirement correctly.

Thanks for providing dump and FW version to reproduce it.


> [   10.417421] mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014)
> [   10.419282] ------------[ cut here ]------------
> [   10.419291] WARNING: CPU: 2 PID: 2517 at ../drivers/infiniband/core/cache.c:702 ib_cache_gid_set_default_gid+0x2f8/0x340 [ib_core]()
> [   10.419386] CPU: 2 PID: 2517 Comm: modprobe Tainted: G                 X 4.4.19-1-default #1
> [   10.419387] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.1.7 06/16/2016
> [   10.419389]  0000000000000000 ffffffff8130d740 0000000000000000 ffffffffa04e0300
> [   10.419395]  ffffffff8107c121
> [   10.419400]  ffff88017bfe0000 ffff88003712b9e0 ffff88045ad905c0
> [   10.419401]  0000000000000001 fffffffffffffffc ffffffffa04d8a58 0000000000000000
> [   10.419406] Call Trace:
> [   10.419415]  [<ffffffff81019a59>] dump_trace+0x59/0x310
> [   10.419419]  [<ffffffff81019dfa>] show_stack_log_lvl+0xea/0x170
> [   10.419421]  [<ffffffff8101ab81>] show_stack+0x21/0x40
> [   10.419426]  [<ffffffff8130d740>] dump_stack+0x5c/0x7c
> [   10.419431]  [<ffffffff8107c121>] warn_slowpath_common+0x81/0xb0
> [   10.419436]  [<ffffffffa04d8a58>] ib_cache_gid_set_default_gid+0x2f8/0x340 [ib_core]
> [   10.419449]  [<ffffffffa04da2dd>] add_netdev_ips+0x9d/0xa0 [ib_core]
> [   10.419456]  [<ffffffffa04da45b>] enum_all_gids_of_dev_cb+0x7b/0xb0 [ib_core]
> [   10.419461]  [<ffffffffa04d641d>] ib_enum_roce_netdev+0xdd/0x100 [ib_core]
> [   10.419466]  [<ffffffffa04da5ed>] roce_rescan_device+0x1d/0x20 [ib_core]
> [   10.419470]  [<ffffffffa04d8cdb>] ib_cache_setup_one+0x23b/0x3d0 [ib_core]
> [   10.419475]  [<ffffffffa04d606b>] ib_register_device+0x2bb/0x4f0 [ib_core]
> [   10.419483]  [<ffffffffa0618bbf>] mlx5_ib_add+0xaaf/0x12e0 [mlx5_ib]
> [   10.419492]  [<ffffffffa08b76c1>] mlx5_add_device+0x41/0xa0 [mlx5_core]
> [   10.419498]  [<ffffffffa08b7785>] mlx5_register_interface+0x65/0xa0 [mlx5_core]
> [   10.419502]  [<ffffffffa0474030>] mlx5_ib_init+0x30/0x42 [mlx5_ib]
> [   10.419506]  [<ffffffff81002138>] do_one_initcall+0xc8/0x1f0
> [   10.419510]  [<ffffffff811827e8>] do_init_module+0x5a/0x1d7
> [   10.419514]  [<ffffffff81103536>] load_module+0x1366/0x1c50
> [   10.419518]  [<ffffffff81103fd0>] SYSC_finit_module+0x70/0xa0
> [   10.419523]  [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d
> [   10.420681] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0x6d
> [   10.420682] Leftover inexact backtrace:
> [   10.420684] ---[ end trace fc8ccb16c9d8e28a ]---
> ...
>
> Thanks,
> David Chang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 1b4094baa2de..0796fb2b04f1 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -288,7 +288,8 @@  __be16 mlx5_get_roce_udp_sport(struct mlx5_ib_dev *dev, u8 port_num,
 
 static int mlx5_use_mad_ifc(struct mlx5_ib_dev *dev)
 {
-	return !MLX5_CAP_GEN(dev->mdev, ib_virt);
+	return !dev->mdev->issi &&
+		!MLX5_CAP_GEN(dev->mdev, ib_virt);
 }
 
 enum {