diff mbox

IB/mlx5: avoid binding a new mpi unit to the same devices repeatedly

Message ID 20180713235021.18141-1-qing.huang@oracle.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Qing Huang July 13, 2018, 11:50 p.m. UTC
When a CX5 device is configured in dual-port RoCE mode, after creating
many VFs against port 1, creating the same number of VFs against port 2
will flood kernel/syslog with something like
"mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
affiliated."

So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
shouldn't repeatedly attempt to bind the new mpi data unit to every device
on the list until it finds an unbound device.

Reported-by: Gerald Gibson <gerald.gibson@oracle.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
---
 drivers/infiniband/hw/mlx5/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Or Gerlitz July 14, 2018, 3:57 p.m. UTC | #1
On Sat, Jul 14, 2018 at 2:50 AM, Qing Huang <qing.huang@oracle.com> wrote:
> When a CX5 device is configured in dual-port RoCE mode, after creating
> many VFs against port 1, creating the same number of VFs against port 2
> will flood kernel/syslog with something like
> "mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
> affiliated."
>
> So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
> shouldn't repeatedly attempt to bind the new mpi data unit to every device
> on the list until it finds an unbound device.

Daniel,

What is mpi data unit?

Or.

>
> Reported-by: Gerald Gibson <gerald.gibson@oracle.com>
> Signed-off-by: Qing Huang <qing.huang@oracle.com>
> ---
>  drivers/infiniband/hw/mlx5/main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index b3ba9a2..1ddd1d3 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -6068,7 +6068,8 @@ static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num)
>
>         mutex_lock(&mlx5_ib_multiport_mutex);
>         list_for_each_entry(dev, &mlx5_ib_dev_list, ib_dev_list) {
> -               if (dev->sys_image_guid == mpi->sys_image_guid)
> +               if (dev->sys_image_guid == mpi->sys_image_guid &&
> +                   !dev->port[mlx5_core_native_port_num(mdev) - 1].mp.mpi)
>                         bound = mlx5_ib_bind_slave_port(dev, mpi);
>
>                 if (bound) {
> --
> 2.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky July 15, 2018, 6:04 a.m. UTC | #2
On Sat, Jul 14, 2018 at 06:57:01PM +0300, Or Gerlitz wrote:
> On Sat, Jul 14, 2018 at 2:50 AM, Qing Huang <qing.huang@oracle.com> wrote:
> > When a CX5 device is configured in dual-port RoCE mode, after creating
> > many VFs against port 1, creating the same number of VFs against port 2
> > will flood kernel/syslog with something like
> > "mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
> > affiliated."
> >
> > So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
> > shouldn't repeatedly attempt to bind the new mpi data unit to every device
> > on the list until it finds an unbound device.
>
> Daniel,
>
> What is mpi data unit?

Or,

"mpi" is multi-port information needed for multi-port IB devices
resource sharing.

In simple words, "mpi" holds mapping information between
affiliated ports (master vs. slave) and IB device which it is connected.

Most probably, "mpi data unit" was meant "mpi struct" by Qing.

BTW, Daniel is OOO till 7-23.

Thanks

>
> Or.
>
> >
> > Reported-by: Gerald Gibson <gerald.gibson@oracle.com>
> > Signed-off-by: Qing Huang <qing.huang@oracle.com>
> > ---
> >  drivers/infiniband/hw/mlx5/main.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> > index b3ba9a2..1ddd1d3 100644
> > --- a/drivers/infiniband/hw/mlx5/main.c
> > +++ b/drivers/infiniband/hw/mlx5/main.c
> > @@ -6068,7 +6068,8 @@ static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num)
> >
> >         mutex_lock(&mlx5_ib_multiport_mutex);
> >         list_for_each_entry(dev, &mlx5_ib_dev_list, ib_dev_list) {
> > -               if (dev->sys_image_guid == mpi->sys_image_guid)
> > +               if (dev->sys_image_guid == mpi->sys_image_guid &&
> > +                   !dev->port[mlx5_core_native_port_num(mdev) - 1].mp.mpi)
> >                         bound = mlx5_ib_bind_slave_port(dev, mpi);
> >
> >                 if (bound) {
> > --
> > 2.9.3
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Jurgens July 15, 2018, 7:48 p.m. UTC | #3
On 7/14/2018 10:57 AM, Or Gerlitz wrote:
> On Sat, Jul 14, 2018 at 2:50 AM, Qing Huang <qing.huang@oracle.com> wrote:
>> When a CX5 device is configured in dual-port RoCE mode, after creating
>> many VFs against port 1, creating the same number of VFs against port 2
>> will flood kernel/syslog with something like
>> "mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
>> affiliated."
>>
>> So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
>> shouldn't repeatedly attempt to bind the new mpi data unit to every device
>> on the list until it finds an unbound device.
> Daniel,
>
> What is mpi data unit?
It's a structure to keep track affiliated port info in dual port RoCE mode, mpi meaning multi-port info. Parav can review this it my absence, otherwise I can take a closer look when I return to the office.
>
> Or.
>
>> Reported-by: Gerald Gibson <gerald.gibson@oracle.com>
>> Signed-off-by: Qing Huang <qing.huang@oracle.com>
>> ---
>>  drivers/infiniband/hw/mlx5/main.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
>> index b3ba9a2..1ddd1d3 100644
>> --- a/drivers/infiniband/hw/mlx5/main.c
>> +++ b/drivers/infiniband/hw/mlx5/main.c
>> @@ -6068,7 +6068,8 @@ static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num)
>>
>>         mutex_lock(&mlx5_ib_multiport_mutex);
>>         list_for_each_entry(dev, &mlx5_ib_dev_list, ib_dev_list) {
>> -               if (dev->sys_image_guid == mpi->sys_image_guid)
>> +               if (dev->sys_image_guid == mpi->sys_image_guid &&
>> +                   !dev->port[mlx5_core_native_port_num(mdev) - 1].mp.mpi)
>>                         bound = mlx5_ib_bind_slave_port(dev, mpi);
>>
>>                 if (bound) {
>> --
>> 2.9.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qing Huang July 23, 2018, 3:36 p.m. UTC | #4
On 7/15/2018 12:48 PM, Daniel Jurgens wrote:
> On 7/14/2018 10:57 AM, Or Gerlitz wrote:
>> On Sat, Jul 14, 2018 at 2:50 AM, Qing Huang <qing.huang@oracle.com> wrote:
>>> When a CX5 device is configured in dual-port RoCE mode, after creating
>>> many VFs against port 1, creating the same number of VFs against port 2
>>> will flood kernel/syslog with something like
>>> "mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
>>> affiliated."
>>>
>>> So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
>>> shouldn't repeatedly attempt to bind the new mpi data unit to every device
>>> on the list until it finds an unbound device.
>> Daniel,
>>
>> What is mpi data unit?
> It's a structure to keep track affiliated port info in dual port RoCE mode, mpi meaning multi-port info. Parav can review this it my absence, otherwise I can take a closer look when I return to the office.
Hi Daniel/Parav,

Have you got a chance to review this patch? Thanks!

>> Or.
>>
>>> Reported-by: Gerald Gibson <gerald.gibson@oracle.com>
>>> Signed-off-by: Qing Huang <qing.huang@oracle.com>
>>> ---
>>>   drivers/infiniband/hw/mlx5/main.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
>>> index b3ba9a2..1ddd1d3 100644
>>> --- a/drivers/infiniband/hw/mlx5/main.c
>>> +++ b/drivers/infiniband/hw/mlx5/main.c
>>> @@ -6068,7 +6068,8 @@ static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num)
>>>
>>>          mutex_lock(&mlx5_ib_multiport_mutex);
>>>          list_for_each_entry(dev, &mlx5_ib_dev_list, ib_dev_list) {
>>> -               if (dev->sys_image_guid == mpi->sys_image_guid)
>>> +               if (dev->sys_image_guid == mpi->sys_image_guid &&
>>> +                   !dev->port[mlx5_core_native_port_num(mdev) - 1].mp.mpi)
>>>                          bound = mlx5_ib_bind_slave_port(dev, mpi);
>>>
>>>                  if (bound) {
>>> --
>>> 2.9.3
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Parav Pandit July 23, 2018, 4:21 p.m. UTC | #5
Hi Qing,


> -----Original Message-----
> From: Qing Huang [mailto:qing.huang@oracle.com]
> Sent: Monday, July 23, 2018 10:36 AM
> To: Daniel Jurgens <danielj@mellanox.com>; Or Gerlitz
> <gerlitz.or@gmail.com>; Parav Pandit <parav@mellanox.com>
> Cc: Linux Kernel <linux-kernel@vger.kernel.org>; RDMA mailing list <linux-
> rdma@vger.kernel.org>; Jason Gunthorpe <jgg@ziepe.ca>; Doug Ledford
> <dledford@redhat.com>; Leon Romanovsky <leon@kernel.org>;
> gerald.gibson@oracle.com
> Subject: Re: [PATCH] IB/mlx5: avoid binding a new mpi unit to the same
> devices repeatedly
> 
> 
> 
> On 7/15/2018 12:48 PM, Daniel Jurgens wrote:
> > On 7/14/2018 10:57 AM, Or Gerlitz wrote:
> >> On Sat, Jul 14, 2018 at 2:50 AM, Qing Huang <qing.huang@oracle.com>
> wrote:
> >>> When a CX5 device is configured in dual-port RoCE mode, after
> >>> creating many VFs against port 1, creating the same number of VFs
> >>> against port 2 will flood kernel/syslog with something like
> >>> "mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
> >>> affiliated."
> >>>
> >>> So basically, when traversing mlx5_ib_dev_list,
> >>> mlx5_ib_add_slave_port() shouldn't repeatedly attempt to bind the
> >>> new mpi data unit to every device on the list until it finds an unbound
> device.
> >> Daniel,
> >>
> >> What is mpi data unit?
> > It's a structure to keep track affiliated port info in dual port RoCE mode,
> mpi meaning multi-port info. Parav can review this it my absence, otherwise I
> can take a closer look when I return to the office.
> Hi Daniel/Parav,
> 
> Have you got a chance to review this patch? Thanks!
Didn't have chance yet.
Will do this week.
Daniel Jurgens July 23, 2018, 6:11 p.m. UTC | #6
On 7/23/2018 10:36 AM, Qing Huang wrote:
>
> Hi Daniel/Parav,
>
> Have you got a chance to review this patch? Thanks!
Hi Qing, sorry for the delay, I just got back to the office today. I don't agree with the proposed fix, I provided an alternative suggestion below.
>
>>> Or.
>>>
>>>> Reported-by: Gerald Gibson <gerald.gibson@oracle.com>
>>>> Signed-off-by: Qing Huang <qing.huang@oracle.com>
>>>> ---
>>>>   drivers/infiniband/hw/mlx5/main.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
>>>> index b3ba9a2..1ddd1d3 100644
>>>> --- a/drivers/infiniband/hw/mlx5/main.c
>>>> +++ b/drivers/infiniband/hw/mlx5/main.c
>>>> @@ -6068,7 +6068,8 @@ static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num)
>>>>
>>>>          mutex_lock(&mlx5_ib_multiport_mutex);
>>>>          list_for_each_entry(dev, &mlx5_ib_dev_list, ib_dev_list) {
>>>> -               if (dev->sys_image_guid == mpi->sys_image_guid)
>>>> +               if (dev->sys_image_guid == mpi->sys_image_guid &&
>>>> +                   !dev->port[mlx5_core_native_port_num(mdev) - 1].mp.mpi)
You shouldn't check the mpi field that without holding the lock in the mp structure. Prefer you change the print from a warning in mlx5_ib_bind_slave_port to a debug message.

>>>>                          bound = mlx5_ib_bind_slave_port(dev, mpi);
>>>>
>>>>                  if (bound) {
>>>> -- 
>>>> 2.9.3
>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qing Huang July 23, 2018, 6:21 p.m. UTC | #7
Hi Daniel,


On 7/23/2018 11:11 AM, Daniel Jurgens wrote:
>
> On 7/23/2018 10:36 AM, Qing Huang wrote:
>> Hi Daniel/Parav,
>>
>> Have you got a chance to review this patch? Thanks!
> Hi Qing, sorry for the delay, I just got back to the office today. I don't agree with the proposed fix, I provided an alternative suggestion below.
>>>> Or.
>>>>
>>>>> Reported-by: Gerald Gibson <gerald.gibson@oracle.com>
>>>>> Signed-off-by: Qing Huang <qing.huang@oracle.com>
>>>>> ---
>>>>>    drivers/infiniband/hw/mlx5/main.c | 3 ++-
>>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
>>>>> index b3ba9a2..1ddd1d3 100644
>>>>> --- a/drivers/infiniband/hw/mlx5/main.c
>>>>> +++ b/drivers/infiniband/hw/mlx5/main.c
>>>>> @@ -6068,7 +6068,8 @@ static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num)
>>>>>
>>>>>           mutex_lock(&mlx5_ib_multiport_mutex);
>>>>>           list_for_each_entry(dev, &mlx5_ib_dev_list, ib_dev_list) {
>>>>> -               if (dev->sys_image_guid == mpi->sys_image_guid)
>>>>> +               if (dev->sys_image_guid == mpi->sys_image_guid &&
>>>>> +                   !dev->port[mlx5_core_native_port_num(mdev) - 1].mp.mpi)
> You shouldn't check the mpi field that without holding the lock in the mp structure. Prefer you change the print from a warning in mlx5_ib_bind_slave_port to a debug message.
Thanks for the review. That works for us too. Will resend the patch.

Regards,
Qing

>
>>>>>                           bound = mlx5_ib_bind_slave_port(dev, mpi);
>>>>>
>>>>>                   if (bound) {
>>>>> -- 
>>>>> 2.9.3
>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b3ba9a2..1ddd1d3 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -6068,7 +6068,8 @@  static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num)
 
 	mutex_lock(&mlx5_ib_multiport_mutex);
 	list_for_each_entry(dev, &mlx5_ib_dev_list, ib_dev_list) {
-		if (dev->sys_image_guid == mpi->sys_image_guid)
+		if (dev->sys_image_guid == mpi->sys_image_guid &&
+		    !dev->port[mlx5_core_native_port_num(mdev) - 1].mp.mpi)
 			bound = mlx5_ib_bind_slave_port(dev, mpi);
 
 		if (bound) {