diff mbox series

[for-next] RDMA/nldev: Add parent bdf to device information dump

Message ID 20201103132627.67642-1-galpress@amazon.com (mailing list archive)
State Rejected
Headers show
Series [for-next] RDMA/nldev: Add parent bdf to device information dump | expand

Commit Message

Gal Pressman Nov. 3, 2020, 1:26 p.m. UTC
Add the ability to query the device's bdf through rdma tool netlink
command (in addition to the sysfs infra).

In case of virtual devices (rxe/siw), the netdev bdf will be shown.

Signed-off-by: Gal Pressman <galpress@amazon.com>
---
 drivers/infiniband/core/nldev.c  | 10 +++++++++-
 include/uapi/rdma/rdma_netlink.h |  5 +++++
 2 files changed, 14 insertions(+), 1 deletion(-)

Comments

Parav Pandit Nov. 3, 2020, 1:31 p.m. UTC | #1
> From: Gal Pressman <galpress@amazon.com>
> Sent: Tuesday, November 3, 2020 6:56 PM
> 
> Add the ability to query the device's bdf through rdma tool netlink command
> (in addition to the sysfs infra).
> 
New netlink attribute addition needs to show an example in the commit message for
$ rdma dev show
or 
$rdma link show 
Whichever applicable.

> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
> 
> Signed-off-by: Gal Pressman <galpress@amazon.com>
> ---
>  drivers/infiniband/core/nldev.c  | 10 +++++++++-
> include/uapi/rdma/rdma_netlink.h |  5 +++++
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 12d29d54a081..9704b1449c01 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -291,7 +291,15 @@ static int fill_dev_info(struct sk_buff *msg, struct
> ib_device *device)
>  	else if (rdma_protocol_usnic(device, port))
>  		ret = nla_put_string(msg,
> RDMA_NLDEV_ATTR_DEV_PROTOCOL,
>  				     "usnic");
> -	return ret;
> +	if (ret)
> +		return ret;
> +
> +	if (device->dev.parent)
> +		if (nla_put_string(msg, RDMA_NLDEV_PARENT_BDF,
Not everything is PCI, BDF is too pci specific.

So name attribute name should be RDMA_NLDEV_PARENT_DEV and additional one as PARENT_BUS

> +				   dev_name(device->dev.parent)))
> +			return -EMSGSIZE;
> +
> +	return 0;
>  }
> 
>  static int fill_port_info(struct sk_buff *msg, diff --git
> a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> index d2f5b8396243..7495104668eb 100644
> --- a/include/uapi/rdma/rdma_netlink.h
> +++ b/include/uapi/rdma/rdma_netlink.h
> @@ -533,6 +533,11 @@ enum rdma_nldev_attr {
> 
>  	RDMA_NLDEV_ATTR_RES_RAW,	/* binary */
> 
> +	/*
> +	 * Parent device BDF (bus, device, function).
> +	 */
> +	RDMA_NLDEV_PARENT_BDF,			/* string */
> +
>  	/*
>  	 * Always the end
>  	 */
> --
> 2.29.1
Jason Gunthorpe Nov. 3, 2020, 1:45 p.m. UTC | #2
On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> Add the ability to query the device's bdf through rdma tool netlink
> command (in addition to the sysfs infra).
> 
> In case of virtual devices (rxe/siw), the netdev bdf will be shown.

Why? What is the use case?

Jason
Leon Romanovsky Nov. 3, 2020, 1:57 p.m. UTC | #3
On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> > Add the ability to query the device's bdf through rdma tool netlink
> > command (in addition to the sysfs infra).
> >
> > In case of virtual devices (rxe/siw), the netdev bdf will be shown.
>
> Why? What is the use case?

Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?

Thanks

>
> Jason
Gal Pressman Nov. 3, 2020, 2:11 p.m. UTC | #4
On 03/11/2020 15:57, Leon Romanovsky wrote:
> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
>>> Add the ability to query the device's bdf through rdma tool netlink
>>> command (in addition to the sysfs infra).
>>>
>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
>>
>> Why? What is the use case?
> 
> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?

When taking system topology into consideration you need some way to pair the
ibdev and bdf, especially when working with multiple devices.
The netdev name doesn't exist on devices with no netdevs (IB, EFA).

Why rdma tool? Because it's more intuitive than sysfs.
Jason Gunthorpe Nov. 3, 2020, 2:22 p.m. UTC | #5
On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
> On 03/11/2020 15:57, Leon Romanovsky wrote:
> > On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
> >> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> >>> Add the ability to query the device's bdf through rdma tool netlink
> >>> command (in addition to the sysfs infra).
> >>>
> >>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
> >>
> >> Why? What is the use case?
> > 
> > Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
> 
> When taking system topology into consideration you need some way to pair the
> ibdev and bdf, especially when working with multiple devices.
> The netdev name doesn't exist on devices with no netdevs (IB, EFA).

You are supposed to use sysfs

/sys/class/infiniband/ibp0s9/device
 
Should always be the physical device

> Why rdma tool? Because it's more intuitive than sysfs.

But we generally don't put this information into netlink BDF is just
the start, you need all the other topology information to make sense
of it, and all that is in sysfs only already

Jason
Gal Pressman Nov. 3, 2020, 3:45 p.m. UTC | #6
On 03/11/2020 16:22, Jason Gunthorpe wrote:
> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
>> On 03/11/2020 15:57, Leon Romanovsky wrote:
>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
>>>>> Add the ability to query the device's bdf through rdma tool netlink
>>>>> command (in addition to the sysfs infra).
>>>>>
>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
>>>>
>>>> Why? What is the use case?
>>>
>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
>>
>> When taking system topology into consideration you need some way to pair the
>> ibdev and bdf, especially when working with multiple devices.
>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
> 
> You are supposed to use sysfs
> 
> /sys/class/infiniband/ibp0s9/device
> 
> Should always be the physical device
> 
>> Why rdma tool? Because it's more intuitive than sysfs.
> 
> But we generally don't put this information into netlink BDF is just
> the start, you need all the other topology information to make sense
> of it, and all that is in sysfs only already

As the commit message says, it's in addition to the device sysfs.

Many (if not most) of the existing rdma netlink commands are duplicates of some
sysfs entries, but show it in a more "modern" way.
I'm not convinced that bdf should be treated differently.

Similarly to how you can see netdevs bdf through 'ethtool -i' in addition to
sysfs, I think it's useful.
Jason Gunthorpe Nov. 5, 2020, 8 p.m. UTC | #7
On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
> On 03/11/2020 16:22, Jason Gunthorpe wrote:
> > On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
> >> On 03/11/2020 15:57, Leon Romanovsky wrote:
> >>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
> >>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> >>>>> Add the ability to query the device's bdf through rdma tool netlink
> >>>>> command (in addition to the sysfs infra).
> >>>>>
> >>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
> >>>>
> >>>> Why? What is the use case?
> >>>
> >>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
> >>
> >> When taking system topology into consideration you need some way to pair the
> >> ibdev and bdf, especially when working with multiple devices.
> >> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
> > 
> > You are supposed to use sysfs
> > 
> > /sys/class/infiniband/ibp0s9/device
> > 
> > Should always be the physical device
> > 
> >> Why rdma tool? Because it's more intuitive than sysfs.
> > 
> > But we generally don't put this information into netlink BDF is just
> > the start, you need all the other topology information to make sense
> > of it, and all that is in sysfs only already
> 
> As the commit message says, it's in addition to the device sysfs.
> 
> Many (if not most) of the existing rdma netlink commands are duplicates of some
> sysfs entries, but show it in a more "modern" way.
> I'm not convinced that bdf should be treated differently.

Why did you call it BDF anyhow? it has nothing to do with PCI BDF
other than it happens to be the PDF for PCI devices. Netdev called
this bus_info

> Similarly to how you can see netdevs bdf through 'ethtool -i' in addition to
> sysfs, I think it's useful.

bus_info is incredibly old, it predates even the driver core to an
time when there really was no other way to get the information.

Jason
Gal Pressman Nov. 8, 2020, 1:03 p.m. UTC | #8
On 05/11/2020 22:00, Jason Gunthorpe wrote:
> On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
>> On 03/11/2020 16:22, Jason Gunthorpe wrote:
>>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
>>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
>>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
>>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
>>>>>>> Add the ability to query the device's bdf through rdma tool netlink
>>>>>>> command (in addition to the sysfs infra).
>>>>>>>
>>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
>>>>>>
>>>>>> Why? What is the use case?
>>>>>
>>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
>>>>
>>>> When taking system topology into consideration you need some way to pair the
>>>> ibdev and bdf, especially when working with multiple devices.
>>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
>>>
>>> You are supposed to use sysfs
>>>
>>> /sys/class/infiniband/ibp0s9/device
>>>
>>> Should always be the physical device
>>>
>>>> Why rdma tool? Because it's more intuitive than sysfs.
>>>
>>> But we generally don't put this information into netlink BDF is just
>>> the start, you need all the other topology information to make sense
>>> of it, and all that is in sysfs only already
>>
>> As the commit message says, it's in addition to the device sysfs.
>>
>> Many (if not most) of the existing rdma netlink commands are duplicates of some
>> sysfs entries, but show it in a more "modern" way.
>> I'm not convinced that bdf should be treated differently.
> 
> Why did you call it BDF anyhow? it has nothing to do with PCI BDF
> other than it happens to be the PDF for PCI devices. Netdev called
> this bus_info

Are there non pci devices in the subsystem?
I can rename to a more fitting name, will change to bus_info unless someone has
a better idea.
Parav Pandit Nov. 8, 2020, 2:36 p.m. UTC | #9
> From: Gal Pressman <galpress@amazon.com>
> Sent: Sunday, November 8, 2020 6:34 PM
> 
> On 05/11/2020 22:00, Jason Gunthorpe wrote:
> > On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
> >> On 03/11/2020 16:22, Jason Gunthorpe wrote:
> >>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
> >>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
> >>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
> >>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> >>>>>>> Add the ability to query the device's bdf through rdma tool
> >>>>>>> netlink command (in addition to the sysfs infra).
> >>>>>>>
> >>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
> >>>>>>
> >>>>>> Why? What is the use case?
> >>>>>
> >>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
> >>>>
> >>>> When taking system topology into consideration you need some way to
> >>>> pair the ibdev and bdf, especially when working with multiple devices.
> >>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
> >>>
> >>> You are supposed to use sysfs
> >>>
> >>> /sys/class/infiniband/ibp0s9/device
> >>>
> >>> Should always be the physical device
> >>>
> >>>> Why rdma tool? Because it's more intuitive than sysfs.
> >>>
> >>> But we generally don't put this information into netlink BDF is just
> >>> the start, you need all the other topology information to make sense
> >>> of it, and all that is in sysfs only already
> >>
> >> As the commit message says, it's in addition to the device sysfs.
> >>
> >> Many (if not most) of the existing rdma netlink commands are
> >> duplicates of some sysfs entries, but show it in a more "modern" way.
> >> I'm not convinced that bdf should be treated differently.
> >
> > Why did you call it BDF anyhow? it has nothing to do with PCI BDF
> > other than it happens to be the PDF for PCI devices. Netdev called
> > this bus_info
> 
> Are there non pci devices in the subsystem?
Yes. They are coming over auxiliary bus, waiting for the bus and Leon's patchset [2] to be merged.

> I can rename to a more fitting name, will change to bus_info unless someone
> has a better idea.
Yes. I guess you missed the suggestion given in [1].
Basically adding bus name and device name will generate unique bus+device information.
This is generic, not specific to PCI.
RDMA_NLDEV_ATTR_PARENT_DEV_NAME, RDMA_NLDEV_ATTR_PARENT_DEV_BUS_NAME.

[1] https://lore.kernel.org/linux-rdma/cd3f2926-0491-8540-d6b1-534014190bae@amazon.com/T/#ma5f71e14abae23fb67a52ff06e74600ce1489e79
[2] https://lore.kernel.org/linux-rdma/DM6PR11MB28417902253469FC9ABB72F0DDEE0@DM6PR11MB2841.namprd11.prod.outlook.com/T/#m37d9d24903fff0e99e7fec59933d4fe6e6a5162b
Jason Gunthorpe Nov. 8, 2020, 11:49 p.m. UTC | #10
On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
> On 05/11/2020 22:00, Jason Gunthorpe wrote:
> > On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
> >> On 03/11/2020 16:22, Jason Gunthorpe wrote:
> >>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
> >>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
> >>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
> >>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> >>>>>>> Add the ability to query the device's bdf through rdma tool netlink
> >>>>>>> command (in addition to the sysfs infra).
> >>>>>>>
> >>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
> >>>>>>
> >>>>>> Why? What is the use case?
> >>>>>
> >>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
> >>>>
> >>>> When taking system topology into consideration you need some way to pair the
> >>>> ibdev and bdf, especially when working with multiple devices.
> >>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
> >>>
> >>> You are supposed to use sysfs
> >>>
> >>> /sys/class/infiniband/ibp0s9/device
> >>>
> >>> Should always be the physical device
> >>>
> >>>> Why rdma tool? Because it's more intuitive than sysfs.
> >>>
> >>> But we generally don't put this information into netlink BDF is just
> >>> the start, you need all the other topology information to make sense
> >>> of it, and all that is in sysfs only already
> >>
> >> As the commit message says, it's in addition to the device sysfs.
> >>
> >> Many (if not most) of the existing rdma netlink commands are duplicates of some
> >> sysfs entries, but show it in a more "modern" way.
> >> I'm not convinced that bdf should be treated differently.
> > 
> > Why did you call it BDF anyhow? it has nothing to do with PCI BDF
> > other than it happens to be the PDF for PCI devices. Netdev called
> > this bus_info
> 
> Are there non pci devices in the subsystem?

Yes, HNS uses non-pci devices

> I can rename to a more fitting name, will change to bus_info unless
> someone has a better idea.

The thing is, is is still useless. You have to consult sysfs to
understand what bus it is scoped on to do anything further with
it. Can't just assume it is PCI.

Jason
Leon Romanovsky Nov. 9, 2020, 5:09 a.m. UTC | #11
On Sun, Nov 08, 2020 at 07:49:35PM -0400, Jason Gunthorpe wrote:
> On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
> > On 05/11/2020 22:00, Jason Gunthorpe wrote:
> > > On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
> > >> On 03/11/2020 16:22, Jason Gunthorpe wrote:
> > >>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
> > >>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
> > >>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
> > >>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> > >>>>>>> Add the ability to query the device's bdf through rdma tool netlink
> > >>>>>>> command (in addition to the sysfs infra).
> > >>>>>>>
> > >>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
> > >>>>>>
> > >>>>>> Why? What is the use case?
> > >>>>>
> > >>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
> > >>>>
> > >>>> When taking system topology into consideration you need some way to pair the
> > >>>> ibdev and bdf, especially when working with multiple devices.
> > >>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
> > >>>
> > >>> You are supposed to use sysfs
> > >>>
> > >>> /sys/class/infiniband/ibp0s9/device
> > >>>
> > >>> Should always be the physical device
> > >>>
> > >>>> Why rdma tool? Because it's more intuitive than sysfs.
> > >>>
> > >>> But we generally don't put this information into netlink BDF is just
> > >>> the start, you need all the other topology information to make sense
> > >>> of it, and all that is in sysfs only already
> > >>
> > >> As the commit message says, it's in addition to the device sysfs.
> > >>
> > >> Many (if not most) of the existing rdma netlink commands are duplicates of some
> > >> sysfs entries, but show it in a more "modern" way.
> > >> I'm not convinced that bdf should be treated differently.
> > >
> > > Why did you call it BDF anyhow? it has nothing to do with PCI BDF
> > > other than it happens to be the PDF for PCI devices. Netdev called
> > > this bus_info
> >
> > Are there non pci devices in the subsystem?
>
> Yes, HNS uses non-pci devices
>
> > I can rename to a more fitting name, will change to bus_info unless
> > someone has a better idea.
>
> The thing is, is is still useless. You have to consult sysfs to
> understand what bus it is scoped on to do anything further with
> it. Can't just assume it is PCI.

Can anyone please remind me why are we doing it?
What problem do you solve here by adding new nldev attributes?

Thanks

>
> Jason
Gal Pressman Nov. 9, 2020, 9:03 a.m. UTC | #12
On 09/11/2020 7:09, Leon Romanovsky wrote:
> On Sun, Nov 08, 2020 at 07:49:35PM -0400, Jason Gunthorpe wrote:
>> On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
>>> On 05/11/2020 22:00, Jason Gunthorpe wrote:
>>>> On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
>>>>> On 03/11/2020 16:22, Jason Gunthorpe wrote:
>>>>>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
>>>>>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
>>>>>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
>>>>>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
>>>>>>>>>> Add the ability to query the device's bdf through rdma tool netlink
>>>>>>>>>> command (in addition to the sysfs infra).
>>>>>>>>>>
>>>>>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
>>>>>>>>>
>>>>>>>>> Why? What is the use case?
>>>>>>>>
>>>>>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
>>>>>>>
>>>>>>> When taking system topology into consideration you need some way to pair the
>>>>>>> ibdev and bdf, especially when working with multiple devices.
>>>>>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
>>>>>>
>>>>>> You are supposed to use sysfs
>>>>>>
>>>>>> /sys/class/infiniband/ibp0s9/device
>>>>>>
>>>>>> Should always be the physical device
>>>>>>
>>>>>>> Why rdma tool? Because it's more intuitive than sysfs.
>>>>>>
>>>>>> But we generally don't put this information into netlink BDF is just
>>>>>> the start, you need all the other topology information to make sense
>>>>>> of it, and all that is in sysfs only already
>>>>>
>>>>> As the commit message says, it's in addition to the device sysfs.
>>>>>
>>>>> Many (if not most) of the existing rdma netlink commands are duplicates of some
>>>>> sysfs entries, but show it in a more "modern" way.
>>>>> I'm not convinced that bdf should be treated differently.
>>>>
>>>> Why did you call it BDF anyhow? it has nothing to do with PCI BDF
>>>> other than it happens to be the PDF for PCI devices. Netdev called
>>>> this bus_info
>>>
>>> Are there non pci devices in the subsystem?
>>
>> Yes, HNS uses non-pci devices
>>
>>> I can rename to a more fitting name, will change to bus_info unless
>>> someone has a better idea.
>>
>> The thing is, is is still useless. You have to consult sysfs to
>> understand what bus it is scoped on to do anything further with
>> it. Can't just assume it is PCI.
> 
> Can anyone please remind me why are we doing it?
> What problem do you solve here by adding new nldev attributes?

https://lore.kernel.org/linux-rdma/0825e1bf-f913-d2c1-ad3f-35ba3d6b75ef@amazon.com/
Gal Pressman Nov. 9, 2020, 9:03 a.m. UTC | #13
On 09/11/2020 1:49, Jason Gunthorpe wrote:
> On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
>> On 05/11/2020 22:00, Jason Gunthorpe wrote:
>>> On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
>>>> On 03/11/2020 16:22, Jason Gunthorpe wrote:
>>>>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
>>>>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
>>>>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
>>>>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
>>>>>>>>> Add the ability to query the device's bdf through rdma tool netlink
>>>>>>>>> command (in addition to the sysfs infra).
>>>>>>>>>
>>>>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
>>>>>>>>
>>>>>>>> Why? What is the use case?
>>>>>>>
>>>>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
>>>>>>
>>>>>> When taking system topology into consideration you need some way to pair the
>>>>>> ibdev and bdf, especially when working with multiple devices.
>>>>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
>>>>>
>>>>> You are supposed to use sysfs
>>>>>
>>>>> /sys/class/infiniband/ibp0s9/device
>>>>>
>>>>> Should always be the physical device
>>>>>
>>>>>> Why rdma tool? Because it's more intuitive than sysfs.
>>>>>
>>>>> But we generally don't put this information into netlink BDF is just
>>>>> the start, you need all the other topology information to make sense
>>>>> of it, and all that is in sysfs only already
>>>>
>>>> As the commit message says, it's in addition to the device sysfs.
>>>>
>>>> Many (if not most) of the existing rdma netlink commands are duplicates of some
>>>> sysfs entries, but show it in a more "modern" way.
>>>> I'm not convinced that bdf should be treated differently.
>>>
>>> Why did you call it BDF anyhow? it has nothing to do with PCI BDF
>>> other than it happens to be the PDF for PCI devices. Netdev called
>>> this bus_info
>>
>> Are there non pci devices in the subsystem?
> 
> Yes, HNS uses non-pci devices
> 
>> I can rename to a more fitting name, will change to bus_info unless
>> someone has a better idea.
> 
> The thing is, is is still useless. You have to consult sysfs to
> understand what bus it is scoped on to do anything further with
> it. Can't just assume it is PCI.

This can be solved with Parav's suggestion.
Leon Romanovsky Nov. 9, 2020, 11:55 a.m. UTC | #14
On Mon, Nov 09, 2020 at 11:03:25AM +0200, Gal Pressman wrote:
>
> On 09/11/2020 7:09, Leon Romanovsky wrote:
> > On Sun, Nov 08, 2020 at 07:49:35PM -0400, Jason Gunthorpe wrote:
> >> On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
> >>> On 05/11/2020 22:00, Jason Gunthorpe wrote:
> >>>> On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
> >>>>> On 03/11/2020 16:22, Jason Gunthorpe wrote:
> >>>>>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
> >>>>>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
> >>>>>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
> >>>>>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> >>>>>>>>>> Add the ability to query the device's bdf through rdma tool netlink
> >>>>>>>>>> command (in addition to the sysfs infra).
> >>>>>>>>>>
> >>>>>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
> >>>>>>>>>
> >>>>>>>>> Why? What is the use case?
> >>>>>>>>
> >>>>>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
> >>>>>>>
> >>>>>>> When taking system topology into consideration you need some way to pair the
> >>>>>>> ibdev and bdf, especially when working with multiple devices.
> >>>>>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
> >>>>>>
> >>>>>> You are supposed to use sysfs
> >>>>>>
> >>>>>> /sys/class/infiniband/ibp0s9/device
> >>>>>>
> >>>>>> Should always be the physical device
> >>>>>>
> >>>>>>> Why rdma tool? Because it's more intuitive than sysfs.
> >>>>>>
> >>>>>> But we generally don't put this information into netlink BDF is just
> >>>>>> the start, you need all the other topology information to make sense
> >>>>>> of it, and all that is in sysfs only already
> >>>>>
> >>>>> As the commit message says, it's in addition to the device sysfs.
> >>>>>
> >>>>> Many (if not most) of the existing rdma netlink commands are duplicates of some
> >>>>> sysfs entries, but show it in a more "modern" way.
> >>>>> I'm not convinced that bdf should be treated differently.
> >>>>
> >>>> Why did you call it BDF anyhow? it has nothing to do with PCI BDF
> >>>> other than it happens to be the PDF for PCI devices. Netdev called
> >>>> this bus_info
> >>>
> >>> Are there non pci devices in the subsystem?
> >>
> >> Yes, HNS uses non-pci devices
> >>
> >>> I can rename to a more fitting name, will change to bus_info unless
> >>> someone has a better idea.
> >>
> >> The thing is, is is still useless. You have to consult sysfs to
> >> understand what bus it is scoped on to do anything further with
> >> it. Can't just assume it is PCI.
> >
> > Can anyone please remind me why are we doing it?
> > What problem do you solve here by adding new nldev attributes?
>
> https://lore.kernel.org/linux-rdma/0825e1bf-f913-d2c1-ad3f-35ba3d6b75ef@amazon.com/

Thanks, but IMHO it doesn't answer on the question about the problem.
Gal Pressman Nov. 9, 2020, 12:27 p.m. UTC | #15
On 09/11/2020 13:55, Leon Romanovsky wrote:
> On Mon, Nov 09, 2020 at 11:03:25AM +0200, Gal Pressman wrote:
>>
>> On 09/11/2020 7:09, Leon Romanovsky wrote:
>>> On Sun, Nov 08, 2020 at 07:49:35PM -0400, Jason Gunthorpe wrote:
>>>> On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
>>>>> On 05/11/2020 22:00, Jason Gunthorpe wrote:
>>>>>> On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
>>>>>>> On 03/11/2020 16:22, Jason Gunthorpe wrote:
>>>>>>>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
>>>>>>>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
>>>>>>>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
>>>>>>>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
>>>>>>>>>>>> Add the ability to query the device's bdf through rdma tool netlink
>>>>>>>>>>>> command (in addition to the sysfs infra).
>>>>>>>>>>>>
>>>>>>>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
>>>>>>>>>>>
>>>>>>>>>>> Why? What is the use case?
>>>>>>>>>>
>>>>>>>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
>>>>>>>>>
>>>>>>>>> When taking system topology into consideration you need some way to pair the
>>>>>>>>> ibdev and bdf, especially when working with multiple devices.
>>>>>>>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
>>>>>>>>
>>>>>>>> You are supposed to use sysfs
>>>>>>>>
>>>>>>>> /sys/class/infiniband/ibp0s9/device
>>>>>>>>
>>>>>>>> Should always be the physical device
>>>>>>>>
>>>>>>>>> Why rdma tool? Because it's more intuitive than sysfs.
>>>>>>>>
>>>>>>>> But we generally don't put this information into netlink BDF is just
>>>>>>>> the start, you need all the other topology information to make sense
>>>>>>>> of it, and all that is in sysfs only already
>>>>>>>
>>>>>>> As the commit message says, it's in addition to the device sysfs.
>>>>>>>
>>>>>>> Many (if not most) of the existing rdma netlink commands are duplicates of some
>>>>>>> sysfs entries, but show it in a more "modern" way.
>>>>>>> I'm not convinced that bdf should be treated differently.
>>>>>>
>>>>>> Why did you call it BDF anyhow? it has nothing to do with PCI BDF
>>>>>> other than it happens to be the PDF for PCI devices. Netdev called
>>>>>> this bus_info
>>>>>
>>>>> Are there non pci devices in the subsystem?
>>>>
>>>> Yes, HNS uses non-pci devices
>>>>
>>>>> I can rename to a more fitting name, will change to bus_info unless
>>>>> someone has a better idea.
>>>>
>>>> The thing is, is is still useless. You have to consult sysfs to
>>>> understand what bus it is scoped on to do anything further with
>>>> it. Can't just assume it is PCI.
>>>
>>> Can anyone please remind me why are we doing it?
>>> What problem do you solve here by adding new nldev attributes?
>>
>> https://lore.kernel.org/linux-rdma/0825e1bf-f913-d2c1-ad3f-35ba3d6b75ef@amazon.com/
> 
> Thanks, but IMHO it doesn't answer on the question about the problem.

For example, in an instance with multiple NICs and GPUs, it's common to examine
the devices topology and distances, device bdfs are needed for that.

Also, when analyzing dmesg logs the prints contain the ibdev name, which is not
always enough when trying to debug the corresponding physical device.
Leon Romanovsky Nov. 9, 2020, 12:32 p.m. UTC | #16
On Mon, Nov 09, 2020 at 02:27:16PM +0200, Gal Pressman wrote:
> On 09/11/2020 13:55, Leon Romanovsky wrote:
> > On Mon, Nov 09, 2020 at 11:03:25AM +0200, Gal Pressman wrote:
> >>
> >> On 09/11/2020 7:09, Leon Romanovsky wrote:
> >>> On Sun, Nov 08, 2020 at 07:49:35PM -0400, Jason Gunthorpe wrote:
> >>>> On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
> >>>>> On 05/11/2020 22:00, Jason Gunthorpe wrote:
> >>>>>> On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
> >>>>>>> On 03/11/2020 16:22, Jason Gunthorpe wrote:
> >>>>>>>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
> >>>>>>>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
> >>>>>>>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
> >>>>>>>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> >>>>>>>>>>>> Add the ability to query the device's bdf through rdma tool netlink
> >>>>>>>>>>>> command (in addition to the sysfs infra).
> >>>>>>>>>>>>
> >>>>>>>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
> >>>>>>>>>>>
> >>>>>>>>>>> Why? What is the use case?
> >>>>>>>>>>
> >>>>>>>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
> >>>>>>>>>
> >>>>>>>>> When taking system topology into consideration you need some way to pair the
> >>>>>>>>> ibdev and bdf, especially when working with multiple devices.
> >>>>>>>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
> >>>>>>>>
> >>>>>>>> You are supposed to use sysfs
> >>>>>>>>
> >>>>>>>> /sys/class/infiniband/ibp0s9/device
> >>>>>>>>
> >>>>>>>> Should always be the physical device
> >>>>>>>>
> >>>>>>>>> Why rdma tool? Because it's more intuitive than sysfs.
> >>>>>>>>
> >>>>>>>> But we generally don't put this information into netlink BDF is just
> >>>>>>>> the start, you need all the other topology information to make sense
> >>>>>>>> of it, and all that is in sysfs only already
> >>>>>>>
> >>>>>>> As the commit message says, it's in addition to the device sysfs.
> >>>>>>>
> >>>>>>> Many (if not most) of the existing rdma netlink commands are duplicates of some
> >>>>>>> sysfs entries, but show it in a more "modern" way.
> >>>>>>> I'm not convinced that bdf should be treated differently.
> >>>>>>
> >>>>>> Why did you call it BDF anyhow? it has nothing to do with PCI BDF
> >>>>>> other than it happens to be the PDF for PCI devices. Netdev called
> >>>>>> this bus_info
> >>>>>
> >>>>> Are there non pci devices in the subsystem?
> >>>>
> >>>> Yes, HNS uses non-pci devices
> >>>>
> >>>>> I can rename to a more fitting name, will change to bus_info unless
> >>>>> someone has a better idea.
> >>>>
> >>>> The thing is, is is still useless. You have to consult sysfs to
> >>>> understand what bus it is scoped on to do anything further with
> >>>> it. Can't just assume it is PCI.
> >>>
> >>> Can anyone please remind me why are we doing it?
> >>> What problem do you solve here by adding new nldev attributes?
> >>
> >> https://lore.kernel.org/linux-rdma/0825e1bf-f913-d2c1-ad3f-35ba3d6b75ef@amazon.com/
> >
> > Thanks, but IMHO it doesn't answer on the question about the problem.
>
> For example, in an instance with multiple NICs and GPUs, it's common to examine
> the devices topology and distances, device bdfs are needed for that.
>
> Also, when analyzing dmesg logs the prints contain the ibdev name, which is not
> always enough when trying to debug the corresponding physical device.

Gal,

I'm asking which problem will solve new nldev and not why BDF is important. :)

Thanks
Gal Pressman Nov. 9, 2020, 12:47 p.m. UTC | #17
On 09/11/2020 14:32, Leon Romanovsky wrote:
> On Mon, Nov 09, 2020 at 02:27:16PM +0200, Gal Pressman wrote:
>> On 09/11/2020 13:55, Leon Romanovsky wrote:
>>> On Mon, Nov 09, 2020 at 11:03:25AM +0200, Gal Pressman wrote:
>>>>
>>>> On 09/11/2020 7:09, Leon Romanovsky wrote:
>>>>> On Sun, Nov 08, 2020 at 07:49:35PM -0400, Jason Gunthorpe wrote:
>>>>>> On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
>>>>>>> On 05/11/2020 22:00, Jason Gunthorpe wrote:
>>>>>>>> On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
>>>>>>>>> On 03/11/2020 16:22, Jason Gunthorpe wrote:
>>>>>>>>>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
>>>>>>>>>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
>>>>>>>>>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
>>>>>>>>>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
>>>>>>>>>>>>>> Add the ability to query the device's bdf through rdma tool netlink
>>>>>>>>>>>>>> command (in addition to the sysfs infra).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why? What is the use case?
>>>>>>>>>>>>
>>>>>>>>>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
>>>>>>>>>>>
>>>>>>>>>>> When taking system topology into consideration you need some way to pair the
>>>>>>>>>>> ibdev and bdf, especially when working with multiple devices.
>>>>>>>>>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
>>>>>>>>>>
>>>>>>>>>> You are supposed to use sysfs
>>>>>>>>>>
>>>>>>>>>> /sys/class/infiniband/ibp0s9/device
>>>>>>>>>>
>>>>>>>>>> Should always be the physical device
>>>>>>>>>>
>>>>>>>>>>> Why rdma tool? Because it's more intuitive than sysfs.
>>>>>>>>>>
>>>>>>>>>> But we generally don't put this information into netlink BDF is just
>>>>>>>>>> the start, you need all the other topology information to make sense
>>>>>>>>>> of it, and all that is in sysfs only already
>>>>>>>>>
>>>>>>>>> As the commit message says, it's in addition to the device sysfs.
>>>>>>>>>
>>>>>>>>> Many (if not most) of the existing rdma netlink commands are duplicates of some
>>>>>>>>> sysfs entries, but show it in a more "modern" way.
>>>>>>>>> I'm not convinced that bdf should be treated differently.
>>>>>>>>
>>>>>>>> Why did you call it BDF anyhow? it has nothing to do with PCI BDF
>>>>>>>> other than it happens to be the PDF for PCI devices. Netdev called
>>>>>>>> this bus_info
>>>>>>>
>>>>>>> Are there non pci devices in the subsystem?
>>>>>>
>>>>>> Yes, HNS uses non-pci devices
>>>>>>
>>>>>>> I can rename to a more fitting name, will change to bus_info unless
>>>>>>> someone has a better idea.
>>>>>>
>>>>>> The thing is, is is still useless. You have to consult sysfs to
>>>>>> understand what bus it is scoped on to do anything further with
>>>>>> it. Can't just assume it is PCI.
>>>>>
>>>>> Can anyone please remind me why are we doing it?
>>>>> What problem do you solve here by adding new nldev attributes?
>>>>
>>>> https://lore.kernel.org/linux-rdma/0825e1bf-f913-d2c1-ad3f-35ba3d6b75ef@amazon.com/
>>>
>>> Thanks, but IMHO it doesn't answer on the question about the problem.
>>
>> For example, in an instance with multiple NICs and GPUs, it's common to examine
>> the devices topology and distances, device bdfs are needed for that.
>>
>> Also, when analyzing dmesg logs the prints contain the ibdev name, which is not
>> always enough when trying to debug the corresponding physical device.
> 
> Gal,
> 
> I'm asking which problem will solve new nldev and not why BDF is important. :)

This patch follows the implementation of other fields in fill_dev_info() such as
port index, fw version, node guid, sys image guid, node type, dev protocol, etc,
which also exist in sysfs.

You added most of these new nldevs not long ago, so I find your question a bit
confusing.. Can you please explain your concerns and why you think bdf is different?
Leon Romanovsky Nov. 9, 2020, 1:02 p.m. UTC | #18
On Mon, Nov 09, 2020 at 02:47:07PM +0200, Gal Pressman wrote:
> On 09/11/2020 14:32, Leon Romanovsky wrote:
> > On Mon, Nov 09, 2020 at 02:27:16PM +0200, Gal Pressman wrote:
> >> On 09/11/2020 13:55, Leon Romanovsky wrote:
> >>> On Mon, Nov 09, 2020 at 11:03:25AM +0200, Gal Pressman wrote:
> >>>>
> >>>> On 09/11/2020 7:09, Leon Romanovsky wrote:
> >>>>> On Sun, Nov 08, 2020 at 07:49:35PM -0400, Jason Gunthorpe wrote:
> >>>>>> On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
> >>>>>>> On 05/11/2020 22:00, Jason Gunthorpe wrote:
> >>>>>>>> On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
> >>>>>>>>> On 03/11/2020 16:22, Jason Gunthorpe wrote:
> >>>>>>>>>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
> >>>>>>>>>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
> >>>>>>>>>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
> >>>>>>>>>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> >>>>>>>>>>>>>> Add the ability to query the device's bdf through rdma tool netlink
> >>>>>>>>>>>>>> command (in addition to the sysfs infra).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Why? What is the use case?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
> >>>>>>>>>>>
> >>>>>>>>>>> When taking system topology into consideration you need some way to pair the
> >>>>>>>>>>> ibdev and bdf, especially when working with multiple devices.
> >>>>>>>>>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
> >>>>>>>>>>
> >>>>>>>>>> You are supposed to use sysfs
> >>>>>>>>>>
> >>>>>>>>>> /sys/class/infiniband/ibp0s9/device
> >>>>>>>>>>
> >>>>>>>>>> Should always be the physical device
> >>>>>>>>>>
> >>>>>>>>>>> Why rdma tool? Because it's more intuitive than sysfs.
> >>>>>>>>>>
> >>>>>>>>>> But we generally don't put this information into netlink BDF is just
> >>>>>>>>>> the start, you need all the other topology information to make sense
> >>>>>>>>>> of it, and all that is in sysfs only already
> >>>>>>>>>
> >>>>>>>>> As the commit message says, it's in addition to the device sysfs.
> >>>>>>>>>
> >>>>>>>>> Many (if not most) of the existing rdma netlink commands are duplicates of some
> >>>>>>>>> sysfs entries, but show it in a more "modern" way.
> >>>>>>>>> I'm not convinced that bdf should be treated differently.
> >>>>>>>>
> >>>>>>>> Why did you call it BDF anyhow? it has nothing to do with PCI BDF
> >>>>>>>> other than it happens to be the PDF for PCI devices. Netdev called
> >>>>>>>> this bus_info
> >>>>>>>
> >>>>>>> Are there non pci devices in the subsystem?
> >>>>>>
> >>>>>> Yes, HNS uses non-pci devices
> >>>>>>
> >>>>>>> I can rename to a more fitting name, will change to bus_info unless
> >>>>>>> someone has a better idea.
> >>>>>>
> >>>>>> The thing is, is is still useless. You have to consult sysfs to
> >>>>>> understand what bus it is scoped on to do anything further with
> >>>>>> it. Can't just assume it is PCI.
> >>>>>
> >>>>> Can anyone please remind me why are we doing it?
> >>>>> What problem do you solve here by adding new nldev attributes?
> >>>>
> >>>> https://lore.kernel.org/linux-rdma/0825e1bf-f913-d2c1-ad3f-35ba3d6b75ef@amazon.com/
> >>>
> >>> Thanks, but IMHO it doesn't answer on the question about the problem.
> >>
> >> For example, in an instance with multiple NICs and GPUs, it's common to examine
> >> the devices topology and distances, device bdfs are needed for that.
> >>
> >> Also, when analyzing dmesg logs the prints contain the ibdev name, which is not
> >> always enough when trying to debug the corresponding physical device.
> >
> > Gal,
> >
> > I'm asking which problem will solve new nldev and not why BDF is important. :)
>
> This patch follows the implementation of other fields in fill_dev_info() such as
> port index, fw version, node guid, sys image guid, node type, dev protocol, etc,
> which also exist in sysfs.
>
> You added most of these new nldevs not long ago, so I find your question a bit
> confusing.. Can you please explain your concerns and why you think bdf is different?

Almost all fields that you mentioned were needed to implement rdma_rename
utility that followed systemd internal implementation and/or were used in
the rdma-core.

The FW version is clearly an exemption to the above.

So I'm trying to understand the rationale behind BDF and how it will
work with different bus variants that we will have. Like Parav said,
the IB is connected to auxiliary bus (no BDF) and will have parent
with BDF too at the same time.

Thanks
Gal Pressman Nov. 9, 2020, 1:52 p.m. UTC | #19
On 09/11/2020 15:02, Leon Romanovsky wrote:
> On Mon, Nov 09, 2020 at 02:47:07PM +0200, Gal Pressman wrote:
>> On 09/11/2020 14:32, Leon Romanovsky wrote:
>>> On Mon, Nov 09, 2020 at 02:27:16PM +0200, Gal Pressman wrote:
>>>> On 09/11/2020 13:55, Leon Romanovsky wrote:
>>>>> On Mon, Nov 09, 2020 at 11:03:25AM +0200, Gal Pressman wrote:
>>>>>>
>>>>>> On 09/11/2020 7:09, Leon Romanovsky wrote:
>>>>>>> On Sun, Nov 08, 2020 at 07:49:35PM -0400, Jason Gunthorpe wrote:
>>>>>>>> On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
>>>>>>>>> On 05/11/2020 22:00, Jason Gunthorpe wrote:
>>>>>>>>>> On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
>>>>>>>>>>> On 03/11/2020 16:22, Jason Gunthorpe wrote:
>>>>>>>>>>>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
>>>>>>>>>>>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
>>>>>>>>>>>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
>>>>>>>>>>>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
>>>>>>>>>>>>>>>> Add the ability to query the device's bdf through rdma tool netlink
>>>>>>>>>>>>>>>> command (in addition to the sysfs infra).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Why? What is the use case?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
>>>>>>>>>>>>>
>>>>>>>>>>>>> When taking system topology into consideration you need some way to pair the
>>>>>>>>>>>>> ibdev and bdf, especially when working with multiple devices.
>>>>>>>>>>>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
>>>>>>>>>>>>
>>>>>>>>>>>> You are supposed to use sysfs
>>>>>>>>>>>>
>>>>>>>>>>>> /sys/class/infiniband/ibp0s9/device
>>>>>>>>>>>>
>>>>>>>>>>>> Should always be the physical device
>>>>>>>>>>>>
>>>>>>>>>>>>> Why rdma tool? Because it's more intuitive than sysfs.
>>>>>>>>>>>>
>>>>>>>>>>>> But we generally don't put this information into netlink BDF is just
>>>>>>>>>>>> the start, you need all the other topology information to make sense
>>>>>>>>>>>> of it, and all that is in sysfs only already
>>>>>>>>>>>
>>>>>>>>>>> As the commit message says, it's in addition to the device sysfs.
>>>>>>>>>>>
>>>>>>>>>>> Many (if not most) of the existing rdma netlink commands are duplicates of some
>>>>>>>>>>> sysfs entries, but show it in a more "modern" way.
>>>>>>>>>>> I'm not convinced that bdf should be treated differently.
>>>>>>>>>>
>>>>>>>>>> Why did you call it BDF anyhow? it has nothing to do with PCI BDF
>>>>>>>>>> other than it happens to be the PDF for PCI devices. Netdev called
>>>>>>>>>> this bus_info
>>>>>>>>>
>>>>>>>>> Are there non pci devices in the subsystem?
>>>>>>>>
>>>>>>>> Yes, HNS uses non-pci devices
>>>>>>>>
>>>>>>>>> I can rename to a more fitting name, will change to bus_info unless
>>>>>>>>> someone has a better idea.
>>>>>>>>
>>>>>>>> The thing is, is is still useless. You have to consult sysfs to
>>>>>>>> understand what bus it is scoped on to do anything further with
>>>>>>>> it. Can't just assume it is PCI.
>>>>>>>
>>>>>>> Can anyone please remind me why are we doing it?
>>>>>>> What problem do you solve here by adding new nldev attributes?
>>>>>>
>>>>>> https://lore.kernel.org/linux-rdma/0825e1bf-f913-d2c1-ad3f-35ba3d6b75ef@amazon.com/
>>>>>
>>>>> Thanks, but IMHO it doesn't answer on the question about the problem.
>>>>
>>>> For example, in an instance with multiple NICs and GPUs, it's common to examine
>>>> the devices topology and distances, device bdfs are needed for that.
>>>>
>>>> Also, when analyzing dmesg logs the prints contain the ibdev name, which is not
>>>> always enough when trying to debug the corresponding physical device.
>>>
>>> Gal,
>>>
>>> I'm asking which problem will solve new nldev and not why BDF is important. :)
>>
>> This patch follows the implementation of other fields in fill_dev_info() such as
>> port index, fw version, node guid, sys image guid, node type, dev protocol, etc,
>> which also exist in sysfs.
>>
>> You added most of these new nldevs not long ago, so I find your question a bit
>> confusing.. Can you please explain your concerns and why you think bdf is different?
> 
> Almost all fields that you mentioned were needed to implement rdma_rename
> utility that followed systemd internal implementation and/or were used in
> the rdma-core.
> 
> The FW version is clearly an exemption to the above.
> 
> So I'm trying to understand the rationale behind BDF and how it will
> work with different bus variants that we will have. Like Parav said,
> the IB is connected to auxiliary bus (no BDF) and will have parent
> with BDF too at the same time.

We can review the different cases and make sure it works as expected (and what's
expected), but let's first reach an agreement if I should continue with this
work or not.
Leon Romanovsky Nov. 9, 2020, 2:12 p.m. UTC | #20
On Mon, Nov 09, 2020 at 03:52:29PM +0200, Gal Pressman wrote:
> On 09/11/2020 15:02, Leon Romanovsky wrote:
> > On Mon, Nov 09, 2020 at 02:47:07PM +0200, Gal Pressman wrote:
> >> On 09/11/2020 14:32, Leon Romanovsky wrote:
> >>> On Mon, Nov 09, 2020 at 02:27:16PM +0200, Gal Pressman wrote:
> >>>> On 09/11/2020 13:55, Leon Romanovsky wrote:
> >>>>> On Mon, Nov 09, 2020 at 11:03:25AM +0200, Gal Pressman wrote:
> >>>>>>
> >>>>>> On 09/11/2020 7:09, Leon Romanovsky wrote:
> >>>>>>> On Sun, Nov 08, 2020 at 07:49:35PM -0400, Jason Gunthorpe wrote:
> >>>>>>>> On Sun, Nov 08, 2020 at 03:03:45PM +0200, Gal Pressman wrote:
> >>>>>>>>> On 05/11/2020 22:00, Jason Gunthorpe wrote:
> >>>>>>>>>> On Tue, Nov 03, 2020 at 05:45:26PM +0200, Gal Pressman wrote:
> >>>>>>>>>>> On 03/11/2020 16:22, Jason Gunthorpe wrote:
> >>>>>>>>>>>> On Tue, Nov 03, 2020 at 04:11:19PM +0200, Gal Pressman wrote:
> >>>>>>>>>>>>> On 03/11/2020 15:57, Leon Romanovsky wrote:
> >>>>>>>>>>>>>> On Tue, Nov 03, 2020 at 09:45:22AM -0400, Jason Gunthorpe wrote:
> >>>>>>>>>>>>>>> On Tue, Nov 03, 2020 at 03:26:27PM +0200, Gal Pressman wrote:
> >>>>>>>>>>>>>>>> Add the ability to query the device's bdf through rdma tool netlink
> >>>>>>>>>>>>>>>> command (in addition to the sysfs infra).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In case of virtual devices (rxe/siw), the netdev bdf will be shown.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Why? What is the use case?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Right, and why isn't netdev (RDMA_NLDEV_ATTR_NDEV_NAME) enough?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> When taking system topology into consideration you need some way to pair the
> >>>>>>>>>>>>> ibdev and bdf, especially when working with multiple devices.
> >>>>>>>>>>>>> The netdev name doesn't exist on devices with no netdevs (IB, EFA).
> >>>>>>>>>>>>
> >>>>>>>>>>>> You are supposed to use sysfs
> >>>>>>>>>>>>
> >>>>>>>>>>>> /sys/class/infiniband/ibp0s9/device
> >>>>>>>>>>>>
> >>>>>>>>>>>> Should always be the physical device
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Why rdma tool? Because it's more intuitive than sysfs.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But we generally don't put this information into netlink BDF is just
> >>>>>>>>>>>> the start, you need all the other topology information to make sense
> >>>>>>>>>>>> of it, and all that is in sysfs only already
> >>>>>>>>>>>
> >>>>>>>>>>> As the commit message says, it's in addition to the device sysfs.
> >>>>>>>>>>>
> >>>>>>>>>>> Many (if not most) of the existing rdma netlink commands are duplicates of some
> >>>>>>>>>>> sysfs entries, but show it in a more "modern" way.
> >>>>>>>>>>> I'm not convinced that bdf should be treated differently.
> >>>>>>>>>>
> >>>>>>>>>> Why did you call it BDF anyhow? it has nothing to do with PCI BDF
> >>>>>>>>>> other than it happens to be the PDF for PCI devices. Netdev called
> >>>>>>>>>> this bus_info
> >>>>>>>>>
> >>>>>>>>> Are there non pci devices in the subsystem?
> >>>>>>>>
> >>>>>>>> Yes, HNS uses non-pci devices
> >>>>>>>>
> >>>>>>>>> I can rename to a more fitting name, will change to bus_info unless
> >>>>>>>>> someone has a better idea.
> >>>>>>>>
> >>>>>>>> The thing is, is is still useless. You have to consult sysfs to
> >>>>>>>> understand what bus it is scoped on to do anything further with
> >>>>>>>> it. Can't just assume it is PCI.
> >>>>>>>
> >>>>>>> Can anyone please remind me why are we doing it?
> >>>>>>> What problem do you solve here by adding new nldev attributes?
> >>>>>>
> >>>>>> https://lore.kernel.org/linux-rdma/0825e1bf-f913-d2c1-ad3f-35ba3d6b75ef@amazon.com/
> >>>>>
> >>>>> Thanks, but IMHO it doesn't answer on the question about the problem.
> >>>>
> >>>> For example, in an instance with multiple NICs and GPUs, it's common to examine
> >>>> the devices topology and distances, device bdfs are needed for that.
> >>>>
> >>>> Also, when analyzing dmesg logs the prints contain the ibdev name, which is not
> >>>> always enough when trying to debug the corresponding physical device.
> >>>
> >>> Gal,
> >>>
> >>> I'm asking which problem will solve new nldev and not why BDF is important. :)
> >>
> >> This patch follows the implementation of other fields in fill_dev_info() such as
> >> port index, fw version, node guid, sys image guid, node type, dev protocol, etc,
> >> which also exist in sysfs.
> >>
> >> You added most of these new nldevs not long ago, so I find your question a bit
> >> confusing.. Can you please explain your concerns and why you think bdf is different?
> >
> > Almost all fields that you mentioned were needed to implement rdma_rename
> > utility that followed systemd internal implementation and/or were used in
> > the rdma-core.
> >
> > The FW version is clearly an exemption to the above.
> >
> > So I'm trying to understand the rationale behind BDF and how it will
> > work with different bus variants that we will have. Like Parav said,
> > the IB is connected to auxiliary bus (no BDF) and will have parent
> > with BDF too at the same time.
>
> We can review the different cases and make sure it works as expected (and what's
> expected), but let's first reach an agreement if I should continue with this
> work or not.

I don't know.

Thanks
Jason Gunthorpe Nov. 9, 2020, 5:57 p.m. UTC | #21
On Mon, Nov 09, 2020 at 11:03:47AM +0200, Gal Pressman wrote:

> > The thing is, is is still useless. You have to consult sysfs to
> > understand what bus it is scoped on to do anything further with
> > it. Can't just assume it is PCI.
> 
> This can be solved with Parav's suggestion.

Now you are adding more stuff.

What is wrong with reading sysfs? sysfs is where topology information
lives, why do we need to denormalize things?

Jason
Gal Pressman Nov. 10, 2020, 7:49 a.m. UTC | #22
On 09/11/2020 19:57, Jason Gunthorpe wrote:
> On Mon, Nov 09, 2020 at 11:03:47AM +0200, Gal Pressman wrote:
> 
>>> The thing is, is is still useless. You have to consult sysfs to
>>> understand what bus it is scoped on to do anything further with
>>> it. Can't just assume it is PCI.
>>
>> This can be solved with Parav's suggestion.
> 
> Now you are adding more stuff.
> 
> What is wrong with reading sysfs? sysfs is where topology information
> lives, why do we need to denormalize things?

And yet you have lspci so you don't have to dig through the sysfs files by hand
for that topology.
Please drop this patch.
Jason Gunthorpe Nov. 10, 2020, 1:41 p.m. UTC | #23
On Tue, Nov 10, 2020 at 09:49:11AM +0200, Gal Pressman wrote:
> On 09/11/2020 19:57, Jason Gunthorpe wrote:
> > On Mon, Nov 09, 2020 at 11:03:47AM +0200, Gal Pressman wrote:
> > 
> >>> The thing is, is is still useless. You have to consult sysfs to
> >>> understand what bus it is scoped on to do anything further with
> >>> it. Can't just assume it is PCI.
> >>
> >> This can be solved with Parav's suggestion.
> > 
> > Now you are adding more stuff.
> > 
> > What is wrong with reading sysfs? sysfs is where topology information
> > lives, why do we need to denormalize things?
> 
> And yet you have lspci so you don't have to dig through the sysfs files by hand
> for that topology.
> Please drop this patch.

If you want to add something to rdma tool it can read sysfs and disply it

Jason
Leon Romanovsky Nov. 10, 2020, 1:54 p.m. UTC | #24
On Tue, Nov 10, 2020 at 09:41:22AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 10, 2020 at 09:49:11AM +0200, Gal Pressman wrote:
> > On 09/11/2020 19:57, Jason Gunthorpe wrote:
> > > On Mon, Nov 09, 2020 at 11:03:47AM +0200, Gal Pressman wrote:
> > >
> > >>> The thing is, is is still useless. You have to consult sysfs to
> > >>> understand what bus it is scoped on to do anything further with
> > >>> it. Can't just assume it is PCI.
> > >>
> > >> This can be solved with Parav's suggestion.
> > >
> > > Now you are adding more stuff.
> > >
> > > What is wrong with reading sysfs? sysfs is where topology information
> > > lives, why do we need to denormalize things?
> >
> > And yet you have lspci so you don't have to dig through the sysfs files by hand
> > for that topology.
> > Please drop this patch.
>
> If you want to add something to rdma tool it can read sysfs and disply it

I tried it and it wasn't accepted well in netdev community.

Thanks

>
> Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 12d29d54a081..9704b1449c01 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -291,7 +291,15 @@  static int fill_dev_info(struct sk_buff *msg, struct ib_device *device)
 	else if (rdma_protocol_usnic(device, port))
 		ret = nla_put_string(msg, RDMA_NLDEV_ATTR_DEV_PROTOCOL,
 				     "usnic");
-	return ret;
+	if (ret)
+		return ret;
+
+	if (device->dev.parent)
+		if (nla_put_string(msg, RDMA_NLDEV_PARENT_BDF,
+				   dev_name(device->dev.parent)))
+			return -EMSGSIZE;
+
+	return 0;
 }
 
 static int fill_port_info(struct sk_buff *msg,
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index d2f5b8396243..7495104668eb 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -533,6 +533,11 @@  enum rdma_nldev_attr {
 
 	RDMA_NLDEV_ATTR_RES_RAW,	/* binary */
 
+	/*
+	 * Parent device BDF (bus, device, function).
+	 */
+	RDMA_NLDEV_PARENT_BDF,			/* string */
+
 	/*
 	 * Always the end
 	 */