diff mbox series

[v2,rdma-core] irdma: Add ice and irdma to kernel-boot rules

Message ID 20210823154816.2027-1-tatyana.e.nikolova@intel.com (mailing list archive)
State Changes Requested
Delegated to: Jason Gunthorpe
Headers show
Series [v2,rdma-core] irdma: Add ice and irdma to kernel-boot rules | expand

Commit Message

Nikolova, Tatyana E Aug. 23, 2021, 3:48 p.m. UTC
Add ice and irdma to kernel-boot rules so that
these devices are recognized as iWARP and RoCE capable.

Otherwise the port mapper service which is only relevant
for iWARP devices may not start automatically after boot.

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
---
 kernel-boot/rdma-description.rules | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jason Gunthorpe Aug. 23, 2021, 4:11 p.m. UTC | #1
On Mon, Aug 23, 2021 at 10:48:16AM -0500, Tatyana Nikolova wrote:
> Add ice and irdma to kernel-boot rules so that
> these devices are recognized as iWARP and RoCE capable.
> 
> Otherwise the port mapper service which is only relevant
> for iWARP devices may not start automatically after boot.
> 
> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
>  kernel-boot/rdma-description.rules | 2 ++
>  1 file changed, 2 insertions(+)

Given that ice is both iwarp and roce, is there some better way to
detect this? Doesn't the aux device encode it?

Jason
Nikolova, Tatyana E Sept. 2, 2021, 3:29 p.m. UTC | #2
> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, August 23, 2021 11:11 AM
> To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
> Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org
> Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to kernel-boot
> rules
> 
> On Mon, Aug 23, 2021 at 10:48:16AM -0500, Tatyana Nikolova wrote:
> > Add ice and irdma to kernel-boot rules so that these devices are
> > recognized as iWARP and RoCE capable.
> >
> > Otherwise the port mapper service which is only relevant for iWARP
> > devices may not start automatically after boot.
> >
> > Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
> > kernel-boot/rdma-description.rules | 2 ++
> >  1 file changed, 2 insertions(+)
> 
> Given that ice is both iwarp and roce, is there some better way to detect
> this? Doesn't the aux device encode it?

Hi Jason,

We tried a few experiments without success. The auxiliary devices alias with our driver and not ice, so maybe this is the reason?

Here is an example of what we tried. 

udevadm info /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
E: DRIVER=irdma
E: MODALIAS=auxiliary:ice.roce
E: SUBSYSTEM=auxiliary

udevadm info /sys/bus/auxiliary/devices/ice.roce.0
P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
E: DRIVER=irdma
E: MODALIAS=auxiliary:ice.roce
E: SUBSYSTEM=auxiliary

Given the udevadm output, we put the following line in the udev rdma-description.rules:

SUBSYSTEMS=="auxiliary", DEVPATH=="*/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0/*", ENV{ID_RDMA_ROCE}="1"

Thank you,
Tatyana

udevadm info
> > /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DRIVER=irdma
> > E: MODALIAS=auxiliary:ice.roce
> > E: SUBSYSTEM=auxiliary
> >
> > udevadm info /sys/bus/auxiliary/devices/ice.roce.0
> > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DRIVER=irdma
> > E: MODALIAS=auxiliary:ice.roce
> > E: SUBSYSTEM=auxiliary
> >
> > We put the following line in the udev rdma-description.rules:
> >
> > SUBSYSTEMS=="auxiliary",
> > DEVPATH=="*/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0/*"
> > ,
> > ENV{ID_RDMA_ROCE}="1"
Jason Gunthorpe Sept. 2, 2021, 3:40 p.m. UTC | #3
On Thu, Sep 02, 2021 at 03:29:43PM +0000, Nikolova, Tatyana E wrote:
> > Given that ice is both iwarp and roce, is there some better way to detect
> > this? Doesn't the aux device encode it?
> 
> Hi Jason,
> 
> We tried a few experiments without success. The auxiliary devices
> alias with our driver and not ice, so maybe this is the reason?
> 
> Here is an example of what we tried. 
> 
> udevadm info /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> E: DRIVER=irdma
> E: MODALIAS=auxiliary:ice.roce
> E: SUBSYSTEM=auxiliary
> 
> udevadm info /sys/bus/auxiliary/devices/ice.roce.0
> P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> E: DRIVER=irdma
> E: MODALIAS=auxiliary:ice.roce
> E: SUBSYSTEM=auxiliary
> 
> Given the udevadm output, we put the following line in the udev rdma-description.rules:
> 
> SUBSYSTEMS=="auxiliary", DEVPATH=="*/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0/*", ENV{ID_RDMA_ROCE}="1"

What is the SUBSYSTEM=="infiniband" device like?

This seems like the right direction, you need to wrangle udev though..

Jason
Leon Romanovsky Sept. 2, 2021, 4:03 p.m. UTC | #4
On Thu, Sep 02, 2021 at 03:29:43PM +0000, Nikolova, Tatyana E wrote:
> 
> 
> > -----Original Message-----
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Monday, August 23, 2021 11:11 AM
> > To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
> > Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org
> > Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to kernel-boot
> > rules
> > 
> > On Mon, Aug 23, 2021 at 10:48:16AM -0500, Tatyana Nikolova wrote:
> > > Add ice and irdma to kernel-boot rules so that these devices are
> > > recognized as iWARP and RoCE capable.
> > >
> > > Otherwise the port mapper service which is only relevant for iWARP
> > > devices may not start automatically after boot.
> > >
> > > Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
> > > kernel-boot/rdma-description.rules | 2 ++
> > >  1 file changed, 2 insertions(+)
> > 
> > Given that ice is both iwarp and roce, is there some better way to detect
> > this? Doesn't the aux device encode it?
> 
> Hi Jason,
> 
> We tried a few experiments without success. The auxiliary devices alias with our driver and not ice, so maybe this is the reason?
> 
> Here is an example of what we tried. 
> 
> udevadm info /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> E: DRIVER=irdma
> E: MODALIAS=auxiliary:ice.roce
> E: SUBSYSTEM=auxiliary
> 
> udevadm info /sys/bus/auxiliary/devices/ice.roce.0
> P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> E: DRIVER=irdma
> E: MODALIAS=auxiliary:ice.roce
> E: SUBSYSTEM=auxiliary

Everything will be much easier if you follow my initial review comment about auxiliary
bus naming when irdma driver was added.

The RoCE device should be:
P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
E: MODALIAS=auxiliary:ice.roce

and the iWARP device needs to be:
P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.iwarp.0
E: MODALIAS=auxiliary:ice.iwarp

Thanks
Nikolova, Tatyana E Sept. 2, 2021, 4:13 p.m. UTC | #5
> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, September 2, 2021 11:03 AM
> To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
> Cc: Jason Gunthorpe <jgg@nvidia.com>; dledford@redhat.com; linux-
> rdma@vger.kernel.org
> Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to kernel-boot
> rules
> 
> On Thu, Sep 02, 2021 at 03:29:43PM +0000, Nikolova, Tatyana E wrote:
> >
> >
> > > -----Original Message-----
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Monday, August 23, 2021 11:11 AM
> > > To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
> > > Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org
> > > Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to
> > > kernel-boot rules
> > >
> > > On Mon, Aug 23, 2021 at 10:48:16AM -0500, Tatyana Nikolova wrote:
> > > > Add ice and irdma to kernel-boot rules so that these devices are
> > > > recognized as iWARP and RoCE capable.
> > > >
> > > > Otherwise the port mapper service which is only relevant for iWARP
> > > > devices may not start automatically after boot.
> > > >
> > > > Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
> > > > kernel-boot/rdma-description.rules | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > >
> > > Given that ice is both iwarp and roce, is there some better way to
> > > detect this? Doesn't the aux device encode it?
> >
> > Hi Jason,
> >
> > We tried a few experiments without success. The auxiliary devices alias
> with our driver and not ice, so maybe this is the reason?
> >
> > Here is an example of what we tried.
> >
> > udevadm info
> > /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DRIVER=irdma
> > E: MODALIAS=auxiliary:ice.roce
> > E: SUBSYSTEM=auxiliary
> >
> > udevadm info /sys/bus/auxiliary/devices/ice.roce.0
> > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DRIVER=irdma
> > E: MODALIAS=auxiliary:ice.roce
> > E: SUBSYSTEM=auxiliary
> 
> Everything will be much easier if you follow my initial review comment about
> auxiliary bus naming when irdma driver was added.
> 
> The RoCE device should be:
> P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> E: MODALIAS=auxiliary:ice.roce
> 
> and the iWARP device needs to be:
> P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.iwarp.0
> E: MODALIAS=auxiliary:ice.iwarp

Hi Leon,
 
This is what we have now. We just provided an example with RoCE.

> 
> Thanks
Leon Romanovsky Sept. 2, 2021, 11:23 p.m. UTC | #6
On Thu, Sep 02, 2021 at 04:13:44PM +0000, Nikolova, Tatyana E wrote:
> 
> 
> > -----Original Message-----
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Thursday, September 2, 2021 11:03 AM
> > To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
> > Cc: Jason Gunthorpe <jgg@nvidia.com>; dledford@redhat.com; linux-
> > rdma@vger.kernel.org
> > Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to kernel-boot
> > rules
> > 
> > On Thu, Sep 02, 2021 at 03:29:43PM +0000, Nikolova, Tatyana E wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Monday, August 23, 2021 11:11 AM
> > > > To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
> > > > Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org
> > > > Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to
> > > > kernel-boot rules
> > > >
> > > > On Mon, Aug 23, 2021 at 10:48:16AM -0500, Tatyana Nikolova wrote:
> > > > > Add ice and irdma to kernel-boot rules so that these devices are
> > > > > recognized as iWARP and RoCE capable.
> > > > >
> > > > > Otherwise the port mapper service which is only relevant for iWARP
> > > > > devices may not start automatically after boot.
> > > > >
> > > > > Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
> > > > > kernel-boot/rdma-description.rules | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > >
> > > > Given that ice is both iwarp and roce, is there some better way to
> > > > detect this? Doesn't the aux device encode it?
> > >
> > > Hi Jason,
> > >
> > > We tried a few experiments without success. The auxiliary devices alias
> > with our driver and not ice, so maybe this is the reason?
> > >
> > > Here is an example of what we tried.
> > >
> > > udevadm info
> > > /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > E: DRIVER=irdma
> > > E: MODALIAS=auxiliary:ice.roce
> > > E: SUBSYSTEM=auxiliary
> > >
> > > udevadm info /sys/bus/auxiliary/devices/ice.roce.0
> > > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > E: DRIVER=irdma
> > > E: MODALIAS=auxiliary:ice.roce
> > > E: SUBSYSTEM=auxiliary
> > 
> > Everything will be much easier if you follow my initial review comment about
> > auxiliary bus naming when irdma driver was added.
> > 
> > The RoCE device should be:
> > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: MODALIAS=auxiliary:ice.roce
> > 
> > and the iWARP device needs to be:
> > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.iwarp.0
> > E: MODALIAS=auxiliary:ice.iwarp
> 
> Hi Leon,
>  
> This is what we have now. We just provided an example with RoCE.

Great, so it seems like you almost there.

> 
> > 
> > Thanks
Nikolova, Tatyana E Sept. 20, 2021, 7:41 p.m. UTC | #7
> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, September 2, 2021 10:40 AM
> To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
> Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org
> Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to kernel-boot
> rules
> 
> On Thu, Sep 02, 2021 at 03:29:43PM +0000, Nikolova, Tatyana E wrote:
> > > Given that ice is both iwarp and roce, is there some better way to
> > > detect this? Doesn't the aux device encode it?
> >
> > Hi Jason,
> >
> > We tried a few experiments without success. The auxiliary devices
> > alias with our driver and not ice, so maybe this is the reason?
> >
> > Here is an example of what we tried.
> >
> > udevadm info
> > /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DRIVER=irdma
> > E: MODALIAS=auxiliary:ice.roce
> > E: SUBSYSTEM=auxiliary
> >
> > udevadm info /sys/bus/auxiliary/devices/ice.roce.0
> > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > E: DRIVER=irdma
> > E: MODALIAS=auxiliary:ice.roce
> > E: SUBSYSTEM=auxiliary
> >
> > Given the udevadm output, we put the following line in the udev rdma-
> description.rules:
> >
> > SUBSYSTEMS=="auxiliary",
> DEVPATH=="*/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0/*",
> ENV{ID_RDMA_ROCE}="1"
> 
> What is the SUBSYSTEM=="infiniband" device like?
> 
> This seems like the right direction, you need to wrangle udev though..
> 

Hi Jason,

After more research and given the udevadm output, we revised the irdma udev rule to make it work. Could you please review the patch bellow?

diff --git a/kernel-boot/rdma-description.rules b/kernel-boot/rdma-description.rules
index 48a7cede..09deb451 100644
--- a/kernel-boot/rdma-description.rules
+++ b/kernel-boot/rdma-description.rules
@@ -1,7 +1,7 @@
 # This is a version of net-description.rules for /sys/class/infiniband devices
 
 ACTION=="remove", GOTO="rdma_description_end"
-SUBSYSTEM!="infiniband", GOTO="rdma_description_end"
+SUBSYSTEM!="infiniband", GOTO="rdma_infiniband_end"
 
 # NOTE: DRIVERS searches up the sysfs path to find the driver that is bound to  # the PCI/etc device that the RDMA device is linked to. This is not the kernel @@ -40,4 +40,9 @@ DEVPATH=="*/infiniband/rxe*", ATTR{parent}=="*", ENV{ID_RDMA_ROCE}="1"
 SUBSYSTEMS=="pci", ENV{ID_BUS}="pci", ENV{ID_VENDOR_ID}="$attr{vendor}", ENV{ID_MODEL_ID}="$attr{device}"
 SUBSYSTEMS=="pci", IMPORT{builtin}="hwdb --subsystem=pci"
 
+LABEL="rdma_infiniband_end"
+
+SUBSYSTEM!="auxiliary", GOTO="rdma_description_end"
+KERNEL=="ice.iwarp.?", ENV{ID_RDMA_IWARP}="1" 
+KERNEL=="ice.roce.?", ENV{ID_RDMA_ROCE}="1"
 LABEL="rdma_description_end"

Thank you,
Tatyana
Jason Gunthorpe Sept. 20, 2021, 11:23 p.m. UTC | #8
On Mon, Sep 20, 2021 at 07:41:21PM +0000, Nikolova, Tatyana E wrote:
> 
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, September 2, 2021 10:40 AM
> > To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
> > Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org
> > Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to kernel-boot
> > rules
> > 
> > On Thu, Sep 02, 2021 at 03:29:43PM +0000, Nikolova, Tatyana E wrote:
> > > > Given that ice is both iwarp and roce, is there some better way to
> > > > detect this? Doesn't the aux device encode it?
> > >
> > > Hi Jason,
> > >
> > > We tried a few experiments without success. The auxiliary devices
> > > alias with our driver and not ice, so maybe this is the reason?
> > >
> > > Here is an example of what we tried.
> > >
> > > udevadm info
> > > /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > E: DRIVER=irdma
> > > E: MODALIAS=auxiliary:ice.roce
> > > E: SUBSYSTEM=auxiliary
> > >
> > > udevadm info /sys/bus/auxiliary/devices/ice.roce.0
> > > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > E: DRIVER=irdma
> > > E: MODALIAS=auxiliary:ice.roce
> > > E: SUBSYSTEM=auxiliary
> > >
> > > Given the udevadm output, we put the following line in the udev rdma-
> > description.rules:
> > >
> > > SUBSYSTEMS=="auxiliary",
> > DEVPATH=="*/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0/*",
> > ENV{ID_RDMA_ROCE}="1"
> > 
> > What is the SUBSYSTEM=="infiniband" device like?
> > 
> > This seems like the right direction, you need to wrangle udev though..
> > 
> 
> Hi Jason,
> 
> After more research and given the udevadm output, we revised the irdma udev rule to make it work. Could you please review the patch bellow?
> 
> diff --git a/kernel-boot/rdma-description.rules b/kernel-boot/rdma-description.rules
> index 48a7cede..09deb451 100644
> +++ b/kernel-boot/rdma-description.rules
> @@ -1,7 +1,7 @@
>  # This is a version of net-description.rules for /sys/class/infiniband devices
>  
>  ACTION=="remove", GOTO="rdma_description_end"
> -SUBSYSTEM!="infiniband", GOTO="rdma_description_end"
> +SUBSYSTEM!="infiniband", GOTO="rdma_infiniband_end"
>  
>  # NOTE: DRIVERS searches up the sysfs path to find the driver that is bound to  # the PCI/etc device that the RDMA device is linked to. This is not the kernel @@ -40,4 +40,9 @@ DEVPATH=="*/infiniband/rxe*", ATTR{parent}=="*", ENV{ID_RDMA_ROCE}="1"
>  SUBSYSTEMS=="pci", ENV{ID_BUS}="pci", ENV{ID_VENDOR_ID}="$attr{vendor}", ENV{ID_MODEL_ID}="$attr{device}"
>  SUBSYSTEMS=="pci", IMPORT{builtin}="hwdb --subsystem=pci"
>  
> +LABEL="rdma_infiniband_end"
> +
> +SUBSYSTEM!="auxiliary", GOTO="rdma_description_end"
> +KERNEL=="ice.iwarp.?", ENV{ID_RDMA_IWARP}="1" 
> +KERNEL=="ice.roce.?", ENV{ID_RDMA_ROCE}="1"
>  LABEL="rdma_description_end"

This doesn't seem right, the ID_* must be applied to an infiniband
device or the other stuff doesn't that consumes this won't work right.

What does the udev debugging say about these ID tags?

The SUBSYSTEMS=="" is the right approach, as shown above for the other
metadata. If you are having trobule I'm wondering if there is some
kind of kernel problem creating the wrong sysfs?

Jason
Leon Romanovsky Oct. 10, 2021, 9:42 a.m. UTC | #9
On Mon, Aug 23, 2021 at 10:48:16AM -0500, Tatyana Nikolova wrote:
> Add ice and irdma to kernel-boot rules so that
> these devices are recognized as iWARP and RoCE capable.
> 
> Otherwise the port mapper service which is only relevant
> for iWARP devices may not start automatically after boot.
> 
> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
> ---
>  kernel-boot/rdma-description.rules | 2 ++
>  1 file changed, 2 insertions(+)

Tatyana,

Are you planning to resubmit it?

Thanks

> 
> diff --git a/kernel-boot/rdma-description.rules b/kernel-boot/rdma-description.rules
> index 48a7ced..f2f7b38 100644
> --- a/kernel-boot/rdma-description.rules
> +++ b/kernel-boot/rdma-description.rules
> @@ -24,11 +24,13 @@ DRIVERS=="hfi1", ENV{ID_RDMA_OPA}="1"
>  # Hardware that supports iWarp
>  DRIVERS=="cxgb4", ENV{ID_RDMA_IWARP}="1"
>  DRIVERS=="i40e", ENV{ID_RDMA_IWARP}="1"
> +DRIVERS=="ice", ENV{ID_RDMA_IWARP}="1"
>  
>  # Hardware that supports RoCE
>  DRIVERS=="be2net", ENV{ID_RDMA_ROCE}="1"
>  DRIVERS=="bnxt_en", ENV{ID_RDMA_ROCE}="1"
>  DRIVERS=="hns", ENV{ID_RDMA_ROCE}="1"
> +DRIVERS=="ice", ENV{ID_RDMA_ROCE}="1"
>  DRIVERS=="mlx4_core", ENV{ID_RDMA_ROCE}="1"
>  DRIVERS=="mlx5_core", ENV{ID_RDMA_ROCE}="1"
>  DRIVERS=="qede", ENV{ID_RDMA_ROCE}="1"
> -- 
> 1.8.3.1
>
Nikolova, Tatyana E Oct. 14, 2021, 8:11 p.m. UTC | #10
> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, September 20, 2021 6:24 PM
> To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
> Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org
> Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to kernel-boot
> rules
> 
> On Mon, Sep 20, 2021 at 07:41:21PM +0000, Nikolova, Tatyana E wrote:
> >
> >
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Thursday, September 2, 2021 10:40 AM
> > > To: Nikolova, Tatyana E <tatyana.e.nikolova@intel.com>
> > > Cc: dledford@redhat.com; leon@kernel.org; linux-rdma@vger.kernel.org
> > > Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to
> > > kernel-boot rules
> > >
> > > On Thu, Sep 02, 2021 at 03:29:43PM +0000, Nikolova, Tatyana E wrote:
> > > > > Given that ice is both iwarp and roce, is there some better way
> > > > > to detect this? Doesn't the aux device encode it?
> > > >
> > > > Hi Jason,
> > > >
> > > > We tried a few experiments without success. The auxiliary devices
> > > > alias with our driver and not ice, so maybe this is the reason?
> > > >
> > > > Here is an example of what we tried.
> > > >
> > > > udevadm info
> > > > /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > > E:
> > > > DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > > E: DRIVER=irdma
> > > > E: MODALIAS=auxiliary:ice.roce
> > > > E: SUBSYSTEM=auxiliary
> > > >
> > > > udevadm info /sys/bus/auxiliary/devices/ice.roce.0
> > > > P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > > E:
> > > > DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0
> > > > E: DRIVER=irdma
> > > > E: MODALIAS=auxiliary:ice.roce
> > > > E: SUBSYSTEM=auxiliary
> > > >
> > > > Given the udevadm output, we put the following line in the udev
> > > > rdma-
> > > description.rules:
> > > >
> > > > SUBSYSTEMS=="auxiliary",
> > > DEVPATH=="*/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0/
> > > *",
> > > ENV{ID_RDMA_ROCE}="1"
> > >
> > > What is the SUBSYSTEM=="infiniband" device like?
> > >
> > > This seems like the right direction, you need to wrangle udev though..
> > >
> >
> > Hi Jason,
> >
> > After more research and given the udevadm output, we revised the irdma
> udev rule to make it work. Could you please review the patch bellow?
> >
> > diff --git a/kernel-boot/rdma-description.rules
> > b/kernel-boot/rdma-description.rules
> > index 48a7cede..09deb451 100644
> > +++ b/kernel-boot/rdma-description.rules
> > @@ -1,7 +1,7 @@
> >  # This is a version of net-description.rules for
> > /sys/class/infiniband devices
> >
> >  ACTION=="remove", GOTO="rdma_description_end"
> > -SUBSYSTEM!="infiniband", GOTO="rdma_description_end"
> > +SUBSYSTEM!="infiniband", GOTO="rdma_infiniband_end"
> >
> >  # NOTE: DRIVERS searches up the sysfs path to find the driver that is
> bound to  # the PCI/etc device that the RDMA device is linked to. This is not
> the kernel @@ -40,4 +40,9 @@ DEVPATH=="*/infiniband/rxe*",
> ATTR{parent}=="*", ENV{ID_RDMA_ROCE}="1"
> >  SUBSYSTEMS=="pci", ENV{ID_BUS}="pci",
> ENV{ID_VENDOR_ID}="$attr{vendor}", ENV{ID_MODEL_ID}="$attr{device}"
> >  SUBSYSTEMS=="pci", IMPORT{builtin}="hwdb --subsystem=pci"
> >
> > +LABEL="rdma_infiniband_end"
> > +
> > +SUBSYSTEM!="auxiliary", GOTO="rdma_description_end"
> > +KERNEL=="ice.iwarp.?", ENV{ID_RDMA_IWARP}="1"
> > +KERNEL=="ice.roce.?", ENV{ID_RDMA_ROCE}="1"
> >  LABEL="rdma_description_end"
> 
> This doesn't seem right, the ID_* must be applied to an infiniband device or
> the other stuff doesn't that consumes this won't work right.

Hi Jason,

Based on the following output, it seems that some systemd services won't work. I just tested with the port mapper which worked.

udevadm info -q all  /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.iwarp.0
P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.iwarp.0
E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.iwarp.0
E: DRIVER=irdma
E: ID_RDMA_IWARP=1
E: MODALIAS=auxiliary:ice.iwarp
E: SUBSYSTEM=auxiliary
E: SYSTEMD_WANTS=iwpmd.service
E: TAGS=:systemd:
E: USEC_INITIALIZED=33683420

The parent of the aux device (and our ib device) is 

'/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0':
    KERNELS=="0000:2f:00.0"
    SUBSYSTEMS=="pci"
    DRIVERS=="ice"
    ATTRS{ari_enabled}=="1"
    ATTRS{broken_parity_status}=="0"
    ATTRS{class}=="0x020000"
...

If we need to use the aux device name in the udev rules, then I am not aware how to get to the aux device through the infiniband or the pci subsystem.

> What does the udev debugging say about these ID tags?
> 
> The SUBSYSTEMS=="" is the right approach, as shown above for the other
> metadata. If you are having trobule I'm wondering if there is some kind of
> kernel problem creating the wrong sysfs?
> 

Previously I was using an RC1 kernel and seeing issues with sysfs. After switching to a GA kernel, it works better. 

udevadm info --attribute-walk /sys/class/infiniband/rocep47s0f0

  looking at device '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/infiniband/rocep47s0f0':
    KERNEL=="rocep47s0f0"
    SUBSYSTEM=="infiniband"
    DRIVER==""
    ATTR{fw_ver}=="1.48"
    ATTR{node_desc}==""
    ATTR{node_guid}=="6a05:caff:fec1:c790"
    ATTR{node_type}=="1: CA"
    ATTR{sys_image_guid}=="6805:cac1:c790:0000"

  looking at parent device '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0':
    KERNELS=="0000:2f:00.0"
    SUBSYSTEMS=="pci"
    DRIVERS=="ice"
    ATTRS{ari_enabled}=="1"
    ATTRS{broken_parity_status}=="0"
    ATTRS{class}=="0x020000" 
    ...

So adding the following to rdma-description.rules seems to work. Is this acceptable?

diff --git a/kernel-boot/rdma-description.rules b/kernel-boot/rdma-description.rules
index 48a7ced..9a18b67 100644
--- a/kernel-boot/rdma-description.rules
+++ b/kernel-boot/rdma-description.rules
@@ -33,6 +33,8 @@ DRIVERS=="mlx4_core", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="mlx5_core", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="qede", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="vmw_pvrdma", ENV{ID_RDMA_ROCE}="1"
+KERNEL=="iw*", ENV{ID_RDMA_IWARP}="1"
+KERNEL=="roce*", ENV{ID_RDMA_ROCE}="1"
 DEVPATH=="*/infiniband/rxe*", ATTR{parent}=="*", ENV{ID_RDMA_ROCE}="1"

This script results in the following settings:

udevadm info -q all /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/infiniband/iw-ifname
P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/infiniband/iw-ifname
E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/infiniband/iw-ifname
E: ID_BUS=pci
E: ID_MODEL_ID=0x1593
E: ID_PCI_CLASS_FROM_DATABASE=Network controller
E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
E: ID_RDMA_IWARP=1
E: ID_VENDOR_FROM_DATABASE=Intel Corporation
E: ID_VENDOR_ID=0x8086
E: NAME=iw-ifname
E: SUBSYSTEM=infiniband
E: SYSTEMD_WANTS=rdma-ndd.service iwpmd.service rdma-hw.target rdma-load-modules@iwarp.service
E: TAGS=:systemd:
E: USEC_INITIALIZED=41070786

Thank you,
Tatyana
Jason Gunthorpe Oct. 14, 2021, 11:36 p.m. UTC | #11
On Thu, Oct 14, 2021 at 08:11:25PM +0000, Nikolova, Tatyana E wrote:
> Based on the following output, it seems that some systemd services
> won't work. I just tested with the port mapper which worked.
> 
> udevadm info -q all  /sys/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.iwarp.0
> P: /devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.iwarp.0
> E: DEVPATH=/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.iwarp.0
> E: DRIVER=irdma
> E: ID_RDMA_IWARP=1
> E: MODALIAS=auxiliary:ice.iwarp
> E: SUBSYSTEM=auxiliary
> E: SYSTEMD_WANTS=iwpmd.service
> E: TAGS=:systemd:
> E: USEC_INITIALIZED=33683420

Right, this is not an RDMA device, so it should not have ID_RDMA_* set
on it at all.

> If we need to use the aux device name in the udev rules, then I am
> not aware how to get to the aux device through the infiniband or the
> pci subsystem.
> 
> > What does the udev debugging say about these ID tags?
> > 
> > The SUBSYSTEMS=="" is the right approach, as shown above for the other
> > metadata. If you are having trobule I'm wondering if there is some kind of
> > kernel problem creating the wrong sysfs?
> >
> 
> Previously I was using an RC1 kernel and seeing issues with sysfs. After switching to a GA kernel, it works better. 
> 
> udevadm info --attribute-walk /sys/class/infiniband/rocep47s0f0
>
>   looking at device '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/infiniband/rocep47s0f0':

This looks like the problem. For any of this to work the infiniband
device needs to be parented to the aux device, not the PCI device.

mlx5 did not due this for backwards compat reasons, but this is a new
driver so it could do it properly.

Then you can use SUBSYSTEMS and so forth as I first suggested.

> diff --git a/kernel-boot/rdma-description.rules b/kernel-boot/rdma-description.rules
> index 48a7ced..9a18b67 100644
> +++ b/kernel-boot/rdma-description.rules
> @@ -33,6 +33,8 @@ DRIVERS=="mlx4_core", ENV{ID_RDMA_ROCE}="1"
>  DRIVERS=="mlx5_core", ENV{ID_RDMA_ROCE}="1"
>  DRIVERS=="qede", ENV{ID_RDMA_ROCE}="1"
>  DRIVERS=="vmw_pvrdma", ENV{ID_RDMA_ROCE}="1"
> +KERNEL=="iw*", ENV{ID_RDMA_IWARP}="1"
> +KERNEL=="roce*", ENV{ID_RDMA_ROCE}="1"

No, this is matching against the RDMA device name.

The other alternative is to replace this entire file with a C program
that is invoked by udev

The C program would use netlink to query the rdma device properties
and return the ID_xx strings for setting.

Jason
Nikolova, Tatyana E Nov. 2, 2022, 4:40 p.m. UTC | #12
Hi Jason,

I know it has been a while since we discussed this. Based on your feedback, we are proposing another solution for the irdma kernel-boot rules. Could you please review it?

> > udevadm info --attribute-walk /sys/class/infiniband/rocep47s0f0
> >
> >   looking at device
> '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/infiniband/rocep47s0f0':
> 
> This looks like the problem. For any of this to work the infiniband 
> device needs to be parented to the aux device, not the PCI device.
> 
> mlx5 did not due this for backwards compat reasons, but this is a new 
> driver so it could do it properly.
> 
> Then you can use SUBSYSTEMS and so forth as I first suggested.

Here is a patch for irdma driver to set aux_dev as the ibdev parent:

diff --git a/drivers/infiniband/hw/irdma/i40iw_if.c b/drivers/infiniband/hw/irdma/i40iw_if.c
index 4053ead32416..e08fbfe148ea 100644
--- a/drivers/infiniband/hw/irdma/i40iw_if.c
+++ b/drivers/infiniband/hw/irdma/i40iw_if.c
@@ -90,6 +90,7 @@ static void i40iw_fill_device_info(struct irdma_device *iwdev, struct i40e_info
        iwdev->rcv_wnd = IRDMA_CM_DEFAULT_RCV_WND_SCALED;
        iwdev->rcv_wscale = IRDMA_CM_DEFAULT_RCV_WND_SCALE;
        iwdev->netdev = cdev_info->netdev;
+       iwdev->aux_dev = cdev_info->aux_dev;
        iwdev->vsi_num = 0;
 }
 
diff --git a/drivers/infiniband/hw/irdma/main.c b/drivers/infiniband/hw/irdma/main.c
index 514453777e07..835a087a3329 100644
--- a/drivers/infiniband/hw/irdma/main.c
+++ b/drivers/infiniband/hw/irdma/main.c
@@ -279,6 +279,7 @@ static int irdma_probe(struct auxiliary_device *aux_dev, const struct auxiliary_
        }
 
        irdma_fill_device_info(iwdev, pf, vsi);
+       iwdev->aux_dev = aux_dev;
        rf = iwdev->rf;
 
        err = irdma_ctrl_init_hw(rf);
diff --git a/drivers/infiniband/hw/irdma/main.h b/drivers/infiniband/hw/irdma/main.h
index 65e966ad3453..f2f86b882cef 100644
--- a/drivers/infiniband/hw/irdma/main.h
+++ b/drivers/infiniband/hw/irdma/main.h
@@ -329,6 +329,7 @@ struct irdma_device {
        struct ib_device ibdev;
        struct irdma_pci_f *rf;
        struct net_device *netdev;
+       struct auxiliary_device *aux_dev;
        struct workqueue_struct *cleanup_wq;
        struct irdma_sc_vsi vsi;
        struct irdma_cm_core cm_core;
diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index a22afbb25bc5..f1304b6b58b3 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -4549,7 +4549,6 @@ static int irdma_init_iw_device(struct irdma_device *iwdev)
  */
 static int irdma_init_rdma_device(struct irdma_device *iwdev)  {
-       struct pci_dev *pcidev = iwdev->rf->pcidev;
        int ret;
 
        if (iwdev->roce_mode) {
@@ -4561,7 +4560,7 @@ static int irdma_init_rdma_device(struct irdma_device *iwdev)
        }
        iwdev->ibdev.phys_port_cnt = 1;
        iwdev->ibdev.num_comp_vectors = iwdev->rf->ceqs_count;
-       iwdev->ibdev.dev.parent = &pcidev->dev;
+       iwdev->ibdev.dev.parent = &iwdev->aux_dev->dev;
        ib_set_device_ops(&iwdev->ibdev, &irdma_dev_ops);
 
        return 0;

Here is the udev output after this change:
    
    looking at device
    '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0/infiniband/rocep47s0f0':
        KERNEL=="rocep47s0f0"
        SUBSYSTEM=="infiniband"
        DRIVER==""
        ATTR{fw_ver}=="1.48"
        ATTR{node_desc}==""
        ATTR{node_guid}=="6a05:caff:fec1:c790"
        ATTR{node_type}=="1: CA"
        ATTR{sys_image_guid}=="6a05:caff:fec1:c790"
    
      looking at parent device
    '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0':
        KERNELS=="ice.roce.0"
        SUBSYSTEMS=="auxiliary"
        DRIVERS=="irdma"
    
      looking at parent device
    '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0':
        KERNELS=="0000:2f:00.0"
        SUBSYSTEMS=="pci"
        DRIVERS=="ice" 

Having the aux_dev as a parent of the irdma ibdev allows for udev rules to set ID_RDMA_IWARP or ID_RDMA_ROCE environment vars based on the auxiliary device name with the following patch:

diff --git a/kernel-boot/rdma-description.rules b/kernel-boot/rdma-description.rules
index 48a7ced..0b844bd 100644
--- a/kernel-boot/rdma-description.rules
+++ b/kernel-boot/rdma-description.rules
@@ -29,6 +29,8 @@ DRIVERS=="i40e", ENV{ID_RDMA_IWARP}="1"
 DRIVERS=="be2net", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="bnxt_en", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="hns", ENV{ID_RDMA_ROCE}="1"
+DRIVERS=="irdma", KERNELS=="*iw*", ENV{ID_RDMA_IWARP}="1"
+DRIVERS=="irdma", KERNELS=="*roce*", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="mlx4_core", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="mlx5_core", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="qede", ENV{ID_RDMA_ROCE}="1"

In addition to this patch, changes to rdma-core rdma_rename logic are necessary to handle ib devices with aux_dev parent and rename them by_onboard or by_pci the same way as it did before introducing the aux_dev parent.
Please review rdma-core PR at https://github.com/linux-rdma/rdma-core/pull/1248

Thank you,
Tatyana
Jason Gunthorpe Nov. 9, 2022, 1:45 p.m. UTC | #13
On Wed, Nov 02, 2022 at 04:40:20PM +0000, Nikolova, Tatyana E wrote:
> Hi Jason,
> 
> I know it has been a while since we discussed this. Based on your feedback, we are proposing another solution for the irdma kernel-boot rules. Could you please review it?
> 
> > > udevadm info --attribute-walk /sys/class/infiniband/rocep47s0f0
> > >
> > >   looking at device
> > '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/infiniband/rocep47s0f0':
> > 
> > This looks like the problem. For any of this to work the infiniband 
> > device needs to be parented to the aux device, not the PCI device.
> > 
> > mlx5 did not due this for backwards compat reasons, but this is a new 
> > driver so it could do it properly.
> > 
> > Then you can use SUBSYSTEMS and so forth as I first suggested.
> 
> Here is a patch for irdma driver to set aux_dev as the ibdev parent:
> 
> diff --git a/drivers/infiniband/hw/irdma/i40iw_if.c b/drivers/infiniband/hw/irdma/i40iw_if.c
> index 4053ead32416..e08fbfe148ea 100644
> --- a/drivers/infiniband/hw/irdma/i40iw_if.c
> +++ b/drivers/infiniband/hw/irdma/i40iw_if.c
> @@ -90,6 +90,7 @@ static void i40iw_fill_device_info(struct irdma_device *iwdev, struct i40e_info
>         iwdev->rcv_wnd = IRDMA_CM_DEFAULT_RCV_WND_SCALED;
>         iwdev->rcv_wscale = IRDMA_CM_DEFAULT_RCV_WND_SCALE;
>         iwdev->netdev = cdev_info->netdev;
> +       iwdev->aux_dev = cdev_info->aux_dev;
>         iwdev->vsi_num = 0;
>  }
>  
> diff --git a/drivers/infiniband/hw/irdma/main.c b/drivers/infiniband/hw/irdma/main.c
> index 514453777e07..835a087a3329 100644
> --- a/drivers/infiniband/hw/irdma/main.c
> +++ b/drivers/infiniband/hw/irdma/main.c
> @@ -279,6 +279,7 @@ static int irdma_probe(struct auxiliary_device *aux_dev, const struct auxiliary_
>         }
>  
>         irdma_fill_device_info(iwdev, pf, vsi);
> +       iwdev->aux_dev = aux_dev;
>         rf = iwdev->rf;
>  
>         err = irdma_ctrl_init_hw(rf);
> diff --git a/drivers/infiniband/hw/irdma/main.h b/drivers/infiniband/hw/irdma/main.h
> index 65e966ad3453..f2f86b882cef 100644
> --- a/drivers/infiniband/hw/irdma/main.h
> +++ b/drivers/infiniband/hw/irdma/main.h
> @@ -329,6 +329,7 @@ struct irdma_device {
>         struct ib_device ibdev;
>         struct irdma_pci_f *rf;
>         struct net_device *netdev;
> +       struct auxiliary_device *aux_dev;
>         struct workqueue_struct *cleanup_wq;
>         struct irdma_sc_vsi vsi;
>         struct irdma_cm_core cm_core;
> diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
> index a22afbb25bc5..f1304b6b58b3 100644
> --- a/drivers/infiniband/hw/irdma/verbs.c
> +++ b/drivers/infiniband/hw/irdma/verbs.c
> @@ -4549,7 +4549,6 @@ static int irdma_init_iw_device(struct irdma_device *iwdev)
>   */
>  static int irdma_init_rdma_device(struct irdma_device *iwdev)  {
> -       struct pci_dev *pcidev = iwdev->rf->pcidev;
>         int ret;
>  
>         if (iwdev->roce_mode) {
> @@ -4561,7 +4560,7 @@ static int irdma_init_rdma_device(struct irdma_device *iwdev)
>         }
>         iwdev->ibdev.phys_port_cnt = 1;
>         iwdev->ibdev.num_comp_vectors = iwdev->rf->ceqs_count;
> -       iwdev->ibdev.dev.parent = &pcidev->dev;
> +       iwdev->ibdev.dev.parent = &iwdev->aux_dev->dev;
>         ib_set_device_ops(&iwdev->ibdev, &irdma_dev_ops);
>  
>         return 0;
> 
> Here is the udev output after this change:
>     
>     looking at device
>     '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0/infiniband/rocep47s0f0':
>         KERNEL=="rocep47s0f0"
>         SUBSYSTEM=="infiniband"
>         DRIVER==""
>         ATTR{fw_ver}=="1.48"
>         ATTR{node_desc}==""
>         ATTR{node_guid}=="6a05:caff:fec1:c790"
>         ATTR{node_type}=="1: CA"
>         ATTR{sys_image_guid}=="6a05:caff:fec1:c790"
>     
>       looking at parent device
>     '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/ice.roce.0':
>         KERNELS=="ice.roce.0"
>         SUBSYSTEMS=="auxiliary"
>         DRIVERS=="irdma"
>     
>       looking at parent device
>     '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0':
>         KERNELS=="0000:2f:00.0"
>         SUBSYSTEMS=="pci"
>         DRIVERS=="ice" 

This looks right to me

Jason
Shiraz Saleem Jan. 13, 2023, 11:57 p.m. UTC | #14
> Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to kernel-boot rules
> 
> On Wed, Nov 02, 2022 at 04:40:20PM +0000, Nikolova, Tatyana E wrote:
> > Hi Jason,
> >
> > I know it has been a while since we discussed this. Based on your feedback, we
> are proposing another solution for the irdma kernel-boot rules. Could you please
> review it?
> >
> > > > udevadm info --attribute-walk /sys/class/infiniband/rocep47s0f0
> > > >
> > > >   looking at device
> > > '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/infiniband/rocep47s0f0':
> > >
> > > This looks like the problem. For any of this to work the infiniband
> > > device needs to be parented to the aux device, not the PCI device.
> > >
> > > mlx5 did not due this for backwards compat reasons, but this is a
> > > new driver so it could do it properly.
> > >

Hi Jason - This also impacts us in terms of backwards compatibility.

There are current applications/customers who are looking at "'/sys/class/infiniband/<ib_device>/device/"
for sysfs attributes like numa_node,  local_cpus etc. under the PCI device.

i.e. assuming the PCI device is ib device parent.

With parent change to auxiliary device, they won't be able find these attributes and potentially resort to default
configurations that are sub-optimal.

Shiraz
Jason Gunthorpe Jan. 14, 2023, 12:01 a.m. UTC | #15
On Fri, Jan 13, 2023 at 11:57:33PM +0000, Saleem, Shiraz wrote:
> > Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to kernel-boot rules
> > 
> > On Wed, Nov 02, 2022 at 04:40:20PM +0000, Nikolova, Tatyana E wrote:
> > > Hi Jason,
> > >
> > > I know it has been a while since we discussed this. Based on your feedback, we
> > are proposing another solution for the irdma kernel-boot rules. Could you please
> > review it?
> > >
> > > > > udevadm info --attribute-walk /sys/class/infiniband/rocep47s0f0
> > > > >
> > > > >   looking at device
> > > > '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/infiniband/rocep47s0f0':
> > > >
> > > > This looks like the problem. For any of this to work the infiniband
> > > > device needs to be parented to the aux device, not the PCI device.
> > > >
> > > > mlx5 did not due this for backwards compat reasons, but this is a
> > > > new driver so it could do it properly.
> > > >
> 
> Hi Jason - This also impacts us in terms of backwards compatibility.

This is a new driver, it by definition does not have backwards
compatability.

What you are talking about is drop in compatability against mlx5 which
is a very different thing.

> There are current applications/customers who are looking at
> "'/sys/class/infiniband/<ib_device>/device/" for sysfs attributes
> like numa_node, local_cpus etc. under the PCI device.

That is just wrong assumptions for how sysfs is structured, sadly.

> i.e. assuming the PCI device is ib device parent.
> 
> With parent change to auxiliary device, they won't be able find
> these attributes and potentially resort to default configurations
> that are sub-optimal.

Introducing new stuff often comes with an ecosystem upgrade as well..

Jason
Shiraz Saleem Jan. 17, 2023, 8:27 p.m. UTC | #16
> Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to kernel-boot rules
> 
> On Fri, Jan 13, 2023 at 11:57:33PM +0000, Saleem, Shiraz wrote:
> > > Subject: Re: [PATCH v2 rdma-core] irdma: Add ice and irdma to
> > > kernel-boot rules
> > >
> > > On Wed, Nov 02, 2022 at 04:40:20PM +0000, Nikolova, Tatyana E wrote:
> > > > Hi Jason,
> > > >
> > > > I know it has been a while since we discussed this. Based on your
> > > > feedback, we
> > > are proposing another solution for the irdma kernel-boot rules.
> > > Could you please review it?
> > > >
> > > > > > udevadm info --attribute-walk
> > > > > > /sys/class/infiniband/rocep47s0f0
> > > > > >
> > > > > >   looking at device
> > > > > '/devices/pci0000:2e/0000:2e:00.0/0000:2f:00.0/infiniband/rocep47s0f0':
> > > > >
> > > > > This looks like the problem. For any of this to work the
> > > > > infiniband device needs to be parented to the aux device, not the PCI
> device.
> > > > >
> > > > > mlx5 did not due this for backwards compat reasons, but this is
> > > > > a new driver so it could do it properly.
> > > > >
> >
> > Hi Jason - This also impacts us in terms of backwards compatibility.
> 
> This is a new driver, it by definition does not have backwards compatability.

irdma replaces i40iw (to support x722) which has been there for many years. irdma has been in the kernel since 5.14.
So we have similar concerns about compatibility.

If we are going with such a sysfs update for the ib_device, is it not better it is done across all vendor drivers using auxiliary bus similar to the restructuring for sysfs HW counters?

For consistency within the subsystem and it could enable a wider ecosystem adoption.

> 
> What you are talking about is drop in compatability against mlx5 which is a very
> different thing.
> 
> > There are current applications/customers who are looking at
> > "'/sys/class/infiniband/<ib_device>/device/" for sysfs attributes like
> > numa_node, local_cpus etc. under the PCI device.
> 
> That is just wrong assumptions for how sysfs is structured, sadly.
> 
> > i.e. assuming the PCI device is ib device parent.
> >
> > With parent change to auxiliary device, they won't be able find these
> > attributes and potentially resort to default configurations that are
> > sub-optimal.
> 
> Introducing new stuff often comes with an ecosystem upgrade as well..
> 
> Jason
Jason Gunthorpe Jan. 20, 2023, 6:02 p.m. UTC | #17
On Tue, Jan 17, 2023 at 08:27:49PM +0000, Saleem, Shiraz wrote:

> irdma replaces i40iw (to support x722) which has been there for many
> years. irdma has been in the kernel since 5.14.  So we have similar
> concerns about compatibility.

Well, I did caution you about wholesale replacing the driver...

> If we are going with such a sysfs update for the ib_device, is it
> not better it is done across all vendor drivers using auxiliary bus
> similar to the restructuring for sysfs HW counters?

I think if the ecosystem issues are fixed then we could talk about
this.

Jason
diff mbox series

Patch

diff --git a/kernel-boot/rdma-description.rules b/kernel-boot/rdma-description.rules
index 48a7ced..f2f7b38 100644
--- a/kernel-boot/rdma-description.rules
+++ b/kernel-boot/rdma-description.rules
@@ -24,11 +24,13 @@  DRIVERS=="hfi1", ENV{ID_RDMA_OPA}="1"
 # Hardware that supports iWarp
 DRIVERS=="cxgb4", ENV{ID_RDMA_IWARP}="1"
 DRIVERS=="i40e", ENV{ID_RDMA_IWARP}="1"
+DRIVERS=="ice", ENV{ID_RDMA_IWARP}="1"
 
 # Hardware that supports RoCE
 DRIVERS=="be2net", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="bnxt_en", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="hns", ENV{ID_RDMA_ROCE}="1"
+DRIVERS=="ice", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="mlx4_core", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="mlx5_core", ENV{ID_RDMA_ROCE}="1"
 DRIVERS=="qede", ENV{ID_RDMA_ROCE}="1"