diff mbox series

[RFC,v1,01/19] net/i40e: Add peer register/unregister to struct i40e_netdev_priv

Message ID 20190215171107.6464-2-shiraz.saleem@intel.com (mailing list archive)
State RFC
Headers show
Series Add unified Intel Ethernet RDMA driver (irdma) | expand

Commit Message

Shiraz Saleem Feb. 15, 2019, 5:10 p.m. UTC
Expose the register/unregister function pointers in the struct
i40e_netdev_priv which is accesible via the netdev_priv() interface
in the RDMA driver. On a netdev notification in the RDMA driver,
the appropriate LAN driver register/unregister functions are invoked
from the struct i40e_netdev_priv structure,

This is done in order to support single RDMA driver to work with
multiple LAN drivers over multi-generation Intel HW supporting RDMA

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h        |  1 +
 drivers/net/ethernet/intel/i40e/i40e_client.h | 10 ++++++++++
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  7 +++++++
 3 files changed, 18 insertions(+)

Comments

Jason Gunthorpe Feb. 15, 2019, 5:22 p.m. UTC | #1
On Fri, Feb 15, 2019 at 11:10:48AM -0600, Shiraz Saleem wrote:
> Expose the register/unregister function pointers in the struct
> i40e_netdev_priv which is accesible via the netdev_priv() interface
> in the RDMA driver. On a netdev notification in the RDMA driver,
> the appropriate LAN driver register/unregister functions are invoked
> from the struct i40e_netdev_priv structure,

Why? In later patches we get an entire device_add() based thing. Why do
you need two things?

The RDMA driver should bind to the thing that device_add created and
from there reliably get the netdev. It should not listen to netdev
notifiers for attachment.

It would be excellent if you could make this more general as pretty
much every single RDMA driver has some open coded (and often wrongly
locked) version of this attachment process.

This series is very big, so if you can see a way to make a general
attachment scheme based around device_add/etc it would be a great
pre-cursor series.

Jason
Shiraz Saleem Feb. 21, 2019, 2:19 a.m. UTC | #2
>Subject: Re: [RFC v1 01/19] net/i40e: Add peer register/unregister to struct
>i40e_netdev_priv
>
>On Fri, Feb 15, 2019 at 11:10:48AM -0600, Shiraz Saleem wrote:
>> Expose the register/unregister function pointers in the struct
>> i40e_netdev_priv which is accesible via the netdev_priv() interface in
>> the RDMA driver. On a netdev notification in the RDMA driver, the
>> appropriate LAN driver register/unregister functions are invoked from
>> the struct i40e_netdev_priv structure,
>
>Why? In later patches we get an entire device_add() based thing. Why do you
>need two things?
>
>The RDMA driver should bind to the thing that device_add created and from there
>reliably get the netdev. It should not listen to netdev notifiers for attachment.

In the new IDC mechanism between ice<->irdma, the LAN driver setups up the
device for us and attaches it to a software bus via device_add() based mechanism.
However, RDMA driver binds to the device only when the LAN 'register' function is
called in irdma. In order to do a unified RDMA driver, this register function is exposed
in the private netdev struct of each LAN driver, so we need a matching LAN driver
netdev (ice or i40e) in irdma to call the 'register' function in the first place.
.
Once the binding is done, the bus infrastructure triggers the probe() for the device
into the 'ice' driver and it in turn calls the irdma probe() where we can get also get
the netdev like your referring to. But there is no software bus object or device the
RDMA driver can bind to without a matching LAN driver being loaded and calling its
register. 

There is no ordering guarantee in which irdma, i40e and ice modules load.
The netdev notifier is for the case where the irdma loads before i40e or ice.
The irdma driver listen to the notifiers so that when a supporting LAN drivers netdev
appears (on load), it can register with it.

Shiraz
Jason Gunthorpe Feb. 21, 2019, 7:35 p.m. UTC | #3
On Thu, Feb 21, 2019 at 02:19:33AM +0000, Saleem, Shiraz wrote:
> >Subject: Re: [RFC v1 01/19] net/i40e: Add peer register/unregister to struct
> >i40e_netdev_priv
> >
> >On Fri, Feb 15, 2019 at 11:10:48AM -0600, Shiraz Saleem wrote:
> >> Expose the register/unregister function pointers in the struct
> >> i40e_netdev_priv which is accesible via the netdev_priv() interface in
> >> the RDMA driver. On a netdev notification in the RDMA driver, the
> >> appropriate LAN driver register/unregister functions are invoked from
> >> the struct i40e_netdev_priv structure,
> >
> >Why? In later patches we get an entire device_add() based thing. Why do you
> >need two things?
> >
> >The RDMA driver should bind to the thing that device_add created and from there
> >reliably get the netdev. It should not listen to netdev notifiers for attachment.
> 
> In the new IDC mechanism between ice<->irdma, the LAN driver setups up the
> device for us and attaches it to a software bus via device_add() based mechanism.
> However, RDMA driver binds to the device only when the LAN 'register' function is
> called in irdma.

That doesn't make sense. The PCI driver should always create the
required struct device attachment point when attachment is
becomes possible.

> There is no ordering guarantee in which irdma, i40e and ice modules load.
> The netdev notifier is for the case where the irdma loads before
> i40e or ice.

You are supposed to use the driver core to handle this ordering.

The pci driver creates the attachment points in the correct order,
when they are ready for use, and the driver core will automatically
attach registered device drivers to the attachement points, no matter
the module load loader.

You will have a netdev and a rdma attachment point, sounds like the
RDMA one is created once the netdev is happy.

Maybe what you are missing is a struct device_driver?

Jason
Dave Ertman Feb. 22, 2019, 8:13 p.m. UTC | #4
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@ziepe.ca]
> Sent: Thursday, February 21, 2019 11:35 AM
> To: Saleem, Shiraz <shiraz.saleem@intel.com>
> Cc: dledford@redhat.com; davem@davemloft.net; linux-
> rdma@vger.kernel.org; netdev@vger.kernel.org; Ismail, Mustafa
> <mustafa.ismail@intel.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> Patil, Kiran <kiran.patil@intel.com>; Ertman, David M
> <david.m.ertman@intel.com>
> Subject: Re: [RFC v1 01/19] net/i40e: Add peer register/unregister to struct
> i40e_netdev_priv
> 
> On Thu, Feb 21, 2019 at 02:19:33AM +0000, Saleem, Shiraz wrote:
> > >Subject: Re: [RFC v1 01/19] net/i40e: Add peer register/unregister to
> > >struct i40e_netdev_priv
> > >
> > >On Fri, Feb 15, 2019 at 11:10:48AM -0600, Shiraz Saleem wrote:
> > >> Expose the register/unregister function pointers in the struct
> > >> i40e_netdev_priv which is accesible via the netdev_priv() interface
> > >> in the RDMA driver. On a netdev notification in the RDMA driver,
> > >> the appropriate LAN driver register/unregister functions are
> > >> invoked from the struct i40e_netdev_priv structure,
> > >
> > >Why? In later patches we get an entire device_add() based thing. Why
> > >do you need two things?
> > >
> > >The RDMA driver should bind to the thing that device_add created and
> > >from there reliably get the netdev. It should not listen to netdev notifiers for
> attachment.
> >
> > In the new IDC mechanism between ice<->irdma, the LAN driver setups up
> > the device for us and attaches it to a software bus via device_add() based
> mechanism.
> > However, RDMA driver binds to the device only when the LAN 'register'
> > function is called in irdma.
> 
> That doesn't make sense. The PCI driver should always create the required
> struct device attachment point when attachment is becomes possible.
> 
> > There is no ordering guarantee in which irdma, i40e and ice modules load.
> > The netdev notifier is for the case where the irdma loads before i40e
> > or ice.
> 
> You are supposed to use the driver core to handle this ordering.
> 
> The pci driver creates the attachment points in the correct order, when they
> are ready for use, and the driver core will automatically attach registered
> device drivers to the attachement points, no matter the module load loader.
> 
> You will have a netdev and a rdma attachment point, sounds like the RDMA one
> is created once the netdev is happy.
> 
> Maybe what you are missing is a struct device_driver?
> 
> Jason

I am assuming that the term PCI driver is being used to mean the PCI
subsystem in the kernel.  If this assumption is wrong, please disregard the next
paragraph, but the following points will still apply.

The bus that the irdma driver is registering with is a software (pseudo) bus
and not a hardware bus, and since this software bus is being defined by the LAN
driver, the bus does not exist until a LAN driver is loaded, up, and ready to receive
registration from the irdma peer.  The PCI driver does not have anything to with this
bus, and has no ability to perform the described functions.  The irdma driver cannot
register with the software bus unless it registers with the LAN driver that controls the
bus.  The LAN driver's register function will call "driver_register(&drv->driver)" for the
registering irdma driver.

Since the irdma driver is a consolidated driver (supports both ice and i40e LAN
drivers), we cannot guarantee that a given LAN driver will load before the irdma
driver.  Even if we use module dependencies to make irdma depend on (ice ||
i40e), we have to consider the situation where a machine will have both an ice
supported LAN device and an i40e supported LAN device in it.  In this case, the
load order could be (e.g.) i40e -> irdma -> ice.  The irdma driver can check all
present netdevs when it loads to find the one that has the correct function
pointers in it, but it will have no way of knowing that a new software bus was
created by the second LAN driver to load.

This is why irdma is listening for netdev notifiers, so that whenever a new netdev
appears from a LAN driver loading after irdma, the irdma driver can evaluate
whether the new netdev was created by a LAN driver supported by irdma driver.

If I have misunderstood your concerns, I apologize and look forward to clarification.

Thanks,
-Dave Ertman
Jason Gunthorpe Feb. 22, 2019, 8:23 p.m. UTC | #5
On Fri, Feb 22, 2019 at 08:13:58PM +0000, Ertman, David M wrote:
> > From: Jason Gunthorpe [mailto:jgg@ziepe.ca]
> > Sent: Thursday, February 21, 2019 11:35 AM
> > To: Saleem, Shiraz <shiraz.saleem@intel.com>
> > Cc: dledford@redhat.com; davem@davemloft.net; linux-
> > rdma@vger.kernel.org; netdev@vger.kernel.org; Ismail, Mustafa
> > <mustafa.ismail@intel.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> > Patil, Kiran <kiran.patil@intel.com>; Ertman, David M
> > <david.m.ertman@intel.com>
> > Subject: Re: [RFC v1 01/19] net/i40e: Add peer register/unregister to struct
> > i40e_netdev_priv
> > 
> > On Thu, Feb 21, 2019 at 02:19:33AM +0000, Saleem, Shiraz wrote:
> > > >Subject: Re: [RFC v1 01/19] net/i40e: Add peer register/unregister to
> > > >struct i40e_netdev_priv
> > > >
> > > >On Fri, Feb 15, 2019 at 11:10:48AM -0600, Shiraz Saleem wrote:
> > > >> Expose the register/unregister function pointers in the struct
> > > >> i40e_netdev_priv which is accesible via the netdev_priv() interface
> > > >> in the RDMA driver. On a netdev notification in the RDMA driver,
> > > >> the appropriate LAN driver register/unregister functions are
> > > >> invoked from the struct i40e_netdev_priv structure,
> > > >
> > > >Why? In later patches we get an entire device_add() based thing. Why
> > > >do you need two things?
> > > >
> > > >The RDMA driver should bind to the thing that device_add created and
> > > >from there reliably get the netdev. It should not listen to netdev notifiers for
> > attachment.
> > >
> > > In the new IDC mechanism between ice<->irdma, the LAN driver setups up
> > > the device for us and attaches it to a software bus via device_add() based
> > mechanism.
> > > However, RDMA driver binds to the device only when the LAN 'register'
> > > function is called in irdma.
> > 
> > That doesn't make sense. The PCI driver should always create the required
> > struct device attachment point when attachment is becomes possible.
> > 
> > > There is no ordering guarantee in which irdma, i40e and ice modules load.
> > > The netdev notifier is for the case where the irdma loads before i40e
> > > or ice.
> > 
> > You are supposed to use the driver core to handle this ordering.
> > 
> > The pci driver creates the attachment points in the correct order, when they
> > are ready for use, and the driver core will automatically attach registered
> > device drivers to the attachement points, no matter the module load loader.
> > 
> > You will have a netdev and a rdma attachment point, sounds like the RDMA one
> > is created once the netdev is happy.
> > 
> > Maybe what you are missing is a struct device_driver?
> > 
> > Jason
> 
> I am assuming that the term PCI driver is being used to mean the PCI
> subsystem in the kernel.  If this assumption is wrong, please disregard the next
> paragraph, but the following points will still apply.

No, I mean the driver that has the struct pci_driver for the PCI
function. Maybe that is the LAN driver for this case.

> bus, and has no ability to perform the described functions.  The
> irdma driver cannot register with the software bus unless it
> registers with the LAN driver that controls the bus.  The LAN
> driver's register function will call "driver_register(&drv->driver)"
> for the registering irdma driver.

That isn't how to use the driver core.

> Since the irdma driver is a consolidated driver (supports both ice and i40e LAN
> drivers), we cannot guarantee that a given LAN driver will load before the irdma
> driver.  Even if we use module dependencies to make irdma depend on (ice ||
> i40e), we have to consider the situation where a machine will have both an ice
> supported LAN device and an i40e supported LAN device in it.  In this case, the
> load order could be (e.g.) i40e -> irdma -> ice.  The irdma driver can check all
> present netdevs when it loads to find the one that has the correct function
> pointers in it, but it will have no way of knowing that a new software bus was
> created by the second LAN driver to load.

This is why you use the driver core to manage driver binding.

> This is why irdma is listening for netdev notifiers, so that whenever a new netdev
> appears from a LAN driver loading after irdma, the irdma driver can evaluate
> whether the new netdev was created by a LAN driver supported by irdma driver.

Register a device driver to the driver core and wait for the driver
core to call that driver's probe method.

Jason
Kirsher, Jeffrey T March 13, 2019, 2:11 a.m. UTC | #6
On Fri, 2019-02-22 at 13:23 -0700, Jason Gunthorpe wrote:
> On Fri, Feb 22, 2019 at 08:13:58PM +0000, Ertman, David M wrote:
> > > From: Jason Gunthorpe [mailto:jgg@ziepe.ca]
> > > Sent: Thursday, February 21, 2019 11:35 AM
> > > To: Saleem, Shiraz <shiraz.saleem@intel.com>
> > > Cc: dledford@redhat.com; davem@davemloft.net; linux-
> > > rdma@vger.kernel.org; netdev@vger.kernel.org; Ismail, Mustafa
> > > <mustafa.ismail@intel.com>; Kirsher, Jeffrey T <
> > > jeffrey.t.kirsher@intel.com>;
> > > Patil, Kiran <kiran.patil@intel.com>; Ertman, David M
> > > <david.m.ertman@intel.com>
> > > Subject: Re: [RFC v1 01/19] net/i40e: Add peer register/unregister to
> > > struct
> > > i40e_netdev_priv
> > > 
> > > On Thu, Feb 21, 2019 at 02:19:33AM +0000, Saleem, Shiraz wrote:
> > > > > Subject: Re: [RFC v1 01/19] net/i40e: Add peer
> > > > > register/unregister to
> > > > > struct i40e_netdev_priv
> > > > > 
> > > > > On Fri, Feb 15, 2019 at 11:10:48AM -0600, Shiraz Saleem wrote:
> > > > > > Expose the register/unregister function pointers in the struct
> > > > > > i40e_netdev_priv which is accesible via the netdev_priv()
> > > > > > interface
> > > > > > in the RDMA driver. On a netdev notification in the RDMA
> > > > > > driver,
> > > > > > the appropriate LAN driver register/unregister functions are
> > > > > > invoked from the struct i40e_netdev_priv structure,
> > > > > 
> > > > > Why? In later patches we get an entire device_add() based thing.
> > > > > Why
> > > > > do you need two things?
> > > > > 
> > > > > The RDMA driver should bind to the thing that device_add created
> > > > > and
> > > > > from there reliably get the netdev. It should not listen to
> > > > > netdev notifiers for
> > > attachment.
> > > > In the new IDC mechanism between ice<->irdma, the LAN driver setups
> > > > up
> > > > the device for us and attaches it to a software bus via
> > > > device_add() based
> > > mechanism.
> > > > However, RDMA driver binds to the device only when the LAN
> > > > 'register'
> > > > function is called in irdma.
> > > 
> > > That doesn't make sense. The PCI driver should always create the
> > > required
> > > struct device attachment point when attachment is becomes possible.
> > > 
> > > > There is no ordering guarantee in which irdma, i40e and ice modules
> > > > load.
> > > > The netdev notifier is for the case where the irdma loads before
> > > > i40e
> > > > or ice.
> > > 
> > > You are supposed to use the driver core to handle this ordering.
> > > 
> > > The pci driver creates the attachment points in the correct order,
> > > when they
> > > are ready for use, and the driver core will automatically attach
> > > registered
> > > device drivers to the attachement points, no matter the module load
> > > loader.
> > > 
> > > You will have a netdev and a rdma attachment point, sounds like the
> > > RDMA one
> > > is created once the netdev is happy.
> > > 
> > > Maybe what you are missing is a struct device_driver?
> > > 
> > > Jason
> > 
> > I am assuming that the term PCI driver is being used to mean the PCI
> > subsystem in the kernel.  If this assumption is wrong, please disregard
> > the next
> > paragraph, but the following points will still apply.
> 
> No, I mean the driver that has the struct pci_driver for the PCI
> function. Maybe that is the LAN driver for this case.

Sorry for the delayed response, I was dealing with a death in the family.

I want to make sure we are on the same page, so please correct me if I am
wrong.  The Intel RDMA driver is not a stand-alone PCI function driver
because there is no separate PCI function for RDMA, so the RDMA driver does
not call pci_register_driver(), that is done with the LAN driver.  The
Intel RDMA driver still needs to access to HW resources which is accessible
only through the LAN PF driver (e.g. i40e/ice) and does not have its own
netdev, so it uses the netdev exposed by the LAN PF driver.  The access to
the netdev is needed so that it can listen to RT netlink messages and other
reasons.

We refer to "software bus" because out hardware doe not expose the hardware
bus, so our APIs to bus_register/unregister are actually using a software
bus, which is exposed and managed by the LAN driver.

> > bus, and has no ability to perform the described functions.  The
> > irdma driver cannot register with the software bus unless it
> > registers with the LAN driver that controls the bus.  The LAN
> > driver's register function will call "driver_register(&drv->driver)"
> > for the registering irdma driver.
> 
> That isn't how to use the driver core.

Software bus exposing and managing in other kernel drivers, like Many
Integrated Core (MIC) software driver and virtio subsystem.

> > Since the irdma driver is a consolidated driver (supports both ice and
> > i40e LAN
> > drivers), we cannot guarantee that a given LAN driver will load before
> > the irdma
> > driver.  Even if we use module dependencies to make irdma depend on
> > (ice ||
> > i40e), we have to consider the situation where a machine will have both
> > an ice
> > supported LAN device and an i40e supported LAN device in it.  In this
> > case, the
> > load order could be (e.g.) i40e -> irdma -> ice.  The irdma driver can
> > check all
> > present netdevs when it loads to find the one that has the correct
> > function
> > pointers in it, but it will have no way of knowing that a new software
> > bus was
> > created by the second LAN driver to load.
> 
> This is why you use the driver core to manage driver binding.
> 
> > This is why irdma is listening for netdev notifiers, so that whenever a
> > new netdev
> > appears from a LAN driver loading after irdma, the irdma driver can
> > evaluate
> > whether the new netdev was created by a LAN driver supported by irdma
> > driver.
> 
> Register a device driver to the driver core and wait for the driver
> core to call that driver's probe method.

Yes, the LAN PF driver is the software component exposing and managing the
bus, so it is the one who will call probe/remove of the peer driver (RDMA
driver).  Although netdev notifiers based approach is needed if the RDMA
driver was loaded first before the LAN PF driver (i40e or ice) is loaded.

> 
> Jason
Jason Gunthorpe March 13, 2019, 1:28 p.m. UTC | #7
On Tue, Mar 12, 2019 at 07:11:08PM -0700, Jeff Kirsher wrote:
> > > I am assuming that the term PCI driver is being used to mean the PCI
> > > subsystem in the kernel.  If this assumption is wrong, please disregard
> > > the next
> > > paragraph, but the following points will still apply.
> > 
> > No, I mean the driver that has the struct pci_driver for the PCI
> > function. Maybe that is the LAN driver for this case.
> 
> Sorry for the delayed response, I was dealing with a death in the family.

My condolences

> I want to make sure we are on the same page, so please correct me if I am
> wrong.  The Intel RDMA driver is not a stand-alone PCI function driver
> because there is no separate PCI function for RDMA, so the RDMA driver does
> not call pci_register_driver(), that is done with the LAN driver.  The
> Intel RDMA driver still needs to access to HW resources which is accessible
> only through the LAN PF driver (e.g. i40e/ice) and does not have its own
> netdev, so it uses the netdev exposed by the LAN PF driver.  The access to
> the netdev is needed so that it can listen to RT netlink messages and other
> reasons.

Whichever driver own the pci_driver is the one that has to setup the
other driver core elements. Sounds like it is the net driver in your
design.
 
> We refer to "software bus" because out hardware doe not expose the hardware
> bus, so our APIs to bus_register/unregister are actually using a software
> bus, which is exposed and managed by the LAN driver.

You have a multi-function device, and we have the mfd subsystem to
help these cases. Probably this driver should use it.

> > Register a device driver to the driver core and wait for the driver
> > core to call that driver's probe method.
> 
> Yes, the LAN PF driver is the software component exposing and managing the
> bus, so it is the one who will call probe/remove of the peer driver (RDMA
> driver).  Although netdev notifiers based approach is needed if the RDMA
> driver was loaded first before the LAN PF driver (i40e or ice) is loaded.

Why would notifiers be needed? Driver core handles all these ordering
things. If you have a device_driver with no device it waits until a
device gets plugged in to call probe.

Jason
Shiraz Saleem May 10, 2019, 1:31 p.m. UTC | #8
On Wed, Mar 13, 2019 at 07:28:41AM -0600, Jason Gunthorpe wrote:
> 
> > > Register a device driver to the driver core and wait for the driver
> > > core to call that driver's probe method.
> > 
> > Yes, the LAN PF driver is the software component exposing and managing the
> > bus, so it is the one who will call probe/remove of the peer driver (RDMA
> > driver).  Although netdev notifiers based approach is needed if the RDMA
> > driver was loaded first before the LAN PF driver (i40e or ice) is loaded.
> 
> Why would notifiers be needed? Driver core handles all these ordering
> things. If you have a device_driver with no device it waits until a
> device gets plugged in to call probe.
> 

Hi Jason - Your feedback here is much appreciated and we have revisited our design based on it.
The platform driver/device model is a good fit for us with the addition of RDMA capable devices
to the virtual platform bus. Here are the highlights of design and how they address your concerns.

(1) irdma driver registers itself as a platform driver with its own probe()/remove() routines.
    It will support RDMA capable platform devices from different Intel HW generations. 
(2) The intel net driver will register RDMA capable devices on the platform bus.
(3) Exposing a virtual bus type in the netdev driver is redundant and thus removed.
    Additionally, it would require the bus object to be exported in order for irdma to register,
    which doesnt allow irdma to be unified. 
(4) In irdma bus probe(), we are able to reach each platform dev's associated net-specific
    data including the netdev. 
(5) There are no ordering dependencies between net-driver and irdma since it's managed by driver
    core as you stated. Listening to netdev notifiers for attachment is no longer required and
    thus removed.

We did a proof-of-concept of this revised design with 'irdma' and 'ice'.

The last 2 commits on github contain the specific changes to the 2 drivers to migrate to the new model.

https://github.com/shirazsaleem/linux-rdma/commits/poc-irdma-platform-driver
eba0979 ("RDMA/irdma: Register irdma as a platform driver")
32a7dea ("ice: Register RDMA peer devices to the virtual platform bus")

Thoughts?

Shiraz
Jason Gunthorpe May 10, 2019, 6:17 p.m. UTC | #9
On Fri, May 10, 2019 at 08:31:02AM -0500, Shiraz Saleem wrote:
> On Wed, Mar 13, 2019 at 07:28:41AM -0600, Jason Gunthorpe wrote:
> > 
> > > > Register a device driver to the driver core and wait for the driver
> > > > core to call that driver's probe method.
> > > 
> > > Yes, the LAN PF driver is the software component exposing and managing the
> > > bus, so it is the one who will call probe/remove of the peer driver (RDMA
> > > driver).  Although netdev notifiers based approach is needed if the RDMA
> > > driver was loaded first before the LAN PF driver (i40e or ice) is loaded.
> > 
> > Why would notifiers be needed? Driver core handles all these ordering
> > things. If you have a device_driver with no device it waits until a
> > device gets plugged in to call probe.
> > 
> 
> Hi Jason - Your feedback here is much appreciated and we have revisited our design based on it.
> The platform driver/device model is a good fit for us with the addition of RDMA capable devices
> to the virtual platform bus. Here are the highlights of design and how they address your concerns.
> 
> (1) irdma driver registers itself as a platform driver with its own probe()/remove() routines.
>     It will support RDMA capable platform devices from different Intel HW generations. 
> (2) The intel net driver will register RDMA capable devices on the platform bus.
> (3) Exposing a virtual bus type in the netdev driver is redundant and thus removed.
>     Additionally, it would require the bus object to be exported in order for irdma to register,
>     which doesnt allow irdma to be unified. 
> (4) In irdma bus probe(), we are able to reach each platform dev's associated net-specific
>     data including the netdev. 
> (5) There are no ordering dependencies between net-driver and irdma since it's managed by driver
>     core as you stated. Listening to netdev notifiers for attachment is no longer required and
>     thus removed.

This sounds a bount right, but you will want to run these details by
Greg KH. I think he will tell you to use the multi-function device
stuff, not a platform device.

Jason
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 8de9085..7bc6316 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -795,6 +795,7 @@  struct i40e_vsi {
 } ____cacheline_internodealigned_in_smp;
 
 struct i40e_netdev_priv {
+	struct idc_srv_provider prov_callbacks;
 	struct i40e_vsi *vsi;
 };
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.h b/drivers/net/ethernet/intel/i40e/i40e_client.h
index 72994ba..70ddb76 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_client.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_client.h
@@ -44,6 +44,16 @@  enum i40e_client_instance_state {
 #define I40E_QUEUE_TYPE_PE_AEQ  0x80
 #define I40E_QUEUE_INVALID_IDX	0xFFFF
 
+#define IDC_SIGNATURE 0x494e54454c494443ULL	/* INTELIDC */
+struct idc_srv_provider {
+	u64 signature;
+	u16 maj_ver;
+	u16 min_ver;
+	u8 rsvd[4];
+	int (*reg_peer_driver)(struct i40e_client *client);
+	int (*unreg_peer_driver)(struct i40e_client *client);
+};
+
 struct i40e_qv_info {
 	u32 v_idx; /* msix_vector */
 	u16 ceq_idx;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f52e2c4..114ff0e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12243,6 +12243,13 @@  static int i40e_config_netdev(struct i40e_vsi *vsi)
 	np = netdev_priv(netdev);
 	np->vsi = vsi;
 
+	np->prov_callbacks.signature = IDC_SIGNATURE;
+	np->prov_callbacks.maj_ver = I40E_CLIENT_VERSION_MAJOR;
+	np->prov_callbacks.min_ver = I40E_CLIENT_VERSION_MINOR;
+	memset(np->prov_callbacks.rsvd, 0, sizeof(np->prov_callbacks.rsvd));
+	np->prov_callbacks.reg_peer_driver = i40e_register_client;
+	np->prov_callbacks.unreg_peer_driver = i40e_unregister_client;
+
 	hw_enc_features = NETIF_F_SG			|
 			  NETIF_F_IP_CSUM		|
 			  NETIF_F_IPV6_CSUM		|