Message ID | 20210909151223.572918-1-david.m.ertman@intel.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RESEND,net] ice: Correctly deal with PFs that do not support RDMA | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 8 of 8 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 2 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 25 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 2 this patch: 0 |
netdev/header_inline | success | Link |
On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote: > There are two cases where the current PF does not support RDMA > functionality. The first is if the NVM loaded on the device is set > to not support RDMA (common_caps.rdma is false). The second is if > the kernel bonding driver has included the current PF in an active > link aggregate. > > When the driver has determined that this PF does not support RDMA, then > auxiliary devices should not be created on the auxiliary bus. This part is wrong, auxiliary devices should always be created, in your case it will be one eth device only without extra irdma device. Your "bug" is that you mixed auxiliary bus devices with "regular" ones and created eth device not as auxiliary one. This is why you are calling to auxiliary_device_init() for RDMA only and fallback to non-auxiliary mode. I hope that this is simple mistake while Intel folks rushed to merge irdma and not deliberate decision to find a way to support out-of-tree drivers. As a reminder, the whole idea of auxiliary bus is to have small, independent vendor driver core logic that manages capabilities and based on that creates/removes sub-devices (eth, rdma, vdpa ...), so driver core can properly load/unload their respective drivers. Thanks
> Subject: Re: [PATCH RESEND net] ice: Correctly deal with PFs that do not > support RDMA > > On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote: > > There are two cases where the current PF does not support RDMA > > functionality. The first is if the NVM loaded on the device is set to > > not support RDMA (common_caps.rdma is false). The second is if the > > kernel bonding driver has included the current PF in an active link > > aggregate. > > > > When the driver has determined that this PF does not support RDMA, > > then auxiliary devices should not be created on the auxiliary bus. > > This part is wrong, auxiliary devices should always be created, in your case it will > be one eth device only without extra irdma device. It is worth considering having an eth aux device/driver but is it a hard-and-fast rule? In this case, the RDMA-capable PCI network device spawns an auxiliary device for RDMA and the core driver is a network driver. > > Your "bug" is that you mixed auxiliary bus devices with "regular" ones and created > eth device not as auxiliary one. This is why you are calling to auxiliary_device_init() > for RDMA only and fallback to non-auxiliary mode. It's a design choice on how you carve out function(s) off your PCI core device to be managed by auxiliary driver(s) and not a bug. Shiraz
> -----Original Message----- > From: Saleem, Shiraz <shiraz.saleem@intel.com> > Sent: Monday, September 13, 2021 8:50 AM > To: Leon Romanovsky <leon@kernel.org>; Ertman, David M > <david.m.ertman@intel.com> > Cc: davem@davemloft.net; kuba@kernel.org; yongxin.liu@windriver.com; > Nguyen, Anthony L <anthony.l.nguyen@intel.com>; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Brandeburg, Jesse > <jesse.brandeburg@intel.com>; intel-wired-lan@lists.osuosl.org; linux- > rdma@vger.kernel.org; jgg@ziepe.ca; Williams, Dan J > <dan.j.williams@intel.com>; Singhai, Anjali <anjali.singhai@intel.com>; > Parikh, Neerav <neerav.parikh@intel.com>; Samudrala, Sridhar > <sridhar.samudrala@intel.com> > Subject: RE: [PATCH RESEND net] ice: Correctly deal with PFs that do not > support RDMA > > > Subject: Re: [PATCH RESEND net] ice: Correctly deal with PFs that do not > > support RDMA > > > > On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote: > > > There are two cases where the current PF does not support RDMA > > > functionality. The first is if the NVM loaded on the device is set to > > > not support RDMA (common_caps.rdma is false). The second is if the > > > kernel bonding driver has included the current PF in an active link > > > aggregate. > > > > > > When the driver has determined that this PF does not support RDMA, > > > then auxiliary devices should not be created on the auxiliary bus. > > > > This part is wrong, auxiliary devices should always be created, in your case it > will > > be one eth device only without extra irdma device. > > It is worth considering having an eth aux device/driver but is it a hard-and- > fast rule? > In this case, the RDMA-capable PCI network device spawns an auxiliary > device for RDMA > and the core driver is a network driver. > > > > > Your "bug" is that you mixed auxiliary bus devices with "regular" ones and > created > > eth device not as auxiliary one. This is why you are calling to > auxiliary_device_init() > > for RDMA only and fallback to non-auxiliary mode. > > It's a design choice on how you carve out function(s) off your PCI core device > to be > managed by auxiliary driver(s) and not a bug. > > Shiraz Also, regardless of whether netdev functionality is carved out into an auxiliary device or not, this code would still be necessary. We don't want to carve out an auxiliary device to support a functionality that the base PCI device does not support. Not having the RDMA auxiliary device for an auxiliary driver to bind to is how we differentiate between devices that support RDMA and those that don't. Thanks, DaveE
On Mon, Sep 13, 2021 at 03:49:43PM +0000, Saleem, Shiraz wrote: > > Subject: Re: [PATCH RESEND net] ice: Correctly deal with PFs that do not > > support RDMA > > > > On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote: > > > There are two cases where the current PF does not support RDMA > > > functionality. The first is if the NVM loaded on the device is set to > > > not support RDMA (common_caps.rdma is false). The second is if the > > > kernel bonding driver has included the current PF in an active link > > > aggregate. > > > > > > When the driver has determined that this PF does not support RDMA, > > > then auxiliary devices should not be created on the auxiliary bus. > > > > This part is wrong, auxiliary devices should always be created, in your case it will > > be one eth device only without extra irdma device. > > It is worth considering having an eth aux device/driver but is it a hard-and-fast rule? > In this case, the RDMA-capable PCI network device spawns an auxiliary device for RDMA > and the core driver is a network driver. > > > > > Your "bug" is that you mixed auxiliary bus devices with "regular" ones and created > > eth device not as auxiliary one. This is why you are calling to auxiliary_device_init() > > for RDMA only and fallback to non-auxiliary mode. > > It's a design choice on how you carve out function(s) off your PCI core device to be > managed by auxiliary driver(s) and not a bug. I'm not the one who is setting rules, just explaining what is wrong with the current design and proposed solution. The driver/core design expects three building blocks: logic that enumerates (creates) devices, bus that connects those devices (load/unload drivers) and specific drivers for every such device. Such separation allows clean view from locking perspective (separated devices), proper sysfs layout and same logic for the user space tools. In your case, you connected ethernet driver to be "enumerator" and replaced (duplicated) general driver/core logic that decides if to load or not auxiliary device driver with your custom code. Thanks > > Shiraz
On Mon, Sep 13, 2021 at 04:07:28PM +0000, Ertman, David M wrote: > > -----Original Message----- > > From: Saleem, Shiraz <shiraz.saleem@intel.com> > > Sent: Monday, September 13, 2021 8:50 AM > > To: Leon Romanovsky <leon@kernel.org>; Ertman, David M > > <david.m.ertman@intel.com> > > Cc: davem@davemloft.net; kuba@kernel.org; yongxin.liu@windriver.com; > > Nguyen, Anthony L <anthony.l.nguyen@intel.com>; > > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Brandeburg, Jesse > > <jesse.brandeburg@intel.com>; intel-wired-lan@lists.osuosl.org; linux- > > rdma@vger.kernel.org; jgg@ziepe.ca; Williams, Dan J > > <dan.j.williams@intel.com>; Singhai, Anjali <anjali.singhai@intel.com>; > > Parikh, Neerav <neerav.parikh@intel.com>; Samudrala, Sridhar > > <sridhar.samudrala@intel.com> > > Subject: RE: [PATCH RESEND net] ice: Correctly deal with PFs that do not > > support RDMA > > > > > Subject: Re: [PATCH RESEND net] ice: Correctly deal with PFs that do not > > > support RDMA > > > > > > On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote: > > > > There are two cases where the current PF does not support RDMA > > > > functionality. The first is if the NVM loaded on the device is set to > > > > not support RDMA (common_caps.rdma is false). The second is if the > > > > kernel bonding driver has included the current PF in an active link > > > > aggregate. > > > > > > > > When the driver has determined that this PF does not support RDMA, > > > > then auxiliary devices should not be created on the auxiliary bus. > > > > > > This part is wrong, auxiliary devices should always be created, in your case it > > will > > > be one eth device only without extra irdma device. > > > > It is worth considering having an eth aux device/driver but is it a hard-and- > > fast rule? > > In this case, the RDMA-capable PCI network device spawns an auxiliary > > device for RDMA > > and the core driver is a network driver. > > > > > > > > Your "bug" is that you mixed auxiliary bus devices with "regular" ones and > > created > > > eth device not as auxiliary one. This is why you are calling to > > auxiliary_device_init() > > > for RDMA only and fallback to non-auxiliary mode. > > > > It's a design choice on how you carve out function(s) off your PCI core device > > to be > > managed by auxiliary driver(s) and not a bug. > > > > Shiraz > > Also, regardless of whether netdev functionality is carved out into an auxiliary device or not, this code would still be necessary. Right > > We don't want to carve out an auxiliary device to support a functionality that the base PCI device does not support. Not having > the RDMA auxiliary device for an auxiliary driver to bind to is how we differentiate between devices that support RDMA and those > that don't. This is right too. My complain is that you mixed enumerator logic with eth driver and create auxiliary bus only if your RDMA device exists. It is wrong. Thanks > > Thanks, > DaveE >
On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote: > There are two cases where the current PF does not support RDMA > functionality. The first is if the NVM loaded on the device is set > to not support RDMA (common_caps.rdma is false). The second is if > the kernel bonding driver has included the current PF in an active > link aggregate. > > When the driver has determined that this PF does not support RDMA, then > auxiliary devices should not be created on the auxiliary bus. Without > a device on the auxiliary bus, even if the irdma driver is present, there > will be no RDMA activity attempted on this PF. > > Currently, in the reset flow, an attempt to create auxiliary devices is > performed without regard to the ability of the PF. There needs to be a > check in ice_aux_plug_dev (as the central point that creates auxiliary > devices) to see if the PF is in a state to support the functionality. > > When disabling and re-enabling RDMA due to the inclusion/removal of the PF > in a link aggregate, we also need to set/clear the bit which controls > auxiliary device creation so that a reset recovery in a link aggregate > situation doesn't try to create auxiliary devices when it shouldn't. > > Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") > Reported-by: Yongxin Liu <yongxin.liu@windriver.com> > Signed-off-by: Dave Ertman <david.m.ertman@intel.com> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> > drivers/net/ethernet/intel/ice/ice.h | 2 ++ > drivers/net/ethernet/intel/ice/ice_idc.c | 6 ++++++ > 2 files changed, 8 insertions(+) > > diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h > index eadcb9958346..3c4f08d20414 100644 > +++ b/drivers/net/ethernet/intel/ice/ice.h > @@ -695,6 +695,7 @@ static inline void ice_set_rdma_cap(struct ice_pf *pf) > { > if (pf->hw.func_caps.common_cap.rdma && pf->num_rdma_msix) { > set_bit(ICE_FLAG_RDMA_ENA, pf->flags); > + set_bit(ICE_FLAG_AUX_ENA, pf->flags); > ice_plug_aux_dev(pf); I agree with Leon, there shouldn't be a flag for "aux en". aux is enabled when a device on the aux bus is required. It should all be rdma en, which already seems to have a bit. Th only existing place that uses aux_ena immediately calls err = ice_init_rdma(pf); So I'd just delete the whole thing and use rdma_ena. Frankly it looks structured confusingly, the mlx implementation is better where this is one function that synchronizes the aux bus with the current state of the driver - adding/removing as required Jason
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index eadcb9958346..3c4f08d20414 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -695,6 +695,7 @@ static inline void ice_set_rdma_cap(struct ice_pf *pf) { if (pf->hw.func_caps.common_cap.rdma && pf->num_rdma_msix) { set_bit(ICE_FLAG_RDMA_ENA, pf->flags); + set_bit(ICE_FLAG_AUX_ENA, pf->flags); ice_plug_aux_dev(pf); } } @@ -707,5 +708,6 @@ static inline void ice_clear_rdma_cap(struct ice_pf *pf) { ice_unplug_aux_dev(pf); clear_bit(ICE_FLAG_RDMA_ENA, pf->flags); + clear_bit(ICE_FLAG_AUX_ENA, pf->flags); } #endif /* _ICE_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c index 1f2afdf6cd48..adcc9a251595 100644 --- a/drivers/net/ethernet/intel/ice/ice_idc.c +++ b/drivers/net/ethernet/intel/ice/ice_idc.c @@ -271,6 +271,12 @@ int ice_plug_aux_dev(struct ice_pf *pf) struct auxiliary_device *adev; int ret; + /* if this PF doesn't support a technology that requires auxiliary + * devices, then gracefully exit + */ + if (!ice_is_aux_ena(pf)) + return 0; + iadev = kzalloc(sizeof(*iadev), GFP_KERNEL); if (!iadev) return -ENOMEM;