Message ID | 20221114125755.13659-14-michal.swiatkowski@linux.intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | resource management using devlink reload | expand |
Mon, Nov 14, 2022 at 01:57:55PM CET, michal.swiatkowski@linux.intel.com wrote: >From: Michal Kubiak <michal.kubiak@intel.com> > >Implement devlink resource to control how many MSI-X vectors are >used for eth, VF and RDMA. Show misc MSI-X as read only. > >This is first approach to control the mix of resources managed >by ice driver. This commit registers number of available MSI-X >as devlink resource and also add specific resources for eth, vf and RDMA. > >Also, make those resources generic. > >$ devlink resource show pci/0000:31:00.0 > name msix size 1024 occ 172 unit entry dpipe_tables none So, 1024 is the total vector count available in your hw? > resources: > name msix_misc size 4 unit entry dpipe_tables none What's misc? Why you don't show occupancy for it? Yet, it seems to be accounted in the total (172) Also, drop the "msix_" prefix from all, you already have parent called "msix". > name msix_eth size 92 occ 92 unit entry size_min 1 size_max Why "size_min is not 0 here? > 128 size_gran 1 dpipe_tables none > name msix_vf size 128 occ 0 unit entry size_min 0 size_max > 1020 size_gran 1 dpipe_tables none > name msix_rdma size 76 occ 76 unit entry size_min 0 size_max Okay, this means that for eth and rdma, the vectors are fully used, no VF is instanciated? > 132 size_gran 1 dpipe_tables none > >example commands: >$ devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16 >$ devlink resource set pci/0000:31:00.0 path msix/msix_vf size 512 >$ devlink dev reload pci/0000:31:00.0
On Mon, Nov 14, 2022 at 04:28:38PM +0100, Jiri Pirko wrote: > Mon, Nov 14, 2022 at 01:57:55PM CET, michal.swiatkowski@linux.intel.com wrote: > >From: Michal Kubiak <michal.kubiak@intel.com> > > > >Implement devlink resource to control how many MSI-X vectors are > >used for eth, VF and RDMA. Show misc MSI-X as read only. > > > >This is first approach to control the mix of resources managed > >by ice driver. This commit registers number of available MSI-X > >as devlink resource and also add specific resources for eth, vf and RDMA. > > > >Also, make those resources generic. > > > >$ devlink resource show pci/0000:31:00.0 > > name msix size 1024 occ 172 unit entry dpipe_tables none > > > So, 1024 is the total vector count available in your hw? > For this particular device and physical function, yes. > > > resources: > > name msix_misc size 4 unit entry dpipe_tables none > > What's misc? Why you don't show occupancy for it? Yet, it seems to be > accounted in the total (172) > > Also, drop the "msix_" prefix from all, you already have parent called > "msix". misc interrupts are for miscellaneous purposes like communication with Firmware or other control plane interrupts (if any). > > > > name msix_eth size 92 occ 92 unit entry size_min 1 size_max > > Why "size_min is not 0 here? Thanks, actually 0 would mean disable the eth, default, netdev at all. It could be done, however not implemented in this patchset. But for cases when the default port is not needed at all, it seems like a good idea. > > > > 128 size_gran 1 dpipe_tables none > > name msix_vf size 128 occ 0 unit entry size_min 0 size_max > > 1020 size_gran 1 dpipe_tables none > > name msix_rdma size 76 occ 76 unit entry size_min 0 size_max > > Okay, this means that for eth and rdma, the vectors are fully used, no > VF is instanciated? Yes, in this driver implementation, both eth and rdma will most probably be always fully utilized, but the moment you change the size and execute `devlink reload` then they will reconfigure with new values. The VF allocation here is the maximum number of interrupt vectors that can be assigned to actually created VFs. If so, then occ shows how many are actually utilized by the VFs. > > > > > 132 size_gran 1 dpipe_tables none > > > >example commands: > >$ devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16 > >$ devlink resource set pci/0000:31:00.0 path msix/msix_vf size 512 > >$ devlink dev reload pci/0000:31:00.0
On Mon, Nov 14, 2022 at 05:03:43PM +0100, Piotr Raczynski wrote: > On Mon, Nov 14, 2022 at 04:28:38PM +0100, Jiri Pirko wrote: > > Mon, Nov 14, 2022 at 01:57:55PM CET, michal.swiatkowski@linux.intel.com wrote: > > >From: Michal Kubiak <michal.kubiak@intel.com> > > > > > >Implement devlink resource to control how many MSI-X vectors are > > >used for eth, VF and RDMA. Show misc MSI-X as read only. > > > > > >This is first approach to control the mix of resources managed > > >by ice driver. This commit registers number of available MSI-X > > >as devlink resource and also add specific resources for eth, vf and RDMA. > > > > > >Also, make those resources generic. > > > > > >$ devlink resource show pci/0000:31:00.0 > > > name msix size 1024 occ 172 unit entry dpipe_tables none > > > > > > So, 1024 is the total vector count available in your hw? > > > > For this particular device and physical function, yes. > > > > > > > resources: > > > name msix_misc size 4 unit entry dpipe_tables none > > > > What's misc? Why you don't show occupancy for it? Yet, it seems to be > > accounted in the total (172) > > > > Also, drop the "msix_" prefix from all, you already have parent called > > "msix". > > misc interrupts are for miscellaneous purposes like communication with > Firmware or other control plane interrupts (if any). > I will drop msix_ prefix. I didn't show the occupancy because it is the same all the time and user can't change it. But You are righ it is accounted, so I will add occupancy also here in next version. > > > > > > > name msix_eth size 92 occ 92 unit entry size_min 1 size_max > > > > Why "size_min is not 0 here? > > Thanks, actually 0 would mean disable the eth, default, netdev at all. > It could be done, however not implemented in this patchset. But for > cases when the default port is not needed at all, it seems like a good > idea. > I will try to do it in next version, thanks. > > > > > > > 128 size_gran 1 dpipe_tables none > > > name msix_vf size 128 occ 0 unit entry size_min 0 size_max > > > 1020 size_gran 1 dpipe_tables none > > > name msix_rdma size 76 occ 76 unit entry size_min 0 size_max > > > > Okay, this means that for eth and rdma, the vectors are fully used, no > > VF is instanciated? > > Yes, in this driver implementation, both eth and rdma will most probably > be always fully utilized, but the moment you change the size and execute > `devlink reload` then they will reconfigure with new values. > > The VF allocation here is the maximum number of interrupt vectors that > can be assigned to actually created VFs. If so, then occ shows how many > are actually utilized by the VFs. > > > > > > > > > > 132 size_gran 1 dpipe_tables none > > > > > >example commands: > > >$ devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16 > > >$ devlink resource set pci/0000:31:00.0 path msix/msix_vf size 512 > > >$ devlink dev reload pci/0000:31:00.0
Mon, Nov 14, 2022 at 05:03:43PM CET, piotr.raczynski@intel.com wrote: >On Mon, Nov 14, 2022 at 04:28:38PM +0100, Jiri Pirko wrote: >> Mon, Nov 14, 2022 at 01:57:55PM CET, michal.swiatkowski@linux.intel.com wrote: >> >From: Michal Kubiak <michal.kubiak@intel.com> >> > >> >Implement devlink resource to control how many MSI-X vectors are >> >used for eth, VF and RDMA. Show misc MSI-X as read only. >> > >> >This is first approach to control the mix of resources managed >> >by ice driver. This commit registers number of available MSI-X >> >as devlink resource and also add specific resources for eth, vf and RDMA. >> > >> >Also, make those resources generic. >> > >> >$ devlink resource show pci/0000:31:00.0 >> > name msix size 1024 occ 172 unit entry dpipe_tables none >> >> >> So, 1024 is the total vector count available in your hw? >> > >For this particular device and physical function, yes. > > >> >> > resources: >> > name msix_misc size 4 unit entry dpipe_tables none >> >> What's misc? Why you don't show occupancy for it? Yet, it seems to be >> accounted in the total (172) >> >> Also, drop the "msix_" prefix from all, you already have parent called >> "msix". > >misc interrupts are for miscellaneous purposes like communication with >Firmware or other control plane interrupts (if any). > >> >> >> > name msix_eth size 92 occ 92 unit entry size_min 1 size_max >> >> Why "size_min is not 0 here? > >Thanks, actually 0 would mean disable the eth, default, netdev at all. >It could be done, however not implemented in this patchset. But for >cases when the default port is not needed at all, it seems like a good >idea. So the behaviour you describe is, if you set number of vectors to 0, the driver won't instantiate netdev ? That sounds quite odd. In that case, the minimal value should really be 1. The same goes to rdma. > >> >> >> > 128 size_gran 1 dpipe_tables none >> > name msix_vf size 128 occ 0 unit entry size_min 0 size_max >> > 1020 size_gran 1 dpipe_tables none >> > name msix_rdma size 76 occ 76 unit entry size_min 0 size_max >> >> Okay, this means that for eth and rdma, the vectors are fully used, no >> VF is instanciated? > >Yes, in this driver implementation, both eth and rdma will most probably >be always fully utilized, but the moment you change the size and execute >`devlink reload` then they will reconfigure with new values. > >The VF allocation here is the maximum number of interrupt vectors that >can be assigned to actually created VFs. If so, then occ shows how many >are actually utilized by the VFs. > >> >> >> >> > 132 size_gran 1 dpipe_tables none >> > >> >example commands: >> >$ devlink resource set pci/0000:31:00.0 path msix/msix_eth size 16 >> >$ devlink resource set pci/0000:31:00.0 path msix/msix_vf size 512 >> >$ devlink dev reload pci/0000:31:00.0
diff --git a/Documentation/networking/devlink/devlink-resource.rst b/Documentation/networking/devlink/devlink-resource.rst index 3d5ae51e65a2..dcd3d124ebfb 100644 --- a/Documentation/networking/devlink/devlink-resource.rst +++ b/Documentation/networking/devlink/devlink-resource.rst @@ -36,6 +36,16 @@ device drivers and their description must be added to the following table: - Description * - ``physical_ports`` - A limited capacity of physical ports that the switch ASIC can support + * - ``msix`` + - Amount of MSIX interrupt vectors that the device can support + * - ``msix_misc`` + - Amount of MSIX interrupt vectors that can be used for miscellaneous purposes + * - ``msix_eth`` + - Amount of MSIX interrupt vectors that can be used by ethernet + * - ``msix_vf`` + - Amount of MSIX interrupt vectors that can be used by virtual functions + * - ``msix_rdma`` + - Amount of MSIX interrupt vectors that can be used by RDMA example usage ------------- diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c index 8f73c4008a56..00f01cf84812 100644 --- a/drivers/net/ethernet/intel/ice/ice_devlink.c +++ b/drivers/net/ethernet/intel/ice/ice_devlink.c @@ -496,6 +496,27 @@ ice_devlink_reload_empr_finish(struct ice_pf *pf, return 0; } +static void ice_devlink_read_resources_size(struct ice_pf *pf) +{ + struct devlink *devlink = priv_to_devlink(pf); + u64 size_new; + + devl_resource_size_get(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_ETH, + &size_new); + pf->req_msix.eth = size_new; + + devl_resource_size_get(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_VF, + &size_new); + pf->msix.vf = size_new; + + devl_resource_size_get(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_RDMA, + &size_new); + pf->req_msix.rdma = size_new; +} + /** * ice_devlink_port_opt_speed_str - convert speed to a string * @speed: speed value @@ -761,6 +782,7 @@ ice_devlink_reload_up(struct devlink *devlink, switch (action) { case DEVLINK_RELOAD_ACTION_DRIVER_REINIT: *actions_performed = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT); + ice_devlink_read_resources_size(pf); return ice_load(pf); case DEVLINK_RELOAD_ACTION_FW_ACTIVATE: *actions_performed = BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE); @@ -955,6 +977,144 @@ void ice_devlink_unregister(struct ice_pf *pf) devlink_unregister(priv_to_devlink(pf)); } +static u64 ice_devlink_res_msix_pf_occ_get(void *priv) +{ + struct ice_pf *pf = priv; + + return pf->msix.eth; +} + +static u64 ice_devlink_res_msix_vf_occ_get(void *priv) +{ + struct ice_pf *pf = priv; + + return pf->vfs.num_msix_per * ice_get_num_vfs(pf); +} + +static u64 ice_devlink_res_msix_rdma_occ_get(void *priv) +{ + struct ice_pf *pf = priv; + + return pf->msix.rdma; +} + +static u64 ice_devlink_res_msix_occ_get(void *priv) +{ + struct ice_pf *pf = priv; + + return ice_devlink_res_msix_pf_occ_get(priv) + + ice_devlink_res_msix_vf_occ_get(priv) + + ice_devlink_res_msix_rdma_occ_get(priv) + + pf->msix.misc; +} + +int ice_devlink_register_resources(struct ice_pf *pf) +{ + struct devlink *devlink = priv_to_devlink(pf); + struct devlink_resource_size_params params; + struct device *dev = ice_pf_to_dev(pf); + struct ice_msix *req = &pf->req_msix; + int err, max_pf_msix, max_rdma_msix; + const char *res_name; + + max_pf_msix = min_t(int, num_online_cpus(), ICE_MAX_LG_RSS_QS); + max_rdma_msix = max_pf_msix + ICE_RDMA_NUM_AEQ_MSIX; + + devlink_resource_size_params_init(¶ms, req->all, req->all, 1, + DEVLINK_RESOURCE_UNIT_ENTRY); + res_name = DEVLINK_RESOURCE_GENERIC_NAME_MSIX; + err = devlink_resource_register(devlink, res_name, req->all, + DEVLINK_RESOURCE_GENERIC_ID_MSIX, + DEVLINK_RESOURCE_ID_PARENT_TOP, + ¶ms); + if (err) + goto res_create_err; + + devlink_resource_size_params_init(¶ms, req->misc, req->misc, 1, + DEVLINK_RESOURCE_UNIT_ENTRY); + res_name = DEVLINK_RESOURCE_GENERIC_NAME_MSIX_MISC; + err = devlink_resource_register(devlink, res_name, req->misc, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_MISC, + DEVLINK_RESOURCE_GENERIC_ID_MSIX, + ¶ms); + if (err) + goto res_create_err; + + devlink_resource_size_params_init(¶ms, ICE_MIN_LAN_TXRX_MSIX, + max_pf_msix, 1, + DEVLINK_RESOURCE_UNIT_ENTRY); + res_name = DEVLINK_RESOURCE_GENERIC_NAME_MSIX_ETH; + err = devlink_resource_register(devlink, res_name, req->eth, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_ETH, + DEVLINK_RESOURCE_GENERIC_ID_MSIX, + ¶ms); + if (err) + goto res_create_err; + + devlink_resource_size_params_init(¶ms, 0, req->all - req->misc, 1, + DEVLINK_RESOURCE_UNIT_ENTRY); + + res_name = DEVLINK_RESOURCE_GENERIC_NAME_MSIX_VF; + err = devlink_resource_register(devlink, res_name, req->vf, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_VF, + DEVLINK_RESOURCE_GENERIC_ID_MSIX, + ¶ms); + if (err) + goto res_create_err; + + devlink_resource_size_params_init(¶ms, ICE_MIN_RDMA_MSIX, + max_rdma_msix, 1, + DEVLINK_RESOURCE_UNIT_ENTRY); + + res_name = DEVLINK_RESOURCE_GENERIC_NAME_MSIX_RDMA; + err = devlink_resource_register(devlink, res_name, req->rdma, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_RDMA, + DEVLINK_RESOURCE_GENERIC_ID_MSIX, + ¶ms); + if (err) + goto res_create_err; + + devlink_resource_occ_get_register(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX, + ice_devlink_res_msix_occ_get, pf); + + devlink_resource_occ_get_register(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_ETH, + ice_devlink_res_msix_pf_occ_get, pf); + + devlink_resource_occ_get_register(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_VF, + ice_devlink_res_msix_vf_occ_get, pf); + + devlink_resource_occ_get_register(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_RDMA, + ice_devlink_res_msix_rdma_occ_get, + pf); + return 0; + +res_create_err: + dev_err(dev, "Failed to register devlink resource: %s error: %pe\n", + res_name, ERR_PTR(err)); + devlink_resources_unregister(devlink); + + return err; +} + +void ice_devlink_unregister_resources(struct ice_pf *pf) +{ + struct devlink *devlink = priv_to_devlink(pf); + + devlink_resource_occ_get_unregister(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX); + devlink_resource_occ_get_unregister(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_ETH); + devlink_resource_occ_get_unregister(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_VF); + devlink_resource_occ_get_unregister(devlink, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_RDMA); + devlink_resources_unregister(devlink); +} + /** * ice_devlink_set_switch_id - Set unique switch id based on pci dsn * @pf: the PF to create a devlink port for diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.h b/drivers/net/ethernet/intel/ice/ice_devlink.h index fe006d9946f8..35233ea087fc 100644 --- a/drivers/net/ethernet/intel/ice/ice_devlink.h +++ b/drivers/net/ethernet/intel/ice/ice_devlink.h @@ -10,6 +10,8 @@ void ice_devlink_register(struct ice_pf *pf); void ice_devlink_unregister(struct ice_pf *pf); int ice_devlink_register_params(struct ice_pf *pf); void ice_devlink_unregister_params(struct ice_pf *pf); +int ice_devlink_register_resources(struct ice_pf *pf); +void ice_devlink_unregister_resources(struct ice_pf *pf); int ice_devlink_create_pf_port(struct ice_pf *pf); void ice_devlink_destroy_pf_port(struct ice_pf *pf); int ice_devlink_create_vf_port(struct ice_vf *vf); diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index a4c283bc8da0..ba6f93e7e640 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -4970,6 +4970,10 @@ static int ice_init_devlink(struct ice_pf *pf) if (err) return err; + err = ice_devlink_register_resources(pf); + if (err) + return err; + ice_devlink_init_regions(pf); ice_devlink_register(pf); @@ -4980,6 +4984,7 @@ static void ice_deinit_devlink(struct ice_pf *pf) { ice_devlink_unregister(pf); ice_devlink_destroy_regions(pf); + ice_devlink_unregister_resources(pf); ice_devlink_unregister_params(pf); } diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c index 3ba1408c56a9..d42df5983494 100644 --- a/drivers/net/ethernet/intel/ice/ice_sriov.c +++ b/drivers/net/ethernet/intel/ice/ice_sriov.c @@ -518,8 +518,7 @@ static int ice_set_per_vf_res(struct ice_pf *pf, u16 num_vfs) return -ENOSPC; /* determine MSI-X resources per VF */ - msix_avail_for_sriov = pf->hw.func_caps.common_cap.num_msix_vectors - - pf->irq_tracker->num_entries; + msix_avail_for_sriov = pf->msix.vf; msix_avail_per_vf = msix_avail_for_sriov / num_vfs; if (msix_avail_per_vf >= ICE_NUM_VF_MSIX_MED) { num_msix_per_vf = ICE_NUM_VF_MSIX_MED; diff --git a/include/net/devlink.h b/include/net/devlink.h index 611a23a3deb2..a041c50f72c6 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -409,7 +409,21 @@ typedef u64 devlink_resource_occ_get_t(void *priv); #define DEVLINK_RESOURCE_ID_PARENT_TOP 0 +enum devlink_msix_resource_id { + /* generic resource for MSIX */ + DEVLINK_RESOURCE_GENERIC_ID_MSIX = 1, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_MISC, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_ETH, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_VF, + DEVLINK_RESOURCE_GENERIC_ID_MSIX_RDMA, +}; + #define DEVLINK_RESOURCE_GENERIC_NAME_PORTS "physical_ports" +#define DEVLINK_RESOURCE_GENERIC_NAME_MSIX "msix" +#define DEVLINK_RESOURCE_GENERIC_NAME_MSIX_MISC "msix_misc" +#define DEVLINK_RESOURCE_GENERIC_NAME_MSIX_ETH "msix_eth" +#define DEVLINK_RESOURCE_GENERIC_NAME_MSIX_VF "msix_vf" +#define DEVLINK_RESOURCE_GENERIC_NAME_MSIX_RDMA "msix_rdma" #define __DEVLINK_PARAM_MAX_STRING_VALUE 32 enum devlink_param_type {