Message ID | 20210113234226.3638426-1-anthony.l.nguyen@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,1/1] ice: Improve MSI-X vector enablement fallback logic | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 4 maintainers not CCed: jeffrey.t.kirsher@intel.com anirudh.venkataramanan@intel.com jesse.brandeburg@intel.com intel-wired-lan@lists.osuosl.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: line length of 81 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Wed, 13 Jan 2021 15:42:26 -0800 Tony Nguyen wrote: > From: Brett Creeley <brett.creeley@intel.com> > > The current MSI-X enablement logic either tries to enable best-case > MSI-X vectors and if that fails we only support a bare-minimum set. > This is not very flexible and the current logic is broken when actually > allocating and reserving MSI-X in the driver. Fix this by improving the > fall-back logic and fixing the allocation/reservation of MSI-X when the > best-case MSI-X vectors are not received from the OS. > > The new fall-back logic is described below with each [#] being an > attempt at enabling a certain number of MSI-X. If any of the steps > succeed, then return the number of MSI-X enabled from > ice_ena_msix_range(). If any of the attempts fail, then goto the next > step. > > Attempt [0]: Enable the best-case scenario MSI-X vectors. > > Attempt [1]: Enable MSI-X vectors with the number of pf->num_lan_msix > reduced by a factor of 2 from the previous attempt (i.e. > num_online_cpus() / 2). > > Attempt [2]: Same as attempt [1], except reduce by a factor of 4. > > Attempt [3]: Enable the bare-minimum MSI-X vectors. > > Also, if the adjusted_base_msix ever hits the minimum required for LAN, > then just set the needed MSI-X for that feature to the minimum > (similar to attempt [3]). I don't really get why you switch to this manual "exponential back-off" rather than continuing to use pci_enable_msix_range(), but fixing the capping to ICE_MIN_LAN_VECS. > To fix the allocation/reservation of MSI-X, the PF VSI needs to take > into account the pf->num_lan_msix available and only allocate up to that > many MSI-X vectors. To do this, limit the number of Tx and Rx queues > based on pf->num_lan_msix. This is done because we don't want more than > 1 Tx/Rx queue per interrupt due to performance concerns. Also, limit the > number of MSI-X based on pf->num_lan_msix available. > > Also, prevent users from enabling more combined queues than there are > MSI-X available via ethtool. Right, this part sounds like a fix, and should be a separate patch against net, not net-next. > Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range") > Signed-off-by: Brett Creeley <brett.creeley@intel.com> > Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
On Thu, 2021-01-14 at 16:42 -0800, Jakub Kicinski wrote: > On Wed, 13 Jan 2021 15:42:26 -0800 Tony Nguyen wrote: > > From: Brett Creeley <brett.creeley@intel.com> > > > > The current MSI-X enablement logic either tries to enable best-case > > MSI-X vectors and if that fails we only support a bare-minimum set. > > This is not very flexible and the current logic is broken when > > actually > > allocating and reserving MSI-X in the driver. Fix this by improving > > the > > fall-back logic and fixing the allocation/reservation of MSI-X when > > the > > best-case MSI-X vectors are not received from the OS. > > > > The new fall-back logic is described below with each [#] being an > > attempt at enabling a certain number of MSI-X. If any of the steps > > succeed, then return the number of MSI-X enabled from > > ice_ena_msix_range(). If any of the attempts fail, then goto the > > next > > step. > > > > Attempt [0]: Enable the best-case scenario MSI-X vectors. > > > > Attempt [1]: Enable MSI-X vectors with the number of pf- > > >num_lan_msix > > reduced by a factor of 2 from the previous attempt (i.e. > > num_online_cpus() / 2). > > > > Attempt [2]: Same as attempt [1], except reduce by a factor of 4. > > > > Attempt [3]: Enable the bare-minimum MSI-X vectors. > > > > Also, if the adjusted_base_msix ever hits the minimum required for > > LAN, > > then just set the needed MSI-X for that feature to the minimum > > (similar to attempt [3]). > > I don't really get why you switch to this manual "exponential back- > off" > rather than continuing to use pci_enable_msix_range(), but fixing the > capping to ICE_MIN_LAN_VECS. As per the current logic, if the driver does not get the number of MSI- X vectors it needs, it will immediately drop to "Do I have at least two (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable a single Tx/Rx traffic queue pair, bound to one of the two MSI-X vectors. This is a bit of an all-or-nothing type approach. There's a mid-ground that can allow more queues to be enabled (ex. driver asked for 300 vectors, but got 68 vectors, so enabled 64 data queues) and this patch implements the mid-ground logic. This mid-ground logic can also be implemented based on the return value of pci_enable_msix_range() but IMHO the implementation in this patch using pci_enable_msix_exact is better because it's always only enabling/reserving as many MSI-X vectors as required, not more, not less. > > > To fix the allocation/reservation of MSI-X, the PF VSI needs to > > take > > into account the pf->num_lan_msix available and only allocate up to > > that > > many MSI-X vectors. To do this, limit the number of Tx and Rx > > queues > > based on pf->num_lan_msix. This is done because we don't want more > > than > > 1 Tx/Rx queue per interrupt due to performance concerns. Also, > > limit the > > number of MSI-X based on pf->num_lan_msix available. > > > > Also, prevent users from enabling more combined queues than there > > are > > MSI-X available via ethtool. > > Right, this part sounds like a fix, and should be a separate patch > against net, not net-next. ACK. We will do a different patch for this. > > > Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range") > > Signed-off-by: Brett Creeley <brett.creeley@intel.com> > > Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
On Wed, 20 Jan 2021 00:12:26 +0000 Venkataramanan, Anirudh wrote: > > > Attempt [0]: Enable the best-case scenario MSI-X vectors. > > > > > > Attempt [1]: Enable MSI-X vectors with the number of pf- > > > >num_lan_msix > > > reduced by a factor of 2 from the previous attempt (i.e. > > > num_online_cpus() / 2). > > > > > > Attempt [2]: Same as attempt [1], except reduce by a factor of 4. > > > > > > Attempt [3]: Enable the bare-minimum MSI-X vectors. > > > > > > Also, if the adjusted_base_msix ever hits the minimum required for > > > LAN, > > > then just set the needed MSI-X for that feature to the minimum > > > (similar to attempt [3]). > > > > I don't really get why you switch to this manual "exponential back- > > off" > > rather than continuing to use pci_enable_msix_range(), but fixing the > > capping to ICE_MIN_LAN_VECS. > > As per the current logic, if the driver does not get the number of MSI- > X vectors it needs, it will immediately drop to "Do I have at least two > (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable a > single Tx/Rx traffic queue pair, bound to one of the two MSI-X vectors. > > This is a bit of an all-or-nothing type approach. There's a mid-ground > that can allow more queues to be enabled (ex. driver asked for 300 > vectors, but got 68 vectors, so enabled 64 data queues) and this patch > implements the mid-ground logic. > > This mid-ground logic can also be implemented based on the return value > of pci_enable_msix_range() but IMHO the implementation in this patch > using pci_enable_msix_exact is better because it's always only > enabling/reserving as many MSI-X vectors as required, not more, not > less. What do you mean by "required" in the last sentence? The driver requests num_online_cpus()-worth of IRQs, so it must work with any number of IRQs. Why is num_cpus() / 1,2,4,8 "required"?
On Tue, 2021-01-19 at 16:41 -0800, Jakub Kicinski wrote: > On Wed, 20 Jan 2021 00:12:26 +0000 Venkataramanan, Anirudh wrote: > > > > Attempt [0]: Enable the best-case scenario MSI-X vectors. > > > > > > > > Attempt [1]: Enable MSI-X vectors with the number of pf- > > > > > num_lan_msix > > > > reduced by a factor of 2 from the previous attempt (i.e. > > > > num_online_cpus() / 2). > > > > > > > > Attempt [2]: Same as attempt [1], except reduce by a factor of > > > > 4. > > > > > > > > Attempt [3]: Enable the bare-minimum MSI-X vectors. > > > > > > > > Also, if the adjusted_base_msix ever hits the minimum required > > > > for > > > > LAN, > > > > then just set the needed MSI-X for that feature to the minimum > > > > (similar to attempt [3]). > > > > > > I don't really get why you switch to this manual "exponential > > > back- > > > off" > > > rather than continuing to use pci_enable_msix_range(), but fixing > > > the > > > capping to ICE_MIN_LAN_VECS. > > > > As per the current logic, if the driver does not get the number of > > MSI- > > X vectors it needs, it will immediately drop to "Do I have at least > > two > > (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable > > a > > single Tx/Rx traffic queue pair, bound to one of the two MSI-X > > vectors. > > > > This is a bit of an all-or-nothing type approach. There's a mid- > > ground > > that can allow more queues to be enabled (ex. driver asked for 300 > > vectors, but got 68 vectors, so enabled 64 data queues) and this > > patch > > implements the mid-ground logic. > > > > This mid-ground logic can also be implemented based on the return > > value > > of pci_enable_msix_range() but IMHO the implementation in this > > patch > > using pci_enable_msix_exact is better because it's always only > > enabling/reserving as many MSI-X vectors as required, not more, not > > less. > > What do you mean by "required" in the last sentence? .. as "required" in that particular iteration of the loop. > The driver > requests num_online_cpus()-worth of IRQs, so it must work with any > number of IRQs. Why is num_cpus() / 1,2,4,8 "required"? Let me back up a bit here. Ultimately, the issue we are trying to solve here is "what happens when the driver doesn't get as many MSI-X vectors as it needs, and how it's interpreted by the end user" Let's say there are these two systems, each with 256 cores but the response to pci_enable_msix_range() is different: System 1: 256 cores, pci_enable_msix_range returns 75 vectors System 2: 256 cores, pci_enable_msix_range returns 220 vectors In this case, the number of queues the user would see enabled on each of these systems would be very different (73 on system 1 and 218 on system 2). This variabilty makes it difficult to define what the expected behavior should be, because it's not exactly obvious to the user how many free MSI-X vectors a given system has. Instead, if the driver reduced it's demand for vectors in a well defined manner (num_cpus() / 1,2,4,8), the user visible difference between the two systems wouldn't be so drastic. If this is plain wrong or if there's a preferred approach, I'd be happy to discuss further. Ani
On Wed, 20 Jan 2021 02:13:36 +0000 Venkataramanan, Anirudh wrote: > > > As per the current logic, if the driver does not get the number of > > > MSI- > > > X vectors it needs, it will immediately drop to "Do I have at least > > > two > > > (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will enable > > > a > > > single Tx/Rx traffic queue pair, bound to one of the two MSI-X > > > vectors. > > > > > > This is a bit of an all-or-nothing type approach. There's a mid- > > > ground > > > that can allow more queues to be enabled (ex. driver asked for 300 > > > vectors, but got 68 vectors, so enabled 64 data queues) and this > > > patch > > > implements the mid-ground logic. > > > > > > This mid-ground logic can also be implemented based on the return > > > value > > > of pci_enable_msix_range() but IMHO the implementation in this > > > patch > > > using pci_enable_msix_exact is better because it's always only > > > enabling/reserving as many MSI-X vectors as required, not more, not > > > less. > > > > What do you mean by "required" in the last sentence? > > .. as "required" in that particular iteration of the loop. > > > The driver > > requests num_online_cpus()-worth of IRQs, so it must work with any > > number of IRQs. Why is num_cpus() / 1,2,4,8 "required"? > > Let me back up a bit here. > > Ultimately, the issue we are trying to solve here is "what happens when > the driver doesn't get as many MSI-X vectors as it needs, and how it's > interpreted by the end user" > > Let's say there are these two systems, each with 256 cores but the > response to pci_enable_msix_range() is different: > > System 1: 256 cores, pci_enable_msix_range returns 75 vectors > System 2: 256 cores, pci_enable_msix_range returns 220 vectors > > In this case, the number of queues the user would see enabled on each > of these systems would be very different (73 on system 1 and 218 on > system 2). This variabilty makes it difficult to define what the > expected behavior should be, because it's not exactly obvious to the > user how many free MSI-X vectors a given system has. Instead, if the > driver reduced it's demand for vectors in a well defined manner > (num_cpus() / 1,2,4,8), the user visible difference between the two > systems wouldn't be so drastic. > > If this is plain wrong or if there's a preferred approach, I'd be happy > to discuss further. Let's stick to the standard Linux way of handling IRQ exhaustion, and rely on pci_enable_msix_range() to pick the number. If the current behavior of pci_enable_msix_range() is now what users want we can change it. Each driver creating its own heuristic is worst of all choices as most brownfield deployments will have a mix of NICs.
On Tue, 2021-01-19 at 18:29 -0800, Jakub Kicinski wrote: > On Wed, 20 Jan 2021 02:13:36 +0000 Venkataramanan, Anirudh wrote: > > > > As per the current logic, if the driver does not get the number > > > > of > > > > MSI- > > > > X vectors it needs, it will immediately drop to "Do I have at > > > > least > > > > two > > > > (ICE_MIN_LAN_VECS) MSI-X vectors?". If yes, the driver will > > > > enable > > > > a > > > > single Tx/Rx traffic queue pair, bound to one of the two MSI-X > > > > vectors. > > > > > > > > This is a bit of an all-or-nothing type approach. There's a > > > > mid- > > > > ground > > > > that can allow more queues to be enabled (ex. driver asked for > > > > 300 > > > > vectors, but got 68 vectors, so enabled 64 data queues) and > > > > this > > > > patch > > > > implements the mid-ground logic. > > > > > > > > This mid-ground logic can also be implemented based on the > > > > return > > > > value > > > > of pci_enable_msix_range() but IMHO the implementation in this > > > > patch > > > > using pci_enable_msix_exact is better because it's always only > > > > enabling/reserving as many MSI-X vectors as required, not more, > > > > not > > > > less. > > > > > > What do you mean by "required" in the last sentence? > > > > .. as "required" in that particular iteration of the loop. > > > > > The driver > > > requests num_online_cpus()-worth of IRQs, so it must work with > > > any > > > number of IRQs. Why is num_cpus() / 1,2,4,8 "required"? > > > > Let me back up a bit here. > > > > Ultimately, the issue we are trying to solve here is "what happens > > when > > the driver doesn't get as many MSI-X vectors as it needs, and how > > it's > > interpreted by the end user" > > > > Let's say there are these two systems, each with 256 cores but the > > response to pci_enable_msix_range() is different: > > > > System 1: 256 cores, pci_enable_msix_range returns 75 vectors > > System 2: 256 cores, pci_enable_msix_range returns 220 vectors > > > > In this case, the number of queues the user would see enabled on > > each > > of these systems would be very different (73 on system 1 and 218 on > > system 2). This variabilty makes it difficult to define what the > > expected behavior should be, because it's not exactly obvious to > > the > > user how many free MSI-X vectors a given system has. Instead, if > > the > > driver reduced it's demand for vectors in a well defined manner > > (num_cpus() / 1,2,4,8), the user visible difference between the two > > systems wouldn't be so drastic. > > > > If this is plain wrong or if there's a preferred approach, I'd be > > happy > > to discuss further. > > Let's stick to the standard Linux way of handling IRQ exhaustion, and > rely on pci_enable_msix_range() to pick the number. If the current > behavior of pci_enable_msix_range() is now what users want we can > change it. Each driver creating its own heuristic is worst of all > choices as most brownfield deployments will have a mix of NICs. Okay, we will rework the fallback improvement logic. Just so you are aware, we will be posting a couple of bug-fix patches that fix issues around the current fallback logic.
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 56725356a17b..3cf6f2bdca41 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -68,8 +68,10 @@ #define ICE_INT_NAME_STR_LEN (IFNAMSIZ + 16) #define ICE_AQ_LEN 64 #define ICE_MBXSQ_LEN 64 -#define ICE_MIN_MSIX 2 #define ICE_FDIR_MSIX 1 +#define ICE_MIN_LAN_MSIX 1 +#define ICE_OICR_MSIX 1 +#define ICE_MIN_MSIX (ICE_MIN_LAN_MSIX + ICE_OICR_MSIX) #define ICE_NO_VSI 0xffff #define ICE_VSI_MAP_CONTIG 0 #define ICE_VSI_MAP_SCATTER 1 diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index 9e8e9531cd87..69c113a4de7e 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -3258,8 +3258,8 @@ ice_set_rxfh(struct net_device *netdev, const u32 *indir, const u8 *key, */ static int ice_get_max_txq(struct ice_pf *pf) { - return min_t(int, num_online_cpus(), - pf->hw.func_caps.common_cap.num_txq); + return min3(pf->num_lan_msix, (u16)num_online_cpus(), + (u16)pf->hw.func_caps.common_cap.num_txq); } /** @@ -3268,8 +3268,8 @@ static int ice_get_max_txq(struct ice_pf *pf) */ static int ice_get_max_rxq(struct ice_pf *pf) { - return min_t(int, num_online_cpus(), - pf->hw.func_caps.common_cap.num_rxq); + return min3(pf->num_lan_msix, (u16)num_online_cpus(), + (u16)pf->hw.func_caps.common_cap.num_rxq); } /** diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 3df67486d42d..01da18ee17da 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -161,8 +161,10 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id) switch (vsi->type) { case ICE_VSI_PF: - vsi->alloc_txq = min_t(int, ice_get_avail_txq_count(pf), - num_online_cpus()); + /* default to 1 Tx queue per MSI-X to not hurt our performance */ + vsi->alloc_txq = min3(pf->num_lan_msix, + ice_get_avail_txq_count(pf), + (u16)num_online_cpus()); if (vsi->req_txq) { vsi->alloc_txq = vsi->req_txq; vsi->num_txq = vsi->req_txq; @@ -174,8 +176,10 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id) if (!test_bit(ICE_FLAG_RSS_ENA, pf->flags)) { vsi->alloc_rxq = 1; } else { - vsi->alloc_rxq = min_t(int, ice_get_avail_rxq_count(pf), - num_online_cpus()); + /* default to 1 Rx queue per MSI-X to not hurt our performance */ + vsi->alloc_rxq = min3(pf->num_lan_msix, + ice_get_avail_rxq_count(pf), + (u16)num_online_cpus()); if (vsi->req_rxq) { vsi->alloc_rxq = vsi->req_rxq; vsi->num_rxq = vsi->req_rxq; @@ -184,7 +188,8 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi, u16 vf_id) pf->num_lan_rx = vsi->alloc_rxq; - vsi->num_q_vectors = max_t(int, vsi->alloc_rxq, vsi->alloc_txq); + vsi->num_q_vectors = min_t(int, pf->num_lan_msix, + max_t(int, vsi->alloc_rxq, vsi->alloc_txq)); break; case ICE_VSI_VF: vf = &pf->vf[vsi->vf_id]; diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 6e251dfffc91..040e868a0637 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -3367,95 +3367,105 @@ static int ice_init_pf(struct ice_pf *pf) return 0; } +static int ice_alloc_msix_entries(struct ice_pf *pf, u16 num_entries) +{ + u16 i; + + pf->msix_entries = devm_kcalloc(ice_pf_to_dev(pf), num_entries, + sizeof(*pf->msix_entries), GFP_KERNEL); + if (!pf->msix_entries) + return -ENOMEM; + + for (i = 0; i < num_entries; i++) + pf->msix_entries[i].entry = i; + + return 0; +} + +static void ice_free_msix_entries(struct ice_pf *pf) +{ + devm_kfree(ice_pf_to_dev(pf), pf->msix_entries); + pf->msix_entries = NULL; +} + /** - * ice_ena_msix_range - Request a range of MSIX vectors from the OS + * ice_ena_msix_range - request a range of MSI-X vectors from the OS * @pf: board private structure * - * compute the number of MSIX vectors required (v_budget) and request from - * the OS. Return the number of vectors reserved or negative on failure + * The driver first tries to enable best-case scenario MSI-X vectors. If that + * doesn't succeed then a fall-back method is employed. + * + * The fall-back logic is described below with each [#] being an attempt at + * enabling a certain number of MSI-X. If any of the steps succeed, then return + * the number of MSI-X enabled from pci_ena_msix_exact(). If any of the + * attempts fail, then goto the next step. + * + * Attempt [0]: Enable the best-case scenario MSI-X vectors. + * + * Attempt [1]: Enable MSI-X vectors with the number of pf->num_lan_msix reduced + * by a factor of 2 from the previous attempts (i.e. num_online_cpus() / 2). + * + * Attempt [2]: Same as attempt [1], except reduce both by a factor of 4. + * + * Attempt [3]: Enable the bare-minimum MSI-X vectors. + * + * Also, if the adjusted_base_msix ever hits the mimimum required for LAN, then + * just set the needed MSI-X for that feature to the minimum (similar to + * attempt [3]). */ static int ice_ena_msix_range(struct ice_pf *pf) { + int err = -ENOSPC, num_cpus, attempt, adjusted_msix_divisor = 1, needed; struct device *dev = ice_pf_to_dev(pf); - int v_left, v_actual, v_budget = 0; - int needed, err, i; - - v_left = pf->hw.func_caps.common_cap.num_msix_vectors; - - /* reserve one vector for miscellaneous handler */ - needed = 1; - if (v_left < needed) - goto no_hw_vecs_left_err; - v_budget += needed; - v_left -= needed; - - /* reserve vectors for LAN traffic */ - needed = min_t(int, num_online_cpus(), v_left); - if (v_left < needed) - goto no_hw_vecs_left_err; - pf->num_lan_msix = needed; - v_budget += needed; - v_left -= needed; - - /* reserve one vector for flow director */ - if (test_bit(ICE_FLAG_FD_ENA, pf->flags)) { - needed = ICE_FDIR_MSIX; - if (v_left < needed) - goto no_hw_vecs_left_err; - v_budget += needed; - v_left -= needed; - } - - pf->msix_entries = devm_kcalloc(dev, v_budget, - sizeof(*pf->msix_entries), GFP_KERNEL); - if (!pf->msix_entries) { - err = -ENOMEM; - goto exit_err; - } + num_cpus = num_online_cpus(); - for (i = 0; i < v_budget; i++) - pf->msix_entries[i].entry = i; +#define ICE_MAX_ENABLE_MSIX_ATTEMPTS 3 + /* make multiple passes at enabling MSI-X vectors in case there aren't + * enough available for the best-case scenario + */ + for (attempt = 0; attempt <= ICE_MAX_ENABLE_MSIX_ATTEMPTS; attempt++) { + int adjusted_base_msix = num_cpus / adjusted_msix_divisor; - /* actually reserve the vectors */ - v_actual = pci_enable_msix_range(pf->pdev, pf->msix_entries, - ICE_MIN_MSIX, v_budget); - - if (v_actual < 0) { - dev_err(dev, "unable to reserve MSI-X vectors\n"); - err = v_actual; - goto msix_err; - } - - if (v_actual < v_budget) { - dev_warn(dev, "not enough OS MSI-X vectors. requested = %d, obtained = %d\n", - v_budget, v_actual); -/* 2 vectors each for LAN and RDMA (traffic + OICR), one for flow director */ -#define ICE_MIN_LAN_VECS 2 -#define ICE_MIN_RDMA_VECS 2 -#define ICE_MIN_VECS (ICE_MIN_LAN_VECS + ICE_MIN_RDMA_VECS + 1) - - if (v_actual < ICE_MIN_LAN_VECS) { - /* error if we can't get minimum vectors */ - pci_disable_msix(pf->pdev); - err = -ERANGE; - goto msix_err; + /* attempt to enable minimum MSI-X range */ + if (attempt == ICE_MAX_ENABLE_MSIX_ATTEMPTS) { + needed = ICE_MIN_MSIX; + pf->num_lan_msix = ICE_MIN_LAN_MSIX; } else { - pf->num_lan_msix = ICE_MIN_LAN_VECS; + if (adjusted_base_msix > ICE_MIN_LAN_MSIX) + pf->num_lan_msix = adjusted_base_msix; + else + pf->num_lan_msix = ICE_MIN_LAN_MSIX; + + needed = pf->num_lan_msix + ICE_OICR_MSIX; } - } - return v_actual; + if (test_bit(ICE_FLAG_FD_ENA, pf->flags)) + needed += ICE_FDIR_MSIX; -msix_err: - devm_kfree(dev, pf->msix_entries); - goto exit_err; + err = ice_alloc_msix_entries(pf, needed); + if (err) + goto err_out; + + dev_dbg(dev, "attempting to enable %d MSI-X vectors\n", needed); + err = pci_enable_msix_exact(pf->pdev, pf->msix_entries, needed); + if (err < 0) { + ice_free_msix_entries(pf); + dev_notice(dev, "Couldn't get %d MSI-X vectors due to OS, Platform, and/or PCI-function limitations. Reducing request and retrying.", + needed); + + adjusted_msix_divisor *= 2; + } else { + if (pf->num_lan_msix != num_cpus) + dev_notice(dev, "Enabled %d MSI-X vectors for LAN traffic.\n", + pf->num_lan_msix); + + return needed; + } + } -no_hw_vecs_left_err: - dev_err(dev, "not enough device MSI-X vectors. requested = %d, available = %d\n", - needed, v_left); - err = -ERANGE; -exit_err: +err_out: + dev_err(dev, "failed to enable MSI-X vectors\n"); pf->num_lan_msix = 0; return err; } @@ -3467,8 +3477,7 @@ static int ice_ena_msix_range(struct ice_pf *pf) static void ice_dis_msix(struct ice_pf *pf) { pci_disable_msix(pf->pdev); - devm_kfree(ice_pf_to_dev(pf), pf->msix_entries); - pf->msix_entries = NULL; + ice_free_msix_entries(pf); } /**