diff mbox series

[net-next,06/16] idpf: a use saner limit for default number of queues to allocate

Message ID 20250305162132.1106080-7-aleksander.lobakin@intel.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series idpf: add XDP support | expand

Checks

Context Check Description
netdev/series_format fail Series longer than 15 patches
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 13 of 13 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 22 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 104 this patch: 104
netdev/source_inline success Was 0 now: 0

Commit Message

Alexander Lobakin March 5, 2025, 4:21 p.m. UTC
Currently, the maximum number of queues available for one vport is 16.
This is hardcoded, but then the function calculating the optimal number
of queues takes min(16, num_online_cpus()).
On order to be able to allocate more queues, which will be then used for
XDP, stop hardcoding 16 and rely on what the device gives us. Instead of
num_online_cpus(), which is considered suboptimal since at least 2013,
use netif_get_num_default_rss_queues() to still have free queues in the
pool.
nr_cpu_ids number of Tx queues are needed only for lockless XDP sending,
the regular stack doesn't benefit from that anyhow.
On a 128-thread Xeon, this now gives me 32 regular Tx queues and leaves
224 free for XDP (128 of which will handle XDP_TX, .ndo_xdp_xmit(), and
XSk xmit when enabled).

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf_txrx.c     | 8 +-------
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 2 +-
 2 files changed, 2 insertions(+), 8 deletions(-)

Comments

Maciej Fijalkowski March 7, 2025, 10:32 a.m. UTC | #1
On Wed, Mar 05, 2025 at 05:21:22PM +0100, Alexander Lobakin wrote:
> Currently, the maximum number of queues available for one vport is 16.
> This is hardcoded, but then the function calculating the optimal number
> of queues takes min(16, num_online_cpus()).
> On order to be able to allocate more queues, which will be then used for

nit: s/On/In

> XDP, stop hardcoding 16 and rely on what the device gives us. Instead of
> num_online_cpus(), which is considered suboptimal since at least 2013,
> use netif_get_num_default_rss_queues() to still have free queues in the
> pool.

Should we update older drivers as well?

> nr_cpu_ids number of Tx queues are needed only for lockless XDP sending,
> the regular stack doesn't benefit from that anyhow.
> On a 128-thread Xeon, this now gives me 32 regular Tx queues and leaves
> 224 free for XDP (128 of which will handle XDP_TX, .ndo_xdp_xmit(), and
> XSk xmit when enabled).
> 
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  drivers/net/ethernet/intel/idpf/idpf_txrx.c     | 8 +-------
>  drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 2 +-
>  2 files changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> index c15833928ea1..2f221c0abad8 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> @@ -1234,13 +1234,7 @@ int idpf_vport_calc_total_qs(struct idpf_adapter *adapter, u16 vport_idx,
>  		num_req_tx_qs = vport_config->user_config.num_req_tx_qs;
>  		num_req_rx_qs = vport_config->user_config.num_req_rx_qs;
>  	} else {
> -		int num_cpus;
> -
> -		/* Restrict num of queues to cpus online as a default
> -		 * configuration to give best performance. User can always
> -		 * override to a max number of queues via ethtool.
> -		 */
> -		num_cpus = num_online_cpus();
> +		u32 num_cpus = netif_get_num_default_rss_queues();
>  
>  		dflt_splitq_txq_grps = min_t(int, max_q->max_txq, num_cpus);
>  		dflt_singleq_txqs = min_t(int, max_q->max_txq, num_cpus);
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index 3d2413b8684f..135af3cc243f 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -937,7 +937,7 @@ int idpf_vport_alloc_max_qs(struct idpf_adapter *adapter,
>  	max_tx_q = le16_to_cpu(caps->max_tx_q) / default_vports;
>  	if (adapter->num_alloc_vports < default_vports) {
>  		max_q->max_rxq = min_t(u16, max_rx_q, IDPF_MAX_Q);
> -		max_q->max_txq = min_t(u16, max_tx_q, IDPF_MAX_Q);
> +		max_q->max_txq = min_t(u16, max_tx_q, IDPF_LARGE_MAX_Q);
>  	} else {
>  		max_q->max_rxq = IDPF_MIN_Q;
>  		max_q->max_txq = IDPF_MIN_Q;
> -- 
> 2.48.1
>
Alexander Lobakin March 12, 2025, 5:22 p.m. UTC | #2
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Fri, 7 Mar 2025 11:32:15 +0100

> On Wed, Mar 05, 2025 at 05:21:22PM +0100, Alexander Lobakin wrote:
>> Currently, the maximum number of queues available for one vport is 16.
>> This is hardcoded, but then the function calculating the optimal number
>> of queues takes min(16, num_online_cpus()).
>> On order to be able to allocate more queues, which will be then used for
> 
> nit: s/On/In

Also "use a saner limit", not "a use saner limit" in the subject =\

> 
>> XDP, stop hardcoding 16 and rely on what the device gives us. Instead of
>> num_online_cpus(), which is considered suboptimal since at least 2013,
>> use netif_get_num_default_rss_queues() to still have free queues in the
>> pool.
> 
> Should we update older drivers as well?

That would be good.

For idpf, this is particularly important since the current logic eats
128 Tx queues for skb traffic on my Xeon out of 256 available by default
(per vport). On a 256-thread system, it would eat the whole limit,
leaving nothing for XDP >_< ice doesn't have a per-port limit IIRC.

> 
>> nr_cpu_ids number of Tx queues are needed only for lockless XDP sending,
>> the regular stack doesn't benefit from that anyhow.
>> On a 128-thread Xeon, this now gives me 32 regular Tx queues and leaves
>> 224 free for XDP (128 of which will handle XDP_TX, .ndo_xdp_xmit(), and
>> XSk xmit when enabled).

Thanks,
Olek
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index c15833928ea1..2f221c0abad8 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -1234,13 +1234,7 @@  int idpf_vport_calc_total_qs(struct idpf_adapter *adapter, u16 vport_idx,
 		num_req_tx_qs = vport_config->user_config.num_req_tx_qs;
 		num_req_rx_qs = vport_config->user_config.num_req_rx_qs;
 	} else {
-		int num_cpus;
-
-		/* Restrict num of queues to cpus online as a default
-		 * configuration to give best performance. User can always
-		 * override to a max number of queues via ethtool.
-		 */
-		num_cpus = num_online_cpus();
+		u32 num_cpus = netif_get_num_default_rss_queues();
 
 		dflt_splitq_txq_grps = min_t(int, max_q->max_txq, num_cpus);
 		dflt_singleq_txqs = min_t(int, max_q->max_txq, num_cpus);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 3d2413b8684f..135af3cc243f 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -937,7 +937,7 @@  int idpf_vport_alloc_max_qs(struct idpf_adapter *adapter,
 	max_tx_q = le16_to_cpu(caps->max_tx_q) / default_vports;
 	if (adapter->num_alloc_vports < default_vports) {
 		max_q->max_rxq = min_t(u16, max_rx_q, IDPF_MAX_Q);
-		max_q->max_txq = min_t(u16, max_tx_q, IDPF_MAX_Q);
+		max_q->max_txq = min_t(u16, max_tx_q, IDPF_LARGE_MAX_Q);
 	} else {
 		max_q->max_rxq = IDPF_MIN_Q;
 		max_q->max_txq = IDPF_MIN_Q;