diff mbox series

[net-next,5/7] bnxt_en: Change MSIX/NQs allocation policy

Message ID 20240409215431.41424-6-michael.chan@broadcom.com (mailing list archive)
State Accepted
Commit 2e4592dc9bee5ca4fd3da4c5451c7f97c76442bf
Delegated to: Netdev Maintainers
Headers show
Series bnxt_en: Updates for net-next | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 942 this patch: 942
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 953 this patch: 953
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 957 this patch: 957
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 222 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 2 this patch: 2
netdev/source_inline success Was 0 now: 0

Commit Message

Michael Chan April 9, 2024, 9:54 p.m. UTC
From: Vikas Gupta <vikas.gupta@broadcom.com>

The existing scheme sets aside a number of MSIX/NQs for the RoCE
driver whether the RoCE driver is registered or not.  This scheme
is not flexible and limits the resources available for the L2 rings
if RoCE is never used.

Modify the scheme so that the RoCE MSIX/NQs can be used by the L2
driver if they are not used for RoCE.  The MSIX/NQs are now
represented by 3 fields.  bp->ulp_num_msix_want contains the
desired default value, edev->ulp_num_msix_vec contains the
available value (but not necessarily in use), and
ulp_tbl->msix_requested contains the actual value in use by RoCE.

The L2 driver can dip into edev->ulp_num_msix_vec if necessary.

We need to add rtnl_lock() back in bnxt_register_dev() and
bnxt_unregister_dev() to synchronize the MSIX usage between L2 and
RoCE.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 25 +++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 77 +++++++++++++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h |  8 +-
 4 files changed, 92 insertions(+), 19 deletions(-)

Comments

Jacob Keller April 9, 2024, 11:40 p.m. UTC | #1
On 4/9/2024 2:54 PM, Michael Chan wrote:
> From: Vikas Gupta <vikas.gupta@broadcom.com>
> 
> The existing scheme sets aside a number of MSIX/NQs for the RoCE
> driver whether the RoCE driver is registered or not.  This scheme
> is not flexible and limits the resources available for the L2 rings
> if RoCE is never used.
> 
> Modify the scheme so that the RoCE MSIX/NQs can be used by the L2
> driver if they are not used for RoCE.  The MSIX/NQs are now
> represented by 3 fields.  bp->ulp_num_msix_want contains the
> desired default value, edev->ulp_num_msix_vec contains the
> available value (but not necessarily in use), and
> ulp_tbl->msix_requested contains the actual value in use by RoCE.
> 
> The L2 driver can dip into edev->ulp_num_msix_vec if necessary.
> 
> We need to add rtnl_lock() back in bnxt_register_dev() and
> bnxt_unregister_dev() to synchronize the MSIX usage between L2 and
> RoCE.
> 
> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---

Whats the behavior if the L2 driver dips into this pool and then RoCE is
enabled later?

I guess RoCE would fail to get the resources it needs, but then system
administrator could re-configure the L2 device to use fewer resources?

Makes sense.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

Thanks,
Jake
Michael Chan April 9, 2024, 11:48 p.m. UTC | #2
On Tue, Apr 9, 2024 at 4:40 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
>
>
> On 4/9/2024 2:54 PM, Michael Chan wrote:
> > From: Vikas Gupta <vikas.gupta@broadcom.com>
> >
> > The existing scheme sets aside a number of MSIX/NQs for the RoCE
> > driver whether the RoCE driver is registered or not.  This scheme
> > is not flexible and limits the resources available for the L2 rings
> > if RoCE is never used.
> >
> > Modify the scheme so that the RoCE MSIX/NQs can be used by the L2
> > driver if they are not used for RoCE.  The MSIX/NQs are now
> > represented by 3 fields.  bp->ulp_num_msix_want contains the
> > desired default value, edev->ulp_num_msix_vec contains the
> > available value (but not necessarily in use), and
> > ulp_tbl->msix_requested contains the actual value in use by RoCE.
> >
> > The L2 driver can dip into edev->ulp_num_msix_vec if necessary.
> >
> > We need to add rtnl_lock() back in bnxt_register_dev() and
> > bnxt_unregister_dev() to synchronize the MSIX usage between L2 and
> > RoCE.
> >
> > Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> > Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
> > Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> > ---
>
> Whats the behavior if the L2 driver dips into this pool and then RoCE is
> enabled later?

Thanks for the review.  If the user increases the L2 rings or enables
XDP which will cause the driver to allocate a new set of XDP TX rings,
it can now use the RoCE MSIX if needed and if they are not in use.

>
> I guess RoCE would fail to get the resources it needs, but then system
> administrator could re-configure the L2 device to use fewer resources?

If the above simply reduces the RoCE MSIX, the RoCE driver can still
operate with fewer MSIX.  If the above has reduced the MSIX below the
minimum required for RoCE, then RoCE will fail to initialize.  At that
point, the user can reduce the L2 rings and reload the RoCE driver.
Jacob Keller April 10, 2024, 7:28 p.m. UTC | #3
On 4/9/2024 4:48 PM, Michael Chan wrote:
> On Tue, Apr 9, 2024 at 4:40 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>>
>>
>>
>> On 4/9/2024 2:54 PM, Michael Chan wrote:
>>> From: Vikas Gupta <vikas.gupta@broadcom.com>
>>>
>>> The existing scheme sets aside a number of MSIX/NQs for the RoCE
>>> driver whether the RoCE driver is registered or not.  This scheme
>>> is not flexible and limits the resources available for the L2 rings
>>> if RoCE is never used.
>>>
>>> Modify the scheme so that the RoCE MSIX/NQs can be used by the L2
>>> driver if they are not used for RoCE.  The MSIX/NQs are now
>>> represented by 3 fields.  bp->ulp_num_msix_want contains the
>>> desired default value, edev->ulp_num_msix_vec contains the
>>> available value (but not necessarily in use), and
>>> ulp_tbl->msix_requested contains the actual value in use by RoCE.
>>>
>>> The L2 driver can dip into edev->ulp_num_msix_vec if necessary.
>>>
>>> We need to add rtnl_lock() back in bnxt_register_dev() and
>>> bnxt_unregister_dev() to synchronize the MSIX usage between L2 and
>>> RoCE.
>>>
>>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>>> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
>>> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
>>> ---
>>
>> Whats the behavior if the L2 driver dips into this pool and then RoCE is
>> enabled later?
> 
> Thanks for the review.  If the user increases the L2 rings or enables
> XDP which will cause the driver to allocate a new set of XDP TX rings,
> it can now use the RoCE MSIX if needed and if they are not in use.
> 
>>
>> I guess RoCE would fail to get the resources it needs, but then system
>> administrator could re-configure the L2 device to use fewer resources?
> 
> If the above simply reduces the RoCE MSIX, the RoCE driver can still
> operate with fewer MSIX.  If the above has reduced the MSIX below the
> minimum required for RoCE, then RoCE will fail to initialize.  At that
> point, the user can reduce the L2 rings and reload the RoCE driver.

Sensible behavior. Thanks for the detailed explanation

-Jake
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8c24e83ba050..88cf8f47e071 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7549,6 +7549,20 @@  static int __bnxt_reserve_rings(struct bnxt *bp)
 	if (!netif_is_rxfh_configured(bp->dev))
 		bnxt_set_dflt_rss_indir_tbl(bp, NULL);
 
+	if (!bnxt_ulp_registered(bp->edev) && BNXT_NEW_RM(bp)) {
+		int resv_msix, resv_ctx, ulp_msix, ulp_ctxs;
+		struct bnxt_hw_resc *hw_resc;
+
+		hw_resc = &bp->hw_resc;
+		resv_msix = hw_resc->resv_irqs - bp->cp_nr_rings;
+		ulp_msix = bnxt_get_ulp_msix_num(bp);
+		ulp_msix = min_t(int, resv_msix, ulp_msix);
+		bnxt_set_ulp_msix_num(bp, ulp_msix);
+		resv_ctx = hw_resc->resv_stat_ctxs  - bp->cp_nr_rings;
+		ulp_ctxs = min(resv_ctx, bnxt_get_ulp_stat_ctxs(bp));
+		bnxt_set_ulp_stat_ctxs(bp, ulp_ctxs);
+	}
+
 	return rc;
 }
 
@@ -14982,6 +14996,7 @@  static void bnxt_trim_dflt_sh_rings(struct bnxt *bp)
 static int bnxt_set_dflt_rings(struct bnxt *bp, bool sh)
 {
 	int dflt_rings, max_rx_rings, max_tx_rings, rc;
+	int avail_msix;
 
 	if (!bnxt_can_reserve_rings(bp))
 		return 0;
@@ -15009,6 +15024,14 @@  static int bnxt_set_dflt_rings(struct bnxt *bp, bool sh)
 		bp->cp_nr_rings = bp->tx_nr_rings_per_tc + bp->rx_nr_rings;
 	bp->tx_nr_rings = bp->tx_nr_rings_per_tc;
 
+	avail_msix = bnxt_get_max_func_irqs(bp) - bp->cp_nr_rings;
+	if (avail_msix >= BNXT_MIN_ROCE_CP_RINGS) {
+		int ulp_num_msix = min(avail_msix, bp->ulp_num_msix_want);
+
+		bnxt_set_ulp_msix_num(bp, ulp_num_msix);
+		bnxt_set_dflt_ulp_stat_ctxs(bp);
+	}
+
 	rc = __bnxt_reserve_rings(bp);
 	if (rc && rc != -ENODEV)
 		netdev_warn(bp->dev, "Unable to reserve tx rings\n");
@@ -15358,6 +15381,7 @@  static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	bnxt_set_rx_skb_mode(bp, false);
 	bnxt_set_tpa_flags(bp);
 	bnxt_set_ring_params(bp);
+	bnxt_rdma_aux_device_init(bp);
 	rc = bnxt_set_dflt_rings(bp, true);
 	if (rc) {
 		if (BNXT_VF(bp) && rc == -ENODEV) {
@@ -15411,7 +15435,6 @@  static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (BNXT_SUPPORTS_NTUPLE_VNIC(bp))
 		bnxt_init_multi_rss_ctx(bp);
 
-	bnxt_rdma_aux_device_init(bp);
 
 	rc = register_netdev(dev);
 	if (rc)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 0640fcb57ef8..ad57ef051798 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2303,6 +2303,7 @@  struct bnxt {
 
 	struct bnxt_irq	*irq_tbl;
 	int			total_irqs;
+	int			ulp_num_msix_want;
 	u8			mac_addr[ETH_ALEN];
 
 #ifdef CONFIG_BNXT_DCB
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index ae2b367b1226..de2cb1d4cd98 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -48,6 +48,46 @@  static void bnxt_fill_msix_vecs(struct bnxt *bp, struct bnxt_msix_entry *ent)
 	}
 }
 
+int bnxt_get_ulp_msix_num(struct bnxt *bp)
+{
+	if (bp->edev)
+		return bp->edev->ulp_num_msix_vec;
+	return 0;
+}
+
+void bnxt_set_ulp_msix_num(struct bnxt *bp, int num)
+{
+	if (bp->edev)
+		bp->edev->ulp_num_msix_vec = num;
+}
+
+int bnxt_get_ulp_stat_ctxs(struct bnxt *bp)
+{
+	if (bp->edev)
+		return bp->edev->ulp_num_ctxs;
+	return 0;
+}
+
+void bnxt_set_ulp_stat_ctxs(struct bnxt *bp, int num_ulp_ctx)
+{
+	if (bp->edev)
+		bp->edev->ulp_num_ctxs = num_ulp_ctx;
+}
+
+void bnxt_set_dflt_ulp_stat_ctxs(struct bnxt *bp)
+{
+	if (bp->edev) {
+		bp->edev->ulp_num_ctxs = BNXT_MIN_ROCE_STAT_CTXS;
+		/* Reserve one additional stat_ctx for PF0 (except
+		 * on 1-port NICs) as it also creates one stat_ctx
+		 * for PF1 in case of RoCE bonding.
+		 */
+		if (BNXT_PF(bp) && !bp->pf.port_id &&
+		    bp->port_count > 1)
+			bp->edev->ulp_num_ctxs++;
+	}
+}
+
 int bnxt_register_dev(struct bnxt_en_dev *edev,
 		      struct bnxt_ulp_ops *ulp_ops,
 		      void *handle)
@@ -56,11 +96,19 @@  int bnxt_register_dev(struct bnxt_en_dev *edev,
 	struct bnxt *bp = netdev_priv(dev);
 	unsigned int max_stat_ctxs;
 	struct bnxt_ulp *ulp;
+	int rc = 0;
 
+	rtnl_lock();
+	if (!bp->irq_tbl) {
+		rc = -ENODEV;
+		goto exit;
+	}
 	max_stat_ctxs = bnxt_get_max_func_stat_ctxs(bp);
 	if (max_stat_ctxs <= BNXT_MIN_ROCE_STAT_CTXS ||
-	    bp->cp_nr_rings == max_stat_ctxs)
-		return -ENOMEM;
+	    bp->cp_nr_rings == max_stat_ctxs) {
+		rc = -ENOMEM;
+		goto exit;
+	}
 
 	ulp = edev->ulp_tbl;
 	ulp->handle = handle;
@@ -69,9 +117,13 @@  int bnxt_register_dev(struct bnxt_en_dev *edev,
 	if (test_bit(BNXT_STATE_OPEN, &bp->state))
 		bnxt_hwrm_vnic_cfg(bp, &bp->vnic_info[BNXT_VNIC_DEFAULT]);
 
+	edev->ulp_tbl->msix_requested = bnxt_get_ulp_msix_num(bp);
+
 	bnxt_fill_msix_vecs(bp, bp->edev->msix_entries);
 	edev->flags |= BNXT_EN_FLAG_MSIX_REQUESTED;
-	return 0;
+exit:
+	rtnl_unlock();
+	return rc;
 }
 EXPORT_SYMBOL(bnxt_register_dev);
 
@@ -83,8 +135,10 @@  void bnxt_unregister_dev(struct bnxt_en_dev *edev)
 	int i = 0;
 
 	ulp = edev->ulp_tbl;
+	rtnl_lock();
 	if (ulp->msix_requested)
 		edev->flags &= ~BNXT_EN_FLAG_MSIX_REQUESTED;
+	edev->ulp_tbl->msix_requested = 0;
 
 	if (ulp->max_async_event_id)
 		bnxt_hwrm_func_drv_rgtr(bp, NULL, 0, true);
@@ -97,11 +151,12 @@  void bnxt_unregister_dev(struct bnxt_en_dev *edev)
 		msleep(100);
 		i++;
 	}
+	rtnl_unlock();
 	return;
 }
 EXPORT_SYMBOL(bnxt_unregister_dev);
 
-int bnxt_get_ulp_msix_num(struct bnxt *bp)
+static int bnxt_set_dflt_ulp_msix(struct bnxt *bp)
 {
 	u32 roce_msix = BNXT_VF(bp) ?
 			BNXT_MAX_VF_ROCE_MSIX : BNXT_MAX_ROCE_MSIX;
@@ -110,18 +165,6 @@  int bnxt_get_ulp_msix_num(struct bnxt *bp)
 		min_t(u32, roce_msix, num_online_cpus()) : 0);
 }
 
-int bnxt_get_ulp_stat_ctxs(struct bnxt *bp)
-{
-	if (bnxt_ulp_registered(bp->edev)) {
-		struct bnxt_en_dev *edev = bp->edev;
-
-		if (edev->ulp_tbl->msix_requested)
-			return BNXT_MIN_ROCE_STAT_CTXS;
-	}
-
-	return 0;
-}
-
 int bnxt_send_msg(struct bnxt_en_dev *edev,
 			 struct bnxt_fw_msg *fw_msg)
 {
@@ -336,7 +379,6 @@  static void bnxt_set_edev_info(struct bnxt_en_dev *edev, struct bnxt *bp)
 	edev->pf_port_id = bp->pf.port_id;
 	edev->en_state = bp->state;
 	edev->bar0 = bp->bar0;
-	edev->ulp_tbl->msix_requested = bnxt_get_ulp_msix_num(bp);
 }
 
 void bnxt_rdma_aux_device_add(struct bnxt *bp)
@@ -409,6 +451,7 @@  void bnxt_rdma_aux_device_init(struct bnxt *bp)
 	aux_priv->edev = edev;
 	bp->edev = edev;
 	bnxt_set_edev_info(edev, bp);
+	bp->ulp_num_msix_want = bnxt_set_dflt_ulp_msix(bp);
 
 	return;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
index ae7266ddf167..04ce3328e66f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
@@ -85,17 +85,23 @@  struct bnxt_en_dev {
 							 * updated in resume.
 							 */
 	void __iomem                    *bar0;
+
+	u16				ulp_num_msix_vec;
+	u16				ulp_num_ctxs;
 };
 
 static inline bool bnxt_ulp_registered(struct bnxt_en_dev *edev)
 {
-	if (edev && edev->ulp_tbl)
+	if (edev && rcu_access_pointer(edev->ulp_tbl->ulp_ops))
 		return true;
 	return false;
 }
 
 int bnxt_get_ulp_msix_num(struct bnxt *bp);
+void bnxt_set_ulp_msix_num(struct bnxt *bp, int num);
 int bnxt_get_ulp_stat_ctxs(struct bnxt *bp);
+void bnxt_set_ulp_stat_ctxs(struct bnxt *bp, int num_ctxs);
+void bnxt_set_dflt_ulp_stat_ctxs(struct bnxt *bp);
 void bnxt_ulp_stop(struct bnxt *bp);
 void bnxt_ulp_start(struct bnxt *bp, int err);
 void bnxt_ulp_sriov_cfg(struct bnxt *bp, int num_vfs);