diff mbox series

[V4,11/12] bnxt_en: Add TPH support in BNXT driver

Message ID 20240822204120.3634-12-wei.huang2@amd.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series PCIe TPH and cache direct injection support | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 19 this patch: 19
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 131 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 2 this patch: 2
netdev/source_inline success Was 0 now: 0

Commit Message

Wei Huang Aug. 22, 2024, 8:41 p.m. UTC
From: Manoj Panicker <manoj.panicker2@amd.com>

Implement TPH support in Broadcom BNXT device driver. The driver uses
TPH functions to retrieve and configure the device's Steering Tags when
its interrupt affinity is being changed.

Co-developed-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 78 +++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
 2 files changed, 82 insertions(+)

Comments

Jakub Kicinski Aug. 26, 2024, 8:22 p.m. UTC | #1
On Thu, 22 Aug 2024 15:41:19 -0500 Wei Huang wrote:
> +		rtnl_lock();
> +		bnxt_close_nic(irq->bp, false, false);
> +		bnxt_open_nic(irq->bp, false, false);
> +		rtnl_unlock();

This code means that under memory pressure changing CPU affinity
can take the machine offline. The entire machine, even if container
orchestration is trying to just move a few IRQs in place for a new
container.

We can't let you do this, it will set a bad precedent. I can't think
of any modern driver with reconfiguration safety as bad as bnxt.
Technical debt coming due.
Andy Gospodarek Aug. 26, 2024, 8:56 p.m. UTC | #2
On Mon, Aug 26, 2024 at 01:22:13PM -0700, Jakub Kicinski wrote:
> On Thu, 22 Aug 2024 15:41:19 -0500 Wei Huang wrote:
> > +		rtnl_lock();
> > +		bnxt_close_nic(irq->bp, false, false);
> > +		bnxt_open_nic(irq->bp, false, false);
> > +		rtnl_unlock();
> 
> This code means that under memory pressure changing CPU affinity
> can take the machine offline. The entire machine, even if container
> orchestration is trying to just move a few IRQs in place for a new
> container.
> 
> We can't let you do this, it will set a bad precedent. I can't think
> of any modern driver with reconfiguration safety as bad as bnxt.
> Technical debt coming due.

Jakub,

We have not said this on the list, but we agree.  We plan to replace these
calls with calls to stop and start only that ring via netdev_rx_queue_restart
as soon as these calls all land in the same tree.  Since this set is
[presumably] coming through linux-pci we didn't think we could do that yet.

Thoughts?

-andy
Jakub Kicinski Aug. 26, 2024, 10:49 p.m. UTC | #3
On Mon, 26 Aug 2024 16:56:36 -0400 Andy Gospodarek wrote:
> We plan to replace these calls with calls to stop and start only that
> ring via netdev_rx_queue_restart as soon as these calls all land in
> the same tree.  Since this set is [presumably] coming through
> linux-pci we didn't think we could do that yet.
> 
> Thoughts?

The merge window is in 3 weeks or so, so this can wait.
I'm worried we'll find out later that the current queue reset
implementation in bnxt turns out to be insufficient. And we'll
be stuck with yet another close/open in this driver.

While we talk about affinity settings in bnxt -- it'd be great
if it maintained the mapping and XPS settings across reconfiguration 
(like queue count changes or attaching XDP).
Andy Gospodarek Aug. 27, 2024, 2:50 p.m. UTC | #4
On Mon, Aug 26, 2024 at 03:49:12PM -0700, Jakub Kicinski wrote:
> On Mon, 26 Aug 2024 16:56:36 -0400 Andy Gospodarek wrote:
> > We plan to replace these calls with calls to stop and start only that
> > ring via netdev_rx_queue_restart as soon as these calls all land in
> > the same tree.  Since this set is [presumably] coming through
> > linux-pci we didn't think we could do that yet.
> > 
> > Thoughts?
> 
> The merge window is in 3 weeks or so, so this can wait.

Are you asking for the patch for this feature to include the queue
stop/start instead of this?  I just checked linux-pci and it does have
bnxt_queue_stop/bnxt_queue_start.

> I'm worried we'll find out later that the current queue reset
> implementation in bnxt turns out to be insufficient. And we'll
> be stuck with yet another close/open in this driver.

The queue reset _has_ to work.  We will ensure that it does and fix any
problems found.  Note that these have been under test already internally
and fixes are/will be posted to the list as they are made.  Holding this
patch because an API that it uses might not work seems odd.
Jakub Kicinski Aug. 27, 2024, 7:05 p.m. UTC | #5
On Tue, 27 Aug 2024 10:50:51 -0400 Andy Gospodarek wrote:
> > The merge window is in 3 weeks or so, so this can wait.  
> 
> Are you asking for the patch for this feature to include the queue
> stop/start instead of this? 

Yes, indeed.

> I just checked linux-pci and it does have bnxt_queue_stop/bnxt_queue_start.
>
> > I'm worried we'll find out later that the current queue reset
> > implementation in bnxt turns out to be insufficient. And we'll
> > be stuck with yet another close/open in this driver.  
> 
> The queue reset _has_ to work.  We will ensure that it does and fix any
> problems found.  Note that these have been under test already internally
> and fixes are/will be posted to the list as they are made.  Holding this
> patch because an API that it uses might not work seems odd.

Not holding because API may not work, holding because (I thought) 
API isn't in place at all. If bnxt_queue_stop/bnxt_queue_start are in
linux-pci please rewrite the patch to use those and then all clear 
from my PoV.
Michael Chan Aug. 27, 2024, 7:20 p.m. UTC | #6
On Tue, Aug 27, 2024 at 12:05 PM Jakub Kicinski <kuba@kernel.org> wrote:

> Not holding because API may not work, holding because (I thought)
> API isn't in place at all. If bnxt_queue_stop/bnxt_queue_start are in
> linux-pci please rewrite the patch to use those and then all clear
> from my PoV.

To be clear, the API is available in the linux-pci tree but the recent
patch from David to check for proper FW support is only in net-next:

97cbf3d0accc ("bnxt_en: only set dev->queue_mgmt_ops if supported by FW")

So we'll need to add this check for TPH also once the 2 trees are merged.
Bjorn Helgaas Sept. 5, 2024, 3:06 p.m. UTC | #7
On Thu, Aug 22, 2024 at 03:41:19PM -0500, Wei Huang wrote:
> From: Manoj Panicker <manoj.panicker2@amd.com>
> 
> Implement TPH support in Broadcom BNXT device driver. The driver uses
> TPH functions to retrieve and configure the device's Steering Tags when
> its interrupt affinity is being changed.

> +	/* Register IRQ affinility notifier */

s/affinility/affinity/
Alejandro Lucero Palau Sept. 11, 2024, 3:37 p.m. UTC | #8
On 8/22/24 21:41, Wei Huang wrote:
> From: Manoj Panicker <manoj.panicker2@amd.com>
>
> Implement TPH support in Broadcom BNXT device driver. The driver uses
> TPH functions to retrieve and configure the device's Steering Tags when
> its interrupt affinity is being changed.
>
> Co-developed-by: Wei Huang <wei.huang2@amd.com>
> Signed-off-by: Wei Huang <wei.huang2@amd.com>
> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> ---
>   drivers/net/ethernet/broadcom/bnxt/bnxt.c | 78 +++++++++++++++++++++++
>   drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
>   2 files changed, 82 insertions(+)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index ffa74c26ee53..5903cd36b54d 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -55,6 +55,7 @@
>   #include <net/page_pool/helpers.h>
>   #include <linux/align.h>
>   #include <net/netdev_queues.h>
> +#include <linux/pci-tph.h>
>   
>   #include "bnxt_hsi.h"
>   #include "bnxt.h"
> @@ -10821,6 +10822,58 @@ int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init)
>   	return 0;
>   }
>   
> +static void __bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
> +				       const cpumask_t *mask)
> +{
> +	struct bnxt_irq *irq;
> +	u16 tag;
> +
> +	irq = container_of(notify, struct bnxt_irq, affinity_notify);
> +	cpumask_copy(irq->cpu_mask, mask);
> +
> +	if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
> +				cpumask_first(irq->cpu_mask), &tag))


I understand just one cpu from the mask has to be used, but I wonder if 
some check should be done for ensuring the mask is not mad.

This is control path and the related queue is going to be restarted, so 
maybe a sanity check for ensuring all the cpus in the mask are from the 
same CCX complex?

That would be an iteration checking the tag is the same one for all of 
them. If not, at least a warning stating the tag/CCX/cpu used.


> +		return;
> +
> +	if (pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag))
> +		return;
> +
> +	if (netif_running(irq->bp->dev)) {
> +		rtnl_lock();
> +		bnxt_close_nic(irq->bp, false, false);
> +		bnxt_open_nic(irq->bp, false, false);
> +		rtnl_unlock();
> +	}
> +}
> +
> +static void __bnxt_irq_affinity_release(struct kref __always_unused *ref)
> +{
> +}
> +
> +static void bnxt_release_irq_notifier(struct bnxt_irq *irq)
> +{
> +	irq_set_affinity_notifier(irq->vector, NULL);
> +}
> +
> +static void bnxt_register_irq_notifier(struct bnxt *bp, struct bnxt_irq *irq)
> +{
> +	struct irq_affinity_notify *notify;
> +
> +	/* Nothing to do if TPH is not enabled */
> +	if (!pcie_tph_enabled(bp->pdev))
> +		return;
> +
> +	irq->bp = bp;
> +
> +	/* Register IRQ affinility notifier */
> +	notify = &irq->affinity_notify;
> +	notify->irq = irq->vector;
> +	notify->notify = __bnxt_irq_affinity_notify;
> +	notify->release = __bnxt_irq_affinity_release;
> +
> +	irq_set_affinity_notifier(irq->vector, notify);
> +}
> +
>   static void bnxt_free_irq(struct bnxt *bp)
>   {
>   	struct bnxt_irq *irq;
> @@ -10843,11 +10896,17 @@ static void bnxt_free_irq(struct bnxt *bp)
>   				free_cpumask_var(irq->cpu_mask);
>   				irq->have_cpumask = 0;
>   			}
> +
> +			bnxt_release_irq_notifier(irq);
> +
>   			free_irq(irq->vector, bp->bnapi[i]);
>   		}
>   
>   		irq->requested = 0;
>   	}
> +
> +	/* Disable TPH support */
> +	pcie_disable_tph(bp->pdev);
>   }
>   
>   static int bnxt_request_irq(struct bnxt *bp)
> @@ -10870,6 +10929,13 @@ static int bnxt_request_irq(struct bnxt *bp)
>   	if (!(bp->flags & BNXT_FLAG_USING_MSIX))
>   		flags = IRQF_SHARED;
>   
> +	/* Enable TPH support as part of IRQ request */
> +	if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC) {
> +		rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
> +		if (rc)
> +			netdev_warn(bp->dev, "failed enabling TPH support\n");
> +	}
> +
>   	for (i = 0, j = 0; i < bp->cp_nr_rings; i++) {
>   		int map_idx = bnxt_cp_num_to_irq_num(bp, i);
>   		struct bnxt_irq *irq = &bp->irq_tbl[map_idx];
> @@ -10893,8 +10959,10 @@ static int bnxt_request_irq(struct bnxt *bp)
>   
>   		if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) {
>   			int numa_node = dev_to_node(&bp->pdev->dev);
> +			u16 tag;
>   
>   			irq->have_cpumask = 1;
> +			irq->msix_nr = map_idx;
>   			cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>   					irq->cpu_mask);
>   			rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
> @@ -10904,6 +10972,16 @@ static int bnxt_request_irq(struct bnxt *bp)
>   					    irq->vector);
>   				break;
>   			}
> +
> +			bnxt_register_irq_notifier(bp, irq);
> +
> +			/* Init ST table entry */
> +			if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
> +						cpumask_first(irq->cpu_mask),
> +						&tag))
> +				break;
> +
> +			pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag);
>   		}
>   	}
>   	return rc;
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> index 6bbdc718c3a7..ae1abcc1bddf 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> @@ -1224,6 +1224,10 @@ struct bnxt_irq {
>   	u8		have_cpumask:1;
>   	char		name[IFNAMSIZ + 2];
>   	cpumask_var_t	cpu_mask;
> +
> +	struct bnxt	*bp;
> +	int		msix_nr;
> +	struct irq_affinity_notify affinity_notify;
>   };
>   
>   #define HWRM_RING_ALLOC_TX	0x1
Wei Huang Sept. 16, 2024, 6:55 p.m. UTC | #9
On 9/11/24 10:37 AM, Alejandro Lucero Palau wrote:
> 
> On 8/22/24 21:41, Wei Huang wrote:
>> From: Manoj Panicker <manoj.panicker2@amd.com>
>>
>> Implement TPH support in Broadcom BNXT device driver. The driver uses
>> TPH functions to retrieve and configure the device's Steering Tags when
>> its interrupt affinity is being changed.
>>
>> Co-developed-by: Wei Huang <wei.huang2@amd.com>
>> Signed-off-by: Wei Huang <wei.huang2@amd.com>
>> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>> ---
>>    drivers/net/ethernet/broadcom/bnxt/bnxt.c | 78 +++++++++++++++++++++++
>>    drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
>>    2 files changed, 82 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> index ffa74c26ee53..5903cd36b54d 100644
>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> @@ -55,6 +55,7 @@
>>    #include <net/page_pool/helpers.h>
>>    #include <linux/align.h>
>>    #include <net/netdev_queues.h>
>> +#include <linux/pci-tph.h>
>>    
>>    #include "bnxt_hsi.h"
>>    #include "bnxt.h"
>> @@ -10821,6 +10822,58 @@ int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init)
>>    	return 0;
>>    }
>>    
>> +static void __bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
>> +				       const cpumask_t *mask)
>> +{
>> +	struct bnxt_irq *irq;
>> +	u16 tag;
>> +
>> +	irq = container_of(notify, struct bnxt_irq, affinity_notify);
>> +	cpumask_copy(irq->cpu_mask, mask);
>> +
>> +	if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
>> +				cpumask_first(irq->cpu_mask), &tag))
> 
> 
> I understand just one cpu from the mask has to be used, but I wonder if
> some check should be done for ensuring the mask is not mad.
> 
> This is control path and the related queue is going to be restarted, so
> maybe a sanity check for ensuring all the cpus in the mask are from the
> same CCX complex?

I don't think this is always true and we shouldn't warn when this 
happens. There is only one ST can be supported, so the driver need to 
make a good judgement on which ST to be used. But no matter what, ST is 
just a hint - it shouldn't cause any correctness issues in HW, even when 
it is not the optimal target CPU. So warning is unnecessary.

> 
> That would be an iteration checking the tag is the same one for all of
> them. If not, at least a warning stating the tag/CCX/cpu used.
> 
> 
>> +		return;
>> +
>> +	if (pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag))
>> +		return;
>> +
>> +	if (netif_running(irq->bp->dev)) {
>> +		rtnl_lock();
>> +		bnxt_close_nic(irq->bp, false, false);
>> +		bnxt_open_nic(irq->bp, false, false);
>> +		rtnl_unlock();
>> +	}
>> +}
>> +
>> +static void __bnxt_irq_affinity_release(struct kref __always_unused *ref)
>> +{
>> +}
>> +
>> +static void bnxt_release_irq_notifier(struct bnxt_irq *irq)
>> +{
>> +	irq_set_affinity_notifier(irq->vector, NULL);
>> +}
>> +
>> +static void bnxt_register_irq_notifier(struct bnxt *bp, struct bnxt_irq *irq)
>> +{
>> +	struct irq_affinity_notify *notify;
>> +
>> +	/* Nothing to do if TPH is not enabled */
>> +	if (!pcie_tph_enabled(bp->pdev))
>> +		return;
>> +
>> +	irq->bp = bp;
>> +
>> +	/* Register IRQ affinility notifier */
>> +	notify = &irq->affinity_notify;
>> +	notify->irq = irq->vector;
>> +	notify->notify = __bnxt_irq_affinity_notify;
>> +	notify->release = __bnxt_irq_affinity_release;
>> +
>> +	irq_set_affinity_notifier(irq->vector, notify);
>> +}
>> +
>>    static void bnxt_free_irq(struct bnxt *bp)
>>    {
>>    	struct bnxt_irq *irq;
>> @@ -10843,11 +10896,17 @@ static void bnxt_free_irq(struct bnxt *bp)
>>    				free_cpumask_var(irq->cpu_mask);
>>    				irq->have_cpumask = 0;
>>    			}
>> +
>> +			bnxt_release_irq_notifier(irq);
>> +
>>    			free_irq(irq->vector, bp->bnapi[i]);
>>    		}
>>    
>>    		irq->requested = 0;
>>    	}
>> +
>> +	/* Disable TPH support */
>> +	pcie_disable_tph(bp->pdev);
>>    }
>>    
>>    static int bnxt_request_irq(struct bnxt *bp)
>> @@ -10870,6 +10929,13 @@ static int bnxt_request_irq(struct bnxt *bp)
>>    	if (!(bp->flags & BNXT_FLAG_USING_MSIX))
>>    		flags = IRQF_SHARED;
>>    
>> +	/* Enable TPH support as part of IRQ request */
>> +	if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC) {
>> +		rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
>> +		if (rc)
>> +			netdev_warn(bp->dev, "failed enabling TPH support\n");
>> +	}
>> +
>>    	for (i = 0, j = 0; i < bp->cp_nr_rings; i++) {
>>    		int map_idx = bnxt_cp_num_to_irq_num(bp, i);
>>    		struct bnxt_irq *irq = &bp->irq_tbl[map_idx];
>> @@ -10893,8 +10959,10 @@ static int bnxt_request_irq(struct bnxt *bp)
>>    
>>    		if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) {
>>    			int numa_node = dev_to_node(&bp->pdev->dev);
>> +			u16 tag;
>>    
>>    			irq->have_cpumask = 1;
>> +			irq->msix_nr = map_idx;
>>    			cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>>    					irq->cpu_mask);
>>    			rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
>> @@ -10904,6 +10972,16 @@ static int bnxt_request_irq(struct bnxt *bp)
>>    					    irq->vector);
>>    				break;
>>    			}
>> +
>> +			bnxt_register_irq_notifier(bp, irq);
>> +
>> +			/* Init ST table entry */
>> +			if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
>> +						cpumask_first(irq->cpu_mask),
>> +						&tag))
>> +				break;
>> +
>> +			pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag);
>>    		}
>>    	}
>>    	return rc;
>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>> index 6bbdc718c3a7..ae1abcc1bddf 100644
>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>> @@ -1224,6 +1224,10 @@ struct bnxt_irq {
>>    	u8		have_cpumask:1;
>>    	char		name[IFNAMSIZ + 2];
>>    	cpumask_var_t	cpu_mask;
>> +
>> +	struct bnxt	*bp;
>> +	int		msix_nr;
>> +	struct irq_affinity_notify affinity_notify;
>>    };
>>    
>>    #define HWRM_RING_ALLOC_TX	0x1
Alejandro Lucero Palau Sept. 18, 2024, 5:31 p.m. UTC | #10
On 9/16/24 19:55, Wei Huang wrote:
>
>
> On 9/11/24 10:37 AM, Alejandro Lucero Palau wrote:
>>
>> On 8/22/24 21:41, Wei Huang wrote:
>>> From: Manoj Panicker <manoj.panicker2@amd.com>
>>>
>>> Implement TPH support in Broadcom BNXT device driver. The driver uses
>>> TPH functions to retrieve and configure the device's Steering Tags when
>>> its interrupt affinity is being changed.
>>>
>>> Co-developed-by: Wei Huang <wei.huang2@amd.com>
>>> Signed-off-by: Wei Huang <wei.huang2@amd.com>
>>> Signed-off-by: Manoj Panicker <manoj.panicker2@amd.com>
>>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>>> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
>>> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
>>> ---
>>>    drivers/net/ethernet/broadcom/bnxt/bnxt.c | 78 
>>> +++++++++++++++++++++++
>>>    drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++
>>>    2 files changed, 82 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
>>> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> index ffa74c26ee53..5903cd36b54d 100644
>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> @@ -55,6 +55,7 @@
>>>    #include <net/page_pool/helpers.h>
>>>    #include <linux/align.h>
>>>    #include <net/netdev_queues.h>
>>> +#include <linux/pci-tph.h>
>>>       #include "bnxt_hsi.h"
>>>    #include "bnxt.h"
>>> @@ -10821,6 +10822,58 @@ int bnxt_reserve_rings(struct bnxt *bp, 
>>> bool irq_re_init)
>>>        return 0;
>>>    }
>>>    +static void __bnxt_irq_affinity_notify(struct 
>>> irq_affinity_notify *notify,
>>> +                       const cpumask_t *mask)
>>> +{
>>> +    struct bnxt_irq *irq;
>>> +    u16 tag;
>>> +
>>> +    irq = container_of(notify, struct bnxt_irq, affinity_notify);
>>> +    cpumask_copy(irq->cpu_mask, mask);
>>> +
>>> +    if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
>>> +                cpumask_first(irq->cpu_mask), &tag))
>>
>>
>> I understand just one cpu from the mask has to be used, but I wonder if
>> some check should be done for ensuring the mask is not mad.
>>
>> This is control path and the related queue is going to be restarted, so
>> maybe a sanity check for ensuring all the cpus in the mask are from the
>> same CCX complex?
>
> I don't think this is always true and we shouldn't warn when this 
> happens. There is only one ST can be supported, so the driver need to 
> make a good judgement on which ST to be used. But no matter what, ST 
> is just a hint - it shouldn't cause any correctness issues in HW, even 
> when it is not the optimal target CPU. So warning is unnecessary.
>

1) You can use a "mad" mask for avoiding a specific interrupt to disturb 
a specific execution is those cores not part of the mask. But I argue 
the ST hint should not be set then.


2) Someone, maybe an automatic script, could try to get the best 
performance possible, and a "mad" mask could preclude such outcome 
inadvertently.


I agree a warning could not be a good idea because 1, but I would say 
adding some way of traceability here could be interesting. A tracepoint 
or a new ST field for last hint set for that interrupt/queue.


>>
>> That would be an iteration checking the tag is the same one for all of
>> them. If not, at least a warning stating the tag/CCX/cpu used.
>>
>>
>>> +        return;
>>> +
>>> +    if (pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag))
>>> +        return;
>>> +
>>> +    if (netif_running(irq->bp->dev)) {
>>> +        rtnl_lock();
>>> +        bnxt_close_nic(irq->bp, false, false);
>>> +        bnxt_open_nic(irq->bp, false, false);
>>> +        rtnl_unlock();
>>> +    }
>>> +}
>>> +
>>> +static void __bnxt_irq_affinity_release(struct kref __always_unused 
>>> *ref)
>>> +{
>>> +}
>>> +
>>> +static void bnxt_release_irq_notifier(struct bnxt_irq *irq)
>>> +{
>>> +    irq_set_affinity_notifier(irq->vector, NULL);
>>> +}
>>> +
>>> +static void bnxt_register_irq_notifier(struct bnxt *bp, struct 
>>> bnxt_irq *irq)
>>> +{
>>> +    struct irq_affinity_notify *notify;
>>> +
>>> +    /* Nothing to do if TPH is not enabled */
>>> +    if (!pcie_tph_enabled(bp->pdev))
>>> +        return;
>>> +
>>> +    irq->bp = bp;
>>> +
>>> +    /* Register IRQ affinility notifier */
>>> +    notify = &irq->affinity_notify;
>>> +    notify->irq = irq->vector;
>>> +    notify->notify = __bnxt_irq_affinity_notify;
>>> +    notify->release = __bnxt_irq_affinity_release;
>>> +
>>> +    irq_set_affinity_notifier(irq->vector, notify);
>>> +}
>>> +
>>>    static void bnxt_free_irq(struct bnxt *bp)
>>>    {
>>>        struct bnxt_irq *irq;
>>> @@ -10843,11 +10896,17 @@ static void bnxt_free_irq(struct bnxt *bp)
>>>                    free_cpumask_var(irq->cpu_mask);
>>>                    irq->have_cpumask = 0;
>>>                }
>>> +
>>> +            bnxt_release_irq_notifier(irq);
>>> +
>>>                free_irq(irq->vector, bp->bnapi[i]);
>>>            }
>>>               irq->requested = 0;
>>>        }
>>> +
>>> +    /* Disable TPH support */
>>> +    pcie_disable_tph(bp->pdev);
>>>    }
>>>       static int bnxt_request_irq(struct bnxt *bp)
>>> @@ -10870,6 +10929,13 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>        if (!(bp->flags & BNXT_FLAG_USING_MSIX))
>>>            flags = IRQF_SHARED;
>>>    +    /* Enable TPH support as part of IRQ request */
>>> +    if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC) {
>>> +        rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
>>> +        if (rc)
>>> +            netdev_warn(bp->dev, "failed enabling TPH support\n");
>>> +    }
>>> +
>>>        for (i = 0, j = 0; i < bp->cp_nr_rings; i++) {
>>>            int map_idx = bnxt_cp_num_to_irq_num(bp, i);
>>>            struct bnxt_irq *irq = &bp->irq_tbl[map_idx];
>>> @@ -10893,8 +10959,10 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>               if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) {
>>>                int numa_node = dev_to_node(&bp->pdev->dev);
>>> +            u16 tag;
>>>                   irq->have_cpumask = 1;
>>> +            irq->msix_nr = map_idx;
>>>                cpumask_set_cpu(cpumask_local_spread(i, numa_node),
>>>                        irq->cpu_mask);
>>>                rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
>>> @@ -10904,6 +10972,16 @@ static int bnxt_request_irq(struct bnxt *bp)
>>>                            irq->vector);
>>>                    break;
>>>                }
>>> +
>>> +            bnxt_register_irq_notifier(bp, irq);
>>> +
>>> +            /* Init ST table entry */
>>> +            if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
>>> +                        cpumask_first(irq->cpu_mask),
>>> +                        &tag))
>>> +                break;
>>> +
>>> +            pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag);
>>>            }
>>>        }
>>>        return rc;
>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
>>> b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> index 6bbdc718c3a7..ae1abcc1bddf 100644
>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
>>> @@ -1224,6 +1224,10 @@ struct bnxt_irq {
>>>        u8        have_cpumask:1;
>>>        char        name[IFNAMSIZ + 2];
>>>        cpumask_var_t    cpu_mask;
>>> +
>>> +    struct bnxt    *bp;
>>> +    int        msix_nr;
>>> +    struct irq_affinity_notify affinity_notify;
>>>    };
>>>       #define HWRM_RING_ALLOC_TX    0x1
Wei Huang Sept. 19, 2024, 4:14 p.m. UTC | #11
On 9/18/24 12:31, Alejandro Lucero Palau wrote:
> 
> On 9/16/24 19:55, Wei Huang wrote:
>>
>>
>> On 9/11/24 10:37 AM, Alejandro Lucero Palau wrote:
>>>
...
>>>
>>> I understand just one cpu from the mask has to be used, but I wonder if
>>> some check should be done for ensuring the mask is not mad.
>>>
>>> This is control path and the related queue is going to be restarted, so
>>> maybe a sanity check for ensuring all the cpus in the mask are from the
>>> same CCX complex?
>>
>> I don't think this is always true and we shouldn't warn when this
>> happens. There is only one ST can be supported, so the driver need to
>> make a good judgement on which ST to be used. But no matter what, ST
>> is just a hint - it shouldn't cause any correctness issues in HW, even
>> when it is not the optimal target CPU. So warning is unnecessary.
>>
> 
> 1) You can use a "mad" mask for avoiding a specific interrupt to disturb
> a specific execution is those cores not part of the mask. But I argue
> the ST hint should not be set then.
> 
> 
> 2) Someone, maybe an automatic script, could try to get the best
> performance possible, and a "mad" mask could preclude such outcome
> inadvertently.
> 

For this case, you can use the following command:

echo cpu_id > /proc/irq/nnn/smp_affinity_list

where nnn is the MSI IRQ number associated witht the device. This forces
IRQ to be associated with only one specific CPU.

> 
> I agree a warning could not be a good idea because 1, but I would say
> adding some way of traceability here could be interesting. A tracepoint
> or a new ST field for last hint set for that interrupt/queue.

We do have two pci_dbg() in tph.c. You can see the logs with proper
kernel print level. The logs show GET/SET ST values in what PCIe device,
which ST table, and at which index.

> 
> 
>>>
>>> That would be an iteration checking the tag is the same one for all of
>>> them. If not, at least a warning stating the tag/CCX/cpu used.
>>>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ffa74c26ee53..5903cd36b54d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -55,6 +55,7 @@ 
 #include <net/page_pool/helpers.h>
 #include <linux/align.h>
 #include <net/netdev_queues.h>
+#include <linux/pci-tph.h>
 
 #include "bnxt_hsi.h"
 #include "bnxt.h"
@@ -10821,6 +10822,58 @@  int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init)
 	return 0;
 }
 
+static void __bnxt_irq_affinity_notify(struct irq_affinity_notify *notify,
+				       const cpumask_t *mask)
+{
+	struct bnxt_irq *irq;
+	u16 tag;
+
+	irq = container_of(notify, struct bnxt_irq, affinity_notify);
+	cpumask_copy(irq->cpu_mask, mask);
+
+	if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
+				cpumask_first(irq->cpu_mask), &tag))
+		return;
+
+	if (pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag))
+		return;
+
+	if (netif_running(irq->bp->dev)) {
+		rtnl_lock();
+		bnxt_close_nic(irq->bp, false, false);
+		bnxt_open_nic(irq->bp, false, false);
+		rtnl_unlock();
+	}
+}
+
+static void __bnxt_irq_affinity_release(struct kref __always_unused *ref)
+{
+}
+
+static void bnxt_release_irq_notifier(struct bnxt_irq *irq)
+{
+	irq_set_affinity_notifier(irq->vector, NULL);
+}
+
+static void bnxt_register_irq_notifier(struct bnxt *bp, struct bnxt_irq *irq)
+{
+	struct irq_affinity_notify *notify;
+
+	/* Nothing to do if TPH is not enabled */
+	if (!pcie_tph_enabled(bp->pdev))
+		return;
+
+	irq->bp = bp;
+
+	/* Register IRQ affinility notifier */
+	notify = &irq->affinity_notify;
+	notify->irq = irq->vector;
+	notify->notify = __bnxt_irq_affinity_notify;
+	notify->release = __bnxt_irq_affinity_release;
+
+	irq_set_affinity_notifier(irq->vector, notify);
+}
+
 static void bnxt_free_irq(struct bnxt *bp)
 {
 	struct bnxt_irq *irq;
@@ -10843,11 +10896,17 @@  static void bnxt_free_irq(struct bnxt *bp)
 				free_cpumask_var(irq->cpu_mask);
 				irq->have_cpumask = 0;
 			}
+
+			bnxt_release_irq_notifier(irq);
+
 			free_irq(irq->vector, bp->bnapi[i]);
 		}
 
 		irq->requested = 0;
 	}
+
+	/* Disable TPH support */
+	pcie_disable_tph(bp->pdev);
 }
 
 static int bnxt_request_irq(struct bnxt *bp)
@@ -10870,6 +10929,13 @@  static int bnxt_request_irq(struct bnxt *bp)
 	if (!(bp->flags & BNXT_FLAG_USING_MSIX))
 		flags = IRQF_SHARED;
 
+	/* Enable TPH support as part of IRQ request */
+	if (pcie_tph_modes(bp->pdev) & PCI_TPH_CAP_INT_VEC) {
+		rc = pcie_enable_tph(bp->pdev, PCI_TPH_CAP_INT_VEC);
+		if (rc)
+			netdev_warn(bp->dev, "failed enabling TPH support\n");
+	}
+
 	for (i = 0, j = 0; i < bp->cp_nr_rings; i++) {
 		int map_idx = bnxt_cp_num_to_irq_num(bp, i);
 		struct bnxt_irq *irq = &bp->irq_tbl[map_idx];
@@ -10893,8 +10959,10 @@  static int bnxt_request_irq(struct bnxt *bp)
 
 		if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) {
 			int numa_node = dev_to_node(&bp->pdev->dev);
+			u16 tag;
 
 			irq->have_cpumask = 1;
+			irq->msix_nr = map_idx;
 			cpumask_set_cpu(cpumask_local_spread(i, numa_node),
 					irq->cpu_mask);
 			rc = irq_set_affinity_hint(irq->vector, irq->cpu_mask);
@@ -10904,6 +10972,16 @@  static int bnxt_request_irq(struct bnxt *bp)
 					    irq->vector);
 				break;
 			}
+
+			bnxt_register_irq_notifier(bp, irq);
+
+			/* Init ST table entry */
+			if (pcie_tph_get_cpu_st(irq->bp->pdev, TPH_MEM_TYPE_VM,
+						cpumask_first(irq->cpu_mask),
+						&tag))
+				break;
+
+			pcie_tph_set_st_entry(irq->bp->pdev, irq->msix_nr, tag);
 		}
 	}
 	return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 6bbdc718c3a7..ae1abcc1bddf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1224,6 +1224,10 @@  struct bnxt_irq {
 	u8		have_cpumask:1;
 	char		name[IFNAMSIZ + 2];
 	cpumask_var_t	cpu_mask;
+
+	struct bnxt	*bp;
+	int		msix_nr;
+	struct irq_affinity_notify affinity_notify;
 };
 
 #define HWRM_RING_ALLOC_TX	0x1