diff mbox series

[net,v1] net: stmmac: Disable PCS Link and AN interrupt when PCS AN is disabled

Message ID 20241018222407.1139697-1-quic_abchauha@quicinc.com (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series [net,v1] net: stmmac: Disable PCS Link and AN interrupt when PCS AN is disabled | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: quic_snehshah@quicinc.com; 2 maintainers not CCed: andrew+netdev@lunn.ch quic_snehshah@quicinc.com
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-10-19--00-00 (tests: 777)

Commit Message

Abhishek Chauhan (ABC) Oct. 18, 2024, 10:24 p.m. UTC
Currently we disable PCS ANE when the link speed is 2.5Gbps.
mac_link_up callback internally calls the fix_mac_speed which internally
calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps.

We observed that the CPU utilization is pretty high. That is because
we saw that the PCS interrupt status line for Link and AN always remain
asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense
to also disable the PCS link status and AN complete in the interrupt
enable register.

Interrupt storm Issue:-
[   25.465754][    C2] stmmac_pcs: Link Down
[   25.469888][    C2] stmmac_pcs: Link Down
[   25.474030][    C2] stmmac_pcs: Link Down
[   25.478164][    C2] stmmac_pcs: Link Down
[   25.482305][    C2] stmmac_pcs: Link Down
[   25.486441][    C2] stmmac_pcs: Link Down
[   25.486635][    C4] watchdog0: pretimeout event
[   25.490585][    C2] stmmac_pcs: Link Down
[   25.499341][    C2] stmmac_pcs: Link Down
[   25.503484][    C2] stmmac_pcs: Link Down
[   25.507619][    C2] stmmac_pcs: Link Down
[   25.511760][    C2] stmmac_pcs: Link Down
[   25.515897][    C2] stmmac_pcs: Link Down
[   25.520038][    C2] stmmac_pcs: Link Down
[   25.524174][    C2] stmmac_pcs: Link Down
[   25.528316][    C2] stmmac_pcs: Link Down
[   25.532451][    C2] stmmac_pcs: Link Down
[   25.536591][    C2] stmmac_pcs: Link Down
[   25.540724][    C2] stmmac_pcs: Link Down
[   25.544866][    C2] stmmac_pcs: Link Down

Once we disabled PCS ANE and Link Status interrupt issue
disappears.

Fixes: a818bd12538c ("net: stmmac: dwmac-qcom-ethqos: Add support for 2.5G SGMII")
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Andrew Lunn Oct. 19, 2024, 2:45 a.m. UTC | #1
On Fri, Oct 18, 2024 at 03:24:07PM -0700, Abhishek Chauhan wrote:
> Currently we disable PCS ANE when the link speed is 2.5Gbps.
> mac_link_up callback internally calls the fix_mac_speed which internally
> calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps.
> 
> We observed that the CPU utilization is pretty high. That is because
> we saw that the PCS interrupt status line for Link and AN always remain
> asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense
> to also disable the PCS link status and AN complete in the interrupt
> enable register.
> 
> Interrupt storm Issue:-
> [   25.465754][    C2] stmmac_pcs: Link Down
> [   25.469888][    C2] stmmac_pcs: Link Down
> [   25.474030][    C2] stmmac_pcs: Link Down
> [   25.478164][    C2] stmmac_pcs: Link Down
> [   25.482305][    C2] stmmac_pcs: Link Down

I don't know this code, so i cannot really comment if not enabling the
interrupt is the correct fix or not. But generally an interrupt storm
like this is cause because you are not acknowledging the interrupt
correctly to clear its status. So rather than not enabling it, maybe
you should check what is the correct way to clear the interrupt once
it happens?

	Andrew
Russell King (Oracle) Oct. 21, 2024, 9:20 a.m. UTC | #2
On Sat, Oct 19, 2024 at 04:45:16AM +0200, Andrew Lunn wrote:
> On Fri, Oct 18, 2024 at 03:24:07PM -0700, Abhishek Chauhan wrote:
> > Currently we disable PCS ANE when the link speed is 2.5Gbps.
> > mac_link_up callback internally calls the fix_mac_speed which internally
> > calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps.
> > 
> > We observed that the CPU utilization is pretty high. That is because
> > we saw that the PCS interrupt status line for Link and AN always remain
> > asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense
> > to also disable the PCS link status and AN complete in the interrupt
> > enable register.
> > 
> > Interrupt storm Issue:-
> > [   25.465754][    C2] stmmac_pcs: Link Down
> > [   25.469888][    C2] stmmac_pcs: Link Down
> > [   25.474030][    C2] stmmac_pcs: Link Down
> > [   25.478164][    C2] stmmac_pcs: Link Down
> > [   25.482305][    C2] stmmac_pcs: Link Down
> 
> I don't know this code, so i cannot really comment if not enabling the
> interrupt is the correct fix or not. But generally an interrupt storm
> like this is cause because you are not acknowledging the interrupt
> correctly to clear its status. So rather than not enabling it, maybe
> you should check what is the correct way to clear the interrupt once
> it happens?

stmmac PCS support is total crap and shouldn't be used, or stmmac
should not be using phylink. It's one or the other. Blame Serge for
this mess.
Russell King (Oracle) Oct. 21, 2024, 9:32 a.m. UTC | #3
On Mon, Oct 21, 2024 at 10:20:24AM +0100, Russell King (Oracle) wrote:
> On Sat, Oct 19, 2024 at 04:45:16AM +0200, Andrew Lunn wrote:
> > On Fri, Oct 18, 2024 at 03:24:07PM -0700, Abhishek Chauhan wrote:
> > > Currently we disable PCS ANE when the link speed is 2.5Gbps.
> > > mac_link_up callback internally calls the fix_mac_speed which internally
> > > calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps.
> > > 
> > > We observed that the CPU utilization is pretty high. That is because
> > > we saw that the PCS interrupt status line for Link and AN always remain
> > > asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense
> > > to also disable the PCS link status and AN complete in the interrupt
> > > enable register.
> > > 
> > > Interrupt storm Issue:-
> > > [   25.465754][    C2] stmmac_pcs: Link Down
> > > [   25.469888][    C2] stmmac_pcs: Link Down
> > > [   25.474030][    C2] stmmac_pcs: Link Down
> > > [   25.478164][    C2] stmmac_pcs: Link Down
> > > [   25.482305][    C2] stmmac_pcs: Link Down
> > 
> > I don't know this code, so i cannot really comment if not enabling the
> > interrupt is the correct fix or not. But generally an interrupt storm
> > like this is cause because you are not acknowledging the interrupt
> > correctly to clear its status. So rather than not enabling it, maybe
> > you should check what is the correct way to clear the interrupt once
> > it happens?
> 
> stmmac PCS support is total crap and shouldn't be used, or stmmac
> should not be using phylink. It's one or the other. Blame Serge for
> this mess.

Seriously, we could've had this fixed had the patch set I was working
on that fixed stmmac's _bad_ _conversion_ to phylink progressed to the
point of being merged.

The whole stmmac PCS support is broken, bypassing phylink.

This series also contained bug fixes for stuff like this interrupt
storm after Serge tested it. However, Serge wanted to turn my series
into his maze of indirect function pointers approach that I disagreed
with, and he wouldn't change his mind on that, so I deleted the series.

As I keep saying - either stmmac uses phylink *properly* and gets its
PCS hacks sorted out, or it does not use phylink *at* *all*. It's one
or the other.

I am not going to patch stmmac for any future phylink changes, and if
it breaks, then I'll just say "oh that's a shame, not my problem."
Blame Serge for that. I've had it with the pile of crap that is
stmmac.
Abhishek Chauhan (ABC) Oct. 21, 2024, 3:09 p.m. UTC | #4
On 10/21/2024 2:32 AM, Russell King (Oracle) wrote:
> On Mon, Oct 21, 2024 at 10:20:24AM +0100, Russell King (Oracle) wrote:
>> On Sat, Oct 19, 2024 at 04:45:16AM +0200, Andrew Lunn wrote:
>>> On Fri, Oct 18, 2024 at 03:24:07PM -0700, Abhishek Chauhan wrote:
>>>> Currently we disable PCS ANE when the link speed is 2.5Gbps.
>>>> mac_link_up callback internally calls the fix_mac_speed which internally
>>>> calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps.
>>>>
>>>> We observed that the CPU utilization is pretty high. That is because
>>>> we saw that the PCS interrupt status line for Link and AN always remain
>>>> asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense
>>>> to also disable the PCS link status and AN complete in the interrupt
>>>> enable register.
>>>>
>>>> Interrupt storm Issue:-
>>>> [   25.465754][    C2] stmmac_pcs: Link Down
>>>> [   25.469888][    C2] stmmac_pcs: Link Down
>>>> [   25.474030][    C2] stmmac_pcs: Link Down
>>>> [   25.478164][    C2] stmmac_pcs: Link Down
>>>> [   25.482305][    C2] stmmac_pcs: Link Down
>>>
>>> I don't know this code, so i cannot really comment if not enabling the
>>> interrupt is the correct fix or not. But generally an interrupt storm
>>> like this is cause because you are not acknowledging the interrupt
>>> correctly to clear its status. So rather than not enabling it, maybe
>>> you should check what is the correct way to clear the interrupt once
>>> it happens?
>>
>> stmmac PCS support is total crap and shouldn't be used, or stmmac
>> should not be using phylink. It's one or the other. Blame Serge for
>> this mess.
> 
> Seriously, we could've had this fixed had the patch set I was working
> on that fixed stmmac's _bad_ _conversion_ to phylink progressed to the
> point of being merged.
> 
> The whole stmmac PCS support is broken, bypassing phylink.
> 
> This series also contained bug fixes for stuff like this interrupt
> storm after Serge tested it. However, Serge wanted to turn my series
> into his maze of indirect function pointers approach that I disagreed
> with, and he wouldn't change his mind on that, so I deleted the series.
> 
> As I keep saying - either stmmac uses phylink *properly* and gets its
> PCS hacks sorted out, or it does not use phylink *at* *all*. It's one
> or the other.
> 
> I am not going to patch stmmac for any future phylink changes, and if
> it breaks, then I'll just say "oh that's a shame, not my problem."
> Blame Serge for that. I've had it with the pile of crap that is
> stmmac.
> 
Thanks Andrew and Russell for you review comments. 

Adding Serge here. 

Lets take a step back and see how i can help here to make sure 
we can get things merged and the discussion proceeds. 

Serge please help if can here. Thanks!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index e65a65666cc1..db77d07af9fe 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -751,7 +751,16 @@  static void dwmac4_flow_ctrl(struct mac_device_info *hw, unsigned int duplex,
 static void dwmac4_ctrl_ane(void __iomem *ioaddr, bool ane, bool srgmi_ral,
 			    bool loopback)
 {
+	u32 intr_mask = readl(ioaddr + GMAC_INT_EN);
+
 	dwmac_ctrl_ane(ioaddr, GMAC_PCS_BASE, ane, srgmi_ral, loopback);
+
+	if (!ane)
+		intr_mask &= ~(GMAC_INT_PCS_LINK | GMAC_INT_PCS_ANE);
+	else
+		intr_mask |= (GMAC_INT_PCS_LINK | GMAC_INT_PCS_ANE);
+
+	writel(intr_mask, ioaddr + GMAC_INT_EN);
 }
 
 static void dwmac4_get_adv_lp(void __iomem *ioaddr, struct rgmii_adv *adv)