diff mbox series

net: stmmac: Allow zero for [tr]x_fifo_size

Message ID 20250203093419.25804-1-steven.price@arm.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: stmmac: Allow zero for [tr]x_fifo_size | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 13 of 13 maintainers
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 36 this patch: 36
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-04--00-00 (tests: 886)

Commit Message

Steven Price Feb. 3, 2025, 9:34 a.m. UTC
Commit 8865d22656b4 ("net: stmmac: Specify hardware capability value
when FIFO size isn't specified") modified the behaviour to bail out if
both the FIFO size and the hardware capability were both set to zero.
However devices where has_gmac4 and has_xgmac are both false don't use
the fifo size and that commit breaks platforms for which these values
were zero.

Only warn and error out when (has_gmac4 || has_xgmac) where the values
are used and zero would cause problems, otherwise continue with the zero
values.

Fixes: 8865d22656b4 ("net: stmmac: Specify hardware capability value when FIFO size isn't specified")
Tested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Russell King (Oracle) Feb. 3, 2025, 10:38 a.m. UTC | #1
On Mon, Feb 03, 2025 at 09:34:18AM +0000, Steven Price wrote:
> Commit 8865d22656b4 ("net: stmmac: Specify hardware capability value
> when FIFO size isn't specified") modified the behaviour to bail out if
> both the FIFO size and the hardware capability were both set to zero.
> However devices where has_gmac4 and has_xgmac are both false don't use
> the fifo size and that commit breaks platforms for which these values
> were zero.
> 
> Only warn and error out when (has_gmac4 || has_xgmac) where the values
> are used and zero would cause problems, otherwise continue with the zero
> values.
> 
> Fixes: 8865d22656b4 ("net: stmmac: Specify hardware capability value when FIFO size isn't specified")
> Tested-by: Xi Ruoyao <xry111@xry111.site>
> Signed-off-by: Steven Price <steven.price@arm.com>

I'm still of the opinion that the original patch set was wrong, and
I was thinking at the time that it should _not_ have been submitted
for the "net" tree (it wasn't fixing a bug afaics, and was a risky
change.)

Yes, we had multiple places where we have code like:

        int rxfifosz = priv->plat->rx_fifo_size;
        int txfifosz = priv->plat->tx_fifo_size;

        if (rxfifosz == 0)
                rxfifosz = priv->dma_cap.rx_fifo_size;
        if (txfifosz == 0)
                txfifosz = priv->dma_cap.tx_fifo_size;

        /* Split up the shared Tx/Rx FIFO memory on DW QoS Eth and DW XGMAC */
        if (priv->plat->has_gmac4 || priv->plat->has_xgmac) {
                rxfifosz /= rx_channels_count;
                txfifosz /= tx_channels_count;
        }

and this is passed to stmmac_dma_rx_mode() and stmmac_dma_tx_mode().

We also have it in the stmmac_change_mtu() path for the transmit side,
which ensures that the MTU value is not larger than the transmit FIFO
size (which is going to fail as it's always done before or after the
original patch set, and whether or not your patch is applied.)

Now, as for the stmmac_dma_[tr]x_mode(), these are method functions
calling into the DMA code. dwmac4, dwmac1000, dwxgmac2, dwmac100 and
sun8i implement methods for this.

Of these, dwmac4, dwxgmac2 makes use of the value passed into
stmmac_dma_[tr]x_mode() - both of which initialise dma.[tr]x_fifo_size.
dwmac1000, dwmac100 and sun8i do not make use of it.

So, going back to the original patch series, I still question the value
of the changes there - and with your patch, it makes their value even
less because the justification seemed to be to ensure that
priv->plat->[tr]x_fifo_size contained a sensible value. With your patch
we're going back to a situation where we allow these to effectively be
"unset" or zero.

I'll ask the question straight out - with your patch applied, what is
the value of the original four patch series that caused the breakage?
Steven Price Feb. 3, 2025, 11:01 a.m. UTC | #2
[Moved Kunihiko to the To: line]

On 03/02/2025 10:38, Russell King (Oracle) wrote:
> On Mon, Feb 03, 2025 at 09:34:18AM +0000, Steven Price wrote:
>> Commit 8865d22656b4 ("net: stmmac: Specify hardware capability value
>> when FIFO size isn't specified") modified the behaviour to bail out if
>> both the FIFO size and the hardware capability were both set to zero.
>> However devices where has_gmac4 and has_xgmac are both false don't use
>> the fifo size and that commit breaks platforms for which these values
>> were zero.
>>
>> Only warn and error out when (has_gmac4 || has_xgmac) where the values
>> are used and zero would cause problems, otherwise continue with the zero
>> values.
>>
>> Fixes: 8865d22656b4 ("net: stmmac: Specify hardware capability value when FIFO size isn't specified")
>> Tested-by: Xi Ruoyao <xry111@xry111.site>
>> Signed-off-by: Steven Price <steven.price@arm.com>
> 
> I'm still of the opinion that the original patch set was wrong, and
> I was thinking at the time that it should _not_ have been submitted
> for the "net" tree (it wasn't fixing a bug afaics, and was a risky
> change.)
> 
> Yes, we had multiple places where we have code like:
> 
>         int rxfifosz = priv->plat->rx_fifo_size;
>         int txfifosz = priv->plat->tx_fifo_size;
> 
>         if (rxfifosz == 0)
>                 rxfifosz = priv->dma_cap.rx_fifo_size;
>         if (txfifosz == 0)
>                 txfifosz = priv->dma_cap.tx_fifo_size;
> 
>         /* Split up the shared Tx/Rx FIFO memory on DW QoS Eth and DW XGMAC */
>         if (priv->plat->has_gmac4 || priv->plat->has_xgmac) {
>                 rxfifosz /= rx_channels_count;
>                 txfifosz /= tx_channels_count;
>         }
> 
> and this is passed to stmmac_dma_rx_mode() and stmmac_dma_tx_mode().
> 
> We also have it in the stmmac_change_mtu() path for the transmit side,
> which ensures that the MTU value is not larger than the transmit FIFO
> size (which is going to fail as it's always done before or after the
> original patch set, and whether or not your patch is applied.)
> 
> Now, as for the stmmac_dma_[tr]x_mode(), these are method functions
> calling into the DMA code. dwmac4, dwmac1000, dwxgmac2, dwmac100 and
> sun8i implement methods for this.
> 
> Of these, dwmac4, dwxgmac2 makes use of the value passed into
> stmmac_dma_[tr]x_mode() - both of which initialise dma.[tr]x_fifo_size.
> dwmac1000, dwmac100 and sun8i do not make use of it.
> 
> So, going back to the original patch series, I still question the value
> of the changes there - and with your patch, it makes their value even
> less because the justification seemed to be to ensure that
> priv->plat->[tr]x_fifo_size contained a sensible value. With your patch
> we're going back to a situation where we allow these to effectively be
> "unset" or zero.
> 
> I'll ask the question straight out - with your patch applied, what is
> the value of the original four patch series that caused the breakage?
> 

I've no opinion whether the original series "had value" - I'm just 
trying to fix the breakage that entailed. My first attempt at a patch 
was indeed a (partial) revert, but Andrew was keen to find a better 
solution[1].

I'd prefer we don't delay getting a fix merged arguing about the finer 
details on this. Obviously once a fix is merged the code can be
improved at leisure. If you want to propose a straight revert then
by all means send the patch and I'll post a Tested-By.

Steve

[1] https://lore.kernel.org/all/fc08926d-b9af-428f-8811-4bfe08acc5b7@lunn.ch/
Russell King (Oracle) Feb. 3, 2025, 11:16 a.m. UTC | #3
On Mon, Feb 03, 2025 at 11:01:28AM +0000, Steven Price wrote:
> [Moved Kunihiko to the To: line]
> 
> On 03/02/2025 10:38, Russell King (Oracle) wrote:
> > On Mon, Feb 03, 2025 at 09:34:18AM +0000, Steven Price wrote:
> >> Commit 8865d22656b4 ("net: stmmac: Specify hardware capability value
> >> when FIFO size isn't specified") modified the behaviour to bail out if
> >> both the FIFO size and the hardware capability were both set to zero.
> >> However devices where has_gmac4 and has_xgmac are both false don't use
> >> the fifo size and that commit breaks platforms for which these values
> >> were zero.
> >>
> >> Only warn and error out when (has_gmac4 || has_xgmac) where the values
> >> are used and zero would cause problems, otherwise continue with the zero
> >> values.
> >>
> >> Fixes: 8865d22656b4 ("net: stmmac: Specify hardware capability value when FIFO size isn't specified")
> >> Tested-by: Xi Ruoyao <xry111@xry111.site>
> >> Signed-off-by: Steven Price <steven.price@arm.com>
> > 
> > I'm still of the opinion that the original patch set was wrong, and
> > I was thinking at the time that it should _not_ have been submitted
> > for the "net" tree (it wasn't fixing a bug afaics, and was a risky
> > change.)
> > 
> > Yes, we had multiple places where we have code like:
> > 
> >         int rxfifosz = priv->plat->rx_fifo_size;
> >         int txfifosz = priv->plat->tx_fifo_size;
> > 
> >         if (rxfifosz == 0)
> >                 rxfifosz = priv->dma_cap.rx_fifo_size;
> >         if (txfifosz == 0)
> >                 txfifosz = priv->dma_cap.tx_fifo_size;
> > 
> >         /* Split up the shared Tx/Rx FIFO memory on DW QoS Eth and DW XGMAC */
> >         if (priv->plat->has_gmac4 || priv->plat->has_xgmac) {
> >                 rxfifosz /= rx_channels_count;
> >                 txfifosz /= tx_channels_count;
> >         }
> > 
> > and this is passed to stmmac_dma_rx_mode() and stmmac_dma_tx_mode().
> > 
> > We also have it in the stmmac_change_mtu() path for the transmit side,
> > which ensures that the MTU value is not larger than the transmit FIFO
> > size (which is going to fail as it's always done before or after the
> > original patch set, and whether or not your patch is applied.)
> > 
> > Now, as for the stmmac_dma_[tr]x_mode(), these are method functions
> > calling into the DMA code. dwmac4, dwmac1000, dwxgmac2, dwmac100 and
> > sun8i implement methods for this.
> > 
> > Of these, dwmac4, dwxgmac2 makes use of the value passed into
> > stmmac_dma_[tr]x_mode() - both of which initialise dma.[tr]x_fifo_size.
> > dwmac1000, dwmac100 and sun8i do not make use of it.
> > 
> > So, going back to the original patch series, I still question the value
> > of the changes there - and with your patch, it makes their value even
> > less because the justification seemed to be to ensure that
> > priv->plat->[tr]x_fifo_size contained a sensible value. With your patch
> > we're going back to a situation where we allow these to effectively be
> > "unset" or zero.
> > 
> > I'll ask the question straight out - with your patch applied, what is
> > the value of the original four patch series that caused the breakage?
> > 
> 
> I've no opinion whether the original series "had value" - I'm just 
> trying to fix the breakage that entailed. My first attempt at a patch 
> was indeed a (partial) revert, but Andrew was keen to find a better 
> solution[1].

There are two ways to fix the breakage - either revert the original
patches (which if they have little value now would be the sensible
approach IMHO) or try to fix them up, which may entail several patches
if further breakage is found.

Does the flow control test behave the same before and after the patch
series? Please can you test that?

See
drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c::stmmac_test_flowctrl()

Thanks.
Steven Price Feb. 3, 2025, 11:40 a.m. UTC | #4
On 03/02/2025 11:16, Russell King (Oracle) wrote:
> On Mon, Feb 03, 2025 at 11:01:28AM +0000, Steven Price wrote:
>> [Moved Kunihiko to the To: line]
>>
>> On 03/02/2025 10:38, Russell King (Oracle) wrote:
>>> On Mon, Feb 03, 2025 at 09:34:18AM +0000, Steven Price wrote:
>>>> Commit 8865d22656b4 ("net: stmmac: Specify hardware capability value
>>>> when FIFO size isn't specified") modified the behaviour to bail out if
>>>> both the FIFO size and the hardware capability were both set to zero.
>>>> However devices where has_gmac4 and has_xgmac are both false don't use
>>>> the fifo size and that commit breaks platforms for which these values
>>>> were zero.
>>>>
>>>> Only warn and error out when (has_gmac4 || has_xgmac) where the values
>>>> are used and zero would cause problems, otherwise continue with the zero
>>>> values.
>>>>
>>>> Fixes: 8865d22656b4 ("net: stmmac: Specify hardware capability value when FIFO size isn't specified")
>>>> Tested-by: Xi Ruoyao <xry111@xry111.site>
>>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>>
>>> I'm still of the opinion that the original patch set was wrong, and
>>> I was thinking at the time that it should _not_ have been submitted
>>> for the "net" tree (it wasn't fixing a bug afaics, and was a risky
>>> change.)
>>>
>>> Yes, we had multiple places where we have code like:
>>>
>>>         int rxfifosz = priv->plat->rx_fifo_size;
>>>         int txfifosz = priv->plat->tx_fifo_size;
>>>
>>>         if (rxfifosz == 0)
>>>                 rxfifosz = priv->dma_cap.rx_fifo_size;
>>>         if (txfifosz == 0)
>>>                 txfifosz = priv->dma_cap.tx_fifo_size;
>>>
>>>         /* Split up the shared Tx/Rx FIFO memory on DW QoS Eth and DW XGMAC */
>>>         if (priv->plat->has_gmac4 || priv->plat->has_xgmac) {
>>>                 rxfifosz /= rx_channels_count;
>>>                 txfifosz /= tx_channels_count;
>>>         }
>>>
>>> and this is passed to stmmac_dma_rx_mode() and stmmac_dma_tx_mode().
>>>
>>> We also have it in the stmmac_change_mtu() path for the transmit side,
>>> which ensures that the MTU value is not larger than the transmit FIFO
>>> size (which is going to fail as it's always done before or after the
>>> original patch set, and whether or not your patch is applied.)
>>>
>>> Now, as for the stmmac_dma_[tr]x_mode(), these are method functions
>>> calling into the DMA code. dwmac4, dwmac1000, dwxgmac2, dwmac100 and
>>> sun8i implement methods for this.
>>>
>>> Of these, dwmac4, dwxgmac2 makes use of the value passed into
>>> stmmac_dma_[tr]x_mode() - both of which initialise dma.[tr]x_fifo_size.
>>> dwmac1000, dwmac100 and sun8i do not make use of it.
>>>
>>> So, going back to the original patch series, I still question the value
>>> of the changes there - and with your patch, it makes their value even
>>> less because the justification seemed to be to ensure that
>>> priv->plat->[tr]x_fifo_size contained a sensible value. With your patch
>>> we're going back to a situation where we allow these to effectively be
>>> "unset" or zero.
>>>
>>> I'll ask the question straight out - with your patch applied, what is
>>> the value of the original four patch series that caused the breakage?
>>>
>>
>> I've no opinion whether the original series "had value" - I'm just 
>> trying to fix the breakage that entailed. My first attempt at a patch 
>> was indeed a (partial) revert, but Andrew was keen to find a better 
>> solution[1].
> 
> There are two ways to fix the breakage - either revert the original
> patches (which if they have little value now would be the sensible
> approach IMHO) or try to fix them up, which may entail several patches
> if further breakage is found.
> 
> Does the flow control test behave the same before and after the patch
> series? Please can you test that?

Yes I see the same results from "ethtool -t eth0" on v6.13 and after
applying this patch on v6.14-rc1. Although neither exactly look healthy:

The test result is FAIL
The test extra info:
 1. MAC Loopback               	 0
 2. PHY Loopback               	 -110
 3. MMC Counters               	 0
 4. EEE                        	 -95
 5. Hash Filter MC             	 0
 6. Perfect Filter UC          	 -110
 7. MC Filter                  	 -110
 8. UC Filter                  	 0
 9. Flow Control               	 -110
10. RSS                        	 -95
11. VLAN Filtering             	 -95
12. VLAN Filtering (perf)      	 -95
13. Double VLAN Filter         	 -95
14. Double VLAN Filter (perf)  	 -95
15. Flexible RX Parser         	 -95
16. SA Insertion (desc)        	 -95
17. SA Replacement (desc)      	 -95
18. SA Insertion (reg)         	 -95
19. SA Replacement (reg)       	 -95
20. VLAN TX Insertion          	 -95
21. SVLAN TX Insertion         	 -95
22. L3 DA Filtering            	 -95
23. L3 SA Filtering            	 -95
24. L4 DA TCP Filtering        	 -95
25. L4 SA TCP Filtering        	 -95
26. L4 DA UDP Filtering        	 -95
27. L4 SA UDP Filtering        	 -95
28. ARP Offload                	 -95
29. Jumbo Frame                	 1
30. Multichannel Jumbo         	 -95
31. Split Header               	 -95
32. TBS (ETF Scheduler)        	 -95

But I'll admit I've no idea what I'm doing here, so perhaps I don't have
a correct setup for running these tests?

Thanks,
Steve
Jakub Kicinski Feb. 3, 2025, 10:23 p.m. UTC | #5
On Mon, 3 Feb 2025 11:16:34 +0000 Russell King (Oracle) wrote:
> > I've no opinion whether the original series "had value" - I'm just 
> > trying to fix the breakage that entailed. My first attempt at a patch 
> > was indeed a (partial) revert, but Andrew was keen to find a better 
> > solution[1].  
> 
> There are two ways to fix the breakage - either revert the original
> patches (which if they have little value now would be the sensible
> approach IMHO)

+1, I also vote revert FWIW
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index d04543e5697b..821404beb629 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -7222,7 +7222,7 @@  static int stmmac_hw_init(struct stmmac_priv *priv)
 	if (!priv->plat->rx_fifo_size) {
 		if (priv->dma_cap.rx_fifo_size) {
 			priv->plat->rx_fifo_size = priv->dma_cap.rx_fifo_size;
-		} else {
+		} else if (priv->plat->has_gmac4 || priv->plat->has_xgmac) {
 			dev_err(priv->device, "Can't specify Rx FIFO size\n");
 			return -ENODEV;
 		}
@@ -7236,7 +7236,7 @@  static int stmmac_hw_init(struct stmmac_priv *priv)
 	if (!priv->plat->tx_fifo_size) {
 		if (priv->dma_cap.tx_fifo_size) {
 			priv->plat->tx_fifo_size = priv->dma_cap.tx_fifo_size;
-		} else {
+		} else if (priv->plat->has_gmac4 || priv->plat->has_xgmac) {
 			dev_err(priv->device, "Can't specify Tx FIFO size\n");
 			return -ENODEV;
 		}