mbox series

[RFC,0/6] net: stmmac: convert stmmac "pcs" to phylink

Message ID ZkDuJAx7atDXjf5m@shell.armlinux.org.uk (mailing list archive)
Headers show
Series net: stmmac: convert stmmac "pcs" to phylink | expand

Message

Russell King (Oracle) May 12, 2024, 4:28 p.m. UTC
Hi,

As I noted recently in a thread (and was ignored) stmmac sucks. (I
won't hide my distain for drivers that make my life as phylink
maintainer more difficult!)

One of the contract conditions for using phylink is that the driver
will _not_ mess with the netif carrier. stmmac developers/maintainers
clearly didn't read that, because stmmac messes with the netif
carrier, which destroys phylink's guarantee that it'll make certain
calls in a particular order (e.g. it won't call mac_link_up() twice
in a row without an intervening mac_link_down().) This is clearly
stated in the phylink documentation.

Thus, this patch set attempts to fix this. Why does it mess with the
netif carrier? It has its own independent PCS implementation that
completely bypasses phylink _while_ phylink is still being used.
This is not acceptable. Either the driver uses phylink, or it doesn't
use phylink. There is no half-way house about this. Therefore, this
driver needs to either be fixed, or needs to stop using phylink.

Since I was ignored when I brought this up, I've hacked together the
following patch set - and it is hacky at the moment. It's also broken
because of recentl changes involving dwmac-qcom-ethqos.c - but there
isn't sufficient information in the driver for me to fix this. The
driver appears to use SGMII at 2500Mbps, which simply does not exist.
What interface mode (and neg_mode) does phylink pass to pcs_config()
in each of the speeds that dwmac-qcom-ethqos.c is interested in.
Without this information, I can't do that conversion. So for the
purposes of this, I've just ignored dwmac-qcom-ethqos.c (which means
it will fail to build.)

The patch splitup is not ideal, but that's not what I'm interested in
here. What I want to hear is the results of testing - does this switch
of the RGMII/SGMII "pcs" stuff to a phylink_pcs work for this driver?

Please don't review the patches, but you are welcome to send fixes to
them. Once we know that the overall implementation works, then I'll
look at how best to split the patches. In the mean time, the present
form is more convenient for making changes and fixing things.

There is still more improvement that's needed here.

Thanks.

 drivers/net/ethernet/stmicro/stmmac/Makefile       |   2 +-
 drivers/net/ethernet/stmicro/stmmac/common.h       |  12 ++-
 .../net/ethernet/stmicro/stmmac/dwmac1000_core.c   | 113 ++++++++++++---------
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c  | 108 ++++++++++++--------
 .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c    |   6 --
 drivers/net/ethernet/stmicro/stmmac/hwif.h         |  27 ++---
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   | 111 +-------------------
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  19 ++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_pcs.c   |  57 +++++++++++
 drivers/net/ethernet/stmicro/stmmac/stmmac_pcs.h   |  84 ++-------------
 10 files changed, 227 insertions(+), 312 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/stmmac_pcs.c

Comments

Serge Semin May 13, 2024, 11:04 p.m. UTC | #1
Hi Russell

On Sun, May 12, 2024 at 05:28:20PM +0100, Russell King (Oracle) wrote:
> Hi,
> 

> As I noted recently in a thread (and was ignored)

Sorry about that. I didn't mean to ignore. Your message reached me
right when I caught a cold, which made me AFK for the rest of the
week.(

> As I noted recently in a thread (and was ignored) stmmac sucks.

Can't argue with that. There are much more aspects in what it sucks
than just the netif's. One glimpse at the plat_stmmacenet_data
structure causes million questions aka why, how come, what the hell...
I'll start submitting my cleanup patch sets after my another
networking work (DW XPCS wise) is finally done, re-submitted, reviewed
and merged in.

> (I
> won't hide my distain for drivers that make my life as phylink
> maintainer more difficult!)
> 
> One of the contract conditions for using phylink is that the driver
> will _not_ mess with the netif carrier. stmmac developers/maintainers
> clearly didn't read that, because stmmac messes with the netif
> carrier, which destroys phylink's guarantee that it'll make certain
> calls in a particular order (e.g. it won't call mac_link_up() twice
> in a row without an intervening mac_link_down().) This is clearly
> stated in the phylink documentation.
> 
> Thus, this patch set attempts to fix this. Why does it mess with the
> netif carrier? It has its own independent PCS implementation that
> completely bypasses phylink _while_ phylink is still being used.
> This is not acceptable. Either the driver uses phylink, or it doesn't
> use phylink. There is no half-way house about this. Therefore, this
> driver needs to either be fixed, or needs to stop using phylink.

Thanks for submitting the series, especially for making the RGMII
in-band status well-implemented in the driver. When I was studying the
STMMAC internals I was surprised that it wasn't actually utilized for
something useful. Furthermore at some point afterwards even the
RGSMIIIS IRQ turned to be disabled. So the RGMII-part of the code has
been completely unused after that. But even before that the RGMII
in-band status change IRQ was utilized just to print the link state
into the system log. 

> 
> Since I was ignored when I brought this up, I've hacked together the
> following patch set - and it is hacky at the moment. It's also broken
> because of recentl changes involving dwmac-qcom-ethqos.c - but there
> isn't sufficient information in the driver for me to fix this. The
> driver appears to use SGMII at 2500Mbps, which simply does not exist.
> What interface mode (and neg_mode) does phylink pass to pcs_config()
> in each of the speeds that dwmac-qcom-ethqos.c is interested in.
> Without this information, I can't do that conversion. So for the
> purposes of this, I've just ignored dwmac-qcom-ethqos.c (which means
> it will fail to build.)
> 
> The patch splitup is not ideal, but that's not what I'm interested in
> here. What I want to hear is the results of testing - does this switch
> of the RGMII/SGMII "pcs" stuff to a phylink_pcs work for this driver?
> 
> Please don't review the patches, but you are welcome to send fixes to
> them. Once we know that the overall implementation works, then I'll
> look at how best to split the patches. In the mean time, the present
> form is more convenient for making changes and fixing things.

I'll give your series a try later on this week on my DW GMAC with the
RGMII interface (alas I don't have an SGMII capable device, so can't
help with the AN-part testing). Today I've made a brief glance on it
and already noted a few places which may require a fix to make the
change working for RGMII (at least the RGSMIIIS IRQ must be got back
enabled). After making the patch set working for my device in what
form would you prefer me to submit the fixes? As incremental patches
in-reply to this thread?

-Serge(y)

> 
> There is still more improvement that's needed here.
> 
> Thanks.
> 
>  drivers/net/ethernet/stmicro/stmmac/Makefile       |   2 +-
>  drivers/net/ethernet/stmicro/stmmac/common.h       |  12 ++-
>  .../net/ethernet/stmicro/stmmac/dwmac1000_core.c   | 113 ++++++++++++---------
>  drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c  | 108 ++++++++++++--------
>  .../net/ethernet/stmicro/stmmac/dwxgmac2_core.c    |   6 --
>  drivers/net/ethernet/stmicro/stmmac/hwif.h         |  27 ++---
>  .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   | 111 +-------------------
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  19 ++--
>  drivers/net/ethernet/stmicro/stmmac/stmmac_pcs.c   |  57 +++++++++++
>  drivers/net/ethernet/stmicro/stmmac/stmmac_pcs.h   |  84 ++-------------
>  10 files changed, 227 insertions(+), 312 deletions(-)
>  create mode 100644 drivers/net/ethernet/stmicro/stmmac/stmmac_pcs.c
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Russell King (Oracle) May 13, 2024, 11:21 p.m. UTC | #2
On Tue, May 14, 2024 at 02:04:00AM +0300, Serge Semin wrote:
> Hi Russell
> 
> I'll give your series a try later on this week on my DW GMAC with the
> RGMII interface (alas I don't have an SGMII capable device, so can't
> help with the AN-part testing).

Thanks!

> Today I've made a brief glance on it
> and already noted a few places which may require a fix to make the
> change working for RGMII (at least the RGSMIIIS IRQ must be got back
> enabled). After making the patch set working for my device in what
> form would you prefer me to submit the fixes? As incremental patches
> in-reply to this thread?

I think it depends on where the issues are.

If they are addressing issues that are also present in the existing
code, then it would make sense to get those patched as the driver
stands today, because backporting them to stable would be easier.

If they are for "new" issues, given that this patch series is more
or less experimental, I would prefer to roll them into these
patches. As mentioned, when it comes to submitting these patches,
the way I've split them wouldn't make much sense, but it does
make sense for where I am with it. Hence, I'll want to resplit
the series into something better for submission than it currently
is. If you want to reply to this thread, that is fine.

There's still a few netif_carrier_off()/netif_carrier_on()s left
in the driver even after this patch series, but I think they're in
more obscure paths, but I will also want to address those as well.
Serge Semin May 13, 2024, 11:53 p.m. UTC | #3
On Tue, May 14, 2024 at 12:21:42AM +0100, Russell King (Oracle) wrote:
> On Tue, May 14, 2024 at 02:04:00AM +0300, Serge Semin wrote:
> > Hi Russell
> > 
> > I'll give your series a try later on this week on my DW GMAC with the
> > RGMII interface (alas I don't have an SGMII capable device, so can't
> > help with the AN-part testing).
> 
> Thanks!
> 
> > Today I've made a brief glance on it
> > and already noted a few places which may require a fix to make the
> > change working for RGMII (at least the RGSMIIIS IRQ must be got back
> > enabled). After making the patch set working for my device in what
> > form would you prefer me to submit the fixes? As incremental patches
> > in-reply to this thread?
> 
> I think it depends on where the issues are.
> 

> If they are addressing issues that are also present in the existing
> code, then it would make sense to get those patched as the driver
> stands today, because backporting them to stable would be easier.
> 

Sure. If I get to find any problem with the existing code I'll submit
a fix as an independent patch.

> If they are for "new" issues, given that this patch series is more
> or less experimental, I would prefer to roll them into these
> patches. As mentioned, when it comes to submitting these patches,
> the way I've split them wouldn't make much sense, but it does
> make sense for where I am with it. Hence, I'll want to resplit
> the series into something better for submission than it currently
> is. If you want to reply to this thread, that is fine.

I was meaning the "new" issues. Ok then. I'll prepare the fixes as
incremental patches either on top of the entire series or on top of
the respective patches, so the altered parts could be spotted right
out of the patches body. Then I'll submit them in-reply to the
respective messages in this patch set. After that you'll be able to
squash them up into the commits in your repo and re-shuffle the
changes as would be more appropriate.

> 
> There's still a few netif_carrier_off()/netif_carrier_on()s left
> in the driver even after this patch series, but I think they're in
> more obscure paths, but I will also want to address those as well.

Yeah, I noticed that. I was going to discuss this matter after getting
the changes tested.

-Serge(y)

> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Serge Semin May 24, 2024, 11:54 p.m. UTC | #4
Hi Russell

On Tue, May 14, 2024 at 12:21:42AM +0100, Russell King (Oracle) wrote:
> On Tue, May 14, 2024 at 02:04:00AM +0300, Serge Semin wrote:
> > Hi Russell
> > 
> > I'll give your series a try later on this week on my DW GMAC with the
> > RGMII interface (alas I don't have an SGMII capable device, so can't
> > help with the AN-part testing).
> 
> Thanks!
> 
> > Today I've made a brief glance on it
> > and already noted a few places which may require a fix to make the
> > change working for RGMII (at least the RGSMIIIS IRQ must be got back
> > enabled). After making the patch set working for my device in what
> > form would you prefer me to submit the fixes? As incremental patches
> > in-reply to this thread?
> 
> I think it depends on where the issues are.
> 
> If they are addressing issues that are also present in the existing
> code, then it would make sense to get those patched as the driver
> stands today, because backporting them to stable would be easier.
> 
> If they are for "new" issues, given that this patch series is more
> or less experimental, I would prefer to roll them into these
> patches. As mentioned, when it comes to submitting these patches,
> the way I've split them wouldn't make much sense, but it does
> make sense for where I am with it. Hence, I'll want to resplit
> the series into something better for submission than it currently
> is. If you want to reply to this thread, that is fine.

I've just submitted the fixes for your series
https://lore.kernel.org/netdev/20240524210304.9164-1-fancer.lancer@gmail.com/
which make it working well on my hardware: DW GMAC v3.73 with RGMII
PHY interface connected to the Micrel KSZ9031RNX PHY. (For a lucky
coincident the PHY happen to support the link status sent in-band up
to the MAC.) So as long as the managed="in-band-status" property is
specified the PCS subsystem gets the link-status by means of the
pcs_get_state() callback. The status change is signaled by means of
the RGSMIIIS IRQ.  For that to work the Patch 2 of my series was
required (and of course Patch 1 which prevents the IRQs flood).

I'm sorry for submitting the series only today. First I had to dig
deeper into the way the RGMII In-band/PCS-block of the controller
works. Then I needed some time to study the STMMAC PCS-code and to
debug the problems fixed in my series. So I finished testing your
patchset on this Monday. Then I decided to spend sometime for making
the PCS implementation looking more optimized based on the knowledge I
gained during the debugging. But as it's normal for the STMMAC driver
(which sucks indeed) a few days wasn't enough for that, because due to
the driver being overwhelmed with duty hacks any more-or-less invasive
refactoring may lead to regressions here or there. So I stuck right at
the first step of getting the "snps,ps-speed" and the MAC2MAC mode
well implemented...(

Anyway here is the key points regarding the RGMII In-band and
PCS-interface implemented in the DW GMAC and DW QoS Eth
controllers/drivers:

1. Intermediate PCS for which the plat_stmmacenet_data::mac_interface
field and the "mac-mode" property was introduced isn't the case of the
PCS embedded into the DW GMAC and DW QoS Eth IP-cores by Synopsys.
That was some another PCS likely specific for the Altera SoC FPGA
platform (dwmac-socfpga.c).

2. RGMII: There is no any PCS-block utilized in case of the RGMII PHY
interface (that's why HWFEATURE.PCSSEL flag isn't set). The networking
controller provides a way to pick up the RGMII In-band status
delivered from the attached PHY. The in-band status is updated in the
GMAC_RGSMIIIS (DW GMAC) and in the GMAC_PHYIF_CTRLSTATUS (DW QoS Eth)
registers and signalled via the RGSMIIIS MAC IRQ.

3. SGMII: The interface implementation has a PCS-block so the
HWFEATURE.PCSSEL flag is set. But the auto-negotiation procedure
complies to the SGMII specification: no ability advertisement. SGMII
PHY sends the control information back to the MAC by means of the
tx_config_Reg[15:0] register. MAC either acknowledges the update or
not. The control information retrieved from the PHY is reflected in
the GMAC_RGSMIIIS (DW GMAC) and in the GMAC_PHYIF_CTRLSTATUS (DW QoS
Eth) registers. The only AN-related CSR available for the SGMII
interface are GMAC_AN_CTRL(x) and GMAC_AN_STATUS(x) since no
advertisement implied by the specification.

4. RGMII/SGMII/SMII: Note CSR-wise the tx_config_Reg[15:0] register
mapping is the same for all of these interfaces. It's available in the
GMAC_RGSMIIIS (DW GMAC) and in the GMAC_PHYIF_CTRLSTATUS (DW QoS Eth)
CSRs: in case of the DW GMAC it's GMAC_RGSMIIIS[0:15] bits, but in
case of DW QoS Eth it's GMAC_PHYIF_CTRLSTATUS[16:31]. (This info can
be useful to create a common dwmac_inband_pcs_get_state() method
implementation in the stmmac_pcs.c module.)

5. TBI/RTBI: It has a traditional auto-negotiation procedure fully
complying to the IEEE 802.3z C37 specification with the link abilities
advertisement. RBI/RTBI don't imply any in-band link status detection
so the GMAC_RGSMIIIS (DW GMAC) and in the GMAC_PHYIF_CTRLSTATUS (DW
QoS Eth) CSRs aren't available for these interfaces.

6. RGMII/SGMII/SMII: In order to have the Link Speed
(GMAC_CONTROL.{PS,FES}), Duplex mode (GMAC_CONTROL.DM) and Link
Up/Down bit (GMAC_CONTROL.LUD or GMAC_PHYIF_CTRLSTATUS.LUD)
transferred from MAC to the attached PHY or to another MAC via
tx_config_Reg[15:0], the GMAC_CONTROL.TC (DW GMAC) or
GMAC_PHYIF_CTRLSTATUS.TC (DW QoS Eth) flags must be set. Just a note
seeing the current PCS implementation doesn't do that even in case of
the fixed Port-select speed setup (when snps,ps-speed property is
specified).


Based on the info above I was going to extend your stmmac_pcs.c module
with the inband link status (retrieved via the tx_config_Reg[15:0])
parsing method; create more basic PCS-ops in the framework of the
dwmac1000_core.c and dwmac4_core.c modules, and the common
phylink_pcs_ops in the stmmac_main.c module using those basic PCS-ops.
But as I mentioned before I was stuck on the fixed Port-select speed
implementation. It's activated via the "snps,ps-speed" property. If
it's specified the AN_Control.SGMRAL flag will be set which makes the
SGMII interface working with a fixed speed pre-initialized in the
MAC_CONFIG.{PS,FES} fields. First of all I wasn't sure whether the
MAC2MAC functionality it's utilized for, can be applicable for
non-SGMII interface. Secondly the port speed fixing is performed
behind the phylink back. It's possible to have the speed setting being
updated later in framework of the mac_link_up() callback. So I have
some doubts that the current "snps,ps-speed"-based fixed port speed
implementation is fully correct.

That's the stage at which I decided to stop further researches and
sent my series of fixes to you.

-Serge(y)

> 
> There's still a few netif_carrier_off()/netif_carrier_on()s left
> in the driver even after this patch series, but I think they're in
> more obscure paths, but I will also want to address those as well.
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!