[RFC,net-next,0/4] net: dsa: always use phylink

Message ID	YrWi5oBFn7vR15BH@shell.armlinux.org.uk (mailing list archive)
Headers	show Return-Path: <linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org> Date: Fri, 24 Jun 2022 12:41:26 +0100 From: "Russell King (Oracle)" <linux@armlinux.org.uk> To: Andrew Lunn <andrew@lunn.ch>, Heiner Kallweit <hkallweit1@gmail.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>, Alvin =?utf-8?q?=C5=A0ipraga?= <alsi@bang-olufsen.dk>, Claudiu Manoil <claudiu.manoil@nxp.com>, "David S. Miller" <davem@davemloft.net>, DENG Qingfang <dqfext@gmail.com>, Eric Dumazet <edumazet@google.com>, Florian Fainelli <f.fainelli@gmail.com>, George McCollister <george.mccollister@gmail.com>, Hauke Mehrtens <hauke@hauke-m.de>, Jakub Kicinski <kuba@kernel.org>, Kurt Kanzenbach <kurt@linutronix.de>, Landen Chao <Landen.Chao@mediatek.com>, Linus Walleij <linus.walleij@linaro.org>, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, Matthias Brugger <matthias.bgg@gmail.com>, netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>, Sean Wang <sean.wang@mediatek.com>, UNGLinuxDriver@microchip.com, Vivien Didelot <vivien.didelot@gmail.com>, Vladimir Oltean <olteanv@gmail.com>, Woojung Huh <woojung.huh@microchip.com> Subject: [PATCH RFC net-next 0/4] net: dsa: always use phylink Message-ID: <YrWi5oBFn7vR15BH@shell.armlinux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Precedence: list Sender: "Linux-mediatek" <linux-mediatek-bounces@lists.infradead.org> Errors-To: linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org
Series	net: dsa: always use phylink \| expand [RFC,net-next,0/4] net: dsa: always use phylink [RFC,net-next,1/4] net: dsa: add support for retrieving the interface mode [RFC,net-next,2/4] net: dsa: mv88e6xxx: report the default interface mode for the port [RFC,net-next,3/4] net: phylink: add phylink_set_max_fixed_link() [RFC,net-next,4/4] net: dsa: always use phylink for CPU and DSA ports

Russell King (Oracle) June 24, 2022, 11:41 a.m. UTC

Hi,

Currently, the core DSA code conditionally uses phylink for CPU and DSA
ports depending on whether the firmware specifies a fixed-link or a PHY.
If either of these are specified, then phylink is used for these ports,
otherwise phylink is not, and we rely on the DSA drivers to "do the
right thing". However, this detail is not mentioned in the DT binding,
but Andrew has said that this behaviour has always something that DSA
wants.

mv88e6xxx has had support for this for a long time with its "SPEED_MAX"
thing, which I recently reworked to make use of the mac_capabilities in
preparation to solving this more fully.

This series is an experiment to solve this properly, and it does this
in two steps.

The first step consists of the first two patches. Phylink needs to
know the PHY interface mode that is being used so it can (a) pass the
right mode into the MAC/PCS etc and (b) know the properties of the
link and therefore which speeds can be supported across it.

In order to achieve this, the DSA phylink_get_caps() method has an
extra argument added to it so that DSA drivers can report the
interface mode that they will be using for this port back to the core
DSA code, thereby allowing phylink to be initialised with the correct
interface mode.

Note that this can only be used for CPU and DSA ports as "user" ports
need a different behaviour - they rely on getting the interface mode
from phylib, which will only happen if phylink is initialised with
PHY_INTERFACE_MODE_NA. Unfortunately, changing this behaviour is likely
to cause widespread regressions.

Obvious questions:
1. Should phylink_get_caps() be augmented in this way, or should it be
   a separate method?

2. DSA has traditionally used "interface mode for the maximum supported
   speed on this port" where the interface mode is programmable (via
   its internal port_max_speed_mode() method) but this is only present
   for a few of the sub-drivers. Is reporting the current interface
   mode correct where this method is not implemented?

The second step is to introduce a function that allows phylink to be
reconfigured after creation time to operate at max-speed fixed-link
mode for the PHY interface mode, also using the MAC capabilities to
determine the speed and duplex mode we should be using.

Obvious questions:
1. Should we be allowing half-duplex for this?
2. If we do allow half-duplex, should we prefer fastest speed over
   duplex setting, or should we prefer fastest full-duplex speed
   over any half-duplex?
3. How do we sanely switch DSA from its current behaviour to always
   using phylink for these ports without breakage - this is the
   difficult one, because it's not obvious which drivers have been
   coded to either work around this quirk of the DSA implementation.
   For example, if we start forcing the link down before calling
   dsa_port_phylink_create(), and we then fail to set max-fixed-link,
   then the CPU/DSA port is going to fail, and we're going to have
   lots of regressions.

Please look at the patches and make suggestions on how we can proceed
to clean up this quirk of DSA.

 drivers/net/dsa/b53/b53_common.c       |  3 +-
 drivers/net/dsa/bcm_sf2.c              |  3 +-
 drivers/net/dsa/hirschmann/hellcreek.c |  3 +-
 drivers/net/dsa/lantiq_gswip.c         |  6 ++-
 drivers/net/dsa/microchip/ksz_common.c |  3 +-
 drivers/net/dsa/mt7530.c               |  3 +-
 drivers/net/dsa/mv88e6xxx/chip.c       | 53 +++++++---------------
 drivers/net/dsa/ocelot/felix.c         |  3 +-
 drivers/net/dsa/qca/ar9331.c           |  3 +-
 drivers/net/dsa/qca8k.c                |  3 +-
 drivers/net/dsa/realtek/rtl8365mb.c    |  3 +-
 drivers/net/dsa/sja1105/sja1105_main.c |  3 +-
 drivers/net/dsa/xrs700x/xrs700x.c      |  3 +-
 drivers/net/phy/phylink.c              | 83 ++++++++++++++++++++++++++++++++++
 include/linux/phylink.h                |  2 +
 include/net/dsa.h                      |  3 +-
 net/dsa/port.c                         | 42 ++++++++++-------
 17 files changed, 154 insertions(+), 68 deletions(-)

Russell King (Oracle) June 28, 2022, 9:16 p.m. UTC | #1

On Fri, Jun 24, 2022 at 12:41:26PM +0100, Russell King (Oracle) wrote:
> Hi,
> 
> Currently, the core DSA code conditionally uses phylink for CPU and DSA
> ports depending on whether the firmware specifies a fixed-link or a PHY.
> If either of these are specified, then phylink is used for these ports,
> otherwise phylink is not, and we rely on the DSA drivers to "do the
> right thing". However, this detail is not mentioned in the DT binding,
> but Andrew has said that this behaviour has always something that DSA
> wants.
> 
> mv88e6xxx has had support for this for a long time with its "SPEED_MAX"
> thing, which I recently reworked to make use of the mac_capabilities in
> preparation to solving this more fully.
> 
> This series is an experiment to solve this properly, and it does this
> in two steps.
> 
> The first step consists of the first two patches. Phylink needs to
> know the PHY interface mode that is being used so it can (a) pass the
> right mode into the MAC/PCS etc and (b) know the properties of the
> link and therefore which speeds can be supported across it.
> 
> In order to achieve this, the DSA phylink_get_caps() method has an
> extra argument added to it so that DSA drivers can report the
> interface mode that they will be using for this port back to the core
> DSA code, thereby allowing phylink to be initialised with the correct
> interface mode.
> 
> Note that this can only be used for CPU and DSA ports as "user" ports
> need a different behaviour - they rely on getting the interface mode
> from phylib, which will only happen if phylink is initialised with
> PHY_INTERFACE_MODE_NA. Unfortunately, changing this behaviour is likely
> to cause widespread regressions.
> 
> Obvious questions:
> 1. Should phylink_get_caps() be augmented in this way, or should it be
>    a separate method?
> 
> 2. DSA has traditionally used "interface mode for the maximum supported
>    speed on this port" where the interface mode is programmable (via
>    its internal port_max_speed_mode() method) but this is only present
>    for a few of the sub-drivers. Is reporting the current interface
>    mode correct where this method is not implemented?
> 
> The second step is to introduce a function that allows phylink to be
> reconfigured after creation time to operate at max-speed fixed-link
> mode for the PHY interface mode, also using the MAC capabilities to
> determine the speed and duplex mode we should be using.
> 
> Obvious questions:
> 1. Should we be allowing half-duplex for this?
> 2. If we do allow half-duplex, should we prefer fastest speed over
>    duplex setting, or should we prefer fastest full-duplex speed
>    over any half-duplex?
> 3. How do we sanely switch DSA from its current behaviour to always
>    using phylink for these ports without breakage - this is the
>    difficult one, because it's not obvious which drivers have been
>    coded to either work around this quirk of the DSA implementation.
>    For example, if we start forcing the link down before calling
>    dsa_port_phylink_create(), and we then fail to set max-fixed-link,
>    then the CPU/DSA port is going to fail, and we're going to have
>    lots of regressions.
> 
> Please look at the patches and make suggestions on how we can proceed
> to clean up this quirk of DSA.

An alternative idea has been put forward by Marek on how to solve this
without involving changes to DSA drivers, but everyone would have to
fill in the supported_interfaces and mac_capabilities.

The suggestion is that DSA calls phylink_set_max_fixed_link(), which
looks at the above two fields, and finds an interface which gives the
maximum link speed if the interface mode has not been specified. In
other words, something like this for phylink_set_max_fixed_link():

        interface = pl->link_interface;
        if (interface != PHY_INTERFACE_MODE_NA) {
                /* Get the speed/duplex capabilities and reduce according to the
                 * specified interface mode.
                 */
                caps = pl->config->mac_capabilities;
                caps &= phylink_interface_to_caps(interface);
        } else {
                interfaces = pl->config->supported_interfaces;
                max_caps = 0;

                /* Find the supported interface mode which gives the maximum
                 * speed.
                 */
                for (intf = 0; intf < PHY_INTERFACE_MODE_MAX; intf++) {
                        if (test_bit(intf, interfaces)) {
                                caps = pl->config->mac_capabilities;
                                caps &= phylink_interface_to_caps(intf);
                                if (caps > max_caps) {
                                        max_caps = caps;
                                        interface = intf;
                                }
                        }
                }

                caps = max_caps;
        }

        caps &= ~(MAC_SYM_PAUSE | MAC_ASYM_PAUSE);

        /* If there are no capabilities, then we are not using this default. */
        if (!caps)
                return -EINVAL;

        /* Decode to fastest speed and duplex */
        duplex = DUPLEX_UNKNOWN;
        speed = SPEED_UNKNOWN;
        for (i = 0; i < ARRAY_SIZE(phylink_caps_speeds); i++) {
                if (caps & phylink_caps_speeds[i].fd_mask) {
                        duplex = DUPLEX_FULL;
                        speed = phylink_caps_speeds[i].speed;
                        break;
                } else if (caps & phylink_caps_speeds[i].hd_mask) {
                        duplex = DUPLEX_HALF;
                        speed = phylink_caps_speeds[i].speed;
                        break;
                }
        }

        /* If we didn't find anything, bail. */
        if (speed == SPEED_UNKNOWN)
                return -EINVAL;

        pl->link_interface = interface;
        pl->link_config.interface = interface;
        pl->link_config.speed = speed;
        pl->link_config.duplex = duplex;
        pl->link_config.link = 1;
        pl->cfg_link_an_mode = MLO_AN_FIXED;
        pl->cur_link_an_mode = MLO_AN_FIXED;

This would have the effect of selecting the first interface mode in
numerical order that gives us the fastest link speed.

I should point out that if a DSA port can be programmed in software to
support both SGMII and 1000baseX, this will end up selecting SGMII
irrespective of what the hardware was wire-strapped to and how it was
initially configured. Do we believe that would be acceptable?

Some comments would be really useful on this.

Andrew Lunn June 29, 2022, 7:18 a.m. UTC | #2

> I should point out that if a DSA port can be programmed in software to
> support both SGMII and 1000baseX, this will end up selecting SGMII
> irrespective of what the hardware was wire-strapped to and how it was
> initially configured. Do we believe that would be acceptable?

I'm pretty sure the devel b board has 1000BaseX DSA links between its
two switches. Since both should end up SGMII that should be O.K.

Where we potentially have issues is 1000BaseX to the CPU. This is not
an issue for the Vybrid based boards, since they are fast Ethernet
only, but there are some boards with an IMX6 with 1G ethernet. I guess
they currently use 1000BaseX, and the CPU side of the link probably
has a fixed-link with phy-mode = 1000BaseX. So we might have an issue
there.

	Andrew

Marek Behún June 29, 2022, 9:27 a.m. UTC | #3

On Wed, 29 Jun 2022 09:18:10 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > I should point out that if a DSA port can be programmed in software to
> > support both SGMII and 1000baseX, this will end up selecting SGMII
> > irrespective of what the hardware was wire-strapped to and how it was
> > initially configured. Do we believe that would be acceptable?  
> 
> I'm pretty sure the devel b board has 1000BaseX DSA links between its
> two switches. Since both should end up SGMII that should be O.K.
> 
> Where we potentially have issues is 1000BaseX to the CPU. This is not
> an issue for the Vybrid based boards, since they are fast Ethernet
> only, but there are some boards with an IMX6 with 1G ethernet. I guess
> they currently use 1000BaseX, and the CPU side of the link probably
> has a fixed-link with phy-mode = 1000BaseX. So we might have an issue
> there.

If one side of the link (e.g. only the CPU eth interface) has 1000base-x
specified in device-tree explicitly, the code should keep it at
1000base-x for the DSA CPU port...

Marek

Russell King (Oracle) June 29, 2022, 9:34 a.m. UTC | #4

On Wed, Jun 29, 2022 at 11:27:50AM +0200, Marek Behún wrote:
> On Wed, 29 Jun 2022 09:18:10 +0200
> Andrew Lunn <andrew@lunn.ch> wrote:
> 
> > > I should point out that if a DSA port can be programmed in software to
> > > support both SGMII and 1000baseX, this will end up selecting SGMII
> > > irrespective of what the hardware was wire-strapped to and how it was
> > > initially configured. Do we believe that would be acceptable?  
> > 
> > I'm pretty sure the devel b board has 1000BaseX DSA links between its
> > two switches. Since both should end up SGMII that should be O.K.
> > 
> > Where we potentially have issues is 1000BaseX to the CPU. This is not
> > an issue for the Vybrid based boards, since they are fast Ethernet
> > only, but there are some boards with an IMX6 with 1G ethernet. I guess
> > they currently use 1000BaseX, and the CPU side of the link probably
> > has a fixed-link with phy-mode = 1000BaseX. So we might have an issue
> > there.
> 
> If one side of the link (e.g. only the CPU eth interface) has 1000base-x
> specified in device-tree explicitly, the code should keep it at
> 1000base-x for the DSA CPU port...

So does that mean that, if we don't find a phy-mode property in the cpu
port node, we should chase the ethernet property and check there? This
seems to be adding functionality that wasn't there before.

Marek Behún June 29, 2022, 9:42 a.m. UTC | #5

On Wed, 29 Jun 2022 10:34:28 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Wed, Jun 29, 2022 at 11:27:50AM +0200, Marek Behún wrote:
> > On Wed, 29 Jun 2022 09:18:10 +0200
> > Andrew Lunn <andrew@lunn.ch> wrote:
> >   
> > > > I should point out that if a DSA port can be programmed in software to
> > > > support both SGMII and 1000baseX, this will end up selecting SGMII
> > > > irrespective of what the hardware was wire-strapped to and how it was
> > > > initially configured. Do we believe that would be acceptable?    
> > > 
> > > I'm pretty sure the devel b board has 1000BaseX DSA links between its
> > > two switches. Since both should end up SGMII that should be O.K.
> > > 
> > > Where we potentially have issues is 1000BaseX to the CPU. This is not
> > > an issue for the Vybrid based boards, since they are fast Ethernet
> > > only, but there are some boards with an IMX6 with 1G ethernet. I guess
> > > they currently use 1000BaseX, and the CPU side of the link probably
> > > has a fixed-link with phy-mode = 1000BaseX. So we might have an issue
> > > there.  
> > 
> > If one side of the link (e.g. only the CPU eth interface) has 1000base-x
> > specified in device-tree explicitly, the code should keep it at
> > 1000base-x for the DSA CPU port...  
> 
> So does that mean that, if we don't find a phy-mode property in the cpu
> port node, we should chase the ethernet property and check there? This
> seems to be adding functionality that wasn't there before.

It wasn't there before, but it would make sense IMO.

1. if cpu port has explicit phy-mode, use that
2. otherwise look at the mode defined for peer
3. otherwise try to compute the best possible mode for both peers

Marek

Russell King (Oracle) June 29, 2022, 9:43 a.m. UTC | #6

On Wed, Jun 29, 2022 at 09:18:10AM +0200, Andrew Lunn wrote:
> > I should point out that if a DSA port can be programmed in software to
> > support both SGMII and 1000baseX, this will end up selecting SGMII
> > irrespective of what the hardware was wire-strapped to and how it was
> > initially configured. Do we believe that would be acceptable?
> 
> I'm pretty sure the devel b board has 1000BaseX DSA links between its
> two switches. Since both should end up SGMII that should be O.K.

Would such a port have a programmable C_Mode, and would it specify that
it supports both SGMII and 1000BaseX ? Without going through a lot of
boards and documentation for every switch, I can't say.

I don't think we can come to any conclusion on what the right way to
deal with this actually is - we don't have enough information about how
this is used across all the platforms we have. I think we can only try
something, get it merged into net-next, and wait to see whether anyone
complains.

When we have a CPU or DSA port without a fixed-link, phy or sfp specified,
I think we should:
(a) use the phy-mode property if present, otherwise,
(b,i) have the DSA driver return the interface mode that it wants to use
for max speed for CPU and DSA ports.
(b,ii) in the absence of the DSA driver returning a valid interface mode,
we use the supported_interfaces to find an interface which gives the
maximum speed (irrespective of duplex?) that falls within the
mac capabilities.

If all those fail, then things will break, and we will have to wait for
people to report that breakage. Does this sound a sane approach, or
does anyone have any other suggestions how to solve this?

Marek Behún June 29, 2022, 10:10 a.m. UTC | #7

On Wed, 29 Jun 2022 10:43:23 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Wed, Jun 29, 2022 at 09:18:10AM +0200, Andrew Lunn wrote:
> > > I should point out that if a DSA port can be programmed in software to
> > > support both SGMII and 1000baseX, this will end up selecting SGMII
> > > irrespective of what the hardware was wire-strapped to and how it was
> > > initially configured. Do we believe that would be acceptable?  
> > 
> > I'm pretty sure the devel b board has 1000BaseX DSA links between its
> > two switches. Since both should end up SGMII that should be O.K.  
> 
> Would such a port have a programmable C_Mode, and would it specify that
> it supports both SGMII and 1000BaseX ? Without going through a lot of
> boards and documentation for every switch, I can't say.
> 
> I don't think we can come to any conclusion on what the right way to
> deal with this actually is - we don't have enough information about how
> this is used across all the platforms we have. I think we can only try
> something, get it merged into net-next, and wait to see whether anyone
> complains.
> 
> When we have a CPU or DSA port without a fixed-link, phy or sfp specified,
> I think we should:
> (a) use the phy-mode property if present, otherwise,
> (b,i) have the DSA driver return the interface mode that it wants to use
> for max speed for CPU and DSA ports.
> (b,ii) in the absence of the DSA driver returning a valid interface mode,
> we use the supported_interfaces to find an interface which gives the
> maximum speed (irrespective of duplex?) that falls within the
> mac capabilities.
> 
> If all those fail, then things will break, and we will have to wait for
> people to report that breakage. Does this sound a sane approach, or
> does anyone have any other suggestions how to solve this?

It is a sane approach. But in the future I think we should get rid of
(b,i): I always considered the max_speed_interface() method a temporary
solution, until the drivers report what a specific port support and the
subsystem can then choose whichever mode it wants that is wired and
supported by hardware. Then we could also make it possible to change
the CPU interface mode via ethtool, which would be cool...

Marek

Russell King (Oracle) June 29, 2022, 12:41 p.m. UTC | #8

On Wed, Jun 29, 2022 at 12:10:20PM +0200, Marek Behún wrote:
> On Wed, 29 Jun 2022 10:43:23 +0100
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> > On Wed, Jun 29, 2022 at 09:18:10AM +0200, Andrew Lunn wrote:
> > > > I should point out that if a DSA port can be programmed in software to
> > > > support both SGMII and 1000baseX, this will end up selecting SGMII
> > > > irrespective of what the hardware was wire-strapped to and how it was
> > > > initially configured. Do we believe that would be acceptable?  
> > > 
> > > I'm pretty sure the devel b board has 1000BaseX DSA links between its
> > > two switches. Since both should end up SGMII that should be O.K.  
> > 
> > Would such a port have a programmable C_Mode, and would it specify that
> > it supports both SGMII and 1000BaseX ? Without going through a lot of
> > boards and documentation for every switch, I can't say.
> > 
> > I don't think we can come to any conclusion on what the right way to
> > deal with this actually is - we don't have enough information about how
> > this is used across all the platforms we have. I think we can only try
> > something, get it merged into net-next, and wait to see whether anyone
> > complains.
> > 
> > When we have a CPU or DSA port without a fixed-link, phy or sfp specified,
> > I think we should:
> > (a) use the phy-mode property if present, otherwise,
> > (b,i) have the DSA driver return the interface mode that it wants to use
> > for max speed for CPU and DSA ports.
> > (b,ii) in the absence of the DSA driver returning a valid interface mode,
> > we use the supported_interfaces to find an interface which gives the
> > maximum speed (irrespective of duplex?) that falls within the
> > mac capabilities.
> > 
> > If all those fail, then things will break, and we will have to wait for
> > people to report that breakage. Does this sound a sane approach, or
> > does anyone have any other suggestions how to solve this?
> 
> It is a sane approach. But in the future I think we should get rid of
> (b,i): I always considered the max_speed_interface() method a temporary
> solution, until the drivers report what a specific port support and the
> subsystem can then choose whichever mode it wants that is wired and
> supported by hardware. Then we could also make it possible to change
> the CPU interface mode via ethtool, which would be cool...

I can remotely test clearfog, which seems to do the right thing:

[    5.707839] mv88e6085 f1072004.mdio-mii:04: sif=21 if=21(1000base-x) cap=bd
[    5.715114] mv88e6085 f1072004.mdio-mii:04: configuring for fixed/1000base-x link mode

meaning that the supported interfaces (sif) mask only contains
1000base-x, phylink_create() was called with (if) 1000base-x, and the
capabilities (cap) indicates 1000-fd, 100-(h,f)d, and 10-(h,f)d.

I don't think port 5 on the 88e6176 can support any other modes, so
this isn't a particularly good test. My ZII boards aren't powered up
so can't test those with the extra debugging print.

I'll cut a new RFC which includes the debug print so folk can try it
out.

[RFC,net-next,0/4] net: dsa: always use phylink

Message

Comments