diff mbox series

[v2,net-next,4/4] net: dsa: validate that DT nodes of shared ports have the properties they need

Message ID 20220729132119.1191227-5-vladimir.oltean@nxp.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Validate OF nodes for DSA shared ports | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 3 maintainers not CCed: linux@armlinux.org.uk linux-mediatek@lists.infradead.org linux-arm-kernel@lists.infradead.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Vladimir Oltean July 29, 2022, 1:21 p.m. UTC
There is a desire coming from Russell King to make all DSA drivers
register unconditionally with phylink, to simplify the code paths:
https://lore.kernel.org/netdev/YtGPO5SkMZfN8b%2Fs@shell.armlinux.org.uk/

However this is not possible today without risking to break drivers
which rely on a different mechanism, that where ports are manually
brought up to the highest link speed during setup, and never touched by
phylink at runtime.

This happens because DSA was not always integrated with phylink, and
when the early drivers were converted from platform data to the new DSA
bindings, there was no information kept in the platform data structures
about port link speeds, so as a result, there was no information
translated into the first DT bindings.

https://lore.kernel.org/all/YtXFtTsf++AeDm1l@lunn.ch/

Today we have a workaround in place, introduced by commit a20f997010c4
("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed"),
where shared ports would be checked for the presence of phy-handle/
fixed-link/managed OF properties, and if missing, phylink registration
would be skipped.

We modify the logic of this workaround such as to stop the proliferation
of more port OF nodes with lacking information, to put an upper bound to
the number of switches for which a link management description must be
faked in order for phylink registration to become possible for them.

Today we have drivers introduced years after the phylink migration of
CPU/DSA ports, and yet we're still not completely sure whether all new
drivers use phylink, because this depends on dynamic information
(DT blob, which may very well not be upstream, because why would it).
Driver maintainers may even be unaware about the fact that omitting
fixed-link/phy-handle for CPU/DSA ports is legal, and even works with
some of the old pre-phylink drivers.

Add central validation in DSA for the OF properties required by phylink,
in an attempt to sanitize the environment for future driver writers, and
as much as possible for existing driver maintainers.

Technically no driver except sja1105 and felix (partially) validates
these properties, but perhaps due to subtle reasons, some of the
other existing drivers may not actually work properly with a port OF
node that lacks a complete description. There isn't any way to know
except by deleting the fixed-link (or phy-mode or both) on a CPU port
and trying.

We can't fully know what is the situation with downstream DT blobs,
but we can guess the overall trend by studying the DT blobs that were
submitted upstream. If there are upstream blobs that have lacking
descriptions, we take it as very likely that there are many more
downstream blobs that do so too. If all upstream blobs have complete
descriptions, we take that as a hint that the driver is a candidate for
strict validation (considering that most bindings are copy-pasted).
If there are no upstream DT blobs, we take the conservative route of
skipping validation, unless the driver maintainer instructs us
otherwise.

The driver situation is as follows:

ar9331
~~~~~~

    compatible strings:
    - qca,ar9331-switch

    1 occurrence in mainline device trees, part of SoC dtsi
    (arch/mips/boot/dts/qca/ar9331.dtsi), description is not problematic.

    Verdict: opt into validation.

b53
~~~

    compatible strings:
    - brcm,bcm5325
    - brcm,bcm53115
    - brcm,bcm53125
    - brcm,bcm53128
    - brcm,bcm5365
    - brcm,bcm5389
    - brcm,bcm5395
    - brcm,bcm5397
    - brcm,bcm5398

    - brcm,bcm53010-srab
    - brcm,bcm53011-srab
    - brcm,bcm53012-srab
    - brcm,bcm53018-srab
    - brcm,bcm53019-srab
    - brcm,bcm5301x-srab
    - brcm,bcm11360-srab
    - brcm,bcm58522-srab
    - brcm,bcm58525-srab
    - brcm,bcm58535-srab
    - brcm,bcm58622-srab
    - brcm,bcm58623-srab
    - brcm,bcm58625-srab
    - brcm,bcm88312-srab
    - brcm,cygnus-srab
    - brcm,nsp-srab
    - brcm,omega-srab

    - brcm,bcm3384-switch
    - brcm,bcm6328-switch
    - brcm,bcm6368-switch
    - brcm,bcm63xx-switch

    I've found at least these mainline DT blobs with problems:

    arch/arm/boot/dts/bcm47094-linksys-panamera.dts
    - lacks phy-mode
    arch/arm/boot/dts/bcm47189-tenda-ac9.dts
    - lacks phy-mode and fixed-link
    arch/arm/boot/dts/bcm47081-luxul-xap-1410.dts
    arch/arm/boot/dts/bcm47081-luxul-xwr-1200.dts
    arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts
    - lacks phy-mode and fixed-link
    arch/arm/boot/dts/bcm47094-luxul-xbr-4500.dts
    arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts
    arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts
    arch/arm/boot/dts/bcm953012er.dts
    arch/arm/boot/dts/bcm4708-netgear-r6250.dts
    arch/arm/boot/dts/bcm4708-buffalo-wzr-1166dhp-common.dtsi
    arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts
    arch/arm/boot/dts/bcm47094-luxul-abr-4500.dts
    - lacks phy-mode and fixed-link
    arch/arm/boot/dts/bcm53016-meraki-mr32.dts
    - lacks phy-mode

    Verdict: opt all switches out of strict validation.

bcm_sf2
~~~~~~~

    compatible strings:
    - brcm,bcm4908-switch
    - brcm,bcm7445-switch-v4.0
    - brcm,bcm7278-switch-v4.0
    - brcm,bcm7278-switch-v4.8

    A single occurrence in mainline
    (arch/arm64/boot/dts/broadcom/bcm4908/bcm4908.dtsi), part of a SoC
    dtsi, valid description. Florian Fainelli explains that most of the
    bcm_sf2 device trees lack a full description for the internal IMP
    ports.

    Verdict: opt the BCM4908 into strict validation, and opt out the
    rest. Note that even though BCM4908 has strict DT bindings, it still
    does not register with phylink on the IMP port due to it implementing
    ->adjust_link().

hellcreek
~~~~~~~~~

    compatible strings:
    - hirschmann,hellcreek-de1soc-r1

    No occurrence in mainline device trees. Kurt Kanzenbach confirms
    that the downstream device tree lacks phy-mode and fixed link, and
    needs work.

    Verdict: opt out of validation.

lan9303
~~~~~~~

    compatible strings:
    - smsc,lan9303-mdio
    - smsc,lan9303-i2c

    1 occurrence in mainline device trees:
    arch/arm/boot/dts/imx53-kp-hsc.dts
    - no phy-mode, no fixed-link

    Verdict: opt out of validation.

lantiq_gswip
~~~~~~~~~~~~

    compatible strings:
    - lantiq,xrx200-gswip
    - lantiq,xrx300-gswip
    - lantiq,xrx330-gswip

    No occurrences in mainline device trees. Martin Blumenstingl
    confirms that the downstream OpenWrt device trees lack a proper
    fixed-link and need work, and that the incomplete description can
    even be seen in the example from
    Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt.

    Verdict: opt out of validation.

microchip ksz
~~~~~~~~~~~~~

    compatible strings:
    - microchip,ksz8765
    - microchip,ksz8794
    - microchip,ksz8795
    - microchip,ksz8863
    - microchip,ksz8873
    - microchip,ksz9477
    - microchip,ksz9897
    - microchip,ksz9893
    - microchip,ksz9563
    - microchip,ksz8563
    - microchip,ksz9567
    - microchip,lan9370
    - microchip,lan9371
    - microchip,lan9372
    - microchip,lan9373
    - microchip,lan9374

    5 occurrences in mainline device trees, all descriptions are valid.
    But we had a snafu for the ksz8795 and ksz9477 drivers where the
    phy-mode property would be expected to be located directly under the
    'switch' node rather than under a port OF node. It was fixed by
    commit edecfa98f602 ("net: dsa: microchip: look for phy-mode in port
    nodes"). The driver still has compatibility with the old DT blobs.
    The lan937x support was added later than the above snafu was fixed,
    and even though it has support for the broken DT blobs by virtue of
    sharing a common probing function, I'll take it that its DT blobs
    are correct.

    Verdict: opt lan937x into validation, and the others out.

mt7530
~~~~~~

    compatible strings
    - mediatek,mt7621
    - mediatek,mt7530
    - mediatek,mt7531

    Multiple occurrences in mainline device trees, one is part of an SoC
    dtsi (arch/mips/boot/dts/ralink/mt7621.dtsi), all descriptions are
    fine.

    Verdict: opt into strict validation.

mv88e6060
~~~~~~~~~

    compatible string:
    - marvell,mv88e6060

    no occurrences in mainline, nobody knows anybody who uses it.

    Verdict: opt out of strict validation.

mv88e6xxx
~~~~~~~~~

    compatible strings:
    - marvell,mv88e6085
    - marvell,mv88e6190
    - marvell,mv88e6250

    Device trees that have incomplete descriptions of CPU or DSA ports:
    arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi
    - lacks phy-mode
    arch/arm64/boot/dts/marvell/cn9130-crb.dtsi
    - lacks phy-mode and fixed-link
    arch/arm/boot/dts/vf610-zii-ssmb-spu3.dts
    - lacks phy-mode
    arch/arm/boot/dts/kirkwood-mv88f6281gtw-ge.dts
    - lacks phy-mode
    arch/arm/boot/dts/vf610-zii-spb4.dts
    - lacks phy-mode
    arch/arm/boot/dts/vf610-zii-cfu1.dts
    - lacks phy-mode
    arch/arm/boot/dts/vf610-zii-dev-rev-c.dts
    - lacks phy-mode on CPU port, fixed-link on DSA ports
    arch/arm/boot/dts/vf610-zii-dev-rev-b.dts
    - lacks phy-mode on CPU port
    arch/arm/boot/dts/armada-381-netgear-gs110emx.dts
    - lacks phy-mode
    arch/arm/boot/dts/vf610-zii-scu4-aib.dts
    - lacks fixed-link on xgmii DSA ports and/or in-band-status on
      2500base-x DSA ports, and phy-mode on CPU port
    arch/arm/boot/dts/imx6qdl-gw5904.dtsi
    - lacks phy-mode and fixed-link
    arch/arm/boot/dts/armada-385-clearfog-gtr-l8.dts
    - lacks phy-mode and fixed-link
    arch/arm/boot/dts/vf610-zii-ssmb-dtu.dts
    - lacks phy-mode
    arch/arm/boot/dts/kirkwood-dir665.dts
    - lacks phy-mode
    arch/arm/boot/dts/kirkwood-rd88f6281.dtsi
    - lacks phy-mode
    arch/arm/boot/dts/orion5x-netgear-wnr854t.dts
    - lacks phy-mode and fixed-link
    arch/arm/boot/dts/armada-388-clearfog.dts
    - lacks phy-mode
    arch/arm/boot/dts/armada-xp-linksys-mamba.dts
    - lacks phy-mode
    arch/arm/boot/dts/armada-385-linksys.dtsi
    - lacks phy-mode
    arch/arm/boot/dts/imx6q-b450v3.dts
    arch/arm/boot/dts/imx6q-b850v3.dts
    - has a phy-handle but not a phy-mode?
    arch/arm/boot/dts/armada-370-rd.dts
    - lacks phy-mode
    arch/arm/boot/dts/kirkwood-linksys-viper.dts
    - lacks phy-mode
    arch/arm/boot/dts/imx51-zii-rdu1.dts
    - lacks phy-mode
    arch/arm/boot/dts/imx51-zii-scu2-mezz.dts
    - lacks phy-mode
    arch/arm/boot/dts/imx6qdl-zii-rdu2.dtsi
    - lacks phy-mode
    arch/arm/boot/dts/armada-385-clearfog-gtr-s4.dts
    - lacks phy-mode and fixed-link

    Verdict: opt out of validation.

ocelot
~~~~~~

    compatible strings:
    - mscc,vsc9953-switch
    - felix (arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi) is a PCI
      device, has no compatible string

    2 occurrences in mainline, both are part of SoC dtsi and complete.

    Verdict: opt into strict validation.

qca8k
~~~~~

    compatible strings:
    - qca,qca8327
    - qca,qca8328
    - qca,qca8334
    - qca,qca8337

    5 occurrences in mainline device trees, none of the descriptions are
    problematic.

    Verdict: opt into validation.

realtek
~~~~~~~

    compatible strings:
    - realtek,rtl8366rb
    - realtek,rtl8365mb

    2 occurrences in mainline, both descriptions are fine, additionally
    rtl8365mb.c has a comment "The device tree firmware should also
    specify the link partner of the extension port - either via a
    fixed-link or other phy-handle."

    Verdict: opt into validation.

rzn1_a5psw
~~~~~~~~~~

    compatible strings:
    - renesas,rzn1-a5psw

    One single occurrence, part of SoC dtsi
    (arch/arm/boot/dts/r9a06g032.dtsi), description is fine.

    Verdict: opt into validation.

sja1105
~~~~~~~

    Driver already validates its port OF nodes in
    sja1105_parse_ports_node().

    Verdict: opt into validation.

vsc73xx
~~~~~~~

    compatible strings:
    - vitesse,vsc7385
    - vitesse,vsc7388
    - vitesse,vsc7395
    - vitesse,vsc7398

    2 occurrences in mainline device trees, both descriptions are fine.

    Verdict: opt into validation.

xrs700x
~~~~~~~

    compatible strings:
    - arrow,xrs7003e
    - arrow,xrs7003f
    - arrow,xrs7004e
    - arrow,xrs7004f

    no occurrences in mainline, we don't know.

    Verdict: opt out of strict validation.

Because there is a pattern where newly added switches reuse existing
drivers more often than introducing new ones, I've opted for deciding
who gets to skip validation based on an OF compatible match table in the
DSA core. The alternative would have been to add another boolean
property to struct dsa_switch, like configure_vlan_while_not_filtering.
But this avoids situations where sometimes driver maintainers obfuscate
what goes on by sharing a common probing function, and therefore
making new switches inherit old quirks.

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Acked-by: Alvin Šipraga <alsi@bang-olufsen.dk> # realtek
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
v1->v2:
- sort drivers alphabetically, and other rewordings in the commit
  message
- move validation inside dsa_shared_port_link_register_of(), where it is
  better placed considering the future work that might take place here
- perform validation for everyone, just don't enforce it for the
  switches listed, as suggested by Andrew Lunn

 net/dsa/port.c | 175 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 170 insertions(+), 5 deletions(-)

Comments

Rob Herring July 29, 2022, 4:22 p.m. UTC | #1
On Fri, Jul 29, 2022 at 7:21 AM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> There is a desire coming from Russell King to make all DSA drivers
> register unconditionally with phylink, to simplify the code paths:
> https://lore.kernel.org/netdev/YtGPO5SkMZfN8b%2Fs@shell.armlinux.org.uk/
>
> However this is not possible today without risking to break drivers
> which rely on a different mechanism, that where ports are manually
> brought up to the highest link speed during setup, and never touched by
> phylink at runtime.
>
> This happens because DSA was not always integrated with phylink, and
> when the early drivers were converted from platform data to the new DSA
> bindings, there was no information kept in the platform data structures
> about port link speeds, so as a result, there was no information
> translated into the first DT bindings.
>
> https://lore.kernel.org/all/YtXFtTsf++AeDm1l@lunn.ch/
>
> Today we have a workaround in place, introduced by commit a20f997010c4
> ("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed"),
> where shared ports would be checked for the presence of phy-handle/
> fixed-link/managed OF properties, and if missing, phylink registration
> would be skipped.
>
> We modify the logic of this workaround such as to stop the proliferation
> of more port OF nodes with lacking information, to put an upper bound to
> the number of switches for which a link management description must be
> faked in order for phylink registration to become possible for them.
>
> Today we have drivers introduced years after the phylink migration of
> CPU/DSA ports, and yet we're still not completely sure whether all new
> drivers use phylink, because this depends on dynamic information
> (DT blob, which may very well not be upstream, because why would it).
> Driver maintainers may even be unaware about the fact that omitting
> fixed-link/phy-handle for CPU/DSA ports is legal, and even works with
> some of the old pre-phylink drivers.
>
> Add central validation in DSA for the OF properties required by phylink,
> in an attempt to sanitize the environment for future driver writers, and
> as much as possible for existing driver maintainers.

It's not the kernel's job to validate the DT. If it was, it does a
horrible job. Is the schema providing this validation? If not, you
need to add it.

Rob
Vladimir Oltean July 29, 2022, 5:01 p.m. UTC | #2
On Fri, Jul 29, 2022 at 10:22:49AM -0600, Rob Herring wrote:
> It's not the kernel's job to validate the DT. If it was, it does a
> horrible job.

I'm surprised by you saying this.

The situation is as follows: phylink parses the fwnode it's given, and
errors out if it can't find everything it needs. See phylink_parse_mode()
and phylink_parse_fixedlink(). This is a matter of fact - if you start
parsing stuff, you'll eventually need to treat the case where what
you're searching for isn't there, or isn't realistic.

DSA is a common framework used by multiple drivers, and it wasn't always
integrated with phylink. The DT nodes of some ports will lack what
phylink needs, but these ports don't really need phylink to work, it's
optional, they work without it too. However if we begin the process of
registering them with phylink and we let phylink fail, this process is
irreversible and the ports don't work anymore.

So what DSA currently does (even before this patch set) is it
pre-validates that phylink has what it needs, and skips phylink if it
doesn't. It's only that it doesn't name it this way, and it doesn't
print anything.

Being a common framework, new drivers opt into this behavior willy-nilly.
I am adding a table of compatible strings of old drivers where the
behavior is retained. For new drivers, we fail them in DSA rather than
in phylink, this is true. Maybe this is what you disagree with?
We do this as a matter of practicality - we already need to predetermine
whether phylink has a chance of working, and if we find something missing,
we know it won't. Seems illogical to let phylink go through the same
parsing again.

As for the lousy job, I can't contradict you...

> Is the schema providing this validation? If not, you need to add it.

No, it's not. I can also look into providing a patch that statically
validates this. But I'm afraid, with all due respect, that not many
people take the YAML validator too seriously? With the volume of output
it even throws, it would be even hard to detect something new, you'd
need to know to search for it. Most of the DSA drivers aren't even
converted to YAML, and it is precisely the biggest offenders that
aren't. And even if the schema says a property is required but the code
begs to differ, and a future DT blob gets to enter production based on
undocumented behavior, who's right?
Marcin Wojtas July 29, 2022, 5:57 p.m. UTC | #3
pt., 29 lip 2022 o 15:21 Vladimir Oltean <vladimir.oltean@nxp.com> napisał(a):
>
> There is a desire coming from Russell King to make all DSA drivers
> register unconditionally with phylink, to simplify the code paths:
> https://lore.kernel.org/netdev/YtGPO5SkMZfN8b%2Fs@shell.armlinux.org.uk/
>
> However this is not possible today without risking to break drivers
> which rely on a different mechanism, that where ports are manually
> brought up to the highest link speed during setup, and never touched by
> phylink at runtime.
>
> This happens because DSA was not always integrated with phylink, and
> when the early drivers were converted from platform data to the new DSA
> bindings, there was no information kept in the platform data structures
> about port link speeds, so as a result, there was no information
> translated into the first DT bindings.
>
> https://lore.kernel.org/all/YtXFtTsf++AeDm1l@lunn.ch/
>
> Today we have a workaround in place, introduced by commit a20f997010c4
> ("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed"),
> where shared ports would be checked for the presence of phy-handle/
> fixed-link/managed OF properties, and if missing, phylink registration
> would be skipped.
>
> We modify the logic of this workaround such as to stop the proliferation
> of more port OF nodes with lacking information, to put an upper bound to
> the number of switches for which a link management description must be
> faked in order for phylink registration to become possible for them.
>
> Today we have drivers introduced years after the phylink migration of
> CPU/DSA ports, and yet we're still not completely sure whether all new
> drivers use phylink, because this depends on dynamic information
> (DT blob, which may very well not be upstream, because why would it).
> Driver maintainers may even be unaware about the fact that omitting
> fixed-link/phy-handle for CPU/DSA ports is legal, and even works with
> some of the old pre-phylink drivers.
>
> Add central validation in DSA for the OF properties required by phylink,
> in an attempt to sanitize the environment for future driver writers, and
> as much as possible for existing driver maintainers.
>
> Technically no driver except sja1105 and felix (partially) validates
> these properties, but perhaps due to subtle reasons, some of the
> other existing drivers may not actually work properly with a port OF
> node that lacks a complete description. There isn't any way to know
> except by deleting the fixed-link (or phy-mode or both) on a CPU port
> and trying.
>
> We can't fully know what is the situation with downstream DT blobs,
> but we can guess the overall trend by studying the DT blobs that were
> submitted upstream. If there are upstream blobs that have lacking
> descriptions, we take it as very likely that there are many more
> downstream blobs that do so too. If all upstream blobs have complete
> descriptions, we take that as a hint that the driver is a candidate for
> strict validation (considering that most bindings are copy-pasted).
> If there are no upstream DT blobs, we take the conservative route of
> skipping validation, unless the driver maintainer instructs us
> otherwise.
>
> The driver situation is as follows:
>
> ar9331
> ~~~~~~
>
>     compatible strings:
>     - qca,ar9331-switch
>
>     1 occurrence in mainline device trees, part of SoC dtsi
>     (arch/mips/boot/dts/qca/ar9331.dtsi), description is not problematic.
>
>     Verdict: opt into validation.
>
> b53
> ~~~
>
>     compatible strings:
>     - brcm,bcm5325
>     - brcm,bcm53115
>     - brcm,bcm53125
>     - brcm,bcm53128
>     - brcm,bcm5365
>     - brcm,bcm5389
>     - brcm,bcm5395
>     - brcm,bcm5397
>     - brcm,bcm5398
>
>     - brcm,bcm53010-srab
>     - brcm,bcm53011-srab
>     - brcm,bcm53012-srab
>     - brcm,bcm53018-srab
>     - brcm,bcm53019-srab
>     - brcm,bcm5301x-srab
>     - brcm,bcm11360-srab
>     - brcm,bcm58522-srab
>     - brcm,bcm58525-srab
>     - brcm,bcm58535-srab
>     - brcm,bcm58622-srab
>     - brcm,bcm58623-srab
>     - brcm,bcm58625-srab
>     - brcm,bcm88312-srab
>     - brcm,cygnus-srab
>     - brcm,nsp-srab
>     - brcm,omega-srab
>
>     - brcm,bcm3384-switch
>     - brcm,bcm6328-switch
>     - brcm,bcm6368-switch
>     - brcm,bcm63xx-switch
>
>     I've found at least these mainline DT blobs with problems:
>
>     arch/arm/boot/dts/bcm47094-linksys-panamera.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/bcm47189-tenda-ac9.dts
>     - lacks phy-mode and fixed-link
>     arch/arm/boot/dts/bcm47081-luxul-xap-1410.dts
>     arch/arm/boot/dts/bcm47081-luxul-xwr-1200.dts
>     arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts
>     - lacks phy-mode and fixed-link
>     arch/arm/boot/dts/bcm47094-luxul-xbr-4500.dts
>     arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts
>     arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts
>     arch/arm/boot/dts/bcm953012er.dts
>     arch/arm/boot/dts/bcm4708-netgear-r6250.dts
>     arch/arm/boot/dts/bcm4708-buffalo-wzr-1166dhp-common.dtsi
>     arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts
>     arch/arm/boot/dts/bcm47094-luxul-abr-4500.dts
>     - lacks phy-mode and fixed-link
>     arch/arm/boot/dts/bcm53016-meraki-mr32.dts
>     - lacks phy-mode
>
>     Verdict: opt all switches out of strict validation.
>
> bcm_sf2
> ~~~~~~~
>
>     compatible strings:
>     - brcm,bcm4908-switch
>     - brcm,bcm7445-switch-v4.0
>     - brcm,bcm7278-switch-v4.0
>     - brcm,bcm7278-switch-v4.8
>
>     A single occurrence in mainline
>     (arch/arm64/boot/dts/broadcom/bcm4908/bcm4908.dtsi), part of a SoC
>     dtsi, valid description. Florian Fainelli explains that most of the
>     bcm_sf2 device trees lack a full description for the internal IMP
>     ports.
>
>     Verdict: opt the BCM4908 into strict validation, and opt out the
>     rest. Note that even though BCM4908 has strict DT bindings, it still
>     does not register with phylink on the IMP port due to it implementing
>     ->adjust_link().
>
> hellcreek
> ~~~~~~~~~
>
>     compatible strings:
>     - hirschmann,hellcreek-de1soc-r1
>
>     No occurrence in mainline device trees. Kurt Kanzenbach confirms
>     that the downstream device tree lacks phy-mode and fixed link, and
>     needs work.
>
>     Verdict: opt out of validation.
>
> lan9303
> ~~~~~~~
>
>     compatible strings:
>     - smsc,lan9303-mdio
>     - smsc,lan9303-i2c
>
>     1 occurrence in mainline device trees:
>     arch/arm/boot/dts/imx53-kp-hsc.dts
>     - no phy-mode, no fixed-link
>
>     Verdict: opt out of validation.
>
> lantiq_gswip
> ~~~~~~~~~~~~
>
>     compatible strings:
>     - lantiq,xrx200-gswip
>     - lantiq,xrx300-gswip
>     - lantiq,xrx330-gswip
>
>     No occurrences in mainline device trees. Martin Blumenstingl
>     confirms that the downstream OpenWrt device trees lack a proper
>     fixed-link and need work, and that the incomplete description can
>     even be seen in the example from
>     Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt.
>
>     Verdict: opt out of validation.
>
> microchip ksz
> ~~~~~~~~~~~~~
>
>     compatible strings:
>     - microchip,ksz8765
>     - microchip,ksz8794
>     - microchip,ksz8795
>     - microchip,ksz8863
>     - microchip,ksz8873
>     - microchip,ksz9477
>     - microchip,ksz9897
>     - microchip,ksz9893
>     - microchip,ksz9563
>     - microchip,ksz8563
>     - microchip,ksz9567
>     - microchip,lan9370
>     - microchip,lan9371
>     - microchip,lan9372
>     - microchip,lan9373
>     - microchip,lan9374
>
>     5 occurrences in mainline device trees, all descriptions are valid.
>     But we had a snafu for the ksz8795 and ksz9477 drivers where the
>     phy-mode property would be expected to be located directly under the
>     'switch' node rather than under a port OF node. It was fixed by
>     commit edecfa98f602 ("net: dsa: microchip: look for phy-mode in port
>     nodes"). The driver still has compatibility with the old DT blobs.
>     The lan937x support was added later than the above snafu was fixed,
>     and even though it has support for the broken DT blobs by virtue of
>     sharing a common probing function, I'll take it that its DT blobs
>     are correct.
>
>     Verdict: opt lan937x into validation, and the others out.
>
> mt7530
> ~~~~~~
>
>     compatible strings
>     - mediatek,mt7621
>     - mediatek,mt7530
>     - mediatek,mt7531
>
>     Multiple occurrences in mainline device trees, one is part of an SoC
>     dtsi (arch/mips/boot/dts/ralink/mt7621.dtsi), all descriptions are
>     fine.
>
>     Verdict: opt into strict validation.
>
> mv88e6060
> ~~~~~~~~~
>
>     compatible string:
>     - marvell,mv88e6060
>
>     no occurrences in mainline, nobody knows anybody who uses it.
>
>     Verdict: opt out of strict validation.
>
> mv88e6xxx
> ~~~~~~~~~
>
>     compatible strings:
>     - marvell,mv88e6085
>     - marvell,mv88e6190
>     - marvell,mv88e6250
>
>     Device trees that have incomplete descriptions of CPU or DSA ports:
>     arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi
>     - lacks phy-mode
>     arch/arm64/boot/dts/marvell/cn9130-crb.dtsi
>     - lacks phy-mode and fixed-link
>     arch/arm/boot/dts/vf610-zii-ssmb-spu3.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/kirkwood-mv88f6281gtw-ge.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/vf610-zii-spb4.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/vf610-zii-cfu1.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/vf610-zii-dev-rev-c.dts
>     - lacks phy-mode on CPU port, fixed-link on DSA ports
>     arch/arm/boot/dts/vf610-zii-dev-rev-b.dts
>     - lacks phy-mode on CPU port
>     arch/arm/boot/dts/armada-381-netgear-gs110emx.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/vf610-zii-scu4-aib.dts
>     - lacks fixed-link on xgmii DSA ports and/or in-band-status on
>       2500base-x DSA ports, and phy-mode on CPU port
>     arch/arm/boot/dts/imx6qdl-gw5904.dtsi
>     - lacks phy-mode and fixed-link
>     arch/arm/boot/dts/armada-385-clearfog-gtr-l8.dts
>     - lacks phy-mode and fixed-link
>     arch/arm/boot/dts/vf610-zii-ssmb-dtu.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/kirkwood-dir665.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/kirkwood-rd88f6281.dtsi
>     - lacks phy-mode
>     arch/arm/boot/dts/orion5x-netgear-wnr854t.dts
>     - lacks phy-mode and fixed-link
>     arch/arm/boot/dts/armada-388-clearfog.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/armada-xp-linksys-mamba.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/armada-385-linksys.dtsi
>     - lacks phy-mode
>     arch/arm/boot/dts/imx6q-b450v3.dts
>     arch/arm/boot/dts/imx6q-b850v3.dts
>     - has a phy-handle but not a phy-mode?
>     arch/arm/boot/dts/armada-370-rd.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/kirkwood-linksys-viper.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/imx51-zii-rdu1.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/imx51-zii-scu2-mezz.dts
>     - lacks phy-mode
>     arch/arm/boot/dts/imx6qdl-zii-rdu2.dtsi
>     - lacks phy-mode
>     arch/arm/boot/dts/armada-385-clearfog-gtr-s4.dts
>     - lacks phy-mode and fixed-link
>
>     Verdict: opt out of validation.
>
> ocelot
> ~~~~~~
>
>     compatible strings:
>     - mscc,vsc9953-switch
>     - felix (arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi) is a PCI
>       device, has no compatible string
>
>     2 occurrences in mainline, both are part of SoC dtsi and complete.
>
>     Verdict: opt into strict validation.
>
> qca8k
> ~~~~~
>
>     compatible strings:
>     - qca,qca8327
>     - qca,qca8328
>     - qca,qca8334
>     - qca,qca8337
>
>     5 occurrences in mainline device trees, none of the descriptions are
>     problematic.
>
>     Verdict: opt into validation.
>
> realtek
> ~~~~~~~
>
>     compatible strings:
>     - realtek,rtl8366rb
>     - realtek,rtl8365mb
>
>     2 occurrences in mainline, both descriptions are fine, additionally
>     rtl8365mb.c has a comment "The device tree firmware should also
>     specify the link partner of the extension port - either via a
>     fixed-link or other phy-handle."
>
>     Verdict: opt into validation.
>
> rzn1_a5psw
> ~~~~~~~~~~
>
>     compatible strings:
>     - renesas,rzn1-a5psw
>
>     One single occurrence, part of SoC dtsi
>     (arch/arm/boot/dts/r9a06g032.dtsi), description is fine.
>
>     Verdict: opt into validation.
>
> sja1105
> ~~~~~~~
>
>     Driver already validates its port OF nodes in
>     sja1105_parse_ports_node().
>
>     Verdict: opt into validation.
>
> vsc73xx
> ~~~~~~~
>
>     compatible strings:
>     - vitesse,vsc7385
>     - vitesse,vsc7388
>     - vitesse,vsc7395
>     - vitesse,vsc7398
>
>     2 occurrences in mainline device trees, both descriptions are fine.
>
>     Verdict: opt into validation.
>
> xrs700x
> ~~~~~~~
>
>     compatible strings:
>     - arrow,xrs7003e
>     - arrow,xrs7003f
>     - arrow,xrs7004e
>     - arrow,xrs7004f
>
>     no occurrences in mainline, we don't know.
>
>     Verdict: opt out of strict validation.
>
> Because there is a pattern where newly added switches reuse existing
> drivers more often than introducing new ones, I've opted for deciding
> who gets to skip validation based on an OF compatible match table in the
> DSA core. The alternative would have been to add another boolean
> property to struct dsa_switch, like configure_vlan_while_not_filtering.
> But this avoids situations where sometimes driver maintainers obfuscate
> what goes on by sharing a common probing function, and therefore
> making new switches inherit old quirks.
>
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Frank Rowand <frowand.list@gmail.com>
> Acked-by: Alvin Šipraga <alsi@bang-olufsen.dk> # realtek
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> v1->v2:
> - sort drivers alphabetically, and other rewordings in the commit
>   message
> - move validation inside dsa_shared_port_link_register_of(), where it is
>   better placed considering the future work that might take place here
> - perform validation for everyone, just don't enforce it for the
>   switches listed, as suggested by Andrew Lunn
>
>  net/dsa/port.c | 175 +++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 170 insertions(+), 5 deletions(-)
>
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index 4b6139bff217..c07a7c69d5e0 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -1650,22 +1650,187 @@ static int dsa_shared_port_phylink_register(struct dsa_port *dp)
>         return err;
>  }
>
> +/* During the initial DSA driver migration to OF, port nodes were sometimes
> + * added to device trees with no indication of how they should operate from a
> + * link management perspective (phy-handle, fixed-link, etc). Additionally, the
> + * phy-mode may be absent. The interpretation of these port OF nodes depends on
> + * their type.
> + *
> + * User ports with no phy-handle or fixed-link are expected to connect to an
> + * internal PHY located on the ds->slave_mii_bus at an MDIO address equal to
> + * the port number. This description is still actively supported.
> + *
> + * Shared (CPU and DSA) ports with no phy-handle or fixed-link are expected to
> + * operate at the maximum speed that their phy-mode is capable of. If the
> + * phy-mode is absent, they are expected to operate using the phy-mode
> + * supported by the port that gives the highest link speed. It is unspecified
> + * if the port should use flow control or not, half duplex or full duplex, or
> + * if the phy-mode is a SERDES link, whether in-band autoneg is expected to be
> + * enabled or not.
> + *
> + * In the latter case of shared ports, omitting the link management description
> + * from the firmware node is deprecated and strongly discouraged. DSA uses
> + * phylink, which rejects the firmware nodes of these ports for lacking
> + * required properties.
> + *
> + * For switches in this table, DSA will skip enforcing validation and will
> + * later omit registering a phylink instance for the shared ports, if they lack
> + * a fixed-link, a phy-handle, or a managed = "in-band-status" property.
> + * It becomes the responsibility of the driver to ensure that these ports
> + * operate at the maximum speed (whatever this means) and will interoperate
> + * with the DSA master or other cascade port, since phylink methods will not be
> + * invoked for them.
> + *
> + * If you are considering expanding this table for newly introduced switches,
> + * think again. It is OK to remove switches from this table if there aren't DT
> + * blobs in circulation which rely on defaulting the shared ports.
> + */
> +static const char * const dsa_switches_dont_enforce_validation[] = {
> +#if IS_ENABLED(CONFIG_NET_DSA_XRS700X)
> +       "arrow,xrs7003e",
> +       "arrow,xrs7003f",
> +       "arrow,xrs7004e",
> +       "arrow,xrs7004f",
> +#endif
> +#if IS_ENABLED(CONFIG_B53)
> +       "brcm,bcm5325",
> +       "brcm,bcm53115",
> +       "brcm,bcm53125",
> +       "brcm,bcm53128",
> +       "brcm,bcm5365",
> +       "brcm,bcm5389",
> +       "brcm,bcm5395",
> +       "brcm,bcm5397",
> +       "brcm,bcm5398",
> +       "brcm,bcm53010-srab",
> +       "brcm,bcm53011-srab",
> +       "brcm,bcm53012-srab",
> +       "brcm,bcm53018-srab",
> +       "brcm,bcm53019-srab",
> +       "brcm,bcm5301x-srab",
> +       "brcm,bcm11360-srab",
> +       "brcm,bcm58522-srab",
> +       "brcm,bcm58525-srab",
> +       "brcm,bcm58535-srab",
> +       "brcm,bcm58622-srab",
> +       "brcm,bcm58623-srab",
> +       "brcm,bcm58625-srab",
> +       "brcm,bcm88312-srab",
> +       "brcm,cygnus-srab",
> +       "brcm,nsp-srab",
> +       "brcm,omega-srab",
> +       "brcm,bcm3384-switch",
> +       "brcm,bcm6328-switch",
> +       "brcm,bcm6368-switch",
> +       "brcm,bcm63xx-switch",
> +#endif
> +#if IS_ENABLED(CONFIG_NET_DSA_BCM_SF2)
> +       "brcm,bcm7445-switch-v4.0",
> +       "brcm,bcm7278-switch-v4.0",
> +       "brcm,bcm7278-switch-v4.8",
> +#endif
> +#if IS_ENABLED(CONFIG_NET_DSA_HIRSCHMANN_HELLCREEK)
> +       "hirschmann,hellcreek-de1soc-r1",
> +#endif
> +#if IS_ENABLED(CONFIG_NET_DSA_LANTIQ_GSWIP)
> +       "lantiq,xrx200-gswip",
> +       "lantiq,xrx300-gswip",
> +       "lantiq,xrx330-gswip",
> +#endif
> +#if IS_ENABLED(CONFIG_NET_DSA_MV88E6060)
> +       "marvell,mv88e6060",
> +#endif
> +#if IS_ENABLED(CONFIG_NET_DSA_MV88E6XXX)
> +       "marvell,mv88e6085",
> +       "marvell,mv88e6190",
> +       "marvell,mv88e6250",
> +#endif
> +#if IS_ENABLED(CONFIG_NET_DSA_MICROCHIP_KSZ_COMMON)
> +       "microchip,ksz8765",
> +       "microchip,ksz8794",
> +       "microchip,ksz8795",
> +       "microchip,ksz8863",
> +       "microchip,ksz8873",
> +       "microchip,ksz9477",
> +       "microchip,ksz9897",
> +       "microchip,ksz9893",
> +       "microchip,ksz9563",
> +       "microchip,ksz8563",
> +       "microchip,ksz9567",
> +#endif
> +#if IS_ENABLED(CONFIG_NET_DSA_SMSC_LAN9303_MDIO)
> +       "smsc,lan9303-mdio",
> +#endif
> +#if IS_ENABLED(CONFIG_NET_DSA_SMSC_LAN9303_I2C)
> +       "smsc,lan9303-i2c",
> +#endif
> +       NULL,
> +};
> +

I'm ok with enforcing the phylink usage and updating the binding
description, so the CPU / DSA ports have a proper full description of
the link. What I find problematic is including the drivers' related
ifdefs and compat strings in the subsystem's generic code. With this
change, if someone adds a new driver (or extends the existing ones),
they will have to add the string in the driver AND net/dsa...

How about the following scenario:
- Remove allow/blocklist from this patch and validate the description
always (no opt out). For an agreed timeframe (1 year? 2 LTS releases?)
it wouldn't cause the switch probe to fail, but instead of
dev_warn/dev_err, there should be a big fat WARN_ON(). Spoiled bootlog
will encourage users to update the device trees.
- After the deadline, the switch probe should start failing with
improper description and everyone will have to use phylink.
- Announce the binding change and start updating DT binding
description schema (adding the validation on that level too).
?

Best regards,
Marcin
Vladimir Oltean July 29, 2022, 6:34 p.m. UTC | #4
On Fri, Jul 29, 2022 at 07:57:50PM +0200, Marcin Wojtas wrote:
> I'm ok with enforcing the phylink usage and updating the binding
> description, so the CPU / DSA ports have a proper full description of
> the link. What I find problematic is including the drivers' related
> ifdefs and compat strings in the subsystem's generic code. With this
> change, if someone adds a new driver (or extends the existing ones),
> they will have to add the string in the driver AND net/dsa...

I chuckled when I read this. You must have missed:

 * If you are considering expanding this table for newly introduced switches,
 * think again. It is OK to remove switches from this table if there aren't DT
 * blobs in circulation which rely on defaulting the shared ports.

The #ifdef's are there such that the compatible array is smaller on a
kernel when those drivers are compiled out.

> How about the following scenario:
> - Remove allow/blocklist from this patch and validate the description
> always (no opt out).

We're validating the description always. We're opting a fixed number of
switches out of _enforcing_ it, number which will not increase.
That's why the people here are copied, to state if they're ok with being
in one camp or the other.

> For an agreed timeframe (1 year? 2 LTS releases?)
> it wouldn't cause the switch probe to fail, but instead of
> dev_warn/dev_err, there should be a big fat WARN_ON(). Spoiled bootlog
> will encourage users to update the device trees.

The intention is _not_ to fail probing for drivers with incomplete
bindings, neither now nor after 1 year or 2 LTS releases.

The intention is to not allow drivers which didn't have any such DT
blobs, or awareness of the feature, to gain any parasitic users.
The DSA core currently allows it. If planets align just the right way,
those ports might even work by accident, until they don't.

> - After the deadline, the switch probe should start failing with
> improper description and everyone will have to use phylink.

Not applicable after the explanation above, I think. At least, it's not
my goal to fail drivers. If individual maintainers want to do so,
they're free to do it from my side.

> - Announce the binding change and start updating DT binding
> description schema (adding the validation on that level too).
> ?

The announcement is here, what else are you thinking of?
Rob Herring July 29, 2022, 6:39 p.m. UTC | #5
On Fri, Jul 29, 2022 at 11:01 AM Vladimir Oltean
<vladimir.oltean@nxp.com> wrote:
>
> On Fri, Jul 29, 2022 at 10:22:49AM -0600, Rob Herring wrote:
> > It's not the kernel's job to validate the DT. If it was, it does a
> > horrible job.
>
> I'm surprised by you saying this.

I've said it many times...

> The situation is as follows: phylink parses the fwnode it's given, and
> errors out if it can't find everything it needs. See phylink_parse_mode()
> and phylink_parse_fixedlink(). This is a matter of fact - if you start
> parsing stuff, you'll eventually need to treat the case where what
> you're searching for isn't there, or isn't realistic.
>
> DSA is a common framework used by multiple drivers, and it wasn't always
> integrated with phylink. The DT nodes of some ports will lack what
> phylink needs, but these ports don't really need phylink to work, it's
> optional, they work without it too. However if we begin the process of
> registering them with phylink and we let phylink fail, this process is
> irreversible and the ports don't work anymore.
>
> So what DSA currently does (even before this patch set) is it
> pre-validates that phylink has what it needs, and skips phylink if it
> doesn't. It's only that it doesn't name it this way, and it doesn't
> print anything.
>
> Being a common framework, new drivers opt into this behavior willy-nilly.
> I am adding a table of compatible strings of old drivers where the
> behavior is retained. For new drivers, we fail them in DSA rather than
> in phylink, this is true. Maybe this is what you disagree with?

No, I haven't looked at it that closely.

> We do this as a matter of practicality - we already need to predetermine
> whether phylink has a chance of working, and if we find something missing,
> we know it won't. Seems illogical to let phylink go through the same
> parsing again.
>
> As for the lousy job, I can't contradict you...
>
> > Is the schema providing this validation? If not, you need to add it.
>
> No, it's not. I can also look into providing a patch that statically
> validates this. But I'm afraid, with all due respect, that not many
> people take the YAML validator too seriously? With the volume of output
> it even throws, it would be even hard to detect something new, you'd
> need to know to search for it. Most of the DSA drivers aren't even
> converted to YAML, and it is precisely the biggest offenders that
> aren't. And even if the schema says a property is required but the code
> begs to differ, and a future DT blob gets to enter production based on
> undocumented behavior, who's right?

All valid points. At least for the sea of warnings, you can limit
checking to only a subset of schemas you care about. Setting
'DT_SCHEMA_FILES=net/' will only check networking schemas for example.
Just need folks to care about those subsets.

I'm not saying don't put warnings in the kernel for this. Just don't
make it the only source of warnings. Given you are tightening the
requirements, it makes sense to have a warning. If it was something
entirely new, then I'd be more inclined to say only the schema should
check.

Rob
Marcin Wojtas July 29, 2022, 8:36 p.m. UTC | #6
pt., 29 lip 2022 o 20:34 Vladimir Oltean <vladimir.oltean@nxp.com> napisał(a):
>
> On Fri, Jul 29, 2022 at 07:57:50PM +0200, Marcin Wojtas wrote:
> > I'm ok with enforcing the phylink usage and updating the binding
> > description, so the CPU / DSA ports have a proper full description of
> > the link. What I find problematic is including the drivers' related
> > ifdefs and compat strings in the subsystem's generic code. With this
> > change, if someone adds a new driver (or extends the existing ones),
> > they will have to add the string in the driver AND net/dsa...
>
> I chuckled when I read this. You must have missed:
>
>  * If you are considering expanding this table for newly introduced switches,
>  * think again. It is OK to remove switches from this table if there aren't DT
>  * blobs in circulation which rely on defaulting the shared ports.
>
> The #ifdef's are there such that the compatible array is smaller on a
> kernel when those drivers are compiled out.
>
> > How about the following scenario:
> > - Remove allow/blocklist from this patch and validate the description
> > always (no opt out).
>
> We're validating the description always. We're opting a fixed number of
> switches out of _enforcing_ it, number which will not increase.
> That's why the people here are copied, to state if they're ok with being
> in one camp or the other.
>
> > For an agreed timeframe (1 year? 2 LTS releases?)
> > it wouldn't cause the switch probe to fail, but instead of
> > dev_warn/dev_err, there should be a big fat WARN_ON(). Spoiled bootlog
> > will encourage users to update the device trees.
>
> The intention is _not_ to fail probing for drivers with incomplete
> bindings, neither now nor after 1 year or 2 LTS releases.
>
> The intention is to not allow drivers which didn't have any such DT
> blobs, or awareness of the feature, to gain any parasitic users.
> The DSA core currently allows it. If planets align just the right way,
> those ports might even work by accident, until they don't.

What I propose is to enforce more strictly an update of DT description
with a specified timeline, abandoning 'camps' idea and driver-specific
contents in the generic code.

>
> > - After the deadline, the switch probe should start failing with
> > improper description and everyone will have to use phylink.
>
> Not applicable after the explanation above, I think. At least, it's not
> my goal to fail drivers. If individual maintainers want to do so,
> they're free to do it from my side.

Ok, that makes sense. The WARN_ON, however, can be annoying enough, so
to more efficiently enforce DT updates. If this change is eventually
scheduled for v5.21, there will be plenty of time for DT overhaul.

>
> > - Announce the binding change and start updating DT binding
> > description schema (adding the validation on that level too).
> > ?
>
> The announcement is here, what else are you thinking of?

Some open source projects have a nice practice of sending additional
'heads-up' information to the lists, that are related to the more
significant changes affecting a lot of users, modifying generic
behavior of the OS, implicating future need for a firmware update,
etc. For instance, v2 which is still under review on the edge of v5.19
and v5.20-rc1 could be easy to miss. In case of a more strict
approach, that could be helpful for raising awareness after the patch
lands.
Andrew Lunn July 29, 2022, 9:17 p.m. UTC | #7
> What I propose is to enforce more strictly an update of DT description
> with a specified timeline, abandoning 'camps' idea and driver-specific
> contents in the generic code.

Regressions are the problem. We are supposed to be backwards
compatible with older DT blobs. If we now say old DT blobs are
invalid, and refuse to probe, we cause a regression.

For some of the in kernel DT files using the mv88e6xxx i can make a
good guess at what the missing properties are. However, i'm bound to
guess wrong at some point, and cause a regression. So we could change
just those we can test. But at some point, the other blobs are going
to fail the enforces checks and cause a regression anyway.

And what about out of tree blobs? Probably OpenWRT have some. Do we
want to cause them to regress?

     Andrew
Florian Fainelli July 29, 2022, 9:24 p.m. UTC | #8
On 7/29/22 14:17, Andrew Lunn wrote:
>> What I propose is to enforce more strictly an update of DT description
>> with a specified timeline, abandoning 'camps' idea and driver-specific
>> contents in the generic code.
> 
> Regressions are the problem. We are supposed to be backwards
> compatible with older DT blobs. If we now say old DT blobs are
> invalid, and refuse to probe, we cause a regression.
> 
> For some of the in kernel DT files using the mv88e6xxx i can make a
> good guess at what the missing properties are. However, i'm bound to
> guess wrong at some point, and cause a regression. So we could change
> just those we can test. But at some point, the other blobs are going
> to fail the enforces checks and cause a regression anyway.
> 
> And what about out of tree blobs? Probably OpenWRT have some. Do we
> want to cause them to regress?

No, we do not want that, which is why Vladimir's approach IMHO is reasonable in that it acknowledges mistakes or shortcomings of the past into the present, and expects the future to be corrected and not repeat those same mistakes. The deprectiation window idea is all well and good in premise, however with such a large user base, I am not sure it is going to go very far unfortunately, nor that it will hinder our ability to have a more maintainable DSA framework TBH.

BTW, OpenWrt does not typically ship DT blobs that stay frozen, all of the kernel, DTBs, root filesystem, and sometimes a recent u-boot copy will be updated at the same time because very rarely do the existing boot loader satisfy modern requirements (PSCI, etc.).
Marcin Wojtas July 29, 2022, 9:33 p.m. UTC | #9
pt., 29 lip 2022 o 23:24 Florian Fainelli <f.fainelli@gmail.com> napisał(a):
>
> On 7/29/22 14:17, Andrew Lunn wrote:
> >> What I propose is to enforce more strictly an update of DT description
> >> with a specified timeline, abandoning 'camps' idea and driver-specific
> >> contents in the generic code.
> >
> > Regressions are the problem. We are supposed to be backwards
> > compatible with older DT blobs. If we now say old DT blobs are
> > invalid, and refuse to probe, we cause a regression.
> >
> > For some of the in kernel DT files using the mv88e6xxx i can make a
> > good guess at what the missing properties are. However, i'm bound to
> > guess wrong at some point, and cause a regression. So we could change
> > just those we can test. But at some point, the other blobs are going
> > to fail the enforces checks and cause a regression anyway.
> >
> > And what about out of tree blobs? Probably OpenWRT have some. Do we
> > want to cause them to regress?
>
> No, we do not want that, which is why Vladimir's approach IMHO is reasonable in that it acknowledges mistakes or shortcomings of the past into the present, and expects the future to be corrected and not repeat those same mistakes. The deprectiation window idea is all well and good in premise, however with such a large user base, I am not sure it is going to go very far unfortunately, nor that it will hinder our ability to have a more maintainable DSA framework TBH.
>
> BTW, OpenWrt does not typically ship DT blobs that stay frozen, all of the kernel, DTBs, root filesystem, and sometimes a recent u-boot copy will be updated at the same time because very rarely do the existing boot loader satisfy modern requirements (PSCI, etc.).

Initially, I thought that the idea is a probe failure (hence the camps
to prevent that) - but it was clarified later, it's not the case.

I totally agree and I am all against breaking the backward
compatibility (this is why I work on ACPI support that much :) ). The
question is whether for existing deployments with 'broken' DT
description we would be ok to introduce a dev_warn/WARN_ON message
after a kernel update. That would be a case if the check is performed
unconditionally - this way we can keep compat strings out of net/dsa.
What do you think?

Best regards,
Marcin
Florian Fainelli July 29, 2022, 9:44 p.m. UTC | #10
On 7/29/22 14:33, Marcin Wojtas wrote:
> pt., 29 lip 2022 o 23:24 Florian Fainelli <f.fainelli@gmail.com> napisał(a):
>>
>> On 7/29/22 14:17, Andrew Lunn wrote:
>>>> What I propose is to enforce more strictly an update of DT description
>>>> with a specified timeline, abandoning 'camps' idea and driver-specific
>>>> contents in the generic code.
>>>
>>> Regressions are the problem. We are supposed to be backwards
>>> compatible with older DT blobs. If we now say old DT blobs are
>>> invalid, and refuse to probe, we cause a regression.
>>>
>>> For some of the in kernel DT files using the mv88e6xxx i can make a
>>> good guess at what the missing properties are. However, i'm bound to
>>> guess wrong at some point, and cause a regression. So we could change
>>> just those we can test. But at some point, the other blobs are going
>>> to fail the enforces checks and cause a regression anyway.
>>>
>>> And what about out of tree blobs? Probably OpenWRT have some. Do we
>>> want to cause them to regress?
>>
>> No, we do not want that, which is why Vladimir's approach IMHO is reasonable in that it acknowledges mistakes or shortcomings of the past into the present, and expects the future to be corrected and not repeat those same mistakes. The deprectiation window idea is all well and good in premise, however with such a large user base, I am not sure it is going to go very far unfortunately, nor that it will hinder our ability to have a more maintainable DSA framework TBH.
>>
>> BTW, OpenWrt does not typically ship DT blobs that stay frozen, all of the kernel, DTBs, root filesystem, and sometimes a recent u-boot copy will be updated at the same time because very rarely do the existing boot loader satisfy modern requirements (PSCI, etc.).
> 
> Initially, I thought that the idea is a probe failure (hence the camps
> to prevent that) - but it was clarified later, it's not the case.
> 
> I totally agree and I am all against breaking the backward
> compatibility (this is why I work on ACPI support that much :) ). The
> question is whether for existing deployments with 'broken' DT
> description we would be ok to introduce a dev_warn/WARN_ON message
> after a kernel update. That would be a case if the check is performed
> unconditionally - this way we can keep compat strings out of net/dsa.
> What do you think?

A warning seems fine and appropriate to give just the right amount of nudge to get things fixed, I would not as far as a full backtrace WARN() because those will definitively upset people's CI in a wayt that seems a bit over reacting. Anyway I do have my share of DT blobs to update.
Vladimir Oltean July 30, 2022, 12:49 a.m. UTC | #11
On Fri, Jul 29, 2022 at 11:33:30PM +0200, Marcin Wojtas wrote:
> Initially, I thought that the idea is a probe failure (hence the camps
> to prevent that) - but it was clarified later, it's not the case.

The idea _is_ a probe failure, but not for whom you seem to believe
(not for the 'expected bad' drivers but for the 'expected good' ones).
The probe failure is for drivers using OF whose compatibles are outside
of dsa_switches_dont_enforce_validation, and all ACPI drivers (that's
going to be your part), effective immediately.

For example the sja1105 doesn't have any parasitic users, I'm absolutely
sure about that because it already has validation, although only coincidentally.
For the ocelot driver I'm also relatively certain practically speaking,
because we're talking about an embedded switch where the description is
in an SoC dtsi. But I can't totally exclude the possibility that somebody
won't be crazy enough to /delete-property/ the fixed-link from a board
device tree, or to redefine everything from scratch without including
the SoC dtsi, and then claim that hey, this was actually working before
this or that refactoring, due to some marginal condition which I never
designed for.

And that's the point of this patch. The DSA core is flawed because it
doesn't let phylink fail when it should. It tries to save the day by
making some odd decisions which make sense considering the old drivers,
but simply don't, when you consider what the code would have looked like,
were it not for the pre-phylink baggage.

So this change essentially lets happen for new drivers what should have
happened in a normal world where DSA would not interfere at all -
nondescriptive bindings: fail to probe, bye.

Rob is right, the DSA core shouldn't validate the OF node, it should
just let phylink fail if that's what it will, and go about its day.
Individual DSA drivers shouldn't have to validate their OF node either,
but they'd have to, if they wanted to circumvent DSA's logic. And why
put the burden on them?

> I totally agree and I am all against breaking the backward
> compatibility (this is why I work on ACPI support that much :) ). The
> question is whether for existing deployments with 'broken' DT
> description we would be ok to introduce a dev_warn/WARN_ON message
> after a kernel update.

Andrew suggested it. In v1, the array was for skipping DT validation
(this implies skipping printing warnings) for drivers where we knew it's
going to be violated anyway. But we thought, why not also tell users
that what's going on isn't quite ideal.

https://patchwork.kernel.org/project/netdevbpf/patch/20220723164635.1621911-1-vladimir.oltean@nxp.com/#24949229

This is a detail to me and not the purpose of the change. It would be
nothing short of a miracle if people would suddenly see the light and
update all DT blobs tomorrow. But no one here is planning based on that.
Quite far from it - Russell King and Marek Behún were (are?) working on
a patch set that actually pushes those incomplete DT blobs to the next
level, and fakes what the proper OF node should have looked like, based
on driver level knowledge.

> That would be a case if the check is performed unconditionally - this
> way we can keep compat strings out of net/dsa.  What do you think?

So you're saying keep the validation and the warnings, but don't fail
the probing of anyone? Why? Being strict about validation allows me as a
driver maintainer to be formally correct when I say that there are no
users that expect to not register with phylink, rather than just guess
or hope, or even waste time checking. And if I was someone new coming to
the DSA framework, I'd want to have that guarantee. I didn't design for
that possibility and I don't want to have it, just because I use the
same framework as someone who put workarounds in place so that it works
for him. We could argue about who of existing but otherwise newish
drivers should be part of dsa_switches_dont_enforce_validation, and
that's where the camps come in.

Being warm and fuzzy about this, and just print, doesn't really describe
anything except for a preference. After all, we're not abandoning the
broken DT blobs.

What's your concrete problem with compatible strings inside net/dsa anyway,
and do you have a competent alternative? Is it the problem that you'll
have to convert them to fwnode? You do realize that this array is not
planned to expand at all after the patch is merged, including not to
ACPI IDs, right?

I've explained in the commit message why:

| Because there is a pattern where newly added switches reuse existing
| drivers more often than introducing new ones, I've opted for deciding
| who gets to skip validation based on an OF compatible match table in the
| DSA core. The alternative would have been to add another boolean
| property to struct dsa_switch, like configure_vlan_while_not_filtering.
| But this avoids situations where sometimes driver maintainers obfuscate
| what goes on by sharing a common probing function, and therefore
| making new switches inherit old quirks.

The latter has actually happened no farther than a few weeks ago,
refactoring was done to the microchip ksz driver, and a newly added
switch gained support for an old one's quirks, by refactoring the code
to share common logic (which I was otherwise very happy about).
Now good luck with adding a compatible string to
dsa_switches_dont_enforce_validation without anyone noticing.
Vladimir Oltean July 30, 2022, 4:23 p.m. UTC | #12
Hi Rob,

On Fri, Jul 29, 2022 at 12:39:06PM -0600, Rob Herring wrote:
> All valid points. At least for the sea of warnings, you can limit
> checking to only a subset of schemas you care about. Setting
> 'DT_SCHEMA_FILES=net/' will only check networking schemas for example.
> Just need folks to care about those subsets.
> 
> I'm not saying don't put warnings in the kernel for this. Just don't
> make it the only source of warnings. Given you are tightening the
> requirements, it makes sense to have a warning. If it was something
> entirely new, then I'd be more inclined to say only the schema should
> check.

How does this look like? It says that if the 'ethernet' property exists
and is a phandle (i.e. this is a CPU port), or if the 'link' property
exists and is a phandle-array (i.e. this is a DSA port), then the
phylink-related properties are mandatory, in the combinations that they
may appear in.

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml b/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml
index 09317e16cb5d..ed828cec90fd 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml
@@ -76,6 +76,31 @@ properties:
 required:
   - reg
 
+if:
+  oneOf:
+    - properties:
+        ethernet:
+          items:
+            $ref: /schemas/types.yaml#/definitions/phandle
+    - properties:
+        link:
+          items:
+            $ref: /schemas/types.yaml#/definitions/phandle-array
+then:
+  allOf:
+  - required:
+    - phy-mode
+  - oneOf:
+    - required:
+      - fixed-link
+    - required:
+      - phy-handle
+    - required:
+      - managed
+    - required:
+      - phy-handle
+      - managed
+
 additionalProperties: true
 

I have deliberately made this part of dsa-port.yaml and not depend on
any compatible string. The reason is the code - we'll warn about missing
properties regardless of whether we have fallback logic for some older
switches. Essentially the fact that some switches have the fallback to
not use phylink will remain an undocumented easter egg as far as the
dt-schema is concerned.

What do you think?

Can I also get an ACK for this patch and patch 1 please?
Rob Herring Aug. 1, 2022, 2:02 p.m. UTC | #13
On Sat, Jul 30, 2022 at 10:24 AM Vladimir Oltean
<vladimir.oltean@nxp.com> wrote:
>
> Hi Rob,
>
> On Fri, Jul 29, 2022 at 12:39:06PM -0600, Rob Herring wrote:
> > All valid points. At least for the sea of warnings, you can limit
> > checking to only a subset of schemas you care about. Setting
> > 'DT_SCHEMA_FILES=net/' will only check networking schemas for example.
> > Just need folks to care about those subsets.
> >
> > I'm not saying don't put warnings in the kernel for this. Just don't
> > make it the only source of warnings. Given you are tightening the
> > requirements, it makes sense to have a warning. If it was something
> > entirely new, then I'd be more inclined to say only the schema should
> > check.
>
> How does this look like? It says that if the 'ethernet' property exists
> and is a phandle (i.e. this is a CPU port), or if the 'link' property
> exists and is a phandle-array (i.e. this is a DSA port), then the
> phylink-related properties are mandatory, in the combinations that they
> may appear in.
>
> diff --git a/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml b/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml
> index 09317e16cb5d..ed828cec90fd 100644
> --- a/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml
> +++ b/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml
> @@ -76,6 +76,31 @@ properties:
>  required:
>    - reg
>
> +if:
> +  oneOf:
> +    - properties:
> +        ethernet:
> +          items:
> +            $ref: /schemas/types.yaml#/definitions/phandle
> +    - properties:
> +        link:
> +          items:
> +            $ref: /schemas/types.yaml#/definitions/phandle-array

'items' here is wrong. 'required' is needed because the property not
present will be true otherwise.

Checking the type is redundant given the top-level schema already does that.

> +then:
> +  allOf:
> +  - required:
> +    - phy-mode
> +  - oneOf:
> +    - required:
> +      - fixed-link
> +    - required:
> +      - phy-handle
> +    - required:
> +      - managed
> +    - required:
> +      - phy-handle
> +      - managed
> +
>  additionalProperties: true
>
>
> I have deliberately made this part of dsa-port.yaml and not depend on
> any compatible string. The reason is the code - we'll warn about missing
> properties regardless of whether we have fallback logic for some older
> switches. Essentially the fact that some switches have the fallback to
> not use phylink will remain an undocumented easter egg as far as the
> dt-schema is concerned.

dsa-port.yaml is only applied when some other schema references it
which is probably based on some compatible. If you want to apply this
under some other condition, then you need a custom 'select' schema to
define when.

Rob
Vladimir Oltean Aug. 1, 2022, 2:11 p.m. UTC | #14
On Mon, Aug 01, 2022 at 08:02:56AM -0600, Rob Herring wrote:
> > +if:
> > +  oneOf:
> > +    - properties:
> > +        ethernet:
> > +          items:
> > +            $ref: /schemas/types.yaml#/definitions/phandle
> > +    - properties:
> > +        link:
> > +          items:
> > +            $ref: /schemas/types.yaml#/definitions/phandle-array
> 
> 'items' here is wrong. 'required' is needed because the property not
> present will be true otherwise.
> 
> Checking the type is redundant given the top-level schema already does that.

Excuse me, but I don't understand. I verified this and it works as
expected - if the property is present, the 'items' evaluates to true,
otherwise to false.

Could you suggest an alternative way of checking whether the 'ethernet'
or 'link' properties are present, as part of a conditional?

> > I have deliberately made this part of dsa-port.yaml and not depend on
> > any compatible string. The reason is the code - we'll warn about missing
> > properties regardless of whether we have fallback logic for some older
> > switches. Essentially the fact that some switches have the fallback to
> > not use phylink will remain an undocumented easter egg as far as the
> > dt-schema is concerned.
> 
> dsa-port.yaml is only applied when some other schema references it
> which is probably based on some compatible. If you want to apply this
> under some other condition, then you need a custom 'select' schema to
> define when.

No, this is good, I think. I got a "build warning" from your bot which
found the DSA examples which had missing required properties, so I think
it works the way I indended it.
Rob Herring Aug. 1, 2022, 2:26 p.m. UTC | #15
On Mon, Aug 1, 2022 at 8:11 AM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> On Mon, Aug 01, 2022 at 08:02:56AM -0600, Rob Herring wrote:
> > > +if:
> > > +  oneOf:
> > > +    - properties:
> > > +        ethernet:
> > > +          items:
> > > +            $ref: /schemas/types.yaml#/definitions/phandle
> > > +    - properties:
> > > +        link:
> > > +          items:
> > > +            $ref: /schemas/types.yaml#/definitions/phandle-array
> >
> > 'items' here is wrong. 'required' is needed because the property not
> > present will be true otherwise.
> >
> > Checking the type is redundant given the top-level schema already does that.
>
> Excuse me, but I don't understand. I verified this and it works as
> expected - if the property is present, the 'items' evaluates to true,
> otherwise to false.

The main schema has:

link:
  $ref: /schemas/types.yaml#/definitions/phandle-array

Both with and without 'items' can't be correct.

> Could you suggest an alternative way of checking whether the 'ethernet'
> or 'link' properties are present, as part of a conditional?

if:
  oneOf:
    - required: [ ethernet ]
    - required: [ link ]

'required' here doesn't mean the property is required, but is simply
true if the property is present and false if not present.


> > > I have deliberately made this part of dsa-port.yaml and not depend on
> > > any compatible string. The reason is the code - we'll warn about missing
> > > properties regardless of whether we have fallback logic for some older
> > > switches. Essentially the fact that some switches have the fallback to
> > > not use phylink will remain an undocumented easter egg as far as the
> > > dt-schema is concerned.
> >
> > dsa-port.yaml is only applied when some other schema references it
> > which is probably based on some compatible. If you want to apply this
> > under some other condition, then you need a custom 'select' schema to
> > define when.
>
> No, this is good, I think. I got a "build warning" from your bot which
> found the DSA examples which had missing required properties, so I think
> it works the way I indended it.

Okay, I was thinking you wanted to check cases that didn't yet have a
schema for the specific device.

Rob
diff mbox series

Patch

diff --git a/net/dsa/port.c b/net/dsa/port.c
index 4b6139bff217..c07a7c69d5e0 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -1650,22 +1650,187 @@  static int dsa_shared_port_phylink_register(struct dsa_port *dp)
 	return err;
 }
 
+/* During the initial DSA driver migration to OF, port nodes were sometimes
+ * added to device trees with no indication of how they should operate from a
+ * link management perspective (phy-handle, fixed-link, etc). Additionally, the
+ * phy-mode may be absent. The interpretation of these port OF nodes depends on
+ * their type.
+ *
+ * User ports with no phy-handle or fixed-link are expected to connect to an
+ * internal PHY located on the ds->slave_mii_bus at an MDIO address equal to
+ * the port number. This description is still actively supported.
+ *
+ * Shared (CPU and DSA) ports with no phy-handle or fixed-link are expected to
+ * operate at the maximum speed that their phy-mode is capable of. If the
+ * phy-mode is absent, they are expected to operate using the phy-mode
+ * supported by the port that gives the highest link speed. It is unspecified
+ * if the port should use flow control or not, half duplex or full duplex, or
+ * if the phy-mode is a SERDES link, whether in-band autoneg is expected to be
+ * enabled or not.
+ *
+ * In the latter case of shared ports, omitting the link management description
+ * from the firmware node is deprecated and strongly discouraged. DSA uses
+ * phylink, which rejects the firmware nodes of these ports for lacking
+ * required properties.
+ *
+ * For switches in this table, DSA will skip enforcing validation and will
+ * later omit registering a phylink instance for the shared ports, if they lack
+ * a fixed-link, a phy-handle, or a managed = "in-band-status" property.
+ * It becomes the responsibility of the driver to ensure that these ports
+ * operate at the maximum speed (whatever this means) and will interoperate
+ * with the DSA master or other cascade port, since phylink methods will not be
+ * invoked for them.
+ *
+ * If you are considering expanding this table for newly introduced switches,
+ * think again. It is OK to remove switches from this table if there aren't DT
+ * blobs in circulation which rely on defaulting the shared ports.
+ */
+static const char * const dsa_switches_dont_enforce_validation[] = {
+#if IS_ENABLED(CONFIG_NET_DSA_XRS700X)
+	"arrow,xrs7003e",
+	"arrow,xrs7003f",
+	"arrow,xrs7004e",
+	"arrow,xrs7004f",
+#endif
+#if IS_ENABLED(CONFIG_B53)
+	"brcm,bcm5325",
+	"brcm,bcm53115",
+	"brcm,bcm53125",
+	"brcm,bcm53128",
+	"brcm,bcm5365",
+	"brcm,bcm5389",
+	"brcm,bcm5395",
+	"brcm,bcm5397",
+	"brcm,bcm5398",
+	"brcm,bcm53010-srab",
+	"brcm,bcm53011-srab",
+	"brcm,bcm53012-srab",
+	"brcm,bcm53018-srab",
+	"brcm,bcm53019-srab",
+	"brcm,bcm5301x-srab",
+	"brcm,bcm11360-srab",
+	"brcm,bcm58522-srab",
+	"brcm,bcm58525-srab",
+	"brcm,bcm58535-srab",
+	"brcm,bcm58622-srab",
+	"brcm,bcm58623-srab",
+	"brcm,bcm58625-srab",
+	"brcm,bcm88312-srab",
+	"brcm,cygnus-srab",
+	"brcm,nsp-srab",
+	"brcm,omega-srab",
+	"brcm,bcm3384-switch",
+	"brcm,bcm6328-switch",
+	"brcm,bcm6368-switch",
+	"brcm,bcm63xx-switch",
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_BCM_SF2)
+	"brcm,bcm7445-switch-v4.0",
+	"brcm,bcm7278-switch-v4.0",
+	"brcm,bcm7278-switch-v4.8",
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_HIRSCHMANN_HELLCREEK)
+	"hirschmann,hellcreek-de1soc-r1",
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_LANTIQ_GSWIP)
+	"lantiq,xrx200-gswip",
+	"lantiq,xrx300-gswip",
+	"lantiq,xrx330-gswip",
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_MV88E6060)
+	"marvell,mv88e6060",
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_MV88E6XXX)
+	"marvell,mv88e6085",
+	"marvell,mv88e6190",
+	"marvell,mv88e6250",
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_MICROCHIP_KSZ_COMMON)
+	"microchip,ksz8765",
+	"microchip,ksz8794",
+	"microchip,ksz8795",
+	"microchip,ksz8863",
+	"microchip,ksz8873",
+	"microchip,ksz9477",
+	"microchip,ksz9897",
+	"microchip,ksz9893",
+	"microchip,ksz9563",
+	"microchip,ksz8563",
+	"microchip,ksz9567",
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_SMSC_LAN9303_MDIO)
+	"smsc,lan9303-mdio",
+#endif
+#if IS_ENABLED(CONFIG_NET_DSA_SMSC_LAN9303_I2C)
+	"smsc,lan9303-i2c",
+#endif
+	NULL,
+};
+
+static void dsa_shared_port_validate_of(struct dsa_port *dp,
+					bool *missing_phy_mode,
+					bool *missing_link_description)
+{
+	struct device_node *dn = dp->dn, *phy_np;
+	struct dsa_switch *ds = dp->ds;
+	phy_interface_t mode;
+
+	*missing_phy_mode = false;
+	*missing_link_description = false;
+
+	if (of_get_phy_mode(dn, &mode)) {
+		*missing_phy_mode = true;
+		dev_err(ds->dev,
+			"OF node %pOF of %s port %d lacks the required \"phy-mode\" property\n",
+			dn, dsa_port_is_cpu(dp) ? "CPU" : "DSA", dp->index);
+	}
+
+	/* Note: of_phy_is_fixed_link() also returns true for
+	 * managed = "in-band-status"
+	 */
+	if (of_phy_is_fixed_link(dn))
+		return;
+
+	phy_np = of_parse_phandle(dn, "phy-handle", 0);
+	if (phy_np) {
+		of_node_put(phy_np);
+		return;
+	}
+
+	*missing_link_description = true;
+
+	dev_err(ds->dev,
+		"OF node %pOF of %s port %d lacks the required \"phy-handle\", \"fixed-link\" or \"managed\" properties\n",
+		dn, dsa_port_is_cpu(dp) ? "CPU" : "DSA", dp->index);
+}
+
 int dsa_shared_port_link_register_of(struct dsa_port *dp)
 {
 	struct dsa_switch *ds = dp->ds;
-	struct device_node *phy_np;
+	bool missing_link_description;
+	bool missing_phy_mode;
 	int port = dp->index;
 
+	dsa_shared_port_validate_of(dp, &missing_phy_mode,
+				    &missing_link_description);
+
+	if ((missing_phy_mode || missing_link_description) &&
+	    !of_device_compatible_match(ds->dev->of_node,
+					dsa_switches_dont_enforce_validation))
+		return -EINVAL;
+
 	if (!ds->ops->adjust_link) {
-		phy_np = of_parse_phandle(dp->dn, "phy-handle", 0);
-		if (of_phy_is_fixed_link(dp->dn) || phy_np) {
+		if (missing_link_description) {
+			dev_warn(ds->dev,
+				 "Skipping phylink registration for %s port %d\n",
+				 dsa_port_is_cpu(dp) ? "CPU" : "DSA", dp->index);
+		} else {
 			if (ds->ops->phylink_mac_link_down)
 				ds->ops->phylink_mac_link_down(ds, port,
 					MLO_AN_FIXED, PHY_INTERFACE_MODE_NA);
-			of_node_put(phy_np);
+
 			return dsa_shared_port_phylink_register(dp);
 		}
-		of_node_put(phy_np);
 		return 0;
 	}