Message ID | 20220715215954.1449214-8-sean.anderson@seco.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: dpaa: Convert to phylink | expand |
> drivers/net/phy/phy.c | 21 +++++++++++++++++++++ > include/linux/phy.h | 38 ++++++++++++++++++++++++++++++++++++++ > 2 files changed, 59 insertions(+) > > diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c > index 8d3ee3a6495b..cf4a8b055a42 100644 > --- a/drivers/net/phy/phy.c > +++ b/drivers/net/phy/phy.c > @@ -114,6 +114,27 @@ void phy_print_status(struct phy_device *phydev) > } > EXPORT_SYMBOL(phy_print_status); > > +/** > + * phy_get_rate_adaptation - determine if rate adaptation is supported > + * @phydev: The phy device to return rate adaptation for > + * @iface: The interface mode to use > + * > + * This determines the type of rate adaptation (if any) that @phy supports > + * using @iface. @iface may be %PHY_INTERFACE_MODE_NA to determine if any > + * interface supports rate adaptation. > + * > + * Return: The type of rate adaptation @phy supports for @iface, or > + * %RATE_ADAPT_NONE. > + */ > +enum rate_adaptation phy_get_rate_adaptation(struct phy_device *phydev, > + phy_interface_t iface) > +{ > + if (phydev->drv->get_rate_adaptation) > + return phydev->drv->get_rate_adaptation(phydev, iface); It is normal that any call into the driver is performed with the phydev->lock held. > #define PHY_INIT_TIMEOUT 100000 > #define PHY_FORCE_TIMEOUT 10 > @@ -570,6 +588,7 @@ struct macsec_ops; > * @lp_advertising: Current link partner advertised linkmodes > * @eee_broken_modes: Energy efficient ethernet modes which should be prohibited > * @autoneg: Flag autoneg being used > + * @rate_adaptation: Current rate adaptation mode > * @link: Current link state > * @autoneg_complete: Flag auto negotiation of the link has completed > * @mdix: Current crossover > @@ -637,6 +656,8 @@ struct phy_device { > unsigned irq_suspended:1; > unsigned irq_rerun:1; > > + enum rate_adaptation rate_adaptation; It is not clear what the locking is on this member. Is it only safe to access it during the adjust_link callback, when it is guaranteed that the phydev->lock is held, so the value is consistent? Or is the MAC allowed to access this at other times? Andrew
On 7/16/22 3:39 PM, Andrew Lunn wrote: >> drivers/net/phy/phy.c | 21 +++++++++++++++++++++ >> include/linux/phy.h | 38 ++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 59 insertions(+) >> >> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c >> index 8d3ee3a6495b..cf4a8b055a42 100644 >> --- a/drivers/net/phy/phy.c >> +++ b/drivers/net/phy/phy.c >> @@ -114,6 +114,27 @@ void phy_print_status(struct phy_device *phydev) >> } >> EXPORT_SYMBOL(phy_print_status); >> >> +/** >> + * phy_get_rate_adaptation - determine if rate adaptation is supported >> + * @phydev: The phy device to return rate adaptation for >> + * @iface: The interface mode to use >> + * >> + * This determines the type of rate adaptation (if any) that @phy supports >> + * using @iface. @iface may be %PHY_INTERFACE_MODE_NA to determine if any >> + * interface supports rate adaptation. >> + * >> + * Return: The type of rate adaptation @phy supports for @iface, or >> + * %RATE_ADAPT_NONE. >> + */ >> +enum rate_adaptation phy_get_rate_adaptation(struct phy_device *phydev, >> + phy_interface_t iface) >> +{ >> + if (phydev->drv->get_rate_adaptation) >> + return phydev->drv->get_rate_adaptation(phydev, iface); > > It is normal that any call into the driver is performed with the > phydev->lock held. Ah, so like phy_ethtool_get_strings. >> #define PHY_INIT_TIMEOUT 100000 >> #define PHY_FORCE_TIMEOUT 10 >> @@ -570,6 +588,7 @@ struct macsec_ops; >> * @lp_advertising: Current link partner advertised linkmodes >> * @eee_broken_modes: Energy efficient ethernet modes which should be prohibited >> * @autoneg: Flag autoneg being used >> + * @rate_adaptation: Current rate adaptation mode >> * @link: Current link state >> * @autoneg_complete: Flag auto negotiation of the link has completed >> * @mdix: Current crossover >> @@ -637,6 +656,8 @@ struct phy_device { >> unsigned irq_suspended:1; >> unsigned irq_rerun:1; >> >> + enum rate_adaptation rate_adaptation; > > It is not clear what the locking is on this member. Is it only safe to > access it during the adjust_link callback, when it is guaranteed that > the phydev->lock is held, so the value is consistent? Or is the MAC > allowed to access this at other times? The former. My intention is that this has the same access as link/interface/speed/duplex. --Sean
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 8d3ee3a6495b..cf4a8b055a42 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -114,6 +114,27 @@ void phy_print_status(struct phy_device *phydev) } EXPORT_SYMBOL(phy_print_status); +/** + * phy_get_rate_adaptation - determine if rate adaptation is supported + * @phydev: The phy device to return rate adaptation for + * @iface: The interface mode to use + * + * This determines the type of rate adaptation (if any) that @phy supports + * using @iface. @iface may be %PHY_INTERFACE_MODE_NA to determine if any + * interface supports rate adaptation. + * + * Return: The type of rate adaptation @phy supports for @iface, or + * %RATE_ADAPT_NONE. + */ +enum rate_adaptation phy_get_rate_adaptation(struct phy_device *phydev, + phy_interface_t iface) +{ + if (phydev->drv->get_rate_adaptation) + return phydev->drv->get_rate_adaptation(phydev, iface); + return RATE_ADAPT_NONE; +} +EXPORT_SYMBOL_GPL(phy_get_rate_adaptation); + /** * phy_config_interrupt - configure the PHY device for the requested interrupts * @phydev: the phy_device struct diff --git a/include/linux/phy.h b/include/linux/phy.h index 81ce76c3e799..e983711f6c8b 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -276,6 +276,24 @@ static inline const char *phy_modes(phy_interface_t interface) } } +/** + * enum rate_adaptation - methods of rate adaptation + * @RATE_ADAPT_NONE: No rate adaptation performed. + * @RATE_ADAPT_PAUSE: The phy sends pause frames to throttle the MAC. + * @RATE_ADAPT_CRS: The phy asserts CRS to prevent the MAC from transmitting. + * @RATE_ADAPT_OPEN_LOOP: The MAC is programmed with a sufficiently-large IPG. + * + * These are used to throttle the rate of data on the phy interface when the + * native speed of the interface is higher than the link speed. These should + * not be used for phy interfaces which natively support multiple speeds (e.g. + * MII or SGMII). + */ +enum rate_adaptation { + RATE_ADAPT_NONE = 0, + RATE_ADAPT_PAUSE, + RATE_ADAPT_CRS, + RATE_ADAPT_OPEN_LOOP, +}; #define PHY_INIT_TIMEOUT 100000 #define PHY_FORCE_TIMEOUT 10 @@ -570,6 +588,7 @@ struct macsec_ops; * @lp_advertising: Current link partner advertised linkmodes * @eee_broken_modes: Energy efficient ethernet modes which should be prohibited * @autoneg: Flag autoneg being used + * @rate_adaptation: Current rate adaptation mode * @link: Current link state * @autoneg_complete: Flag auto negotiation of the link has completed * @mdix: Current crossover @@ -637,6 +656,8 @@ struct phy_device { unsigned irq_suspended:1; unsigned irq_rerun:1; + enum rate_adaptation rate_adaptation; + enum phy_state state; u32 dev_flags; @@ -801,6 +822,21 @@ struct phy_driver { */ int (*get_features)(struct phy_device *phydev); + /** + * @get_rate_adaptation: Get the supported type of rate adaptation for a + * particular phy interface. This is used by phy consumers to determine + * whether to advertise lower-speed modes for that interface. It is + * assumed that if a rate adaptation mode is supported on an interface, + * then that interface's rate can be adapted to all slower link speeds + * supported by the phy. If iface is %PHY_INTERFACE_MODE_NA, and the phy + * supports any kind of rate adaptation for any interface, then it must + * return that rate adaptation mode (preferring %RATE_ADAPT_PAUSE, to + * %RATE_ADAPT_CRS). If the interface is not supported, this should + * return %RATE_ADAPT_NONE. + */ + enum rate_adaptation (*get_rate_adaptation)(struct phy_device *phydev, + phy_interface_t iface); + /* PHY Power Management */ /** @suspend: Suspend the hardware, saving state if needed */ int (*suspend)(struct phy_device *phydev); @@ -1681,6 +1717,8 @@ int phy_disable_interrupts(struct phy_device *phydev); void phy_request_interrupt(struct phy_device *phydev); void phy_free_interrupt(struct phy_device *phydev); void phy_print_status(struct phy_device *phydev); +enum rate_adaptation phy_get_rate_adaptation(struct phy_device *phydev, + phy_interface_t iface); void phy_set_max_speed(struct phy_device *phydev, u32 max_speed); void phy_remove_link_mode(struct phy_device *phydev, u32 link_mode); void phy_advertise_supported(struct phy_device *phydev);
This adds general support for rate adaptation to the phy subsystem. The general idea is that the phy interface runs at one speed, and the MAC throttles the rate at which it sends packets to the link speed. There's a good overview of several techniques for achieving this at [1]. This patch adds support for three: pause-frame based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and 2BASE-TL), and open-loop-based (such as in 10GBASE-W). This patch makes a few assumptions and a few non assumptions about the types of rate adaptation available. First, it assumes that different phys may use different forms of rate adaptation. Second, it assumes that phys can use rate adaptation for any of their supported link speeds (e.g. if a phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T). Third, it does not assume that all interface modes will use the same form of rate adaptation. Fourth, it does not assume that all phy devices will support rate adaptation (even some do). Relaxing or strengthening these (non-)assumptions could result in a different API. For example, if all interface modes were assumed to use the same form of rate adaptation, then a bitmask of interface modes supportting rate adaptation would suffice. We need support for rate adaptation for two reasons. First, the phy consumer needs to know if the phy will perform rate adaptation in order to program the correct advertising. An unaware consumer will only program support for link modes at the phy interface mode's native speed. This will cause autonegotiation to fail if the link partner only advertises support for lower speed link modes. Second, to reduce packet loss it may be desirable to throttle packet throughput. In past discussions [2-4], this behavior has been controversial. It is the opinion of several developers that it is the responsibility of the system integrator or end user to set the link settings appropriately for rate adaptation. In particular, it was argued that it is difficult to determine whether a particular phy has rate adaptation enabled, and it is simpler to keep such determinations out of the kernel. Another criticism is that packet loss may happen anyway, such as if a faster link is used with a switch or repeater that does not support pause frames. I believe that our current approach is limiting, especially when considering that rate adaptation (in two forms) has made it into IEEE standards. In general, When we have appropriate information we should set sensible defaults. To consider use a contrasting example, we enable pause frames by default for switches which autonegotiate for them. When it's the phy itself generating these frames, we don't even have to autonegotiate to know that we should enable pause frames. Our current approach also encourages workarounds, such as commit 73a21fa817f0 ("dpaa_eth: support all modes with rate adapting PHYs"). These workarounds are fine for phylib drivers, but phylink drivers cannot use this approach (since there is no direct access to the phy). Note that even when we determine (e.g.) the pause settings based on whether rate adaptation is enabled, they can still be overridden by userspace (using ethtool). It might be prudent to allow disabling of rate adaptation generally in ethtool as well. [1] https://www.ieee802.org/3/efm/baseline/marris_1_0302.pdf [2] https://lore.kernel.org/netdev/1579701573-6609-1-git-send-email-madalin.bucur@oss.nxp.com/ [3] https://lore.kernel.org/netdev/1580137671-22081-1-git-send-email-madalin.bucur@oss.nxp.com/ [4] https://lore.kernel.org/netdev/20200116181933.32765-1-olteanv@gmail.com/ Signed-off-by: Sean Anderson <sean.anderson@seco.com> --- Changes in v3: - New drivers/net/phy/phy.c | 21 +++++++++++++++++++++ include/linux/phy.h | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+)