diff mbox series

[net-next,v1,7/7] net: usb: lan78xx: Enable EEE support with phylink integration

Message ID 20250108121341.2689130-8-o.rempel@pengutronix.de (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Convert LAN78xx to PHYLINK | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: linux-usb@vger.kernel.org
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3 this patch: 3
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Oleksij Rempel Jan. 8, 2025, 12:13 p.m. UTC
Refactor Energy-Efficient Ethernet (EEE) support in the LAN78xx driver
to integrate with phylink. This includes the following changes:

- Use phylink_ethtool_get_eee and phylink_ethtool_set_eee to manage
  EEE settings, aligning with the phylink API.
- Add a new tx_lpi_timer variable to manage the TX LPI (Low Power Idle)
  request delay. Default it to 50 microseconds based on LAN7800 documentation
  recommendations.
- Update EEE configuration during link_up and ensure proper disabling
  during `link_down` to allow reconfiguration.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
 drivers/net/usb/lan78xx.c | 81 ++++++++++++++++++++++-----------------
 1 file changed, 46 insertions(+), 35 deletions(-)

Comments

Russell King (Oracle) Jan. 8, 2025, 12:47 p.m. UTC | #1
On Wed, Jan 08, 2025 at 01:13:41PM +0100, Oleksij Rempel wrote:
> Refactor Energy-Efficient Ethernet (EEE) support in the LAN78xx driver
> to integrate with phylink. This includes the following changes:
> 
> - Use phylink_ethtool_get_eee and phylink_ethtool_set_eee to manage
>   EEE settings, aligning with the phylink API.
> - Add a new tx_lpi_timer variable to manage the TX LPI (Low Power Idle)
>   request delay. Default it to 50 microseconds based on LAN7800 documentation
>   recommendations.

phylib maintains tx_lpi_timer for you. Please use that instead.

In any case, I've been submitting phylink EEE support which will help
driver authors get this correct, but I think it needs more feedback.
Please can you look at my patch set previously posted which is now
a bit out of date, review, and think about how this driver can make
use of it.

In particular, I'd like ideas on what phylink should be doing with
tx_lpi_timer values that are out of range of the hardware. Should it
limit the value itself? Should set_eee() error out? I asked these
questions in the cover message but I don't think *anyone* reads
cover messages anymore - as evidenced by my recent patch series that
made reference to "make it sew" and Singer sewing machines. No one
noticed. So I think patch series cover messages are now useless on
netdev.

Thanks.
Oleksij Rempel Jan. 8, 2025, 2:23 p.m. UTC | #2
On Wed, Jan 08, 2025 at 12:47:37PM +0000, Russell King (Oracle) wrote:
> On Wed, Jan 08, 2025 at 01:13:41PM +0100, Oleksij Rempel wrote:
> > Refactor Energy-Efficient Ethernet (EEE) support in the LAN78xx driver
> > to integrate with phylink. This includes the following changes:
> > 
> > - Use phylink_ethtool_get_eee and phylink_ethtool_set_eee to manage
> >   EEE settings, aligning with the phylink API.
> > - Add a new tx_lpi_timer variable to manage the TX LPI (Low Power Idle)
> >   request delay. Default it to 50 microseconds based on LAN7800 documentation
> >   recommendations.
> 
> phylib maintains tx_lpi_timer for you. Please use that instead.

Just using this variable directly phydev->eee_cfg.tx_lpi_timer ?

> In any case, I've been submitting phylink EEE support which will help
> driver authors get this correct, but I think it needs more feedback.
> Please can you look at my patch set previously posted which is now
> a bit out of date, review, and think about how this driver can make
> use of it.

Ack, will do. It looks like your port of lan743x to the new API
looks exactly like what I need for this driver too.

> In particular, I'd like ideas on what phylink should be doing with
> tx_lpi_timer values that are out of range of the hardware. Should it
> limit the value itself?

Yes, otherwise every MAC driver will need to do it in the
ethtool_set_eee() function.

The other question is, should we allow absolute maximum values, or sane
maximum? At some point will come the question, why the EEE is even
enabled?

The same is about minimal value, too low value will cause strong speed
degradation. Should we allow set insane minimum, but use sane default
value?

> Should set_eee() error out?

Yes, please.

> I asked these  questions in the cover message but I don't think *anyone*
> reads cover messages anymore - as evidenced by my recent patch series that
> made reference to "make it sew" and Singer sewing machines. No one
> noticed. So I think patch series cover messages are now useless on
> netdev.

lol :)
Russell King (Oracle) Jan. 8, 2025, 3:15 p.m. UTC | #3
On Wed, Jan 08, 2025 at 03:23:37PM +0100, Oleksij Rempel wrote:
> On Wed, Jan 08, 2025 at 12:47:37PM +0000, Russell King (Oracle) wrote:
> > On Wed, Jan 08, 2025 at 01:13:41PM +0100, Oleksij Rempel wrote:
> > > Refactor Energy-Efficient Ethernet (EEE) support in the LAN78xx driver
> > > to integrate with phylink. This includes the following changes:
> > > 
> > > - Use phylink_ethtool_get_eee and phylink_ethtool_set_eee to manage
> > >   EEE settings, aligning with the phylink API.
> > > - Add a new tx_lpi_timer variable to manage the TX LPI (Low Power Idle)
> > >   request delay. Default it to 50 microseconds based on LAN7800 documentation
> > >   recommendations.
> > 
> > phylib maintains tx_lpi_timer for you. Please use that instead.
> 
> Just using this variable directly phydev->eee_cfg.tx_lpi_timer ?

Yes. We're already accessing phydev->enable_tx_lpi directly, and we
have no helpers for this. Maybe it's more a question for Andrew,
whether he wishes to see phylib helpers for this state rather than
directly dereferencing phydev.

> > In any case, I've been submitting phylink EEE support which will help
> > driver authors get this correct, but I think it needs more feedback.
> > Please can you look at my patch set previously posted which is now
> > a bit out of date, review, and think about how this driver can make
> > use of it.
> 
> Ack, will do. It looks like your port of lan743x to the new API
> looks exactly like what I need for this driver too.
> 
> > In particular, I'd like ideas on what phylink should be doing with
> > tx_lpi_timer values that are out of range of the hardware. Should it
> > limit the value itself?
> 
> Yes, otherwise every MAC driver will need to do it in the
> ethtool_set_eee() function.

I've had several solutions, and my latest patch set actually has a
mixture of them in there (which is why I'm eager to try and find a way
forward on this, so I can fix the patch set):

1. the original idea to address this in Marvell platforms was to limit
   the LPI timer to the maximum representable value in the hardware,
   which would be 255us. This ignores that the hardware uses a 1us
   tick rate for the timer at 1G speeds, and 10us for 100M speeds.
   (So it limits it to 260us, even though the hardware can do 2550us
   at 100M speed). This limit was applied by clamping the value passed
   in from userspace without erroring out.

2. another solution was added the mac_validate_tx_lpi() method, and
   implementations added _in addition_ to the above, with the idea
   of erroring out for values > 255us on Marvell hardware.

3. another idea was to have mac_enable_tx_lpi() error out if it wasn't
   possible to allow e.g. falling back to a software timer (see stmmac
   comments below.) Another reason for erroring out applies to Marvell
   hardware, where PP2 hardware supports LPI on the GMAC but not the
   XGMAC - so it only works at speeds at or below 2.5G. However, that
   can be handled via the lpi_capabilities, so I don't think needs to
   be a concern.

> The other question is, should we allow absolute maximum values, or sane
> maximum? At some point will come the question, why the EEE is even
> enabled?

As referenced above, stmmac uses the hardware timer for LPI timeouts up
to and including 1048575us (STMMAC_ET_MAX). Beyond that, it uses a
normal kernel timer which is:

- disabled (and EEE mode reset) when we have a packet to transmit, or
  EEE is disabled
- is re-armed when cleaning up from packet transmission (although
  it looks like we attempt to immediately enter LPI mode, and would
  only wait for the timer if there are more packets to queue... maybe
  this is a bug in stmmac's implementation?) or when EEE mode is first
  enabled with a LPI timer longer than the above value.

So, should phylink have the capability to switch to a software LPI timer
implementation when the LPI timeout value exceeds what the hardware
supports? To put it another way, should the stmmac solution to this be
made generic?

Note that stmmac has this software timer implementation because not
only for the reason I've given above, but also because cores other than
GMAC4 that support LPI do not have support for the hardware timer.

> The same is about minimal value, too low value will cause strong speed
> degradation. Should we allow set insane minimum, but use sane default
> value?

We currently allow zero, and the behaviour of that depends on the
hardware. For example, in the last couple of days, it's been reported
that stmmac will never enter LPI with a value of zero.

Note that phylib defaults to zero, so imposing a minimum would cause
a read-modify-write of the EEE settings without setting the timer to
fail.

> > Should set_eee() error out?
> 
> Yes, please.

If we are to convert stmmac, then we need to consider what it's doing
(as per the above) and whether that should be generic - and if it isn't
what we want in generic code, then how do we allow drivers to do this if
they wish.
Oleksij Rempel Jan. 9, 2025, 5:13 p.m. UTC | #4
On Wed, Jan 08, 2025 at 03:15:52PM +0000, Russell King (Oracle) wrote:
> On Wed, Jan 08, 2025 at 03:23:37PM +0100, Oleksij Rempel wrote:
> > Yes, otherwise every MAC driver will need to do it in the
> > ethtool_set_eee() function.
> 
> I've had several solutions, and my latest patch set actually has a
> mixture of them in there (which is why I'm eager to try and find a way
> forward on this, so I can fix the patch set):
> 
> 1. the original idea to address this in Marvell platforms was to limit
>    the LPI timer to the maximum representable value in the hardware,
>    which would be 255us. This ignores that the hardware uses a 1us
>    tick rate for the timer at 1G speeds, and 10us for 100M speeds.
>    (So it limits it to 260us, even though the hardware can do 2550us
>    at 100M speed). This limit was applied by clamping the value passed
>    in from userspace without erroring out.
> 
> 2. another solution was added the mac_validate_tx_lpi() method, and
>    implementations added _in addition_ to the above, with the idea
>    of erroring out for values > 255us on Marvell hardware.
> 
> 3. another idea was to have mac_enable_tx_lpi() error out if it wasn't
>    possible to allow e.g. falling back to a software timer (see stmmac
>    comments below.) Another reason for erroring out applies to Marvell
>    hardware, where PP2 hardware supports LPI on the GMAC but not the
>    XGMAC - so it only works at speeds at or below 2.5G. However, that
>    can be handled via the lpi_capabilities, so I don't think needs to
>    be a concern.
> 
> > The other question is, should we allow absolute maximum values, or sane
> > maximum? At some point will come the question, why the EEE is even
> > enabled?
> 
> As referenced above, stmmac uses the hardware timer for LPI timeouts up
> to and including 1048575us (STMMAC_ET_MAX). Beyond that, it uses a
> normal kernel timer which is:
> 
> - disabled (and EEE mode reset) when we have a packet to transmit, or
>   EEE is disabled
> - is re-armed when cleaning up from packet transmission (although
>   it looks like we attempt to immediately enter LPI mode, and would
>   only wait for the timer if there are more packets to queue... maybe
>   this is a bug in stmmac's implementation?) or when EEE mode is first
>   enabled with a LPI timer longer than the above value.
> 
> So, should phylink have the capability to switch to a software LPI timer
> implementation when the LPI timeout value exceeds what the hardware
> supports?

No, i'll list my arguments later down.

> To put it another way, should the stmmac solution to this be
> made generic?

May be partially?

> Note that stmmac has this software timer implementation because not
> only for the reason I've given above, but also because cores other than
> GMAC4 that support LPI do not have support for the hardware timer.

There seems to be a samsung ethernet driver which implements software
based timer too.

> > The same is about minimal value, too low value will cause strong speed
> > degradation. Should we allow set insane minimum, but use sane default
> > value?
> 
> We currently allow zero, and the behaviour of that depends on the
> hardware. For example, in the last couple of days, it's been reported
> that stmmac will never enter LPI with a value of zero.
> 
> Note that phylib defaults to zero, so imposing a minimum would cause
> a read-modify-write of the EEE settings without setting the timer to
> fail.
>
> > > Should set_eee() error out?
> > 
> > Yes, please.
> 
> If we are to convert stmmac, then we need to consider what it's doing
> (as per the above) and whether that should be generic - and if it isn't
> what we want in generic code, then how do we allow drivers to do this if
> they wish.

I'll try to approach this from a user perspective. Currently, we have a single
`lpi_timer` interface for all link modes. If I start using it, I'm trying to
address a specific issue, but in most cases, I have no idea what link mode will
be active at any given time. To my knowledge, there are no user space tools
that allow users to configure different timer values for different link speeds.

So, what problems am I really trying to solve by adjusting this timer? I can
imagine the following:

1. Noticeable Speed Degradation:
 
   This happens when the timer is configured to a value smaller than the time
needed to put the hardware to sleep and wake it up again. For interfaces
supporting multiple link speeds with EEE, the most plausible configuration to
avoid degradation would be to set the timer to the maximum sleep-wake time
across all supported EEE link speeds.

2. Other Use Cases: 
 
   Most other scenarios involve trying to work around specific constraints or
optimizing for particular use cases:

   - Maximizing Power Savings: Setting the timer to the smallest possible
value to achieve aggressive power-saving. Why would a user do this? It seems
niche but might apply in low-traffic environments.

   - Reducing Latency for Periodic Traffic: For example, in audio
streaming, frames might be sent every X milliseconds. In this case, the timer
could be set slightly higher than X to allow the interface to enter LPI mode
between frames. As soon as the audio stops and no other traffic is present, the
interface transitions to LPI mode entirely. If the hardware supports timers
with values ≥ X, no additional complexity is needed. However, if the hardware
timer is not supported or the supported range is lower than X, a
software-assisted timer would be required. This might introduce additional
latency, and users should be made aware of this potential impact.
In my expectation HW timer based latency can be different to software based
timer.

From my current user perspective, I would expect the following behavior from
the existing `lpi_timer` interface:

1. Subtle Disabling of LPI Should Be Prevented: 
 
   If setting the `lpi_timer` to 0 effectively disables LPI, then this value
should not be allowed. The interface should ensure that LPI remains functional
unless explicitly turned off by the user.

2. Maximum Timer Value Should Align with Timer Implementation: 
 
   The maximum value of the `lpi_timer` should correspond to the timer
implementation in use:

   - No software and hardware timer should be mixed, otherwise it would
     affect latency behavior depending on the timer value.

   - If a hardware timer is supported but has a lower maximum range than
     required, the interface should support either:

     - Only the hardware timer within its valid range.
     - A fallback to only a software timer (if feasible for the system).  

   However, for hardware like switches, software-based LPI implementations
are not feasible.

3. Sensible Maximum Timer Values: 
 
   Setting the timer to excessively high values (e.g., one or two seconds or
more) makes the behavior unpredictable. Such configurations seem more like a
"time bomb" or a workaround for another issue that should be addressed
differently. For example:

   - If the use case requires such long timer values, it may make more sense to
disable `tx_lpi` entirely and manage power savings differently.

4. Errors for Unsupported Configurations: 
 
   If a configuration variation is not supported - whether due to hardware
constraints, a mismatch with the current link mode, or a similar limitation - the
user should receive a clear error message. EEE is already challenging to debug,
and silent failures or corner-case issues (e.g., a speed downshift from 1000
Mbps to 100 Mbps causing configuration to fail) would significantly degrade the
user experience.
   Some HW or drivers (for example dsa/microchip/ driver) do no support
LPI timer configuration. In this case the error should be returned too.

5. Separate Handling of LPI Activation:

   Some MACs support both automatic LPI activation (based on egress queue and
EEE/LPI activation bits) and forced activation for testing or software based
timers. Some PHYs, such as the Atheros AR8035, appear sensitive to premature
LPI activation, particularly during the transition from autonegotiation to an
active link. To address this:

   - The MAC driver should expose controls for managing automatic versus forced
     LPI activation where applicable. This will be needed for common software
     based timer implementation.

   - The PHYLINK API should provide separate control mechanisms for LPI
     activation and link state transitions. (done)

6. Consideration for Link-Independent Modes: 
 
   Certain EEE-related configurations can be applied without a PHY, while
others are entirely dependent on the PHY being present. The system should
differentiate between these cases and handle them as follows:

   - EEE On/Off: 
 
     Enabling or disabling EEE at the MAC level should be allowed without a
PHY. This can be treated as a user preference - "I prefer EEE to be on if
supported." If a PHY becomes available later and supports EEE, this preference
can then take effect.

   - LPI On/Off: 
 
     Similar to EEE on/off, enabling or disabling Low Power Idle (LPI) can be
managed at the MAC level independently of the PHY. This setting reflects the
MAC's ability to enter LPI mode. In SmartEEE or similar modes, this could
potentially involve PHY-specific behavior, but the basic LPI on/off setting
remains primarily MAC-specific.

   - LPI Timer:  

     The LPI timer is implementation-specific to the MAC driver and does not
inherently depend on the PHY. Yes, it depends at least on the link speed,
but this can't be addresses with existing interface.

   - EEE Advertisement:  

     Advertising EEE capabilities is entirely dependent on the PHY. Without a
PHY, these settings cannot be determined or validated, as the PHY defines the
supported capabilities. Any attempt to configure EEE advertisement without an
attached PHY should fail immediately with an appropriate error, such as:  "EEE
advertisement configuration not applied: no PHY available to validate
capabilities."
Russell King (Oracle) Jan. 9, 2025, 5:27 p.m. UTC | #5
On Thu, Jan 09, 2025 at 06:13:10PM +0100, Oleksij Rempel wrote:
> On Wed, Jan 08, 2025 at 03:15:52PM +0000, Russell King (Oracle) wrote:
> > On Wed, Jan 08, 2025 at 03:23:37PM +0100, Oleksij Rempel wrote:
> > > Yes, otherwise every MAC driver will need to do it in the
> > > ethtool_set_eee() function.
> > 
> > I've had several solutions, and my latest patch set actually has a
> > mixture of them in there (which is why I'm eager to try and find a way
> > forward on this, so I can fix the patch set):
> > 
> > 1. the original idea to address this in Marvell platforms was to limit
> >    the LPI timer to the maximum representable value in the hardware,
> >    which would be 255us. This ignores that the hardware uses a 1us
> >    tick rate for the timer at 1G speeds, and 10us for 100M speeds.
> >    (So it limits it to 260us, even though the hardware can do 2550us
> >    at 100M speed). This limit was applied by clamping the value passed
> >    in from userspace without erroring out.
> > 
> > 2. another solution was added the mac_validate_tx_lpi() method, and
> >    implementations added _in addition_ to the above, with the idea
> >    of erroring out for values > 255us on Marvell hardware.
> > 
> > 3. another idea was to have mac_enable_tx_lpi() error out if it wasn't
> >    possible to allow e.g. falling back to a software timer (see stmmac
> >    comments below.) Another reason for erroring out applies to Marvell
> >    hardware, where PP2 hardware supports LPI on the GMAC but not the
> >    XGMAC - so it only works at speeds at or below 2.5G. However, that
> >    can be handled via the lpi_capabilities, so I don't think needs to
> >    be a concern.
> > 
> > > The other question is, should we allow absolute maximum values, or sane
> > > maximum? At some point will come the question, why the EEE is even
> > > enabled?
> > 
> > As referenced above, stmmac uses the hardware timer for LPI timeouts up
> > to and including 1048575us (STMMAC_ET_MAX). Beyond that, it uses a
> > normal kernel timer which is:
> > 
> > - disabled (and EEE mode reset) when we have a packet to transmit, or
> >   EEE is disabled
> > - is re-armed when cleaning up from packet transmission (although
> >   it looks like we attempt to immediately enter LPI mode, and would
> >   only wait for the timer if there are more packets to queue... maybe
> >   this is a bug in stmmac's implementation?) or when EEE mode is first
> >   enabled with a LPI timer longer than the above value.
> > 
> > So, should phylink have the capability to switch to a software LPI timer
> > implementation when the LPI timeout value exceeds what the hardware
> > supports?
> 
> No, i'll list my arguments later down.
> 
> > To put it another way, should the stmmac solution to this be
> > made generic?
> 
> May be partially?
> 
> > Note that stmmac has this software timer implementation because not
> > only for the reason I've given above, but also because cores other than
> > GMAC4 that support LPI do not have support for the hardware timer.
> 
> There seems to be a samsung ethernet driver which implements software
> based timer too.
> 
> > > The same is about minimal value, too low value will cause strong speed
> > > degradation. Should we allow set insane minimum, but use sane default
> > > value?
> > 
> > We currently allow zero, and the behaviour of that depends on the
> > hardware. For example, in the last couple of days, it's been reported
> > that stmmac will never enter LPI with a value of zero.
> > 
> > Note that phylib defaults to zero, so imposing a minimum would cause
> > a read-modify-write of the EEE settings without setting the timer to
> > fail.
> >
> > > > Should set_eee() error out?
> > > 
> > > Yes, please.
> > 
> > If we are to convert stmmac, then we need to consider what it's doing
> > (as per the above) and whether that should be generic - and if it isn't
> > what we want in generic code, then how do we allow drivers to do this if
> > they wish.
> 
> I'll try to approach this from a user perspective. Currently, we have a single
> `lpi_timer` interface for all link modes. If I start using it, I'm trying to
> address a specific issue, but in most cases, I have no idea what link mode will
> be active at any given time. To my knowledge, there are no user space tools
> that allow users to configure different timer values for different link speeds.
> 
> So, what problems am I really trying to solve by adjusting this timer? I can
> imagine the following:
> 
> 1. Noticeable Speed Degradation:
>  
>    This happens when the timer is configured to a value smaller than the time
> needed to put the hardware to sleep and wake it up again. For interfaces
> supporting multiple link speeds with EEE, the most plausible configuration to
> avoid degradation would be to set the timer to the maximum sleep-wake time
> across all supported EEE link speeds.
> 
> 2. Other Use Cases: 
>  
>    Most other scenarios involve trying to work around specific constraints or
> optimizing for particular use cases:
> 
>    - Maximizing Power Savings: Setting the timer to the smallest possible
> value to achieve aggressive power-saving. Why would a user do this? It seems
> niche but might apply in low-traffic environments.
> 
>    - Reducing Latency for Periodic Traffic: For example, in audio
> streaming, frames might be sent every X milliseconds. In this case, the timer
> could be set slightly higher than X to allow the interface to enter LPI mode
> between frames. As soon as the audio stops and no other traffic is present, the
> interface transitions to LPI mode entirely. If the hardware supports timers
> with values ≥ X, no additional complexity is needed. However, if the hardware
> timer is not supported or the supported range is lower than X, a
> software-assisted timer would be required. This might introduce additional
> latency, and users should be made aware of this potential impact.
> In my expectation HW timer based latency can be different to software based
> timer.
> 
> From my current user perspective, I would expect the following behavior from
> the existing `lpi_timer` interface:
> 
> 1. Subtle Disabling of LPI Should Be Prevented: 
>  
>    If setting the `lpi_timer` to 0 effectively disables LPI, then this value
> should not be allowed. The interface should ensure that LPI remains functional
> unless explicitly turned off by the user.
> 
> 2. Maximum Timer Value Should Align with Timer Implementation: 
>  
>    The maximum value of the `lpi_timer` should correspond to the timer
> implementation in use:
> 
>    - No software and hardware timer should be mixed, otherwise it would
>      affect latency behavior depending on the timer value.
> 
>    - If a hardware timer is supported but has a lower maximum range than
>      required, the interface should support either:
> 
>      - Only the hardware timer within its valid range.
>      - A fallback to only a software timer (if feasible for the system).  
> 
>    However, for hardware like switches, software-based LPI implementations
> are not feasible.
> 
> 3. Sensible Maximum Timer Values: 
>  
>    Setting the timer to excessively high values (e.g., one or two seconds or
> more) makes the behavior unpredictable. Such configurations seem more like a
> "time bomb" or a workaround for another issue that should be addressed
> differently. For example:
> 
>    - If the use case requires such long timer values, it may make more sense to
> disable `tx_lpi` entirely and manage power savings differently.
> 
> 4. Errors for Unsupported Configurations: 
>  
>    If a configuration variation is not supported - whether due to hardware
> constraints, a mismatch with the current link mode, or a similar limitation - the
> user should receive a clear error message. EEE is already challenging to debug,
> and silent failures or corner-case issues (e.g., a speed downshift from 1000
> Mbps to 100 Mbps causing configuration to fail) would significantly degrade the
> user experience.
>    Some HW or drivers (for example dsa/microchip/ driver) do no support
> LPI timer configuration. In this case the error should be returned too.
> 
> 5. Separate Handling of LPI Activation:
> 
>    Some MACs support both automatic LPI activation (based on egress queue and
> EEE/LPI activation bits) and forced activation for testing or software based
> timers. Some PHYs, such as the Atheros AR8035, appear sensitive to premature
> LPI activation, particularly during the transition from autonegotiation to an
> active link. To address this:
> 
>    - The MAC driver should expose controls for managing automatic versus forced
>      LPI activation where applicable. This will be needed for common software
>      based timer implementation.
> 
>    - The PHYLINK API should provide separate control mechanisms for LPI
>      activation and link state transitions. (done)
> 
> 6. Consideration for Link-Independent Modes: 
>  
>    Certain EEE-related configurations can be applied without a PHY, while
> others are entirely dependent on the PHY being present. The system should
> differentiate between these cases and handle them as follows:
> 
>    - EEE On/Off: 
>  
>      Enabling or disabling EEE at the MAC level should be allowed without a
> PHY. This can be treated as a user preference - "I prefer EEE to be on if
> supported." If a PHY becomes available later and supports EEE, this preference
> can then take effect.
> 
>    - LPI On/Off: 
>  
>      Similar to EEE on/off, enabling or disabling Low Power Idle (LPI) can be
> managed at the MAC level independently of the PHY. This setting reflects the
> MAC's ability to enter LPI mode. In SmartEEE or similar modes, this could
> potentially involve PHY-specific behavior, but the basic LPI on/off setting
> remains primarily MAC-specific.
> 
>    - LPI Timer:  
> 
>      The LPI timer is implementation-specific to the MAC driver and does not
> inherently depend on the PHY. Yes, it depends at least on the link speed,
> but this can't be addresses with existing interface.
> 
>    - EEE Advertisement:  
> 
>      Advertising EEE capabilities is entirely dependent on the PHY. Without a
> PHY, these settings cannot be determined or validated, as the PHY defines the
> supported capabilities. Any attempt to configure EEE advertisement without an
> attached PHY should fail immediately with an appropriate error, such as:  "EEE
> advertisement configuration not applied: no PHY available to validate
> capabilities."

Sorry, at this point, I give up with phylink managed EEE. What you
detail above is way too much for me to get involved with, and goes
well beyond simply:

1) Fixing the cockup with the phylib-managed EEE that has caused *user*
   *regressions* that we need to resolve.

2) Providing core functionality so that newer implementations can have
   a consistency of behaviour.

I have *no* interest in doing a total rewrite of kernel EEE
functionality - that goes well beyond my aims here.

So I'm afraid that I really lost interest in reading your email, sorry.
Oleksij Rempel Jan. 9, 2025, 5:39 p.m. UTC | #6
On Thu, Jan 09, 2025 at 05:27:11PM +0000, Russell King (Oracle) wrote:
> On Thu, Jan 09, 2025 at 06:13:10PM +0100, Oleksij Rempel wrote:
> > On Wed, Jan 08, 2025 at 03:15:52PM +0000, Russell King (Oracle) wrote:
> > > On Wed, Jan 08, 2025 at 03:23:37PM +0100, Oleksij Rempel wrote:
> > > > Yes, otherwise every MAC driver will need to do it in the
> > > > ethtool_set_eee() function.
> > > 
> > > I've had several solutions, and my latest patch set actually has a
> > > mixture of them in there (which is why I'm eager to try and find a way
> > > forward on this, so I can fix the patch set):
> > > 
> > > 1. the original idea to address this in Marvell platforms was to limit
> > >    the LPI timer to the maximum representable value in the hardware,
> > >    which would be 255us. This ignores that the hardware uses a 1us
> > >    tick rate for the timer at 1G speeds, and 10us for 100M speeds.
> > >    (So it limits it to 260us, even though the hardware can do 2550us
> > >    at 100M speed). This limit was applied by clamping the value passed
> > >    in from userspace without erroring out.
> > > 
> > > 2. another solution was added the mac_validate_tx_lpi() method, and
> > >    implementations added _in addition_ to the above, with the idea
> > >    of erroring out for values > 255us on Marvell hardware.
> > > 
> > > 3. another idea was to have mac_enable_tx_lpi() error out if it wasn't
> > >    possible to allow e.g. falling back to a software timer (see stmmac
> > >    comments below.) Another reason for erroring out applies to Marvell
> > >    hardware, where PP2 hardware supports LPI on the GMAC but not the
> > >    XGMAC - so it only works at speeds at or below 2.5G. However, that
> > >    can be handled via the lpi_capabilities, so I don't think needs to
> > >    be a concern.
> > > 
> > > > The other question is, should we allow absolute maximum values, or sane
> > > > maximum? At some point will come the question, why the EEE is even
> > > > enabled?
> > > 
> > > As referenced above, stmmac uses the hardware timer for LPI timeouts up
> > > to and including 1048575us (STMMAC_ET_MAX). Beyond that, it uses a
> > > normal kernel timer which is:
> > > 
> > > - disabled (and EEE mode reset) when we have a packet to transmit, or
> > >   EEE is disabled
> > > - is re-armed when cleaning up from packet transmission (although
> > >   it looks like we attempt to immediately enter LPI mode, and would
> > >   only wait for the timer if there are more packets to queue... maybe
> > >   this is a bug in stmmac's implementation?) or when EEE mode is first
> > >   enabled with a LPI timer longer than the above value.
> > > 
> > > So, should phylink have the capability to switch to a software LPI timer
> > > implementation when the LPI timeout value exceeds what the hardware
> > > supports?
> > 
> > No, i'll list my arguments later down.
> > 
> > > To put it another way, should the stmmac solution to this be
> > > made generic?
> > 
> > May be partially?
> > 
> > > Note that stmmac has this software timer implementation because not
> > > only for the reason I've given above, but also because cores other than
> > > GMAC4 that support LPI do not have support for the hardware timer.
> > 
> > There seems to be a samsung ethernet driver which implements software
> > based timer too.
> > 
> > > > The same is about minimal value, too low value will cause strong speed
> > > > degradation. Should we allow set insane minimum, but use sane default
> > > > value?
> > > 
> > > We currently allow zero, and the behaviour of that depends on the
> > > hardware. For example, in the last couple of days, it's been reported
> > > that stmmac will never enter LPI with a value of zero.
> > > 
> > > Note that phylib defaults to zero, so imposing a minimum would cause
> > > a read-modify-write of the EEE settings without setting the timer to
> > > fail.
> > >
> > > > > Should set_eee() error out?
> > > > 
> > > > Yes, please.
> > > 
> > > If we are to convert stmmac, then we need to consider what it's doing
> > > (as per the above) and whether that should be generic - and if it isn't
> > > what we want in generic code, then how do we allow drivers to do this if
> > > they wish.
> > 
> >    - EEE Advertisement:  
> > 
> >      Advertising EEE capabilities is entirely dependent on the PHY. Without a
> > PHY, these settings cannot be determined or validated, as the PHY defines the
> > supported capabilities. Any attempt to configure EEE advertisement without an
> > attached PHY should fail immediately with an appropriate error, such as:  "EEE
> > advertisement configuration not applied: no PHY available to validate
> > capabilities."
> 
> Sorry, at this point, I give up with phylink managed EEE. What you
> detail above is way too much for me to get involved with, and goes
> well beyond simply:
> 
> 1) Fixing the cockup with the phylib-managed EEE that has caused *user*
>    *regressions* that we need to resolve.
> 
> 2) Providing core functionality so that newer implementations can have
>    a consistency of behaviour.
> 
> I have *no* interest in doing a total rewrite of kernel EEE
> functionality - that goes well beyond my aims here.
>
> So I'm afraid that I really lost interest in reading your email, sorry.
 
Sorry for killing your motivation. I can feel your pain...
Russell King (Oracle) Jan. 9, 2025, 6:10 p.m. UTC | #7
On Thu, Jan 09, 2025 at 06:39:45PM +0100, Oleksij Rempel wrote:
> On Thu, Jan 09, 2025 at 05:27:11PM +0000, Russell King (Oracle) wrote:
> > On Thu, Jan 09, 2025 at 06:13:10PM +0100, Oleksij Rempel wrote:
> > > On Wed, Jan 08, 2025 at 03:15:52PM +0000, Russell King (Oracle) wrote:
> > > > On Wed, Jan 08, 2025 at 03:23:37PM +0100, Oleksij Rempel wrote:
> > > > > Yes, otherwise every MAC driver will need to do it in the
> > > > > ethtool_set_eee() function.
> > > > 
> > > > I've had several solutions, and my latest patch set actually has a
> > > > mixture of them in there (which is why I'm eager to try and find a way
> > > > forward on this, so I can fix the patch set):
> > > > 
> > > > 1. the original idea to address this in Marvell platforms was to limit
> > > >    the LPI timer to the maximum representable value in the hardware,
> > > >    which would be 255us. This ignores that the hardware uses a 1us
> > > >    tick rate for the timer at 1G speeds, and 10us for 100M speeds.
> > > >    (So it limits it to 260us, even though the hardware can do 2550us
> > > >    at 100M speed). This limit was applied by clamping the value passed
> > > >    in from userspace without erroring out.
> > > > 
> > > > 2. another solution was added the mac_validate_tx_lpi() method, and
> > > >    implementations added _in addition_ to the above, with the idea
> > > >    of erroring out for values > 255us on Marvell hardware.
> > > > 
> > > > 3. another idea was to have mac_enable_tx_lpi() error out if it wasn't
> > > >    possible to allow e.g. falling back to a software timer (see stmmac
> > > >    comments below.) Another reason for erroring out applies to Marvell
> > > >    hardware, where PP2 hardware supports LPI on the GMAC but not the
> > > >    XGMAC - so it only works at speeds at or below 2.5G. However, that
> > > >    can be handled via the lpi_capabilities, so I don't think needs to
> > > >    be a concern.
> > > > 
> > > > > The other question is, should we allow absolute maximum values, or sane
> > > > > maximum? At some point will come the question, why the EEE is even
> > > > > enabled?
> > > > 
> > > > As referenced above, stmmac uses the hardware timer for LPI timeouts up
> > > > to and including 1048575us (STMMAC_ET_MAX). Beyond that, it uses a
> > > > normal kernel timer which is:
> > > > 
> > > > - disabled (and EEE mode reset) when we have a packet to transmit, or
> > > >   EEE is disabled
> > > > - is re-armed when cleaning up from packet transmission (although
> > > >   it looks like we attempt to immediately enter LPI mode, and would
> > > >   only wait for the timer if there are more packets to queue... maybe
> > > >   this is a bug in stmmac's implementation?) or when EEE mode is first
> > > >   enabled with a LPI timer longer than the above value.
> > > > 
> > > > So, should phylink have the capability to switch to a software LPI timer
> > > > implementation when the LPI timeout value exceeds what the hardware
> > > > supports?
> > > 
> > > No, i'll list my arguments later down.
> > > 
> > > > To put it another way, should the stmmac solution to this be
> > > > made generic?
> > > 
> > > May be partially?
> > > 
> > > > Note that stmmac has this software timer implementation because not
> > > > only for the reason I've given above, but also because cores other than
> > > > GMAC4 that support LPI do not have support for the hardware timer.
> > > 
> > > There seems to be a samsung ethernet driver which implements software
> > > based timer too.
> > > 
> > > > > The same is about minimal value, too low value will cause strong speed
> > > > > degradation. Should we allow set insane minimum, but use sane default
> > > > > value?
> > > > 
> > > > We currently allow zero, and the behaviour of that depends on the
> > > > hardware. For example, in the last couple of days, it's been reported
> > > > that stmmac will never enter LPI with a value of zero.
> > > > 
> > > > Note that phylib defaults to zero, so imposing a minimum would cause
> > > > a read-modify-write of the EEE settings without setting the timer to
> > > > fail.
> > > >
> > > > > > Should set_eee() error out?
> > > > > 
> > > > > Yes, please.
> > > > 
> > > > If we are to convert stmmac, then we need to consider what it's doing
> > > > (as per the above) and whether that should be generic - and if it isn't
> > > > what we want in generic code, then how do we allow drivers to do this if
> > > > they wish.
> > > 
> > >    - EEE Advertisement:  
> > > 
> > >      Advertising EEE capabilities is entirely dependent on the PHY. Without a
> > > PHY, these settings cannot be determined or validated, as the PHY defines the
> > > supported capabilities. Any attempt to configure EEE advertisement without an
> > > attached PHY should fail immediately with an appropriate error, such as:  "EEE
> > > advertisement configuration not applied: no PHY available to validate
> > > capabilities."
> > 
> > Sorry, at this point, I give up with phylink managed EEE. What you
> > detail above is way too much for me to get involved with, and goes
> > well beyond simply:
> > 
> > 1) Fixing the cockup with the phylib-managed EEE that has caused *user*
> >    *regressions* that we need to resolve.
> > 
> > 2) Providing core functionality so that newer implementations can have
> >    a consistency of behaviour.
> > 
> > I have *no* interest in doing a total rewrite of kernel EEE
> > functionality - that goes well beyond my aims here.
> >
> > So I'm afraid that I really lost interest in reading your email, sorry.
>  
> Sorry for killing your motivation. I can feel your pain...

I just don't think it's right to throw a whole new load of problems
to be solved into the mix when we already have issues in the kernel
caused by inappropriately merged previous patches.

Andrew had a large patch set, which added the phylib core code, and
fixed up many drivers. This was taken by someone else, and only
Andrew's core phylib code was merged along with the code for their
driver, thus breaking a heck of a lot of other drivers.

Either this needs to be fixed, or why don't we just declare that we've
broken EEE in the kernel, declare that we don't support EEE at all, and
rip the whole sorry damn thing out and start again from scratch -
because what you're suggesting is basically changing *everything*
about EEE support.

Yes, what we currently do may be sub-optimal, but it's the API that
we've presented to userspace for a long time.

I just don't think it's right to decide to pile all these new issues
on top of the utter crap situation we currently have.

Oh, lookie... I just looked back in the git history to find the person
who submitted the subset of Andrew's code was... YOU. YOU broke lots of
drivers by doing so. Now you're torpedoing attempts to fix them by
trying to make it more complicated. Sorry, but your opinion has just
lost all credibility with me given the mess you previously created
and haven't been bothered to try to fix up. At least *I've* been
trying to fix you crap.
Andrew Lunn Jan. 9, 2025, 7:33 p.m. UTC | #8
> Andrew had a large patch set, which added the phylib core code, and
> fixed up many drivers. This was taken by someone else, and only
> Andrew's core phylib code was merged along with the code for their
> driver, thus breaking a heck of a lot of other drivers.

As Russell says, i did convert all existing drivers over the new
internal API, and removed some ugly parts of the old EEE core code.
I'm not too happy we only got part way with my patches. Having this in
between state makes the internal APIs much harder to deal with, and as
Russell says, we broke a few MAC drivers because the rest did not get
merged.

Before we think about extensions to the kAPI, we first need to finish
the refactor. Get all MAC drivers over to the newer internal API and
remove phy_init_eee() which really does need to die. My patches have
probably bit rotted a bit, but i doubt they are unusable. So i would
like to see them merged. I would however leave phylink drivers to
Russell. He went a slight different way than i did, and he should get
to decide how phylink should support this.

    Andrew
diff mbox series

Patch

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 55488ddba269..b9958a0533de 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -476,6 +476,8 @@  struct lan78xx_net {
 	struct phylink_config	phylink_config;
 
 	struct phy_device	*fixed_phy;
+
+	u32			tx_lpi_timer;
 };
 
 /* use ethtool to change the level for any given device */
@@ -1787,54 +1789,24 @@  static int lan78xx_set_wol(struct net_device *netdev,
 static int lan78xx_get_eee(struct net_device *net, struct ethtool_keee *edata)
 {
 	struct lan78xx_net *dev = netdev_priv(net);
-	struct phy_device *phydev = net->phydev;
 	int ret;
-	u32 buf;
 
-	ret = usb_autopm_get_interface(dev->intf);
+	ret = phylink_ethtool_get_eee(dev->phylink, edata);
 	if (ret < 0)
 		return ret;
 
-	ret = phy_ethtool_get_eee(phydev, edata);
-	if (ret < 0)
-		goto exit;
-
-	ret = lan78xx_read_reg(dev, MAC_CR, &buf);
-	if (buf & MAC_CR_EEE_EN_) {
-		/* EEE_TX_LPI_REQ_DLY & tx_lpi_timer are same uSec unit */
-		ret = lan78xx_read_reg(dev, EEE_TX_LPI_REQ_DLY, &buf);
-		edata->tx_lpi_timer = buf;
-	} else {
-		edata->tx_lpi_timer = 0;
-	}
-
-	ret = 0;
-exit:
-	usb_autopm_put_interface(dev->intf);
+	edata->tx_lpi_timer = dev->tx_lpi_timer;
 
-	return ret;
+	return 0;
 }
 
 static int lan78xx_set_eee(struct net_device *net, struct ethtool_keee *edata)
 {
 	struct lan78xx_net *dev = netdev_priv(net);
-	int ret;
-	u32 buf;
-
-	ret = usb_autopm_get_interface(dev->intf);
-	if (ret < 0)
-		return ret;
 
-	ret = phy_ethtool_set_eee(net->phydev, edata);
-	if (ret < 0)
-		goto out;
-
-	buf = (u32)edata->tx_lpi_timer;
-	ret = lan78xx_write_reg(dev, EEE_TX_LPI_REQ_DLY, buf);
-out:
-	usb_autopm_put_interface(dev->intf);
+	dev->tx_lpi_timer = edata->tx_lpi_timer;
 
-	return ret;
+	return phylink_ethtool_set_eee(dev->phylink, edata);
 }
 
 static void lan78xx_get_drvinfo(struct net_device *net,
@@ -2345,6 +2317,13 @@  static void lan78xx_mac_link_down(struct phylink_config *config,
 	if (ret < 0)
 		goto link_down_fail;
 
+	/* at least MAC_CR_EEE_EN_ should be disable for proper configuration
+	 * on link_up
+	 */
+	ret = lan78xx_update_reg(dev, MAC_CR, MAC_CR_EEE_EN_, 0);
+	if (ret < 0)
+		goto link_down_fail;
+
 	/* MAC reset seems to not affect MAC configuration, no idea if it is
 	 * really needed, but it was done in previous driver version. So, leave
 	 * it here.
@@ -2418,6 +2397,7 @@  static void lan78xx_mac_link_up(struct phylink_config *config,
 {
 	struct net_device *net = to_net_dev(config->dev);
 	struct lan78xx_net *dev = netdev_priv(net);
+	struct phy_device *phydev = net->phydev;
 	u32 mac_cr = 0;
 	int ret;
 
@@ -2439,6 +2419,18 @@  static void lan78xx_mac_link_up(struct phylink_config *config,
 	if (duplex == DUPLEX_FULL)
 		mac_cr |= MAC_CR_FULL_DUPLEX_;
 
+	if (phydev->enable_tx_lpi) {
+		/* EEE_TX_LPI_REQ_DLY should be written before MAC_CR_EEE_EN_
+		 * is set
+		 */
+		ret = lan78xx_write_reg(dev, EEE_TX_LPI_REQ_DLY,
+					dev->tx_lpi_timer);
+		if (ret < 0)
+			goto link_up_fail;
+
+		mac_cr |=  MAC_CR_EEE_EN_;
+	}
+
 	/* make sure TXEN and RXEN are disabled before reconfiguring MAC */
 	ret = lan78xx_update_reg(dev, MAC_CR, MAC_CR_SPEED_MASK_ |
 				 MAC_CR_FULL_DUPLEX_ | MAC_CR_EEE_EN_, mac_cr);
@@ -4430,6 +4422,25 @@  static int lan78xx_probe(struct usb_interface *intf,
 	dev->msg_enable = netif_msg_init(msg_level, NETIF_MSG_DRV
 					| NETIF_MSG_PROBE | NETIF_MSG_LINK);
 
+	/*
+	 * Default TX LPI (Low Power Idle) request delay count is set to 50us.
+	 *
+	 * Source: LAN7800 Documentation, DS00001992H, Section 15.1.57, Page 204.
+	 *
+	 * Reasoning:
+	 * According to the application note in the LAN7800 documentation, a
+	 * zero delay may negatively impact the TX data path’s ability to
+	 * support Gigabit operation. A value of 50us is recommended as a
+	 * reasonable default when the part operates at Gigabit speeds,
+	 * balancing stability and power efficiency in EEE mode. This delay can
+	 * be increased based on performance testing, as EEE is designed for
+	 * scenarios with mostly idle links and occasional bursts of full
+	 * bandwidth transmission. The goal is to ensure reliable Gigabit
+	 * performance without overly aggressive power optimization during
+	 * inactive periods.
+	 */
+	dev->tx_lpi_timer = 50;
+
 	skb_queue_head_init(&dev->rxq);
 	skb_queue_head_init(&dev->txq);
 	skb_queue_head_init(&dev->rxq_done);