diff mbox series

[net-next,2/2] net: phylink: Fix issues with link balancing w/ BMC present

Message ID 174481734008.986682.1350602067856870465.stgit@ahduyck-xeon-server.home.arpa (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: phylink: Fix issue w/ BMC link flap | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: edumazet@google.com
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 53 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 17 this patch: 17
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2025-04-17--00-00 (tests: 910)

Commit Message

Alexander Duyck April 16, 2025, 3:29 p.m. UTC
From: Alexander Duyck <alexanderduyck@fb.com>

This change is meant to address the fact that there are link imbalances
introduced when using phylink on a system with a BMC. Specifically there
are two issues.

The first issue is that if we lose link after the first call to
phylink_start but before it gets to the phylink_resolve we will end up with
the phylink interface assuming the link was always down and not calling
phylink_link_down resulting in a stuck interface.

The second issue is that when a BMC is present we are currently forcing the
link down. This results in us bouncing the link for a fraction of a second
and that will result in dropped packets for the BMC.

The third issue is just an extra "Link Down" message that is seen when
calling phylink_resume. This is addressed by identifying that the link
isn't balanced and just not displaying the down message in such a case.

To resolve these issues this change introduces a new boolean variable
link_balanced. This value will be set to 0 initially when we create the
phylink interface, and again when we bring down the link and unbalance it
in phylink_suspend. When it is set to 0 it will force us to trigger the
phylink_link_up/down call which will have us write to the hardware. As a
result we can avoid the two issues and it should not rearm without another
call to phylink_suspend.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/net/phy/phylink.c |   18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

Comments

Russell King (Oracle) April 16, 2025, 4:01 p.m. UTC | #1
On Wed, Apr 16, 2025 at 08:29:00AM -0700, Alexander Duyck wrote:
> From: Alexander Duyck <alexanderduyck@fb.com>
> 
> This change is meant to address the fact that there are link imbalances
> introduced when using phylink on a system with a BMC. Specifically there
> are two issues.
> 
> The first issue is that if we lose link after the first call to
> phylink_start but before it gets to the phylink_resolve we will end up with
> the phylink interface assuming the link was always down and not calling
> phylink_link_down resulting in a stuck interface.

That is intentional.

phylink strictly orders .mac_link_down and .mac_link_up, and starts from
an initial position that the link _will_ be considered to be down. So,
it is intentional that .mac_link_down will _never_ be called after
phylink_start().

> The second issue is that when a BMC is present we are currently forcing the
> link down. This results in us bouncing the link for a fraction of a second
> and that will result in dropped packets for the BMC.

... but you don't explain how that happens.

> The third issue is just an extra "Link Down" message that is seen when
> calling phylink_resume. This is addressed by identifying that the link
> isn't balanced and just not displaying the down message in such a case.

Hmm, this one is an error, but is not as simple as "don't print the
message" as it results in a violation of the rule I mentioned above.
We need phylink_suspend() to record the state of the link at that
point, and avoid calling phylink_link_down() if the link was down
prior to suspend.
Russell King (Oracle) April 16, 2025, 5:12 p.m. UTC | #2
On Wed, Apr 16, 2025 at 05:01:09PM +0100, Russell King (Oracle) wrote:
> On Wed, Apr 16, 2025 at 08:29:00AM -0700, Alexander Duyck wrote:
> > From: Alexander Duyck <alexanderduyck@fb.com>
> > 
> > This change is meant to address the fact that there are link imbalances
> > introduced when using phylink on a system with a BMC. Specifically there
> > are two issues.
> > 
> > The first issue is that if we lose link after the first call to
> > phylink_start but before it gets to the phylink_resolve we will end up with
> > the phylink interface assuming the link was always down and not calling
> > phylink_link_down resulting in a stuck interface.
> 
> That is intentional.
> 
> phylink strictly orders .mac_link_down and .mac_link_up, and starts from
> an initial position that the link _will_ be considered to be down. So,
> it is intentional that .mac_link_down will _never_ be called after
> phylink_start().
> 
> > The second issue is that when a BMC is present we are currently forcing the
> > link down. This results in us bouncing the link for a fraction of a second
> > and that will result in dropped packets for the BMC.
> 
> ... but you don't explain how that happens.
> 
> > The third issue is just an extra "Link Down" message that is seen when
> > calling phylink_resume. This is addressed by identifying that the link
> > isn't balanced and just not displaying the down message in such a case.
> 
> Hmm, this one is an error, but is not as simple as "don't print the
> message" as it results in a violation of the rule I mentioned above.
> We need phylink_suspend() to record the state of the link at that
> point, and avoid calling phylink_link_down() if the link was down
> prior to suspend.

Okay, confirmed on nvidia Jetson Xavier NX:

[   11.838132] dwc-eth-dwmac 2490000.ethernet eth0: Adding VLAN ID 0 is not supported
[   15.299757] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

LAN cable was unplugged:
[   50.436587] dwc-eth-dwmac 2490000.ethernet eth0: Link is Down

Then the system was suspended using rtcwake for 3 seconds:

[   54.736849] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found
[   54.736898] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
[   54.741078] dwc-eth-dwmac 2490000.ethernet eth0: Link is Down

This shouldn't happen. With the patch I posted, this second "Link is Down"
message is not printed, and .mac_link_down() will not be called.
Alexander Duyck April 16, 2025, 7:03 p.m. UTC | #3
On Wed, Apr 16, 2025 at 9:01 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Wed, Apr 16, 2025 at 08:29:00AM -0700, Alexander Duyck wrote:
> > From: Alexander Duyck <alexanderduyck@fb.com>
> >
> > This change is meant to address the fact that there are link imbalances
> > introduced when using phylink on a system with a BMC. Specifically there
> > are two issues.
> >
> > The first issue is that if we lose link after the first call to
> > phylink_start but before it gets to the phylink_resolve we will end up with
> > the phylink interface assuming the link was always down and not calling
> > phylink_link_down resulting in a stuck interface.
>
> That is intentional.
>
> phylink strictly orders .mac_link_down and .mac_link_up, and starts from
> an initial position that the link _will_ be considered to be down. So,
> it is intentional that .mac_link_down will _never_ be called after
> phylink_start().

Well the issue is that with a BMC present the link may be up before we
even start using phylink. So if the link is lost while we are going
through phylink_start we will end up in an in-between state where the
link is physically down, but the MAC is still configured as though the
link is up. This will be problematic as the MAC should essentially be
discarding frames for transmit if the link is down to avoid blocking
internal Tx FIFOs.

Everything for our driver has to be a light touch as bringing down the
BMC link for any reason other than the physical link being lost is
essentially a criteria for failure as the BMC is the most essential
link on the system. So, for example the code I have in the driver for
handling a major config is currently checking in mac_prepare to verify
if we already have link based on the requested interface mode and FEC
settings and if we do the mac_config and pcs_config steps read an
internal state flag and do nothing, then we just go through to
mac_finish where we write the necessary bits to make sure the PCS/PMA
and PMA/PMD connections are enabled which essentially becomes a no-op
if the link is already enabled.

> > The second issue is that when a BMC is present we are currently forcing the
> > link down. This results in us bouncing the link for a fraction of a second
> > and that will result in dropped packets for the BMC.
>
> ... but you don't explain how that happens.

It was right there in the patch. It was the lines I removed:
@@ -2596,16 +2600,6 @@ void phylink_resume(struct phylink *pl)
        if (test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)) {
                /* Wake-on-Lan enabled, MAC handling */

-               /* Call mac_link_down() so we keep the overall state balanced.
-                * Do this under the state_mutex lock for consistency. This
-                * will cause a "Link Down" message to be printed during
-                * resume, which is harmless - the true link state will be
-                * printed when we run a resolve.
-                */
-               mutex_lock(&pl->state_mutex);
-               phylink_link_down(pl);
-               mutex_unlock(&pl->state_mutex);
-
                /* Re-apply the link parameters so that all the settings get
                 * restored to the MAC.
                 */

From a BMC perspective this forcing of the link down even if it is for
a fraction of a second is unacceptable as it can break up ssh sessions
to the BMC, especially if somebody is doing a bunch of configuration
changes on the NIC as it results in dropped frames. When compared to
the firmware based NIC approaches such as Broadcom and Nvidia/Mellanox
it is a huge negative as the BMC link is static with those NICs and
doesn't bounce no matter if the interface is being configured up/down,
the driver proved/removed ect.

The issue, even with your recent patch, is that it will still force
the link down if the link was previously up. That is the piece I need
to avoid to prevent the BMC from losing link. Ideally what I need is
to have a check of the current link state and then sync back up rather
than force the phylink state on the MAC and then clean things up after
the fact.

> > The third issue is just an extra "Link Down" message that is seen when
> > calling phylink_resume. This is addressed by identifying that the link
> > isn't balanced and just not displaying the down message in such a case.
>
> Hmm, this one is an error, but is not as simple as "don't print the
> message" as it results in a violation of the rule I mentioned above.
> We need phylink_suspend() to record the state of the link at that
> point, and avoid calling phylink_link_down() if the link was down
> prior to suspend.

Looking at your fix it doesn't resolve my core issue. All it does is
address the "third issue".

In my case the link is up and needs to stay up when I resume. The fact
that we are forcing the link down causes our BMC to drop connection
which makes it difficult to remotely manage the interface as changes
to the interface such as a simple ifconfig down; ifconfig up are
enough to cause the BMC to drop packets and potentially the ssh
session if we start losing enough of them.
Russell King (Oracle) April 16, 2025, 7:19 p.m. UTC | #4
On Wed, Apr 16, 2025 at 12:03:05PM -0700, Alexander Duyck wrote:
> On Wed, Apr 16, 2025 at 9:01 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Wed, Apr 16, 2025 at 08:29:00AM -0700, Alexander Duyck wrote:
> > > From: Alexander Duyck <alexanderduyck@fb.com>
> > >
> > > This change is meant to address the fact that there are link imbalances
> > > introduced when using phylink on a system with a BMC. Specifically there
> > > are two issues.
> > >
> > > The first issue is that if we lose link after the first call to
> > > phylink_start but before it gets to the phylink_resolve we will end up with
> > > the phylink interface assuming the link was always down and not calling
> > > phylink_link_down resulting in a stuck interface.
> >
> > That is intentional.
> >
> > phylink strictly orders .mac_link_down and .mac_link_up, and starts from
> > an initial position that the link _will_ be considered to be down. So,
> > it is intentional that .mac_link_down will _never_ be called after
> > phylink_start().
> 
> Well the issue is that with a BMC present the link may be up before we
> even start using phylink. So if the link is lost while we are going
> through phylink_start we will end up in an in-between state where the
> link is physically down, but the MAC is still configured as though the
> link is up. This will be problematic as the MAC should essentially be
> discarding frames for transmit if the link is down to avoid blocking
> internal Tx FIFOs.

So, when a Linux network driver probes, it starts out in administrative
state *DOWN*. When the administrator configures the network driver,
.ndo_open is called, which is expected to configure the network adapter.

Part of that process is to call phylink_start() as one of the last
steps, which detects whether the link is up or not. If the link is up,
then phylink will call netif_carrier_on() and .mac_link_up(). This
tells the core networking layers that the network interface is now
ready to start sending packets, and it will begin queueing packets for
the network driver to process - not before.

Prior to .ndo_open being called, the networking layers do not expect
traffic from the network device no matter what physical state the
media link is in. If .ndo_open fails, the same applies - no traffic is
expected to be passed to the core network layers from the network
layers because as far as the network stack is concerned, the interface
is still administratively down.

Thus, the fact that your BMC thinks that the link is up is irrelevant.

So, start off in a state that packet activity is suspended even if the
link is up at probe time. Only start packet activity (reception and
transmission) once .mac_link_up() has been called. Stop that activity
when .mac_link_down() is subsequently called.

There have been lots of NICs out there where the physical link doesn't
follow the adminstrative state of the network interface. This is not a
problem. It may be desirable that it does, but a desire is not the same
as "it must".

> > > The second issue is that when a BMC is present we are currently forcing the
> > > link down. This results in us bouncing the link for a fraction of a second
> > > and that will result in dropped packets for the BMC.
> >
> > ... but you don't explain how that happens.
> 
> It was right there in the patch. It was the lines I removed:

... thus further breaking phylink guarantees.

Sorry, but no.

> The issue, even with your recent patch, is that it will still force
> the link down if the link was previously up. That is the piece I need
> to avoid to prevent the BMC from losing link. Ideally what I need is
> to have a check of the current link state and then sync back up rather
> than force the phylink state on the MAC and then clean things up after
> the fact.

So don't force the link, just stop packet activity. As stated above,
nothing requires that the physical link is forced down just because
.mac_link_down() has been called.

This makes me wonder what happens to your BMC with your ideas if
userspace takes down the network interface. It sounds like all hell
breaks loose because you've taken the link down, and that's seen as
a critical failure... So taking down a network interface becomes a
critical failure - yet it's a *normal* userspace operation.
Russell King (Oracle) April 16, 2025, 8:05 p.m. UTC | #5
On Wed, Apr 16, 2025 at 08:19:38PM +0100, Russell King (Oracle) wrote:
> So, when a Linux network driver probes, it starts out in administrative
> state *DOWN*. When the administrator configures the network driver,
> .ndo_open is called, which is expected to configure the network adapter.
> 
> Part of that process is to call phylink_start() as one of the last
> steps, which detects whether the link is up or not. If the link is up,
> then phylink will call netif_carrier_on() and .mac_link_up(). This
> tells the core networking layers that the network interface is now
> ready to start sending packets, and it will begin queueing packets for
> the network driver to process - not before.
> 
> Prior to .ndo_open being called, the networking layers do not expect
> traffic from the network device no matter what physical state the
> media link is in. If .ndo_open fails, the same applies - no traffic is
> expected to be passed to the core network layers from the network
> layers because as far as the network stack is concerned, the interface
> is still administratively down.
> 
> Thus, the fact that your BMC thinks that the link is up is irrelevant.
> 
> So, start off in a state that packet activity is suspended even if the
> link is up at probe time. Only start packet activity (reception and
> transmission) once .mac_link_up() has been called. Stop that activity
> when .mac_link_down() is subsequently called.
> 
> There have been lots of NICs out there where the physical link doesn't
> follow the adminstrative state of the network interface. This is not a
> problem. It may be desirable that it does, but a desire is not the same
> as "it must".

Let me be crystal clear on this.

Phylink has a contract with all existing users. That contract is:

Initial state: link down.

Driver calls phylink_start() in its .ndo_open method.

Phylink does configuration of the PHY and link according to the
chosen link parameters by calling into the MAC, PCS, and phylib as
appropriate.

If the link is then discovered to be up (it might have been already
up before phylink_start() was called), phylink will call the various
components such as PCS and MAC to inform them that the link is now up.
This will mean calling the .mac_link_up() method. Otherwise (if the
link is discovered to be down when the interface is brought up) no
call to either .mac_link_up() nor .mac_link_down() will be made.

If the link _subsequently_ goes down, then phylink deals with that
and calls .mac_link_down() - only if .mac_link_up() was previously
called (that's one of the bugs you discovered, that on resume it
gets called anyway. I've submitted a fix for that contract breach,
which only affects a very small number of drivers - stmmac, ucc_geth
and your fbnic out of 22 total ethernet users plus however many DSA
users we have.)

Only if .mac_link_down() has been called, if the link subsequently
comes back up, then the same process happens as before resulting in
.mac_link_up() being called.

If the interface is taken down, then .mac_link_down() will be called
if and only if .mac_link_up() had been called.

The ordering of .mac_link_up() / .mac_link_down() is a strict
contract term with phylink users.

The reason for this contract: phylink users may have ordering
requirements.

For example, on mac_link_down(), they may wait for packet activity to
stop, and then place the MAC in reset. If called without a previous
.mac_link_up call, the wait stage may time out due to the MAC being
in reset. (Ocelot may suffer with this.)

Another example is fs_enet which also relies on this strict ordering
as described above.

There could be others - there are some that call into firmware on
calls to .mac_link_up() / .mac_link_down() and misordering those
depends on what the firmware is doing, which we have no visibility
of.

As I stated, this is the contract that phylink gave to users, and
the contract still stands, and can't be broken to behave differently
(e.g. calling .mac_link_down() after phylink_start() without an
intervening call to .mac_link_up()) otherwise existing users will
break. Bugs that go against that contract will be fixed, but the
contract will not be intentionally broken.
Alexander Duyck April 16, 2025, 10:58 p.m. UTC | #6
On Wed, Apr 16, 2025 at 1:05 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Wed, Apr 16, 2025 at 08:19:38PM +0100, Russell King (Oracle) wrote:
> > So, when a Linux network driver probes, it starts out in administrative
> > state *DOWN*. When the administrator configures the network driver,
> > .ndo_open is called, which is expected to configure the network adapter.
> >
> > Part of that process is to call phylink_start() as one of the last
> > steps, which detects whether the link is up or not. If the link is up,
> > then phylink will call netif_carrier_on() and .mac_link_up(). This
> > tells the core networking layers that the network interface is now
> > ready to start sending packets, and it will begin queueing packets for
> > the network driver to process - not before.
> >
> > Prior to .ndo_open being called, the networking layers do not expect
> > traffic from the network device no matter what physical state the
> > media link is in. If .ndo_open fails, the same applies - no traffic is
> > expected to be passed to the core network layers from the network
> > layers because as far as the network stack is concerned, the interface
> > is still administratively down.
> >
> > Thus, the fact that your BMC thinks that the link is up is irrelevant.
> >
> > So, start off in a state that packet activity is suspended even if the
> > link is up at probe time. Only start packet activity (reception and
> > transmission) once .mac_link_up() has been called. Stop that activity
> > when .mac_link_down() is subsequently called.
> >
> > There have been lots of NICs out there where the physical link doesn't
> > follow the adminstrative state of the network interface. This is not a
> > problem. It may be desirable that it does, but a desire is not the same
> > as "it must".
>
> Let me be crystal clear on this.
>
> Phylink has a contract with all existing users. That contract is:
>
> Initial state: link down.
>
> Driver calls phylink_start() in its .ndo_open method.
>
> Phylink does configuration of the PHY and link according to the
> chosen link parameters by calling into the MAC, PCS, and phylib as
> appropriate.
>
> If the link is then discovered to be up (it might have been already
> up before phylink_start() was called), phylink will call the various
> components such as PCS and MAC to inform them that the link is now up.
> This will mean calling the .mac_link_up() method. Otherwise (if the
> link is discovered to be down when the interface is brought up) no
> call to either .mac_link_up() nor .mac_link_down() will be made.
>
> If the link _subsequently_ goes down, then phylink deals with that
> and calls .mac_link_down() - only if .mac_link_up() was previously
> called (that's one of the bugs you discovered, that on resume it
> gets called anyway. I've submitted a fix for that contract breach,
> which only affects a very small number of drivers - stmmac, ucc_geth
> and your fbnic out of 22 total ethernet users plus however many DSA
> users we have.)
>
> Only if .mac_link_down() has been called, if the link subsequently
> comes back up, then the same process happens as before resulting in
> .mac_link_up() being called.
>
> If the interface is taken down, then .mac_link_down() will be called
> if and only if .mac_link_up() had been called.
>
> The ordering of .mac_link_up() / .mac_link_down() is a strict
> contract term with phylink users.
>
> The reason for this contract: phylink users may have ordering
> requirements.
>
> For example, on mac_link_down(), they may wait for packet activity to
> stop, and then place the MAC in reset. If called without a previous
> .mac_link_up call, the wait stage may time out due to the MAC being
> in reset. (Ocelot may suffer with this.)
>
> Another example is fs_enet which also relies on this strict ordering
> as described above.
>
> There could be others - there are some that call into firmware on
> calls to .mac_link_up() / .mac_link_down() and misordering those
> depends on what the firmware is doing, which we have no visibility
> of.
>
> As I stated, this is the contract that phylink gave to users, and
> the contract still stands, and can't be broken to behave differently
> (e.g. calling .mac_link_down() after phylink_start() without an
> intervening call to .mac_link_up()) otherwise existing users will
> break. Bugs that go against that contract will be fixed, but the
> contract will not be intentionally broken.

The issue is as that stands the contract is inherently broken if a BMC
is present.

1. There is still the link loss during the phylink_start issue which
will essentially leave the MAC up if the link fails. This will cause
the Tx FIFO on the NIC to essentially be left to hang without flushing
out the stale packets that may have queued up when the link dropped.
This is one of the reasons why the mac_link_down call still needs to
propagate all the way back down to the hardware.

2. We already have precedent for the link being up when WOL is in use.
One concern I would have with your patch is if it will impact that or
not. I suspected part of the mac_link_down is related to cleaning up
something related to the link setup for WOL configuring the link for
some speed.

3. While it may be a part of the contract, isn't there some way we can
"renegotiate the terms"? The fact is in the case of our driver we are
essentially doing a hand-off from FW to the OS for the link when we
call ndo_open. When we are done in ndo_stop we hand ownership back
over to the firmware. Those hand-offs need to be free of link bounces.
This pattern looks very much like the WOL setup with the only
exception being trying to avoid these link bounces. I'm just wondering
if there isn't some way we can add a phylink_config value indicating
that there is a BMC present on the link so we can handle those cases.
The general idea would be to update my patch so instead of
pl->link_balanced being just set to false in suspend and at creation
time we could essentially just set it to "pl->link_balanced =
!pl->config.mac_bmc_handoff".

The fact is the code with the diff I provided did everything I needed.
I could load/unload and ifconfig up/down the driver all day and it
didn't drop a packet. As it stood the definition of the mac_config
code more or less called out that we weren't supposed to change things
unless we needed to and I had followed that. I suspect the overall
change should be small, smaller now following your fix of the message
issue, and it shouldn't impact anything other than our driver if I add
the config flag checks.
diff mbox series

Patch

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 942ce114dabd..2b9ab343942e 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -45,6 +45,7 @@  struct phylink {
 	struct phylink_pcs *pcs;
 	struct device *dev;
 	unsigned int old_link_state:1;
+	unsigned int link_balanced:1;
 
 	unsigned long phylink_disable_state; /* bitmask of disables */
 	struct phy_device *phydev;
@@ -1553,7 +1554,8 @@  static void phylink_link_down(struct phylink *pl)
 
 	pl->mac_ops->mac_link_down(pl->config, pl->act_link_an_mode,
 				   pl->cur_interface);
-	phylink_info(pl, "Link is Down\n");
+	if (pl->link_balanced)
+		phylink_info(pl, "Link is Down\n");
 }
 
 static bool phylink_link_is_up(struct phylink *pl)
@@ -1658,12 +1660,13 @@  static void phylink_resolve(struct work_struct *w)
 	if (pl->major_config_failed)
 		link_state.link = false;
 
-	if (link_state.link != cur_link_state) {
+	if (link_state.link != cur_link_state || !pl->link_balanced) {
 		pl->old_link_state = link_state.link;
 		if (!link_state.link)
 			phylink_link_down(pl);
 		else
 			phylink_link_up(pl, link_state);
+		pl->link_balanced = true;
 	}
 	if (!link_state.link && retrigger) {
 		pl->link_failed = false;
@@ -2546,6 +2549,7 @@  void phylink_suspend(struct phylink *pl, bool mac_wol)
 			netif_carrier_off(pl->netdev);
 		else
 			pl->old_link_state = false;
+		pl->link_balanced = false;
 
 		/* We do not call mac_link_down() here as we want the
 		 * link to remain up to receive the WoL packets.
@@ -2596,16 +2600,6 @@  void phylink_resume(struct phylink *pl)
 	if (test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)) {
 		/* Wake-on-Lan enabled, MAC handling */
 
-		/* Call mac_link_down() so we keep the overall state balanced.
-		 * Do this under the state_mutex lock for consistency. This
-		 * will cause a "Link Down" message to be printed during
-		 * resume, which is harmless - the true link state will be
-		 * printed when we run a resolve.
-		 */
-		mutex_lock(&pl->state_mutex);
-		phylink_link_down(pl);
-		mutex_unlock(&pl->state_mutex);
-
 		/* Re-apply the link parameters so that all the settings get
 		 * restored to the MAC.
 		 */