diff mbox series

[net,v2,2/2] net: dsa: mt7530: fix disabling EEE on failure on MT7531 and MT7988

Message ID 20240321-for-net-mt7530-fix-eee-for-mt7531-mt7988-v2-2-9af9d5041bfe@arinc9.com (mailing list archive)
State New
Headers show
Series Fix EEE support for MT7531 and MT7988 SoC switch | expand

Commit Message

Arınç ÜNAL via B4 Relay March 21, 2024, 4:29 p.m. UTC
From: Arınç ÜNAL <arinc.unal@arinc9.com>

The MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits let the
PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits determine the 1G/100 EEE
abilities of the MAC. If MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 are
unset, the abilities are left to be determined by PHY auto polling.

The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features")
made it so that the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits are set on
mt753x_phylink_mac_link_up(). But it did not set the MT7531_FORCE_EEE1G and
MT7531_FORCE_EEE100 bits. Because of this, EEE will be enabled on the
switch MACs by polling the PHY, regardless of the result of phy_init_eee().

Define these bits and add them to MT7531_FORCE_MODE which is being used by
the subdriver. With this, EEE will be prevented from being enabled on the
switch MACs when phy_init_eee() fails.

Fixes: 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
---
 drivers/net/dsa/mt7530.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Paolo Abeni March 26, 2024, 9:02 a.m. UTC | #1
On Thu, 2024-03-21 at 19:29 +0300, Arınç ÜNAL via B4 Relay wrote:
> From: Arınç ÜNAL <arinc.unal@arinc9.com>
> 
> The MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits let the
> PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits determine the 1G/100 EEE
> abilities of the MAC. If MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 are
> unset, the abilities are left to be determined by PHY auto polling.
> 
> The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features")
> made it so that the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits are set on
> mt753x_phylink_mac_link_up(). But it did not set the MT7531_FORCE_EEE1G and
> MT7531_FORCE_EEE100 bits. Because of this, EEE will be enabled on the
> switch MACs by polling the PHY, regardless of the result of phy_init_eee().
> 
> Define these bits and add them to MT7531_FORCE_MODE which is being used by
> the subdriver. With this, EEE will be prevented from being enabled on the
> switch MACs when phy_init_eee() fails.
> 
> Fixes: 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features")
> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>

If I read the past discussion correctly, this is a potential issue
found by code inspection and never producing problem in practice, am I
correct?

If so I think it will deserve a 3rd party tested-by tag or similar to
go in.

If nobody could provide such feedback in a little time, I suggest to
drop this patch and apply only 1/2.

Cheers,

Paolo
Arınç ÜNAL March 26, 2024, 9:19 a.m. UTC | #2
On 26.03.2024 12:02, Paolo Abeni wrote:
> On Thu, 2024-03-21 at 19:29 +0300, Arınç ÜNAL via B4 Relay wrote:
>> From: Arınç ÜNAL <arinc.unal@arinc9.com>
>>
>> The MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits let the
>> PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits determine the 1G/100 EEE
>> abilities of the MAC. If MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 are
>> unset, the abilities are left to be determined by PHY auto polling.
>>
>> The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features")
>> made it so that the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits are set on
>> mt753x_phylink_mac_link_up(). But it did not set the MT7531_FORCE_EEE1G and
>> MT7531_FORCE_EEE100 bits. Because of this, EEE will be enabled on the
>> switch MACs by polling the PHY, regardless of the result of phy_init_eee().
>>
>> Define these bits and add them to MT7531_FORCE_MODE which is being used by
>> the subdriver. With this, EEE will be prevented from being enabled on the
>> switch MACs when phy_init_eee() fails.
>>
>> Fixes: 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features")
>> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
> 
> If I read the past discussion correctly, this is a potential issue
> found by code inspection and never producing problem in practice, am I
> correct?
> 
> If so I think it will deserve a 3rd party tested-by tag or similar to
> go in.
> 
> If nobody could provide such feedback in a little time, I suggest to
> drop this patch and apply only 1/2.

Whether a problem would happen in practice depends on when phy_init_eee()
fails, meaning it returns a negative non-zero code. I requested Russell to
review this patch to shed light on when phy_init_eee() would return a
negative non-zero code so we have an idea whether this patch actually fixes
a problem.

Arınç
Arınç ÜNAL March 27, 2024, 8:46 a.m. UTC | #3
On 26.03.2024 12:19, Arınç ÜNAL wrote:
> On 26.03.2024 12:02, Paolo Abeni wrote:
>> If I read the past discussion correctly, this is a potential issue
>> found by code inspection and never producing problem in practice, am I
>> correct?
>> 
>> If so I think it will deserve a 3rd party tested-by tag or similar to
>> go in.
>> 
>> If nobody could provide such feedback in a little time, I suggest to
>> drop this patch and apply only 1/2.
> 
> Whether a problem would happen in practice depends on when 
> phy_init_eee()
> fails, meaning it returns a negative non-zero code. I requested Russell 
> to
> review this patch to shed light on when phy_init_eee() would return a
> negative non-zero code so we have an idea whether this patch actually 
> fixes
> a problem.

I don't suppose Russell is going to review the patch at this point. I 
will
submit this to net-next then. If someone actually reports a problem in
practice, I can always submit it to the stable trees.

Arınç
Russell King (Oracle) March 27, 2024, 3:50 p.m. UTC | #4
On Tue, Mar 26, 2024 at 12:19:40PM +0300, Arınç ÜNAL wrote:
> On 26.03.2024 12:02, Paolo Abeni wrote:
> > On Thu, 2024-03-21 at 19:29 +0300, Arınç ÜNAL via B4 Relay wrote:
> > > From: Arınç ÜNAL <arinc.unal@arinc9.com>
> > > 
> > > The MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits let the
> > > PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits determine the 1G/100 EEE
> > > abilities of the MAC. If MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 are
> > > unset, the abilities are left to be determined by PHY auto polling.
> > > 
> > > The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features")
> > > made it so that the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits are set on
> > > mt753x_phylink_mac_link_up(). But it did not set the MT7531_FORCE_EEE1G and
> > > MT7531_FORCE_EEE100 bits. Because of this, EEE will be enabled on the
> > > switch MACs by polling the PHY, regardless of the result of phy_init_eee().
> > > 
> > > Define these bits and add them to MT7531_FORCE_MODE which is being used by
> > > the subdriver. With this, EEE will be prevented from being enabled on the
> > > switch MACs when phy_init_eee() fails.
> > > 
> > > Fixes: 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features")
> > > Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
> > 
> > If I read the past discussion correctly, this is a potential issue
> > found by code inspection and never producing problem in practice, am I
> > correct?
> > 
> > If so I think it will deserve a 3rd party tested-by tag or similar to
> > go in.
> > 
> > If nobody could provide such feedback in a little time, I suggest to
> > drop this patch and apply only 1/2.
> 
> Whether a problem would happen in practice depends on when phy_init_eee()
> fails, meaning it returns a negative non-zero code. I requested Russell to
> review this patch to shed light on when phy_init_eee() would return a
> negative non-zero code so we have an idea whether this patch actually fixes
> a problem.

Urgh, so I need to read the code and report back?

Well, looking at phy_init_eee(), it could return a negative vallue when:

1. phydev->drv is NULL
2. if genphy_c45_eee_is_active() returns negative
3. if genphy_c45_eee_is_active() returns zero, it returns
   -EPROTONOSUPPORT
4. if phy_set_bits_mmd() fails (e.g. communication error with the PHY)

If we then look at genphy_c45_eee_is_active(), then:

genphy_c45_read_eee_adv() and genphy_c45_read_eee_lpa() propagate their
non-zero return values, otherwise this function returns zero or positive
integer.

If we then look at genphy_c45_read_eee_adv(), then a failure of
phy_read_mmd() would cause a negative value to be returned.

Looking at genphy_c45_read_eee_lpa(), the same is true.

So, it can be summarised as:

- phydev->drv is NULL
- there is a communication error accessing the PHY
- EEE is not active

otherwise, it returns zero on success.

If one wishes to determine whether an error occurred vs EEE not being
supported through negotiation for the negotiated speed, if it returns
-EPROTONOSUPPORT in the latter case. Other error codes mean either the
driver has been unloaded or communication error.

This has been expertly determined by reading the code, which only a
phylib maintainer has the capability of doing. Thank you for using this
service.
Russell King (Oracle) March 27, 2024, 3:58 p.m. UTC | #5
On Wed, Mar 27, 2024 at 11:46:19AM +0300, arinc.unal@arinc9.com wrote:
> On 26.03.2024 12:19, Arınç ÜNAL wrote:
> > On 26.03.2024 12:02, Paolo Abeni wrote:
> > > If I read the past discussion correctly, this is a potential issue
> > > found by code inspection and never producing problem in practice, am I
> > > correct?
> > > 
> > > If so I think it will deserve a 3rd party tested-by tag or similar to
> > > go in.
> > > 
> > > If nobody could provide such feedback in a little time, I suggest to
> > > drop this patch and apply only 1/2.
> > 
> > Whether a problem would happen in practice depends on when
> > phy_init_eee()
> > fails, meaning it returns a negative non-zero code. I requested Russell
> > to
> > review this patch to shed light on when phy_init_eee() would return a
> > negative non-zero code so we have an idea whether this patch actually
> > fixes
> > a problem.
> 
> I don't suppose Russell is going to review the patch at this point. I will
> submit this to net-next then. If someone actually reports a problem in
> practice, I can always submit it to the stable trees.

So the fact that I only saw your request this morning to look at
phy_init_eee(), and to review this patch... because... I work for
Oracle, and I've been looking at backporting Arm64 KVM patches to
our kernel, been testing and debugging that effort... and the
act that less than 24 hours had passed since you made the original
request... yea, sorry, it's clearly my fault for not jumping on this
the moment you sent the email.

I get _so_ much email that incorrectly has me in the To: header. I
also get _so_ much email that fails to list me in the To: header
when the author wants me to respond. I don't have time to read every
email as it comes in. I certainly don't have time to read every
email in any case. I do the best I can, which varies considerably
with my workload.

I already find that being single, fitting everything in during the
day (paid work, chores, feeding oneself) is quite a mammoth task.
There is no one else to do the laundry. There is no one else to get
the shopping. There is no one else to do the washing up. There is no
one else to take the rubbish out. All this I do myself, and serially
because there is only one of me, and it all takes time away from
sitting here reading every damn email as it comes in.

And then when I end up doing something that _you_ very well could do
(reading the phy_init_eee() code to find out when it might return a
negative number) and then you send an email like this... yea... that
really gets my goat.
Russell King (Oracle) March 27, 2024, 3:59 p.m. UTC | #6
On Wed, Mar 27, 2024 at 03:58:13PM +0000, Russell King (Oracle) wrote:
> On Wed, Mar 27, 2024 at 11:46:19AM +0300, arinc.unal@arinc9.com wrote:
> > On 26.03.2024 12:19, Arınç ÜNAL wrote:
> > > On 26.03.2024 12:02, Paolo Abeni wrote:
> > > > If I read the past discussion correctly, this is a potential issue
> > > > found by code inspection and never producing problem in practice, am I
> > > > correct?
> > > > 
> > > > If so I think it will deserve a 3rd party tested-by tag or similar to
> > > > go in.
> > > > 
> > > > If nobody could provide such feedback in a little time, I suggest to
> > > > drop this patch and apply only 1/2.
> > > 
> > > Whether a problem would happen in practice depends on when
> > > phy_init_eee()
> > > fails, meaning it returns a negative non-zero code. I requested Russell
> > > to
> > > review this patch to shed light on when phy_init_eee() would return a
> > > negative non-zero code so we have an idea whether this patch actually
> > > fixes
> > > a problem.
> > 
> > I don't suppose Russell is going to review the patch at this point. I will
> > submit this to net-next then. If someone actually reports a problem in
> > practice, I can always submit it to the stable trees.
> 
> So the fact that I only saw your request this morning to look at
> phy_init_eee(), and to review this patch... because... I work for
> Oracle, and I've been looking at backporting Arm64 KVM patches to
> our kernel, been testing and debugging that effort... and the
> act that less than 24 hours had passed since you made the original
> request... yea, sorry, it's clearly my fault for not jumping on this
> the moment you sent the email.
> 
> I get _so_ much email that incorrectly has me in the To: header. I
> also get _so_ much email that fails to list me in the To: header
> when the author wants me to respond. I don't have time to read every
> email as it comes in. I certainly don't have time to read every
> email in any case. I do the best I can, which varies considerably
> with my workload.
> 
> I already find that being single, fitting everything in during the
> day (paid work, chores, feeding oneself) is quite a mammoth task.
> There is no one else to do the laundry. There is no one else to get
> the shopping. There is no one else to do the washing up. There is no
> one else to take the rubbish out. All this I do myself, and serially
> because there is only one of me, and it all takes time away from
> sitting here reading every damn email as it comes in.
> 
> And then when I end up doing something that _you_ very well could do
> (reading the phy_init_eee() code to find out when it might return a
> negative number) and then you send an email like this... yea... that
> really gets my goat.

... and now I have a 1:1 with my manager for the next 30-60 minutes.
Is it okay by you for me to be offline for that period of time while
I have a chat with him?
Arınç ÜNAL March 28, 2024, 2:31 p.m. UTC | #7
On 27.03.2024 18:50, Russell King (Oracle) wrote:
> On Tue, Mar 26, 2024 at 12:19:40PM +0300, Arınç ÜNAL wrote:
>> Whether a problem would happen in practice depends on when phy_init_eee()
>> fails, meaning it returns a negative non-zero code. I requested Russell to
>> review this patch to shed light on when phy_init_eee() would return a
>> negative non-zero code so we have an idea whether this patch actually fixes
>> a problem.
> 
> Urgh, so I need to read the code and report back?
> 
> Well, looking at phy_init_eee(), it could return a negative vallue when:
> 
> 1. phydev->drv is NULL
> 2. if genphy_c45_eee_is_active() returns negative
> 3. if genphy_c45_eee_is_active() returns zero, it returns
>     -EPROTONOSUPPORT
> 4. if phy_set_bits_mmd() fails (e.g. communication error with the PHY)
> 
> If we then look at genphy_c45_eee_is_active(), then:
> 
> genphy_c45_read_eee_adv() and genphy_c45_read_eee_lpa() propagate their
> non-zero return values, otherwise this function returns zero or positive
> integer.
> 
> If we then look at genphy_c45_read_eee_adv(), then a failure of
> phy_read_mmd() would cause a negative value to be returned.
> 
> Looking at genphy_c45_read_eee_lpa(), the same is true.
> 
> So, it can be summarised as:
> 
> - phydev->drv is NULL
> - there is a communication error accessing the PHY
> - EEE is not active
> 
> otherwise, it returns zero on success.
> 
> If one wishes to determine whether an error occurred vs EEE not being
> supported through negotiation for the negotiated speed, if it returns
> -EPROTONOSUPPORT in the latter case. Other error codes mean either the
> driver has been unloaded or communication error.
> 
> This has been expertly determined by reading the code, which only a
> phylib maintainer has the capability of doing. Thank you for using this
> service.

Thanks for explaining it. I believe determining enabling/disabling EEE on
the switch MAC by polling the PHY, when one of the last two conditions in
your summary is true, wouldn't result in having EEE enabled. And it seems
to me that if phydev->drv is NULL, there would be bigger problems with the
device. So I think it'll be more fitting to submit this patch to net-next.

Arınç
Arınç ÜNAL March 28, 2024, 2:46 p.m. UTC | #8
On 27.03.2024 18:59, Russell King (Oracle) wrote:
> On Wed, Mar 27, 2024 at 03:58:13PM +0000, Russell King (Oracle) wrote:
>> On Wed, Mar 27, 2024 at 11:46:19AM +0300, arinc.unal@arinc9.com wrote:
>>> On 26.03.2024 12:19, Arınç ÜNAL wrote:
>>>> Whether a problem would happen in practice depends on when
>>>> phy_init_eee()
>>>> fails, meaning it returns a negative non-zero code. I requested Russell
>>>> to
>>>> review this patch to shed light on when phy_init_eee() would return a
>>>> negative non-zero code so we have an idea whether this patch actually
>>>> fixes
>>>> a problem.
>>>
>>> I don't suppose Russell is going to review the patch at this point. I will
>>> submit this to net-next then. If someone actually reports a problem in
>>> practice, I can always submit it to the stable trees.
>>
>> So the fact that I only saw your request this morning to look at
>> phy_init_eee(), and to review this patch... because... I work for
>> Oracle, and I've been looking at backporting Arm64 KVM patches to
>> our kernel, been testing and debugging that effort... and the
>> act that less than 24 hours had passed since you made the original
>> request... yea, sorry, it's clearly my fault for not jumping on this
>> the moment you sent the email.
>>
>> I get _so_ much email that incorrectly has me in the To: header. I
>> also get _so_ much email that fails to list me in the To: header
>> when the author wants me to respond. I don't have time to read every
>> email as it comes in. I certainly don't have time to read every
>> email in any case. I do the best I can, which varies considerably
>> with my workload.
>>
>> I already find that being single, fitting everything in during the
>> day (paid work, chores, feeding oneself) is quite a mammoth task.
>> There is no one else to do the laundry. There is no one else to get
>> the shopping. There is no one else to do the washing up. There is no
>> one else to take the rubbish out. All this I do myself, and serially
>> because there is only one of me, and it all takes time away from
>> sitting here reading every damn email as it comes in.
>>
>> And then when I end up doing something that _you_ very well could do
>> (reading the phy_init_eee() code to find out when it might return a
>> negative number) and then you send an email like this... yea... that
>> really gets my goat.

I've made the request on 21th of March. It must've been buried under the
other emails that are incorrectly sent to you as you've described. Of
course you're not in fault for not responding. I trust your expertise on
the topic so I requested your comment. You're not entitled to do that,
which is why, after waiting for about 6 days, I assumed that you're not
interested in looking at this patch, so I responded with the intention to
move on.

https://lore.kernel.org/netdev/dc487e20-7d6c-48b7-a590-cb3bd815cd21@arinc9.com/

> 
> ... and now I have a 1:1 with my manager for the next 30-60 minutes.
> Is it okay by you for me to be offline for that period of time while
> I have a chat with him?

That sounds exhausting. I wish things get easier for you.

Arınç
diff mbox series

Patch

diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h
index 509ed5362236..5b99aeca34b4 100644
--- a/drivers/net/dsa/mt7530.h
+++ b/drivers/net/dsa/mt7530.h
@@ -299,11 +299,15 @@  enum mt7530_vlan_port_acc_frm {
 #define  MT7531_FORCE_DPX		BIT(29)
 #define  MT7531_FORCE_RX_FC		BIT(28)
 #define  MT7531_FORCE_TX_FC		BIT(27)
+#define  MT7531_FORCE_EEE100		BIT(26)
+#define  MT7531_FORCE_EEE1G		BIT(25)
 #define  MT7531_FORCE_MODE		(MT7531_FORCE_LNK | \
 					 MT7531_FORCE_SPD | \
 					 MT7531_FORCE_DPX | \
 					 MT7531_FORCE_RX_FC | \
-					 MT7531_FORCE_TX_FC)
+					 MT7531_FORCE_TX_FC | \
+					 MT7531_FORCE_EEE100 | \
+					 MT7531_FORCE_EEE1G)
 #define  PMCR_LINK_SETTINGS_MASK	(PMCR_TX_EN | PMCR_FORCE_SPEED_1000 | \
 					 PMCR_RX_EN | PMCR_FORCE_SPEED_100 | \
 					 PMCR_TX_FC_EN | PMCR_RX_FC_EN | \