Message ID | 20240321-for-net-mt7530-fix-eee-for-mt7531-mt7988-v2-2-9af9d5041bfe@arinc9.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Fix EEE support for MT7531 and MT7988 SoC switch | expand |
On Thu, 2024-03-21 at 19:29 +0300, Arınç ÜNAL via B4 Relay wrote: > From: Arınç ÜNAL <arinc.unal@arinc9.com> > > The MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits let the > PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits determine the 1G/100 EEE > abilities of the MAC. If MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 are > unset, the abilities are left to be determined by PHY auto polling. > > The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") > made it so that the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits are set on > mt753x_phylink_mac_link_up(). But it did not set the MT7531_FORCE_EEE1G and > MT7531_FORCE_EEE100 bits. Because of this, EEE will be enabled on the > switch MACs by polling the PHY, regardless of the result of phy_init_eee(). > > Define these bits and add them to MT7531_FORCE_MODE which is being used by > the subdriver. With this, EEE will be prevented from being enabled on the > switch MACs when phy_init_eee() fails. > > Fixes: 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") > Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> If I read the past discussion correctly, this is a potential issue found by code inspection and never producing problem in practice, am I correct? If so I think it will deserve a 3rd party tested-by tag or similar to go in. If nobody could provide such feedback in a little time, I suggest to drop this patch and apply only 1/2. Cheers, Paolo
On 26.03.2024 12:02, Paolo Abeni wrote: > On Thu, 2024-03-21 at 19:29 +0300, Arınç ÜNAL via B4 Relay wrote: >> From: Arınç ÜNAL <arinc.unal@arinc9.com> >> >> The MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits let the >> PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits determine the 1G/100 EEE >> abilities of the MAC. If MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 are >> unset, the abilities are left to be determined by PHY auto polling. >> >> The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") >> made it so that the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits are set on >> mt753x_phylink_mac_link_up(). But it did not set the MT7531_FORCE_EEE1G and >> MT7531_FORCE_EEE100 bits. Because of this, EEE will be enabled on the >> switch MACs by polling the PHY, regardless of the result of phy_init_eee(). >> >> Define these bits and add them to MT7531_FORCE_MODE which is being used by >> the subdriver. With this, EEE will be prevented from being enabled on the >> switch MACs when phy_init_eee() fails. >> >> Fixes: 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") >> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> > > If I read the past discussion correctly, this is a potential issue > found by code inspection and never producing problem in practice, am I > correct? > > If so I think it will deserve a 3rd party tested-by tag or similar to > go in. > > If nobody could provide such feedback in a little time, I suggest to > drop this patch and apply only 1/2. Whether a problem would happen in practice depends on when phy_init_eee() fails, meaning it returns a negative non-zero code. I requested Russell to review this patch to shed light on when phy_init_eee() would return a negative non-zero code so we have an idea whether this patch actually fixes a problem. Arınç
On 26.03.2024 12:19, Arınç ÜNAL wrote: > On 26.03.2024 12:02, Paolo Abeni wrote: >> If I read the past discussion correctly, this is a potential issue >> found by code inspection and never producing problem in practice, am I >> correct? >> >> If so I think it will deserve a 3rd party tested-by tag or similar to >> go in. >> >> If nobody could provide such feedback in a little time, I suggest to >> drop this patch and apply only 1/2. > > Whether a problem would happen in practice depends on when > phy_init_eee() > fails, meaning it returns a negative non-zero code. I requested Russell > to > review this patch to shed light on when phy_init_eee() would return a > negative non-zero code so we have an idea whether this patch actually > fixes > a problem. I don't suppose Russell is going to review the patch at this point. I will submit this to net-next then. If someone actually reports a problem in practice, I can always submit it to the stable trees. Arınç
On Tue, Mar 26, 2024 at 12:19:40PM +0300, Arınç ÜNAL wrote: > On 26.03.2024 12:02, Paolo Abeni wrote: > > On Thu, 2024-03-21 at 19:29 +0300, Arınç ÜNAL via B4 Relay wrote: > > > From: Arınç ÜNAL <arinc.unal@arinc9.com> > > > > > > The MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits let the > > > PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits determine the 1G/100 EEE > > > abilities of the MAC. If MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 are > > > unset, the abilities are left to be determined by PHY auto polling. > > > > > > The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") > > > made it so that the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits are set on > > > mt753x_phylink_mac_link_up(). But it did not set the MT7531_FORCE_EEE1G and > > > MT7531_FORCE_EEE100 bits. Because of this, EEE will be enabled on the > > > switch MACs by polling the PHY, regardless of the result of phy_init_eee(). > > > > > > Define these bits and add them to MT7531_FORCE_MODE which is being used by > > > the subdriver. With this, EEE will be prevented from being enabled on the > > > switch MACs when phy_init_eee() fails. > > > > > > Fixes: 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features") > > > Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> > > > > If I read the past discussion correctly, this is a potential issue > > found by code inspection and never producing problem in practice, am I > > correct? > > > > If so I think it will deserve a 3rd party tested-by tag or similar to > > go in. > > > > If nobody could provide such feedback in a little time, I suggest to > > drop this patch and apply only 1/2. > > Whether a problem would happen in practice depends on when phy_init_eee() > fails, meaning it returns a negative non-zero code. I requested Russell to > review this patch to shed light on when phy_init_eee() would return a > negative non-zero code so we have an idea whether this patch actually fixes > a problem. Urgh, so I need to read the code and report back? Well, looking at phy_init_eee(), it could return a negative vallue when: 1. phydev->drv is NULL 2. if genphy_c45_eee_is_active() returns negative 3. if genphy_c45_eee_is_active() returns zero, it returns -EPROTONOSUPPORT 4. if phy_set_bits_mmd() fails (e.g. communication error with the PHY) If we then look at genphy_c45_eee_is_active(), then: genphy_c45_read_eee_adv() and genphy_c45_read_eee_lpa() propagate their non-zero return values, otherwise this function returns zero or positive integer. If we then look at genphy_c45_read_eee_adv(), then a failure of phy_read_mmd() would cause a negative value to be returned. Looking at genphy_c45_read_eee_lpa(), the same is true. So, it can be summarised as: - phydev->drv is NULL - there is a communication error accessing the PHY - EEE is not active otherwise, it returns zero on success. If one wishes to determine whether an error occurred vs EEE not being supported through negotiation for the negotiated speed, if it returns -EPROTONOSUPPORT in the latter case. Other error codes mean either the driver has been unloaded or communication error. This has been expertly determined by reading the code, which only a phylib maintainer has the capability of doing. Thank you for using this service.
On Wed, Mar 27, 2024 at 11:46:19AM +0300, arinc.unal@arinc9.com wrote: > On 26.03.2024 12:19, Arınç ÜNAL wrote: > > On 26.03.2024 12:02, Paolo Abeni wrote: > > > If I read the past discussion correctly, this is a potential issue > > > found by code inspection and never producing problem in practice, am I > > > correct? > > > > > > If so I think it will deserve a 3rd party tested-by tag or similar to > > > go in. > > > > > > If nobody could provide such feedback in a little time, I suggest to > > > drop this patch and apply only 1/2. > > > > Whether a problem would happen in practice depends on when > > phy_init_eee() > > fails, meaning it returns a negative non-zero code. I requested Russell > > to > > review this patch to shed light on when phy_init_eee() would return a > > negative non-zero code so we have an idea whether this patch actually > > fixes > > a problem. > > I don't suppose Russell is going to review the patch at this point. I will > submit this to net-next then. If someone actually reports a problem in > practice, I can always submit it to the stable trees. So the fact that I only saw your request this morning to look at phy_init_eee(), and to review this patch... because... I work for Oracle, and I've been looking at backporting Arm64 KVM patches to our kernel, been testing and debugging that effort... and the act that less than 24 hours had passed since you made the original request... yea, sorry, it's clearly my fault for not jumping on this the moment you sent the email. I get _so_ much email that incorrectly has me in the To: header. I also get _so_ much email that fails to list me in the To: header when the author wants me to respond. I don't have time to read every email as it comes in. I certainly don't have time to read every email in any case. I do the best I can, which varies considerably with my workload. I already find that being single, fitting everything in during the day (paid work, chores, feeding oneself) is quite a mammoth task. There is no one else to do the laundry. There is no one else to get the shopping. There is no one else to do the washing up. There is no one else to take the rubbish out. All this I do myself, and serially because there is only one of me, and it all takes time away from sitting here reading every damn email as it comes in. And then when I end up doing something that _you_ very well could do (reading the phy_init_eee() code to find out when it might return a negative number) and then you send an email like this... yea... that really gets my goat.
On Wed, Mar 27, 2024 at 03:58:13PM +0000, Russell King (Oracle) wrote: > On Wed, Mar 27, 2024 at 11:46:19AM +0300, arinc.unal@arinc9.com wrote: > > On 26.03.2024 12:19, Arınç ÜNAL wrote: > > > On 26.03.2024 12:02, Paolo Abeni wrote: > > > > If I read the past discussion correctly, this is a potential issue > > > > found by code inspection and never producing problem in practice, am I > > > > correct? > > > > > > > > If so I think it will deserve a 3rd party tested-by tag or similar to > > > > go in. > > > > > > > > If nobody could provide such feedback in a little time, I suggest to > > > > drop this patch and apply only 1/2. > > > > > > Whether a problem would happen in practice depends on when > > > phy_init_eee() > > > fails, meaning it returns a negative non-zero code. I requested Russell > > > to > > > review this patch to shed light on when phy_init_eee() would return a > > > negative non-zero code so we have an idea whether this patch actually > > > fixes > > > a problem. > > > > I don't suppose Russell is going to review the patch at this point. I will > > submit this to net-next then. If someone actually reports a problem in > > practice, I can always submit it to the stable trees. > > So the fact that I only saw your request this morning to look at > phy_init_eee(), and to review this patch... because... I work for > Oracle, and I've been looking at backporting Arm64 KVM patches to > our kernel, been testing and debugging that effort... and the > act that less than 24 hours had passed since you made the original > request... yea, sorry, it's clearly my fault for not jumping on this > the moment you sent the email. > > I get _so_ much email that incorrectly has me in the To: header. I > also get _so_ much email that fails to list me in the To: header > when the author wants me to respond. I don't have time to read every > email as it comes in. I certainly don't have time to read every > email in any case. I do the best I can, which varies considerably > with my workload. > > I already find that being single, fitting everything in during the > day (paid work, chores, feeding oneself) is quite a mammoth task. > There is no one else to do the laundry. There is no one else to get > the shopping. There is no one else to do the washing up. There is no > one else to take the rubbish out. All this I do myself, and serially > because there is only one of me, and it all takes time away from > sitting here reading every damn email as it comes in. > > And then when I end up doing something that _you_ very well could do > (reading the phy_init_eee() code to find out when it might return a > negative number) and then you send an email like this... yea... that > really gets my goat. ... and now I have a 1:1 with my manager for the next 30-60 minutes. Is it okay by you for me to be offline for that period of time while I have a chat with him?
On 27.03.2024 18:50, Russell King (Oracle) wrote: > On Tue, Mar 26, 2024 at 12:19:40PM +0300, Arınç ÜNAL wrote: >> Whether a problem would happen in practice depends on when phy_init_eee() >> fails, meaning it returns a negative non-zero code. I requested Russell to >> review this patch to shed light on when phy_init_eee() would return a >> negative non-zero code so we have an idea whether this patch actually fixes >> a problem. > > Urgh, so I need to read the code and report back? > > Well, looking at phy_init_eee(), it could return a negative vallue when: > > 1. phydev->drv is NULL > 2. if genphy_c45_eee_is_active() returns negative > 3. if genphy_c45_eee_is_active() returns zero, it returns > -EPROTONOSUPPORT > 4. if phy_set_bits_mmd() fails (e.g. communication error with the PHY) > > If we then look at genphy_c45_eee_is_active(), then: > > genphy_c45_read_eee_adv() and genphy_c45_read_eee_lpa() propagate their > non-zero return values, otherwise this function returns zero or positive > integer. > > If we then look at genphy_c45_read_eee_adv(), then a failure of > phy_read_mmd() would cause a negative value to be returned. > > Looking at genphy_c45_read_eee_lpa(), the same is true. > > So, it can be summarised as: > > - phydev->drv is NULL > - there is a communication error accessing the PHY > - EEE is not active > > otherwise, it returns zero on success. > > If one wishes to determine whether an error occurred vs EEE not being > supported through negotiation for the negotiated speed, if it returns > -EPROTONOSUPPORT in the latter case. Other error codes mean either the > driver has been unloaded or communication error. > > This has been expertly determined by reading the code, which only a > phylib maintainer has the capability of doing. Thank you for using this > service. Thanks for explaining it. I believe determining enabling/disabling EEE on the switch MAC by polling the PHY, when one of the last two conditions in your summary is true, wouldn't result in having EEE enabled. And it seems to me that if phydev->drv is NULL, there would be bigger problems with the device. So I think it'll be more fitting to submit this patch to net-next. Arınç
On 27.03.2024 18:59, Russell King (Oracle) wrote: > On Wed, Mar 27, 2024 at 03:58:13PM +0000, Russell King (Oracle) wrote: >> On Wed, Mar 27, 2024 at 11:46:19AM +0300, arinc.unal@arinc9.com wrote: >>> On 26.03.2024 12:19, Arınç ÜNAL wrote: >>>> Whether a problem would happen in practice depends on when >>>> phy_init_eee() >>>> fails, meaning it returns a negative non-zero code. I requested Russell >>>> to >>>> review this patch to shed light on when phy_init_eee() would return a >>>> negative non-zero code so we have an idea whether this patch actually >>>> fixes >>>> a problem. >>> >>> I don't suppose Russell is going to review the patch at this point. I will >>> submit this to net-next then. If someone actually reports a problem in >>> practice, I can always submit it to the stable trees. >> >> So the fact that I only saw your request this morning to look at >> phy_init_eee(), and to review this patch... because... I work for >> Oracle, and I've been looking at backporting Arm64 KVM patches to >> our kernel, been testing and debugging that effort... and the >> act that less than 24 hours had passed since you made the original >> request... yea, sorry, it's clearly my fault for not jumping on this >> the moment you sent the email. >> >> I get _so_ much email that incorrectly has me in the To: header. I >> also get _so_ much email that fails to list me in the To: header >> when the author wants me to respond. I don't have time to read every >> email as it comes in. I certainly don't have time to read every >> email in any case. I do the best I can, which varies considerably >> with my workload. >> >> I already find that being single, fitting everything in during the >> day (paid work, chores, feeding oneself) is quite a mammoth task. >> There is no one else to do the laundry. There is no one else to get >> the shopping. There is no one else to do the washing up. There is no >> one else to take the rubbish out. All this I do myself, and serially >> because there is only one of me, and it all takes time away from >> sitting here reading every damn email as it comes in. >> >> And then when I end up doing something that _you_ very well could do >> (reading the phy_init_eee() code to find out when it might return a >> negative number) and then you send an email like this... yea... that >> really gets my goat. I've made the request on 21th of March. It must've been buried under the other emails that are incorrectly sent to you as you've described. Of course you're not in fault for not responding. I trust your expertise on the topic so I requested your comment. You're not entitled to do that, which is why, after waiting for about 6 days, I assumed that you're not interested in looking at this patch, so I responded with the intention to move on. https://lore.kernel.org/netdev/dc487e20-7d6c-48b7-a590-cb3bd815cd21@arinc9.com/ > > ... and now I have a 1:1 with my manager for the next 30-60 minutes. > Is it okay by you for me to be offline for that period of time while > I have a chat with him? That sounds exhausting. I wish things get easier for you. Arınç
diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h index 509ed5362236..5b99aeca34b4 100644 --- a/drivers/net/dsa/mt7530.h +++ b/drivers/net/dsa/mt7530.h @@ -299,11 +299,15 @@ enum mt7530_vlan_port_acc_frm { #define MT7531_FORCE_DPX BIT(29) #define MT7531_FORCE_RX_FC BIT(28) #define MT7531_FORCE_TX_FC BIT(27) +#define MT7531_FORCE_EEE100 BIT(26) +#define MT7531_FORCE_EEE1G BIT(25) #define MT7531_FORCE_MODE (MT7531_FORCE_LNK | \ MT7531_FORCE_SPD | \ MT7531_FORCE_DPX | \ MT7531_FORCE_RX_FC | \ - MT7531_FORCE_TX_FC) + MT7531_FORCE_TX_FC | \ + MT7531_FORCE_EEE100 | \ + MT7531_FORCE_EEE1G) #define PMCR_LINK_SETTINGS_MASK (PMCR_TX_EN | PMCR_FORCE_SPEED_1000 | \ PMCR_RX_EN | PMCR_FORCE_SPEED_100 | \ PMCR_TX_FC_EN | PMCR_RX_FC_EN | \