Message ID | d1a1a6a6-fca4-4f1b-93b3-f2f6963b4e04.lettre@localhost (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [GIT,PULL] Allwinner Fixes for 5.10 | expand |
On Thu, Oct 29, 2020 at 8:06 PM Maxime Ripard <mripard@kernel.org> wrote: > > ---------------------------------------------------------------- > Mostly some fixes for a fallout in a PHY driver that pointed out errors > in our DTs. Can you clarify what this means for compatibility of the dtb files? I would assume that the modified .dts files all work on older kernels because you fix the setting, but at least some of them require the patch with newer kernels, right? Are they all broken without the change? Are other platforms likely to suffer the same problem? IIRC seems that at least the SynQuacer box had the same issue, but I don't yet understand how common the problem is. Arnd
On Thu, Oct 29, 2020 at 10:23 PM Arnd Bergmann <arnd@kernel.org> wrote: > > On Thu, Oct 29, 2020 at 8:06 PM Maxime Ripard <mripard@kernel.org> wrote: > > > > ---------------------------------------------------------------- > > Mostly some fixes for a fallout in a PHY driver that pointed out errors > > in our DTs. > > Can you clarify what this means for compatibility of the dtb files? > > I would assume that the modified .dts files all work on older kernels > because you fix the setting, but at least some of them require > the patch with newer kernels, right? > > Are they all broken without the change? Are other platforms > likely to suffer the same problem? IIRC seems that at least > the SynQuacer box had the same issue, but I don't yet > understand how common the problem is. To clarify: I had pulled the branch already when I replied, but the automated email for that apparently did not trigger. I would like to know the background here mainly so I can put it into my pull request to Linus. Arnd
On Thu, 29 Oct 2020 at 22:41, Arnd Bergmann <arnd@kernel.org> wrote: > > On Thu, Oct 29, 2020 at 10:23 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > On Thu, Oct 29, 2020 at 8:06 PM Maxime Ripard <mripard@kernel.org> wrote: > > > > > > ---------------------------------------------------------------- > > > Mostly some fixes for a fallout in a PHY driver that pointed out errors > > > in our DTs. > > > > Can you clarify what this means for compatibility of the dtb files? > > > > I would assume that the modified .dts files all work on older kernels > > because you fix the setting, but at least some of them require > > the patch with newer kernels, right? > > > > Are they all broken without the change? Are other platforms > > likely to suffer the same problem? IIRC seems that at least > > the SynQuacer box had the same issue, but I don't yet > > understand how common the problem is. > > To clarify: I had pulled the branch already when I replied, but the > automated email for that apparently did not trigger. > > I would like to know the background here mainly so I can put > it into my pull request to Linus. > The Realtek PHY driver used to ignore the TX/RX delay settings implied by the xxx in the the different rgmii-xxx phy-mode settings, and a failed attempt was made in the v5.2 timeframe to change this, and so the -xxx part has been effectively ignored all this time, meaning you could get away with providing the wrong value. Even though no platform appears to have been affected by this incorrect patch, the followup fix that repairs it has been backported to -stable, breaking all the formerly working platforms incorporating this PHY that describe the mode as 'rgmii' instead of 'rgmii-id'. I have argued that the backport of the followup fix should be reverted, given that there is a risk that production systems tracking a -stable release may be affected by this if they don't take their DT directly from the upstream kernel. I think this PR is fine, though - fixing the DTs going forward is needed in any case (although I think backporting DT changes to -stable is a bad idea but that is just my opinion)
On Thu, Oct 29, 2020 at 11:03 PM Ard Biesheuvel <ardb@kernel.org> wrote: > On Thu, 29 Oct 2020 at 22:41, Arnd Bergmann <arnd@kernel.org> wrote: > > On Thu, Oct 29, 2020 at 10:23 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > On Thu, Oct 29, 2020 at 8:06 PM Maxime Ripard <mripard@kernel.org> wrote: > > > > > > > > ---------------------------------------------------------------- > > > > Mostly some fixes for a fallout in a PHY driver that pointed out errors > > > > in our DTs. > > > > > > Can you clarify what this means for compatibility of the dtb files? > > > > > > I would assume that the modified .dts files all work on older kernels > > > because you fix the setting, but at least some of them require > > > the patch with newer kernels, right? > > > > > > Are they all broken without the change? Are other platforms > > > likely to suffer the same problem? IIRC seems that at least > > > the SynQuacer box had the same issue, but I don't yet > > > understand how common the problem is. > > > > To clarify: I had pulled the branch already when I replied, but the > > automated email for that apparently did not trigger. > > > > I would like to know the background here mainly so I can put > > it into my pull request to Linus. > > > > The Realtek PHY driver used to ignore the TX/RX delay settings implied > by the xxx in the the different rgmii-xxx phy-mode settings, and a > failed attempt was made in the v5.2 timeframe to change this, and so > the -xxx part has been effectively ignored all this time, meaning you > could get away with providing the wrong value. > > Even though no platform appears to have been affected by this > incorrect patch, the followup fix that repairs it has been backported > to -stable, breaking all the formerly working platforms incorporating > this PHY that describe the mode as 'rgmii' instead of 'rgmii-id'. I > have argued that the backport of the followup fix should be reverted, > given that there is a risk that production systems tracking a -stable > release may be affected by this if they don't take their DT directly > from the upstream kernel. > > I think this PR is fine, though - fixing the DTs going forward is > needed in any case (although I think backporting DT changes to -stable > is a bad idea but that is just my opinion) Hmm, that sounds much worse than I understood first. I think the mainline driver needs to be patched back to restore the previous behavior, and that backported to stable kernels if they are already broken by the backport. While I had already pulled these fixes, I'm dropping them now. It should have been clear that this kind of driver change is not acceptable. Arnd
On Thu, 29 Oct 2020 at 23:56, Arnd Bergmann <arnd@kernel.org> wrote: > > On Thu, Oct 29, 2020 at 11:03 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Thu, 29 Oct 2020 at 22:41, Arnd Bergmann <arnd@kernel.org> wrote: > > > On Thu, Oct 29, 2020 at 10:23 PM Arnd Bergmann <arnd@kernel.org> wrote: > > > > On Thu, Oct 29, 2020 at 8:06 PM Maxime Ripard <mripard@kernel.org> wrote: > > > > > > > > > > ---------------------------------------------------------------- > > > > > Mostly some fixes for a fallout in a PHY driver that pointed out errors > > > > > in our DTs. > > > > > > > > Can you clarify what this means for compatibility of the dtb files? > > > > > > > > I would assume that the modified .dts files all work on older kernels > > > > because you fix the setting, but at least some of them require > > > > the patch with newer kernels, right? > > > > > > > > Are they all broken without the change? Are other platforms > > > > likely to suffer the same problem? IIRC seems that at least > > > > the SynQuacer box had the same issue, but I don't yet > > > > understand how common the problem is. > > > > > > To clarify: I had pulled the branch already when I replied, but the > > > automated email for that apparently did not trigger. > > > > > > I would like to know the background here mainly so I can put > > > it into my pull request to Linus. > > > > > > > The Realtek PHY driver used to ignore the TX/RX delay settings implied > > by the xxx in the the different rgmii-xxx phy-mode settings, and a > > failed attempt was made in the v5.2 timeframe to change this, and so > > the -xxx part has been effectively ignored all this time, meaning you > > could get away with providing the wrong value. > > > > Even though no platform appears to have been affected by this > > incorrect patch, the followup fix that repairs it has been backported > > to -stable, breaking all the formerly working platforms incorporating > > this PHY that describe the mode as 'rgmii' instead of 'rgmii-id'. I > > have argued that the backport of the followup fix should be reverted, > > given that there is a risk that production systems tracking a -stable > > release may be affected by this if they don't take their DT directly > > from the upstream kernel. > > > > I think this PR is fine, though - fixing the DTs going forward is > > needed in any case (although I think backporting DT changes to -stable > > is a bad idea but that is just my opinion) > > Hmm, that sounds much worse than I understood first. I think the > mainline driver needs to be patched back to restore the previous > behavior, and that backported to stable kernels if they are already > broken by the backport. > > While I had already pulled these fixes, I'm dropping them now. > It should have been clear that this kind of driver change is not > acceptable. > This is what Ilias and I tried to argue [0] earlier today, but Andrew has a different view. [0] https://lore.kernel.org/netdev/CAMj1kXGQDeOGj+2+tMnPhjoPJRX+eTh8-94yaH_bGwDATL7pkg@mail.gmail.com/
On Thu, Oct 29, 2020 at 11:59 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > This is what Ilias and I tried to argue [0] earlier today, but Andrew > has a different view. > > [0] https://lore.kernel.org/netdev/CAMj1kXGQDeOGj+2+tMnPhjoPJRX+eTh8-94yaH_bGwDATL7pkg@mail.gmail.com/ Could we work around it in by updating the DT contents at boot time before it gets to the driver? We've done that in the past on specific machines to avoid breaking DT compatibility, though I don't like the idea of doing it for everyone. Is there a safe fallback? I.e. would it break other platforms if we were to always replace phy-mode="rgmii" with phy-mode="rgmii-id"? I suppose we have no way of detecting the actual phy from early DT code in order to figure out if the device behind the mdio bus is affected. The best we could do might be either a list of affected boards or a list of affected ethernet controllers for which this gets updated. Arnd
On Fri, 30 Oct 2020 at 10:06, Arnd Bergmann <arnd@kernel.org> wrote: > > On Thu, Oct 29, 2020 at 11:59 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > This is what Ilias and I tried to argue [0] earlier today, but Andrew > > has a different view. > > > > [0] https://lore.kernel.org/netdev/CAMj1kXGQDeOGj+2+tMnPhjoPJRX+eTh8-94yaH_bGwDATL7pkg@mail.gmail.com/ > > Could we work around it in by updating the DT contents at boot time before it > gets to the driver? We've done that in the past on specific machines to avoid > breaking DT compatibility, though I don't like the idea of doing it > for everyone. > > Is there a safe fallback? I.e. would it break other platforms if we were to > always replace phy-mode="rgmii" with phy-mode="rgmii-id"? I suppose we > have no way of detecting the actual phy from early DT code in order to > figure out if the device behind the mdio bus is affected. The best we could > do might be either a list of affected boards or a list of affected ethernet > controllers for which this gets updated. > The root of the issue is that the followup fix that was backported overrides the TX/RX delay setting configured by h/w straps. So 'rgmii' used to mean 'don't enable TX/RX delay in software' but now it means 'disable TX/RX delay in software even if it was enabled using h/w straps'. Omitting the phy-mode string entirely would help, as the PHY layer interprets this as 'leave the delay setting alone'. However, there are cases where the MAC driver may interpret this property as well, as it may have to apply the delay at the MAC end if the PHY is programmed for no delay. Also, it may need to distinguish between gmii and rgmii. So we cannot replace rgmii with anything else in the general case, the only thing we can do is leave the TX/RX delay setting alone if we detect that it is enabled by h/w straps. According to Andrew, this is possible, and he is willing to accept a patch that implements this for backporting to -stable, but otherwise treats it as someone else's problem.
On Fri, Oct 30, 2020 at 10:14 AM Ard Biesheuvel <ardb@kernel.org> wrote: > On Fri, 30 Oct 2020 at 10:06, Arnd Bergmann <arnd@kernel.org> wrote: > > > > On Thu, Oct 29, 2020 at 11:59 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > This is what Ilias and I tried to argue [0] earlier today, but Andrew > > > has a different view. > > > > > > [0] https://lore.kernel.org/netdev/CAMj1kXGQDeOGj+2+tMnPhjoPJRX+eTh8-94yaH_bGwDATL7pkg@mail.gmail.com/ > > > > Could we work around it in by updating the DT contents at boot time before it > > gets to the driver? We've done that in the past on specific machines to avoid > > breaking DT compatibility, though I don't like the idea of doing it > > for everyone. > > > > Is there a safe fallback? I.e. would it break other platforms if we were to > > always replace phy-mode="rgmii" with phy-mode="rgmii-id"? I suppose we > > have no way of detecting the actual phy from early DT code in order to > > figure out if the device behind the mdio bus is affected. The best we could > > do might be either a list of affected boards or a list of affected ethernet > > controllers for which this gets updated. > > > > The root of the issue is that the followup fix that was backported > overrides the TX/RX delay setting configured by h/w straps. > > So 'rgmii' used to mean 'don't enable TX/RX delay in software' but now > it means 'disable TX/RX delay in software even if it was enabled using > h/w straps'. > > Omitting the phy-mode string entirely would help, as the PHY layer > interprets this as 'leave the delay setting alone'. However, there are > cases where the MAC driver may interpret this property as well, as it > may have to apply the delay at the MAC end if the PHY is programmed > for no delay. Also, it may need to distinguish between gmii and rgmii. > > So we cannot replace rgmii with anything else in the general case, the > only thing we can do is leave the TX/RX delay setting alone if we > detect that it is enabled by h/w straps. According to Andrew, this is > possible, and he is willing to accept a patch that implements this for > backporting to -stable, but otherwise treats it as someone else's > problem. So I suppose the best workaround we can hope for would be to have a list of board compatible strings on which we remove any phy-mode="rgmii" properties but leave everything alone? We shouldn't have different behavior between stable kernels and future kernels looking at the same DTB, so I'd hope that would work in all combinations. Arnd
On Fri, 30 Oct 2020 at 10:45, Arnd Bergmann <arnd@kernel.org> wrote: > > On Fri, Oct 30, 2020 at 10:14 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Fri, 30 Oct 2020 at 10:06, Arnd Bergmann <arnd@kernel.org> wrote: > > > > > > On Thu, Oct 29, 2020 at 11:59 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > > > This is what Ilias and I tried to argue [0] earlier today, but Andrew > > > > has a different view. > > > > > > > > [0] https://lore.kernel.org/netdev/CAMj1kXGQDeOGj+2+tMnPhjoPJRX+eTh8-94yaH_bGwDATL7pkg@mail.gmail.com/ > > > > > > Could we work around it in by updating the DT contents at boot time before it > > > gets to the driver? We've done that in the past on specific machines to avoid > > > breaking DT compatibility, though I don't like the idea of doing it > > > for everyone. > > > > > > Is there a safe fallback? I.e. would it break other platforms if we were to > > > always replace phy-mode="rgmii" with phy-mode="rgmii-id"? I suppose we > > > have no way of detecting the actual phy from early DT code in order to > > > figure out if the device behind the mdio bus is affected. The best we could > > > do might be either a list of affected boards or a list of affected ethernet > > > controllers for which this gets updated. > > > > > > > The root of the issue is that the followup fix that was backported > > overrides the TX/RX delay setting configured by h/w straps. > > > > So 'rgmii' used to mean 'don't enable TX/RX delay in software' but now > > it means 'disable TX/RX delay in software even if it was enabled using > > h/w straps'. > > > > Omitting the phy-mode string entirely would help, as the PHY layer > > interprets this as 'leave the delay setting alone'. However, there are > > cases where the MAC driver may interpret this property as well, as it > > may have to apply the delay at the MAC end if the PHY is programmed > > for no delay. Also, it may need to distinguish between gmii and rgmii. > > > > So we cannot replace rgmii with anything else in the general case, the > > only thing we can do is leave the TX/RX delay setting alone if we > > detect that it is enabled by h/w straps. According to Andrew, this is > > possible, and he is willing to accept a patch that implements this for > > backporting to -stable, but otherwise treats it as someone else's > > problem. > > So I suppose the best workaround we can hope for would be > to have a list of board compatible strings on which we remove > any phy-mode="rgmii" properties but leave everything alone? > > We shouldn't have different behavior between stable kernels and > future kernels looking at the same DTB, so I'd hope that would > work in all combinations. > I think it would be better to redefine 'rgmii' as 'delay unspecified', and add 'rgmii-noid' which means that tx/rx delay should be disabled even when it is set by straps. The reason is that I don't think the latter will ever be used in practice - TX/RX delay is a h/w integration choice, and I don't see why you would enable it in hardware only to disable it again in software. But of course, I'm not really a networking person :-)
On Fri, Oct 30, 2020 at 11:03 AM Ard Biesheuvel <ardb@kernel.org> wrote: > On Fri, 30 Oct 2020 at 10:45, Arnd Bergmann <arnd@kernel.org> wrote: > > > > So I suppose the best workaround we can hope for would be > > to have a list of board compatible strings on which we remove > > any phy-mode="rgmii" properties but leave everything alone? > > > > We shouldn't have different behavior between stable kernels and > > future kernels looking at the same DTB, so I'd hope that would > > work in all combinations. > > I think it would be better to redefine 'rgmii' as 'delay unspecified', > and add 'rgmii-noid' which means that tx/rx delay should be disabled > even when it is set by straps. The reason is that I don't think the > latter will ever be used in practice - TX/RX delay is a h/w > integration choice, and I don't see why you would enable it in > hardware only to disable it again in software. But of course, I'm not > really a networking person :-) I think the meaning of "rgmii" would need to be "phy driver specific delay for backwards compatibility", but I'm not sure how many extra strings need to be defined, possibly three, to mean: a. rgmii, but leave delay unchanged from firmware b. rgmii, but use delay as configured from strapping pins c. rgmii, but turn off internal delay I also wonder if there could be more than one of these in order to work with older kernels that do not understand the new strings. How would existing kernels deal with this? phy-mode = "rgmii", "rgmii-noid"; (note that this would be slightly backwards, using the more generic string before the more specific one). Arnd
On Fri, 30 Oct 2020 at 11:15, Arnd Bergmann <arnd@kernel.org> wrote: > > On Fri, Oct 30, 2020 at 11:03 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Fri, 30 Oct 2020 at 10:45, Arnd Bergmann <arnd@kernel.org> wrote: > > > > > > > So I suppose the best workaround we can hope for would be > > > to have a list of board compatible strings on which we remove > > > any phy-mode="rgmii" properties but leave everything alone? > > > > > > We shouldn't have different behavior between stable kernels and > > > future kernels looking at the same DTB, so I'd hope that would > > > work in all combinations. > > > > I think it would be better to redefine 'rgmii' as 'delay unspecified', > > and add 'rgmii-noid' which means that tx/rx delay should be disabled > > even when it is set by straps. The reason is that I don't think the > > latter will ever be used in practice - TX/RX delay is a h/w > > integration choice, and I don't see why you would enable it in > > hardware only to disable it again in software. But of course, I'm not > > really a networking person :-) > > I think the meaning of "rgmii" would need to be "phy driver specific > delay for backwards compatibility", but I'm not sure how many extra > strings need to be defined, possibly three, to mean: > > a. rgmii, but leave delay unchanged from firmware > b. rgmii, but use delay as configured from strapping pins > c. rgmii, but turn off internal delay > > I also wonder if there could be more than one of these in order > to work with older kernels that do not understand the new strings. > How would existing kernels deal with this? > > phy-mode = "rgmii", "rgmii-noid"; > > (note that this would be slightly backwards, using the more generic > string before the more specific one). > I think this is all making it needlessly complicated, especially because it is really only needed to fix problems created by a patch that was backported without any evidence that it fixes any issues in practice. Basically, the Realtek PHY driver has always ignored the TX/RX delay settings given by software, even after the original fix went in for v5.2. Therefore, the followup fix fixed a theoretical issue, which could never have been the cause of a regression because the first fix was broken in the first place. Adhering strictly to the PHY mode going forward is fine. Backporting a change that knowingly breaks platforms, without evidence that it actually fixes any is just another symptom of our -stable disease, where everything that applies lexically is considered for -stable, without regards for what the change actually does.
From: Arnd Bergmann <arnd@arndb.de> On Thu, 29 Oct 2020 20:06:01 +0100, Maxime Ripard wrote: > Please pull the following changes for the current release. > > Thanks! > Maxime > > The following changes since commit 3650b228f83adda7e5ee532e2b90429c03f7b9ec: > > Linux 5.10-rc1 (2020-10-25 15:14:11 -0700) > > [...] Merged into arm/fixes, thanks! merge commit: b57d5437e3740bffed60ceedf74f881ab5bd6122 Arnd
On Fri, Oct 30, 2020 at 11:29 AM Ard Biesheuvel <ardb@kernel.org> wrote: > On Fri, 30 Oct 2020 at 11:15, Arnd Bergmann <arnd@kernel.org> wrote: > > On Fri, Oct 30, 2020 at 11:03 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > On Fri, 30 Oct 2020 at 10:45, Arnd Bergmann <arnd@kernel.org> wrote: > > > > > > > > So I suppose the best workaround we can hope for would be > > > > to have a list of board compatible strings on which we remove > > > > any phy-mode="rgmii" properties but leave everything alone? > > > > > > > > We shouldn't have different behavior between stable kernels and > > > > future kernels looking at the same DTB, so I'd hope that would > > > > work in all combinations. > > > > > > I think it would be better to redefine 'rgmii' as 'delay unspecified', > > > and add 'rgmii-noid' which means that tx/rx delay should be disabled > > > even when it is set by straps. The reason is that I don't think the > > > latter will ever be used in practice - TX/RX delay is a h/w > > > integration choice, and I don't see why you would enable it in > > > hardware only to disable it again in software. But of course, I'm not > > > really a networking person :-) > > > > I think the meaning of "rgmii" would need to be "phy driver specific > > delay for backwards compatibility", but I'm not sure how many extra > > strings need to be defined, possibly three, to mean: > > > > a. rgmii, but leave delay unchanged from firmware > > b. rgmii, but use delay as configured from strapping pins > > c. rgmii, but turn off internal delay > > > > I also wonder if there could be more than one of these in order > > to work with older kernels that do not understand the new strings. > > How would existing kernels deal with this? > > > > phy-mode = "rgmii", "rgmii-noid"; > > > > (note that this would be slightly backwards, using the more generic > > string before the more specific one). > > > > I think this is all making it needlessly complicated, especially > because it is really only needed to fix problems created by a patch > that was backported without any evidence that it fixes any issues in > practice. > > Basically, the Realtek PHY driver has always ignored the TX/RX delay > settings given by software, even after the original fix went in for > v5.2. Therefore, the followup fix fixed a theoretical issue, which > could never have been the cause of a regression because the first fix > was broken in the first place. > > Adhering strictly to the PHY mode going forward is fine. Backporting a > change that knowingly breaks platforms, without evidence that it > actually fixes any is just another symptom of our -stable disease, > where everything that applies lexically is considered for -stable, > without regards for what the change actually does. Sorry for delaying this, I ended up merging the fixes now as it seems worse to not have it fixed in v5.10. I actually still disagree with the driver fix even in mainline, we should really not break existing systems that are deployed with dtb files that only work by accident. Having the driver work the same way in stable and in mainline doesn't bother me as long as it doesn't regress but only warns about cases that are possibly broken. Arnd