Message ID | 1432658616-13733-1-git-send-email-l.stach@pengutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote: > The GPC rewrite to IRQ domains has been on the premise that it may break > suspend/resume for new kernels on old DT, but otherwise keep things working > from a user perspective. This was an accepted compromise to be able to move > the GIC cleanup forward. > > What actually happened was that booting a new kernel on an old DT crashes > before even the console is up, so the user does not even see the warning > that the DT is too old. The warning message suggests that this has been > known before, which is clearly unacceptable. To see any early message like this one, low-level debug support is expected to be turned on. But anyway, with your change, we get a much better situation. Will send it to arm-soc soon. Thanks for the patch. Shawn > > Fix the early crash by mapping the GPC memory space if the IRQ controller > doesn't claim it. This keeps at least CPUidle and the needed CPU wakeup > workarounds working. With this fixed the system is able to boot up > properly minus the expected suspend/resume breakage. > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de> > --- > Shawn, this is a critical fix for 4.1, to keep the combination of old > DT + new kernel at least somewhat working, to allow users to resolve > the situation on their systems. Please make sure this gets forwarded > ASAP. > --- > arch/arm/mach-imx/gpc.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c > index 4d60005e9277..bbf015056221 100644 > --- a/arch/arm/mach-imx/gpc.c > +++ b/arch/arm/mach-imx/gpc.c > @@ -280,9 +280,15 @@ void __init imx_gpc_check_dt(void) > struct device_node *np; > > np = of_find_compatible_node(NULL, NULL, "fsl,imx6q-gpc"); > - if (WARN_ON(!np || > - !of_find_property(np, "interrupt-controller", NULL))) > - pr_warn("Outdated DT detected, system is about to crash!!!\n"); > + if (WARN_ON(!np)) > + return; > + > + if (WARN_ON(!of_find_property(np, "interrupt-controller", NULL))) { > + pr_warn("Outdated DT detected, suspend/resume will NOT work\n"); > + > + /* map GPC, so that at least CPUidle and WARs keep working */ > + gpc_base = of_iomap(np, 0); > + } > } > > #ifdef CONFIG_PM_GENERIC_DOMAINS > -- > 2.1.4 >
Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo: > On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote: > > The GPC rewrite to IRQ domains has been on the premise that it may break > > suspend/resume for new kernels on old DT, but otherwise keep things working > > from a user perspective. This was an accepted compromise to be able to move > > the GIC cleanup forward. > > > > What actually happened was that booting a new kernel on an old DT crashes > > before even the console is up, so the user does not even see the warning > > that the DT is too old. The warning message suggests that this has been > > known before, which is clearly unacceptable. > > To see any early message like this one, low-level debug support is > expected to be turned on. > Using low-level debug might be acceptable for a developer. From a user perspective a kernel update without a corresponding DT update should never render the machine completely broken. Keep in mind that i.MX6 isn't only used in deeply embedded system, but is already in devices where non-developer users might update the kernel. They might even use prebuilt kernels where enabling low-level debug is not an option. We are not free to break the existing DT stability rules in such a drastic way, especially if keeping things working to some extent is easily done. I have another patch to fix a power domain problem with new kernel on old DT. Will send out in a minute. Regards, Lucas > But anyway, with your change, we get a much better situation. Will send > it to arm-soc soon. Thanks for the patch. > > Shawn > > > > > Fix the early crash by mapping the GPC memory space if the IRQ controller > > doesn't claim it. This keeps at least CPUidle and the needed CPU wakeup > > workarounds working. With this fixed the system is able to boot up > > properly minus the expected suspend/resume breakage. > > > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de> > > --- > > Shawn, this is a critical fix for 4.1, to keep the combination of old > > DT + new kernel at least somewhat working, to allow users to resolve > > the situation on their systems. Please make sure this gets forwarded > > ASAP. > > --- > > arch/arm/mach-imx/gpc.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c > > index 4d60005e9277..bbf015056221 100644 > > --- a/arch/arm/mach-imx/gpc.c > > +++ b/arch/arm/mach-imx/gpc.c > > @@ -280,9 +280,15 @@ void __init imx_gpc_check_dt(void) > > struct device_node *np; > > > > np = of_find_compatible_node(NULL, NULL, "fsl,imx6q-gpc"); > > - if (WARN_ON(!np || > > - !of_find_property(np, "interrupt-controller", NULL))) > > - pr_warn("Outdated DT detected, system is about to crash!!!\n"); > > + if (WARN_ON(!np)) > > + return; > > + > > + if (WARN_ON(!of_find_property(np, "interrupt-controller", NULL))) { > > + pr_warn("Outdated DT detected, suspend/resume will NOT work\n"); > > + > > + /* map GPC, so that at least CPUidle and WARs keep working */ > > + gpc_base = of_iomap(np, 0); > > + } > > } > > > > #ifdef CONFIG_PM_GENERIC_DOMAINS > > -- > > 2.1.4 > >
On 27/05/15 08:52, Lucas Stach wrote: > Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo: >> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote: >>> The GPC rewrite to IRQ domains has been on the premise that it may break >>> suspend/resume for new kernels on old DT, but otherwise keep things working >>> from a user perspective. This was an accepted compromise to be able to move >>> the GIC cleanup forward. >>> >>> What actually happened was that booting a new kernel on an old DT crashes >>> before even the console is up, so the user does not even see the warning >>> that the DT is too old. The warning message suggests that this has been >>> known before, which is clearly unacceptable. >> >> To see any early message like this one, low-level debug support is >> expected to be turned on. >> > Using low-level debug might be acceptable for a developer. > > From a user perspective a kernel update without a corresponding DT > update should never render the machine completely broken. Keep in mind > that i.MX6 isn't only used in deeply embedded system, but is already in > devices where non-developer users might update the kernel. They might > even use prebuilt kernels where enabling low-level debug is not an > option. I'd imagine that whoever provides this pre-build kernel would also deliver some form of release notes indicating the procedure. Even better, an installation script. > We are not free to break the existing DT stability rules in such a > drastic way, especially if keeping things working to some extent is > easily done. That would be on the condition that the DT was correct the first place, and accurately described the hardware. It didn't, breaking the contract we have the first place. We can argue for years about DT stability, history proves that it doesn't lead anywhere. Thanks, M.
Am Mittwoch, den 27.05.2015, 09:07 +0100 schrieb Marc Zyngier: > On 27/05/15 08:52, Lucas Stach wrote: > > Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo: > >> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote: > >>> The GPC rewrite to IRQ domains has been on the premise that it may break > >>> suspend/resume for new kernels on old DT, but otherwise keep things working > >>> from a user perspective. This was an accepted compromise to be able to move > >>> the GIC cleanup forward. > >>> > >>> What actually happened was that booting a new kernel on an old DT crashes > >>> before even the console is up, so the user does not even see the warning > >>> that the DT is too old. The warning message suggests that this has been > >>> known before, which is clearly unacceptable. > >> > >> To see any early message like this one, low-level debug support is > >> expected to be turned on. > >> > > Using low-level debug might be acceptable for a developer. > > > > From a user perspective a kernel update without a corresponding DT > > update should never render the machine completely broken. Keep in mind > > that i.MX6 isn't only used in deeply embedded system, but is already in > > devices where non-developer users might update the kernel. They might > > even use prebuilt kernels where enabling low-level debug is not an > > option. > > I'd imagine that whoever provides this pre-build kernel would also > deliver some form of release notes indicating the procedure. Even > better, an installation script. > > > We are not free to break the existing DT stability rules in such a > > drastic way, especially if keeping things working to some extent is > > easily done. > > That would be on the condition that the DT was correct the first place, > and accurately described the hardware. It didn't, breaking the contract > we have the first place. > > We can argue for years about DT stability, history proves that it > doesn't lead anywhere. > The fact that we are still in a learning phase with regard to DT and have made mistakes in the past (and I'm sure we will still do some in the future) isn't an excuse for not even trying to keep the stability. In this case there was a clear consent that we break DT stability by rendering suspend/resume broken. This was the accepted compromise between DT stability and the highly needed GIC cleanup. Completely breaking an entire SoC family in early boot was not part of the the compromise! Especially as it isn't really hard to keep things working to the specified extent, as this patch proves. Regards, Lucas
On 27/05/15 09:20, Lucas Stach wrote: > Am Mittwoch, den 27.05.2015, 09:07 +0100 schrieb Marc Zyngier: >> On 27/05/15 08:52, Lucas Stach wrote: >>> Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo: >>>> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote: >>>>> The GPC rewrite to IRQ domains has been on the premise that it may break >>>>> suspend/resume for new kernels on old DT, but otherwise keep things working >>>>> from a user perspective. This was an accepted compromise to be able to move >>>>> the GIC cleanup forward. >>>>> >>>>> What actually happened was that booting a new kernel on an old DT crashes >>>>> before even the console is up, so the user does not even see the warning >>>>> that the DT is too old. The warning message suggests that this has been >>>>> known before, which is clearly unacceptable. >>>> >>>> To see any early message like this one, low-level debug support is >>>> expected to be turned on. >>>> >>> Using low-level debug might be acceptable for a developer. >>> >>> From a user perspective a kernel update without a corresponding DT >>> update should never render the machine completely broken. Keep in mind >>> that i.MX6 isn't only used in deeply embedded system, but is already in >>> devices where non-developer users might update the kernel. They might >>> even use prebuilt kernels where enabling low-level debug is not an >>> option. >> >> I'd imagine that whoever provides this pre-build kernel would also >> deliver some form of release notes indicating the procedure. Even >> better, an installation script. >> >>> We are not free to break the existing DT stability rules in such a >>> drastic way, especially if keeping things working to some extent is >>> easily done. >> >> That would be on the condition that the DT was correct the first place, >> and accurately described the hardware. It didn't, breaking the contract >> we have the first place. >> >> We can argue for years about DT stability, history proves that it >> doesn't lead anywhere. >> > The fact that we are still in a learning phase with regard to DT and > have made mistakes in the past (and I'm sure we will still do some in > the future) isn't an excuse for not even trying to keep the stability. > > In this case there was a clear consent that we break DT stability by > rendering suspend/resume broken. This was the accepted compromise > between DT stability and the highly needed GIC cleanup. > Completely breaking an entire SoC family in early boot was not part of > the the compromise! Especially as it isn't really hard to keep things > working to the specified extent, as this patch proves. This patch series has been on the list for months, in -next for several weeks, and now in mainline for about a month. I would have appreciated your input and your patch during that time so that your pet platform would have been kept in an acceptable shape. Others have done so, you didn't. If you have such a vested interest in keeping this SoC on a smooth upgrade path, then I suggest you at least follow the various patch series potentially touching it, and contribute in a timely manner. Understand that people don't necessarily have access to your favourite piece of HW, and may overlook that kind of detail. I rely on others to let me know about it, and I encourage you to do so. Thanks, M.
Am Mittwoch, den 27.05.2015, 09:33 +0100 schrieb Marc Zyngier: > On 27/05/15 09:20, Lucas Stach wrote: > > Am Mittwoch, den 27.05.2015, 09:07 +0100 schrieb Marc Zyngier: > >> On 27/05/15 08:52, Lucas Stach wrote: > >>> Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo: > >>>> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote: > >>>>> The GPC rewrite to IRQ domains has been on the premise that it may break > >>>>> suspend/resume for new kernels on old DT, but otherwise keep things working > >>>>> from a user perspective. This was an accepted compromise to be able to move > >>>>> the GIC cleanup forward. > >>>>> > >>>>> What actually happened was that booting a new kernel on an old DT crashes > >>>>> before even the console is up, so the user does not even see the warning > >>>>> that the DT is too old. The warning message suggests that this has been > >>>>> known before, which is clearly unacceptable. > >>>> > >>>> To see any early message like this one, low-level debug support is > >>>> expected to be turned on. > >>>> > >>> Using low-level debug might be acceptable for a developer. > >>> > >>> From a user perspective a kernel update without a corresponding DT > >>> update should never render the machine completely broken. Keep in mind > >>> that i.MX6 isn't only used in deeply embedded system, but is already in > >>> devices where non-developer users might update the kernel. They might > >>> even use prebuilt kernels where enabling low-level debug is not an > >>> option. > >> > >> I'd imagine that whoever provides this pre-build kernel would also > >> deliver some form of release notes indicating the procedure. Even > >> better, an installation script. > >> > >>> We are not free to break the existing DT stability rules in such a > >>> drastic way, especially if keeping things working to some extent is > >>> easily done. > >> > >> That would be on the condition that the DT was correct the first place, > >> and accurately described the hardware. It didn't, breaking the contract > >> we have the first place. > >> > >> We can argue for years about DT stability, history proves that it > >> doesn't lead anywhere. > >> > > The fact that we are still in a learning phase with regard to DT and > > have made mistakes in the past (and I'm sure we will still do some in > > the future) isn't an excuse for not even trying to keep the stability. > > > > In this case there was a clear consent that we break DT stability by > > rendering suspend/resume broken. This was the accepted compromise > > between DT stability and the highly needed GIC cleanup. > > Completely breaking an entire SoC family in early boot was not part of > > the the compromise! Especially as it isn't really hard to keep things > > working to the specified extent, as this patch proves. > > This patch series has been on the list for months, in -next for several > weeks, and now in mainline for about a month. I would have appreciated > your input and your patch during that time so that your pet platform > would have been kept in an acceptable shape. Others have done so, you > didn't. > We are still in the RC phase of 4.1, so no harm has been done other than upsetting some developers and potential adventurous users. So we are still right in time to fix this. I don't have the bandwidth to test every patchset flying by and while I didn't provide an explicit ack, I was okay with it by looking at it. Honestly I trusted you that the patchset does what was stated in the commit message, which is break suspend/resume but otherwise keep things working. > If you have such a vested interest in keeping this SoC on a smooth > upgrade path, then I suggest you at least follow the various patch > series potentially touching it, and contribute in a timely manner. > Understand that people don't necessarily have access to your favourite > piece of HW, and may overlook that kind of detail. I rely on others to > let me know about it, and I encourage you to do so. > I'm fine with fixing things at the RC stages and I did not blame anyone personally. What is not okay is that the commit messages of the series didn't provide a hint for this kind of breakage. Until v6 of this series the warning message read "Outdated DT detected, suspend/resume will NOT work", only later it was changed to "Outdated DT detected, system is about to crash!!!", which seems to indicate that you were aware of the breakage, but neglected to explicitly state this in a commit message where it is easily accessible for other developers looking at the patches or resolve the problem. Also the attitude of saying "DT stability has been broken in the past so why bother with it in the future" is troubling IMHO. Regards, Lucas
On 27/05/15 10:05, Lucas Stach wrote: > Am Mittwoch, den 27.05.2015, 09:33 +0100 schrieb Marc Zyngier: >> On 27/05/15 09:20, Lucas Stach wrote: >>> Am Mittwoch, den 27.05.2015, 09:07 +0100 schrieb Marc Zyngier: >>>> On 27/05/15 08:52, Lucas Stach wrote: >>>>> Am Mittwoch, den 27.05.2015, 15:34 +0800 schrieb Shawn Guo: >>>>>> On Tue, May 26, 2015 at 06:43:36PM +0200, Lucas Stach wrote: >>>>>>> The GPC rewrite to IRQ domains has been on the premise that it may break >>>>>>> suspend/resume for new kernels on old DT, but otherwise keep things working >>>>>>> from a user perspective. This was an accepted compromise to be able to move >>>>>>> the GIC cleanup forward. >>>>>>> >>>>>>> What actually happened was that booting a new kernel on an old DT crashes >>>>>>> before even the console is up, so the user does not even see the warning >>>>>>> that the DT is too old. The warning message suggests that this has been >>>>>>> known before, which is clearly unacceptable. >>>>>> >>>>>> To see any early message like this one, low-level debug support is >>>>>> expected to be turned on. >>>>>> >>>>> Using low-level debug might be acceptable for a developer. >>>>> >>>>> From a user perspective a kernel update without a corresponding DT >>>>> update should never render the machine completely broken. Keep in mind >>>>> that i.MX6 isn't only used in deeply embedded system, but is already in >>>>> devices where non-developer users might update the kernel. They might >>>>> even use prebuilt kernels where enabling low-level debug is not an >>>>> option. >>>> >>>> I'd imagine that whoever provides this pre-build kernel would also >>>> deliver some form of release notes indicating the procedure. Even >>>> better, an installation script. >>>> >>>>> We are not free to break the existing DT stability rules in such a >>>>> drastic way, especially if keeping things working to some extent is >>>>> easily done. >>>> >>>> That would be on the condition that the DT was correct the first place, >>>> and accurately described the hardware. It didn't, breaking the contract >>>> we have the first place. >>>> >>>> We can argue for years about DT stability, history proves that it >>>> doesn't lead anywhere. >>>> >>> The fact that we are still in a learning phase with regard to DT and >>> have made mistakes in the past (and I'm sure we will still do some in >>> the future) isn't an excuse for not even trying to keep the stability. >>> >>> In this case there was a clear consent that we break DT stability by >>> rendering suspend/resume broken. This was the accepted compromise >>> between DT stability and the highly needed GIC cleanup. >>> Completely breaking an entire SoC family in early boot was not part of >>> the the compromise! Especially as it isn't really hard to keep things >>> working to the specified extent, as this patch proves. >> >> This patch series has been on the list for months, in -next for several >> weeks, and now in mainline for about a month. I would have appreciated >> your input and your patch during that time so that your pet platform >> would have been kept in an acceptable shape. Others have done so, you >> didn't. >> > We are still in the RC phase of 4.1, so no harm has been done other than > upsetting some developers and potential adventurous users. So we are > still right in time to fix this. > > I don't have the bandwidth to test every patchset flying by and while I > didn't provide an explicit ack, I was okay with it by looking at it. > Honestly I trusted you that the patchset does what was stated in the > commit message, which is break suspend/resume but otherwise keep things > working. The cover letter explicitly said that I didn't test it on such HW (only OMAP4/5 and Tegra-2). I don't have any FSL board in my zoo. >> If you have such a vested interest in keeping this SoC on a smooth >> upgrade path, then I suggest you at least follow the various patch >> series potentially touching it, and contribute in a timely manner. >> Understand that people don't necessarily have access to your favourite >> piece of HW, and may overlook that kind of detail. I rely on others to >> let me know about it, and I encourage you to do so. >> > I'm fine with fixing things at the RC stages and I did not blame anyone > personally. What is not okay is that the commit messages of the series > didn't provide a hint for this kind of breakage. > Until v6 of this series the warning message read "Outdated DT detected, > suspend/resume will NOT work", only later it was changed to "Outdated DT > detected, system is about to crash!!!", which seems to indicate that you > were aware of the breakage, but neglected to explicitly state this in a > commit message where it is easily accessible for other developers > looking at the patches or resolve the problem. You really are reading *way* too much into a message. I would have happily happily inserted a quote by Frank Zappa instead, but the meaning could have been slightly obscure. As to know about such a breakage, do you really think I care that little about the kernel to deliberately leave a bug in and paper over it? Please give me a break... > Also the attitude of saying "DT stability has been broken in the past so > why bother with it in the future" is troubling IMHO. Yes, this is a conspiracy. Read all about it. M.
diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c index 4d60005e9277..bbf015056221 100644 --- a/arch/arm/mach-imx/gpc.c +++ b/arch/arm/mach-imx/gpc.c @@ -280,9 +280,15 @@ void __init imx_gpc_check_dt(void) struct device_node *np; np = of_find_compatible_node(NULL, NULL, "fsl,imx6q-gpc"); - if (WARN_ON(!np || - !of_find_property(np, "interrupt-controller", NULL))) - pr_warn("Outdated DT detected, system is about to crash!!!\n"); + if (WARN_ON(!np)) + return; + + if (WARN_ON(!of_find_property(np, "interrupt-controller", NULL))) { + pr_warn("Outdated DT detected, suspend/resume will NOT work\n"); + + /* map GPC, so that at least CPUidle and WARs keep working */ + gpc_base = of_iomap(np, 0); + } } #ifdef CONFIG_PM_GENERIC_DOMAINS
The GPC rewrite to IRQ domains has been on the premise that it may break suspend/resume for new kernels on old DT, but otherwise keep things working from a user perspective. This was an accepted compromise to be able to move the GIC cleanup forward. What actually happened was that booting a new kernel on an old DT crashes before even the console is up, so the user does not even see the warning that the DT is too old. The warning message suggests that this has been known before, which is clearly unacceptable. Fix the early crash by mapping the GPC memory space if the IRQ controller doesn't claim it. This keeps at least CPUidle and the needed CPU wakeup workarounds working. With this fixed the system is able to boot up properly minus the expected suspend/resume breakage. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> --- Shawn, this is a critical fix for 4.1, to keep the combination of old DT + new kernel at least somewhat working, to allow users to resolve the situation on their systems. Please make sure this gets forwarded ASAP. --- arch/arm/mach-imx/gpc.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)