Message ID | b80b85cd6f26d02d698463b3dd5e9783081aa780.1510176592.git.hns@goldelico.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
* H. Nikolaus Schaller <hns@goldelico.com> [171108 21:32]: > commit d2b310b0234c ("ARM: debug: Use generic 8250 debug_ll for omap2 and omap3/4/5 common uarts") > commit fc23beb8a577 ("ARM: debug: Use generic 8250 debug_ll for omap3/4/5") > > switched to generic 8250 debug_ll code which seems to be incompatible > with at least OMAP5 boards (OMAP5EVM, Pyra) if CONFIG_DEBUG_LL is > still enabled in some legacy defconfig. Since this is very hard to > relate to these patches and difficult to identify, let's have the > compiler emit a warning. Hmm the issue is the existing values in .config as Kconfig.debug does default XXX if DEBUG_XXX, right? I think we have this issue in general if you enable DEBUG_LL for a SoC, then enable it for another SoC using the same .config? Regards, Tony
On Wed, Nov 08, 2017 at 02:27:55PM -0800, Tony Lindgren wrote: > * H. Nikolaus Schaller <hns@goldelico.com> [171108 21:32]: > > commit d2b310b0234c ("ARM: debug: Use generic 8250 debug_ll for omap2 and omap3/4/5 common uarts") > > commit fc23beb8a577 ("ARM: debug: Use generic 8250 debug_ll for omap3/4/5") > > > > switched to generic 8250 debug_ll code which seems to be incompatible > > with at least OMAP5 boards (OMAP5EVM, Pyra) if CONFIG_DEBUG_LL is > > still enabled in some legacy defconfig. Since this is very hard to > > relate to these patches and difficult to identify, let's have the > > compiler emit a warning. > > Hmm the issue is the existing values in .config as Kconfig.debug > does default XXX if DEBUG_XXX, right? > > I think we have this issue in general if you enable DEBUG_LL for > a SoC, then enable it for another SoC using the same .config? As I keep saying, DEBUG_LL is the low level debugging code, which is there to support people trying to debug the early bringup. It can only be configured for use with one particular port. If that doesn't match the SoC, then, if you make use of the DEBUG_LL code, it will fail to boot. Unfortunately, if you use early_printk rather than earlycon, as early_printk uses the DEBUG_LL bits, it means that your early_printk kernel is tied to the selected platform. All well known issues. early_printk should not be used in a multi- platform kernel for this very reason. We don't need a compiler warning there, we probably need better help text against DEBUG_LL and against EARLY_PRINTK.
On Wed, Nov 08, 2017 at 10:36:04PM +0000, Russell King - ARM Linux wrote: > We don't need a compiler warning there, we probably need better help > text against DEBUG_LL and against EARLY_PRINTK. Actually, this is already clearly stated against DEBUG_LL: Note that selecting this option will limit the kernel to a single UART definition, as specified below. Attempting to boot the kernel image on a different platform *will not work*, so this option should not be enabled for kernels that are intended to be portable. and since EARLY_PRINTK depends on DEBUG_LL... I guess people don't read help texts anymore.
Hi Russel, > Am 08.11.2017 um 23:38 schrieb Russell King - ARM Linux <linux@armlinux.org.uk>: > > On Wed, Nov 08, 2017 at 10:36:04PM +0000, Russell King - ARM Linux wrote: >> We don't need a compiler warning there, we probably need better help >> text against DEBUG_LL and against EARLY_PRINTK. > > Actually, this is already clearly stated against DEBUG_LL: > > Note that selecting this option will limit the kernel to a single > UART definition, as specified below. Attempting to boot the kernel > image on a different platform *will not work*, so this option should > not be enabled for kernels that are intended to be portable. > > and since EARLY_PRINTK depends on DEBUG_LL... I guess people don't > read help texts anymore. Yes, we read, but we face a situation where it doesn't help that it is a well known problem *for you* or anyone else and that it is described in the help. We simply had a private defconfig for years and someone enabled DEBUG_LL some years ago and it worked. Then we upgrade the kernel for every -rc and it continued to work well. Then, recently when doing just one update the compiled kernel failed to boot. Nobody did change and look at the defconfig and its help. We just merged the latest patches from linus/master. Do you expect anybody to remember after 3 years of *not* changing DEBUG_LL what was written in a help message? Or to understand that DEBUG_LL has anything to do with the mysteriously appearing boot problem? Which can't easily be debugged since there are no messages despite playing with EARLY_PRINTK/EARLYCON? It took me several hours of git bisect to find out the problematic commits and needed help by the author to relate it to DEBUG_LL in the defconfig. So we are coming from the other end with much less knowledge than you have. Linux has grown to be a beast that only a handful of people fully understand. And this patch is to help those poor production kernel maintainers like me who don't... A technically different solution would be to automatically disable DEBUG_LL if it is not compatible with some other setting. BTW it appears that there was similar code elsewhere long time ago: http://elixir.free-electrons.com/linux/v3.1.10/source/arch/arm/plat-mxc/include/mach/debug-macro.S BR and thanks, Nikolaus Schaller
On Thu, Nov 09, 2017 at 06:44:49AM +0100, H. Nikolaus Schaller wrote: > Hi Russel, Nikolus, > > Am 08.11.2017 um 23:38 schrieb Russell King - ARM Linux <linux@armlinux.org.uk>: > > > > On Wed, Nov 08, 2017 at 10:36:04PM +0000, Russell King - ARM Linux wrote: > >> We don't need a compiler warning there, we probably need better help > >> text against DEBUG_LL and against EARLY_PRINTK. > > > > Actually, this is already clearly stated against DEBUG_LL: > > > > Note that selecting this option will limit the kernel to a single > > UART definition, as specified below. Attempting to boot the kernel > > image on a different platform *will not work*, so this option should > > not be enabled for kernels that are intended to be portable. > > > > and since EARLY_PRINTK depends on DEBUG_LL... I guess people don't > > read help texts anymore. > > Yes, we read, but we face a situation where it doesn't help that it is a > well known problem *for you* or anyone else and that it is described in > the help. Personally, I'd like early_printk not to be re-using this, but others disagree. It's all about knowledge and compromise. What we don't want is more warnings in the kernel - it's already hard enough for people to spot the "bad" warnings that they need to fix to have a working system (such as wrong printf specifiers) that (a) they're not going to spot this #warning and (b) it's just going to be more noise for those who do try to spot the warnings. > Do you expect anybody to remember after 3 years of *not* changing DEBUG_LL > what was written in a help message? Or to understand that DEBUG_LL has > anything to do with the mysteriously appearing boot problem? Which can't > easily be debugged since there are no messages despite playing with > EARLY_PRINTK/EARLYCON? Do you expect people to read the build log of their kernel and spot the additional warning? You might be on the ball enough to do that, but not everyone does, or is. Not everyone logs the output of the kernel build so they can review it later. I suspect that many just set the build going in a terminal, walk away and come back sometime later when they think it's done. (This view is based upon the number of warnings that I've seen crop up that are for things that are real bugs that others have introduced - had the warnings been spotted, it would've triggered a "oh, that's not right" moment.)
Hi Russel, > Am 09.11.2017 um 17:58 schrieb Russell King - ARM Linux <linux@armlinux.org.uk>: > > On Thu, Nov 09, 2017 at 06:44:49AM +0100, H. Nikolaus Schaller wrote: >> Hi Russel, > > Nikolus, > >>> Am 08.11.2017 um 23:38 schrieb Russell King - ARM Linux <linux@armlinux.org.uk>: >>> >>> On Wed, Nov 08, 2017 at 10:36:04PM +0000, Russell King - ARM Linux wrote: >>>> We don't need a compiler warning there, we probably need better help >>>> text against DEBUG_LL and against EARLY_PRINTK. >>> >>> Actually, this is already clearly stated against DEBUG_LL: >>> >>> Note that selecting this option will limit the kernel to a single >>> UART definition, as specified below. Attempting to boot the kernel >>> image on a different platform *will not work*, so this option should >>> not be enabled for kernels that are intended to be portable. >>> >>> and since EARLY_PRINTK depends on DEBUG_LL... I guess people don't >>> read help texts anymore. >> >> Yes, we read, but we face a situation where it doesn't help that it is a >> well known problem *for you* or anyone else and that it is described in >> the help. > > Personally, I'd like early_printk not to be re-using this, but others > disagree. It's all about knowledge and compromise. Sure, we have to make a lot of compromises... > > What we don't want is more warnings in the kernel - it's already > hard enough for people to spot the "bad" warnings that they need to > fix to have a working system (such as wrong printf specifiers) that > (a) they're not going to spot this #warning and (b) it's just going > to be more noise for those who do try to spot the warnings. > >> Do you expect anybody to remember after 3 years of *not* changing DEBUG_LL >> what was written in a help message? Or to understand that DEBUG_LL has >> anything to do with the mysteriously appearing boot problem? Which can't >> easily be debugged since there are no messages despite playing with >> EARLY_PRINTK/EARLYCON? > > Do you expect people to read the build log of their kernel and spot > the additional warning? I am doing it and I rarely see warnings by the compiler. Maybe sometimes during the -rc phase but not in stable. > You might be on the ball enough to do that, > but not everyone does, or is. Not everyone logs the output of the > kernel build so they can review it later. I suspect that many just > set the build going in a terminal, walk away and come back sometime > later when they think it's done. In this case they would have a good trigger and reason to look into the build log: the kernel they just built isn't booting and doesn't tell anything. Then I guess for many of us the next logical step would be to look trough the build log (but not through a defconfig that wasn't touched for ages). Well, I have to admit that if someone doesn't try a clean build, it will not see it again if the initial log wasn't captured somewhere... So it should more be an #error than a #warning. Then people must notice. But I don't know if there are situations where this would be too strict. And in this special case someone may even have the idea to enable DEBUG_LL to get better error reporting - but find it is already enabled. Not knowing that this is the reason of the problem. This is what IMHO makes this case very special. Or the other option would be to automatically disable DEBUG_LL if it conflicts. Then, there is no need for a warning that could be ignored. > (This view is based upon the number > of warnings that I've seen crop up that are for things that are real > bugs that others have introduced - had the warnings been spotted, it > would've triggered a "oh, that's not right" moment.) BR and thanks, Nikolaus
Hi Nikkylos, On Thu, Nov 09, 2017 at 06:16:29PM +0100, H. Nikolaus Schaller wrote: > Hi Russel, > > > Am 09.11.2017 um 17:58 schrieb Russell King - ARM Linux <linux@armlinux.org.uk>: > > > > On Thu, Nov 09, 2017 at 06:44:49AM +0100, H. Nikolaus Schaller wrote: > >> Hi Russel, > > > > Nikolus, > > > >>> Am 08.11.2017 um 23:38 schrieb Russell King - ARM Linux <linux@armlinux.org.uk>: > >>> > >>> On Wed, Nov 08, 2017 at 10:36:04PM +0000, Russell King - ARM Linux wrote: > >>>> We don't need a compiler warning there, we probably need better help > >>>> text against DEBUG_LL and against EARLY_PRINTK. > >>> > >>> Actually, this is already clearly stated against DEBUG_LL: > >>> > >>> Note that selecting this option will limit the kernel to a single > >>> UART definition, as specified below. Attempting to boot the kernel > >>> image on a different platform *will not work*, so this option should > >>> not be enabled for kernels that are intended to be portable. > >>> > >>> and since EARLY_PRINTK depends on DEBUG_LL... I guess people don't > >>> read help texts anymore. > >> > >> Yes, we read, but we face a situation where it doesn't help that it is a > >> well known problem *for you* or anyone else and that it is described in > >> the help. > > > > Personally, I'd like early_printk not to be re-using this, but others > > disagree. It's all about knowledge and compromise. > > Sure, we have to make a lot of compromises... > > > > > What we don't want is more warnings in the kernel - it's already > > hard enough for people to spot the "bad" warnings that they need to > > fix to have a working system (such as wrong printf specifiers) that > > (a) they're not going to spot this #warning and (b) it's just going > > to be more noise for those who do try to spot the warnings. > > > > >> Do you expect anybody to remember after 3 years of *not* changing DEBUG_LL > >> what was written in a help message? Or to understand that DEBUG_LL has > >> anything to do with the mysteriously appearing boot problem? Which can't > >> easily be debugged since there are no messages despite playing with > >> EARLY_PRINTK/EARLYCON? > > > > Do you expect people to read the build log of their kernel and spot > > the additional warning? > > I am doing it and I rarely see warnings by the compiler. Maybe sometimes > during the -rc phase but not in stable. The warnings you see are compiler specific, and I hardly ever see builds that are fully clean. > > You might be on the ball enough to do that, > > but not everyone does, or is. Not everyone logs the output of the > > kernel build so they can review it later. I suspect that many just > > set the build going in a terminal, walk away and come back sometime > > later when they think it's done. > > In this case they would have a good trigger and reason to look into the > build log: the kernel they just built isn't booting and doesn't tell > anything. Then I guess for many of us the next logical step would be to > look trough the build log (but not through a defconfig that wasn't touched > for ages). > > Well, I have to admit that if someone doesn't try a clean build, it > will not see it again if the initial log wasn't captured somewhere... > > So it should more be an #error than a #warning. Then people must notice. > But I don't know if there are situations where this would be too strict. There are - it's perfectly valid to have your kernel, but with DEBUG_LL enabled and configured correctly so it outputs to a real UART on your platform. As I've explained already, DEBUG_LL exists as a way of last resort to solve the "it doesn't boot" problem, because it gives a way for the early assembler to be instrumented and debugged before anything else is up, through the kernel boot sequence and into the C code. That is its primary purpose. Adding a #error or #warning is going to turn people who most need the infrastructure to solve the "it silently doesn't boot" away. Unfortunately, DEBUG_LL is one of those things that is platform specific, it has to be to guarantee output, so situations such as missing DT can still produce output. Unfortunately, it got used for early_printk because it was easier to add early_printk rather than telling people to add a printascii() call in kernel/printk/printk.c (which is what I used to tell people, and what I _still_ do today.) DEBUG_LL has always been intended for this purpose, and it's under the "debugging" menus, because it's really not meant to be enabled for production. If you have folk who have enabled DEBUG_LL in your configurations, and then left it enabled beyond this level of debugging, I have to question the competence of that. DEBUG_LL may be an extreme example of a config option that can land your kernel in a non-bootable situation, but there are plenty of other configuration options that will severely degrade the performance of your kernel. Should we go around adding #warning's for those as well? I know that the kernel configuration is hard - and it's hard because it's large, but it is worth reviewing the options that you have in your config from time to time, especially the debug options, and question whether the debug options that are enabled are appropriate for the phase of development. So, if you're doing new development, you should have all the options that add extra checks enabled (such as lockdep, hang checks, etc). If you're moving towards the end of the development phase, and you care about performance, then you need to start turning these options off, because many of them will slow the kernel down. The configuration of debug options is not something that should be static. I suspect that you probably have a lot of debug options enabled in your kernel that affect it's performance... maybe you're due to review them as a whole?
Hi Russell, > Am 09.11.2017 um 18:35 schrieb Russell King - ARM Linux <linux@armlinux.org.uk>: > > Hi Nikkylos, > > On Thu, Nov 09, 2017 at 06:16:29PM +0100, H. Nikolaus Schaller wrote: >> Hi Russel, >> >>> Am 09.11.2017 um 17:58 schrieb Russell King - ARM Linux <linux@armlinux.org.uk>: >>> >>> On Thu, Nov 09, 2017 at 06:44:49AM +0100, H. Nikolaus Schaller wrote: >>>> Hi Russel, >>> >>> Nikolus, >>> >>>>> Am 08.11.2017 um 23:38 schrieb Russell King - ARM Linux <linux@armlinux.org.uk>: >>>>> >>>>> On Wed, Nov 08, 2017 at 10:36:04PM +0000, Russell King - ARM Linux wrote: >>>>>> We don't need a compiler warning there, we probably need better help >>>>>> text against DEBUG_LL and against EARLY_PRINTK. >>>>> >>>>> Actually, this is already clearly stated against DEBUG_LL: >>>>> >>>>> Note that selecting this option will limit the kernel to a single >>>>> UART definition, as specified below. Attempting to boot the kernel >>>>> image on a different platform *will not work*, so this option should >>>>> not be enabled for kernels that are intended to be portable. >>>>> >>>>> and since EARLY_PRINTK depends on DEBUG_LL... I guess people don't >>>>> read help texts anymore. >>>> >>>> Yes, we read, but we face a situation where it doesn't help that it is a >>>> well known problem *for you* or anyone else and that it is described in >>>> the help. >>> >>> Personally, I'd like early_printk not to be re-using this, but others >>> disagree. It's all about knowledge and compromise. >> >> Sure, we have to make a lot of compromises... >> >>> >>> What we don't want is more warnings in the kernel - it's already >>> hard enough for people to spot the "bad" warnings that they need to >>> fix to have a working system (such as wrong printf specifiers) that >>> (a) they're not going to spot this #warning and (b) it's just going >>> to be more noise for those who do try to spot the warnings. >> >>> >>>> Do you expect anybody to remember after 3 years of *not* changing DEBUG_LL >>>> what was written in a help message? Or to understand that DEBUG_LL has >>>> anything to do with the mysteriously appearing boot problem? Which can't >>>> easily be debugged since there are no messages despite playing with >>>> EARLY_PRINTK/EARLYCON? >>> >>> Do you expect people to read the build log of their kernel and spot >>> the additional warning? >> >> I am doing it and I rarely see warnings by the compiler. Maybe sometimes >> during the -rc phase but not in stable. > > The warnings you see are compiler specific, and I hardly ever see builds > that are fully clean. I can just contribute my experience that mine are usually. > >>> You might be on the ball enough to do that, >>> but not everyone does, or is. Not everyone logs the output of the >>> kernel build so they can review it later. I suspect that many just >>> set the build going in a terminal, walk away and come back sometime >>> later when they think it's done. >> >> In this case they would have a good trigger and reason to look into the >> build log: the kernel they just built isn't booting and doesn't tell >> anything. Then I guess for many of us the next logical step would be to >> look trough the build log (but not through a defconfig that wasn't touched >> for ages). >> >> Well, I have to admit that if someone doesn't try a clean build, it >> will not see it again if the initial log wasn't captured somewhere... >> >> So it should more be an #error than a #warning. Then people must notice. >> But I don't know if there are situations where this would be too strict. > > There are - it's perfectly valid to have your kernel, but with DEBUG_LL > enabled and configured correctly so it outputs to a real UART on your > platform. As I've explained already, DEBUG_LL exists as a way of last > resort to solve the "it doesn't boot" problem, because it gives a > way for the early assembler to be instrumented and debugged before > anything else is up, through the kernel boot sequence and into the C > code. Indeed - except that DEBUG_LL can be the reason for the "it doesn't boot" which I assume nobody expects. t would be like placing a voltmeter somewhere and you have power outage. > > That is its primary purpose. Adding a #error or #warning is going to > turn people who most need the infrastructure to solve the "it silently > doesn't boot" away. This confirms my assumption that a #warning seems more appropriate than an #error. You can ignore it if you know what you are doing. > > Unfortunately, DEBUG_LL is one of those things that is platform specific, > it has to be to guarantee output, so situations such as missing DT can > still produce output. Unfortunately, it got used for early_printk > because it was easier to add early_printk rather than telling people > to add a printascii() call in kernel/printk/printk.c (which is what I > used to tell people, and what I _still_ do today.) Yes, I have used printascii() in early days of board bringup (especially with broken u-boot setup)... > > DEBUG_LL has always been intended for this purpose, and it's under the > "debugging" menus, because it's really not meant to be enabled for > production. If you have folk who have enabled DEBUG_LL in your > configurations, and then left it enabled beyond this level of debugging, > I have to question the competence of that. It is not a matter of competence but experience. Simply assume that someone with only a little experience did enable it (or even copied from some nowadays unknown source) some years ago for some good reason (in the view of the team experience back then) and nobody questioned this decision afterwards. And suddenly the system fails completely. It is the same mental principle how severe bugs (and security issues) stay for years and are suddenly found to the surprise of everybody. > DEBUG_LL may be an extreme example of a config option that can land > your kernel in a non-bootable situation, but there are plenty of other > configuration options that will severely degrade the performance of > your kernel. Should we go around adding #warning's for those as well? No. If just performance is going down, you can still boot and see messages and debug other (e.g. driver) issues. And have all debugging tools around. But you can't play around to debug a non-booting kernel (except through JTAG). > > I know that the kernel configuration is hard - and it's hard because > it's large, but it is worth reviewing the options that you have in > your config from time to time, especially the debug options, We did that from time to time - but I admit it is not done in a systematic and well documented way. Something to think about and maybe automate... sed "s/\(.*DEBUG.*\)=y/# \1 is not set/g" <debug_defconfig >defconfig > and > question whether the debug options that are enabled are appropriate > for the phase of development. > > So, if you're doing new development, you should have all the options > that add extra checks enabled (such as lockdep, hang checks, etc). > If you're moving towards the end of the development phase, and you > care about performance, then you need to start turning these options > off, because many of them will slow the kernel down. > > The configuration of debug options is not something that should be > static. > > I suspect that you probably have a lot of debug options enabled in your > kernel that affect it's performance... maybe you're due to review them > as a whole? It would not solve the problem. Just assume we have two configs - one with debug disabled and one with debug enabled. Now we come to 4.14-rc1 where the code change happened. We will not immediately notice, but as soon as we build with the debug defconfig (for debugging something else), we can't even boot. So instead of fixing an important and urgent issue, we have to make the debug kernel boot again... While this might hint at a config problem, we still do not know which one. And we look for and find DEBUG_LL enabled which we think is (still) right. And it wasn't completely wrong before the patches for OMAP were integrated. While my proposed patch would immediately tell - without runtime performance influence. And in the non-debug config nobody would even see the #warning because DEBUG_LL is disabled. But it is up to you to accept or reject this idea - we for our team have solved it for us. We simply want to save others from making the same bad experience again. If this is not welcome, it has to be accepted. BR and thanks, Nikolaus
diff --git a/arch/arm/include/debug/omap2plus.S b/arch/arm/include/debug/omap2plus.S index 192a7583999c..3e7ea21cde2a 100644 --- a/arch/arm/include/debug/omap2plus.S +++ b/arch/arm/include/debug/omap2plus.S @@ -12,6 +12,10 @@ #include <linux/serial_reg.h> +#if defined(CONFIG_DEBUG_LL) +#warning Please disable CONFIG_DEBUG_LL and enable CONFIG_SERIAL_EARLYCON or OMAP5 devices will not boot +#endif + /* External port on Zoom2/3 */ #define ZOOM_UART_BASE 0x10000000 #define ZOOM_UART_VIRT 0xfa400000
commit d2b310b0234c ("ARM: debug: Use generic 8250 debug_ll for omap2 and omap3/4/5 common uarts") commit fc23beb8a577 ("ARM: debug: Use generic 8250 debug_ll for omap3/4/5") switched to generic 8250 debug_ll code which seems to be incompatible with at least OMAP5 boards (OMAP5EVM, Pyra) if CONFIG_DEBUG_LL is still enabled in some legacy defconfig. Since this is very hard to relate to these patches and difficult to identify, let's have the compiler emit a warning. Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> --- arch/arm/include/debug/omap2plus.S | 4 ++++ 1 file changed, 4 insertions(+)