Message ID | 20191106030542.868541-1-dima@arista.com (mailing list archive) |
---|---|
Headers | show |
Series | Add log level to show_stack() | expand |
On Wed 2019-11-06 03:04:51, Dmitry Safonov wrote: > Add log level argument to show_stack(). > Done in three stages: > 1. Introducing show_stack_loglvl() for every architecture > 2. Migrating old users with an explicit log level > 3. Renaming show_stack_loglvl() into show_stack() > > Justification: > o It's a design mistake to move a business-logic decision > into platform realization detail. > o I have currently two patches sets that would benefit from this work: > Removing console_loglevel jumps in sysrq driver [1] Just to clarify. The problem in sysrq driver is a bit different. It modifies console_loglevel to show even less important message on the console. IMHO, it should be solved by printing the header line with pr_error(). It is not ideal. A cleaner solution might be to introduce another loglevel that will always get pushed to the console. But I am not sure if it is worth this single line. > Hung task warning before panic [2] - suggested by Tetsuo (but he > probably didn't realise what it would involve). > o While doing (1), (2) the backtraces were adjusted to headers > and other messages for each situation - so there won't be a situation > when the backtrace is printed, but the headers are missing because > they have lesser log level (or the reverse). > o As the result in (2) plays with console_loglevel for kdb are removed. > The least important for upstream, but maybe still worth to note that > every company I've worked in so far had an off-list patch to print > backtrace with the needed log level (but only for the architecture they > cared about). > If you have other ideas how you will benefit from show_stack() with > a log level - please, reply to this cover letter. I agree with all the other justification. I would add. The backtrace is really useful for debugging. It should be possible to print it even in less critical situations. I am afraid that many people use WARN() for this purpose. But WARN() is not always appropriate. WARN() misuse huts when panic_on_warn option is used. Best Regards, Petr
On Wed, Nov 06, 2019 at 03:04:51AM +0000, Dmitry Safonov wrote: > Add log level argument to show_stack(). > Done in three stages: > 1. Introducing show_stack_loglvl() for every architecture > 2. Migrating old users with an explicit log level > 3. Renaming show_stack_loglvl() into show_stack() > > Justification: > o It's a design mistake to move a business-logic decision > into platform realization detail. > o I have currently two patches sets that would benefit from this work: > Removing console_loglevel jumps in sysrq driver [1] > Hung task warning before panic [2] - suggested by Tetsuo (but he > probably didn't realise what it would involve). > o While doing (1), (2) the backtraces were adjusted to headers > and other messages for each situation - so there won't be a situation > when the backtrace is printed, but the headers are missing because > they have lesser log level (or the reverse). > o As the result in (2) plays with console_loglevel for kdb are removed. I really don't understand that word salad. Why are you doing this?
On 11/6/19 8:35 AM, Petr Mladek wrote: > On Wed 2019-11-06 03:04:51, Dmitry Safonov wrote: >> Add log level argument to show_stack(). >> Done in three stages: >> 1. Introducing show_stack_loglvl() for every architecture >> 2. Migrating old users with an explicit log level >> 3. Renaming show_stack_loglvl() into show_stack() >> >> Justification: >> o It's a design mistake to move a business-logic decision >> into platform realization detail. >> o I have currently two patches sets that would benefit from this work: >> Removing console_loglevel jumps in sysrq driver [1] > > Just to clarify. The problem in sysrq driver is a bit different. > It modifies console_loglevel to show even less important message > on the console. > > IMHO, it should be solved by printing the header line with pr_error(). > It is not ideal. A cleaner solution might be to introduce another > loglevel that will always get pushed to the console. But I am > not sure if it is worth this single line. I believe why it's not done - there is a comment in sysrq code that said the userspace relies on the loglevel of sysrq messages (and may trigger alerts from emerg/err log level): * Raise the apparent loglevel to maximum so that the sysrq header * is shown to provide the user with positive feedback. We do not * simply emit this at KERN_EMERG as that would change message * routing in the consumers of /proc/kmsg. But I don't mind any solution. >> Hung task warning before panic [2] - suggested by Tetsuo (but he >> probably didn't realise what it would involve). >> o While doing (1), (2) the backtraces were adjusted to headers >> and other messages for each situation - so there won't be a situation >> when the backtrace is printed, but the headers are missing because >> they have lesser log level (or the reverse). >> o As the result in (2) plays with console_loglevel for kdb are removed. > >> The least important for upstream, but maybe still worth to note that >> every company I've worked in so far had an off-list patch to print >> backtrace with the needed log level (but only for the architecture they >> cared about). >> If you have other ideas how you will benefit from show_stack() with >> a log level - please, reply to this cover letter. > > I agree with all the other justification. > > I would add. The backtrace is really useful for debugging. It should > be possible to print it even in less critical situations. > > I am afraid that many people use WARN() for this purpose. But WARN() > is not always appropriate. WARN() misuse huts when panic_on_warn > option is used. Thanks, Petr.
Hi Peter, On 11/6/19 9:20 AM, Peter Zijlstra wrote: > On Wed, Nov 06, 2019 at 03:04:51AM +0000, Dmitry Safonov wrote: >> Add log level argument to show_stack(). >> Done in three stages: >> 1. Introducing show_stack_loglvl() for every architecture >> 2. Migrating old users with an explicit log level >> 3. Renaming show_stack_loglvl() into show_stack() >> >> Justification: >> o It's a design mistake to move a business-logic decision >> into platform realization detail. >> o I have currently two patches sets that would benefit from this work: >> Removing console_loglevel jumps in sysrq driver [1] >> Hung task warning before panic [2] - suggested by Tetsuo (but he >> probably didn't realise what it would involve). >> o While doing (1), (2) the backtraces were adjusted to headers >> and other messages for each situation - so there won't be a situation >> when the backtrace is printed, but the headers are missing because >> they have lesser log level (or the reverse). >> o As the result in (2) plays with console_loglevel for kdb are removed. > > I really don't understand that word salad. Why are you doing this? > Sorry, I should have tried to describe better. I'm trying to remove external users of console_loglevel by following reasons: - changing console_loglevel on SMP means that unwanted messages from other CPUs will appear (that have lower log level) - on UMP unwanted messages may appear if the code is preempted while it hasn't set the console_loglevel back to old - rising console_loglevel to print wanted message(s) may not work at all if printk() has being delayed and the console_loglevel is already set back to old value Sysrq driver changes console_loglevel because it needs to print message with a specific log level (the code said userspace relies on this). Kdb changes console_loglevel to print backtrace as the log level depends on architecture realisation. I also have patches in wip those needs to print backtrace with specific loglevel (higher when it's critical, lower when it's notice and shouldn't go to serial console). Besides on local tests I see hits those have headers (messages like "Backtrace: ") without an actual backtrace and the reverse - a backtrace without a reason for it. It's quite annoying and worth addressing by syncing headers log levels to backtraces. Thanks, Dmitry
On Wed, Nov 06, 2019 at 04:27:33PM +0000, Dmitry Safonov wrote: > Hi Peter, > > On 11/6/19 9:20 AM, Peter Zijlstra wrote: > > On Wed, Nov 06, 2019 at 03:04:51AM +0000, Dmitry Safonov wrote: > >> Add log level argument to show_stack(). > >> Done in three stages: > >> 1. Introducing show_stack_loglvl() for every architecture > >> 2. Migrating old users with an explicit log level > >> 3. Renaming show_stack_loglvl() into show_stack() > >> > >> Justification: > >> o It's a design mistake to move a business-logic decision > >> into platform realization detail. > >> o I have currently two patches sets that would benefit from this work: > >> Removing console_loglevel jumps in sysrq driver [1] > >> Hung task warning before panic [2] - suggested by Tetsuo (but he > >> probably didn't realise what it would involve). > >> o While doing (1), (2) the backtraces were adjusted to headers > >> and other messages for each situation - so there won't be a situation > >> when the backtrace is printed, but the headers are missing because > >> they have lesser log level (or the reverse). > >> o As the result in (2) plays with console_loglevel for kdb are removed. > > > > I really don't understand that word salad. Why are you doing this? > > > > Sorry, I should have tried to describe better. > > I'm trying to remove external users of console_loglevel by following > reasons: I suppose since all my machines have 'debug ignore_loglevel earlyprintk=serial,ttyS0,115200 console=ttyS0,115200' I don't have this experience. > - changing console_loglevel on SMP means that unwanted messages from > other CPUs will appear (that have lower log level) > - on UMP unwanted messages may appear if the code is preempted while it > hasn't set the console_loglevel back to old > - rising console_loglevel to print wanted message(s) may not work at all > if printk() has being delayed and the console_loglevel is already set > back to old value Sure, frobbing the global console_loglevel is bad. > I also have patches in wip those needs to print backtrace with specific > loglevel (higher when it's critical, lower when it's notice and > shouldn't go to serial console). (everything always should go to serial, serial is awesome :-) > Besides on local tests I see hits those have headers (messages like > "Backtrace: ") without an actual backtrace and the reverse - a backtrace > without a reason for it. It's quite annoying and worth addressing by > syncing headers log levels to backtraces. I suppose I'm surprised there are backtraces that are not important. Either badness happened and it needs printing, or the user asked for it and it needs printing. Perhaps we should be removing backtraces if they're not important instead of allowing to print them as lower loglevels?
On Wed, Nov 06, 2019 at 09:34:40PM +0100, Peter Zijlstra wrote: > I suppose I'm surprised there are backtraces that are not important. > Either badness happened and it needs printing, or the user asked for it > and it needs printing. Or utterly meaningless. > Perhaps we should be removing backtraces if they're not important > instead of allowing to print them as lower loglevels? Definitely! WARN_ON() is well overused - and as is typical, used without much thought. Bound to happen after Linus got shirty about BUG_ON() being over used. Everyone just grabbed the next nearest thing to assert(). As a kind of example, I've recently come across one WARN_ON() in a driver subsystem (that shall remain nameless at the moment) which very likely has multiple different devices on a platform. The WARN_ON() triggers as a result of a problem with the hardware, but because it's a WARN_ON(), you've no idea which device has a problem. The backtrace is mostly meaningless. So you know that a problem has occurred, but the kernel prints *useless* backtrace to let you know, and totally omits the *useful* information.
On (19/11/06 09:35), Petr Mladek wrote: > I agree with all the other justification. > > I would add. The backtrace is really useful for debugging. It should > be possible to print it even in less critical situations. Hmm, I don't know. Do we really need debug/info level backtraces? May be all backtraces can be converted to something more severe (so we can stop playing games with loglvl) and then we can clean up "(ab)users"? -ss
On Fri 2019-11-08 19:37:19, Sergey Senozhatsky wrote: > On (19/11/06 09:35), Petr Mladek wrote: > > I agree with all the other justification. > > > > I would add. The backtrace is really useful for debugging. It should > > be possible to print it even in less critical situations. > > Hmm, I don't know. > Do we really need debug/info level backtraces? debug is exactly the loglevel where registry content and backtrace might be very useful. It is not always important to reach the console. > May be all backtraces can be converted to something more severe > (so we can stop playing games with loglvl) and then we can > clean up "(ab)users"? IMHO, we should distinguish warning, error, crit, alert, emerg situations. Backtraces and any related messages should be filtered the same way. Any information might be useless without the context. I agree that it is complicated to pass the loglevel as a parameter. It would be better define the default log level for a given code section. It might be stored in task_struct for the normal context and in per-CPU variables for interrupt contexts. Best Regards, Petr
On 11/6/19 8:34 PM, Peter Zijlstra wrote: > On Wed, Nov 06, 2019 at 04:27:33PM +0000, Dmitry Safonov wrote: [..] >> Sorry, I should have tried to describe better. >> >> I'm trying to remove external users of console_loglevel by following >> reasons: > > I suppose since all my machines have 'debug ignore_loglevel > earlyprintk=serial,ttyS0,115200 console=ttyS0,115200' I don't have this > experience. Yeah, I remember you avoid all those functionalities of printk(), fair enough. On the other side, regular users and I'm betting most of the non-tuned distributions use /proc/sys/kernel/printk by default. (Checking on my Arch & Fedora - loglevel 4 from the box) >> - changing console_loglevel on SMP means that unwanted messages from >> other CPUs will appear (that have lower log level) >> - on UMP unwanted messages may appear if the code is preempted while it >> hasn't set the console_loglevel back to old >> - rising console_loglevel to print wanted message(s) may not work at all >> if printk() has being delayed and the console_loglevel is already set >> back to old value > > Sure, frobbing the global console_loglevel is bad. > >> I also have patches in wip those needs to print backtrace with specific >> loglevel (higher when it's critical, lower when it's notice and >> shouldn't go to serial console). > > (everything always should go to serial, serial is awesome :-) Personally I agree. Unfortunately, here @Arista there are switches (I'm speaking about the order of thousands at least) those have baud-rate 9600. It's a bit expensive being elaborate with such setup. >> Besides on local tests I see hits those have headers (messages like >> "Backtrace: ") without an actual backtrace and the reverse - a backtrace >> without a reason for it. It's quite annoying and worth addressing by >> syncing headers log levels to backtraces. > > I suppose I'm surprised there are backtraces that are not important. > Either badness happened and it needs printing, or the user asked for it > and it needs printing. > > Perhaps we should be removing backtraces if they're not important > instead of allowing to print them as lower loglevels? Well, the use-case for lower log-level is that everything goes into logs (/var/log/dmesg or /var/log/messages whatever rsyslog has settting). That has it's value: - after a failure (i.e. panic) messages, those were only signs that something goes wrong can be seen in logs which can give ideas what has happened. - before the failure, those messages go to telemetry and can be auto-magically processed remotely to take a decision (e.g. balance the traffic away). I wish all the information could be gathered in the userspace, but many existing kernel/driver messages are valuable. A more detailed example is hung task detector: we want to have the backtrace for a task that is in uninterruptible state too long, but only in logs as printing on serial console may lead to holding console lock and watchdog. Besides customer notifications and investigations, I see the value of such "bottleneck" warnings during long-running integration tests. Thanks, Dmitry
On Fri, Nov 08, 2019 at 04:28:30PM +0000, Dmitry Safonov wrote: > On 11/6/19 8:34 PM, Peter Zijlstra wrote: > > On Wed, Nov 06, 2019 at 04:27:33PM +0000, Dmitry Safonov wrote: > [..] > >> Sorry, I should have tried to describe better. > >> > >> I'm trying to remove external users of console_loglevel by following > >> reasons: > > > > I suppose since all my machines have 'debug ignore_loglevel > > earlyprintk=serial,ttyS0,115200 console=ttyS0,115200' I don't have this > > experience. > > Yeah, I remember you avoid all those functionalities of printk(), fair > enough. On the other side, regular users and I'm betting most of > the non-tuned distributions use /proc/sys/kernel/printk by default. > (Checking on my Arch & Fedora - loglevel 4 from the box) > > >> - changing console_loglevel on SMP means that unwanted messages from > >> other CPUs will appear (that have lower log level) > >> - on UMP unwanted messages may appear if the code is preempted while it > >> hasn't set the console_loglevel back to old > >> - rising console_loglevel to print wanted message(s) may not work at all > >> if printk() has being delayed and the console_loglevel is already set > >> back to old value > > > > Sure, frobbing the global console_loglevel is bad. > > > >> I also have patches in wip those needs to print backtrace with specific > >> loglevel (higher when it's critical, lower when it's notice and > >> shouldn't go to serial console). > > > > (everything always should go to serial, serial is awesome :-) > > Personally I agree. Unfortunately, here @Arista there are switches (I'm > speaking about the order of thousands at least) those have baud-rate 9600. > It's a bit expensive being elaborate with such setup. > > >> Besides on local tests I see hits those have headers (messages like > >> "Backtrace: ") without an actual backtrace and the reverse - a backtrace > >> without a reason for it. It's quite annoying and worth addressing by > >> syncing headers log levels to backtraces. > > > > I suppose I'm surprised there are backtraces that are not important. > > Either badness happened and it needs printing, or the user asked for it > > and it needs printing. > > > > Perhaps we should be removing backtraces if they're not important > > instead of allowing to print them as lower loglevels? > > Well, the use-case for lower log-level is that everything goes into logs > (/var/log/dmesg or /var/log/messages whatever rsyslog has settting). > > That has it's value: > - after a failure (i.e. panic) messages, those were only signs that > something goes wrong can be seen in logs which can give ideas what has > happened. No they don't. When the kernel panics, userspace generally stops running, so rsyslog won't be able to write them to /var/log/messages. How, by "kernel panics" I mean a real kernel panic, which probably isn't what you're talking about there. You are probably talking about the whole shebang of non-fatal kernel oops, kernel warnings and the like. If so, I'd ask you to stop confuzzilating terminology. If you really want to capture such events, then you need to have the kernel write the panic to (e.g.) flash - see the mtdoops driver.
On 11/8/19 5:30 PM, Russell King - ARM Linux admin wrote: > On Fri, Nov 08, 2019 at 04:28:30PM +0000, Dmitry Safonov wrote: [..] >> >> Well, the use-case for lower log-level is that everything goes into logs >> (/var/log/dmesg or /var/log/messages whatever rsyslog has settting). >> >> That has it's value: >> - after a failure (i.e. panic) messages, those were only signs that >> something goes wrong can be seen in logs which can give ideas what has >> happened. > > No they don't. When the kernel panics, userspace generally stops > running, so rsyslog won't be able to write them to /var/log/messages. > > How, by "kernel panics" I mean a real kernel panic, which probably > isn't what you're talking about there. You are probably talking > about the whole shebang of non-fatal kernel oops, kernel warnings > and the like. If so, I'd ask you to stop confuzzilating terminology. > > If you really want to capture such events, then you need to have the > kernel write the panic to (e.g.) flash - see the mtdoops driver. I was talking about things prior the panic: OOMs, MMC write/read warnings, hung tasks, we also have local patches to produce a warning if the mutex is being held for too long or a task is starving on CPU time by hard/soft irqs (I hope I will design something like that for upstream). I've found those warnings useful to: (a) have an early message when the things are starting going bad. (b) analyze contentions or too large scale for a box or faulty hardware for non-reproducible issues just from logs. We use kexec to save the dmesg ringbuffer content after the panic. Thanks, Dmitry
On (19/11/08 14:04), Petr Mladek wrote: [..] > I agree that it is complicated to pass the loglevel as > a parameter. It would be better define the default > log level for a given code section. It might be stored > in task_struct for the normal context and in per-CPU > variables for interrupt contexts. I do recall that we talked about per-CPU printk state bit which would start/end "just print it" section. We probably can extend it to "just log_store" type of functionality. Doesn't look like a very bad idea. "This task/context is in trouble, whatever it printk()-s is important". Per-console loglevel also might help sometimes. Slower consoles would ->write() only critical messages, faster consoles everything. Passing log_level as part of message payload, which printk machinery magically hides is not entirely exciting. What we have in the code now - printk("%s blah\n", lvl) - is not what we see in the logs. Because the leading '%s' becomes special. And printk()/sprintf() documentation should reflect that: '%s' prints a string, but sometimes it doesn't. -ss
On Mon 2019-11-11 10:23:36, Sergey Senozhatsky wrote: > On (19/11/08 14:04), Petr Mladek wrote: > [..] > > I agree that it is complicated to pass the loglevel as > > a parameter. It would be better define the default > > log level for a given code section. It might be stored > > in task_struct for the normal context and in per-CPU > > variables for interrupt contexts. > > I do recall that we talked about per-CPU printk state bit which would > start/end "just print it" section. We probably can extend it to "just > log_store" type of functionality. Doesn't look like a very bad idea. The problem with per-CPU printk is that we would need to disable interrupts. It is not always wanted. Also people might not expect this from a printk() API. > "This task/context is in trouble, whatever it printk()-s is important". It might be a minimal loglevel. More important messages would still be printed() with the higher loglevel. But yes, this per-code-section loglevel is problematic. The feedback against the patchset shows that people want it also the other way. I mean to keep pr_debug() as pr_debug(). A solution might be to use the per-code-section loglevel only instead of some special loglevel. > Per-console loglevel also might help sometimes. Slower consoles would > ->write() only critical messages, faster consoles everything. This looks like another problem to me. Anyway, this filtering will work better when the loglevel will be consistent across the related lines. > Passing log_level as part of message payload, which printk machinery > magically hides is not entirely exciting. What we have in the code > now - printk("%s blah\n", lvl) - is not what we see in the logs. > Because the leading '%s' becomes special. And printk()/sprintf() > documentation should reflect that: '%s' prints a string, but sometimes > it doesn't. I personally do not see this as a big problem. The explicitly passed loglevel makes me feel more confident that all needed printk() calls were updated. But it might be a false feeling. I do not really have any strong preference. Best Regards, Petr
Hi Sergey, Petr, On 11/11/19 1:23 AM, Sergey Senozhatsky wrote: > On (19/11/08 14:04), Petr Mladek wrote: > [..] >> I agree that it is complicated to pass the loglevel as >> a parameter. It would be better define the default >> log level for a given code section. It might be stored >> in task_struct for the normal context and in per-CPU >> variables for interrupt contexts. > > I do recall that we talked about per-CPU printk state bit which would > start/end "just print it" section. We probably can extend it to "just > log_store" type of functionality. Doesn't look like a very bad idea. > "This task/context is in trouble, whatever it printk()-s is important". I don't see how bits on task_struct or in per-cpu are easier than supplying a log level parameter down the stack. How would it work if sysrq_handle_crash() called by key-press? How would that interact with deferred printing? How would it make visible prints from current context, but not from something that preempted it? Furthermore, what problems are you trying to solve with this design? Only sysrq driver? Kdb? In my perspective it sounds too complicated and over-engineered for something that has two-three users. Also I've tried to point that I need to print backtrace sometimes with KERN_DEBUG loglevel to use it in production for early notices those needs to go only to log files and currently each architecture decides which log level it prefers. And what's so complicated about this patches set? I see only side of the testing, but the build-testing is covered with 0day bot and cost nothing and any visible regression may be found during -next period. While introducing those printk-sections may subtly break things. I mean, I definitely know less about printk() and its internals than you - so my points may be a no-sense. What I'm going to do - is to fix all build and reported issues, I'll send v2 this week and feel free to NAK it, I will forget about those patches and won't be offended. I don't see many people those are "hey, we'll benefit from this". And doing this patches set was neither quite fun (dirty business), nor something I can be later proud of (hey, I've introduced the log level parameter to printing functions!). Thanks, Dima
On (19/11/11 19:47), Dmitry Safonov wrote: [..] > I don't see how bits on task_struct or in per-cpu are easier than > supplying a log level parameter down the stack. > How would it work if sysrq_handle_crash() called by key-press? > How would that interact with deferred printing? > How would it make visible prints from current context, but not from > something that preempted it? [..] per-context log_level works pretty much the same way as per-message log_level. // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - // static DEFINE_PER_CPU(int, cpu_loglevels[4]); // @INITME.. LOGLEVEL_DEBUG + 1? static int __printing_context(void) { unsigned int preempt = preempt_count(); if (!(preempt & (NMI_MASK | HARDIRQ_MASK | SOFITRQ_OFFSET))) return 0; if (preempt & SOFITRQ_OFFSET) return 1; if (preempt & HARDIRQ_MASK) return 2; return 3; } static int adj_context_loglevel(int level) { int ctx = __printing_context(); int cpu_level = this_cpu_read(cpu_loglevels[ctx]); // this one is important if (level == LOGLEVEL_SCHED) return level; // we are not in emergency context if (cpu_level == LOGLEVEL_DEBUG + 1) return level; // we better not override these if (LOGLEVEL_EMERG <= level && level <= LOGLEVEL_ERR) return level; return cpu_level; } void printk_emergency_enter(int log_level) { int ctx; preempt_disable(); ctx = __printing_context(); this_cpu_write(cpu_loglevels[ctx], log_level); } void printk_emergency_exit(void) { int ctx = __printing_context(); this_cpu_write(cpu_loglevels[ctx], LOGLEVEL_DEBUG + 1); preempt_enable(); } void vprintk_emit(...) { level = adj_context_loglevel(level); } // // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - // // static void __show_stack(struct task_struct *task, unsigned long *sp) { printk(); ... printk(); } void show_stack(struct task_struct *task, unsigned long *sp, int log_level) { printk_emergency_enter(log_level); __show_stack(task, sp); printk_emergency_exit(); } // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - // show_stack() never schedules, disabling preemption around it should not change anything. Should it be interrupted, we will handle it via preempt count. printk_emergency_enter(log_level) handles every printk() that __show_stack() and friends do. Not worse than printk("%s Stack", lvl); all over the place. > What I'm going to do - is to fix all build and reported issues, I'll > send v2 this week and feel free to NAK it, I will forget about those > patches and won't be offended. Lovely. And - no, I'm not going to NAK platform specific changes. Just so you know. *All* I'm talking about is an alternative, less "go and touch a ton of platform code" approach. The argument "I patched so many files that I'm not even going to discuss anything now" is not productive, to say the least. Hope this clarifies. -ss
Hi Sergey, On 11/12/19 2:17 AM, Sergey Senozhatsky wrote: > On (19/11/11 19:47), Dmitry Safonov wrote: [..] >> What I'm going to do - is to fix all build and reported issues, I'll >> send v2 this week and feel free to NAK it, I will forget about those >> patches and won't be offended. > > Lovely. > And - no, I'm not going to NAK platform specific changes. Just so you know. > > *All* I'm talking about is an alternative, less "go and touch a ton of > platform code" approach. The argument "I patched so many files that I'm > not even going to discuss anything now" is not productive, to say the > least. Hope this clarifies. It probably was a wrong impression from the both sides. My impression was "You touch every architecture - we won't even consider that". Sorry for the the wrong impression from my side - I'm open for discussion. In my point of view the cost of one-time [mostly build] testing every architecture is cheaper than introducing some new smart code that will live forever. Though, again you and Petr understand more than me in printk() code, so I'm not any insisting. I'll reply to your suggestion tomorrow, it's a bit late in my tz. Thanks, Dmitry
On (19/11/12 02:40), Dmitry Safonov wrote: [..] > In my point of view the cost of one-time [mostly build] testing every > architecture is cheaper than introducing some new smart code that will > live forever. Well, there may be the need to pass loglevel deeper due to "hey __show_stack() on that arch invokes bar(), which invokes foo() and now foo() does printk(), but we don't see it". The context which decided to backtaraces decided to do so for a reason, probably, so I guess we can look at it as "a special error reporting code block". The proposed patch set passes loglevel via slightly unusual channel - via sprintf(). We probably can do it, but I would prefer to minimize the number of such printk-s in the kernel. The code snippet which I posted also does pretty unusual thing w.r.t loglevel. Both approaches are "non-standard" from that POV. > I'll reply to your suggestion tomorrow, it's a bit late in my tz. Sure. To anyone who will comment on that code snippet - this is not a "look, here is what you need to do" type of proposal. Just an alternative approach with its pros and cons. We had several requests over the years to have something like "forcibly allow all underlying printk-s from here to here" or "forcibly suppress or postpone underlying printk-s from here to here", etc. -ss
On (19/11/11 10:12), Petr Mladek wrote: [..] > > I do recall that we talked about per-CPU printk state bit which would > > start/end "just print it" section. We probably can extend it to "just > > log_store" type of functionality. Doesn't look like a very bad idea. > > The problem with per-CPU printk is that we would need to disable > interrupts. Or disable preemption and have loglevel per-CPU and per-context. preempt_count can navigate us to the right context loglevel on particular CPU. I'm talking here only about backtrace (error) reporting contexts. Those can be atomic perfectly fine. I posted a silly code snippet. [..] > But yes, this per-code-section loglevel is problematic. The feedback > against the patchset shows that people want it also the other way. > I mean to keep pr_debug() as pr_debug(). Hmm. Right. > A solution might be to use the per-code-section loglevel only instead > of some special loglevel. So maybe we can "overwrite" only KERN_DEFAULT loglevels? We certainly should not mess with SCHED or with anything in between EMERG and ERR. > The explicitly passed loglevel makes me feel more confident that > all needed printk() calls were updated. But it might be a false > feeling. I do not really have any strong preference. I'm not like really objecting, just trying to explore some other options. -ss
On (19/11/12 13:44), Sergey Senozhatsky wrote: [..] > > But yes, this per-code-section loglevel is problematic. The feedback > > against the patchset shows that people want it also the other way. > > I mean to keep pr_debug() as pr_debug(). > > Hmm. Right. > > > A solution might be to use the per-code-section loglevel only instead > > of some special loglevel. > > So maybe we can "overwrite" only KERN_DEFAULT loglevels? LOGLEVEL_DEFAULT, LOGLEVEL_NOTICE, LOGLEVEL_INFO? So we can downgrade some messages (log_store() only) or promote some messages. DEBUG perhaps should stay debug. > We certainly should not mess with SCHED or with anything in between > EMERG and ERR. [EMERG, WARN] -ss
On Tue 2019-11-12 13:57:04, Sergey Senozhatsky wrote: > On (19/11/12 13:44), Sergey Senozhatsky wrote: > [..] > > > But yes, this per-code-section loglevel is problematic. The feedback > > > against the patchset shows that people want it also the other way. > > > I mean to keep pr_debug() as pr_debug(). > > > > Hmm. Right. > > > > > A solution might be to use the per-code-section loglevel only instead > > > of some special loglevel. > > > > So maybe we can "overwrite" only KERN_DEFAULT loglevels? > > LOGLEVEL_DEFAULT, LOGLEVEL_NOTICE, LOGLEVEL_INFO? > > So we can downgrade some messages (log_store() only) or promote > some messages. > > DEBUG perhaps should stay debug. > > > We certainly should not mess with SCHED or with anything in between > > EMERG and ERR. > > [EMERG, WARN] This is getting too complicated. It would introduce too many hidden rules. While the explicitly passed loglevel parameter is straightforward and clear. I am getting more inclined to the solution introduced by this patchset. It looks reasonable for the different use-cases. Best Regards, Petr
On (19/11/12 09:35), Petr Mladek wrote: [..] > This is getting too complicated. It would introduce too many > hidden rules. While the explicitly passed loglevel parameter > is straightforward and clear. If loglevel is DEFAULT or NOTICE or INFO then we can overwrite it (either downgrade or upgrade). That's one rule, basically. Not too complicated, I guess. > I am getting more inclined to the solution introduced by this > patchset. It looks reasonable for the different use-cases. No pressure from my side. Up to arch maintainers. -ss
On (19/11/12 19:12), Sergey Senozhatsky wrote: > On (19/11/12 09:35), Petr Mladek wrote: > [..] > > This is getting too complicated. It would introduce too many > > hidden rules. While the explicitly passed loglevel parameter > > is straightforward and clear. > > If loglevel is DEFAULT or NOTICE or INFO then we can overwrite it > (either downgrade or upgrade). That's one rule, basically. Not too > complicated, I guess. Can be taken even a bit further than show_stack(NULL, NULL, LOGLEVEL_DEBUG); or show_stack(NULL, NULL, LOGLEVEL_ERR); For instance, spin_lock_irqsave(&rq->lock, flags); printk_emergency_enter(LOGLEVEL_SCHED); ... show_stack(...); printk(); printk(); ... spin_unlock_irqrestore(&rq->lock, flags); or spin_lock_irqsave(&uart_port->lock, flags); printk_emergency_enter(LOGLEVEL_SCHED); ... printk(); printk(); ... and so on. -ss
On 11/13/19 1:23 AM, Sergey Senozhatsky wrote: > On (19/11/12 19:12), Sergey Senozhatsky wrote: >> On (19/11/12 09:35), Petr Mladek wrote: >> [..] >>> This is getting too complicated. It would introduce too many >>> hidden rules. While the explicitly passed loglevel parameter >>> is straightforward and clear. >> >> If loglevel is DEFAULT or NOTICE or INFO then we can overwrite it >> (either downgrade or upgrade). That's one rule, basically. Not too >> complicated, I guess. > > Can be taken even a bit further than > > show_stack(NULL, NULL, LOGLEVEL_DEBUG); > or > show_stack(NULL, NULL, LOGLEVEL_ERR); > > For instance, > > spin_lock_irqsave(&rq->lock, flags); > printk_emergency_enter(LOGLEVEL_SCHED); > ... > show_stack(...); > printk(); > printk(); > ... > spin_unlock_irqrestore(&rq->lock, flags); > > or > > spin_lock_irqsave(&uart_port->lock, flags); > printk_emergency_enter(LOGLEVEL_SCHED); > ... > printk(); > printk(); > ... Yeah, that makes sense. I believe it's up to you, Petr and Steven to decide what's better for printk(). I mean, after all you care for all this code. I guess I've pointed that in my point of view price for one-time testing code is cheaper than adding a new printk feature to swap log levels on the fly. But again, it's just how I see it - if we fix those printing problems, than in half year we will forget they ever existed, but in your proposal, there will still be some clever printk code. On the other side, also worth to note that current patches set fix problems for kdb (and for my hung task printing patch), but it's incomplete for sysrq driver. I've gone through functions used by sysrq driver and the same changes introducing log level parameter would be needed for: sched_show_task(), debug_show_all_locks(), show_regs(), show_state(), show_mem(). Some of them don't need any platform changes, but at least show_regs() needs. So, you also need to have that in mind making the decision. Thanks, Dmitry
On 11/12/19 4:25 AM, Sergey Senozhatsky wrote: > On (19/11/12 02:40), Dmitry Safonov wrote: > [..] >> In my point of view the cost of one-time [mostly build] testing every >> architecture is cheaper than introducing some new smart code that will >> live forever. > > Well, there may be the need to pass loglevel deeper due to "hey __show_stack() > on that arch invokes bar(), which invokes foo() and now foo() does printk(), > but we don't see it". The context which decided to backtaraces decided > to do so for a reason, probably, so I guess we can look at it as "a special > error reporting code block". > > The proposed patch set passes loglevel via slightly unusual channel - > via sprintf(). We probably can do it, but I would prefer to minimize > the number of such printk-s in the kernel. The code snippet which I > posted also does pretty unusual thing w.r.t loglevel. Both approaches > are "non-standard" from that POV. I don't strongly disagree, but if you look at those results: git grep 'printk("%s.*", \(lvl\|level\)' it seems to be used in quite a few places. Thanks, Dmitry
On (19/11/13 02:41), Dmitry Safonov wrote: [..] > > I don't strongly disagree, but if you look at those results: > git grep 'printk("%s.*", \(lvl\|level\)' > > it seems to be used in quite a few places. Yes, you are right, it is used in some places. That's why I said that I'd prefer to keep that number low (minimize it). But it's not 0 (that ship has sailed). -ss
On (19/11/13 02:25), Dmitry Safonov wrote: > I guess I've pointed that in my point of view price for one-time testing > code is cheaper than adding a new printk feature to swap log levels on > the fly. [..] > I've gone through functions used by sysrq driver and the same changes > introducing log level parameter would be needed for: sched_show_task(), > debug_show_all_locks(), show_regs(), show_state(), show_mem(). Some of > them don't need any platform changes, but at least show_regs() needs. Good points and nice conclusion. Well, here we go. There is a number of generally useful functions that print nice data and where people might want to have better loglevel control (for debugging purposes). show_stack() is just one of them. Patching all those functions, which you have mentioned above, is hardly a fun task to do. Hence the printk() per-CPU per-context loglevel proposition. The code there is not clever or complicated and is meant for debugging purposes only, but with just 3 lines of code one can do some stuff: /* @BTW you can't do this with "%s" KERN_FOO ;P */ + printk_emergency_enter(LOGLEVEL_SCHED); + debug_show_all_locks(); + printk_emergency_exit(); Now... I'm not sure if this whole thing is up to "printk maintainers only". If no one is going to use "emergency printk contexts" then there is no point in having that code in the kernel. -ss
On Wed 2019-11-13 15:33:34, Sergey Senozhatsky wrote: > On (19/11/13 02:25), Dmitry Safonov wrote: > > I guess I've pointed that in my point of view price for one-time testing > > code is cheaper than adding a new printk feature to swap log levels on > > the fly. > [..] > > I've gone through functions used by sysrq driver and the same changes > > introducing log level parameter would be needed for: sched_show_task(), > > debug_show_all_locks(), show_regs(), show_state(), show_mem(). Some of > > them don't need any platform changes, but at least show_regs() needs. > > Good points and nice conclusion. > > Well, here we go. There is a number of generally useful functions that > print nice data and where people might want to have better loglevel control > (for debugging purposes). show_stack() is just one of them. Could you please provide some examples so that we get an idea about the scope, usefulness, and requirements? > Patching all > those functions, which you have mentioned above, is hardly a fun task to do. > Hence the printk() per-CPU per-context loglevel proposition. The code there > is not clever or complicated and is meant for debugging purposes only, but > with just 3 lines of code one can do some stuff: > > /* @BTW you can't do this with "%s" KERN_FOO ;P */ > + printk_emergency_enter(LOGLEVEL_SCHED); > + debug_show_all_locks(); > + printk_emergency_exit(); But this will not solve situations where the original loglevel should stay from any reason. It happened in this patchset, see https://lkml.kernel.org/r/20191106091258.GS25745@shell.armlinux.org.uk https://lkml.kernel.org/r/20191106132516.GC5808@willie-the-truck We would need to investigate more potential users of this feature to see eventual requirements. If there are too many exceptions and modes then the generic API might get pretty complicated. At the moment, I am in favor of this patchset. It is huge and needed a lot of manual work. But the result is straightforward and easy to understand. Best Regards, Petr
On Tue, 12 Nov 2019 13:44:47 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > > > I do recall that we talked about per-CPU printk state bit which would > > > start/end "just print it" section. We probably can extend it to "just > > > log_store" type of functionality. Doesn't look like a very bad idea. > > > > The problem with per-CPU printk is that we would need to disable > > interrupts. > > Or disable preemption and have loglevel per-CPU and per-context. > preempt_count can navigate us to the right context loglevel on > particular CPU. I'm talking here only about backtrace (error) > reporting contexts. Those can be atomic perfectly fine. With my real-time hat on, I'm totally against disabling of preemption for this purpose. -- Steve
On Tue, 12 Nov 2019 11:17:47 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > void show_stack(struct task_struct *task, unsigned long *sp, int log_level) > { > printk_emergency_enter(log_level); > __show_stack(task, sp); > printk_emergency_exit(); > } > // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - // > > show_stack() never schedules, disabling preemption around it should > not change anything. Should it be interrupted, we will handle it via > preempt count. Please no! The whole point of the printk rewrite was to allow for printk to be preemptible and used in more contexts. The show_stack() can be all over the place and is not a fast function. Let's not disable preemption for it. -- Steve
On Wed, 6 Nov 2019 21:34:40 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > I suppose I'm surprised there are backtraces that are not important. > Either badness happened and it needs printing, or the user asked for it > and it needs printing. Unfortunately that is the case. As my tests will fail if a backtrace is detected. > > Perhaps we should be removing backtraces if they're not important > instead of allowing to print them as lower loglevels? I usually end up removing backtraces for my tests, so I'm for this. Specifically this happens in the drm and i915 drivers :-p -- Steve
On Wed, 6 Nov 2019 23:25:13 +0000 Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > On Wed, Nov 06, 2019 at 09:34:40PM +0100, Peter Zijlstra wrote: > > I suppose I'm surprised there are backtraces that are not important. > > Either badness happened and it needs printing, or the user asked for it > > and it needs printing. > > Or utterly meaningless. > > > Perhaps we should be removing backtraces if they're not important > > instead of allowing to print them as lower loglevels? > > Definitely! WARN_ON() is well overused - and as is typical, used > without much thought. Bound to happen after Linus got shirty about > BUG_ON() being over used. Everyone just grabbed the next nearest thing > to assert(). > > As a kind of example, I've recently come across one WARN_ON() in a > driver subsystem (that shall remain nameless at the moment) which very > likely has multiple different devices on a platform. The WARN_ON() > triggers as a result of a problem with the hardware, but because it's a > WARN_ON(), you've no idea which device has a problem. The backtrace is > mostly meaningless. So you know that a problem has occurred, but the > kernel prints *useless* backtrace to let you know, and totally omits > the *useful* information. > I would like to bring up a topic for the next maintainers summit (although I may not even be there), that we define a clear use of WARN_ON(). I use it only if the code does something I do not expect it to do, and is considered a bug in the code if it triggers. But it appears that some drivers use it for "oh I didn't realize this hardware does something I didn't expect". And is ignored when the warn on is triggered and reported, with "you have buggy hardware" but my hardware appears to work just fine! -- Steve
On Wed, 13 Nov 2019 09:47:22 +0100 Petr Mladek <pmladek@suse.com> wrote: > At the moment, I am in favor of this patchset. It is huge and > needed a lot of manual work. But the result is straightforward and > easy to understand. I'm in favor of this patchset too. If there's other areas that need to adjust the current loglevel (say per task context), then we can cross that bridge when the need arises. But I don't want to over engineer this as the stack trace logic should have a way to explicitly state how important this stack trace really is (or better yet, we should be removing stack traces that are not important!) -- Steve
Hi Sergey, On 11/13/19 6:33 AM, Sergey Senozhatsky wrote: [..] > Well, here we go. There is a number of generally useful functions that > print nice data and where people might want to have better loglevel control > (for debugging purposes). show_stack() is just one of them. Patching all > those functions, which you have mentioned above, is hardly a fun task to do. > Hence the printk() per-CPU per-context loglevel proposition. The code there > is not clever or complicated and is meant for debugging purposes only, but > with just 3 lines of code one can do some stuff: > > /* @BTW you can't do this with "%s" KERN_FOO ;P */ > + printk_emergency_enter(LOGLEVEL_SCHED); > + debug_show_all_locks(); > + printk_emergency_exit(); Not that I want to critic your proposal more, but just to clarify what I've meant by "cleaver and complicated": I don't think semantically there is any difference between: printk_emergency_enter(LOGLEVEL_SCHED); printk(); printk_emergency_exit(); vs printk("%s ...", KERN_SHED, ...); I was speaking about complexity not about usage, but about realization inside printk_emergency_enter(): there will be some business logic that will do "it's NMI? Use loglevel given. it's KERN_ALERT? Don't downgrade the loglevel..." and other smart details those are really logic which one have to think about and later maintain. There will be also minor issues like people inserting print with one log level and seeing it in dmesg with another, but I presume those printk_enter() and printk_exit() will cover little amount of code without much nesting [as long as it's not getting overused]. And also it can be documented and people will learn about another feature of printk(). And this year I've seen the presentation "Why printk() is so complicated?" and I presumed that the approach is to keep things as simple as possible. In conclusion: I see that your proposal also solves the problem [except preemption and some pr_debug() that shouldn't be printed]. And I think your approach is better in the sense of short-term (we won't have any missed %s KERN_ in linux-next), but in a long-term it adds some amount of business logic to printk and another feature. Just in case: again, I don't mind, it's up to you, maintainers of printk. It's also not very fun for me to create those patches. But they fix console_loglevel issues (I hope we could un-export it in the end) and also I need it for my other patches those will produce warnings with debug loglevel when configured through sysctl. Thanks, Dmitry
On (19/11/13 16:40), Dmitry Safonov wrote: > It's also not very fun for me to create those patches. > But they fix console_loglevel issues (I hope we could un-export it in > the end) and also I need it for my other patches those will produce > warnings with debug loglevel when configured through sysctl. No objections. -ss
On (19/11/13 10:39), Steven Rostedt wrote: [..] > > void show_stack(struct task_struct *task, unsigned long *sp, int log_level) > > { > > printk_emergency_enter(log_level); > > __show_stack(task, sp); > > printk_emergency_exit(); > > } > > // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - // > > > > show_stack() never schedules, disabling preemption around it should > > not change anything. Should it be interrupted, we will handle it via > > preempt count. > > Please no! The whole point of the printk rewrite was to allow for > printk to be preemptible and used in more contexts. The show_stack() can > be all over the place and is not a fast function. Let's not disable > preemption for it. I never said that this code should be used all over the place. What I did say several times was that this code is for quick debugging, when one sits behind a slow serial console and wants to tweak loglevel of all printk-s of a particular context/function only. -ss