Message ID | 1468326012-15910-1-git-send-email-kernel@kyup.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, Jul 12, 2016 at 03:20:12PM +0300, Nikolay Borisov wrote: > Currently the btrfs printk infrastructure doesn't implement any > kind of ratelimiting. If you count the whole infrastructure, it does. See ctree.h and macros ending with _rl (btrfs_err_rl), and should be used where the messages are likely to flood. Otherwise I think "more is better" regarding messages as this is helpful when debugging issues. > Recently I came accross a case where due to > FS corruption an excessive amount of printk caused the softlockup > detector to trigger and reset the server. This patch aims to avoid > two types of issue: > * I want to avoid different levels of messages interefere with the > ratelimiting of oneanother so as to avoid a situation where a > flood of INFO messages causes the ratelimit to trigger, > potentially leading to supression of more important messages. Yeah, that's my concern as well. What if there's a burst of several error messages that do not fit to the limit and some of them get dropped. > * Avoid a flood of any type of messages rendering the machine > unusable While I'd rather set a per-message ratelimiting, it's possible that an unexpected error will start flooding. So some sort of per-level limiting could be implemented, as you propose, but I'd suggest to set the numbers higher. That way it would still flood up to certain level but should avoid the lockups. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/12/2016 04:20 PM, David Sterba wrote: > On Tue, Jul 12, 2016 at 03:20:12PM +0300, Nikolay Borisov wrote: >> Currently the btrfs printk infrastructure doesn't implement any >> kind of ratelimiting. > > If you count the whole infrastructure, it does. See ctree.h and macros > ending with _rl (btrfs_err_rl), and should be used where the messages > are likely to flood. Otherwise I think "more is better" regarding > messages as this is helpful when debugging issues. So I definitely didn't look at those. But now I have and it seems they implement more or less the same thing. Also, if I'm reading the code correctly, as it stands now using the _rl versions seem to be more flexible as the limits is going to be per-message rather than per-message class as it is in my proposal. So I'd rather move that particular csum related message to the _rl infrastructure. > >> Recently I came accross a case where due to >> FS corruption an excessive amount of printk caused the softlockup >> detector to trigger and reset the server. This patch aims to avoid >> two types of issue: >> * I want to avoid different levels of messages interefere with the >> ratelimiting of oneanother so as to avoid a situation where a >> flood of INFO messages causes the ratelimit to trigger, >> potentially leading to supression of more important messages. > > Yeah, that's my concern as well. What if there's a burst of several > error messages that do not fit to the limit and some of them get > dropped. > >> * Avoid a flood of any type of messages rendering the machine >> unusable > > While I'd rather set a per-message ratelimiting, it's possible that an > unexpected error will start flooding. So some sort of per-level limiting > could be implemented, as you propose, but I'd suggest to set the numbers > higher. That way it would still flood up to certain level but should > avoid the lockups. Sure, I'm happy to set the limits higher and have it act as a safety net. But then again, what would make a sensible limit - 100 messages in 5 seconds seems reasonable but this is completely arbitrary. Any thoughts? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 12, 2016 at 04:47:53PM +0300, Nikolay Borisov wrote: > > > On 07/12/2016 04:20 PM, David Sterba wrote: > > On Tue, Jul 12, 2016 at 03:20:12PM +0300, Nikolay Borisov wrote: > >> Currently the btrfs printk infrastructure doesn't implement any > >> kind of ratelimiting. > > > > If you count the whole infrastructure, it does. See ctree.h and macros > > ending with _rl (btrfs_err_rl), and should be used where the messages > > are likely to flood. Otherwise I think "more is better" regarding > > messages as this is helpful when debugging issues. > > So I definitely didn't look at those. But now I have and it seems they > implement more or less the same thing. Also, if I'm reading the code > correctly, as it stands now using the _rl versions seem to be more > flexible as the limits is going to be per-message rather than > per-message class as it is in my proposal. So I'd rather move that > particular csum related message to the _rl infrastructure. Ok. > >> Recently I came accross a case where due to > >> FS corruption an excessive amount of printk caused the softlockup > >> detector to trigger and reset the server. This patch aims to avoid > >> two types of issue: > >> * I want to avoid different levels of messages interefere with the > >> ratelimiting of oneanother so as to avoid a situation where a > >> flood of INFO messages causes the ratelimit to trigger, > >> potentially leading to supression of more important messages. > > > > Yeah, that's my concern as well. What if there's a burst of several > > error messages that do not fit to the limit and some of them get > > dropped. > > > >> * Avoid a flood of any type of messages rendering the machine > >> unusable > > > > While I'd rather set a per-message ratelimiting, it's possible that an > > unexpected error will start flooding. So some sort of per-level limiting > > could be implemented, as you propose, but I'd suggest to set the numbers > > higher. That way it would still flood up to certain level but should > > avoid the lockups. > > Sure, I'm happy to set the limits higher and have it act as a safety > net. But then again, what would make a sensible limit - 100 messages in > 5 seconds seems reasonable but this is completely arbitrary. Any thoughts? 5 seconds "should be" enough for the system to do other work, and 100 messages could mean 5-8 kB of text. We're talking about a situation when things are really bad, messages are likely to be duplicated so there should be enough information captured by the time the ratelimiting kicks in. So, let it be 100. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index fe609b81dd1b..85b0a4517e30 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -188,6 +188,25 @@ static const char * const logtypes[] = { "debug", }; +static struct ratelimit_state printk_limits[] = { + RATELIMIT_STATE_INIT(printk_limits[0], DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST), + RATELIMIT_STATE_INIT(printk_limits[1], DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST), + RATELIMIT_STATE_INIT(printk_limits[2], DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST), + RATELIMIT_STATE_INIT(printk_limits[3], DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST), + RATELIMIT_STATE_INIT(printk_limits[4], DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST), + RATELIMIT_STATE_INIT(printk_limits[5], DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST), + RATELIMIT_STATE_INIT(printk_limits[6], DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST), + RATELIMIT_STATE_INIT(printk_limits[7], DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST), +}; + void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) { struct super_block *sb = fs_info->sb; @@ -196,6 +215,7 @@ void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) va_list args; const char *type = logtypes[4]; int kern_level; + struct ratelimit_state *ratelimit; va_start(args, fmt); @@ -206,13 +226,18 @@ void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) lvl[size] = '\0'; fmt += size; type = logtypes[kern_level - '0']; - } else + ratelimit = &printk_limits[kern_level - '0']; + } else { *lvl = '\0'; + /* Default to debug output */ + ratelimit = &printk_limits[7]; + } vaf.fmt = fmt; vaf.va = &args; - printk("%sBTRFS %s (device %s): %pV\n", lvl, type, sb->s_id, &vaf); + if (__ratelimit(ratelimit)) + printk("%sBTRFS %s (device %s): %pV\n", lvl, type, sb->s_id, &vaf); va_end(args); }