[PATCHv3] lib/ratelimit: Lockless ratelimiting

Message ID	20180703225628.25684-1-dima@arista.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Dmitry Safonov <dima@arista.com> To: linux-kernel@vger.kernel.org Date: Tue, 3 Jul 2018 23:56:28 +0100 Message-Id: <20180703225628.25684-1-dima@arista.com> Subject: [Intel-gfx] [PATCHv3] lib/ratelimit: Lockless ratelimiting Precedence: list Cc: Theodore Ts'o <tytso@mit.edu>, Arnd Bergmann <arnd@arndb.de>, Dmitry Safonov <dima@arista.com>, David Airlie <airlied@linux.ie>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, intel-gfx@lists.freedesktop.org, Andy Shevchenko <andy.shevchenko@gmail.com>, dri-devel@lists.freedesktop.org, Rodrigo Vivi <rodrigo.vivi@intel.com>, Thomas Gleixner <tglx@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Dmitry Safonov July 3, 2018, 10:56 p.m. UTC

Currently ratelimit_state is protected with spin_lock. If the .lock is
taken at the moment of ___ratelimit() call, the message is suppressed to
make ratelimiting robust.

That results in the following issue issue:
      CPU0                          CPU1
------------------           ------------------
printk_ratelimit()           printk_ratelimit()
        |                             |
  try_spin_lock()              try_spin_lock()
        |                             |
time_is_before_jiffies()          return 0; // suppress

So, concurrent call of ___ratelimit() results in silently suppressing
one of the messages, regardless if the limit is reached or not.
And rc->missed is not increased in such case so the issue is covered
from user.

Convert ratelimiting to use atomic counters and drop spin_lock.

Note: That might be unexpected, but with the first interval of messages
storm one can print up to burst*2 messages. So, it doesn't guarantee
that in *any* interval amount of messages is lesser than burst.
But, that differs lightly from previous behavior where one can start
burst=5 interval and print 4 messages on the last milliseconds of
interval + new 5 messages from new interval (totally 9 messages in
lesser than interval value):

   msg0              msg1-msg4 msg0-msg4
    |                     |       |
    |                     |       |
 |--o---------------------o-|-----o--------------------|--> (t)
                          <------->
                       Lesser than burst

Dropped dev/random patch since v1 version:
lkml.kernel.org/r/<20180510125211.12583-1-dima@arista.com>

Dropped `name' in as it's unused in RATELIMIT_STATE_INIT()

Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/char/random.c            | 28 ++++++++-----------
 drivers/gpu/drm/i915/i915_perf.c |  8 ++++--
 fs/btrfs/super.c                 | 16 +++++------
 fs/xfs/scrub/scrub.c             |  2 +-
 include/linux/ratelimit.h        | 31 ++++++++++++---------
 kernel/user.c                    |  2 +-
 lib/ratelimit.c                  | 59 +++++++++++++++++++---------------------
 7 files changed, 73 insertions(+), 73 deletions(-)

Dmitry Safonov July 17, 2018, 12:59 a.m. UTC | #1

I would be glad if someone helps/bothers to review the change :C

Thanks,
Dmitry

On Tue, 2018-07-03 at 23:56 +0100, Dmitry Safonov wrote:
> Currently ratelimit_state is protected with spin_lock. If the .lock
> is
> taken at the moment of ___ratelimit() call, the message is suppressed
> to
> make ratelimiting robust.
> 
> That results in the following issue issue:
>       CPU0                          CPU1
> ------------------           ------------------
> printk_ratelimit()           printk_ratelimit()
>         |                             |
>   try_spin_lock()              try_spin_lock()
>         |                             |
> time_is_before_jiffies()          return 0; // suppress
> 
> So, concurrent call of ___ratelimit() results in silently suppressing
> one of the messages, regardless if the limit is reached or not.
> And rc->missed is not increased in such case so the issue is covered
> from user.
> 
> Convert ratelimiting to use atomic counters and drop spin_lock.
> 
> Note: That might be unexpected, but with the first interval of
> messages
> storm one can print up to burst*2 messages. So, it doesn't guarantee
> that in *any* interval amount of messages is lesser than burst.
> But, that differs lightly from previous behavior where one can start
> burst=5 interval and print 4 messages on the last milliseconds of
> interval + new 5 messages from new interval (totally 9 messages in
> lesser than interval value):
> 
>    msg0              msg1-msg4 msg0-msg4
>     |                     |       |
>     |                     |       |
>  |--o---------------------o-|-----o--------------------|--> (t)
>                           <------->
>                        Lesser than burst
> 
> Dropped dev/random patch since v1 version:
> lkml.kernel.org/r/<20180510125211.12583-1-dima@arista.com>
> 
> Dropped `name' in as it's unused in RATELIMIT_STATE_INIT()
> 
> Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Jani Nikula <jani.nikula@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: intel-gfx@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  drivers/char/random.c            | 28 ++++++++-----------
>  drivers/gpu/drm/i915/i915_perf.c |  8 ++++--
>  fs/btrfs/super.c                 | 16 +++++------
>  fs/xfs/scrub/scrub.c             |  2 +-
>  include/linux/ratelimit.h        | 31 ++++++++++++---------
>  kernel/user.c                    |  2 +-
>  lib/ratelimit.c                  | 59 +++++++++++++++++++-----------
> ----------
>  7 files changed, 73 insertions(+), 73 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index cd888d4ee605..0be31b3eadab 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -439,10 +439,8 @@ static void _crng_backtrack_protect(struct
> crng_state *crng,
>  static void process_random_ready_list(void);
>  static void _get_random_bytes(void *buf, int nbytes);
>  
> -static struct ratelimit_state unseeded_warning =
> -	RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
> -static struct ratelimit_state urandom_warning =
> -	RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
> +static struct ratelimit_state unseeded_warning =
> RATELIMIT_STATE_INIT(HZ, 3);
> +static struct ratelimit_state urandom_warning =
> RATELIMIT_STATE_INIT(HZ, 3);
>  
>  static int ratelimit_disable __read_mostly;
>  
> @@ -937,24 +935,22 @@ static void crng_reseed(struct crng_state
> *crng, struct entropy_store *r)
>  	crng->init_time = jiffies;
>  	spin_unlock_irqrestore(&crng->lock, flags);
>  	if (crng == &primary_crng && crng_init < 2) {
> +		unsigned int unseeded_miss, urandom_miss;
> +
>  		invalidate_batched_entropy();
>  		numa_crng_init();
>  		crng_init = 2;
>  		process_random_ready_list();
>  		wake_up_interruptible(&crng_init_wait);
>  		pr_notice("random: crng init done\n");
> -		if (unseeded_warning.missed) {
> -			pr_notice("random: %d get_random_xx
> warning(s) missed "
> -				  "due to ratelimiting\n",
> -				  unseeded_warning.missed);
> -			unseeded_warning.missed = 0;
> -		}
> -		if (urandom_warning.missed) {
> -			pr_notice("random: %d urandom warning(s)
> missed "
> -				  "due to ratelimiting\n",
> -				  urandom_warning.missed);
> -			urandom_warning.missed = 0;
> -		}
> +		unseeded_miss =
> atomic_xchg(&unseeded_warning.missed, 0);
> +		if (unseeded_miss)
> +			pr_notice("random: %u get_random_xx
> warning(s) missed "
> +				  "due to ratelimiting\n",
> unseeded_miss);
> +		urandom_miss = atomic_xchg(&urandom_warning.missed,
> 0);
> +		if (urandom_miss)
> +			pr_notice("random: %u urandom warning(s)
> missed "
> +				  "due to ratelimiting\n",
> urandom_miss);
>  	}
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_perf.c
> b/drivers/gpu/drm/i915/i915_perf.c
> index 019bd2d073ad..dcfba3023547 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1295,6 +1295,7 @@ free_oa_buffer(struct drm_i915_private *i915)
>  static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
>  {
>  	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	unsigned int msg_missed;
>  
>  	BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
>  
> @@ -1317,9 +1318,10 @@ static void i915_oa_stream_destroy(struct
> i915_perf_stream *stream)
>  
>  	put_oa_config(dev_priv, stream->oa_config);
>  
> -	if (dev_priv->perf.oa.spurious_report_rs.missed) {
> -		DRM_NOTE("%d spurious OA report notices suppressed
> due to ratelimiting\n",
> -			 dev_priv-
> >perf.oa.spurious_report_rs.missed);
> +	msg_missed = atomic_read(&dev_priv-
> >perf.oa.spurious_report_rs.missed);
> +	if (msg_missed) {
> +		DRM_NOTE("%u spurious OA report notices suppressed
> due to ratelimiting\n",
> +			 msg_missed);
>  	}
>  }
>  
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 81107ad49f3a..ffab6c23bc50 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -177,14 +177,14 @@ static const char * const logtypes[] = {
>   * messages doesn't cause more important ones to be dropped.
>   */
>  static struct ratelimit_state printk_limits[] = {
> -	RATELIMIT_STATE_INIT(printk_limits[0],
> DEFAULT_RATELIMIT_INTERVAL, 100),
> -	RATELIMIT_STATE_INIT(printk_limits[1],
> DEFAULT_RATELIMIT_INTERVAL, 100),
> -	RATELIMIT_STATE_INIT(printk_limits[2],
> DEFAULT_RATELIMIT_INTERVAL, 100),
> -	RATELIMIT_STATE_INIT(printk_limits[3],
> DEFAULT_RATELIMIT_INTERVAL, 100),
> -	RATELIMIT_STATE_INIT(printk_limits[4],
> DEFAULT_RATELIMIT_INTERVAL, 100),
> -	RATELIMIT_STATE_INIT(printk_limits[5],
> DEFAULT_RATELIMIT_INTERVAL, 100),
> -	RATELIMIT_STATE_INIT(printk_limits[6],
> DEFAULT_RATELIMIT_INTERVAL, 100),
> -	RATELIMIT_STATE_INIT(printk_limits[7],
> DEFAULT_RATELIMIT_INTERVAL, 100),
> +	RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
> +	RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
> +	RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
> +	RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
> +	RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
> +	RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
> +	RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
> +	RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
>  };
>  
>  void btrfs_printk(const struct btrfs_fs_info *fs_info, const char
> *fmt, ...)
> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> index 58ae76b3a421..8b58b439694a 100644
> --- a/fs/xfs/scrub/scrub.c
> +++ b/fs/xfs/scrub/scrub.c
> @@ -349,7 +349,7 @@ xfs_scrub_experimental_warning(
>  	struct xfs_mount	*mp)
>  {
>  	static struct ratelimit_state scrub_warning =
> RATELIMIT_STATE_INIT(
> -			"xfs_scrub_warning", 86400 * HZ, 1);
> +			86400 * HZ, 1);
>  	ratelimit_set_flags(&scrub_warning,
> RATELIMIT_MSG_ON_RELEASE);
>  
>  	if (__ratelimit(&scrub_warning))
> diff --git a/include/linux/ratelimit.h b/include/linux/ratelimit.h
> index 8ddf79e9207a..f9a9386dd869 100644
> --- a/include/linux/ratelimit.h
> +++ b/include/linux/ratelimit.h
> @@ -4,7 +4,6 @@
>  
>  #include <linux/param.h>
>  #include <linux/sched.h>
> -#include <linux/spinlock.h>
>  
>  #define DEFAULT_RATELIMIT_INTERVAL	(5 * HZ)
>  #define DEFAULT_RATELIMIT_BURST		10
> @@ -13,38 +12,39 @@
>  #define RATELIMIT_MSG_ON_RELEASE	BIT(0)
>  
>  struct ratelimit_state {
> -	raw_spinlock_t	lock;		/* protect the
> state */
> +	atomic_t	printed;
> +	atomic_t	missed;
>  
>  	int		interval;
>  	int		burst;
> -	int		printed;
> -	int		missed;
>  	unsigned long	begin;
>  	unsigned long	flags;
>  };
>  
> -#define RATELIMIT_STATE_INIT(name, interval_init, burst_init) {	
> 	\
> -		.lock		=
> __RAW_SPIN_LOCK_UNLOCKED(name.lock),	\
> +#define RATELIMIT_STATE_INIT(interval_init, burst_init) {		
> \
>  		.interval	= interval_init,			
> \
>  		.burst		= burst_init,			
> 	\
> +		.printed	= ATOMIC_INIT(0),			
> \
> +		.missed		= ATOMIC_INIT(0),		
> 	\
>  	}
>  
>  #define RATELIMIT_STATE_INIT_DISABLED				
> 	\
> -	RATELIMIT_STATE_INIT(ratelimit_state, 0,
> DEFAULT_RATELIMIT_BURST)
> +	RATELIMIT_STATE_INIT(0, DEFAULT_RATELIMIT_BURST)
>  
>  #define DEFINE_RATELIMIT_STATE(name, interval_init, burst_init)	
> 	\
>  									
> \
>  	struct ratelimit_state name =				
> 	\
> -		RATELIMIT_STATE_INIT(name, interval_init,
> burst_init)	\
> +		RATELIMIT_STATE_INIT(interval_init, burst_init)
>  
>  static inline void ratelimit_state_init(struct ratelimit_state *rs,
>  					int interval, int burst)
>  {
>  	memset(rs, 0, sizeof(*rs));
>  
> -	raw_spin_lock_init(&rs->lock);
>  	rs->interval	= interval;
>  	rs->burst	= burst;
> +	atomic_set(&rs->printed, 0);
> +	atomic_set(&rs->missed, 0);
>  }
>  
>  static inline void ratelimit_default_init(struct ratelimit_state
> *rs)
> @@ -53,16 +53,21 @@ static inline void ratelimit_default_init(struct
> ratelimit_state *rs)
>  					DEFAULT_RATELIMIT_BURST);
>  }
>  
> +/*
> + * Keeping It Simple: not reenterable and not safe for concurrent
> + * ___ratelimit() call as used only by devkmsg_release().
> + */
>  static inline void ratelimit_state_exit(struct ratelimit_state *rs)
>  {
> +	int missed;
> +
>  	if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
>  		return;
>  
> -	if (rs->missed) {
> +	missed = atomic_xchg(&rs->missed, 0);
> +	if (missed)
>  		pr_warn("%s: %d output lines suppressed due to
> ratelimiting\n",
> -			current->comm, rs->missed);
> -		rs->missed = 0;
> -	}
> +			current->comm, missed);
>  }
>  
>  static inline void
> diff --git a/kernel/user.c b/kernel/user.c
> index 36288d840675..a964f842d8f0 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -101,7 +101,7 @@ struct user_struct root_user = {
>  	.sigpending	= ATOMIC_INIT(0),
>  	.locked_shm     = 0,
>  	.uid		= GLOBAL_ROOT_UID,
> -	.ratelimit	=
> RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
> +	.ratelimit	= RATELIMIT_STATE_INIT(0, 0),
>  };
>  
>  /*
> diff --git a/lib/ratelimit.c b/lib/ratelimit.c
> index d01f47135239..d9b749d40108 100644
> --- a/lib/ratelimit.c
> +++ b/lib/ratelimit.c
> @@ -13,6 +13,18 @@
>  #include <linux/jiffies.h>
>  #include <linux/export.h>
>  
> +static void ratelimit_end_interval(struct ratelimit_state *rs, const
> char *func)
> +{
> +	rs->begin = jiffies;
> +
> +	if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
> +		unsigned int missed = atomic_xchg(&rs->missed, 0);
> +
> +		if (missed)
> +			pr_warn("%s: %u callbacks suppressed\n",
> func, missed);
> +	}
> +}
> +
>  /*
>   * __ratelimit - rate limiting
>   * @rs: ratelimit_state data
> @@ -27,45 +39,30 @@
>   */
>  int ___ratelimit(struct ratelimit_state *rs, const char *func)
>  {
> -	unsigned long flags;
> -	int ret;
> -
>  	if (!rs->interval)
>  		return 1;
>  
> -	/*
> -	 * If we contend on this state's lock then almost
> -	 * by definition we are too busy to print a message,
> -	 * in addition to the one that will be printed by
> -	 * the entity that is holding the lock already:
> -	 */
> -	if (!raw_spin_trylock_irqsave(&rs->lock, flags))
> +	if (unlikely(!rs->burst)) {
> +		atomic_add_unless(&rs->missed, 1, -1);
> +		if (time_is_before_jiffies(rs->begin + rs-
> >interval))
> +			ratelimit_end_interval(rs, func);
> +
>  		return 0;
> +	}
>  
> -	if (!rs->begin)
> -		rs->begin = jiffies;
> +	if (atomic_add_unless(&rs->printed, 1, rs->burst))
> +		return 1;
>  
>  	if (time_is_before_jiffies(rs->begin + rs->interval)) {
> -		if (rs->missed) {
> -			if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
> {
> -				printk_deferred(KERN_WARNING
> -						"%s: %d callbacks
> suppressed\n",
> -						func, rs->missed);
> -				rs->missed = 0;
> -			}
> -		}
> -		rs->begin   = jiffies;
> -		rs->printed = 0;
> -	}
> -	if (rs->burst && rs->burst > rs->printed) {
> -		rs->printed++;
> -		ret = 1;
> -	} else {
> -		rs->missed++;
> -		ret = 0;
> +		if (atomic_cmpxchg(&rs->printed, rs->burst, 0))
> +			ratelimit_end_interval(rs, func);
>  	}
> -	raw_spin_unlock_irqrestore(&rs->lock, flags);
>  
> -	return ret;
> +	if (atomic_add_unless(&rs->printed, 1, rs->burst))
> +		return 1;
> +
> +	atomic_add_unless(&rs->missed, 1, -1);
> +
> +	return 0;
>  }
>  EXPORT_SYMBOL(___ratelimit);

Andy Shevchenko July 18, 2018, 4:49 p.m. UTC | #2

On Tue, Jul 17, 2018 at 3:59 AM, Dmitry Safonov <dima@arista.com> wrote:
> I would be glad if someone helps/bothers to review the change :C
>

Perhaps Petr and / or Steven can help you.

> Thanks,
> Dmitry
>
> On Tue, 2018-07-03 at 23:56 +0100, Dmitry Safonov wrote:
>> Currently ratelimit_state is protected with spin_lock. If the .lock
>> is
>> taken at the moment of ___ratelimit() call, the message is suppressed
>> to
>> make ratelimiting robust.
>>
>> That results in the following issue issue:
>>       CPU0                          CPU1
>> ------------------           ------------------
>> printk_ratelimit()           printk_ratelimit()
>>         |                             |
>>   try_spin_lock()              try_spin_lock()
>>         |                             |
>> time_is_before_jiffies()          return 0; // suppress
>>
>> So, concurrent call of ___ratelimit() results in silently suppressing
>> one of the messages, regardless if the limit is reached or not.
>> And rc->missed is not increased in such case so the issue is covered
>> from user.
>>
>> Convert ratelimiting to use atomic counters and drop spin_lock.
>>
>> Note: That might be unexpected, but with the first interval of
>> messages
>> storm one can print up to burst*2 messages. So, it doesn't guarantee
>> that in *any* interval amount of messages is lesser than burst.
>> But, that differs lightly from previous behavior where one can start
>> burst=5 interval and print 4 messages on the last milliseconds of
>> interval + new 5 messages from new interval (totally 9 messages in
>> lesser than interval value):
>>
>>    msg0              msg1-msg4 msg0-msg4
>>     |                     |       |
>>     |                     |       |
>>  |--o---------------------o-|-----o--------------------|--> (t)
>>                           <------->
>>                        Lesser than burst
>>
>> Dropped dev/random patch since v1 version:
>> lkml.kernel.org/r/<20180510125211.12583-1-dima@arista.com>
>>
>> Dropped `name' in as it's unused in RATELIMIT_STATE_INIT()
>>
>> Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: David Airlie <airlied@linux.ie>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Jani Nikula <jani.nikula@linux.intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Cc: "Theodore Ts'o" <tytso@mit.edu>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: intel-gfx@lists.freedesktop.org
>> Cc: dri-devel@lists.freedesktop.org
>> Signed-off-by: Dmitry Safonov <dima@arista.com>
>> ---
>>  drivers/char/random.c            | 28 ++++++++-----------
>>  drivers/gpu/drm/i915/i915_perf.c |  8 ++++--
>>  fs/btrfs/super.c                 | 16 +++++------
>>  fs/xfs/scrub/scrub.c             |  2 +-
>>  include/linux/ratelimit.h        | 31 ++++++++++++---------
>>  kernel/user.c                    |  2 +-
>>  lib/ratelimit.c                  | 59 +++++++++++++++++++-----------
>> ----------
>>  7 files changed, 73 insertions(+), 73 deletions(-)
>>
>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>> index cd888d4ee605..0be31b3eadab 100644
>> --- a/drivers/char/random.c
>> +++ b/drivers/char/random.c
>> @@ -439,10 +439,8 @@ static void _crng_backtrack_protect(struct
>> crng_state *crng,
>>  static void process_random_ready_list(void);
>>  static void _get_random_bytes(void *buf, int nbytes);
>>
>> -static struct ratelimit_state unseeded_warning =
>> -     RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
>> -static struct ratelimit_state urandom_warning =
>> -     RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);
>> +static struct ratelimit_state unseeded_warning =
>> RATELIMIT_STATE_INIT(HZ, 3);
>> +static struct ratelimit_state urandom_warning =
>> RATELIMIT_STATE_INIT(HZ, 3);
>>
>>  static int ratelimit_disable __read_mostly;
>>
>> @@ -937,24 +935,22 @@ static void crng_reseed(struct crng_state
>> *crng, struct entropy_store *r)
>>       crng->init_time = jiffies;
>>       spin_unlock_irqrestore(&crng->lock, flags);
>>       if (crng == &primary_crng && crng_init < 2) {
>> +             unsigned int unseeded_miss, urandom_miss;
>> +
>>               invalidate_batched_entropy();
>>               numa_crng_init();
>>               crng_init = 2;
>>               process_random_ready_list();
>>               wake_up_interruptible(&crng_init_wait);
>>               pr_notice("random: crng init done\n");
>> -             if (unseeded_warning.missed) {
>> -                     pr_notice("random: %d get_random_xx
>> warning(s) missed "
>> -                               "due to ratelimiting\n",
>> -                               unseeded_warning.missed);
>> -                     unseeded_warning.missed = 0;
>> -             }
>> -             if (urandom_warning.missed) {
>> -                     pr_notice("random: %d urandom warning(s)
>> missed "
>> -                               "due to ratelimiting\n",
>> -                               urandom_warning.missed);
>> -                     urandom_warning.missed = 0;
>> -             }
>> +             unseeded_miss =
>> atomic_xchg(&unseeded_warning.missed, 0);
>> +             if (unseeded_miss)
>> +                     pr_notice("random: %u get_random_xx
>> warning(s) missed "
>> +                               "due to ratelimiting\n",
>> unseeded_miss);
>> +             urandom_miss = atomic_xchg(&urandom_warning.missed,
>> 0);
>> +             if (urandom_miss)
>> +                     pr_notice("random: %u urandom warning(s)
>> missed "
>> +                               "due to ratelimiting\n",
>> urandom_miss);
>>       }
>>  }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index 019bd2d073ad..dcfba3023547 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1295,6 +1295,7 @@ free_oa_buffer(struct drm_i915_private *i915)
>>  static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
>>  {
>>       struct drm_i915_private *dev_priv = stream->dev_priv;
>> +     unsigned int msg_missed;
>>
>>       BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
>>
>> @@ -1317,9 +1318,10 @@ static void i915_oa_stream_destroy(struct
>> i915_perf_stream *stream)
>>
>>       put_oa_config(dev_priv, stream->oa_config);
>>
>> -     if (dev_priv->perf.oa.spurious_report_rs.missed) {
>> -             DRM_NOTE("%d spurious OA report notices suppressed
>> due to ratelimiting\n",
>> -                      dev_priv-
>> >perf.oa.spurious_report_rs.missed);
>> +     msg_missed = atomic_read(&dev_priv-
>> >perf.oa.spurious_report_rs.missed);
>> +     if (msg_missed) {
>> +             DRM_NOTE("%u spurious OA report notices suppressed
>> due to ratelimiting\n",
>> +                      msg_missed);
>>       }
>>  }
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 81107ad49f3a..ffab6c23bc50 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -177,14 +177,14 @@ static const char * const logtypes[] = {
>>   * messages doesn't cause more important ones to be dropped.
>>   */
>>  static struct ratelimit_state printk_limits[] = {
>> -     RATELIMIT_STATE_INIT(printk_limits[0],
>> DEFAULT_RATELIMIT_INTERVAL, 100),
>> -     RATELIMIT_STATE_INIT(printk_limits[1],
>> DEFAULT_RATELIMIT_INTERVAL, 100),
>> -     RATELIMIT_STATE_INIT(printk_limits[2],
>> DEFAULT_RATELIMIT_INTERVAL, 100),
>> -     RATELIMIT_STATE_INIT(printk_limits[3],
>> DEFAULT_RATELIMIT_INTERVAL, 100),
>> -     RATELIMIT_STATE_INIT(printk_limits[4],
>> DEFAULT_RATELIMIT_INTERVAL, 100),
>> -     RATELIMIT_STATE_INIT(printk_limits[5],
>> DEFAULT_RATELIMIT_INTERVAL, 100),
>> -     RATELIMIT_STATE_INIT(printk_limits[6],
>> DEFAULT_RATELIMIT_INTERVAL, 100),
>> -     RATELIMIT_STATE_INIT(printk_limits[7],
>> DEFAULT_RATELIMIT_INTERVAL, 100),
>> +     RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
>> +     RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
>> +     RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
>> +     RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
>> +     RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
>> +     RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
>> +     RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
>> +     RATELIMIT_STATE_INIT(DEFAULT_RATELIMIT_INTERVAL, 100),
>>  };
>>
>>  void btrfs_printk(const struct btrfs_fs_info *fs_info, const char
>> *fmt, ...)
>> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
>> index 58ae76b3a421..8b58b439694a 100644
>> --- a/fs/xfs/scrub/scrub.c
>> +++ b/fs/xfs/scrub/scrub.c
>> @@ -349,7 +349,7 @@ xfs_scrub_experimental_warning(
>>       struct xfs_mount        *mp)
>>  {
>>       static struct ratelimit_state scrub_warning =
>> RATELIMIT_STATE_INIT(
>> -                     "xfs_scrub_warning", 86400 * HZ, 1);
>> +                     86400 * HZ, 1);
>>       ratelimit_set_flags(&scrub_warning,
>> RATELIMIT_MSG_ON_RELEASE);
>>
>>       if (__ratelimit(&scrub_warning))
>> diff --git a/include/linux/ratelimit.h b/include/linux/ratelimit.h
>> index 8ddf79e9207a..f9a9386dd869 100644
>> --- a/include/linux/ratelimit.h
>> +++ b/include/linux/ratelimit.h
>> @@ -4,7 +4,6 @@
>>
>>  #include <linux/param.h>
>>  #include <linux/sched.h>
>> -#include <linux/spinlock.h>
>>
>>  #define DEFAULT_RATELIMIT_INTERVAL   (5 * HZ)
>>  #define DEFAULT_RATELIMIT_BURST              10
>> @@ -13,38 +12,39 @@
>>  #define RATELIMIT_MSG_ON_RELEASE     BIT(0)
>>
>>  struct ratelimit_state {
>> -     raw_spinlock_t  lock;           /* protect the
>> state */
>> +     atomic_t        printed;
>> +     atomic_t        missed;
>>
>>       int             interval;
>>       int             burst;
>> -     int             printed;
>> -     int             missed;
>>       unsigned long   begin;
>>       unsigned long   flags;
>>  };
>>
>> -#define RATELIMIT_STATE_INIT(name, interval_init, burst_init) {
>>       \
>> -             .lock           =
>> __RAW_SPIN_LOCK_UNLOCKED(name.lock),  \
>> +#define RATELIMIT_STATE_INIT(interval_init, burst_init) {
>> \
>>               .interval       = interval_init,
>> \
>>               .burst          = burst_init,
>>       \
>> +             .printed        = ATOMIC_INIT(0),
>> \
>> +             .missed         = ATOMIC_INIT(0),
>>       \
>>       }
>>
>>  #define RATELIMIT_STATE_INIT_DISABLED
>>       \
>> -     RATELIMIT_STATE_INIT(ratelimit_state, 0,
>> DEFAULT_RATELIMIT_BURST)
>> +     RATELIMIT_STATE_INIT(0, DEFAULT_RATELIMIT_BURST)
>>
>>  #define DEFINE_RATELIMIT_STATE(name, interval_init, burst_init)
>>       \
>>
>> \
>>       struct ratelimit_state name =
>>       \
>> -             RATELIMIT_STATE_INIT(name, interval_init,
>> burst_init)   \
>> +             RATELIMIT_STATE_INIT(interval_init, burst_init)
>>
>>  static inline void ratelimit_state_init(struct ratelimit_state *rs,
>>                                       int interval, int burst)
>>  {
>>       memset(rs, 0, sizeof(*rs));
>>
>> -     raw_spin_lock_init(&rs->lock);
>>       rs->interval    = interval;
>>       rs->burst       = burst;
>> +     atomic_set(&rs->printed, 0);
>> +     atomic_set(&rs->missed, 0);
>>  }
>>
>>  static inline void ratelimit_default_init(struct ratelimit_state
>> *rs)
>> @@ -53,16 +53,21 @@ static inline void ratelimit_default_init(struct
>> ratelimit_state *rs)
>>                                       DEFAULT_RATELIMIT_BURST);
>>  }
>>
>> +/*
>> + * Keeping It Simple: not reenterable and not safe for concurrent
>> + * ___ratelimit() call as used only by devkmsg_release().
>> + */
>>  static inline void ratelimit_state_exit(struct ratelimit_state *rs)
>>  {
>> +     int missed;
>> +
>>       if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
>>               return;
>>
>> -     if (rs->missed) {
>> +     missed = atomic_xchg(&rs->missed, 0);
>> +     if (missed)
>>               pr_warn("%s: %d output lines suppressed due to
>> ratelimiting\n",
>> -                     current->comm, rs->missed);
>> -             rs->missed = 0;
>> -     }
>> +                     current->comm, missed);
>>  }
>>
>>  static inline void
>> diff --git a/kernel/user.c b/kernel/user.c
>> index 36288d840675..a964f842d8f0 100644
>> --- a/kernel/user.c
>> +++ b/kernel/user.c
>> @@ -101,7 +101,7 @@ struct user_struct root_user = {
>>       .sigpending     = ATOMIC_INIT(0),
>>       .locked_shm     = 0,
>>       .uid            = GLOBAL_ROOT_UID,
>> -     .ratelimit      =
>> RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
>> +     .ratelimit      = RATELIMIT_STATE_INIT(0, 0),
>>  };
>>
>>  /*
>> diff --git a/lib/ratelimit.c b/lib/ratelimit.c
>> index d01f47135239..d9b749d40108 100644
>> --- a/lib/ratelimit.c
>> +++ b/lib/ratelimit.c
>> @@ -13,6 +13,18 @@
>>  #include <linux/jiffies.h>
>>  #include <linux/export.h>
>>
>> +static void ratelimit_end_interval(struct ratelimit_state *rs, const
>> char *func)
>> +{
>> +     rs->begin = jiffies;
>> +
>> +     if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
>> +             unsigned int missed = atomic_xchg(&rs->missed, 0);
>> +
>> +             if (missed)
>> +                     pr_warn("%s: %u callbacks suppressed\n",
>> func, missed);
>> +     }
>> +}
>> +
>>  /*
>>   * __ratelimit - rate limiting
>>   * @rs: ratelimit_state data
>> @@ -27,45 +39,30 @@
>>   */
>>  int ___ratelimit(struct ratelimit_state *rs, const char *func)
>>  {
>> -     unsigned long flags;
>> -     int ret;
>> -
>>       if (!rs->interval)
>>               return 1;
>>
>> -     /*
>> -      * If we contend on this state's lock then almost
>> -      * by definition we are too busy to print a message,
>> -      * in addition to the one that will be printed by
>> -      * the entity that is holding the lock already:
>> -      */
>> -     if (!raw_spin_trylock_irqsave(&rs->lock, flags))
>> +     if (unlikely(!rs->burst)) {
>> +             atomic_add_unless(&rs->missed, 1, -1);
>> +             if (time_is_before_jiffies(rs->begin + rs-
>> >interval))
>> +                     ratelimit_end_interval(rs, func);
>> +
>>               return 0;
>> +     }
>>
>> -     if (!rs->begin)
>> -             rs->begin = jiffies;
>> +     if (atomic_add_unless(&rs->printed, 1, rs->burst))
>> +             return 1;
>>
>>       if (time_is_before_jiffies(rs->begin + rs->interval)) {
>> -             if (rs->missed) {
>> -                     if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
>> {
>> -                             printk_deferred(KERN_WARNING
>> -                                             "%s: %d callbacks
>> suppressed\n",
>> -                                             func, rs->missed);
>> -                             rs->missed = 0;
>> -                     }
>> -             }
>> -             rs->begin   = jiffies;
>> -             rs->printed = 0;
>> -     }
>> -     if (rs->burst && rs->burst > rs->printed) {
>> -             rs->printed++;
>> -             ret = 1;
>> -     } else {
>> -             rs->missed++;
>> -             ret = 0;
>> +             if (atomic_cmpxchg(&rs->printed, rs->burst, 0))
>> +                     ratelimit_end_interval(rs, func);
>>       }
>> -     raw_spin_unlock_irqrestore(&rs->lock, flags);
>>
>> -     return ret;
>> +     if (atomic_add_unless(&rs->printed, 1, rs->burst))
>> +             return 1;
>> +
>> +     atomic_add_unless(&rs->missed, 1, -1);
>> +
>> +     return 0;
>>  }
>>  EXPORT_SYMBOL(___ratelimit);

Petr Mladek July 20, 2018, 3:09 p.m. UTC | #3

On Wed 2018-07-18 19:49:10, Andy Shevchenko wrote:
> On Tue, Jul 17, 2018 at 3:59 AM, Dmitry Safonov <dima@arista.com> wrote:
> > I would be glad if someone helps/bothers to review the change :C
> >
> 
> Perhaps Petr and / or Steven can help you.


> > Thanks,
> > Dmitry
> >
> > On Tue, 2018-07-03 at 23:56 +0100, Dmitry Safonov wrote:
> >> Currently ratelimit_state is protected with spin_lock. If the .lock
> >> is
> >> taken at the moment of ___ratelimit() call, the message is suppressed
> >> to
> >> make ratelimiting robust.
> >>
> >> That results in the following issue issue:
> >>       CPU0                          CPU1
> >> ------------------           ------------------
> >> printk_ratelimit()           printk_ratelimit()
> >>         |                             |
> >>   try_spin_lock()              try_spin_lock()
> >>         |                             |
> >> time_is_before_jiffies()          return 0; // suppress
> >>
> >> So, concurrent call of ___ratelimit() results in silently suppressing
> >> one of the messages, regardless if the limit is reached or not.
> >> And rc->missed is not increased in such case so the issue is covered
> >> from user.
> >>
> >> Convert ratelimiting to use atomic counters and drop spin_lock.
> >>
> >> Note: That might be unexpected, but with the first interval of
> >> messages
> >> storm one can print up to burst*2 messages. So, it doesn't guarantee
> >> that in *any* interval amount of messages is lesser than burst.
> >> But, that differs lightly from previous behavior where one can start
> >> burst=5 interval and print 4 messages on the last milliseconds of
> >> interval + new 5 messages from new interval (totally 9 messages in
> >> lesser than interval value):
> >>
> >>    msg0              msg1-msg4 msg0-msg4
> >>     |                     |       |
> >>     |                     |       |
> >>  |--o---------------------o-|-----o--------------------|--> (t)
> >>                           <------->
> >>                        Lesser than burst
> >>

I am a bit confused. Does this patch fix two problems?

   + Lost rc->missed update when try_spin_lock() fails
   + printing up to burst*2 messages in a give interval

It would make the review much easier if you split it into more
patches and fix the problems separately.

Otherwise, it looks promissing...

Best Regards,
Petr

PS: I have vacation the following two weeks. Anyway, please, CC me
if you send any new version.

Best Regards,
Petr

Dmitry Safonov July 20, 2018, 3:33 p.m. UTC | #4

On Fri, 2018-07-20 at 17:09 +0200, Petr Mladek wrote:
> > > On Tue, 2018-07-03 at 23:56 +0100, Dmitry Safonov wrote:
> > > > Currently ratelimit_state is protected with spin_lock. If the
> > > > .lock
> > > > is
> > > > taken at the moment of ___ratelimit() call, the message is
> > > > suppressed
> > > > to
> > > > make ratelimiting robust.
> > > > 
> > > > That results in the following issue issue:
> > > >       CPU0                          CPU1
> > > > ------------------           ------------------
> > > > printk_ratelimit()           printk_ratelimit()
> > > >         |                             |
> > > >   try_spin_lock()              try_spin_lock()
> > > >         |                             |
> > > > time_is_before_jiffies()          return 0; // suppress
> > > > 
> > > > So, concurrent call of ___ratelimit() results in silently
> > > > suppressing
> > > > one of the messages, regardless if the limit is reached or not.
> > > > And rc->missed is not increased in such case so the issue is
> > > > covered
> > > > from user.
> > > > 
> > > > Convert ratelimiting to use atomic counters and drop spin_lock.
> > > > 
> > > > Note: That might be unexpected, but with the first interval of
> > > > messages
> > > > storm one can print up to burst*2 messages. So, it doesn't
> > > > guarantee
> > > > that in *any* interval amount of messages is lesser than burst.
> > > > But, that differs lightly from previous behavior where one can
> > > > start
> > > > burst=5 interval and print 4 messages on the last milliseconds
> > > > of
> > > > interval + new 5 messages from new interval (totally 9 messages
> > > > in
> > > > lesser than interval value):
> > > > 
> > > >    msg0              msg1-msg4 msg0-msg4
> > > >     |                     |       |
> > > >     |                     |       |
> > > >  |--o---------------------o-|-----o--------------------|--> (t)
> > > >                           <------->
> > > >                        Lesser than burst
> > > > 
> 
> I am a bit confused. Does this patch fix two problems?
> 
>    + Lost rc->missed update when try_spin_lock() fails
>    + printing up to burst*2 messages in a give interval

What I tried to solve here (maybe I could better point it in the commit
message): is the situation where ratelimit::burst haven't been hit yet
and there are calls for __ratelimit() from different CPUs.
So, neither of the messages should be suppressed, but as the spinlock
is taken already - one of them is dropped and even without updating
missed counter.

> It would make the review much easier if you split it into more
> patches and fix the problems separately.

Hmm, not really sure: the patch changes spinlock to atomics and I'm not
sure it make much sense to add atomics before removing the spinlock..

> Otherwise, it looks promissing...
> 
> Best Regards,
> Petr
> 
> PS: I have vacation the following two weeks. Anyway, please, CC me
> if you send any new version.

Surely, thanks for your attention to the patch (and time).

Steven Rostedt Aug. 2, 2018, 1:48 a.m. UTC | #5

On Fri, 20 Jul 2018 16:33:36 +0100
Dmitry Safonov <dima@arista.com> wrote:

> On Fri, 2018-07-20 at 17:09 +0200, Petr Mladek wrote:
> > > > On Tue, 2018-07-03 at 23:56 +0100, Dmitry Safonov wrote:  
> > > > > Currently ratelimit_state is protected with spin_lock. If the
> > > > > .lock
> > > > > is
> > > > > taken at the moment of ___ratelimit() call, the message is
> > > > > suppressed
> > > > > to
> > > > > make ratelimiting robust.
> > > > > 
> > > > > That results in the following issue issue:
> > > > >       CPU0                          CPU1
> > > > > ------------------           ------------------
> > > > > printk_ratelimit()           printk_ratelimit()
> > > > >         |                             |
> > > > >   try_spin_lock()              try_spin_lock()
> > > > >         |                             |
> > > > > time_is_before_jiffies()          return 0; // suppress
> > > > > 
> > > > > So, concurrent call of ___ratelimit() results in silently
> > > > > suppressing
> > > > > one of the messages, regardless if the limit is reached or not.
> > > > > And rc->missed is not increased in such case so the issue is
> > > > > covered
> > > > > from user.
> > > > > 
> > > > > Convert ratelimiting to use atomic counters and drop spin_lock.
> > > > > 
> > > > > Note: That might be unexpected, but with the first interval of
> > > > > messages
> > > > > storm one can print up to burst*2 messages. So, it doesn't
> > > > > guarantee
> > > > > that in *any* interval amount of messages is lesser than burst.
> > > > > But, that differs lightly from previous behavior where one can
> > > > > start
> > > > > burst=5 interval and print 4 messages on the last milliseconds
> > > > > of
> > > > > interval + new 5 messages from new interval (totally 9 messages
> > > > > in
> > > > > lesser than interval value):
> > > > > 
> > > > >    msg0              msg1-msg4 msg0-msg4
> > > > >     |                     |       |
> > > > >     |                     |       |
> > > > >  |--o---------------------o-|-----o--------------------|--> (t)
> > > > >                           <------->
> > > > >                        Lesser than burst
> > > > >   
> > 
> > I am a bit confused. Does this patch fix two problems?
> > 
> >    + Lost rc->missed update when try_spin_lock() fails
> >    + printing up to burst*2 messages in a give interval  
> 
> What I tried to solve here (maybe I could better point it in the commit
> message): is the situation where ratelimit::burst haven't been hit yet
> and there are calls for __ratelimit() from different CPUs.
> So, neither of the messages should be suppressed, but as the spinlock
> is taken already - one of them is dropped and even without updating
> missed counter.
> 
> > It would make the review much easier if you split it into more
> > patches and fix the problems separately.  
> 
> Hmm, not really sure: the patch changes spinlock to atomics and I'm not
> sure it make much sense to add atomics before removing the spinlock..
> 
> > Otherwise, it looks promissing...
> > 
> > Best Regards,
> > Petr
> > 
> > PS: I have vacation the following two weeks. Anyway, please, CC me
> > if you send any new version.  
> 
> Surely, thanks for your attention to the patch (and time).
> 

I'm just catching up from my vacation. What about making rs->missed
into an atomic, and have:

	if (!raw_spin_trylock_irqsave(&rs->lock, flags)) {
		atomic_inc(&rs->missed);
		return 0;
	}

?

You would also need to do:

	if (time_is_before_jiffies(rs->begin + rs->interval)) {
		int missed = atomic_xchg(&rs->missed, 0);
		if (missed) {

So that you don't have a race between checking rs->missed and setting it
to zero.

-- Steve

Dmitry Safonov Aug. 2, 2018, 3:19 p.m. UTC | #6

Hi Steven,
Thanks for your reply,

On Wed, 2018-08-01 at 21:48 -0400, Steven Rostedt wrote:
> I'm just catching up from my vacation. What about making rs->missed
> into an atomic, and have:
> 
> 	if (!raw_spin_trylock_irqsave(&rs->lock, flags)) {
> 		atomic_inc(&rs->missed);
> 		return 0;
> 	}
> 
> ?

Uhm. Do you mean as a preparation patch to split this on two patches?
Because it will not solve the issue where one CPU has taken rs->lock,
and is updating rs->printed, checking burst and whatnot; while the
second CPU will loose the message which was even *under* burst limit.

I.e.: there are enough of printk_ratelimit() users in tree and a
message from one can be suppressed, while shouldn't.

> You would also need to do:
> 
> 	if (time_is_before_jiffies(rs->begin + rs->interval)) {
> 		int missed = atomic_xchg(&rs->missed, 0);
> 		if (missed) {
> 
> So that you don't have a race between checking rs->missed and setting
> it
> to zero.

Petr Mladek Aug. 15, 2018, 3:10 p.m. UTC | #7

On Tue 2018-07-03 23:56:28, Dmitry Safonov wrote:
> Currently ratelimit_state is protected with spin_lock. If the .lock is
> taken at the moment of ___ratelimit() call, the message is suppressed to
> make ratelimiting robust.
> 
> That results in the following issue issue:
>       CPU0                          CPU1
> ------------------           ------------------
> printk_ratelimit()           printk_ratelimit()
>         |                             |
>   try_spin_lock()              try_spin_lock()
>         |                             |
> time_is_before_jiffies()          return 0; // suppress
> 
> So, concurrent call of ___ratelimit() results in silently suppressing
> one of the messages, regardless if the limit is reached or not.
> And rc->missed is not increased in such case so the issue is covered
> from user.
> 
> Convert ratelimiting to use atomic counters and drop spin_lock.
> 
> Note: That might be unexpected, but with the first interval of messages
> storm one can print up to burst*2 messages. So, it doesn't guarantee
> that in *any* interval amount of messages is lesser than burst.
> But, that differs lightly from previous behavior where one can start
> burst=5 interval and print 4 messages on the last milliseconds of
> interval + new 5 messages from new interval (totally 9 messages in
> lesser than interval value):

I am still confused by this paragraph. Does this patch change the
behavior? What is the original and what is the new one, please?

>    msg0              msg1-msg4 msg0-msg4
>     |                     |       |
>     |                     |       |
>  |--o---------------------o-|-----o--------------------|--> (t)
>                           <------->
>                        Lesser than burst
> 
> Dropped dev/random patch since v1 version:
> lkml.kernel.org/r/<20180510125211.12583-1-dima@arista.com>
> 
> Dropped `name' in as it's unused in RATELIMIT_STATE_INIT()
> 
> diff --git a/lib/ratelimit.c b/lib/ratelimit.c
> index d01f47135239..d9b749d40108 100644
> --- a/lib/ratelimit.c
> +++ b/lib/ratelimit.c
> @@ -13,6 +13,18 @@
>  #include <linux/jiffies.h>
>  #include <linux/export.h>
>  
> +static void ratelimit_end_interval(struct ratelimit_state *rs, const char *func)
> +{
> +	rs->begin = jiffies;
> +
> +	if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
> +		unsigned int missed = atomic_xchg(&rs->missed, 0);
> +
> +		if (missed)
> +			pr_warn("%s: %u callbacks suppressed\n", func, missed);
> +	}
> +}
> +
>  /*
>   * __ratelimit - rate limiting
>   * @rs: ratelimit_state data
> @@ -27,45 +39,30 @@
>   */
>  int ___ratelimit(struct ratelimit_state *rs, const char *func)
>  {
> -	unsigned long flags;
> -	int ret;
> -
>  	if (!rs->interval)
>  		return 1;
>  
> -	/*
> -	 * If we contend on this state's lock then almost
> -	 * by definition we are too busy to print a message,
> -	 * in addition to the one that will be printed by
> -	 * the entity that is holding the lock already:
> -	 */
> -	if (!raw_spin_trylock_irqsave(&rs->lock, flags))
> +	if (unlikely(!rs->burst)) {
> +		atomic_add_unless(&rs->missed, 1, -1);
> +		if (time_is_before_jiffies(rs->begin + rs->interval))
> +			ratelimit_end_interval(rs, func);

This is racy. time_is_before_jiffies() might be valid on two
CPUs in parallel. They would both call ratelimit_end_interval().
This is not longer atomic context. Therefore one might get scheduled
and set rs->begin = jiffies seconds later. I am sure that there might
be more crazy scenarios.

> +
>  		return 0;
> +	}
>  
> -	if (!rs->begin)
> -		rs->begin = jiffies;
> +	if (atomic_add_unless(&rs->printed, 1, rs->burst))
> +		return 1;
>  
>  	if (time_is_before_jiffies(rs->begin + rs->interval)) {
> -		if (rs->missed) {
> -			if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
> -				printk_deferred(KERN_WARNING
> -						"%s: %d callbacks suppressed\n",
> -						func, rs->missed);
> -				rs->missed = 0;
> -			}
> -		}
> -		rs->begin   = jiffies;
> -		rs->printed = 0;
> -	}
> -	if (rs->burst && rs->burst > rs->printed) {
> -		rs->printed++;
> -		ret = 1;
> -	} else {
> -		rs->missed++;
> -		ret = 0;
> +		if (atomic_cmpxchg(&rs->printed, rs->burst, 0))
> +			ratelimit_end_interval(rs, func);
>  	}
> -	raw_spin_unlock_irqrestore(&rs->lock, flags);
>  
> -	return ret;
> +	if (atomic_add_unless(&rs->printed, 1, rs->burst))
> +		return 1;

The entire logic is complicated and hard to understand. Especially calling
ratelimit_end_interval() and atomic_add_unless(&rs->printed) twice.

> +	atomic_add_unless(&rs->missed, 1, -1);
> +
> +	return 0;
>  }


I wonder if the following code would do the job (not even compile tested!):

static void ratelimit_end_interval(struct ratelimit_state *rs, const char *func)
{
	rs->begin = jiffies;

	if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
		unsigned int missed = atomic_xchg(&rs->missed, 0);

		if (missed)
			pr_warn("%s: %u callbacks suppressed\n", func, missed);
	}

	atomic_xchg(&rs->printed, 0);
}

/*
 * __ratelimit - rate limiting
 * @rs: ratelimit_state data
 * @func: name of calling function
 *
 * This enforces a rate limit: not more than @rs->burst callbacks
 * in every @rs->interval
 *
 * RETURNS:
 * 0 means callbacks will be suppressed.
 * 1 means go ahead and do it.
 */
int ___ratelimit(struct ratelimit_state *rs, const char *func)
{
	unsigned long begin, old_begin = rs->begin;

	if (!rs->interval)
		return 1;

	if (time_is_before_jiffies(rs->begin + rs->interval) &&
	    cmpxchg(&rs->begin, begin, begin + rs->interval) == begin) {
		ratelimit_end_interval(rs, func);
	}

	if (atomic_add_unless(&rs->printed, 1, rs->burst))
		return 1;

	atomic_add_unless(&rs->missed, 1, -1);
	return 0;
}
EXPORT_SYMBOL(___ratelimit);


The main logic is the same as in the original code. Only one CPU is
able to reset the interval and counters (thanks to cmpxchg).
Every caller increases either "printed" or "missed" counter.

Best Regards,
Petr

Dmitry Safonov Aug. 20, 2018, 11:14 p.m. UTC | #8

Hi Petr, thanks for review,

On Wed, 2018-08-15 at 17:10 +0200, Petr Mladek wrote:
> On Tue 2018-07-03 23:56:28, Dmitry Safonov wrote:
> > Currently ratelimit_state is protected with spin_lock. If the .lock
> > is
> > taken at the moment of ___ratelimit() call, the message is
> > suppressed to
> > make ratelimiting robust.
> > 
> > That results in the following issue issue:
> >       CPU0                          CPU1
> > ------------------           ------------------
> > printk_ratelimit()           printk_ratelimit()
> >         |                             |
> >   try_spin_lock()              try_spin_lock()
> >         |                             |
> > time_is_before_jiffies()          return 0; // suppress
> > 
> > So, concurrent call of ___ratelimit() results in silently
> > suppressing
> > one of the messages, regardless if the limit is reached or not.
> > And rc->missed is not increased in such case so the issue is
> > covered
> > from user.
> > 
> > Convert ratelimiting to use atomic counters and drop spin_lock.
> > 
> > Note: That might be unexpected, but with the first interval of
> > messages
> > storm one can print up to burst*2 messages. So, it doesn't
> > guarantee
> > that in *any* interval amount of messages is lesser than burst.
> > But, that differs lightly from previous behavior where one can
> > start
> > burst=5 interval and print 4 messages on the last milliseconds of
> > interval + new 5 messages from new interval (totally 9 messages in
> > lesser than interval value):
> 
> I am still confused by this paragraph. Does this patch change the
> behavior? What is the original and what is the new one, please?

Yeah, I could be a bit cleaner about the change.
Originally more than `burst` messages could be printed if the previous
period hasn't ended:

> 
> >    msg0              msg1-msg4 msg0-msg4
> >     |                     |       |
> >     |                     |       |
> >  |--o---------------------o-|-----o--------------------|--> (t)
> >                           <------->
> >                        Lesser than burst

But now, because I check:
+       if (atomic_add_unless(&rs->printed, 1, rs->burst))
+               return 1;

*before* checking the end of interval, the maximum number of messages
in the peak will be the same, burst*2 (burst*2 - 1 in original).
Why do I check it before the end of interval?
I thought it would be a nice optimization, making atomic_add_unless()
the only called function on the fast-path (when we haven't hit the
limit yet). If you want, I can move it into a separate patch..

> > 
> > Dropped dev/random patch since v1 version:
> > lkml.kernel.org/r/<20180510125211.12583-1-dima@arista.com>
> > 
> > Dropped `name' in as it's unused in RATELIMIT_STATE_INIT()
> > 
> > diff --git a/lib/ratelimit.c b/lib/ratelimit.c
> > index d01f47135239..d9b749d40108 100644
> > --- a/lib/ratelimit.c
> > +++ b/lib/ratelimit.c
> > @@ -13,6 +13,18 @@
> >  #include <linux/jiffies.h>
> >  #include <linux/export.h>
> >  
> > +static void ratelimit_end_interval(struct ratelimit_state *rs,
> > const char *func)
> > +{
> > +	rs->begin = jiffies;
> > +
> > +	if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
> > +		unsigned int missed = atomic_xchg(&rs->missed, 0);
> > +
> > +		if (missed)
> > +			pr_warn("%s: %u callbacks suppressed\n",
> > func, missed);
> > +	}
> > +}
> > +
> >  /*
> >   * __ratelimit - rate limiting
> >   * @rs: ratelimit_state data
> > @@ -27,45 +39,30 @@
> >   */
> >  int ___ratelimit(struct ratelimit_state *rs, const char *func)
> >  {
> > -	unsigned long flags;
> > -	int ret;
> > -
> >  	if (!rs->interval)
> >  		return 1;
> >  
> > -	/*
> > -	 * If we contend on this state's lock then almost
> > -	 * by definition we are too busy to print a message,
> > -	 * in addition to the one that will be printed by
> > -	 * the entity that is holding the lock already:
> > -	 */
> > -	if (!raw_spin_trylock_irqsave(&rs->lock, flags))
> > +	if (unlikely(!rs->burst)) {
> > +		atomic_add_unless(&rs->missed, 1, -1);
> > +		if (time_is_before_jiffies(rs->begin + rs-
> > >interval))
> > +			ratelimit_end_interval(rs, func);
> 
> This is racy. time_is_before_jiffies() might be valid on two
> CPUs in parallel. They would both call ratelimit_end_interval().
> This is not longer atomic context. Therefore one might get scheduled
> and set rs->begin = jiffies seconds later. I am sure that there might
> be more crazy scenarios.

That's the case with rs->burst == 0.
So, in this situation all the messages will be suppressed.
And the only reason to call ratelimit_end_interval() is to update the
start time and number of messages not printed.
I didn't want to add any complexion for this case - the worst will be
the count of messages suppressed will be imprecise for rs->burst == 0
case. Not a big deal, huh?

> 
> > +
> >  		return 0;
> > +	}
> >  
> > -	if (!rs->begin)
> > -		rs->begin = jiffies;
> > +	if (atomic_add_unless(&rs->printed, 1, rs->burst))
> > +		return 1;
> >  
> >  	if (time_is_before_jiffies(rs->begin + rs->interval)) {
> > -		if (rs->missed) {
> > -			if (!(rs->flags &
> > RATELIMIT_MSG_ON_RELEASE)) {
> > -				printk_deferred(KERN_WARNING
> > -						"%s: %d callbacks
> > suppressed\n",
> > -						func, rs->missed);
> > -				rs->missed = 0;
> > -			}
> > -		}
> > -		rs->begin   = jiffies;
> > -		rs->printed = 0;
> > -	}
> > -	if (rs->burst && rs->burst > rs->printed) {
> > -		rs->printed++;
> > -		ret = 1;
> > -	} else {
> > -		rs->missed++;
> > -		ret = 0;
> > +		if (atomic_cmpxchg(&rs->printed, rs->burst, 0))
> > +			ratelimit_end_interval(rs, func);
> >  	}
> > -	raw_spin_unlock_irqrestore(&rs->lock, flags);
> >  
> > -	return ret;
> > +	if (atomic_add_unless(&rs->printed, 1, rs->burst))
> > +		return 1;
> 
> The entire logic is complicated and hard to understand. Especially
> calling
> ratelimit_end_interval() and atomic_add_unless(&rs->printed) twice.

Probably, I've described the reasons above.. I may add a comment or
drop the first atomic_add_unless(), but I would prefer still keep it
there, having only one check for fast-path.

> 
> > +	atomic_add_unless(&rs->missed, 1, -1);
> > +
> > +	return 0;
> >  }
> 
> 
> I wonder if the following code would do the job (not even compile
> tested!):
> 
> static void ratelimit_end_interval(struct ratelimit_state *rs, const
> char *func)
> {
> 	rs->begin = jiffies;
> 
> 	if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
> 		unsigned int missed = atomic_xchg(&rs->missed, 0);
> 
> 		if (missed)
> 			pr_warn("%s: %u callbacks suppressed\n", func,
> missed);
> 	}
> 
> 	atomic_xchg(&rs->printed, 0);
> }
> 
> /*
>  * __ratelimit - rate limiting
>  * @rs: ratelimit_state data
>  * @func: name of calling function
>  *
>  * This enforces a rate limit: not more than @rs->burst callbacks
>  * in every @rs->interval
>  *
>  * RETURNS:
>  * 0 means callbacks will be suppressed.
>  * 1 means go ahead and do it.
>  */
> int ___ratelimit(struct ratelimit_state *rs, const char *func)
> {
> 	unsigned long begin, old_begin = rs->begin;
> 
> 	if (!rs->interval)
> 		return 1;
> 
> 	if (time_is_before_jiffies(rs->begin + rs->interval) &&
> 	    cmpxchg(&rs->begin, begin, begin + rs->interval) == begin) 

Probably, cmpxchg(&rs->begin, begin, jiffies)?
Otherwise hitting it later will be buggy.
Also `begin` might be uninitialized? (if cmpxchg() fails)

> {
> 		ratelimit_end_interval(rs, func);
> 	}
> 
> 	if (atomic_add_unless(&rs->printed, 1, rs->burst))
> 		return 1;
> 
> 	atomic_add_unless(&rs->missed, 1, -1);
> 	return 0;
> }
> EXPORT_SYMBOL(___ratelimit);
> 
> 
> The main logic is the same as in the original code. Only one CPU is
> able to reset the interval and counters (thanks to cmpxchg).
> Every caller increases either "printed" or "missed" counter.

Do we care about fast-path argument I did above?

Petr Mladek Aug. 29, 2018, 11:08 a.m. UTC | #9

On Tue 2018-08-21 00:14:35, Dmitry Safonov wrote:
> Hi Petr, thanks for review,
> 
> On Wed, 2018-08-15 at 17:10 +0200, Petr Mladek wrote:
> > On Tue 2018-07-03 23:56:28, Dmitry Safonov wrote:
> > > Currently ratelimit_state is protected with spin_lock. If the .lock
> > > is
> > > taken at the moment of ___ratelimit() call, the message is
> > > suppressed to
> > > make ratelimiting robust.
> > > 
> > > That results in the following issue issue:
> > >       CPU0                          CPU1
> > > ------------------           ------------------
> > > printk_ratelimit()           printk_ratelimit()
> > >         |                             |
> > >   try_spin_lock()              try_spin_lock()
> > >         |                             |
> > > time_is_before_jiffies()          return 0; // suppress
> > > 
> > > So, concurrent call of ___ratelimit() results in silently
> > > suppressing
> > > one of the messages, regardless if the limit is reached or not.
> > > And rc->missed is not increased in such case so the issue is
> > > covered
> > > from user.
> > > 
> > > Convert ratelimiting to use atomic counters and drop spin_lock.
> > > 
> > > Note: That might be unexpected, but with the first interval of
> > > messages
> > > storm one can print up to burst*2 messages. So, it doesn't
> > > guarantee
> > > that in *any* interval amount of messages is lesser than burst.
> > > But, that differs lightly from previous behavior where one can
> > > start
> > > burst=5 interval and print 4 messages on the last milliseconds of
> > > interval + new 5 messages from new interval (totally 9 messages in
> > > lesser than interval value):
> > 
> > I am still confused by this paragraph. Does this patch change the
> > behavior? What is the original and what is the new one, please?
> 
> Yeah, I could be a bit cleaner about the change.
> Originally more than `burst` messages could be printed if the previous
> period hasn't ended:

I am even more confused now. What do you mean by "if the previous
period hasn't ended"?

I do not see any way how more than 'burst' messages might be
printed between upating rs->begin.

> > 
> > >    msg0              msg1-msg4 msg0-msg4
> > >     |                     |       |
> > >     |                     |       |
> > >  |--o---------------------o-|-----o--------------------|--> (t)
> > >                           <------->
> > >                        Lesser than burst

s/Lesser than burst/Lesser than rs->period/  ?

> But now, because I check:
> +       if (atomic_add_unless(&rs->printed, 1, rs->burst))
> +               return 1;
> 
> *before* checking the end of interval, the maximum number of
> *messages in the peak will be the same, burst*2 (burst*2 - 1
> in original).

This paragraph is very confusing as well. It says that the number
of messages will be the "same". Then it mentions
burst*2 vs. burst*2 - 1. What means the word "same"?

I understand that with the old code you might see "burst*2 -1"
messages within rc->period time frame when they are printed
close around updating rs->begin.

But I do not see how it might be "burst*2" with the new code.
rs->begin must never be updated faster than rs->period.
And rs->begin is always updated together with rs->printed.
I mean that the first msg0 is always printed at the very
beginning of the interval and therefore far from the end.
Or do I miss anything?


> Why do I check it before the end of interval?
> I thought it would be a nice optimization, making atomic_add_unless()
> the only called function on the fast-path (when we haven't hit the
> limit yet). If you want, I can move it into a separate patch..

ratelimit is basically used together with printk() that is
a slow path. Your optimization is not worth the complexity.

And this brings me to the question what is the motivation for
this patch, please? Did you have a real life problem with
the original behavior?

The main purpose of the function is to reduce the amount
of duplicated messages that would mess the log, overload
consoles and do not help anything. It is used in situations
where it is expected that some messages might get lost.

It might make sense to fix rc->missed counter and
also make sure that always the first Nth < burst messages
will get printed but only when the code stays readable.


> > > Dropped dev/random patch since v1 version:
> > > lkml.kernel.org/r/<20180510125211.12583-1-dima@arista.com>
> > > 
> > > Dropped `name' in as it's unused in RATELIMIT_STATE_INIT()
> > > 
> > > diff --git a/lib/ratelimit.c b/lib/ratelimit.c
> > > index d01f47135239..d9b749d40108 100644
> > > --- a/lib/ratelimit.c
> > > +++ b/lib/ratelimit.c
> > > @@ -13,6 +13,18 @@
> > >  #include <linux/jiffies.h>
> > >  #include <linux/export.h>
> > >  
> > > +static void ratelimit_end_interval(struct ratelimit_state *rs,
> > > const char *func)
> > > +{
> > > +	rs->begin = jiffies;
> > > +
> > > +	if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
> > > +		unsigned int missed = atomic_xchg(&rs->missed, 0);
> > > +
> > > +		if (missed)
> > > +			pr_warn("%s: %u callbacks suppressed\n",
> > > func, missed);
> > > +	}
> > > +}
> > > +
> > >  /*
> > >   * __ratelimit - rate limiting
> > >   * @rs: ratelimit_state data
> > > @@ -27,45 +39,30 @@
> > >   */
> > >  int ___ratelimit(struct ratelimit_state *rs, const char *func)
> > >  {
> > > -	unsigned long flags;
> > > -	int ret;
> > > -
> > >  	if (!rs->interval)
> > >  		return 1;
> > >  
> > > -	/*
> > > -	 * If we contend on this state's lock then almost
> > > -	 * by definition we are too busy to print a message,
> > > -	 * in addition to the one that will be printed by
> > > -	 * the entity that is holding the lock already:
> > > -	 */
> > > -	if (!raw_spin_trylock_irqsave(&rs->lock, flags))
> > > +	if (unlikely(!rs->burst)) {
> > > +		atomic_add_unless(&rs->missed, 1, -1);
> > > +		if (time_is_before_jiffies(rs->begin + rs-
> > > >interval))
> > > +			ratelimit_end_interval(rs, func);
> > 
> > This is racy. time_is_before_jiffies() might be valid on two
> > CPUs in parallel. They would both call ratelimit_end_interval().
> > This is not longer atomic context. Therefore one might get scheduled
> > and set rs->begin = jiffies seconds later. I am sure that there might
> > be more crazy scenarios.
> 
> That's the case with rs->burst == 0.
> So, in this situation all the messages will be suppressed.
> And the only reason to call ratelimit_end_interval() is to update the
> start time and number of messages not printed.
> I didn't want to add any complexion for this case - the worst will be
> the count of messages suppressed will be imprecise for rs->burst == 0
> case. Not a big deal, huh?

Ah, I see, I have missed the check !rs->burst. Well, it does not make
much sense to use this value in the real life. IMHO, it is not worth
any optimization and extra code complexity if the rest of the code
might handle this situation correctly.


> > I wonder if the following code would do the job (not even compile
> > tested!):
> > 
> > static void ratelimit_end_interval(struct ratelimit_state *rs, const
> > char *func)
> > {
> > 	rs->begin = jiffies;
> > 
> > 	if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
> > 		unsigned int missed = atomic_xchg(&rs->missed, 0);
> > 
> > 		if (missed)
> > 			pr_warn("%s: %u callbacks suppressed\n", func,
> > missed);
> > 	}
> > 
> > 	atomic_xchg(&rs->printed, 0);
> > }
> > 
> > /*
> >  * __ratelimit - rate limiting
> >  * @rs: ratelimit_state data
> >  * @func: name of calling function
> >  *
> >  * This enforces a rate limit: not more than @rs->burst callbacks
> >  * in every @rs->interval
> >  *
> >  * RETURNS:
> >  * 0 means callbacks will be suppressed.
> >  * 1 means go ahead and do it.
> >  */
> > int ___ratelimit(struct ratelimit_state *rs, const char *func)
> > {
> > 	unsigned long begin, old_begin = rs->begin;
> > 
> > 	if (!rs->interval)
> > 		return 1;
> > 
> > 	if (time_is_before_jiffies(rs->begin + rs->interval) &&
> > 	    cmpxchg(&rs->begin, begin, begin + rs->interval) == begin) 
> 
> Probably, cmpxchg(&rs->begin, begin, jiffies)?
> Otherwise hitting it later will be buggy.
> Also `begin` might be uninitialized? (if cmpxchg() fails)

Yup, I messed this a bit. We could remove begin variable and
use:

	cmpxchg(&rs->begin, old_begin, jiffies)


Best Regards,
Petr

[PATCHv3] lib/ratelimit: Lockless ratelimiting

Commit Message

Comments

Patch