Message ID | 20220830214919.53220-4-surenb@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Code tagging framework and applications | expand |
On Tue, Aug 30, 2022 at 02:48:52PM -0700, Suren Baghdasaryan wrote: > From: Kent Overstreet <kent.overstreet@linux.dev> > > This patch adds lib/lazy-percpu-counter.c, which implements counters > that start out as atomics, but lazily switch to percpu mode if the > update rate crosses some threshold (arbitrarily set at 256 per second). > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Why not use percpu_counter? It has a per-cpu counter that is synchronised when a batch threshold (default 32) is exceeded and can explicitly sync the counters when required assuming the synchronised count is only needed when reading debugfs.
On Wed, Aug 31, 2022 at 3:02 AM Mel Gorman <mgorman@suse.de> wrote: > > On Tue, Aug 30, 2022 at 02:48:52PM -0700, Suren Baghdasaryan wrote: > > From: Kent Overstreet <kent.overstreet@linux.dev> > > > > This patch adds lib/lazy-percpu-counter.c, which implements counters > > that start out as atomics, but lazily switch to percpu mode if the > > update rate crosses some threshold (arbitrarily set at 256 per second). > > > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > > Why not use percpu_counter? It has a per-cpu counter that is synchronised > when a batch threshold (default 32) is exceeded and can explicitly sync > the counters when required assuming the synchronised count is only needed > when reading debugfs. The intent is to use atomic counters for places that are not updated very often. This would save memory required for the counters. Originally I had a config option to choose which counter type to use but with lazy counters we sacrifice memory for performance only when needed while keeping the other counters small. > > -- > Mel Gorman > SUSE Labs
On Wed, Aug 31, 2022 at 11:02:49AM +0100, Mel Gorman wrote: > On Tue, Aug 30, 2022 at 02:48:52PM -0700, Suren Baghdasaryan wrote: > > From: Kent Overstreet <kent.overstreet@linux.dev> > > > > This patch adds lib/lazy-percpu-counter.c, which implements counters > > that start out as atomics, but lazily switch to percpu mode if the > > update rate crosses some threshold (arbitrarily set at 256 per second). > > > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > > Why not use percpu_counter? It has a per-cpu counter that is synchronised > when a batch threshold (default 32) is exceeded and can explicitly sync > the counters when required assuming the synchronised count is only needed > when reading debugfs. It doesn't switch from atomic mode to percpu mode when the update rate crosses a threshold like lazy percpu counters does, it allocates all the percpu counters up front - that makes it a non starter here. Also, from my reading of the code... wtf is it even doing, and why would I use it at all? This looks like old grotty code from ext3, it's not even using this_cpu_add() - it does preempt_enable()/disable() just for adding to a local percpu counter! Noooooope.
On Tue, Aug 30, 2022 at 02:48:52PM -0700, Suren Baghdasaryan wrote: > +static void lazy_percpu_counter_switch_to_pcpu(struct raw_lazy_percpu_counter *c) > +{ > + u64 __percpu *pcpu_v = alloc_percpu_gfp(u64, GFP_ATOMIC|__GFP_NOWARN); Realize that this is incorrect when used under a raw_spinlock_t.
On Thu, Sep 01, 2022 at 08:51:31AM +0200, Peter Zijlstra wrote: > On Tue, Aug 30, 2022 at 02:48:52PM -0700, Suren Baghdasaryan wrote: > > +static void lazy_percpu_counter_switch_to_pcpu(struct raw_lazy_percpu_counter *c) > > +{ > > + u64 __percpu *pcpu_v = alloc_percpu_gfp(u64, GFP_ATOMIC|__GFP_NOWARN); > > Realize that this is incorrect when used under a raw_spinlock_t. Can you elaborate?
On Thu, 1 Sep 2022 10:32:19 -0400 Kent Overstreet <kent.overstreet@linux.dev> wrote: > On Thu, Sep 01, 2022 at 08:51:31AM +0200, Peter Zijlstra wrote: > > On Tue, Aug 30, 2022 at 02:48:52PM -0700, Suren Baghdasaryan wrote: > > > +static void lazy_percpu_counter_switch_to_pcpu(struct raw_lazy_percpu_counter *c) > > > +{ > > > + u64 __percpu *pcpu_v = alloc_percpu_gfp(u64, GFP_ATOMIC|__GFP_NOWARN); > > > > Realize that this is incorrect when used under a raw_spinlock_t. > > Can you elaborate? All allocations (including GFP_ATOMIC) grab normal spin_locks. When PREEMPT_RT is configured, normal spin_locks turn into a mutex, where as raw_spinlock's do not. Thus, if this is done within a raw_spinlock with PREEMPT_RT configured, it can cause a schedule while holding a spinlock. -- Steve
On Thu, Sep 01, 2022 at 10:48:39AM -0400, Steven Rostedt wrote: > On Thu, 1 Sep 2022 10:32:19 -0400 > Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > On Thu, Sep 01, 2022 at 08:51:31AM +0200, Peter Zijlstra wrote: > > > On Tue, Aug 30, 2022 at 02:48:52PM -0700, Suren Baghdasaryan wrote: > > > > +static void lazy_percpu_counter_switch_to_pcpu(struct raw_lazy_percpu_counter *c) > > > > +{ > > > > + u64 __percpu *pcpu_v = alloc_percpu_gfp(u64, GFP_ATOMIC|__GFP_NOWARN); > > > > > > Realize that this is incorrect when used under a raw_spinlock_t. > > > > Can you elaborate? > > All allocations (including GFP_ATOMIC) grab normal spin_locks. When > PREEMPT_RT is configured, normal spin_locks turn into a mutex, where as > raw_spinlock's do not. > > Thus, if this is done within a raw_spinlock with PREEMPT_RT configured, it > can cause a schedule while holding a spinlock. Thanks, I think we should be good here but I'll document it anyways.
On Thu, Sep 01, 2022 at 10:32:19AM -0400, Kent Overstreet wrote: > On Thu, Sep 01, 2022 at 08:51:31AM +0200, Peter Zijlstra wrote: > > On Tue, Aug 30, 2022 at 02:48:52PM -0700, Suren Baghdasaryan wrote: > > > +static void lazy_percpu_counter_switch_to_pcpu(struct raw_lazy_percpu_counter *c) > > > +{ > > > + u64 __percpu *pcpu_v = alloc_percpu_gfp(u64, GFP_ATOMIC|__GFP_NOWARN); > > > > Realize that this is incorrect when used under a raw_spinlock_t. > > Can you elaborate? required lock order: raw_spinlock_t < spinlock_t < mutex allocators lives at spinlock_t. Also see CONFIG_PROVE_RAW_LOCK_NESTING and there might be a document mentioning all this somewhere. Additionally, this (obviously) also isn't NMI safe.
diff --git a/include/linux/lazy-percpu-counter.h b/include/linux/lazy-percpu-counter.h new file mode 100644 index 000000000000..a22a2b9a9f32 --- /dev/null +++ b/include/linux/lazy-percpu-counter.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Lazy percpu counters: + * (C) 2022 Kent Overstreet + * + * Lazy percpu counters start out in atomic mode, then switch to percpu mode if + * the update rate crosses some threshold. + * + * This means we don't have to decide between low memory overhead atomic + * counters and higher performance percpu counters - we can have our cake and + * eat it, too! + * + * Internally we use an atomic64_t, where the low bit indicates whether we're in + * percpu mode, and the high 8 bits are a secondary counter that's incremented + * when the counter is modified - meaning 55 bits of precision are available for + * the counter itself. + * + * lazy_percpu_counter is 16 bytes (on 64 bit machines), raw_lazy_percpu_counter + * is 8 bytes but requires a separate unsigned long to record when the counter + * wraps - because sometimes multiple counters are used together and can share + * the same timestamp. + */ + +#ifndef _LINUX_LAZY_PERCPU_COUNTER_H +#define _LINUX_LAZY_PERCPU_COUNTER_H + +struct raw_lazy_percpu_counter { + atomic64_t v; +}; + +void __lazy_percpu_counter_exit(struct raw_lazy_percpu_counter *c); +void __lazy_percpu_counter_add(struct raw_lazy_percpu_counter *c, + unsigned long *last_wrap, s64 i); +s64 __lazy_percpu_counter_read(struct raw_lazy_percpu_counter *c); + +static inline void __lazy_percpu_counter_sub(struct raw_lazy_percpu_counter *c, + unsigned long *last_wrap, s64 i) +{ + __lazy_percpu_counter_add(c, last_wrap, -i); +} + +struct lazy_percpu_counter { + struct raw_lazy_percpu_counter v; + unsigned long last_wrap; +}; + +static inline void lazy_percpu_counter_exit(struct lazy_percpu_counter *c) +{ + __lazy_percpu_counter_exit(&c->v); +} + +static inline void lazy_percpu_counter_add(struct lazy_percpu_counter *c, s64 i) +{ + __lazy_percpu_counter_add(&c->v, &c->last_wrap, i); +} + +static inline void lazy_percpu_counter_sub(struct lazy_percpu_counter *c, s64 i) +{ + __lazy_percpu_counter_sub(&c->v, &c->last_wrap, i); +} + +static inline s64 lazy_percpu_counter_read(struct lazy_percpu_counter *c) +{ + return __lazy_percpu_counter_read(&c->v); +} + +#endif /* _LINUX_LAZY_PERCPU_COUNTER_H */ diff --git a/lib/Kconfig b/lib/Kconfig index dc1ab2ed1dc6..fc6dbc425728 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -498,6 +498,9 @@ config ASSOCIATIVE_ARRAY for more information. +config LAZY_PERCPU_COUNTER + bool + config HAS_IOMEM bool depends on !NO_IOMEM diff --git a/lib/Makefile b/lib/Makefile index ffabc30a27d4..cc7762748708 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -163,6 +163,8 @@ obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o obj-$(CONFIG_DEBUG_LIST) += list_debug.o obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o +obj-$(CONFIG_LAZY_PERCPU_COUNTER) += lazy-percpu-counter.o + obj-$(CONFIG_BITREVERSE) += bitrev.o obj-$(CONFIG_LINEAR_RANGES) += linear_ranges.o obj-$(CONFIG_PACKING) += packing.o diff --git a/lib/lazy-percpu-counter.c b/lib/lazy-percpu-counter.c new file mode 100644 index 000000000000..299ef36137ee --- /dev/null +++ b/lib/lazy-percpu-counter.c @@ -0,0 +1,141 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <linux/atomic.h> +#include <linux/gfp.h> +#include <linux/jiffies.h> +#include <linux/lazy-percpu-counter.h> +#include <linux/percpu.h> + +/* + * We use the high bits of the atomic counter for a secondary counter, which is + * incremented every time the counter is touched. When the secondary counter + * wraps, we check the time the counter last wrapped, and if it was recent + * enough that means the update frequency has crossed our threshold and we + * switch to percpu mode: + */ +#define COUNTER_MOD_BITS 8 +#define COUNTER_MOD_MASK ~(~0ULL >> COUNTER_MOD_BITS) +#define COUNTER_MOD_BITS_START (64 - COUNTER_MOD_BITS) + +/* + * We use the low bit of the counter to indicate whether we're in atomic mode + * (low bit clear), or percpu mode (low bit set, counter is a pointer to actual + * percpu counters: + */ +#define COUNTER_IS_PCPU_BIT 1 + +static inline u64 __percpu *lazy_percpu_counter_is_pcpu(u64 v) +{ + if (!(v & COUNTER_IS_PCPU_BIT)) + return NULL; + + v ^= COUNTER_IS_PCPU_BIT; + return (u64 __percpu *)(unsigned long)v; +} + +static inline s64 lazy_percpu_counter_atomic_val(s64 v) +{ + /* Ensure output is sign extended properly: */ + return (v << COUNTER_MOD_BITS) >> + (COUNTER_MOD_BITS + COUNTER_IS_PCPU_BIT); +} + +static void lazy_percpu_counter_switch_to_pcpu(struct raw_lazy_percpu_counter *c) +{ + u64 __percpu *pcpu_v = alloc_percpu_gfp(u64, GFP_ATOMIC|__GFP_NOWARN); + u64 old, new, v; + + if (!pcpu_v) + return; + + preempt_disable(); + v = atomic64_read(&c->v); + do { + if (lazy_percpu_counter_is_pcpu(v)) { + free_percpu(pcpu_v); + return; + } + + old = v; + new = (unsigned long)pcpu_v | 1; + + *this_cpu_ptr(pcpu_v) = lazy_percpu_counter_atomic_val(v); + } while ((v = atomic64_cmpxchg(&c->v, old, new)) != old); + preempt_enable(); +} + +/** + * __lazy_percpu_counter_exit: Free resources associated with a + * raw_lazy_percpu_counter + * + * @c: counter to exit + */ +void __lazy_percpu_counter_exit(struct raw_lazy_percpu_counter *c) +{ + free_percpu(lazy_percpu_counter_is_pcpu(atomic64_read(&c->v))); +} +EXPORT_SYMBOL_GPL(__lazy_percpu_counter_exit); + +/** + * __lazy_percpu_counter_read: Read current value of a raw_lazy_percpu_counter + * + * @c: counter to read + */ +s64 __lazy_percpu_counter_read(struct raw_lazy_percpu_counter *c) +{ + s64 v = atomic64_read(&c->v); + u64 __percpu *pcpu_v = lazy_percpu_counter_is_pcpu(v); + + if (pcpu_v) { + int cpu; + + v = 0; + for_each_possible_cpu(cpu) + v += *per_cpu_ptr(pcpu_v, cpu); + } else { + v = lazy_percpu_counter_atomic_val(v); + } + + return v; +} +EXPORT_SYMBOL_GPL(__lazy_percpu_counter_read); + +/** + * __lazy_percpu_counter_add: Add a value to a lazy_percpu_counter + * + * @c: counter to modify + * @last_wrap: pointer to a timestamp, updated when mod counter wraps + * @i: value to add + */ +void __lazy_percpu_counter_add(struct raw_lazy_percpu_counter *c, + unsigned long *last_wrap, s64 i) +{ + u64 atomic_i; + u64 old, v = atomic64_read(&c->v); + u64 __percpu *pcpu_v; + + atomic_i = i << COUNTER_IS_PCPU_BIT; + atomic_i &= ~COUNTER_MOD_MASK; + atomic_i |= 1ULL << COUNTER_MOD_BITS_START; + + do { + pcpu_v = lazy_percpu_counter_is_pcpu(v); + if (pcpu_v) { + this_cpu_add(*pcpu_v, i); + return; + } + + old = v; + } while ((v = atomic64_cmpxchg(&c->v, old, old + atomic_i)) != old); + + if (unlikely(!(v & COUNTER_MOD_MASK))) { + unsigned long now = jiffies; + + if (*last_wrap && + unlikely(time_after(*last_wrap + HZ, now))) + lazy_percpu_counter_switch_to_pcpu(c); + else + *last_wrap = now; + } +} +EXPORT_SYMBOL(__lazy_percpu_counter_add);