Message ID | 20220913192538.3023708-2-jiebin.sun@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | ipc/msg: mitigate the lock contention in ipc/msg | expand |
Hi Jiebin, On 9/13/22 21:25, Jiebin Sun wrote: > > +/* > + * With percpu_counter_add_local() and percpu_counter_sub_local(), counts > + * are accumulated in local per cpu counter and not in fbc->count until > + * local count overflows PERCPU_COUNTER_LOCAL_BATCH. This makes counter > + * write efficient. > + * But percpu_counter_sum(), instead of percpu_counter_read(), needs to be > + * used to add up the counts from each CPU to account for all the local > + * counts. So percpu_counter_add_local() and percpu_counter_sub_local() > + * should be used when a counter is updated frequently and read rarely. > + */ > +static inline void > +percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) > +{ > + percpu_counter_add_batch(fbc, amount, PERCPU_COUNTER_LOCAL_BATCH); > +} > + Unrelated to your patch, and not relevant for ipc/msg as the functions are not called from interrupts, but: Aren't there races with interrupts? > * > * This function is both preempt and irq safe. The former is due to > explicit > * preemption disable. The latter is guaranteed by the fact that the > slow path > * is explicitly protected by an irq-safe spinlock whereas the fast > patch uses > * this_cpu_add which is irq-safe by definition. Hence there is no need > muck > * with irq state before calling this one > */ > void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, > s32 batch) > { > s64 count; > > preempt_disable(); > count = __this_cpu_read(*fbc->counters) + amount; > if (abs(count) >= batch) { > unsigned long flags; > raw_spin_lock_irqsave(&fbc->lock, flags); > fbc->count += count; > __this_cpu_sub(*fbc->counters, count - amount); > raw_spin_unlock_irqrestore(&fbc->lock, flags); > } else { > this_cpu_add(*fbc->counters, amount); > } > preempt_enable(); > } > EXPORT_SYMBOL(percpu_counter_add_batch); > > Race 1: start: __this_cpu_read(*fbc->counters) = INT_MAX-1. Call: per_cpu_counter_add_batch(fbc, 1, INT_MAX); Result: count=INT_MAX; if (abs(count) >= batch) { // branch taken before the raw_spin_lock_irqsave(): Interrupt Within interrupt: per_cpu_counter_add_batch(fbc, -2*(INT_MAX-1), INT_MAX) count=-(INT_MAX-1); branch not taken this_cpu_add() updates fbc->counters, new value is -(INT_MAX-1) exit interrupt raw_spin_lock_irqsave() __this_cpu_sub(*fbc->counters, count - amount) will substract INT_MAX-1 from *fbc->counters. But the value is already -(INT_MAX-1) -> underflow. Race 2: (much simpler) start: __this_cpu_read(*fbc->counters) = 0. Call: per_cpu_counter_add_batch(fbc, INT_MAX-1, INT_MAX); amont=INT_MAX-1; - branch not taken. before this_cpu_add(): interrupt within the interrupt: call per_cpu_counter_add_batch(fbc, INT_MAX-1, INT_MAX) new value of *fbc->counters: INT_MAX-1. exit interrupt outside interrupt: this_cpu_add(*fbc->counters, amount); <<< overflow. Attached is an incomplete patch (untested). If needed, I could check the whole file and add/move the required local_irq_save() calls. -- Manfred
On 9/18/2022 7:08 PM, Manfred Spraul wrote: > Hi Jiebin, > > On 9/13/22 21:25, Jiebin Sun wrote: >> +/* >> + * With percpu_counter_add_local() and percpu_counter_sub_local(), >> counts >> + * are accumulated in local per cpu counter and not in fbc->count until >> + * local count overflows PERCPU_COUNTER_LOCAL_BATCH. This makes counter >> + * write efficient. >> + * But percpu_counter_sum(), instead of percpu_counter_read(), needs >> to be >> + * used to add up the counts from each CPU to account for all the local >> + * counts. So percpu_counter_add_local() and percpu_counter_sub_local() >> + * should be used when a counter is updated frequently and read rarely. >> + */ >> +static inline void >> +percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) >> +{ >> + percpu_counter_add_batch(fbc, amount, PERCPU_COUNTER_LOCAL_BATCH); >> +} >> + > > Unrelated to your patch, and not relevant for ipc/msg as the functions > are not called from interrupts, but: > Aren't there races with interrupts? > >> * >> * This function is both preempt and irq safe. The former is due to >> explicit >> * preemption disable. The latter is guaranteed by the fact that the >> slow path >> * is explicitly protected by an irq-safe spinlock whereas the fast >> patch uses >> * this_cpu_add which is irq-safe by definition. Hence there is no >> need muck >> * with irq state before calling this one >> */ >> void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, >> s32 batch) >> { >> s64 count; >> >> preempt_disable(); >> count = __this_cpu_read(*fbc->counters) + amount; >> if (abs(count) >= batch) { >> unsigned long flags; >> raw_spin_lock_irqsave(&fbc->lock, flags); >> fbc->count += count; >> __this_cpu_sub(*fbc->counters, count - amount); >> raw_spin_unlock_irqrestore(&fbc->lock, flags); >> } else { >> this_cpu_add(*fbc->counters, amount); >> } >> preempt_enable(); >> } >> EXPORT_SYMBOL(percpu_counter_add_batch); >> >> > Race 1: > > start: __this_cpu_read(*fbc->counters) = INT_MAX-1. > > Call: per_cpu_counter_add_batch(fbc, 1, INT_MAX); > > Result: > > count=INT_MAX; > > if (abs(count) >= batch) { // branch taken > > before the raw_spin_lock_irqsave(): > > Interrupt > > Within interrupt: > > per_cpu_counter_add_batch(fbc, -2*(INT_MAX-1), INT_MAX) > > count=-(INT_MAX-1); > > branch not taken > > this_cpu_add() updates fbc->counters, new value is -(INT_MAX-1) > > exit interrupt > > raw_spin_lock_irqsave() > > __this_cpu_sub(*fbc->counters, count - amount) > > will substract INT_MAX-1 from *fbc->counters. But the value is already > -(INT_MAX-1) -> underflow. > > > Race 2: (much simpler) > > start: __this_cpu_read(*fbc->counters) = 0. > > Call: per_cpu_counter_add_batch(fbc, INT_MAX-1, INT_MAX); > > amont=INT_MAX-1; > > - branch not taken. > > before this_cpu_add(): interrupt > > within the interrupt: call per_cpu_counter_add_batch(fbc, INT_MAX-1, > INT_MAX) > > new value of *fbc->counters: INT_MAX-1. > > exit interrupt > > outside interrupt: > > this_cpu_add(*fbc->counters, amount); > > <<< overflow. > > Attached is an incomplete patch (untested). > If needed, I could check the whole file and add/move the required > local_irq_save() calls. > > > -- > > Manfred The interrupt protect patch in the real case looks good to me. Thanks.
diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index 01861eebed79..8ed5fba6d156 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -15,6 +15,9 @@ #include <linux/types.h> #include <linux/gfp.h> +/* percpu_counter batch for local add or sub */ +#define PERCPU_COUNTER_LOCAL_BATCH INT_MAX + #ifdef CONFIG_SMP struct percpu_counter { @@ -56,6 +59,22 @@ static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount) percpu_counter_add_batch(fbc, amount, percpu_counter_batch); } +/* + * With percpu_counter_add_local() and percpu_counter_sub_local(), counts + * are accumulated in local per cpu counter and not in fbc->count until + * local count overflows PERCPU_COUNTER_LOCAL_BATCH. This makes counter + * write efficient. + * But percpu_counter_sum(), instead of percpu_counter_read(), needs to be + * used to add up the counts from each CPU to account for all the local + * counts. So percpu_counter_add_local() and percpu_counter_sub_local() + * should be used when a counter is updated frequently and read rarely. + */ +static inline void +percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) +{ + percpu_counter_add_batch(fbc, amount, PERCPU_COUNTER_LOCAL_BATCH); +} + static inline s64 percpu_counter_sum_positive(struct percpu_counter *fbc) { s64 ret = __percpu_counter_sum(fbc); @@ -138,6 +157,13 @@ percpu_counter_add(struct percpu_counter *fbc, s64 amount) preempt_enable(); } +/* non-SMP percpu_counter_add_local is the same with percpu_counter_add */ +static inline void +percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) +{ + percpu_counter_add(fbc, amount); +} + static inline void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) { @@ -193,4 +219,10 @@ static inline void percpu_counter_sub(struct percpu_counter *fbc, s64 amount) percpu_counter_add(fbc, -amount); } +static inline void +percpu_counter_sub_local(struct percpu_counter *fbc, s64 amount) +{ + percpu_counter_add_local(fbc, -amount); +} + #endif /* _LINUX_PERCPU_COUNTER_H */