Message ID | 20221016162305.2489629-1-joel@joelfernandes.org (mailing list archive) |
---|---|
Headers | show |
Series | rcu: call_rcu() power improvements | expand |
On Sun, Oct 16, 2022 at 04:22:52PM +0000, Joel Fernandes (Google) wrote: > v9 version of RCU lazy patches based on rcu/next branch. > Only change since v8 is this discussion: > https://lore.kernel.org/rcu/20221011180142.2742289-1-joel@joelfernandes.org/T/#m8eff15110477f3430b3b02561b66f7b0d34a73b0 > > To facilitate easier merge, I dropped tracing and other patches and just > implemented the new changes. I will post the tracing patches later along with > rcutop as I need to add new tracepoints that Frederic suggested. > > Main recent changes: > 1. rcu_barrier() wake up only for lazy bypass list. > 2. Make all call_rcu() default-lazy and add call_rcu_flush() API. > 3. Take care of some callers using call_rcu_flush() API. > 4. Several refactorings suggested by Paul/Frederic. > 5. New call_rcu() to call_rcu_flush() conversions by Joel/Vlad/Paul. > > I am seeing good performance and power with these patches on real ChromeOS x86 > asymmetric hardware. > > Earlier cover letter with lots of details is here: > https://lore.kernel.org/all/20220901221720.1105021-1-joel@joelfernandes.org/ > > List of recent changes: > > [ Frederic Weisbec: Program the lazy timer only if WAKE_NOT, since other > deferral levels wake much earlier so for those it is not needed. ] > > [ Frederic Weisbec: Use flush flags to keep bypass API code clean. ] > > [ Frederic Weisbec: Make rcu_barrier() wake up only if main list empty. ] > > [ Frederic Weisbec: Remove extra 'else if' branch in rcu_nocb_try_bypass(). ] > > [ Joel: Fix issue where I was not resetting lazy_len after moving it to rdp ] > > [ Paul/Thomas/Joel: Make call_rcu() default lazy so users don't mess up. ] > > [ Paul/Frederic : Cosmetic changes, split out wakeup of nocb thread. ] > > [ Vlad/Joel : More call_rcu -> flush conversions ] > > [ debug code for detecting "wake" in kernel's call_rcu() callbacks. ] > > The following 2 scripts can be used to check if any callbacks in the kernel are > doing a wake up (it is best effort and may miss some things, but we found > issues using it) > > 1. Script to search for call_rcu() references and dump the callback list to a file: > #!/bin/bash > > rm func-list > touch func-list > > for f in $(find . \( -name "*.c" -o -name "*.h" \) | grep -v rcu); do > > funcs=$(perl -0777 -ne 'while(m/call_rcu\([&]?.+,\s?(.+)\).*;/g){print "$1\n";}' $f) > > if [ "x$funcs" != "x" ]; then > for func in $funcs; do > echo "$f $func" >> func-list > echo "$f $func" > done > fi > > done > > cat func-list | sort | uniq | tee func-list-sorted > > 2. Script to search "wake" after callback references: > > #!/bin/bash > > while read fl; do > file=$(echo $fl | cut -d " " -f1) > func=$(echo $fl | cut -d " " -f2) > > grep -A 30 $func $file | grep wake > /dev/null > > if [ $? -eq 0 ]; then > echo "keyword wake found after function reference $func in $file" > echo "Output:" > grep -A 30 $func $file > echo "===========================================================" > fi > done < func-list-sorted Very good, thank you all! I have pulled these in for further review and testing. I am holding off on the last one ("rcu/debug: Add wake-up debugging for lazy callbacks") for the immediate future, but let's see how it goes. Thanx, Paul > Frederic Weisbecker (1): > rcu: Fix missing nocb gp wake on rcu_barrier() > > Joel Fernandes (Google) (9): > rcu: Make call_rcu() lazy to save power > rcu: Refactor code a bit in rcu_nocb_do_flush_bypass() > rcuscale: Add laziness and kfree tests > percpu-refcount: Use call_rcu_flush() for atomic switch > rcu/sync: Use call_rcu_flush() instead of call_rcu > rcu/rcuscale: Use call_rcu_flush() for async reader test > rcu/rcutorture: Use call_rcu_flush() where needed > rxrpc: Use call_rcu_flush() instead of call_rcu() > rcu/debug: Add wake-up debugging for lazy callbacks > > Uladzislau Rezki (2): > scsi/scsi_error: Use call_rcu_flush() instead of call_rcu() > workqueue: Make queue_rcu_work() use call_rcu_flush() > > Vineeth Pillai (1): > rcu: shrinker for lazy rcu > > drivers/scsi/scsi_error.c | 2 +- > include/linux/rcupdate.h | 7 ++ > kernel/rcu/Kconfig | 15 +++ > kernel/rcu/lazy-debug.h | 154 +++++++++++++++++++++++++++ > kernel/rcu/rcu.h | 8 ++ > kernel/rcu/rcuscale.c | 70 +++++++++++- > kernel/rcu/rcutorture.c | 16 +-- > kernel/rcu/sync.c | 2 +- > kernel/rcu/tiny.c | 2 +- > kernel/rcu/tree.c | 149 ++++++++++++++++++-------- > kernel/rcu/tree.h | 12 ++- > kernel/rcu/tree_exp.h | 2 +- > kernel/rcu/tree_nocb.h | 217 ++++++++++++++++++++++++++++++++------ > kernel/workqueue.c | 2 +- > lib/percpu-refcount.c | 3 +- > net/rxrpc/conn_object.c | 2 +- > 16 files changed, 565 insertions(+), 98 deletions(-) > create mode 100644 kernel/rcu/lazy-debug.h > > -- > 2.38.0.413.g74048e4d9e-goog >
On Mon, Oct 17, 2022 at 9:37 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Sun, Oct 16, 2022 at 04:22:52PM +0000, Joel Fernandes (Google) wrote: > > v9 version of RCU lazy patches based on rcu/next branch. > > Only change since v8 is this discussion: > > https://lore.kernel.org/rcu/20221011180142.2742289-1-joel@joelfernandes.org/T/#m8eff15110477f3430b3b02561b66f7b0d34a73b0 > > > > To facilitate easier merge, I dropped tracing and other patches and just > > implemented the new changes. I will post the tracing patches later along with > > rcutop as I need to add new tracepoints that Frederic suggested. > > > > Main recent changes: > > 1. rcu_barrier() wake up only for lazy bypass list. > > 2. Make all call_rcu() default-lazy and add call_rcu_flush() API. > > 3. Take care of some callers using call_rcu_flush() API. > > 4. Several refactorings suggested by Paul/Frederic. > > 5. New call_rcu() to call_rcu_flush() conversions by Joel/Vlad/Paul. > > > > I am seeing good performance and power with these patches on real ChromeOS x86 > > asymmetric hardware. > > > > Earlier cover letter with lots of details is here: > > https://lore.kernel.org/all/20220901221720.1105021-1-joel@joelfernandes.org/ > > [...] > > Very good, thank you all! > > I have pulled these in for further review and testing. > > I am holding off on the last one ("rcu/debug: Add wake-up debugging for > lazy callbacks") for the immediate future, but let's see how it goes. Thanks! And nice timing with RCU just turning 20 years old ;-) - Joel > > Thanx, Paul > > > Frederic Weisbecker (1): > > rcu: Fix missing nocb gp wake on rcu_barrier() > > > > Joel Fernandes (Google) (9): > > rcu: Make call_rcu() lazy to save power > > rcu: Refactor code a bit in rcu_nocb_do_flush_bypass() > > rcuscale: Add laziness and kfree tests > > percpu-refcount: Use call_rcu_flush() for atomic switch > > rcu/sync: Use call_rcu_flush() instead of call_rcu > > rcu/rcuscale: Use call_rcu_flush() for async reader test > > rcu/rcutorture: Use call_rcu_flush() where needed > > rxrpc: Use call_rcu_flush() instead of call_rcu() > > rcu/debug: Add wake-up debugging for lazy callbacks > > > > Uladzislau Rezki (2): > > scsi/scsi_error: Use call_rcu_flush() instead of call_rcu() > > workqueue: Make queue_rcu_work() use call_rcu_flush() > > > > Vineeth Pillai (1): > > rcu: shrinker for lazy rcu > > > > drivers/scsi/scsi_error.c | 2 +- > > include/linux/rcupdate.h | 7 ++ > > kernel/rcu/Kconfig | 15 +++ > > kernel/rcu/lazy-debug.h | 154 +++++++++++++++++++++++++++ > > kernel/rcu/rcu.h | 8 ++ > > kernel/rcu/rcuscale.c | 70 +++++++++++- > > kernel/rcu/rcutorture.c | 16 +-- > > kernel/rcu/sync.c | 2 +- > > kernel/rcu/tiny.c | 2 +- > > kernel/rcu/tree.c | 149 ++++++++++++++++++-------- > > kernel/rcu/tree.h | 12 ++- > > kernel/rcu/tree_exp.h | 2 +- > > kernel/rcu/tree_nocb.h | 217 ++++++++++++++++++++++++++++++++------ > > kernel/workqueue.c | 2 +- > > lib/percpu-refcount.c | 3 +- > > net/rxrpc/conn_object.c | 2 +- > > 16 files changed, 565 insertions(+), 98 deletions(-) > > create mode 100644 kernel/rcu/lazy-debug.h > > > > -- > > 2.38.0.413.g74048e4d9e-goog > >
On Mon, Oct 17, 2022 at 09:47:00AM -0400, Joel Fernandes wrote: > On Mon, Oct 17, 2022 at 9:37 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Sun, Oct 16, 2022 at 04:22:52PM +0000, Joel Fernandes (Google) wrote: > > > v9 version of RCU lazy patches based on rcu/next branch. > > > Only change since v8 is this discussion: > > > https://lore.kernel.org/rcu/20221011180142.2742289-1-joel@joelfernandes.org/T/#m8eff15110477f3430b3b02561b66f7b0d34a73b0 > > > > > > To facilitate easier merge, I dropped tracing and other patches and just > > > implemented the new changes. I will post the tracing patches later along with > > > rcutop as I need to add new tracepoints that Frederic suggested. > > > > > > Main recent changes: > > > 1. rcu_barrier() wake up only for lazy bypass list. > > > 2. Make all call_rcu() default-lazy and add call_rcu_flush() API. > > > 3. Take care of some callers using call_rcu_flush() API. > > > 4. Several refactorings suggested by Paul/Frederic. > > > 5. New call_rcu() to call_rcu_flush() conversions by Joel/Vlad/Paul. > > > > > > I am seeing good performance and power with these patches on real ChromeOS x86 > > > asymmetric hardware. > > > > > > Earlier cover letter with lots of details is here: > > > https://lore.kernel.org/all/20220901221720.1105021-1-joel@joelfernandes.org/ > > > > [...] > > > > Very good, thank you all! > > > > I have pulled these in for further review and testing. > > > > I am holding off on the last one ("rcu/debug: Add wake-up debugging for > > lazy callbacks") for the immediate future, but let's see how it goes. > > Thanks! And nice timing with RCU just turning 20 years old ;-) Yes, 20 years old, but with the qualifier "in the Linux kernel". ;-) RCU was in DYNIX/ptx for almost ten years prior to that. And mechanisms vaguely resembling RCU go back to Kung's and Lehman's 1980 paper entitled "Concurrent Manipulation of Binary Search Trees". And maybe farther, but that is the oldest citation I know of. Thanx, Paul > - Joel > > > > > > > > Thanx, Paul > > > > > Frederic Weisbecker (1): > > > rcu: Fix missing nocb gp wake on rcu_barrier() > > > > > > Joel Fernandes (Google) (9): > > > rcu: Make call_rcu() lazy to save power > > > rcu: Refactor code a bit in rcu_nocb_do_flush_bypass() > > > rcuscale: Add laziness and kfree tests > > > percpu-refcount: Use call_rcu_flush() for atomic switch > > > rcu/sync: Use call_rcu_flush() instead of call_rcu > > > rcu/rcuscale: Use call_rcu_flush() for async reader test > > > rcu/rcutorture: Use call_rcu_flush() where needed > > > rxrpc: Use call_rcu_flush() instead of call_rcu() > > > rcu/debug: Add wake-up debugging for lazy callbacks > > > > > > Uladzislau Rezki (2): > > > scsi/scsi_error: Use call_rcu_flush() instead of call_rcu() > > > workqueue: Make queue_rcu_work() use call_rcu_flush() > > > > > > Vineeth Pillai (1): > > > rcu: shrinker for lazy rcu > > > > > > drivers/scsi/scsi_error.c | 2 +- > > > include/linux/rcupdate.h | 7 ++ > > > kernel/rcu/Kconfig | 15 +++ > > > kernel/rcu/lazy-debug.h | 154 +++++++++++++++++++++++++++ > > > kernel/rcu/rcu.h | 8 ++ > > > kernel/rcu/rcuscale.c | 70 +++++++++++- > > > kernel/rcu/rcutorture.c | 16 +-- > > > kernel/rcu/sync.c | 2 +- > > > kernel/rcu/tiny.c | 2 +- > > > kernel/rcu/tree.c | 149 ++++++++++++++++++-------- > > > kernel/rcu/tree.h | 12 ++- > > > kernel/rcu/tree_exp.h | 2 +- > > > kernel/rcu/tree_nocb.h | 217 ++++++++++++++++++++++++++++++++------ > > > kernel/workqueue.c | 2 +- > > > lib/percpu-refcount.c | 3 +- > > > net/rxrpc/conn_object.c | 2 +- > > > 16 files changed, 565 insertions(+), 98 deletions(-) > > > create mode 100644 kernel/rcu/lazy-debug.h > > > > > > -- > > > 2.38.0.413.g74048e4d9e-goog > > >