Message ID | 20220316224548.500123-1-namhyung@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | locking: Add new lock contention tracepoints (v3) | expand |
On Wed, Mar 16, 2022 at 03:45:46PM -0700, Namhyung Kim wrote: > Hello, > > There have been some requests for low-overhead kernel lock contention > monitoring. The kernel has CONFIG_LOCK_STAT to provide such an infra > either via /proc/lock_stat or tracepoints directly. > > However it's not light-weight and hard to be used in production. So > I'm trying to add new tracepoints for lock contention and using them > as a base to build a new monitoring system. Hello Namhyung, I like this series so much. Lock contentions became much more visible without serious overhead. For the series: Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> How would you use these tracepoints, is there a script you use? For testing, I just wrote a simple bpftrace script: $ sudo bpftrace -e 'BEGIN { printf("Collecting lockstats... Hit Ctrl-C to end.\n"); } tracepoint:lock:contention_begin { @start_us[tid] = nsecs / 1000; } tracepoint:lock:contention_end { if (args->ret == 0) { @stats[kstack] = hist(nsecs / 1000 - @start_us[tid]); } } END { clear(@start_us); }' And it shows its distribution of slowpath wait time like below. Great. @stats[ __traceiter_contention_end+76 __traceiter_contention_end+76 queued_spin_lock_slowpath+556 _raw_spin_lock+108 rmqueue_bulk+80 rmqueue+1060 get_page_from_freelist+372 __alloc_pages+208 alloc_pages_vma+160 alloc_zeroed_user_highpage_movable+72 do_anonymous_page+92 handle_pte_fault+320 __handle_mm_fault+252 handle_mm_fault+244 do_page_fault+340 do_translation_fault+100 do_mem_abort+76 el0_da+60 el0t_64_sync_handler+232 el0t_64_sync+420 ]: [2, 4) 1 |@ | [4, 8) 30 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [8, 16) 25 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [16, 32) 33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32, 64) 29 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [64, 128) 13 |@@@@@@@@@@@@@@@@@@@@ | [128, 256) 2 |@@@ | @stats[ __traceiter_contention_end+76 __traceiter_contention_end+76 rwsem_down_write_slowpath+1216 down_write_killable+100 do_mprotect_pkey.constprop.0+176 __arm64_sys_mprotect+40 invoke_syscall+80 el0_svc_common.constprop.0+76 do_el0_svc+52 el0_svc+48 el0t_64_sync_handler+164 el0t_64_sync+420 ]: [1] 13 |@@@@@@@@@@@ | [2, 4) 42 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4, 8) 5 |@@@@ | [8, 16) 10 |@@@@@@@@ | [16, 32) 60 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32, 64) 41 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [64, 128) 40 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [128, 256) 14 |@@@@@@@@@@@@ | [256, 512) 7 |@@@@@@ | [512, 1K) 6 |@@@@@ | [1K, 2K) 2 |@ | [2K, 4K) 1 | | @stats[ __traceiter_contention_end+76 __traceiter_contention_end+76 queued_spin_lock_slowpath+556 _raw_spin_lock+108 futex_wake+168 do_futex+200 __arm64_sys_futex+112 invoke_syscall+80 el0_svc_common.constprop.0+76 do_el0_svc+52 el0_svc+48 el0t_64_sync_handler+164 el0t_64_sync+420 ]: [0] 3 | | [1] 2515 |@ | [2, 4) 8747 |@@@@@ | [4, 8) 17052 |@@@@@@@@@@ | [8, 16) 46706 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [16, 32) 82105 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32, 64) 46918 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [64, 128) 99 | | [128, 256) 1 | | [256, 512) 8 | | [512, 1K) 0 | | [1K, 2K) 0 | | [2K, 4K) 0 | | [4K, 8K) 0 | | [8K, 16K) 0 | | [16K, 32K) 5 | | [32K, 64K) 12 | | [64K, 128K) 34 | | [128K, 256K) 68 | | [256K, 512K) 7 | | > * Changes in v3 > - move the tracepoints deeper in the slow path > - remove the caller ip > - don't use task state in the flags > > * Changes in v2 > - do not use lockdep infrastructure > - add flags argument to lock:contention_begin tracepoint > > I added a flags argument in the contention_begin to classify locks in > question. It can tell whether it's a spinlock, reader-writer lock or > a mutex. With stacktrace, users can identify which lock is contended. > > The patch 01 added the tracepoints and move the definition to the > mutex.c file so that it can see the tracepoints without lockdep. > > The patch 02 actually installs the tracepoints in the locking code. > To minimize the overhead, they were added in the slow path of the code > separately. As spinlocks are defined in the arch headers, I couldn't > handle them all. I've just added it to generic queued spinlock and > rwlocks only. Each arch can add the tracepoints later. > > This series base on the current tip/locking/core and you get it from > 'locking/tracepoint-v3' branch in my tree at: > > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git > > > Thanks, > Namhyung > > > Namhyung Kim (2): > locking: Add lock contention tracepoints > locking: Apply contention tracepoints in the slow path > > include/trace/events/lock.h | 54 +++++++++++++++++++++++++++++++++-- > kernel/locking/lockdep.c | 1 - > kernel/locking/mutex.c | 6 ++++ > kernel/locking/percpu-rwsem.c | 3 ++ > kernel/locking/qrwlock.c | 9 ++++++ > kernel/locking/qspinlock.c | 5 ++++ > kernel/locking/rtmutex.c | 11 +++++++ > kernel/locking/rwbase_rt.c | 3 ++ > kernel/locking/rwsem.c | 9 ++++++ > kernel/locking/semaphore.c | 14 ++++++++- > 10 files changed, 110 insertions(+), 5 deletions(-) > > > base-commit: cd27ccfc727e99352321c0c75012ab9c5a90321e > -- > 2.35.1.894.gb6a874cedc-goog >
Hello Hyeonggon, On Thu, Mar 17, 2022 at 10:32 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote: > > On Wed, Mar 16, 2022 at 03:45:46PM -0700, Namhyung Kim wrote: > > Hello, > > > > There have been some requests for low-overhead kernel lock contention > > monitoring. The kernel has CONFIG_LOCK_STAT to provide such an infra > > either via /proc/lock_stat or tracepoints directly. > > > > However it's not light-weight and hard to be used in production. So > > I'm trying to add new tracepoints for lock contention and using them > > as a base to build a new monitoring system. > > Hello Namhyung, > I like this series so much. > Lock contentions became much more visible without serious overhead. > > For the series: > Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Thanks! > > How would you use these tracepoints, is there a script you use? Not yet. But I'm thinking something similar to your script. Probably I'll extend 'perf lock' command to have this kind of output but it doesn't have lock name and use avg/min/max time instead of histogram. Thanks, Namhyung > For testing, I just wrote a simple bpftrace script: > > $ sudo bpftrace -e 'BEGIN > { > printf("Collecting lockstats... Hit Ctrl-C to end.\n"); > } > > tracepoint:lock:contention_begin > { > @start_us[tid] = nsecs / 1000; > } > > tracepoint:lock:contention_end > { > if (args->ret == 0) { > @stats[kstack] = hist(nsecs / 1000 - @start_us[tid]); > } > } > > END > { > clear(@start_us); > }' > > And it shows its distribution of slowpath wait time like below. Great. > > @stats[ > __traceiter_contention_end+76 > __traceiter_contention_end+76 > queued_spin_lock_slowpath+556 > _raw_spin_lock+108 > rmqueue_bulk+80 > rmqueue+1060 > get_page_from_freelist+372 > __alloc_pages+208 > alloc_pages_vma+160 > alloc_zeroed_user_highpage_movable+72 > do_anonymous_page+92 > handle_pte_fault+320 > __handle_mm_fault+252 > handle_mm_fault+244 > do_page_fault+340 > do_translation_fault+100 > do_mem_abort+76 > el0_da+60 > el0t_64_sync_handler+232 > el0t_64_sync+420 > ]: > [2, 4) 1 |@ | > [4, 8) 30 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [8, 16) 25 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [16, 32) 33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [32, 64) 29 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [64, 128) 13 |@@@@@@@@@@@@@@@@@@@@ | > [128, 256) 2 |@@@ | > > > @stats[ > __traceiter_contention_end+76 > __traceiter_contention_end+76 > rwsem_down_write_slowpath+1216 > down_write_killable+100 > do_mprotect_pkey.constprop.0+176 > __arm64_sys_mprotect+40 > invoke_syscall+80 > el0_svc_common.constprop.0+76 > do_el0_svc+52 > el0_svc+48 > el0t_64_sync_handler+164 > el0t_64_sync+420 > ]: > [1] 13 |@@@@@@@@@@@ | > [2, 4) 42 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [4, 8) 5 |@@@@ | > [8, 16) 10 |@@@@@@@@ | > [16, 32) 60 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [32, 64) 41 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [64, 128) 40 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [128, 256) 14 |@@@@@@@@@@@@ | > [256, 512) 7 |@@@@@@ | > [512, 1K) 6 |@@@@@ | > [1K, 2K) 2 |@ | > [2K, 4K) 1 | | > > @stats[ > __traceiter_contention_end+76 > __traceiter_contention_end+76 > queued_spin_lock_slowpath+556 > _raw_spin_lock+108 > futex_wake+168 > do_futex+200 > __arm64_sys_futex+112 > invoke_syscall+80 > el0_svc_common.constprop.0+76 > do_el0_svc+52 > el0_svc+48 > el0t_64_sync_handler+164 > el0t_64_sync+420 > ]: > [0] 3 | | > [1] 2515 |@ | > [2, 4) 8747 |@@@@@ | > [4, 8) 17052 |@@@@@@@@@@ | > [8, 16) 46706 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [16, 32) 82105 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [32, 64) 46918 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > [64, 128) 99 | | > [128, 256) 1 | | > [256, 512) 8 | | > [512, 1K) 0 | | > [1K, 2K) 0 | | > [2K, 4K) 0 | | > [4K, 8K) 0 | | > [8K, 16K) 0 | | > [16K, 32K) 5 | | > [32K, 64K) 12 | | > [64K, 128K) 34 | | > [128K, 256K) 68 | | > [256K, 512K) 7 | | >