Message ID | 20170208192659.GA32582@beast (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Feb 8, 2017 at 11:26 AM, Kees Cook <keescook@chromium.org> wrote: > > Given that the tracer can give the same information, this patch entirely > removes CONFIG_TIMER_STATS. > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Kees Cook <keescook@chromium.org> > Acked-by: John Stultz <john.stultz@linaro.org> Looks good to me. Let's wait for the 4.11 merge window though. I'm assuming I'll get this through the tip timer tree.. Linus
On Wed, Feb 08, 2017 at 11:54:30AM -0800, Linus Torvalds wrote: > On Wed, Feb 8, 2017 at 11:26 AM, Kees Cook <keescook@chromium.org> wrote: > > > > Given that the tracer can give the same information, this patch entirely > > removes CONFIG_TIMER_STATS. > > > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > > Signed-off-by: Kees Cook <keescook@chromium.org> > > Acked-by: John Stultz <john.stultz@linaro.org> > > Looks good to me. Let's wait for the 4.11 merge window though. I'm > assuming I'll get this through the tip timer tree.. From a quick look at the source, powertop uses this file, and appears to handle it gracefully if it fails to open, but it will lose functionality to determine if a timer is deferred. Dave
Hi, Don't forget to send to linux-api@vger.kernel.org Le mercredi 08 février 2017 à 11:26 -0800, Kees Cook a écrit : > Currently CONFIG_TIMER_STATS exposes process information across > namespaces: > > kernel/time/timer_list.c print_timer(): > > SEQ_printf(m, ", %s/%d", tmp, timer->start_pid); > > /proc/timer_list: > > #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, > cron/2570 > > Given that the tracer can give the same information, this patch > entirely > removes CONFIG_TIMER_STATS. > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Kees Cook <keescook@chromium.org> > Acked-by: John Stultz <john.stultz@linaro.org> > --- > v2: > - dropped doc comments for removed structure elements; thx 0-day > builder. > --- > Documentation/timers/timer_stats.txt | 73 ------ > include/linux/hrtimer.h | 11 - > include/linux/timer.h | 45 ---- > kernel/kthread.c | 1 - > kernel/time/Makefile | 1 - > kernel/time/hrtimer.c | 38 ---- > kernel/time/timer.c | 48 +--- > kernel/time/timer_list.c | 10 - > kernel/time/timer_stats.c | 425 ----------------------- > ------------ > kernel/workqueue.c | 2 - > lib/Kconfig.debug | 14 -- > 11 files changed, 2 insertions(+), 666 deletions(-) > delete mode 100644 Documentation/timers/timer_stats.txt > delete mode 100644 kernel/time/timer_stats.c > > diff --git a/Documentation/timers/timer_stats.txt > b/Documentation/timers/timer_stats.txt > deleted file mode 100644 > index de835ee97455..000000000000 > --- a/Documentation/timers/timer_stats.txt > +++ /dev/null > @@ -1,73 +0,0 @@ > -timer_stats - timer usage statistics > ------------------------------------- > - > -timer_stats is a debugging facility to make the timer (ab)usage in a > Linux > -system visible to kernel and userspace developers. If enabled in the > config > -but not used it has almost zero runtime overhead, and a relatively > small > -data structure overhead. Even if collection is enabled runtime all > the > -locking is per-CPU and lookup is hashed. > - > -timer_stats should be used by kernel and userspace developers to > verify that > -their code does not make unduly use of timers. This helps to avoid > unnecessary > -wakeups, which should be avoided to optimize power consumption. > - > -It can be enabled by CONFIG_TIMER_STATS in the "Kernel hacking" > configuration > -section. > - > -timer_stats collects information about the timer events which are > fired in a > -Linux system over a sample period: > - > -- the pid of the task(process) which initialized the timer > -- the name of the process which initialized the timer > -- the function where the timer was initialized > -- the callback function which is associated to the timer > -- the number of events (callbacks) > - > -timer_stats adds an entry to /proc: /proc/timer_stats > - > -This entry is used to control the statistics functionality and to > read out the > -sampled information. > - > -The timer_stats functionality is inactive on bootup. > - > -To activate a sample period issue: > -# echo 1 >/proc/timer_stats > - > -To stop a sample period issue: > -# echo 0 >/proc/timer_stats > - > -The statistics can be retrieved by: > -# cat /proc/timer_stats > - > -While sampling is enabled, each readout from /proc/timer_stats will > see > -newly updated statistics. Once sampling is disabled, the sampled > information > -is kept until a new sample period is started. This allows multiple > readouts. > - > -Sample output of /proc/timer_stats: > - > -Timerstats sample period: 3.888770 s > - 12, 0 swapper hrtimer_stop_sched_tick > (hrtimer_sched_tick) > - 15, 1 swapper hcd_submit_urb (rh_timer_func) > - 4, 959 kedac schedule_timeout (process_timeout) > - 1, 0 swapper page_writeback_init (wb_timer_fn) > - 28, 0 swapper hrtimer_stop_sched_tick > (hrtimer_sched_tick) > - 22, 2948 IRQ 4 tty_flip_buffer_push > (delayed_work_timer_fn) > - 3, 3100 bash schedule_timeout (process_timeout) > - 1, 1 swapper queue_delayed_work_on > (delayed_work_timer_fn) > - 1, 1 swapper queue_delayed_work_on > (delayed_work_timer_fn) > - 1, 1 swapper neigh_table_init_no_netlink > (neigh_periodic_timer) > - 1, 2292 ip __netdev_watchdog_up (dev_watchdog) > - 1, 23 events/1 do_cache_clean (delayed_work_timer_fn) > -90 total events, 30.0 events/sec > - > -The first column is the number of events, the second column the pid, > the third > -column is the name of the process. The forth column shows the > function which > -initialized the timer and in parenthesis the callback function which > was > -executed on expiry. > - > - Thomas, Ingo > - > -Added flag to indicate 'deferrable timer' in /proc/timer_stats. A > deferrable > -timer will appear as follows > - 10D, 1 swapper queue_delayed_work_on > (delayed_work_timer_fn) > - > diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h > index cdab81ba29f8..e52b427223ba 100644 > --- a/include/linux/hrtimer.h > +++ b/include/linux/hrtimer.h > @@ -88,12 +88,6 @@ enum hrtimer_restart { > * @base: pointer to the timer base (per cpu and per clock) > * @state: state information (See bit values above) > * @is_rel: Set if the timer was armed relative > - * @start_pid: timer statistics field to store the pid of the task > which > - * started the timer > - * @start_site: timer statistics field to store the site > where the timer > - * was started > - * @start_comm: timer statistics field to store the name of the > process which > - * started the timer > * > * The hrtimer structure must be initialized by hrtimer_init() > */ > @@ -104,11 +98,6 @@ struct hrtimer { > struct hrtimer_clock_base *base; > u8 state; > u8 is_rel; > -#ifdef CONFIG_TIMER_STATS > - int start_pid; > - void *start_site; > - char start_comm[16]; > -#endif > }; > > /** > diff --git a/include/linux/timer.h b/include/linux/timer.h > index 51d601f192d4..5a209b84fd9e 100644 > --- a/include/linux/timer.h > +++ b/include/linux/timer.h > @@ -20,11 +20,6 @@ struct timer_list { > unsigned long data; > u32 flags; > > -#ifdef CONFIG_TIMER_STATS > - int start_pid; > - void *start_site; > - char start_comm[16]; > -#endif > #ifdef CONFIG_LOCKDEP > struct lockdep_map lockdep_map; > #endif > @@ -197,46 +192,6 @@ extern int mod_timer_pending(struct timer_list > *timer, unsigned long expires); > */ > #define NEXT_TIMER_MAX_DELTA ((1UL << 30) - 1) > > -/* > - * Timer-statistics info: > - */ > -#ifdef CONFIG_TIMER_STATS > - > -extern int timer_stats_active; > - > -extern void init_timer_stats(void); > - > -extern void timer_stats_update_stats(void *timer, pid_t pid, void > *startf, > - void *timerf, char *comm, u32 > flags); > - > -extern void __timer_stats_timer_set_start_info(struct timer_list > *timer, > - void *addr); > - > -static inline void timer_stats_timer_set_start_info(struct > timer_list *timer) > -{ > - if (likely(!timer_stats_active)) > - return; > - __timer_stats_timer_set_start_info(timer, > __builtin_return_address(0)); > -} > - > -static inline void timer_stats_timer_clear_start_info(struct > timer_list *timer) > -{ > - timer->start_site = NULL; > -} > -#else > -static inline void init_timer_stats(void) > -{ > -} > - > -static inline void timer_stats_timer_set_start_info(struct > timer_list *timer) > -{ > -} > - > -static inline void timer_stats_timer_clear_start_info(struct > timer_list *timer) > -{ > -} > -#endif > - > extern void add_timer(struct timer_list *timer); > > extern int try_to_del_timer_sync(struct timer_list *timer); > diff --git a/kernel/kthread.c b/kernel/kthread.c > index 2318fba86277..8461a4372e8a 100644 > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -850,7 +850,6 @@ void __kthread_queue_delayed_work(struct > kthread_worker *worker, > > list_add(&work->node, &worker->delayed_work_list); > work->worker = worker; > - timer_stats_timer_set_start_info(&dwork->timer); > timer->expires = jiffies + delay; > add_timer(timer); > } > diff --git a/kernel/time/Makefile b/kernel/time/Makefile > index 976840d29a71..938dbf33ef49 100644 > --- a/kernel/time/Makefile > +++ b/kernel/time/Makefile > @@ -15,6 +15,5 @@ ifeq ($(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST),y) > endif > obj-$(CONFIG_GENERIC_SCHED_CLOCK) += sched_clock.o > obj-$(CONFIG_TICK_ONESHOT) += tick-oneshot.o > tick-sched.o > -obj-$(CONFIG_TIMER_STATS) += timer_stats.o > obj-$(CONFIG_DEBUG_FS) += > timekeeping_debug.o > obj-$(CONFIG_TEST_UDELAY) += test_udelay.o > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c > index c6ecedd3b839..edabde646e58 100644 > --- a/kernel/time/hrtimer.c > +++ b/kernel/time/hrtimer.c > @@ -766,34 +766,6 @@ void hrtimers_resume(void) > clock_was_set_delayed(); > } > > -static inline void timer_stats_hrtimer_set_start_info(struct hrtimer > *timer) > -{ > -#ifdef CONFIG_TIMER_STATS > - if (timer->start_site) > - return; > - timer->start_site = __builtin_return_address(0); > - memcpy(timer->start_comm, current->comm, TASK_COMM_LEN); > - timer->start_pid = current->pid; > -#endif > -} > - > -static inline void timer_stats_hrtimer_clear_start_info(struct > hrtimer *timer) > -{ > -#ifdef CONFIG_TIMER_STATS > - timer->start_site = NULL; > -#endif > -} > - > -static inline void timer_stats_account_hrtimer(struct hrtimer > *timer) > -{ > -#ifdef CONFIG_TIMER_STATS > - if (likely(!timer_stats_active)) > - return; > - timer_stats_update_stats(timer, timer->start_pid, timer- > >start_site, > - timer->function, timer->start_comm, > 0); > -#endif > -} > - > /* > * Counterpart to lock_hrtimer_base above: > */ > @@ -932,7 +904,6 @@ remove_hrtimer(struct hrtimer *timer, struct > hrtimer_clock_base *base, bool rest > * rare case and less expensive than a smp call. > */ > debug_deactivate(timer); > - timer_stats_hrtimer_clear_start_info(timer); > reprogram = base->cpu_base == > this_cpu_ptr(&hrtimer_bases); > > if (!restart) > @@ -990,8 +961,6 @@ void hrtimer_start_range_ns(struct hrtimer > *timer, ktime_t tim, > /* Switch the timer base, if necessary: */ > new_base = switch_hrtimer_base(timer, base, mode & > HRTIMER_MODE_PINNED); > > - timer_stats_hrtimer_set_start_info(timer); > - > leftmost = enqueue_hrtimer(timer, new_base); > if (!leftmost) > goto unlock; > @@ -1128,12 +1097,6 @@ static void __hrtimer_init(struct hrtimer > *timer, clockid_t clock_id, > base = hrtimer_clockid_to_base(clock_id); > timer->base = &cpu_base->clock_base[base]; > timerqueue_init(&timer->node); > - > -#ifdef CONFIG_TIMER_STATS > - timer->start_site = NULL; > - timer->start_pid = -1; > - memset(timer->start_comm, 0, TASK_COMM_LEN); > -#endif > } > > /** > @@ -1217,7 +1180,6 @@ static void __run_hrtimer(struct > hrtimer_cpu_base *cpu_base, > raw_write_seqcount_barrier(&cpu_base->seq); > > __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE, 0); > - timer_stats_account_hrtimer(timer); > fn = timer->function; > > /* > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > index ec33a6933eae..82a6bfa0c307 100644 > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -571,38 +571,6 @@ internal_add_timer(struct timer_base *base, > struct timer_list *timer) > trigger_dyntick_cpu(base, timer); > } > > -#ifdef CONFIG_TIMER_STATS > -void __timer_stats_timer_set_start_info(struct timer_list *timer, > void *addr) > -{ > - if (timer->start_site) > - return; > - > - timer->start_site = addr; > - memcpy(timer->start_comm, current->comm, TASK_COMM_LEN); > - timer->start_pid = current->pid; > -} > - > -static void timer_stats_account_timer(struct timer_list *timer) > -{ > - void *site; > - > - /* > - * start_site can be concurrently reset by > - * timer_stats_timer_clear_start_info() > - */ > - site = READ_ONCE(timer->start_site); > - if (likely(!site)) > - return; > - > - timer_stats_update_stats(timer, timer->start_pid, site, > - timer->function, timer->start_comm, > - timer->flags); > -} > - > -#else > -static void timer_stats_account_timer(struct timer_list *timer) {} > -#endif > - > #ifdef CONFIG_DEBUG_OBJECTS_TIMERS > > static struct debug_obj_descr timer_debug_descr; > @@ -789,11 +757,6 @@ static void do_init_timer(struct timer_list > *timer, unsigned int flags, > { > timer->entry.pprev = NULL; > timer->flags = flags | raw_smp_processor_id(); > -#ifdef CONFIG_TIMER_STATS > - timer->start_site = NULL; > - timer->start_pid = -1; > - memset(timer->start_comm, 0, TASK_COMM_LEN); > -#endif > lockdep_init_map(&timer->lockdep_map, name, key, 0); > } > > @@ -1001,8 +964,6 @@ __mod_timer(struct timer_list *timer, unsigned > long expires, bool pending_only) > base = lock_timer_base(timer, &flags); > } > > - timer_stats_timer_set_start_info(timer); > - > ret = detach_if_pending(timer, base, false); > if (!ret && pending_only) > goto out_unlock; > @@ -1130,7 +1091,6 @@ void add_timer_on(struct timer_list *timer, int > cpu) > struct timer_base *new_base, *base; > unsigned long flags; > > - timer_stats_timer_set_start_info(timer); > BUG_ON(timer_pending(timer) || !timer->function); > > new_base = get_timer_cpu_base(timer->flags, cpu); > @@ -1176,7 +1136,6 @@ int del_timer(struct timer_list *timer) > > debug_assert_init(timer); > > - timer_stats_timer_clear_start_info(timer); > if (timer_pending(timer)) { > base = lock_timer_base(timer, &flags); > ret = detach_if_pending(timer, base, true); > @@ -1204,10 +1163,9 @@ int try_to_del_timer_sync(struct timer_list > *timer) > > base = lock_timer_base(timer, &flags); > > - if (base->running_timer != timer) { > - timer_stats_timer_clear_start_info(timer); > + if (base->running_timer != timer) > ret = detach_if_pending(timer, base, true); > - } > + > spin_unlock_irqrestore(&base->lock, flags); > > return ret; > @@ -1331,7 +1289,6 @@ static void expire_timers(struct timer_base > *base, struct hlist_head *head) > unsigned long data; > > timer = hlist_entry(head->first, struct timer_list, > entry); > - timer_stats_account_timer(timer); > > base->running_timer = timer; > detach_timer(timer, true); > @@ -1868,7 +1825,6 @@ static void __init init_timer_cpus(void) > void __init init_timers(void) > { > init_timer_cpus(); > - init_timer_stats(); > open_softirq(TIMER_SOFTIRQ, run_timer_softirq); > } > > diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c > index afe6cd1944fc..387a3a5aa388 100644 > --- a/kernel/time/timer_list.c > +++ b/kernel/time/timer_list.c > @@ -62,21 +62,11 @@ static void > print_timer(struct seq_file *m, struct hrtimer *taddr, struct > hrtimer *timer, > int idx, u64 now) > { > -#ifdef CONFIG_TIMER_STATS > - char tmp[TASK_COMM_LEN + 1]; > -#endif > SEQ_printf(m, " #%d: ", idx); > print_name_offset(m, taddr); > SEQ_printf(m, ", "); > print_name_offset(m, timer->function); > SEQ_printf(m, ", S:%02x", timer->state); > -#ifdef CONFIG_TIMER_STATS > - SEQ_printf(m, ", "); > - print_name_offset(m, timer->start_site); > - memcpy(tmp, timer->start_comm, TASK_COMM_LEN); > - tmp[TASK_COMM_LEN] = 0; > - SEQ_printf(m, ", %s/%d", tmp, timer->start_pid); > -#endif > SEQ_printf(m, "\n"); > SEQ_printf(m, " # expires at %Lu-%Lu nsecs [in %Ld to %Ld > nsecs]\n", > (unsigned long > long)ktime_to_ns(hrtimer_get_softexpires(timer)), > diff --git a/kernel/time/timer_stats.c b/kernel/time/timer_stats.c > deleted file mode 100644 > index afddded947df..000000000000 > --- a/kernel/time/timer_stats.c > +++ /dev/null > @@ -1,425 +0,0 @@ > -/* > - * kernel/time/timer_stats.c > - * > - * Collect timer usage statistics. > - * > - * Copyright(C) 2006, Red Hat, Inc., Ingo Molnar > - * Copyright(C) 2006 Timesys Corp., Thomas Gleixner <tglx@timesys.co > m> > - * > - * timer_stats is based on timer_top, a similar functionality which > was part of > - * Con Kolivas dyntick patch set. It was developed by Daniel Petrini > at the > - * Instituto Nokia de Tecnologia - INdT - Manaus. timer_top's design > was based > - * on dynamic allocation of the statistics entries and linear search > based > - * lookup combined with a global lock, rather than the static array, > hash > - * and per-CPU locking which is used by timer_stats. It was written > for the > - * pre hrtimer kernel code and therefore did not take hrtimers into > account. > - * Nevertheless it provided the base for the timer_stats > implementation and > - * was a helpful source of inspiration. Kudos to Daniel and the > Nokia folks > - * for this effort. > - * > - * timer_top.c is > - * Copyright (C) 2005 Instituto Nokia de Tecnologia - INdT - > Manaus > - * Written by Daniel Petrini <d.pensator@gmail.com> > - * timer_top.c was released under the GNU General Public > License version 2 > - * > - * We export the addresses and counting of timer functions being > called, > - * the pid and cmdline from the owner process if applicable. > - * > - * Start/stop data collection: > - * # echo [1|0] >/proc/timer_stats > - * > - * Display the information collected so far: > - * # cat /proc/timer_stats > - * > - * This program is free software; you can redistribute it and/or > modify > - * it under the terms of the GNU General Public License version 2 as > - * published by the Free Software Foundation. > - */ > - > -#include <linux/proc_fs.h> > -#include <linux/module.h> > -#include <linux/spinlock.h> > -#include <linux/sched.h> > -#include <linux/seq_file.h> > -#include <linux/kallsyms.h> > - > -#include <linux/uaccess.h> > - > -/* > - * This is our basic unit of interest: a timer expiry event > identified > - * by the timer, its start/expire functions and the PID of the task > that > - * started the timer. We count the number of times an event happens: > - */ > -struct entry { > - /* > - * Hash list: > - */ > - struct entry *next; > - > - /* > - * Hash keys: > - */ > - void *timer; > - void *start_func; > - void *expire_func; > - pid_t pid; > - > - /* > - * Number of timeout events: > - */ > - unsigned long count; > - u32 flags; > - > - /* > - * We save the command-line string to preserve > - * this information past task exit: > - */ > - char comm[TASK_COMM_LEN + 1]; > - > -} ____cacheline_aligned_in_smp; > - > -/* > - * Spinlock protecting the tables - not taken during lookup: > - */ > -static DEFINE_RAW_SPINLOCK(table_lock); > - > -/* > - * Per-CPU lookup locks for fast hash lookup: > - */ > -static DEFINE_PER_CPU(raw_spinlock_t, tstats_lookup_lock); > - > -/* > - * Mutex to serialize state changes with show-stats activities: > - */ > -static DEFINE_MUTEX(show_mutex); > - > -/* > - * Collection status, active/inactive: > - */ > -int __read_mostly timer_stats_active; > - > -/* > - * Beginning/end timestamps of measurement: > - */ > -static ktime_t time_start, time_stop; > - > -/* > - * tstat entry structs only get allocated while collection is > - * active and never freed during that time - this simplifies > - * things quite a bit. > - * > - * They get freed when a new collection period is started. > - */ > -#define MAX_ENTRIES_BITS 10 > -#define MAX_ENTRIES (1UL << MAX_ENTRIES_BITS) > - > -static unsigned long nr_entries; > -static struct entry entries[MAX_ENTRIES]; > - > -static atomic_t overflow_count; > - > -/* > - * The entries are in a hash-table, for fast lookup: > - */ > -#define TSTAT_HASH_BITS (MAX_ENTRIES_BITS - 1) > -#define TSTAT_HASH_SIZE (1UL << TSTAT_HASH_BITS) > -#define TSTAT_HASH_MASK (TSTAT_HASH_SIZE - 1) > - > -#define __tstat_hashfn(entry) > \ > - (((unsigned long)(entry)->timer ^ > \ > - (unsigned long)(entry)->start_func ^ > \ > - (unsigned long)(entry)->expire_func ^ > \ > - (unsigned long)(entry)->pid ) & > TSTAT_HASH_MASK) > - > -#define tstat_hashentry(entry) (tstat_hash_table + > __tstat_hashfn(entry)) > - > -static struct entry *tstat_hash_table[TSTAT_HASH_SIZE] > __read_mostly; > - > -static void reset_entries(void) > -{ > - nr_entries = 0; > - memset(entries, 0, sizeof(entries)); > - memset(tstat_hash_table, 0, sizeof(tstat_hash_table)); > - atomic_set(&overflow_count, 0); > -} > - > -static struct entry *alloc_entry(void) > -{ > - if (nr_entries >= MAX_ENTRIES) > - return NULL; > - > - return entries + nr_entries++; > -} > - > -static int match_entries(struct entry *entry1, struct entry *entry2) > -{ > - return entry1->timer == entry2->timer && > - entry1->start_func == entry2->start_func && > - entry1->expire_func == entry2->expire_func && > - entry1->pid == entry2->pid; > -} > - > -/* > - * Look up whether an entry matching this item is present > - * in the hash already. Must be called with irqs off and the > - * lookup lock held: > - */ > -static struct entry *tstat_lookup(struct entry *entry, char *comm) > -{ > - struct entry **head, *curr, *prev; > - > - head = tstat_hashentry(entry); > - curr = *head; > - > - /* > - * The fastpath is when the entry is already hashed, > - * we do this with the lookup lock held, but with the > - * table lock not held: > - */ > - while (curr) { > - if (match_entries(curr, entry)) > - return curr; > - > - curr = curr->next; > - } > - /* > - * Slowpath: allocate, set up and link a new hash entry: > - */ > - prev = NULL; > - curr = *head; > - > - raw_spin_lock(&table_lock); > - /* > - * Make sure we have not raced with another CPU: > - */ > - while (curr) { > - if (match_entries(curr, entry)) > - goto out_unlock; > - > - prev = curr; > - curr = curr->next; > - } > - > - curr = alloc_entry(); > - if (curr) { > - *curr = *entry; > - curr->count = 0; > - curr->next = NULL; > - memcpy(curr->comm, comm, TASK_COMM_LEN); > - > - smp_mb(); /* Ensure that curr is initialized before > insert */ > - > - if (prev) > - prev->next = curr; > - else > - *head = curr; > - } > - out_unlock: > - raw_spin_unlock(&table_lock); > - > - return curr; > -} > - > -/** > - * timer_stats_update_stats - Update the statistics for a timer. > - * @timer: pointer to either a timer_list or a hrtimer > - * @pid: the pid of the task which set up the timer > - * @startf: pointer to the function which did the timer setup > - * @timerf: pointer to the timer callback function of the > timer > - * @comm: name of the process which set up the timer > - * @tflags: The flags field of the timer > - * > - * When the timer is already registered, then the event counter is > - * incremented. Otherwise the timer is registered in a free slot. > - */ > -void timer_stats_update_stats(void *timer, pid_t pid, void *startf, > - void *timerf, char *comm, u32 tflags) > -{ > - /* > - * It doesn't matter which lock we take: > - */ > - raw_spinlock_t *lock; > - struct entry *entry, input; > - unsigned long flags; > - > - if (likely(!timer_stats_active)) > - return; > - > - lock = &per_cpu(tstats_lookup_lock, raw_smp_processor_id()); > - > - input.timer = timer; > - input.start_func = startf; > - input.expire_func = timerf; > - input.pid = pid; > - input.flags = tflags; > - > - raw_spin_lock_irqsave(lock, flags); > - if (!timer_stats_active) > - goto out_unlock; > - > - entry = tstat_lookup(&input, comm); > - if (likely(entry)) > - entry->count++; > - else > - atomic_inc(&overflow_count); > - > - out_unlock: > - raw_spin_unlock_irqrestore(lock, flags); > -} > - > -static void print_name_offset(struct seq_file *m, unsigned long > addr) > -{ > - char symname[KSYM_NAME_LEN]; > - > - if (lookup_symbol_name(addr, symname) < 0) > - seq_printf(m, "<%p>", (void *)addr); > - else > - seq_printf(m, "%s", symname); > -} > - > -static int tstats_show(struct seq_file *m, void *v) > -{ > - struct timespec64 period; > - struct entry *entry; > - unsigned long ms; > - long events = 0; > - ktime_t time; > - int i; > - > - mutex_lock(&show_mutex); > - /* > - * If still active then calculate up to now: > - */ > - if (timer_stats_active) > - time_stop = ktime_get(); > - > - time = ktime_sub(time_stop, time_start); > - > - period = ktime_to_timespec64(time); > - ms = period.tv_nsec / 1000000; > - > - seq_puts(m, "Timer Stats Version: v0.3\n"); > - seq_printf(m, "Sample period: %ld.%03ld s\n", > (long)period.tv_sec, ms); > - if (atomic_read(&overflow_count)) > - seq_printf(m, "Overflow: %d entries\n", > atomic_read(&overflow_count)); > - seq_printf(m, "Collection: %s\n", timer_stats_active ? > "active" : "inactive"); > - > - for (i = 0; i < nr_entries; i++) { > - entry = entries + i; > - if (entry->flags & TIMER_DEFERRABLE) { > - seq_printf(m, "%4luD, %5d %-16s ", > - entry->count, entry->pid, entry- > >comm); > - } else { > - seq_printf(m, " %4lu, %5d %-16s ", > - entry->count, entry->pid, entry- > >comm); > - } > - > - print_name_offset(m, (unsigned long)entry- > >start_func); > - seq_puts(m, " ("); > - print_name_offset(m, (unsigned long)entry- > >expire_func); > - seq_puts(m, ")\n"); > - > - events += entry->count; > - } > - > - ms += period.tv_sec * 1000; > - if (!ms) > - ms = 1; > - > - if (events && period.tv_sec) > - seq_printf(m, "%ld total events, %ld.%03ld > events/sec\n", > - events, events * 1000 / ms, > - (events * 1000000 / ms) % 1000); > - else > - seq_printf(m, "%ld total events\n", events); > - > - mutex_unlock(&show_mutex); > - > - return 0; > -} > - > -/* > - * After a state change, make sure all concurrent lookup/update > - * activities have stopped: > - */ > -static void sync_access(void) > -{ > - unsigned long flags; > - int cpu; > - > - for_each_online_cpu(cpu) { > - raw_spinlock_t *lock = &per_cpu(tstats_lookup_lock, > cpu); > - > - raw_spin_lock_irqsave(lock, flags); > - /* nothing */ > - raw_spin_unlock_irqrestore(lock, flags); > - } > -} > - > -static ssize_t tstats_write(struct file *file, const char __user > *buf, > - size_t count, loff_t *offs) > -{ > - char ctl[2]; > - > - if (count != 2 || *offs) > - return -EINVAL; > - > - if (copy_from_user(ctl, buf, count)) > - return -EFAULT; > - > - mutex_lock(&show_mutex); > - switch (ctl[0]) { > - case '0': > - if (timer_stats_active) { > - timer_stats_active = 0; > - time_stop = ktime_get(); > - sync_access(); > - } > - break; > - case '1': > - if (!timer_stats_active) { > - reset_entries(); > - time_start = ktime_get(); > - smp_mb(); > - timer_stats_active = 1; > - } > - break; > - default: > - count = -EINVAL; > - } > - mutex_unlock(&show_mutex); > - > - return count; > -} > - > -static int tstats_open(struct inode *inode, struct file *filp) > -{ > - return single_open(filp, tstats_show, NULL); > -} > - > -static const struct file_operations tstats_fops = { > - .open = tstats_open, > - .read = seq_read, > - .write = tstats_write, > - .llseek = seq_lseek, > - .release = single_release, > -}; > - > -void __init init_timer_stats(void) > -{ > - int cpu; > - > - for_each_possible_cpu(cpu) > - raw_spin_lock_init(&per_cpu(tstats_lookup_lock, > cpu)); > -} > - > -static int __init init_tstats_procfs(void) > -{ > - struct proc_dir_entry *pe; > - > - pe = proc_create("timer_stats", 0644, NULL, &tstats_fops); > - if (!pe) > - return -ENOMEM; > - return 0; > -} > -__initcall(init_tstats_procfs); > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 1d9fb6543a66..072cbc9b175d 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1523,8 +1523,6 @@ static void __queue_delayed_work(int cpu, > struct workqueue_struct *wq, > return; > } > > - timer_stats_timer_set_start_info(&dwork->timer); > - > dwork->wq = wq; > dwork->cpu = cpu; > timer->expires = jiffies + delay; > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index eb9e9a7870fa..132af338d6dd 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -980,20 +980,6 @@ config DEBUG_TIMEKEEPING > > If unsure, say N. > > -config TIMER_STATS > - bool "Collect kernel timers statistics" > - depends on DEBUG_KERNEL && PROC_FS > - help > - If you say Y here, additional code will be inserted into > the > - timer routines to collect statistics about kernel timers > being > - reprogrammed. The statistics can be read from > /proc/timer_stats. > - The statistics collection is started by writing 1 to > /proc/timer_stats, > - writing 0 stops it. This feature is useful to collect > information > - about timer usage patterns in kernel and userspace. This > feature > - is lightweight if enabled in the kernel config but not > activated > - (it defaults to deactivated on bootup and will only be > activated > - if some application like powertop activates it > explicitly). > - > config DEBUG_PREEMPT > bool "Debug preemptible kernel" > depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT > -- > 2.7.4 > >
On Wed, 8 Feb 2017, Dave Jones wrote: > On Wed, Feb 08, 2017 at 11:54:30AM -0800, Linus Torvalds wrote: > > On Wed, Feb 8, 2017 at 11:26 AM, Kees Cook <keescook@chromium.org> wrote: > > > > > > Given that the tracer can give the same information, this patch entirely > > > removes CONFIG_TIMER_STATS. > > > > > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > > > Signed-off-by: Kees Cook <keescook@chromium.org> > > > Acked-by: John Stultz <john.stultz@linaro.org> > > > > Looks good to me. Let's wait for the 4.11 merge window though. I'm > > assuming I'll get this through the tip timer tree.. > > >From a quick look at the source, powertop uses this file, and appears to > handle it gracefully if it fails to open, but it will lose functionality > to determine if a timer is deferred. The lookup happens when evaluating a timer_expire_entry tracing event. With current kernels the same information can be retrieved from the timer_start tracing event (flags field). And that's way more sensible than scanning timer_stats because when the trace is evaluated there is no guarantee at all that the timer is still queued and exposed there. That should be trivial to fix in powertop. Thanks, tglx
diff --git a/Documentation/timers/timer_stats.txt b/Documentation/timers/timer_stats.txt deleted file mode 100644 index de835ee97455..000000000000 --- a/Documentation/timers/timer_stats.txt +++ /dev/null @@ -1,73 +0,0 @@ -timer_stats - timer usage statistics ------------------------------------- - -timer_stats is a debugging facility to make the timer (ab)usage in a Linux -system visible to kernel and userspace developers. If enabled in the config -but not used it has almost zero runtime overhead, and a relatively small -data structure overhead. Even if collection is enabled runtime all the -locking is per-CPU and lookup is hashed. - -timer_stats should be used by kernel and userspace developers to verify that -their code does not make unduly use of timers. This helps to avoid unnecessary -wakeups, which should be avoided to optimize power consumption. - -It can be enabled by CONFIG_TIMER_STATS in the "Kernel hacking" configuration -section. - -timer_stats collects information about the timer events which are fired in a -Linux system over a sample period: - -- the pid of the task(process) which initialized the timer -- the name of the process which initialized the timer -- the function where the timer was initialized -- the callback function which is associated to the timer -- the number of events (callbacks) - -timer_stats adds an entry to /proc: /proc/timer_stats - -This entry is used to control the statistics functionality and to read out the -sampled information. - -The timer_stats functionality is inactive on bootup. - -To activate a sample period issue: -# echo 1 >/proc/timer_stats - -To stop a sample period issue: -# echo 0 >/proc/timer_stats - -The statistics can be retrieved by: -# cat /proc/timer_stats - -While sampling is enabled, each readout from /proc/timer_stats will see -newly updated statistics. Once sampling is disabled, the sampled information -is kept until a new sample period is started. This allows multiple readouts. - -Sample output of /proc/timer_stats: - -Timerstats sample period: 3.888770 s - 12, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick) - 15, 1 swapper hcd_submit_urb (rh_timer_func) - 4, 959 kedac schedule_timeout (process_timeout) - 1, 0 swapper page_writeback_init (wb_timer_fn) - 28, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick) - 22, 2948 IRQ 4 tty_flip_buffer_push (delayed_work_timer_fn) - 3, 3100 bash schedule_timeout (process_timeout) - 1, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) - 1, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) - 1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer) - 1, 2292 ip __netdev_watchdog_up (dev_watchdog) - 1, 23 events/1 do_cache_clean (delayed_work_timer_fn) -90 total events, 30.0 events/sec - -The first column is the number of events, the second column the pid, the third -column is the name of the process. The forth column shows the function which -initialized the timer and in parenthesis the callback function which was -executed on expiry. - - Thomas, Ingo - -Added flag to indicate 'deferrable timer' in /proc/timer_stats. A deferrable -timer will appear as follows - 10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) - diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index cdab81ba29f8..e52b427223ba 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -88,12 +88,6 @@ enum hrtimer_restart { * @base: pointer to the timer base (per cpu and per clock) * @state: state information (See bit values above) * @is_rel: Set if the timer was armed relative - * @start_pid: timer statistics field to store the pid of the task which - * started the timer - * @start_site: timer statistics field to store the site where the timer - * was started - * @start_comm: timer statistics field to store the name of the process which - * started the timer * * The hrtimer structure must be initialized by hrtimer_init() */ @@ -104,11 +98,6 @@ struct hrtimer { struct hrtimer_clock_base *base; u8 state; u8 is_rel; -#ifdef CONFIG_TIMER_STATS - int start_pid; - void *start_site; - char start_comm[16]; -#endif }; /** diff --git a/include/linux/timer.h b/include/linux/timer.h index 51d601f192d4..5a209b84fd9e 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -20,11 +20,6 @@ struct timer_list { unsigned long data; u32 flags; -#ifdef CONFIG_TIMER_STATS - int start_pid; - void *start_site; - char start_comm[16]; -#endif #ifdef CONFIG_LOCKDEP struct lockdep_map lockdep_map; #endif @@ -197,46 +192,6 @@ extern int mod_timer_pending(struct timer_list *timer, unsigned long expires); */ #define NEXT_TIMER_MAX_DELTA ((1UL << 30) - 1) -/* - * Timer-statistics info: - */ -#ifdef CONFIG_TIMER_STATS - -extern int timer_stats_active; - -extern void init_timer_stats(void); - -extern void timer_stats_update_stats(void *timer, pid_t pid, void *startf, - void *timerf, char *comm, u32 flags); - -extern void __timer_stats_timer_set_start_info(struct timer_list *timer, - void *addr); - -static inline void timer_stats_timer_set_start_info(struct timer_list *timer) -{ - if (likely(!timer_stats_active)) - return; - __timer_stats_timer_set_start_info(timer, __builtin_return_address(0)); -} - -static inline void timer_stats_timer_clear_start_info(struct timer_list *timer) -{ - timer->start_site = NULL; -} -#else -static inline void init_timer_stats(void) -{ -} - -static inline void timer_stats_timer_set_start_info(struct timer_list *timer) -{ -} - -static inline void timer_stats_timer_clear_start_info(struct timer_list *timer) -{ -} -#endif - extern void add_timer(struct timer_list *timer); extern int try_to_del_timer_sync(struct timer_list *timer); diff --git a/kernel/kthread.c b/kernel/kthread.c index 2318fba86277..8461a4372e8a 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -850,7 +850,6 @@ void __kthread_queue_delayed_work(struct kthread_worker *worker, list_add(&work->node, &worker->delayed_work_list); work->worker = worker; - timer_stats_timer_set_start_info(&dwork->timer); timer->expires = jiffies + delay; add_timer(timer); } diff --git a/kernel/time/Makefile b/kernel/time/Makefile index 976840d29a71..938dbf33ef49 100644 --- a/kernel/time/Makefile +++ b/kernel/time/Makefile @@ -15,6 +15,5 @@ ifeq ($(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST),y) endif obj-$(CONFIG_GENERIC_SCHED_CLOCK) += sched_clock.o obj-$(CONFIG_TICK_ONESHOT) += tick-oneshot.o tick-sched.o -obj-$(CONFIG_TIMER_STATS) += timer_stats.o obj-$(CONFIG_DEBUG_FS) += timekeeping_debug.o obj-$(CONFIG_TEST_UDELAY) += test_udelay.o diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index c6ecedd3b839..edabde646e58 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -766,34 +766,6 @@ void hrtimers_resume(void) clock_was_set_delayed(); } -static inline void timer_stats_hrtimer_set_start_info(struct hrtimer *timer) -{ -#ifdef CONFIG_TIMER_STATS - if (timer->start_site) - return; - timer->start_site = __builtin_return_address(0); - memcpy(timer->start_comm, current->comm, TASK_COMM_LEN); - timer->start_pid = current->pid; -#endif -} - -static inline void timer_stats_hrtimer_clear_start_info(struct hrtimer *timer) -{ -#ifdef CONFIG_TIMER_STATS - timer->start_site = NULL; -#endif -} - -static inline void timer_stats_account_hrtimer(struct hrtimer *timer) -{ -#ifdef CONFIG_TIMER_STATS - if (likely(!timer_stats_active)) - return; - timer_stats_update_stats(timer, timer->start_pid, timer->start_site, - timer->function, timer->start_comm, 0); -#endif -} - /* * Counterpart to lock_hrtimer_base above: */ @@ -932,7 +904,6 @@ remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base, bool rest * rare case and less expensive than a smp call. */ debug_deactivate(timer); - timer_stats_hrtimer_clear_start_info(timer); reprogram = base->cpu_base == this_cpu_ptr(&hrtimer_bases); if (!restart) @@ -990,8 +961,6 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, /* Switch the timer base, if necessary: */ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED); - timer_stats_hrtimer_set_start_info(timer); - leftmost = enqueue_hrtimer(timer, new_base); if (!leftmost) goto unlock; @@ -1128,12 +1097,6 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, base = hrtimer_clockid_to_base(clock_id); timer->base = &cpu_base->clock_base[base]; timerqueue_init(&timer->node); - -#ifdef CONFIG_TIMER_STATS - timer->start_site = NULL; - timer->start_pid = -1; - memset(timer->start_comm, 0, TASK_COMM_LEN); -#endif } /** @@ -1217,7 +1180,6 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, raw_write_seqcount_barrier(&cpu_base->seq); __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE, 0); - timer_stats_account_hrtimer(timer); fn = timer->function; /* diff --git a/kernel/time/timer.c b/kernel/time/timer.c index ec33a6933eae..82a6bfa0c307 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -571,38 +571,6 @@ internal_add_timer(struct timer_base *base, struct timer_list *timer) trigger_dyntick_cpu(base, timer); } -#ifdef CONFIG_TIMER_STATS -void __timer_stats_timer_set_start_info(struct timer_list *timer, void *addr) -{ - if (timer->start_site) - return; - - timer->start_site = addr; - memcpy(timer->start_comm, current->comm, TASK_COMM_LEN); - timer->start_pid = current->pid; -} - -static void timer_stats_account_timer(struct timer_list *timer) -{ - void *site; - - /* - * start_site can be concurrently reset by - * timer_stats_timer_clear_start_info() - */ - site = READ_ONCE(timer->start_site); - if (likely(!site)) - return; - - timer_stats_update_stats(timer, timer->start_pid, site, - timer->function, timer->start_comm, - timer->flags); -} - -#else -static void timer_stats_account_timer(struct timer_list *timer) {} -#endif - #ifdef CONFIG_DEBUG_OBJECTS_TIMERS static struct debug_obj_descr timer_debug_descr; @@ -789,11 +757,6 @@ static void do_init_timer(struct timer_list *timer, unsigned int flags, { timer->entry.pprev = NULL; timer->flags = flags | raw_smp_processor_id(); -#ifdef CONFIG_TIMER_STATS - timer->start_site = NULL; - timer->start_pid = -1; - memset(timer->start_comm, 0, TASK_COMM_LEN); -#endif lockdep_init_map(&timer->lockdep_map, name, key, 0); } @@ -1001,8 +964,6 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) base = lock_timer_base(timer, &flags); } - timer_stats_timer_set_start_info(timer); - ret = detach_if_pending(timer, base, false); if (!ret && pending_only) goto out_unlock; @@ -1130,7 +1091,6 @@ void add_timer_on(struct timer_list *timer, int cpu) struct timer_base *new_base, *base; unsigned long flags; - timer_stats_timer_set_start_info(timer); BUG_ON(timer_pending(timer) || !timer->function); new_base = get_timer_cpu_base(timer->flags, cpu); @@ -1176,7 +1136,6 @@ int del_timer(struct timer_list *timer) debug_assert_init(timer); - timer_stats_timer_clear_start_info(timer); if (timer_pending(timer)) { base = lock_timer_base(timer, &flags); ret = detach_if_pending(timer, base, true); @@ -1204,10 +1163,9 @@ int try_to_del_timer_sync(struct timer_list *timer) base = lock_timer_base(timer, &flags); - if (base->running_timer != timer) { - timer_stats_timer_clear_start_info(timer); + if (base->running_timer != timer) ret = detach_if_pending(timer, base, true); - } + spin_unlock_irqrestore(&base->lock, flags); return ret; @@ -1331,7 +1289,6 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head) unsigned long data; timer = hlist_entry(head->first, struct timer_list, entry); - timer_stats_account_timer(timer); base->running_timer = timer; detach_timer(timer, true); @@ -1868,7 +1825,6 @@ static void __init init_timer_cpus(void) void __init init_timers(void) { init_timer_cpus(); - init_timer_stats(); open_softirq(TIMER_SOFTIRQ, run_timer_softirq); } diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c index afe6cd1944fc..387a3a5aa388 100644 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -62,21 +62,11 @@ static void print_timer(struct seq_file *m, struct hrtimer *taddr, struct hrtimer *timer, int idx, u64 now) { -#ifdef CONFIG_TIMER_STATS - char tmp[TASK_COMM_LEN + 1]; -#endif SEQ_printf(m, " #%d: ", idx); print_name_offset(m, taddr); SEQ_printf(m, ", "); print_name_offset(m, timer->function); SEQ_printf(m, ", S:%02x", timer->state); -#ifdef CONFIG_TIMER_STATS - SEQ_printf(m, ", "); - print_name_offset(m, timer->start_site); - memcpy(tmp, timer->start_comm, TASK_COMM_LEN); - tmp[TASK_COMM_LEN] = 0; - SEQ_printf(m, ", %s/%d", tmp, timer->start_pid); -#endif SEQ_printf(m, "\n"); SEQ_printf(m, " # expires at %Lu-%Lu nsecs [in %Ld to %Ld nsecs]\n", (unsigned long long)ktime_to_ns(hrtimer_get_softexpires(timer)), diff --git a/kernel/time/timer_stats.c b/kernel/time/timer_stats.c deleted file mode 100644 index afddded947df..000000000000 --- a/kernel/time/timer_stats.c +++ /dev/null @@ -1,425 +0,0 @@ -/* - * kernel/time/timer_stats.c - * - * Collect timer usage statistics. - * - * Copyright(C) 2006, Red Hat, Inc., Ingo Molnar - * Copyright(C) 2006 Timesys Corp., Thomas Gleixner <tglx@timesys.com> - * - * timer_stats is based on timer_top, a similar functionality which was part of - * Con Kolivas dyntick patch set. It was developed by Daniel Petrini at the - * Instituto Nokia de Tecnologia - INdT - Manaus. timer_top's design was based - * on dynamic allocation of the statistics entries and linear search based - * lookup combined with a global lock, rather than the static array, hash - * and per-CPU locking which is used by timer_stats. It was written for the - * pre hrtimer kernel code and therefore did not take hrtimers into account. - * Nevertheless it provided the base for the timer_stats implementation and - * was a helpful source of inspiration. Kudos to Daniel and the Nokia folks - * for this effort. - * - * timer_top.c is - * Copyright (C) 2005 Instituto Nokia de Tecnologia - INdT - Manaus - * Written by Daniel Petrini <d.pensator@gmail.com> - * timer_top.c was released under the GNU General Public License version 2 - * - * We export the addresses and counting of timer functions being called, - * the pid and cmdline from the owner process if applicable. - * - * Start/stop data collection: - * # echo [1|0] >/proc/timer_stats - * - * Display the information collected so far: - * # cat /proc/timer_stats - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#include <linux/proc_fs.h> -#include <linux/module.h> -#include <linux/spinlock.h> -#include <linux/sched.h> -#include <linux/seq_file.h> -#include <linux/kallsyms.h> - -#include <linux/uaccess.h> - -/* - * This is our basic unit of interest: a timer expiry event identified - * by the timer, its start/expire functions and the PID of the task that - * started the timer. We count the number of times an event happens: - */ -struct entry { - /* - * Hash list: - */ - struct entry *next; - - /* - * Hash keys: - */ - void *timer; - void *start_func; - void *expire_func; - pid_t pid; - - /* - * Number of timeout events: - */ - unsigned long count; - u32 flags; - - /* - * We save the command-line string to preserve - * this information past task exit: - */ - char comm[TASK_COMM_LEN + 1]; - -} ____cacheline_aligned_in_smp; - -/* - * Spinlock protecting the tables - not taken during lookup: - */ -static DEFINE_RAW_SPINLOCK(table_lock); - -/* - * Per-CPU lookup locks for fast hash lookup: - */ -static DEFINE_PER_CPU(raw_spinlock_t, tstats_lookup_lock); - -/* - * Mutex to serialize state changes with show-stats activities: - */ -static DEFINE_MUTEX(show_mutex); - -/* - * Collection status, active/inactive: - */ -int __read_mostly timer_stats_active; - -/* - * Beginning/end timestamps of measurement: - */ -static ktime_t time_start, time_stop; - -/* - * tstat entry structs only get allocated while collection is - * active and never freed during that time - this simplifies - * things quite a bit. - * - * They get freed when a new collection period is started. - */ -#define MAX_ENTRIES_BITS 10 -#define MAX_ENTRIES (1UL << MAX_ENTRIES_BITS) - -static unsigned long nr_entries; -static struct entry entries[MAX_ENTRIES]; - -static atomic_t overflow_count; - -/* - * The entries are in a hash-table, for fast lookup: - */ -#define TSTAT_HASH_BITS (MAX_ENTRIES_BITS - 1) -#define TSTAT_HASH_SIZE (1UL << TSTAT_HASH_BITS) -#define TSTAT_HASH_MASK (TSTAT_HASH_SIZE - 1) - -#define __tstat_hashfn(entry) \ - (((unsigned long)(entry)->timer ^ \ - (unsigned long)(entry)->start_func ^ \ - (unsigned long)(entry)->expire_func ^ \ - (unsigned long)(entry)->pid ) & TSTAT_HASH_MASK) - -#define tstat_hashentry(entry) (tstat_hash_table + __tstat_hashfn(entry)) - -static struct entry *tstat_hash_table[TSTAT_HASH_SIZE] __read_mostly; - -static void reset_entries(void) -{ - nr_entries = 0; - memset(entries, 0, sizeof(entries)); - memset(tstat_hash_table, 0, sizeof(tstat_hash_table)); - atomic_set(&overflow_count, 0); -} - -static struct entry *alloc_entry(void) -{ - if (nr_entries >= MAX_ENTRIES) - return NULL; - - return entries + nr_entries++; -} - -static int match_entries(struct entry *entry1, struct entry *entry2) -{ - return entry1->timer == entry2->timer && - entry1->start_func == entry2->start_func && - entry1->expire_func == entry2->expire_func && - entry1->pid == entry2->pid; -} - -/* - * Look up whether an entry matching this item is present - * in the hash already. Must be called with irqs off and the - * lookup lock held: - */ -static struct entry *tstat_lookup(struct entry *entry, char *comm) -{ - struct entry **head, *curr, *prev; - - head = tstat_hashentry(entry); - curr = *head; - - /* - * The fastpath is when the entry is already hashed, - * we do this with the lookup lock held, but with the - * table lock not held: - */ - while (curr) { - if (match_entries(curr, entry)) - return curr; - - curr = curr->next; - } - /* - * Slowpath: allocate, set up and link a new hash entry: - */ - prev = NULL; - curr = *head; - - raw_spin_lock(&table_lock); - /* - * Make sure we have not raced with another CPU: - */ - while (curr) { - if (match_entries(curr, entry)) - goto out_unlock; - - prev = curr; - curr = curr->next; - } - - curr = alloc_entry(); - if (curr) { - *curr = *entry; - curr->count = 0; - curr->next = NULL; - memcpy(curr->comm, comm, TASK_COMM_LEN); - - smp_mb(); /* Ensure that curr is initialized before insert */ - - if (prev) - prev->next = curr; - else - *head = curr; - } - out_unlock: - raw_spin_unlock(&table_lock); - - return curr; -} - -/** - * timer_stats_update_stats - Update the statistics for a timer. - * @timer: pointer to either a timer_list or a hrtimer - * @pid: the pid of the task which set up the timer - * @startf: pointer to the function which did the timer setup - * @timerf: pointer to the timer callback function of the timer - * @comm: name of the process which set up the timer - * @tflags: The flags field of the timer - * - * When the timer is already registered, then the event counter is - * incremented. Otherwise the timer is registered in a free slot. - */ -void timer_stats_update_stats(void *timer, pid_t pid, void *startf, - void *timerf, char *comm, u32 tflags) -{ - /* - * It doesn't matter which lock we take: - */ - raw_spinlock_t *lock; - struct entry *entry, input; - unsigned long flags; - - if (likely(!timer_stats_active)) - return; - - lock = &per_cpu(tstats_lookup_lock, raw_smp_processor_id()); - - input.timer = timer; - input.start_func = startf; - input.expire_func = timerf; - input.pid = pid; - input.flags = tflags; - - raw_spin_lock_irqsave(lock, flags); - if (!timer_stats_active) - goto out_unlock; - - entry = tstat_lookup(&input, comm); - if (likely(entry)) - entry->count++; - else - atomic_inc(&overflow_count); - - out_unlock: - raw_spin_unlock_irqrestore(lock, flags); -} - -static void print_name_offset(struct seq_file *m, unsigned long addr) -{ - char symname[KSYM_NAME_LEN]; - - if (lookup_symbol_name(addr, symname) < 0) - seq_printf(m, "<%p>", (void *)addr); - else - seq_printf(m, "%s", symname); -} - -static int tstats_show(struct seq_file *m, void *v) -{ - struct timespec64 period; - struct entry *entry; - unsigned long ms; - long events = 0; - ktime_t time; - int i; - - mutex_lock(&show_mutex); - /* - * If still active then calculate up to now: - */ - if (timer_stats_active) - time_stop = ktime_get(); - - time = ktime_sub(time_stop, time_start); - - period = ktime_to_timespec64(time); - ms = period.tv_nsec / 1000000; - - seq_puts(m, "Timer Stats Version: v0.3\n"); - seq_printf(m, "Sample period: %ld.%03ld s\n", (long)period.tv_sec, ms); - if (atomic_read(&overflow_count)) - seq_printf(m, "Overflow: %d entries\n", atomic_read(&overflow_count)); - seq_printf(m, "Collection: %s\n", timer_stats_active ? "active" : "inactive"); - - for (i = 0; i < nr_entries; i++) { - entry = entries + i; - if (entry->flags & TIMER_DEFERRABLE) { - seq_printf(m, "%4luD, %5d %-16s ", - entry->count, entry->pid, entry->comm); - } else { - seq_printf(m, " %4lu, %5d %-16s ", - entry->count, entry->pid, entry->comm); - } - - print_name_offset(m, (unsigned long)entry->start_func); - seq_puts(m, " ("); - print_name_offset(m, (unsigned long)entry->expire_func); - seq_puts(m, ")\n"); - - events += entry->count; - } - - ms += period.tv_sec * 1000; - if (!ms) - ms = 1; - - if (events && period.tv_sec) - seq_printf(m, "%ld total events, %ld.%03ld events/sec\n", - events, events * 1000 / ms, - (events * 1000000 / ms) % 1000); - else - seq_printf(m, "%ld total events\n", events); - - mutex_unlock(&show_mutex); - - return 0; -} - -/* - * After a state change, make sure all concurrent lookup/update - * activities have stopped: - */ -static void sync_access(void) -{ - unsigned long flags; - int cpu; - - for_each_online_cpu(cpu) { - raw_spinlock_t *lock = &per_cpu(tstats_lookup_lock, cpu); - - raw_spin_lock_irqsave(lock, flags); - /* nothing */ - raw_spin_unlock_irqrestore(lock, flags); - } -} - -static ssize_t tstats_write(struct file *file, const char __user *buf, - size_t count, loff_t *offs) -{ - char ctl[2]; - - if (count != 2 || *offs) - return -EINVAL; - - if (copy_from_user(ctl, buf, count)) - return -EFAULT; - - mutex_lock(&show_mutex); - switch (ctl[0]) { - case '0': - if (timer_stats_active) { - timer_stats_active = 0; - time_stop = ktime_get(); - sync_access(); - } - break; - case '1': - if (!timer_stats_active) { - reset_entries(); - time_start = ktime_get(); - smp_mb(); - timer_stats_active = 1; - } - break; - default: - count = -EINVAL; - } - mutex_unlock(&show_mutex); - - return count; -} - -static int tstats_open(struct inode *inode, struct file *filp) -{ - return single_open(filp, tstats_show, NULL); -} - -static const struct file_operations tstats_fops = { - .open = tstats_open, - .read = seq_read, - .write = tstats_write, - .llseek = seq_lseek, - .release = single_release, -}; - -void __init init_timer_stats(void) -{ - int cpu; - - for_each_possible_cpu(cpu) - raw_spin_lock_init(&per_cpu(tstats_lookup_lock, cpu)); -} - -static int __init init_tstats_procfs(void) -{ - struct proc_dir_entry *pe; - - pe = proc_create("timer_stats", 0644, NULL, &tstats_fops); - if (!pe) - return -ENOMEM; - return 0; -} -__initcall(init_tstats_procfs); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 1d9fb6543a66..072cbc9b175d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1523,8 +1523,6 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, return; } - timer_stats_timer_set_start_info(&dwork->timer); - dwork->wq = wq; dwork->cpu = cpu; timer->expires = jiffies + delay; diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index eb9e9a7870fa..132af338d6dd 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -980,20 +980,6 @@ config DEBUG_TIMEKEEPING If unsure, say N. -config TIMER_STATS - bool "Collect kernel timers statistics" - depends on DEBUG_KERNEL && PROC_FS - help - If you say Y here, additional code will be inserted into the - timer routines to collect statistics about kernel timers being - reprogrammed. The statistics can be read from /proc/timer_stats. - The statistics collection is started by writing 1 to /proc/timer_stats, - writing 0 stops it. This feature is useful to collect information - about timer usage patterns in kernel and userspace. This feature - is lightweight if enabled in the kernel config but not activated - (it defaults to deactivated on bootup and will only be activated - if some application like powertop activates it explicitly). - config DEBUG_PREEMPT bool "Debug preemptible kernel" depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT