Message ID | 20240628065111.59718-2-adrian.hunter@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V8,01/12] perf/core: Add aux_pause, aux_resume, aux_start_paused | expand |
On Fri, Jun 28, 2024 at 09:51:00AM +0300, Adrian Hunter wrote: > Add aux_paused to struct perf_event for AUX area events to keep track of > the "paused" state. aux_paused is initialized to aux_start_paused. > @@ -798,6 +810,9 @@ struct perf_event { > /* for aux_output events */ > struct perf_event *aux_event; > > + /* for AUX area events */ > + unsigned int aux_paused; > + > void (*destroy)(struct perf_event *); > struct rcu_head rcu_head; > Should this not be part of struct hw_perf_event for whatever hw event implements this AUX stuff? In fact, I would expect PERF_HES_PAUSED or something to go in perf_event::hw::state.
On Fri, Jun 28, 2024 at 09:51:00AM +0300, Adrian Hunter wrote: > + union { > + __u32 aux_action; > + struct { > + __u32 aux_start_paused : 1, /* start AUX area tracing paused */ > + aux_pause : 1, /* on overflow, pause AUX area tracing */ > + aux_resume : 1, /* on overflow, resume AUX area tracing */ > + __reserved_3 : 29; > + }; > + }; > @@ -12860,7 +12930,7 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu, > * Grouping is not supported for kernel events, neither is 'AUX', > * make sure the caller's intentions are adjusted. > */ > - if (attr->aux_output) > + if (attr->aux_output || attr->aux_action) > return ERR_PTR(-EINVAL); > AFAICT this is the only usage of aux_action. But in a few patches time you'll introduce a helper along the lines of has_aux_action() that tests all the individual bits. Combined with perf_copy_attr() ensuring __reserved_3 is actually 0, I'm thinking that should all be enough to render this aux_action field surplus to requirement, and we can simplify all this somewhat, no?
On 1/07/24 13:26, Peter Zijlstra wrote: > On Fri, Jun 28, 2024 at 09:51:00AM +0300, Adrian Hunter wrote: > >> Add aux_paused to struct perf_event for AUX area events to keep track of >> the "paused" state. aux_paused is initialized to aux_start_paused. > >> @@ -798,6 +810,9 @@ struct perf_event { >> /* for aux_output events */ >> struct perf_event *aux_event; >> >> + /* for AUX area events */ >> + unsigned int aux_paused; >> + >> void (*destroy)(struct perf_event *); >> struct rcu_head rcu_head; >> > > Should this not be part of struct hw_perf_event for whatever hw event > implements this AUX stuff? > > In fact, I would expect PERF_HES_PAUSED or something to go in > perf_event::hw::state. Sorry for the slow reply due to vacation. There doesn't seem to be anything preventing pause/resume from "racing" (PMI vs task context) with changes in the aux event state, so aux_paused must be separate from perf_event::hw::state. But putting it in perf_event::hw should be fine.
On 1/07/24 13:52, Peter Zijlstra wrote: > On Fri, Jun 28, 2024 at 09:51:00AM +0300, Adrian Hunter wrote: > >> + union { >> + __u32 aux_action; >> + struct { >> + __u32 aux_start_paused : 1, /* start AUX area tracing paused */ >> + aux_pause : 1, /* on overflow, pause AUX area tracing */ >> + aux_resume : 1, /* on overflow, resume AUX area tracing */ >> + __reserved_3 : 29; >> + }; >> + }; > >> @@ -12860,7 +12930,7 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu, >> * Grouping is not supported for kernel events, neither is 'AUX', >> * make sure the caller's intentions are adjusted. >> */ >> - if (attr->aux_output) >> + if (attr->aux_output || attr->aux_action) >> return ERR_PTR(-EINVAL); >> > > AFAICT this is the only usage of aux_action. But in a few patches time > you'll introduce a helper along the lines of has_aux_action() that tests > all the individual bits. > > Combined with perf_copy_attr() ensuring __reserved_3 is actually 0, I'm > thinking that should all be enough to render this aux_action field > surplus to requirement, and we can simplify all this somewhat, no? It is used in tool's patches and will be used more when additional fields are added for Intel PT Trigger Tracing. See the information in the cover letter about that: [PATCH V8 00/12] perf/core: Add ability for an event to "pause" or "resume" AUX area tracing https://lore.kernel.org/lkml/20240628065111.59718-1-adrian.hunter@intel.com/ In general, without aux_action there is no way to pass all the related fields together except by using struct perf_event_attr itself.
On Thu, Jul 11, 2024 at 01:42:33PM +0300, Adrian Hunter wrote: > On 1/07/24 13:26, Peter Zijlstra wrote: > > On Fri, Jun 28, 2024 at 09:51:00AM +0300, Adrian Hunter wrote: > > > >> Add aux_paused to struct perf_event for AUX area events to keep track of > >> the "paused" state. aux_paused is initialized to aux_start_paused. > > > >> @@ -798,6 +810,9 @@ struct perf_event { > >> /* for aux_output events */ > >> struct perf_event *aux_event; > >> > >> + /* for AUX area events */ > >> + unsigned int aux_paused; > >> + > >> void (*destroy)(struct perf_event *); > >> struct rcu_head rcu_head; > >> > > > > Should this not be part of struct hw_perf_event for whatever hw event > > implements this AUX stuff? > > > > In fact, I would expect PERF_HES_PAUSED or something to go in > > perf_event::hw::state. > > Sorry for the slow reply due to vacation. > > There doesn't seem to be anything preventing pause/resume from > "racing" (PMI vs task context) with changes in the aux event state, > so aux_paused must be separate from perf_event::hw::state. So normally, context switching perf would be under pmu_disable(), which should ensure no PMI happens while we're reprogramming the PMU. That said, I think for the aux case we have perf_event_stop() that can race vs PMI, so yeah, bummer. Sticking it in hw_perf_event would be good though. Also perhaps add a comment about exactly why this cannot be in ::state.
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index a5304ae8c654..50dbf8fa1f54 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -291,6 +291,7 @@ struct perf_event_pmu_context; #define PERF_PMU_CAP_NO_EXCLUDE 0x0040 #define PERF_PMU_CAP_AUX_OUTPUT 0x0080 #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100 +#define PERF_PMU_CAP_AUX_PAUSE 0x0200 struct perf_output_handle; @@ -363,6 +364,8 @@ struct pmu { #define PERF_EF_START 0x01 /* start the counter when adding */ #define PERF_EF_RELOAD 0x02 /* reload the counter when starting */ #define PERF_EF_UPDATE 0x04 /* update the counter when stopping */ +#define PERF_EF_PAUSE 0x08 /* AUX area event, pause tracing */ +#define PERF_EF_RESUME 0x10 /* AUX area event, resume tracing */ /* * Adds/Removes a counter to/from the PMU, can be done inside a @@ -402,6 +405,15 @@ struct pmu { * * ->start() with PERF_EF_RELOAD will reprogram the counter * value, must be preceded by a ->stop() with PERF_EF_UPDATE. + * + * ->stop() with PERF_EF_PAUSE will stop as simply as possible. Will not + * overlap another ->stop() with PERF_EF_PAUSE nor ->start() with + * PERF_EF_RESUME. + * + * ->start() with PERF_EF_RESUME will start as simply as possible but + * only if the counter is not otherwise stopped. Will not overlap + * another ->start() with PERF_EF_RESUME nor ->stop() with + * PERF_EF_PAUSE. */ void (*start) (struct perf_event *event, int flags); void (*stop) (struct perf_event *event, int flags); @@ -798,6 +810,9 @@ struct perf_event { /* for aux_output events */ struct perf_event *aux_event; + /* for AUX area events */ + unsigned int aux_paused; + void (*destroy)(struct perf_event *); struct rcu_head rcu_head; diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 3a64499b0f5d..0c557f0a17b3 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -511,7 +511,16 @@ struct perf_event_attr { __u16 sample_max_stack; __u16 __reserved_2; __u32 aux_sample_size; - __u32 __reserved_3; + + union { + __u32 aux_action; + struct { + __u32 aux_start_paused : 1, /* start AUX area tracing paused */ + aux_pause : 1, /* on overflow, pause AUX area tracing */ + aux_resume : 1, /* on overflow, resume AUX area tracing */ + __reserved_3 : 29; + }; + }; /* * User provided data if sigtrap=1, passed back to user via diff --git a/kernel/events/core.c b/kernel/events/core.c index 8f908f077935..a2e63f359aa0 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2097,7 +2097,8 @@ static void perf_put_aux_event(struct perf_event *event) static bool perf_need_aux_event(struct perf_event *event) { - return !!event->attr.aux_output || !!event->attr.aux_sample_size; + return event->attr.aux_output || event->attr.aux_sample_size || + event->attr.aux_pause || event->attr.aux_resume; } static int perf_get_aux_event(struct perf_event *event, @@ -2122,6 +2123,10 @@ static int perf_get_aux_event(struct perf_event *event, !perf_aux_output_match(event, group_leader)) return 0; + if ((event->attr.aux_pause || event->attr.aux_resume) && + !(group_leader->pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) + return 0; + if (event->attr.aux_sample_size && !group_leader->pmu->snapshot_aux) return 0; @@ -7878,6 +7883,51 @@ void perf_prepare_header(struct perf_event_header *header, WARN_ON_ONCE(header->size & 7); } +static void __perf_event_aux_pause(struct perf_event *event, bool pause) +{ + if (pause) { + if (!event->aux_paused) { + event->aux_paused = 1; + event->pmu->stop(event, PERF_EF_PAUSE); + } + } else { + if (event->aux_paused) { + event->aux_paused = 0; + event->pmu->start(event, PERF_EF_RESUME); + } + } +} + +static void perf_event_aux_pause(struct perf_event *event, bool pause) +{ + struct perf_buffer *rb; + unsigned long flags; + + if (WARN_ON_ONCE(!event)) + return; + + rb = ring_buffer_get(event); + if (!rb) + return; + + local_irq_save(flags); + /* + * Guard against NMI, NMI loses here. The critical setion is demarked by + * rb->aux_in_pause_resume == 1. An NMI in the critical section will not + * process a pause/resume. + */ + if (READ_ONCE(rb->aux_in_pause_resume)) + goto out_restore; + WRITE_ONCE(rb->aux_in_pause_resume, 1); + barrier(); + __perf_event_aux_pause(event, pause); + barrier(); + WRITE_ONCE(rb->aux_in_pause_resume, 0); +out_restore: + local_irq_restore(flags); + ring_buffer_put(rb); +} + static __always_inline int __perf_event_output(struct perf_event *event, struct perf_sample_data *data, @@ -7891,6 +7941,9 @@ __perf_event_output(struct perf_event *event, struct perf_event_header header; int err; + if (event->attr.aux_pause) + perf_event_aux_pause(event->aux_event, true); + /* protect the callchain buffers */ rcu_read_lock(); @@ -7907,6 +7960,10 @@ __perf_event_output(struct perf_event *event, exit: rcu_read_unlock(); + + if (event->attr.aux_resume) + perf_event_aux_pause(event->aux_event, false); + return err; } @@ -12060,11 +12117,24 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, } if (event->attr.aux_output && - !(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT)) { + (!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) || + event->attr.aux_pause || event->attr.aux_resume)) { err = -EOPNOTSUPP; goto err_pmu; } + if (event->attr.aux_pause && event->attr.aux_resume) { + err = -EINVAL; + goto err_pmu; + } + + if (event->attr.aux_start_paused && + !(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) { + err = -EOPNOTSUPP; + goto err_pmu; + } + event->aux_paused = event->attr.aux_start_paused; + if (cgroup_fd != -1) { err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader); if (err) @@ -12860,7 +12930,7 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu, * Grouping is not supported for kernel events, neither is 'AUX', * make sure the caller's intentions are adjusted. */ - if (attr->aux_output) + if (attr->aux_output || attr->aux_action) return ERR_PTR(-EINVAL); event = perf_event_alloc(attr, cpu, task, NULL, NULL, diff --git a/kernel/events/internal.h b/kernel/events/internal.h index 5150d5f84c03..3320f78117dc 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -51,6 +51,7 @@ struct perf_buffer { void (*free_aux)(void *); refcount_t aux_refcount; int aux_in_sampling; + int aux_in_pause_resume; void **aux_pages; void *aux_priv;