Message ID | 20231123121851.10826-3-adrian.hunter@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | perf/core: Add ability for an event to "pause" or "resume" AUX area tracing | expand |
On 23/11/2023 12:18, Adrian Hunter wrote: > Prevent tracing to start if aux_paused. > > Implement pause_resume() callback. When aux_paused, stop tracing. When > not aux_paused, only start tracing if it isn't currently meant to be > stopped. > > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > --- > arch/x86/events/intel/pt.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c > index 42a55794004a..aa883b64814a 100644 > --- a/arch/x86/events/intel/pt.c > +++ b/arch/x86/events/intel/pt.c > @@ -418,6 +418,9 @@ static void pt_config_start(struct perf_event *event) > struct pt *pt = this_cpu_ptr(&pt_ctx); > u64 ctl = event->hw.config; > > + if (event->aux_paused) > + return; > + > ctl |= RTIT_CTL_TRACEEN; > if (READ_ONCE(pt->vmx_on)) > perf_aux_output_flag(&pt->handle, PERF_AUX_FLAG_PARTIAL); > @@ -1563,6 +1566,14 @@ EXPORT_SYMBOL_GPL(intel_pt_handle_vmx); > * PMU callbacks > */ > > +static void pt_event_pause_resume(struct perf_event *event) > +{ > + if (event->aux_paused) > + pt_config_stop(event); > + else if (!event->hw.state) > + pt_config_start(event); > +} It seems like having a single pause/resume callback rather than separate pause and resume ones pushes some of the event state management into the individual drivers and would be prone to code duplication and divergent behavior. Would it be possible to move the conditions from here into the core code and call separate functions instead? > + > static void pt_event_start(struct perf_event *event, int mode) > { > struct hw_perf_event *hwc = &event->hw; > @@ -1798,6 +1809,7 @@ static __init int pt_init(void) > pt_pmu.pmu.del = pt_event_del; > pt_pmu.pmu.start = pt_event_start; > pt_pmu.pmu.stop = pt_event_stop; > + pt_pmu.pmu.pause_resume = pt_event_pause_resume; The general idea seems ok to me. Is there a reason to not use the existing start() stop() callbacks, rather than adding a new one? I assume it's intended to be something like an optimisation where you can turn it on and off without having to do the full setup, teardown and emit an AUX record because you know the process being traced never gets switched out? Could you make it so that it works out of the box, with the option of later optimisation if you do something like this (not here but something like this in events/core.c): /* Use specialised pause/resume if it exists, otherwise use more * expensive start/stop. */ if (pmu->pause_resume) pmu->pause_resume(...) else pmu->stop(...) > pt_pmu.pmu.snapshot_aux = pt_event_snapshot_aux; > pt_pmu.pmu.read = pt_event_read; > pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;
On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote: > On 23/11/2023 12:18, Adrian Hunter wrote: > > +static void pt_event_pause_resume(struct perf_event *event) > > +{ > > + if (event->aux_paused) > > + pt_config_stop(event); > > + else if (!event->hw.state) > > + pt_config_start(event); > > +} > > It seems like having a single pause/resume callback rather than separate > pause and resume ones pushes some of the event state management into the > individual drivers and would be prone to code duplication and divergent > behavior. > > Would it be possible to move the conditions from here into the core code > and call separate functions instead? > > > + > > static void pt_event_start(struct perf_event *event, int mode) > > { > > struct hw_perf_event *hwc = &event->hw; > > @@ -1798,6 +1809,7 @@ static __init int pt_init(void) > > pt_pmu.pmu.del = pt_event_del; > > pt_pmu.pmu.start = pt_event_start; > > pt_pmu.pmu.stop = pt_event_stop; > > + pt_pmu.pmu.pause_resume = pt_event_pause_resume; > > The general idea seems ok to me. Is there a reason to not use the > existing start() stop() callbacks, rather than adding a new one? > > I assume it's intended to be something like an optimisation where you > can turn it on and off without having to do the full setup, teardown and > emit an AUX record because you know the process being traced never gets > switched out? So the actual scheduling uses ->add() / ->del(), the ->start() / ->stop() methods are something that can be used after ->add() and before ->del() to 'temporarily' pause things. Pretty much exactly what is required here I think. We currently use this for PMI throttling and adaptive frequency stuff, but there is no reason it could not also be used for this. As is, we don't track the paused state across ->del() / ->add(), but perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_ bits to manage things.
On 29/11/23 12:58, Peter Zijlstra wrote: > On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote: >> On 23/11/2023 12:18, Adrian Hunter wrote: > >>> +static void pt_event_pause_resume(struct perf_event *event) >>> +{ >>> + if (event->aux_paused) >>> + pt_config_stop(event); >>> + else if (!event->hw.state) >>> + pt_config_start(event); >>> +} >> >> It seems like having a single pause/resume callback rather than separate >> pause and resume ones pushes some of the event state management into the >> individual drivers and would be prone to code duplication and divergent >> behavior. >> >> Would it be possible to move the conditions from here into the core code >> and call separate functions instead? >> >>> + >>> static void pt_event_start(struct perf_event *event, int mode) >>> { >>> struct hw_perf_event *hwc = &event->hw; >>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void) >>> pt_pmu.pmu.del = pt_event_del; >>> pt_pmu.pmu.start = pt_event_start; >>> pt_pmu.pmu.stop = pt_event_stop; >>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume; >> >> The general idea seems ok to me. Is there a reason to not use the >> existing start() stop() callbacks, rather than adding a new one? >> >> I assume it's intended to be something like an optimisation where you >> can turn it on and off without having to do the full setup, teardown and >> emit an AUX record because you know the process being traced never gets >> switched out? > > So the actual scheduling uses ->add() / ->del(), the ->start() / > ->stop() methods are something that can be used after ->add() and before > ->del() to 'temporarily' pause things. > > Pretty much exactly what is required here I think. We currently use this > for PMI throttling and adaptive frequency stuff, but there is no reason > it could not also be used for this. > > As is, we don't track the paused state across ->del() / ->add(), but > perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_ > bits to manage things. > > I am not sure stop / start play nice with NMI's from other events e.g. PMC NMI wants to pause or resume AUX but what if AUX event is currently being processed in ->stop() or ->start()? Or maybe that can't happen?
On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote: > On 29/11/23 12:58, Peter Zijlstra wrote: > > On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote: > >> On 23/11/2023 12:18, Adrian Hunter wrote: > > > >>> +static void pt_event_pause_resume(struct perf_event *event) > >>> +{ > >>> + if (event->aux_paused) > >>> + pt_config_stop(event); > >>> + else if (!event->hw.state) > >>> + pt_config_start(event); > >>> +} > >> > >> It seems like having a single pause/resume callback rather than separate > >> pause and resume ones pushes some of the event state management into the > >> individual drivers and would be prone to code duplication and divergent > >> behavior. > >> > >> Would it be possible to move the conditions from here into the core code > >> and call separate functions instead? > >> > >>> + > >>> static void pt_event_start(struct perf_event *event, int mode) > >>> { > >>> struct hw_perf_event *hwc = &event->hw; > >>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void) > >>> pt_pmu.pmu.del = pt_event_del; > >>> pt_pmu.pmu.start = pt_event_start; > >>> pt_pmu.pmu.stop = pt_event_stop; > >>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume; > >> > >> The general idea seems ok to me. Is there a reason to not use the > >> existing start() stop() callbacks, rather than adding a new one? > >> > >> I assume it's intended to be something like an optimisation where you > >> can turn it on and off without having to do the full setup, teardown and > >> emit an AUX record because you know the process being traced never gets > >> switched out? > > > > So the actual scheduling uses ->add() / ->del(), the ->start() / > > ->stop() methods are something that can be used after ->add() and before > > ->del() to 'temporarily' pause things. > > > > Pretty much exactly what is required here I think. We currently use this > > for PMI throttling and adaptive frequency stuff, but there is no reason > > it could not also be used for this. > > > > As is, we don't track the paused state across ->del() / ->add(), but > > perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_ > > bits to manage things. > > > > > > I am not sure stop / start play nice with NMI's from other events e.g. > > PMC NMI wants to pause or resume AUX but what if AUX event is currently > being processed in ->stop() or ->start()? Or maybe that can't happen? I think that can happen, and pt_event_stop() can actually handle some of that, while your pause_resume() thing, which uses pt_config_stop() does not. But yes, I think that if you add pt_event_{stop,start}() calls from *other* events their PMI, then you get to deal with more 'fun'. Something like: perf_addr_filters_adjust() __perf_addr_filters_adjust() perf_event_stop() __perf_event_stop() event->pmu->stop() <NMI> ... perf_event_overflow() pt_event->pmu->stop() </NMI> event->pmu->start() // whoopsie! Should now be possible. I think what you want to do is rename pt->handle_nmi into pt->stop_count and make it a counter, then ->stop() increments it, and ->start() decrements it and everybody ensures the thing doesn't get restart while !0 etc.. I suspect you need to guard the generic part of this feature with a new PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once they've audited things. James, does that work for you?
On 29/11/2023 12:23, Peter Zijlstra wrote: > On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote: >> On 29/11/23 12:58, Peter Zijlstra wrote: >>> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote: >>>> On 23/11/2023 12:18, Adrian Hunter wrote: >>> >>>>> +static void pt_event_pause_resume(struct perf_event *event) >>>>> +{ >>>>> + if (event->aux_paused) >>>>> + pt_config_stop(event); >>>>> + else if (!event->hw.state) >>>>> + pt_config_start(event); >>>>> +} >>>> >>>> It seems like having a single pause/resume callback rather than separate >>>> pause and resume ones pushes some of the event state management into the >>>> individual drivers and would be prone to code duplication and divergent >>>> behavior. >>>> >>>> Would it be possible to move the conditions from here into the core code >>>> and call separate functions instead? >>>> >>>>> + >>>>> static void pt_event_start(struct perf_event *event, int mode) >>>>> { >>>>> struct hw_perf_event *hwc = &event->hw; >>>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void) >>>>> pt_pmu.pmu.del = pt_event_del; >>>>> pt_pmu.pmu.start = pt_event_start; >>>>> pt_pmu.pmu.stop = pt_event_stop; >>>>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume; >>>> >>>> The general idea seems ok to me. Is there a reason to not use the >>>> existing start() stop() callbacks, rather than adding a new one? >>>> >>>> I assume it's intended to be something like an optimisation where you >>>> can turn it on and off without having to do the full setup, teardown and >>>> emit an AUX record because you know the process being traced never gets >>>> switched out? >>> >>> So the actual scheduling uses ->add() / ->del(), the ->start() / >>> ->stop() methods are something that can be used after ->add() and before >>> ->del() to 'temporarily' pause things. >>> >>> Pretty much exactly what is required here I think. We currently use this >>> for PMI throttling and adaptive frequency stuff, but there is no reason >>> it could not also be used for this. >>> >>> As is, we don't track the paused state across ->del() / ->add(), but >>> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_ >>> bits to manage things. >>> >>> >> >> I am not sure stop / start play nice with NMI's from other events e.g. >> >> PMC NMI wants to pause or resume AUX but what if AUX event is currently >> being processed in ->stop() or ->start()? Or maybe that can't happen? > > I think that can happen, and pt_event_stop() can actually handle some of > that, while your pause_resume() thing, which uses pt_config_stop() does > not. > > But yes, I think that if you add pt_event_{stop,start}() calls from > *other* events their PMI, then you get to deal with more 'fun'. > > Something like: > > perf_addr_filters_adjust() > __perf_addr_filters_adjust() > perf_event_stop() > __perf_event_stop() > event->pmu->stop() > <NMI> > ... > perf_event_overflow() > pt_event->pmu->stop() > </NMI> > event->pmu->start() // whoopsie! > > Should now be possible. > > I think what you want to do is rename pt->handle_nmi into pt->stop_count > and make it a counter, then ->stop() increments it, and ->start() > decrements it and everybody ensures the thing doesn't get restart while > !0 etc.. > > I suspect you need to guard the generic part of this feature with a new > PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once > they've audited things. > > James, does that work for you? > Yep I think that would work. If I understand it with the stop_count counter, to be able to support the new CAP, every device would basically have to implement a similar counter? Would it be possible to do that reference counting on the outside of start() and stop() in the core code? So that a device only ever sees one call to start and one call to stop and doesn't need to do any extra tracking?
On 30/11/23 12:07, James Clark wrote: > > > On 29/11/2023 12:23, Peter Zijlstra wrote: >> On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote: >>> On 29/11/23 12:58, Peter Zijlstra wrote: >>>> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote: >>>>> On 23/11/2023 12:18, Adrian Hunter wrote: >>>> >>>>>> +static void pt_event_pause_resume(struct perf_event *event) >>>>>> +{ >>>>>> + if (event->aux_paused) >>>>>> + pt_config_stop(event); >>>>>> + else if (!event->hw.state) >>>>>> + pt_config_start(event); >>>>>> +} >>>>> >>>>> It seems like having a single pause/resume callback rather than separate >>>>> pause and resume ones pushes some of the event state management into the >>>>> individual drivers and would be prone to code duplication and divergent >>>>> behavior. >>>>> >>>>> Would it be possible to move the conditions from here into the core code >>>>> and call separate functions instead? >>>>> >>>>>> + >>>>>> static void pt_event_start(struct perf_event *event, int mode) >>>>>> { >>>>>> struct hw_perf_event *hwc = &event->hw; >>>>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void) >>>>>> pt_pmu.pmu.del = pt_event_del; >>>>>> pt_pmu.pmu.start = pt_event_start; >>>>>> pt_pmu.pmu.stop = pt_event_stop; >>>>>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume; >>>>> >>>>> The general idea seems ok to me. Is there a reason to not use the >>>>> existing start() stop() callbacks, rather than adding a new one? >>>>> >>>>> I assume it's intended to be something like an optimisation where you >>>>> can turn it on and off without having to do the full setup, teardown and >>>>> emit an AUX record because you know the process being traced never gets >>>>> switched out? >>>> >>>> So the actual scheduling uses ->add() / ->del(), the ->start() / >>>> ->stop() methods are something that can be used after ->add() and before >>>> ->del() to 'temporarily' pause things. >>>> >>>> Pretty much exactly what is required here I think. We currently use this >>>> for PMI throttling and adaptive frequency stuff, but there is no reason >>>> it could not also be used for this. >>>> >>>> As is, we don't track the paused state across ->del() / ->add(), but >>>> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_ >>>> bits to manage things. >>>> >>>> >>> >>> I am not sure stop / start play nice with NMI's from other events e.g. >>> >>> PMC NMI wants to pause or resume AUX but what if AUX event is currently >>> being processed in ->stop() or ->start()? Or maybe that can't happen? >> >> I think that can happen, and pt_event_stop() can actually handle some of >> that, while your pause_resume() thing, which uses pt_config_stop() does >> not. >> >> But yes, I think that if you add pt_event_{stop,start}() calls from >> *other* events their PMI, then you get to deal with more 'fun'. >> >> Something like: >> >> perf_addr_filters_adjust() >> __perf_addr_filters_adjust() >> perf_event_stop() >> __perf_event_stop() >> event->pmu->stop() >> <NMI> >> ... >> perf_event_overflow() >> pt_event->pmu->stop() >> </NMI> >> event->pmu->start() // whoopsie! >> >> Should now be possible. >> >> I think what you want to do is rename pt->handle_nmi into pt->stop_count >> and make it a counter, then ->stop() increments it, and ->start() >> decrements it and everybody ensures the thing doesn't get restart while >> !0 etc.. >> >> I suspect you need to guard the generic part of this feature with a new >> PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once >> they've audited things. >> >> James, does that work for you? >> > > Yep I think that would work. > > If I understand it with the stop_count counter, to be able to support > the new CAP, every device would basically have to implement a similar > counter? > > Would it be possible to do that reference counting on the outside of > start() and stop() in the core code? So that a device only ever sees one > call to start and one call to stop and doesn't need to do any extra > tracking? stop_counter does not seem right for pauses and resumes because they should not accumulate. We want: pause stop pause resume start resume not: pause stop pause resume resume start Also stop_counter has issues like: stop_counter 0 -> 1 <NMI> stop_counter 1 -> 0 start() </NMI> stop() <- stop_counter is now out of sync or: stop_counter 1 -> 0 <NMI> stop_counter 0 -> 1 stop() </NMI> start() <- stop_counter is now out of sync Also Intel PT implementation uses low-level start / stop for pause / resume, which can be made not to conflict with regular start / stop because they only toggle TRACEEN bit.
diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c index 42a55794004a..aa883b64814a 100644 --- a/arch/x86/events/intel/pt.c +++ b/arch/x86/events/intel/pt.c @@ -418,6 +418,9 @@ static void pt_config_start(struct perf_event *event) struct pt *pt = this_cpu_ptr(&pt_ctx); u64 ctl = event->hw.config; + if (event->aux_paused) + return; + ctl |= RTIT_CTL_TRACEEN; if (READ_ONCE(pt->vmx_on)) perf_aux_output_flag(&pt->handle, PERF_AUX_FLAG_PARTIAL); @@ -1563,6 +1566,14 @@ EXPORT_SYMBOL_GPL(intel_pt_handle_vmx); * PMU callbacks */ +static void pt_event_pause_resume(struct perf_event *event) +{ + if (event->aux_paused) + pt_config_stop(event); + else if (!event->hw.state) + pt_config_start(event); +} + static void pt_event_start(struct perf_event *event, int mode) { struct hw_perf_event *hwc = &event->hw; @@ -1798,6 +1809,7 @@ static __init int pt_init(void) pt_pmu.pmu.del = pt_event_del; pt_pmu.pmu.start = pt_event_start; pt_pmu.pmu.stop = pt_event_stop; + pt_pmu.pmu.pause_resume = pt_event_pause_resume; pt_pmu.pmu.snapshot_aux = pt_event_snapshot_aux; pt_pmu.pmu.read = pt_event_read; pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;
Prevent tracing to start if aux_paused. Implement pause_resume() callback. When aux_paused, stop tracing. When not aux_paused, only start tracing if it isn't currently meant to be stopped. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> --- arch/x86/events/intel/pt.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)