diff mbox

[RFC,2/2] arm64: kernel: perf: add pmu CPU PM notifier

Message ID 1426008682-5680-2-git-send-email-lorenzo.pieralisi@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Lorenzo Pieralisi March 10, 2015, 5:31 p.m. UTC
When a CPU is being profiled through PMU events and it enters suspend
or idle states, the PMU registers content can be lost, which means that
counters that were relied upon on power down entry are reset on power
up to values that are incosistent with the profile session.

This patch adds a CPU PM notifier to arm64 perf code, that detects
on entry if events are being monitored, and if so, it returns
failure to the CPU PM notification chain, causing the suspend
thread or the idle thread to abort power down, therefore preventing
registers content loss.

By triggering CPU PM notification failure this patch prevents
suspending a system if the suspend thread is being profiled and
it also prevents entering idle deep states on cores that have profile
events in use, somehow limiting power management capabilities when
there are active perf sessions.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 arch/arm64/kernel/perf_event.c | 50 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

Comments

Kevin Hilman March 11, 2015, 4:02 p.m. UTC | #1
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:

> When a CPU is being profiled through PMU events and it enters suspend
> or idle states, the PMU registers content can be lost, which means that
> counters that were relied upon on power down entry are reset on power
> up to values that are incosistent with the profile session.
>
> This patch adds a CPU PM notifier to arm64 perf code, that detects
> on entry if events are being monitored, and if so, it returns
> failure to the CPU PM notification chain, causing the suspend
> thread or the idle thread to abort power down, therefore preventing
> registers content loss.
>
> By triggering CPU PM notification failure this patch prevents
> suspending a system if the suspend thread is being profiled and
> it also prevents entering idle deep states on cores that have profile
> events in use, somehow limiting power management capabilities when
> there are active perf sessions.

I guess that's one choice.  Couldn't you also stop the PMU and
save/restore it's context in the notifiers? so that you wouldn't affect
PM capabilities?

That would imply that you lose the ability to profile after a certain
point in suspend/idle, but maybe that's a better trade off than having
profiling disable certain PM features?

Kevin
Lorenzo Pieralisi March 12, 2015, 10:27 a.m. UTC | #2
[Cc'ing Dave]

On Wed, Mar 11, 2015 at 04:02:17PM +0000, Kevin Hilman wrote:
> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> 
> > When a CPU is being profiled through PMU events and it enters suspend
> > or idle states, the PMU registers content can be lost, which means that
> > counters that were relied upon on power down entry are reset on power
> > up to values that are incosistent with the profile session.
> >
> > This patch adds a CPU PM notifier to arm64 perf code, that detects
> > on entry if events are being monitored, and if so, it returns
> > failure to the CPU PM notification chain, causing the suspend
> > thread or the idle thread to abort power down, therefore preventing
> > registers content loss.
> >
> > By triggering CPU PM notification failure this patch prevents
> > suspending a system if the suspend thread is being profiled and
> > it also prevents entering idle deep states on cores that have profile
> > events in use, somehow limiting power management capabilities when
> > there are active perf sessions.
> 
> I guess that's one choice.  Couldn't you also stop the PMU and
> save/restore it's context in the notifiers? so that you wouldn't affect
> PM capabilities?

Yes, that's why I sent this an RFC. This solution can also be easily
ported to power domains, when we put them in place.

To save/restore PMU counters we can either reuse perf core (IIRC Dave
had a stab at this, it is not a trivial patch) or do arch specific
save/restore (with related buffers for registers context) but I think
Will does not like the idea at all (and he has a point since the context
memory is already there in perf core).

> That would imply that you lose the ability to profile after a certain
> point in suspend/idle, but maybe that's a better trade off than having
> profiling disable certain PM features?

I am ok either way, as long as we make a decision, it has been hanging
in the balance for aeons.

Thanks,
Lorenzo
Ashwin Chaugule March 12, 2015, 1:18 p.m. UTC | #3
On 11 March 2015 at 12:02, Kevin Hilman <khilman@kernel.org> wrote:
> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
>
>> When a CPU is being profiled through PMU events and it enters suspend
>> or idle states, the PMU registers content can be lost, which means that
>> counters that were relied upon on power down entry are reset on power
>> up to values that are incosistent with the profile session.
>>
>> This patch adds a CPU PM notifier to arm64 perf code, that detects
>> on entry if events are being monitored, and if so, it returns
>> failure to the CPU PM notification chain, causing the suspend
>> thread or the idle thread to abort power down, therefore preventing
>> registers content loss.
>>
>> By triggering CPU PM notification failure this patch prevents
>> suspending a system if the suspend thread is being profiled and
>> it also prevents entering idle deep states on cores that have profile
>> events in use, somehow limiting power management capabilities when
>> there are active perf sessions.
>
> I guess that's one choice.  Couldn't you also stop the PMU and
> save/restore it's context in the notifiers? so that you wouldn't affect
> PM capabilities?
>
> That would imply that you lose the ability to profile after a certain
> point in suspend/idle, but maybe that's a better trade off than having
> profiling disable certain PM features?

I had something like that a few years ago on the Kraits and Scorpions [1].

[1] https://www.codeaurora.org/cgit/quic/la/kernel/msm/commit/arch/arm/kernel?h=msm-3.4&id=b5ca687960f0fea2f4735e83ca5c9543474c19de

Cheers,
Ashwin.
Lorenzo Pieralisi March 13, 2015, 5:40 p.m. UTC | #4
On Thu, Mar 12, 2015 at 01:18:54PM +0000, Ashwin Chaugule wrote:
> On 11 March 2015 at 12:02, Kevin Hilman <khilman@kernel.org> wrote:
> > Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> >
> >> When a CPU is being profiled through PMU events and it enters suspend
> >> or idle states, the PMU registers content can be lost, which means that
> >> counters that were relied upon on power down entry are reset on power
> >> up to values that are incosistent with the profile session.
> >>
> >> This patch adds a CPU PM notifier to arm64 perf code, that detects
> >> on entry if events are being monitored, and if so, it returns
> >> failure to the CPU PM notification chain, causing the suspend
> >> thread or the idle thread to abort power down, therefore preventing
> >> registers content loss.
> >>
> >> By triggering CPU PM notification failure this patch prevents
> >> suspending a system if the suspend thread is being profiled and
> >> it also prevents entering idle deep states on cores that have profile
> >> events in use, somehow limiting power management capabilities when
> >> there are active perf sessions.
> >
> > I guess that's one choice.  Couldn't you also stop the PMU and
> > save/restore it's context in the notifiers? so that you wouldn't affect
> > PM capabilities?
> >
> > That would imply that you lose the ability to profile after a certain
> > point in suspend/idle, but maybe that's a better trade off than having
> > profiling disable certain PM features?
> 
> I had something like that a few years ago on the Kraits and Scorpions [1].

That's another option, but the point is understanding how we want to
tackle the issue, by preventing power down or by restoring the
PMU registers.

BTW, does your patch below need optimizing ? IIUC it "restores" the PMU
counters even when that's not needed, I spotted some code in your tree
that adds additional checks (ie check if returning from idle but I am not
sure that's viable).

All in all going to deep idle must emulate a perf context switch through power
down, I am pretty sured Dave implemented something like that, I will
chase the code and then we will see what's the best option.

Lorenzo

> [1] https://www.codeaurora.org/cgit/quic/la/kernel/msm/commit/arch/arm/kernel?h=msm-3.4&id=b5ca687960f0fea2f4735e83ca5c9543474c19de
> 
> Cheers,
> Ashwin.
>
Kevin Hilman March 13, 2015, 11:31 p.m. UTC | #5
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:

> On Thu, Mar 12, 2015 at 01:18:54PM +0000, Ashwin Chaugule wrote:
>> On 11 March 2015 at 12:02, Kevin Hilman <khilman@kernel.org> wrote:
>> > Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
>> >
>> >> When a CPU is being profiled through PMU events and it enters suspend
>> >> or idle states, the PMU registers content can be lost, which means that
>> >> counters that were relied upon on power down entry are reset on power
>> >> up to values that are incosistent with the profile session.
>> >>
>> >> This patch adds a CPU PM notifier to arm64 perf code, that detects
>> >> on entry if events are being monitored, and if so, it returns
>> >> failure to the CPU PM notification chain, causing the suspend
>> >> thread or the idle thread to abort power down, therefore preventing
>> >> registers content loss.
>> >>
>> >> By triggering CPU PM notification failure this patch prevents
>> >> suspending a system if the suspend thread is being profiled and
>> >> it also prevents entering idle deep states on cores that have profile
>> >> events in use, somehow limiting power management capabilities when
>> >> there are active perf sessions.
>> >
>> > I guess that's one choice.  Couldn't you also stop the PMU and
>> > save/restore it's context in the notifiers? so that you wouldn't affect
>> > PM capabilities?
>> >
>> > That would imply that you lose the ability to profile after a certain
>> > point in suspend/idle, but maybe that's a better trade off than having
>> > profiling disable certain PM features?
>> 
>> I had something like that a few years ago on the Kraits and Scorpions [1].
>
> That's another option, but the point is understanding how we want to
> tackle the issue, by preventing power down or by restoring the
> PMU registers.

Personally, I think the save/restore approach is preferred.  IMO, it's
more intuitive from the perspective of a user who doesn't understand all
the mechanics and also actually allows you to profile most of the
low-power paths and still actually hit the low power states.

> BTW, does your patch below need optimizing ? IIUC it "restores" the PMU
> counters even when that's not needed, I spotted some code in your tree
> that adds additional checks (ie check if returning from idle but I am not
> sure that's viable).

Such an optimization could also be done when we (someday) move this
cpu_pm notifier stuff to a proper pm_domain associated with the
CPU/cluster.

Kevin
Will Deacon March 17, 2015, 5:24 p.m. UTC | #6
On Fri, Mar 13, 2015 at 11:31:37PM +0000, Kevin Hilman wrote:
> Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> 
> > On Thu, Mar 12, 2015 at 01:18:54PM +0000, Ashwin Chaugule wrote:
> >> On 11 March 2015 at 12:02, Kevin Hilman <khilman@kernel.org> wrote:
> >> > Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> >> >
> >> >> When a CPU is being profiled through PMU events and it enters suspend
> >> >> or idle states, the PMU registers content can be lost, which means that
> >> >> counters that were relied upon on power down entry are reset on power
> >> >> up to values that are incosistent with the profile session.
> >> >>
> >> >> This patch adds a CPU PM notifier to arm64 perf code, that detects
> >> >> on entry if events are being monitored, and if so, it returns
> >> >> failure to the CPU PM notification chain, causing the suspend
> >> >> thread or the idle thread to abort power down, therefore preventing
> >> >> registers content loss.
> >> >>
> >> >> By triggering CPU PM notification failure this patch prevents
> >> >> suspending a system if the suspend thread is being profiled and
> >> >> it also prevents entering idle deep states on cores that have profile
> >> >> events in use, somehow limiting power management capabilities when
> >> >> there are active perf sessions.
> >> >
> >> > I guess that's one choice.  Couldn't you also stop the PMU and
> >> > save/restore it's context in the notifiers? so that you wouldn't affect
> >> > PM capabilities?
> >> >
> >> > That would imply that you lose the ability to profile after a certain
> >> > point in suspend/idle, but maybe that's a better trade off than having
> >> > profiling disable certain PM features?
> >> 
> >> I had something like that a few years ago on the Kraits and Scorpions [1].
> >
> > That's another option, but the point is understanding how we want to
> > tackle the issue, by preventing power down or by restoring the
> > PMU registers.
> 
> Personally, I think the save/restore approach is preferred.  IMO, it's
> more intuitive from the perspective of a user who doesn't understand all
> the mechanics and also actually allows you to profile most of the
> low-power paths and still actually hit the low power states.

I agree that save/restore is the nicest thing to do, but until we have a
generic description of the power-domains in device-tree I really don't want
PMU-specific hacks in the arch code to deal with this (since many platforms
do not lose PMU state over suspend).

What Lorenzo is proposing is a stop-gap to prevent perf silently losing its
state on platforms where the register contents is lost.

Will
Lorenzo Pieralisi March 30, 2015, 5:45 p.m. UTC | #7
On Tue, Mar 17, 2015 at 05:24:18PM +0000, Will Deacon wrote:
> On Fri, Mar 13, 2015 at 11:31:37PM +0000, Kevin Hilman wrote:
> > Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> > 
> > > On Thu, Mar 12, 2015 at 01:18:54PM +0000, Ashwin Chaugule wrote:
> > >> On 11 March 2015 at 12:02, Kevin Hilman <khilman@kernel.org> wrote:
> > >> > Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
> > >> >
> > >> >> When a CPU is being profiled through PMU events and it enters suspend
> > >> >> or idle states, the PMU registers content can be lost, which means that
> > >> >> counters that were relied upon on power down entry are reset on power
> > >> >> up to values that are incosistent with the profile session.
> > >> >>
> > >> >> This patch adds a CPU PM notifier to arm64 perf code, that detects
> > >> >> on entry if events are being monitored, and if so, it returns
> > >> >> failure to the CPU PM notification chain, causing the suspend
> > >> >> thread or the idle thread to abort power down, therefore preventing
> > >> >> registers content loss.
> > >> >>
> > >> >> By triggering CPU PM notification failure this patch prevents
> > >> >> suspending a system if the suspend thread is being profiled and
> > >> >> it also prevents entering idle deep states on cores that have profile
> > >> >> events in use, somehow limiting power management capabilities when
> > >> >> there are active perf sessions.
> > >> >
> > >> > I guess that's one choice.  Couldn't you also stop the PMU and
> > >> > save/restore it's context in the notifiers? so that you wouldn't affect
> > >> > PM capabilities?
> > >> >
> > >> > That would imply that you lose the ability to profile after a certain
> > >> > point in suspend/idle, but maybe that's a better trade off than having
> > >> > profiling disable certain PM features?
> > >> 
> > >> I had something like that a few years ago on the Kraits and Scorpions [1].
> > >
> > > That's another option, but the point is understanding how we want to
> > > tackle the issue, by preventing power down or by restoring the
> > > PMU registers.
> > 
> > Personally, I think the save/restore approach is preferred.  IMO, it's
> > more intuitive from the perspective of a user who doesn't understand all
> > the mechanics and also actually allows you to profile most of the
> > low-power paths and still actually hit the low power states.
> 
> I agree that save/restore is the nicest thing to do, but until we have a
> generic description of the power-domains in device-tree I really don't want
> PMU-specific hacks in the arch code to deal with this (since many platforms
> do not lose PMU state over suspend).
> 
> What Lorenzo is proposing is a stop-gap to prevent perf silently losing its
> state on platforms where the register contents is lost.

Yes, that's exactly the goal. I am fine either way, at the moment we are
not in a position to define whether a core loses the PMU context, that's
certain.

I am not sure to understand why people want the core to really go
to shutdown and save/restore perf context if perf is on (eg profiling a
cpu for cache stats, clearly powering down a core biases the results
and might even obfuscate them).

It is not an easy call. If we go for this patch approach I am happy to
rebase and retest the series again, for power domains I fear we have to
wait given the sheer amount of changes in ARM CPUidle world at the moment,
too much stuff going on there.

Lorenzo
Kevin Hilman March 31, 2015, 4:35 p.m. UTC | #8
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:

> On Tue, Mar 17, 2015 at 05:24:18PM +0000, Will Deacon wrote:
>> On Fri, Mar 13, 2015 at 11:31:37PM +0000, Kevin Hilman wrote:
>> > Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
>> > 
>> > > On Thu, Mar 12, 2015 at 01:18:54PM +0000, Ashwin Chaugule wrote:
>> > >> On 11 March 2015 at 12:02, Kevin Hilman <khilman@kernel.org> wrote:
>> > >> > Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> writes:
>> > >> >
>> > >> >> When a CPU is being profiled through PMU events and it enters suspend
>> > >> >> or idle states, the PMU registers content can be lost, which means that
>> > >> >> counters that were relied upon on power down entry are reset on power
>> > >> >> up to values that are incosistent with the profile session.
>> > >> >>
>> > >> >> This patch adds a CPU PM notifier to arm64 perf code, that detects
>> > >> >> on entry if events are being monitored, and if so, it returns
>> > >> >> failure to the CPU PM notification chain, causing the suspend
>> > >> >> thread or the idle thread to abort power down, therefore preventing
>> > >> >> registers content loss.
>> > >> >>
>> > >> >> By triggering CPU PM notification failure this patch prevents
>> > >> >> suspending a system if the suspend thread is being profiled and
>> > >> >> it also prevents entering idle deep states on cores that have profile
>> > >> >> events in use, somehow limiting power management capabilities when
>> > >> >> there are active perf sessions.
>> > >> >
>> > >> > I guess that's one choice.  Couldn't you also stop the PMU and
>> > >> > save/restore it's context in the notifiers? so that you wouldn't affect
>> > >> > PM capabilities?
>> > >> >
>> > >> > That would imply that you lose the ability to profile after a certain
>> > >> > point in suspend/idle, but maybe that's a better trade off than having
>> > >> > profiling disable certain PM features?
>> > >> 
>> > >> I had something like that a few years ago on the Kraits and Scorpions [1].
>> > >
>> > > That's another option, but the point is understanding how we want to
>> > > tackle the issue, by preventing power down or by restoring the
>> > > PMU registers.
>> > 
>> > Personally, I think the save/restore approach is preferred.  IMO, it's
>> > more intuitive from the perspective of a user who doesn't understand all
>> > the mechanics and also actually allows you to profile most of the
>> > low-power paths and still actually hit the low power states.
>> 
>> I agree that save/restore is the nicest thing to do, but until we have a
>> generic description of the power-domains in device-tree I really don't want
>> PMU-specific hacks in the arch code to deal with this (since many platforms
>> do not lose PMU state over suspend).
>> 
>> What Lorenzo is proposing is a stop-gap to prevent perf silently losing its
>> state on platforms where the register contents is lost.
>
> Yes, that's exactly the goal. I am fine either way, at the moment we are
> not in a position to define whether a core loses the PMU context, that's
> certain.
>
> I am not sure to understand why people want the core to really go
> to shutdown and save/restore perf context if perf is on (eg profiling a
> cpu for cache stats, clearly powering down a core biases the results
> and might even obfuscate them).

The main thing I was considering is when using perf to profile the
low-power paths (suspend or idle.)

> It is not an easy call. If we go for this patch approach I am happy to
> rebase and retest the series again, for power domains I fear we have to
> wait given the sheer amount of changes in ARM CPUidle world at the moment,
> too much stuff going on there.

Ultimately, I'm OK with the proposed solution as a short/medium term
solution.  When our PM domain and CPUidle integration story gets a
little better, we can revisit this.

Kevin
diff mbox

Patch

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 83f21d8..4809fca 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -22,6 +22,7 @@ 
 
 #include <linux/bitmap.h>
 #include <linux/cpu.h>
+#include <linux/cpu_pm.h>
 #include <linux/interrupt.h>
 #include <linux/irq.h>
 #include <linux/kernel.h>
@@ -1316,6 +1317,53 @@  static struct pmu_hw_events *armpmu_get_cpu_events(void)
 	return this_cpu_ptr(&cpu_hw_events);
 }
 
+#ifdef CONFIG_CPU_PM
+static int cpu_pm_pmu_notifier(struct notifier_block *self,
+			       unsigned long cmd, void *v)
+{
+	switch (cmd) {
+	case CPU_PM_ENTER: {
+		/*
+		 * Check events in use and fail if events are being monitored.
+		 * On failure, system suspend and idle entry will consequently
+		 * abort power down operations, preventing CPU reset on power
+		 * down, that would cause PMU registers reset in turn.
+		 * The obvious drawback is that we can't enter idle
+		 * states if a cpu is being profiled, and we can't
+		 * profile the suspend thread (ie profiling it has the side
+		 * effect of disabling system suspend), but that's the price
+		 * to pay to keep the PMU registers powered up.
+		 */
+		struct pmu_hw_events *hw_events = cpu_pmu->get_hw_events();
+		int enabled = bitmap_weight(hw_events->used_mask,
+					    cpu_pmu->num_events);
+		return enabled ? NOTIFY_BAD : NOTIFY_OK;
+	}
+	case CPU_PM_EXIT:
+		if (cpu_pmu->reset)
+			cpu_pmu->reset(NULL);
+		return NOTIFY_OK;
+	case CPU_PM_ENTER_FAILED:
+	default:
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_pm_pmu_notifier_block = {
+	.notifier_call = cpu_pm_pmu_notifier,
+};
+
+static void cpu_pm_pmu_init(void)
+{
+	cpu_pm_register_notifier(&cpu_pm_pmu_notifier_block);
+}
+
+#else
+static inline void cpu_pm_pmu_init(void) { }
+#endif /* CONFIG_CPU_PM */
+
 /*
  * PMU hardware loses all context when a CPU goes offline.
  * When a CPU is hotplugged back in, since some hardware registers are
@@ -1352,6 +1400,8 @@  static void __init cpu_pmu_init(struct arm_pmu *armpmu)
 	armpmu->get_hw_events = armpmu_get_cpu_events;
 
 	register_cpu_notifier(&cpu_pmu_notifier_block);
+
+	cpu_pm_pmu_init();
 }
 
 static int __init init_hw_perf_events(void)