diff mbox

[PATCHv3,5/5] arm-cci: CCI-500: Work around PMU counter writes

Message ID 1447783407-18027-6-git-send-email-suzuki.poulose@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Suzuki K Poulose Nov. 17, 2015, 6:03 p.m. UTC
The CCI PMU driver sets the event counter to the half of the maximum
value(2^31) it can count before we start the counters via
pmu_event_set_period(). This is done to give us the best chance to
handle the overflow interrupt, taking care of extreme interrupt latencies.

However, CCI-500 comes with advanced power saving schemes, which
disables the clock to the event counters unless the counters are enabled to
count (PMCR.CEN). This prevents the driver from writing the period to the
counters before starting them.  Also, there is no way we can reset the
individual event counter to 0 (PMCR.RST resets all the counters, losing
their current readings). However the value of the counter is preserved and
could be read back, when the counters are not enabled.

So we cannot reliably use the counters and compute the number of events
generated during the sampling period since we don't have the value of the
counter at start.

This patch works around this issue by changing writes to the counter
with the following steps.

 1) Disable all the counters (remembering any counters which were enabled)
 2) Save the current event and program the target counter to count an
    invalid event, which by spec is guaranteed to not-generate any events.
 3) Enable the target counter.
 4) Enable the CCI PMU
 5) Write to the target counter.
 6) Disable the CCI PMU and the target counter
 7) Restore the event back on the target counter.
 8) Restore the status of the all the counters

Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
---
 drivers/bus/arm-cci.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

Comments

Mark Rutland Dec. 10, 2015, 3:42 p.m. UTC | #1
On Tue, Nov 17, 2015 at 06:03:27PM +0000, Suzuki K. Poulose wrote:
> The CCI PMU driver sets the event counter to the half of the maximum
> value(2^31) it can count before we start the counters via
> pmu_event_set_period(). This is done to give us the best chance to
> handle the overflow interrupt, taking care of extreme interrupt latencies.
> 
> However, CCI-500 comes with advanced power saving schemes, which
> disables the clock to the event counters unless the counters are enabled to
> count (PMCR.CEN). This prevents the driver from writing the period to the
> counters before starting them.  Also, there is no way we can reset the
> individual event counter to 0 (PMCR.RST resets all the counters, losing
> their current readings). However the value of the counter is preserved and
> could be read back, when the counters are not enabled.
> 
> So we cannot reliably use the counters and compute the number of events
> generated during the sampling period since we don't have the value of the
> counter at start.
> 
> This patch works around this issue by changing writes to the counter
> with the following steps.
> 
>  1) Disable all the counters (remembering any counters which were enabled)
>  2) Save the current event and program the target counter to count an
>     invalid event, which by spec is guaranteed to not-generate any events.
>  3) Enable the target counter.
>  4) Enable the CCI PMU
>  5) Write to the target counter.
>  6) Disable the CCI PMU and the target counter
>  7) Restore the event back on the target counter.
>  8) Restore the status of the all the counters
> 
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/bus/arm-cci.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
> index 88b612f..6020a02 100644
> --- a/drivers/bus/arm-cci.c
> +++ b/drivers/bus/arm-cci.c
> @@ -835,6 +835,52 @@ static void __pmu_write_counter(struct cci_pmu *cci_pmu, u32 value, int idx)
>  	pmu_write_register(cci_pmu, value, idx, CCI_PMU_CNTR);
>  }
>  
> +#ifdef CONFIG_ARM_CCI500_PMU
> +
> +/*
> + * CCI-500 has advanced power saving policies, which could gate the
> + * clocks to the PMU counters, which makes the writes to them ineffective.
> + * The only way to write to those counters is when the global counters
> + * are enabled and the particular counter is enabled.
> + *
> + * So we do the following :
> + *
> + * 1) Disable all the PMU counters, saving their current state
> + * 2) Save the programmed event, and write an invalid event code
> + *    to the event control register for the counter, so that the
> + *    counters are not modified.
> + * 3) Enable the counter control for the counter.
> + * 4) Enable the global PMU profiling
> + * 5) Set the counter value
> + * 6) Disable the counter, global PMU.
> + * 7) Restore the event in the target counter
> + * 8) Restore the status of the rest of the counters.
> + *
> + * We choose an event code which has very little chances of getting
> + * assigned a valid code for step(2). We use the highest possible
> + * event code (0x1f) for the master interface 0.
> + */
> +#define CCI500_INVALID_EVENT	((CCI500_PORT_M0 << CCI500_PMU_EVENT_SOURCE_SHIFT) | \
> +				 (CCI500_PMU_EVENT_CODE_MASK << CCI500_PMU_EVENT_CODE_SHIFT))
> +static void cci500_pmu_write_counter(struct cci_pmu *cci_pmu, u32 value, int idx)
> +{
> +	unsigned long mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];
> +	u32 event;
> +
> +	pmu_disable_counters(cci_pmu, mask);
> +	event = pmu_get_event(cci_pmu, idx);
> +	pmu_set_event(cci_pmu, idx, CCI500_INVALID_EVENT);
> +	pmu_enable_counter(cci_pmu, idx);
> +	__cci_pmu_enable();
> +	__pmu_write_counter(cci_pmu, value, idx);
> +	__cci_pmu_disable();
> +	pmu_disable_counter(cci_pmu, idx);
> +	pmu_set_event(cci_pmu, idx, event);
> +	pmu_restore_counters(cci_pmu, mask);
> +}
> +
> +#endif	/* CONFIG_ARM_CCI500_PMU */
> +
>  static void pmu_write_counter(struct perf_event *event, u32 value)
>  {
>  	struct cci_pmu *cci_pmu = to_cci_pmu(event->pmu);
> @@ -1422,6 +1468,7 @@ static struct cci_pmu_model cci_pmu_models[] = {
>  			},
>  		},
>  		.validate_hw_event = cci500_validate_hw_event,
> +		.write_counter	= cci500_pmu_write_counter,
>  	},

This should work, but it seems very heavyweight given we do it for each
write.

Can we not amortize this by using the {start,commit,cancel}_txn hooks?

Either we can handle 1-4 and 6-8 in those, or we can copy everything
into a shadow state and apply it all in one go at commit_txn time.

Or is that not possible for some reason I've missed?

Thanks,
Mark.
Suzuki K Poulose Dec. 11, 2015, 11:28 a.m. UTC | #2
On 10/12/15 15:42, Mark Rutland wrote:
> On Tue, Nov 17, 2015 at 06:03:27PM +0000, Suzuki K. Poulose wrote:
>> The CCI PMU driver sets the event counter to the half of the maximum
>> value(2^31) it can count before we start the counters via
>> pmu_event_set_period(). This is done to give us the best chance to
>> handle the overflow interrupt, taking care of extreme interrupt latencies.


>
> This should work, but it seems very heavyweight given we do it for each
> write.
>
> Can we not amortize this by using the {start,commit,cancel}_txn hooks?
>
> Either we can handle 1-4 and 6-8 in those, or we can copy everything
> into a shadow state and apply it all in one go at commit_txn time.

I took a look at it. The only worrying part is, if pmu->add() will be
called outside *_txn().

from linux/perf_event.h:

         /*
          * Adds/Removes a counter to/from the PMU, can be done inside a
          * transaction, see the ->*_txn() methods.
          *

As of now it is only called within the transactions, but the comment somehow
doesn't look like enforces it.

Thoughts ?

Suzuki
Peter Zijlstra Dec. 11, 2015, 12:10 p.m. UTC | #3
On Fri, Dec 11, 2015 at 11:28:45AM +0000, Suzuki K. Poulose wrote:
> On 10/12/15 15:42, Mark Rutland wrote:

> >This should work, but it seems very heavyweight given we do it for each
> >write.
> >
> >Can we not amortize this by using the {start,commit,cancel}_txn hooks?
> >
> >Either we can handle 1-4 and 6-8 in those, or we can copy everything
> >into a shadow state and apply it all in one go at commit_txn time.
> 
> I took a look at it. The only worrying part is, if pmu->add() will be
> called outside *_txn().
> 
> from linux/perf_event.h:
> 
>         /*
>          * Adds/Removes a counter to/from the PMU, can be done inside a
>          * transaction, see the ->*_txn() methods.
>          *
> 
> As of now it is only called within the transactions, but the comment somehow
> doesn't look like enforces it.

Right, txn stuff is intended to be optional. However a txn
implementation must track if one is in progress, so the ::add() method
can check against that.

Also note that there exist a callchain into pmu->add() that does not
start a txn. See:

	__perf_event_enable()
		if (event != leader)
			event_sched_in()
				event->pmu->add()

That said, you can also use pmu->pmu_{en,dis}able() to batch stuff (x86
does this too), add/del, start/stop are guaranteed to be called with the
PMU disabled (as per the comments in struct pmu).


on x86:

For ::add(), we delay touching the hardware until ::pmu_enable() time.

!txn ::add() will do a schedulability test to see if the pmu had place
for the new event and then record the details of it.

txn ::add() will just record the details.

::commit_txn will do the schedulability test for the txn, if that fails
we undo bits.

::pmu_enable rewrites the hardware registers, moves events about if
needed and configures the new event.
Mark Rutland Dec. 11, 2015, 12:14 p.m. UTC | #4
On Fri, Dec 11, 2015 at 11:28:45AM +0000, Suzuki K. Poulose wrote:
> On 10/12/15 15:42, Mark Rutland wrote:
> >On Tue, Nov 17, 2015 at 06:03:27PM +0000, Suzuki K. Poulose wrote:
> >>The CCI PMU driver sets the event counter to the half of the maximum
> >>value(2^31) it can count before we start the counters via
> >>pmu_event_set_period(). This is done to give us the best chance to
> >>handle the overflow interrupt, taking care of extreme interrupt latencies.
> 
> 
> >
> >This should work, but it seems very heavyweight given we do it for each
> >write.
> >
> >Can we not amortize this by using the {start,commit,cancel}_txn hooks?
> >
> >Either we can handle 1-4 and 6-8 in those, or we can copy everything
> >into a shadow state and apply it all in one go at commit_txn time.
> 
> I took a look at it. The only worrying part is, if pmu->add() will be
> called outside *_txn().

It looks like that happns.

If we __perf_event_enable an events which is not a leader, we may call
event_sched_in (which will call pmu->add) outside of a transaction. The
__perf_event_disable path is similar w.r.t. pmu->del.

So it does look like we can't rely on being in a transaction there.

Assuming that's deliberate, we could follow the example of other PMU
drivers and keep track of whether or not we're in a transaction. If not,
we do all the heavyweight work inline.

Thanks,
Mark.
diff mbox

Patch

diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index 88b612f..6020a02 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -835,6 +835,52 @@  static void __pmu_write_counter(struct cci_pmu *cci_pmu, u32 value, int idx)
 	pmu_write_register(cci_pmu, value, idx, CCI_PMU_CNTR);
 }
 
+#ifdef CONFIG_ARM_CCI500_PMU
+
+/*
+ * CCI-500 has advanced power saving policies, which could gate the
+ * clocks to the PMU counters, which makes the writes to them ineffective.
+ * The only way to write to those counters is when the global counters
+ * are enabled and the particular counter is enabled.
+ *
+ * So we do the following :
+ *
+ * 1) Disable all the PMU counters, saving their current state
+ * 2) Save the programmed event, and write an invalid event code
+ *    to the event control register for the counter, so that the
+ *    counters are not modified.
+ * 3) Enable the counter control for the counter.
+ * 4) Enable the global PMU profiling
+ * 5) Set the counter value
+ * 6) Disable the counter, global PMU.
+ * 7) Restore the event in the target counter
+ * 8) Restore the status of the rest of the counters.
+ *
+ * We choose an event code which has very little chances of getting
+ * assigned a valid code for step(2). We use the highest possible
+ * event code (0x1f) for the master interface 0.
+ */
+#define CCI500_INVALID_EVENT	((CCI500_PORT_M0 << CCI500_PMU_EVENT_SOURCE_SHIFT) | \
+				 (CCI500_PMU_EVENT_CODE_MASK << CCI500_PMU_EVENT_CODE_SHIFT))
+static void cci500_pmu_write_counter(struct cci_pmu *cci_pmu, u32 value, int idx)
+{
+	unsigned long mask[BITS_TO_LONGS(cci_pmu->num_cntrs)];
+	u32 event;
+
+	pmu_disable_counters(cci_pmu, mask);
+	event = pmu_get_event(cci_pmu, idx);
+	pmu_set_event(cci_pmu, idx, CCI500_INVALID_EVENT);
+	pmu_enable_counter(cci_pmu, idx);
+	__cci_pmu_enable();
+	__pmu_write_counter(cci_pmu, value, idx);
+	__cci_pmu_disable();
+	pmu_disable_counter(cci_pmu, idx);
+	pmu_set_event(cci_pmu, idx, event);
+	pmu_restore_counters(cci_pmu, mask);
+}
+
+#endif	/* CONFIG_ARM_CCI500_PMU */
+
 static void pmu_write_counter(struct perf_event *event, u32 value)
 {
 	struct cci_pmu *cci_pmu = to_cci_pmu(event->pmu);
@@ -1422,6 +1468,7 @@  static struct cci_pmu_model cci_pmu_models[] = {
 			},
 		},
 		.validate_hw_event = cci500_validate_hw_event,
+		.write_counter	= cci500_pmu_write_counter,
 	},
 #endif
 };