diff mbox

[v4,4/5] arm64/perf: Enable PMCR long cycle counter bit

Message ID 467597048eda3004bd69f1fbe3981aab111e00dd.1455810755.git.jglauber@cavium.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Glauber Feb. 18, 2016, 4:50 p.m. UTC
With the long cycle counter bit (LC) disabled the cycle counter is not
working on ThunderX SOC (ThunderX only implements Aarch64).
Also, according to documentation LC == 0 is deprecated.

To keep the code simple the patch does not introduce 64 bit wide counter
functions. Instead writing the cycle counter always sets the upper
32 bits so overflow interrupts are generated as before.

Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 arch/arm64/kernel/perf_event.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

Comments

Will Deacon Feb. 18, 2016, 5:34 p.m. UTC | #1
On Thu, Feb 18, 2016 at 05:50:13PM +0100, Jan Glauber wrote:
> With the long cycle counter bit (LC) disabled the cycle counter is not
> working on ThunderX SOC (ThunderX only implements Aarch64).
> Also, according to documentation LC == 0 is deprecated.
> 
> To keep the code simple the patch does not introduce 64 bit wide counter
> functions. Instead writing the cycle counter always sets the upper
> 32 bits so overflow interrupts are generated as before.
> 
> Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>

What does this mean? Do we need Andrew's S-o-B, or is this a fresh patch?

Will
Jan Glauber Feb. 18, 2016, 6:28 p.m. UTC | #2
On Thu, Feb 18, 2016 at 05:34:28PM +0000, Will Deacon wrote:
> On Thu, Feb 18, 2016 at 05:50:13PM +0100, Jan Glauber wrote:
> > With the long cycle counter bit (LC) disabled the cycle counter is not
> > working on ThunderX SOC (ThunderX only implements Aarch64).
> > Also, according to documentation LC == 0 is deprecated.
> > 
> > To keep the code simple the patch does not introduce 64 bit wide counter
> > functions. Instead writing the cycle counter always sets the upper
> > 32 bits so overflow interrupts are generated as before.
> > 
> > Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>
> 
> What does this mean? Do we need Andrew's S-o-B, or is this a fresh patch?
> 
> Will

I've modified Andrew's patch. I assumed his formal S-o-B is not
required. Please correct me if I'm wrong.

Jan
David Daney Feb. 18, 2016, 6:57 p.m. UTC | #3
On 02/18/2016 09:34 AM, Will Deacon wrote:
> On Thu, Feb 18, 2016 at 05:50:13PM +0100, Jan Glauber wrote:
>> With the long cycle counter bit (LC) disabled the cycle counter is not
>> working on ThunderX SOC (ThunderX only implements Aarch64).
>> Also, according to documentation LC == 0 is deprecated.
>>
>> To keep the code simple the patch does not introduce 64 bit wide counter
>> functions. Instead writing the cycle counter always sets the upper
>> 32 bits so overflow interrupts are generated as before.
>>
>> Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>
>
> What does this mean? Do we need Andrew's S-o-B, or is this a fresh patch?

I don't believe we need Andrew's S-o-B as the assertion of the 
Developer's Certificate of Origin 1.1 clauses (a), (b) and (d) is being 
made.  Specifically, clause (c) does not apply.

However this may be a gray area, so we could put on Andrew's S-o-B if 
that would make everybody happier.

David Daney


>
> Will
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Jan Glauber Feb. 22, 2016, 12:45 p.m. UTC | #4
On Thu, Feb 18, 2016 at 05:34:28PM +0000, Will Deacon wrote:
> On Thu, Feb 18, 2016 at 05:50:13PM +0100, Jan Glauber wrote:
> > With the long cycle counter bit (LC) disabled the cycle counter is not
> > working on ThunderX SOC (ThunderX only implements Aarch64).
> > Also, according to documentation LC == 0 is deprecated.
> > 
> > To keep the code simple the patch does not introduce 64 bit wide counter
> > functions. Instead writing the cycle counter always sets the upper
> > 32 bits so overflow interrupts are generated as before.
> > 
> > Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>
> 
> What does this mean? Do we need Andrew's S-o-B, or is this a fresh patch?

Hi Will,

Please let me know if I should repost or not, FWIW I got Andrew's S-o-B on the
patch.

Thanks, Jan

> Will
Will Deacon Feb. 22, 2016, 1:41 p.m. UTC | #5
On Mon, Feb 22, 2016 at 01:45:14PM +0100, Jan Glauber wrote:
> On Thu, Feb 18, 2016 at 05:34:28PM +0000, Will Deacon wrote:
> > On Thu, Feb 18, 2016 at 05:50:13PM +0100, Jan Glauber wrote:
> > > With the long cycle counter bit (LC) disabled the cycle counter is not
> > > working on ThunderX SOC (ThunderX only implements Aarch64).
> > > Also, according to documentation LC == 0 is deprecated.
> > > 
> > > To keep the code simple the patch does not introduce 64 bit wide counter
> > > functions. Instead writing the cycle counter always sets the upper
> > > 32 bits so overflow interrupts are generated as before.
> > > 
> > > Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>
> > 
> > What does this mean? Do we need Andrew's S-o-B, or is this a fresh patch?
> 
> Please let me know if I should repost or not, FWIW I got Andrew's S-o-B on the
> patch.

I think it's fine. This should all be in -next as of last Friday anyhow.

Will
Will Deacon Feb. 29, 2016, 3:39 p.m. UTC | #6
Hi Jan,

I've queued this lot on my perf/updates branch, but I just noticed an
oddity whilst dealing with some potential conflicts with the kvm tree.

On Thu, Feb 18, 2016 at 05:50:13PM +0100, Jan Glauber wrote:
> With the long cycle counter bit (LC) disabled the cycle counter is not
> working on ThunderX SOC (ThunderX only implements Aarch64).
> Also, according to documentation LC == 0 is deprecated.
> 
> To keep the code simple the patch does not introduce 64 bit wide counter
> functions. Instead writing the cycle counter always sets the upper
> 32 bits so overflow interrupts are generated as before.
> 
> Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>
> 
> Signed-off-by: Jan Glauber <jglauber@cavium.com>
> ---
>  arch/arm64/kernel/perf_event.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 0ed05f6..c68fa98 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -405,6 +405,7 @@ static const struct attribute_group *armv8_pmuv3_attr_groups[] = {
>  #define ARMV8_PMCR_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
>  #define ARMV8_PMCR_X		(1 << 4) /* Export to ETM */
>  #define ARMV8_PMCR_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
> +#define ARMV8_PMCR_LC		(1 << 6) /* Overflow on 64 bit cycle counter */
>  #define	ARMV8_PMCR_N_SHIFT	11	 /* Number of counters supported */
>  #define	ARMV8_PMCR_N_MASK	0x1f
>  #define	ARMV8_PMCR_MASK		0x3f	 /* Mask for writable bits */

You haven't extended this mask to cover the LC bit, so it will be ignored
by armv8pmu_pmcr_write afaict.

How did you test this? I can easily update the mask, but it would be
good to know that it doesn't end up cause a breakage.

Will
Jan Glauber March 1, 2016, 7:21 a.m. UTC | #7
On Mon, Feb 29, 2016 at 03:39:35PM +0000, Will Deacon wrote:
> Hi Jan,
> 
> I've queued this lot on my perf/updates branch, but I just noticed an
> oddity whilst dealing with some potential conflicts with the kvm tree.
> 
> On Thu, Feb 18, 2016 at 05:50:13PM +0100, Jan Glauber wrote:
> > With the long cycle counter bit (LC) disabled the cycle counter is not
> > working on ThunderX SOC (ThunderX only implements Aarch64).
> > Also, according to documentation LC == 0 is deprecated.
> > 
> > To keep the code simple the patch does not introduce 64 bit wide counter
> > functions. Instead writing the cycle counter always sets the upper
> > 32 bits so overflow interrupts are generated as before.
> > 
> > Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>
> > 
> > Signed-off-by: Jan Glauber <jglauber@cavium.com>
> > ---
> >  arch/arm64/kernel/perf_event.c | 21 ++++++++++++++++-----
> >  1 file changed, 16 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > index 0ed05f6..c68fa98 100644
> > --- a/arch/arm64/kernel/perf_event.c
> > +++ b/arch/arm64/kernel/perf_event.c
> > @@ -405,6 +405,7 @@ static const struct attribute_group *armv8_pmuv3_attr_groups[] = {
> >  #define ARMV8_PMCR_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
> >  #define ARMV8_PMCR_X		(1 << 4) /* Export to ETM */
> >  #define ARMV8_PMCR_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
> > +#define ARMV8_PMCR_LC		(1 << 6) /* Overflow on 64 bit cycle counter */
> >  #define	ARMV8_PMCR_N_SHIFT	11	 /* Number of counters supported */
> >  #define	ARMV8_PMCR_N_MASK	0x1f
> >  #define	ARMV8_PMCR_MASK		0x3f	 /* Mask for writable bits */
> 
> You haven't extended this mask to cover the LC bit, so it will be ignored
> by armv8pmu_pmcr_write afaict.

This is weird. I've double checked and I missed this mask. Annoying.
Nevertheless it works for me without the LC bit set.

> How did you test this? I can easily update the mask, but it would be
> good to know that it doesn't end up cause a breakage.
 
For testing I used:
- perf top and perf record & report
- looked at interrupt numbers in /proc/interrupts

Without the patch _no_ samples at all are recorded and the interrupt does
not occur. With the patch I get samples and see a reasonable number of
interrupts.

Extending the mask so the LC bit is covered would make sense, I'm going
to test this now.

Jan

> Will
Jan Glauber March 1, 2016, 3:10 p.m. UTC | #8
On Mon, Feb 29, 2016 at 03:39:35PM +0000, Will Deacon wrote:
> Hi Jan,
> 
> I've queued this lot on my perf/updates branch, but I just noticed an
> oddity whilst dealing with some potential conflicts with the kvm tree.
> 
> On Thu, Feb 18, 2016 at 05:50:13PM +0100, Jan Glauber wrote:
> > With the long cycle counter bit (LC) disabled the cycle counter is not
> > working on ThunderX SOC (ThunderX only implements Aarch64).
> > Also, according to documentation LC == 0 is deprecated.
> > 
> > To keep the code simple the patch does not introduce 64 bit wide counter
> > functions. Instead writing the cycle counter always sets the upper
> > 32 bits so overflow interrupts are generated as before.
> > 
> > Original patch from Andrew Pinksi <Andrew.Pinksi@caviumnetworks.com>
> > 
> > Signed-off-by: Jan Glauber <jglauber@cavium.com>
> > ---
> >  arch/arm64/kernel/perf_event.c | 21 ++++++++++++++++-----
> >  1 file changed, 16 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > index 0ed05f6..c68fa98 100644
> > --- a/arch/arm64/kernel/perf_event.c
> > +++ b/arch/arm64/kernel/perf_event.c
> > @@ -405,6 +405,7 @@ static const struct attribute_group *armv8_pmuv3_attr_groups[] = {
> >  #define ARMV8_PMCR_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
> >  #define ARMV8_PMCR_X		(1 << 4) /* Export to ETM */
> >  #define ARMV8_PMCR_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
> > +#define ARMV8_PMCR_LC		(1 << 6) /* Overflow on 64 bit cycle counter */
> >  #define	ARMV8_PMCR_N_SHIFT	11	 /* Number of counters supported */
> >  #define	ARMV8_PMCR_N_MASK	0x1f
> >  #define	ARMV8_PMCR_MASK		0x3f	 /* Mask for writable bits */
> 
> You haven't extended this mask to cover the LC bit, so it will be ignored
> by armv8pmu_pmcr_write afaict.
> 
> How did you test this? I can easily update the mask, but it would be
> good to know that it doesn't end up cause a breakage.

Please update the mask, I've tested with ARMV8_PMCR_MASK set to 0x7f
and it works fine.

It seems like it would work even without the LC bit set because we
set the upper bits again after an interrupt, but reading the documentation
we should really use ARMV8_PMCR_LC.

thanks,
Jan

> Will
diff mbox

Patch

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 0ed05f6..c68fa98 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -405,6 +405,7 @@  static const struct attribute_group *armv8_pmuv3_attr_groups[] = {
 #define ARMV8_PMCR_D		(1 << 3) /* CCNT counts every 64th cpu cycle */
 #define ARMV8_PMCR_X		(1 << 4) /* Export to ETM */
 #define ARMV8_PMCR_DP		(1 << 5) /* Disable CCNT if non-invasive debug*/
+#define ARMV8_PMCR_LC		(1 << 6) /* Overflow on 64 bit cycle counter */
 #define	ARMV8_PMCR_N_SHIFT	11	 /* Number of counters supported */
 #define	ARMV8_PMCR_N_MASK	0x1f
 #define	ARMV8_PMCR_MASK		0x3f	 /* Mask for writable bits */
@@ -494,9 +495,16 @@  static inline void armv8pmu_write_counter(struct perf_event *event, u32 value)
 	if (!armv8pmu_counter_valid(cpu_pmu, idx))
 		pr_err("CPU%u writing wrong counter %d\n",
 			smp_processor_id(), idx);
-	else if (idx == ARMV8_IDX_CYCLE_COUNTER)
-		asm volatile("msr pmccntr_el0, %0" :: "r" (value));
-	else if (armv8pmu_select_counter(idx) == idx)
+	else if (idx == ARMV8_IDX_CYCLE_COUNTER) {
+		/*
+		 * Set the upper 32bits as this is a 64bit counter but we only
+		 * count using the lower 32bits and we want an interrupt when
+		 * it overflows.
+		 */
+		u64 value64 = 0xffffffff00000000ULL | value;
+
+		asm volatile("msr pmccntr_el0, %0" :: "r" (value64));
+	} else if (armv8pmu_select_counter(idx) == idx)
 		asm volatile("msr pmxevcntr_el0, %0" :: "r" (value));
 }
 
@@ -768,8 +776,11 @@  static void armv8pmu_reset(void *info)
 		armv8pmu_disable_intens(idx);
 	}
 
-	/* Initialize & Reset PMNC: C and P bits. */
-	armv8pmu_pmcr_write(ARMV8_PMCR_P | ARMV8_PMCR_C);
+	/*
+	 * Initialize & Reset PMNC. Request overflow interrupt for
+	 * 64 bit cycle counter but cheat in armv8pmu_write_counter().
+	 */
+	armv8pmu_pmcr_write(ARMV8_PMCR_P | ARMV8_PMCR_C | ARMV8_PMCR_LC);
 }
 
 static int armv8_pmuv3_map_event(struct perf_event *event)