diff mbox

[2/2] arm64: pmu: Wire-up L2 cache events for ARMv8 PMUv3

Message ID 20170420190546.7453-4-f.fainelli@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Florian Fainelli April 20, 2017, 7:05 p.m. UTC
The ARMv8 PMUv3 cache map did not include the L2 cache events, add
them.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 arch/arm64/kernel/perf_event.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Will Deacon April 25, 2017, 12:44 p.m. UTC | #1
Hi Florian,

On Thu, Apr 20, 2017 at 12:05:46PM -0700, Florian Fainelli wrote:
> The ARMv8 PMUv3 cache map did not include the L2 cache events, add
> them.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  arch/arm64/kernel/perf_event.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 4f011cdd756d..a664c575f3fd 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -264,6 +264,11 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
>  	[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE,
>  	[C(L1I)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL,
>  
> +	[C(LL)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
> +	[C(LL)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
> +	[C(LL)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
> +	[C(LL)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,

I don't think this is correct in general. 'LL' stands for "last-level",
which may be L3 or even a system cache in the interconnect. Tying that to L2
is the wrong thing to do from perf's generic event perspective.

I'm ok with what you're proposing for A53 (where the PMU can only count
events out to the L2), but I'm reluctant to make this change for the generic
PMUv3 events.

Will
Florian Fainelli April 25, 2017, 5:13 p.m. UTC | #2
On 04/25/2017 05:44 AM, Will Deacon wrote:
> Hi Florian,
> 
> On Thu, Apr 20, 2017 at 12:05:46PM -0700, Florian Fainelli wrote:
>> The ARMv8 PMUv3 cache map did not include the L2 cache events, add
>> them.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  arch/arm64/kernel/perf_event.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
>> index 4f011cdd756d..a664c575f3fd 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -264,6 +264,11 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
>>  	[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE,
>>  	[C(L1I)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL,
>>  
>> +	[C(LL)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
>> +	[C(LL)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
>> +	[C(LL)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
>> +	[C(LL)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
> 
> I don't think this is correct in general. 'LL' stands for "last-level",
> which may be L3 or even a system cache in the interconnect. Tying that to L2
> is the wrong thing to do from perf's generic event perspective.
> 
> I'm ok with what you're proposing for A53 (where the PMU can only count
> events out to the L2), but I'm reluctant to make this change for the generic
> PMUv3 events.

That makes sense, shall I resubmit the first patch by itself or can you
or Catalin take it as-is?

Thanks!
Will Deacon April 27, 2017, 5:36 p.m. UTC | #3
On Tue, Apr 25, 2017 at 10:13:51AM -0700, Florian Fainelli wrote:
> On 04/25/2017 05:44 AM, Will Deacon wrote:
> > Hi Florian,
> > 
> > On Thu, Apr 20, 2017 at 12:05:46PM -0700, Florian Fainelli wrote:
> >> The ARMv8 PMUv3 cache map did not include the L2 cache events, add
> >> them.
> >>
> >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >> ---
> >>  arch/arm64/kernel/perf_event.c | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> >> index 4f011cdd756d..a664c575f3fd 100644
> >> --- a/arch/arm64/kernel/perf_event.c
> >> +++ b/arch/arm64/kernel/perf_event.c
> >> @@ -264,6 +264,11 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
> >>  	[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE,
> >>  	[C(L1I)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL,
> >>  
> >> +	[C(LL)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
> >> +	[C(LL)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
> >> +	[C(LL)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
> >> +	[C(LL)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
> > 
> > I don't think this is correct in general. 'LL' stands for "last-level",
> > which may be L3 or even a system cache in the interconnect. Tying that to L2
> > is the wrong thing to do from perf's generic event perspective.
> > 
> > I'm ok with what you're proposing for A53 (where the PMU can only count
> > events out to the L2), but I'm reluctant to make this change for the generic
> > PMUv3 events.
> 
> That makes sense, shall I resubmit the first patch by itself or can you
> or Catalin take it as-is?

I'll talk to Catalin tomorrow and try to get the A53 bit queued.

Will
Catalin Marinas April 28, 2017, 2:15 p.m. UTC | #4
On Thu, Apr 27, 2017 at 06:36:42PM +0100, Will Deacon wrote:
> On Tue, Apr 25, 2017 at 10:13:51AM -0700, Florian Fainelli wrote:
> > On 04/25/2017 05:44 AM, Will Deacon wrote:
> > > On Thu, Apr 20, 2017 at 12:05:46PM -0700, Florian Fainelli wrote:
> > >> The ARMv8 PMUv3 cache map did not include the L2 cache events, add
> > >> them.
> > >>
> > >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> > >> ---
> > >>  arch/arm64/kernel/perf_event.c | 5 +++++
> > >>  1 file changed, 5 insertions(+)
> > >>
> > >> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > >> index 4f011cdd756d..a664c575f3fd 100644
> > >> --- a/arch/arm64/kernel/perf_event.c
> > >> +++ b/arch/arm64/kernel/perf_event.c
> > >> @@ -264,6 +264,11 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
> > >>  	[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE,
> > >>  	[C(L1I)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL,
> > >>  
> > >> +	[C(LL)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
> > >> +	[C(LL)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
> > >> +	[C(LL)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
> > >> +	[C(LL)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
> > > 
> > > I don't think this is correct in general. 'LL' stands for "last-level",
> > > which may be L3 or even a system cache in the interconnect. Tying that to L2
> > > is the wrong thing to do from perf's generic event perspective.
> > > 
> > > I'm ok with what you're proposing for A53 (where the PMU can only count
> > > events out to the L2), but I'm reluctant to make this change for the generic
> > > PMUv3 events.
> > 
> > That makes sense, shall I resubmit the first patch by itself or can you
> > or Catalin take it as-is?
> 
> I'll talk to Catalin tomorrow and try to get the A53 bit queued.

I queued patch 1/2. Shall I add your ack?
Will Deacon April 28, 2017, 2:17 p.m. UTC | #5
On Fri, Apr 28, 2017 at 03:15:01PM +0100, Catalin Marinas wrote:
> On Thu, Apr 27, 2017 at 06:36:42PM +0100, Will Deacon wrote:
> > On Tue, Apr 25, 2017 at 10:13:51AM -0700, Florian Fainelli wrote:
> > > On 04/25/2017 05:44 AM, Will Deacon wrote:
> > > > On Thu, Apr 20, 2017 at 12:05:46PM -0700, Florian Fainelli wrote:
> > > >> The ARMv8 PMUv3 cache map did not include the L2 cache events, add
> > > >> them.
> > > >>
> > > >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> > > >> ---
> > > >>  arch/arm64/kernel/perf_event.c | 5 +++++
> > > >>  1 file changed, 5 insertions(+)
> > > >>
> > > >> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > > >> index 4f011cdd756d..a664c575f3fd 100644
> > > >> --- a/arch/arm64/kernel/perf_event.c
> > > >> +++ b/arch/arm64/kernel/perf_event.c
> > > >> @@ -264,6 +264,11 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
> > > >>  	[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE,
> > > >>  	[C(L1I)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL,
> > > >>  
> > > >> +	[C(LL)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
> > > >> +	[C(LL)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
> > > >> +	[C(LL)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
> > > >> +	[C(LL)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
> > > > 
> > > > I don't think this is correct in general. 'LL' stands for "last-level",
> > > > which may be L3 or even a system cache in the interconnect. Tying that to L2
> > > > is the wrong thing to do from perf's generic event perspective.
> > > > 
> > > > I'm ok with what you're proposing for A53 (where the PMU can only count
> > > > events out to the L2), but I'm reluctant to make this change for the generic
> > > > PMUv3 events.
> > > 
> > > That makes sense, shall I resubmit the first patch by itself or can you
> > > or Catalin take it as-is?
> > 
> > I'll talk to Catalin tomorrow and try to get the A53 bit queued.
> 
> I queued patch 1/2. Shall I add your ack?

Yes, please.

Will
diff mbox

Patch

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 4f011cdd756d..a664c575f3fd 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -264,6 +264,11 @@  static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 	[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE,
 	[C(L1I)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL,
 
+	[C(LL)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
+	[C(LL)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
+	[C(LL)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE,
+	[C(LL)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L2D_CACHE_REFILL,
+
 	[C(DTLB)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL,
 	[C(DTLB)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1D_TLB,