diff mbox

[PATCH/RFC] arm64: pmu: add Qualcomm Technologies extensions

Message ID 1488385085-19238-1-git-send-email-nleeder@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Neil Leeder March 1, 2017, 4:18 p.m. UTC
Adds CPU PMU perf events support for Qualcomm Technologies' Falkor CPU.

The Qualcomm Technologies CPU PMU is named qcom_pmuv3 and provides
extensions to the architected PMU events.

The extended events are implemented by a set of registers which
are programmed by shifting an event code into a group field.
The PMNx event then points to that region/group combo.

Restrictions that limit only one concurrent region/group combination
are also enforced.

Signed-off-by: Neil Leeder <nleeder@codeaurora.org>
---
The Qualcomm Technologies CPU PMU extensions have an additional set of registers
which need to be programmed when configuring an event. These are the PMRESRs,
which are similar to the krait & scorpion registers in armv7, and the L2
variants in the Qualcomm Technologies L2 PMU driver.

This is an alpha patch where I'm looking for comments on design decisions. I'll
hit the highlights here, with my rationale for the decisions and some possible
alternatives.

For some functions (reset, get_event_idx) I've been able to add qc_* wrappers
for the additional code, and within them call the armv8pmu_* functions for
common code.

For [enable|disable]_event this isn't possible because the body has to be
surrounded by a spinlock. I considered the alternative of duplicating the
function and adding qc-specific code, but went with calling the qc functions
from within the armv8pmu functions. One reason for that is that future features,
for example adding chaining support, may have to be duplicated in both
functions which I wanted to avoid.

There are additional constraints on qc events. The armv7 implementation checks
these in get_event_idx, but during L2 PMU reviews it was thought better to do
these during init processing where possible. I added these in the map_event
callback because its the only callback from within armpmu_event_init(). I'm not
sure if that would be considered stretching the acceptable use of that interface,
so I'm open to other suggestions.

The qc driver also needs to check conflicts between events, using a bitmap. This
has similar use to the hw_events->used_mask. I added the event_conflicts bitmap
into hw_events, although I'm not sure if that's the best place for an
implementation-specific field. An alternative would be a static DEFINE_PER_CPU
bitmap, although that didn't seem as clean, but may be more acceptable.

qc_max_resr is a variable, rather than a constant, to accommodate future
processors with different numbers of RESRs.

This requires Jeremy Linton's patch sequence to add arm64 CPU PMU ACPI support:
https://patchwork.kernel.org/patch/9533677/

Thanks,
Neil

 arch/arm64/kernel/perf_event.c | 344 ++++++++++++++++++++++++++++++++++++++++-
 include/linux/perf/arm_pmu.h   |   8 +
 2 files changed, 348 insertions(+), 4 deletions(-)

Comments

Mark Rutland March 1, 2017, 6:10 p.m. UTC | #1
Hi Neil,

On Wed, Mar 01, 2017 at 11:18:05AM -0500, Neil Leeder wrote:
> Adds CPU PMU perf events support for Qualcomm Technologies' Falkor CPU.
> 
> The Qualcomm Technologies CPU PMU is named qcom_pmuv3 and provides
> extensions to the architected PMU events.

Is this is a strict superset of PMUv3 (that could validly be treated as
just PMUv3), or do those IMP DEF parts need to be poked to use this at
all?

What is reported by ID_AA64DFR0_EL1.PMUVer on these CPUs?

Quite frankly, I'm less than thrilled about the prospect of
IMPLEMENTATION DEFINED CPU PMUs that fall outside of the architected PMU
model, especially for ACPI systems where the raison d'être is standards
and uniformity, and where we have no sensible mechanism to provide
information regarding IMPLEMENTATION DEFINED functionality.

This has knock-on effects for other things, like userspace PMU access
and/or virtualization, and this multiplies the support effort.

KVM already has (architected) PMU support, and without a corresponding
KVM patch this is at best insufficient. I don't imagine the KVM folk
will be too thrilled about the prospect of emulating an IMPLEMENTATION
DEFINED CPU feature like this.

Note that system PMUs are a very different case, since these are not
(implicitly) passed through to guests, nor do they have significant
overlap with architected functionality or each other. It's fine to add
support for those to the host kernel.

> The extended events are implemented by a set of registers which
> are programmed by shifting an event code into a group field.
> The PMNx event then points to that region/group combo.
> 
> Restrictions that limit only one concurrent region/group combination
> are also enforced.
> 
> Signed-off-by: Neil Leeder <nleeder@codeaurora.org>
> ---
> The Qualcomm Technologies CPU PMU extensions have an additional set of registers
> which need to be programmed when configuring an event. These are the PMRESRs,
> which are similar to the krait & scorpion registers in armv7, and the L2
> variants in the Qualcomm Technologies L2 PMU driver.

If these *must* be programmed, it sounds like this isn't PMUv3
compatible.

[...]

> There are additional constraints on qc events. The armv7 implementation checks
> these in get_event_idx, but during L2 PMU reviews it was thought better to do
> these during init processing where possible. I added these in the map_event
> callback because its the only callback from within armpmu_event_init(). I'm not
> sure if that would be considered stretching the acceptable use of that interface,
> so I'm open to other suggestions.

As a general rule for PMU drivers:

* pmu::event_init() should check whether the entire group the event is
  in (i.e. the parent and all siblings) can potentially be allocated
  into counters simultaneously. If it is always impossible for the whole
  group to go on, pmu::event_init should fail, since the group is
  impossible.

* pmu::add() needs to handle inter-group and intra-group conflicts that
  can arise dynamically dependent on how (other) events are scheduled.
  This also needs to fail in the event of a conflict.

> The qc driver also needs to check conflicts between events, using a bitmap. This
> has similar use to the hw_events->used_mask. I added the event_conflicts bitmap
> into hw_events, although I'm not sure if that's the best place for an
> implementation-specific field. An alternative would be a static DEFINE_PER_CPU
> bitmap, although that didn't seem as clean, but may be more acceptable.
> 
> qc_max_resr is a variable, rather than a constant, to accommodate future
> processors with different numbers of RESRs.
> 
> This requires Jeremy Linton's patch sequence to add arm64 CPU PMU ACPI support:
> https://patchwork.kernel.org/patch/9533677/

As a heads-up, I'm currently working on an alternative approach that you
can find at:

git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm/perf/acpi

That's a work-in-progress, and there are a few issues (notably IRQ
management) that I'm currently fixing up. I also intend to add a PMUv3
check to the PMUv3 probe.

[...]

> +static bool qc_pmu;

Sorry, but a global boolean to describe whether a (single ?) PMU
instance is a particular implementation is not going to fly.

> +static void qc_pmu_enable_event(struct perf_event *event,
> +				struct hw_perf_event *hwc, int idx);
> +static void qc_pmu_disable_event(struct perf_event *event,
> +				 struct hw_perf_event *hwc);
> +
>  /*
>   * Perf Events' indices
>   */
> @@ -704,10 +730,13 @@ static void armv8pmu_enable_event(struct perf_event *event)
>  	 */
>  	armv8pmu_disable_counter(idx);
>  
> -	/*
> -	 * Set event (if destined for PMNx counters).
> -	 */
> -	armv8pmu_write_evtype(idx, hwc->config_base);
> +	if (qc_pmu)
> +		qc_pmu_enable_event(event, hwc, idx);
> +	else
> +		/*
> +		 * Set event (if destined for PMNx counters).
> +		 */
> +		armv8pmu_write_evtype(idx, hwc->config_base);

Similarly, this is not a workable structure for these functions.

[...]

> +static void qc_pmu_reset(void *info)
> +{
> +	qc_clear_resrs();
> +	armv8pmu_reset(info);
> +}

Is the QC-specific reset required only if we use the QC-specific events,
or is that required for the standard PMUv3 features to be usable?

Are standard PMUv3 events guaranteed to work if we didn't call
qc_clear_resrs()?

If this is not required for the standard PMUv3 features, and/or any IMP
DEF reset is performed by FW, it looks like we *may* be able to treat
this as PMUv3.

> +static void qc_pmu_enable_event(struct perf_event *event,
> +				struct hw_perf_event *hwc, int idx)
> +{
> +	unsigned int reg, code, group;
> +
> +	if (QC_EVT_PFX(hwc->config_base) != QC_EVT_PREFIX) {
> +		armv8pmu_write_evtype(idx, hwc->config_base);
> +		return;
> +	}

This really shows that this is not a workable structure. It's hideous to
hook the PMUv3 code to call this, then try to duplicate what the PMUv3
code would have done in this case.

[...]

> +static int armv8_falkor_pmu_init(struct arm_pmu *cpu_pmu)
> +{
> +	armv8_pmu_init(cpu_pmu);
> +	cpu_pmu->name			= "qcom_pmuv3";
> +	cpu_pmu->map_event		= armv8_qc_map_event;
> +	cpu_pmu->reset			= qc_pmu_reset;
> +	cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] =
> +		&armv8_pmuv3_events_attr_group;
> +	cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_FORMATS] =
> +		&qc_pmu_format_attr_group;
> +	cpu_pmu->get_event_idx		= qc_get_event_idx;
> +	cpu_pmu->clear_event_idx	= qc_clear_event_idx;
> +
> +	qc_max_resr = QC_FALKOR_MAX_RESR;
> +	qc_clear_resrs();
> +	qc_pmu = true;
> +
> +	if (qc_max_resr > QC_MAX_RESRS) {
> +		/* Sanity check */
> +		pr_err("qcom_pmuv3: max number of RESRs exceeded\n");
> +		return -EINVAL;
> +	}
> +
> +	return armv8pmu_probe_pmu(cpu_pmu);
> +}
> +
>  static const struct of_device_id armv8_pmu_of_device_ids[] = {
>  	{.compatible = "arm,armv8-pmuv3",	.data = armv8_pmuv3_init},
>  	{.compatible = "arm,cortex-a53-pmu",	.data = armv8_a53_pmu_init},
> @@ -1087,6 +1421,8 @@ static int armv8_vulcan_pmu_init(struct arm_pmu *cpu_pmu)
>   * aren't supported by the current PMU are disabled.
>   */
>  static const struct pmu_probe_info armv8_pmu_probe_table[] = {
> +	PMU_PROBE(QCOM_CPU_PART_FALKOR_V1 << MIDR_PARTNUM_SHIFT,
> +		  MIDR_PARTNUM_MASK, armv8_falkor_pmu_init),
>  	PMU_PROBE(0, 0, armv8_pmuv3_init), /* enable all defined counters */
>  	{ /* sentinel value */ }

We don't special-case other PMUs here, and the plan for ACPI was to
rely solely on the architectural information, i.e. the architected ID
registers associated with PMUv3.

I don't think we should special-case implementations like this.
My series removes this MIDR matching.

Thanks,
Mark.
Marc Zyngier March 1, 2017, 7:48 p.m. UTC | #2
On Wed, Mar 01 2017 at  6:10:33 pm GMT, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Neil,
>
> On Wed, Mar 01, 2017 at 11:18:05AM -0500, Neil Leeder wrote:
>> Adds CPU PMU perf events support for Qualcomm Technologies' Falkor CPU.
>> 
>> The Qualcomm Technologies CPU PMU is named qcom_pmuv3 and provides
>> extensions to the architected PMU events.
>
> Is this is a strict superset of PMUv3 (that could validly be treated as
> just PMUv3), or do those IMP DEF parts need to be poked to use this at
> all?
>
> What is reported by ID_AA64DFR0_EL1.PMUVer on these CPUs?
>
> Quite frankly, I'm less than thrilled about the prospect of
> IMPLEMENTATION DEFINED CPU PMUs that fall outside of the architected PMU
> model, especially for ACPI systems where the raison d'être is standards
> and uniformity, and where we have no sensible mechanism to provide
> information regarding IMPLEMENTATION DEFINED functionality.
>
> This has knock-on effects for other things, like userspace PMU access
> and/or virtualization, and this multiplies the support effort.
>
> KVM already has (architected) PMU support, and without a corresponding
> KVM patch this is at best insufficient. I don't imagine the KVM folk
> will be too thrilled about the prospect of emulating an IMPLEMENTATION
> DEFINED CPU feature like this.

Indeed. Another nasty property of these "added value" features is that
they break things like migration. Once the guest has booted on one of
these systems, it cannot be moved to another system. This puts a bit of
a dent on the architecture, frankly...

	M.
Neil Leeder March 1, 2017, 9:36 p.m. UTC | #3
Hi Mark,
Thanks for the quick response.

On 3/1/2017 1:10 PM, Mark Rutland wrote:
> Hi Neil,
>
> On Wed, Mar 01, 2017 at 11:18:05AM -0500, Neil Leeder wrote:
>> Adds CPU PMU perf events support for Qualcomm Technologies' Falkor CPU.
>>
>> The Qualcomm Technologies CPU PMU is named qcom_pmuv3 and provides
>> extensions to the architected PMU events.
>
> Is this is a strict superset of PMUv3 (that could validly be treated as
> just PMUv3), or do those IMP DEF parts need to be poked to use this at
> all?
>
> What is reported by ID_AA64DFR0_EL1.PMUVer on these CPUs?

It's a strict superset. If you don't use any of the extensions than it 
behaves as a PMUv3 for architected events. ID_AA64DFR0_EL1.PMUVer = 1.

> Quite frankly, I'm less than thrilled about the prospect of
> IMPLEMENTATION DEFINED CPU PMUs that fall outside of the architected PMU
> model, especially for ACPI systems where the raison d'être is standards
> and uniformity, and where we have no sensible mechanism to provide
> information regarding IMPLEMENTATION DEFINED functionality.
>
> This has knock-on effects for other things, like userspace PMU access
> and/or virtualization, and this multiplies the support effort.
>
> KVM already has (architected) PMU support, and without a corresponding
> KVM patch this is at best insufficient. I don't imagine the KVM folk
> will be too thrilled about the prospect of emulating an IMPLEMENTATION
> DEFINED CPU feature like this.

Does KVM handle ARMv7 PMU implementations? If so, do you know what it 
does for the scorpion_* and krait_* implementations in 
arch/arm/kernel/perf_events_v7.c? These extensions in ARMv8 are very 
similar to the krait extensions, with some 64-bit tweaks, so could be 
handled by KVM the same way it handles the ARMv7 cases.

I'm not sure about userspace changes - these extensions use a different 
config format to specify an event, but otherwise are transparent to 
userspace.

[...]

>> The Qualcomm Technologies CPU PMU extensions have an additional set of registers
>> which need to be programmed when configuring an event. These are the PMRESRs,
>> which are similar to the krait & scorpion registers in armv7, and the L2
>> variants in the Qualcomm Technologies L2 PMU driver.
>
> If these *must* be programmed, it sounds like this isn't PMUv3
> compatible.

Sorry for the imprecise wording. They only have to be programmed when 
using an event in these extensions, not for architected events.

>> There are additional constraints on qc events. The armv7 implementation checks
>> these in get_event_idx, but during L2 PMU reviews it was thought better to do
>> these during init processing where possible. I added these in the map_event
>> callback because its the only callback from within armpmu_event_init(). I'm not
>> sure if that would be considered stretching the acceptable use of that interface,
>> so I'm open to other suggestions.
>
> As a general rule for PMU drivers:
>
> * pmu::event_init() should check whether the entire group the event is
>   in (i.e. the parent and all siblings) can potentially be allocated
>   into counters simultaneously. If it is always impossible for the whole
>   group to go on, pmu::event_init should fail, since the group is
>   impossible.
>
> * pmu::add() needs to handle inter-group and intra-group conflicts that
>   can arise dynamically dependent on how (other) events are scheduled.
>   This also needs to fail in the event of a conflict.
>

Checking whether there is a conflict within the group is what 
qc_verify_event() does, as it's called from the map_event callback.

[...]

>> This requires Jeremy Linton's patch sequence to add arm64 CPU PMU ACPI support:
>> https://patchwork.kernel.org/patch/9533677/
>
> As a heads-up, I'm currently working on an alternative approach that you
> can find at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm/perf/acpi
>
> That's a work-in-progress, and there are a few issues (notably IRQ
> management) that I'm currently fixing up. I also intend to add a PMUv3
> check to the PMUv3 probe.

Thanks - I'll check this out and I'll be interested to see the final 
version.

>> +static bool qc_pmu;
>
> Sorry, but a global boolean to describe whether a (single ?) PMU
> instance is a particular implementation is not going to fly.
>

Yeah, I wasn't too happy about that, so I was looking for alternatives. 
Duplicate the enable/disable_event() functions and add qc-specific 
processing there? Add additional callbacks to be called from within 
armv8pmu_enable/disable_event and only populate them for the qc case? 
Something else?

[...]

>
>> +static void qc_pmu_reset(void *info)
>> +{
>> +	qc_clear_resrs();
>> +	armv8pmu_reset(info);
>> +}
>
> Is the QC-specific reset required only if we use the QC-specific events,
> or is that required for the standard PMUv3 features to be usable?
>
> Are standard PMUv3 events guaranteed to work if we didn't call
> qc_clear_resrs()?
>
> If this is not required for the standard PMUv3 features, and/or any IMP
> DEF reset is performed by FW, it looks like we *may* be able to treat
> this as PMUv3.

This reset is only required for the QC extensions.

>> +static void qc_pmu_enable_event(struct perf_event *event,
>> +				struct hw_perf_event *hwc, int idx)
>> +{
>> +	unsigned int reg, code, group;
>> +
>> +	if (QC_EVT_PFX(hwc->config_base) != QC_EVT_PREFIX) {
>> +		armv8pmu_write_evtype(idx, hwc->config_base);
>> +		return;
>> +	}
>
> This really shows that this is not a workable structure. It's hideous to
> hook the PMUv3 code to call this, then try to duplicate what the PMUv3
> code would have done in this case.

I can rework this once there's an acceptable way to handle the 
qc-specific enable/disables.

>>  static const struct of_device_id armv8_pmu_of_device_ids[] = {
>>  	{.compatible = "arm,armv8-pmuv3",	.data = armv8_pmuv3_init},
>>  	{.compatible = "arm,cortex-a53-pmu",	.data = armv8_a53_pmu_init},
>> @@ -1087,6 +1421,8 @@ static int armv8_vulcan_pmu_init(struct arm_pmu *cpu_pmu)
>>   * aren't supported by the current PMU are disabled.
>>   */
>>  static const struct pmu_probe_info armv8_pmu_probe_table[] = {
>> +	PMU_PROBE(QCOM_CPU_PART_FALKOR_V1 << MIDR_PARTNUM_SHIFT,
>> +		  MIDR_PARTNUM_MASK, armv8_falkor_pmu_init),
>>  	PMU_PROBE(0, 0, armv8_pmuv3_init), /* enable all defined counters */
>>  	{ /* sentinel value */ }
>
> We don't special-case other PMUs here, and the plan for ACPI was to
> rely solely on the architectural information, i.e. the architected ID
> registers associated with PMUv3.
>
> I don't think we should special-case implementations like this.
> My series removes this MIDR matching.

So would there be equivalent ACPI support for the various 3rd-party 
implementations (vulcan, thunder... and then qc) as there is with device 
tree?

> Thanks,
> Mark.
>

Thanks,
Neil
Marc Zyngier March 2, 2017, 9:05 a.m. UTC | #4
On 01/03/17 21:36, Leeder, Neil wrote:
> Hi Mark,
> Thanks for the quick response.
> 
> On 3/1/2017 1:10 PM, Mark Rutland wrote:
>> Hi Neil,
>>
>> On Wed, Mar 01, 2017 at 11:18:05AM -0500, Neil Leeder wrote:
>>> Adds CPU PMU perf events support for Qualcomm Technologies' Falkor CPU.
>>>
>>> The Qualcomm Technologies CPU PMU is named qcom_pmuv3 and provides
>>> extensions to the architected PMU events.
>>
>> Is this is a strict superset of PMUv3 (that could validly be treated as
>> just PMUv3), or do those IMP DEF parts need to be poked to use this at
>> all?
>>
>> What is reported by ID_AA64DFR0_EL1.PMUVer on these CPUs?
> 
> It's a strict superset. If you don't use any of the extensions than it 
> behaves as a PMUv3 for architected events. ID_AA64DFR0_EL1.PMUVer = 1.
> 
>> Quite frankly, I'm less than thrilled about the prospect of
>> IMPLEMENTATION DEFINED CPU PMUs that fall outside of the architected PMU
>> model, especially for ACPI systems where the raison d'être is standards
>> and uniformity, and where we have no sensible mechanism to provide
>> information regarding IMPLEMENTATION DEFINED functionality.
>>
>> This has knock-on effects for other things, like userspace PMU access
>> and/or virtualization, and this multiplies the support effort.
>>
>> KVM already has (architected) PMU support, and without a corresponding
>> KVM patch this is at best insufficient. I don't imagine the KVM folk
>> will be too thrilled about the prospect of emulating an IMPLEMENTATION
>> DEFINED CPU feature like this.
> 
> Does KVM handle ARMv7 PMU implementations? If so, do you know what it 
> does for the scorpion_* and krait_* implementations in 
> arch/arm/kernel/perf_events_v7.c? These extensions in ARMv8 are very 
> similar to the krait extensions, with some 64-bit tweaks, so could be 
> handled by KVM the same way it handles the ARMv7 cases.

No, KVM doesn't handle the ARMv7 PMU at all. I'm not aware of the
virtualization extensions being available on Scorpion or Krait, which
makes it a moot point. What it handles is the PMUv3 architecture.

Thanks,

	M.
Mark Rutland March 2, 2017, 3:16 p.m. UTC | #5
Hi,

On Wed, Mar 01, 2017 at 04:36:07PM -0500, Leeder, Neil wrote:
> On 3/1/2017 1:10 PM, Mark Rutland wrote:
> >On Wed, Mar 01, 2017 at 11:18:05AM -0500, Neil Leeder wrote:
> >>Adds CPU PMU perf events support for Qualcomm Technologies' Falkor CPU.
> >>
> >>The Qualcomm Technologies CPU PMU is named qcom_pmuv3 and provides
> >>extensions to the architected PMU events.
> >
> >Is this is a strict superset of PMUv3 (that could validly be treated as
> >just PMUv3), or do those IMP DEF parts need to be poked to use this at
> >all?
> >
> >What is reported by ID_AA64DFR0_EL1.PMUVer on these CPUs?
> 
> It's a strict superset. If you don't use any of the extensions than
> it behaves as a PMUv3 for architected events. ID_AA64DFR0_EL1.PMUVer
> = 1.

> >>The Qualcomm Technologies CPU PMU extensions have an additional set of registers
> >>which need to be programmed when configuring an event. These are the PMRESRs,
> >>which are similar to the krait & scorpion registers in armv7, and the L2
> >>variants in the Qualcomm Technologies L2 PMU driver.
> >
> >If these *must* be programmed, it sounds like this isn't PMUv3
> >compatible.
> 
> Sorry for the imprecise wording. They only have to be programmed
> when using an event in these extensions, not for architected events.

Ok, so given that and the ID_AA64DFR0_EL1.PMUVer value, we can treat
this as a PMUv3.

Given that in general we do not support IMP DEF features like this,
given that this cannot work with KVM, and given that the only probing
mechanism we have is to identify the implementation, I'm not keen on
trying to support more than that.

> >> static const struct of_device_id armv8_pmu_of_device_ids[] = {
> >> 	{.compatible = "arm,armv8-pmuv3",	.data = armv8_pmuv3_init},
> >> 	{.compatible = "arm,cortex-a53-pmu",	.data = armv8_a53_pmu_init},
> >>@@ -1087,6 +1421,8 @@ static int armv8_vulcan_pmu_init(struct arm_pmu *cpu_pmu)
> >>  * aren't supported by the current PMU are disabled.
> >>  */
> >> static const struct pmu_probe_info armv8_pmu_probe_table[] = {
> >>+	PMU_PROBE(QCOM_CPU_PART_FALKOR_V1 << MIDR_PARTNUM_SHIFT,
> >>+		  MIDR_PARTNUM_MASK, armv8_falkor_pmu_init),
> >> 	PMU_PROBE(0, 0, armv8_pmuv3_init), /* enable all defined counters */
> >> 	{ /* sentinel value */ }
> >
> >We don't special-case other PMUs here, and the plan for ACPI was to
> >rely solely on the architectural information, i.e. the architected ID
> >registers associated with PMUv3.
> >
> >I don't think we should special-case implementations like this.
> >My series removes this MIDR matching.
> 
> So would there be equivalent ACPI support for the various 3rd-party
> implementations (vulcan, thunder... and then qc) as there is with
> device tree?

So far, those are all PMUv3, and the code handling the compatible string
only sets up two things: a unique name (so as to avoid sysfs clashes),
and default event mapping (which largely could/should be derived from
PMCEID*).

The naming will differ when using ACPI. In Jeremy's series and in mine,
we append a unique suffix to the armv8_pmuv3 string.

We could certainly take a look at using PMCEID* to do a better job of
event mapping.

Thanks,
Mark.
Neil Leeder March 2, 2017, 7:30 p.m. UTC | #6
Hi Mark Z.,

On 3/2/2017 4:05 AM, Marc Zyngier wrote:
> On 01/03/17 21:36, Leeder, Neil wrote:
>> On 3/1/2017 1:10 PM, Mark Rutland wrote:
>>> KVM already has (architected) PMU support, and without a corresponding
>>> KVM patch this is at best insufficient. I don't imagine the KVM folk
>>> will be too thrilled about the prospect of emulating an IMPLEMENTATION
>>> DEFINED CPU feature like this.
>>
>> Does KVM handle ARMv7 PMU implementations? If so, do you know what it
>> does for the scorpion_* and krait_* implementations in
>> arch/arm/kernel/perf_events_v7.c? These extensions in ARMv8 are very
>> similar to the krait extensions, with some 64-bit tweaks, so could be
>> handled by KVM the same way it handles the ARMv7 cases.
>
> No, KVM doesn't handle the ARMv7 PMU at all. I'm not aware of the
> virtualization extensions being available on Scorpion or Krait, which
> makes it a moot point. What it handles is the PMUv3 architecture.

Thank you for the explanation.

This driver is specifically for Qualcomm Technologies server chips. They 
will not be in a heterogenous environment with non-Qualcomm processors, 
so there should be no migration issues.

If we were to provide a patch which added KVM support for the 4 
additional registers here, would you consider reviewing it, or is adding 
implementation-defined registers a show-stopper?

Thanks,
Neil
Marc Zyngier March 3, 2017, 10:17 a.m. UTC | #7
On Thu, Mar 02 2017 at  7:30:53 pm GMT, "Leeder, Neil" <nleeder@codeaurora.org> wrote:
> Hi Mark Z.,
>
> On 3/2/2017 4:05 AM, Marc Zyngier wrote:
>> On 01/03/17 21:36, Leeder, Neil wrote:
>>> On 3/1/2017 1:10 PM, Mark Rutland wrote:
>>>> KVM already has (architected) PMU support, and without a corresponding
>>>> KVM patch this is at best insufficient. I don't imagine the KVM folk
>>>> will be too thrilled about the prospect of emulating an IMPLEMENTATION
>>>> DEFINED CPU feature like this.
>>>
>>> Does KVM handle ARMv7 PMU implementations? If so, do you know what it
>>> does for the scorpion_* and krait_* implementations in
>>> arch/arm/kernel/perf_events_v7.c? These extensions in ARMv8 are very
>>> similar to the krait extensions, with some 64-bit tweaks, so could be
>>> handled by KVM the same way it handles the ARMv7 cases.
>>
>> No, KVM doesn't handle the ARMv7 PMU at all. I'm not aware of the
>> virtualization extensions being available on Scorpion or Krait, which
>> makes it a moot point. What it handles is the PMUv3 architecture.
>
> Thank you for the explanation.
>
> This driver is specifically for Qualcomm Technologies server
> chips. They will not be in a heterogenous environment with
> non-Qualcomm processors, so there should be no migration issues.

How do you know that? I'm afraid this is not something you or I can
guarantee (and even less enforce).

> If we were to provide a patch which added KVM support for the 4
> additional registers here, would you consider reviewing it, or is
> adding implementation-defined registers a show-stopper?

At the moment, it seems that there is a consensus against adding support
for an IMPDEF PMU.

Thanks,

	M.
diff mbox

Patch

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 0fbd7ef..ed4842d 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -535,6 +535,32 @@ 
 	.attrs = armv8_pmuv3_format_attrs,
 };
 
+/* NRCCG format for qc perf raw codes. */
+PMU_FORMAT_ATTR(prefix, "config:16-19");
+PMU_FORMAT_ATTR(reg,    "config:12-15");
+PMU_FORMAT_ATTR(code,   "config:4-11");
+PMU_FORMAT_ATTR(group,  "config:0-3");
+
+static struct attribute *qc_ev_formats[] = {
+	&format_attr_prefix.attr,
+	&format_attr_reg.attr,
+	&format_attr_code.attr,
+	&format_attr_group.attr,
+	&format_attr_event.attr,
+	NULL,
+};
+
+static struct attribute_group qc_pmu_format_attr_group = {
+	.name = "format",
+	.attrs = qc_ev_formats,
+};
+
+static bool qc_pmu;
+static void qc_pmu_enable_event(struct perf_event *event,
+				struct hw_perf_event *hwc, int idx);
+static void qc_pmu_disable_event(struct perf_event *event,
+				 struct hw_perf_event *hwc);
+
 /*
  * Perf Events' indices
  */
@@ -704,10 +730,13 @@  static void armv8pmu_enable_event(struct perf_event *event)
 	 */
 	armv8pmu_disable_counter(idx);
 
-	/*
-	 * Set event (if destined for PMNx counters).
-	 */
-	armv8pmu_write_evtype(idx, hwc->config_base);
+	if (qc_pmu)
+		qc_pmu_enable_event(event, hwc, idx);
+	else
+		/*
+		 * Set event (if destined for PMNx counters).
+		 */
+		armv8pmu_write_evtype(idx, hwc->config_base);
 
 	/*
 	 * Enable interrupt for this counter
@@ -740,6 +769,9 @@  static void armv8pmu_disable_event(struct perf_event *event)
 	 */
 	armv8pmu_disable_counter(idx);
 
+	if (qc_pmu)
+		qc_pmu_disable_event(event, hwc);
+
 	/*
 	 * Disable interrupt for this counter
 	 */
@@ -929,6 +961,269 @@  static int armv8_pmuv3_map_event(struct perf_event *event)
 	return hw_event_id;
 }
 
+/*
+ * Events for Qualcomm Techologies CPU PMU can be envisioned as a 2-dimensional
+ * array. Each column represents a group of events. There are 8 groups.
+ * Only one entry from each group can be in use at a time.
+ *
+ * There are several of these arrays, each controlled by a Region Event
+ * Selection Register (RESR).
+ *
+ * To distinguish Qualcomm Techologies events from ARM architecural events
+ * there is a prefix value specified in event encoding. Currently the only
+ * non-0 value defined is 1.
+ *
+ * Qualcomm Techologies events are specified as 0xNRCCG, where:
+ *   N  = Prefix (1 = Qualcomm Techologies events)
+ *   R  = RESR
+ *   CC = code (2 hex digits specifying array row)
+ *   G  = group (array column).
+ *
+ * In addition the ARM architecural events are also supported. They are
+ * differentiated from the Qualcomm Techologies events by having Prefix = 0.
+ */
+#define pmresr0_el0         sys_reg(3, 5, 11, 3, 0)
+#define pmresr1_el0         sys_reg(3, 5, 11, 3, 2)
+#define pmresr2_el0         sys_reg(3, 5, 11, 3, 4)
+#define pmxevcntcr_el0      sys_reg(3, 5, 11, 0, 3)
+
+#define QC_RESR_ENABLE      BIT_ULL(63)
+
+#define QC_EVT_PREFIX       1
+#define QC_EVT_PFX_SHIFT    16
+#define QC_EVT_REG_SHIFT    12
+#define QC_EVT_CODE_SHIFT   4
+#define QC_EVT_GRP_SHIFT    0
+#define QC_EVT_MASK         GENMASK(QC_EVT_PFX_SHIFT + 3,  0)
+#define QC_EVT_PFX_MASK     GENMASK(QC_EVT_PFX_SHIFT + 3,  QC_EVT_PFX_SHIFT)
+#define QC_EVT_REG_MASK     GENMASK(QC_EVT_REG_SHIFT + 3,  QC_EVT_REG_SHIFT)
+#define QC_EVT_CODE_MASK    GENMASK(QC_EVT_CODE_SHIFT + 7, QC_EVT_CODE_SHIFT)
+#define QC_EVT_GRP_MASK     GENMASK(QC_EVT_GRP_SHIFT + 3,  QC_EVT_GRP_SHIFT)
+#define QC_EVT_PFX(event)   (((event) & QC_EVT_PFX_MASK)  >> QC_EVT_PFX_SHIFT)
+#define QC_EVT_REG(event)   (((event) & QC_EVT_REG_MASK)  >> QC_EVT_REG_SHIFT)
+#define QC_EVT_CODE(event)  (((event) & QC_EVT_CODE_MASK) >> QC_EVT_CODE_SHIFT)
+#define QC_EVT_GROUP(event) (((event) & QC_EVT_GRP_MASK)  >> QC_EVT_GRP_SHIFT)
+
+#define QC_GROUPS_PER_REG   8
+#define QC_BITS_PER_GROUP   8
+#define QC_MAX_GROUP        7
+#define QC_FALKOR_MAX_RESR  2
+
+static int qc_max_resr;
+
+/*
+ * No CPU implementation can exceed this number of RESRS
+ *
+ * Used as a sanity check: detect a future CPU with number of RESRs * groups
+ * which exceeds the size of the event_conflicts element.
+ */
+#define QC_MAX_RESRS (ARMPMU_MAX_EVENT_CONFLICTS / (QC_MAX_GROUP + 1))
+
+static const u8 qc_evt_type_base[3] = {0xd8, 0xe0, 0xe8};
+
+static inline void qc_write_pmxevcntcr(u32 val)
+{
+	write_sysreg_s(val, pmxevcntcr_el0);
+}
+
+static void qc_write_pmresr(int reg, u64 val)
+{
+	if (reg > qc_max_resr)
+		return;
+
+	switch (reg) {
+	case 0:
+		write_sysreg_s(val, pmresr0_el0);
+		break;
+	case 1:
+		write_sysreg_s(val, pmresr1_el0);
+		break;
+	case 2:
+		write_sysreg_s(val, pmresr2_el0);
+		break;
+	}
+}
+
+static u64 qc_read_pmresr(int reg)
+{
+	u64 val = 0;
+
+	if (reg > qc_max_resr)
+		return 0;
+
+	switch (reg) {
+	case 0:
+		val = read_sysreg_s(pmresr0_el0);
+		break;
+	case 1:
+		val = read_sysreg_s(pmresr1_el0);
+		break;
+	case 2:
+		val = read_sysreg_s(pmresr2_el0);
+		break;
+	}
+
+	return val;
+}
+
+static inline u64 qc_get_columnmask(u32 group)
+{
+	u32 shift = QC_BITS_PER_GROUP * group;
+	u32 mask_size = QC_BITS_PER_GROUP;
+
+	/*
+	 * The max group is 1 bit smaller than the other groups,
+	 * because the MS bit in the register is the enable.
+	 */
+	if (group == QC_MAX_GROUP)
+		mask_size--;
+
+	return GENMASK_ULL(shift + mask_size - 1, shift);
+}
+
+static void qc_set_resr(int reg, int code, int group)
+{
+	u64 val;
+
+	val = qc_read_pmresr(reg) & ~qc_get_columnmask(group);
+	val |= ((u64)code << (group * QC_BITS_PER_GROUP));
+	val |= QC_RESR_ENABLE;
+	qc_write_pmresr(reg, val);
+}
+
+static void qc_clear_resr(int reg, int group)
+{
+	u64 val = qc_read_pmresr(reg) & ~qc_get_columnmask(group);
+
+	qc_write_pmresr(reg, val);
+}
+
+static void qc_clear_resrs(void)
+{
+	unsigned int i;
+
+	for (i = 0; i <= qc_max_resr; i++)
+		qc_write_pmresr(i, 0);
+}
+
+static void qc_pmu_reset(void *info)
+{
+	qc_clear_resrs();
+	armv8pmu_reset(info);
+}
+
+static int qc_verify_event(struct perf_event *event)
+{
+	struct perf_event *sibling;
+	u8 prefix  = QC_EVT_PFX(event->attr.config);
+	u8 reg     = QC_EVT_REG(event->attr.config);
+	u8 group   = QC_EVT_GROUP(event->attr.config);
+
+	/* No prefix, so not a qc event - nothing else to verify */
+	if (!prefix)
+		return 0;
+
+	if ((group > QC_MAX_GROUP) || (reg > qc_max_resr) ||
+	    (prefix != QC_EVT_PREFIX))
+		return -ENOENT;
+
+	if ((event != event->group_leader) &&
+	    (QC_EVT_GROUP(event->group_leader->attr.config) == group)) {
+		pr_debug_ratelimited(
+			 "Column exclusion: conflicting events %llx %llx\n",
+		       event->group_leader->attr.config,
+		       event->attr.config);
+		return -ENOENT;
+	}
+
+	list_for_each_entry(sibling, &event->group_leader->sibling_list,
+			    group_entry) {
+		if ((sibling != event) &&
+		    (QC_EVT_GROUP(sibling->attr.config) == group)) {
+			pr_debug_ratelimited(
+			     "Column exclusion: conflicting events %llx %llx\n",
+					    sibling->attr.config,
+					    event->attr.config);
+			return -ENOENT;
+		}
+	}
+
+	return 0;
+}
+
+static void qc_pmu_enable_event(struct perf_event *event,
+				struct hw_perf_event *hwc, int idx)
+{
+	unsigned int reg, code, group;
+
+	if (QC_EVT_PFX(hwc->config_base) != QC_EVT_PREFIX) {
+		armv8pmu_write_evtype(idx, hwc->config_base);
+		return;
+	}
+
+	reg = QC_EVT_REG(hwc->config_base);
+	code = QC_EVT_CODE(hwc->config_base);
+	group = QC_EVT_GROUP(hwc->config_base);
+
+	armv8pmu_write_evtype(idx,
+			      (hwc->config_base & ~QC_EVT_MASK) |
+			      qc_evt_type_base[reg] | group);
+	qc_write_pmxevcntcr(0);
+	qc_set_resr(reg, code, group);
+}
+
+static void qc_pmu_disable_event(struct perf_event *event,
+				 struct hw_perf_event *hwc)
+{
+	if (QC_EVT_PFX(hwc->config_base) == QC_EVT_PREFIX)
+		qc_clear_resr(QC_EVT_REG(hwc->config_base),
+			      QC_EVT_GROUP(hwc->config_base));
+}
+
+static int qc_get_event_idx(struct pmu_hw_events *cpuc,
+			    struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+	int bit = -1;
+	unsigned int reg, group;
+
+	/*
+	 * Check for column exclusion: event column already in use by another
+	 * event. This is for events which are not in the same group.
+	 * Conflicting events in the same group are detected in event_init.
+	 */
+	if (QC_EVT_PFX(hwc->config_base) == QC_EVT_PREFIX) {
+		reg = QC_EVT_REG(hwc->config_base);
+		group = QC_EVT_GROUP(hwc->config_base);
+
+		bit = reg * QC_GROUPS_PER_REG + group;
+		if (test_bit(bit, cpuc->event_conflicts))
+			return -EAGAIN;
+	}
+
+	idx = armv8pmu_get_event_idx(cpuc, event);
+
+	if ((idx >= 0) && (bit >= 0))
+		set_bit(bit, cpuc->event_conflicts);
+
+	return idx;
+}
+
+static void qc_clear_event_idx(struct pmu_hw_events *cpuc,
+			    struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	unsigned int reg, group;
+
+	if (QC_EVT_PFX(hwc->config_base) == QC_EVT_PREFIX) {
+		reg = QC_EVT_REG(hwc->config_base);
+		group = QC_EVT_GROUP(hwc->config_base);
+		clear_bit(reg * QC_GROUPS_PER_REG + group,
+			  cpuc->event_conflicts);
+	}
+}
+
 static int armv8_a53_map_event(struct perf_event *event)
 {
 	return armpmu_map_event(event, &armv8_a53_perf_map,
@@ -957,6 +1252,19 @@  static int armv8_vulcan_map_event(struct perf_event *event)
 				ARMV8_PMU_EVTYPE_EVENT);
 }
 
+static int armv8_qc_map_event(struct perf_event *event)
+{
+	int err;
+
+	err = qc_verify_event(event);
+	if (err < 0)
+		return err;
+
+	return armpmu_map_event(event, &armv8_pmuv3_perf_map,
+				&armv8_pmuv3_perf_cache_map,
+				QC_EVT_MASK);
+}
+
 static void __armv8pmu_probe_pmu(void *info)
 {
 	struct arm_pmu *cpu_pmu = info;
@@ -1071,6 +1379,32 @@  static int armv8_vulcan_pmu_init(struct arm_pmu *cpu_pmu)
 	return armv8pmu_probe_pmu(cpu_pmu);
 }
 
+static int armv8_falkor_pmu_init(struct arm_pmu *cpu_pmu)
+{
+	armv8_pmu_init(cpu_pmu);
+	cpu_pmu->name			= "qcom_pmuv3";
+	cpu_pmu->map_event		= armv8_qc_map_event;
+	cpu_pmu->reset			= qc_pmu_reset;
+	cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] =
+		&armv8_pmuv3_events_attr_group;
+	cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_FORMATS] =
+		&qc_pmu_format_attr_group;
+	cpu_pmu->get_event_idx		= qc_get_event_idx;
+	cpu_pmu->clear_event_idx	= qc_clear_event_idx;
+
+	qc_max_resr = QC_FALKOR_MAX_RESR;
+	qc_clear_resrs();
+	qc_pmu = true;
+
+	if (qc_max_resr > QC_MAX_RESRS) {
+		/* Sanity check */
+		pr_err("qcom_pmuv3: max number of RESRs exceeded\n");
+		return -EINVAL;
+	}
+
+	return armv8pmu_probe_pmu(cpu_pmu);
+}
+
 static const struct of_device_id armv8_pmu_of_device_ids[] = {
 	{.compatible = "arm,armv8-pmuv3",	.data = armv8_pmuv3_init},
 	{.compatible = "arm,cortex-a53-pmu",	.data = armv8_a53_pmu_init},
@@ -1087,6 +1421,8 @@  static int armv8_vulcan_pmu_init(struct arm_pmu *cpu_pmu)
  * aren't supported by the current PMU are disabled.
  */
 static const struct pmu_probe_info armv8_pmu_probe_table[] = {
+	PMU_PROBE(QCOM_CPU_PART_FALKOR_V1 << MIDR_PARTNUM_SHIFT,
+		  MIDR_PARTNUM_MASK, armv8_falkor_pmu_init),
 	PMU_PROBE(0, 0, armv8_pmuv3_init), /* enable all defined counters */
 	{ /* sentinel value */ }
 };
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index f652cd1..d9b5f44 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -37,6 +37,8 @@  struct arm_pmu_platdata {
  */
 #define ARMPMU_MAX_HWEVENTS		32
 
+#define ARMPMU_MAX_EVENT_CONFLICTS	64
+
 #define HW_OP_UNSUPPORTED		0xFFFF
 #define C(_x)				PERF_COUNT_HW_CACHE_##_x
 #define CACHE_OP_UNSUPPORTED		0xFFFF
@@ -65,6 +67,12 @@  struct pmu_hw_events {
 	DECLARE_BITMAP(used_mask, ARMPMU_MAX_HWEVENTS);
 
 	/*
+	 * Keep track of implementation-specific events that
+	 * hardware can't schedule concurrently
+	 */
+	DECLARE_BITMAP(event_conflicts, ARMPMU_MAX_EVENT_CONFLICTS);
+
+	/*
 	 * Hardware lock to serialize accesses to PMU registers. Needed for the
 	 * read/modify/write sequences.
 	 */