diff mbox series

[V5,4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection

Message ID 20221107062514.2851047-5-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64/perf: Enable branch stack sampling | expand

Commit Message

Anshuman Khandual Nov. 7, 2022, 6:25 a.m. UTC
This adds arm pmu infrastrure to probe BRBE implementation's attributes via
driver exported callbacks later. The actual BRBE feature detection will be
added by the driver itself.

CPU specific BRBE entries, cycle count, format support gets detected during
PMU init. This information gets saved in per-cpu struct pmu_hw_events which
later helps in operating BRBE during a perf event context.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

Comments

Mark Rutland Nov. 18, 2022, 6:01 p.m. UTC | #1
On Mon, Nov 07, 2022 at 11:55:11AM +0530, Anshuman Khandual wrote:
> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
> driver exported callbacks later. The actual BRBE feature detection will be
> added by the driver itself.
> 
> CPU specific BRBE entries, cycle count, format support gets detected during
> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
> later helps in operating BRBE during a perf event context.

Do we expect this to vary between CPUs handled by the same struct arm_pmu ?

> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
> index 933b96e243b8..acdc445081aa 100644
> --- a/drivers/perf/arm_pmu_platform.c
> +++ b/drivers/perf/arm_pmu_platform.c
> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>  	return err;
>  }
>  
> +static void arm_brbe_probe_cpu(void *info)
> +{
> +	struct pmu_hw_events *hw_events;
> +	struct arm_pmu *armpmu = info;
> +
> +	/*
> +	 * Return from here, if BRBE driver has not been
> +	 * implemented for this PMU. This helps prevent
> +	 * kernel crash later when brbe_probe() will be
> +	 * called on the PMU.
> +	 */
> +	if (!armpmu->brbe_probe)
> +		return;

Since this is a field on struct arm_pmu, why doesn't armpmu_request_brbe()
check this before calling smp_call_function_single(), to avoid the redundant
IPI?

> +
> +	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
> +	armpmu->brbe_probe(hw_events);
> +}
> +
> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
> +{
> +	int cpu, err = 0;
> +
> +	for_each_cpu(cpu, &armpmu->supported_cpus) {
> +		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);

Why does this need to be called on each CPU in the supported_cpus mask?

I don't see anything here to handle late hotplug, so this looks suspicious.
Either we're missing something, or it's redundant at boot time.

Thanks,
Mark.

> +		if (err)
> +			return err;
> +	}
> +	return err;
> +}
> +
>  static void armpmu_free_irqs(struct arm_pmu *armpmu)
>  {
>  	int cpu;
> @@ -229,6 +259,10 @@ int arm_pmu_device_probe(struct platform_device *pdev,
>  	if (ret)
>  		goto out_free_irqs;
>  
> +	ret = armpmu_request_brbe(pmu);
> +	if (ret)
> +		goto out_free_irqs;
> +
>  	ret = armpmu_register(pmu);
>  	if (ret) {
>  		dev_err(dev, "failed to register PMU devices!\n");
> -- 
> 2.25.1
>
Anshuman Khandual Nov. 21, 2022, 6:36 a.m. UTC | #2
On 11/18/22 23:31, Mark Rutland wrote:
> On Mon, Nov 07, 2022 at 11:55:11AM +0530, Anshuman Khandual wrote:
>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>> driver exported callbacks later. The actual BRBE feature detection will be
>> added by the driver itself.
>>
>> CPU specific BRBE entries, cycle count, format support gets detected during
>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>> later helps in operating BRBE during a perf event context.
> 
> Do we expect this to vary between CPUs handled by the same struct arm_pmu ?

BRBE registers are per CPU, and the spec does not assert about BRBE properties
being the same across the system, served via same the struct arm_pmu. Hence it
would be inaccurate to make that assumption, which might have just avoided all
these IPI based probes during boot.

> 
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>>  1 file changed, 34 insertions(+)
>>
>> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
>> index 933b96e243b8..acdc445081aa 100644
>> --- a/drivers/perf/arm_pmu_platform.c
>> +++ b/drivers/perf/arm_pmu_platform.c
>> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>>  	return err;
>>  }
>>  
>> +static void arm_brbe_probe_cpu(void *info)
>> +{
>> +	struct pmu_hw_events *hw_events;
>> +	struct arm_pmu *armpmu = info;
>> +
>> +	/*
>> +	 * Return from here, if BRBE driver has not been
>> +	 * implemented for this PMU. This helps prevent
>> +	 * kernel crash later when brbe_probe() will be
>> +	 * called on the PMU.
>> +	 */
>> +	if (!armpmu->brbe_probe)
>> +		return;
> 
> Since this is a field on struct arm_pmu, why doesn't armpmu_request_brbe()
> check this before calling smp_call_function_single(), to avoid the redundant
> IPI?

Makes sense, I will move the check inside armpmu_request_brbe() with return
code -ENODEV when not available.

> 
>> +
>> +	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>> +	armpmu->brbe_probe(hw_events);
>> +}
>> +
>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>> +{
>> +	int cpu, err = 0;
>> +
>> +	for_each_cpu(cpu, &armpmu->supported_cpus) {
>> +		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
> 
> Why does this need to be called on each CPU in the supported_cpus mask?

Is not supported_cpus derived after partitioning the IRQ in pmu_parse_percpu_irq().
The idea is to fill up BRBE buffer attributes, on all such supported cpus which could
trigger PMU interrupt. Is the concern, that not all cpus in supported_cpus mask might
not be online during boot, hence IPIs could not be served, hence BRBE attributed for
them could not be fetched ?

> 
> I don't see anything here to handle late hotplug, so this looks suspicious.

Right, I should add cpu hotplug handling, otherwise risk loosing BRBE support on cpus
which might have been offline during boot i.e when above IPI based probe happened ?

> Either we're missing something, or it's redundant at boot time.

Should we add cpu hotplug online-offline handlers like some other PMU drivers ? Let
me know if there are some other concerns.

cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
			arm_brbe_cpu_startup,
		        arm_brbe_cpu_teardown)
Mark Rutland Nov. 21, 2022, 11:39 a.m. UTC | #3
On Mon, Nov 21, 2022 at 12:06:31PM +0530, Anshuman Khandual wrote:
> 
> 
> On 11/18/22 23:31, Mark Rutland wrote:
> > On Mon, Nov 07, 2022 at 11:55:11AM +0530, Anshuman Khandual wrote:
> >> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
> >> driver exported callbacks later. The actual BRBE feature detection will be
> >> added by the driver itself.
> >>
> >> CPU specific BRBE entries, cycle count, format support gets detected during
> >> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
> >> later helps in operating BRBE during a perf event context.
> > 
> > Do we expect this to vary between CPUs handled by the same struct arm_pmu ?
> 
> BRBE registers are per CPU, and the spec does not assert about BRBE properties
> being the same across the system, served via same the struct arm_pmu.

The same is true of the PMU, and struct arm_pmu does not cover the whole
system, it covers each *micro-architecture* within the system.

I think BRBE should be treated the same, i.e. uniform *within* a struct
arm_pmu.

> Hence it would be inaccurate to make that assumption, which might have just
> avoided all these IPI based probes during boot.

FWIW, I would be happy to IPI all CPUs during boot to verify uniformity of CPUs
within an arm_pmu; I just don't think that BRBE should be treated differently
from the rest of the PMU features.

[...]

> >> +	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
> >> +	armpmu->brbe_probe(hw_events);
> >> +}
> >> +
> >> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
> >> +{
> >> +	int cpu, err = 0;
> >> +
> >> +	for_each_cpu(cpu, &armpmu->supported_cpus) {
> >> +		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
> > 
> > Why does this need to be called on each CPU in the supported_cpus mask?
> 
> Is not supported_cpus derived after partitioning the IRQ in pmu_parse_percpu_irq().
> The idea is to fill up BRBE buffer attributes, on all such supported cpus which could
> trigger PMU interrupt. Is the concern, that not all cpus in supported_cpus mask might
> not be online during boot, hence IPIs could not be served, hence BRBE attributed for
> them could not be fetched ?

As above, I think this is solvable if we mandate that BRBE must be uniform
*within* an arm_pmu's supported CPUs; then we only need one CPU in the
supported_cpus mask to be present at boot time, as with the rest of the PMU
code.

We could *verify* that when onlining a CPU.

> > I don't see anything here to handle late hotplug, so this looks suspicious.
> 
> Right, I should add cpu hotplug handling, otherwise risk loosing BRBE support on cpus
> which might have been offline during boot i.e when above IPI based probe happened ?
> 
> > Either we're missing something, or it's redundant at boot time.
> 
> Should we add cpu hotplug online-offline handlers like some other PMU drivers ? Let
> me know if there are some other concerns.
> 
> cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
> 			arm_brbe_cpu_startup,
> 		        arm_brbe_cpu_teardown)

We *could* add that, but that's going to require ordering against the existing
hooks for probing arm_pmu.

Why can't this hang off the exising hooks for arm_pmu? We're treating this as
part of the PMU anyway, so I don't understand why we should probe it
separately.

Thanks,
Mark.
Anshuman Khandual Nov. 28, 2022, 8:24 a.m. UTC | #4
On 11/21/22 17:09, Mark Rutland wrote:
> On Mon, Nov 21, 2022 at 12:06:31PM +0530, Anshuman Khandual wrote:
>>
>>
>> On 11/18/22 23:31, Mark Rutland wrote:
>>> On Mon, Nov 07, 2022 at 11:55:11AM +0530, Anshuman Khandual wrote:
>>>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>>>> driver exported callbacks later. The actual BRBE feature detection will be
>>>> added by the driver itself.
>>>>
>>>> CPU specific BRBE entries, cycle count, format support gets detected during
>>>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>>>> later helps in operating BRBE during a perf event context.
>>>
>>> Do we expect this to vary between CPUs handled by the same struct arm_pmu ?
>>
>> BRBE registers are per CPU, and the spec does not assert about BRBE properties
>> being the same across the system, served via same the struct arm_pmu.
> 
> The same is true of the PMU, and struct arm_pmu does not cover the whole
> system, it covers each *micro-architecture* within the system.
> 
> I think BRBE should be treated the same, i.e. uniform *within* a struct
> arm_pmu.

Understood, detected on one and verified on all ?

> 
>> Hence it would be inaccurate to make that assumption, which might have just
>> avoided all these IPI based probes during boot.
> 
> FWIW, I would be happy to IPI all CPUs during boot to verify uniformity of CPUs
> within an arm_pmu; I just don't think that BRBE should be treated differently
> from the rest of the PMU features.

Hence BRBE probing should be done inside an updated __armv8pmu_probe_pmu().

static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
{
        struct armv8pmu_probe_info probe = {
                .pmu = cpu_pmu,
                .present = false,
        };
        int ret;

        ret = smp_call_function_any(&cpu_pmu->supported_cpus,
                                    __armv8pmu_probe_pmu,
                                    &probe, 1);
        if (ret)
                return ret;

        return probe.present ? 0 : -ENODEV;
}

But if BRBE is assumed (and verified) to be same across the micro-architecture,
then following BRBE attributes when captured should be part of 'struct arm_pmu'
instead of 'struct pmu_hw_events' as is the case currently.

        /* Detected BRBE attributes */
        bool                            brbe_v1p1;
        int                             brbe_cc;
        int                             brbe_nr;
        int                             brbe_format;

> 
> [...]
> 
>>>> +	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>>>> +	armpmu->brbe_probe(hw_events);
>>>> +}
>>>> +
>>>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>>>> +{
>>>> +	int cpu, err = 0;
>>>> +
>>>> +	for_each_cpu(cpu, &armpmu->supported_cpus) {
>>>> +		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
>>>
>>> Why does this need to be called on each CPU in the supported_cpus mask?
>>
>> Is not supported_cpus derived after partitioning the IRQ in pmu_parse_percpu_irq().
>> The idea is to fill up BRBE buffer attributes, on all such supported cpus which could
>> trigger PMU interrupt. Is the concern, that not all cpus in supported_cpus mask might
>> not be online during boot, hence IPIs could not be served, hence BRBE attributed for
>> them could not be fetched ?
> 
> As above, I think this is solvable if we mandate that BRBE must be uniform
> *within* an arm_pmu's supported CPUs; then we only need one CPU in the
> supported_cpus mask to be present at boot time, as with the rest of the PMU
> code.
> 
> We could *verify* that when onlining a CPU.

Understood.

> 
>>> I don't see anything here to handle late hotplug, so this looks suspicious.
>>
>> Right, I should add cpu hotplug handling, otherwise risk loosing BRBE support on cpus
>> which might have been offline during boot i.e when above IPI based probe happened ?
>>
>>> Either we're missing something, or it's redundant at boot time.
>>
>> Should we add cpu hotplug online-offline handlers like some other PMU drivers ? Let
>> me know if there are some other concerns.
>>
>> cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRVNAME,
>> 			arm_brbe_cpu_startup,
>> 		        arm_brbe_cpu_teardown)
> 
> We *could* add that, but that's going to require ordering against the existing
> hooks for probing arm_pmu.

Right.

> 
> Why can't this hang off the exising hooks for arm_pmu? We're treating this as
> part of the PMU anyway, so I don't understand why we should probe it
> separately.
Okay, will try and see what all changes are required to move the probing into generic
arm_pmu probe, and capture the BRBE attributes inside struct arm_pmu.
diff mbox series

Patch

diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
index 933b96e243b8..acdc445081aa 100644
--- a/drivers/perf/arm_pmu_platform.c
+++ b/drivers/perf/arm_pmu_platform.c
@@ -172,6 +172,36 @@  static int armpmu_request_irqs(struct arm_pmu *armpmu)
 	return err;
 }
 
+static void arm_brbe_probe_cpu(void *info)
+{
+	struct pmu_hw_events *hw_events;
+	struct arm_pmu *armpmu = info;
+
+	/*
+	 * Return from here, if BRBE driver has not been
+	 * implemented for this PMU. This helps prevent
+	 * kernel crash later when brbe_probe() will be
+	 * called on the PMU.
+	 */
+	if (!armpmu->brbe_probe)
+		return;
+
+	hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
+	armpmu->brbe_probe(hw_events);
+}
+
+static int armpmu_request_brbe(struct arm_pmu *armpmu)
+{
+	int cpu, err = 0;
+
+	for_each_cpu(cpu, &armpmu->supported_cpus) {
+		err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
+		if (err)
+			return err;
+	}
+	return err;
+}
+
 static void armpmu_free_irqs(struct arm_pmu *armpmu)
 {
 	int cpu;
@@ -229,6 +259,10 @@  int arm_pmu_device_probe(struct platform_device *pdev,
 	if (ret)
 		goto out_free_irqs;
 
+	ret = armpmu_request_brbe(pmu);
+	if (ret)
+		goto out_free_irqs;
+
 	ret = armpmu_register(pmu);
 	if (ret) {
 		dev_err(dev, "failed to register PMU devices!\n");