diff mbox series

[v3,2/8] drivers/perf: hisi: Improve the detection of associated CPUs

Message ID 20241026072424.29887-3-yangyicong@huawei.com (mailing list archive)
State New, archived
Headers show
Series Refactor the common parts to the HiSilicon Uncore PMU core and cleanups | expand

Commit Message

Yicong Yang Oct. 26, 2024, 7:24 a.m. UTC
From: Yicong Yang <yangyicong@hisilicon.com>

Currently the associated CPUs are detected in the cpuhp online
callback. If the CPU's sccl_id or the ccl_id matches the PMU's,
they're associated. There's an exception that some PMUs locate
on the SICL and will match no CPUs. The events of these PMUs
can be opened on any online CPUs. To handle this we just check
whether the PMU's sccl_id is -1, if so we know it locates on
SICL and make any CPU associated to it.

This can be tweaked so in this patch just do the below changes:
- If the PMU doesn't match any CPU then associated it to online CPUs
- Choose the target CPU according to the NUMA affinity for opening
  events

The function is implemented by hisi_pmu_init_associated_cpus() and
invoked in hisi_pmu_init().

Also the associated_cpus are maintained with all the online CPUs. This
is redundant since we'll always schedule the events on the online CPUs.
Get rid of this and make associated_cpus contain offline CPUs as well.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 drivers/perf/hisilicon/hisi_uncore_pmu.c | 56 +++++++++++++++++++-----
 drivers/perf/hisilicon/hisi_uncore_pmu.h |  5 +++
 2 files changed, 49 insertions(+), 12 deletions(-)

Comments

Will Deacon Oct. 29, 2024, 1:28 p.m. UTC | #1
On Sat, Oct 26, 2024 at 03:24:18PM +0800, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> Currently the associated CPUs are detected in the cpuhp online
> callback. If the CPU's sccl_id or the ccl_id matches the PMU's,
> they're associated. There's an exception that some PMUs locate
> on the SICL and will match no CPUs. The events of these PMUs
> can be opened on any online CPUs. To handle this we just check
> whether the PMU's sccl_id is -1, if so we know it locates on
> SICL and make any CPU associated to it.
> 
> This can be tweaked so in this patch just do the below changes:
> - If the PMU doesn't match any CPU then associated it to online CPUs
> - Choose the target CPU according to the NUMA affinity for opening
>   events
> 
> The function is implemented by hisi_pmu_init_associated_cpus() and
> invoked in hisi_pmu_init().
> 
> Also the associated_cpus are maintained with all the online CPUs. This
> is redundant since we'll always schedule the events on the online CPUs.
> Get rid of this and make associated_cpus contain offline CPUs as well.

I don't really understand the rationale for this change. Why is the new
behaviour better than the old one?

> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
>  drivers/perf/hisilicon/hisi_uncore_pmu.c | 56 +++++++++++++++++++-----
>  drivers/perf/hisilicon/hisi_uncore_pmu.h |  5 +++
>  2 files changed, 49 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> index 416f72a813fc..c3549e16e0c3 100644
> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
> @@ -399,6 +399,27 @@ void hisi_uncore_pmu_disable(struct pmu *pmu)
>  }
>  EXPORT_SYMBOL_NS_GPL(hisi_uncore_pmu_disable, HISI_PMU);
>  
> +static void hisi_pmu_init_associated_cpus(struct hisi_pmu *hisi_pmu)
> +{
> +	/*
> +	 * If the associated_cpus has already been initialized, for example
> +	 * determined by comparing the sccl_id and ccl_id with the CPU's
> +	 * mpidr_el1, then do nothing here. Otherwise the PMU has no affinity
> +	 * and could be opened on any online CPU.
> +	 */
> +	if (!cpumask_empty(&hisi_pmu->associated_cpus))
> +		return;
> +
> +	cpumask_copy(&hisi_pmu->associated_cpus, cpu_online_mask);

Is it always safe to access 'cpu_online_mask' here?

Will
Yicong Yang Oct. 30, 2024, 2:04 p.m. UTC | #2
On 2024/10/29 21:28, Will Deacon wrote:
> On Sat, Oct 26, 2024 at 03:24:18PM +0800, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> Currently the associated CPUs are detected in the cpuhp online
>> callback. If the CPU's sccl_id or the ccl_id matches the PMU's,
>> they're associated. There's an exception that some PMUs locate
>> on the SICL and will match no CPUs. The events of these PMUs
>> can be opened on any online CPUs. To handle this we just check
>> whether the PMU's sccl_id is -1, if so we know it locates on
>> SICL and make any CPU associated to it.
>>
>> This can be tweaked so in this patch just do the below changes:
>> - If the PMU doesn't match any CPU then associated it to online CPUs
>> - Choose the target CPU according to the NUMA affinity for opening
>>   events
>>
>> The function is implemented by hisi_pmu_init_associated_cpus() and
>> invoked in hisi_pmu_init().
>>
>> Also the associated_cpus are maintained with all the online CPUs. This
>> is redundant since we'll always schedule the events on the online CPUs.
>> Get rid of this and make associated_cpus contain offline CPUs as well.
> 
> I don't really understand the rationale for this change. Why is the new
> behaviour better than the old one?
> 

Thanks for taking a look.

As mentioned in the commit, we have 2 types of PMU here:
1) PMUs locate on SCCL (Super CPU Cluster *), associated with certain CCL(CPU cluster *)(e.g. L3C PMU)
   or not (e.g. DDRC PMU)
2) PMUs locate on the SICL (Super IO Cluster *), which has no association with certain CPU
   topology (e.g. CPA PMU)

Currently we find associated CPUs in the cpuhp callbacks by comparing the CPU's
MPIDR with the PMU's SCCL ID and CCL ID. This will be fine for type 1) PMUs but
not for type 2) PMUs since no CPU will match a SICL ID. We do a trick here,
make type 2) PMUs's SCCL to -1 and match all the CPUs. So in fact type 2) PMUs
are already associated with online CPUs but in an implicit way. With this patch
the behaviour will be more clear.

Another thing is about NUMA locality. Usually for type 2) PMUs they maybe on
different NUMA node which is not considered in current approach, all SICL PMUs
are stacked on CPU0. With the patch the NUMA node information will be considered
if provided by the firmware.

[*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/perf/hisi-pmu.rst?h=v6.12-rc1

>>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>>  drivers/perf/hisilicon/hisi_uncore_pmu.c | 56 +++++++++++++++++++-----
>>  drivers/perf/hisilicon/hisi_uncore_pmu.h |  5 +++
>>  2 files changed, 49 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> index 416f72a813fc..c3549e16e0c3 100644
>> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> @@ -399,6 +399,27 @@ void hisi_uncore_pmu_disable(struct pmu *pmu)
>>  }
>>  EXPORT_SYMBOL_NS_GPL(hisi_uncore_pmu_disable, HISI_PMU);
>>  
>> +static void hisi_pmu_init_associated_cpus(struct hisi_pmu *hisi_pmu)
>> +{
>> +	/*
>> +	 * If the associated_cpus has already been initialized, for example
>> +	 * determined by comparing the sccl_id and ccl_id with the CPU's
>> +	 * mpidr_el1, then do nothing here. Otherwise the PMU has no affinity
>> +	 * and could be opened on any online CPU.
>> +	 */
>> +	if (!cpumask_empty(&hisi_pmu->associated_cpus))
>> +		return;
>> +
>> +	cpumask_copy(&hisi_pmu->associated_cpus, cpu_online_mask);
> 
> Is it always safe to access 'cpu_online_mask' here?
> 

It should be safe. This function will be called in hisi_pmu_init() in the module
init stage so the cpu_online_mask() should be populated, at least the current
running CPU is contained. The hisi_pmu->associated_cpus will be used along
with the online CPU checking in cpuhp callbacks so we're unlikely to use an
offline CPU mistakenly. Checked cpumask_local_spread() and it also use the
cpu_online_mask directly without synchronization against cpuhp process, so
it also should be ok here. Please correct my if any cases I missed.

Thanks,
Yicong
Yicong Yang Nov. 6, 2024, 8:33 a.m. UTC | #3
Hi Will,

Further comment on this patch?

On 2024/10/30 22:04, Yicong Yang wrote:
> On 2024/10/29 21:28, Will Deacon wrote:
>> On Sat, Oct 26, 2024 at 03:24:18PM +0800, Yicong Yang wrote:
>>> From: Yicong Yang <yangyicong@hisilicon.com>
>>>
>>> Currently the associated CPUs are detected in the cpuhp online
>>> callback. If the CPU's sccl_id or the ccl_id matches the PMU's,
>>> they're associated. There's an exception that some PMUs locate
>>> on the SICL and will match no CPUs. The events of these PMUs
>>> can be opened on any online CPUs. To handle this we just check
>>> whether the PMU's sccl_id is -1, if so we know it locates on
>>> SICL and make any CPU associated to it.
>>>
>>> This can be tweaked so in this patch just do the below changes:
>>> - If the PMU doesn't match any CPU then associated it to online CPUs
>>> - Choose the target CPU according to the NUMA affinity for opening
>>>   events
>>>
>>> The function is implemented by hisi_pmu_init_associated_cpus() and
>>> invoked in hisi_pmu_init().
>>>
>>> Also the associated_cpus are maintained with all the online CPUs. This
>>> is redundant since we'll always schedule the events on the online CPUs.
>>> Get rid of this and make associated_cpus contain offline CPUs as well.
>>
>> I don't really understand the rationale for this change. Why is the new
>> behaviour better than the old one?
>>
> 
> Thanks for taking a look.
> 
> As mentioned in the commit, we have 2 types of PMU here:
> 1) PMUs locate on SCCL (Super CPU Cluster *), associated with certain CCL(CPU cluster *)(e.g. L3C PMU)
>    or not (e.g. DDRC PMU)
> 2) PMUs locate on the SICL (Super IO Cluster *), which has no association with certain CPU
>    topology (e.g. CPA PMU)
> 
> Currently we find associated CPUs in the cpuhp callbacks by comparing the CPU's
> MPIDR with the PMU's SCCL ID and CCL ID. This will be fine for type 1) PMUs but
> not for type 2) PMUs since no CPU will match a SICL ID. We do a trick here,
> make type 2) PMUs's SCCL to -1 and match all the CPUs. So in fact type 2) PMUs
> are already associated with online CPUs but in an implicit way. With this patch
> the behaviour will be more clear.
> 
> Another thing is about NUMA locality. Usually for type 2) PMUs they maybe on
> different NUMA node which is not considered in current approach, all SICL PMUs
> are stacked on CPU0. With the patch the NUMA node information will be considered
> if provided by the firmware.
> 
> [*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/perf/hisi-pmu.rst?h=v6.12-rc1
> 

Another thing is that if the associated CPUs are all offline, previous approach
will show -1 by cpumask while with this patch user will always see a usable CPU
(this is ok since the uncore events are is not associated with certain CPU context
and can be open on any online CPU):

// In the previous approach
[root@localhost devices]# cat hisi_sccl3_l3c1/cpumask
8
// offline all the CPUs sharing sccl3_l3c1
[root@localhost devices]# cat hisi_sccl3_l3c1/cpumask
-1

This should make no difference for the function since perf tool will find an online
CPU for opening the event even if it gets -1 from the cpumask. But -1 maybe a bit
confusing.

Thanks.

>>>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>>> ---
>>>  drivers/perf/hisilicon/hisi_uncore_pmu.c | 56 +++++++++++++++++++-----
>>>  drivers/perf/hisilicon/hisi_uncore_pmu.h |  5 +++
>>>  2 files changed, 49 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
>>> index 416f72a813fc..c3549e16e0c3 100644
>>> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
>>> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
>>> @@ -399,6 +399,27 @@ void hisi_uncore_pmu_disable(struct pmu *pmu)
>>>  }
>>>  EXPORT_SYMBOL_NS_GPL(hisi_uncore_pmu_disable, HISI_PMU);
>>>  
>>> +static void hisi_pmu_init_associated_cpus(struct hisi_pmu *hisi_pmu)
>>> +{
>>> +	/*
>>> +	 * If the associated_cpus has already been initialized, for example
>>> +	 * determined by comparing the sccl_id and ccl_id with the CPU's
>>> +	 * mpidr_el1, then do nothing here. Otherwise the PMU has no affinity
>>> +	 * and could be opened on any online CPU.
>>> +	 */
>>> +	if (!cpumask_empty(&hisi_pmu->associated_cpus))
>>> +		return;
>>> +
>>> +	cpumask_copy(&hisi_pmu->associated_cpus, cpu_online_mask);
>>
>> Is it always safe to access 'cpu_online_mask' here?
>>
> 
> It should be safe. This function will be called in hisi_pmu_init() in the module
> init stage so the cpu_online_mask() should be populated, at least the current
> running CPU is contained. The hisi_pmu->associated_cpus will be used along
> with the online CPU checking in cpuhp callbacks so we're unlikely to use an
> offline CPU mistakenly. Checked cpumask_local_spread() and it also use the
> cpu_online_mask directly without synchronization against cpuhp process, so
> it also should be ok here. Please correct my if any cases I missed.
> 
> Thanks,
> Yicong
> 
> 
> 
> .
>
Will Deacon Nov. 6, 2024, 11:51 a.m. UTC | #4
On Wed, Nov 06, 2024 at 04:33:18PM +0800, Yicong Yang wrote:
> Hi Will,
> 
> Further comment on this patch?

You failed to convince me that copying the online mask is safe, but I
haven't had time to look into that in more depth myself. Saying "It should
be safe" isn't really enough -- it _must_ be safe!

Will
Yicong Yang Nov. 7, 2024, 2:11 p.m. UTC | #5
On 2024/11/6 19:51, Will Deacon wrote:
> On Wed, Nov 06, 2024 at 04:33:18PM +0800, Yicong Yang wrote:
>> Hi Will,
>>
>> Further comment on this patch?
> 
> You failed to convince me that copying the online mask is safe, but I
> haven't had time to look into that in more depth myself. Saying "It should
> be safe" isn't really enough -- it _must_ be safe!
> 

I assume the "safe" here means we won't have trouble if CPU in the copied cpumask
offlined since we don't synchronize with the cpuhp here. Accessing cpu_online_mask
itself is safe since it's a piece of static memory.

We'll initialize the hisi_pmu::associated_cpus from cpu_online_mask in the below
cases. For other cases the associated CPUs is initalized in the cpuhp callback
and we won't come here.
1) for a PMU does have associated CPUs like L3C PMU, but the associcated CPUs
   not onlined at probe
2) for a PMU has no association like CPA PMU which locates on a SICL. Before this
   patch this kind of PMU'associated CPUs are initliazed in the cpuhp callbacks

For the above 2 cases it is safe since the driver's not using hisi_pmu::assoicated_cpus
directly but combined with cpu_online_masks. PMU indicates the user the preferred CPU
to open events on by "cpumask" sysfs which shows hisi_pmu::on_cpu. It's initialized/updated in:
1) hisi_pmu_init_associated_cpus() by cpumask_local_spread() which tries to
   found a nearest CPU of the node that the device locates
2) cpuhp callbacks we registered where holds the cpumap upate lock.

Actually we allow hisi_pmu::associated_cpus contains offline CPUs which is also
mentioned in the commit.

Based on above it's safe to use the cpu_online_mask here. But I find one case
doesn't covered by this patch - if boot with maxcpus=1, offline CPU X and offline
CPU0 (both are not assocaited CPUs), hisi_pmu::on_cpu will not be updated. Since
hisi_pmu::associated_cpus won't be updated when onlining an unassociated CPU
and hisi_pmu::on_cpu is selected from the intersection of the associated CPUs
and the online CPUs, the intersection will be empty in this case.

I'll see how to handle in this case and update the patch.

Thanks.
diff mbox series

Patch

diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index 416f72a813fc..c3549e16e0c3 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -399,6 +399,27 @@  void hisi_uncore_pmu_disable(struct pmu *pmu)
 }
 EXPORT_SYMBOL_NS_GPL(hisi_uncore_pmu_disable, HISI_PMU);
 
+static void hisi_pmu_init_associated_cpus(struct hisi_pmu *hisi_pmu)
+{
+	/*
+	 * If the associated_cpus has already been initialized, for example
+	 * determined by comparing the sccl_id and ccl_id with the CPU's
+	 * mpidr_el1, then do nothing here. Otherwise the PMU has no affinity
+	 * and could be opened on any online CPU.
+	 */
+	if (!cpumask_empty(&hisi_pmu->associated_cpus))
+		return;
+
+	cpumask_copy(&hisi_pmu->associated_cpus, cpu_online_mask);
+	hisi_pmu->dummy_associated_cpus = true;
+
+	/*
+	 * Otherwise the associated CPUs haven't been online yet. Pick a
+	 * nearest CPU according to the PMU's numa node instead.
+	 */
+	hisi_pmu->on_cpu = cpumask_local_spread(0, dev_to_node(hisi_pmu->dev));
+	WARN_ON(irq_set_affinity(hisi_pmu->irq, cpumask_of(hisi_pmu->on_cpu)));
+}
 
 /*
  * The Super CPU Cluster (SCCL) and CPU Cluster (CCL) IDs can be
@@ -446,10 +467,6 @@  static bool hisi_pmu_cpu_is_associated_pmu(struct hisi_pmu *hisi_pmu)
 {
 	int sccl_id, ccl_id;
 
-	/* If SCCL_ID is -1, the PMU is in a SICL and has no CPU affinity */
-	if (hisi_pmu->sccl_id == -1)
-		return true;
-
 	if (hisi_pmu->ccl_id == -1) {
 		/* If CCL_ID is -1, the PMU only shares the same SCCL */
 		hisi_read_sccl_and_ccl_id(&sccl_id, NULL);
@@ -467,13 +484,29 @@  int hisi_uncore_pmu_online_cpu(unsigned int cpu, struct hlist_node *node)
 	struct hisi_pmu *hisi_pmu = hlist_entry_safe(node, struct hisi_pmu,
 						     node);
 
-	if (!hisi_pmu_cpu_is_associated_pmu(hisi_pmu))
-		return 0;
+	/*
+	 * If the CPU is not in the associated_cpus, it maybe a new CPU.
+	 * Test whether it's associated or not.
+	 */
+	if (!cpumask_test_cpu(cpu, &hisi_pmu->associated_cpus)) {
+		if (!hisi_pmu_cpu_is_associated_pmu(hisi_pmu))
+			return 0;
+
+		/*
+		 * We found an associated CPU so we don't need to use the dummy
+		 * associated CPUs. Update it.
+		 */
+		if (hisi_pmu->dummy_associated_cpus) {
+			cpumask_clear(&hisi_pmu->associated_cpus);
+			hisi_pmu->dummy_associated_cpus = false;
+		}
 
-	cpumask_set_cpu(cpu, &hisi_pmu->associated_cpus);
+		cpumask_set_cpu(cpu, &hisi_pmu->associated_cpus);
+	}
 
-	/* If another CPU is already managing this PMU, simply return. */
-	if (hisi_pmu->on_cpu != -1)
+	/* If another associated CPU is already managing this PMU, simply return. */
+	if (hisi_pmu->on_cpu != -1 &&
+	    cpumask_test_cpu(hisi_pmu->on_cpu, &hisi_pmu->associated_cpus))
 		return 0;
 
 	/* Use this CPU in cpumask for event counting */
@@ -492,9 +525,6 @@  int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
 						     node);
 	unsigned int target;
 
-	if (!cpumask_test_and_clear_cpu(cpu, &hisi_pmu->associated_cpus))
-		return 0;
-
 	/* Nothing to do if this CPU doesn't own the PMU */
 	if (hisi_pmu->on_cpu != cpu)
 		return 0;
@@ -521,6 +551,8 @@  void hisi_pmu_init(struct hisi_pmu *hisi_pmu, struct module *module)
 {
 	struct pmu *pmu = &hisi_pmu->pmu;
 
+	hisi_pmu_init_associated_cpus(hisi_pmu);
+
 	pmu->module             = module;
 	pmu->parent             = hisi_pmu->dev;
 	pmu->task_ctx_nr        = perf_invalid_context;
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.h b/drivers/perf/hisilicon/hisi_uncore_pmu.h
index 25b2d43b72bf..0e2f844b5fd9 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.h
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.h
@@ -89,6 +89,11 @@  struct hisi_pmu {
 	struct hisi_pmu_hwevents pmu_events;
 	/* associated_cpus: All CPUs associated with the PMU */
 	cpumask_t associated_cpus;
+	/*
+	 * If the real associated CPUs not onlined by the time initializing,
+	 * we'll initialize with online CPUs and indicate it.
+	 */
+	bool dummy_associated_cpus;
 	/* CPU used for counting */
 	int on_cpu;
 	int irq;