Message ID | 20241026072424.29887-3-yangyicong@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Refactor the common parts to the HiSilicon Uncore PMU core and cleanups | expand |
On Sat, Oct 26, 2024 at 03:24:18PM +0800, Yicong Yang wrote: > From: Yicong Yang <yangyicong@hisilicon.com> > > Currently the associated CPUs are detected in the cpuhp online > callback. If the CPU's sccl_id or the ccl_id matches the PMU's, > they're associated. There's an exception that some PMUs locate > on the SICL and will match no CPUs. The events of these PMUs > can be opened on any online CPUs. To handle this we just check > whether the PMU's sccl_id is -1, if so we know it locates on > SICL and make any CPU associated to it. > > This can be tweaked so in this patch just do the below changes: > - If the PMU doesn't match any CPU then associated it to online CPUs > - Choose the target CPU according to the NUMA affinity for opening > events > > The function is implemented by hisi_pmu_init_associated_cpus() and > invoked in hisi_pmu_init(). > > Also the associated_cpus are maintained with all the online CPUs. This > is redundant since we'll always schedule the events on the online CPUs. > Get rid of this and make associated_cpus contain offline CPUs as well. I don't really understand the rationale for this change. Why is the new behaviour better than the old one? > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> > --- > drivers/perf/hisilicon/hisi_uncore_pmu.c | 56 +++++++++++++++++++----- > drivers/perf/hisilicon/hisi_uncore_pmu.h | 5 +++ > 2 files changed, 49 insertions(+), 12 deletions(-) > > diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c > index 416f72a813fc..c3549e16e0c3 100644 > --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c > +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c > @@ -399,6 +399,27 @@ void hisi_uncore_pmu_disable(struct pmu *pmu) > } > EXPORT_SYMBOL_NS_GPL(hisi_uncore_pmu_disable, HISI_PMU); > > +static void hisi_pmu_init_associated_cpus(struct hisi_pmu *hisi_pmu) > +{ > + /* > + * If the associated_cpus has already been initialized, for example > + * determined by comparing the sccl_id and ccl_id with the CPU's > + * mpidr_el1, then do nothing here. Otherwise the PMU has no affinity > + * and could be opened on any online CPU. > + */ > + if (!cpumask_empty(&hisi_pmu->associated_cpus)) > + return; > + > + cpumask_copy(&hisi_pmu->associated_cpus, cpu_online_mask); Is it always safe to access 'cpu_online_mask' here? Will
On 2024/10/29 21:28, Will Deacon wrote: > On Sat, Oct 26, 2024 at 03:24:18PM +0800, Yicong Yang wrote: >> From: Yicong Yang <yangyicong@hisilicon.com> >> >> Currently the associated CPUs are detected in the cpuhp online >> callback. If the CPU's sccl_id or the ccl_id matches the PMU's, >> they're associated. There's an exception that some PMUs locate >> on the SICL and will match no CPUs. The events of these PMUs >> can be opened on any online CPUs. To handle this we just check >> whether the PMU's sccl_id is -1, if so we know it locates on >> SICL and make any CPU associated to it. >> >> This can be tweaked so in this patch just do the below changes: >> - If the PMU doesn't match any CPU then associated it to online CPUs >> - Choose the target CPU according to the NUMA affinity for opening >> events >> >> The function is implemented by hisi_pmu_init_associated_cpus() and >> invoked in hisi_pmu_init(). >> >> Also the associated_cpus are maintained with all the online CPUs. This >> is redundant since we'll always schedule the events on the online CPUs. >> Get rid of this and make associated_cpus contain offline CPUs as well. > > I don't really understand the rationale for this change. Why is the new > behaviour better than the old one? > Thanks for taking a look. As mentioned in the commit, we have 2 types of PMU here: 1) PMUs locate on SCCL (Super CPU Cluster *), associated with certain CCL(CPU cluster *)(e.g. L3C PMU) or not (e.g. DDRC PMU) 2) PMUs locate on the SICL (Super IO Cluster *), which has no association with certain CPU topology (e.g. CPA PMU) Currently we find associated CPUs in the cpuhp callbacks by comparing the CPU's MPIDR with the PMU's SCCL ID and CCL ID. This will be fine for type 1) PMUs but not for type 2) PMUs since no CPU will match a SICL ID. We do a trick here, make type 2) PMUs's SCCL to -1 and match all the CPUs. So in fact type 2) PMUs are already associated with online CPUs but in an implicit way. With this patch the behaviour will be more clear. Another thing is about NUMA locality. Usually for type 2) PMUs they maybe on different NUMA node which is not considered in current approach, all SICL PMUs are stacked on CPU0. With the patch the NUMA node information will be considered if provided by the firmware. [*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/perf/hisi-pmu.rst?h=v6.12-rc1 >> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> >> --- >> drivers/perf/hisilicon/hisi_uncore_pmu.c | 56 +++++++++++++++++++----- >> drivers/perf/hisilicon/hisi_uncore_pmu.h | 5 +++ >> 2 files changed, 49 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c >> index 416f72a813fc..c3549e16e0c3 100644 >> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c >> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c >> @@ -399,6 +399,27 @@ void hisi_uncore_pmu_disable(struct pmu *pmu) >> } >> EXPORT_SYMBOL_NS_GPL(hisi_uncore_pmu_disable, HISI_PMU); >> >> +static void hisi_pmu_init_associated_cpus(struct hisi_pmu *hisi_pmu) >> +{ >> + /* >> + * If the associated_cpus has already been initialized, for example >> + * determined by comparing the sccl_id and ccl_id with the CPU's >> + * mpidr_el1, then do nothing here. Otherwise the PMU has no affinity >> + * and could be opened on any online CPU. >> + */ >> + if (!cpumask_empty(&hisi_pmu->associated_cpus)) >> + return; >> + >> + cpumask_copy(&hisi_pmu->associated_cpus, cpu_online_mask); > > Is it always safe to access 'cpu_online_mask' here? > It should be safe. This function will be called in hisi_pmu_init() in the module init stage so the cpu_online_mask() should be populated, at least the current running CPU is contained. The hisi_pmu->associated_cpus will be used along with the online CPU checking in cpuhp callbacks so we're unlikely to use an offline CPU mistakenly. Checked cpumask_local_spread() and it also use the cpu_online_mask directly without synchronization against cpuhp process, so it also should be ok here. Please correct my if any cases I missed. Thanks, Yicong
Hi Will, Further comment on this patch? On 2024/10/30 22:04, Yicong Yang wrote: > On 2024/10/29 21:28, Will Deacon wrote: >> On Sat, Oct 26, 2024 at 03:24:18PM +0800, Yicong Yang wrote: >>> From: Yicong Yang <yangyicong@hisilicon.com> >>> >>> Currently the associated CPUs are detected in the cpuhp online >>> callback. If the CPU's sccl_id or the ccl_id matches the PMU's, >>> they're associated. There's an exception that some PMUs locate >>> on the SICL and will match no CPUs. The events of these PMUs >>> can be opened on any online CPUs. To handle this we just check >>> whether the PMU's sccl_id is -1, if so we know it locates on >>> SICL and make any CPU associated to it. >>> >>> This can be tweaked so in this patch just do the below changes: >>> - If the PMU doesn't match any CPU then associated it to online CPUs >>> - Choose the target CPU according to the NUMA affinity for opening >>> events >>> >>> The function is implemented by hisi_pmu_init_associated_cpus() and >>> invoked in hisi_pmu_init(). >>> >>> Also the associated_cpus are maintained with all the online CPUs. This >>> is redundant since we'll always schedule the events on the online CPUs. >>> Get rid of this and make associated_cpus contain offline CPUs as well. >> >> I don't really understand the rationale for this change. Why is the new >> behaviour better than the old one? >> > > Thanks for taking a look. > > As mentioned in the commit, we have 2 types of PMU here: > 1) PMUs locate on SCCL (Super CPU Cluster *), associated with certain CCL(CPU cluster *)(e.g. L3C PMU) > or not (e.g. DDRC PMU) > 2) PMUs locate on the SICL (Super IO Cluster *), which has no association with certain CPU > topology (e.g. CPA PMU) > > Currently we find associated CPUs in the cpuhp callbacks by comparing the CPU's > MPIDR with the PMU's SCCL ID and CCL ID. This will be fine for type 1) PMUs but > not for type 2) PMUs since no CPU will match a SICL ID. We do a trick here, > make type 2) PMUs's SCCL to -1 and match all the CPUs. So in fact type 2) PMUs > are already associated with online CPUs but in an implicit way. With this patch > the behaviour will be more clear. > > Another thing is about NUMA locality. Usually for type 2) PMUs they maybe on > different NUMA node which is not considered in current approach, all SICL PMUs > are stacked on CPU0. With the patch the NUMA node information will be considered > if provided by the firmware. > > [*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/perf/hisi-pmu.rst?h=v6.12-rc1 > Another thing is that if the associated CPUs are all offline, previous approach will show -1 by cpumask while with this patch user will always see a usable CPU (this is ok since the uncore events are is not associated with certain CPU context and can be open on any online CPU): // In the previous approach [root@localhost devices]# cat hisi_sccl3_l3c1/cpumask 8 // offline all the CPUs sharing sccl3_l3c1 [root@localhost devices]# cat hisi_sccl3_l3c1/cpumask -1 This should make no difference for the function since perf tool will find an online CPU for opening the event even if it gets -1 from the cpumask. But -1 maybe a bit confusing. Thanks. >>> >>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> >>> --- >>> drivers/perf/hisilicon/hisi_uncore_pmu.c | 56 +++++++++++++++++++----- >>> drivers/perf/hisilicon/hisi_uncore_pmu.h | 5 +++ >>> 2 files changed, 49 insertions(+), 12 deletions(-) >>> >>> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c >>> index 416f72a813fc..c3549e16e0c3 100644 >>> --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c >>> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c >>> @@ -399,6 +399,27 @@ void hisi_uncore_pmu_disable(struct pmu *pmu) >>> } >>> EXPORT_SYMBOL_NS_GPL(hisi_uncore_pmu_disable, HISI_PMU); >>> >>> +static void hisi_pmu_init_associated_cpus(struct hisi_pmu *hisi_pmu) >>> +{ >>> + /* >>> + * If the associated_cpus has already been initialized, for example >>> + * determined by comparing the sccl_id and ccl_id with the CPU's >>> + * mpidr_el1, then do nothing here. Otherwise the PMU has no affinity >>> + * and could be opened on any online CPU. >>> + */ >>> + if (!cpumask_empty(&hisi_pmu->associated_cpus)) >>> + return; >>> + >>> + cpumask_copy(&hisi_pmu->associated_cpus, cpu_online_mask); >> >> Is it always safe to access 'cpu_online_mask' here? >> > > It should be safe. This function will be called in hisi_pmu_init() in the module > init stage so the cpu_online_mask() should be populated, at least the current > running CPU is contained. The hisi_pmu->associated_cpus will be used along > with the online CPU checking in cpuhp callbacks so we're unlikely to use an > offline CPU mistakenly. Checked cpumask_local_spread() and it also use the > cpu_online_mask directly without synchronization against cpuhp process, so > it also should be ok here. Please correct my if any cases I missed. > > Thanks, > Yicong > > > > . >
On Wed, Nov 06, 2024 at 04:33:18PM +0800, Yicong Yang wrote: > Hi Will, > > Further comment on this patch? You failed to convince me that copying the online mask is safe, but I haven't had time to look into that in more depth myself. Saying "It should be safe" isn't really enough -- it _must_ be safe! Will
On 2024/11/6 19:51, Will Deacon wrote: > On Wed, Nov 06, 2024 at 04:33:18PM +0800, Yicong Yang wrote: >> Hi Will, >> >> Further comment on this patch? > > You failed to convince me that copying the online mask is safe, but I > haven't had time to look into that in more depth myself. Saying "It should > be safe" isn't really enough -- it _must_ be safe! > I assume the "safe" here means we won't have trouble if CPU in the copied cpumask offlined since we don't synchronize with the cpuhp here. Accessing cpu_online_mask itself is safe since it's a piece of static memory. We'll initialize the hisi_pmu::associated_cpus from cpu_online_mask in the below cases. For other cases the associated CPUs is initalized in the cpuhp callback and we won't come here. 1) for a PMU does have associated CPUs like L3C PMU, but the associcated CPUs not onlined at probe 2) for a PMU has no association like CPA PMU which locates on a SICL. Before this patch this kind of PMU'associated CPUs are initliazed in the cpuhp callbacks For the above 2 cases it is safe since the driver's not using hisi_pmu::assoicated_cpus directly but combined with cpu_online_masks. PMU indicates the user the preferred CPU to open events on by "cpumask" sysfs which shows hisi_pmu::on_cpu. It's initialized/updated in: 1) hisi_pmu_init_associated_cpus() by cpumask_local_spread() which tries to found a nearest CPU of the node that the device locates 2) cpuhp callbacks we registered where holds the cpumap upate lock. Actually we allow hisi_pmu::associated_cpus contains offline CPUs which is also mentioned in the commit. Based on above it's safe to use the cpu_online_mask here. But I find one case doesn't covered by this patch - if boot with maxcpus=1, offline CPU X and offline CPU0 (both are not assocaited CPUs), hisi_pmu::on_cpu will not be updated. Since hisi_pmu::associated_cpus won't be updated when onlining an unassociated CPU and hisi_pmu::on_cpu is selected from the intersection of the associated CPUs and the online CPUs, the intersection will be empty in this case. I'll see how to handle in this case and update the patch. Thanks.
diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c index 416f72a813fc..c3549e16e0c3 100644 --- a/drivers/perf/hisilicon/hisi_uncore_pmu.c +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c @@ -399,6 +399,27 @@ void hisi_uncore_pmu_disable(struct pmu *pmu) } EXPORT_SYMBOL_NS_GPL(hisi_uncore_pmu_disable, HISI_PMU); +static void hisi_pmu_init_associated_cpus(struct hisi_pmu *hisi_pmu) +{ + /* + * If the associated_cpus has already been initialized, for example + * determined by comparing the sccl_id and ccl_id with the CPU's + * mpidr_el1, then do nothing here. Otherwise the PMU has no affinity + * and could be opened on any online CPU. + */ + if (!cpumask_empty(&hisi_pmu->associated_cpus)) + return; + + cpumask_copy(&hisi_pmu->associated_cpus, cpu_online_mask); + hisi_pmu->dummy_associated_cpus = true; + + /* + * Otherwise the associated CPUs haven't been online yet. Pick a + * nearest CPU according to the PMU's numa node instead. + */ + hisi_pmu->on_cpu = cpumask_local_spread(0, dev_to_node(hisi_pmu->dev)); + WARN_ON(irq_set_affinity(hisi_pmu->irq, cpumask_of(hisi_pmu->on_cpu))); +} /* * The Super CPU Cluster (SCCL) and CPU Cluster (CCL) IDs can be @@ -446,10 +467,6 @@ static bool hisi_pmu_cpu_is_associated_pmu(struct hisi_pmu *hisi_pmu) { int sccl_id, ccl_id; - /* If SCCL_ID is -1, the PMU is in a SICL and has no CPU affinity */ - if (hisi_pmu->sccl_id == -1) - return true; - if (hisi_pmu->ccl_id == -1) { /* If CCL_ID is -1, the PMU only shares the same SCCL */ hisi_read_sccl_and_ccl_id(&sccl_id, NULL); @@ -467,13 +484,29 @@ int hisi_uncore_pmu_online_cpu(unsigned int cpu, struct hlist_node *node) struct hisi_pmu *hisi_pmu = hlist_entry_safe(node, struct hisi_pmu, node); - if (!hisi_pmu_cpu_is_associated_pmu(hisi_pmu)) - return 0; + /* + * If the CPU is not in the associated_cpus, it maybe a new CPU. + * Test whether it's associated or not. + */ + if (!cpumask_test_cpu(cpu, &hisi_pmu->associated_cpus)) { + if (!hisi_pmu_cpu_is_associated_pmu(hisi_pmu)) + return 0; + + /* + * We found an associated CPU so we don't need to use the dummy + * associated CPUs. Update it. + */ + if (hisi_pmu->dummy_associated_cpus) { + cpumask_clear(&hisi_pmu->associated_cpus); + hisi_pmu->dummy_associated_cpus = false; + } - cpumask_set_cpu(cpu, &hisi_pmu->associated_cpus); + cpumask_set_cpu(cpu, &hisi_pmu->associated_cpus); + } - /* If another CPU is already managing this PMU, simply return. */ - if (hisi_pmu->on_cpu != -1) + /* If another associated CPU is already managing this PMU, simply return. */ + if (hisi_pmu->on_cpu != -1 && + cpumask_test_cpu(hisi_pmu->on_cpu, &hisi_pmu->associated_cpus)) return 0; /* Use this CPU in cpumask for event counting */ @@ -492,9 +525,6 @@ int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node) node); unsigned int target; - if (!cpumask_test_and_clear_cpu(cpu, &hisi_pmu->associated_cpus)) - return 0; - /* Nothing to do if this CPU doesn't own the PMU */ if (hisi_pmu->on_cpu != cpu) return 0; @@ -521,6 +551,8 @@ void hisi_pmu_init(struct hisi_pmu *hisi_pmu, struct module *module) { struct pmu *pmu = &hisi_pmu->pmu; + hisi_pmu_init_associated_cpus(hisi_pmu); + pmu->module = module; pmu->parent = hisi_pmu->dev; pmu->task_ctx_nr = perf_invalid_context; diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.h b/drivers/perf/hisilicon/hisi_uncore_pmu.h index 25b2d43b72bf..0e2f844b5fd9 100644 --- a/drivers/perf/hisilicon/hisi_uncore_pmu.h +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.h @@ -89,6 +89,11 @@ struct hisi_pmu { struct hisi_pmu_hwevents pmu_events; /* associated_cpus: All CPUs associated with the PMU */ cpumask_t associated_cpus; + /* + * If the real associated CPUs not onlined by the time initializing, + * we'll initialize with online CPUs and indicate it. + */ + bool dummy_associated_cpus; /* CPU used for counting */ int on_cpu; int irq;