diff mbox series

[v1] arch_topology: Make cpu_capacity sysfs node as ready-only

Message ID 1551886073-16217-1-git-send-email-clingutla@codeaurora.org (mailing list archive)
State New, archived
Headers show
Series [v1] arch_topology: Make cpu_capacity sysfs node as ready-only | expand

Commit Message

Chandrasekhar L March 6, 2019, 3:27 p.m. UTC
If user updates any cpu's cpu_capacity, then the new value is going to
be applied to all its online sibling cpus. But this need not to be correct
always, as sibling cpus (in ARM, same micro architecture cpus) would have
different cpu_capacity with different performance characteristics.
So updating the user supplied cpu_capacity to all cpu siblings
is not correct.

And another problem is, current code assumes that 'all cpus in a cluster
or with same package_id (core_siblings), would have same cpu_capacity'.
But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove
cpu topology sibling masks")', when a cpu hotplugged out, the cpu
information gets cleared in its sibling cpus. So user supplied
cpu_capacity would be applied to only online sibling cpus at the time.
After that, if any cpu hot plugged in, it would have different cpu_capacity
than its siblings, which breaks the above assumption.

So instead of mucking around the core sibling mask for user supplied
value, use device-tree to set cpu capacity. And make the cpu_capacity
node as read-only to know the assymetry between cpus in the system.

Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
---
 drivers/base/arch_topology.c | 33 +--------------------------------
 1 file changed, 1 insertion(+), 32 deletions(-)

Comments

Juri Lelli March 7, 2019, 7:28 a.m. UTC | #1
Hi,

On 06/03/19 20:57, Lingutla Chandrasekhar wrote:
> If user updates any cpu's cpu_capacity, then the new value is going to
> be applied to all its online sibling cpus. But this need not to be correct
> always, as sibling cpus (in ARM, same micro architecture cpus) would have
> different cpu_capacity with different performance characteristics.
> So updating the user supplied cpu_capacity to all cpu siblings
> is not correct.
> 
> And another problem is, current code assumes that 'all cpus in a cluster
> or with same package_id (core_siblings), would have same cpu_capacity'.
> But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove
> cpu topology sibling masks")', when a cpu hotplugged out, the cpu
> information gets cleared in its sibling cpus. So user supplied
> cpu_capacity would be applied to only online sibling cpus at the time.
> After that, if any cpu hot plugged in, it would have different cpu_capacity
> than its siblings, which breaks the above assumption.
> 
> So instead of mucking around the core sibling mask for user supplied
> value, use device-tree to set cpu capacity. And make the cpu_capacity
> node as read-only to know the assymetry between cpus in the system.
> 
> Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
> ---
>  drivers/base/arch_topology.c | 33 +--------------------------------
>  1 file changed, 1 insertion(+), 32 deletions(-)
> 
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index edfcf8d..d455897 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -7,7 +7,6 @@
>   */
>  
>  #include <linux/acpi.h>
> -#include <linux/arch_topology.h>
>  #include <linux/cpu.h>
>  #include <linux/cpufreq.h>
>  #include <linux/device.h>
> @@ -51,37 +50,7 @@ static ssize_t cpu_capacity_show(struct device *dev,
>  static void update_topology_flags_workfn(struct work_struct *work);
>  static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn);
>  
> -static ssize_t cpu_capacity_store(struct device *dev,
> -				  struct device_attribute *attr,
> -				  const char *buf,
> -				  size_t count)
> -{
> -	struct cpu *cpu = container_of(dev, struct cpu, dev);
> -	int this_cpu = cpu->dev.id;
> -	int i;
> -	unsigned long new_capacity;
> -	ssize_t ret;
> -
> -	if (!count)
> -		return 0;
> -
> -	ret = kstrtoul(buf, 0, &new_capacity);
> -	if (ret)
> -		return ret;
> -	if (new_capacity > SCHED_CAPACITY_SCALE)
> -		return -EINVAL;
> -
> -	mutex_lock(&cpu_scale_mutex);
> -	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
> -		topology_set_cpu_scale(i, new_capacity);
> -	mutex_unlock(&cpu_scale_mutex);
> -
> -	schedule_work(&update_topology_flags_work);
> -
> -	return count;
> -}
> -
> -static DEVICE_ATTR_RW(cpu_capacity);
> +static DEVICE_ATTR_RO(cpu_capacity);

There are cases in which this needs to be RW, as recently discussed
https://lore.kernel.org/lkml/20181123135807.GA14964@e107155-lin/

IMHO, if the core_sibling assumption doesn't work in all cases, one
should be looking into fixing it, rather than making this RO.

Best,

- Juri
Quentin Perret March 7, 2019, 9:31 a.m. UTC | #2
Hi Juri,

On Thursday 07 Mar 2019 at 08:28:56 (+0100), Juri Lelli wrote:
> There are cases in which this needs to be RW, as recently discussed
> https://lore.kernel.org/lkml/20181123135807.GA14964@e107155-lin/

Yeah there's that problem when you can't fix your DT ... But I guess
this is a problem for _all_ values in the DT, not just capacities right ?
But these other values, I'd expected they just can't be fixed from
userspace most of the time, you just have to live with sub-optimal
values. So I don't find it unreasonable to do that for capacities too.

> IMHO, if the core_sibling assumption doesn't work in all cases, one
> should be looking into fixing it, rather than making this RO.

It's just that this thing keeps causing more harm than it helps IMO.
It's quite severely broken ATM, and it prevents us from assuming
'stable' capacity values in places were we'd like to do so (e.g. EAS).

And I'm not aware of a single platform where this is used. So, I'm
personally all for removing the write capability if we can.

Thanks,
Quentin
Juri Lelli March 7, 2019, 9:57 a.m. UTC | #3
Hi,

On 07/03/19 09:31, Quentin Perret wrote:
> Hi Juri,
> 
> On Thursday 07 Mar 2019 at 08:28:56 (+0100), Juri Lelli wrote:
> > There are cases in which this needs to be RW, as recently discussed
> > https://lore.kernel.org/lkml/20181123135807.GA14964@e107155-lin/
> 
> Yeah there's that problem when you can't fix your DT ... But I guess
> this is a problem for _all_ values in the DT, not just capacities right ?
> But these other values, I'd expected they just can't be fixed from
> userspace most of the time, you just have to live with sub-optimal
> values. So I don't find it unreasonable to do that for capacities too.
> 
> > IMHO, if the core_sibling assumption doesn't work in all cases, one
> > should be looking into fixing it, rather than making this RO.
> 
> It's just that this thing keeps causing more harm than it helps IMO.
> It's quite severely broken ATM, and it prevents us from assuming
> 'stable' capacity values in places were we'd like to do so (e.g. EAS).
> 
> And I'm not aware of a single platform where this is used. So, I'm
> personally all for removing the write capability if we can.

If people think it's best to simply make this RO, I won't be against it.
Just pointed out a conversation we recently had. Guess we could also
make it RW again (properly) in the future if somebody complains.

Best,

- Juri
Quentin Perret March 7, 2019, 12:14 p.m. UTC | #4
On Thursday 07 Mar 2019 at 10:57:50 (+0100), Juri Lelli wrote:
> If people think it's best to simply make this RO, I won't be against it.
> Just pointed out a conversation we recently had. Guess we could also
> make it RW again (properly) in the future if somebody complains.

Right, now is probably the time to give it a go before folks start
depending on it. And if I am wrong (and that happens more often than I'd
like unfortunately :-)) and there are users of that thing, then the
revert should be trivial.

Thanks,
Quentin
Sudeep Holla March 7, 2019, 3:04 p.m. UTC | #5
On Thu, Mar 07, 2019 at 12:14:03PM +0000, Quentin Perret wrote:
> On Thursday 07 Mar 2019 at 10:57:50 (+0100), Juri Lelli wrote:
> > If people think it's best to simply make this RO, I won't be against it.
> > Just pointed out a conversation we recently had. Guess we could also
> > make it RW again (properly) in the future if somebody complains.
>
> Right, now is probably the time to give it a go before folks start
> depending on it. And if I am wrong (and that happens more often than I'd
> like unfortunately :-)) and there are users of that thing, then the
> revert should be trivial.
>

+1 on all the points above ;)(I may also be getting things wrong here
but I am not convinced that we can resolve the issue for all the ARM
vendor possible combinations we may have to address)

We should come up with some *magical* cpumask that we can use if we
want to retain this write capability. And only way I see we can do that
is using DT which in turn eliminates the need to have write capability
for this sysfs.

So I am going to ack the $subject patch for now.

--
Regards,
Sudeep
Sudeep Holla March 7, 2019, 3:19 p.m. UTC | #6
On Wed, Mar 06, 2019 at 08:57:53PM +0530, Lingutla Chandrasekhar wrote:
> If user updates any cpu's cpu_capacity, then the new value is going to
> be applied to all its online sibling cpus. But this need not to be correct
> always, as sibling cpus (in ARM, same micro architecture cpus) would have
> different cpu_capacity with different performance characteristics.
> So updating the user supplied cpu_capacity to all cpu siblings
> is not correct.
>
> And another problem is, current code assumes that 'all cpus in a cluster
> or with same package_id (core_siblings), would have same cpu_capacity'.
> But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove
> cpu topology sibling masks")', when a cpu hotplugged out, the cpu
> information gets cleared in its sibling cpus. So user supplied
> cpu_capacity would be applied to only online sibling cpus at the time.
> After that, if any cpu hot plugged in, it would have different cpu_capacity
> than its siblings, which breaks the above assumption.
>
> So instead of mucking around the core sibling mask for user supplied
> value, use device-tree to set cpu capacity. And make the cpu_capacity
> node as read-only to know the assymetry between cpus in the system.
>

Acked-by: Sudeep Holla <sudeep.holla@arm.com>

IIRC this was added for 2 possibilities though I don't completely agree
no one had any objections(including me though I wonder how/why I missed
to notice it now, anyways it's too late)

1. For systems that don't provide this information via device-tree/any
   firmware though that's the highly recommended way. With more complex
   topologies in horizon, I can't think of fetching/deducing this
   information *correctly* in any other sane way.

2. For some sort of tuning(avoid rebuild and reboot), but that's
   questionable as this is not a software characteristic. It's more
   like deriving hardware characteristics using software experiments.
   So, for me, we can compare this with some hardware latencies we have
   like CPU idle entry/exit latencies. They are tuned but not in
   production kernels. So if there's a case for adding this back as
   write capable sysfs, I would prefer that in debugfs and this sysfs
   is read-only ABI.

Hope that helps.

--
Regards,
Sudeep
Dietmar Eggemann March 8, 2019, 11:45 a.m. UTC | #7
On 3/6/19 4:27 PM, Lingutla Chandrasekhar wrote:

[...]

> @@ -51,37 +50,7 @@ static ssize_t cpu_capacity_show(struct device *dev,
>   static void update_topology_flags_workfn(struct work_struct *work);
>   static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn);
>   
> -static ssize_t cpu_capacity_store(struct device *dev,
> -				  struct device_attribute *attr,
> -				  const char *buf,
> -				  size_t count)
> -{
> -	struct cpu *cpu = container_of(dev, struct cpu, dev);
> -	int this_cpu = cpu->dev.id;
> -	int i;
> -	unsigned long new_capacity;
> -	ssize_t ret;
> -
> -	if (!count)
> -		return 0;
> -
> -	ret = kstrtoul(buf, 0, &new_capacity);
> -	if (ret)
> -		return ret;
> -	if (new_capacity > SCHED_CAPACITY_SCALE)
> -		return -EINVAL;
> -
> -	mutex_lock(&cpu_scale_mutex);

Since we can't write to cpu_scale from here anymore, we could get rid of 
cpu_scale_mutex. 
topology_normalize_cpu_scale()->topology_set_cpu_scale() is now only 
called from:

[    0.202628]  topology_normalize_cpu_scale+0x28/0x30
[    0.207529]  init_cpu_topology+0x168/0x1e8
[    0.211644]  smp_prepare_cpus+0x2c/0x108
[    0.215585]  kernel_init_freeable+0x104/0x518
[    0.219963]  kernel_init+0x18/0x110
[    0.223469]  ret_from_fork+0x10/0x1c

for dts capacity-dmips-mhz properties

and

[    3.130180]  topology_normalize_cpu_scale.part.0+0xac/0xd0
[    3.135619]  init_cpu_capacity_callback+0x100/0x178
[    3.140459]  notifier_call_chain+0x5c/0xa0
[    3.144522]  blocking_notifier_call_chain+0x64/0x88
[    3.149363]  cpufreq_set_policy+0xd8/0x3c8
[    3.153427]  cpufreq_init_policy+0x78/0xc8

for cpufreq max frequency related adjustments to cpu capacity.

The mutex was introduced for the sysfs interface here: 
https://lore.kernel.org/lkml/1468932048-31635-8-git-send-email-juri.lelli@arm.com

> -	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
> -		topology_set_cpu_scale(i, new_capacity);
> -	mutex_unlock(&cpu_scale_mutex);
> -
> -	schedule_work(&update_topology_flags_work);
> -
> -	return count;
> -}
> -
> -static DEVICE_ATTR_RW(cpu_capacity);
> +static DEVICE_ATTR_RO(cpu_capacity);
>   
>   static int register_cpu_capacity_sysctl(void)
>   {
> 

Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

on Arm64 Juno with v5.0
diff mbox series

Patch

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index edfcf8d..d455897 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -7,7 +7,6 @@ 
  */
 
 #include <linux/acpi.h>
-#include <linux/arch_topology.h>
 #include <linux/cpu.h>
 #include <linux/cpufreq.h>
 #include <linux/device.h>
@@ -51,37 +50,7 @@  static ssize_t cpu_capacity_show(struct device *dev,
 static void update_topology_flags_workfn(struct work_struct *work);
 static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn);
 
-static ssize_t cpu_capacity_store(struct device *dev,
-				  struct device_attribute *attr,
-				  const char *buf,
-				  size_t count)
-{
-	struct cpu *cpu = container_of(dev, struct cpu, dev);
-	int this_cpu = cpu->dev.id;
-	int i;
-	unsigned long new_capacity;
-	ssize_t ret;
-
-	if (!count)
-		return 0;
-
-	ret = kstrtoul(buf, 0, &new_capacity);
-	if (ret)
-		return ret;
-	if (new_capacity > SCHED_CAPACITY_SCALE)
-		return -EINVAL;
-
-	mutex_lock(&cpu_scale_mutex);
-	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
-		topology_set_cpu_scale(i, new_capacity);
-	mutex_unlock(&cpu_scale_mutex);
-
-	schedule_work(&update_topology_flags_work);
-
-	return count;
-}
-
-static DEVICE_ATTR_RW(cpu_capacity);
+static DEVICE_ATTR_RO(cpu_capacity);
 
 static int register_cpu_capacity_sysctl(void)
 {