Message ID | 1551886073-16217-1-git-send-email-clingutla@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v1] arch_topology: Make cpu_capacity sysfs node as ready-only | expand |
Hi, On 06/03/19 20:57, Lingutla Chandrasekhar wrote: > If user updates any cpu's cpu_capacity, then the new value is going to > be applied to all its online sibling cpus. But this need not to be correct > always, as sibling cpus (in ARM, same micro architecture cpus) would have > different cpu_capacity with different performance characteristics. > So updating the user supplied cpu_capacity to all cpu siblings > is not correct. > > And another problem is, current code assumes that 'all cpus in a cluster > or with same package_id (core_siblings), would have same cpu_capacity'. > But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove > cpu topology sibling masks")', when a cpu hotplugged out, the cpu > information gets cleared in its sibling cpus. So user supplied > cpu_capacity would be applied to only online sibling cpus at the time. > After that, if any cpu hot plugged in, it would have different cpu_capacity > than its siblings, which breaks the above assumption. > > So instead of mucking around the core sibling mask for user supplied > value, use device-tree to set cpu capacity. And make the cpu_capacity > node as read-only to know the assymetry between cpus in the system. > > Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> > --- > drivers/base/arch_topology.c | 33 +-------------------------------- > 1 file changed, 1 insertion(+), 32 deletions(-) > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > index edfcf8d..d455897 100644 > --- a/drivers/base/arch_topology.c > +++ b/drivers/base/arch_topology.c > @@ -7,7 +7,6 @@ > */ > > #include <linux/acpi.h> > -#include <linux/arch_topology.h> > #include <linux/cpu.h> > #include <linux/cpufreq.h> > #include <linux/device.h> > @@ -51,37 +50,7 @@ static ssize_t cpu_capacity_show(struct device *dev, > static void update_topology_flags_workfn(struct work_struct *work); > static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn); > > -static ssize_t cpu_capacity_store(struct device *dev, > - struct device_attribute *attr, > - const char *buf, > - size_t count) > -{ > - struct cpu *cpu = container_of(dev, struct cpu, dev); > - int this_cpu = cpu->dev.id; > - int i; > - unsigned long new_capacity; > - ssize_t ret; > - > - if (!count) > - return 0; > - > - ret = kstrtoul(buf, 0, &new_capacity); > - if (ret) > - return ret; > - if (new_capacity > SCHED_CAPACITY_SCALE) > - return -EINVAL; > - > - mutex_lock(&cpu_scale_mutex); > - for_each_cpu(i, &cpu_topology[this_cpu].core_sibling) > - topology_set_cpu_scale(i, new_capacity); > - mutex_unlock(&cpu_scale_mutex); > - > - schedule_work(&update_topology_flags_work); > - > - return count; > -} > - > -static DEVICE_ATTR_RW(cpu_capacity); > +static DEVICE_ATTR_RO(cpu_capacity); There are cases in which this needs to be RW, as recently discussed https://lore.kernel.org/lkml/20181123135807.GA14964@e107155-lin/ IMHO, if the core_sibling assumption doesn't work in all cases, one should be looking into fixing it, rather than making this RO. Best, - Juri
Hi Juri, On Thursday 07 Mar 2019 at 08:28:56 (+0100), Juri Lelli wrote: > There are cases in which this needs to be RW, as recently discussed > https://lore.kernel.org/lkml/20181123135807.GA14964@e107155-lin/ Yeah there's that problem when you can't fix your DT ... But I guess this is a problem for _all_ values in the DT, not just capacities right ? But these other values, I'd expected they just can't be fixed from userspace most of the time, you just have to live with sub-optimal values. So I don't find it unreasonable to do that for capacities too. > IMHO, if the core_sibling assumption doesn't work in all cases, one > should be looking into fixing it, rather than making this RO. It's just that this thing keeps causing more harm than it helps IMO. It's quite severely broken ATM, and it prevents us from assuming 'stable' capacity values in places were we'd like to do so (e.g. EAS). And I'm not aware of a single platform where this is used. So, I'm personally all for removing the write capability if we can. Thanks, Quentin
Hi, On 07/03/19 09:31, Quentin Perret wrote: > Hi Juri, > > On Thursday 07 Mar 2019 at 08:28:56 (+0100), Juri Lelli wrote: > > There are cases in which this needs to be RW, as recently discussed > > https://lore.kernel.org/lkml/20181123135807.GA14964@e107155-lin/ > > Yeah there's that problem when you can't fix your DT ... But I guess > this is a problem for _all_ values in the DT, not just capacities right ? > But these other values, I'd expected they just can't be fixed from > userspace most of the time, you just have to live with sub-optimal > values. So I don't find it unreasonable to do that for capacities too. > > > IMHO, if the core_sibling assumption doesn't work in all cases, one > > should be looking into fixing it, rather than making this RO. > > It's just that this thing keeps causing more harm than it helps IMO. > It's quite severely broken ATM, and it prevents us from assuming > 'stable' capacity values in places were we'd like to do so (e.g. EAS). > > And I'm not aware of a single platform where this is used. So, I'm > personally all for removing the write capability if we can. If people think it's best to simply make this RO, I won't be against it. Just pointed out a conversation we recently had. Guess we could also make it RW again (properly) in the future if somebody complains. Best, - Juri
On Thursday 07 Mar 2019 at 10:57:50 (+0100), Juri Lelli wrote: > If people think it's best to simply make this RO, I won't be against it. > Just pointed out a conversation we recently had. Guess we could also > make it RW again (properly) in the future if somebody complains. Right, now is probably the time to give it a go before folks start depending on it. And if I am wrong (and that happens more often than I'd like unfortunately :-)) and there are users of that thing, then the revert should be trivial. Thanks, Quentin
On Thu, Mar 07, 2019 at 12:14:03PM +0000, Quentin Perret wrote: > On Thursday 07 Mar 2019 at 10:57:50 (+0100), Juri Lelli wrote: > > If people think it's best to simply make this RO, I won't be against it. > > Just pointed out a conversation we recently had. Guess we could also > > make it RW again (properly) in the future if somebody complains. > > Right, now is probably the time to give it a go before folks start > depending on it. And if I am wrong (and that happens more often than I'd > like unfortunately :-)) and there are users of that thing, then the > revert should be trivial. > +1 on all the points above ;)(I may also be getting things wrong here but I am not convinced that we can resolve the issue for all the ARM vendor possible combinations we may have to address) We should come up with some *magical* cpumask that we can use if we want to retain this write capability. And only way I see we can do that is using DT which in turn eliminates the need to have write capability for this sysfs. So I am going to ack the $subject patch for now. -- Regards, Sudeep
On Wed, Mar 06, 2019 at 08:57:53PM +0530, Lingutla Chandrasekhar wrote: > If user updates any cpu's cpu_capacity, then the new value is going to > be applied to all its online sibling cpus. But this need not to be correct > always, as sibling cpus (in ARM, same micro architecture cpus) would have > different cpu_capacity with different performance characteristics. > So updating the user supplied cpu_capacity to all cpu siblings > is not correct. > > And another problem is, current code assumes that 'all cpus in a cluster > or with same package_id (core_siblings), would have same cpu_capacity'. > But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove > cpu topology sibling masks")', when a cpu hotplugged out, the cpu > information gets cleared in its sibling cpus. So user supplied > cpu_capacity would be applied to only online sibling cpus at the time. > After that, if any cpu hot plugged in, it would have different cpu_capacity > than its siblings, which breaks the above assumption. > > So instead of mucking around the core sibling mask for user supplied > value, use device-tree to set cpu capacity. And make the cpu_capacity > node as read-only to know the assymetry between cpus in the system. > Acked-by: Sudeep Holla <sudeep.holla@arm.com> IIRC this was added for 2 possibilities though I don't completely agree no one had any objections(including me though I wonder how/why I missed to notice it now, anyways it's too late) 1. For systems that don't provide this information via device-tree/any firmware though that's the highly recommended way. With more complex topologies in horizon, I can't think of fetching/deducing this information *correctly* in any other sane way. 2. For some sort of tuning(avoid rebuild and reboot), but that's questionable as this is not a software characteristic. It's more like deriving hardware characteristics using software experiments. So, for me, we can compare this with some hardware latencies we have like CPU idle entry/exit latencies. They are tuned but not in production kernels. So if there's a case for adding this back as write capable sysfs, I would prefer that in debugfs and this sysfs is read-only ABI. Hope that helps. -- Regards, Sudeep
On 3/6/19 4:27 PM, Lingutla Chandrasekhar wrote: [...] > @@ -51,37 +50,7 @@ static ssize_t cpu_capacity_show(struct device *dev, > static void update_topology_flags_workfn(struct work_struct *work); > static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn); > > -static ssize_t cpu_capacity_store(struct device *dev, > - struct device_attribute *attr, > - const char *buf, > - size_t count) > -{ > - struct cpu *cpu = container_of(dev, struct cpu, dev); > - int this_cpu = cpu->dev.id; > - int i; > - unsigned long new_capacity; > - ssize_t ret; > - > - if (!count) > - return 0; > - > - ret = kstrtoul(buf, 0, &new_capacity); > - if (ret) > - return ret; > - if (new_capacity > SCHED_CAPACITY_SCALE) > - return -EINVAL; > - > - mutex_lock(&cpu_scale_mutex); Since we can't write to cpu_scale from here anymore, we could get rid of cpu_scale_mutex. topology_normalize_cpu_scale()->topology_set_cpu_scale() is now only called from: [ 0.202628] topology_normalize_cpu_scale+0x28/0x30 [ 0.207529] init_cpu_topology+0x168/0x1e8 [ 0.211644] smp_prepare_cpus+0x2c/0x108 [ 0.215585] kernel_init_freeable+0x104/0x518 [ 0.219963] kernel_init+0x18/0x110 [ 0.223469] ret_from_fork+0x10/0x1c for dts capacity-dmips-mhz properties and [ 3.130180] topology_normalize_cpu_scale.part.0+0xac/0xd0 [ 3.135619] init_cpu_capacity_callback+0x100/0x178 [ 3.140459] notifier_call_chain+0x5c/0xa0 [ 3.144522] blocking_notifier_call_chain+0x64/0x88 [ 3.149363] cpufreq_set_policy+0xd8/0x3c8 [ 3.153427] cpufreq_init_policy+0x78/0xc8 for cpufreq max frequency related adjustments to cpu capacity. The mutex was introduced for the sysfs interface here: https://lore.kernel.org/lkml/1468932048-31635-8-git-send-email-juri.lelli@arm.com > - for_each_cpu(i, &cpu_topology[this_cpu].core_sibling) > - topology_set_cpu_scale(i, new_capacity); > - mutex_unlock(&cpu_scale_mutex); > - > - schedule_work(&update_topology_flags_work); > - > - return count; > -} > - > -static DEVICE_ATTR_RW(cpu_capacity); > +static DEVICE_ATTR_RO(cpu_capacity); > > static int register_cpu_capacity_sysctl(void) > { > Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> on Arm64 Juno with v5.0
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index edfcf8d..d455897 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -7,7 +7,6 @@ */ #include <linux/acpi.h> -#include <linux/arch_topology.h> #include <linux/cpu.h> #include <linux/cpufreq.h> #include <linux/device.h> @@ -51,37 +50,7 @@ static ssize_t cpu_capacity_show(struct device *dev, static void update_topology_flags_workfn(struct work_struct *work); static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn); -static ssize_t cpu_capacity_store(struct device *dev, - struct device_attribute *attr, - const char *buf, - size_t count) -{ - struct cpu *cpu = container_of(dev, struct cpu, dev); - int this_cpu = cpu->dev.id; - int i; - unsigned long new_capacity; - ssize_t ret; - - if (!count) - return 0; - - ret = kstrtoul(buf, 0, &new_capacity); - if (ret) - return ret; - if (new_capacity > SCHED_CAPACITY_SCALE) - return -EINVAL; - - mutex_lock(&cpu_scale_mutex); - for_each_cpu(i, &cpu_topology[this_cpu].core_sibling) - topology_set_cpu_scale(i, new_capacity); - mutex_unlock(&cpu_scale_mutex); - - schedule_work(&update_topology_flags_work); - - return count; -} - -static DEVICE_ATTR_RW(cpu_capacity); +static DEVICE_ATTR_RO(cpu_capacity); static int register_cpu_capacity_sysctl(void) {
If user updates any cpu's cpu_capacity, then the new value is going to be applied to all its online sibling cpus. But this need not to be correct always, as sibling cpus (in ARM, same micro architecture cpus) would have different cpu_capacity with different performance characteristics. So updating the user supplied cpu_capacity to all cpu siblings is not correct. And another problem is, current code assumes that 'all cpus in a cluster or with same package_id (core_siblings), would have same cpu_capacity'. But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove cpu topology sibling masks")', when a cpu hotplugged out, the cpu information gets cleared in its sibling cpus. So user supplied cpu_capacity would be applied to only online sibling cpus at the time. After that, if any cpu hot plugged in, it would have different cpu_capacity than its siblings, which breaks the above assumption. So instead of mucking around the core sibling mask for user supplied value, use device-tree to set cpu capacity. And make the cpu_capacity node as read-only to know the assymetry between cpus in the system. Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> --- drivers/base/arch_topology.c | 33 +-------------------------------- 1 file changed, 1 insertion(+), 32 deletions(-)