ARM: report present cpus in /proc/cpuinfo

Message ID	4E012198.6010405@nvidia.com (mailing list archive)
State	New, archived
Headers	show Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p5LMqBCU031223 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <patchwork-linux-arm@patchwork.kernel.org>; Tue, 21 Jun 2011 22:52:32 GMT Message-ID: <4E012198.6010405@nvidia.com> Date: Tue, 21 Jun 2011 15:56:24 -0700 From: Jon Mayo <jmayo@nvidia.com> Organization: NVIDIA Corporation User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Lightning/1.0b2 Thunderbird/3.1.8 MIME-Version: 1.0 To: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Subject: [PATCH] ARM: report present cpus in /proc/cpuinfo summary: Content analysis details: (-2.3 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [216.228.121.35 listed in list.dnswl.org] Precedence: list Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Jon Mayo June 21, 2011, 10:56 p.m. UTC

Because arm linux likes to dynamically hotplug cpus, the meaning of
online has changed slightly. Previously online meant a cpus is
schedulable, and conversely offline means they it is not schedulable.
But with the current power management infrastructure there are cpus
that can be scheduled (after they are woken up automatically), yet are
not considered "online" because the masks and flags for them are not
set.

---
  arch/arm/kernel/setup.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

Fabio Estevam June 21, 2011, 10:59 p.m. UTC | #1

On Tue, Jun 21, 2011 at 7:56 PM, Jon Mayo <jmayo@nvidia.com> wrote:
> Because arm linux likes to dynamically hotplug cpus, the meaning of
> online has changed slightly. Previously online meant a cpus is
> schedulable, and conversely offline means they it is not schedulable.
> But with the current power management infrastructure there are cpus
> that can be scheduled (after they are woken up automatically), yet are
> not considered "online" because the masks and flags for them are not
> set.
>
> ---
>  arch/arm/kernel/setup.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)

You missed your Signed-off-by line.

Regards,

Fabio Estevam

Russell King - ARM Linux June 21, 2011, 11:05 p.m. UTC | #2

On Tue, Jun 21, 2011 at 03:56:24PM -0700, Jon Mayo wrote:
> Because arm linux likes to dynamically hotplug cpus, the meaning of
> online has changed slightly. Previously online meant a cpus is
> schedulable, and conversely offline means they it is not schedulable.
> But with the current power management infrastructure there are cpus
> that can be scheduled (after they are woken up automatically), yet are
> not considered "online" because the masks and flags for them are not
> set.

There be sharks here.  glibc can read /proc/cpuinfo to find out how
many CPUs are online.  glibc can also read /proc/stat to determine
that number.

Both files should be using the same test to ensure consistency.  That
is using the online mask, not the present mask.

If a CPU is hot unplugged (and therefore the bit is clear in the online
map) then it is not available for scheduling, and the system will not
wake the CPU up without intervention, no matter how high the load will
become, without userspace assistance.

So I don't think your change is correct, and the code as it stands is
right.

Jon Mayo June 21, 2011, 11:24 p.m. UTC | #3

On 06/21/2011 04:05 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 03:56:24PM -0700, Jon Mayo wrote:
>> Because arm linux likes to dynamically hotplug cpus, the meaning of
>> online has changed slightly. Previously online meant a cpus is
>> schedulable, and conversely offline means they it is not schedulable.
>> But with the current power management infrastructure there are cpus
>> that can be scheduled (after they are woken up automatically), yet are
>> not considered "online" because the masks and flags for them are not
>> set.
>
> There be sharks here.  glibc can read /proc/cpuinfo to find out how
> many CPUs are online.  glibc can also read /proc/stat to determine
> that number.
>

Yea. that's the issue I had with this patch. I couldn't come up with a 
way to make /proc/stat behave the same without impact other arches.

also what you described is something we call a race. :) reading and 
parsing cpuinfo then stat, or vice versa, is not atomic. glibc is just 
going to have to suck it up and deal with cpu1-3 on my system popping in 
and out randomly in both cpuinfo and stat with the current implementation.

if you feel that glibc can't handle this inconsistency built-in to the 
current implementation, then we probably should get together and agree 
to fix something. (I'm not sure what the right answer is, hence my 
proposed patch without any sort of signed-off-by)

> Both files should be using the same test to ensure consistency.  That
> is using the online mask, not the present mask.
>
> If a CPU is hot unplugged (and therefore the bit is clear in the online
> map) then it is not available for scheduling, and the system will not
> wake the CPU up without intervention, no matter how high the load will
> become, without userspace assistance.
>
> So I don't think your change is correct, and the code as it stands is
> right.

I don't think the behavior of ARM linux makes sense. Neither change is 
truly correct in my mind. What I feel is the correct behavior is a list 
(in both stat and cpuinfo) of all cpus either running a task or ready to 
run a task. cpu_possible_mask, cpu_present_mask, and cpu_online_mask 
don't have semantics on ARM that I feel is right. (I don't understand 
what cpu_active_mask is, but it's probably not what I want either)

Russell King - ARM Linux June 21, 2011, 11:36 p.m. UTC | #4

On Tue, Jun 21, 2011 at 04:24:16PM -0700, Jon Mayo wrote:
> On 06/21/2011 04:05 PM, Russell King - ARM Linux wrote:
>> On Tue, Jun 21, 2011 at 03:56:24PM -0700, Jon Mayo wrote:
>>> Because arm linux likes to dynamically hotplug cpus, the meaning of
>>> online has changed slightly. Previously online meant a cpus is
>>> schedulable, and conversely offline means they it is not schedulable.
>>> But with the current power management infrastructure there are cpus
>>> that can be scheduled (after they are woken up automatically), yet are
>>> not considered "online" because the masks and flags for them are not
>>> set.
>>
>> There be sharks here.  glibc can read /proc/cpuinfo to find out how
>> many CPUs are online.  glibc can also read /proc/stat to determine
>> that number.
>>
>
> Yea. that's the issue I had with this patch. I couldn't come up with a  
> way to make /proc/stat behave the same without impact other arches.
>
> also what you described is something we call a race. :) reading and  
> parsing cpuinfo then stat, or vice versa, is not atomic. glibc is just  
> going to have to suck it up and deal with cpu1-3 on my system popping in  
> and out randomly in both cpuinfo and stat with the current 
> implementation.

Well, that's how it is - Linus tends to be opposed to adding syscalls
to allow glibc to get this kind of information, blaming glibc for being
dumb instead.  If you look at the glibc source, you'll see that it has
a hint about a 'syscall' for __get_nprocs - that's quite old and has
never happened.  I don't hope out much hope of anything changing anytime
soon.

> I don't think the behavior of ARM linux makes sense. Neither change is  
> truly correct in my mind. What I feel is the correct behavior is a list  
> (in both stat and cpuinfo) of all cpus either running a task or ready to  
> run a task.

That _is_ what you have listed in /proc/cpuinfo and /proc/stat.

> cpu_possible_mask, cpu_present_mask, and cpu_online_mask  
> don't have semantics on ARM that I feel is right. (I don't understand  
> what cpu_active_mask is, but it's probably not what I want either)

They have their defined meaning.

cpu_possible_mask - the CPU number may be available
cpu_present_mask - the CPU number is present and is available to be brought
	online upon request by the hotplug code
cpu_online_mask - the CPU is becoming available for scheduling
cpu_active_mask - the CPU is fully online and available for scheduling

CPUs only spend a _very_ short time in the online but !active state
(about the time it takes the CPU asking for it to be brought up to
notice that it has been brought up, and for the scheduler migration
code to receive the notification that the CPU is now online.)  So
you can regard the active mask as a mere copy of the online mask for
most purposes.

CPUs may be set in the possible mask but not the present mask - that
can happen if you limit the number of CPUs on the kernel command line.
However, we have no way to bring those CPUs to "present" status, and
so they are not available for bringing online - as far as the software
is concerned, they're as good as being physically unplugged.

CPUs in the possible mask indicate CPUs which can be present while
this kernel is running.

So, actually, our use of these is correct - it can't be any different
to any other architecture, because the code which interprets the state
from these masks is all architecture independent.

For instance, the generic hotplug code will refuse to bring a cpu
online by doing this test:

        if (cpu_online(cpu) || !cpu_present(cpu))
                return -EINVAL;

So, we (in arch/arm) can't change that decision.  Same for online &&
active must both be set in order for any process to be scheduled onto
that CPU - if any process is on a CPU which is going offline (and
therefore !active, !online) then it will be migrated off that CPU by
generic code before the CPU goes offline.

I think what you're getting confused over is that within nvidia, you're
probably dynamically hotplugging CPUs, and so offline CPUs are apparantly
available to the system if the load on the system rises.  That's not
something in the generic kernel, and is a custom addition.  Such an
addition _can_ be seen to change the definition of the above masks,
but that's not the fault of the kernel - that's the way you're driving
the hotplug system.

So, I don't believe there's anything what so ever wrong here, and I
don't believe that we're doing anything different from any other SMP
platform wrt these masks.

Jon Mayo June 22, 2011, 12:08 a.m. UTC | #5

On 06/21/2011 04:36 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 04:24:16PM -0700, Jon Mayo wrote:
>> On 06/21/2011 04:05 PM, Russell King - ARM Linux wrote:
>>> On Tue, Jun 21, 2011 at 03:56:24PM -0700, Jon Mayo wrote:
>>>> Because arm linux likes to dynamically hotplug cpus, the meaning of
>>>> online has changed slightly. Previously online meant a cpus is
>>>> schedulable, and conversely offline means they it is not schedulable.
>>>> But with the current power management infrastructure there are cpus
>>>> that can be scheduled (after they are woken up automatically), yet are
>>>> not considered "online" because the masks and flags for them are not
>>>> set.
>>>
>>> There be sharks here.  glibc can read /proc/cpuinfo to find out how
>>> many CPUs are online.  glibc can also read /proc/stat to determine
>>> that number.
>>>
>>
>> Yea. that's the issue I had with this patch. I couldn't come up with a
>> way to make /proc/stat behave the same without impact other arches.
>>
>> also what you described is something we call a race. :) reading and
>> parsing cpuinfo then stat, or vice versa, is not atomic. glibc is just
>> going to have to suck it up and deal with cpu1-3 on my system popping in
>> and out randomly in both cpuinfo and stat with the current
>> implementation.
>
> Well, that's how it is - Linus tends to be opposed to adding syscalls
> to allow glibc to get this kind of information, blaming glibc for being
> dumb instead.  If you look at the glibc source, you'll see that it has
> a hint about a 'syscall' for __get_nprocs - that's quite old and has
> never happened.  I don't hope out much hope of anything changing anytime
> soon.
>

This issue has had me concerned for a while. Because in userspace it can 
be advantageous to allocate per-cpu structures on start-up for some 
threading tricks. but if you use the wrong count, funny things can happen.

>> I don't think the behavior of ARM linux makes sense. Neither change is
>> truly correct in my mind. What I feel is the correct behavior is a list
>> (in both stat and cpuinfo) of all cpus either running a task or ready to
>> run a task.
>
> That _is_ what you have listed in /proc/cpuinfo and /proc/stat.
>

What I see is my idle cpus are not there because we hot unplug them so 
their power domains can be turned off. scheduling them can happen, but 
only if an extra step occurs. From user space it's transparent, from 
kernel space, there is a whole framework making decisions about when to 
dynamically turn on what.

>> cpu_possible_mask, cpu_present_mask, and cpu_online_mask
>> don't have semantics on ARM that I feel is right. (I don't understand
>> what cpu_active_mask is, but it's probably not what I want either)
>
> They have their defined meaning.
>
> cpu_possible_mask - the CPU number may be available
> cpu_present_mask - the CPU number is present and is available to be brought
> 	online upon request by the hotplug code
> cpu_online_mask - the CPU is becoming available for scheduling
> cpu_active_mask - the CPU is fully online and available for scheduling
>
> CPUs only spend a _very_ short time in the online but !active state
> (about the time it takes the CPU asking for it to be brought up to
> notice that it has been brought up, and for the scheduler migration
> code to receive the notification that the CPU is now online.)  So
> you can regard the active mask as a mere copy of the online mask for
> most purposes.
>
> CPUs may be set in the possible mask but not the present mask - that
> can happen if you limit the number of CPUs on the kernel command line.
> However, we have no way to bring those CPUs to "present" status, and
> so they are not available for bringing online - as far as the software
> is concerned, they're as good as being physically unplugged.
>

I don't see a use for that semantic. Why shouldn't we add a couple lines 
of code to the kernel to scrub out unusable situations?

> CPUs in the possible mask indicate CPUs which can be present while
> this kernel is running.
>
> So, actually, our use of these is correct - it can't be any different
> to any other architecture, because the code which interprets the state
> from these masks is all architecture independent.
>
> For instance, the generic hotplug code will refuse to bring a cpu
> online by doing this test:
>
>          if (cpu_online(cpu) || !cpu_present(cpu))
>                  return -EINVAL;
>
> So, we (in arch/arm) can't change that decision.  Same for online&&
> active must both be set in order for any process to be scheduled onto
> that CPU - if any process is on a CPU which is going offline (and
> therefore !active, !online) then it will be migrated off that CPU by
> generic code before the CPU goes offline.
>

I will accept that. But then does that mean we (either arch/arm or 
mach-tegra) have used the cpu hotplug system incorrectly?

> I think what you're getting confused over is that within nvidia, you're
> probably dynamically hotplugging CPUs, and so offline CPUs are apparantly
> available to the system if the load on the system rises.  That's not
> something in the generic kernel, and is a custom addition.  Such an
> addition _can_ be seen to change the definition of the above masks,
> but that's not the fault of the kernel - that's the way you're driving
> the hotplug system.
>

sorry. I thought we weren't the only one in arm driving it this way. if 
what we've done is strange, I'd like to correct it.

> So, I don't believe there's anything what so ever wrong here, and I
> don't believe that we're doing anything different from any other SMP
> platform wrt these masks.

Like if I were to think of a big mainframe or xeon server with hotplug 
cpus, the way the masks work makes perfect sense. I push a button, all 
the processes get cleared from the cpu, it is marked ass offline. I pull 
the card from the cabinet, and then it is !present. and maybe instead a 
new card at a later date. it's just like any other sort of hotplug thing.

I think my issue with cpuinfo/stat's output is with the semantics for 
"online" being different for this one architecture (mach-tegra) and 
possibly others (??) than what I would expect.

ps - thanks for your time, I really do appreciate this.

Russell King - ARM Linux June 22, 2011, 9:36 a.m. UTC | #6

On Tue, Jun 21, 2011 at 05:08:12PM -0700, Jon Mayo wrote:
> This issue has had me concerned for a while. Because in userspace it can  
> be advantageous to allocate per-cpu structures on start-up for some  
> threading tricks. but if you use the wrong count, funny things can 
> happen.

Again, if you look at the glibc sources, you'll find that they (in
theory) provide two different calls - getconf(_SC_NPROCESSORS_CONF)
and getconf(_SC_NPROCESSORS_ONLN).  See sysdeps/posix/sysconf.c.

That suggests you you should be using getconf(_SC_NPROCESSORS_CONF).

However, these are provided by __get_nprocs_conf() and __get_nprocs()
respectively. See sysdeps/unix/sysv/linux/getsysstats.c.  Notice this
comment:

/* As far as I know Linux has no separate numbers for configured and
   available processors.  So make the `get_nprocs_conf' function an
   alias.  */
strong_alias (__get_nprocs, __get_nprocs_conf)

So, getconf(_SC_NPROCESSORS_CONF) and getconf(_SC_NPROCESSORS_ONLN)
will probably return the same thing - and it seems to me that you
require that to be fixed.  That's not for us in ARM to sort out -
that's a _generic_ kernel and glibc issue, and needs to be discussed
elsewhere.

>>> I don't think the behavior of ARM linux makes sense. Neither change is
>>> truly correct in my mind. What I feel is the correct behavior is a list
>>> (in both stat and cpuinfo) of all cpus either running a task or ready to
>>> run a task.
>>
>> That _is_ what you have listed in /proc/cpuinfo and /proc/stat.
>>
>
> What I see is my idle cpus are not there because we hot unplug them so  
> their power domains can be turned off. scheduling them can happen, but  
> only if an extra step occurs. From user space it's transparent, from  
> kernel space, there is a whole framework making decisions about when to  
> dynamically turn on what.

Exactly.  You're complaining that the kernels interpretation of the masks
is not correct because you're driving it with a user program which
effectively changes that behaviour.

So, if we change the interpretation of the masks, we'll then have people
who aren't using your user program complaining that the masks are wrong
for them.  It's a no-win situation - there is no overall benefit to
changing the kernel.

The fact that you're using a user program which dynamically hot-plugs
CPUs means that _you're_ changing the system behaviour by running that
program, and _you're_ changing the meaning of those masks.

>>> cpu_possible_mask, cpu_present_mask, and cpu_online_mask
>>> don't have semantics on ARM that I feel is right. (I don't understand
>>> what cpu_active_mask is, but it's probably not what I want either)
>>
>> They have their defined meaning.
>>
>> cpu_possible_mask - the CPU number may be available
>> cpu_present_mask - the CPU number is present and is available to be brought
>> 	online upon request by the hotplug code
>> cpu_online_mask - the CPU is becoming available for scheduling
>> cpu_active_mask - the CPU is fully online and available for scheduling
>>
>> CPUs only spend a _very_ short time in the online but !active state
>> (about the time it takes the CPU asking for it to be brought up to
>> notice that it has been brought up, and for the scheduler migration
>> code to receive the notification that the CPU is now online.)  So
>> you can regard the active mask as a mere copy of the online mask for
>> most purposes.
>>
>> CPUs may be set in the possible mask but not the present mask - that
>> can happen if you limit the number of CPUs on the kernel command line.
>> However, we have no way to bring those CPUs to "present" status, and
>> so they are not available for bringing online - as far as the software
>> is concerned, they're as good as being physically unplugged.
>>
>
> I don't see a use for that semantic. Why shouldn't we add a couple lines  
> of code to the kernel to scrub out unusable situations?

Think about it - if you have real hot-pluggable CPUs (servers do), do
you _really_ want to try to bring online a possible CPU (iow, there's
a socket on the board) but one which isn't present (iow, the socket is
empty.)

That's what the possible + !present case caters for.  Possible tells
the kernel how many CPUs to allocate per-cpu data structures for.
present tells it whether a CPU can be onlined or not.

>> So, we (in arch/arm) can't change that decision.  Same for online&&
>> active must both be set in order for any process to be scheduled onto
>> that CPU - if any process is on a CPU which is going offline (and
>> therefore !active, !online) then it will be migrated off that CPU by
>> generic code before the CPU goes offline.
>>
>
> I will accept that. But then does that mean we (either arch/arm or  
> mach-tegra) have used the cpu hotplug system incorrectly?

It means you're using it in ways that it was not originally designed
to be used - which is for the physical act of hot-plugging CPUs in
servers.  CPU hotplug was never designed from the outset for this
kind of dynamic CPU power management.

Yes, you _can_ use the CPU hotplug interfaces to do this, but as you're
finding, there are problems with doing this.

We can't go around making ARM use CPU hotplug differently from everyone
else because that'll make things extremely fragile.  As you've already
found out, glibc getconf() ultimately uses /proc/stat to return the
number of CPUs.  So in order to allow dynamic hotplugging _and_ return
the sensible 'online CPUs' where 'online' means both those which are
currently running and dormant, you need to change generic code.

Plus, there's the issue of CPU affinity for processes and IRQs.  With
current CPU hotplug, a process which has chosen to bind to a particular
CPU will have that binding destroyed when the CPU is hot unplugged, and
its affinity will be broken.  It will be run on a different CPU.  This
is probably not the semantics you desire.

I'd argue that trying to do dynamic hotplugging is the wrong approach,
especially as there is CPUidle (see below.)

>> I think what you're getting confused over is that within nvidia, you're
>> probably dynamically hotplugging CPUs, and so offline CPUs are apparantly
>> available to the system if the load on the system rises.  That's not
>> something in the generic kernel, and is a custom addition.  Such an
>> addition _can_ be seen to change the definition of the above masks,
>> but that's not the fault of the kernel - that's the way you're driving
>> the hotplug system.
>>
>
> sorry. I thought we weren't the only one in arm driving it this way. if  
> what we've done is strange, I'd like to correct it.

I'm aware of ARM having done something like this in their SMP group
in the early days of SMP support, but I wasn't aware that it was still
being actively persued.  I guess that it never really got out of the
prototyping stage (or maybe it did but they chose not to address these
issues.)

> Like if I were to think of a big mainframe or xeon server with hotplug  
> cpus, the way the masks work makes perfect sense. I push a button, all  
> the processes get cleared from the cpu, it is marked ass offline. I pull  
> the card from the cabinet, and then it is !present. and maybe instead a  
> new card at a later date. it's just like any other sort of hotplug thing.
>
> I think my issue with cpuinfo/stat's output is with the semantics for  
> "online" being different for this one architecture (mach-tegra) and  
> possibly others (??) than what I would expect.

No, it's no different.  As I've explained above, the difference is that
you're running a userspace program which automatically does the
hotplugging depending on the system load.

That _can_ be viewed as fundamentally changing the system behaviour
because CPUs which are offlined are still available for scheduling
should the system load become high enough to trigger it.

I think there's an easier way to solve this problem: there is the CPU
idle infrastructure, which allows idle CPUs to remain online while
allowing them to power down when they're not required.  Because they
remain online, the scheduler will migrate tasks to them if they're
not doing anything, and maybe that's something else to look at.

The other advantage of CPUidle is that you're not breaking the affinity
of anything when the CPU is powered down, unlike hotplug.

So, I think you really should be looking at CPUidle, rather than trying
to do dynamic hotplugging based on system load.

Jon Mayo June 22, 2011, 7:26 p.m. UTC | #7

On 06/22/2011 02:36 AM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 05:08:12PM -0700, Jon Mayo wrote:
>> This issue has had me concerned for a while. Because in userspace it can
>> be advantageous to allocate per-cpu structures on start-up for some
>> threading tricks. but if you use the wrong count, funny things can
>> happen.
>
> Again, if you look at the glibc sources, you'll find that they (in
> theory) provide two different calls - getconf(_SC_NPROCESSORS_CONF)
> and getconf(_SC_NPROCESSORS_ONLN).  See sysdeps/posix/sysconf.c.
>
> That suggests you you should be using getconf(_SC_NPROCESSORS_CONF).
>
> However, these are provided by __get_nprocs_conf() and __get_nprocs()
> respectively. See sysdeps/unix/sysv/linux/getsysstats.c.  Notice this
> comment:
>
> /* As far as I know Linux has no separate numbers for configured and
>     available processors.  So make the `get_nprocs_conf' function an
>     alias.  */
> strong_alias (__get_nprocs, __get_nprocs_conf)
>
> So, getconf(_SC_NPROCESSORS_CONF) and getconf(_SC_NPROCESSORS_ONLN)
> will probably return the same thing - and it seems to me that you
> require that to be fixed.  That's not for us in ARM to sort out -
> that's a _generic_ kernel and glibc issue, and needs to be discussed
> elsewhere.
>

Thanks for that. I don't look at glibc too much. I tend to run 
everything but glibc.

>>>> I don't think the behavior of ARM linux makes sense. Neither change is
>>>> truly correct in my mind. What I feel is the correct behavior is a list
>>>> (in both stat and cpuinfo) of all cpus either running a task or ready to
>>>> run a task.
>>>
>>> That _is_ what you have listed in /proc/cpuinfo and /proc/stat.
>>>
>>
>> What I see is my idle cpus are not there because we hot unplug them so
>> their power domains can be turned off. scheduling them can happen, but
>> only if an extra step occurs. From user space it's transparent, from
>> kernel space, there is a whole framework making decisions about when to
>> dynamically turn on what.
>
> Exactly.  You're complaining that the kernels interpretation of the masks
> is not correct because you're driving it with a user program which
> effectively changes that behaviour.
>

small correction to your statement. I'm driving it entirely with the 
kernel. And presenting something that isn't quite what user programs expect.

> So, if we change the interpretation of the masks, we'll then have people
> who aren't using your user program complaining that the masks are wrong
> for them.  It's a no-win situation - there is no overall benefit to
> changing the kernel.
>
> The fact that you're using a user program which dynamically hot-plugs
> CPUs means that _you're_ changing the system behaviour by running that
> program, and _you're_ changing the meaning of those masks.
>

yea, I'm not doing that. This is stuff in mach-tegra that dynamically 
hotplugs CPUs.

>>>> cpu_possible_mask, cpu_present_mask, and cpu_online_mask
>>>> don't have semantics on ARM that I feel is right. (I don't understand
>>>> what cpu_active_mask is, but it's probably not what I want either)
>>>
>>> They have their defined meaning.
>>>
>>> cpu_possible_mask - the CPU number may be available
>>> cpu_present_mask - the CPU number is present and is available to be brought
>>> 	online upon request by the hotplug code
>>> cpu_online_mask - the CPU is becoming available for scheduling
>>> cpu_active_mask - the CPU is fully online and available for scheduling
>>>
>>> CPUs only spend a _very_ short time in the online but !active state
>>> (about the time it takes the CPU asking for it to be brought up to
>>> notice that it has been brought up, and for the scheduler migration
>>> code to receive the notification that the CPU is now online.)  So
>>> you can regard the active mask as a mere copy of the online mask for
>>> most purposes.
>>>
>>> CPUs may be set in the possible mask but not the present mask - that
>>> can happen if you limit the number of CPUs on the kernel command line.
>>> However, we have no way to bring those CPUs to "present" status, and
>>> so they are not available for bringing online - as far as the software
>>> is concerned, they're as good as being physically unplugged.
>>>
>>
>> I don't see a use for that semantic. Why shouldn't we add a couple lines
>> of code to the kernel to scrub out unusable situations?
>
> Think about it - if you have real hot-pluggable CPUs (servers do), do
> you _really_ want to try to bring online a possible CPU (iow, there's
> a socket on the board) but one which isn't present (iow, the socket is
> empty.)
>
> That's what the possible + !present case caters for.  Possible tells
> the kernel how many CPUs to allocate per-cpu data structures for.
> present tells it whether a CPU can be onlined or not.
>

Yes, that's the difference between present and possible. I'm not 
suggesting we report cpus that do not exist. I'm suggesting we report 
cpus that are present, online or not.

>>> So, we (in arch/arm) can't change that decision.  Same for online&&
>>> active must both be set in order for any process to be scheduled onto
>>> that CPU - if any process is on a CPU which is going offline (and
>>> therefore !active, !online) then it will be migrated off that CPU by
>>> generic code before the CPU goes offline.
>>>
>>
>> I will accept that. But then does that mean we (either arch/arm or
>> mach-tegra) have used the cpu hotplug system incorrectly?
>
> It means you're using it in ways that it was not originally designed
> to be used - which is for the physical act of hot-plugging CPUs in
> servers.  CPU hotplug was never designed from the outset for this
> kind of dynamic CPU power management.
>
> Yes, you _can_ use the CPU hotplug interfaces to do this, but as you're
> finding, there are problems with doing this.
>
> We can't go around making ARM use CPU hotplug differently from everyone
> else because that'll make things extremely fragile.  As you've already
> found out, glibc getconf() ultimately uses /proc/stat to return the
> number of CPUs.  So in order to allow dynamic hotplugging _and_ return
> the sensible 'online CPUs' where 'online' means both those which are
> currently running and dormant, you need to change generic code.
>

I agree 100%.

> Plus, there's the issue of CPU affinity for processes and IRQs.  With
> current CPU hotplug, a process which has chosen to bind to a particular
> CPU will have that binding destroyed when the CPU is hot unplugged, and
> its affinity will be broken.  It will be run on a different CPU.  This
> is probably not the semantics you desire.
>

Or maybe it is. I haven't decided yet. For power reasons I might want to 
ignore the affinity until demand goes up for more cores.

> I'd argue that trying to do dynamic hotplugging is the wrong approach,
> especially as there is CPUidle (see below.)
>
>>> I think what you're getting confused over is that within nvidia, you're
>>> probably dynamically hotplugging CPUs, and so offline CPUs are apparantly
>>> available to the system if the load on the system rises.  That's not
>>> something in the generic kernel, and is a custom addition.  Such an
>>> addition _can_ be seen to change the definition of the above masks,
>>> but that's not the fault of the kernel - that's the way you're driving
>>> the hotplug system.
>>>
>>
>> sorry. I thought we weren't the only one in arm driving it this way. if
>> what we've done is strange, I'd like to correct it.
>
> I'm aware of ARM having done something like this in their SMP group
> in the early days of SMP support, but I wasn't aware that it was still
> being actively persued.  I guess that it never really got out of the
> prototyping stage (or maybe it did but they chose not to address these
> issues.)
>
>> Like if I were to think of a big mainframe or xeon server with hotplug
>> cpus, the way the masks work makes perfect sense. I push a button, all
>> the processes get cleared from the cpu, it is marked ass offline. I pull
>> the card from the cabinet, and then it is !present. and maybe instead a
>> new card at a later date. it's just like any other sort of hotplug thing.
>>
>> I think my issue with cpuinfo/stat's output is with the semantics for
>> "online" being different for this one architecture (mach-tegra) and
>> possibly others (??) than what I would expect.
>
> No, it's no different.  As I've explained above, the difference is that
> you're running a userspace program which automatically does the
> hotplugging depending on the system load.
>

no. I'm not. no user space program at all.

> That _can_ be viewed as fundamentally changing the system behaviour
> because CPUs which are offlined are still available for scheduling
> should the system load become high enough to trigger it.
>
> I think there's an easier way to solve this problem: there is the CPU
> idle infrastructure, which allows idle CPUs to remain online while
> allowing them to power down when they're not required.  Because they
> remain online, the scheduler will migrate tasks to them if they're
> not doing anything, and maybe that's something else to look at.
>
> The other advantage of CPUidle is that you're not breaking the affinity
> of anything when the CPU is powered down, unlike hotplug.
>
> So, I think you really should be looking at CPUidle, rather than trying
> to do dynamic hotplugging based on system load.

The disadvantage of CPUidle is there is no way, that I can see, to 
handle asymmetric power domains. I don't ever want to turn off cpu0, 
it's power domain is coupled to a bunch of other things. but any 
additional cores (1 or more) are on a different power domain (shared 
between all additional cores).

if cpu0 is idle, I will want to kick everyone off cpu1 and push them 
onto cpu0, then shut cpu1 off. It gets worse with more cores (a bunch of 
companies announced 4 core ARMs already, for example).

Russell King - ARM Linux June 22, 2011, 8:19 p.m. UTC | #8

On Wed, Jun 22, 2011 at 12:26:11PM -0700, Jon Mayo wrote:
> On 06/22/2011 02:36 AM, Russell King - ARM Linux wrote:
>> Think about it - if you have real hot-pluggable CPUs (servers do), do
>> you _really_ want to try to bring online a possible CPU (iow, there's
>> a socket on the board) but one which isn't present (iow, the socket is
>> empty.)
>>
>> That's what the possible + !present case caters for.  Possible tells
>> the kernel how many CPUs to allocate per-cpu data structures for.
>> present tells it whether a CPU can be onlined or not.
>>
>
> Yes, that's the difference between present and possible. I'm not  
> suggesting we report cpus that do not exist. I'm suggesting we report  
> cpus that are present, online or not.

Which is _what_ we do.  The problem is that mach-tegra is causing
the established well defined APIs to mean something else, and then
you're complaining that those APIs don't mean what they were defined
to be.

You're really shooting yourself in the foot here, and at this point
there is nothing left to discuss.

I can't help you.  You need to discuss this with folk who look after
the hotplug CPU stuff.

You no longer have an ARM architecture problem, your problem is that
you're abusing stuff to get what you want and then complaining that
stuff doesn't work as you want it.

Jon Mayo June 22, 2011, 8:54 p.m. UTC | #9

On 06/22/2011 01:19 PM, Russell King - ARM Linux wrote:
> On Wed, Jun 22, 2011 at 12:26:11PM -0700, Jon Mayo wrote:
>> On 06/22/2011 02:36 AM, Russell King - ARM Linux wrote:
>>> Think about it - if you have real hot-pluggable CPUs (servers do), do
>>> you _really_ want to try to bring online a possible CPU (iow, there's
>>> a socket on the board) but one which isn't present (iow, the socket is
>>> empty.)
>>>
>>> That's what the possible + !present case caters for.  Possible tells
>>> the kernel how many CPUs to allocate per-cpu data structures for.
>>> present tells it whether a CPU can be onlined or not.
>>>
>>
>> Yes, that's the difference between present and possible. I'm not
>> suggesting we report cpus that do not exist. I'm suggesting we report
>> cpus that are present, online or not.
>
> Which is _what_ we do.  The problem is that mach-tegra is causing
> the established well defined APIs to mean something else, and then
> you're complaining that those APIs don't mean what they were defined
> to be.
>

in arch/arm/kernel/setup.c :

#if defined(CONFIG_SMP)
         for_each_online_cpu(i) {

no, the arm kernel reports online cpus, not present cpus. I now agree 
that this is the correct behavior, and is consistent with all other 
platforms. but your responses about present versus possible don't match 
the code.

> You're really shooting yourself in the foot here, and at this point
> there is nothing left to discuss.
>
> I can't help you.  You need to discuss this with folk who look after
> the hotplug CPU stuff.
>
> You no longer have an ARM architecture problem, your problem is that
> you're abusing stuff to get what you want and then complaining that
> stuff doesn't work as you want it.

I'm not complaining, I was seeking advice on the right way to do things. 
You've given me advice. Thank You. End-of-thread.

Russell King - ARM Linux June 22, 2011, 8:57 p.m. UTC | #10

On Wed, Jun 22, 2011 at 01:54:40PM -0700, Jon Mayo wrote:
> On 06/22/2011 01:19 PM, Russell King - ARM Linux wrote:
>> On Wed, Jun 22, 2011 at 12:26:11PM -0700, Jon Mayo wrote:
>>> On 06/22/2011 02:36 AM, Russell King - ARM Linux wrote:
>>>> Think about it - if you have real hot-pluggable CPUs (servers do), do
>>>> you _really_ want to try to bring online a possible CPU (iow, there's
>>>> a socket on the board) but one which isn't present (iow, the socket is
>>>> empty.)
>>>>
>>>> That's what the possible + !present case caters for.  Possible tells
>>>> the kernel how many CPUs to allocate per-cpu data structures for.
>>>> present tells it whether a CPU can be onlined or not.
>>>>
>>>
>>> Yes, that's the difference between present and possible. I'm not
>>> suggesting we report cpus that do not exist. I'm suggesting we report
>>> cpus that are present, online or not.
>>
>> Which is _what_ we do.  The problem is that mach-tegra is causing
>> the established well defined APIs to mean something else, and then
>> you're complaining that those APIs don't mean what they were defined
>> to be.
>>
>
> in arch/arm/kernel/setup.c :
>
> #if defined(CONFIG_SMP)
>         for_each_online_cpu(i) {
>
> no, the arm kernel reports online cpus, not present cpus. I now agree  
> that this is the correct behavior, and is consistent with all other  
> platforms. but your responses about present versus possible don't match  
> the code.

Yes they do, and that is enforced by generic code.

If you disagree, then say _where_ and _why_, don't just say "don't
match".

ARM: report present cpus in /proc/cpuinfo

Commit Message

Comments

Patch