[RFC,v2,02/10] CPU hotplug: Provide APIs for "full" atomic readers to prevent CPU offline

Some of the atomic hotplug readers cannot tolerate CPUs going offline while
they are in their critical section. That is, they can't get away with just
synchronizing with the updates to the cpu_online_mask; they really need to
synchronize with the entire CPU tear-down sequence, because they are very
much involved in the hotplug related code paths.

Such "full" atomic hotplug readers need a way to *actually* and *truly*
prevent CPUs from going offline while they are active.

The intent of this patch is to provide synchronization APIs for such "full"
atomic hotplug readers. [ get/put_online_cpus_atomic_full()]

Some important design requirements and considerations:
-----------------------------------------------------

1. Scalable synchronization

Any synchronization at the atomic hotplug readers side must be highly
scalable - avoid global single-holder locks/counters etc. Because, these
paths currently use the extremely fast preempt_disable(); our replacement
to preempt_disable() should not become ridiculously costly and also should
not serialize the readers among themselves needlessly.

2. Should not have ABBA deadlock possibilities between the 2 types of atomic
readers ("light" vs "full")

Atomic readers who can get away with a stable online mask ("light" readers)
and atomic readers who need full synchronization with CPU offline ("full"
readers) must not end up in situations leading to ABBA deadlocks because of
the APIs they use respectively.

Also, we should not impose any ordering restrictions on how the 2 types of
readers can nest. They should be allowed to nest freely in any way they want,
and we should provide the guarantee that they won't deadlock.
(preempt_disable() posed no ordering restrictions before. Neither should we).

3. preempt_disable() was recursive. The replacement should also be recursive.

Implementation of the design:
----------------------------

Basically, we use another reader-writer lock for synchronizing the "full"
hotplug readers with the writer. But since we want to avoid ABBA deadlock
possibilities, we need to be careful as well as clever while designing this
"full" reader APIs.

Simplification: All "full" readers are also "light" readers
-----------------------------------------------------------

This simplification helps us get rid of ABBA deadlock possibilites, because
the lock ordering remains consistent to both types of readers, and looks
something like this:

Light reader:
------------
    Take light-lock for read

        /* Critical section */

    Release the light-lock

Full reader:
-----------
    Take light-lock for read

        Take full-lock for read

            /* Critical section */

        Release the full-lock

    Release the light-lock

But then, the writer path should be cleverly designed in such a way that
after the update to cpu_online_mask, only the light-readers can continue, but
the full-readers continue to spin until entire CPU offline operation is
complete.

So the lock ordering in the writer should look like this:

Writer:
------

    Take light-lock for write

        Take the full-lock for write

            Update cpu_online_mask (flip the bit)

    /*
     * Now allow only the light-readers to continue, while keeping the
     * full-readers spinning (ie., release the light-lock alone).
     */
    Release the light-lock

    /* Continue CPU tear-down, calling CPU_DYING notifiers */

    /* Finally, allow the full-readers to continue */
    Release the full-lock

It can be verified that, with this scheme, there is no possibility of any
ABBA deadlocks, and that the 2 types of atomic readers can nest in any
way they want, without fear.

We expect that the atomic hotplug readers who need full synchronization
with CPU offline (and cannot just get away with a stable online mask), be
rare. Otherwise, we could end up creating a similar effect as
stop_machine() without even using stop_machine()! [That is, if too many
readers are of this kind, everybody will wait for the entire CPU offline to
finish, which is almost like having stop_machine() itself.]

So we hope that most atomic hotplug readers are of the "light" type.
That would keeps things fast and scalable and make CPU offline operations
painless.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/cpu.h |    4 ++++
 kernel/cpu.c        |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC,v2,02/10] CPU hotplug: Provide APIs for "full" atomic readers to prevent CPU offline

Commit Message

Comments

Patch