[RFC,v2,01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline

There are places where preempt_disable() is used to prevent any CPU from
going offline during the critical section. Let us call them as "atomic
hotplug readers" (atomic because they run in atomic contexts).

Often, these atomic hotplug readers have a simple need : they want the cpu
online mask that they work with (inside their critical section), to be
stable, i.e., it should be guaranteed that CPUs in that mask won't go
offline during the critical section. The important point here is that
they don't really need to synchronize with the actual CPU tear-down
sequence. All they need is synchronization with the updates to the
cpu_online_mask. (Hence the term "light", for light-weight).

The intent of this patch is to provide synchronization APIs for such
"light" atomic hotplug readers. [ get/put_online_cpus_atomic_light() ]

Fundamental idea behind the design:
-----------------------------------

Simply put, in the hotplug writer path, have appropriate locking around the
update to the cpu_online_mask in the CPU tear-down sequence. And once the
update is done, release the lock and allow the "light" atomic hotplug readers
to go ahead. Meanwhile, the hotplug writer can safely continue the actual
CPU tear-down sequence (running CPU_DYING notifiers etc) since the "light"
atomic readers don't really care about those operations (and hence don't
need to synchronize with them).

Also, once the hotplug writer completes taking the CPU offline, it should
not start any new cpu_down() operations until all existing "light" atomic
hotplug readers have completed.

Some important design requirements and considerations:
-----------------------------------------------------

1. The "light" atomic hotplug readers should ideally *never* have to wait
   for the hotplug writer (cpu_down()) for too long (like entire duration
   of CPU offline, for example). Because, these atomic hotplug readers can be
   in very hot-paths like interrupt handling/IPI and hence, if they have to
   wait for an ongoing cpu_down() to complete, it would pretty much
   introduce the same performance/latency problems as stop_machine().

2. Any synchronization at the atomic hotplug readers side must be highly
   scalable - avoid global single-holder locks/counters etc. Because, these
   paths currently use the extremely fast preempt_disable(); our replacement
   to preempt_disable() should not become ridiculously costly and also should
   not serialize the readers among themselves needlessly.

3. preempt_disable() was recursive. The replacement should also be recursive.

Implementation of the design:
----------------------------

At the core, we use a reader-writer lock to synchronize the update to the
cpu_online_mask. That way, multiple "light" atomic hotplug readers can co-exist
and the writer can acquire the lock only when all the readers have completed.

Once acquired, the writer holds the "light" lock only during the duration of
the update to the cpu_online_mask. That way, the readers don't have to spin
for too long (ie., the write-hold-time for the "light" lock is tiny), which
keeps the readers in good shape.

Reader-writer lock are recursive, so they can be used in a nested fashion in
the reader-path. Together, these satisfy all the 3 requirements mentioned above.

Also, since we don't use per-cpu locks (because rwlocks themselves are quite
scalable for readers), we don't end up in any lock ordering problems that can
occur if we try to use per-cpu locks.

I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
suggestions and ideas, which inspired and influenced many of the decisions in
this as well as previous designs. Thanks a lot Michael and Xiao!

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/cpu.h |    4 ++
 kernel/cpu.c        |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 90 insertions(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC,v2,01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline

Commit Message

Comments

Patch