diff mbox series

[v1-alt] xen/livepatch: Make check_for_livepatch_work() faster in the common case

Message ID 20240111201129.4010175-1-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series [v1-alt] xen/livepatch: Make check_for_livepatch_work() faster in the common case | expand

Commit Message

Andrew Cooper Jan. 11, 2024, 8:11 p.m. UTC
When livepatching is enabled, this function is used all the time.  Really do
check the fastpath first, and annotate it likely() as this is the right answer
100% of the time (to many significant figures).  This cuts out 3 pointer
dereferences in the "nothing to do path".

However, GCC still needs some help to persuade it not to set the full stack
frame (6 spilled registers, 3 slots of locals) even on the fastpath.

Create a new check_for_livepatch_work() with the fastpath only, and make the
"new" do_livepatch_work() noinline.  This causes the fastpath to need no stack
frame, making it faster still.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

v1-alt:
 * Manually split the functions.

Experimenting with __attribute__((cold)) was disappointing.  Vs this patch, it
creates an extra check_for_livepatch_work.cold function(and section) which is
just `jmp do_livepatch_work`.
---
 xen/common/livepatch.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Jan Beulich Jan. 12, 2024, 10:31 a.m. UTC | #1
On 11.01.2024 21:11, Andrew Cooper wrote:
> When livepatching is enabled, this function is used all the time.  Really do
> check the fastpath first, and annotate it likely() as this is the right answer
> 100% of the time (to many significant figures).  This cuts out 3 pointer
> dereferences in the "nothing to do path".
> 
> However, GCC still needs some help to persuade it not to set the full stack
> frame (6 spilled registers, 3 slots of locals) even on the fastpath.
> 
> Create a new check_for_livepatch_work() with the fastpath only, and make the
> "new" do_livepatch_work() noinline.  This causes the fastpath to need no stack
> frame, making it faster still.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
diff mbox series

Patch

diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index 1209fea2566c..2c4b84382798 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -1693,7 +1693,7 @@  static int livepatch_spin(atomic_t *counter, s_time_t timeout,
  * The main function which manages the work of quiescing the system and
  * patching code.
  */
-void check_for_livepatch_work(void)
+static void noinline do_livepatch_work(void)
 {
 #define ACTION(x) [LIVEPATCH_ACTION_##x] = #x
     static const char *const names[] = {
@@ -1711,10 +1711,6 @@  void check_for_livepatch_work(void)
          !is_idle_domain(current->sched_unit->domain) )
         return;
 
-    /* Fast path: no work to do. */
-    if ( !per_cpu(work_to_do, cpu ) )
-        return;
-
     smp_rmb();
     /* In case we aborted, other CPUs can skip right away. */
     if ( !livepatch_work.do_work )
@@ -1864,6 +1860,17 @@  void check_for_livepatch_work(void)
     }
 }
 
+void check_for_livepatch_work(void)
+{
+    unsigned int cpu = smp_processor_id();
+
+    /* Fast path: no work to do. */
+    if ( likely(!per_cpu(work_to_do, cpu)) )
+        return;
+
+    do_livepatch_work();
+}
+
 /*
  * Only allow dependent payload is applied on top of the correct
  * build-id.