diff mbox series

[RFC,DO-NOT-APPLY] x86/IRQ: don't pass offline CPU(s) to on_selected_cpus()

Message ID 0396a0fa-e318-493a-9858-30f63304cc99@suse.com (mailing list archive)
State New
Headers show
Series [RFC,DO-NOT-APPLY] x86/IRQ: don't pass offline CPU(s) to on_selected_cpus() | expand

Commit Message

Jan Beulich Feb. 11, 2025, 9:38 a.m. UTC
on_selected_cpus() asserts that it's only passed online CPUs. Between
__cpu_disable() removing a CPU from the online map and fixup_eoi()
cleaning up for the CPU being offlined there's a window, though, where
the CPU in question can still appear in an action's cpu_eoi_map. Remove
offline CPUs, as they're (supposed to be) taken care of by fixup_eoi().

Move the entire tail of desc_guest_eoi() into a new helper, thus
streamlining processinf also for other call sites when the sole
remaining CPU is the local one. Move and split the assertion, to be a
functionally equivalent replacement for the earlier BUG_ON() in
__pirq_guest_unbind(). Note that we can't use scratch_cpumask there, in
particular in the context of a timer handler and with data held there
needing to stay intact across process_pending_softirqs().

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
While this builds okay, for now I didn't even consider testing it: Both
aspects below (plus the ugly locking arrangement) speak against dealing
with the issue this way, imo. Cc-ing REST rather than just x86 for this
reason.

RFC: Don't we need {get,put}_cpu_maps() around quite a few more uses of
     cpu_online_map (e.g. _clear_irq_vector() when called from
     destroy_irq(), or about every caller of smp_call_function())? Roger
     suggests using stop-machine context for CPU {on,off}lining, but I
     seem to vaguely recall that there was a reason not to do so at
     least for the offlining part.

RFC: I doubt process_pending_softirqs() is okay to use from a timer
     handler. (Callers of, in particular, {desc,pirq}_guest_eoi() would
     also need checking for process_pending_softirqs() to be okay to use
     in their contexts.)

Comments

Roger Pau Monne Feb. 11, 2025, 6:37 p.m. UTC | #1
On Tue, Feb 11, 2025 at 10:38:41AM +0100, Jan Beulich wrote:
> on_selected_cpus() asserts that it's only passed online CPUs. Between
> __cpu_disable() removing a CPU from the online map and fixup_eoi()
> cleaning up for the CPU being offlined there's a window, though, where
> the CPU in question can still appear in an action's cpu_eoi_map. Remove
> offline CPUs, as they're (supposed to be) taken care of by fixup_eoi().
> 
> Move the entire tail of desc_guest_eoi() into a new helper, thus
> streamlining processinf also for other call sites when the sole
               ^ processing

> remaining CPU is the local one. Move and split the assertion, to be a
> functionally equivalent replacement for the earlier BUG_ON() in
> __pirq_guest_unbind(). Note that we can't use scratch_cpumask there, in
> particular in the context of a timer handler and with data held there
> needing to stay intact across process_pending_softirqs().

I would probably add the crash backtrace to the commit message.

> 
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> While this builds okay, for now I didn't even consider testing it: Both
> aspects below (plus the ugly locking arrangement) speak against dealing
> with the issue this way, imo. Cc-ing REST rather than just x86 for this
> reason.

Indeed, the locking is specially problematic here, both for being
ugly, and because I assume EOI is a hot path.

> RFC: Don't we need {get,put}_cpu_maps() around quite a few more uses of
>      cpu_online_map (e.g. _clear_irq_vector() when called from
>      destroy_irq(), or about every caller of smp_call_function())?

Indeed, that's my understanding also.  Quite a lot of users of
cpu_online_mask likely requires wrapping around {get,put}_cpu_maps()
calls, otherwise they very likely incur in TOCTOU races.

>      Roger
>      suggests using stop-machine context for CPU {on,off}lining, but I
>      seem to vaguely recall that there was a reason not to do so at
>      least for the offlining part.

I would have to explore this more thoroughly, it does seems feasible
on first sight, but devil (and bugs) is in the details.

I don't think we want to go to the lengths of what you do below for
quite a lot of users of cpu_online_mask.

> 
> RFC: I doubt process_pending_softirqs() is okay to use from a timer
>      handler. (Callers of, in particular, {desc,pirq}_guest_eoi() would
>      also need checking for process_pending_softirqs() to be okay to use
>      in their contexts.)

Yeah, I would be worry of doing softirq processing from this context,
the more that (even if not the common case) I would be afraid of the
unbounded latency that process_pending_softirqs() could introduce.

> 
> --- unstable.orig/xen/arch/x86/irq.c	2021-04-08 11:10:47.000000000 +0200
> +++ unstable/xen/arch/x86/irq.c	2025-02-11 09:54:38.532567287 +0100
> @@ -6,6 +6,7 @@
>   */
>  
>  #include <xen/init.h>
> +#include <xen/cpu.h>
>  #include <xen/delay.h>
>  #include <xen/errno.h>
>  #include <xen/event.h>
> @@ -1183,7 +1184,7 @@ static inline void clear_pirq_eoi(struct
>      }
>  }
>  
> -static void cf_check set_eoi_ready(void *data);
> +static bool invoke_set_eoi_ready(struct irq_desc *desc, bool wait);
>  
>  static void cf_check irq_guest_eoi_timer_fn(void *data)
>  {
> @@ -1224,18 +1225,13 @@ static void cf_check irq_guest_eoi_timer
>  
>      switch ( action->ack_type )
>      {
> -        cpumask_t *cpu_eoi_map;
> -
>      case ACKTYPE_UNMASK:
>          if ( desc->handler->end )
>              desc->handler->end(desc, 0);
>          break;
>  
>      case ACKTYPE_EOI:
> -        cpu_eoi_map = this_cpu(scratch_cpumask);
> -        cpumask_copy(cpu_eoi_map, action->cpu_eoi_map);
> -        spin_unlock_irq(&desc->lock);
> -        on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0);
> +        invoke_set_eoi_ready(desc, false);
>          return;
>      }
>  
> @@ -1468,6 +1464,49 @@ static void cf_check set_eoi_ready(void
>      flush_ready_eoi();
>  }
>  
> +/*
> + * Locking is somewhat special here: In all cases the function is invoked with
> + * desc's lock held.  When @wait is true, the function tries to avoid unlocking.
> + * It always returns whether the lock is still held.
> + */
> +static bool invoke_set_eoi_ready(struct irq_desc *desc, bool wait)
> +{
> +    const irq_guest_action_t *action = guest_action(desc);
> +    cpumask_t cpu_eoi_map;
> +
> +    cpumask_copy(&cpu_eoi_map, action->cpu_eoi_map);
> +
> +    if ( __cpumask_test_and_clear_cpu(smp_processor_id(), &cpu_eoi_map) )
> +    {
> +        ASSERT(action->ack_type == ACKTYPE_EOI);
> +        __set_eoi_ready(desc);
> +        spin_unlock(&desc->lock);
> +        flush_ready_eoi();
> +        local_irq_enable();
> +    }
> +    else if ( wait && cpumask_empty(&cpu_eoi_map) )
> +        return true;
> +    else
> +    {
> +        ASSERT(action->ack_type == ACKTYPE_EOI);
> +        spin_unlock_irq(&desc->lock);
> +    }
> +
> +    if ( cpumask_empty(&cpu_eoi_map) )
> +        return false;
> +
> +    while ( !get_cpu_maps() )
> +        process_pending_softirqs();
> +
> +    cpumask_and(&cpu_eoi_map, &cpu_eoi_map, &cpu_online_map);
> +    if ( !cpumask_empty(&cpu_eoi_map) )
> +        on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, wait);
> +
> +    put_cpu_maps();
> +
> +    return false;
> +}
> +
>  void pirq_guest_eoi(struct pirq *pirq)

Possibly for pirq_guest_eoi() when called from the PHYSDEVOP_eoi
hypercall we should propagate whether the EOI has been performed, and
otherwise use an hypercall continuation to retry?

As said above however, this approach is so ugly, and we would require
similar adjustments in other places that also use cpu_online_maks that
I would only consider going this route as last resort.

Thanks, Roger.
diff mbox series

Patch

--- unstable.orig/xen/arch/x86/irq.c	2021-04-08 11:10:47.000000000 +0200
+++ unstable/xen/arch/x86/irq.c	2025-02-11 09:54:38.532567287 +0100
@@ -6,6 +6,7 @@ 
  */
 
 #include <xen/init.h>
+#include <xen/cpu.h>
 #include <xen/delay.h>
 #include <xen/errno.h>
 #include <xen/event.h>
@@ -1183,7 +1184,7 @@  static inline void clear_pirq_eoi(struct
     }
 }
 
-static void cf_check set_eoi_ready(void *data);
+static bool invoke_set_eoi_ready(struct irq_desc *desc, bool wait);
 
 static void cf_check irq_guest_eoi_timer_fn(void *data)
 {
@@ -1224,18 +1225,13 @@  static void cf_check irq_guest_eoi_timer
 
     switch ( action->ack_type )
     {
-        cpumask_t *cpu_eoi_map;
-
     case ACKTYPE_UNMASK:
         if ( desc->handler->end )
             desc->handler->end(desc, 0);
         break;
 
     case ACKTYPE_EOI:
-        cpu_eoi_map = this_cpu(scratch_cpumask);
-        cpumask_copy(cpu_eoi_map, action->cpu_eoi_map);
-        spin_unlock_irq(&desc->lock);
-        on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0);
+        invoke_set_eoi_ready(desc, false);
         return;
     }
 
@@ -1468,6 +1464,49 @@  static void cf_check set_eoi_ready(void
     flush_ready_eoi();
 }
 
+/*
+ * Locking is somewhat special here: In all cases the function is invoked with
+ * desc's lock held.  When @wait is true, the function tries to avoid unlocking.
+ * It always returns whether the lock is still held.
+ */
+static bool invoke_set_eoi_ready(struct irq_desc *desc, bool wait)
+{
+    const irq_guest_action_t *action = guest_action(desc);
+    cpumask_t cpu_eoi_map;
+
+    cpumask_copy(&cpu_eoi_map, action->cpu_eoi_map);
+
+    if ( __cpumask_test_and_clear_cpu(smp_processor_id(), &cpu_eoi_map) )
+    {
+        ASSERT(action->ack_type == ACKTYPE_EOI);
+        __set_eoi_ready(desc);
+        spin_unlock(&desc->lock);
+        flush_ready_eoi();
+        local_irq_enable();
+    }
+    else if ( wait && cpumask_empty(&cpu_eoi_map) )
+        return true;
+    else
+    {
+        ASSERT(action->ack_type == ACKTYPE_EOI);
+        spin_unlock_irq(&desc->lock);
+    }
+
+    if ( cpumask_empty(&cpu_eoi_map) )
+        return false;
+
+    while ( !get_cpu_maps() )
+        process_pending_softirqs();
+
+    cpumask_and(&cpu_eoi_map, &cpu_eoi_map, &cpu_online_map);
+    if ( !cpumask_empty(&cpu_eoi_map) )
+        on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, wait);
+
+    put_cpu_maps();
+
+    return false;
+}
+
 void pirq_guest_eoi(struct pirq *pirq)
 {
     struct irq_desc *desc;
@@ -1481,7 +1520,6 @@  void pirq_guest_eoi(struct pirq *pirq)
 void desc_guest_eoi(struct irq_desc *desc, struct pirq *pirq)
 {
     irq_guest_action_t *action = guest_action(desc);
-    cpumask_t           cpu_eoi_map;
 
     if ( unlikely(!action) ||
          unlikely(!test_and_clear_bool(pirq->masked)) ||
@@ -1502,24 +1540,7 @@  void desc_guest_eoi(struct irq_desc *des
         return;
     }
 
-    ASSERT(action->ack_type == ACKTYPE_EOI);
-        
-    cpumask_copy(&cpu_eoi_map, action->cpu_eoi_map);
-
-    if ( __cpumask_test_and_clear_cpu(smp_processor_id(), &cpu_eoi_map) )
-    {
-        __set_eoi_ready(desc);
-        spin_unlock(&desc->lock);
-        flush_ready_eoi();
-        local_irq_enable();
-    }
-    else
-    {
-        spin_unlock_irq(&desc->lock);
-    }
-
-    if ( !cpumask_empty(&cpu_eoi_map) )
-        on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, 0);
+    invoke_set_eoi_ready(desc, false);
 }
 
 int pirq_guest_unmask(struct domain *d)
@@ -1734,7 +1755,6 @@  static irq_guest_action_t *__pirq_guest_
     struct domain *d, struct pirq *pirq, struct irq_desc *desc)
 {
     irq_guest_action_t *action = guest_action(desc);
-    cpumask_t           cpu_eoi_map;
     int                 i;
 
     if ( unlikely(action == NULL) )
@@ -1765,12 +1785,7 @@  static irq_guest_action_t *__pirq_guest_
         if ( test_and_clear_bool(pirq->masked) &&
              (--action->in_flight == 0) &&
              (action->nr_guests != 0) )
-        {
-            cpumask_copy(&cpu_eoi_map, action->cpu_eoi_map);
-            spin_unlock_irq(&desc->lock);
-            on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, 0);
-            spin_lock_irq(&desc->lock);
-        }
+            invoke_set_eoi_ready(desc, false);
         break;
     }
 
@@ -1796,14 +1811,8 @@  static irq_guest_action_t *__pirq_guest_
      * would need to flush all ready EOIs before returning as otherwise the
      * desc->handler could change and we would call the wrong 'end' hook.
      */
-    cpumask_copy(&cpu_eoi_map, action->cpu_eoi_map);
-    if ( !cpumask_empty(&cpu_eoi_map) )
-    {
-        BUG_ON(action->ack_type != ACKTYPE_EOI);
-        spin_unlock_irq(&desc->lock);
-        on_selected_cpus(&cpu_eoi_map, set_eoi_ready, desc, 1);
+    if ( !invoke_set_eoi_ready(desc, true) )
         spin_lock_irq(&desc->lock);
-    }
 
     BUG_ON(!cpumask_empty(action->cpu_eoi_map));