diff mbox series

[v2,5/7] x86/irq: deal with old_cpu_mask for interrupts in movement in fixup_irqs()

Message ID 20240610142043.11924-6-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series x86/irq: fixes for CPU hot{,un}plug | expand

Commit Message

Roger Pau Monné June 10, 2024, 2:20 p.m. UTC
Given the current logic it's possible for ->arch.old_cpu_mask to get out of
sync: if a CPU set in old_cpu_mask is offlined and then onlined
again without old_cpu_mask having been updated the data in the mask will no
longer be accurate, as when brought back online the CPU will no longer have
old_vector configured to handle the old interrupt source.

If there's an interrupt movement in progress, and the to be offlined CPU (which
is the call context) is in the old_cpu_mask clear it and update the mask, so it
doesn't contain stale data.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/irq.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

Comments

Jan Beulich June 11, 2024, 12:45 p.m. UTC | #1
On 10.06.2024 16:20, Roger Pau Monne wrote:
> Given the current logic it's possible for ->arch.old_cpu_mask to get out of
> sync: if a CPU set in old_cpu_mask is offlined and then onlined
> again without old_cpu_mask having been updated the data in the mask will no
> longer be accurate, as when brought back online the CPU will no longer have
> old_vector configured to handle the old interrupt source.
> 
> If there's an interrupt movement in progress, and the to be offlined CPU (which
> is the call context) is in the old_cpu_mask clear it and update the mask, so it
> doesn't contain stale data.

This imo is too __cpu_disable()-centric. In the code you cover the
smp_send_stop() case afaict, where it's all _other_ CPUs which are being
offlined. As we're not meaning to bring CPUs online again in that case,
dealing with the situation likely isn't needed. Yet the description should
imo at least make clear that the case was considered.

> @@ -2589,6 +2589,28 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>                                 affinity);
>          }
>  
> +        if ( desc->arch.move_in_progress &&
> +             !cpumask_test_cpu(cpu, &cpu_online_map) &&

This part of the condition is, afaict, what covers (excludes) the
smp_send_stop() case. Might be nice to have a brief comment here, thus
also clarifying ...

> +             cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
> +        {
> +            /*
> +             * This CPU is going offline, remove it from ->arch.old_cpu_mask
> +             * and possibly release the old vector if the old mask becomes
> +             * empty.
> +             *
> +             * Note cleaning ->arch.old_cpu_mask is required if the CPU is
> +             * brought offline and then online again, as when re-onlined the
> +             * per-cpu vector table will no longer have ->arch.old_vector
> +             * setup, and hence ->arch.old_cpu_mask would be stale.
> +             */
> +            cpumask_clear_cpu(cpu, desc->arch.old_cpu_mask);
> +            if ( cpumask_empty(desc->arch.old_cpu_mask) )
> +            {
> +                desc->arch.move_in_progress = 0;
> +                release_old_vec(desc);
> +            }

... that none of this is really wanted or necessary in that other case.
Assuming my understanding above is correct, the code change itself is
once again
Reviewed-by: Jan Beulich <jbeulich@suse.com>
yet here I'm uncertain whether to offer on-commit editing, as it's not
really clear to me whether there's a dependencies on patch 4.

Jan
Jan Beulich June 11, 2024, 1:47 p.m. UTC | #2
On 10.06.2024 16:20, Roger Pau Monne wrote:
> @@ -2589,6 +2589,28 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>                                 affinity);
>          }
>  
> +        if ( desc->arch.move_in_progress &&
> +             !cpumask_test_cpu(cpu, &cpu_online_map) &&

Btw - any reason you're open-coding !cpu_online() here? I've noticed this
in the context of patch 7, where a little further down a !cpu_online() is
being added. Those likely all want to be consistent.

Jan
Roger Pau Monné June 12, 2024, 8:36 a.m. UTC | #3
On Tue, Jun 11, 2024 at 03:47:03PM +0200, Jan Beulich wrote:
> On 10.06.2024 16:20, Roger Pau Monne wrote:
> > @@ -2589,6 +2589,28 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
> >                                 affinity);
> >          }
> >  
> > +        if ( desc->arch.move_in_progress &&
> > +             !cpumask_test_cpu(cpu, &cpu_online_map) &&
> 
> Btw - any reason you're open-coding !cpu_online() here? I've noticed this
> in the context of patch 7, where a little further down a !cpu_online() is
> being added. Those likely all want to be consistent.

No reason really - just me not realizing we had that helper.  Can
adjust in next version.

Thanks, Roger.
Roger Pau Monné June 12, 2024, 8:47 a.m. UTC | #4
On Tue, Jun 11, 2024 at 02:45:09PM +0200, Jan Beulich wrote:
> On 10.06.2024 16:20, Roger Pau Monne wrote:
> > Given the current logic it's possible for ->arch.old_cpu_mask to get out of
> > sync: if a CPU set in old_cpu_mask is offlined and then onlined
> > again without old_cpu_mask having been updated the data in the mask will no
> > longer be accurate, as when brought back online the CPU will no longer have
> > old_vector configured to handle the old interrupt source.
> > 
> > If there's an interrupt movement in progress, and the to be offlined CPU (which
> > is the call context) is in the old_cpu_mask clear it and update the mask, so it
> > doesn't contain stale data.
> 
> This imo is too __cpu_disable()-centric. In the code you cover the
> smp_send_stop() case afaict, where it's all _other_ CPUs which are being
> offlined. As we're not meaning to bring CPUs online again in that case,
> dealing with the situation likely isn't needed. Yet the description should
> imo at least make clear that the case was considered.

What about adding the following paragraph:

Note that when the system is going down fixup_irqs() will be called by
smp_send_stop() from CPU 0 with a mask with only CPU 0 on it,
effectively asking to move all interrupts to the current caller (CPU
0) which is the only CPU online.  In that case we don't care to
migrate interrupts that are in the process of being moved, as it's
likely we won't be able to move all interrupts to CPU 0 due to vector
shortage anyway.

> 
> > @@ -2589,6 +2589,28 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
> >                                 affinity);
> >          }
> >  
> > +        if ( desc->arch.move_in_progress &&
> > +             !cpumask_test_cpu(cpu, &cpu_online_map) &&
> 
> This part of the condition is, afaict, what covers (excludes) the
> smp_send_stop() case. Might be nice to have a brief comment here, thus
> also clarifying ...

Would you be fine with:

        if ( desc->arch.move_in_progress &&
             /*
	      * Only attempt to migrate if the current CPU is going
	      * offline, otherwise the whole system is going down and
	      * leaving stale interrupts is fine.
	      */
             !cpumask_test_cpu(cpu, &cpu_online_map) &&
             cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )


> > +             cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
> > +        {
> > +            /*
> > +             * This CPU is going offline, remove it from ->arch.old_cpu_mask
> > +             * and possibly release the old vector if the old mask becomes
> > +             * empty.
> > +             *
> > +             * Note cleaning ->arch.old_cpu_mask is required if the CPU is
> > +             * brought offline and then online again, as when re-onlined the
> > +             * per-cpu vector table will no longer have ->arch.old_vector
> > +             * setup, and hence ->arch.old_cpu_mask would be stale.
> > +             */
> > +            cpumask_clear_cpu(cpu, desc->arch.old_cpu_mask);
> > +            if ( cpumask_empty(desc->arch.old_cpu_mask) )
> > +            {
> > +                desc->arch.move_in_progress = 0;
> > +                release_old_vec(desc);
> > +            }
> 
> ... that none of this is really wanted or necessary in that other case.
> Assuming my understanding above is correct, the code change itself is
> once again

It is.  For the smp_send_stop() case we don't care much about leaving
stale data around, as the system is going down.  It's also likely
impossible to move all interrupts to CPU0 due to vector shortage, so
some interrupts will be left assigned to different CPUs.

> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> yet here I'm uncertain whether to offer on-commit editing, as it's not
> really clear to me whether there's a dependencies on patch 4.

No, in principle it should be fine to skip patch 4, but I would like
to do another round of testing before confirming.

Thanks, Roger.
Jan Beulich June 12, 2024, 9:04 a.m. UTC | #5
On 12.06.2024 10:47, Roger Pau Monné wrote:
> On Tue, Jun 11, 2024 at 02:45:09PM +0200, Jan Beulich wrote:
>> On 10.06.2024 16:20, Roger Pau Monne wrote:
>>> Given the current logic it's possible for ->arch.old_cpu_mask to get out of
>>> sync: if a CPU set in old_cpu_mask is offlined and then onlined
>>> again without old_cpu_mask having been updated the data in the mask will no
>>> longer be accurate, as when brought back online the CPU will no longer have
>>> old_vector configured to handle the old interrupt source.
>>>
>>> If there's an interrupt movement in progress, and the to be offlined CPU (which
>>> is the call context) is in the old_cpu_mask clear it and update the mask, so it
>>> doesn't contain stale data.
>>
>> This imo is too __cpu_disable()-centric. In the code you cover the
>> smp_send_stop() case afaict, where it's all _other_ CPUs which are being
>> offlined. As we're not meaning to bring CPUs online again in that case,
>> dealing with the situation likely isn't needed. Yet the description should
>> imo at least make clear that the case was considered.
> 
> What about adding the following paragraph:

Sounds good, just maybe one small adjustment:

> Note that when the system is going down fixup_irqs() will be called by
> smp_send_stop() from CPU 0 with a mask with only CPU 0 on it,
> effectively asking to move all interrupts to the current caller (CPU
> 0) which is the only CPU online.  In that case we don't care to

"... the only CPU to remain online."

> migrate interrupts that are in the process of being moved, as it's
> likely we won't be able to move all interrupts to CPU 0 due to vector
> shortage anyway.
> 
>>
>>> @@ -2589,6 +2589,28 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>>>                                 affinity);
>>>          }
>>>  
>>> +        if ( desc->arch.move_in_progress &&
>>> +             !cpumask_test_cpu(cpu, &cpu_online_map) &&
>>
>> This part of the condition is, afaict, what covers (excludes) the
>> smp_send_stop() case. Might be nice to have a brief comment here, thus
>> also clarifying ...
> 
> Would you be fine with:
> 
>         if ( desc->arch.move_in_progress &&
>              /*
> 	      * Only attempt to migrate if the current CPU is going
> 	      * offline, otherwise the whole system is going down and
> 	      * leaving stale interrupts is fine.
> 	      */
>              !cpumask_test_cpu(cpu, &cpu_online_map) &&
>              cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )

Sure, this is even more verbose (i.e. better) than I was after.

Jan
Roger Pau Monné June 12, 2024, 10:41 a.m. UTC | #6
On Wed, Jun 12, 2024 at 11:04:26AM +0200, Jan Beulich wrote:
> On 12.06.2024 10:47, Roger Pau Monné wrote:
> > On Tue, Jun 11, 2024 at 02:45:09PM +0200, Jan Beulich wrote:
> >> On 10.06.2024 16:20, Roger Pau Monne wrote:
> >>> Given the current logic it's possible for ->arch.old_cpu_mask to get out of
> >>> sync: if a CPU set in old_cpu_mask is offlined and then onlined
> >>> again without old_cpu_mask having been updated the data in the mask will no
> >>> longer be accurate, as when brought back online the CPU will no longer have
> >>> old_vector configured to handle the old interrupt source.
> >>>
> >>> If there's an interrupt movement in progress, and the to be offlined CPU (which
> >>> is the call context) is in the old_cpu_mask clear it and update the mask, so it
> >>> doesn't contain stale data.
> >>
> >> This imo is too __cpu_disable()-centric. In the code you cover the
> >> smp_send_stop() case afaict, where it's all _other_ CPUs which are being
> >> offlined. As we're not meaning to bring CPUs online again in that case,
> >> dealing with the situation likely isn't needed. Yet the description should
> >> imo at least make clear that the case was considered.
> > 
> > What about adding the following paragraph:
> 
> Sounds good, just maybe one small adjustment:
> 
> > Note that when the system is going down fixup_irqs() will be called by
> > smp_send_stop() from CPU 0 with a mask with only CPU 0 on it,
> > effectively asking to move all interrupts to the current caller (CPU
> > 0) which is the only CPU online.  In that case we don't care to
> 
> "... the only CPU to remain online."

Right, that's better.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 306e7756112f..f07e09b63b53 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2546,7 +2546,7 @@  void fixup_irqs(const cpumask_t *mask, bool verbose)
     for ( irq = 0; irq < nr_irqs; irq++ )
     {
         bool break_affinity = false, set_affinity = true;
-        unsigned int vector;
+        unsigned int vector, cpu = smp_processor_id();
         cpumask_t *affinity = this_cpu(scratch_cpumask);
 
         if ( irq == 2 )
@@ -2589,6 +2589,28 @@  void fixup_irqs(const cpumask_t *mask, bool verbose)
                                affinity);
         }
 
+        if ( desc->arch.move_in_progress &&
+             !cpumask_test_cpu(cpu, &cpu_online_map) &&
+             cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
+        {
+            /*
+             * This CPU is going offline, remove it from ->arch.old_cpu_mask
+             * and possibly release the old vector if the old mask becomes
+             * empty.
+             *
+             * Note cleaning ->arch.old_cpu_mask is required if the CPU is
+             * brought offline and then online again, as when re-onlined the
+             * per-cpu vector table will no longer have ->arch.old_vector
+             * setup, and hence ->arch.old_cpu_mask would be stale.
+             */
+            cpumask_clear_cpu(cpu, desc->arch.old_cpu_mask);
+            if ( cpumask_empty(desc->arch.old_cpu_mask) )
+            {
+                desc->arch.move_in_progress = 0;
+                release_old_vec(desc);
+            }
+        }
+
         /*
          * Avoid shuffling the interrupt around as long as current target CPUs
          * are a subset of the input mask.  What fixup_irqs() cares about is