[v2,0/6] x86: fixes/improvements for scratch cpumask
mbox series

Message ID 20200217184324.73762-1-roger.pau@citrix.com
Headers show
Series
  • x86: fixes/improvements for scratch cpumask
Related show

Message

Roger Pau Monné Feb. 17, 2020, 6:43 p.m. UTC
Hello,

Commit:

5500d265a2a8fa63d60c08beb549de8ec82ff7a5
x86/smp: use APIC ALLBUT destination shorthand when possible

Introduced a bogus usage of the scratch cpumask: it was used in a
function that could be called from interrupt context, and hence using
the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
together with also preventing the usage of any per-CPU variables when
send_IPI_mask is called from #MC or #NMI context. Previous patches are
preparatory changes.

Patch #6 adds some debug infrastructure to make sure the scratch cpumask
is used in the right context, and hence should prevent further missuses.

Thanks, Roger.

Roger Pau Monne (6):
  x86/smp: unify header includes in smp.h
  x86: introduce a nmi_count tracking variable
  x86: track when in #MC context
  x86: track when in #NMI context
  x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  x86: add accessors for scratch cpu mask

 xen/arch/x86/cpu/mcheck/mce.c |  2 ++
 xen/arch/x86/io_apic.c        |  6 +++--
 xen/arch/x86/irq.c            | 13 ++++++---
 xen/arch/x86/mm.c             | 30 ++++++++++++++-------
 xen/arch/x86/msi.c            |  4 ++-
 xen/arch/x86/nmi.c            | 11 ++++----
 xen/arch/x86/smp.c            | 51 ++++++++++++++++++++++++++++++++++-
 xen/arch/x86/smpboot.c        | 10 +++++--
 xen/arch/x86/traps.c          | 10 ++++++-
 xen/include/asm-x86/hardirq.h | 23 +++++++++++++++-
 xen/include/asm-x86/nmi.h     |  2 ++
 xen/include/asm-x86/smp.h     | 15 ++++++++---
 xen/include/xen/irq_cpustat.h |  1 +
 13 files changed, 148 insertions(+), 30 deletions(-)

Comments

Jürgen Groß Feb. 18, 2020, 7:40 a.m. UTC | #1
On 17.02.20 19:43, Roger Pau Monne wrote:
> Hello,
> 
> Commit:
> 
> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> x86/smp: use APIC ALLBUT destination shorthand when possible
> 
> Introduced a bogus usage of the scratch cpumask: it was used in a
> function that could be called from interrupt context, and hence using
> the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
> together with also preventing the usage of any per-CPU variables when
> send_IPI_mask is called from #MC or #NMI context. Previous patches are
> preparatory changes.
> 
> Patch #6 adds some debug infrastructure to make sure the scratch cpumask
> is used in the right context, and hence should prevent further missuses.

I wonder whether it wouldn't be better to have a common percpu scratch
cpumask handling instead of introducing local ones all over the
hypervisor.

So basically an array of percpu cpumasks allocated when bringing up a
cpu (this spares memory as the masks wouldn't need to cover NR_CPUS
cpus), a percpu counter of the next free index and get_ and put_
functions acting in a lifo manner.

This would help removing all the still existing cpumasks on the stack
and any illegal nesting would be avoided. The only remaining question
would be the size of the array.

Thoughts?


Juergen
Roger Pau Monné Feb. 18, 2020, 10:15 a.m. UTC | #2
On Tue, Feb 18, 2020 at 08:40:58AM +0100, Jürgen Groß wrote:
> On 17.02.20 19:43, Roger Pau Monne wrote:
> > Hello,
> > 
> > Commit:
> > 
> > 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> > x86/smp: use APIC ALLBUT destination shorthand when possible
> > 
> > Introduced a bogus usage of the scratch cpumask: it was used in a
> > function that could be called from interrupt context, and hence using
> > the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
> > together with also preventing the usage of any per-CPU variables when
> > send_IPI_mask is called from #MC or #NMI context. Previous patches are
> > preparatory changes.
> > 
> > Patch #6 adds some debug infrastructure to make sure the scratch cpumask
> > is used in the right context, and hence should prevent further missuses.
> 
> I wonder whether it wouldn't be better to have a common percpu scratch
> cpumask handling instead of introducing local ones all over the
> hypervisor.

But the scratch CPU mask already accomplishes this, it's a single
per-CPU mask allocated when the CPU is brought up.

> So basically an array of percpu cpumasks allocated when bringing up a
> cpu (this spares memory as the masks wouldn't need to cover NR_CPUS
> cpus), a percpu counter of the next free index and get_ and put_
> functions acting in a lifo manner.

Sizing this array would be complicated IMO, and it's possible that a
failure to get a mask in certain places could lead to a panic.

> This would help removing all the still existing cpumasks on the stack
> and any illegal nesting would be avoided. The only remaining question
> would be the size of the array.
> 
> Thoughts?

I'm mostly worried about the size of such stack, since we would then
allow nested calls to the get_ cpumask helper. Also I'm not sure
whether we should prevent the usage in interrupt context then, in
order to try to limit the nesting as much as possible.

I think this series is a move in the right direction, and we can add
the per-CPU stack afterwards (as the get_/put_ helpers will already be
there).

Thanks, Roger.
Andrew Cooper Feb. 18, 2020, 10:26 a.m. UTC | #3
On 18/02/2020 07:40, Jürgen Groß wrote:
> On 17.02.20 19:43, Roger Pau Monne wrote:
>> Hello,
>>
>> Commit:
>>
>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>
>> Introduced a bogus usage of the scratch cpumask: it was used in a
>> function that could be called from interrupt context, and hence using
>> the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
>> together with also preventing the usage of any per-CPU variables when
>> send_IPI_mask is called from #MC or #NMI context. Previous patches are
>> preparatory changes.
>>
>> Patch #6 adds some debug infrastructure to make sure the scratch cpumask
>> is used in the right context, and hence should prevent further missuses.
>
> I wonder whether it wouldn't be better to have a common percpu scratch
> cpumask handling instead of introducing local ones all over the
> hypervisor.
>
> So basically an array of percpu cpumasks allocated when bringing up a
> cpu (this spares memory as the masks wouldn't need to cover NR_CPUS
> cpus), a percpu counter of the next free index and get_ and put_
> functions acting in a lifo manner.
>
> This would help removing all the still existing cpumasks on the stack
> and any illegal nesting would be avoided. The only remaining question
> would be the size of the array.
>
> Thoughts?

I like the approach, but there is a major caveat.

It is certainly problematic that we have both cpumask_scratch and
scratch_cpumask with have different rules for how to use safely, and now
we're gaining custom logic to fix up the recursive safety of one of them.

That said, I'm pretty sure it will be x86 specific, because the safety
of this is tied to using per-pcpu stacks rather than per-vcpu stacks. 
The only reason the scheduler cpumasks are safe for use on ARM is
because the scheduler code which uses them doesn't call schedule() in
the middle of use.

As a consequence, I don't think this is infrastructure which would be
safe for common code to use.

~Andrew
Jürgen Groß Feb. 18, 2020, 10:46 a.m. UTC | #4
On 18.02.20 11:15, Roger Pau Monné wrote:
> On Tue, Feb 18, 2020 at 08:40:58AM +0100, Jürgen Groß wrote:
>> On 17.02.20 19:43, Roger Pau Monne wrote:
>>> Hello,
>>>
>>> Commit:
>>>
>>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>>
>>> Introduced a bogus usage of the scratch cpumask: it was used in a
>>> function that could be called from interrupt context, and hence using
>>> the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
>>> together with also preventing the usage of any per-CPU variables when
>>> send_IPI_mask is called from #MC or #NMI context. Previous patches are
>>> preparatory changes.
>>>
>>> Patch #6 adds some debug infrastructure to make sure the scratch cpumask
>>> is used in the right context, and hence should prevent further missuses.
>>
>> I wonder whether it wouldn't be better to have a common percpu scratch
>> cpumask handling instead of introducing local ones all over the
>> hypervisor.
> 
> But the scratch CPU mask already accomplishes this, it's a single
> per-CPU mask allocated when the CPU is brought up.

This one, yes. There are others which are just defined like:

DEFINE_PER_CPU(cpumask_t, ...)

>> So basically an array of percpu cpumasks allocated when bringing up a
>> cpu (this spares memory as the masks wouldn't need to cover NR_CPUS
>> cpus), a percpu counter of the next free index and get_ and put_
>> functions acting in a lifo manner.
> 
> Sizing this array would be complicated IMO, and it's possible that a
> failure to get a mask in certain places could lead to a panic.

The question is whether a silent double usage is better (which your
series is already addressing at least for the scratch cpumask).

>> This would help removing all the still existing cpumasks on the stack
>> and any illegal nesting would be avoided. The only remaining question
>> would be the size of the array.
>>
>> Thoughts?
> 
> I'm mostly worried about the size of such stack, since we would then
> allow nested calls to the get_ cpumask helper. Also I'm not sure
> whether we should prevent the usage in interrupt context then, in
> order to try to limit the nesting as much as possible.

No, excluding interrupt context would add the need for special
purpose masks again.

> I think this series is a move in the right direction, and we can add
> the per-CPU stack afterwards (as the get_/put_ helpers will already be
> there).

Yes, I completely agree.


Juergen
Jürgen Groß Feb. 18, 2020, 10:54 a.m. UTC | #5
On 18.02.20 11:26, Andrew Cooper wrote:
> On 18/02/2020 07:40, Jürgen Groß wrote:
>> On 17.02.20 19:43, Roger Pau Monne wrote:
>>> Hello,
>>>
>>> Commit:
>>>
>>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>>
>>> Introduced a bogus usage of the scratch cpumask: it was used in a
>>> function that could be called from interrupt context, and hence using
>>> the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
>>> together with also preventing the usage of any per-CPU variables when
>>> send_IPI_mask is called from #MC or #NMI context. Previous patches are
>>> preparatory changes.
>>>
>>> Patch #6 adds some debug infrastructure to make sure the scratch cpumask
>>> is used in the right context, and hence should prevent further missuses.
>>
>> I wonder whether it wouldn't be better to have a common percpu scratch
>> cpumask handling instead of introducing local ones all over the
>> hypervisor.
>>
>> So basically an array of percpu cpumasks allocated when bringing up a
>> cpu (this spares memory as the masks wouldn't need to cover NR_CPUS
>> cpus), a percpu counter of the next free index and get_ and put_
>> functions acting in a lifo manner.
>>
>> This would help removing all the still existing cpumasks on the stack
>> and any illegal nesting would be avoided. The only remaining question
>> would be the size of the array.
>>
>> Thoughts?
> 
> I like the approach, but there is a major caveat.
> 
> It is certainly problematic that we have both cpumask_scratch and
> scratch_cpumask with have different rules for how to use safely, and now
> we're gaining custom logic to fix up the recursive safety of one of them.
> 
> That said, I'm pretty sure it will be x86 specific, because the safety
> of this is tied to using per-pcpu stacks rather than per-vcpu stacks.
> The only reason the scheduler cpumasks are safe for use on ARM is
> because the scheduler code which uses them doesn't call schedule() in
> the middle of use.

No, the reason the scheduler cpumasks are safe is that using one of
those requires to take the scheduler lock of the cpu having the mask in
its percpu data.

That restriction could probably be dropped in case the scheduler would
be using the common infrastructure.


Juergen