mbox series

[0/3] x86: fixes/improvements for scratch cpumask

Message ID 20200212164949.56434-1-roger.pau@citrix.com (mailing list archive)
Headers show
Series x86: fixes/improvements for scratch cpumask | expand

Message

Roger Pau Monné Feb. 12, 2020, 4:49 p.m. UTC
Hello,

Commit:

5500d265a2a8fa63d60c08beb549de8ec82ff7a5
x86/smp: use APIC ALLBUT destination shorthand when possible

Introduced a bogus usage of the scratch cpumask: it was used in a
function that could be called from interrupt context, and hence using
the scratch cpumask there is not safe. Patch #2 is a fix for that usage.

Patch #3 adds some debug infrastructure to make sure the scratch cpumask
is used in the right context, and hence should prevent further missuses.

Thanks, Roger.

Roger Pau Monne (3):
  x86/smp: unify header includes in smp.h
  x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  x86: add accessors for scratch cpu mask

 xen/arch/x86/io_apic.c    |  6 ++++--
 xen/arch/x86/irq.c        | 13 ++++++++++---
 xen/arch/x86/mm.c         | 30 +++++++++++++++++++++---------
 xen/arch/x86/msi.c        |  4 +++-
 xen/arch/x86/smp.c        | 14 +++++++++++++-
 xen/arch/x86/smpboot.c    | 33 ++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/smp.h | 15 +++++++++++----
 7 files changed, 94 insertions(+), 21 deletions(-)

Comments

Sander Eikelenboom Feb. 12, 2020, 4:53 p.m. UTC | #1
On 12/02/2020 17:49, Roger Pau Monne wrote:
> Hello,
> 
> Commit:
> 
> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> x86/smp: use APIC ALLBUT destination shorthand when possible
> 
> Introduced a bogus usage of the scratch cpumask: it was used in a
> function that could be called from interrupt context, and hence using
> the scratch cpumask there is not safe. Patch #2 is a fix for that usage.
> 
> Patch #3 adds some debug infrastructure to make sure the scratch cpumask
> is used in the right context, and hence should prevent further missuses.
> 
> Thanks, Roger.

Hi Roger,

Do you still want me to test the "panic" patch ?
Or test this series instead ?

--
Sander


> Roger Pau Monne (3):
>   x86/smp: unify header includes in smp.h
>   x86/smp: use a dedicated scratch cpumask in send_IPI_mask
>   x86: add accessors for scratch cpu mask
> 
>  xen/arch/x86/io_apic.c    |  6 ++++--
>  xen/arch/x86/irq.c        | 13 ++++++++++---
>  xen/arch/x86/mm.c         | 30 +++++++++++++++++++++---------
>  xen/arch/x86/msi.c        |  4 +++-
>  xen/arch/x86/smp.c        | 14 +++++++++++++-
>  xen/arch/x86/smpboot.c    | 33 ++++++++++++++++++++++++++++++++-
>  xen/include/asm-x86/smp.h | 15 +++++++++++----
>  7 files changed, 94 insertions(+), 21 deletions(-)
>
Roger Pau Monné Feb. 12, 2020, 5:01 p.m. UTC | #2
On Wed, Feb 12, 2020 at 05:53:39PM +0100, Sander Eikelenboom wrote:
> On 12/02/2020 17:49, Roger Pau Monne wrote:
> > Hello,
> > 
> > Commit:
> > 
> > 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> > x86/smp: use APIC ALLBUT destination shorthand when possible
> > 
> > Introduced a bogus usage of the scratch cpumask: it was used in a
> > function that could be called from interrupt context, and hence using
> > the scratch cpumask there is not safe. Patch #2 is a fix for that usage.
> > 
> > Patch #3 adds some debug infrastructure to make sure the scratch cpumask
> > is used in the right context, and hence should prevent further missuses.
> > 
> > Thanks, Roger.
> 
> Hi Roger,
> 
> Do you still want me to test the "panic" patch ?
> Or test this series instead ?

I've been able to trigger this myself, so if you can give a try to the
series in order to assert it fixes your issue that would be great.

Thanks.
Sander Eikelenboom Feb. 12, 2020, 5:13 p.m. UTC | #3
On 12/02/2020 18:01, Roger Pau Monné wrote:
> On Wed, Feb 12, 2020 at 05:53:39PM +0100, Sander Eikelenboom wrote:
>> On 12/02/2020 17:49, Roger Pau Monne wrote:
>>> Hello,
>>>
>>> Commit:
>>>
>>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>>
>>> Introduced a bogus usage of the scratch cpumask: it was used in a
>>> function that could be called from interrupt context, and hence using
>>> the scratch cpumask there is not safe. Patch #2 is a fix for that usage.
>>>
>>> Patch #3 adds some debug infrastructure to make sure the scratch cpumask
>>> is used in the right context, and hence should prevent further missuses.
>>>
>>> Thanks, Roger.
>>
>> Hi Roger,
>>
>> Do you still want me to test the "panic" patch ?
>> Or test this series instead ?
> 
> I've been able to trigger this myself, so if you can give a try to the
> series in order to assert it fixes your issue that would be great.
> 
> Thanks.
> 

Sure, compiling now, will report back tomorrow morning.
--
Sander
Julien Grall Feb. 12, 2020, 9:05 p.m. UTC | #4
On 12/02/2020 17:49, Roger Pau Monne wrote:
> Hello,

Hi Roger,

> Commit:
> 
> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> x86/smp: use APIC ALLBUT destination shorthand when possible

There is a more subtle problem introduced by this patch. I thought I 
would mention it here just in case this affect the approach you have 
chosen in this series.

get_cpu_maps() is used by stop_machine_run() to serialize the callers. 
If the latter fails to acquire the lock, it will bail out. 
Unfortunately, rcu_barrier() is implemented using stop_machine_run() and 
will be turned to pretty much a NOP if the latter fails (e.g the lock 
cannot be acquired).

This means that the rcu_barrier() will not do the expected job and 
potentially introduce unknown issues (e.g use-after-free...).

Before your patch, it would have been pretty hard to hit the problem 
above. After, you can race more easily with rcu_barrier() as sending IPI 
is pretty common.

Sadly, I don't have a suggestion yet how to fix this problem.

Cheers,
Sander Eikelenboom Feb. 13, 2020, 9:31 a.m. UTC | #5
On 12/02/2020 18:13, Sander Eikelenboom wrote:
> On 12/02/2020 18:01, Roger Pau Monné wrote:
>> On Wed, Feb 12, 2020 at 05:53:39PM +0100, Sander Eikelenboom wrote:
>>> On 12/02/2020 17:49, Roger Pau Monne wrote:
>>>> Hello,
>>>>
>>>> Commit:
>>>>
>>>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>>>
>>>> Introduced a bogus usage of the scratch cpumask: it was used in a
>>>> function that could be called from interrupt context, and hence using
>>>> the scratch cpumask there is not safe. Patch #2 is a fix for that usage.
>>>>
>>>> Patch #3 adds some debug infrastructure to make sure the scratch cpumask
>>>> is used in the right context, and hence should prevent further missuses.
>>>>
>>>> Thanks, Roger.
>>>
>>> Hi Roger,
>>>
>>> Do you still want me to test the "panic" patch ?
>>> Or test this series instead ?
>>
>> I've been able to trigger this myself, so if you can give a try to the
>> series in order to assert it fixes your issue that would be great.
>>
>> Thanks.
>>
> 
> Sure, compiling now, will report back tomorrow morning.
> --
> Sander
> 

Haven't seen the issue yet, so it seems fixed.
Thanks !
--
Sander
Jan Beulich Feb. 13, 2020, 9:53 a.m. UTC | #6
On 12.02.2020 22:05, Julien Grall wrote:
> On 12/02/2020 17:49, Roger Pau Monne wrote:
>> Commit:
>>
>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>> x86/smp: use APIC ALLBUT destination shorthand when possible
> 
> There is a more subtle problem introduced by this patch. I thought I 
> would mention it here just in case this affect the approach you have 
> chosen in this series.
> 
> get_cpu_maps() is used by stop_machine_run() to serialize the callers. 
> If the latter fails to acquire the lock, it will bail out. 
> Unfortunately, rcu_barrier() is implemented using stop_machine_run() and 
> will be turned to pretty much a NOP if the latter fails (e.g the lock 
> cannot be acquired).
> 
> This means that the rcu_barrier() will not do the expected job and 
> potentially introduce unknown issues (e.g use-after-free...).
> 
> Before your patch, it would have been pretty hard to hit the problem 
> above. After, you can race more easily with rcu_barrier() as sending IPI 
> is pretty common.
> 
> Sadly, I don't have a suggestion yet how to fix this problem.

A frequent use like on the IPI sending path suggests the lock may
want to become an r/w one, where both parties you mention want to
acquire it in read mode.

Jan
Roger Pau Monné Feb. 13, 2020, 10:05 a.m. UTC | #7
On Thu, Feb 13, 2020 at 10:53:43AM +0100, Jan Beulich wrote:
> On 12.02.2020 22:05, Julien Grall wrote:
> > On 12/02/2020 17:49, Roger Pau Monne wrote:
> >> Commit:
> >>
> >> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> >> x86/smp: use APIC ALLBUT destination shorthand when possible
> > 
> > There is a more subtle problem introduced by this patch. I thought I 
> > would mention it here just in case this affect the approach you have 
> > chosen in this series.
> > 
> > get_cpu_maps() is used by stop_machine_run() to serialize the callers. 
> > If the latter fails to acquire the lock, it will bail out. 
> > Unfortunately, rcu_barrier() is implemented using stop_machine_run() and 
> > will be turned to pretty much a NOP if the latter fails (e.g the lock 
> > cannot be acquired).
> > 
> > This means that the rcu_barrier() will not do the expected job and 
> > potentially introduce unknown issues (e.g use-after-free...).
> > 
> > Before your patch, it would have been pretty hard to hit the problem 
> > above. After, you can race more easily with rcu_barrier() as sending IPI 
> > is pretty common.
> > 
> > Sadly, I don't have a suggestion yet how to fix this problem.
> 
> A frequent use like on the IPI sending path suggests the lock may
> want to become an r/w one, where both parties you mention want to
> acquire it in read mode.

I'm already preparing a patch in this direction.

Roger.