Message ID | 20200212164949.56434-1-roger.pau@citrix.com (mailing list archive) |
---|---|
Headers | show |
Series | x86: fixes/improvements for scratch cpumask | expand |
On 12/02/2020 17:49, Roger Pau Monne wrote: > Hello, > > Commit: > > 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 > x86/smp: use APIC ALLBUT destination shorthand when possible > > Introduced a bogus usage of the scratch cpumask: it was used in a > function that could be called from interrupt context, and hence using > the scratch cpumask there is not safe. Patch #2 is a fix for that usage. > > Patch #3 adds some debug infrastructure to make sure the scratch cpumask > is used in the right context, and hence should prevent further missuses. > > Thanks, Roger. Hi Roger, Do you still want me to test the "panic" patch ? Or test this series instead ? -- Sander > Roger Pau Monne (3): > x86/smp: unify header includes in smp.h > x86/smp: use a dedicated scratch cpumask in send_IPI_mask > x86: add accessors for scratch cpu mask > > xen/arch/x86/io_apic.c | 6 ++++-- > xen/arch/x86/irq.c | 13 ++++++++++--- > xen/arch/x86/mm.c | 30 +++++++++++++++++++++--------- > xen/arch/x86/msi.c | 4 +++- > xen/arch/x86/smp.c | 14 +++++++++++++- > xen/arch/x86/smpboot.c | 33 ++++++++++++++++++++++++++++++++- > xen/include/asm-x86/smp.h | 15 +++++++++++---- > 7 files changed, 94 insertions(+), 21 deletions(-) >
On Wed, Feb 12, 2020 at 05:53:39PM +0100, Sander Eikelenboom wrote: > On 12/02/2020 17:49, Roger Pau Monne wrote: > > Hello, > > > > Commit: > > > > 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 > > x86/smp: use APIC ALLBUT destination shorthand when possible > > > > Introduced a bogus usage of the scratch cpumask: it was used in a > > function that could be called from interrupt context, and hence using > > the scratch cpumask there is not safe. Patch #2 is a fix for that usage. > > > > Patch #3 adds some debug infrastructure to make sure the scratch cpumask > > is used in the right context, and hence should prevent further missuses. > > > > Thanks, Roger. > > Hi Roger, > > Do you still want me to test the "panic" patch ? > Or test this series instead ? I've been able to trigger this myself, so if you can give a try to the series in order to assert it fixes your issue that would be great. Thanks.
On 12/02/2020 18:01, Roger Pau Monné wrote: > On Wed, Feb 12, 2020 at 05:53:39PM +0100, Sander Eikelenboom wrote: >> On 12/02/2020 17:49, Roger Pau Monne wrote: >>> Hello, >>> >>> Commit: >>> >>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 >>> x86/smp: use APIC ALLBUT destination shorthand when possible >>> >>> Introduced a bogus usage of the scratch cpumask: it was used in a >>> function that could be called from interrupt context, and hence using >>> the scratch cpumask there is not safe. Patch #2 is a fix for that usage. >>> >>> Patch #3 adds some debug infrastructure to make sure the scratch cpumask >>> is used in the right context, and hence should prevent further missuses. >>> >>> Thanks, Roger. >> >> Hi Roger, >> >> Do you still want me to test the "panic" patch ? >> Or test this series instead ? > > I've been able to trigger this myself, so if you can give a try to the > series in order to assert it fixes your issue that would be great. > > Thanks. > Sure, compiling now, will report back tomorrow morning. -- Sander
On 12/02/2020 17:49, Roger Pau Monne wrote: > Hello, Hi Roger, > Commit: > > 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 > x86/smp: use APIC ALLBUT destination shorthand when possible There is a more subtle problem introduced by this patch. I thought I would mention it here just in case this affect the approach you have chosen in this series. get_cpu_maps() is used by stop_machine_run() to serialize the callers. If the latter fails to acquire the lock, it will bail out. Unfortunately, rcu_barrier() is implemented using stop_machine_run() and will be turned to pretty much a NOP if the latter fails (e.g the lock cannot be acquired). This means that the rcu_barrier() will not do the expected job and potentially introduce unknown issues (e.g use-after-free...). Before your patch, it would have been pretty hard to hit the problem above. After, you can race more easily with rcu_barrier() as sending IPI is pretty common. Sadly, I don't have a suggestion yet how to fix this problem. Cheers,
On 12/02/2020 18:13, Sander Eikelenboom wrote: > On 12/02/2020 18:01, Roger Pau Monné wrote: >> On Wed, Feb 12, 2020 at 05:53:39PM +0100, Sander Eikelenboom wrote: >>> On 12/02/2020 17:49, Roger Pau Monne wrote: >>>> Hello, >>>> >>>> Commit: >>>> >>>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 >>>> x86/smp: use APIC ALLBUT destination shorthand when possible >>>> >>>> Introduced a bogus usage of the scratch cpumask: it was used in a >>>> function that could be called from interrupt context, and hence using >>>> the scratch cpumask there is not safe. Patch #2 is a fix for that usage. >>>> >>>> Patch #3 adds some debug infrastructure to make sure the scratch cpumask >>>> is used in the right context, and hence should prevent further missuses. >>>> >>>> Thanks, Roger. >>> >>> Hi Roger, >>> >>> Do you still want me to test the "panic" patch ? >>> Or test this series instead ? >> >> I've been able to trigger this myself, so if you can give a try to the >> series in order to assert it fixes your issue that would be great. >> >> Thanks. >> > > Sure, compiling now, will report back tomorrow morning. > -- > Sander > Haven't seen the issue yet, so it seems fixed. Thanks ! -- Sander
On 12.02.2020 22:05, Julien Grall wrote: > On 12/02/2020 17:49, Roger Pau Monne wrote: >> Commit: >> >> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 >> x86/smp: use APIC ALLBUT destination shorthand when possible > > There is a more subtle problem introduced by this patch. I thought I > would mention it here just in case this affect the approach you have > chosen in this series. > > get_cpu_maps() is used by stop_machine_run() to serialize the callers. > If the latter fails to acquire the lock, it will bail out. > Unfortunately, rcu_barrier() is implemented using stop_machine_run() and > will be turned to pretty much a NOP if the latter fails (e.g the lock > cannot be acquired). > > This means that the rcu_barrier() will not do the expected job and > potentially introduce unknown issues (e.g use-after-free...). > > Before your patch, it would have been pretty hard to hit the problem > above. After, you can race more easily with rcu_barrier() as sending IPI > is pretty common. > > Sadly, I don't have a suggestion yet how to fix this problem. A frequent use like on the IPI sending path suggests the lock may want to become an r/w one, where both parties you mention want to acquire it in read mode. Jan
On Thu, Feb 13, 2020 at 10:53:43AM +0100, Jan Beulich wrote: > On 12.02.2020 22:05, Julien Grall wrote: > > On 12/02/2020 17:49, Roger Pau Monne wrote: > >> Commit: > >> > >> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 > >> x86/smp: use APIC ALLBUT destination shorthand when possible > > > > There is a more subtle problem introduced by this patch. I thought I > > would mention it here just in case this affect the approach you have > > chosen in this series. > > > > get_cpu_maps() is used by stop_machine_run() to serialize the callers. > > If the latter fails to acquire the lock, it will bail out. > > Unfortunately, rcu_barrier() is implemented using stop_machine_run() and > > will be turned to pretty much a NOP if the latter fails (e.g the lock > > cannot be acquired). > > > > This means that the rcu_barrier() will not do the expected job and > > potentially introduce unknown issues (e.g use-after-free...). > > > > Before your patch, it would have been pretty hard to hit the problem > > above. After, you can race more easily with rcu_barrier() as sending IPI > > is pretty common. > > > > Sadly, I don't have a suggestion yet how to fix this problem. > > A frequent use like on the IPI sending path suggests the lock may > want to become an r/w one, where both parties you mention want to > acquire it in read mode. I'm already preparing a patch in this direction. Roger.