Message ID | 20200217184324.73762-1-roger.pau@citrix.com (mailing list archive) |
---|---|
Headers | show |
Series | x86: fixes/improvements for scratch cpumask | expand |
On 17.02.20 19:43, Roger Pau Monne wrote: > Hello, > > Commit: > > 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 > x86/smp: use APIC ALLBUT destination shorthand when possible > > Introduced a bogus usage of the scratch cpumask: it was used in a > function that could be called from interrupt context, and hence using > the scratch cpumask there is not safe. Patch #5 is a fix for that usage, > together with also preventing the usage of any per-CPU variables when > send_IPI_mask is called from #MC or #NMI context. Previous patches are > preparatory changes. > > Patch #6 adds some debug infrastructure to make sure the scratch cpumask > is used in the right context, and hence should prevent further missuses. I wonder whether it wouldn't be better to have a common percpu scratch cpumask handling instead of introducing local ones all over the hypervisor. So basically an array of percpu cpumasks allocated when bringing up a cpu (this spares memory as the masks wouldn't need to cover NR_CPUS cpus), a percpu counter of the next free index and get_ and put_ functions acting in a lifo manner. This would help removing all the still existing cpumasks on the stack and any illegal nesting would be avoided. The only remaining question would be the size of the array. Thoughts? Juergen
On Tue, Feb 18, 2020 at 08:40:58AM +0100, Jürgen Groß wrote: > On 17.02.20 19:43, Roger Pau Monne wrote: > > Hello, > > > > Commit: > > > > 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 > > x86/smp: use APIC ALLBUT destination shorthand when possible > > > > Introduced a bogus usage of the scratch cpumask: it was used in a > > function that could be called from interrupt context, and hence using > > the scratch cpumask there is not safe. Patch #5 is a fix for that usage, > > together with also preventing the usage of any per-CPU variables when > > send_IPI_mask is called from #MC or #NMI context. Previous patches are > > preparatory changes. > > > > Patch #6 adds some debug infrastructure to make sure the scratch cpumask > > is used in the right context, and hence should prevent further missuses. > > I wonder whether it wouldn't be better to have a common percpu scratch > cpumask handling instead of introducing local ones all over the > hypervisor. But the scratch CPU mask already accomplishes this, it's a single per-CPU mask allocated when the CPU is brought up. > So basically an array of percpu cpumasks allocated when bringing up a > cpu (this spares memory as the masks wouldn't need to cover NR_CPUS > cpus), a percpu counter of the next free index and get_ and put_ > functions acting in a lifo manner. Sizing this array would be complicated IMO, and it's possible that a failure to get a mask in certain places could lead to a panic. > This would help removing all the still existing cpumasks on the stack > and any illegal nesting would be avoided. The only remaining question > would be the size of the array. > > Thoughts? I'm mostly worried about the size of such stack, since we would then allow nested calls to the get_ cpumask helper. Also I'm not sure whether we should prevent the usage in interrupt context then, in order to try to limit the nesting as much as possible. I think this series is a move in the right direction, and we can add the per-CPU stack afterwards (as the get_/put_ helpers will already be there). Thanks, Roger.
On 18/02/2020 07:40, Jürgen Groß wrote: > On 17.02.20 19:43, Roger Pau Monne wrote: >> Hello, >> >> Commit: >> >> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 >> x86/smp: use APIC ALLBUT destination shorthand when possible >> >> Introduced a bogus usage of the scratch cpumask: it was used in a >> function that could be called from interrupt context, and hence using >> the scratch cpumask there is not safe. Patch #5 is a fix for that usage, >> together with also preventing the usage of any per-CPU variables when >> send_IPI_mask is called from #MC or #NMI context. Previous patches are >> preparatory changes. >> >> Patch #6 adds some debug infrastructure to make sure the scratch cpumask >> is used in the right context, and hence should prevent further missuses. > > I wonder whether it wouldn't be better to have a common percpu scratch > cpumask handling instead of introducing local ones all over the > hypervisor. > > So basically an array of percpu cpumasks allocated when bringing up a > cpu (this spares memory as the masks wouldn't need to cover NR_CPUS > cpus), a percpu counter of the next free index and get_ and put_ > functions acting in a lifo manner. > > This would help removing all the still existing cpumasks on the stack > and any illegal nesting would be avoided. The only remaining question > would be the size of the array. > > Thoughts? I like the approach, but there is a major caveat. It is certainly problematic that we have both cpumask_scratch and scratch_cpumask with have different rules for how to use safely, and now we're gaining custom logic to fix up the recursive safety of one of them. That said, I'm pretty sure it will be x86 specific, because the safety of this is tied to using per-pcpu stacks rather than per-vcpu stacks. The only reason the scheduler cpumasks are safe for use on ARM is because the scheduler code which uses them doesn't call schedule() in the middle of use. As a consequence, I don't think this is infrastructure which would be safe for common code to use. ~Andrew
On 18.02.20 11:15, Roger Pau Monné wrote: > On Tue, Feb 18, 2020 at 08:40:58AM +0100, Jürgen Groß wrote: >> On 17.02.20 19:43, Roger Pau Monne wrote: >>> Hello, >>> >>> Commit: >>> >>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 >>> x86/smp: use APIC ALLBUT destination shorthand when possible >>> >>> Introduced a bogus usage of the scratch cpumask: it was used in a >>> function that could be called from interrupt context, and hence using >>> the scratch cpumask there is not safe. Patch #5 is a fix for that usage, >>> together with also preventing the usage of any per-CPU variables when >>> send_IPI_mask is called from #MC or #NMI context. Previous patches are >>> preparatory changes. >>> >>> Patch #6 adds some debug infrastructure to make sure the scratch cpumask >>> is used in the right context, and hence should prevent further missuses. >> >> I wonder whether it wouldn't be better to have a common percpu scratch >> cpumask handling instead of introducing local ones all over the >> hypervisor. > > But the scratch CPU mask already accomplishes this, it's a single > per-CPU mask allocated when the CPU is brought up. This one, yes. There are others which are just defined like: DEFINE_PER_CPU(cpumask_t, ...) >> So basically an array of percpu cpumasks allocated when bringing up a >> cpu (this spares memory as the masks wouldn't need to cover NR_CPUS >> cpus), a percpu counter of the next free index and get_ and put_ >> functions acting in a lifo manner. > > Sizing this array would be complicated IMO, and it's possible that a > failure to get a mask in certain places could lead to a panic. The question is whether a silent double usage is better (which your series is already addressing at least for the scratch cpumask). >> This would help removing all the still existing cpumasks on the stack >> and any illegal nesting would be avoided. The only remaining question >> would be the size of the array. >> >> Thoughts? > > I'm mostly worried about the size of such stack, since we would then > allow nested calls to the get_ cpumask helper. Also I'm not sure > whether we should prevent the usage in interrupt context then, in > order to try to limit the nesting as much as possible. No, excluding interrupt context would add the need for special purpose masks again. > I think this series is a move in the right direction, and we can add > the per-CPU stack afterwards (as the get_/put_ helpers will already be > there). Yes, I completely agree. Juergen
On 18.02.20 11:26, Andrew Cooper wrote: > On 18/02/2020 07:40, Jürgen Groß wrote: >> On 17.02.20 19:43, Roger Pau Monne wrote: >>> Hello, >>> >>> Commit: >>> >>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5 >>> x86/smp: use APIC ALLBUT destination shorthand when possible >>> >>> Introduced a bogus usage of the scratch cpumask: it was used in a >>> function that could be called from interrupt context, and hence using >>> the scratch cpumask there is not safe. Patch #5 is a fix for that usage, >>> together with also preventing the usage of any per-CPU variables when >>> send_IPI_mask is called from #MC or #NMI context. Previous patches are >>> preparatory changes. >>> >>> Patch #6 adds some debug infrastructure to make sure the scratch cpumask >>> is used in the right context, and hence should prevent further missuses. >> >> I wonder whether it wouldn't be better to have a common percpu scratch >> cpumask handling instead of introducing local ones all over the >> hypervisor. >> >> So basically an array of percpu cpumasks allocated when bringing up a >> cpu (this spares memory as the masks wouldn't need to cover NR_CPUS >> cpus), a percpu counter of the next free index and get_ and put_ >> functions acting in a lifo manner. >> >> This would help removing all the still existing cpumasks on the stack >> and any illegal nesting would be avoided. The only remaining question >> would be the size of the array. >> >> Thoughts? > > I like the approach, but there is a major caveat. > > It is certainly problematic that we have both cpumask_scratch and > scratch_cpumask with have different rules for how to use safely, and now > we're gaining custom logic to fix up the recursive safety of one of them. > > That said, I'm pretty sure it will be x86 specific, because the safety > of this is tied to using per-pcpu stacks rather than per-vcpu stacks. > The only reason the scheduler cpumasks are safe for use on ARM is > because the scheduler code which uses them doesn't call schedule() in > the middle of use. No, the reason the scheduler cpumasks are safe is that using one of those requires to take the scheduler lock of the cpu having the mask in its percpu data. That restriction could probably be dropped in case the scheduler would be using the common infrastructure. Juergen