Message ID | 20220627173605.514504-1-tony.luck@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Handle corrected machine check interrupt storms | expand |
On Mon, Jun 27, 2022 at 10:36:00AM -0700, Tony Luck wrote: > Extend the logic of handling Intel's corrected machine check interrupt > storms to AMD's threshold interrupts. > > First two patches are from Tony which cleans up the existing storm > handling for Intel and proposes per CPU per bank storm handling. > > Third and fourth patches do some cleanup and refactoring on the CMCI > storm handling in order to extend similar workaround for AMD's threshold > interrupt storms. These two patches could be merged into Tony's second > patch of CMCI storm mitigation. > > AMD's storm mitigation for threshold interrupts also relies on per CPU > per bank approach similar to Intel. But unlike CMCI storm handling it does > not set thresholds to reduce rate of interrupts on a storm. Rather it > turns off the interrupt on the current CPU and bank if there is a storm > and re-enables back the interrupts when the storm subsides. > > It is okay to turn off threshold interrupts on AMD systems as other error > severities continue to be handled even if the threshold interrupts are > turned off. Uncorrected errors will generate a #MC and deferred errors > have a unique separate deferred error interrupt. The final patch adds > support for handling threshold interrupt storms on AMD systems. > > Changes since v1: > > 1) Fix shift computation when keeping track of bank history. Shift > should be "1" when a storm is in progress (because polling once per > second). When a storm is not in progress shift should be based on > number of seconds since the bank was last checked. > > 2) Changed Smita's code in part 0003 to avoid use of a function pointer > (since the kernel is avoiding indirect branch points that might be > trainable for various Spectre-like issues). > > Smita Koralahalli (2): > x86/mce: Introduce mce_handle_storm() to deal with begin/end of storms > x86/mce: Handle AMD threshold interrupt storms > x86/mce: Move storm handling to core. > > Tony Luck (3): > x86/mce: Remove old CMCI storm mitigation code > x86/mce: Add per-bank CMCI storm mitigation > > arch/x86/kernel/cpu/mce/amd.c | 49 ++++++++ > arch/x86/kernel/cpu/mce/core.c | 139 +++++++++++++++++----- > arch/x86/kernel/cpu/mce/intel.c | 179 +++++++---------------------- > arch/x86/kernel/cpu/mce/internal.h | 33 ++++-- > 4 files changed, 230 insertions(+), 170 deletions(-) > > -- Hi Tony, Is there an updated version of this set? I can help review and test. Smita is focusing on other items at the moment. Thanks! -Yazen