Message ID | 20130619175438.2852.93449.stgit@localhost.localdomain (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Wed, Jun 19, 2013 at 11:27:17PM +0530, Naveen N. Rao wrote: > The Corrected Machine Check structure (CMC) in HEST has a flag which can be > set by the firmware to indicate to the OS that it prefers to process the > corrected error events first. In this scenario, the OS is expected to not > monitor for corrected errors (through CMCI/polling). Instead, the firmware > notifies the OS on corrected error events through GHES. > > Linux already has support for GHES. This patch adds support for parsing CMC > structure and to disable CMCI/polling if the firmware first flag is set. > > Further, the list of machine check bank structures at the end of CMC is used > to determine which MCA banks function in FF mode, so that we continue to > monitor error events on the other banks. > > > - Naveen > > -- > Changes: > - Incorporated comments from Boris and Tony from the previous thread at > http://thread.gmane.org/gmane.linux.acpi.devel/60802 > - Added patch to disable firmware first mode through a boot option. > > > Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> > --- > arch/x86/include/asm/mce.h | 3 ++ > arch/x86/kernel/cpu/mcheck/mce-internal.h | 3 ++ > arch/x86/kernel/cpu/mcheck/mce.c | 25 ++++++++++++++++++ > arch/x86/kernel/cpu/mcheck/mce_intel.c | 40 +++++++++++++++++++++++------ > drivers/acpi/apei/hest.c | 36 ++++++++++++++++++++++++++ > 5 files changed, 99 insertions(+), 8 deletions(-) > > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > index fa5f71e..380fff8 100644 > --- a/arch/x86/include/asm/mce.h > +++ b/arch/x86/include/asm/mce.h > @@ -188,6 +188,9 @@ extern void register_mce_write_callback(ssize_t (*)(struct file *filp, > const char __user *ubuf, > size_t usize, loff_t *off)); > > +/* Disable CMCI/polling for MCA bank claimed by firmware */ > +extern void mce_disable_ce_bank(int bank); > + > /* > * Exception handler > */ > diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h > index 5b7d4fa..193edc1 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce-internal.h > +++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h > @@ -25,6 +25,9 @@ int mce_severity(struct mce *a, int tolerant, char **msg); > struct dentry *mce_get_debugfs_dir(void); > > extern struct mce_bank *mce_banks; > +extern mce_banks_t mce_banks_ce_disabled; > + > +void cmci_disable_bank(int bank); > > #ifdef CONFIG_X86_MCE_INTEL > unsigned long mce_intel_adjust_timer(unsigned long interval); > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index 9239504..9512034 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -94,6 +94,9 @@ DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = { > [0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL > }; > > +/* MCA banks controlled through firmware first for corrected errors */ > +mce_banks_t mce_banks_ce_disabled; Why the second bitfield? The cleared bits in mce_poll_banks should be the banks which are handled in FF mode, no? > + > static DEFINE_PER_CPU(struct work_struct, mce_work); > > static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs); > @@ -1932,6 +1935,28 @@ static struct miscdevice mce_chrdev_device = { > &mce_chrdev_ops, > }; > > +static void __mce_disable_ce_bank(void *arg) > +{ > + int bank = *((int *)arg); > + > + /* Ensure we don't poll this bank */ No need for that comment. > + __clear_bit(bank, __get_cpu_var(mce_poll_banks)); > + /* Disable CMCI */ No need for that comment either - function name cannot be more descriptive :-) > + cmci_disable_bank(bank); > +} > + > +void mce_disable_ce_bank(int bank) Yeah, let's call it ...disable_poll_bank because we're disabling polling for those banks. And yes, we poll for errors for which no MCE exception is generated and those happen to be corrected but still... > +{ > + if (bank >= mca_cfg.banks) { > + pr_warning(FW_BUG > + "Ignoring request to disable invalid MCA bank %d.\n", > + bank); > + return; > + } > + set_bit(bank, mce_banks_ce_disabled); > + on_each_cpu(__mce_disable_ce_bank, &bank, 1); > +} > + > /* > * mce=off Disables machine check > * mce=no_cmci Disables CMCI > diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c > index ae1697c..78256c0 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c > +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c > @@ -191,6 +191,10 @@ static void cmci_discover(int banks) > if (test_bit(i, owned)) > continue; > > + /* Skip banks in firmware first mode */ > + if (test_bit(i, mce_banks_ce_disabled)) > + continue; > + > rdmsrl(MSR_IA32_MCx_CTL2(i), val); > > /* Already owned by someone else? */ > @@ -259,6 +263,20 @@ void cmci_recheck(void) > local_irq_restore(flags); > } > > +/* Caller must hold the lock on cmci_discover_lock */ > +static void __cmci_disable_bank(int bank) > +{ > + u64 val; > + > + if (!test_bit(bank, __get_cpu_var(mce_banks_owned))) > + return; > + /* Disable CMCI */ > + rdmsrl(MSR_IA32_MCx_CTL2(bank), val); > + val &= ~MCI_CTL2_CMCI_EN; > + wrmsrl(MSR_IA32_MCx_CTL2(bank), val); > + __clear_bit(bank, __get_cpu_var(mce_banks_owned)); > +} > + > /* > * Disable CMCI on this CPU for all banks it owns when it goes down. > * This allows other CPUs to claim the banks on rediscovery. > @@ -268,19 +286,12 @@ void cmci_clear(void) > unsigned long flags; > int i; > int banks; > - u64 val; > > if (!cmci_supported(&banks)) > return; > raw_spin_lock_irqsave(&cmci_discover_lock, flags); > for (i = 0; i < banks; i++) { > - if (!test_bit(i, __get_cpu_var(mce_banks_owned))) > - continue; > - /* Disable CMCI */ > - rdmsrl(MSR_IA32_MCx_CTL2(i), val); > - val &= ~MCI_CTL2_CMCI_EN; > - wrmsrl(MSR_IA32_MCx_CTL2(i), val); > - __clear_bit(i, __get_cpu_var(mce_banks_owned)); > + __cmci_disable_bank(i); > } This leaves only one line so no need for the {} braces anymore. > raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); > } > @@ -315,6 +326,19 @@ void cmci_reenable(void) > cmci_discover(banks); > } > > +void cmci_disable_bank(int bank) > +{ > + int banks; > + unsigned long flags; > + > + if (!cmci_supported(&banks)) > + return; > + > + raw_spin_lock_irqsave(&cmci_discover_lock, flags); > + __cmci_disable_bank(bank); > + raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); > +} You need a prototype for that if you're going to call it from mce.c but CONFIG_X86_MCE_INTEL is not set: arch/x86/built-in.o: In function `__mce_disable_ce_bank': /home/boris/kernel/linux-2.6/arch/x86/kernel/cpu/mcheck/mce.c:1945: undefined reference to `cmci_disable_bank' make: *** [vmlinux] Error 1 > + > static void intel_init_cmci(void) > { > int banks; > diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c > index f5ef5d5..d8c69ba 100644 > --- a/drivers/acpi/apei/hest.c > +++ b/drivers/acpi/apei/hest.c > @@ -36,6 +36,7 @@ > #include <linux/io.h> > #include <linux/platform_device.h> > #include <acpi/apei.h> > +#include <asm/mce.h> > > #include "apei-internal.h" > > @@ -121,6 +122,39 @@ int apei_hest_parse(apei_hest_func_t func, void *data) > } > EXPORT_SYMBOL_GPL(apei_hest_parse); > > +/* > + * Check if firmware advertises firmware first mode. We need FF bit to be set > + * along with a set of MC banks which work in FF mode. > + */ > +static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data) > +{ > + int i; > + struct acpi_hest_ia_corrected *cmc; > + struct acpi_hest_ia_error_bank *mc_bank; > + > + if (hest_hdr->type != ACPI_HEST_TYPE_IA32_CORRECTED_CHECK) > + return 0; > + > + cmc = (struct acpi_hest_ia_corrected *)hest_hdr; > + if (!cmc->enabled || !(cmc->flags & ACPI_HEST_FIRMWARE_FIRST)) > + return 0; > + > + /* > + * We expect HEST to provide a list of MC banks that > + * report errors through firmware first mode. ... in firmware first mode. > + */ > + if (cmc->num_hardware_banks == 0) if (!cmc->num_hardware_banks) > + return 0; Err, why does this function return 0 when the sanity checks above fail? apei_hest_parse actually looks at the retval and advances the hest_hdr. > + > + pr_info(HEST_PFX "Enabling Firmware First mode for corrected errors.\n"); > + > + mc_bank = (struct acpi_hest_ia_error_bank *)(cmc + 1); > + for (i = 0; i < cmc->num_hardware_banks; i++, mc_bank++) > + mce_disable_ce_bank(mc_bank->bank_number); > + > + return 0; IOW, we should return 0 only here. [ … ]
On 06/20/2013 01:09 PM, Borislav Petkov wrote: > On Wed, Jun 19, 2013 at 11:27:17PM +0530, Naveen N. Rao wrote: >> The Corrected Machine Check structure (CMC) in HEST has a flag which can be >> set by the firmware to indicate to the OS that it prefers to process the >> corrected error events first. In this scenario, the OS is expected to not >> monitor for corrected errors (through CMCI/polling). Instead, the firmware >> notifies the OS on corrected error events through GHES. >> >> Linux already has support for GHES. This patch adds support for parsing CMC >> structure and to disable CMCI/polling if the firmware first flag is set. >> >> Further, the list of machine check bank structures at the end of CMC is used >> to determine which MCA banks function in FF mode, so that we continue to >> monitor error events on the other banks. >> >> >> - Naveen >> >> -- >> Changes: >> - Incorporated comments from Boris and Tony from the previous thread at >> http://thread.gmane.org/gmane.linux.acpi.devel/60802 >> - Added patch to disable firmware first mode through a boot option. >> >> >> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> >> --- >> arch/x86/include/asm/mce.h | 3 ++ >> arch/x86/kernel/cpu/mcheck/mce-internal.h | 3 ++ >> arch/x86/kernel/cpu/mcheck/mce.c | 25 ++++++++++++++++++ >> arch/x86/kernel/cpu/mcheck/mce_intel.c | 40 +++++++++++++++++++++++------ >> drivers/acpi/apei/hest.c | 36 ++++++++++++++++++++++++++ >> 5 files changed, 99 insertions(+), 8 deletions(-) >> >> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h >> index fa5f71e..380fff8 100644 >> --- a/arch/x86/include/asm/mce.h >> +++ b/arch/x86/include/asm/mce.h >> @@ -188,6 +188,9 @@ extern void register_mce_write_callback(ssize_t (*)(struct file *filp, >> const char __user *ubuf, >> size_t usize, loff_t *off)); >> >> +/* Disable CMCI/polling for MCA bank claimed by firmware */ >> +extern void mce_disable_ce_bank(int bank); >> + >> /* >> * Exception handler >> */ >> diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h >> index 5b7d4fa..193edc1 100644 >> --- a/arch/x86/kernel/cpu/mcheck/mce-internal.h >> +++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h >> @@ -25,6 +25,9 @@ int mce_severity(struct mce *a, int tolerant, char **msg); >> struct dentry *mce_get_debugfs_dir(void); >> >> extern struct mce_bank *mce_banks; >> +extern mce_banks_t mce_banks_ce_disabled; >> + >> +void cmci_disable_bank(int bank); >> >> #ifdef CONFIG_X86_MCE_INTEL >> unsigned long mce_intel_adjust_timer(unsigned long interval); >> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c >> index 9239504..9512034 100644 >> --- a/arch/x86/kernel/cpu/mcheck/mce.c >> +++ b/arch/x86/kernel/cpu/mcheck/mce.c >> @@ -94,6 +94,9 @@ DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = { >> [0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL >> }; >> >> +/* MCA banks controlled through firmware first for corrected errors */ >> +mce_banks_t mce_banks_ce_disabled; > > Why the second bitfield? The cleared bits in mce_poll_banks should be > the banks which are handled in FF mode, no? We need this bitfield to prevent enabling CMCI in future cmci_discover() invocations. See usage in cmci_discover() further below. > >> + >> static DEFINE_PER_CPU(struct work_struct, mce_work); >> >> static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs); >> @@ -1932,6 +1935,28 @@ static struct miscdevice mce_chrdev_device = { >> &mce_chrdev_ops, >> }; >> >> +static void __mce_disable_ce_bank(void *arg) >> +{ >> + int bank = *((int *)arg); >> + >> + /* Ensure we don't poll this bank */ > > No need for that comment. > >> + __clear_bit(bank, __get_cpu_var(mce_poll_banks)); >> + /* Disable CMCI */ > > No need for that comment either - function name cannot be more > descriptive :-) Sure, will remove these comments. > >> + cmci_disable_bank(bank); >> +} >> + >> +void mce_disable_ce_bank(int bank) > > Yeah, let's call it ...disable_poll_bank because we're disabling polling > for those banks. And yes, we poll for errors for which no MCE exception > is generated and those happen to be corrected but still... We actually also disable CMCI here. So, in essence, we are disabling these banks for all sorts of direct corrected error reporting. I thought of naming this disable_ce_on_bank() or disable_ce_bank(), but felt that the mce_ prefix would be good to have. If that isn't necessary, I can rename this to disable_ce_on_bank() which sounds more accurate to me. Is that ok? > >> +{ >> + if (bank >= mca_cfg.banks) { >> + pr_warning(FW_BUG >> + "Ignoring request to disable invalid MCA bank %d.\n", >> + bank); >> + return; >> + } >> + set_bit(bank, mce_banks_ce_disabled); >> + on_each_cpu(__mce_disable_ce_bank, &bank, 1); >> +} >> + >> /* >> * mce=off Disables machine check >> * mce=no_cmci Disables CMCI >> diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c >> index ae1697c..78256c0 100644 >> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c >> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c >> @@ -191,6 +191,10 @@ static void cmci_discover(int banks) >> if (test_bit(i, owned)) >> continue; >> >> + /* Skip banks in firmware first mode */ >> + if (test_bit(i, mce_banks_ce_disabled)) >> + continue; >> + >> rdmsrl(MSR_IA32_MCx_CTL2(i), val); >> >> /* Already owned by someone else? */ >> @@ -259,6 +263,20 @@ void cmci_recheck(void) >> local_irq_restore(flags); >> } >> >> +/* Caller must hold the lock on cmci_discover_lock */ >> +static void __cmci_disable_bank(int bank) >> +{ >> + u64 val; >> + >> + if (!test_bit(bank, __get_cpu_var(mce_banks_owned))) >> + return; >> + /* Disable CMCI */ >> + rdmsrl(MSR_IA32_MCx_CTL2(bank), val); >> + val &= ~MCI_CTL2_CMCI_EN; >> + wrmsrl(MSR_IA32_MCx_CTL2(bank), val); >> + __clear_bit(bank, __get_cpu_var(mce_banks_owned)); >> +} >> + >> /* >> * Disable CMCI on this CPU for all banks it owns when it goes down. >> * This allows other CPUs to claim the banks on rediscovery. >> @@ -268,19 +286,12 @@ void cmci_clear(void) >> unsigned long flags; >> int i; >> int banks; >> - u64 val; >> >> if (!cmci_supported(&banks)) >> return; >> raw_spin_lock_irqsave(&cmci_discover_lock, flags); >> for (i = 0; i < banks; i++) { >> - if (!test_bit(i, __get_cpu_var(mce_banks_owned))) >> - continue; >> - /* Disable CMCI */ >> - rdmsrl(MSR_IA32_MCx_CTL2(i), val); >> - val &= ~MCI_CTL2_CMCI_EN; >> - wrmsrl(MSR_IA32_MCx_CTL2(i), val); >> - __clear_bit(i, __get_cpu_var(mce_banks_owned)); >> + __cmci_disable_bank(i); >> } > > This leaves only one line so no need for the {} braces anymore. Ah, ok. > >> raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); >> } >> @@ -315,6 +326,19 @@ void cmci_reenable(void) >> cmci_discover(banks); >> } >> >> +void cmci_disable_bank(int bank) >> +{ >> + int banks; >> + unsigned long flags; >> + >> + if (!cmci_supported(&banks)) >> + return; >> + >> + raw_spin_lock_irqsave(&cmci_discover_lock, flags); >> + __cmci_disable_bank(bank); >> + raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); >> +} > > You need a prototype for that if you're going to call it from mce.c but > CONFIG_X86_MCE_INTEL is not set: > > arch/x86/built-in.o: In function `__mce_disable_ce_bank': > /home/boris/kernel/linux-2.6/arch/x86/kernel/cpu/mcheck/mce.c:1945: undefined reference to `cmci_disable_bank' > make: *** [vmlinux] Error 1 Good catch - missed this. Will address this. > >> + >> static void intel_init_cmci(void) >> { >> int banks; >> diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c >> index f5ef5d5..d8c69ba 100644 >> --- a/drivers/acpi/apei/hest.c >> +++ b/drivers/acpi/apei/hest.c >> @@ -36,6 +36,7 @@ >> #include <linux/io.h> >> #include <linux/platform_device.h> >> #include <acpi/apei.h> >> +#include <asm/mce.h> >> >> #include "apei-internal.h" >> >> @@ -121,6 +122,39 @@ int apei_hest_parse(apei_hest_func_t func, void *data) >> } >> EXPORT_SYMBOL_GPL(apei_hest_parse); >> >> +/* >> + * Check if firmware advertises firmware first mode. We need FF bit to be set >> + * along with a set of MC banks which work in FF mode. >> + */ >> +static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data) >> +{ >> + int i; >> + struct acpi_hest_ia_corrected *cmc; >> + struct acpi_hest_ia_error_bank *mc_bank; >> + >> + if (hest_hdr->type != ACPI_HEST_TYPE_IA32_CORRECTED_CHECK) >> + return 0; >> + >> + cmc = (struct acpi_hest_ia_corrected *)hest_hdr; >> + if (!cmc->enabled || !(cmc->flags & ACPI_HEST_FIRMWARE_FIRST)) >> + return 0; >> + >> + /* >> + * We expect HEST to provide a list of MC banks that >> + * report errors through firmware first mode. > > ... in firmware first mode. Ok. > >> + */ >> + if (cmc->num_hardware_banks == 0) > > if (!cmc->num_hardware_banks) Ok. > >> + return 0; > > Err, why does this function return 0 when the sanity checks above fail? > apei_hest_parse actually looks at the retval and advances the hest_hdr. > >> + >> + pr_info(HEST_PFX "Enabling Firmware First mode for corrected errors.\n"); >> + >> + mc_bank = (struct acpi_hest_ia_error_bank *)(cmc + 1); >> + for (i = 0; i < cmc->num_hardware_banks; i++, mc_bank++) >> + mce_disable_ce_bank(mc_bank->bank_number); >> + >> + return 0; > > IOW, we should return 0 only here. Quite the contrary actually. We want apei_hest_parse() to continue to the next error source structure _until_ we find CMC error source. This requires us to return 0 at the very first check we have for error source type. After that, we should actually return a non-zero value so we don't continue looking at further entries (as per APEI, only one error source of type CMC is permitted in HEST.) I will make this change. As always, thanks for the detailed review! - Naveen -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 21, 2013 at 12:38:13AM +0530, Naveen N. Rao wrote: > We need this bitfield to prevent enabling CMCI in future > cmci_discover() invocations. See usage in cmci_discover() further > below. So?! /* Skip banks in firmware first mode */ if (!test_bit(i, __get_cpu_var(mce_poll_banks)) continue; ... > >Yeah, let's call it ...disable_poll_bank because we're disabling polling > >for those banks. And yes, we poll for errors for which no MCE exception > >is generated and those happen to be corrected but still... > > We actually also disable CMCI here. So, in essence, we are disabling > these banks for all sorts of direct corrected error reporting. I > thought of naming this disable_ce_on_bank() or disable_ce_bank(), > but felt that the mce_ prefix would be good to have. If that isn't > necessary, I can rename this to disable_ce_on_bank() which sounds > more accurate to me. Is that ok? No, mce_disable_bank() removes the respective bank from the polling bitfield and cmci_disable_bank() actually disables CMCI which is Intel-only. So leave it at mce_disable_bank and that should be fine. Thanks.
On 06/21/2013 12:59 AM, Borislav Petkov wrote: > On Fri, Jun 21, 2013 at 12:38:13AM +0530, Naveen N. Rao wrote: >> We need this bitfield to prevent enabling CMCI in future >> cmci_discover() invocations. See usage in cmci_discover() further >> below. > > So?! > > /* Skip banks in firmware first mode */ > if (!test_bit(i, __get_cpu_var(mce_poll_banks)) > continue; This won't work across cpu offline/online, right? We will end up _not_ enabling CMCI on certain banks where we should have. > > ... > >>> Yeah, let's call it ...disable_poll_bank because we're disabling polling >>> for those banks. And yes, we poll for errors for which no MCE exception >>> is generated and those happen to be corrected but still... >> >> We actually also disable CMCI here. So, in essence, we are disabling >> these banks for all sorts of direct corrected error reporting. I >> thought of naming this disable_ce_on_bank() or disable_ce_bank(), >> but felt that the mce_ prefix would be good to have. If that isn't >> necessary, I can rename this to disable_ce_on_bank() which sounds >> more accurate to me. Is that ok? > > No, mce_disable_bank() removes the respective bank from the polling > bitfield and cmci_disable_bank() actually disables CMCI which is > Intel-only. So leave it at mce_disable_bank and that should be fine. Ok. I will rename this to mce_disable_bank() from mce_disable_ce_bank(). Another thing: for hest_parse_cmc(), does the below look good? cmc = (struct acpi_hest_ia_corrected *)hest_hdr; if (!cmc->enabled) return 0; #define ACPI_HEST_PARSING_DONE 1 /* * We expect HEST to provide a list of MC banks that * report errors in firmware first mode. */ if (!(cmc->flags & ACPI_HEST_FIRMWARE_FIRST) || !cmc->num_hardware_banks) return ACPI_HEST_PARSING_DONE; The return value doesn't really matter since we don't check it, but returning an error looked like the wrong thing to do as well. Thanks, Naveen -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 21, 2013 at 01:44:00AM +0530, Naveen N. Rao wrote: > This won't work across cpu offline/online, right? We will end up > _not_ enabling CMCI on certain banks where we should have. Huh, don't understand. cmci_discover runs on each CPU. After you've run hest_parse_cmc early during boot and cleared the mce_poll_banks bits, nothing will set them again so CPU hotplug doesn't matter... > Another thing: for hest_parse_cmc(), does the below look good? > > cmc = (struct acpi_hest_ia_corrected *)hest_hdr; > if (!cmc->enabled) > return 0; > > #define ACPI_HEST_PARSING_DONE 1 > /* > * We expect HEST to provide a list of MC banks that > * report errors in firmware first mode. > */ > if (!(cmc->flags & ACPI_HEST_FIRMWARE_FIRST) || > !cmc->num_hardware_banks) > return ACPI_HEST_PARSING_DONE; > > The return value doesn't really matter since we don't check it, but > returning an error looked like the wrong thing to do as well. I'd add a comment above the "return 1" statement to explain why I'm doing this. It is much more verbose even than a well-named macro :) Thanks.
On 06/21/2013 02:27 AM, Borislav Petkov wrote: > On Fri, Jun 21, 2013 at 01:44:00AM +0530, Naveen N. Rao wrote: >> This won't work across cpu offline/online, right? We will end up >> _not_ enabling CMCI on certain banks where we should have. > > Huh, don't understand. cmci_discover runs on each CPU. After you've run > hest_parse_cmc early during boot and cleared the mce_poll_banks bits, > nothing will set them again so CPU hotplug doesn't matter... Exactly, but mce_poll_banks also doesn't have bits set for banks on which CMCI is enabled. Let's say we have a cpu with 2 banks (not shared), none of which work in FF mode. Both these banks support CMCI, so mce_poll_banks won't have these bits set. On cpu offline, we call cmci_clear() which disables CMCI on these two banks before offlining it. When this cpu is brought online again, we call cmci_discover() which sees that mce_poll_banks doesn't have these two banks enabled and will skip enabling CMCI thinking these are in FF. Right? > >> Another thing: for hest_parse_cmc(), does the below look good? >> >> cmc = (struct acpi_hest_ia_corrected *)hest_hdr; >> if (!cmc->enabled) >> return 0; >> >> #define ACPI_HEST_PARSING_DONE 1 >> /* >> * We expect HEST to provide a list of MC banks that >> * report errors in firmware first mode. >> */ >> if (!(cmc->flags & ACPI_HEST_FIRMWARE_FIRST) || >> !cmc->num_hardware_banks) >> return ACPI_HEST_PARSING_DONE; >> >> The return value doesn't really matter since we don't check it, but >> returning an error looked like the wrong thing to do as well. > > I'd add a comment above the "return 1" statement to explain why I'm > doing this. It is much more verbose even than a well-named macro :) Okay :) Thanks, Naveen -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 21, 2013 at 02:52:25AM +0530, Naveen N. Rao wrote: > Exactly, but mce_poll_banks also doesn't have bits set for banks on > which CMCI is enabled. > > Let's say we have a cpu with 2 banks (not shared), none of which work > in FF mode. Both these banks support CMCI, so mce_poll_banks won't > have these bits set. > > On cpu offline, we call cmci_clear() which disables CMCI on these two > banks before offlining it. When this cpu is brought online again, we > call cmci_discover() which sees that mce_poll_banks doesn't have these > two banks enabled and will skip enabling CMCI thinking these are in > FF. Hmm, mce_intel has yet another bitfield - mce_banks_owned. (Btw, this is why I have a problem with adding yet another bitfield). The way I understand it is, if a bit is set in the owned bitfield, those banks belong to CMCI and are not polled. Now, can we use both mce_banks_owned and mce_poll_banks? If a bit in both bifields is cleared, the corresponding bank is not polled *and* is not owned by CMCI => it is in FF mode. Makes sense?
On 06/21/2013 01:04 PM, Borislav Petkov wrote: > On Fri, Jun 21, 2013 at 02:52:25AM +0530, Naveen N. Rao wrote: >> Exactly, but mce_poll_banks also doesn't have bits set for banks on >> which CMCI is enabled. >> >> Let's say we have a cpu with 2 banks (not shared), none of which work >> in FF mode. Both these banks support CMCI, so mce_poll_banks won't >> have these bits set. >> >> On cpu offline, we call cmci_clear() which disables CMCI on these two >> banks before offlining it. When this cpu is brought online again, we >> call cmci_discover() which sees that mce_poll_banks doesn't have these >> two banks enabled and will skip enabling CMCI thinking these are in >> FF. > > Hmm, mce_intel has yet another bitfield - mce_banks_owned. (Btw, this is > why I have a problem with adding yet another bitfield). > > The way I understand it is, if a bit is set in the owned bitfield, those > banks belong to CMCI and are not polled. > > Now, can we use both mce_banks_owned and mce_poll_banks? If a bit in > both bifields is cleared, the corresponding bank is not polled *and* is > not owned by CMCI => it is in FF mode. > > Makes sense? > Yes, but I'm afraid this won't work either - mce_banks_owned is cleared during cpu offline. This is necessary since a cmci rediscover is triggered on cpu offline, so that if this bank is shared across cores, a different cpu can claim ownership of this bank. The difference between the new bitfield and the existing bitfields is that the new one is not per-cpu. This is a global list of banks across cpus that we do not want enabled. Thanks, Naveen -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 21, 2013 at 01:16:50PM +0530, Naveen N. Rao wrote: > Yes, but I'm afraid this won't work either - mce_banks_owned is > cleared during cpu offline. This is necessary since a cmci > rediscover is triggered on cpu offline, so that if this bank is > shared across cores, a different cpu can claim ownership of this > bank. What for? Sounds strange to me. Anyway, the reason for which mce_banks_owned won't work is because it is in mce_intel and we need a mce.c global bitfield for banks which are excluded from CE handling. So ok, I'm persuaded, yet another bitfield it is ... :-\ Thanks.
On 06/21/2013 02:06 PM, Borislav Petkov wrote: > On Fri, Jun 21, 2013 at 01:16:50PM +0530, Naveen N. Rao wrote: >> Yes, but I'm afraid this won't work either - mce_banks_owned is >> cleared during cpu offline. This is necessary since a cmci >> rediscover is triggered on cpu offline, so that if this bank is >> shared across cores, a different cpu can claim ownership of this >> bank. > > What for? Sounds strange to me. Look at section "15.5.1 CMCI Local APIC Interface" from Intel SDM Vol. 3, and the subsequent section on "System Software Recommendation for Managing CMCI and Machine Check Resources": "For example, if a corrected bit error in a cache shared by two logical processors caused a CMCI, the interrupt will be delivered to both logical processors sharing that microarchitectural sub-system." In other words, some of the MC banks are shared across logical cpus in a core and some across all cores in a package. During initialization, the first cpu in a core ends up owning most of the banks specific to the core/package. When this cpu is offlined, we would want the second cpu in that core to discover and enable CMCI for those MC banks which it shares with the first cpu. As an example, consider a hypothetical single-core Intel processor with Hyperthreading. On init, let's say the first cpu ends up owning banks 1, 2, 3 and 4; and the second cpu ends up owning banks 1 and 2. This would mean that MC banks 1 and 2 are "hyperthread"-specific, while banks 3 and 4 are shared. Now, if we offline the first cpu, it disables CMCI on all 4 banks. However, banks 3 and 4 are shared. So, if we now do a cmci rediscovery, the second cpu will see that banks 3 and 4 don't have CMCI enabled and will then claim ownership of those so that we can continue to receive and process CMCIs from those subsystems. Makes sense now? Thanks, Naveen -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 21, 2013 at 03:02:04PM +0530, Naveen N. Rao wrote: > As an example, consider a hypothetical single-core Intel processor > with Hyperthreading. On init, let's say the first cpu ends up owning > banks 1, 2, 3 and 4; and the second cpu ends up owning banks 1 and > 2. This would mean that MC banks 1 and 2 are "hyperthread"-specific, > while banks 3 and 4 are shared. Now, if we offline the first cpu, it > disables CMCI on all 4 banks. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is what I find strange - having to disable CMCI, especially on a shared bank, just to reenable it right back on the next core. But I guess this is a constraint dictated by the hardware...
On Fri, Jun 21, 2013 at 1:36 AM, Borislav Petkov <bp@alien8.de> wrote:
> So ok, I'm persuaded, yet another bitfield it is ... :-\
Let's add some more comments on what each of these bitfields mean. Otherwise
we will be going back over this next time we have a patch that touches one
of them and we've all forgotten the subtle details explained in this
e-mail thread.
-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 21, 2013 at 09:47:31AM -0700, Tony Luck wrote: > On Fri, Jun 21, 2013 at 1:36 AM, Borislav Petkov <bp@alien8.de> wrote: > > So ok, I'm persuaded, yet another bitfield it is ... :-\ > > Let's add some more comments on what each of these bitfields mean. Otherwise > we will be going back over this next time we have a patch that touches one > of them and we've all forgotten the subtle details explained in this > e-mail thread. Very good point - it's like you're reading my mind! I thought about suggesting exactly that but then it slipped my mind with all the details and commotion here. :-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index fa5f71e..380fff8 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -188,6 +188,9 @@ extern void register_mce_write_callback(ssize_t (*)(struct file *filp, const char __user *ubuf, size_t usize, loff_t *off)); +/* Disable CMCI/polling for MCA bank claimed by firmware */ +extern void mce_disable_ce_bank(int bank); + /* * Exception handler */ diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h index 5b7d4fa..193edc1 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-internal.h +++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h @@ -25,6 +25,9 @@ int mce_severity(struct mce *a, int tolerant, char **msg); struct dentry *mce_get_debugfs_dir(void); extern struct mce_bank *mce_banks; +extern mce_banks_t mce_banks_ce_disabled; + +void cmci_disable_bank(int bank); #ifdef CONFIG_X86_MCE_INTEL unsigned long mce_intel_adjust_timer(unsigned long interval); diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 9239504..9512034 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -94,6 +94,9 @@ DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = { [0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL }; +/* MCA banks controlled through firmware first for corrected errors */ +mce_banks_t mce_banks_ce_disabled; + static DEFINE_PER_CPU(struct work_struct, mce_work); static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs); @@ -1932,6 +1935,28 @@ static struct miscdevice mce_chrdev_device = { &mce_chrdev_ops, }; +static void __mce_disable_ce_bank(void *arg) +{ + int bank = *((int *)arg); + + /* Ensure we don't poll this bank */ + __clear_bit(bank, __get_cpu_var(mce_poll_banks)); + /* Disable CMCI */ + cmci_disable_bank(bank); +} + +void mce_disable_ce_bank(int bank) +{ + if (bank >= mca_cfg.banks) { + pr_warning(FW_BUG + "Ignoring request to disable invalid MCA bank %d.\n", + bank); + return; + } + set_bit(bank, mce_banks_ce_disabled); + on_each_cpu(__mce_disable_ce_bank, &bank, 1); +} + /* * mce=off Disables machine check * mce=no_cmci Disables CMCI diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c index ae1697c..78256c0 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c @@ -191,6 +191,10 @@ static void cmci_discover(int banks) if (test_bit(i, owned)) continue; + /* Skip banks in firmware first mode */ + if (test_bit(i, mce_banks_ce_disabled)) + continue; + rdmsrl(MSR_IA32_MCx_CTL2(i), val); /* Already owned by someone else? */ @@ -259,6 +263,20 @@ void cmci_recheck(void) local_irq_restore(flags); } +/* Caller must hold the lock on cmci_discover_lock */ +static void __cmci_disable_bank(int bank) +{ + u64 val; + + if (!test_bit(bank, __get_cpu_var(mce_banks_owned))) + return; + /* Disable CMCI */ + rdmsrl(MSR_IA32_MCx_CTL2(bank), val); + val &= ~MCI_CTL2_CMCI_EN; + wrmsrl(MSR_IA32_MCx_CTL2(bank), val); + __clear_bit(bank, __get_cpu_var(mce_banks_owned)); +} + /* * Disable CMCI on this CPU for all banks it owns when it goes down. * This allows other CPUs to claim the banks on rediscovery. @@ -268,19 +286,12 @@ void cmci_clear(void) unsigned long flags; int i; int banks; - u64 val; if (!cmci_supported(&banks)) return; raw_spin_lock_irqsave(&cmci_discover_lock, flags); for (i = 0; i < banks; i++) { - if (!test_bit(i, __get_cpu_var(mce_banks_owned))) - continue; - /* Disable CMCI */ - rdmsrl(MSR_IA32_MCx_CTL2(i), val); - val &= ~MCI_CTL2_CMCI_EN; - wrmsrl(MSR_IA32_MCx_CTL2(i), val); - __clear_bit(i, __get_cpu_var(mce_banks_owned)); + __cmci_disable_bank(i); } raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); } @@ -315,6 +326,19 @@ void cmci_reenable(void) cmci_discover(banks); } +void cmci_disable_bank(int bank) +{ + int banks; + unsigned long flags; + + if (!cmci_supported(&banks)) + return; + + raw_spin_lock_irqsave(&cmci_discover_lock, flags); + __cmci_disable_bank(bank); + raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); +} + static void intel_init_cmci(void) { int banks; diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c index f5ef5d5..d8c69ba 100644 --- a/drivers/acpi/apei/hest.c +++ b/drivers/acpi/apei/hest.c @@ -36,6 +36,7 @@ #include <linux/io.h> #include <linux/platform_device.h> #include <acpi/apei.h> +#include <asm/mce.h> #include "apei-internal.h" @@ -121,6 +122,39 @@ int apei_hest_parse(apei_hest_func_t func, void *data) } EXPORT_SYMBOL_GPL(apei_hest_parse); +/* + * Check if firmware advertises firmware first mode. We need FF bit to be set + * along with a set of MC banks which work in FF mode. + */ +static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data) +{ + int i; + struct acpi_hest_ia_corrected *cmc; + struct acpi_hest_ia_error_bank *mc_bank; + + if (hest_hdr->type != ACPI_HEST_TYPE_IA32_CORRECTED_CHECK) + return 0; + + cmc = (struct acpi_hest_ia_corrected *)hest_hdr; + if (!cmc->enabled || !(cmc->flags & ACPI_HEST_FIRMWARE_FIRST)) + return 0; + + /* + * We expect HEST to provide a list of MC banks that + * report errors through firmware first mode. + */ + if (cmc->num_hardware_banks == 0) + return 0; + + pr_info(HEST_PFX "Enabling Firmware First mode for corrected errors.\n"); + + mc_bank = (struct acpi_hest_ia_error_bank *)(cmc + 1); + for (i = 0; i < cmc->num_hardware_banks; i++, mc_bank++) + mce_disable_ce_bank(mc_bank->bank_number); + + return 0; +} + struct ghes_arr { struct platform_device **ghes_devs; unsigned int count; @@ -227,6 +261,8 @@ void __init acpi_hest_init(void) goto err; } + apei_hest_parse(hest_parse_cmc, NULL); + if (!ghes_disable) { rc = apei_hest_parse(hest_parse_ghes_count, &ghes_count); if (rc)
The Corrected Machine Check structure (CMC) in HEST has a flag which can be set by the firmware to indicate to the OS that it prefers to process the corrected error events first. In this scenario, the OS is expected to not monitor for corrected errors (through CMCI/polling). Instead, the firmware notifies the OS on corrected error events through GHES. Linux already has support for GHES. This patch adds support for parsing CMC structure and to disable CMCI/polling if the firmware first flag is set. Further, the list of machine check bank structures at the end of CMC is used to determine which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. - Naveen -- Changes: - Incorporated comments from Boris and Tony from the previous thread at http://thread.gmane.org/gmane.linux.acpi.devel/60802 - Added patch to disable firmware first mode through a boot option. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> --- arch/x86/include/asm/mce.h | 3 ++ arch/x86/kernel/cpu/mcheck/mce-internal.h | 3 ++ arch/x86/kernel/cpu/mcheck/mce.c | 25 ++++++++++++++++++ arch/x86/kernel/cpu/mcheck/mce_intel.c | 40 +++++++++++++++++++++++------ drivers/acpi/apei/hest.c | 36 ++++++++++++++++++++++++++ 5 files changed, 99 insertions(+), 8 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html