Message ID | 1433185940-24770-4-git-send-email-zjzhang@codeaurora.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | Andy Gross |
Headers | show |
[ Cc'ing Boris and Tony. Folks original patch is here, https://lkml.kernel.org/r/1433185940-24770-4-git-send-email-zjzhang@codeaurora.org ] On Mon, 01 Jun, at 12:12:20PM, Jonathan (Zhixiong) Zhang wrote: > From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org> > > With ACPI APEI firmware first handling, generic hardware error > record is updated by firmware in GHES memory region. When firmware > updated GHES memory region in DDR without going through cache, > Linux reads stale data from cache. > > GHES memory region should be mapped with cache attributes > according to EFI memory map when applicable. If firmware updates > DDR directly, EFI memory map has GHES memory region defined as > uncached; If firmware updates cache, EFI memory map has GHES > memory region defined as cached. > > When EFI is configued, map IRQ page using efi_remap() provided by > EFI subsystem. [...] > @@ -159,6 +160,7 @@ static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) > return (void __iomem *)vaddr; > } > > +#ifndef CONFIG_EFI > static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) > { > unsigned long vaddr; Sprinkling CONFIG_EFI like this is wrong. On x86 we run kernels built with CONFIG_EFI on machines with BIOS - you can't make the EFI vs. non-EFI decision at compile-time. So this patch looks like a potential regression to me since on x86 ghes_ioremap_pfn_irq() would not be used anymore and instead we'd be using efi_remap() which will perform an ioremap_nocache() if it gets called after efi_free_boot_services(). And based on the comments in the apei code, that's going to cause issues because ioremap() does not work in atomic context, not to mention the fact that we've gone from a cached mapping to an uncached one. Instead, I suggest you modify ghes_ioremap_* to query the EFI memmap (if it's available at runtime) to lookup the correct mapping attributes. But I've Cc'd some more people who have actually worked on this code, since I'm not one of them.
On Fri, Jun 05, 2015 at 10:57:01AM +0100, Matt Fleming wrote: > [ Cc'ing Boris and Tony. Folks original patch is here, > https://lkml.kernel.org/r/1433185940-24770-4-git-send-email-zjzhang@codeaurora.org ] > > On Mon, 01 Jun, at 12:12:20PM, Jonathan (Zhixiong) Zhang wrote: > > From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org> > > > > With ACPI APEI firmware first handling, generic hardware error > > record is updated by firmware in GHES memory region. When firmware > > updated GHES memory region in DDR without going through cache, What is DDR? I think this needs to be clarified first before we go any further. I picked up on the sidelines that this might be arm64-specific stuff. If so, your approach is wrong: you're merging efi_* facilities from x86 and ia64 into generic efi ones but then doing CONFIG_EFI ifdeffery in GHES. What you should do instead is have arch-specific: ghes_ioremap_pfn_irq() ghes_iounmap_irq() ... and whatever else functionality which is different on your arch and which get called from the generic ghes.c driver. In the arch-specific ones you can go wild with the ifdeffery and whatnot is needed on that specific arch. Something like that, at least.
Thanks Matt for the review. Yes, you are right on, I am following this: > modify ghes_ioremap_* to query the EFI memmap (if > it's available at runtime) to lookup the correct mapping attributes. Jonathan On 6/5/2015 2:57 AM, Matt Fleming wrote: > [ Cc'ing Boris and Tony. Folks original patch is here, > https://lkml.kernel.org/r/1433185940-24770-4-git-send-email-zjzhang@codeaurora.org ] > > On Mon, 01 Jun, at 12:12:20PM, Jonathan (Zhixiong) Zhang wrote: >> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org> >> >> With ACPI APEI firmware first handling, generic hardware error >> record is updated by firmware in GHES memory region. When firmware >> updated GHES memory region in DDR without going through cache, >> Linux reads stale data from cache. >> >> GHES memory region should be mapped with cache attributes >> according to EFI memory map when applicable. If firmware updates >> DDR directly, EFI memory map has GHES memory region defined as >> uncached; If firmware updates cache, EFI memory map has GHES >> memory region defined as cached. >> >> When EFI is configued, map IRQ page using efi_remap() provided by >> EFI subsystem. > > [...] > >> @@ -159,6 +160,7 @@ static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) >> return (void __iomem *)vaddr; >> } >> >> +#ifndef CONFIG_EFI >> static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) >> { >> unsigned long vaddr; > > Sprinkling CONFIG_EFI like this is wrong. On x86 we run kernels built > with CONFIG_EFI on machines with BIOS - you can't make the EFI vs. > non-EFI decision at compile-time. > > So this patch looks like a potential regression to me since on x86 > ghes_ioremap_pfn_irq() would not be used anymore and instead we'd be > using efi_remap() which will perform an ioremap_nocache() if it gets > called after efi_free_boot_services(). > > And based on the comments in the apei code, that's going to cause issues > because ioremap() does not work in atomic context, not to mention the > fact that we've gone from a cached mapping to an uncached one. > > Instead, I suggest you modify ghes_ioremap_* to query the EFI memmap (if > it's available at runtime) to lookup the correct mapping attributes. > > But I've Cc'd some more people who have actually worked on this code, > since I'm not one of them. >
On Fri, Jun 05, 2015 at 09:43:26AM -0700, Zhang, Jonathan Zhixiong wrote: > Thanks Matt for the review. Yes, you are right on, I am following > this: > > modify ghes_ioremap_* to query the EFI memmap (if > > it's available at runtime) to lookup the correct mapping attributes. A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? So please do not top-post. Thanks.
Thank you Borislav for the review. Pls. see comments inline... On 6/5/2015 3:25 AM, Borislav Petkov wrote: > On Fri, Jun 05, 2015 at 10:57:01AM +0100, Matt Fleming wrote: >> [ Cc'ing Boris and Tony. Folks original patch is here, >> https://lkml.kernel.org/r/1433185940-24770-4-git-send-email-zjzhang@codeaurora.org ] >> >> On Mon, 01 Jun, at 12:12:20PM, Jonathan (Zhixiong) Zhang wrote: >>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org> >>> >>> With ACPI APEI firmware first handling, generic hardware error >>> record is updated by firmware in GHES memory region. When firmware >>> updated GHES memory region in DDR without going through cache, > > What is DDR? > > I think this needs to be clarified first before we go any further. I thought the word "memory" might be confusing, because there are memories on the system that is not accessible by Linux. In this context, the APEI error data is accessed (read and write) by both Linux and platform firmware; hence both sides should access the memory using same cache attribute. I wanted to emphasize the idea that even though normally DDR is cachable, but in this case when platform access it with un-cached attribute, Linux should do the same. I will try to make it more clear in next version of the patch. > > I picked up on the sidelines that this might be arm64-specific stuff. If > so, your approach is wrong: you're merging efi_* facilities from x86 and > ia64 into generic efi ones but then doing CONFIG_EFI ifdeffery in GHES. > > What you should do instead is have arch-specific: > > ghes_ioremap_pfn_irq() > ghes_iounmap_irq() > ... > > and whatever else functionality which is different on your arch and > which get called from the generic ghes.c driver. > > In the arch-specific ones you can go wild with the ifdeffery and whatnot > is needed on that specific arch. > > Something like that, at least. Makes total sense. I was trying to reduce binary size for non-EFI system, but as Matt pointed out in another feedback, on x86 even BIOS based system has CONFIG_EFI enabled. I will submit a new version accordingly.
On 6/5/2015 9:50 AM, Borislav Petkov wrote: > On Fri, Jun 05, 2015 at 09:43:26AM -0700, Zhang, Jonathan Zhixiong wrote: >> Thanks Matt for the review. Yes, you are right on, I am following >> this: >>> modify ghes_ioremap_* to query the EFI memmap (if >>> it's available at runtime) to lookup the correct mapping attributes. > > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > A: Top-posting. > Q: What is the most annoying thing in e-mail? > > So please do not top-post. > > Thanks. > Will do. Thanks for the advise, Borislav.
On Fri, Jun 05, 2015 at 10:05:13AM -0700, Zhang, Jonathan Zhixiong wrote: > >What is DDR? > > > >I think this needs to be clarified first before we go any further. > I thought the word "memory" might be confusing, because there are So you mean normal RAM here? > memories on the system that is not accessible by Linux. In this > context, the APEI error data is accessed (read and write) by both Linux > and platform firmware; hence both sides should access the memory using > same cache attribute. I wanted to emphasize the idea that even though > normally DDR is cachable, but in this case when platform access it with > un-cached attribute, Linux should do the same. Makes sense. Btw, do we need synchronization between firmware and Linux then? Does Linux need to know when it is ok to touch that memory?
On 6/5/2015 10:12 AM, Borislav Petkov wrote: > On Fri, Jun 05, 2015 at 10:05:13AM -0700, Zhang, Jonathan Zhixiong wrote: >>> What is DDR? >>> >>> I think this needs to be clarified first before we go any further. >> I thought the word "memory" might be confusing, because there are > > So you mean normal RAM here? Yes, exactly. I should use this word RAM instead. > >> memories on the system that is not accessible by Linux. In this >> context, the APEI error data is accessed (read and write) by both Linux >> and platform firmware; hence both sides should access the memory using >> same cache attribute. I wanted to emphasize the idea that even though >> normally DDR is cachable, but in this case when platform access it with >> un-cached attribute, Linux should do the same. > > Makes sense. > > Btw, do we need synchronization between firmware and Linux then? Does > Linux need to know when it is ok to touch that memory? Good question. Linux zeros out error status code in the error data after the data is consumed, this is good; but it alone does not solve the synchronization concern. For interrupt notification type (SCI or NMI) error source, this may not be an issue since both sides can operate under the rule that the error data is only overwritten but never appended. But what about poll notification type? In this case, platform gathers error, updates the memory region as needed; Linux checks the same memory region periodically. An ACPI APEI proposal intended to solve this concern has been discussed in UEFI forum. The idea is to have OS to send platform a signal (through updating a designated register) after error data is consumed. Therefore, when OS is accessing the memory region, platform does not try to access the same memory region in the mean time. After this proposal is approved and published, I will submit a patch to implement it.
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index e82d0976a5d0..56875ca76aa7 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -48,6 +48,7 @@ #include <linux/pci.h> #include <linux/aer.h> #include <linux/nmi.h> +#include <linux/efi.h> #include <acpi/ghes.h> #include <acpi/apei.h> @@ -159,6 +160,7 @@ static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) return (void __iomem *)vaddr; } +#ifndef CONFIG_EFI static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) { unsigned long vaddr; @@ -169,6 +171,7 @@ static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) return (void __iomem *)vaddr; } +#endif static void ghes_iounmap_nmi(void __iomem *vaddr_ptr) { @@ -180,6 +183,7 @@ static void ghes_iounmap_nmi(void __iomem *vaddr_ptr) arch_apei_flush_tlb_one(vaddr); } +#ifndef CONFIG_EFI static void ghes_iounmap_irq(void __iomem *vaddr_ptr) { unsigned long vaddr = (unsigned long __force)vaddr_ptr; @@ -189,6 +193,7 @@ static void ghes_iounmap_irq(void __iomem *vaddr_ptr) unmap_kernel_range_noflush(vaddr, PAGE_SIZE); arch_apei_flush_tlb_one(vaddr); } +#endif static int ghes_estatus_pool_init(void) { @@ -309,7 +314,11 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, vaddr = ghes_ioremap_pfn_nmi(paddr >> PAGE_SHIFT); } else { spin_lock_irqsave(&ghes_ioremap_lock_irq, flags); +#ifdef CONFIG_EFI + vaddr = efi_remap(paddr & PAGE_MASK, PAGE_SIZE); +#else vaddr = ghes_ioremap_pfn_irq(paddr >> PAGE_SHIFT); +#endif } trunk = PAGE_SIZE - offset; trunk = min(trunk, len); @@ -324,7 +333,11 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, ghes_iounmap_nmi(vaddr); raw_spin_unlock(&ghes_ioremap_lock_nmi); } else { +#ifdef CONFIG_EFI + iounmap(vaddr); +#else ghes_iounmap_irq(vaddr); +#endif spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags); } }