diff mbox

[V2,3/3] acpi, apei: use EFI memmap to map GHES memory

Message ID 1433185940-24770-4-git-send-email-zjzhang@codeaurora.org (mailing list archive)
State Not Applicable, archived
Delegated to: Andy Gross
Headers show

Commit Message

Jonathan (Zhixiong) Zhang June 1, 2015, 7:12 p.m. UTC
From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

With ACPI APEI firmware first handling, generic hardware error
record is updated by firmware in GHES memory region. When firmware
updated GHES memory region in DDR without going through cache,
Linux reads stale data from cache.

GHES memory region should be mapped with cache attributes
according to EFI memory map when applicable. If firmware updates
DDR directly, EFI memory map has GHES memory region defined as
uncached; If firmware updates cache, EFI memory map has GHES
memory region defined as cached.

When EFI is configued, map IRQ page using efi_remap() provided by
EFI subsystem.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Matt Fleming June 5, 2015, 9:57 a.m. UTC | #1
[ Cc'ing Boris and Tony. Folks original patch is here,
  https://lkml.kernel.org/r/1433185940-24770-4-git-send-email-zjzhang@codeaurora.org ]

On Mon, 01 Jun, at 12:12:20PM, Jonathan (Zhixiong) Zhang wrote:
> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
> 
> With ACPI APEI firmware first handling, generic hardware error
> record is updated by firmware in GHES memory region. When firmware
> updated GHES memory region in DDR without going through cache,
> Linux reads stale data from cache.
> 
> GHES memory region should be mapped with cache attributes
> according to EFI memory map when applicable. If firmware updates
> DDR directly, EFI memory map has GHES memory region defined as
> uncached; If firmware updates cache, EFI memory map has GHES
> memory region defined as cached.
> 
> When EFI is configued, map IRQ page using efi_remap() provided by
> EFI subsystem.

[...]

> @@ -159,6 +160,7 @@ static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>  	return (void __iomem *)vaddr;
>  }
>  
> +#ifndef CONFIG_EFI
>  static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
>  {
>  	unsigned long vaddr;

Sprinkling CONFIG_EFI like this is wrong. On x86 we run kernels built
with CONFIG_EFI on machines with BIOS - you can't make the EFI vs.
non-EFI decision at compile-time.

So this patch looks like a potential regression to me since on x86
ghes_ioremap_pfn_irq() would not be used anymore and instead we'd be
using efi_remap() which will perform an ioremap_nocache() if it gets
called after efi_free_boot_services().

And based on the comments in the apei code, that's going to cause issues
because ioremap() does not work in atomic context, not to mention the
fact that we've gone from a cached mapping to an uncached one.

Instead, I suggest you modify ghes_ioremap_* to query the EFI memmap (if
it's available at runtime) to lookup the correct mapping attributes.

But I've Cc'd some more people who have actually worked on this code,
since I'm not one of them.
Borislav Petkov June 5, 2015, 10:25 a.m. UTC | #2
On Fri, Jun 05, 2015 at 10:57:01AM +0100, Matt Fleming wrote:
> [ Cc'ing Boris and Tony. Folks original patch is here,
>   https://lkml.kernel.org/r/1433185940-24770-4-git-send-email-zjzhang@codeaurora.org ]
> 
> On Mon, 01 Jun, at 12:12:20PM, Jonathan (Zhixiong) Zhang wrote:
> > From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
> > 
> > With ACPI APEI firmware first handling, generic hardware error
> > record is updated by firmware in GHES memory region. When firmware
> > updated GHES memory region in DDR without going through cache,

What is DDR?

I think this needs to be clarified first before we go any further.

I picked up on the sidelines that this might be arm64-specific stuff. If
so, your approach is wrong: you're merging efi_* facilities from x86 and
ia64 into generic efi ones but then doing CONFIG_EFI ifdeffery in GHES.

What you should do instead is have arch-specific:

ghes_ioremap_pfn_irq()
ghes_iounmap_irq()
...

and whatever else functionality which is different on your arch and
which get called from the generic ghes.c driver.

In the arch-specific ones you can go wild with the ifdeffery and whatnot
is needed on that specific arch.

Something like that, at least.
Jonathan (Zhixiong) Zhang June 5, 2015, 4:43 p.m. UTC | #3
Thanks Matt for the review. Yes, you are right on, I am following
this:
 > modify ghes_ioremap_* to query the EFI memmap (if
 > it's available at runtime) to lookup the correct mapping attributes.

Jonathan

On 6/5/2015 2:57 AM, Matt Fleming wrote:
> [ Cc'ing Boris and Tony. Folks original patch is here,
>    https://lkml.kernel.org/r/1433185940-24770-4-git-send-email-zjzhang@codeaurora.org ]
>
> On Mon, 01 Jun, at 12:12:20PM, Jonathan (Zhixiong) Zhang wrote:
>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>
>> With ACPI APEI firmware first handling, generic hardware error
>> record is updated by firmware in GHES memory region. When firmware
>> updated GHES memory region in DDR without going through cache,
>> Linux reads stale data from cache.
>>
>> GHES memory region should be mapped with cache attributes
>> according to EFI memory map when applicable. If firmware updates
>> DDR directly, EFI memory map has GHES memory region defined as
>> uncached; If firmware updates cache, EFI memory map has GHES
>> memory region defined as cached.
>>
>> When EFI is configued, map IRQ page using efi_remap() provided by
>> EFI subsystem.
>
> [...]
>
>> @@ -159,6 +160,7 @@ static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>>   	return (void __iomem *)vaddr;
>>   }
>>
>> +#ifndef CONFIG_EFI
>>   static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
>>   {
>>   	unsigned long vaddr;
>
> Sprinkling CONFIG_EFI like this is wrong. On x86 we run kernels built
> with CONFIG_EFI on machines with BIOS - you can't make the EFI vs.
> non-EFI decision at compile-time.
>
> So this patch looks like a potential regression to me since on x86
> ghes_ioremap_pfn_irq() would not be used anymore and instead we'd be
> using efi_remap() which will perform an ioremap_nocache() if it gets
> called after efi_free_boot_services().
>
> And based on the comments in the apei code, that's going to cause issues
> because ioremap() does not work in atomic context, not to mention the
> fact that we've gone from a cached mapping to an uncached one.
>
> Instead, I suggest you modify ghes_ioremap_* to query the EFI memmap (if
> it's available at runtime) to lookup the correct mapping attributes.
>
> But I've Cc'd some more people who have actually worked on this code,
> since I'm not one of them.
>
Borislav Petkov June 5, 2015, 4:50 p.m. UTC | #4
On Fri, Jun 05, 2015 at 09:43:26AM -0700, Zhang, Jonathan Zhixiong wrote:
> Thanks Matt for the review. Yes, you are right on, I am following
> this:
> > modify ghes_ioremap_* to query the EFI memmap (if
> > it's available at runtime) to lookup the correct mapping attributes.

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

So please do not top-post.

Thanks.
Jonathan (Zhixiong) Zhang June 5, 2015, 5:05 p.m. UTC | #5
Thank you Borislav for the review. Pls. see comments inline...

On 6/5/2015 3:25 AM, Borislav Petkov wrote:
> On Fri, Jun 05, 2015 at 10:57:01AM +0100, Matt Fleming wrote:
>> [ Cc'ing Boris and Tony. Folks original patch is here,
>>    https://lkml.kernel.org/r/1433185940-24770-4-git-send-email-zjzhang@codeaurora.org ]
>>
>> On Mon, 01 Jun, at 12:12:20PM, Jonathan (Zhixiong) Zhang wrote:
>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>>
>>> With ACPI APEI firmware first handling, generic hardware error
>>> record is updated by firmware in GHES memory region. When firmware
>>> updated GHES memory region in DDR without going through cache,
>
> What is DDR?
>
> I think this needs to be clarified first before we go any further.
I thought the word "memory" might be confusing, because there are
memories on the system that is not accessible by Linux. In this
context, the APEI error data is accessed (read and write) by both Linux
and platform firmware; hence both sides should access the memory using
same cache attribute. I wanted to emphasize the idea that even though
normally DDR is cachable, but in this case when platform access it with
un-cached attribute, Linux should do the same.
I will try to make it more clear in next version of the patch.
>
> I picked up on the sidelines that this might be arm64-specific stuff. If
> so, your approach is wrong: you're merging efi_* facilities from x86 and
> ia64 into generic efi ones but then doing CONFIG_EFI ifdeffery in GHES.
>
> What you should do instead is have arch-specific:
>
> ghes_ioremap_pfn_irq()
> ghes_iounmap_irq()
> ...
>
> and whatever else functionality which is different on your arch and
> which get called from the generic ghes.c driver.
>
> In the arch-specific ones you can go wild with the ifdeffery and whatnot
> is needed on that specific arch.
>
> Something like that, at least.
Makes total sense. I was trying to reduce binary size for non-EFI
system, but as Matt pointed out in another feedback, on x86 even
BIOS based system has CONFIG_EFI enabled. I will submit a new version
accordingly.
Jonathan (Zhixiong) Zhang June 5, 2015, 5:06 p.m. UTC | #6
On 6/5/2015 9:50 AM, Borislav Petkov wrote:
> On Fri, Jun 05, 2015 at 09:43:26AM -0700, Zhang, Jonathan Zhixiong wrote:
>> Thanks Matt for the review. Yes, you are right on, I am following
>> this:
>>> modify ghes_ioremap_* to query the EFI memmap (if
>>> it's available at runtime) to lookup the correct mapping attributes.
>
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
>
> So please do not top-post.
>
> Thanks.
>
Will do. Thanks for the advise, Borislav.
Borislav Petkov June 5, 2015, 5:12 p.m. UTC | #7
On Fri, Jun 05, 2015 at 10:05:13AM -0700, Zhang, Jonathan Zhixiong wrote:
> >What is DDR?
> >
> >I think this needs to be clarified first before we go any further.
> I thought the word "memory" might be confusing, because there are

So you mean normal RAM here?

> memories on the system that is not accessible by Linux. In this
> context, the APEI error data is accessed (read and write) by both Linux
> and platform firmware; hence both sides should access the memory using
> same cache attribute. I wanted to emphasize the idea that even though
> normally DDR is cachable, but in this case when platform access it with
> un-cached attribute, Linux should do the same.

Makes sense.

Btw, do we need synchronization between firmware and Linux then? Does
Linux need to know when it is ok to touch that memory?
Jonathan (Zhixiong) Zhang June 5, 2015, 9:43 p.m. UTC | #8
On 6/5/2015 10:12 AM, Borislav Petkov wrote:
> On Fri, Jun 05, 2015 at 10:05:13AM -0700, Zhang, Jonathan Zhixiong wrote:
>>> What is DDR?
>>>
>>> I think this needs to be clarified first before we go any further.
>> I thought the word "memory" might be confusing, because there are
>
> So you mean normal RAM here?
Yes, exactly. I should use this word RAM instead.
>
>> memories on the system that is not accessible by Linux. In this
>> context, the APEI error data is accessed (read and write) by both Linux
>> and platform firmware; hence both sides should access the memory using
>> same cache attribute. I wanted to emphasize the idea that even though
>> normally DDR is cachable, but in this case when platform access it with
>> un-cached attribute, Linux should do the same.
>
> Makes sense.
>
> Btw, do we need synchronization between firmware and Linux then? Does
> Linux need to know when it is ok to touch that memory?
Good question. Linux zeros out error status code in the error data
after the data is consumed, this is good; but it alone does not solve
the synchronization concern.

For interrupt notification type (SCI or NMI) error source, this may not
be an issue since both sides can operate under the rule that the error
data is only overwritten but never appended. But what about poll
notification type? In this case, platform gathers error, updates the
memory region as needed; Linux checks the same memory region
periodically.

An ACPI APEI proposal intended to solve this concern has been discussed
in UEFI forum. The idea is to have OS to send platform a signal (through
updating a designated register) after error data is consumed. Therefore,
when OS is accessing the memory region, platform does not try to access
the same memory region in the mean time.

After this proposal is approved and published, I will submit a patch
to implement it.
diff mbox

Patch

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e82d0976a5d0..56875ca76aa7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -48,6 +48,7 @@ 
 #include <linux/pci.h>
 #include <linux/aer.h>
 #include <linux/nmi.h>
+#include <linux/efi.h>
 
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
@@ -159,6 +160,7 @@  static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
 	return (void __iomem *)vaddr;
 }
 
+#ifndef CONFIG_EFI
 static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
 {
 	unsigned long vaddr;
@@ -169,6 +171,7 @@  static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
 
 	return (void __iomem *)vaddr;
 }
+#endif
 
 static void ghes_iounmap_nmi(void __iomem *vaddr_ptr)
 {
@@ -180,6 +183,7 @@  static void ghes_iounmap_nmi(void __iomem *vaddr_ptr)
 	arch_apei_flush_tlb_one(vaddr);
 }
 
+#ifndef CONFIG_EFI
 static void ghes_iounmap_irq(void __iomem *vaddr_ptr)
 {
 	unsigned long vaddr = (unsigned long __force)vaddr_ptr;
@@ -189,6 +193,7 @@  static void ghes_iounmap_irq(void __iomem *vaddr_ptr)
 	unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
 	arch_apei_flush_tlb_one(vaddr);
 }
+#endif
 
 static int ghes_estatus_pool_init(void)
 {
@@ -309,7 +314,11 @@  static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
 			vaddr = ghes_ioremap_pfn_nmi(paddr >> PAGE_SHIFT);
 		} else {
 			spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
+#ifdef CONFIG_EFI
+			vaddr = efi_remap(paddr & PAGE_MASK, PAGE_SIZE);
+#else
 			vaddr = ghes_ioremap_pfn_irq(paddr >> PAGE_SHIFT);
+#endif
 		}
 		trunk = PAGE_SIZE - offset;
 		trunk = min(trunk, len);
@@ -324,7 +333,11 @@  static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
 			ghes_iounmap_nmi(vaddr);
 			raw_spin_unlock(&ghes_ioremap_lock_nmi);
 		} else {
+#ifdef CONFIG_EFI
+			iounmap(vaddr);
+#else
 			ghes_iounmap_irq(vaddr);
+#endif
 			spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
 		}
 	}