diff mbox

[3/3] x86: add support for the non-standard protected e820 type

Message ID 1427299449-26722-4-git-send-email-hch@lst.de (mailing list archive)
State New, archived
Headers show

Commit Message

Christoph Hellwig March 25, 2015, 4:04 p.m. UTC
Various recent bioses support NVDIMMs or ADR using a non-standard
e820 memory type, and Intel supplied reference Linux code using this
type to various vendors.

Wire this e820 table type up to export platform devices for the pmem
driver so that we can use it in Linux, and also provide a memmap=
argument to manually tag memory as protected, which can be used
if the bios doesn't use the standard nonstandard interface, or
we just want to test the pmem driver with regular memory.

Based on an earlier patch from Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/kernel-parameters.txt |  6 ++++
 arch/x86/Kconfig                    | 13 +++++++
 arch/x86/include/asm/setup.h        |  6 ++++
 arch/x86/include/uapi/asm/e820.h    | 10 ++++++
 arch/x86/kernel/Makefile            |  1 +
 arch/x86/kernel/e820.c              | 22 +++++++++++-
 arch/x86/kernel/pmem.c              | 70 +++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup.c             |  2 ++
 8 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/pmem.c

Comments

Elliott, Robert (Server Storage) March 25, 2015, 7:47 p.m. UTC | #1
A few editorial nits follow...

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Christoph Hellwig
> Sent: Wednesday, March 25, 2015 11:04 AM
> Subject: [PATCH 3/3] x86: add support for the non-standard protected e820
> type
> 
> Various recent bioses support NVDIMMs or ADR using a non-standard
> e820 memory type, and Intel supplied reference Linux code using this
> type to various vendors.

If this goes into the kernel, I think someone should request that the
ACPI specification mark the value 12 as permanently tainted.  Otherwise
they could assign it to some new meaning that conflicts with all
of this.

A few editorial nits follow...

> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-
> parameters.txt
> index bfcb1a6..98eeaca 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -1965,6 +1965,12 @@ bytes respectively. Such letter suffixes can also be
> entirely omitted.
>  			         or
>  			         memmap=0x10000$0x18690000
> 
> +	memmap=nn[KMG]!ss[KMG]
> +			[KNL,X86] Mark specific memory as protected.
> +			Region of memory to be used, from ss to ss+nn.
> +			The memory region may be marked as type 12 and
> +			is NVDIMM or ADR memory.

It can be confusing that E820h type values differ from UEFI 
memory map type values, so it might be worth emphasizing that is 
an E820h type value.

Showing hex alongside would also clarify that it is indeed a 
decimal 12.

Suggestion: "e820 type 12 (0xc)"

> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b7d31ca..93a27e4 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1430,6 +1430,19 @@ config ILLEGAL_POINTER_VALUE
> 
>  source "mm/Kconfig"
> 
> +config X86_PMEM_LEGACY
> +	bool "Support non-stanard NVDIMMs and ADR protected memory"

stanard s/b standard


> +	help
> +	  Treat memory marked using the non-stard e820 type of 12 as used

stard s/b standard

> +	  by the Intel Sandy Bridge-EP reference BIOS as protected memory.
> +	  The kernel will the offer these regions to the pmem driver so

the s/b then

> +	  they can be used for persistent storage.
> +
> +	  If you say N the kernel will treat the ADR region like an e820
> +	  reserved region.
> +
> +	  Say Y if unsure
> +

...
> diff --git a/arch/x86/include/uapi/asm/e820.h
> b/arch/x86/include/uapi/asm/e820.h
> index d993e33..e040950 100644
> --- a/arch/x86/include/uapi/asm/e820.h
> +++ b/arch/x86/include/uapi/asm/e820.h
> @@ -33,6 +33,16 @@
>  #define E820_NVS	4
>  #define E820_UNUSABLE	5
> 
> +/*
> + * This is a non-standardized way to represent ADR or NVDIMM regions that
> + * persist over a reboot.  The kernel will ignore their special
> capabilities
> + * unless the CONFIG_X86_PMEM_LEGACY option is set.
> + *
> + * Note that older platforms also used 6 for the same type of memory,
> + * but newer versions switched to 12 as 6 was assigned differently.  Some
> + * time they will learn..
> + */
> +#define E820_PROTECTED_KERN	12

The p in pmem means persistent, not protected.  To me, protected sounds 
like a security feature.  I suggest using a different macro name and 
text strings.

...
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index de088e3..8c6a976 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -48,10 +48,22 @@ unsigned long pci_mem_start = 0xaeedbabe;
>  EXPORT_SYMBOL(pci_mem_start);
>  #endif
> 
> +/*
> + * Memory protected by the system ADR (asynchronous dram refresh)
> + * mechanism is accounted as ram for purposes of establishing max_pfn
> + * and mem_map.
> + */

ADR doesn't really protect or ensure persistence; it just puts memory 
into self-refresh mode. Batteries/capacitors and other logic is what
provides the persistence.

...
> @@ -154,6 +166,9 @@ static void __init e820_print_type(u32 type)
>  	case E820_UNUSABLE:
>  		printk(KERN_CONT "unusable");
>  		break;
> +	case E820_PROTECTED_KERN:
> +		printk(KERN_CONT "protected (type %u)\n", type);
> +		break;

Same "protect" comment applies there, and a few other places in
the patch not excerpted.


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ross Zwisler March 25, 2015, 8:23 p.m. UTC | #2
On Wed, 2015-03-25 at 17:04 +0100, Christoph Hellwig wrote:
> Various recent bioses support NVDIMMs or ADR using a non-standard
> e820 memory type, and Intel supplied reference Linux code using this
> type to various vendors.
> 
> Wire this e820 table type up to export platform devices for the pmem
> driver so that we can use it in Linux, and also provide a memmap=
> argument to manually tag memory as protected, which can be used
> if the bios doesn't use the standard nonstandard interface, or
> we just want to test the pmem driver with regular memory.
> 
> Based on an earlier patch from Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

<snip>

> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b7d31ca..93a27e4 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1430,6 +1430,19 @@ config ILLEGAL_POINTER_VALUE
>  
>  source "mm/Kconfig"
>  
> +config X86_PMEM_LEGACY
> +	bool "Support non-stanard NVDIMMs and ADR protected memory"
> +	help
> +	  Treat memory marked using the non-stard e820 type of 12 as used
> +	  by the Intel Sandy Bridge-EP reference BIOS as protected memory.
> +	  The kernel will the offer these regions to the pmem driver so
> +	  they can be used for persistent storage.
> +
> +	  If you say N the kernel will treat the ADR region like an e820
> +	  reserved region.
> +
> +	  Say Y if unsure

Would it make sense to have this default to "y", or is that too strong?


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ross Zwisler March 25, 2015, 8:25 p.m. UTC | #3
On Wed, 2015-03-25 at 17:04 +0100, Christoph Hellwig wrote:
> Various recent bioses support NVDIMMs or ADR using a non-standard
> e820 memory type, and Intel supplied reference Linux code using this
> type to various vendors.
> 
> Wire this e820 table type up to export platform devices for the pmem
> driver so that we can use it in Linux, and also provide a memmap=
> argument to manually tag memory as protected, which can be used
> if the bios doesn't use the standard nonstandard interface, or
> we just want to test the pmem driver with regular memory.

<snip>

> @@ -154,6 +166,9 @@ static void __init e820_print_type(u32 type)
>  	case E820_UNUSABLE:
>  		printk(KERN_CONT "unusable");
>  		break;
> +	case E820_PROTECTED_KERN:
> +		printk(KERN_CONT "protected (type %u)\n", type);

I don't think we want a newline in this string.


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams March 25, 2015, 8:29 p.m. UTC | #4
On Wed, Mar 25, 2015 at 1:23 PM, Ross Zwisler
<ross.zwisler@linux.intel.com> wrote:
> On Wed, 2015-03-25 at 17:04 +0100, Christoph Hellwig wrote:
>> Various recent bioses support NVDIMMs or ADR using a non-standard
>> e820 memory type, and Intel supplied reference Linux code using this
>> type to various vendors.
>>
>> Wire this e820 table type up to export platform devices for the pmem
>> driver so that we can use it in Linux, and also provide a memmap=
>> argument to manually tag memory as protected, which can be used
>> if the bios doesn't use the standard nonstandard interface, or
>> we just want to test the pmem driver with regular memory.
>>
>> Based on an earlier patch from Dave Jiang <dave.jiang@intel.com>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> <snip>
>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index b7d31ca..93a27e4 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1430,6 +1430,19 @@ config ILLEGAL_POINTER_VALUE
>>
>>  source "mm/Kconfig"
>>
>> +config X86_PMEM_LEGACY
>> +     bool "Support non-stanard NVDIMMs and ADR protected memory"
>> +     help
>> +       Treat memory marked using the non-stard e820 type of 12 as used
>> +       by the Intel Sandy Bridge-EP reference BIOS as protected memory.
>> +       The kernel will the offer these regions to the pmem driver so
>> +       they can be used for persistent storage.
>> +
>> +       If you say N the kernel will treat the ADR region like an e820
>> +       reserved region.
>> +
>> +       Say Y if unsure
>
> Would it make sense to have this default to "y", or is that too strong?

We never default new enabling to y.  Maybe some exceptions, but this
isn't one of them in my mind.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams March 25, 2015, 8:35 p.m. UTC | #5
On Wed, Mar 25, 2015 at 9:04 AM, Christoph Hellwig <hch@lst.de> wrote:
> Various recent bioses support NVDIMMs or ADR using a non-standard
> e820 memory type, and Intel supplied reference Linux code using this
> type to various vendors.
>
> Wire this e820 table type up to export platform devices for the pmem
> driver so that we can use it in Linux, and also provide a memmap=
> argument to manually tag memory as protected, which can be used
> if the bios doesn't use the standard nonstandard interface, or
> we just want to test the pmem driver with regular memory.
>
> Based on an earlier patch from Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
[..]
> +static __init int register_pmem_devices(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < e820.nr_map; i++) {
> +               struct e820entry *ei = &e820.map[i];
> +
> +               if (ei->type == E820_PROTECTED_KERN) {
> +                       struct resource res = {
> +                               .flags  = IORESOURCE_MEM,
> +                               .start  = ei->addr,
> +                               .end    = ei->addr + ei->size - 1,
> +                       };
> +                       register_pmem_device(&res);
> +               }
> +       }
> +
> +       return 0;
> +}

Aside from the s/E820_PROTECTED_KERN/E820_PMEM/ suggestion this looks
ok to me.  The "vaporware" new way can be a superset of this
mechanism.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig March 26, 2015, 8:02 a.m. UTC | #6
On Wed, Mar 25, 2015 at 07:47:26PM +0000, Elliott, Robert (Server Storage) wrote:
> If this goes into the kernel, I think someone should request that the
> ACPI specification mark the value 12 as permanently tainted.  Otherwise
> they could assign it to some new meaning that conflicts with all
> of this.

I think reusing it now would create huge problems, but I have no idea
to how to even talk to the people writing the ACPI spec.

> It can be confusing that E820h type values differ from UEFI 
> memory map type values, so it might be worth emphasizing that is 
> an E820h type value.
> 
> Showing hex alongside would also clarify that it is indeed a 
> decimal 12.
> 
> Suggestion: "e820 type 12 (0xc)"

I've fixed this as well as the various typos.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig March 26, 2015, 8:03 a.m. UTC | #7
On Wed, Mar 25, 2015 at 02:25:33PM -0600, Ross Zwisler wrote:
> > +	case E820_PROTECTED_KERN:
> > +		printk(KERN_CONT "protected (type %u)\n", type);
> 
> I don't think we want a newline in this string.

Fixed.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bfcb1a6..98eeaca 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1965,6 +1965,12 @@  bytes respectively. Such letter suffixes can also be entirely omitted.
 			         or
 			         memmap=0x10000$0x18690000
 
+	memmap=nn[KMG]!ss[KMG]
+			[KNL,X86] Mark specific memory as protected.
+			Region of memory to be used, from ss to ss+nn.
+			The memory region may be marked as type 12 and
+			is NVDIMM or ADR memory.
+
 	memory_corruption_check=0/1 [X86]
 			Some BIOSes seem to corrupt the first 64k of
 			memory when doing things like suspend/resume.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca..93a27e4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1430,6 +1430,19 @@  config ILLEGAL_POINTER_VALUE
 
 source "mm/Kconfig"
 
+config X86_PMEM_LEGACY
+	bool "Support non-stanard NVDIMMs and ADR protected memory"
+	help
+	  Treat memory marked using the non-stard e820 type of 12 as used
+	  by the Intel Sandy Bridge-EP reference BIOS as protected memory.
+	  The kernel will the offer these regions to the pmem driver so
+	  they can be used for persistent storage.
+
+	  If you say N the kernel will treat the ADR region like an e820
+	  reserved region.
+
+	  Say Y if unsure
+
 config HIGHPTE
 	bool "Allocate 3rd-level pagetables from highmem"
 	depends on HIGHMEM
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index ff4e7b2..2352fde 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -57,6 +57,12 @@  extern void x86_ce4100_early_setup(void);
 static inline void x86_ce4100_early_setup(void) { }
 #endif
 
+#ifdef CONFIG_X86_PMEM_LEGACY
+void reserve_pmem(void);
+#else
+static inline void reserve_pmem(void) { }
+#endif
+
 #ifndef _SETUP
 
 #include <asm/espfix.h>
diff --git a/arch/x86/include/uapi/asm/e820.h b/arch/x86/include/uapi/asm/e820.h
index d993e33..e040950 100644
--- a/arch/x86/include/uapi/asm/e820.h
+++ b/arch/x86/include/uapi/asm/e820.h
@@ -33,6 +33,16 @@ 
 #define E820_NVS	4
 #define E820_UNUSABLE	5
 
+/*
+ * This is a non-standardized way to represent ADR or NVDIMM regions that
+ * persist over a reboot.  The kernel will ignore their special capabilities
+ * unless the CONFIG_X86_PMEM_LEGACY option is set.
+ *
+ * Note that older platforms also used 6 for the same type of memory,
+ * but newer versions switched to 12 as 6 was assigned differently.  Some
+ * time they will learn..
+ */
+#define E820_PROTECTED_KERN	12
 
 /*
  * reserved RAM used by kernel itself
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index cdb1b70..971f18c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -94,6 +94,7 @@  obj-$(CONFIG_KVM_GUEST)		+= kvm.o kvmclock.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt.o paravirt_patch_$(BITS).o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)	+= pvclock.o
+obj-$(CONFIG_X86_PMEM_LEGACY)	+= pmem.o
 
 obj-$(CONFIG_PCSPKR_PLATFORM)	+= pcspeaker.o
 
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index de088e3..8c6a976 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -48,10 +48,22 @@  unsigned long pci_mem_start = 0xaeedbabe;
 EXPORT_SYMBOL(pci_mem_start);
 #endif
 
+/*
+ * Memory protected by the system ADR (asynchronous dram refresh)
+ * mechanism is accounted as ram for purposes of establishing max_pfn
+ * and mem_map.
+ */
+#ifdef CONFIG_X86_PMEM_LEGACY
+static inline bool is_e820_ram(__u32 type)
+{
+	return type == E820_RAM || type == E820_PROTECTED_KERN;
+}
+#else
 static inline bool is_e820_ram(__u32 type)
 {
 	return type == E820_RAM;
 }
+#endif
 
 /*
  * This function checks if any part of the range <start,end> is mapped
@@ -154,6 +166,9 @@  static void __init e820_print_type(u32 type)
 	case E820_UNUSABLE:
 		printk(KERN_CONT "unusable");
 		break;
+	case E820_PROTECTED_KERN:
+		printk(KERN_CONT "protected (type %u)\n", type);
+		break;
 	default:
 		printk(KERN_CONT "type %u", type);
 		break;
@@ -871,6 +886,9 @@  static int __init parse_memmap_one(char *p)
 	} else if (*p == '$') {
 		start_at = memparse(p+1, &p);
 		e820_add_region(start_at, mem_size, E820_RESERVED);
+	} else if (*p == '!') {
+		start_at = memparse(p+1, &p);
+		e820_add_region(start_at, mem_size, E820_PROTECTED_KERN);
 	} else
 		e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1);
 
@@ -912,6 +930,7 @@  static inline const char *e820_type_to_string(int e820_type)
 	case E820_ACPI:	return "ACPI Tables";
 	case E820_NVS:	return "ACPI Non-volatile Storage";
 	case E820_UNUSABLE:	return "Unusable memory";
+	case E820_PROTECTED_KERN: return "Protected RAM";
 	default:	return "reserved";
 	}
 }
@@ -946,7 +965,8 @@  void __init e820_reserve_resources(void)
 		 * pcibios_resource_survey()
 		 */
 		if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
-			res->flags |= IORESOURCE_BUSY;
+			if (e820.map[i].type != E820_PROTECTED_KERN)
+				res->flags |= IORESOURCE_BUSY;
 			insert_resource(&iomem_resource, res);
 		}
 		res++;
diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c
new file mode 100644
index 0000000..902e2f9
--- /dev/null
+++ b/arch/x86/kernel/pmem.c
@@ -0,0 +1,70 @@ 
+/*
+ * Copyright (c) 2009, Intel Corporation.
+ * Copyright (c) 2015, Christoph Hellwig.
+ */
+#include <linux/memblock.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <asm/e820.h>
+#include <asm/page_types.h>
+#include <asm/setup.h>
+
+void __init reserve_pmem(void)
+{
+	int i;
+
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+
+		if (ei->type != E820_PROTECTED_KERN)
+			continue;
+
+		memblock_reserve(ei->addr, ei->addr + ei->size);
+		max_pfn_mapped = init_memory_mapping(
+				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
+				ei->addr + ei->size);
+	}
+}
+		
+static __init void register_pmem_device(struct resource *res)
+{
+	struct platform_device *pdev;
+	int error;
+
+	pdev = platform_device_alloc("pmem", PLATFORM_DEVID_AUTO);
+	if (!pdev)
+		return;
+
+	error = platform_device_add_resources(pdev, res, 1);
+	if (error)
+		goto out_put_pdev;
+
+	error = platform_device_add(pdev);
+	if (error)
+		goto out_put_pdev;
+	return;
+out_put_pdev:
+	dev_warn(&pdev->dev, "failed to add pmem device!\n");
+	platform_device_put(pdev);
+}
+
+static __init int register_pmem_devices(void)
+{
+	int i;
+
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+
+		if (ei->type == E820_PROTECTED_KERN) {
+			struct resource res = {
+				.flags	= IORESOURCE_MEM,
+				.start	= ei->addr,
+				.end	= ei->addr + ei->size - 1,
+			};
+			register_pmem_device(&res);
+		}
+	}
+
+	return 0;
+}
+device_initcall(register_pmem_devices);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 0a2421c..f2bed2b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1158,6 +1158,8 @@  void __init setup_arch(char **cmdline_p)
 
 	early_acpi_boot_init();
 
+	reserve_pmem();
+
 	initmem_init();
 	dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT);