diff mbox series

[RFC,1/5] efi: Detect UEFI 2.8 Special Purpose Memory

Message ID 155440491334.3190322.44013027330479237.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show
Series EFI Special Purpose Memory Support | expand

Commit Message

Dan Williams April 4, 2019, 7:08 p.m. UTC
UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
interpretation of the EFI Memory Types as "reserved for a special
purpose".

The proposed Linux behavior for special purpose memory is that it is
reserved for direct-access (device-dax) by default and not available for
any kernel usage, not even as an OOM fallback. Later, through udev
scripts or another init mechanism, these device-dax claimed ranges can
be reconfigured and hot-added to the available System-RAM with a unique
node identifier.

A follow-on patch integrates parsing of the ACPI HMAT to identify the
node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
now, arrange for EFI_MEMORY_SP memory to be reserved.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/Kconfig                  |   18 ++++++++++++++++++
 arch/x86/boot/compressed/eboot.c  |    5 ++++-
 arch/x86/boot/compressed/kaslr.c  |    2 +-
 arch/x86/include/asm/e820/types.h |    9 +++++++++
 arch/x86/kernel/e820.c            |    9 +++++++--
 arch/x86/platform/efi/efi.c       |   10 +++++++++-
 include/linux/efi.h               |   14 ++++++++++++++
 include/linux/ioport.h            |    1 +
 8 files changed, 63 insertions(+), 5 deletions(-)

Comments

Ard Biesheuvel April 6, 2019, 4:21 a.m. UTC | #1
Hi Dan,

On Thu, 4 Apr 2019 at 21:21, Dan Williams <dan.j.williams@intel.com> wrote:
>
> UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> interpretation of the EFI Memory Types as "reserved for a special
> purpose".
>
> The proposed Linux behavior for special purpose memory is that it is
> reserved for direct-access (device-dax) by default and not available for
> any kernel usage, not even as an OOM fallback. Later, through udev
> scripts or another init mechanism, these device-dax claimed ranges can
> be reconfigured and hot-added to the available System-RAM with a unique
> node identifier.
>
> A follow-on patch integrates parsing of the ACPI HMAT to identify the
> node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> now, arrange for EFI_MEMORY_SP memory to be reserved.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Andy Shevchenko <andy@infradead.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/Kconfig                  |   18 ++++++++++++++++++
>  arch/x86/boot/compressed/eboot.c  |    5 ++++-
>  arch/x86/boot/compressed/kaslr.c  |    2 +-
>  arch/x86/include/asm/e820/types.h |    9 +++++++++
>  arch/x86/kernel/e820.c            |    9 +++++++--
>  arch/x86/platform/efi/efi.c       |   10 +++++++++-
>  include/linux/efi.h               |   14 ++++++++++++++
>  include/linux/ioport.h            |    1 +
>  8 files changed, 63 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index c1f9b3cf437c..cb9ca27de7a5 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1961,6 +1961,24 @@ config EFI_MIXED
>
>            If unsure, say N.
>
> +config EFI_SPECIAL_MEMORY
> +       bool "EFI Special Purpose Memory Support"
> +       depends on EFI
> +       ---help---
> +         On systems that have mixed performance classes of memory EFI
> +         may indicate special purpose memory with an attribute (See
> +         EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
> +         attribute may have unique performance characteristics compared
> +         to the system's general purpose "System RAM" pool. On the
> +         expectation that such memory has application specific usage
> +         answer Y to arrange for the kernel to reserve it for
> +         direct-access (device-dax) by default. The memory range can
> +         later be optionally assigned to the page allocator by system
> +         administrator policy. Say N to have the kernel treat this
> +         memory as general purpose by default.
> +
> +         If unsure, say Y.
> +

EFI_MEMORY_SP is now part of the UEFI spec proper, so it does not make
sense to make any understanding of it Kconfigurable.

Instead, what I would prefer is to implement support for EFI_MEMORY_SP
unconditionally (including the ability to identify it in the debug
dump of the memory map etc), in a way that all architectures can use
it. Then, I think we should never treat it as ordinary memory and make
it the firmware's problem not to use the EFI_MEMORY_SP attribute in
cases where it results in undesired behavior in the OS.

Also, sInce there is a generic component and a x86 component, can you
please split those up?

You only cc'ed me on patch #1 this time, but could you please cc me on
the entire series for v2? Thanks.


>  config SECCOMP
>         def_bool y
>         prompt "Enable seccomp to safely compute untrusted bytecode"
> diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
> index 544ac4fafd11..9b90fae21abe 100644
> --- a/arch/x86/boot/compressed/eboot.c
> +++ b/arch/x86/boot/compressed/eboot.c
> @@ -560,7 +560,10 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s
>                 case EFI_BOOT_SERVICES_CODE:
>                 case EFI_BOOT_SERVICES_DATA:
>                 case EFI_CONVENTIONAL_MEMORY:
> -                       e820_type = E820_TYPE_RAM;
> +                       if (is_efi_special(d))
> +                               e820_type = E820_TYPE_SPECIAL;
> +                       else
> +                               e820_type = E820_TYPE_RAM;
>                         break;
>
>                 case EFI_ACPI_MEMORY_NVS:
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 2e53c056ba20..897e46eb9714 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -757,7 +757,7 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
>                  *
>                  * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free.
>                  */
> -               if (md->type != EFI_CONVENTIONAL_MEMORY)
> +               if (md->type != EFI_CONVENTIONAL_MEMORY || is_efi_special(md))
>                         continue;
>
>                 if (efi_mirror_found &&
> diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
> index c3aa4b5e49e2..0ab8abae2e8b 100644
> --- a/arch/x86/include/asm/e820/types.h
> +++ b/arch/x86/include/asm/e820/types.h
> @@ -28,6 +28,15 @@ enum e820_type {
>          */
>         E820_TYPE_PRAM          = 12,
>
> +       /*
> +        * Special-purpose / application-specific memory is indicated to
> +        * the system via the EFI_MEMORY_SP attribute. Define an e820
> +        * translation of this memory type for the purpose of
> +        * reserving this range and marking it with the
> +        * IORES_DESC_APPLICATION_RESERVED designation.
> +        */
> +       E820_TYPE_SPECIAL       = 0xefffffff,
> +
>         /*
>          * Reserved RAM used by the kernel itself if
>          * CONFIG_INTEL_TXT=y is enabled, memory of this type
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 2879e234e193..9f50dd0bbb04 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -176,6 +176,7 @@ static void __init e820_print_type(enum e820_type type)
>         switch (type) {
>         case E820_TYPE_RAM:             /* Fall through: */
>         case E820_TYPE_RESERVED_KERN:   pr_cont("usable");                      break;
> +       case E820_TYPE_SPECIAL:         /* Fall through: */
>         case E820_TYPE_RESERVED:        pr_cont("reserved");                    break;
>         case E820_TYPE_ACPI:            pr_cont("ACPI data");                   break;
>         case E820_TYPE_NVS:             pr_cont("ACPI NVS");                    break;
> @@ -1023,6 +1024,7 @@ static const char *__init e820_type_to_string(struct e820_entry *entry)
>         case E820_TYPE_UNUSABLE:        return "Unusable memory";
>         case E820_TYPE_PRAM:            return "Persistent Memory (legacy)";
>         case E820_TYPE_PMEM:            return "Persistent Memory";
> +       case E820_TYPE_SPECIAL:         /* Fall-through: */
>         case E820_TYPE_RESERVED:        return "Reserved";
>         default:                        return "Unknown E820 type";
>         }
> @@ -1038,6 +1040,7 @@ static unsigned long __init e820_type_to_iomem_type(struct e820_entry *entry)
>         case E820_TYPE_UNUSABLE:        /* Fall-through: */
>         case E820_TYPE_PRAM:            /* Fall-through: */
>         case E820_TYPE_PMEM:            /* Fall-through: */
> +       case E820_TYPE_SPECIAL:         /* Fall-through: */
>         case E820_TYPE_RESERVED:        /* Fall-through: */
>         default:                        return IORESOURCE_MEM;
>         }
> @@ -1050,6 +1053,7 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
>         case E820_TYPE_NVS:             return IORES_DESC_ACPI_NV_STORAGE;
>         case E820_TYPE_PMEM:            return IORES_DESC_PERSISTENT_MEMORY;
>         case E820_TYPE_PRAM:            return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
> +       case E820_TYPE_SPECIAL:         return IORES_DESC_APPLICATION_RESERVED;
>         case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
>         case E820_TYPE_RAM:             /* Fall-through: */
>         case E820_TYPE_UNUSABLE:        /* Fall-through: */
> @@ -1065,13 +1069,14 @@ static bool __init do_mark_busy(enum e820_type type, struct resource *res)
>                 return true;
>
>         /*
> -        * Treat persistent memory like device memory, i.e. reserve it
> -        * for exclusive use of a driver
> +        * Treat persistent memory and other special memory ranges like
> +        * device memory, i.e. reserve it for exclusive use of a driver
>          */
>         switch (type) {
>         case E820_TYPE_RESERVED:
>         case E820_TYPE_PRAM:
>         case E820_TYPE_PMEM:
> +       case E820_TYPE_SPECIAL:
>                 return false;
>         case E820_TYPE_RESERVED_KERN:
>         case E820_TYPE_RAM:
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index e1cb01a22fa8..d227751f331b 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -139,7 +139,9 @@ static void __init do_add_efi_memmap(void)
>                 case EFI_BOOT_SERVICES_CODE:
>                 case EFI_BOOT_SERVICES_DATA:
>                 case EFI_CONVENTIONAL_MEMORY:
> -                       if (md->attribute & EFI_MEMORY_WB)
> +                       if (is_efi_special(md))
> +                               e820_type = E820_TYPE_SPECIAL;
> +                       else if (md->attribute & EFI_MEMORY_WB)
>                                 e820_type = E820_TYPE_RAM;
>                         else
>                                 e820_type = E820_TYPE_RESERVED;
> @@ -753,6 +755,12 @@ static bool should_map_region(efi_memory_desc_t *md)
>         if (IS_ENABLED(CONFIG_X86_32))
>                 return false;
>
> +       /*
> +        * Special purpose memory is not mapped by default.
> +        */
> +       if (is_efi_special(md))
> +               return false;
> +
>         /*
>          * Map all of RAM so that we can access arguments in the 1:1
>          * mapping when making EFI runtime calls.
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 54357a258b35..cecbc2bda1da 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -112,6 +112,7 @@ typedef     struct {
>  #define EFI_MEMORY_MORE_RELIABLE \
>                                 ((u64)0x0000000000010000ULL)    /* higher reliability */
>  #define EFI_MEMORY_RO          ((u64)0x0000000000020000ULL)    /* read-only */
> +#define EFI_MEMORY_SP          ((u64)0x0000000000040000ULL)    /* special purpose */
>  #define EFI_MEMORY_RUNTIME     ((u64)0x8000000000000000ULL)    /* range requires runtime mapping */
>  #define EFI_MEMORY_DESCRIPTOR_VERSION  1
>
> @@ -128,6 +129,19 @@ typedef struct {
>         u64 attribute;
>  } efi_memory_desc_t;
>
> +#ifdef CONFIG_EFI_SPECIAL_MEMORY
> +static inline bool is_efi_special(efi_memory_desc_t *md)
> +{
> +       return md->type == EFI_CONVENTIONAL_MEMORY
> +               && (md->attribute & EFI_MEMORY_SP);
> +}
> +#else
> +static inline bool is_efi_special(efi_memory_desc_t *md)
> +{
> +       return false;
> +}
> +#endif
> +
>  typedef struct {
>         efi_guid_t guid;
>         u32 headersize;
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index da0ebaec25f0..2d79841ee9b9 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -133,6 +133,7 @@ enum {
>         IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
>         IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
>         IORES_DESC_DEVICE_PUBLIC_MEMORY         = 7,
> +       IORES_DESC_APPLICATION_RESERVED         = 8,
>  };
>
>  /* helpers to define resources */
>
Dan Williams April 9, 2019, 4:43 p.m. UTC | #2
On Fri, Apr 5, 2019 at 9:21 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> Hi Dan,
>
> On Thu, 4 Apr 2019 at 21:21, Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > interpretation of the EFI Memory Types as "reserved for a special
> > purpose".
> >
> > The proposed Linux behavior for special purpose memory is that it is
> > reserved for direct-access (device-dax) by default and not available for
> > any kernel usage, not even as an OOM fallback. Later, through udev
> > scripts or another init mechanism, these device-dax claimed ranges can
> > be reconfigured and hot-added to the available System-RAM with a unique
> > node identifier.
> >
> > A follow-on patch integrates parsing of the ACPI HMAT to identify the
> > node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> > now, arrange for EFI_MEMORY_SP memory to be reserved.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Cc: Darren Hart <dvhart@infradead.org>
> > Cc: Andy Shevchenko <andy@infradead.org>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  arch/x86/Kconfig                  |   18 ++++++++++++++++++
> >  arch/x86/boot/compressed/eboot.c  |    5 ++++-
> >  arch/x86/boot/compressed/kaslr.c  |    2 +-
> >  arch/x86/include/asm/e820/types.h |    9 +++++++++
> >  arch/x86/kernel/e820.c            |    9 +++++++--
> >  arch/x86/platform/efi/efi.c       |   10 +++++++++-
> >  include/linux/efi.h               |   14 ++++++++++++++
> >  include/linux/ioport.h            |    1 +
> >  8 files changed, 63 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index c1f9b3cf437c..cb9ca27de7a5 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1961,6 +1961,24 @@ config EFI_MIXED
> >
> >            If unsure, say N.
> >
> > +config EFI_SPECIAL_MEMORY
> > +       bool "EFI Special Purpose Memory Support"
> > +       depends on EFI
> > +       ---help---
> > +         On systems that have mixed performance classes of memory EFI
> > +         may indicate special purpose memory with an attribute (See
> > +         EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
> > +         attribute may have unique performance characteristics compared
> > +         to the system's general purpose "System RAM" pool. On the
> > +         expectation that such memory has application specific usage
> > +         answer Y to arrange for the kernel to reserve it for
> > +         direct-access (device-dax) by default. The memory range can
> > +         later be optionally assigned to the page allocator by system
> > +         administrator policy. Say N to have the kernel treat this
> > +         memory as general purpose by default.
> > +
> > +         If unsure, say Y.
> > +
>
> EFI_MEMORY_SP is now part of the UEFI spec proper, so it does not make
> sense to make any understanding of it Kconfigurable.

No, I think you're misunderstanding what this Kconfig option is trying
to achieve.

The configuration capability is solely for the default kernel policy.
As can already be seen by Christoph's response [1] the thought that
the firmware gets more leeway to dictate to Linux memory policy may be
objectionable.

[1]: https://lore.kernel.org/lkml/20190409121318.GA16955@infradead.org/

So the Kconfig option is gating whether the kernel simply ignores the
attribute and gives it to the page allocator by default. Anything
fancier, like sub-dividing how much is OS managed vs device-dax
accessed requires the OS to reserve it all from the page-allocator by
default until userspace policy can be applied.

> Instead, what I would prefer is to implement support for EFI_MEMORY_SP
> unconditionally (including the ability to identify it in the debug
> dump of the memory map etc), in a way that all architectures can use
> it. Then, I think we should never treat it as ordinary memory and make
> it the firmware's problem not to use the EFI_MEMORY_SP attribute in
> cases where it results in undesired behavior in the OS.

No, a policy of "never treat it as ordinary memory" confuses the base
intent of the attribute which is an optional hint to get the OS to not
put immovable / non-critical allocations in what could be a precious
resource.

Moreover, the interface for platform firmware to indicate that a
memory range should never be treated as ordinary memory is simply the
existing "reserved" memory type, not this attribute. That's the
mechanism to use when platform firmware knows that a driver is needed
for a given mmio resource.

> Also, sInce there is a generic component and a x86 component, can you
> please split those up?

Sure, can do.

>
> You only cc'ed me on patch #1 this time, but could you please cc me on
> the entire series for v2? Thanks.

Yes, will do, and thanks for taking a look.
Ard Biesheuvel April 9, 2019, 5:21 p.m. UTC | #3
On Tue, 9 Apr 2019 at 09:44, Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Apr 5, 2019 at 9:21 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >
> > Hi Dan,
> >
> > On Thu, 4 Apr 2019 at 21:21, Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > > interpretation of the EFI Memory Types as "reserved for a special
> > > purpose".
> > >
> > > The proposed Linux behavior for special purpose memory is that it is
> > > reserved for direct-access (device-dax) by default and not available for
> > > any kernel usage, not even as an OOM fallback. Later, through udev
> > > scripts or another init mechanism, these device-dax claimed ranges can
> > > be reconfigured and hot-added to the available System-RAM with a unique
> > > node identifier.
> > >
> > > A follow-on patch integrates parsing of the ACPI HMAT to identify the
> > > node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> > > now, arrange for EFI_MEMORY_SP memory to be reserved.
> > >
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Borislav Petkov <bp@alien8.de>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > Cc: Darren Hart <dvhart@infradead.org>
> > > Cc: Andy Shevchenko <andy@infradead.org>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  arch/x86/Kconfig                  |   18 ++++++++++++++++++
> > >  arch/x86/boot/compressed/eboot.c  |    5 ++++-
> > >  arch/x86/boot/compressed/kaslr.c  |    2 +-
> > >  arch/x86/include/asm/e820/types.h |    9 +++++++++
> > >  arch/x86/kernel/e820.c            |    9 +++++++--
> > >  arch/x86/platform/efi/efi.c       |   10 +++++++++-
> > >  include/linux/efi.h               |   14 ++++++++++++++
> > >  include/linux/ioport.h            |    1 +
> > >  8 files changed, 63 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > index c1f9b3cf437c..cb9ca27de7a5 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -1961,6 +1961,24 @@ config EFI_MIXED
> > >
> > >            If unsure, say N.
> > >
> > > +config EFI_SPECIAL_MEMORY
> > > +       bool "EFI Special Purpose Memory Support"
> > > +       depends on EFI
> > > +       ---help---
> > > +         On systems that have mixed performance classes of memory EFI
> > > +         may indicate special purpose memory with an attribute (See
> > > +         EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
> > > +         attribute may have unique performance characteristics compared
> > > +         to the system's general purpose "System RAM" pool. On the
> > > +         expectation that such memory has application specific usage
> > > +         answer Y to arrange for the kernel to reserve it for
> > > +         direct-access (device-dax) by default. The memory range can
> > > +         later be optionally assigned to the page allocator by system
> > > +         administrator policy. Say N to have the kernel treat this
> > > +         memory as general purpose by default.
> > > +
> > > +         If unsure, say Y.
> > > +
> >
> > EFI_MEMORY_SP is now part of the UEFI spec proper, so it does not make
> > sense to make any understanding of it Kconfigurable.
>
> No, I think you're misunderstanding what this Kconfig option is trying
> to achieve.
>
> The configuration capability is solely for the default kernel policy.
> As can already be seen by Christoph's response [1] the thought that
> the firmware gets more leeway to dictate to Linux memory policy may be
> objectionable.
>
> [1]: https://lore.kernel.org/lkml/20190409121318.GA16955@infradead.org/
>
> So the Kconfig option is gating whether the kernel simply ignores the
> attribute and gives it to the page allocator by default. Anything
> fancier, like sub-dividing how much is OS managed vs device-dax
> accessed requires the OS to reserve it all from the page-allocator by
> default until userspace policy can be applied.
>

I don't think this policy should dictate whether we pretend that the
attribute doesn't exist in the first place. We should just wire up the
bit fully, and only apply this policy at the very end.

> > Instead, what I would prefer is to implement support for EFI_MEMORY_SP
> > unconditionally (including the ability to identify it in the debug
> > dump of the memory map etc), in a way that all architectures can use
> > it. Then, I think we should never treat it as ordinary memory and make
> > it the firmware's problem not to use the EFI_MEMORY_SP attribute in
> > cases where it results in undesired behavior in the OS.
>
> No, a policy of "never treat it as ordinary memory" confuses the base
> intent of the attribute which is an optional hint to get the OS to not
> put immovable / non-critical allocations in what could be a precious
> resource.
>

The base intent is to prevent the OS from using memory that is
co-located with an accelerator for any purpose other than what the
accelerator needs it for. Having a Kconfigurable policy that may be
disabled kind of misses the point IMO. I think 'optional hint' doesn't
quite capture the intent.

> Moreover, the interface for platform firmware to indicate that a
> memory range should never be treated as ordinary memory is simply the
> existing "reserved" memory type, not this attribute. That's the
> mechanism to use when platform firmware knows that a driver is needed
> for a given mmio resource.
>

Reserved memory is memory that simply should never touched at all by
the OS, and on ARM, we take care never to map it anywhere. However, it
could be annotated with the EFI_MEMORY_RUNTIME attribute in order for
the OS to provide a virtual mapping for it on behalf of the runtime
services, which is why it needs to be listed in the memory map at all.
This has nothing to do with usable memory that should or not should be
used in a certain way by the OS.

> > Also, sInce there is a generic component and a x86 component, can you
> > please split those up?
>
> Sure, can do.
>
> >
> > You only cc'ed me on patch #1 this time, but could you please cc me on
> > the entire series for v2? Thanks.
>
> Yes, will do, and thanks for taking a look.
Dan Williams April 10, 2019, 2:10 a.m. UTC | #4
On Tue, Apr 9, 2019 at 10:21 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On Tue, 9 Apr 2019 at 09:44, Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Fri, Apr 5, 2019 at 9:21 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > >
> > > Hi Dan,
> > >
> > > On Thu, 4 Apr 2019 at 21:21, Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > > > interpretation of the EFI Memory Types as "reserved for a special
> > > > purpose".
> > > >
> > > > The proposed Linux behavior for special purpose memory is that it is
> > > > reserved for direct-access (device-dax) by default and not available for
> > > > any kernel usage, not even as an OOM fallback. Later, through udev
> > > > scripts or another init mechanism, these device-dax claimed ranges can
> > > > be reconfigured and hot-added to the available System-RAM with a unique
> > > > node identifier.
> > > >
> > > > A follow-on patch integrates parsing of the ACPI HMAT to identify the
> > > > node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> > > > now, arrange for EFI_MEMORY_SP memory to be reserved.
> > > >
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > Cc: Borislav Petkov <bp@alien8.de>
> > > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > > Cc: Darren Hart <dvhart@infradead.org>
> > > > Cc: Andy Shevchenko <andy@infradead.org>
> > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > > ---
> > > >  arch/x86/Kconfig                  |   18 ++++++++++++++++++
> > > >  arch/x86/boot/compressed/eboot.c  |    5 ++++-
> > > >  arch/x86/boot/compressed/kaslr.c  |    2 +-
> > > >  arch/x86/include/asm/e820/types.h |    9 +++++++++
> > > >  arch/x86/kernel/e820.c            |    9 +++++++--
> > > >  arch/x86/platform/efi/efi.c       |   10 +++++++++-
> > > >  include/linux/efi.h               |   14 ++++++++++++++
> > > >  include/linux/ioport.h            |    1 +
> > > >  8 files changed, 63 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > > index c1f9b3cf437c..cb9ca27de7a5 100644
> > > > --- a/arch/x86/Kconfig
> > > > +++ b/arch/x86/Kconfig
> > > > @@ -1961,6 +1961,24 @@ config EFI_MIXED
> > > >
> > > >            If unsure, say N.
> > > >
> > > > +config EFI_SPECIAL_MEMORY
> > > > +       bool "EFI Special Purpose Memory Support"
> > > > +       depends on EFI
> > > > +       ---help---
> > > > +         On systems that have mixed performance classes of memory EFI
> > > > +         may indicate special purpose memory with an attribute (See
> > > > +         EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
> > > > +         attribute may have unique performance characteristics compared
> > > > +         to the system's general purpose "System RAM" pool. On the
> > > > +         expectation that such memory has application specific usage
> > > > +         answer Y to arrange for the kernel to reserve it for
> > > > +         direct-access (device-dax) by default. The memory range can
> > > > +         later be optionally assigned to the page allocator by system
> > > > +         administrator policy. Say N to have the kernel treat this
> > > > +         memory as general purpose by default.
> > > > +
> > > > +         If unsure, say Y.
> > > > +
> > >
> > > EFI_MEMORY_SP is now part of the UEFI spec proper, so it does not make
> > > sense to make any understanding of it Kconfigurable.
> >
> > No, I think you're misunderstanding what this Kconfig option is trying
> > to achieve.
> >
> > The configuration capability is solely for the default kernel policy.
> > As can already be seen by Christoph's response [1] the thought that
> > the firmware gets more leeway to dictate to Linux memory policy may be
> > objectionable.
> >
> > [1]: https://lore.kernel.org/lkml/20190409121318.GA16955@infradead.org/
> >
> > So the Kconfig option is gating whether the kernel simply ignores the
> > attribute and gives it to the page allocator by default. Anything
> > fancier, like sub-dividing how much is OS managed vs device-dax
> > accessed requires the OS to reserve it all from the page-allocator by
> > default until userspace policy can be applied.
> >
>
> I don't think this policy should dictate whether we pretend that the
> attribute doesn't exist in the first place. We should just wire up the
> bit fully, and only apply this policy at the very end.

The bit is just a policy hint, if the kernel is not implementing any
of the policy why even check for the bit?

>
> > > Instead, what I would prefer is to implement support for EFI_MEMORY_SP
> > > unconditionally (including the ability to identify it in the debug
> > > dump of the memory map etc), in a way that all architectures can use
> > > it. Then, I think we should never treat it as ordinary memory and make
> > > it the firmware's problem not to use the EFI_MEMORY_SP attribute in
> > > cases where it results in undesired behavior in the OS.
> >
> > No, a policy of "never treat it as ordinary memory" confuses the base
> > intent of the attribute which is an optional hint to get the OS to not
> > put immovable / non-critical allocations in what could be a precious
> > resource.
> >
>
> The base intent is to prevent the OS from using memory that is
> co-located with an accelerator for any purpose other than what the
> accelerator needs it for. Having a Kconfigurable policy that may be
> disabled kind of misses the point IMO. I think 'optional hint' doesn't
> quite capture the intent.

That's not my understanding, and an EFI attribute is the wrong
mechanism to meet such a requirement. If this memory is specifically
meant for use with a given accelerator then it had better be marked
reserved and the accelerator driver is then responsible for publishing
the resource to the OS if at all.

You did prompt me to go back and re-read the wording in the spec. It
still seems clear to me that the attribute is an optional hint not a
hard requirement. Whether the OS honors an optional hint is an OS
policy and I fail to see why the OS should bother to detect the bit
without implementing any associated policy.

> > Moreover, the interface for platform firmware to indicate that a
> > memory range should never be treated as ordinary memory is simply the
> > existing "reserved" memory type, not this attribute. That's the
> > mechanism to use when platform firmware knows that a driver is needed
> > for a given mmio resource.
> >
>
> Reserved memory is memory that simply should never touched at all by
> the OS, and on ARM, we take care never to map it anywhere.

That's not a guarantee, at least on x86. Some shipping persistent
memory platforms describe it as reserved and then the ACPI NFIT
further describes what that reserved memory contains and how the OS
can use it. See commit af1996ef59db "ACPI: Change NFIT driver to
insert new resource".
Ard Biesheuvel April 12, 2019, 8:43 p.m. UTC | #5
On Tue, 9 Apr 2019 at 19:11, Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Tue, Apr 9, 2019 at 10:21 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> >
> > On Tue, 9 Apr 2019 at 09:44, Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Fri, Apr 5, 2019 at 9:21 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > > >
> > > > Hi Dan,
> > > >
> > > > On Thu, 4 Apr 2019 at 21:21, Dan Williams <dan.j.williams@intel.com> wrote:
> > > > >
> > > > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > > > > interpretation of the EFI Memory Types as "reserved for a special
> > > > > purpose".
> > > > >
> > > > > The proposed Linux behavior for special purpose memory is that it is
> > > > > reserved for direct-access (device-dax) by default and not available for
> > > > > any kernel usage, not even as an OOM fallback. Later, through udev
> > > > > scripts or another init mechanism, these device-dax claimed ranges can
> > > > > be reconfigured and hot-added to the available System-RAM with a unique
> > > > > node identifier.
> > > > >
> > > > > A follow-on patch integrates parsing of the ACPI HMAT to identify the
> > > > > node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> > > > > now, arrange for EFI_MEMORY_SP memory to be reserved.
> > > > >
> > > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > > Cc: Borislav Petkov <bp@alien8.de>
> > > > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > > > Cc: Darren Hart <dvhart@infradead.org>
> > > > > Cc: Andy Shevchenko <andy@infradead.org>
> > > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > > > ---
> > > > >  arch/x86/Kconfig                  |   18 ++++++++++++++++++
> > > > >  arch/x86/boot/compressed/eboot.c  |    5 ++++-
> > > > >  arch/x86/boot/compressed/kaslr.c  |    2 +-
> > > > >  arch/x86/include/asm/e820/types.h |    9 +++++++++
> > > > >  arch/x86/kernel/e820.c            |    9 +++++++--
> > > > >  arch/x86/platform/efi/efi.c       |   10 +++++++++-
> > > > >  include/linux/efi.h               |   14 ++++++++++++++
> > > > >  include/linux/ioport.h            |    1 +
> > > > >  8 files changed, 63 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > > > index c1f9b3cf437c..cb9ca27de7a5 100644
> > > > > --- a/arch/x86/Kconfig
> > > > > +++ b/arch/x86/Kconfig
> > > > > @@ -1961,6 +1961,24 @@ config EFI_MIXED
> > > > >
> > > > >            If unsure, say N.
> > > > >
> > > > > +config EFI_SPECIAL_MEMORY
> > > > > +       bool "EFI Special Purpose Memory Support"
> > > > > +       depends on EFI
> > > > > +       ---help---
> > > > > +         On systems that have mixed performance classes of memory EFI
> > > > > +         may indicate special purpose memory with an attribute (See
> > > > > +         EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
> > > > > +         attribute may have unique performance characteristics compared
> > > > > +         to the system's general purpose "System RAM" pool. On the
> > > > > +         expectation that such memory has application specific usage
> > > > > +         answer Y to arrange for the kernel to reserve it for
> > > > > +         direct-access (device-dax) by default. The memory range can
> > > > > +         later be optionally assigned to the page allocator by system
> > > > > +         administrator policy. Say N to have the kernel treat this
> > > > > +         memory as general purpose by default.
> > > > > +
> > > > > +         If unsure, say Y.
> > > > > +
> > > >
> > > > EFI_MEMORY_SP is now part of the UEFI spec proper, so it does not make
> > > > sense to make any understanding of it Kconfigurable.
> > >
> > > No, I think you're misunderstanding what this Kconfig option is trying
> > > to achieve.
> > >
> > > The configuration capability is solely for the default kernel policy.
> > > As can already be seen by Christoph's response [1] the thought that
> > > the firmware gets more leeway to dictate to Linux memory policy may be
> > > objectionable.
> > >
> > > [1]: https://lore.kernel.org/lkml/20190409121318.GA16955@infradead.org/
> > >
> > > So the Kconfig option is gating whether the kernel simply ignores the
> > > attribute and gives it to the page allocator by default. Anything
> > > fancier, like sub-dividing how much is OS managed vs device-dax
> > > accessed requires the OS to reserve it all from the page-allocator by
> > > default until userspace policy can be applied.
> > >
> >
> > I don't think this policy should dictate whether we pretend that the
> > attribute doesn't exist in the first place. We should just wire up the
> > bit fully, and only apply this policy at the very end.
>
> The bit is just a policy hint, if the kernel is not implementing any
> of the policy why even check for the bit?
>

Because I would like things like the EFI memory map dumping code etc
to report the bit regardless of whether we are honoring it or not.

> >
> > > > Instead, what I would prefer is to implement support for EFI_MEMORY_SP
> > > > unconditionally (including the ability to identify it in the debug
> > > > dump of the memory map etc), in a way that all architectures can use
> > > > it. Then, I think we should never treat it as ordinary memory and make
> > > > it the firmware's problem not to use the EFI_MEMORY_SP attribute in
> > > > cases where it results in undesired behavior in the OS.
> > >
> > > No, a policy of "never treat it as ordinary memory" confuses the base
> > > intent of the attribute which is an optional hint to get the OS to not
> > > put immovable / non-critical allocations in what could be a precious
> > > resource.
> > >
> >
> > The base intent is to prevent the OS from using memory that is
> > co-located with an accelerator for any purpose other than what the
> > accelerator needs it for. Having a Kconfigurable policy that may be
> > disabled kind of misses the point IMO. I think 'optional hint' doesn't
> > quite capture the intent.
>
> That's not my understanding, and an EFI attribute is the wrong
> mechanism to meet such a requirement. If this memory is specifically
> meant for use with a given accelerator then it had better be marked
> reserved and the accelerator driver is then responsible for publishing
> the resource to the OS if at all.
>
> You did prompt me to go back and re-read the wording in the spec. It
> still seems clear to me that the attribute is an optional hint not a
> hard requirement. Whether the OS honors an optional hint is an OS
> policy and I fail to see why the OS should bother to detect the bit
> without implementing any associated policy.
>

Because not taking a hint is not the same thing as pretending it isn't
there to begin with.

> > > Moreover, the interface for platform firmware to indicate that a
> > > memory range should never be treated as ordinary memory is simply the
> > > existing "reserved" memory type, not this attribute. That's the
> > > mechanism to use when platform firmware knows that a driver is needed
> > > for a given mmio resource.
> > >
> >
> > Reserved memory is memory that simply should never touched at all by
> > the OS, and on ARM, we take care never to map it anywhere.
>
> That's not a guarantee, at least on x86. Some shipping persistent
> memory platforms describe it as reserved and then the ACPI NFIT
> further describes what that reserved memory contains and how the OS
> can use it. See commit af1996ef59db "ACPI: Change NFIT driver to
> insert new resource".

The UEFI spec is pretty clear about the fact that reserved memory
shouldn't ever be touched by the OS. The fact that x86 platforms exist
that violate this doesn't mean we should abuse it in other ways as
well.
Dan Williams April 12, 2019, 9:18 p.m. UTC | #6
On Fri, Apr 12, 2019 at 1:44 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
[..]
> > > I don't think this policy should dictate whether we pretend that the
> > > attribute doesn't exist in the first place. We should just wire up the
> > > bit fully, and only apply this policy at the very end.
> >
> > The bit is just a policy hint, if the kernel is not implementing any
> > of the policy why even check for the bit?
> >
>
> Because I would like things like the EFI memory map dumping code etc
> to report the bit regardless of whether we are honoring it or not.

Ok, I'll split it out just for reporting purposes, and come up with a
different mechanism to indicate whether the OS might not be honoring
the expectations of the attribute.

[..]
> Because not taking a hint is not the same thing as pretending it isn't
> there to begin with.

True, and I was missing the enabling to go update where the kernel
goes to report attributes, but for the applications that care they
will still want to debug when the kernel may be placing unwanted
allocations in the "special purpose" range.

> > > > Moreover, the interface for platform firmware to indicate that a
> > > > memory range should never be treated as ordinary memory is simply the
> > > > existing "reserved" memory type, not this attribute. That's the
> > > > mechanism to use when platform firmware knows that a driver is needed
> > > > for a given mmio resource.
> > > >
> > >
> > > Reserved memory is memory that simply should never touched at all by
> > > the OS, and on ARM, we take care never to map it anywhere.
> >
> > That's not a guarantee, at least on x86. Some shipping persistent
> > memory platforms describe it as reserved and then the ACPI NFIT
> > further describes what that reserved memory contains and how the OS
> > can use it. See commit af1996ef59db "ACPI: Change NFIT driver to
> > insert new resource".
>
> The UEFI spec is pretty clear about the fact that reserved memory
> shouldn't ever be touched by the OS. The fact that x86 platforms exist
> that violate this doesn't mean we should abuse it in other ways as
> well.

I think we're talking about 2 different "reserved" memory types, and
it was my fault for not being precise enough. The e820 reserved memory
type has been used for things like PCI memory-mapped I/O or other
memory ranges for which the OS should expect a device-driver to claim.
So when I said EFI_RESERVED_TYPE is safe to use as driver memory I
literally meant this interpretation from do_add_efi_memmap():

                default:
                        /*
                         * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
                         * EFI_RUNTIME_SERVICES_DATA EFI_MEMORY_MAPPED_IO
                         * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
                         */
                        e820_type = E820_TYPE_RESERVED;
                        break;

...where EFI_RESERVED_TYPE is identical to EFI_MEMORY_MAPPED_IO
relative to E820_TYPE_RESERVED.

The policy taken by these patches is that EFI_CONVENTIONAL_MEMORY
marked with the EFI_MEMORY_SP attribute is treated as
E820_TYPE_RESERVED by default and given to the device-dax driver with
the option to hotplug it as E820_TYPE_RAM at a later time with its own
numa description.

I'm generally pushing back on the argument that EFI_MEMORY_SP ==
EFI_RESERVED_TYPE, especially when the type is explicitly set to
EFI_CONVENTIONAL_MEMORY.
Enrico Weigelt, metux IT consult April 15, 2019, 11:43 a.m. UTC | #7
On 09.04.19 18:43, Dan Williams wrote:

>>> UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
>>> interpretation of the EFI Memory Types as "reserved for a special
>>> purpose".

What exactly is that suppose to mean ?
Little pacmans might come around and eat the bits ? :o
Who writes specs like that ? What drugs do did take ?


I vote for calling this the lore-ipsum-area ;-)


--mtx
diff mbox series

Patch

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c1f9b3cf437c..cb9ca27de7a5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1961,6 +1961,24 @@  config EFI_MIXED
 
 	   If unsure, say N.
 
+config EFI_SPECIAL_MEMORY
+	bool "EFI Special Purpose Memory Support"
+	depends on EFI
+	---help---
+	  On systems that have mixed performance classes of memory EFI
+	  may indicate special purpose memory with an attribute (See
+	  EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
+	  attribute may have unique performance characteristics compared
+	  to the system's general purpose "System RAM" pool. On the
+	  expectation that such memory has application specific usage
+	  answer Y to arrange for the kernel to reserve it for
+	  direct-access (device-dax) by default. The memory range can
+	  later be optionally assigned to the page allocator by system
+	  administrator policy. Say N to have the kernel treat this
+	  memory as general purpose by default.
+
+	  If unsure, say Y.
+
 config SECCOMP
 	def_bool y
 	prompt "Enable seccomp to safely compute untrusted bytecode"
diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 544ac4fafd11..9b90fae21abe 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -560,7 +560,10 @@  setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s
 		case EFI_BOOT_SERVICES_CODE:
 		case EFI_BOOT_SERVICES_DATA:
 		case EFI_CONVENTIONAL_MEMORY:
-			e820_type = E820_TYPE_RAM;
+			if (is_efi_special(d))
+				e820_type = E820_TYPE_SPECIAL;
+			else
+				e820_type = E820_TYPE_RAM;
 			break;
 
 		case EFI_ACPI_MEMORY_NVS:
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 2e53c056ba20..897e46eb9714 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -757,7 +757,7 @@  process_efi_entries(unsigned long minimum, unsigned long image_size)
 		 *
 		 * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free.
 		 */
-		if (md->type != EFI_CONVENTIONAL_MEMORY)
+		if (md->type != EFI_CONVENTIONAL_MEMORY || is_efi_special(md))
 			continue;
 
 		if (efi_mirror_found &&
diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
index c3aa4b5e49e2..0ab8abae2e8b 100644
--- a/arch/x86/include/asm/e820/types.h
+++ b/arch/x86/include/asm/e820/types.h
@@ -28,6 +28,15 @@  enum e820_type {
 	 */
 	E820_TYPE_PRAM		= 12,
 
+	/*
+	 * Special-purpose / application-specific memory is indicated to
+	 * the system via the EFI_MEMORY_SP attribute. Define an e820
+	 * translation of this memory type for the purpose of
+	 * reserving this range and marking it with the
+	 * IORES_DESC_APPLICATION_RESERVED designation.
+	 */
+	E820_TYPE_SPECIAL	= 0xefffffff,
+
 	/*
 	 * Reserved RAM used by the kernel itself if
 	 * CONFIG_INTEL_TXT=y is enabled, memory of this type
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 2879e234e193..9f50dd0bbb04 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -176,6 +176,7 @@  static void __init e820_print_type(enum e820_type type)
 	switch (type) {
 	case E820_TYPE_RAM:		/* Fall through: */
 	case E820_TYPE_RESERVED_KERN:	pr_cont("usable");			break;
+	case E820_TYPE_SPECIAL:		/* Fall through: */
 	case E820_TYPE_RESERVED:	pr_cont("reserved");			break;
 	case E820_TYPE_ACPI:		pr_cont("ACPI data");			break;
 	case E820_TYPE_NVS:		pr_cont("ACPI NVS");			break;
@@ -1023,6 +1024,7 @@  static const char *__init e820_type_to_string(struct e820_entry *entry)
 	case E820_TYPE_UNUSABLE:	return "Unusable memory";
 	case E820_TYPE_PRAM:		return "Persistent Memory (legacy)";
 	case E820_TYPE_PMEM:		return "Persistent Memory";
+	case E820_TYPE_SPECIAL:		/* Fall-through: */
 	case E820_TYPE_RESERVED:	return "Reserved";
 	default:			return "Unknown E820 type";
 	}
@@ -1038,6 +1040,7 @@  static unsigned long __init e820_type_to_iomem_type(struct e820_entry *entry)
 	case E820_TYPE_UNUSABLE:	/* Fall-through: */
 	case E820_TYPE_PRAM:		/* Fall-through: */
 	case E820_TYPE_PMEM:		/* Fall-through: */
+	case E820_TYPE_SPECIAL:		/* Fall-through: */
 	case E820_TYPE_RESERVED:	/* Fall-through: */
 	default:			return IORESOURCE_MEM;
 	}
@@ -1050,6 +1053,7 @@  static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
 	case E820_TYPE_NVS:		return IORES_DESC_ACPI_NV_STORAGE;
 	case E820_TYPE_PMEM:		return IORES_DESC_PERSISTENT_MEMORY;
 	case E820_TYPE_PRAM:		return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+	case E820_TYPE_SPECIAL:		return IORES_DESC_APPLICATION_RESERVED;
 	case E820_TYPE_RESERVED_KERN:	/* Fall-through: */
 	case E820_TYPE_RAM:		/* Fall-through: */
 	case E820_TYPE_UNUSABLE:	/* Fall-through: */
@@ -1065,13 +1069,14 @@  static bool __init do_mark_busy(enum e820_type type, struct resource *res)
 		return true;
 
 	/*
-	 * Treat persistent memory like device memory, i.e. reserve it
-	 * for exclusive use of a driver
+	 * Treat persistent memory and other special memory ranges like
+	 * device memory, i.e. reserve it for exclusive use of a driver
 	 */
 	switch (type) {
 	case E820_TYPE_RESERVED:
 	case E820_TYPE_PRAM:
 	case E820_TYPE_PMEM:
+	case E820_TYPE_SPECIAL:
 		return false;
 	case E820_TYPE_RESERVED_KERN:
 	case E820_TYPE_RAM:
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index e1cb01a22fa8..d227751f331b 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -139,7 +139,9 @@  static void __init do_add_efi_memmap(void)
 		case EFI_BOOT_SERVICES_CODE:
 		case EFI_BOOT_SERVICES_DATA:
 		case EFI_CONVENTIONAL_MEMORY:
-			if (md->attribute & EFI_MEMORY_WB)
+			if (is_efi_special(md))
+				e820_type = E820_TYPE_SPECIAL;
+			else if (md->attribute & EFI_MEMORY_WB)
 				e820_type = E820_TYPE_RAM;
 			else
 				e820_type = E820_TYPE_RESERVED;
@@ -753,6 +755,12 @@  static bool should_map_region(efi_memory_desc_t *md)
 	if (IS_ENABLED(CONFIG_X86_32))
 		return false;
 
+	/*
+	 * Special purpose memory is not mapped by default.
+	 */
+	if (is_efi_special(md))
+		return false;
+
 	/*
 	 * Map all of RAM so that we can access arguments in the 1:1
 	 * mapping when making EFI runtime calls.
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 54357a258b35..cecbc2bda1da 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -112,6 +112,7 @@  typedef	struct {
 #define EFI_MEMORY_MORE_RELIABLE \
 				((u64)0x0000000000010000ULL)	/* higher reliability */
 #define EFI_MEMORY_RO		((u64)0x0000000000020000ULL)	/* read-only */
+#define EFI_MEMORY_SP		((u64)0x0000000000040000ULL)	/* special purpose */
 #define EFI_MEMORY_RUNTIME	((u64)0x8000000000000000ULL)	/* range requires runtime mapping */
 #define EFI_MEMORY_DESCRIPTOR_VERSION	1
 
@@ -128,6 +129,19 @@  typedef struct {
 	u64 attribute;
 } efi_memory_desc_t;
 
+#ifdef CONFIG_EFI_SPECIAL_MEMORY
+static inline bool is_efi_special(efi_memory_desc_t *md)
+{
+	return md->type == EFI_CONVENTIONAL_MEMORY
+		&& (md->attribute & EFI_MEMORY_SP);
+}
+#else
+static inline bool is_efi_special(efi_memory_desc_t *md)
+{
+	return false;
+}
+#endif
+
 typedef struct {
 	efi_guid_t guid;
 	u32 headersize;
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..2d79841ee9b9 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@  enum {
 	IORES_DESC_PERSISTENT_MEMORY_LEGACY	= 5,
 	IORES_DESC_DEVICE_PRIVATE_MEMORY	= 6,
 	IORES_DESC_DEVICE_PUBLIC_MEMORY		= 7,
+	IORES_DESC_APPLICATION_RESERVED		= 8,
 };
 
 /* helpers to define resources */