diff mbox

[RFC,v1,10/18] x86/efi: Access EFI related tables in the clear

Message ID 20160426225740.13567.85438.stgit@tlendack-t1.amdoffice.net (mailing list archive)
State New, archived
Headers show

Commit Message

Tom Lendacky April 26, 2016, 10:57 p.m. UTC
The EFI tables are not encrypted and need to be accessed as such. Be sure
to memmap them without the encryption attribute set. For EFI support that
lives outside of the arch/x86 tree, create a routine that uses the __weak
attribute so that it can be overridden by an architecture specific routine.

When freeing boot services related memory, since it has been mapped as
un-encrypted, be sure to change the mapping to encrypted for future use.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/cacheflush.h  |    3 +
 arch/x86/include/asm/mem_encrypt.h |   22 +++++++++++
 arch/x86/kernel/setup.c            |    6 +--
 arch/x86/mm/mem_encrypt.c          |   56 +++++++++++++++++++++++++++
 arch/x86/mm/pageattr.c             |   75 ++++++++++++++++++++++++++++++++++++
 arch/x86/platform/efi/efi.c        |   26 +++++++-----
 arch/x86/platform/efi/efi_64.c     |    9 +++-
 arch/x86/platform/efi/quirks.c     |   12 +++++-
 drivers/firmware/efi/efi.c         |   18 +++++++--
 drivers/firmware/efi/esrt.c        |   12 +++---
 include/linux/efi.h                |    3 +
 11 files changed, 212 insertions(+), 30 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Matt Fleming May 10, 2016, 1:43 p.m. UTC | #1
On Tue, 26 Apr, at 05:57:40PM, Tom Lendacky wrote:
> The EFI tables are not encrypted and need to be accessed as such. Be sure
> to memmap them without the encryption attribute set. For EFI support that
> lives outside of the arch/x86 tree, create a routine that uses the __weak
> attribute so that it can be overridden by an architecture specific routine.
> 
> When freeing boot services related memory, since it has been mapped as
> un-encrypted, be sure to change the mapping to encrypted for future use.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/cacheflush.h  |    3 +
>  arch/x86/include/asm/mem_encrypt.h |   22 +++++++++++
>  arch/x86/kernel/setup.c            |    6 +--
>  arch/x86/mm/mem_encrypt.c          |   56 +++++++++++++++++++++++++++
>  arch/x86/mm/pageattr.c             |   75 ++++++++++++++++++++++++++++++++++++
>  arch/x86/platform/efi/efi.c        |   26 +++++++-----
>  arch/x86/platform/efi/efi_64.c     |    9 +++-
>  arch/x86/platform/efi/quirks.c     |   12 +++++-
>  drivers/firmware/efi/efi.c         |   18 +++++++--
>  drivers/firmware/efi/esrt.c        |   12 +++---
>  include/linux/efi.h                |    3 +
>  11 files changed, 212 insertions(+), 30 deletions(-)

The size of this change is completely unexpected. Why is there so much
churn to workaround this new feature?

Is it not possible to maintain some kind of kernel virtual address
mapping so memremap*() and friends can figure out when to twiddle the
mapping attributes and map with/without encryption?

These API changes place an undue burden on developers that don't even
care about SME.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov May 10, 2016, 1:57 p.m. UTC | #2
On Tue, May 10, 2016 at 02:43:58PM +0100, Matt Fleming wrote:
> Is it not possible to maintain some kind of kernel virtual address
> mapping so memremap*() and friends can figure out when to twiddle the
> mapping attributes and map with/without encryption?

I guess we can move the sme_* specific stuff one indirection layer
below, i.e., in the *memremap() routines so that callers don't have to
care... That should keep the churn down...
Tom Lendacky May 12, 2016, 6:20 p.m. UTC | #3
On 05/10/2016 08:57 AM, Borislav Petkov wrote:
> On Tue, May 10, 2016 at 02:43:58PM +0100, Matt Fleming wrote:
>> Is it not possible to maintain some kind of kernel virtual address
>> mapping so memremap*() and friends can figure out when to twiddle the
>> mapping attributes and map with/without encryption?
> 
> I guess we can move the sme_* specific stuff one indirection layer
> below, i.e., in the *memremap() routines so that callers don't have to
> care... That should keep the churn down...
> 

We could do that, but we'll have to generate that list of addresses so
that it can be checked against the range being mapped.  Since this is
part of early memmap support searching that list every time might not be
too bad. I'll have to look into that and see what that looks like.

Thanks,
Tom
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Lendacky May 24, 2016, 2:54 p.m. UTC | #4
On 05/12/2016 01:20 PM, Tom Lendacky wrote:
> On 05/10/2016 08:57 AM, Borislav Petkov wrote:
>> On Tue, May 10, 2016 at 02:43:58PM +0100, Matt Fleming wrote:
>>> Is it not possible to maintain some kind of kernel virtual address
>>> mapping so memremap*() and friends can figure out when to twiddle the
>>> mapping attributes and map with/without encryption?
>>
>> I guess we can move the sme_* specific stuff one indirection layer
>> below, i.e., in the *memremap() routines so that callers don't have to
>> care... That should keep the churn down...
>>
> 
> We could do that, but we'll have to generate that list of addresses so
> that it can be checked against the range being mapped.  Since this is
> part of early memmap support searching that list every time might not be
> too bad. I'll have to look into that and see what that looks like.

I looked into this and this would be a large change also to parse tables
and build lists.  It occurred to me that this could all be taken care of
if the early_memremap calls were changed to early_ioremap calls. Looking
in the git log I see that they were originally early_ioremap calls but
were changed to early_memremap calls with this commit:

commit abc93f8eb6e4 ("efi: Use early_mem*() instead of early_io*()")

Looking at the early_memremap code and the early_ioremap code they both
call __early_ioremap so I don't see how this change makes any
difference (especially since FIXMAP_PAGE_NORMAL and FIXMAP_PAGE_IO are
identical in this case).

Is it safe to change these back to early_ioremap calls (at least on
x86)?

Thanks,
Tom

> 
> Thanks,
> Tom
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Kiper May 25, 2016, 4:09 p.m. UTC | #5
On Tue, May 24, 2016 at 09:54:31AM -0500, Tom Lendacky wrote:
> On 05/12/2016 01:20 PM, Tom Lendacky wrote:
> > On 05/10/2016 08:57 AM, Borislav Petkov wrote:
> >> On Tue, May 10, 2016 at 02:43:58PM +0100, Matt Fleming wrote:
> >>> Is it not possible to maintain some kind of kernel virtual address
> >>> mapping so memremap*() and friends can figure out when to twiddle the
> >>> mapping attributes and map with/without encryption?
> >>
> >> I guess we can move the sme_* specific stuff one indirection layer
> >> below, i.e., in the *memremap() routines so that callers don't have to
> >> care... That should keep the churn down...
> >>
> >
> > We could do that, but we'll have to generate that list of addresses so
> > that it can be checked against the range being mapped.  Since this is
> > part of early memmap support searching that list every time might not be
> > too bad. I'll have to look into that and see what that looks like.
>
> I looked into this and this would be a large change also to parse tables
> and build lists.  It occurred to me that this could all be taken care of
> if the early_memremap calls were changed to early_ioremap calls. Looking
> in the git log I see that they were originally early_ioremap calls but
> were changed to early_memremap calls with this commit:
>
> commit abc93f8eb6e4 ("efi: Use early_mem*() instead of early_io*()")
>
> Looking at the early_memremap code and the early_ioremap code they both
> call __early_ioremap so I don't see how this change makes any
> difference (especially since FIXMAP_PAGE_NORMAL and FIXMAP_PAGE_IO are
> identical in this case).
>
> Is it safe to change these back to early_ioremap calls (at least on
> x86)?

Commit f955371ca9d3986bca100666041fcfa9b6d21962 (x86: remove the Xen-specific
_PAGE_IOMAP PTE flag) made commit abc93f8eb6e4 unnecessary. Though, IMO, it
is still valid code cleanup. So, if it is not very strongly needed I would
not revert this change.

Daniel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Fleming May 25, 2016, 7:30 p.m. UTC | #6
On Tue, 24 May, at 09:54:31AM, Tom Lendacky wrote:
> 
> I looked into this and this would be a large change also to parse tables
> and build lists.  It occurred to me that this could all be taken care of
> if the early_memremap calls were changed to early_ioremap calls. Looking
> in the git log I see that they were originally early_ioremap calls but
> were changed to early_memremap calls with this commit:
> 
> commit abc93f8eb6e4 ("efi: Use early_mem*() instead of early_io*()")
> 
> Looking at the early_memremap code and the early_ioremap code they both
> call __early_ioremap so I don't see how this change makes any
> difference (especially since FIXMAP_PAGE_NORMAL and FIXMAP_PAGE_IO are
> identical in this case).
> 
> Is it safe to change these back to early_ioremap calls (at least on
> x86)?

I really don't want to begin mixing early_ioremap() calls and
early_memremap() calls for any of the EFI code if it can be avoided.

There is slow but steady progress to move more and more of the
architecture specific EFI code out into generic code. Swapping
early_memremap() for early_ioremap() would be a step backwards,
because FIXMAP_PAGE_NORMAL and FIXMAP_PAGE_IO are not identical on
ARM/arm64.

Could you point me at the patch that in this series that fixes up
early_ioremap() to work with mem encrypt/decrypt? I took another
(quick) look through but couldn't find it.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Lendacky May 26, 2016, 1:45 p.m. UTC | #7
On 05/25/2016 02:30 PM, Matt Fleming wrote:
> On Tue, 24 May, at 09:54:31AM, Tom Lendacky wrote:
>>
>> I looked into this and this would be a large change also to parse tables
>> and build lists.  It occurred to me that this could all be taken care of
>> if the early_memremap calls were changed to early_ioremap calls. Looking
>> in the git log I see that they were originally early_ioremap calls but
>> were changed to early_memremap calls with this commit:
>>
>> commit abc93f8eb6e4 ("efi: Use early_mem*() instead of early_io*()")
>>
>> Looking at the early_memremap code and the early_ioremap code they both
>> call __early_ioremap so I don't see how this change makes any
>> difference (especially since FIXMAP_PAGE_NORMAL and FIXMAP_PAGE_IO are
>> identical in this case).
>>
>> Is it safe to change these back to early_ioremap calls (at least on
>> x86)?
> 
> I really don't want to begin mixing early_ioremap() calls and
> early_memremap() calls for any of the EFI code if it can be avoided.

I definitely wouldn't mix them, it would be all or nothing.

> 
> There is slow but steady progress to move more and more of the
> architecture specific EFI code out into generic code. Swapping
> early_memremap() for early_ioremap() would be a step backwards,
> because FIXMAP_PAGE_NORMAL and FIXMAP_PAGE_IO are not identical on
> ARM/arm64.

Maybe adding something similar to __acpi_map_table would be more
appropriate?

> 
> Could you point me at the patch that in this series that fixes up
> early_ioremap() to work with mem encrypt/decrypt? I took another
> (quick) look through but couldn't find it.

The patch in question is patch 6/18 where PAGE_KERNEL is changed to
include the _PAGE_ENC attribute (the encryption mask). This now
makes FIXMAP_PAGE_NORMAL contain the encryption mask while
FIXMAP_PAGE_IO does not. In this way, anything mapped using the
early_ioremap call won't be mapped encrypted.

Thanks,
Tom

> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Fleming June 8, 2016, 10:07 a.m. UTC | #8
(Sorry for the delay)

On Thu, 26 May, at 08:45:58AM, Tom Lendacky wrote:
> 
> The patch in question is patch 6/18 where PAGE_KERNEL is changed to
> include the _PAGE_ENC attribute (the encryption mask). This now
> makes FIXMAP_PAGE_NORMAL contain the encryption mask while
> FIXMAP_PAGE_IO does not. In this way, anything mapped using the
> early_ioremap call won't be mapped encrypted.

There are semantics attached to early_ioremap() that do not apply in
this case; that you're mapping an MMIO region but for EFI we just care
about noting where the firmware (not the kernel) populated the region
with data. Similar problems exist for other early boot code such as
the devicetree stuff.

early_ioremap() is not the answer.

What you really want is just some way to distinguish kernel-owned
regions from those owned by "somebody else".

I have no problem updating early_memremap() to take a @flags argument
to make that distinction, provided that the naming is generic and not
tied to AMD's SME technology via an "sme" prefix/suffix.

And making it generic should allow it to be easily sprinkled into the
shared architecture code in drivers/firmware/efi/ without issue.

I'm going to follow up with some additional comments/questions on
PATCH 10.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Fleming June 8, 2016, 11:18 a.m. UTC | #9
On Tue, 26 Apr, at 05:57:40PM, Tom Lendacky wrote:
> The EFI tables are not encrypted and need to be accessed as such. Be sure
> to memmap them without the encryption attribute set. For EFI support that
> lives outside of the arch/x86 tree, create a routine that uses the __weak
> attribute so that it can be overridden by an architecture specific routine.
> 
> When freeing boot services related memory, since it has been mapped as
> un-encrypted, be sure to change the mapping to encrypted for future use.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/cacheflush.h  |    3 +
>  arch/x86/include/asm/mem_encrypt.h |   22 +++++++++++
>  arch/x86/kernel/setup.c            |    6 +--
>  arch/x86/mm/mem_encrypt.c          |   56 +++++++++++++++++++++++++++
>  arch/x86/mm/pageattr.c             |   75 ++++++++++++++++++++++++++++++++++++
>  arch/x86/platform/efi/efi.c        |   26 +++++++-----
>  arch/x86/platform/efi/efi_64.c     |    9 +++-
>  arch/x86/platform/efi/quirks.c     |   12 +++++-
>  drivers/firmware/efi/efi.c         |   18 +++++++--
>  drivers/firmware/efi/esrt.c        |   12 +++---
>  include/linux/efi.h                |    3 +
>  11 files changed, 212 insertions(+), 30 deletions(-)

[...]

> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 994a7df8..871b213 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -53,6 +53,7 @@
>  #include <asm/x86_init.h>
>  #include <asm/rtc.h>
>  #include <asm/uv/uv.h>
> +#include <asm/mem_encrypt.h>
>  
>  #define EFI_DEBUG
>  
> @@ -261,12 +262,12 @@ static int __init efi_systab_init(void *phys)
>  		u64 tmp = 0;
>  
>  		if (efi_setup) {
> -			data = early_memremap(efi_setup, sizeof(*data));
> +			data = sme_early_memremap(efi_setup, sizeof(*data));
>  			if (!data)
>  				return -ENOMEM;
>  		}

Beware, this data comes from a previous kernel that kexec'd this
kernel. Unless you've updated bzImage64_load() to allocate an
unencrypted region 'efi_setup' will in fact be encrypted.

> @@ -690,6 +691,7 @@ static void *realloc_pages(void *old_memmap, int old_shift)
>  	ret = (void *)__get_free_pages(GFP_KERNEL, old_shift + 1);
>  	if (!ret)
>  		goto out;
> +	sme_set_mem_dec(ret, PAGE_SIZE << (old_shift + 1));
>  
>  	/*
>  	 * A first-time allocation doesn't have anything to copy.

I'm not sure why it's necessary to mark this region as unencrypted,
because at this point the kernel controls the platform and when we
call into the firmware it should be using our page tables. I wouldn't
expect the firmware to mess with the SYSCFG MSR either.

Have you come across a situation where the above was required?

> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index 49e4dd4..834a992 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -223,7 +223,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>  	if (efi_enabled(EFI_OLD_MEMMAP))
>  		return 0;
>  
> -	efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
> +	efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
>  	pgd = efi_pgd;
>  
>  	/*

Huh? Why does __pa() now OR in sme_mas_mask? I thought SME only
required the page table structures to be modified, not the end
address?

> @@ -262,7 +262,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>  		pfn = md->phys_addr >> PAGE_SHIFT;
>  		npages = md->num_pages;
>  
> -		if (kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, npages, _PAGE_RW)) {
> +		if (kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, npages,
> +					    _PAGE_RW | _PAGE_ENC)) {
>  			pr_err("Failed to map 1:1 memory\n");
>  			return 1;
>  		}

Could you push the _PAGE_ENC addition down into
kernel_map_pages_in_pgd()? Other flags are also handled that way, see
_PAGE_PRESENT.

> @@ -272,6 +273,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>  	if (!page)
>  		panic("Unable to allocate EFI runtime stack < 4GB\n");
>  
> +	sme_set_mem_dec(page_address(page), PAGE_SIZE);
>  	efi_scratch.phys_stack = virt_to_phys(page_address(page));
>  	efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */
>  

We should not need to mark the stack as unencrypted, the firmware
should respect our SME settings, right?

> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index ab50ada..dde4fb6b 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -13,6 +13,7 @@
>  #include <linux/dmi.h>
>  #include <asm/efi.h>
>  #include <asm/uv/uv.h>
> +#include <asm/mem_encrypt.h>
>  
>  #define EFI_MIN_RESERVE 5120
>  
> @@ -265,6 +266,13 @@ void __init efi_free_boot_services(void)
>  		if (md->attribute & EFI_MEMORY_RUNTIME)
>  			continue;
>  
> +		/*
> +		 * Change the mapping to encrypted memory before freeing.
> +		 * This insures any future allocations of this mapped area
> +		 * are used encrypted.
> +		 */
> +		sme_set_mem_enc(__va(start), size);
> +
>  		free_bootmem_late(start, size);
>  	}
>  

I don't think it's necessary to have to mark the __va() mapping of
these regions as encrypted at this point. They should be setup that
way initially.

The reason is that it'd be a bug if these regions were accessed via
the __va() mappings before this point. Unless there's something I'm
missing.

> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 3a69ed5..25010c7 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -76,6 +76,16 @@ static int __init parse_efi_cmdline(char *str)
>  }
>  early_param("efi", parse_efi_cmdline);
>  
> +/*
> + * If memory encryption is supported, then an override to this function
> + * will be provided.
> + */
> +void __weak __init *efi_me_early_memremap(resource_size_t phys_addr,
> +					  unsigned long size)
> +{
> +	return early_memremap(phys_addr, size);
> +}
> +
>  struct kobject *efi_kobj;
>  
>  /*

Like I said in my other mail, I'd much prefer to see this buried in
arch/x86 by passing a flag to early_memremap() which can be parsed in
arch directories.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Lendacky June 9, 2016, 4:16 p.m. UTC | #10
On 06/08/2016 05:07 AM, Matt Fleming wrote:
> (Sorry for the delay)

No worries, thanks for all the feedback.

> 
> On Thu, 26 May, at 08:45:58AM, Tom Lendacky wrote:
>>
>> The patch in question is patch 6/18 where PAGE_KERNEL is changed to
>> include the _PAGE_ENC attribute (the encryption mask). This now
>> makes FIXMAP_PAGE_NORMAL contain the encryption mask while
>> FIXMAP_PAGE_IO does not. In this way, anything mapped using the
>> early_ioremap call won't be mapped encrypted.
> 
> There are semantics attached to early_ioremap() that do not apply in
> this case; that you're mapping an MMIO region but for EFI we just care
> about noting where the firmware (not the kernel) populated the region
> with data. Similar problems exist for other early boot code such as
> the devicetree stuff.
> 
> early_ioremap() is not the answer.
> 
> What you really want is just some way to distinguish kernel-owned
> regions from those owned by "somebody else".
> 
> I have no problem updating early_memremap() to take a @flags argument
> to make that distinction, provided that the naming is generic and not
> tied to AMD's SME technology via an "sme" prefix/suffix.

So maybe something along the lines of an enum that would have entries
(initially) like KERNEL_DATA (equal to zero) and EFI_DATA. Others could
be added later as needed.

Would you then want to allow the protection attributes to be updated
by architecture specific code through something like a __weak function?
In the x86 case I can add this function as a non-SME specific function
that would initially just have the SME-related mask modification in it.

Thanks,
Tom

> 
> And making it generic should allow it to be easily sprinkled into the
> shared architecture code in drivers/firmware/efi/ without issue.
> 
> I'm going to follow up with some additional comments/questions on
> PATCH 10.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Lendacky June 9, 2016, 6:33 p.m. UTC | #11
On 06/08/2016 06:18 AM, Matt Fleming wrote:
> On Tue, 26 Apr, at 05:57:40PM, Tom Lendacky wrote:
>> The EFI tables are not encrypted and need to be accessed as such. Be sure
>> to memmap them without the encryption attribute set. For EFI support that
>> lives outside of the arch/x86 tree, create a routine that uses the __weak
>> attribute so that it can be overridden by an architecture specific routine.
>>
>> When freeing boot services related memory, since it has been mapped as
>> un-encrypted, be sure to change the mapping to encrypted for future use.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/cacheflush.h  |    3 +
>>  arch/x86/include/asm/mem_encrypt.h |   22 +++++++++++
>>  arch/x86/kernel/setup.c            |    6 +--
>>  arch/x86/mm/mem_encrypt.c          |   56 +++++++++++++++++++++++++++
>>  arch/x86/mm/pageattr.c             |   75 ++++++++++++++++++++++++++++++++++++
>>  arch/x86/platform/efi/efi.c        |   26 +++++++-----
>>  arch/x86/platform/efi/efi_64.c     |    9 +++-
>>  arch/x86/platform/efi/quirks.c     |   12 +++++-
>>  drivers/firmware/efi/efi.c         |   18 +++++++--
>>  drivers/firmware/efi/esrt.c        |   12 +++---
>>  include/linux/efi.h                |    3 +
>>  11 files changed, 212 insertions(+), 30 deletions(-)
> 
> [...]
> 
>> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
>> index 994a7df8..871b213 100644
>> --- a/arch/x86/platform/efi/efi.c
>> +++ b/arch/x86/platform/efi/efi.c
>> @@ -53,6 +53,7 @@
>>  #include <asm/x86_init.h>
>>  #include <asm/rtc.h>
>>  #include <asm/uv/uv.h>
>> +#include <asm/mem_encrypt.h>
>>  
>>  #define EFI_DEBUG
>>  
>> @@ -261,12 +262,12 @@ static int __init efi_systab_init(void *phys)
>>  		u64 tmp = 0;
>>  
>>  		if (efi_setup) {
>> -			data = early_memremap(efi_setup, sizeof(*data));
>> +			data = sme_early_memremap(efi_setup, sizeof(*data));
>>  			if (!data)
>>  				return -ENOMEM;
>>  		}
> 
> Beware, this data comes from a previous kernel that kexec'd this
> kernel. Unless you've updated bzImage64_load() to allocate an
> unencrypted region 'efi_setup' will in fact be encrypted.

Yes, I missed the kexec path originally and need to take that into
account in general.

> 
>> @@ -690,6 +691,7 @@ static void *realloc_pages(void *old_memmap, int old_shift)
>>  	ret = (void *)__get_free_pages(GFP_KERNEL, old_shift + 1);
>>  	if (!ret)
>>  		goto out;
>> +	sme_set_mem_dec(ret, PAGE_SIZE << (old_shift + 1));
>>  
>>  	/*
>>  	 * A first-time allocation doesn't have anything to copy.
> 
> I'm not sure why it's necessary to mark this region as unencrypted,
> because at this point the kernel controls the platform and when we
> call into the firmware it should be using our page tables. I wouldn't
> expect the firmware to mess with the SYSCFG MSR either.
> 
> Have you come across a situation where the above was required?

I was trying to play it safe here, but as you say, the firmware should
be using our page tables so we can get rid of this call. The problem
will actually be if we transition to a 32-bit efi. The encryption bit
will be lost in cr3 and so the pgd table will have to be un-encrypted.
The entries in the pgd can have the encryption bit set so I would only
need to worry about the pgd itself. I'll have to update the
efi_alloc_page_tables routine.

> 
>> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
>> index 49e4dd4..834a992 100644
>> --- a/arch/x86/platform/efi/efi_64.c
>> +++ b/arch/x86/platform/efi/efi_64.c
>> @@ -223,7 +223,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>>  	if (efi_enabled(EFI_OLD_MEMMAP))
>>  		return 0;
>>  
>> -	efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
>> +	efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
>>  	pgd = efi_pgd;
>>  
>>  	/*
> 
> Huh? Why does __pa() now OR in sme_mas_mask? I thought SME only
> required the page table structures to be modified, not the end
> address?

The encryption bit in the cr3 register will indicate if the pgd table
is encrypted or not. Based on my comment above about the pgd having
to be un-encrypted in case we have to transition to 32-bit efi, this
can be removed.

> 
>> @@ -262,7 +262,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>>  		pfn = md->phys_addr >> PAGE_SHIFT;
>>  		npages = md->num_pages;
>>  
>> -		if (kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, npages, _PAGE_RW)) {
>> +		if (kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, npages,
>> +					    _PAGE_RW | _PAGE_ENC)) {
>>  			pr_err("Failed to map 1:1 memory\n");
>>  			return 1;
>>  		}
> 
> Could you push the _PAGE_ENC addition down into
> kernel_map_pages_in_pgd()? Other flags are also handled that way, see
> _PAGE_PRESENT.

I'll look into this a bit more. From looking at it I don't want the
_PAGE_ENC bit set for the memmap unless it gets re-allocated (which
I missed in these patches). Let me see what I can do with this.

> 
>> @@ -272,6 +273,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>>  	if (!page)
>>  		panic("Unable to allocate EFI runtime stack < 4GB\n");
>>  
>> +	sme_set_mem_dec(page_address(page), PAGE_SIZE);
>>  	efi_scratch.phys_stack = virt_to_phys(page_address(page));
>>  	efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */
>>  
> 
> We should not need to mark the stack as unencrypted, the firmware
> should respect our SME settings, right?

Yup, you're correct. I think we can get rid of this call, too.

> 
>> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
>> index ab50ada..dde4fb6b 100644
>> --- a/arch/x86/platform/efi/quirks.c
>> +++ b/arch/x86/platform/efi/quirks.c
>> @@ -13,6 +13,7 @@
>>  #include <linux/dmi.h>
>>  #include <asm/efi.h>
>>  #include <asm/uv/uv.h>
>> +#include <asm/mem_encrypt.h>
>>  
>>  #define EFI_MIN_RESERVE 5120
>>  
>> @@ -265,6 +266,13 @@ void __init efi_free_boot_services(void)
>>  		if (md->attribute & EFI_MEMORY_RUNTIME)
>>  			continue;
>>  
>> +		/*
>> +		 * Change the mapping to encrypted memory before freeing.
>> +		 * This insures any future allocations of this mapped area
>> +		 * are used encrypted.
>> +		 */
>> +		sme_set_mem_enc(__va(start), size);
>> +
>>  		free_bootmem_late(start, size);
>>  	}
>>  
> 
> I don't think it's necessary to have to mark the __va() mapping of
> these regions as encrypted at this point. They should be setup that
> way initially.
> 
> The reason is that it'd be a bug if these regions were accessed via
> the __va() mappings before this point. Unless there's something I'm
> missing.

I'll look further into this, but I saw that this area of virtual memory
was mapped un-encrypted and after freeing the boot services the
mappings were somehow reused as un-encrypted for DMA which assumes
(unless using swiotlb) encrypted. This resulted in DMA data being
transferred in as encrypted and then accessed un-encrypted.

Thanks,
Tom

> 
>> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
>> index 3a69ed5..25010c7 100644
>> --- a/drivers/firmware/efi/efi.c
>> +++ b/drivers/firmware/efi/efi.c
>> @@ -76,6 +76,16 @@ static int __init parse_efi_cmdline(char *str)
>>  }
>>  early_param("efi", parse_efi_cmdline);
>>  
>> +/*
>> + * If memory encryption is supported, then an override to this function
>> + * will be provided.
>> + */
>> +void __weak __init *efi_me_early_memremap(resource_size_t phys_addr,
>> +					  unsigned long size)
>> +{
>> +	return early_memremap(phys_addr, size);
>> +}
>> +
>>  struct kobject *efi_kobj;
>>  
>>  /*
> 
> Like I said in my other mail, I'd much prefer to see this buried in
> arch/x86 by passing a flag to early_memremap() which can be parsed in
> arch directories.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Fleming June 13, 2016, 12:03 p.m. UTC | #12
On Thu, 09 Jun, at 11:16:40AM, Tom Lendacky wrote:
> 
> So maybe something along the lines of an enum that would have entries
> (initially) like KERNEL_DATA (equal to zero) and EFI_DATA. Others could
> be added later as needed.
 
Sure, that works for me, though maybe BOOT_DATA would be more
applicable considering the devicetree case too.

> Would you then want to allow the protection attributes to be updated
> by architecture specific code through something like a __weak function?
> In the x86 case I can add this function as a non-SME specific function
> that would initially just have the SME-related mask modification in it.

Would we need a new function? Couldn't we just have a new
FIXMAP_PAGE_* constant? e.g. would something like this work?

---

enum memremap_owner {
	KERNEL_DATA = 0,
	BOOT_DATA,
};

void __init *
early_memremap(resource_size_t phys_addr, unsigned long size,
	       enum memremap_owner owner)
{
	pgprot_t prot;

	switch (owner) {
	case BOOT_DATA:
		prot = FIXMAP_PAGE_BOOT;
		break;
	case KERNEL_DATA:	/* FALLTHROUGH */
	default:
		prot = FIXMAP_PAGE_NORMAL;
		
	}

	return (__force void *)__early_ioremap(phys_addr, size, prot);
}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Fleming June 13, 2016, 12:34 p.m. UTC | #13
On Mon, 13 Jun, at 01:03:22PM, Matt Fleming wrote:
> 
> Would we need a new function? Couldn't we just have a new
> FIXMAP_PAGE_* constant? e.g. would something like this work?
> 
> ---
> 
> enum memremap_owner {
> 	KERNEL_DATA = 0,
> 	BOOT_DATA,
> };
> 
> void __init *
> early_memremap(resource_size_t phys_addr, unsigned long size,
> 	       enum memremap_owner owner)
> {
> 	pgprot_t prot;
> 
> 	switch (owner) {
> 	case BOOT_DATA:
> 		prot = FIXMAP_PAGE_BOOT;
> 		break;
> 	case KERNEL_DATA:	/* FALLTHROUGH */
> 	default:
> 		prot = FIXMAP_PAGE_NORMAL;
> 		
> 	}
> 
> 	return (__force void *)__early_ioremap(phys_addr, size, prot);
> }

Although it occurs to me that if there's a trivial 1:1 mapping between
memremap_owner and FIXMAP_PAGE_* we might as well just add a new
early_memremap_boot() that uses the correct FIXMAP_PAGE_* constant,
akin to early_memremap_ro().
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Fleming June 13, 2016, 1:51 p.m. UTC | #14
On Thu, 09 Jun, at 01:33:30PM, Tom Lendacky wrote:
> 
> I was trying to play it safe here, but as you say, the firmware should
> be using our page tables so we can get rid of this call. The problem
> will actually be if we transition to a 32-bit efi. The encryption bit
> will be lost in cr3 and so the pgd table will have to be un-encrypted.
> The entries in the pgd can have the encryption bit set so I would only
> need to worry about the pgd itself. I'll have to update the
> efi_alloc_page_tables routine.
 
Interesting, I hadn't expected 32-bit EFI to be an option for
platforms with the SME technology. I'd assumed we could just ignore
that.

Are you saying that the encryption bit isn't supported in 32-bit
compatibility mode? We don't do a "full" switch to 32-bit protected
mode when in mixed mode, just load a 32-bit code segment descriptor.
The page tables are not modified at all.

> The encryption bit in the cr3 register will indicate if the pgd table
> is encrypted or not. Based on my comment above about the pgd having
> to be un-encrypted in case we have to transition to 32-bit efi, this
> can be removed.
 
I'm not (yet) sure that the pgd needs to be unencrypted for 32-bit EFI
when running a 64-bit kernel. In the AMD Programmer's Manual, Section
7.10.3 Operating Modes seems to indicate that running encrypted should
work fine.

> I'll look into this a bit more. From looking at it I don't want the
> _PAGE_ENC bit set for the memmap unless it gets re-allocated (which
> I missed in these patches). Let me see what I can do with this.
 
I don't understand your comment about re-allocating the memmap.

The kernel builds its own EFI memory map at runtime, initially based
on the memory map provided by the firmware. We always allocate a new
memory map.

In efi_setup_page_tables() we're building our own page tables, which
should be encrypted, and mapping EFI regions described by the memmap
into those page tables.

So unless we're mapping an MMIO region (in which case _PAGE_PCD is set
in @flags for kernel_map_pages_in_pgd()) I would expect _PAGE_ENC to
be set.

> I'll look further into this, but I saw that this area of virtual memory
> was mapped un-encrypted and after freeing the boot services the
> mappings were somehow reused as un-encrypted for DMA which assumes
> (unless using swiotlb) encrypted. This resulted in DMA data being
> transferred in as encrypted and then accessed un-encrypted.

That the mappings were re-used isn't a surprise.

efi_free_boot_services() lifts the reservation that was put in place
during efi_reserve_boot_services() and releases the pages to the
kernel's memory allocators.

What is surprising is that they were marked unencrypted at all.
There's nothing special about these pages as far as the __va() region
is concerned.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Lendacky June 13, 2016, 3:16 p.m. UTC | #15
On 06/13/2016 07:03 AM, Matt Fleming wrote:
> On Thu, 09 Jun, at 11:16:40AM, Tom Lendacky wrote:
>>
>> So maybe something along the lines of an enum that would have entries
>> (initially) like KERNEL_DATA (equal to zero) and EFI_DATA. Others could
>> be added later as needed.
>  
> Sure, that works for me, though maybe BOOT_DATA would be more
> applicable considering the devicetree case too.
> 
>> Would you then want to allow the protection attributes to be updated
>> by architecture specific code through something like a __weak function?
>> In the x86 case I can add this function as a non-SME specific function
>> that would initially just have the SME-related mask modification in it.
> 
> Would we need a new function? Couldn't we just have a new
> FIXMAP_PAGE_* constant? e.g. would something like this work?

Looking forward to the virtualization support (SEV), the VM will be
completely encrypted from the time it is started. In this case all of
the UEFI data will be encrypted and I would need to insure that the
mapping reflects that. When I do the SEV patches, I can change the
FIXMAP #define to add some logic to return a value, so I think the
FIXMAP_PAGE_ idea can work.

Thanks,
Tom

> 
> ---
> 
> enum memremap_owner {
> 	KERNEL_DATA = 0,
> 	BOOT_DATA,
> };
> 
> void __init *
> early_memremap(resource_size_t phys_addr, unsigned long size,
> 	       enum memremap_owner owner)
> {
> 	pgprot_t prot;
> 
> 	switch (owner) {
> 	case BOOT_DATA:
> 		prot = FIXMAP_PAGE_BOOT;
> 		break;
> 	case KERNEL_DATA:	/* FALLTHROUGH */
> 	default:
> 		prot = FIXMAP_PAGE_NORMAL;
> 		
> 	}
> 
> 	return (__force void *)__early_ioremap(phys_addr, size, prot);
> }
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Lendacky June 15, 2016, 1:17 p.m. UTC | #16
On 06/13/2016 08:51 AM, Matt Fleming wrote:
> On Thu, 09 Jun, at 01:33:30PM, Tom Lendacky wrote:
>>
>> I was trying to play it safe here, but as you say, the firmware should
>> be using our page tables so we can get rid of this call. The problem
>> will actually be if we transition to a 32-bit efi. The encryption bit
>> will be lost in cr3 and so the pgd table will have to be un-encrypted.
>> The entries in the pgd can have the encryption bit set so I would only
>> need to worry about the pgd itself. I'll have to update the
>> efi_alloc_page_tables routine.
>  
> Interesting, I hadn't expected 32-bit EFI to be an option for
> platforms with the SME technology. I'd assumed we could just ignore
> that.

We may be able to do that.

> 
> Are you saying that the encryption bit isn't supported in 32-bit
> compatibility mode? We don't do a "full" switch to 32-bit protected
> mode when in mixed mode, just load a 32-bit code segment descriptor.
> The page tables are not modified at all.

The encryption bit is supported in 32-bit compatibility mode and since
we're not doing the "full" switch the cr3 register will remain as a
64-bit register so we can leave the pgd table encrypted.

> 
>> The encryption bit in the cr3 register will indicate if the pgd table
>> is encrypted or not. Based on my comment above about the pgd having
>> to be un-encrypted in case we have to transition to 32-bit efi, this
>> can be removed.
>  
> I'm not (yet) sure that the pgd needs to be unencrypted for 32-bit EFI
> when running a 64-bit kernel. In the AMD Programmer's Manual, Section
> 7.10.3 Operating Modes seems to indicate that running encrypted should
> work fine.
> 
>> I'll look into this a bit more. From looking at it I don't want the
>> _PAGE_ENC bit set for the memmap unless it gets re-allocated (which
>> I missed in these patches). Let me see what I can do with this.
>  
> I don't understand your comment about re-allocating the memmap.
> 
> The kernel builds its own EFI memory map at runtime, initially based
> on the memory map provided by the firmware. We always allocate a new
> memory map.

Sorry, I mis-interpreted the efi_map_regions function/loop and see
that the memmap is always allocated by the kernel.

> 
> In efi_setup_page_tables() we're building our own page tables, which
> should be encrypted, and mapping EFI regions described by the memmap
> into those page tables.
> 
> So unless we're mapping an MMIO region (in which case _PAGE_PCD is set
> in @flags for kernel_map_pages_in_pgd()) I would expect _PAGE_ENC to
> be set.
> 
>> I'll look further into this, but I saw that this area of virtual memory
>> was mapped un-encrypted and after freeing the boot services the
>> mappings were somehow reused as un-encrypted for DMA which assumes
>> (unless using swiotlb) encrypted. This resulted in DMA data being
>> transferred in as encrypted and then accessed un-encrypted.
> 
> That the mappings were re-used isn't a surprise.
> 
> efi_free_boot_services() lifts the reservation that was put in place
> during efi_reserve_boot_services() and releases the pages to the
> kernel's memory allocators.
> 
> What is surprising is that they were marked unencrypted at all.
> There's nothing special about these pages as far as the __va() region
> is concerned.

Right, let me keep looking into this to see if I can pin down what
was (or is) happening.

Thanks,
Tom

> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Lendacky June 16, 2016, 2:38 p.m. UTC | #17
On 06/15/2016 08:17 AM, Tom Lendacky wrote:
> On 06/13/2016 08:51 AM, Matt Fleming wrote:
>> On Thu, 09 Jun, at 01:33:30PM, Tom Lendacky wrote:
>>>

[...]

>>
>>> I'll look further into this, but I saw that this area of virtual memory
>>> was mapped un-encrypted and after freeing the boot services the
>>> mappings were somehow reused as un-encrypted for DMA which assumes
>>> (unless using swiotlb) encrypted. This resulted in DMA data being
>>> transferred in as encrypted and then accessed un-encrypted.
>>
>> That the mappings were re-used isn't a surprise.
>>
>> efi_free_boot_services() lifts the reservation that was put in place
>> during efi_reserve_boot_services() and releases the pages to the
>> kernel's memory allocators.
>>
>> What is surprising is that they were marked unencrypted at all.
>> There's nothing special about these pages as far as the __va() region
>> is concerned.
> 
> Right, let me keep looking into this to see if I can pin down what
> was (or is) happening.

Ok, I think this was happening before the commit to build our own
EFI page table structures:

commit 67a9108ed ("x86/efi: Build our own page table structures")

Before this commit the boot services ended up mapped into the kernel
page table entries as un-encrypted during efi_map_regions() and I needed
to change those entries back to encrypted. With your change above,
this appears to no longer be needed.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Fleming June 17, 2016, 3:51 p.m. UTC | #18
On Thu, 16 Jun, at 09:38:31AM, Tom Lendacky wrote:
> 
> Ok, I think this was happening before the commit to build our own
> EFI page table structures:
> 
> commit 67a9108ed ("x86/efi: Build our own page table structures")
> 
> Before this commit the boot services ended up mapped into the kernel
> page table entries as un-encrypted during efi_map_regions() and I needed
> to change those entries back to encrypted. With your change above,
> this appears to no longer be needed.

Great news! Things are as they should be ;)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 61518cf..bfb08e5 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -13,6 +13,7 @@ 
  * Executability : eXeutable, NoteXecutable
  * Read/Write    : ReadOnly, ReadWrite
  * Presence      : NotPresent
+ * Encryption    : ENCrypted, DECrypted
  *
  * Within a category, the attributes are mutually exclusive.
  *
@@ -48,6 +49,8 @@  int set_memory_ro(unsigned long addr, int numpages);
 int set_memory_rw(unsigned long addr, int numpages);
 int set_memory_np(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
+int set_memory_enc(unsigned long addr, int numpages);
+int set_memory_dec(unsigned long addr, int numpages);
 
 int set_memory_array_uc(unsigned long *addr, int addrinarray);
 int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 2785493..42868f5 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -23,13 +23,23 @@  extern unsigned long sme_me_mask;
 
 u8 sme_get_me_loss(void);
 
+int sme_set_mem_enc(void *vaddr, unsigned long size);
+int sme_set_mem_dec(void *vaddr, unsigned long size);
+
 void __init sme_early_mem_enc(resource_size_t paddr,
 			      unsigned long size);
 void __init sme_early_mem_dec(resource_size_t paddr,
 			      unsigned long size);
 
+void __init *sme_early_memremap(resource_size_t paddr,
+				unsigned long size);
+
 void __init sme_early_init(void);
 
+/* Architecture __weak replacement functions */
+void __init *efi_me_early_memremap(resource_size_t paddr,
+				   unsigned long size);
+
 #define __sme_pa(x)		(__pa((x)) | sme_me_mask)
 #define __sme_pa_nodebug(x)	(__pa_nodebug((x)) | sme_me_mask)
 
@@ -44,6 +54,16 @@  static inline u8 sme_get_me_loss(void)
 	return 0;
 }
 
+static inline int sme_set_mem_enc(void *vaddr, unsigned long size)
+{
+	return 0;
+}
+
+static inline int sme_set_mem_dec(void *vaddr, unsigned long size)
+{
+	return 0;
+}
+
 static inline void __init sme_early_mem_enc(resource_size_t paddr,
 					    unsigned long size)
 {
@@ -63,6 +83,8 @@  static inline void __init sme_early_init(void)
 
 #define __sme_va		__va
 
+#define sme_early_memremap	early_memremap
+
 #endif	/* CONFIG_AMD_MEM_ENCRYPT */
 
 #endif	/* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1d29cf9..2e460fb 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -424,7 +424,7 @@  static void __init parse_setup_data(void)
 	while (pa_data) {
 		u32 data_len, data_type;
 
-		data = early_memremap(pa_data, sizeof(*data));
+		data = sme_early_memremap(pa_data, sizeof(*data));
 		data_len = data->len + sizeof(struct setup_data);
 		data_type = data->type;
 		pa_next = data->next;
@@ -457,7 +457,7 @@  static void __init e820_reserve_setup_data(void)
 		return;
 
 	while (pa_data) {
-		data = early_memremap(pa_data, sizeof(*data));
+		data = sme_early_memremap(pa_data, sizeof(*data));
 		e820_update_range(pa_data, sizeof(*data)+data->len,
 			 E820_RAM, E820_RESERVED_KERN);
 		pa_data = data->next;
@@ -477,7 +477,7 @@  static void __init memblock_x86_reserve_range_setup_data(void)
 
 	pa_data = boot_params.hdr.setup_data;
 	while (pa_data) {
-		data = early_memremap(pa_data, sizeof(*data));
+		data = sme_early_memremap(pa_data, sizeof(*data));
 		memblock_reserve(pa_data, sizeof(*data) + data->len);
 		pa_data = data->next;
 		early_memunmap(data, sizeof(*data));
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 5f19ede..7d56d1b 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -14,12 +14,55 @@ 
 #include <linux/mm.h>
 
 #include <asm/mem_encrypt.h>
+#include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
 
 /* Buffer used for early in-place encryption by BSP, no locking needed */
 static char me_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
 
+int sme_set_mem_enc(void *vaddr, unsigned long size)
+{
+	unsigned long addr, numpages;
+
+	if (!sme_me_mask)
+		return 0;
+
+	addr = (unsigned long)vaddr & PAGE_MASK;
+	numpages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	/*
+	 * The set_memory_xxx functions take an integer for numpages, make
+	 * sure it doesn't exceed that.
+	 */
+	if (numpages > INT_MAX)
+		return -EINVAL;
+
+	return set_memory_enc(addr, numpages);
+}
+EXPORT_SYMBOL_GPL(sme_set_mem_enc);
+
+int sme_set_mem_dec(void *vaddr, unsigned long size)
+{
+	unsigned long addr, numpages;
+
+	if (!sme_me_mask)
+		return 0;
+
+	addr = (unsigned long)vaddr & PAGE_MASK;
+	numpages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	/*
+	 * The set_memory_xxx functions take an integer for numpages, make
+	 * sure it doesn't exceed that.
+	 */
+	if (numpages > INT_MAX)
+		return -EINVAL;
+
+	return set_memory_dec(addr, numpages);
+}
+EXPORT_SYMBOL_GPL(sme_set_mem_dec);
+
 void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
 {
 	void *src, *dst;
@@ -104,6 +147,12 @@  void __init sme_early_mem_dec(resource_size_t paddr, unsigned long size)
 	}
 }
 
+void __init *sme_early_memremap(resource_size_t paddr,
+				unsigned long size)
+{
+	return early_memremap_dec(paddr, size);
+}
+
 void __init sme_early_init(void)
 {
 	unsigned int i;
@@ -117,3 +166,10 @@  void __init sme_early_init(void)
 	for (i = 0; i < ARRAY_SIZE(protection_map); i++)
 		protection_map[i] = __pgprot(pgprot_val(protection_map[i]) | sme_me_mask);
 }
+
+/* Architecture __weak replacement functions */
+void __init *efi_me_early_memremap(resource_size_t paddr,
+				   unsigned long size)
+{
+	return sme_early_memremap(paddr, size);
+}
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index c055302..0384fb3 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1731,6 +1731,81 @@  int set_memory_4k(unsigned long addr, int numpages)
 					__pgprot(0), 1, 0, NULL);
 }
 
+static int __set_memory_enc_dec(struct cpa_data *cpa)
+{
+	unsigned long addr;
+	int numpages;
+	int ret;
+
+	if (*cpa->vaddr & ~PAGE_MASK) {
+		*cpa->vaddr &= PAGE_MASK;
+
+		/* People should not be passing in unaligned addresses */
+		WARN_ON_ONCE(1);
+	}
+
+	addr = *cpa->vaddr;
+	numpages = cpa->numpages;
+
+	/* Must avoid aliasing mappings in the highmem code */
+	kmap_flush_unused();
+	vm_unmap_aliases();
+
+	ret = __change_page_attr_set_clr(cpa, 1);
+
+	/* Check whether we really changed something */
+	if (!(cpa->flags & CPA_FLUSHTLB))
+		goto out;
+
+	/*
+	 * On success we use CLFLUSH, when the CPU supports it to
+	 * avoid the WBINVD.
+	 */
+	if (!ret && static_cpu_has(X86_FEATURE_CLFLUSH))
+		cpa_flush_range(addr, numpages, 1);
+	else
+		cpa_flush_all(1);
+
+out:
+	return ret;
+}
+
+int set_memory_enc(unsigned long addr, int numpages)
+{
+	struct cpa_data cpa;
+
+	if (!sme_me_mask)
+		return 0;
+
+	memset(&cpa, 0, sizeof(cpa));
+	cpa.vaddr = &addr;
+	cpa.numpages = numpages;
+	cpa.mask_set = __pgprot(_PAGE_ENC);
+	cpa.mask_clr = __pgprot(0);
+	cpa.pgd = init_mm.pgd;
+
+	return __set_memory_enc_dec(&cpa);
+}
+EXPORT_SYMBOL(set_memory_enc);
+
+int set_memory_dec(unsigned long addr, int numpages)
+{
+	struct cpa_data cpa;
+
+	if (!sme_me_mask)
+		return 0;
+
+	memset(&cpa, 0, sizeof(cpa));
+	cpa.vaddr = &addr;
+	cpa.numpages = numpages;
+	cpa.mask_set = __pgprot(0);
+	cpa.mask_clr = __pgprot(_PAGE_ENC);
+	cpa.pgd = init_mm.pgd;
+
+	return __set_memory_enc_dec(&cpa);
+}
+EXPORT_SYMBOL(set_memory_dec);
+
 int set_pages_uc(struct page *page, int numpages)
 {
 	unsigned long addr = (unsigned long)page_address(page);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 994a7df8..871b213 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -53,6 +53,7 @@ 
 #include <asm/x86_init.h>
 #include <asm/rtc.h>
 #include <asm/uv/uv.h>
+#include <asm/mem_encrypt.h>
 
 #define EFI_DEBUG
 
@@ -261,12 +262,12 @@  static int __init efi_systab_init(void *phys)
 		u64 tmp = 0;
 
 		if (efi_setup) {
-			data = early_memremap(efi_setup, sizeof(*data));
+			data = sme_early_memremap(efi_setup, sizeof(*data));
 			if (!data)
 				return -ENOMEM;
 		}
-		systab64 = early_memremap((unsigned long)phys,
-					 sizeof(*systab64));
+		systab64 = sme_early_memremap((unsigned long)phys,
+					      sizeof(*systab64));
 		if (systab64 == NULL) {
 			pr_err("Couldn't map the system table!\n");
 			if (data)
@@ -314,8 +315,8 @@  static int __init efi_systab_init(void *phys)
 	} else {
 		efi_system_table_32_t *systab32;
 
-		systab32 = early_memremap((unsigned long)phys,
-					 sizeof(*systab32));
+		systab32 = sme_early_memremap((unsigned long)phys,
+					      sizeof(*systab32));
 		if (systab32 == NULL) {
 			pr_err("Couldn't map the system table!\n");
 			return -ENOMEM;
@@ -361,8 +362,8 @@  static int __init efi_runtime_init32(void)
 {
 	efi_runtime_services_32_t *runtime;
 
-	runtime = early_memremap((unsigned long)efi.systab->runtime,
-			sizeof(efi_runtime_services_32_t));
+	runtime = sme_early_memremap((unsigned long)efi.systab->runtime,
+				     sizeof(efi_runtime_services_32_t));
 	if (!runtime) {
 		pr_err("Could not map the runtime service table!\n");
 		return -ENOMEM;
@@ -385,8 +386,8 @@  static int __init efi_runtime_init64(void)
 {
 	efi_runtime_services_64_t *runtime;
 
-	runtime = early_memremap((unsigned long)efi.systab->runtime,
-			sizeof(efi_runtime_services_64_t));
+	runtime = sme_early_memremap((unsigned long)efi.systab->runtime,
+				     sizeof(efi_runtime_services_64_t));
 	if (!runtime) {
 		pr_err("Could not map the runtime service table!\n");
 		return -ENOMEM;
@@ -444,8 +445,8 @@  static int __init efi_memmap_init(void)
 		return 0;
 
 	/* Map the EFI memory map */
-	memmap.map = early_memremap((unsigned long)memmap.phys_map,
-				   memmap.nr_map * memmap.desc_size);
+	memmap.map = sme_early_memremap((unsigned long)memmap.phys_map,
+					memmap.nr_map * memmap.desc_size);
 	if (memmap.map == NULL) {
 		pr_err("Could not map the memory map!\n");
 		return -ENOMEM;
@@ -490,7 +491,7 @@  void __init efi_init(void)
 	/*
 	 * Show what we know for posterity
 	 */
-	c16 = tmp = early_memremap(efi.systab->fw_vendor, 2);
+	c16 = tmp = sme_early_memremap(efi.systab->fw_vendor, 2);
 	if (c16) {
 		for (i = 0; i < sizeof(vendor) - 1 && *c16; ++i)
 			vendor[i] = *c16++;
@@ -690,6 +691,7 @@  static void *realloc_pages(void *old_memmap, int old_shift)
 	ret = (void *)__get_free_pages(GFP_KERNEL, old_shift + 1);
 	if (!ret)
 		goto out;
+	sme_set_mem_dec(ret, PAGE_SIZE << (old_shift + 1));
 
 	/*
 	 * A first-time allocation doesn't have anything to copy.
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 49e4dd4..834a992 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -223,7 +223,7 @@  int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	if (efi_enabled(EFI_OLD_MEMMAP))
 		return 0;
 
-	efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+	efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
 	pgd = efi_pgd;
 
 	/*
@@ -262,7 +262,8 @@  int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 		pfn = md->phys_addr >> PAGE_SHIFT;
 		npages = md->num_pages;
 
-		if (kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, npages, _PAGE_RW)) {
+		if (kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, npages,
+					    _PAGE_RW | _PAGE_ENC)) {
 			pr_err("Failed to map 1:1 memory\n");
 			return 1;
 		}
@@ -272,6 +273,7 @@  int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	if (!page)
 		panic("Unable to allocate EFI runtime stack < 4GB\n");
 
+	sme_set_mem_dec(page_address(page), PAGE_SIZE);
 	efi_scratch.phys_stack = virt_to_phys(page_address(page));
 	efi_scratch.phys_stack += PAGE_SIZE; /* stack grows down */
 
@@ -279,7 +281,8 @@  int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	text = __pa(_text);
 	pfn = text >> PAGE_SHIFT;
 
-	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
+	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages,
+				    _PAGE_RW | _PAGE_ENC)) {
 		pr_err("Failed to map kernel text 1:1\n");
 		return 1;
 	}
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index ab50ada..dde4fb6b 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -13,6 +13,7 @@ 
 #include <linux/dmi.h>
 #include <asm/efi.h>
 #include <asm/uv/uv.h>
+#include <asm/mem_encrypt.h>
 
 #define EFI_MIN_RESERVE 5120
 
@@ -265,6 +266,13 @@  void __init efi_free_boot_services(void)
 		if (md->attribute & EFI_MEMORY_RUNTIME)
 			continue;
 
+		/*
+		 * Change the mapping to encrypted memory before freeing.
+		 * This insures any future allocations of this mapped area
+		 * are used encrypted.
+		 */
+		sme_set_mem_enc(__va(start), size);
+
 		free_bootmem_late(start, size);
 	}
 
@@ -292,7 +300,7 @@  int __init efi_reuse_config(u64 tables, int nr_tables)
 	if (!efi_enabled(EFI_64BIT))
 		return 0;
 
-	data = early_memremap(efi_setup, sizeof(*data));
+	data = sme_early_memremap(efi_setup, sizeof(*data));
 	if (!data) {
 		ret = -ENOMEM;
 		goto out;
@@ -303,7 +311,7 @@  int __init efi_reuse_config(u64 tables, int nr_tables)
 
 	sz = sizeof(efi_config_table_64_t);
 
-	p = tablep = early_memremap(tables, nr_tables * sz);
+	p = tablep = sme_early_memremap(tables, nr_tables * sz);
 	if (!p) {
 		pr_err("Could not map Configuration table!\n");
 		ret = -ENOMEM;
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 3a69ed5..25010c7 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -76,6 +76,16 @@  static int __init parse_efi_cmdline(char *str)
 }
 early_param("efi", parse_efi_cmdline);
 
+/*
+ * If memory encryption is supported, then an override to this function
+ * will be provided.
+ */
+void __weak __init *efi_me_early_memremap(resource_size_t phys_addr,
+					  unsigned long size)
+{
+	return early_memremap(phys_addr, size);
+}
+
 struct kobject *efi_kobj;
 
 /*
@@ -289,9 +299,9 @@  int __init efi_mem_desc_lookup(u64 phys_addr, efi_memory_desc_t *out_md)
 		 * So just always get our own virtual map on the CPU.
 		 *
 		 */
-		md = early_memremap(p, sizeof (*md));
+		md = efi_me_early_memremap(p, sizeof (*md));
 		if (!md) {
-			pr_err_once("early_memremap(%pa, %zu) failed.\n",
+			pr_err_once("efi_me_early_memremap(%pa, %zu) failed.\n",
 				    &p, sizeof (*md));
 			return -ENOMEM;
 		}
@@ -431,8 +441,8 @@  int __init efi_config_init(efi_config_table_type_t *arch_tables)
 	/*
 	 * Let's see what config tables the firmware passed to us.
 	 */
-	config_tables = early_memremap(efi.systab->tables,
-				       efi.systab->nr_tables * sz);
+	config_tables = efi_me_early_memremap(efi.systab->tables,
+					      efi.systab->nr_tables * sz);
 	if (config_tables == NULL) {
 		pr_err("Could not map Configuration table!\n");
 		return -ENOMEM;
diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c
index 75feb3f..7a96bc6 100644
--- a/drivers/firmware/efi/esrt.c
+++ b/drivers/firmware/efi/esrt.c
@@ -273,10 +273,10 @@  void __init efi_esrt_init(void)
 		return;
 	}
 
-	va = early_memremap(efi.esrt, size);
+	va = efi_me_early_memremap(efi.esrt, size);
 	if (!va) {
-		pr_err("early_memremap(%p, %zu) failed.\n", (void *)efi.esrt,
-		       size);
+		pr_err("efi_me_early_memremap(%p, %zu) failed.\n",
+		       (void *)efi.esrt, size);
 		return;
 	}
 
@@ -323,10 +323,10 @@  void __init efi_esrt_init(void)
 	/* remap it with our (plausible) new pages */
 	early_memunmap(va, size);
 	size += entries_size;
-	va = early_memremap(efi.esrt, size);
+	va = efi_me_early_memremap(efi.esrt, size);
 	if (!va) {
-		pr_err("early_memremap(%p, %zu) failed.\n", (void *)efi.esrt,
-		       size);
+		pr_err("efi_me_early_memremap(%p, %zu) failed.\n",
+		       (void *)efi.esrt, size);
 		return;
 	}
 
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 1626474..557c774 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -957,6 +957,9 @@  extern void __init efi_fake_memmap(void);
 static inline void efi_fake_memmap(void) { }
 #endif
 
+extern void __weak __init *efi_me_early_memremap(resource_size_t phys_addr,
+						 unsigned long size);
+
 /* Iterate through an efi_memory_map */
 #define for_each_efi_memory_desc(m, md)					   \
 	for ((md) = (m)->map;						   \