[RFC,v3,13/20] x86: DMA support for memory encryption

Message ID	20161110003723.3280.62636.stgit@tlendack-t1.amdoffice.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Tom Lendacky <thomas.lendacky@amd.com> Subject: [RFC PATCH v3 13/20] x86: DMA support for memory encryption To: <linux-arch@vger.kernel.org>, <linux-efi@vger.kernel.org>, <kvm@vger.kernel.org>, <linux-doc@vger.kernel.org>, <x86@kernel.org>, <linux-kernel@vger.kernel.org>, <kasan-dev@googlegroups.com>, <linux-mm@kvack.org>, <iommu@lists.linux-foundation.org> CC: Rik van Riel <riel@redhat.com>, Radim =?utf-8?b?S3LEjW3DocWZ?= <rkrcmar@redhat.com>, Arnd Bergmann <arnd@arndb.de>, Jonathan Corbet <corbet@lwn.net>, Matt Fleming <matt@codeblueprint.co.uk>, Joerg Roedel <joro@8bytes.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, "Paolo Bonzini" <pbonzini@redhat.com>, Larry Woodman <lwoodman@redhat.com>, "Ingo Molnar" <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>, Andrey Ryabinin <aryabinin@virtuozzo.com>, Alexander Potapenko <glider@google.com>, "Thomas Gleixner" <tglx@linutronix.de>, Dmitry Vyukov <dvyukov@google.com> Date: Wed, 9 Nov 2016 18:37:23 -0600 Message-ID: <20161110003723.3280.62636.stgit@tlendack-t1.amdoffice.net> In-Reply-To: <20161110003426.3280.2999.stgit@tlendack-t1.amdoffice.net> References: <20161110003426.3280.2999.stgit@tlendack-t1.amdoffice.net> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtDWTRQUjEyTUIxMTQxOzIzOnpwWXo5OG9ha242MFpGWDZZcHZKdDUrblRX?= =?utf-8?B?UU1vWTdYdzMyd0hzamxRRjIzS1U4QklqQm83QzZOOHk4RlVlazl4ZitzUnBm?= =?utf-8?B?OUNvcVJXWkR6RDBQS2EvQmZncHNBSFJOWENJc3I3aDRBUFRGSVUwczlmVUdS?= =?utf-8?B?S1FxeTFHdE9FODducUZwb2hTZWNoNUtsa3huTUtKTGt3RlZhMm5TellLa0hi?= =?utf-8?B?SEtQWFp5TWxRdzVLTW1MQXVUczIvczV0RG95d0VnTnJIeVMwRnlOUGJYUXUy?= =?utf-8?B?S1J6d3JsKy91QlZsRGZ6QXVMVWFsdzdBcXJKVVJsdlN1UFVoZVZYSmZMRmJu?= =?utf-8?B?SGVkOW5xVjZTWVV3M2JpSWtrd2VudEZJVEhtV1IrVlREcmYwczdSRVVSU0ZH?= =?utf-8?B?amYwZTlYU0tkUk13aHk0dkNiWW9mVU02QXZOeXZTUnc1aW5zeHFWdWM1YmNm?= =?utf-8?B?QkJOQUtIeDZmcUhGSk9mdUdwTS9QQWp0dzVPT1NzOEw3eVdtNjdTZjFSUU1D?= =?utf-8?B?MTAwNEJRMzVKcFRXTUN3dHk1amFrQ1NTTllYNmlySUZ6MXdUemN3QWxTWUZH?= =?utf-8?B?NTZyeHpqcEZWeVVQNWxrVEI5V09oNTRIa1NIM294d0craXQxMWNuQUZydFJz?= =?utf-8?B?d0NRMHBEcnJ2VkdnbkFWNHlRR0pSNERZRExCQ1ZSemlBRXVxRHZ1UEQ1NGE0?= =?utf-8?B?RzlVMm41NzQvR1djNUlab1VFV1pwb3o3b0U5cVVNTXd4ak5aNEVwVFZZUDdv?= =?utf-8?B?WlYwMi9MbDRkbzYwaHQyaG4wdTRtbklDc2lwVE5rRDlncFJGZnoycUR4MXdt?= =?utf-8?B?Y2c0bXBxdjhsY3NCNk93c0lSejlWZmwxQWpRUFJOaXBVdmI2WU9QeXR4bGs5?= =?utf-8?B?VGNZVTh5bklPR0Q5ZHdjMHpCSG1SV3RDUS9UdTloZzhzRG9SeklIVFZJUGk2?= =?utf-8?B?VC94NjM3TDgrQVZ4L3B4TVdJdytEbGZVOHdNQXhtVlo5YUlnbXFmWXJiTnh2?= =?utf-8?B?Z2FxY09kR3VtTE9YZU9WT1FrY1c4bXpmUjBjcWlJZkNXVW04N0NWcE91bzFK?= =?utf-8?B?eWtHckZ1cklhayt2dFM3WUhwSGFoRkFvTURKYnphU1gzRUs5SURwUlphZGJv?= =?utf-8?B?M3F6czlwRk5lZ094SUdudWNzOTdXREhJQm43NGU0aTF0SkFFYXhLSUhiYVRN?= =?utf-8?B?QmJKaWlaMnEwOUZGMkhuL21RbUtyamV2VDdoaC9mNE02cjRCQkVWMUZsdE5v?= =?utf-8?B?MjFvbmRVTVU1aVkvWldmSy96OURYdFMwNWp1ZTRyMWUzY3UvRHpXWXlWUC9l?= =?utf-8?B?dXA0V0lDNkVma3NFNWk4anYrenoxR2Y4SEIzMFlUMU1LUVhaQzVEcGlYUHdG?= =?utf-8?B?WE9hOTFvMDhjdTNCQ1ZwRnlPdXEvLzd6NWVlMlRCRDZ6ZS9scUxwQlAwOXJC?= =?utf-8?B?bEVpYUFnUmo3U2pXTHErSWhiemVLNjR1NTdzZjZOVEUvVWhSZ0sxK3BqRTZD?= =?utf-8?B?dmV4TjFkTVVxVmZtVVh2SkpQdDhpQWtNWlk5WEc4djA3ZHNvcExhUTJMMEU3?= =?utf-8?B?OGNCSjBXV2tic2NaVUJvRStpa3R5N2UxcDU2ZW15WHNTcTk2VC92Ky96c2ZY?= =?utf-8?B?SDBDeEs1U2pQdFVBckR2Rk9STlZQcG4zVFFmdG1EM3pUMlMxcnp4T0lzZTB1?= =?utf-8?Q?RVT39NhhriSHiMWhs8pJPacI+PFZga2KaMBIorI?= X-Microsoft-Exchange-Diagnostics: 1; CY4PR12MB1141; 6:tCHKLn9J4Ra91SL9pr7kGkJDimWtkQZh0zKzmv+8Vf3r5wJcF+3PJRGYjdvbjAea16v3or5rQ3pLWlqMpIyaC7+TzP4/tblQ6Vp+dxCy25SwYzd/jR83fcPbvjmYd85y/EN7dkzuCgw6/qOAW7hfzVMwEJAcvu+HQwVn6qJDPlfZd8rYsth4mUtzJNwgmj9T9kpfEIyqB0+MYYu8NxK3MlIViiakzSx3RcQT3/K2HpVhhX+4iifl6qpGmO2C8IaNZRH81vl/FpR0AViR8R69EJuqHSCF5QHbuqHXab2C0h0AD64sxzarvKNzqGpVc6wsZ84MjPKop6XhQn5VViZH5x4wmaoatlrEOcCx+5T1v9E=; 5:/lFwbYn9qgZxcCECnUewQ+1r3n//j0sjZMuNHmVi77v5oCEZS+6LPDDkAWAU+bwYe3gDnaQyu2qF0/3NA7sCxiGsiPFAP/3EIxVDuJoG53Pn69c2jbX+mCCQSYpNp1IcZpGwVE7HpUWy06jDuMkHoA==; 24:kmhtqU04b0Ri41+S1oeeOnNAK6p1sSLcDPo5r7OsUHUqOmbjslC3/+IPgluzsufFOxwKDK5TMes0qF6wMs4bqhH5l5SsOLE5rWRFKmlQIYk= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; CY4PR12MB1141; 7:lCvnXg4ddKwmL74JxECgjQ6S40zu79MUZNWhfEXpt3XSwDfrzOE3W6VEZJvE7+HUKbNfsjtNebemoXVURC4rXFWruJD/yBAMF0W7iMLbL8+vNYevZRMcoN9RL8uoaTilnNEngPb4wwzLkoJ3jc1sN4tzQsp8SrnRNFJIrQRgP4Wez+y55A4JfdfZHHo8ywNsBh+LzYVKojplN6GZmcW5QMMs/0SkvPJqKauD4AFQA/FNbK7iEtWOV+Q1rwJAdhFDO61uaeEc9sXe6cObZthohGgi4WuadpezEobcDtSaqskrTuq6/QOeLDPWmoS11Zky1VAFiFGCVwFDc7IucLPKEiAebO3lGV1oLU1V8Msl0xs=; 20:ZHmCnT16hW5WTGYikTqzoocE0cXDz6rST47lTJiJilAcuuZI5QClWHwEk3STmpCayKdGhDCctVBTvqI7lB76bD+nQohPLhcxDQQly4pV6h0oCanBbTv3B808nHhWqmuCVSjeh9m77O4wXgfoiBKwZsZ9fvIB670Lz76WPl/Cwzozah2OTQXEivtUUEEcUXQLwZTeeEenirIV15xEd91AP8TSkDddidbqtYNiwomYTt1enQW4nHN3LcbmzRwn7W1X Sender: kvm-owner@vger.kernel.org Precedence: bulk

Tom Lendacky Nov. 10, 2016, 12:37 a.m. UTC

Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create un-encrypted bounce buffers for use by these devices.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/dma-mapping.h |    5 ++-
 arch/x86/include/asm/mem_encrypt.h |    5 +++
 arch/x86/kernel/pci-dma.c          |   11 ++++---
 arch/x86/kernel/pci-nommu.c        |    2 +
 arch/x86/kernel/pci-swiotlb.c      |    8 ++++-
 arch/x86/mm/mem_encrypt.c          |   17 +++++++++++
 include/linux/swiotlb.h            |    1 +
 init/main.c                        |   13 ++++++++
 lib/swiotlb.c                      |   58 +++++++++++++++++++++++++++++++-----
 9 files changed, 103 insertions(+), 17 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář Nov. 15, 2016, 2:39 p.m. UTC | #1

2016-11-09 18:37-0600, Tom Lendacky:
> Since DMA addresses will effectively look like 48-bit addresses when the
> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
> device performing the DMA does not support 48-bits. SWIOTLB will be
> initialized to create un-encrypted bounce buffers for use by these devices.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
> diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
> @@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
>  				 enum dma_data_direction dir,
>  				 unsigned long attrs)
>  {
> -	dma_addr_t bus = page_to_phys(page) + offset;
> +	dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
>  	WARN_ON(size == 0);
>  	if (!check_addr("map_single", dev, bus, size))
>  		return DMA_ERROR_CODE;
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> @@ -12,6 +12,8 @@
>  int swiotlb __read_mostly;
>  
>  void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>   *
>   * This returns non-zero if we are forced to use swiotlb (by the boot
> - * option).
> + * option). If memory encryption is enabled then swiotlb will be set
> + * to 1 so that bounce buffers are allocated and used for devices that
> + * do not support the addressing range required for the encryption mask.
>   */
>  int __init pci_swiotlb_detect_override(void)
>  {
>  	int use_swiotlb = swiotlb | swiotlb_force;
>  
> -	if (swiotlb_force)
> +	if (swiotlb_force || sme_me_mask)
>  		swiotlb = 1;
>  
>  	return use_swiotlb;

We want to return 1 even if only sme_me_mask is 1, because the return
value is used for detection.  The following would be less obscure, IMO:

	if (swiotlb_force || sme_me_mask)
		swiotlb = 1;

	return swiotlb;

> diff --git a/init/main.c b/init/main.c
> @@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void)
>  	 */
>  	locking_selftest();
>  
> +	/*
> +	 * This needs to be called before any devices perform DMA
> +	 * operations that might use the swiotlb bounce buffers.
> +	 * This call will mark the bounce buffers as un-encrypted so
> +	 * that their usage will not cause "plain-text" data to be
> +	 * decrypted when accessed.
> +	 */
> +	mem_encrypt_init();

(Comments below are connected to the reason why we call this.)

> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> @@ -159,6 +171,31 @@ void swiotlb_print_info(void)
> +/*
> + * If memory encryption is active, the DMA address for an encrypted page may
> + * be beyond the range of the device. If bounce buffers are required be sure
> + * that they are not on an encrypted page. This should be called before the
> + * iotlb area is used.
> + */
> +void __init swiotlb_clear_encryption(void)
> +{
> +	void *vaddr;
> +	unsigned long bytes;
> +
> +	if (no_iotlb_memory || !io_tlb_start || late_alloc)

io_tlb_start seems redundant -- when can !no_iotlb_memory &&
!io_tlb_start happen?

Is the order of calls
  1) swiotlb init
  2) SME init
  3) swiotlb late init 
?

We setup encrypted swiotlb and then decrypt it, but sometimes set it up
decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
directly?

> +		return;
> +
> +	vaddr = phys_to_virt(io_tlb_start);
> +	bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
> +	swiotlb_set_mem_unenc(vaddr, bytes);
> +	memset(vaddr, 0, bytes);
> +
> +	vaddr = phys_to_virt(io_tlb_overflow_buffer);
> +	bytes = PAGE_ALIGN(io_tlb_overflow);
> +	swiotlb_set_mem_unenc(vaddr, bytes);
> +	memset(vaddr, 0, bytes);
> +}
> +
> @@ -541,7 +583,7 @@ static phys_addr_t
>  map_single(struct device *hwdev, phys_addr_t phys, size_t size,
>  	   enum dma_data_direction dir)
>  {
> -	dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
> +	dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);

We have decrypted io_tlb_start before, so shouldn't its physical address
be saved without the sme bit?  (Which changes a lot ...)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin Nov. 15, 2016, 3:16 p.m. UTC | #2

On Wed, Nov 09, 2016 at 06:37:23PM -0600, Tom Lendacky wrote:
> Since DMA addresses will effectively look like 48-bit addresses when the
> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
> device performing the DMA does not support 48-bits. SWIOTLB will be
> initialized to create un-encrypted bounce buffers for use by these devices.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/dma-mapping.h |    5 ++-
>  arch/x86/include/asm/mem_encrypt.h |    5 +++
>  arch/x86/kernel/pci-dma.c          |   11 ++++---
>  arch/x86/kernel/pci-nommu.c        |    2 +
>  arch/x86/kernel/pci-swiotlb.c      |    8 ++++-
>  arch/x86/mm/mem_encrypt.c          |   17 +++++++++++
>  include/linux/swiotlb.h            |    1 +
>  init/main.c                        |   13 ++++++++
>  lib/swiotlb.c                      |   58 +++++++++++++++++++++++++++++++-----
>  9 files changed, 103 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> index 4446162..c9cdcae 100644
> --- a/arch/x86/include/asm/dma-mapping.h
> +++ b/arch/x86/include/asm/dma-mapping.h
> @@ -12,6 +12,7 @@
>  #include <asm/io.h>
>  #include <asm/swiotlb.h>
>  #include <linux/dma-contiguous.h>
> +#include <asm/mem_encrypt.h>
>  
>  #ifdef CONFIG_ISA
>  # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
> @@ -69,12 +70,12 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
>  
>  static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
>  {
> -	return paddr;
> +	return paddr | sme_me_mask;
>  }
>  
>  static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
>  {
> -	return daddr;
> +	return daddr & ~sme_me_mask;
>  }
>  #endif /* CONFIG_X86_DMA_REMAP */
>  
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index d544481..a024451 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -35,6 +35,11 @@ void __init sme_encrypt_ramdisk(resource_size_t paddr,
>  
>  void __init sme_early_init(void);
>  
> +/* Architecture __weak replacement functions */
> +void __init mem_encrypt_init(void);
> +
> +void swiotlb_set_mem_unenc(void *vaddr, unsigned long size);
> +
>  #define __sme_pa(x)		(__pa((x)) | sme_me_mask)
>  #define __sme_pa_nodebug(x)	(__pa_nodebug((x)) | sme_me_mask)
>  
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index d30c377..0ce28df 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -92,9 +92,12 @@ again:
>  	/* CMA can be used only in the context which permits sleeping */
>  	if (gfpflags_allow_blocking(flag)) {
>  		page = dma_alloc_from_contiguous(dev, count, get_order(size));
> -		if (page && page_to_phys(page) + size > dma_mask) {
> -			dma_release_from_contiguous(dev, page, count);
> -			page = NULL;
> +		if (page) {
> +			addr = phys_to_dma(dev, page_to_phys(page));
> +			if (addr + size > dma_mask) {
> +				dma_release_from_contiguous(dev, page, count);
> +				page = NULL;
> +			}
>  		}
>  	}
>  	/* fallback */
> @@ -103,7 +106,7 @@ again:
>  	if (!page)
>  		return NULL;
>  
> -	addr = page_to_phys(page);
> +	addr = phys_to_dma(dev, page_to_phys(page));
>  	if (addr + size > dma_mask) {
>  		__free_pages(page, get_order(size));
>  
> diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
> index 00e71ce..922c10d 100644
> --- a/arch/x86/kernel/pci-nommu.c
> +++ b/arch/x86/kernel/pci-nommu.c
> @@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
>  				 enum dma_data_direction dir,
>  				 unsigned long attrs)
>  {
> -	dma_addr_t bus = page_to_phys(page) + offset;
> +	dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
>  	WARN_ON(size == 0);
>  	if (!check_addr("map_single", dev, bus, size))
>  		return DMA_ERROR_CODE;
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index b47edb8..34a9e524 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -12,6 +12,8 @@
>  #include <asm/dma.h>
>  #include <asm/xen/swiotlb-xen.h>
>  #include <asm/iommu_table.h>
> +#include <asm/mem_encrypt.h>
> +
>  int swiotlb __read_mostly;
>  
>  void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>   *
>   * This returns non-zero if we are forced to use swiotlb (by the boot
> - * option).
> + * option). If memory encryption is enabled then swiotlb will be set
> + * to 1 so that bounce buffers are allocated and used for devices that
> + * do not support the addressing range required for the encryption mask.
>   */
>  int __init pci_swiotlb_detect_override(void)
>  {
>  	int use_swiotlb = swiotlb | swiotlb_force;
>  
> -	if (swiotlb_force)
> +	if (swiotlb_force || sme_me_mask)
>  		swiotlb = 1;
>  
>  	return use_swiotlb;
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 41cfdf9..e351003 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -13,6 +13,8 @@
>  #include <linux/linkage.h>
>  #include <linux/init.h>
>  #include <linux/mm.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/swiotlb.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/fixmap.h>
> @@ -240,3 +242,18 @@ void __init sme_early_init(void)
>  	for (i = 0; i < ARRAY_SIZE(protection_map); i++)
>  		protection_map[i] = __pgprot(pgprot_val(protection_map[i]) | sme_me_mask);
>  }
> +
> +/* Architecture __weak replacement functions */
> +void __init mem_encrypt_init(void)
> +{
> +	if (!sme_me_mask)
> +		return;
> +
> +	/* Make SWIOTLB use an unencrypted DMA area */
> +	swiotlb_clear_encryption();
> +}
> +
> +void swiotlb_set_mem_unenc(void *vaddr, unsigned long size)
> +{
> +	sme_set_mem_unenc(vaddr, size);
> +}
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 5f81f8a..5c909fc 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -29,6 +29,7 @@ int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
>  extern unsigned long swiotlb_nr_tbl(void);
>  unsigned long swiotlb_size_or_default(void);
>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> +extern void __init swiotlb_clear_encryption(void);
>  
>  /*
>   * Enumeration for sync targets
> diff --git a/init/main.c b/init/main.c
> index a8a58e2..ae37f0d 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -458,6 +458,10 @@ void __init __weak thread_stack_cache_init(void)
>  }
>  #endif
>  
> +void __init __weak mem_encrypt_init(void)
> +{
> +}
> +
>  /*
>   * Set up kernel memory allocators
>   */
> @@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void)
>  	 */
>  	locking_selftest();
>  
> +	/*
> +	 * This needs to be called before any devices perform DMA
> +	 * operations that might use the swiotlb bounce buffers.
> +	 * This call will mark the bounce buffers as un-encrypted so
> +	 * that their usage will not cause "plain-text" data to be
> +	 * decrypted when accessed.
> +	 */
> +	mem_encrypt_init();
> +
>  #ifdef CONFIG_BLK_DEV_INITRD
>  	if (initrd_start && !initrd_below_start_ok &&
>  	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index 22e13a0..638e99c 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -30,6 +30,7 @@
>  #include <linux/highmem.h>
>  #include <linux/gfp.h>
>  #include <linux/scatterlist.h>
> +#include <linux/mem_encrypt.h>
>  
>  #include <asm/io.h>
>  #include <asm/dma.h>
> @@ -131,6 +132,17 @@ unsigned long swiotlb_size_or_default(void)
>  	return size ? size : (IO_TLB_DEFAULT_SIZE);
>  }
>  
> +void __weak swiotlb_set_mem_unenc(void *vaddr, unsigned long size)
> +{
> +}
> +
> +/* For swiotlb, clear memory encryption mask from dma addresses */
> +static dma_addr_t swiotlb_phys_to_dma(struct device *hwdev,
> +				      phys_addr_t address)
> +{
> +	return phys_to_dma(hwdev, address) & ~sme_me_mask;
> +}
> +
>  /* Note that this doesn't work with highmem page */
>  static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
>  				      volatile void *address)
> @@ -159,6 +171,31 @@ void swiotlb_print_info(void)
>  	       bytes >> 20, vstart, vend - 1);
>  }
>  
> +/*
> + * If memory encryption is active, the DMA address for an encrypted page may
> + * be beyond the range of the device. If bounce buffers are required be sure
> + * that they are not on an encrypted page. This should be called before the
> + * iotlb area is used.

Makes sense, but I think at least a dmesg warning here
might be a good idea.

A boot flag that says "don't enable devices that don't support
encryption" might be a good idea, too, since most people
don't read dmesg output and won't notice the message.


> + */
> +void __init swiotlb_clear_encryption(void)
> +{
> +	void *vaddr;
> +	unsigned long bytes;
> +
> +	if (no_iotlb_memory || !io_tlb_start || late_alloc)
> +		return;
> +
> +	vaddr = phys_to_virt(io_tlb_start);
> +	bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
> +	swiotlb_set_mem_unenc(vaddr, bytes);
> +	memset(vaddr, 0, bytes);
> +
> +	vaddr = phys_to_virt(io_tlb_overflow_buffer);
> +	bytes = PAGE_ALIGN(io_tlb_overflow);
> +	swiotlb_set_mem_unenc(vaddr, bytes);
> +	memset(vaddr, 0, bytes);
> +}
> +
>  int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>  {
>  	void *v_overflow_buffer;
> @@ -294,6 +331,8 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
>  	io_tlb_start = virt_to_phys(tlb);
>  	io_tlb_end = io_tlb_start + bytes;
>  
> +	/* Keep TLB in unencrypted memory if memory encryption is active */
> +	swiotlb_set_mem_unenc(tlb, bytes);
>  	memset(tlb, 0, bytes);
>  
>  	/*
> @@ -304,6 +343,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
>  	if (!v_overflow_buffer)
>  		goto cleanup2;
>  
> +	/* Keep overflow in unencrypted memory if memory encryption is active */
> +	swiotlb_set_mem_unenc(v_overflow_buffer, io_tlb_overflow);
> +	memset(v_overflow_buffer, 0, io_tlb_overflow);
>  	io_tlb_overflow_buffer = virt_to_phys(v_overflow_buffer);
>  
>  	/*
> @@ -541,7 +583,7 @@ static phys_addr_t
>  map_single(struct device *hwdev, phys_addr_t phys, size_t size,
>  	   enum dma_data_direction dir)
>  {
> -	dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
> +	dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
>  
>  	return swiotlb_tbl_map_single(hwdev, start_dma_addr, phys, size, dir);
>  }
> @@ -659,7 +701,7 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
>  			goto err_warn;
>  
>  		ret = phys_to_virt(paddr);
> -		dev_addr = phys_to_dma(hwdev, paddr);
> +		dev_addr = swiotlb_phys_to_dma(hwdev, paddr);
>  
>  		/* Confirm address can be DMA'd by device */
>  		if (dev_addr + size - 1 > dma_mask) {
> @@ -758,15 +800,15 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
>  	map = map_single(dev, phys, size, dir);
>  	if (map == SWIOTLB_MAP_ERROR) {
>  		swiotlb_full(dev, size, dir, 1);
> -		return phys_to_dma(dev, io_tlb_overflow_buffer);
> +		return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
>  	}
>  
> -	dev_addr = phys_to_dma(dev, map);
> +	dev_addr = swiotlb_phys_to_dma(dev, map);
>  
>  	/* Ensure that the address returned is DMA'ble */
>  	if (!dma_capable(dev, dev_addr, size)) {
>  		swiotlb_tbl_unmap_single(dev, map, size, dir);
> -		return phys_to_dma(dev, io_tlb_overflow_buffer);
> +		return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
>  	}
>  
>  	return dev_addr;
> @@ -901,7 +943,7 @@ swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
>  				sg_dma_len(sgl) = 0;
>  				return 0;
>  			}
> -			sg->dma_address = phys_to_dma(hwdev, map);
> +			sg->dma_address = swiotlb_phys_to_dma(hwdev, map);
>  		} else
>  			sg->dma_address = dev_addr;
>  		sg_dma_len(sg) = sg->length;
> @@ -985,7 +1027,7 @@ EXPORT_SYMBOL(swiotlb_sync_sg_for_device);
>  int
>  swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
>  {
> -	return (dma_addr == phys_to_dma(hwdev, io_tlb_overflow_buffer));
> +	return (dma_addr == swiotlb_phys_to_dma(hwdev, io_tlb_overflow_buffer));
>  }
>  EXPORT_SYMBOL(swiotlb_dma_mapping_error);
>  
> @@ -998,6 +1040,6 @@ EXPORT_SYMBOL(swiotlb_dma_mapping_error);
>  int
>  swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> -	return phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
> +	return swiotlb_phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
>  }
>  EXPORT_SYMBOL(swiotlb_dma_supported);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tom Lendacky Nov. 15, 2016, 5:02 p.m. UTC | #3

On 11/15/2016 8:39 AM, Radim Krčmář wrote:
> 2016-11-09 18:37-0600, Tom Lendacky:
>> Since DMA addresses will effectively look like 48-bit addresses when the
>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>> device performing the DMA does not support 48-bits. SWIOTLB will be
>> initialized to create un-encrypted bounce buffers for use by these devices.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>> diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
>> @@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
>>  				 enum dma_data_direction dir,
>>  				 unsigned long attrs)
>>  {
>> -	dma_addr_t bus = page_to_phys(page) + offset;
>> +	dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
>>  	WARN_ON(size == 0);
>>  	if (!check_addr("map_single", dev, bus, size))
>>  		return DMA_ERROR_CODE;
>> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
>> @@ -12,6 +12,8 @@
>>  int swiotlb __read_mostly;
>>  
>>  void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
>> @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
>>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>>   *
>>   * This returns non-zero if we are forced to use swiotlb (by the boot
>> - * option).
>> + * option). If memory encryption is enabled then swiotlb will be set
>> + * to 1 so that bounce buffers are allocated and used for devices that
>> + * do not support the addressing range required for the encryption mask.
>>   */
>>  int __init pci_swiotlb_detect_override(void)
>>  {
>>  	int use_swiotlb = swiotlb | swiotlb_force;
>>  
>> -	if (swiotlb_force)
>> +	if (swiotlb_force || sme_me_mask)
>>  		swiotlb = 1;
>>  
>>  	return use_swiotlb;
> 
> We want to return 1 even if only sme_me_mask is 1, because the return
> value is used for detection.  The following would be less obscure, IMO:
> 
> 	if (swiotlb_force || sme_me_mask)
> 		swiotlb = 1;
> 
> 	return swiotlb;

If we do that then all DMA would go through the swiotlb bounce buffers.
By setting swiotlb to 1 we indicate that the bounce buffers will be
needed for those devices that can't support the addressing range when
the encryption bit is set (48 bit DMA). But if the device can support
the addressing range we won't use the bounce buffers.

> 
>> diff --git a/init/main.c b/init/main.c
>> @@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void)
>>  	 */
>>  	locking_selftest();
>>  
>> +	/*
>> +	 * This needs to be called before any devices perform DMA
>> +	 * operations that might use the swiotlb bounce buffers.
>> +	 * This call will mark the bounce buffers as un-encrypted so
>> +	 * that their usage will not cause "plain-text" data to be
>> +	 * decrypted when accessed.
>> +	 */
>> +	mem_encrypt_init();
> 
> (Comments below are connected to the reason why we call this.)
> 
>> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
>> @@ -159,6 +171,31 @@ void swiotlb_print_info(void)
>> +/*
>> + * If memory encryption is active, the DMA address for an encrypted page may
>> + * be beyond the range of the device. If bounce buffers are required be sure
>> + * that they are not on an encrypted page. This should be called before the
>> + * iotlb area is used.
>> + */
>> +void __init swiotlb_clear_encryption(void)
>> +{
>> +	void *vaddr;
>> +	unsigned long bytes;
>> +
>> +	if (no_iotlb_memory || !io_tlb_start || late_alloc)
> 
> io_tlb_start seems redundant -- when can !no_iotlb_memory &&
> !io_tlb_start happen?

Yes, the io_tlb_start check can be removed.

> 
> Is the order of calls
>   1) swiotlb init
>   2) SME init
>   3) swiotlb late init 
> ?

Yes, sort of.  The swiotlb late init may not be called.

> 
> We setup encrypted swiotlb and then decrypt it, but sometimes set it up
> decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
> directly?

When swiotlb is allocated in swiotlb_init(), it is too early to make
use of the api to the change the page attributes. Because of this,
the callback to make those changes is needed.

> 
>> +		return;
>> +
>> +	vaddr = phys_to_virt(io_tlb_start);
>> +	bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
>> +	swiotlb_set_mem_unenc(vaddr, bytes);
>> +	memset(vaddr, 0, bytes);
>> +
>> +	vaddr = phys_to_virt(io_tlb_overflow_buffer);
>> +	bytes = PAGE_ALIGN(io_tlb_overflow);
>> +	swiotlb_set_mem_unenc(vaddr, bytes);
>> +	memset(vaddr, 0, bytes);
>> +}
>> +
>> @@ -541,7 +583,7 @@ static phys_addr_t
>>  map_single(struct device *hwdev, phys_addr_t phys, size_t size,
>>  	   enum dma_data_direction dir)
>>  {
>> -	dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
>> +	dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
> 
> We have decrypted io_tlb_start before, so shouldn't its physical address
> be saved without the sme bit?  (Which changes a lot ...)

I'm not sure what you mean here, can you elaborate a bit more?

Thanks,
Tom

> 
> Thanks.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Radim Krčmář Nov. 15, 2016, 6:17 p.m. UTC | #4

2016-11-15 11:02-0600, Tom Lendacky:
> On 11/15/2016 8:39 AM, Radim Krčmář wrote:
>> 2016-11-09 18:37-0600, Tom Lendacky:
>>> Since DMA addresses will effectively look like 48-bit addresses when the
>>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>>> device performing the DMA does not support 48-bits. SWIOTLB will be
>>> initialized to create un-encrypted bounce buffers for use by these devices.
>>>
>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>> ---
>>> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
>>> @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
>>>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>>>   *
>>>   * This returns non-zero if we are forced to use swiotlb (by the boot
>>> - * option).
>>> + * option). If memory encryption is enabled then swiotlb will be set
>>> + * to 1 so that bounce buffers are allocated and used for devices that
>>> + * do not support the addressing range required for the encryption mask.
>>>   */
>>>  int __init pci_swiotlb_detect_override(void)
>>>  {
>>>  	int use_swiotlb = swiotlb | swiotlb_force;
>>>  
>>> -	if (swiotlb_force)
>>> +	if (swiotlb_force || sme_me_mask)
>>>  		swiotlb = 1;
>>>  
>>>  	return use_swiotlb;
>> 
>> We want to return 1 even if only sme_me_mask is 1, because the return
>> value is used for detection.  The following would be less obscure, IMO:
>> 
>> 	if (swiotlb_force || sme_me_mask)
>> 		swiotlb = 1;
>> 
>> 	return swiotlb;
> 
> If we do that then all DMA would go through the swiotlb bounce buffers.

No, that is decided for example in swiotlb_map_page() and we need to
call pci_swiotlb_init() to register that function.

> By setting swiotlb to 1 we indicate that the bounce buffers will be
> needed for those devices that can't support the addressing range when
> the encryption bit is set (48 bit DMA). But if the device can support
> the addressing range we won't use the bounce buffers.

If we return 0 here, then pci_swiotlb_init() will not be called =>
dma_ops won't be set to swiotlb_dma_ops => we won't use bounce buffers.

>> We setup encrypted swiotlb and then decrypt it, but sometimes set it up
>> decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
>> directly?
> 
> When swiotlb is allocated in swiotlb_init(), it is too early to make
> use of the api to the change the page attributes. Because of this,
> the callback to make those changes is needed.

Thanks. (I don't know page table setup enough to see a lesser evil. :])

>>> @@ -541,7 +583,7 @@ static phys_addr_t
>>>  map_single(struct device *hwdev, phys_addr_t phys, size_t size,
>>>  	   enum dma_data_direction dir)
>>>  {
>>> -	dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
>>> +	dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
>> 
>> We have decrypted io_tlb_start before, so shouldn't its physical address
>> be saved without the sme bit?  (Which changes a lot ...)
> 
> I'm not sure what you mean here, can you elaborate a bit more?

The C-bit (sme bit) is a part of the physical address.
If we know that a certain physical page should be accessed as
unencrypted (the bounce buffer) then the C-bit is 0.
I'm wondering why we save the physical address with the C-bit set when
we know that it can't be accessed that way (because we remove it every
time).

The naming is a bit confusing, because physical addresses are actually
virtualized by SME -- maybe we should be calling them SME addresses?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tom Lendacky Nov. 15, 2016, 6:29 p.m. UTC | #5

On 11/15/2016 9:16 AM, Michael S. Tsirkin wrote:
> On Wed, Nov 09, 2016 at 06:37:23PM -0600, Tom Lendacky wrote:
>> Since DMA addresses will effectively look like 48-bit addresses when the
>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>> device performing the DMA does not support 48-bits. SWIOTLB will be
>> initialized to create un-encrypted bounce buffers for use by these devices.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/dma-mapping.h |    5 ++-
>>  arch/x86/include/asm/mem_encrypt.h |    5 +++
>>  arch/x86/kernel/pci-dma.c          |   11 ++++---
>>  arch/x86/kernel/pci-nommu.c        |    2 +
>>  arch/x86/kernel/pci-swiotlb.c      |    8 ++++-
>>  arch/x86/mm/mem_encrypt.c          |   17 +++++++++++
>>  include/linux/swiotlb.h            |    1 +
>>  init/main.c                        |   13 ++++++++
>>  lib/swiotlb.c                      |   58 +++++++++++++++++++++++++++++++-----
>>  9 files changed, 103 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
>> index 4446162..c9cdcae 100644
>> --- a/arch/x86/include/asm/dma-mapping.h
>> +++ b/arch/x86/include/asm/dma-mapping.h

..SNIP...

>>  
>> +/*
>> + * If memory encryption is active, the DMA address for an encrypted page may
>> + * be beyond the range of the device. If bounce buffers are required be sure
>> + * that they are not on an encrypted page. This should be called before the
>> + * iotlb area is used.
> 
> Makes sense, but I think at least a dmesg warning here
> might be a good idea.

Good idea.  Should it be a warning when it is first being set up or
a warning the first time the bounce buffers need to be used.  Or maybe
both?

> 
> A boot flag that says "don't enable devices that don't support
> encryption" might be a good idea, too, since most people
> don't read dmesg output and won't notice the message.

I'll look into this. It might be something that can be checked as
part of the device setting its DMA mask or the first time a DMA
API is used if the device doesn't explicitly set its mask.

Thanks,
Tom

> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin Nov. 15, 2016, 7:16 p.m. UTC | #6

On Tue, Nov 15, 2016 at 12:29:35PM -0600, Tom Lendacky wrote:
> On 11/15/2016 9:16 AM, Michael S. Tsirkin wrote:
> > On Wed, Nov 09, 2016 at 06:37:23PM -0600, Tom Lendacky wrote:
> >> Since DMA addresses will effectively look like 48-bit addresses when the
> >> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
> >> device performing the DMA does not support 48-bits. SWIOTLB will be
> >> initialized to create un-encrypted bounce buffers for use by these devices.
> >>
> >> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> >> ---
> >>  arch/x86/include/asm/dma-mapping.h |    5 ++-
> >>  arch/x86/include/asm/mem_encrypt.h |    5 +++
> >>  arch/x86/kernel/pci-dma.c          |   11 ++++---
> >>  arch/x86/kernel/pci-nommu.c        |    2 +
> >>  arch/x86/kernel/pci-swiotlb.c      |    8 ++++-
> >>  arch/x86/mm/mem_encrypt.c          |   17 +++++++++++
> >>  include/linux/swiotlb.h            |    1 +
> >>  init/main.c                        |   13 ++++++++
> >>  lib/swiotlb.c                      |   58 +++++++++++++++++++++++++++++++-----
> >>  9 files changed, 103 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> >> index 4446162..c9cdcae 100644
> >> --- a/arch/x86/include/asm/dma-mapping.h
> >> +++ b/arch/x86/include/asm/dma-mapping.h
> 
> ..SNIP...
> 
> >>  
> >> +/*
> >> + * If memory encryption is active, the DMA address for an encrypted page may
> >> + * be beyond the range of the device. If bounce buffers are required be sure
> >> + * that they are not on an encrypted page. This should be called before the
> >> + * iotlb area is used.
> > 
> > Makes sense, but I think at least a dmesg warning here
> > might be a good idea.
> 
> Good idea.  Should it be a warning when it is first being set up or
> a warning the first time the bounce buffers need to be used.  Or maybe
> both?
> 
> > 
> > A boot flag that says "don't enable devices that don't support
> > encryption" might be a good idea, too, since most people
> > don't read dmesg output and won't notice the message.
> 
> I'll look into this. It might be something that can be checked as
> part of the device setting its DMA mask or the first time a DMA
> API is used if the device doesn't explicitly set its mask.
> 
> Thanks,
> Tom
> 
> > 

I think setup time is nicer if it's possible.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tom Lendacky Nov. 15, 2016, 8:33 p.m. UTC | #7

On 11/15/2016 12:17 PM, Radim Krčmář wrote:
> 2016-11-15 11:02-0600, Tom Lendacky:
>> On 11/15/2016 8:39 AM, Radim Krčmář wrote:
>>> 2016-11-09 18:37-0600, Tom Lendacky:
>>>> Since DMA addresses will effectively look like 48-bit addresses when the
>>>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>>>> device performing the DMA does not support 48-bits. SWIOTLB will be
>>>> initialized to create un-encrypted bounce buffers for use by these devices.
>>>>
>>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>>> ---
>>>> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
>>>> @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
>>>>   * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
>>>>   *
>>>>   * This returns non-zero if we are forced to use swiotlb (by the boot
>>>> - * option).
>>>> + * option). If memory encryption is enabled then swiotlb will be set
>>>> + * to 1 so that bounce buffers are allocated and used for devices that
>>>> + * do not support the addressing range required for the encryption mask.
>>>>   */
>>>>  int __init pci_swiotlb_detect_override(void)
>>>>  {
>>>>  	int use_swiotlb = swiotlb | swiotlb_force;
>>>>  
>>>> -	if (swiotlb_force)
>>>> +	if (swiotlb_force || sme_me_mask)
>>>>  		swiotlb = 1;
>>>>  
>>>>  	return use_swiotlb;
>>>
>>> We want to return 1 even if only sme_me_mask is 1, because the return
>>> value is used for detection.  The following would be less obscure, IMO:
>>>
>>> 	if (swiotlb_force || sme_me_mask)
>>> 		swiotlb = 1;
>>>
>>> 	return swiotlb;
>>
>> If we do that then all DMA would go through the swiotlb bounce buffers.
> 
> No, that is decided for example in swiotlb_map_page() and we need to
> call pci_swiotlb_init() to register that function.
> 
>> By setting swiotlb to 1 we indicate that the bounce buffers will be
>> needed for those devices that can't support the addressing range when
>> the encryption bit is set (48 bit DMA). But if the device can support
>> the addressing range we won't use the bounce buffers.
> 
> If we return 0 here, then pci_swiotlb_init() will not be called =>
> dma_ops won't be set to swiotlb_dma_ops => we won't use bounce buffers.
> 

Ok, I see why this was working for me...  By setting swiotlb = 1 and
returning 0 it was continuing to the pci_swiotlb_detect_4gb table which
would return the current value of swiotlb, which is 1, and so the
swiotlb ops were setup.

So the change that you mentioned will work, thanks for pointing that out
and getting me to dig deeper on it.  I'll update the patch.

>>> We setup encrypted swiotlb and then decrypt it, but sometimes set it up
>>> decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
>>> directly?
>>
>> When swiotlb is allocated in swiotlb_init(), it is too early to make
>> use of the api to the change the page attributes. Because of this,
>> the callback to make those changes is needed.
> 
> Thanks. (I don't know page table setup enough to see a lesser evil. :])
> 
>>>> @@ -541,7 +583,7 @@ static phys_addr_t
>>>>  map_single(struct device *hwdev, phys_addr_t phys, size_t size,
>>>>  	   enum dma_data_direction dir)
>>>>  {
>>>> -	dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
>>>> +	dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
>>>
>>> We have decrypted io_tlb_start before, so shouldn't its physical address
>>> be saved without the sme bit?  (Which changes a lot ...)
>>
>> I'm not sure what you mean here, can you elaborate a bit more?
> 
> The C-bit (sme bit) is a part of the physical address.

The C-bit (sme_me_mask) isn't part of the physical address for
io_tlb_start, but since the original call was to phys_to_dma(), which
now will automatically "or" in the C-bit, I needed to adjust that by
using swiotlb_phys_to_dma() to remove the C-bit.

> If we know that a certain physical page should be accessed as
> unencrypted (the bounce buffer) then the C-bit is 0.
> I'm wondering why we save the physical address with the C-bit set when
> we know that it can't be accessed that way (because we remove it every
> time).

It's not saved with the C-bit, but the phys_to_dma call will "or" in the
C-bit automatically.  And since this is common code I need to leave that
call to phys_to_dma in.

> 
> The naming is a bit confusing, because physical addresses are actually
> virtualized by SME -- maybe we should be calling them SME addresses?

Interesting idea...  I'll have to look at how that plays out in the
patches and documentation.

Thanks,
Tom

> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Borislav Petkov Nov. 22, 2016, 11:38 a.m. UTC | #8

On Tue, Nov 15, 2016 at 12:29:35PM -0600, Tom Lendacky wrote:
> > Makes sense, but I think at least a dmesg warning here
> > might be a good idea.
> 
> Good idea.  Should it be a warning when it is first being set up or
> a warning the first time the bounce buffers need to be used.  Or maybe
> both?

Ok, let me put my user hat on...

(... puts a felt hat ...)

so what am I supposed to do about this as a user? Go and physically
remove those devices because I want to enable SME?!

IMO, the only thing we should do is issue a *single* warning -
pr_warn_once - along the lines of:

"... devices present which due to SME will use bounce buffers and will
cause their speed to diminish. Boot with sme=debug to see full info".

And then sme=debug will dump the whole gory details. I don't think
screaming for each device is going to change anything in many cases.
99% of people don't care - they just want shit to work.

> > A boot flag that says "don't enable devices that don't support
> > encryption" might be a good idea, too, since most people
> > don't read dmesg output and won't notice the message.
> 
> I'll look into this. It might be something that can be checked as
> part of the device setting its DMA mask or the first time a DMA
> API is used if the device doesn't explicitly set its mask.

Still with my user hat on, what would be the purpose of such an option?

We already use bounce buffers so those devices do support encryption,
albeit slower.

felt hat is confused.

Michael S. Tsirkin Nov. 22, 2016, 3:22 p.m. UTC | #9

On Tue, Nov 22, 2016 at 12:38:59PM +0100, Borislav Petkov wrote:
> On Tue, Nov 15, 2016 at 12:29:35PM -0600, Tom Lendacky wrote:
> > > Makes sense, but I think at least a dmesg warning here
> > > might be a good idea.
> > 
> > Good idea.  Should it be a warning when it is first being set up or
> > a warning the first time the bounce buffers need to be used.  Or maybe
> > both?
> 
> Ok, let me put my user hat on...
> 
> (... puts a felt hat ...)
> 
> so what am I supposed to do about this as a user? Go and physically
> remove those devices because I want to enable SME?!
> 
> IMO, the only thing we should do is issue a *single* warning -
> pr_warn_once - along the lines of:
> 
> "... devices present which due to SME will use bounce buffers and will
> cause their speed to diminish. Boot with sme=debug to see full info".
> 
> And then sme=debug will dump the whole gory details. I don't think
> screaming for each device is going to change anything in many cases.
> 99% of people don't care - they just want shit to work.

The issue is it's a (potential) security hole, not a slowdown.


> > > A boot flag that says "don't enable devices that don't support
> > > encryption" might be a good idea, too, since most people
> > > don't read dmesg output and won't notice the message.
> > 
> > I'll look into this. It might be something that can be checked as
> > part of the device setting its DMA mask or the first time a DMA
> > API is used if the device doesn't explicitly set its mask.
> 
> Still with my user hat on, what would be the purpose of such an option?
> 
> We already use bounce buffers so those devices do support encryption,
> albeit slower.
> 
> felt hat is confused.

To disable unsecure things. If someone enables SEV one might have an
expectation of security.  Might help push vendors to do the right thing
as a side effect.

> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Borislav Petkov Nov. 22, 2016, 3:41 p.m. UTC | #10

On Tue, Nov 22, 2016 at 05:22:38PM +0200, Michael S. Tsirkin wrote:
> The issue is it's a (potential) security hole, not a slowdown.

How? Because the bounce buffers will be unencrypted and someone might
intercept them?

> To disable unsecure things. If someone enables SEV one might have an
> expectation of security.  Might help push vendors to do the right thing
> as a side effect.

Ok, you're looking at the SEV-cloud-multiple-guests aspect. Right, that
makes sense.

I guess for SEV we should even flip the logic: disable such devices by
default and an opt-in option to enable them and issue a big fat warning.
I'd even want to let the guest users know that they're on a system which
cannot give them encrypted DMA to some devices...

Michael S. Tsirkin Nov. 22, 2016, 8:41 p.m. UTC | #11

On Tue, Nov 22, 2016 at 04:41:37PM +0100, Borislav Petkov wrote:
> On Tue, Nov 22, 2016 at 05:22:38PM +0200, Michael S. Tsirkin wrote:
> > The issue is it's a (potential) security hole, not a slowdown.
> 
> How? Because the bounce buffers will be unencrypted and someone might
> intercept them?

Or even modify them. Guests generally trust devices since they
assume they are under their control.

> > To disable unsecure things. If someone enables SEV one might have an
> > expectation of security.  Might help push vendors to do the right thing
> > as a side effect.
> 
> Ok, you're looking at the SEV-cloud-multiple-guests aspect. Right, that
> makes sense.
> 
> I guess for SEV we should even flip the logic: disable such devices by
> default and an opt-in option to enable them and issue a big fat warning.
> I'd even want to let the guest users know that they're on a system which
> cannot give them encrypted DMA to some devices...
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC,v3,13/20] x86: DMA support for memory encryption

Commit Message

Comments

Patch