diff mbox series

[RFC,13/62] x86/mm: Add hooks to allocate and free encrypted pages

Message ID 20190508144422.13171-14-kirill.shutemov@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Intel MKTME enabling | expand

Commit Message

Kirill A. Shutemov May 8, 2019, 2:43 p.m. UTC
Hook up into page allocator to allocate and free encrypted page
properly.

The hardware/CPU does not enforce coherency between mappings of the same
physical page with different KeyIDs or encryption keys.
We are responsible for cache management.

Flush cache on allocating encrypted page and on returning the page to
the free pool.

prep_encrypted_page() also takes care about zeroing the page. We have to
do this after KeyID is set for the page.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mktme.h | 17 +++++++++++++
 arch/x86/mm/mktme.c          | 49 ++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

Comments

Peter Zijlstra June 14, 2019, 9:34 a.m. UTC | #1
On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:

> +/* Prepare page to be used for encryption. Called from page allocator. */
> +void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
> +{
> +	int i;
> +
> +	/*
> +	 * The hardware/CPU does not enforce coherency between mappings
> +	 * of the same physical page with different KeyIDs or
> +	 * encryption keys. We are responsible for cache management.
> +	 */

On alloc we should flush the unencrypted (key=0) range, while on free
(below) we should flush the encrypted (key!=0) range.

But I seem to have missed where page_address() does the right thing
here.

> +	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
> +
> +	for (i = 0; i < (1 << order); i++) {
> +		/* All pages coming out of the allocator should have KeyID 0 */
> +		WARN_ON_ONCE(lookup_page_ext(page)->keyid);
> +		lookup_page_ext(page)->keyid = keyid;
> +

So presumably page_address() is affected by this keyid, and the below
clear_highpage() then accesses the 'right' location?

> +		/* Clear the page after the KeyID is set. */
> +		if (zero)
> +			clear_highpage(page);
> +
> +		page++;
> +	}
> +}
> +
> +/*
> + * Handles freeing of encrypted page.
> + * Called from page allocator on freeing encrypted page.
> + */
> +void free_encrypted_page(struct page *page, int order)
> +{
> +	int i;
> +
> +	/*
> +	 * The hardware/CPU does not enforce coherency between mappings
> +	 * of the same physical page with different KeyIDs or
> +	 * encryption keys. We are responsible for cache management.
> +	 */

I still don't like that comment much; yes the hardware doesn't do it,
and yes we have to do it, but it doesn't explain the actual scheme
employed to do so.

> +	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
> +
> +	for (i = 0; i < (1 << order); i++) {
> +		/* Check if the page has reasonable KeyID */
> +		WARN_ON_ONCE(lookup_page_ext(page)->keyid > mktme_nr_keyids);

It should also check keyid > 0, so maybe:

	(unsigned)(keyid - 1) > keyids-1

instead?

> +		lookup_page_ext(page)->keyid = 0;
> +		page++;
> +	}
> +}
> -- 
> 2.20.1
>
Peter Zijlstra June 14, 2019, 11:04 a.m. UTC | #2
On Fri, Jun 14, 2019 at 11:34:09AM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:
> 
> > +		lookup_page_ext(page)->keyid = keyid;

> > +		lookup_page_ext(page)->keyid = 0;

Also, perhaps paranoid; but do we want something like:

static inline void page_set_keyid(struct page *page, int keyid)
{
	/* ensure nothing creeps after changing the keyid */
	barrier();
	WRITE_ONCE(lookup_page_ext(page)->keyid, keyid);
	barrier();
	/* ensure nothing creeps before changing the keyid */
}

And this is very much assuming there is no concurrency through the
allocator locks.
Kirill A . Shutemov June 14, 2019, 1:14 p.m. UTC | #3
On Fri, Jun 14, 2019 at 11:34:09AM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:
> 
> > +/* Prepare page to be used for encryption. Called from page allocator. */
> > +void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
> > +{
> > +	int i;
> > +
> > +	/*
> > +	 * The hardware/CPU does not enforce coherency between mappings
> > +	 * of the same physical page with different KeyIDs or
> > +	 * encryption keys. We are responsible for cache management.
> > +	 */
> 
> On alloc we should flush the unencrypted (key=0) range, while on free
> (below) we should flush the encrypted (key!=0) range.
> 
> But I seem to have missed where page_address() does the right thing
> here.

As you've seen by now, it will be addressed later in the patchset. I'll
update the changelog to indicate that page_address() handles KeyIDs
correctly.

> > +	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
> > +
> > +	for (i = 0; i < (1 << order); i++) {
> > +		/* All pages coming out of the allocator should have KeyID 0 */
> > +		WARN_ON_ONCE(lookup_page_ext(page)->keyid);
> > +		lookup_page_ext(page)->keyid = keyid;
> > +
> 
> So presumably page_address() is affected by this keyid, and the below
> clear_highpage() then accesses the 'right' location?

Yes. clear_highpage() -> kmap_atomic() -> page_address().

> > +		/* Clear the page after the KeyID is set. */
> > +		if (zero)
> > +			clear_highpage(page);
> > +
> > +		page++;
> > +	}
> > +}
> > +
> > +/*
> > + * Handles freeing of encrypted page.
> > + * Called from page allocator on freeing encrypted page.
> > + */
> > +void free_encrypted_page(struct page *page, int order)
> > +{
> > +	int i;
> > +
> > +	/*
> > +	 * The hardware/CPU does not enforce coherency between mappings
> > +	 * of the same physical page with different KeyIDs or
> > +	 * encryption keys. We are responsible for cache management.
> > +	 */
> 
> I still don't like that comment much; yes the hardware doesn't do it,
> and yes we have to do it, but it doesn't explain the actual scheme
> employed to do so.

Fair enough. I'll do better.

> > +	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
> > +
> > +	for (i = 0; i < (1 << order); i++) {
> > +		/* Check if the page has reasonable KeyID */
> > +		WARN_ON_ONCE(lookup_page_ext(page)->keyid > mktme_nr_keyids);
> 
> It should also check keyid > 0, so maybe:
> 
> 	(unsigned)(keyid - 1) > keyids-1
> 
> instead?

Makes sense.
Kirill A . Shutemov June 14, 2019, 1:28 p.m. UTC | #4
On Fri, Jun 14, 2019 at 01:04:58PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 14, 2019 at 11:34:09AM +0200, Peter Zijlstra wrote:
> > On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:
> > 
> > > +		lookup_page_ext(page)->keyid = keyid;
> 
> > > +		lookup_page_ext(page)->keyid = 0;
> 
> Also, perhaps paranoid; but do we want something like:
> 
> static inline void page_set_keyid(struct page *page, int keyid)
> {
> 	/* ensure nothing creeps after changing the keyid */
> 	barrier();
> 	WRITE_ONCE(lookup_page_ext(page)->keyid, keyid);
> 	barrier();
> 	/* ensure nothing creeps before changing the keyid */
> }
> 
> And this is very much assuming there is no concurrency through the
> allocator locks.

There's no concurrency for this page: it has been off the free list, but
have not yet passed on to user. Nobody else sees the page before
allocation is finished.

And barriers/WRITE_ONCE() looks excessive to me. It's just yet another bit
of page's metadata and I don't see why it's has to be handled in a special
way.

Does it relax your paranoia? :P
Peter Zijlstra June 14, 2019, 1:43 p.m. UTC | #5
On Fri, Jun 14, 2019 at 04:28:36PM +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 14, 2019 at 01:04:58PM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 14, 2019 at 11:34:09AM +0200, Peter Zijlstra wrote:
> > > On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:
> > > 
> > > > +		lookup_page_ext(page)->keyid = keyid;
> > 
> > > > +		lookup_page_ext(page)->keyid = 0;
> > 
> > Also, perhaps paranoid; but do we want something like:
> > 
> > static inline void page_set_keyid(struct page *page, int keyid)
> > {
> > 	/* ensure nothing creeps after changing the keyid */
> > 	barrier();
> > 	WRITE_ONCE(lookup_page_ext(page)->keyid, keyid);
> > 	barrier();
> > 	/* ensure nothing creeps before changing the keyid */
> > }
> > 
> > And this is very much assuming there is no concurrency through the
> > allocator locks.
> 
> There's no concurrency for this page: it has been off the free list, but
> have not yet passed on to user. Nobody else sees the page before
> allocation is finished.
> 
> And barriers/WRITE_ONCE() looks excessive to me. It's just yet another bit
> of page's metadata and I don't see why it's has to be handled in a special
> way.
> 
> Does it relax your paranoia? :P

Not really, it all 'works' because clflush_cache_range() includes mb()
and page_address() has an address dependency on the store, and there are
no other sites that will ever change 'keyid', which is all kind of
fragile.

At the very least that should be explicitly called out in a comment.
Kirill A . Shutemov June 14, 2019, 10:41 p.m. UTC | #6
On Fri, Jun 14, 2019 at 03:43:35PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 14, 2019 at 04:28:36PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Jun 14, 2019 at 01:04:58PM +0200, Peter Zijlstra wrote:
> > > On Fri, Jun 14, 2019 at 11:34:09AM +0200, Peter Zijlstra wrote:
> > > > On Wed, May 08, 2019 at 05:43:33PM +0300, Kirill A. Shutemov wrote:
> > > > 
> > > > > +		lookup_page_ext(page)->keyid = keyid;
> > > 
> > > > > +		lookup_page_ext(page)->keyid = 0;
> > > 
> > > Also, perhaps paranoid; but do we want something like:
> > > 
> > > static inline void page_set_keyid(struct page *page, int keyid)
> > > {
> > > 	/* ensure nothing creeps after changing the keyid */
> > > 	barrier();
> > > 	WRITE_ONCE(lookup_page_ext(page)->keyid, keyid);
> > > 	barrier();
> > > 	/* ensure nothing creeps before changing the keyid */
> > > }
> > > 
> > > And this is very much assuming there is no concurrency through the
> > > allocator locks.
> > 
> > There's no concurrency for this page: it has been off the free list, but
> > have not yet passed on to user. Nobody else sees the page before
> > allocation is finished.
> > 
> > And barriers/WRITE_ONCE() looks excessive to me. It's just yet another bit
> > of page's metadata and I don't see why it's has to be handled in a special
> > way.
> > 
> > Does it relax your paranoia? :P
> 
> Not really, it all 'works' because clflush_cache_range() includes mb()
> and page_address() has an address dependency on the store, and there are
> no other sites that will ever change 'keyid', which is all kind of
> fragile.

Hm. I don't follow how the mb() in clflush_cache_range() relevant...

Any following access of page's memory by kernel will go through
page_keyid() and therefore I believe there's always address dependency on
the store.

Am I missing something?

> At the very least that should be explicitly called out in a comment.
>
Peter Zijlstra June 17, 2019, 9:25 a.m. UTC | #7
On Sat, Jun 15, 2019 at 01:41:31AM +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 14, 2019 at 03:43:35PM +0200, Peter Zijlstra wrote:

> > Not really, it all 'works' because clflush_cache_range() includes mb()
> > and page_address() has an address dependency on the store, and there are
> > no other sites that will ever change 'keyid', which is all kind of
> > fragile.
> 
> Hm. I don't follow how the mb() in clflush_cache_range() relevant...
> 
> Any following access of page's memory by kernel will go through
> page_keyid() and therefore I believe there's always address dependency on
> the store.
> 
> Am I missing something?

The dependency doesn't help with prior calls; consider:

	addr = page_address(page);

	*addr = foo;

	page->key_id = bar;

	addr2 = page_address(page);


Without a barrier() between '*addr = foo' and 'page->key_id = bar', the
compiler is allowed to reorder these stores.

Now, the clflush stuff we do, that already hard orders things -- we need
to be done writing before we start flushing -- so we can/do rely on
that, but we should explicitly mention that.

Now, for the second part, addr2 must observe bar, because of the address
dependency, the compiler is not allowed mess that up, but again, that is
something we should put in a comment.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/mktme.h b/arch/x86/include/asm/mktme.h
index b5afa31b4526..6e604126f0bc 100644
--- a/arch/x86/include/asm/mktme.h
+++ b/arch/x86/include/asm/mktme.h
@@ -40,6 +40,23 @@  static inline int vma_keyid(struct vm_area_struct *vma)
 	return __vma_keyid(vma);
 }
 
+#define prep_encrypted_page prep_encrypted_page
+void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero);
+static inline void prep_encrypted_page(struct page *page, int order,
+		int keyid, bool zero)
+{
+	if (keyid)
+		__prep_encrypted_page(page, order, keyid, zero);
+}
+
+#define HAVE_ARCH_FREE_PAGE
+void free_encrypted_page(struct page *page, int order);
+static inline void arch_free_page(struct page *page, int order)
+{
+	if (page_keyid(page))
+		free_encrypted_page(page, order);
+}
+
 #else
 #define mktme_keyid_mask	((phys_addr_t)0)
 #define mktme_nr_keyids		0
diff --git a/arch/x86/mm/mktme.c b/arch/x86/mm/mktme.c
index d4a1a9e9b1c0..43489c098e60 100644
--- a/arch/x86/mm/mktme.c
+++ b/arch/x86/mm/mktme.c
@@ -1,4 +1,5 @@ 
 #include <linux/mm.h>
+#include <linux/highmem.h>
 #include <asm/mktme.h>
 
 /* Mask to extract KeyID from physical address. */
@@ -37,3 +38,51 @@  int __vma_keyid(struct vm_area_struct *vma)
 	pgprotval_t prot = pgprot_val(vma->vm_page_prot);
 	return (prot & mktme_keyid_mask) >> mktme_keyid_shift;
 }
+
+/* Prepare page to be used for encryption. Called from page allocator. */
+void __prep_encrypted_page(struct page *page, int order, int keyid, bool zero)
+{
+	int i;
+
+	/*
+	 * The hardware/CPU does not enforce coherency between mappings
+	 * of the same physical page with different KeyIDs or
+	 * encryption keys. We are responsible for cache management.
+	 */
+	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
+
+	for (i = 0; i < (1 << order); i++) {
+		/* All pages coming out of the allocator should have KeyID 0 */
+		WARN_ON_ONCE(lookup_page_ext(page)->keyid);
+		lookup_page_ext(page)->keyid = keyid;
+
+		/* Clear the page after the KeyID is set. */
+		if (zero)
+			clear_highpage(page);
+
+		page++;
+	}
+}
+
+/*
+ * Handles freeing of encrypted page.
+ * Called from page allocator on freeing encrypted page.
+ */
+void free_encrypted_page(struct page *page, int order)
+{
+	int i;
+
+	/*
+	 * The hardware/CPU does not enforce coherency between mappings
+	 * of the same physical page with different KeyIDs or
+	 * encryption keys. We are responsible for cache management.
+	 */
+	clflush_cache_range(page_address(page), PAGE_SIZE * (1UL << order));
+
+	for (i = 0; i < (1 << order); i++) {
+		/* Check if the page has reasonable KeyID */
+		WARN_ON_ONCE(lookup_page_ext(page)->keyid > mktme_nr_keyids);
+		lookup_page_ext(page)->keyid = 0;
+		page++;
+	}
+}