diff mbox series

[RFC,12/15] kmap: Add stray write protection for device pages

Message ID 20200714070220.3500839-13-ira.weiny@intel.com (mailing list archive)
State New
Headers show
Series PKS: Add Protection Keys Supervisor (PKS) support | expand

Commit Message

Ira Weiny July 14, 2020, 7:02 a.m. UTC
From: Ira Weiny <ira.weiny@intel.com>

Device managed pages may have additional protections.  These protections
need to be removed prior to valid use by kernel users.

Check for special treatment of device managed pages in kmap and take
action if needed.  We use kmap as an interface for generic kernel code
because under normal circumstances it would be a bug for general kernel
code to not use kmap prior to accessing kernel memory.  Therefore, this
should allow any valid kernel users to seamlessly use these pages
without issues.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 include/linux/highmem.h | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra July 14, 2020, 8:44 a.m. UTC | #1
On Tue, Jul 14, 2020 at 12:02:17AM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> Device managed pages may have additional protections.  These protections
> need to be removed prior to valid use by kernel users.
> 
> Check for special treatment of device managed pages in kmap and take
> action if needed.  We use kmap as an interface for generic kernel code
> because under normal circumstances it would be a bug for general kernel
> code to not use kmap prior to accessing kernel memory.  Therefore, this
> should allow any valid kernel users to seamlessly use these pages
> without issues.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  include/linux/highmem.h | 32 +++++++++++++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index d6e82e3de027..7f809d8d5a94 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -8,6 +8,7 @@
>  #include <linux/mm.h>
>  #include <linux/uaccess.h>
>  #include <linux/hardirq.h>
> +#include <linux/memremap.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -31,6 +32,20 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>  
>  #include <asm/kmap_types.h>
>  
> +static inline void enable_access(struct page *page)
> +{
> +	if (!page_is_access_protected(page))
> +		return;
> +	dev_access_enable();
> +}
> +
> +static inline void disable_access(struct page *page)
> +{
> +	if (!page_is_access_protected(page))
> +		return;
> +	dev_access_disable();
> +}

So, if I followed along correctly, you're proposing to do a WRMSR per
k{,un}map{_atomic}(), sounds like excellent performance all-round :-(
Ira Weiny July 14, 2020, 7:06 p.m. UTC | #2
On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote:
> On Tue, Jul 14, 2020 at 12:02:17AM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > Device managed pages may have additional protections.  These protections
> > need to be removed prior to valid use by kernel users.
> > 
> > Check for special treatment of device managed pages in kmap and take
> > action if needed.  We use kmap as an interface for generic kernel code
> > because under normal circumstances it would be a bug for general kernel
> > code to not use kmap prior to accessing kernel memory.  Therefore, this
> > should allow any valid kernel users to seamlessly use these pages
> > without issues.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > ---
> >  include/linux/highmem.h | 32 +++++++++++++++++++++++++++++++-
> >  1 file changed, 31 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> > index d6e82e3de027..7f809d8d5a94 100644
> > --- a/include/linux/highmem.h
> > +++ b/include/linux/highmem.h
> > @@ -8,6 +8,7 @@
> >  #include <linux/mm.h>
> >  #include <linux/uaccess.h>
> >  #include <linux/hardirq.h>
> > +#include <linux/memremap.h>
> >  
> >  #include <asm/cacheflush.h>
> >  
> > @@ -31,6 +32,20 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >  
> >  #include <asm/kmap_types.h>
> >  
> > +static inline void enable_access(struct page *page)
> > +{
> > +	if (!page_is_access_protected(page))
> > +		return;
> > +	dev_access_enable();
> > +}
> > +
> > +static inline void disable_access(struct page *page)
> > +{
> > +	if (!page_is_access_protected(page))
> > +		return;
> > +	dev_access_disable();
> > +}
> 
> So, if I followed along correctly, you're proposing to do a WRMSR per
> k{,un}map{_atomic}(), sounds like excellent performance all-round :-(

Only to pages which have this additional protection, ie not DRAM.

User mappings of this memory is not affected (would be covered by User PKeys if
desired).  User mappings to persistent memory are the primary use case and the
performant path.

Ira
Peter Zijlstra July 14, 2020, 7:29 p.m. UTC | #3
On Tue, Jul 14, 2020 at 12:06:16PM -0700, Ira Weiny wrote:
> On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote:

> > So, if I followed along correctly, you're proposing to do a WRMSR per
> > k{,un}map{_atomic}(), sounds like excellent performance all-round :-(
> 
> Only to pages which have this additional protection, ie not DRAM.
> 
> User mappings of this memory is not affected (would be covered by User PKeys if
> desired).  User mappings to persistent memory are the primary use case and the
> performant path.

Because performance to non-volatile memory doesn't matter? I think Dave
has a better answer here ...
Dave Hansen July 14, 2020, 7:42 p.m. UTC | #4
On 7/14/20 12:29 PM, Peter Zijlstra wrote:
> On Tue, Jul 14, 2020 at 12:06:16PM -0700, Ira Weiny wrote:
>> On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote:
>>> So, if I followed along correctly, you're proposing to do a WRMSR per
>>> k{,un}map{_atomic}(), sounds like excellent performance all-round :-(
>> Only to pages which have this additional protection, ie not DRAM.
>>
>> User mappings of this memory is not affected (would be covered by User PKeys if
>> desired).  User mappings to persistent memory are the primary use case and the
>> performant path.
> Because performance to non-volatile memory doesn't matter? I think Dave
> has a better answer here ...

So, these WRMSRs are less evil than normal.  They're architecturally
non-serializing instructions, just like the others in the SDM WRMSR
documentation:

	Note that WRMSR to the IA32_TSC_DEADLINE MSR (MSR index 6E0H)
	and the X2APIC MSRs (MSR indices 802H to 83FH) are  not
	serializing.

This section of the SDM needs to be updated for the PKRS.  Also note
that the PKRS WRMSR is similar in its ordering properties to WRPKRU:

	WRPKRU will never execute speculatively. Memory accesses
	affected by PKRU register will not execute (even speculatively)
	until all prior executions of WRPKRU have completed execution
	and updated the PKRU register.

Which means we don't have to do silliness like LFENCE before WRMSR to
get ordering *back*.  This is another tidbit that needs to get added to
the SDM.  It should probably also get captured in the changelog.

But, either way, this *will* make accessing PMEM more expensive from the
kernel.  No escaping that.  But, we've also got customers saying they
won't deploy PMEM until we mitigate this stray write issue.  Those folks
are quite willing to pay the increased in-kernel cost for increased
protection from stray kernel writes.  Intel is also quite motivated
because we really like increasing the number of PMEM deployments. :)

Ira, can you make sure this all gets pulled into the changelogs somewhere?
Peter Zijlstra July 14, 2020, 7:49 p.m. UTC | #5
On Tue, Jul 14, 2020 at 12:42:11PM -0700, Dave Hansen wrote:
> On 7/14/20 12:29 PM, Peter Zijlstra wrote:
> > On Tue, Jul 14, 2020 at 12:06:16PM -0700, Ira Weiny wrote:
> >> On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote:
> >>> So, if I followed along correctly, you're proposing to do a WRMSR per
> >>> k{,un}map{_atomic}(), sounds like excellent performance all-round :-(
> >> Only to pages which have this additional protection, ie not DRAM.
> >>
> >> User mappings of this memory is not affected (would be covered by User PKeys if
> >> desired).  User mappings to persistent memory are the primary use case and the
> >> performant path.
> > Because performance to non-volatile memory doesn't matter? I think Dave
> > has a better answer here ...
> 
> So, these WRMSRs are less evil than normal.  They're architecturally
> non-serializing instructions,

Excellent, that should make these a fair bit faster than regular MSRs.

> But, either way, this *will* make accessing PMEM more expensive from the
> kernel.  No escaping that. 

There's no free lunch, it's just that regular MSRs are fairly horrible.
Ira Weiny July 14, 2020, 8 p.m. UTC | #6
On Tue, Jul 14, 2020 at 12:42:11PM -0700, Dave Hansen wrote:
> On 7/14/20 12:29 PM, Peter Zijlstra wrote:
> > On Tue, Jul 14, 2020 at 12:06:16PM -0700, Ira Weiny wrote:
> >> On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote:
> >>> So, if I followed along correctly, you're proposing to do a WRMSR per
> >>> k{,un}map{_atomic}(), sounds like excellent performance all-round :-(
> >> Only to pages which have this additional protection, ie not DRAM.
> >>
> >> User mappings of this memory is not affected (would be covered by User PKeys if
> >> desired).  User mappings to persistent memory are the primary use case and the
> >> performant path.
> > Because performance to non-volatile memory doesn't matter? I think Dave
> > has a better answer here ...
> 
> So, these WRMSRs are less evil than normal.  They're architecturally
> non-serializing instructions, just like the others in the SDM WRMSR
> documentation:
> 
> 	Note that WRMSR to the IA32_TSC_DEADLINE MSR (MSR index 6E0H)
> 	and the X2APIC MSRs (MSR indices 802H to 83FH) are  not
> 	serializing.
> 
> This section of the SDM needs to be updated for the PKRS.  Also note
> that the PKRS WRMSR is similar in its ordering properties to WRPKRU:
> 
> 	WRPKRU will never execute speculatively. Memory accesses
> 	affected by PKRU register will not execute (even speculatively)
> 	until all prior executions of WRPKRU have completed execution
> 	and updated the PKRU register.
> 
> Which means we don't have to do silliness like LFENCE before WRMSR to
> get ordering *back*.  This is another tidbit that needs to get added to
> the SDM.  It should probably also get captured in the changelog.
> 
> But, either way, this *will* make accessing PMEM more expensive from the
> kernel.  No escaping that.  But, we've also got customers saying they
> won't deploy PMEM until we mitigate this stray write issue.  Those folks
> are quite willing to pay the increased in-kernel cost for increased
> protection from stray kernel writes.  Intel is also quite motivated
> because we really like increasing the number of PMEM deployments. :)
> 
> Ira, can you make sure this all gets pulled into the changelogs somewhere?

Yes of course.  Thanks for writing that up.

Ira
diff mbox series

Patch

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index d6e82e3de027..7f809d8d5a94 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -8,6 +8,7 @@ 
 #include <linux/mm.h>
 #include <linux/uaccess.h>
 #include <linux/hardirq.h>
+#include <linux/memremap.h>
 
 #include <asm/cacheflush.h>
 
@@ -31,6 +32,20 @@  static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
 
 #include <asm/kmap_types.h>
 
+static inline void enable_access(struct page *page)
+{
+	if (!page_is_access_protected(page))
+		return;
+	dev_access_enable();
+}
+
+static inline void disable_access(struct page *page)
+{
+	if (!page_is_access_protected(page))
+		return;
+	dev_access_disable();
+}
+
 #ifdef CONFIG_HIGHMEM
 extern void *kmap_atomic_high_prot(struct page *page, pgprot_t prot);
 extern void kunmap_atomic_high(void *kvaddr);
@@ -55,6 +70,11 @@  static inline void *kmap(struct page *page)
 	else
 		addr = kmap_high(page);
 	kmap_flush_tlb((unsigned long)addr);
+	/*
+	 * Even non-highmem pages may have additional access protections which
+	 * need to be checked and potentially enabled.
+	 */
+	enable_access(page);
 	return addr;
 }
 
@@ -63,6 +83,11 @@  void kunmap_high(struct page *page);
 static inline void kunmap(struct page *page)
 {
 	might_sleep();
+	/*
+	 * Even non-highmem pages may have additional access protections which
+	 * need to be checked and potentially disabled.
+	 */
+	disable_access(page);
 	if (!PageHighMem(page))
 		return;
 	kunmap_high(page);
@@ -85,6 +110,7 @@  static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
 {
 	preempt_disable();
 	pagefault_disable();
+	enable_access(page);
 	if (!PageHighMem(page))
 		return page_address(page);
 	return kmap_atomic_high_prot(page, prot);
@@ -137,6 +163,7 @@  static inline unsigned long totalhigh_pages(void) { return 0UL; }
 static inline void *kmap(struct page *page)
 {
 	might_sleep();
+	enable_access(page);
 	return page_address(page);
 }
 
@@ -146,6 +173,7 @@  static inline void kunmap_high(struct page *page)
 
 static inline void kunmap(struct page *page)
 {
+	disable_access(page);
 #ifdef ARCH_HAS_FLUSH_ON_KUNMAP
 	kunmap_flush_on_unmap(page_address(page));
 #endif
@@ -155,6 +183,7 @@  static inline void *kmap_atomic(struct page *page)
 {
 	preempt_disable();
 	pagefault_disable();
+	enable_access(page);
 	return page_address(page);
 }
 #define kmap_atomic_prot(page, prot)	kmap_atomic(page)
@@ -216,7 +245,8 @@  static inline void kmap_atomic_idx_pop(void)
 #define kunmap_atomic(addr)                                     \
 do {                                                            \
 	BUILD_BUG_ON(__same_type((addr), struct page *));       \
-	kunmap_atomic_high(addr);                                  \
+	disable_access(kmap_to_page(addr));                     \
+	kunmap_atomic_high(addr);                               \
 	pagefault_enable();                                     \
 	preempt_enable();                                       \
 } while (0)