Message ID | 20210505003032.489164-1-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | PKS write protected page tables | expand |
On Tue, May 04, 2021 at 05:30:23PM -0700, Rick Edgecombe wrote: > > This is based on V6 [1] of the core PKS infrastructure patches. PKS > infrastructure follow-on’s are planned to enable keys to be set to the same > permissions globally. Since this usage needs a key to be set globally > read-only by default, a small temporary solution is hacked up in patch 8. Long > term, PKS protected page tables would use a better and more generic solution > to achieve this. Before you send this out I've been thinking about this more and I think I would prefer you not call this 'globally' setting the key. Because you don't really want to be able to update the key globally like I originally suggested for kmap(). What is required is to set a different default for the key which gets used by all threads by 'default'. What is really missing is how to get the default changed after it may have been used by some threads... thus the 'global' nature... Perhaps I am picking nits here but I think it may go over better with Thomas and the maintainers. Or maybe not... :-) Would it be too much trouble to call this a 'default' change? Because that is really what you implement? Ira
On Tue, May 04, 2021 at 05:30:23PM -0700, Rick Edgecombe wrote: > This is a POC for write protecting page tables with PKS (Protection Keys for > Supervisor) [1]. The basic idea is to make the page tables read only, except > temporarily on a per-cpu basis when they need to be modified. I’m looking for > opinions on whether people like the general direction of this in terms of > value and implementation. Yay! > Why would people want this? > =========================== > Page tables are the basis for many types of protections and as such, are a > juicy target for attackers. Mapping them read-only will make them harder to > use in attacks. > > This protects against an attacker that has acquired the ability to write to > the page tables. It's not foolproof because an attacker who can execute > arbitrary code can either disable PKS directly, or simply call the same > functions that the kernel uses for legitimate page table writes. I think it absolutely has value. The exploit techniques I'm aware of that target the page table are usually attempting to upgrade an arbitrary write into execution (e.g. write to kernel text after setting kernel text writable in the page table) or similar "data only" attacks (make sensitive page writable). It looks like PKS-protected page tables would be much like the RO-protected text pages in the sense that there is already code in the kernel to do things to make it writable, change text, and set it read-only again (alternatives, ftrace, etc). That said, making the PKS manipulation code be inline to page-writing code would make it less available for ROP/JOP, if an attack DID want to go that route. > Why use PKS for this? > ===================== > PKS is an upcoming CPU feature that allows supervisor virtual memory > permissions to be changed without flushing the TLB, like PKU does for user > memory. Protecting page tables would normally be really expensive because you > would have to do it with paging itself. PKS helps by providing a way to toggle > the writability of the page tables with just a per-cpu MSR. The per-cpu-ness is really important for both performance and for avoiding temporal attacks where an arbitrary write in one CPU is timed against a page table write in another CPU. > Performance impacts > =================== > Setting direct map permissions on whatever random page gets allocated for a > page table would result in a lot of kernel range shootdowns and direct map > large page shattering. So the way the PKS page table memory is created is > similar to this module page clustering series[2], where a cache of pages is > replenished from 2MB pages such that the direct map permissions and associated > breakage is localized on the direct map. In the PKS page tables case, a PKS > key is pre-applied to the direct map for pages in the cache. > > There would be some costs of memory overhead in order to protect the direct > map page tables. There would also be some extra kernel range shootdowns to > replenish the cache on occasion, from setting the PKS key on the direct map of > the new pages. I don’t have any actual performance data yet. What CPU models are expected to have PKS? > This is based on V6 [1] of the core PKS infrastructure patches. PKS > infrastructure follow-on’s are planned to enable keys to be set to the same > permissions globally. Since this usage needs a key to be set globally > read-only by default, a small temporary solution is hacked up in patch 8. Long > term, PKS protected page tables would use a better and more generic solution > to achieve this. > > [1] https://lore.kernel.org/lkml/20210401225833.566238-1-ira.weiny@intel.com/ Ah, neat! > [2] https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@intel.com/ Ooh. What does this do for performance? It sounds like less TLB pressure, IIUC?
On Tue, May 04, 2021 at 11:25:31PM -0700, Kees Cook wrote: > It looks like PKS-protected page tables would be much like the > RO-protected text pages in the sense that there is already code in > the kernel to do things to make it writable, change text, and set it > read-only again (alternatives, ftrace, etc). We don't actually modify text by changing the mapping at all. We modify through a writable (but not executable) temporary alias on the page (on x86). Once a mapping is RX it will *never* be writable again (until we tear it all down).
On 5/5/21 2:30 AM, Rick Edgecombe wrote: > This is a POC for write protecting page tables with PKS (Protection Keys for > Supervisor) [1]. The basic idea is to make the page tables read only, except > temporarily on a per-cpu basis when they need to be modified. I’m looking for > opinions on whether people like the general direction of this in terms of > value and implementation. > > Why would people want this? > =========================== > Page tables are the basis for many types of protections and as such, are a > juicy target for attackers. Mapping them read-only will make them harder to > use in attacks. > > This protects against an attacker that has acquired the ability to write to > the page tables. It's not foolproof because an attacker who can execute > arbitrary code can either disable PKS directly, or simply call the same > functions that the kernel uses for legitimate page table writes. Yeah, it's a good idea. I've once used a similar approach locally during debugging a problem that appeared to be stray writes hitting page tables, and without PKS I indeed made the whole pages read-only when not touched by the designated code. > Why use PKS for this? > ===================== > PKS is an upcoming CPU feature that allows supervisor virtual memory > permissions to be changed without flushing the TLB, like PKU does for user > memory. Protecting page tables would normally be really expensive because you > would have to do it with paging itself. PKS helps by providing a way to toggle > the writability of the page tables with just a per-cpu MSR. I can see in patch 8/9 that you are flipping the MSR around individual operations on page table entries. In my patch I hooked making the page table writable to obtaining the page table lock (IIRC I had only the PTE level fully handled though). Wonder if that would be better tradeoff even for your MSR approach? Vlastimil > Performance impacts > =================== > Setting direct map permissions on whatever random page gets allocated for a > page table would result in a lot of kernel range shootdowns and direct map > large page shattering. So the way the PKS page table memory is created is > similar to this module page clustering series[2], where a cache of pages is > replenished from 2MB pages such that the direct map permissions and associated > breakage is localized on the direct map. In the PKS page tables case, a PKS > key is pre-applied to the direct map for pages in the cache. > > There would be some costs of memory overhead in order to protect the direct > map page tables. There would also be some extra kernel range shootdowns to > replenish the cache on occasion, from setting the PKS key on the direct map of > the new pages. I don’t have any actual performance data yet. > > This is based on V6 [1] of the core PKS infrastructure patches. PKS > infrastructure follow-on’s are planned to enable keys to be set to the same > permissions globally. Since this usage needs a key to be set globally > read-only by default, a small temporary solution is hacked up in patch 8. Long > term, PKS protected page tables would use a better and more generic solution > to achieve this. > > [1] > https://lore.kernel.org/lkml/20210401225833.566238-1-ira.weiny@intel.com/ > [2] > https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@intel.com > / > > Thanks, > > Rick > > > Rick Edgecombe (9): > list: Support getting most recent element in list_lru > list: Support list head not in object for list_lru > x86/mm/cpa: Add grouped page allocations > mm: Explicitly zero page table lock ptr > x86, mm: Use cache of page tables > x86/mm/cpa: Add set_memory_pks() > x86/mm/cpa: Add perm callbacks to grouped pages > x86, mm: Protect page tables with PKS > x86, cpa: PKS protect direct map page tables > > arch/x86/boot/compressed/ident_map_64.c | 5 + > arch/x86/include/asm/pgalloc.h | 6 + > arch/x86/include/asm/pgtable.h | 26 +- > arch/x86/include/asm/pgtable_64.h | 33 ++- > arch/x86/include/asm/pkeys_common.h | 8 +- > arch/x86/include/asm/set_memory.h | 23 ++ > arch/x86/mm/init.c | 40 +++ > arch/x86/mm/pat/set_memory.c | 312 +++++++++++++++++++++++- > arch/x86/mm/pgtable.c | 144 ++++++++++- > include/asm-generic/pgalloc.h | 42 +++- > include/linux/list_lru.h | 26 ++ > include/linux/mm.h | 7 + > mm/Kconfig | 6 +- > mm/list_lru.c | 38 ++- > mm/memory.c | 1 + > mm/swap.c | 7 + > mm/swap_state.c | 6 + > 17 files changed, 705 insertions(+), 25 deletions(-) >
On Wed, May 05, 2021 at 01:08:35PM +0200, Vlastimil Babka wrote: > On 5/5/21 2:30 AM, Rick Edgecombe wrote: > > Why use PKS for this? > > ===================== > > PKS is an upcoming CPU feature that allows supervisor virtual memory > > permissions to be changed without flushing the TLB, like PKU does for user > > memory. Protecting page tables would normally be really expensive because you > > would have to do it with paging itself. PKS helps by providing a way to toggle > > the writability of the page tables with just a per-cpu MSR. > > I can see in patch 8/9 that you are flipping the MSR around individual > operations on page table entries. In my patch I hooked making the page table > writable to obtaining the page table lock (IIRC I had only the PTE level fully > handled though). Wonder if that would be better tradeoff even for your MSR approach? There's also the HIGHPTE code we could abuse to kmap an alias while we're at it.
On Wed, May 05, 2021 at 10:37:29AM +0200, Peter Zijlstra wrote: > On Tue, May 04, 2021 at 11:25:31PM -0700, Kees Cook wrote: > > > It looks like PKS-protected page tables would be much like the > > RO-protected text pages in the sense that there is already code in > > the kernel to do things to make it writable, change text, and set it > > read-only again (alternatives, ftrace, etc). > > We don't actually modify text by changing the mapping at all. We modify > through a writable (but not executable) temporary alias on the page (on > x86). > > Once a mapping is RX it will *never* be writable again (until we tear it > all down). Yes, quite true. I was trying to answer the concern about "is it okay that there is a routine in the kernel that can write to page tables (via temporary disabling of PKS)?" by saying "yes, this is fine -- we already have similar routines in the kernel that bypass memory protections, and that's okay because the defense is primarily about blocking flaws that allow attacker-controlled writes to be used to leverage greater control over kernel state, of which the page tables are pretty central. :)
On Wed, 2021-05-05 at 13:56 +0200, Peter Zijlstra wrote: > On Wed, May 05, 2021 at 01:08:35PM +0200, Vlastimil Babka wrote: > > On 5/5/21 2:30 AM, Rick Edgecombe wrote: > > > > Why use PKS for this? > > > ===================== > > > PKS is an upcoming CPU feature that allows supervisor virtual > > > memory > > > permissions to be changed without flushing the TLB, like PKU does > > > for user > > > memory. Protecting page tables would normally be really expensive > > > because you > > > would have to do it with paging itself. PKS helps by providing a > > > way to toggle > > > the writability of the page tables with just a per-cpu MSR. > > > > I can see in patch 8/9 that you are flipping the MSR around > > individual > > operations on page table entries. In my patch I hooked making the > > page table > > writable to obtaining the page table lock (IIRC I had only the PTE > > level fully > > handled though). Wonder if that would be better tradeoff even for > > your MSR approach? > Hmm, I see, that could reduce the sprinkling of the enable/disable calls. It seems some(most?) of the kernel address space page table modification don't take the page table locks though so those callers would have to call something else to be able to write. > There's also the HIGHPTE code we could abuse to kmap an alias while > we're at it. For a non-PKS debug feature? It might fit pretty easily into CONFIG_DEBUG_PAGEALLOC on top of this series. enable/disable_pgtable_write() could take a pointer that is ignored in PKS mode, but triggers a cpa call in a CONFIG_DEBUG_PAGETABLE_WRITE mode.
On Tue, 2021-05-04 at 23:25 -0700, Kees Cook wrote: > > infrastructure follow-on’s are planned to enable keys to be set to > > the same > > permissions globally. Since this usage needs a key to be set globally > > read-only by default, a small temporary solution is hacked up in > > patch 8. Long > > term, PKS protected page tables would use a better and more generic > > solution > > to achieve this. > > > > [1] > > https://lore.kernel.org/lkml/20210401225833.566238-1-ira.weiny@intel.com/ > > Ah, neat! > > > [2] > > https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@intel.com/ > > Ooh. What does this do for performance? It sounds like less TLB > pressure, IIUC? Yea, less TLB pressure, faster page table walks in theory. There was some testing that showed having all 4k pages was bad for performance: https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/ I'm not sure exactly how much breakage is needed before problems start to show up, but there was also someone posting that large amounts of tracing was noticeable for their workload.
On Tue, May 04, 2021 at 11:25:31PM -0700, Kees Cook wrote: > On Tue, May 04, 2021 at 05:30:23PM -0700, Rick Edgecombe wrote: > > > Performance impacts > > =================== > > Setting direct map permissions on whatever random page gets allocated for a > > page table would result in a lot of kernel range shootdowns and direct map > > large page shattering. So the way the PKS page table memory is created is > > similar to this module page clustering series[2], where a cache of pages is > > replenished from 2MB pages such that the direct map permissions and associated > > breakage is localized on the direct map. In the PKS page tables case, a PKS > > key is pre-applied to the direct map for pages in the cache. > > > > There would be some costs of memory overhead in order to protect the direct > > map page tables. There would also be some extra kernel range shootdowns to > > replenish the cache on occasion, from setting the PKS key on the direct map of > > the new pages. I don’t have any actual performance data yet. > > What CPU models are expected to have PKS? Supervisor Memory Protection Keys (PKS) is a feature which is found on Intel’s Sapphire Rapids (and later) “Scalable Processor” Server CPUs. It will also be available in future non-server Intel parts. Also qemu has some support as well. https://www.qemu.org/2021/04/30/qemu-6-0-0/ Ira