mbox series

[RFC,v4,part-1,0/7] ASI - Part I (ASI Infrastructure and PTI)

Message ID 20200504144939.11318-1-alexandre.chartre@oracle.com (mailing list archive)
Headers show
Series ASI - Part I (ASI Infrastructure and PTI) | expand

Message

Alexandre Chartre May 4, 2020, 2:49 p.m. UTC
This version 4 of the kernel Address Space Isolation (ASI) RFC. I have
broken it down into three distinct parts:

 - Part I: ASI Infrastructure and PTI (this part)
 - Part II: Decorated Page-Table
 - Part III: ASI Test Driver and CLI

Part I is similar to RFCv3 [3] with some small bug fixes. Parts II and III
extend the initial patchset: part II introduces decorated page-table in
order to provide convenient page-table management functions, and part III
provides a driver and CLI for testing ASI (using parts I and II).

KVM ASI will come later and will rely on the ASI infrastructure (part I)
and decorated page-table (part II).

Patches are based on v5.7-rc4.

Background
==========
Kernel Address Space Isolation aims to use address spaces to isolate some
parts of the kernel (for example KVM) to prevent leaking sensitive data
between CPU hyper-threads under speculative execution attacks.

Over the past years, various speculative attacks (like L1TF or MDS) have
highlighted that data can leak between CPU threads through the CPU (micro)
architecture. In particular, a malicious virtual machine running on a CPU
thread can target data used by a sibling CPU thread from the same CPU core.
Thus, a malicious VM can potentially access data from another VM or from
the host system if they are running on sibling CPU threads.

Core Scheduling [4] can prevent a malicious VM from attacking another VM
by running the same VM on all CPU threads of a CPU core. However a
malicious VM can still target the host system when the sibling CPU thread
exits the VM and returns to the host.

Address Space Isolation can be applied to KVM to mitigate this VM-to-host
attack by removing secrets from the kernel address space used when running
KVM, thus preventing a malicious VM from collecting any sensitive data
from host.

Address Space Isolation can also be used to implement Page Table Isolation
(PTI [5]) which reduces kernel mappings present in user address spaces to
prevent the Meltdown attack.

Details
=======

ASI
---
An ASI is created by calling asi_create() with a specified ASI type. The
ASI type manages data common to all ASI of the same type. It is used, in
particular, to manage per-ASI type TLB/PCID information.

Then the ASI can be entered with asi_enter() and exited with asi_exit().
When an ASI is in used, any interrupt/exception/NMI will cause the ASI to
be interrupted (ASI_INTERRUPT) and the ASI will be resumed (ASI_RESUME)
when the interrupt/exception/NMI returns.

asi_enter()/asi_exit() and ASI_INTERRUPT/ASI_RESUME switch between the
ASI and the full kernel page-table by updating the CR3 register.

If a task using ASI is scheduled out then its ASI state is saved and it
will be restored when the task is scheduled back.

Page fault occurring while ASI is used will either cause the ASI to be
aborted (switch back to the full kernel pagetable) or to be preserved.
The behavior depends on the ASI type. For example, for PTI the ASI is
preserved and the kernel page fault handler handles the fault on behalf
of the ASI. But for KVM ASI, the ASI will be aborted and the fault will
be retried with the full kernel page-table.

PTI
---
PTI is now implemented with ASI (user ASI) if both CONFIG_ADDRESS_SPACE_ISOLATION
and CONFIG_PAGE_TABLE_ISOLATION are set. The behavior of PTI is unchanged
but it is now using the ASI infrastructure. 

For each user process, a user ASI is defined with the PTI pagetable. The
user ASI is used when running userland code, and it is exited when entering
a syscall. The user ASI is re-entered when the syscall returns to userland.

KVM
---
As already mentioned, KVM ASI is not present in this patchset. KVM ASI
will be implemented ontop of this infrastructure. Basically, the KVM ASI
patchset will:
  - define a KVM ASI type (DEFINE_ASI_TYPE)
  - create and fill a page-table to be used by the KVM ASI
  - create a KVM ASI (asi_create_kvm())
  - enter the KVM ASI (asi_enter()) on KVM_RUN ioctl
  - exit the KVM ASI (asi_exit())

Fault occuring when KVM ASI is in used will cause the ASI to be aborted,
and the code will continue running with the full kernel page-table,
until KVM ASI is explicitly reentered.

Status
======
The code looks stable and it supports running a full kernel build and
also ltp tests. Performance impact is expected to be limited as the new
code only adds a small number of assembly instructions on syscall and
interrupts. There's probably also room for reducing this number of
instructions.

Changes 
=======
RFCv4:

- Fix crash when booting with PTI disabled
- Fix issue when task using ASI is scheduled-in

RFCv3:

- Add ASI Type

- Add generic TLB flushing mechanism for ASI. This mechanism is similar
  to the context tracking done when switching mm.

- When ASI is in used, it is interrupted on interrupt/exception/NMI and
  resumed when the interrupt/exception/NMI returns.

- If a task using ASI is scheduled in/out then save/restore the corresponding
  ASI and update the cpu ASI session.

- Implement PTI with ASI.

- Remove KVM ASI from the patchset. KVM ASI will be provided in a separated
  patchset ontop of the ASI infrastructure.

- Remove functions to manage, populate and clear page-tables. These functions
  were only used to build to the KVM ASI page-table. Also such functions should
  be generic page-table functions and not specific to ASI. Mike Rapoport is also
  looking at making these functions generic.


References
==========
[1] ASI RFCv1 - https://lkml.org/lkml/2019/5/13/515
[2] ASI RFCv2 - https://lore.kernel.org/lkml/1562855138-19507-1-git-send-email-alexandre.chartre@oracle.com
[3] ASI RFCv3 - https://lore.kernel.org/lkml/1582734120-26757-1-git-send-email-alexandre.chartre@oracle.com
[4] Core Scheduling - https://lwn.net/Articles/803652
[5] Page Table Isolation (PTI) - https://www.kernel.org/doc/html/latest/x86/pti.html


Thanks,

alex.

-----

Alexandre Chartre (7):
  mm/x86: Introduce kernel Address Space Isolation (ASI)
  mm/asi: ASI entry/exit interface
  mm/asi: Improve TLB flushing when switching to an ASI pagetable
  mm/asi: Interrupt ASI on interrupt/exception/NMI
  mm/asi: Exit/enter ASI when task enters/exits scheduler
  mm/asi: ASI fault handler
  mm/asi: Implement PTI with ASI

 arch/x86/entry/calling.h           |  37 ++-
 arch/x86/entry/common.c            |  29 ++-
 arch/x86/entry/entry_64.S          |  28 ++
 arch/x86/include/asm/asi.h         | 289 +++++++++++++++++++++
 arch/x86/include/asm/asi_session.h |  24 ++
 arch/x86/include/asm/mmu_context.h |  20 +-
 arch/x86/include/asm/tlbflush.h    |  23 +-
 arch/x86/kernel/asm-offsets.c      |   5 +
 arch/x86/mm/Makefile               |   1 +
 arch/x86/mm/asi.c                  | 402 +++++++++++++++++++++++++++++
 arch/x86/mm/fault.c                |  20 ++
 arch/x86/mm/pti.c                  |  28 +-
 include/linux/mm_types.h           |   5 +
 include/linux/sched.h              |   9 +
 kernel/fork.c                      |  17 ++
 kernel/sched/core.c                |  17 ++
 security/Kconfig                   |  10 +
 17 files changed, 946 insertions(+), 18 deletions(-)
 create mode 100644 arch/x86/include/asm/asi.h
 create mode 100644 arch/x86/include/asm/asi_session.h
 create mode 100644 arch/x86/mm/asi.c

Comments

Dave Hansen May 12, 2020, 5:45 p.m. UTC | #1
On 5/4/20 7:49 AM, Alexandre Chartre wrote:
> This version 4 of the kernel Address Space Isolation (ASI) RFC. I have
> broken it down into three distinct parts:
> 
>  - Part I: ASI Infrastructure and PTI (this part)
>  - Part II: Decorated Page-Table
>  - Part III: ASI Test Driver and CLI
> 
> Part I is similar to RFCv3 [3] with some small bug fixes. Parts II and III
> extend the initial patchset: part II introduces decorated page-table in
> order to provide convenient page-table management functions, and part III
> provides a driver and CLI for testing ASI (using parts I and II).

These look interesting.  I haven't found any holes in your methods,
although the interrupt depth tracking worries me a bit.  I tried and
failed to do a similar thing with PTI in the NMI path, but you might
have just bested me there. :)

It's very interesting that you've been able to implement PTI underneath
all of this, and the "test driver" is really entertaining!

That said, this is working in some of the nastiest corners of the x86
code and this is going to take quite an investment to get reviewed.  I'm
not *quite* sure it's all worth it.

So, this isn't being ignored, I'm just not quite sure what to do with
it, yet.
Alexandre Chartre May 12, 2020, 7:25 p.m. UTC | #2
Hi Dave,

On 5/12/20 7:45 PM, Dave Hansen wrote:
> On 5/4/20 7:49 AM, Alexandre Chartre wrote:
>> This version 4 of the kernel Address Space Isolation (ASI) RFC. I have
>> broken it down into three distinct parts:
>>
>>   - Part I: ASI Infrastructure and PTI (this part)
>>   - Part II: Decorated Page-Table
>>   - Part III: ASI Test Driver and CLI
>>
>> Part I is similar to RFCv3 [3] with some small bug fixes. Parts II and III
>> extend the initial patchset: part II introduces decorated page-table in
>> order to provide convenient page-table management functions, and part III
>> provides a driver and CLI for testing ASI (using parts I and II).
> 
> These look interesting.  I haven't found any holes in your methods,
> although the interrupt depth tracking worries me a bit.  I tried and
> failed to do a similar thing with PTI in the NMI path, but you might
> have just bested me there. :)

Thanks for taking a look. I am glad it seems okay, I have run several tests
and was unable to have it fail (so far) while previous versions were easily
breakable.

> It's very interesting that you've been able to implement PTI underneath
> all of this, and the "test driver" is really entertaining!

Yeah, this a kind of PTI on steroid as part of the implementation was done
based on the PTI implementation but making it more generic. The test driver
has proven very useful for testing and debugging. I am currently using it
(with some extensions) for helping me define the KVM ASI: I can connect the
driver to a KVM ASI, dump the KVM ASI faults and dynamically add mappings;
this is very handy.

> That said, this is working in some of the nastiest corners of the x86
> code and this is going to take quite an investment to get reviewed.  I'm
> not *quite* sure it's all worth it.

I am also concerned about making changes in all these nasty corners. I am a
bit more confident now that it is working to implement PTI because PTI provides
a good stress test for ASI. I am also waiting for (and reviewing) all x86/entry
changes from tglx; this greatly cleans up the entry code and will hopefully help
for the integration of ASI. I will rebase as soon as these all changes are
integrated and check the benefit for ASI.

> So, this isn't being ignored, I'm just not quite sure what to do with
> it, yet.
> 

I am working on defining ASI for KVM. Hopefully this will provide a good
usage example, and make the changes more compelling.

Thanks.

alex.
Andy Lutomirski May 12, 2020, 8:07 p.m. UTC | #3
> On May 12, 2020, at 10:45 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 5/4/20 7:49 AM, Alexandre Chartre wrote:
>> This version 4 of the kernel Address Space Isolation (ASI) RFC. I have
>> broken it down into three distinct parts:
>> 
>> - Part I: ASI Infrastructure and PTI (this part)
>> - Part II: Decorated Page-Table
>> - Part III: ASI Test Driver and CLI
>> 
>> Part I is similar to RFCv3 [3] with some small bug fixes. Parts II and III
>> extend the initial patchset: part II introduces decorated page-table in
>> order to provide convenient page-table management functions, and part III
>> provides a driver and CLI for testing ASI (using parts I and II).
> 
> These look interesting.  I haven't found any holes in your methods,
> although the interrupt depth tracking worries me a bit.  I tried and
> failed to do a similar thing with PTI in the NMI path, but you might
> have just bested me there. :)
> 
> It's very interesting that you've been able to implement PTI underneath
> all of this, and the "test driver" is really entertaining!
> 
> That said, this is working in some of the nastiest corners of the x86
> code and this is going to take quite an investment to get reviewed.  I'm
> not *quite* sure it's all worth it.
> 
> So, this isn't being ignored, I'm just not quite sure what to do with
> it, yet.

I’m going to wait until the dust settles on tglx’s big entry rework before I look at this.