diff mbox series

[RFC,06/27] KVM: x86: Exit KVM isolation on IRQ entry

Message ID 1557758315-12667-7-git-send-email-alexandre.chartre@oracle.com (mailing list archive)
State New, archived
Headers show
Series KVM Address Space Isolation | expand

Commit Message

Alexandre Chartre May 13, 2019, 2:38 p.m. UTC
From: Liran Alon <liran.alon@oracle.com>

Next commits will change most of KVM #VMExit handlers to run
in KVM isolated address space. Any interrupt handler raised
during execution in KVM address space needs to switch back
to host address space.

This patch makes sure that IRQ handlers will run in full
host address space instead of KVM isolated address space.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
---
 arch/x86/include/asm/apic.h    |    4 ++--
 arch/x86/include/asm/hardirq.h |   10 ++++++++++
 arch/x86/kernel/smp.c          |    2 +-
 arch/x86/platform/uv/tlb_uv.c  |    2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

Comments

Andy Lutomirski May 13, 2019, 3:51 p.m. UTC | #1
On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre
<alexandre.chartre@oracle.com> wrote:
>
> From: Liran Alon <liran.alon@oracle.com>
>
> Next commits will change most of KVM #VMExit handlers to run
> in KVM isolated address space. Any interrupt handler raised
> during execution in KVM address space needs to switch back
> to host address space.
>
> This patch makes sure that IRQ handlers will run in full
> host address space instead of KVM isolated address space.

IMO this needs to be somewhere a lot more central.  What about NMI and
MCE?  Or async page faults?  Or any other entry?

--Andy
Alexandre Chartre May 13, 2019, 4:28 p.m. UTC | #2
On 5/13/19 5:51 PM, Andy Lutomirski wrote:
> On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre
> <alexandre.chartre@oracle.com> wrote:
>>
>> From: Liran Alon <liran.alon@oracle.com>
>>
>> Next commits will change most of KVM #VMExit handlers to run
>> in KVM isolated address space. Any interrupt handler raised
>> during execution in KVM address space needs to switch back
>> to host address space.
>>
>> This patch makes sure that IRQ handlers will run in full
>> host address space instead of KVM isolated address space.
> 
> IMO this needs to be somewhere a lot more central.  What about NMI and
> MCE?  Or async page faults?  Or any other entry?
> 

Actually, I am not sure this is effectively useful because the IRQ
handler is probably faulting before it tries to exit isolation, so
the isolation exit will be done by the kvm page fault handler. I need
to check that.

alex.
Andy Lutomirski May 13, 2019, 6:13 p.m. UTC | #3
On Mon, May 13, 2019 at 9:28 AM Alexandre Chartre
<alexandre.chartre@oracle.com> wrote:
>
>
>
> On 5/13/19 5:51 PM, Andy Lutomirski wrote:
> > On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre
> > <alexandre.chartre@oracle.com> wrote:
> >>
> >> From: Liran Alon <liran.alon@oracle.com>
> >>
> >> Next commits will change most of KVM #VMExit handlers to run
> >> in KVM isolated address space. Any interrupt handler raised
> >> during execution in KVM address space needs to switch back
> >> to host address space.
> >>
> >> This patch makes sure that IRQ handlers will run in full
> >> host address space instead of KVM isolated address space.
> >
> > IMO this needs to be somewhere a lot more central.  What about NMI and
> > MCE?  Or async page faults?  Or any other entry?
> >
>
> Actually, I am not sure this is effectively useful because the IRQ
> handler is probably faulting before it tries to exit isolation, so
> the isolation exit will be done by the kvm page fault handler. I need
> to check that.
>

The whole idea of having #PF exit with a different CR3 than was loaded
on entry seems questionable to me.  I'd be a lot more comfortable with
the whole idea if a page fault due to accessing the wrong data was an
OOPS and the code instead just did the right thing directly.

--Andy
Peter Zijlstra May 14, 2019, 7:07 a.m. UTC | #4
On Mon, May 13, 2019 at 11:13:34AM -0700, Andy Lutomirski wrote:
> On Mon, May 13, 2019 at 9:28 AM Alexandre Chartre
> <alexandre.chartre@oracle.com> wrote:

> > Actually, I am not sure this is effectively useful because the IRQ
> > handler is probably faulting before it tries to exit isolation, so
> > the isolation exit will be done by the kvm page fault handler. I need
> > to check that.
> >
> 
> The whole idea of having #PF exit with a different CR3 than was loaded
> on entry seems questionable to me.  I'd be a lot more comfortable with
> the whole idea if a page fault due to accessing the wrong data was an
> OOPS and the code instead just did the right thing directly.

So I've ran into this idea before; it basically allows a lazy approach
to things.

I'm somewhat conflicted on things, on the one hand, changing CR3 from
#PF is a natural extention in that #PF already changes page-tables (for
userspace / vmalloc etc..), on the other hand, there's a thin line
between being lazy and being sloppy.

If we're going down this route; I think we need a very coherent design
and strong rules.
Alexandre Chartre May 14, 2019, 7:58 a.m. UTC | #5
On 5/14/19 9:07 AM, Peter Zijlstra wrote:
> On Mon, May 13, 2019 at 11:13:34AM -0700, Andy Lutomirski wrote:
>> On Mon, May 13, 2019 at 9:28 AM Alexandre Chartre
>> <alexandre.chartre@oracle.com> wrote:
> 
>>> Actually, I am not sure this is effectively useful because the IRQ
>>> handler is probably faulting before it tries to exit isolation, so
>>> the isolation exit will be done by the kvm page fault handler. I need
>>> to check that.
>>>
>>
>> The whole idea of having #PF exit with a different CR3 than was loaded
>> on entry seems questionable to me.  I'd be a lot more comfortable with
>> the whole idea if a page fault due to accessing the wrong data was an
>> OOPS and the code instead just did the right thing directly.
> 
> So I've ran into this idea before; it basically allows a lazy approach
> to things.
> 
> I'm somewhat conflicted on things, on the one hand, changing CR3 from
> #PF is a natural extention in that #PF already changes page-tables (for
> userspace / vmalloc etc..), on the other hand, there's a thin line
> between being lazy and being sloppy.
> 
> If we're going down this route; I think we need a very coherent design
> and strong rules.
> 

Right. We should particularly ensure that the KVM page-table remains a
subset of the kernel page-table, in particular page-table changes (e.g.
for vmalloc etc...) should happen in the kernel page-table and not in
the kvm page-table.

So we should probably enforce switching to the kernel page-table when
doing operation like vmalloc. The current code doesn't enforce it, but
I can see it faulting, when doing any allocation (because the kvm page
table doesn't have all structures used during an allocation).

alex.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 130e81e..606da8f 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -515,7 +515,7 @@  static inline unsigned int read_apic_id(void)
 static inline void entering_irq(void)
 {
 	irq_enter();
-	kvm_set_cpu_l1tf_flush_l1d();
+	kvm_cpu_may_access_sensitive_data();
 }
 
 static inline void entering_ack_irq(void)
@@ -528,7 +528,7 @@  static inline void ipi_entering_ack_irq(void)
 {
 	irq_enter();
 	ack_APIC_irq();
-	kvm_set_cpu_l1tf_flush_l1d();
+	kvm_cpu_may_access_sensitive_data();
 }
 
 static inline void exiting_irq(void)
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index d9069bb..e082ecb 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -80,4 +80,14 @@  static inline bool kvm_get_cpu_l1tf_flush_l1d(void)
 static inline void kvm_set_cpu_l1tf_flush_l1d(void) { }
 #endif /* IS_ENABLED(CONFIG_KVM_INTEL) */
 
+#ifdef CONFIG_HAVE_KVM
+extern void (*kvm_isolation_exit_handler)(void);
+
+static inline void kvm_cpu_may_access_sensitive_data(void)
+{
+	kvm_set_cpu_l1tf_flush_l1d();
+	kvm_isolation_exit_handler();
+}
+#endif
+
 #endif /* _ASM_X86_HARDIRQ_H */
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 04adc8d..b99fda0 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -261,7 +261,7 @@  __visible void __irq_entry smp_reschedule_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
 	inc_irq_stat(irq_resched_count);
-	kvm_set_cpu_l1tf_flush_l1d();
+	kvm_cpu_may_access_sensitive_data();
 
 	if (trace_resched_ipi_enabled()) {
 		/*
diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c
index 1297e18..83a17ca 100644
--- a/arch/x86/platform/uv/tlb_uv.c
+++ b/arch/x86/platform/uv/tlb_uv.c
@@ -1285,7 +1285,7 @@  void uv_bau_message_interrupt(struct pt_regs *regs)
 	struct msg_desc msgdesc;
 
 	ack_APIC_irq();
-	kvm_set_cpu_l1tf_flush_l1d();
+	kvm_cpu_may_access_sensitive_data();
 	time_start = get_cycles();
 
 	bcp = &per_cpu(bau_control, smp_processor_id());