diff mbox series

[v19,120/130] KVM: TDX: Add a method to ignore dirty logging

Message ID 1491dd247829bf1a29df1904aeed5ed6b464d29c.1708933498.git.isaku.yamahata@intel.com (mailing list archive)
State New, archived
Headers show
Series [v19,001/130] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro | expand

Commit Message

Isaku Yamahata Feb. 26, 2024, 8:27 a.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com>

Currently TDX KVM doesn't support tracking dirty pages (yet).  Implement a
method to ignore it.  Because the flag for kvm memory slot to enable dirty
logging isn't accepted for TDX, warn on the method is called for TDX.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/vmx/main.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Edgecombe, Rick P March 15, 2024, 12:06 a.m. UTC | #1
On Mon, 2024-02-26 at 00:27 -0800, isaku.yamahata@intel.com wrote:
>  
> +static void vt_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
> +{
> +       if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
> +               return;
> +
> +       vmx_update_cpu_dirty_logging(vcpu);
> +}

Discussed this first part offline, but logging it here. Since
guest_memfd cannot have dirty logging, this is essentially bugging the
VM if somehow they manage anyway. But it should be blocked via the code
in check_memory_region_flags().

On the subject of warnings and KVM_BUG_ON(), my feeling so far is that
this series is quite aggressive about these. Is it due the complexity
of the series? I think maybe we can remove some of the simple ones, but
not sure if there was already some discussion on what level is
appropriate.
Isaku Yamahata March 15, 2024, 1:35 a.m. UTC | #2
On Fri, Mar 15, 2024 at 12:06:31AM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Mon, 2024-02-26 at 00:27 -0800, isaku.yamahata@intel.com wrote:
> >  
> > +static void vt_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
> > +{
> > +       if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
> > +               return;
> > +
> > +       vmx_update_cpu_dirty_logging(vcpu);
> > +}
> 
> Discussed this first part offline, but logging it here. Since
> guest_memfd cannot have dirty logging, this is essentially bugging the
> VM if somehow they manage anyway. But it should be blocked via the code
> in check_memory_region_flags().

Will drop this patch.


> On the subject of warnings and KVM_BUG_ON(), my feeling so far is that
> this series is quite aggressive about these. Is it due the complexity
> of the series? I think maybe we can remove some of the simple ones, but
> not sure if there was already some discussion on what level is
> appropriate.

KVM_BUG_ON() was helpful at the early stage.  Because we don't hit them
recently, it's okay to remove them.  Will remove them.
Edgecombe, Rick P March 15, 2024, 2:01 p.m. UTC | #3
On Thu, 2024-03-14 at 18:35 -0700, Isaku Yamahata wrote:
> > On the subject of warnings and KVM_BUG_ON(), my feeling so far is
> > that
> > this series is quite aggressive about these. Is it due the
> > complexity
> > of the series? I think maybe we can remove some of the simple ones,
> > but
> > not sure if there was already some discussion on what level is
> > appropriate.
> 
> KVM_BUG_ON() was helpful at the early stage.  Because we don't hit
> them
> recently, it's okay to remove them.  Will remove them.

Hmm. We probably need to do it case by case.
Isaku Yamahata March 18, 2024, 5:12 p.m. UTC | #4
On Fri, Mar 15, 2024 at 02:01:43PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Thu, 2024-03-14 at 18:35 -0700, Isaku Yamahata wrote:
> > > On the subject of warnings and KVM_BUG_ON(), my feeling so far is
> > > that
> > > this series is quite aggressive about these. Is it due the
> > > complexity
> > > of the series? I think maybe we can remove some of the simple ones,
> > > but
> > > not sure if there was already some discussion on what level is
> > > appropriate.
> > 
> > KVM_BUG_ON() was helpful at the early stage.  Because we don't hit
> > them
> > recently, it's okay to remove them.  Will remove them.
> 
> Hmm. We probably need to do it case by case.

I categorize as follows. Unless otherwise, I'll update this series.

- dirty log check
  As we will drop this ptach, we'll have no call site.

- KVM_BUG_ON() in main.c
  We should drop them because their logic isn't complex.
  
- KVM_BUG_ON() in tdx.c
  - The error check of the return value from SEAMCALL
    We should keep it as it's unexpected error from TDX module. When we hit
    this, we should mark the guest bugged and prevent further operation.  It's
    hard to deduce the reason.  TDX mdoule might be broken.

  - Other check
    We should drop them.
Edgecombe, Rick P March 18, 2024, 5:43 p.m. UTC | #5
On Mon, 2024-03-18 at 10:12 -0700, Isaku Yamahata wrote:
> I categorize as follows. Unless otherwise, I'll update this series.
> 
> - dirty log check
>   As we will drop this ptach, we'll have no call site.
> 
> - KVM_BUG_ON() in main.c
>   We should drop them because their logic isn't complex.
What about "KVM: TDX: Add methods to ignore guest instruction
emulation"? Is it cleanly blocked somehow?

>   
> - KVM_BUG_ON() in tdx.c
>   - The error check of the return value from SEAMCALL
>     We should keep it as it's unexpected error from TDX module. When
> we hit
>     this, we should mark the guest bugged and prevent further
> operation.  It's
>     hard to deduce the reason.  TDX mdoule might be broken.
Yes. Makes sense.

> 
>   - Other check
>     We should drop them.

Offhand, I'm not sure what is in this category.
Isaku Yamahata March 18, 2024, 11:16 p.m. UTC | #6
On Mon, Mar 18, 2024 at 05:43:33PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Mon, 2024-03-18 at 10:12 -0700, Isaku Yamahata wrote:
> > I categorize as follows. Unless otherwise, I'll update this series.
> > 
> > - dirty log check
> >   As we will drop this ptach, we'll have no call site.
> > 
> > - KVM_BUG_ON() in main.c
> >   We should drop them because their logic isn't complex.
> What about "KVM: TDX: Add methods to ignore guest instruction
> emulation"? Is it cleanly blocked somehow?

KVM fault handler, kvm_mmu_page_fault(), is the caller into the emulation,
It should skip the emulation.

As the second guard, x86_emulate_instruction(), calls
check_emulate_instruction() callback to check if the emulation can/should be
done.  TDX callback can return it as X86EMUL_UNHANDLEABLE.  Then, the flow goes
to user space as error.  I'll update the vt_check_emulate_instruction().



> > - KVM_BUG_ON() in tdx.c
> >   - The error check of the return value from SEAMCALL
> >     We should keep it as it's unexpected error from TDX module. When
> > we hit
> >     this, we should mark the guest bugged and prevent further
> > operation.  It's
> >     hard to deduce the reason.  TDX mdoule might be broken.
> Yes. Makes sense.
> 
> > 
> >   - Other check
> >     We should drop them.
> 
> Offhand, I'm not sure what is in this category.

- Checking error code on TD enter/exit
  I'll revise how to check error from TD enter/exit.  We'll have new code.  I
  will update wrapper function to take struct kvm_tdx or struct tdx_vcpu, and
  revise to remove random check.  Cleanups related to kvm_rebooting,
  TDX_SW_ERROR, kvm_spurious_fault()

- Remaining random check for debug.
  The examples are as follows.  They were added for debug.


@@ -797,18 +788,14 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu)
         * list_{del,add}() on associated_tdvcpus list later.
         */
        tdx_disassociate_vp_on_cpu(vcpu);
-       WARN_ON_ONCE(vcpu->cpu != -1);
 
        /*
         * This methods can be called when vcpu allocation/initialization
         * failed. So it's possible that hkid, tdvpx and tdvpr are not assigned
         * yet.
         */
-       if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm))) {
-               WARN_ON_ONCE(tdx->tdvpx_pa);
-               WARN_ON_ONCE(tdx->tdvpr_pa);
+       if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
                return;
-       }
 
        if (tdx->tdvpx_pa) {
                for (i = 0; i < tdx_info->nr_tdvpx_pages; i++) {
@@ -831,9 +818,9 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 
        /* vcpu_deliver_init method silently discards INIT event. */
-       if (KVM_BUG_ON(init_event, vcpu->kvm))
+       if (init_event)
                return;
-       if (KVM_BUG_ON(is_td_vcpu_created(to_tdx(vcpu)), vcpu->kvm))
+       if (is_td_vcpu_created(to_tdx(vcpu)))
                return;
 
        /*
@@ -831,9 +818,9 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 
        /* vcpu_deliver_init method silently discards INIT event. */
-       if (KVM_BUG_ON(init_event, vcpu->kvm))
+       if (init_event)
                return;
-       if (KVM_BUG_ON(is_td_vcpu_created(to_tdx(vcpu)), vcpu->kvm))
+       if (is_td_vcpu_created(to_tdx(vcpu)))
                return;
 
        /*
@@ -831,9 +818,9 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 
        /* vcpu_deliver_init method silently discards INIT event. */
-       if (KVM_BUG_ON(init_event, vcpu->kvm))
+       if (init_event)
                return;
-       if (KVM_BUG_ON(is_td_vcpu_created(to_tdx(vcpu)), vcpu->kvm))
+       if (is_td_vcpu_created(to_tdx(vcpu)))
                return;
 
        /*
Isaku Yamahata March 22, 2024, 10:57 p.m. UTC | #7
On Mon, Mar 18, 2024 at 04:16:56PM -0700,
Isaku Yamahata <isaku.yamahata@intel.com> wrote:

> On Mon, Mar 18, 2024 at 05:43:33PM +0000,
> "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:
> 
> > On Mon, 2024-03-18 at 10:12 -0700, Isaku Yamahata wrote:
> > > I categorize as follows. Unless otherwise, I'll update this series.
> > > 
> > > - dirty log check
> > >   As we will drop this ptach, we'll have no call site.
> > > 
> > > - KVM_BUG_ON() in main.c
> > >   We should drop them because their logic isn't complex.
> > What about "KVM: TDX: Add methods to ignore guest instruction
> > emulation"? Is it cleanly blocked somehow?
> 
> KVM fault handler, kvm_mmu_page_fault(), is the caller into the emulation,
> It should skip the emulation.
> 
> As the second guard, x86_emulate_instruction(), calls
> check_emulate_instruction() callback to check if the emulation can/should be
> done.  TDX callback can return it as X86EMUL_UNHANDLEABLE.  Then, the flow goes
> to user space as error.  I'll update the vt_check_emulate_instruction().

Oops. It was wrong. It should be X86EMUL_RETRY_INSTR.  RETRY_INSTR means, let
vcpu execute the intrusion again, UNHANDLEABLE means, emulator can't emulate,
inject exception or give up with KVM_EXIT_INTERNAL_ERROR.

For TDX, we'd like to inject #VE to the guest so that the guest #VE handler
can issue TDG.VP.VMCALL<MMIO>.  The default non-present sept value has
#VE suppress bit set.  As first step, EPT violation occurs. then KVM sets
up mmio_spte with #VE suppress bit cleared. Then X86EMUL_RETRY_INSTR tells
kvm to resume vcpu to inject #VE.
Edgecombe, Rick P March 22, 2024, 11:05 p.m. UTC | #8
On Fri, 2024-03-22 at 15:57 -0700, Isaku Yamahata wrote:
> > KVM fault handler, kvm_mmu_page_fault(), is the caller into the
> > emulation,
> > It should skip the emulation.
> > 
> > As the second guard, x86_emulate_instruction(), calls
> > check_emulate_instruction() callback to check if the emulation
> > can/should be
> > done.  TDX callback can return it as X86EMUL_UNHANDLEABLE.  Then,
> > the flow goes
> > to user space as error.  I'll update the
> > vt_check_emulate_instruction().
> 
> Oops. It was wrong. It should be X86EMUL_RETRY_INSTR.  RETRY_INSTR
> means, let
> vcpu execute the intrusion again, UNHANDLEABLE means, emulator can't
> emulate,
> inject exception or give up with KVM_EXIT_INTERNAL_ERROR.
> 
> For TDX, we'd like to inject #VE to the guest so that the guest #VE
> handler
> can issue TDG.VP.VMCALL<MMIO>.  The default non-present sept value
> has
> #VE suppress bit set.  As first step, EPT violation occurs. then KVM
> sets
> up mmio_spte with #VE suppress bit cleared. Then X86EMUL_RETRY_INSTR
> tells
> kvm to resume vcpu to inject #VE.

Ah, so in a normal VM it would:
 - get ept violation for no memslot
 - setup MMIO PTE for later
 - go ahead and head to the emulator anyway instead of waiting to
refault

In TDX, it could skip the last step and head right to the guest, but
instead if heads to the emulator and gets told to retry there. It's a
good solution in that there is already an way to cleanly insert logic
at check_emulate_instruction(). Otherwise we would need to add another
check somewhere to know in TDX to consider the fault resolved. 

I was initially thinking if it gets anywhere near the emulator things
have gone wrong and we are missing something. The downside is that we
can't find those issues. If something tries to use the emulator it will
just fault in a loop instead of exiting to userspace or bugging the VM.
Hmm.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 9aaa9bd7abf3..991bfdecaed2 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -834,6 +834,14 @@  static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 	return vmx_get_mt_mask(vcpu, gfn, is_mmio);
 }
 
+static void vt_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
+{
+	if (KVM_BUG_ON(is_td_vcpu(vcpu), vcpu->kvm))
+		return;
+
+	vmx_update_cpu_dirty_logging(vcpu);
+}
+
 static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
 {
 	if (!is_td(kvm))
@@ -1008,7 +1016,7 @@  struct kvm_x86_ops vt_x86_ops __initdata = {
 	.sched_in = vt_sched_in,
 
 	.cpu_dirty_log_size = PML_ENTITY_NUM,
-	.update_cpu_dirty_logging = vmx_update_cpu_dirty_logging,
+	.update_cpu_dirty_logging = vt_update_cpu_dirty_logging,
 
 	.nested_ops = &vmx_nested_ops,