diff mbox series

[v2,3/4] KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariants

Message ID 20220113233020.3986005-4-dmatlack@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/mmu: Fix write-protection bug in the TDP MMU | expand

Commit Message

David Matlack Jan. 13, 2022, 11:30 p.m. UTC
SPTEs are tagged with software-only bits to indicate if it is
"MMU-writable" and "Host-writable". These bits are used to determine why
KVM has marked an SPTE as read-only.

Document these bits and their invariants, and enforce the invariants
with new WARNs in spte_can_locklessly_be_made_writable() to ensure they
are not accidentally violated in the future.

Opportunistically move DEFAULT_SPTE_{MMU,HOST}_WRITABLE next to
EPT_SPTE_{MMU,HOST}_WRITABLE since the new documentation applies to
both.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/spte.h | 42 +++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

Comments

Sean Christopherson Jan. 14, 2022, 10:29 p.m. UTC | #1
On Thu, Jan 13, 2022, David Matlack wrote:
> +/*
> + * *_SPTE_HOST_WRITEABLE (aka Host-writable) indicates whether the host permits
> + * writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs
> + * that map guest pages in read-only memslots and read-only VMAs.
> + *
> + * Invariants:
> + *  - If Host-writable is clear, PT_WRITABLE_MASK must be clear.
> + *
> + *
> + * *_SPTE_MMU_WRITEABLE (aka MMU-writable) indicates whether the shadow MMU
> + * allows writes to the guest page mapped by the SPTE. This bit is cleared when
> + * the guest page mapped by the SPTE contains a page table that is being
> + * monitored for shadow paging. In this case the SPTE can only be made writable
> + * by unsyncing the shadow page under the mmu_lock.
> + *
> + * Invariants:
> + *  - If MMU-writable is clear, PT_WRITABLE_MASK must be clear.
> + *  - If MMU-writable is set, Host-writable must be set.
> + *
> + * If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared
> + * to track writes for dirty logging. For such SPTEs, KVM will locklessly set
> + * PT_WRITABLE_MASK upon the next write from the guest and record the write in
> + * the dirty log (see fast_page_fault()).
> + */
> +
> +/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
> +#define DEFAULT_SPTE_HOST_WRITEABLE	BIT_ULL(9)
> +#define DEFAULT_SPTE_MMU_WRITEABLE	BIT_ULL(10)

Ha, so there's a massive comment above is_writable_pte() that covers a lot of
the same material.  More below.

> +
>  /*
>   * Low ignored bits are at a premium for EPT, use high ignored bits, taking care
>   * to not overlap the A/D type mask or the saved access bits of access-tracked
> @@ -316,8 +341,13 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
>  
>  static inline bool spte_can_locklessly_be_made_writable(u64 spte)
>  {
> -	return (spte & shadow_host_writable_mask) &&
> -	       (spte & shadow_mmu_writable_mask);
> +	if (spte & shadow_mmu_writable_mask) {
> +		WARN_ON_ONCE(!(spte & shadow_host_writable_mask));
> +		return true;
> +	}
> +
> +	WARN_ON_ONCE(spte & PT_WRITABLE_MASK);

I don't like having the WARNs here.  This is a moderately hot path, there are a
decent number of call sites, and the WARNs won't actually help detect the offender,
i.e. whoever wrote the bad SPTE long since got away.

And for whatever reason, I had a hell of a time (correctly) reading the second WARN :-)

Lastly, there's also an "overlapping" WARN in mark_spte_for_access_track().

> +	return false;

To kill a few birds with fewer stones, what if we:

  a. Move is_writable_pte() into spte.h, somewhat close to the HOST/MMU_WRITABLE
     definitions.

  b. Add a new helper, spte_check_writable_invariants(), to enforce that a SPTE
     is WRITABLE iff it's MMU-Writable, and that a SPTE is MMU-Writable iff it's
     HOST-Writable.

  c. Drop the WARN in mark_spte_for_access_track().

  d. Call spte_check_writable_invariants() when setting SPTEs.

  e. Document everything in a comment above spte_check_writable_invariants().
David Matlack Jan. 18, 2022, 5:45 p.m. UTC | #2
On Fri, Jan 14, 2022 at 2:29 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Jan 13, 2022, David Matlack wrote:
> > +/*
> > + * *_SPTE_HOST_WRITEABLE (aka Host-writable) indicates whether the host permits
> > + * writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs
> > + * that map guest pages in read-only memslots and read-only VMAs.
> > + *
> > + * Invariants:
> > + *  - If Host-writable is clear, PT_WRITABLE_MASK must be clear.
> > + *
> > + *
> > + * *_SPTE_MMU_WRITEABLE (aka MMU-writable) indicates whether the shadow MMU
> > + * allows writes to the guest page mapped by the SPTE. This bit is cleared when
> > + * the guest page mapped by the SPTE contains a page table that is being
> > + * monitored for shadow paging. In this case the SPTE can only be made writable
> > + * by unsyncing the shadow page under the mmu_lock.
> > + *
> > + * Invariants:
> > + *  - If MMU-writable is clear, PT_WRITABLE_MASK must be clear.
> > + *  - If MMU-writable is set, Host-writable must be set.
> > + *
> > + * If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared
> > + * to track writes for dirty logging. For such SPTEs, KVM will locklessly set
> > + * PT_WRITABLE_MASK upon the next write from the guest and record the write in
> > + * the dirty log (see fast_page_fault()).
> > + */
> > +
> > +/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
> > +#define DEFAULT_SPTE_HOST_WRITEABLE  BIT_ULL(9)
> > +#define DEFAULT_SPTE_MMU_WRITEABLE   BIT_ULL(10)
>
> Ha, so there's a massive comment above is_writable_pte() that covers a lot of
> the same material.  More below.
>
> > +
> >  /*
> >   * Low ignored bits are at a premium for EPT, use high ignored bits, taking care
> >   * to not overlap the A/D type mask or the saved access bits of access-tracked
> > @@ -316,8 +341,13 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
> >
> >  static inline bool spte_can_locklessly_be_made_writable(u64 spte)
> >  {
> > -     return (spte & shadow_host_writable_mask) &&
> > -            (spte & shadow_mmu_writable_mask);
> > +     if (spte & shadow_mmu_writable_mask) {
> > +             WARN_ON_ONCE(!(spte & shadow_host_writable_mask));
> > +             return true;
> > +     }
> > +
> > +     WARN_ON_ONCE(spte & PT_WRITABLE_MASK);
>
> I don't like having the WARNs here.  This is a moderately hot path, there are a
> decent number of call sites, and the WARNs won't actually help detect the offender,
> i.e. whoever wrote the bad SPTE long since got away.

Re: hot path. The "return true" case (for fast_page_fault()) already
had to do 2 bitwise-ANDs and compares, so this patch shouldn't make
that any worse.

But that's a good point that it doesn't help with detecting the
offender. I agree these WARNs should move to where SPTEs are set.

>
> And for whatever reason, I had a hell of a time (correctly) reading the second WARN :-)
>
> Lastly, there's also an "overlapping" WARN in mark_spte_for_access_track().
>
> > +     return false;
>
> To kill a few birds with fewer stones, what if we:
>
>   a. Move is_writable_pte() into spte.h, somewhat close to the HOST/MMU_WRITABLE
>      definitions.
>
>   b. Add a new helper, spte_check_writable_invariants(), to enforce that a SPTE
>      is WRITABLE iff it's MMU-Writable, and that a SPTE is MMU-Writable iff it's
>      HOST-Writable.
>
>   c. Drop the WARN in mark_spte_for_access_track().
>
>   d. Call spte_check_writable_invariants() when setting SPTEs.
>
>   e. Document everything in a comment above spte_check_writable_invariants().

Sounds good. I'll send a follow-up series.
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index a4af2a42695c..be6a007a4af3 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -60,10 +60,6 @@  static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
 	(((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
-/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
-#define DEFAULT_SPTE_HOST_WRITEABLE	BIT_ULL(9)
-#define DEFAULT_SPTE_MMU_WRITEABLE	BIT_ULL(10)
-
 /*
  * The mask/shift to use for saving the original R/X bits when marking the PTE
  * as not-present for access tracking purposes. We do not save the W bit as the
@@ -78,6 +74,35 @@  static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
 					 SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)
 static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
 
+/*
+ * *_SPTE_HOST_WRITEABLE (aka Host-writable) indicates whether the host permits
+ * writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs
+ * that map guest pages in read-only memslots and read-only VMAs.
+ *
+ * Invariants:
+ *  - If Host-writable is clear, PT_WRITABLE_MASK must be clear.
+ *
+ *
+ * *_SPTE_MMU_WRITEABLE (aka MMU-writable) indicates whether the shadow MMU
+ * allows writes to the guest page mapped by the SPTE. This bit is cleared when
+ * the guest page mapped by the SPTE contains a page table that is being
+ * monitored for shadow paging. In this case the SPTE can only be made writable
+ * by unsyncing the shadow page under the mmu_lock.
+ *
+ * Invariants:
+ *  - If MMU-writable is clear, PT_WRITABLE_MASK must be clear.
+ *  - If MMU-writable is set, Host-writable must be set.
+ *
+ * If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared
+ * to track writes for dirty logging. For such SPTEs, KVM will locklessly set
+ * PT_WRITABLE_MASK upon the next write from the guest and record the write in
+ * the dirty log (see fast_page_fault()).
+ */
+
+/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
+#define DEFAULT_SPTE_HOST_WRITEABLE	BIT_ULL(9)
+#define DEFAULT_SPTE_MMU_WRITEABLE	BIT_ULL(10)
+
 /*
  * Low ignored bits are at a premium for EPT, use high ignored bits, taking care
  * to not overlap the A/D type mask or the saved access bits of access-tracked
@@ -316,8 +341,13 @@  static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
 
 static inline bool spte_can_locklessly_be_made_writable(u64 spte)
 {
-	return (spte & shadow_host_writable_mask) &&
-	       (spte & shadow_mmu_writable_mask);
+	if (spte & shadow_mmu_writable_mask) {
+		WARN_ON_ONCE(!(spte & shadow_host_writable_mask));
+		return true;
+	}
+
+	WARN_ON_ONCE(spte & PT_WRITABLE_MASK);
+	return false;
 }
 
 static inline u64 get_mmio_spte_generation(u64 spte)