Message ID | 20220113233020.3986005-4-dmatlack@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/mmu: Fix write-protection bug in the TDP MMU | expand |
On Thu, Jan 13, 2022, David Matlack wrote: > +/* > + * *_SPTE_HOST_WRITEABLE (aka Host-writable) indicates whether the host permits > + * writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs > + * that map guest pages in read-only memslots and read-only VMAs. > + * > + * Invariants: > + * - If Host-writable is clear, PT_WRITABLE_MASK must be clear. > + * > + * > + * *_SPTE_MMU_WRITEABLE (aka MMU-writable) indicates whether the shadow MMU > + * allows writes to the guest page mapped by the SPTE. This bit is cleared when > + * the guest page mapped by the SPTE contains a page table that is being > + * monitored for shadow paging. In this case the SPTE can only be made writable > + * by unsyncing the shadow page under the mmu_lock. > + * > + * Invariants: > + * - If MMU-writable is clear, PT_WRITABLE_MASK must be clear. > + * - If MMU-writable is set, Host-writable must be set. > + * > + * If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared > + * to track writes for dirty logging. For such SPTEs, KVM will locklessly set > + * PT_WRITABLE_MASK upon the next write from the guest and record the write in > + * the dirty log (see fast_page_fault()). > + */ > + > +/* Bits 9 and 10 are ignored by all non-EPT PTEs. */ > +#define DEFAULT_SPTE_HOST_WRITEABLE BIT_ULL(9) > +#define DEFAULT_SPTE_MMU_WRITEABLE BIT_ULL(10) Ha, so there's a massive comment above is_writable_pte() that covers a lot of the same material. More below. > + > /* > * Low ignored bits are at a premium for EPT, use high ignored bits, taking care > * to not overlap the A/D type mask or the saved access bits of access-tracked > @@ -316,8 +341,13 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check, > > static inline bool spte_can_locklessly_be_made_writable(u64 spte) > { > - return (spte & shadow_host_writable_mask) && > - (spte & shadow_mmu_writable_mask); > + if (spte & shadow_mmu_writable_mask) { > + WARN_ON_ONCE(!(spte & shadow_host_writable_mask)); > + return true; > + } > + > + WARN_ON_ONCE(spte & PT_WRITABLE_MASK); I don't like having the WARNs here. This is a moderately hot path, there are a decent number of call sites, and the WARNs won't actually help detect the offender, i.e. whoever wrote the bad SPTE long since got away. And for whatever reason, I had a hell of a time (correctly) reading the second WARN :-) Lastly, there's also an "overlapping" WARN in mark_spte_for_access_track(). > + return false; To kill a few birds with fewer stones, what if we: a. Move is_writable_pte() into spte.h, somewhat close to the HOST/MMU_WRITABLE definitions. b. Add a new helper, spte_check_writable_invariants(), to enforce that a SPTE is WRITABLE iff it's MMU-Writable, and that a SPTE is MMU-Writable iff it's HOST-Writable. c. Drop the WARN in mark_spte_for_access_track(). d. Call spte_check_writable_invariants() when setting SPTEs. e. Document everything in a comment above spte_check_writable_invariants().
On Fri, Jan 14, 2022 at 2:29 PM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Jan 13, 2022, David Matlack wrote: > > +/* > > + * *_SPTE_HOST_WRITEABLE (aka Host-writable) indicates whether the host permits > > + * writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs > > + * that map guest pages in read-only memslots and read-only VMAs. > > + * > > + * Invariants: > > + * - If Host-writable is clear, PT_WRITABLE_MASK must be clear. > > + * > > + * > > + * *_SPTE_MMU_WRITEABLE (aka MMU-writable) indicates whether the shadow MMU > > + * allows writes to the guest page mapped by the SPTE. This bit is cleared when > > + * the guest page mapped by the SPTE contains a page table that is being > > + * monitored for shadow paging. In this case the SPTE can only be made writable > > + * by unsyncing the shadow page under the mmu_lock. > > + * > > + * Invariants: > > + * - If MMU-writable is clear, PT_WRITABLE_MASK must be clear. > > + * - If MMU-writable is set, Host-writable must be set. > > + * > > + * If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared > > + * to track writes for dirty logging. For such SPTEs, KVM will locklessly set > > + * PT_WRITABLE_MASK upon the next write from the guest and record the write in > > + * the dirty log (see fast_page_fault()). > > + */ > > + > > +/* Bits 9 and 10 are ignored by all non-EPT PTEs. */ > > +#define DEFAULT_SPTE_HOST_WRITEABLE BIT_ULL(9) > > +#define DEFAULT_SPTE_MMU_WRITEABLE BIT_ULL(10) > > Ha, so there's a massive comment above is_writable_pte() that covers a lot of > the same material. More below. > > > + > > /* > > * Low ignored bits are at a premium for EPT, use high ignored bits, taking care > > * to not overlap the A/D type mask or the saved access bits of access-tracked > > @@ -316,8 +341,13 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check, > > > > static inline bool spte_can_locklessly_be_made_writable(u64 spte) > > { > > - return (spte & shadow_host_writable_mask) && > > - (spte & shadow_mmu_writable_mask); > > + if (spte & shadow_mmu_writable_mask) { > > + WARN_ON_ONCE(!(spte & shadow_host_writable_mask)); > > + return true; > > + } > > + > > + WARN_ON_ONCE(spte & PT_WRITABLE_MASK); > > I don't like having the WARNs here. This is a moderately hot path, there are a > decent number of call sites, and the WARNs won't actually help detect the offender, > i.e. whoever wrote the bad SPTE long since got away. Re: hot path. The "return true" case (for fast_page_fault()) already had to do 2 bitwise-ANDs and compares, so this patch shouldn't make that any worse. But that's a good point that it doesn't help with detecting the offender. I agree these WARNs should move to where SPTEs are set. > > And for whatever reason, I had a hell of a time (correctly) reading the second WARN :-) > > Lastly, there's also an "overlapping" WARN in mark_spte_for_access_track(). > > > + return false; > > To kill a few birds with fewer stones, what if we: > > a. Move is_writable_pte() into spte.h, somewhat close to the HOST/MMU_WRITABLE > definitions. > > b. Add a new helper, spte_check_writable_invariants(), to enforce that a SPTE > is WRITABLE iff it's MMU-Writable, and that a SPTE is MMU-Writable iff it's > HOST-Writable. > > c. Drop the WARN in mark_spte_for_access_track(). > > d. Call spte_check_writable_invariants() when setting SPTEs. > > e. Document everything in a comment above spte_check_writable_invariants(). Sounds good. I'll send a follow-up series.
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index a4af2a42695c..be6a007a4af3 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -60,10 +60,6 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0); (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1)) #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) -/* Bits 9 and 10 are ignored by all non-EPT PTEs. */ -#define DEFAULT_SPTE_HOST_WRITEABLE BIT_ULL(9) -#define DEFAULT_SPTE_MMU_WRITEABLE BIT_ULL(10) - /* * The mask/shift to use for saving the original R/X bits when marking the PTE * as not-present for access tracking purposes. We do not save the W bit as the @@ -78,6 +74,35 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0); SHADOW_ACC_TRACK_SAVED_BITS_SHIFT) static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK)); +/* + * *_SPTE_HOST_WRITEABLE (aka Host-writable) indicates whether the host permits + * writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs + * that map guest pages in read-only memslots and read-only VMAs. + * + * Invariants: + * - If Host-writable is clear, PT_WRITABLE_MASK must be clear. + * + * + * *_SPTE_MMU_WRITEABLE (aka MMU-writable) indicates whether the shadow MMU + * allows writes to the guest page mapped by the SPTE. This bit is cleared when + * the guest page mapped by the SPTE contains a page table that is being + * monitored for shadow paging. In this case the SPTE can only be made writable + * by unsyncing the shadow page under the mmu_lock. + * + * Invariants: + * - If MMU-writable is clear, PT_WRITABLE_MASK must be clear. + * - If MMU-writable is set, Host-writable must be set. + * + * If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared + * to track writes for dirty logging. For such SPTEs, KVM will locklessly set + * PT_WRITABLE_MASK upon the next write from the guest and record the write in + * the dirty log (see fast_page_fault()). + */ + +/* Bits 9 and 10 are ignored by all non-EPT PTEs. */ +#define DEFAULT_SPTE_HOST_WRITEABLE BIT_ULL(9) +#define DEFAULT_SPTE_MMU_WRITEABLE BIT_ULL(10) + /* * Low ignored bits are at a premium for EPT, use high ignored bits, taking care * to not overlap the A/D type mask or the saved access bits of access-tracked @@ -316,8 +341,13 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check, static inline bool spte_can_locklessly_be_made_writable(u64 spte) { - return (spte & shadow_host_writable_mask) && - (spte & shadow_mmu_writable_mask); + if (spte & shadow_mmu_writable_mask) { + WARN_ON_ONCE(!(spte & shadow_host_writable_mask)); + return true; + } + + WARN_ON_ONCE(spte & PT_WRITABLE_MASK); + return false; } static inline u64 get_mmio_spte_generation(u64 spte)
SPTEs are tagged with software-only bits to indicate if it is "MMU-writable" and "Host-writable". These bits are used to determine why KVM has marked an SPTE as read-only. Document these bits and their invariants, and enforce the invariants with new WARNs in spte_can_locklessly_be_made_writable() to ensure they are not accidentally violated in the future. Opportunistically move DEFAULT_SPTE_{MMU,HOST}_WRITABLE next to EPT_SPTE_{MMU,HOST}_WRITABLE since the new documentation applies to both. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> --- arch/x86/kvm/mmu/spte.h | 42 +++++++++++++++++++++++++++++++++++------ 1 file changed, 36 insertions(+), 6 deletions(-)