Message ID | 20240529145628.3272630-13-maz@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: nv: Shadow stage-2 page table handling | expand |
On Wed, May 29, 2024 at 03:56:24PM +0100, Marc Zyngier wrote: > Populate bits [56:55] of the leaf entry with the level provided > by the guest's S2 translation. This will allow us to better scope > the invalidation by remembering the mapping size. > > Of course, this assume that the guest will issue an invalidation > with an address that falls into the same leaf. If the guest doesn't, > we'll over-invalidate. > > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > arch/arm64/include/asm/kvm_nested.h | 8 ++++++++ > arch/arm64/kvm/mmu.c | 17 +++++++++++++++-- > 2 files changed, 23 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h > index fcb0de3a93fe..971dbe533730 100644 > --- a/arch/arm64/include/asm/kvm_nested.h > +++ b/arch/arm64/include/asm/kvm_nested.h > @@ -5,6 +5,7 @@ > #include <linux/bitfield.h> > #include <linux/kvm_host.h> > #include <asm/kvm_emulate.h> > +#include <asm/kvm_pgtable.h> > > static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu) > { > @@ -195,4 +196,11 @@ static inline bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr) > } > #endif > > +#define KVM_NV_GUEST_MAP_SZ (KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0) > + > +static inline u64 kvm_encode_nested_level(struct kvm_s2_trans *trans) > +{ > + return FIELD_PREP(KVM_NV_GUEST_MAP_SZ, trans->level); > +} > + It might be nice to keep all of the software fields for (in)valid in a central place so we can add some documentation. I fear this is going to get rather complicated as more pieces of pKVM land upstream and we find new and fun ways to cram data into stage-2. > #endif /* __ARM64_KVM_NESTED_H */ > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 4ed93a384255..f3a8ec70bd29 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1598,11 +1598,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > * Potentially reduce shadow S2 permissions to match the guest's own > * S2. For exec faults, we'd only reach this point if the guest > * actually allowed it (see kvm_s2_handle_perm_fault). > + * > + * Also encode the level of the nested translation in the SW bits of > + * the PTE/PMD/PUD. This will be retrived on TLB invalidation from > + * the guest. typo: retrieved Also, it might be helpful to add some color here to indicate the encoded TTL is used to represent the span of a single virtual TLB entry, providing scope to the TLBI by address. Before I actually read what was going on, I thought the TTL in the PTE was used for matching invalidation scopes that have a valid TTL.
On Mon, 03 Jun 2024 19:05:23 +0100, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Wed, May 29, 2024 at 03:56:24PM +0100, Marc Zyngier wrote: > > Populate bits [56:55] of the leaf entry with the level provided > > by the guest's S2 translation. This will allow us to better scope > > the invalidation by remembering the mapping size. > > > > Of course, this assume that the guest will issue an invalidation > > with an address that falls into the same leaf. If the guest doesn't, > > we'll over-invalidate. > > > > Signed-off-by: Marc Zyngier <maz@kernel.org> > > --- > > arch/arm64/include/asm/kvm_nested.h | 8 ++++++++ > > arch/arm64/kvm/mmu.c | 17 +++++++++++++++-- > > 2 files changed, 23 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h > > index fcb0de3a93fe..971dbe533730 100644 > > --- a/arch/arm64/include/asm/kvm_nested.h > > +++ b/arch/arm64/include/asm/kvm_nested.h > > @@ -5,6 +5,7 @@ > > #include <linux/bitfield.h> > > #include <linux/kvm_host.h> > > #include <asm/kvm_emulate.h> > > +#include <asm/kvm_pgtable.h> > > > > static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu) > > { > > @@ -195,4 +196,11 @@ static inline bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr) > > } > > #endif > > > > +#define KVM_NV_GUEST_MAP_SZ (KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0) > > + > > +static inline u64 kvm_encode_nested_level(struct kvm_s2_trans *trans) > > +{ > > + return FIELD_PREP(KVM_NV_GUEST_MAP_SZ, trans->level); > > +} > > + > > It might be nice to keep all of the software fields for (in)valid in > a central place so we can add some documentation. I fear this is going > to get rather complicated as more pieces of pKVM land upstream and we > find new and fun ways to cram data into stage-2. I had that at some point, but it then became clear that pKVM and NV were pretty much incompatible in their current respective incarnation. To get them to play together, you'd need to reinvent the NV wheel solely at EL2, something that nobody is looking forward to. What I'm aiming at with this digression is that although they use the same bits, NV and pKVM are never using them at the same time. If we shove them at the same location, we make it less clear what is used when (hence pKVM keeping its toys in mem_protect.h). But maybe you had a scheme in mind that would avoid this situation? > > > #endif /* __ARM64_KVM_NESTED_H */ > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 4ed93a384255..f3a8ec70bd29 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1598,11 +1598,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > * Potentially reduce shadow S2 permissions to match the guest's own > > * S2. For exec faults, we'd only reach this point if the guest > > * actually allowed it (see kvm_s2_handle_perm_fault). > > + * > > + * Also encode the level of the nested translation in the SW bits of > > + * the PTE/PMD/PUD. This will be retrived on TLB invalidation from > > + * the guest. > > typo: retrieved > > Also, it might be helpful to add some color here to indicate the encoded > TTL is used to represent the span of a single virtual TLB entry, > providing scope to the TLBI by address. > > Before I actually read what was going on, I thought the TTL in the PTE > was used for matching invalidation scopes that have a valid TTL. How about this: /* * Also encode the level of the original translation in the SW bits of * the leaf entry as a proxy for the span of that translation. This will * be retrieved on TLB invalidation from the guest and used to limit * the invalidation scope if a TTL hint or a range isn't provided. */ Thanks, M.
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h index fcb0de3a93fe..971dbe533730 100644 --- a/arch/arm64/include/asm/kvm_nested.h +++ b/arch/arm64/include/asm/kvm_nested.h @@ -5,6 +5,7 @@ #include <linux/bitfield.h> #include <linux/kvm_host.h> #include <asm/kvm_emulate.h> +#include <asm/kvm_pgtable.h> static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu) { @@ -195,4 +196,11 @@ static inline bool kvm_auth_eretax(struct kvm_vcpu *vcpu, u64 *elr) } #endif +#define KVM_NV_GUEST_MAP_SZ (KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0) + +static inline u64 kvm_encode_nested_level(struct kvm_s2_trans *trans) +{ + return FIELD_PREP(KVM_NV_GUEST_MAP_SZ, trans->level); +} + #endif /* __ARM64_KVM_NESTED_H */ diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 4ed93a384255..f3a8ec70bd29 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1598,11 +1598,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * Potentially reduce shadow S2 permissions to match the guest's own * S2. For exec faults, we'd only reach this point if the guest * actually allowed it (see kvm_s2_handle_perm_fault). + * + * Also encode the level of the nested translation in the SW bits of + * the PTE/PMD/PUD. This will be retrived on TLB invalidation from + * the guest. */ if (nested) { writable &= kvm_s2_trans_writable(nested); if (!kvm_s2_trans_readable(nested)) prot &= ~KVM_PGTABLE_PROT_R; + + prot |= kvm_encode_nested_level(nested); } read_lock(&kvm->mmu_lock); @@ -1661,14 +1667,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * permissions only if vma_pagesize equals fault_granule. Otherwise, * kvm_pgtable_stage2_map() should be called to change block size. */ - if (fault_is_perm && vma_pagesize == fault_granule) + if (fault_is_perm && vma_pagesize == fault_granule) { + /* + * Drop the SW bits in favour of those stored in the + * PTE, which will be preserved. + */ + prot &= ~KVM_NV_GUEST_MAP_SZ; ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot); - else + } else { ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize, __pfn_to_phys(pfn), prot, memcache, KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED); + } + out_unlock: read_unlock(&kvm->mmu_lock);
Populate bits [56:55] of the leaf entry with the level provided by the guest's S2 translation. This will allow us to better scope the invalidation by remembering the mapping size. Of course, this assume that the guest will issue an invalidation with an address that falls into the same leaf. If the guest doesn't, we'll over-invalidate. Signed-off-by: Marc Zyngier <maz@kernel.org> --- arch/arm64/include/asm/kvm_nested.h | 8 ++++++++ arch/arm64/kvm/mmu.c | 17 +++++++++++++++-- 2 files changed, 23 insertions(+), 2 deletions(-)