Message ID | 20201130121847.91808-4-wangyanan55@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix several bugs in KVM stage 2 translation | expand |
On Mon, Nov 30, 2020 at 08:18:47PM +0800, Yanan Wang wrote: > If we get a FSC_PERM fault, just using (logging_active && writable) to determine > calling kvm_pgtable_stage2_map(). There will be two more cases we should consider. > > (1) After logging_active is configged back to false from true. When we get a > FSC_PERM fault with write_fault and adjustment of hugepage is needed, we should > merge tables back to a block entry. This case is ignored by still calling > kvm_pgtable_stage2_relax_perms(), which will lead to an endless loop and guest > panic due to soft lockup. > > (2) We use (FSC_PERM && logging_active && writable) to determine collapsing > a block entry into a table by calling kvm_pgtable_stage2_map(). But sometimes > we may only need to relax permissions when trying to write to a page other than > a block. In this condition, using kvm_pgtable_stage2_relax_perms() will be fine. > > The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup level > at which a D-abort or I-abort occured. By comparing granule of the fault lookup > level with vma_pagesize, we can strictly distinguish conditions of calling > kvm_pgtable_stage2_relax_perms() or kvm_pgtable_stage2_map(), and the above > two cases will be well considered. > > Suggested-by: Keqian Zhu <zhukeqian1@huawei.com> > Signed-off-by: Yanan Wang <wangyanan55@huawei.com> > --- > arch/arm64/include/asm/esr.h | 1 + > arch/arm64/include/asm/kvm_emulate.h | 5 +++++ > arch/arm64/kvm/mmu.c | 11 +++++++++-- > 3 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h > index 22c81f1edda2..85a3e49f92f4 100644 > --- a/arch/arm64/include/asm/esr.h > +++ b/arch/arm64/include/asm/esr.h > @@ -104,6 +104,7 @@ > /* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */ > #define ESR_ELx_FSC (0x3F) > #define ESR_ELx_FSC_TYPE (0x3C) > +#define ESR_ELx_FSC_LEVEL (0x03) > #define ESR_ELx_FSC_EXTABT (0x10) > #define ESR_ELx_FSC_SERROR (0x11) > #define ESR_ELx_FSC_ACCESS (0x08) > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h > index 5ef2669ccd6c..2e0e8edf6306 100644 > --- a/arch/arm64/include/asm/kvm_emulate.h > +++ b/arch/arm64/include/asm/kvm_emulate.h > @@ -350,6 +350,11 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc > return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE; > } > > +static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu) > +{ > + return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL; > +{ > + > static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu) > { > switch (kvm_vcpu_trap_get_fault(vcpu)) { > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 1a01da9fdc99..75814a02d189 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -754,10 +754,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > gfn_t gfn; > kvm_pfn_t pfn; > bool logging_active = memslot_is_logging(memslot); > - unsigned long vma_pagesize; > + unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu); > + unsigned long vma_pagesize, fault_granule; > enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; > struct kvm_pgtable *pgt; > > + fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level); I like the idea, but is this macro reliable for stage-2 page-tables, given that we could have a concatenated pgd? Will
Hi Will, On 2020/11/30 21:49, Will Deacon wrote: > On Mon, Nov 30, 2020 at 08:18:47PM +0800, Yanan Wang wrote: >> If we get a FSC_PERM fault, just using (logging_active && writable) to determine >> calling kvm_pgtable_stage2_map(). There will be two more cases we should consider. >> >> (1) After logging_active is configged back to false from true. When we get a >> FSC_PERM fault with write_fault and adjustment of hugepage is needed, we should >> merge tables back to a block entry. This case is ignored by still calling >> kvm_pgtable_stage2_relax_perms(), which will lead to an endless loop and guest >> panic due to soft lockup. >> >> (2) We use (FSC_PERM && logging_active && writable) to determine collapsing >> a block entry into a table by calling kvm_pgtable_stage2_map(). But sometimes >> we may only need to relax permissions when trying to write to a page other than >> a block. In this condition, using kvm_pgtable_stage2_relax_perms() will be fine. >> >> The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup level >> at which a D-abort or I-abort occured. By comparing granule of the fault lookup >> level with vma_pagesize, we can strictly distinguish conditions of calling >> kvm_pgtable_stage2_relax_perms() or kvm_pgtable_stage2_map(), and the above >> two cases will be well considered. >> >> Suggested-by: Keqian Zhu <zhukeqian1@huawei.com> >> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> >> --- >> arch/arm64/include/asm/esr.h | 1 + >> arch/arm64/include/asm/kvm_emulate.h | 5 +++++ >> arch/arm64/kvm/mmu.c | 11 +++++++++-- >> 3 files changed, 15 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h >> index 22c81f1edda2..85a3e49f92f4 100644 >> --- a/arch/arm64/include/asm/esr.h >> +++ b/arch/arm64/include/asm/esr.h >> @@ -104,6 +104,7 @@ >> /* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */ >> #define ESR_ELx_FSC (0x3F) >> #define ESR_ELx_FSC_TYPE (0x3C) >> +#define ESR_ELx_FSC_LEVEL (0x03) >> #define ESR_ELx_FSC_EXTABT (0x10) >> #define ESR_ELx_FSC_SERROR (0x11) >> #define ESR_ELx_FSC_ACCESS (0x08) >> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h >> index 5ef2669ccd6c..2e0e8edf6306 100644 >> --- a/arch/arm64/include/asm/kvm_emulate.h >> +++ b/arch/arm64/include/asm/kvm_emulate.h >> @@ -350,6 +350,11 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc >> return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE; >> } >> >> +static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu) >> +{ >> + return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL; >> +{ >> + >> static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu) >> { >> switch (kvm_vcpu_trap_get_fault(vcpu)) { >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index 1a01da9fdc99..75814a02d189 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -754,10 +754,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> gfn_t gfn; >> kvm_pfn_t pfn; >> bool logging_active = memslot_is_logging(memslot); >> - unsigned long vma_pagesize; >> + unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu); >> + unsigned long vma_pagesize, fault_granule; >> enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; >> struct kvm_pgtable *pgt; >> >> + fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level); > I like the idea, but is this macro reliable for stage-2 page-tables, given > that we could have a concatenated pgd? > > Will > . Yes, it's fine even when we have a concatenated pgd table. No matter a concatenated pgd will be made or not, the initial lookup level (start _level) is set in VTCR_EL2 register. The MMU hardware walker will know the start_level according to information in VTCR_EL2. This idea runs well in practice on host where ia_bits is 40, PAGE_SIZE is 4k, and a concatenated pgd is made for guest stage2. According to the kernel info printed, the start_level is 1, and stage 2 translation runs as expected. Yanan
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h index 22c81f1edda2..85a3e49f92f4 100644 --- a/arch/arm64/include/asm/esr.h +++ b/arch/arm64/include/asm/esr.h @@ -104,6 +104,7 @@ /* Shared ISS fault status code(IFSC/DFSC) for Data/Instruction aborts */ #define ESR_ELx_FSC (0x3F) #define ESR_ELx_FSC_TYPE (0x3C) +#define ESR_ELx_FSC_LEVEL (0x03) #define ESR_ELx_FSC_EXTABT (0x10) #define ESR_ELx_FSC_SERROR (0x11) #define ESR_ELx_FSC_ACCESS (0x08) diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index 5ef2669ccd6c..2e0e8edf6306 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -350,6 +350,11 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE; } +static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu) +{ + return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL; +{ + static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu) { switch (kvm_vcpu_trap_get_fault(vcpu)) { diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 1a01da9fdc99..75814a02d189 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -754,10 +754,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, gfn_t gfn; kvm_pfn_t pfn; bool logging_active = memslot_is_logging(memslot); - unsigned long vma_pagesize; + unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu); + unsigned long vma_pagesize, fault_granule; enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; struct kvm_pgtable *pgt; + fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level); write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); VM_BUG_ON(write_fault && exec_fault); @@ -896,7 +898,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) prot |= KVM_PGTABLE_PROT_X; - if (fault_status == FSC_PERM && !(logging_active && writable)) { + /* + * Under the premise of getting a FSC_PERM fault, we just need to relax + * permissions only if vma_pagesize equals fault_granule. Otherwise, + * kvm_pgtable_stage2_map() should be called to change block size. + */ + if (fault_status == FSC_PERM && vma_pagesize == fault_granule) { ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot); } else { ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
If we get a FSC_PERM fault, just using (logging_active && writable) to determine calling kvm_pgtable_stage2_map(). There will be two more cases we should consider. (1) After logging_active is configged back to false from true. When we get a FSC_PERM fault with write_fault and adjustment of hugepage is needed, we should merge tables back to a block entry. This case is ignored by still calling kvm_pgtable_stage2_relax_perms(), which will lead to an endless loop and guest panic due to soft lockup. (2) We use (FSC_PERM && logging_active && writable) to determine collapsing a block entry into a table by calling kvm_pgtable_stage2_map(). But sometimes we may only need to relax permissions when trying to write to a page other than a block. In this condition, using kvm_pgtable_stage2_relax_perms() will be fine. The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup level at which a D-abort or I-abort occured. By comparing granule of the fault lookup level with vma_pagesize, we can strictly distinguish conditions of calling kvm_pgtable_stage2_relax_perms() or kvm_pgtable_stage2_map(), and the above two cases will be well considered. Suggested-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> --- arch/arm64/include/asm/esr.h | 1 + arch/arm64/include/asm/kvm_emulate.h | 5 +++++ arch/arm64/kvm/mmu.c | 11 +++++++++-- 3 files changed, 15 insertions(+), 2 deletions(-)