diff mbox series

[v2] KVM: arm64: Inject exception on out-of-IPA-range translation fault

Message ID 20220427220434.3097449-1-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v2] KVM: arm64: Inject exception on out-of-IPA-range translation fault | expand

Commit Message

Marc Zyngier April 27, 2022, 10:04 p.m. UTC
When taking a translation fault for an IPA that is outside of
the range defined by the hypervisor (between the HW PARange and
the IPA range), we stupidly treat it as an IO and forward the access
to userspace. Of course, userspace can't do much with it, and things
end badly.

Arguably, the guest is braindead, but we should at least catch the
case and inject an exception.

Check the faulting IPA against:
- the sanitised PARange: inject an address size fault
- the IPA size: inject an abort

Reported-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |  1 +
 arch/arm64/kvm/inject_fault.c        | 28 ++++++++++++++++++++++++++++
 arch/arm64/kvm/mmu.c                 | 19 +++++++++++++++++++
 3 files changed, 48 insertions(+)

Comments

Alexandru Elisei April 28, 2022, 8:46 a.m. UTC | #1
Hi,

On Wed, Apr 27, 2022 at 11:04:34PM +0100, Marc Zyngier wrote:
> When taking a translation fault for an IPA that is outside of
> the range defined by the hypervisor (between the HW PARange and
> the IPA range), we stupidly treat it as an IO and forward the access
> to userspace. Of course, userspace can't do much with it, and things
> end badly.
> 
> Arguably, the guest is braindead, but we should at least catch the
> case and inject an exception.
> 
> Check the faulting IPA against:
> - the sanitised PARange: inject an address size fault
> - the IPA size: inject an abort
> 
> Reported-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |  1 +
>  arch/arm64/kvm/inject_fault.c        | 28 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/mmu.c                 | 19 +++++++++++++++++++
>  3 files changed, 48 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 7496deab025a..f71358271b71 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -40,6 +40,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu);
>  void kvm_inject_vabt(struct kvm_vcpu *vcpu);
>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
> +void kvm_inject_size_fault(struct kvm_vcpu *vcpu);
>  
>  void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index b47df73e98d7..ba20405d2dc2 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -145,6 +145,34 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
>  		inject_abt64(vcpu, true, addr);
>  }
>  
> +void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long addr, esr;
> +
> +	addr  = kvm_vcpu_get_fault_ipa(vcpu);
> +	addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> +
> +	if (kvm_vcpu_trap_is_iabt(vcpu))
> +		kvm_inject_pabt(vcpu, addr);
> +	else
> +		kvm_inject_dabt(vcpu, addr);
> +
> +	/*
> +	 * If AArch64 or LPAE, set FSC to 0 to indicate an Address
> +	 * Size Fault at level 0, as if exceeding PARange.
> +	 *
> +	 * Non-LPAE guests will only get the external abort, as there
> +	 * is no way to to describe the ASF.
> +	 */
> +	if (vcpu_el1_is_32bit(vcpu) &&
> +	    !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE))
> +		return;
> +
> +	esr = vcpu_read_sys_reg(vcpu, ESR_EL1);
> +	esr &= ~GENMASK_ULL(5, 0);
> +	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +}
> +
>  /**
>   * kvm_inject_undefined - inject an undefined instruction into the guest
>   * @vcpu: The vCPU in which to inject the exception
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 53ae2c0640bc..5400fc020164 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1337,6 +1337,25 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>  
> +	if (fault_status == FSC_FAULT) {
> +		/* Beyond sanitised PARange (which is the IPA limit) */
> +		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
> +			kvm_inject_size_fault(vcpu);
> +			return 1;
> +		}
> +
> +		/* Falls between the IPA range and the PARange? */
> +		if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
> +			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> +
> +			if (is_iabt)
> +				kvm_inject_pabt(vcpu, fault_ipa);
> +			else
> +				kvm_inject_dabt(vcpu, fault_ipa);
> +			return 1;
> +		}

Doesn't KVM treat faults outside a valid memslot (aka guest RAM) as MMIO
aborts? From the guest's point of view, the IPA is valid because it's
inside the HW PARange, so it's not entirely impossible that the VMM put a
device there.

Thanks,
Alex

> +	}
> +
>  	/* Synchronous External Abort? */
>  	if (kvm_vcpu_abt_issea(vcpu)) {
>  		/*
> -- 
> 2.34.1
>
Marc Zyngier April 28, 2022, 3:22 p.m. UTC | #2
On Thu, 28 Apr 2022 09:46:21 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Wed, Apr 27, 2022 at 11:04:34PM +0100, Marc Zyngier wrote:
> > When taking a translation fault for an IPA that is outside of
> > the range defined by the hypervisor (between the HW PARange and
> > the IPA range), we stupidly treat it as an IO and forward the access
> > to userspace. Of course, userspace can't do much with it, and things
> > end badly.
> > 
> > Arguably, the guest is braindead, but we should at least catch the
> > case and inject an exception.
> > 
> > Check the faulting IPA against:
> > - the sanitised PARange: inject an address size fault
> > - the IPA size: inject an abort
> > 
> > Reported-by: Christoffer Dall <christoffer.dall@arm.com>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h |  1 +
> >  arch/arm64/kvm/inject_fault.c        | 28 ++++++++++++++++++++++++++++
> >  arch/arm64/kvm/mmu.c                 | 19 +++++++++++++++++++
> >  3 files changed, 48 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index 7496deab025a..f71358271b71 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -40,6 +40,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu);
> >  void kvm_inject_vabt(struct kvm_vcpu *vcpu);
> >  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> >  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
> > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu);
> >  
> >  void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
> >  
> > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> > index b47df73e98d7..ba20405d2dc2 100644
> > --- a/arch/arm64/kvm/inject_fault.c
> > +++ b/arch/arm64/kvm/inject_fault.c
> > @@ -145,6 +145,34 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
> >  		inject_abt64(vcpu, true, addr);
> >  }
> >  
> > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
> > +{
> > +	unsigned long addr, esr;
> > +
> > +	addr  = kvm_vcpu_get_fault_ipa(vcpu);
> > +	addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> > +
> > +	if (kvm_vcpu_trap_is_iabt(vcpu))
> > +		kvm_inject_pabt(vcpu, addr);
> > +	else
> > +		kvm_inject_dabt(vcpu, addr);
> > +
> > +	/*
> > +	 * If AArch64 or LPAE, set FSC to 0 to indicate an Address
> > +	 * Size Fault at level 0, as if exceeding PARange.
> > +	 *
> > +	 * Non-LPAE guests will only get the external abort, as there
> > +	 * is no way to to describe the ASF.
> > +	 */
> > +	if (vcpu_el1_is_32bit(vcpu) &&
> > +	    !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE))
> > +		return;
> > +
> > +	esr = vcpu_read_sys_reg(vcpu, ESR_EL1);
> > +	esr &= ~GENMASK_ULL(5, 0);
> > +	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> > +}
> > +
> >  /**
> >   * kvm_inject_undefined - inject an undefined instruction into the guest
> >   * @vcpu: The vCPU in which to inject the exception
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 53ae2c0640bc..5400fc020164 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1337,6 +1337,25 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >  	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> >  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> >  
> > +	if (fault_status == FSC_FAULT) {
> > +		/* Beyond sanitised PARange (which is the IPA limit) */
> > +		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
> > +			kvm_inject_size_fault(vcpu);
> > +			return 1;
> > +		}
> > +
> > +		/* Falls between the IPA range and the PARange? */
> > +		if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
> > +			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> > +
> > +			if (is_iabt)
> > +				kvm_inject_pabt(vcpu, fault_ipa);
> > +			else
> > +				kvm_inject_dabt(vcpu, fault_ipa);
> > +			return 1;
> > +		}
> 
> Doesn't KVM treat faults outside a valid memslot (aka guest RAM) as MMIO
> aborts? From the guest's point of view, the IPA is valid because it's
> inside the HW PARange, so it's not entirely impossible that the VMM put a
> device there.

Sure. But the generated IPA is outside of the range the VMM has asked
to handle. The IPA space describes the whole of the guest address
space, and there shouldn't be anything outside of it.

We actually state in the documentation that the IPA Size limit *is*
the physical address size for the VM. If the VMM places something
outside if the IPA space and still expect something to be reported to
it, we have a problem (in some cases, we may want to actually put page
tables in place even for MMIO that traps to userspace -- see my
earlier work on MMIO guard).

Does it make sense to you?

	M.
Alexandru Elisei April 28, 2022, 4:07 p.m. UTC | #3
Hi,

On Thu, Apr 28, 2022 at 04:22:58PM +0100, Marc Zyngier wrote:
> On Thu, 28 Apr 2022 09:46:21 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi,
> > 
> > On Wed, Apr 27, 2022 at 11:04:34PM +0100, Marc Zyngier wrote:
> > > When taking a translation fault for an IPA that is outside of
> > > the range defined by the hypervisor (between the HW PARange and
> > > the IPA range), we stupidly treat it as an IO and forward the access
> > > to userspace. Of course, userspace can't do much with it, and things
> > > end badly.
> > > 
> > > Arguably, the guest is braindead, but we should at least catch the
> > > case and inject an exception.
> > > 
> > > Check the faulting IPA against:
> > > - the sanitised PARange: inject an address size fault
> > > - the IPA size: inject an abort
> > > 
> > > Reported-by: Christoffer Dall <christoffer.dall@arm.com>
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/kvm_emulate.h |  1 +
> > >  arch/arm64/kvm/inject_fault.c        | 28 ++++++++++++++++++++++++++++
> > >  arch/arm64/kvm/mmu.c                 | 19 +++++++++++++++++++
> > >  3 files changed, 48 insertions(+)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > > index 7496deab025a..f71358271b71 100644
> > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > @@ -40,6 +40,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu);
> > >  void kvm_inject_vabt(struct kvm_vcpu *vcpu);
> > >  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> > >  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
> > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu);
> > >  
> > >  void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
> > >  
> > > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> > > index b47df73e98d7..ba20405d2dc2 100644
> > > --- a/arch/arm64/kvm/inject_fault.c
> > > +++ b/arch/arm64/kvm/inject_fault.c
> > > @@ -145,6 +145,34 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
> > >  		inject_abt64(vcpu, true, addr);
> > >  }
> > >  
> > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
> > > +{
> > > +	unsigned long addr, esr;
> > > +
> > > +	addr  = kvm_vcpu_get_fault_ipa(vcpu);
> > > +	addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> > > +
> > > +	if (kvm_vcpu_trap_is_iabt(vcpu))
> > > +		kvm_inject_pabt(vcpu, addr);
> > > +	else
> > > +		kvm_inject_dabt(vcpu, addr);
> > > +
> > > +	/*
> > > +	 * If AArch64 or LPAE, set FSC to 0 to indicate an Address
> > > +	 * Size Fault at level 0, as if exceeding PARange.
> > > +	 *
> > > +	 * Non-LPAE guests will only get the external abort, as there
> > > +	 * is no way to to describe the ASF.
> > > +	 */
> > > +	if (vcpu_el1_is_32bit(vcpu) &&
> > > +	    !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE))
> > > +		return;
> > > +
> > > +	esr = vcpu_read_sys_reg(vcpu, ESR_EL1);
> > > +	esr &= ~GENMASK_ULL(5, 0);
> > > +	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> > > +}
> > > +
> > >  /**
> > >   * kvm_inject_undefined - inject an undefined instruction into the guest
> > >   * @vcpu: The vCPU in which to inject the exception
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index 53ae2c0640bc..5400fc020164 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -1337,6 +1337,25 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > >  	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> > >  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> > >  
> > > +	if (fault_status == FSC_FAULT) {
> > > +		/* Beyond sanitised PARange (which is the IPA limit) */
> > > +		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
> > > +			kvm_inject_size_fault(vcpu);
> > > +			return 1;
> > > +		}
> > > +
> > > +		/* Falls between the IPA range and the PARange? */
> > > +		if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
> > > +			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> > > +
> > > +			if (is_iabt)
> > > +				kvm_inject_pabt(vcpu, fault_ipa);
> > > +			else
> > > +				kvm_inject_dabt(vcpu, fault_ipa);
> > > +			return 1;
> > > +		}
> > 
> > Doesn't KVM treat faults outside a valid memslot (aka guest RAM) as MMIO
> > aborts? From the guest's point of view, the IPA is valid because it's
> > inside the HW PARange, so it's not entirely impossible that the VMM put a
> > device there.
> 
> Sure. But the generated IPA is outside of the range the VMM has asked
> to handle. The IPA space describes the whole of the guest address
> space, and there shouldn't be anything outside of it.
> 
> We actually state in the documentation that the IPA Size limit *is*
> the physical address size for the VM. If the VMM places something
> outside if the IPA space and still expect something to be reported to
> it, we have a problem (in some cases, we may want to actually put page
> tables in place even for MMIO that traps to userspace -- see my
> earlier work on MMIO guard).

If you mean this bit:

On arm64, the physical address size for a VM (IPA Size limit) is limited
to 40bits by default. The limit can be configured if the host supports the
extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
identifier, where IPA_Bits is the maximum width of any physical
address used by the VM.

And then below there is this paragraph:

Please note that configuring the IPA size does not affect the capability
exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
**size of the address translated by the stage2 level (guest physical to
host physical address translations)**.

Emphasis added by me.

It looks to me like the two paragraph state different things, first says
the IPA size caps "the physical address size for a VM", the second that it
caps the RAM size - "size of the address translated by the stage 2 level.

I have no problem with either, but it looks confusing.

So if a VMM that wants to put devices above RAM it must make sure that the
IPA size is extended to match, did I get that right?

I'm also a bit confused about the rationale. Why is the PARange exposed to
the guest in effect the wrong value, because the true PARange is defined by
IPA size?

Thanks,
Alex
Marc Zyngier April 28, 2022, 5:55 p.m. UTC | #4
On Thu, 28 Apr 2022 17:07:21 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Thu, Apr 28, 2022 at 04:22:58PM +0100, Marc Zyngier wrote:
> > On Thu, 28 Apr 2022 09:46:21 +0100,
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > 
> > > Hi,
> > > 
> > > On Wed, Apr 27, 2022 at 11:04:34PM +0100, Marc Zyngier wrote:
> > > > When taking a translation fault for an IPA that is outside of
> > > > the range defined by the hypervisor (between the HW PARange and
> > > > the IPA range), we stupidly treat it as an IO and forward the access
> > > > to userspace. Of course, userspace can't do much with it, and things
> > > > end badly.
> > > > 
> > > > Arguably, the guest is braindead, but we should at least catch the
> > > > case and inject an exception.
> > > > 
> > > > Check the faulting IPA against:
> > > > - the sanitised PARange: inject an address size fault
> > > > - the IPA size: inject an abort
> > > > 
> > > > Reported-by: Christoffer Dall <christoffer.dall@arm.com>
> > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > ---
> > > >  arch/arm64/include/asm/kvm_emulate.h |  1 +
> > > >  arch/arm64/kvm/inject_fault.c        | 28 ++++++++++++++++++++++++++++
> > > >  arch/arm64/kvm/mmu.c                 | 19 +++++++++++++++++++
> > > >  3 files changed, 48 insertions(+)
> > > > 
> > > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > > > index 7496deab025a..f71358271b71 100644
> > > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > > @@ -40,6 +40,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu);
> > > >  void kvm_inject_vabt(struct kvm_vcpu *vcpu);
> > > >  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> > > >  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
> > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu);
> > > >  
> > > >  void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
> > > >  
> > > > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> > > > index b47df73e98d7..ba20405d2dc2 100644
> > > > --- a/arch/arm64/kvm/inject_fault.c
> > > > +++ b/arch/arm64/kvm/inject_fault.c
> > > > @@ -145,6 +145,34 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
> > > >  		inject_abt64(vcpu, true, addr);
> > > >  }
> > > >  
> > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +	unsigned long addr, esr;
> > > > +
> > > > +	addr  = kvm_vcpu_get_fault_ipa(vcpu);
> > > > +	addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> > > > +
> > > > +	if (kvm_vcpu_trap_is_iabt(vcpu))
> > > > +		kvm_inject_pabt(vcpu, addr);
> > > > +	else
> > > > +		kvm_inject_dabt(vcpu, addr);
> > > > +
> > > > +	/*
> > > > +	 * If AArch64 or LPAE, set FSC to 0 to indicate an Address
> > > > +	 * Size Fault at level 0, as if exceeding PARange.
> > > > +	 *
> > > > +	 * Non-LPAE guests will only get the external abort, as there
> > > > +	 * is no way to to describe the ASF.
> > > > +	 */
> > > > +	if (vcpu_el1_is_32bit(vcpu) &&
> > > > +	    !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE))
> > > > +		return;
> > > > +
> > > > +	esr = vcpu_read_sys_reg(vcpu, ESR_EL1);
> > > > +	esr &= ~GENMASK_ULL(5, 0);
> > > > +	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> > > > +}
> > > > +
> > > >  /**
> > > >   * kvm_inject_undefined - inject an undefined instruction into the guest
> > > >   * @vcpu: The vCPU in which to inject the exception
> > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > index 53ae2c0640bc..5400fc020164 100644
> > > > --- a/arch/arm64/kvm/mmu.c
> > > > +++ b/arch/arm64/kvm/mmu.c
> > > > @@ -1337,6 +1337,25 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > > >  	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> > > >  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> > > >  
> > > > +	if (fault_status == FSC_FAULT) {
> > > > +		/* Beyond sanitised PARange (which is the IPA limit) */
> > > > +		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
> > > > +			kvm_inject_size_fault(vcpu);
> > > > +			return 1;
> > > > +		}
> > > > +
> > > > +		/* Falls between the IPA range and the PARange? */
> > > > +		if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
> > > > +			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> > > > +
> > > > +			if (is_iabt)
> > > > +				kvm_inject_pabt(vcpu, fault_ipa);
> > > > +			else
> > > > +				kvm_inject_dabt(vcpu, fault_ipa);
> > > > +			return 1;
> > > > +		}
> > > 
> > > Doesn't KVM treat faults outside a valid memslot (aka guest RAM) as MMIO
> > > aborts? From the guest's point of view, the IPA is valid because it's
> > > inside the HW PARange, so it's not entirely impossible that the VMM put a
> > > device there.
> > 
> > Sure. But the generated IPA is outside of the range the VMM has asked
> > to handle. The IPA space describes the whole of the guest address
> > space, and there shouldn't be anything outside of it.
> > 
> > We actually state in the documentation that the IPA Size limit *is*
> > the physical address size for the VM. If the VMM places something
> > outside if the IPA space and still expect something to be reported to
> > it, we have a problem (in some cases, we may want to actually put page
> > tables in place even for MMIO that traps to userspace -- see my
> > earlier work on MMIO guard).
> 
> If you mean this bit:
> 
> On arm64, the physical address size for a VM (IPA Size limit) is limited
> to 40bits by default. The limit can be configured if the host supports the
> extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
> KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
> identifier, where IPA_Bits is the maximum width of any physical
> address used by the VM.
>
> And then below there is this paragraph:
> 
> Please note that configuring the IPA size does not affect the capability
> exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
> **size of the address translated by the stage2 level (guest physical to
> host physical address translations)**.

I don't see that as a contradiction. It just says that we don't
repaint PARange.

> 
> Emphasis added by me.
> 
> It looks to me like the two paragraph state different things, first says
> the IPA size caps "the physical address size for a VM", the second that it
> caps the RAM size - "size of the address translated by the stage 2 level.

That's not the way I understand it. It just gives a textbook
definition of the IPA space. And to be clear, this is just an
implementation detail. We should be able to populate all full IPA
space with faulting entries and keep the behaviour the same.

> I have no problem with either, but it looks confusing.
> 
> So if a VMM that wants to put devices above RAM it must make sure that the
> IPA size is extended to match, did I get that right?

Yes. Anything that is reacheable by a memory transaction has to fit in
the IPA space.

> I'm also a bit confused about the rationale. Why is the PARange exposed to
> the guest in effect the wrong value, because the true PARange is defined by
> IPA size?

PARange and IPA range don't have the same granularity. You can't
express a PARange of 37 bits, for example, while it is perfectly
possible for the IPA range. And they do cover two different concepts:
the IPA space means nothing to the guest.

	M.
Alexandru Elisei April 29, 2022, 10:44 a.m. UTC | #5
Hi,

On Thu, Apr 28, 2022 at 06:55:56PM +0100, Marc Zyngier wrote:
> On Thu, 28 Apr 2022 17:07:21 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi,
> > 
> > On Thu, Apr 28, 2022 at 04:22:58PM +0100, Marc Zyngier wrote:
> > > On Thu, 28 Apr 2022 09:46:21 +0100,
> > > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > On Wed, Apr 27, 2022 at 11:04:34PM +0100, Marc Zyngier wrote:
> > > > > When taking a translation fault for an IPA that is outside of
> > > > > the range defined by the hypervisor (between the HW PARange and
> > > > > the IPA range), we stupidly treat it as an IO and forward the access
> > > > > to userspace. Of course, userspace can't do much with it, and things
> > > > > end badly.
> > > > > 
> > > > > Arguably, the guest is braindead, but we should at least catch the
> > > > > case and inject an exception.
> > > > > 
> > > > > Check the faulting IPA against:
> > > > > - the sanitised PARange: inject an address size fault
> > > > > - the IPA size: inject an abort
> > > > > 
> > > > > Reported-by: Christoffer Dall <christoffer.dall@arm.com>
> > > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > > ---
> > > > >  arch/arm64/include/asm/kvm_emulate.h |  1 +
> > > > >  arch/arm64/kvm/inject_fault.c        | 28 ++++++++++++++++++++++++++++
> > > > >  arch/arm64/kvm/mmu.c                 | 19 +++++++++++++++++++
> > > > >  3 files changed, 48 insertions(+)
> > > > > 
> > > > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > > > > index 7496deab025a..f71358271b71 100644
> > > > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > > > @@ -40,6 +40,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu);
> > > > >  void kvm_inject_vabt(struct kvm_vcpu *vcpu);
> > > > >  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> > > > >  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
> > > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu);
> > > > >  
> > > > >  void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
> > > > >  
> > > > > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> > > > > index b47df73e98d7..ba20405d2dc2 100644
> > > > > --- a/arch/arm64/kvm/inject_fault.c
> > > > > +++ b/arch/arm64/kvm/inject_fault.c
> > > > > @@ -145,6 +145,34 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
> > > > >  		inject_abt64(vcpu, true, addr);
> > > > >  }
> > > > >  
> > > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > +	unsigned long addr, esr;
> > > > > +
> > > > > +	addr  = kvm_vcpu_get_fault_ipa(vcpu);
> > > > > +	addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> > > > > +
> > > > > +	if (kvm_vcpu_trap_is_iabt(vcpu))
> > > > > +		kvm_inject_pabt(vcpu, addr);
> > > > > +	else
> > > > > +		kvm_inject_dabt(vcpu, addr);
> > > > > +
> > > > > +	/*
> > > > > +	 * If AArch64 or LPAE, set FSC to 0 to indicate an Address
> > > > > +	 * Size Fault at level 0, as if exceeding PARange.
> > > > > +	 *
> > > > > +	 * Non-LPAE guests will only get the external abort, as there
> > > > > +	 * is no way to to describe the ASF.
> > > > > +	 */
> > > > > +	if (vcpu_el1_is_32bit(vcpu) &&
> > > > > +	    !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE))
> > > > > +		return;
> > > > > +
> > > > > +	esr = vcpu_read_sys_reg(vcpu, ESR_EL1);
> > > > > +	esr &= ~GENMASK_ULL(5, 0);
> > > > > +	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> > > > > +}
> > > > > +
> > > > >  /**
> > > > >   * kvm_inject_undefined - inject an undefined instruction into the guest
> > > > >   * @vcpu: The vCPU in which to inject the exception
> > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > > index 53ae2c0640bc..5400fc020164 100644
> > > > > --- a/arch/arm64/kvm/mmu.c
> > > > > +++ b/arch/arm64/kvm/mmu.c
> > > > > @@ -1337,6 +1337,25 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > > > >  	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> > > > >  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> > > > >  
> > > > > +	if (fault_status == FSC_FAULT) {
> > > > > +		/* Beyond sanitised PARange (which is the IPA limit) */
> > > > > +		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
> > > > > +			kvm_inject_size_fault(vcpu);
> > > > > +			return 1;
> > > > > +		}
> > > > > +
> > > > > +		/* Falls between the IPA range and the PARange? */
> > > > > +		if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
> > > > > +			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
> > > > > +
> > > > > +			if (is_iabt)
> > > > > +				kvm_inject_pabt(vcpu, fault_ipa);
> > > > > +			else
> > > > > +				kvm_inject_dabt(vcpu, fault_ipa);
> > > > > +			return 1;
> > > > > +		}
> > > > 
> > > > Doesn't KVM treat faults outside a valid memslot (aka guest RAM) as MMIO
> > > > aborts? From the guest's point of view, the IPA is valid because it's
> > > > inside the HW PARange, so it's not entirely impossible that the VMM put a
> > > > device there.
> > > 
> > > Sure. But the generated IPA is outside of the range the VMM has asked
> > > to handle. The IPA space describes the whole of the guest address
> > > space, and there shouldn't be anything outside of it.
> > > 
> > > We actually state in the documentation that the IPA Size limit *is*
> > > the physical address size for the VM. If the VMM places something
> > > outside if the IPA space and still expect something to be reported to
> > > it, we have a problem (in some cases, we may want to actually put page
> > > tables in place even for MMIO that traps to userspace -- see my
> > > earlier work on MMIO guard).
> > 
> > If you mean this bit:
> > 
> > On arm64, the physical address size for a VM (IPA Size limit) is limited
> > to 40bits by default. The limit can be configured if the host supports the
> > extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
> > KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
> > identifier, where IPA_Bits is the maximum width of any physical
> > address used by the VM.
> >
> > And then below there is this paragraph:
> > 
> > Please note that configuring the IPA size does not affect the capability
> > exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
> > **size of the address translated by the stage2 level (guest physical to
> > host physical address translations)**.
> 
> I don't see that as a contradiction. It just says that we don't
> repaint PARange.
> 
> > 
> > Emphasis added by me.
> > 
> > It looks to me like the two paragraph state different things, first says
> > the IPA size caps "the physical address size for a VM", the second that it
> > caps the RAM size - "size of the address translated by the stage 2 level.
> 
> That's not the way I understand it. It just gives a textbook
> definition of the IPA space. And to be clear, this is just an
> implementation detail. We should be able to populate all full IPA
> space with faulting entries and keep the behaviour the same.
> 
> > I have no problem with either, but it looks confusing.
> > 
> > So if a VMM that wants to put devices above RAM it must make sure that the
> > IPA size is extended to match, did I get that right?
> 
> Yes. Anything that is reacheable by a memory transaction has to fit in
> the IPA space.
> 
> > I'm also a bit confused about the rationale. Why is the PARange exposed to
> > the guest in effect the wrong value, because the true PARange is defined by
> > IPA size?
> 
> PARange and IPA range don't have the same granularity. You can't
> express a PARange of 37 bits, for example, while it is perfectly
> possible for the IPA range. And they do cover two different concepts:
> the IPA space means nothing to the guest.

I see, thank you for the detailed explanation!

Thanks,
Alex
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 7496deab025a..f71358271b71 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -40,6 +40,7 @@  void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_vabt(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
+void kvm_inject_size_fault(struct kvm_vcpu *vcpu);
 
 void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index b47df73e98d7..ba20405d2dc2 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -145,6 +145,34 @@  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
 		inject_abt64(vcpu, true, addr);
 }
 
+void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
+{
+	unsigned long addr, esr;
+
+	addr  = kvm_vcpu_get_fault_ipa(vcpu);
+	addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
+
+	if (kvm_vcpu_trap_is_iabt(vcpu))
+		kvm_inject_pabt(vcpu, addr);
+	else
+		kvm_inject_dabt(vcpu, addr);
+
+	/*
+	 * If AArch64 or LPAE, set FSC to 0 to indicate an Address
+	 * Size Fault at level 0, as if exceeding PARange.
+	 *
+	 * Non-LPAE guests will only get the external abort, as there
+	 * is no way to to describe the ASF.
+	 */
+	if (vcpu_el1_is_32bit(vcpu) &&
+	    !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE))
+		return;
+
+	esr = vcpu_read_sys_reg(vcpu, ESR_EL1);
+	esr &= ~GENMASK_ULL(5, 0);
+	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+}
+
 /**
  * kvm_inject_undefined - inject an undefined instruction into the guest
  * @vcpu: The vCPU in which to inject the exception
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 53ae2c0640bc..5400fc020164 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1337,6 +1337,25 @@  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
+	if (fault_status == FSC_FAULT) {
+		/* Beyond sanitised PARange (which is the IPA limit) */
+		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
+			kvm_inject_size_fault(vcpu);
+			return 1;
+		}
+
+		/* Falls between the IPA range and the PARange? */
+		if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) {
+			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
+
+			if (is_iabt)
+				kvm_inject_pabt(vcpu, fault_ipa);
+			else
+				kvm_inject_dabt(vcpu, fault_ipa);
+			return 1;
+		}
+	}
+
 	/* Synchronous External Abort? */
 	if (kvm_vcpu_abt_issea(vcpu)) {
 		/*