diff mbox series

[RESEND,RFC,v1,1/5] arm64: Add TLB Conflict Abort Exception handler to KVM

Message ID 20241211160218.41404-2-miko.lenczewski@arm.com (mailing list archive)
State New, archived
Headers show
Series Initial BBML2 support for contpte_convert() | expand

Commit Message

Mikołaj Lenczewski Dec. 11, 2024, 4:01 p.m. UTC
Currently, KVM does not handle the case of a stage 2 TLB conflict abort
exception. The Arm ARM specifies that the worst-case handling of such an
exception requires a `tlbi vmalls12e1`. Perform such an invalidation
when this exception is encountered.

Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
---
 arch/arm64/include/asm/esr.h | 8 ++++++++
 arch/arm64/kvm/mmu.c         | 6 ++++++
 2 files changed, 14 insertions(+)

Comments

Marc Zyngier Dec. 11, 2024, 5:40 p.m. UTC | #1
On Wed, 11 Dec 2024 16:01:37 +0000,
Mikołaj Lenczewski <miko.lenczewski@arm.com> wrote:
> 
> Currently, KVM does not handle the case of a stage 2 TLB conflict abort
> exception. The Arm ARM specifies that the worst-case handling of such an
> exception requires a `tlbi vmalls12e1`.

Not quite. It says (I_JCCRT):

<quote>
* For the EL1&0 translation regime, when stage 2 translations are in
  use, either VMALLS12E1 or ALLE1.
</quote>

> Perform such an invalidation when this exception is encountered.

What you fail to describe is *why* this is needed. You know it, I know
it, but not everybody does. A reference to the ARM ARM would
definitely be helpful.

> 
> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
> ---
>  arch/arm64/include/asm/esr.h | 8 ++++++++
>  arch/arm64/kvm/mmu.c         | 6 ++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index d1b1a33f9a8b..8a66f81ca291 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -121,6 +121,7 @@
>  #define ESR_ELx_FSC_SEA_TTW(n)	(0x14 + (n))
>  #define ESR_ELx_FSC_SECC	(0x18)
>  #define ESR_ELx_FSC_SECC_TTW(n)	(0x1c + (n))
> +#define ESR_ELx_FSC_TLBABT	(0x30)
>  
>  /* Status codes for individual page table levels */
>  #define ESR_ELx_FSC_ACCESS_L(n)	(ESR_ELx_FSC_ACCESS + (n))
> @@ -464,6 +465,13 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
>  	       (esr == ESR_ELx_FSC_ACCESS_L(0));
>  }
>  
> +static inline bool esr_fsc_is_tlb_conflict_abort(unsigned long esr)
> +{
> +	esr = esr & ESR_ELx_FSC;
> +
> +	return esr == ESR_ELx_FSC_TLBABT;
> +}
> +
>  /* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
>  static inline bool esr_iss_is_eretax(unsigned long esr)
>  {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c9d46ad57e52..c8c6f5a97a1b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1756,6 +1756,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>  
> +	if (esr_fsc_is_tlb_conflict_abort(esr)) {
> +		// does a `tlbi vmalls12e1is`

nit: this isn't a very useful comment.

> +		__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
> +		return 1;
> +	}

That's not enough, unfortunately. A nested VM has *many* VMIDs (the
flattening of all translation contexts that the guest uses).

So you can either iterate over all the valid VMIDs owned by this
guest, or more simply issue a TLBI ALLE1, which will do the trick in a
much more efficient way.

The other thing is that you are using an IS invalidation, which is
farther reaching than necessary. Why would you invalidate the TLBs for
CPUs that are only innocent bystanders? A non-shareable invalidation
seems preferable to me.

> +
>  	if (esr_fsc_is_translation_fault(esr)) {
>  		/* Beyond sanitised PARange (which is the IPA limit) */
>  		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {

But it also begs the question: why only KVM, and not the host? This
handler will only take effect for a TLB Conflict abort delivered from
an EL1 guest to EL2.

However, it doesn't seem to me that the host is equipped to deal with
this sort of exception for itself. Shouldn't you start with that?

Thanks,

	M.
Ryan Roberts Dec. 12, 2024, 9:23 a.m. UTC | #2
On 11/12/2024 17:40, Marc Zyngier wrote:
> On Wed, 11 Dec 2024 16:01:37 +0000,
> Mikołaj Lenczewski <miko.lenczewski@arm.com> wrote:
>>
>> Currently, KVM does not handle the case of a stage 2 TLB conflict abort
>> exception. The Arm ARM specifies that the worst-case handling of such an
>> exception requires a `tlbi vmalls12e1`.
> 
> Not quite. It says (I_JCCRT):
> 
> <quote>
> * For the EL1&0 translation regime, when stage 2 translations are in
>   use, either VMALLS12E1 or ALLE1.
> </quote>
> 
>> Perform such an invalidation when this exception is encountered.
> 
> What you fail to describe is *why* this is needed. You know it, I know
> it, but not everybody does. A reference to the ARM ARM would
> definitely be helpful.
> 
>>
>> Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com>
>> ---
>>  arch/arm64/include/asm/esr.h | 8 ++++++++
>>  arch/arm64/kvm/mmu.c         | 6 ++++++
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index d1b1a33f9a8b..8a66f81ca291 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -121,6 +121,7 @@
>>  #define ESR_ELx_FSC_SEA_TTW(n)	(0x14 + (n))
>>  #define ESR_ELx_FSC_SECC	(0x18)
>>  #define ESR_ELx_FSC_SECC_TTW(n)	(0x1c + (n))
>> +#define ESR_ELx_FSC_TLBABT	(0x30)
>>  
>>  /* Status codes for individual page table levels */
>>  #define ESR_ELx_FSC_ACCESS_L(n)	(ESR_ELx_FSC_ACCESS + (n))
>> @@ -464,6 +465,13 @@ static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
>>  	       (esr == ESR_ELx_FSC_ACCESS_L(0));
>>  }
>>  
>> +static inline bool esr_fsc_is_tlb_conflict_abort(unsigned long esr)
>> +{
>> +	esr = esr & ESR_ELx_FSC;
>> +
>> +	return esr == ESR_ELx_FSC_TLBABT;
>> +}
>> +
>>  /* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
>>  static inline bool esr_iss_is_eretax(unsigned long esr)
>>  {
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c9d46ad57e52..c8c6f5a97a1b 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1756,6 +1756,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>>  	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>>  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>>  
>> +	if (esr_fsc_is_tlb_conflict_abort(esr)) {
>> +		// does a `tlbi vmalls12e1is`
> 
> nit: this isn't a very useful comment.
> 
>> +		__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
>> +		return 1;
>> +	}
> 
> That's not enough, unfortunately. A nested VM has *many* VMIDs (the
> flattening of all translation contexts that the guest uses).
> 
> So you can either iterate over all the valid VMIDs owned by this
> guest, or more simply issue a TLBI ALLE1, which will do the trick in a
> much more efficient way.
> 
> The other thing is that you are using an IS invalidation, which is
> farther reaching than necessary. Why would you invalidate the TLBs for
> CPUs that are only innocent bystanders? A non-shareable invalidation
> seems preferable to me.
> 
>> +
>>  	if (esr_fsc_is_translation_fault(esr)) {
>>  		/* Beyond sanitised PARange (which is the IPA limit) */
>>  		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
> 
> But it also begs the question: why only KVM, and not the host? This
> handler will only take effect for a TLB Conflict abort delivered from
> an EL1 guest to EL2.

Hi Marc,

I believe the intent of this patch is to protect the host/KVM against a guest
that is using BBML2. The host/KVM always assumes BBML0 and therefore doesn't do
any operations that are allowed by the arch to cause a conflict abort. Therefore
the host doesn't need to handle it. But a guest could be taking advantage of
BBML2 and therefore it's architiecturally possible for a conflict abort to be
raised to EL2. I think today that would take down the host?

So really I think this could be considered a stand-alone KVM hardening improvement?

> 
> However, it doesn't seem to me that the host is equipped to deal with
> this sort of exception for itself. Shouldn't you start with that?

If the host isn't doing any BBML2 operations it doesn't need to handle it, I
don't think? Obviously that changes later in the series and Miko is adding the
required handling to the host.

Thanks,
Ryan

> 
> Thanks,
> 
> 	M.
>
Marc Zyngier Dec. 12, 2024, 9:57 a.m. UTC | #3
Hi Ryan,

On Thu, 12 Dec 2024 09:23:20 +0000,
Ryan Roberts <ryan.roberts@arm.com> wrote:
> 
> Hi Marc,
> 
> I believe the intent of this patch is to protect the host/KVM against a guest
> that is using BBML2. The host/KVM always assumes BBML0 and therefore doesn't do
> any operations that are allowed by the arch to cause a conflict abort. Therefore
> the host doesn't need to handle it. But a guest could be taking advantage of
> BBML2 and therefore it's architiecturally possible for a conflict abort to be
> raised to EL2. I think today that would take down the host?
> 
> So really I think this could be considered a stand-alone KVM
> hardening improvement?

I'm not disputing the need for a TLB Conflict abort handler. It will
be a good addition once we agree on what needs to be done.

> > However, it doesn't seem to me that the host is equipped to deal with
> > this sort of exception for itself. Shouldn't you start with that?
> 
> If the host isn't doing any BBML2 operations it doesn't need to handle it, I
> don't think? Obviously that changes later in the series and Miko is adding the
> required handling to the host.

Yes, and that's what I overlooked yesterday, and I replied to that
change this morning.

Thanks,

	M.
Ryan Roberts Dec. 12, 2024, 10:37 a.m. UTC | #4
On 12/12/2024 09:57, Marc Zyngier wrote:
> Hi Ryan,
> 
> On Thu, 12 Dec 2024 09:23:20 +0000,
> Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Hi Marc,
>>
>> I believe the intent of this patch is to protect the host/KVM against a guest
>> that is using BBML2. The host/KVM always assumes BBML0 and therefore doesn't do
>> any operations that are allowed by the arch to cause a conflict abort. Therefore
>> the host doesn't need to handle it. But a guest could be taking advantage of
>> BBML2 and therefore it's architiecturally possible for a conflict abort to be
>> raised to EL2. I think today that would take down the host?
>>
>> So really I think this could be considered a stand-alone KVM
>> hardening improvement?
> 
> I'm not disputing the need for a TLB Conflict abort handler. It will
> be a good addition once we agree on what needs to be done.

OK great, glad we are on the same page. I'll leave Miko to work through the details.

> 
>>> However, it doesn't seem to me that the host is equipped to deal with
>>> this sort of exception for itself. Shouldn't you start with that?
>>
>> If the host isn't doing any BBML2 operations it doesn't need to handle it, I
>> don't think? Obviously that changes later in the series and Miko is adding the
>> required handling to the host.
> 
> Yes, and that's what I overlooked yesterday, and I replied to that
> change this morning.
> 
> Thanks,
> 
> 	M.
>
Mikołaj Lenczewski Dec. 13, 2024, 4:24 p.m. UTC | #5
Apologies again for spam (replied instead of group-replied).

On Wed, Dec 11, 2024 at 05:40:36PM +0000, Marc Zyngier wrote:
> On Wed, 11 Dec 2024 16:01:37 +0000,
> Mikołaj Lenczewski <miko.lenczewski@arm.com> wrote:
> > 
> > Currently, KVM does not handle the case of a stage 2 TLB conflict abort
> > exception. The Arm ARM specifies that the worst-case handling of such an
> > exception requires a `tlbi vmalls12e1`.
> 
> Not quite. It says (I_JCCRT):
> 
> <quote>
> * For the EL1&0 translation regime, when stage 2 translations are in
>   use, either VMALLS12E1 or ALLE1.
> </quote>
> 
> > Perform such an invalidation when this exception is encountered.
> 
> What you fail to describe is *why* this is needed. You know it, I know
> it, but not everybody does. A reference to the ARM ARM would
> definitely be helpful.
> 

You are correct. Will update the commit message.

> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c9d46ad57e52..c8c6f5a97a1b 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1756,6 +1756,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >  	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> >  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> >  
> > +	if (esr_fsc_is_tlb_conflict_abort(esr)) {
> > +		// does a `tlbi vmalls12e1is`
> 
> nit: this isn't a very useful comment.
> 

Will remove it.

> > +		__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
> > +		return 1;
> > +	}
> 
> That's not enough, unfortunately. A nested VM has *many* VMIDs (the
> flattening of all translation contexts that the guest uses).
> 
> So you can either iterate over all the valid VMIDs owned by this
> guest, or more simply issue a TLBI ALLE1, which will do the trick in a
> much more efficient way.
> 
> The other thing is that you are using an IS invalidation, which is
> farther reaching than necessary. Why would you invalidate the TLBs for
> CPUs that are only innocent bystanders? A non-shareable invalidation
> seems preferable to me.
>

You are completely correct here. I had forgotten about nested VMs, and
agree that issuing a `tlbi alle1` is simpler and more efficient. I agree
also on not using an IS invalidation.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d1b1a33f9a8b..8a66f81ca291 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -121,6 +121,7 @@ 
 #define ESR_ELx_FSC_SEA_TTW(n)	(0x14 + (n))
 #define ESR_ELx_FSC_SECC	(0x18)
 #define ESR_ELx_FSC_SECC_TTW(n)	(0x1c + (n))
+#define ESR_ELx_FSC_TLBABT	(0x30)
 
 /* Status codes for individual page table levels */
 #define ESR_ELx_FSC_ACCESS_L(n)	(ESR_ELx_FSC_ACCESS + (n))
@@ -464,6 +465,13 @@  static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
 	       (esr == ESR_ELx_FSC_ACCESS_L(0));
 }
 
+static inline bool esr_fsc_is_tlb_conflict_abort(unsigned long esr)
+{
+	esr = esr & ESR_ELx_FSC;
+
+	return esr == ESR_ELx_FSC_TLBABT;
+}
+
 /* Indicate whether ESR.EC==0x1A is for an ERETAx instruction */
 static inline bool esr_iss_is_eretax(unsigned long esr)
 {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c9d46ad57e52..c8c6f5a97a1b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,6 +1756,12 @@  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
+	if (esr_fsc_is_tlb_conflict_abort(esr)) {
+		// does a `tlbi vmalls12e1is`
+		__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
+		return 1;
+	}
+
 	if (esr_fsc_is_translation_fault(esr)) {
 		/* Beyond sanitised PARange (which is the IPA limit) */
 		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {