diff mbox series

KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch

Message ID 20200909210527.1926996-1-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch | expand

Commit Message

Marc Zyngier Sept. 9, 2020, 9:05 p.m. UTC
KVM currently assumes that an instruction abort can never be a write.
This is in general true, except when the abort is triggered by
a S1PTW on instruction fetch that tries to update the S1 page tables
(to set AF, for example).

This can happen if the page tables have been paged out and brought
back in without seeing a direct write to them (they are thus marked
read only), and the fault handling code will make the PT executable(!)
instead of writable. The guest gets stuck forever.

In these conditions, the permission fault must be considered as
a write so that the Stage-1 update can take place. This is essentially
the I-side equivalent of the problem fixed by 60e21a0ef54c ("arm64: KVM:
Take S1 walks into account when determining S2 write faults").

Update both kvm_is_write_fault() to return true on IABT+S1PTW, as well
as kvm_vcpu_trap_is_iabt() to return false in the same conditions.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
This could do with some cleanup (kvm_vcpu_dabt_iss1tw has nothing to do
with data aborts), but I've chosen to keep the patch simple in order to
ease backporting.

 arch/arm64/include/asm/kvm_emulate.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Sasha Levin Sept. 10, 2020, 4:34 p.m. UTC | #1
Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.8.7, v5.4.63, v4.19.143, v4.14.196, v4.9.235, v4.4.235.

v5.8.7: Build OK!
v5.4.63: Build OK!
v4.19.143: Failed to apply! Possible dependencies:
    0e20f5e25556 ("KVM: arm/arm64: Cleanup MMIO handling")
    5c37f1ae1c33 ("KVM: arm64: Ask the compiler to __always_inline functions used at HYP")
    5ffdfaedfa0a ("arm64: mm: Support Common Not Private translations")
    86d0dd34eaff ("arm64: cpufeature: add feature for CRC32 instructions")
    caab277b1de0 ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234")
    d94d71cb45fd ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 266")

v4.14.196: Failed to apply! Possible dependencies:
    1fc5dce78ad1 ("arm64/sve: Low-level SVE architectural state manipulation functions")
    2e0f2478ea37 ("arm64/sve: Probe SVE capabilities and usable vector lengths")
    43994d824e84 ("arm64/sve: Detect SVE and activate runtime support")
    5c37f1ae1c33 ("KVM: arm64: Ask the compiler to __always_inline functions used at HYP")
    611a7bc74ed2 ("arm64: docs: describe ELF hwcaps")
    746647c75afb ("arm64: entry.S convert el0_sync")
    7582e22038a2 ("arm64/sve: Backend logic for setting the vector length")
    79ab047c75d6 ("arm64/sve: Support vector length resetting for new processes")
    94ef7ecbdf6f ("arm64: fpsimd: Correctly annotate exception helpers called from asm")
    bc0ee4760364 ("arm64/sve: Core task context handling")
    ddd25ad1fde8 ("arm64/sve: Kconfig update and conditional compilation support")

v4.9.235: Failed to apply! Possible dependencies:
    016f98afd050 ("irqchip/gic-v3: Use nops macro for Cavium ThunderX erratum 23154")
    0e9884fe63c6 ("arm64: sysreg: subsume GICv3 sysreg definitions")
    328191c05ed7 ("irqchip/gic-v3-its: Specialise flush_dcache operation")
    38fd94b0275c ("arm64: Work around Falkor erratum 1003")
    43994d824e84 ("arm64/sve: Detect SVE and activate runtime support")
    47863d41ecf8 ("arm64: sysreg: sort by encoding")
    4aa8a472c33f ("arm64: Documentation - Expose CPU feature registers")
    5c37f1ae1c33 ("KVM: arm64: Ask the compiler to __always_inline functions used at HYP")
    611a7bc74ed2 ("arm64: docs: describe ELF hwcaps")
    6e01398fe450 ("arm64: arch_timer: document Hisilicon erratum 161010101")
    b20d1ba3cf4b ("arm64: cpufeature: allow for version discrepancy in PMU implementations")
    b389d7997acb ("arm64: cpufeature: treat unknown fields as RES0")
    bca8f17f57bd ("arm64: Get rid of asm/opcodes.h")
    c7a3c61fc606 ("arm64: sysreg: add performance monitor registers")
    cd9e1927a525 ("arm64: Work around broken .inst when defective gas is detected")
    d9ff80f83ecb ("arm64: Work around Falkor erratum 1009")
    eab43e88734f ("arm64: cpufeature: Cleanup feature bit tables")
    eeb1efbcb83c ("arm64: cpu_errata: Add capability to advertise Cortex-A73 erratum 858921")
    f31deaadff0d ("arm64: cpufeature: Don't enforce system-wide SPE capability")
    fe4fbdbcddea ("arm64: cpufeature: Track user visible fields")

v4.4.235: Failed to apply! Possible dependencies:
    06282fd2c2bf ("arm64: KVM: Implement vgic-v2 save/restore")
    0e9884fe63c6 ("arm64: sysreg: subsume GICv3 sysreg definitions")
    1b8e83c04ee2 ("arm64: KVM: vgic-v3: Avoid accessing ICH registers")
    2d81d425b6d5 ("irqchip/gicv3-its: Introduce two helper functions for accessing BASERn")
    328191c05ed7 ("irqchip/gic-v3-its: Specialise flush_dcache operation")
    3c13b8f435ac ("KVM: arm/arm64: vgic-v3: Make the LR indexing macro public")
    3faf24ea894a ("irqchip/gicv3-its: Implement two-level(indirect) device table support")
    466b7d168881 ("irqchip/gicv3-its: Don't allow devices whose ID is outside range")
    4b75c4598b5b ("irqchip/gicv3-its: Add a new function for parsing device table BASERn")
    5c37f1ae1c33 ("KVM: arm64: Ask the compiler to __always_inline functions used at HYP")
    91ef84428a86 ("irqchip/gic-v3: Reset BPR during initialization")
    9347359ad0ae ("irqchip/gicv3-its: Split its_alloc_tables() into two functions")
    b5525ce898eb ("arm64: KVM: Move GIC accessors to arch_gicv3.h")
    c76a0a6695c6 ("arm64: KVM: Add a HYP-specific header file")
    d44ffa5ae70a ("irqchip/gic-v3: Convert arm64 GIC accessors to {read,write}_sysreg_s")
    f68d2b1b73cc ("arm64: KVM: Implement vgic-v3 save/restore")
    fd451b90e78c ("arm64: KVM: vgic-v3: Restore ICH_APR0Rn_EL2 before ICH_APR1Rn_EL2")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?
Will Deacon Sept. 11, 2020, 3:59 p.m. UTC | #2
On Wed, Sep 09, 2020 at 10:05:27PM +0100, Marc Zyngier wrote:
> KVM currently assumes that an instruction abort can never be a write.
> This is in general true, except when the abort is triggered by
> a S1PTW on instruction fetch that tries to update the S1 page tables
> (to set AF, for example).
> 
> This can happen if the page tables have been paged out and brought
> back in without seeing a direct write to them (they are thus marked
> read only), and the fault handling code will make the PT executable(!)
> instead of writable. The guest gets stuck forever.
> 
> In these conditions, the permission fault must be considered as
> a write so that the Stage-1 update can take place. This is essentially
> the I-side equivalent of the problem fixed by 60e21a0ef54c ("arm64: KVM:
> Take S1 walks into account when determining S2 write faults").
> 
> Update both kvm_is_write_fault() to return true on IABT+S1PTW, as well
> as kvm_vcpu_trap_is_iabt() to return false in the same conditions.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> This could do with some cleanup (kvm_vcpu_dabt_iss1tw has nothing to do
> with data aborts), but I've chosen to keep the patch simple in order to
> ease backporting.
> 
>  arch/arm64/include/asm/kvm_emulate.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d21676409a24..33d7e16edaa3 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -480,7 +480,8 @@ static __always_inline u8 kvm_vcpu_trap_get_class(const struct kvm_vcpu *vcpu)
>  
>  static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu)
>  {
> -	return kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW;
> +	return (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW &&
> +		!kvm_vcpu_dabt_iss1tw(vcpu));
>  }
>  
>  static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
> @@ -520,6 +521,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
>  
>  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>  {
> +	if (kvm_vcpu_dabt_iss1tw(vcpu))
> +		return true;
> +

Hmm, I'm a bit uneasy about the interaction of this with
kvm_handle_guest_abort() if we take an S1PTW fault on instruction fetch
with our page-tables sitting in a read-only memslot. In this case, I
think we'll end up injecting a data abort into the guest instead of an
instruction abort. It hurts my brain thinking about it though.

Overall, I'd be inclined to:

  1. Rename kvm_vcpu_dabt_iss1tw() to kvm_vcpu_abt_iss1tw()

  2. Introduce something like kvm_is_exec_fault() as:

	return kvm_vcpu_trap_is_iabt() && !kvm_vcpu_abt_iss1tw();

  3. Use that new function in user_mem_abort() to assign 'exec_fault'

  4. Hack kvm_is_write_fault() as you have done above.

Which I _think_ should work (famous last words)...

The only nasty bit is that we then duplicate the kvm_vcpu_dabt_iss1tw()
check in both kvm_is_write_fault() and kvm_vcpu_dabt_iswrite(). Perhaps
we could remove the latter function in favour of the first? Anyway,
obviously this sort of cleanup isn't for stable.

Will
Marc Zyngier Sept. 14, 2020, 9:32 a.m. UTC | #3
On 2020-09-11 16:59, Will Deacon wrote:
> On Wed, Sep 09, 2020 at 10:05:27PM +0100, Marc Zyngier wrote:
>> KVM currently assumes that an instruction abort can never be a write.
>> This is in general true, except when the abort is triggered by
>> a S1PTW on instruction fetch that tries to update the S1 page tables
>> (to set AF, for example).
>> 
>> This can happen if the page tables have been paged out and brought
>> back in without seeing a direct write to them (they are thus marked
>> read only), and the fault handling code will make the PT executable(!)
>> instead of writable. The guest gets stuck forever.
>> 
>> In these conditions, the permission fault must be considered as
>> a write so that the Stage-1 update can take place. This is essentially
>> the I-side equivalent of the problem fixed by 60e21a0ef54c ("arm64: 
>> KVM:
>> Take S1 walks into account when determining S2 write faults").
>> 
>> Update both kvm_is_write_fault() to return true on IABT+S1PTW, as well
>> as kvm_vcpu_trap_is_iabt() to return false in the same conditions.
>> 
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>> This could do with some cleanup (kvm_vcpu_dabt_iss1tw has nothing to 
>> do
>> with data aborts), but I've chosen to keep the patch simple in order 
>> to
>> ease backporting.
>> 
>>  arch/arm64/include/asm/kvm_emulate.h | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index d21676409a24..33d7e16edaa3 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -480,7 +480,8 @@ static __always_inline u8 
>> kvm_vcpu_trap_get_class(const struct kvm_vcpu *vcpu)
>> 
>>  static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu)
>>  {
>> -	return kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW;
>> +	return (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW &&
>> +		!kvm_vcpu_dabt_iss1tw(vcpu));
>>  }
>> 
>>  static __always_inline u8 kvm_vcpu_trap_get_fault(const struct 
>> kvm_vcpu *vcpu)
>> @@ -520,6 +521,9 @@ static __always_inline int 
>> kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
>> 
>>  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>>  {
>> +	if (kvm_vcpu_dabt_iss1tw(vcpu))
>> +		return true;
>> +
> 
> Hmm, I'm a bit uneasy about the interaction of this with
> kvm_handle_guest_abort() if we take an S1PTW fault on instruction fetch
> with our page-tables sitting in a read-only memslot. In this case, I
> think we'll end up injecting a data abort into the guest instead of an
> instruction abort. It hurts my brain thinking about it though.

Good point.

> 
> Overall, I'd be inclined to:
> 
>   1. Rename kvm_vcpu_dabt_iss1tw() to kvm_vcpu_abt_iss1tw()
> 
>   2. Introduce something like kvm_is_exec_fault() as:
> 
> 	return kvm_vcpu_trap_is_iabt() && !kvm_vcpu_abt_iss1tw();
> 
>   3. Use that new function in user_mem_abort() to assign 'exec_fault'
> 
>   4. Hack kvm_is_write_fault() as you have done above.
> 
> Which I _think_ should work (famous last words)...

That's what I initially had, but went for ease of backporting instead...
Back to square one.

> The only nasty bit is that we then duplicate the kvm_vcpu_dabt_iss1tw()
> check in both kvm_is_write_fault() and kvm_vcpu_dabt_iswrite(). Perhaps
> we could remove the latter function in favour of the first? Anyway,
> obviously this sort of cleanup isn't for stable.

I've had a look, and I believe this is safe. We always check for S1PTW
*before* using kvm_vcpu_dabt_iswrite() anyway. I'll add that as a second
patch that we can merge later if required.

         M.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index d21676409a24..33d7e16edaa3 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -480,7 +480,8 @@  static __always_inline u8 kvm_vcpu_trap_get_class(const struct kvm_vcpu *vcpu)
 
 static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu)
 {
-	return kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW;
+	return (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW &&
+		!kvm_vcpu_dabt_iss1tw(vcpu));
 }
 
 static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
@@ -520,6 +521,9 @@  static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
 
 static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 {
+	if (kvm_vcpu_dabt_iss1tw(vcpu))
+		return true;
+
 	if (kvm_vcpu_trap_is_iabt(vcpu))
 		return false;