diff mbox

[RFC,v2,19/31] KVM: arm64: Describe AT instruction emulation design

Message ID 1507000273-3735-17-git-send-email-jintack.lim@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jintack Lim Oct. 3, 2017, 3:11 a.m. UTC
This design overview will help to digest the subsequent patches that
implement AT instruction emulation.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
---
 arch/arm64/kvm/sys_regs.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

Comments

James Morse Oct. 3, 2017, 5:37 p.m. UTC | #1
Hi Jintack,

On 03/10/17 04:11, Jintack Lim wrote:
> This design overview will help to digest the subsequent patches that
> implement AT instruction emulation.

> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 8d04926..d8728cc 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1621,6 +1621,72 @@ static bool access_id_aa64mmfr0_el1(struct kvm_vcpu *v,
>  	{ SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
>  };
>  
> +/*
> + * AT instruction emulation
> + *
> + * We emulate AT instructions executed in the virtual EL2.

> + * Basic strategy for the stage-1 translation emulation is to load proper
> + * context, which depends on the trapped instruction and the virtual HCR_EL2,
> + * to the EL1 virtual memory control registers and execute S1E[01] instructions
> + * in EL2. See below for more detail.

What happens if the guest memory containing some stage1-page-table has been
unmapped from stage2? (e.g. its swapped to disk).

(there is some background to this: I tried to implement the kvm_translate
ioctl() using this approach, running 'at s1e1*' from EL2. I ran into problems
when parts of the guest's stage1 page tables had been unmapped from stage2.)

From memory, I found that the AT instructions would fault-in those pages when
run from EL1, but when executing the same instruction at EL2 they just failed
without any hint of which IPA needed mapping in.

I can try digging for any left over code if we want to setup a test case for this...


Thanks,

James
Jintack Lim Oct. 3, 2017, 9:11 p.m. UTC | #2
Hi James,

On Tue, Oct 3, 2017 at 1:37 PM, James Morse <james.morse@arm.com> wrote:
> Hi Jintack,
>
> On 03/10/17 04:11, Jintack Lim wrote:
>> This design overview will help to digest the subsequent patches that
>> implement AT instruction emulation.
>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 8d04926..d8728cc 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1621,6 +1621,72 @@ static bool access_id_aa64mmfr0_el1(struct kvm_vcpu *v,
>>       { SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
>>  };
>>
>> +/*
>> + * AT instruction emulation
>> + *
>> + * We emulate AT instructions executed in the virtual EL2.
>
>> + * Basic strategy for the stage-1 translation emulation is to load proper
>> + * context, which depends on the trapped instruction and the virtual HCR_EL2,
>> + * to the EL1 virtual memory control registers and execute S1E[01] instructions
>> + * in EL2. See below for more detail.
>
> What happens if the guest memory containing some stage1-page-table has been
> unmapped from stage2? (e.g. its swapped to disk).
>
> (there is some background to this: I tried to implement the kvm_translate
> ioctl() using this approach, running 'at s1e1*' from EL2. I ran into problems
> when parts of the guest's stage1 page tables had been unmapped from stage2.)
>
> From memory, I found that the AT instructions would fault-in those pages when
> run from EL1, but when executing the same instruction at EL2 they just failed
> without any hint of which IPA needed mapping in.

I think I haven't encountered this case yet, probably because I
usually don't set a swap partition.

In fact, I couldn't find pseudocode for AT instructions. If you
happened to have one, is that behavior you observed described in ARM
ARM?

Thanks,
Jintack

>
> I can try digging for any left over code if we want to setup a test case for this...
>
>
> Thanks,
>
> James
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>
Marc Zyngier Oct. 4, 2017, 9:13 a.m. UTC | #3
On 03/10/17 22:11, Jintack Lim wrote:
> Hi James,
> 
> On Tue, Oct 3, 2017 at 1:37 PM, James Morse <james.morse@arm.com> wrote:
>> Hi Jintack,
>>
>> On 03/10/17 04:11, Jintack Lim wrote:
>>> This design overview will help to digest the subsequent patches that
>>> implement AT instruction emulation.
>>
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index 8d04926..d8728cc 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -1621,6 +1621,72 @@ static bool access_id_aa64mmfr0_el1(struct kvm_vcpu *v,
>>>       { SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
>>>  };
>>>
>>> +/*
>>> + * AT instruction emulation
>>> + *
>>> + * We emulate AT instructions executed in the virtual EL2.
>>
>>> + * Basic strategy for the stage-1 translation emulation is to load proper
>>> + * context, which depends on the trapped instruction and the virtual HCR_EL2,
>>> + * to the EL1 virtual memory control registers and execute S1E[01] instructions
>>> + * in EL2. See below for more detail.
>>
>> What happens if the guest memory containing some stage1-page-table has been
>> unmapped from stage2? (e.g. its swapped to disk).
>>
>> (there is some background to this: I tried to implement the kvm_translate
>> ioctl() using this approach, running 'at s1e1*' from EL2. I ran into problems
>> when parts of the guest's stage1 page tables had been unmapped from stage2.)
>>
>> From memory, I found that the AT instructions would fault-in those pages when
>> run from EL1, but when executing the same instruction at EL2 they just failed
>> without any hint of which IPA needed mapping in.

Let me see if I follow:

AT S1E1 at EL1 should only generate a fault if the page table walking
itself generates a fault (the guest page tables have been swapped out),
and the fault is taken to EL2. At that point, that's a normal
translation fault, which EL2 can easily resolve and restart the AT
instruction. This is in fact no different from a faulting load/store.

Doing the same thing at EL2 would simply indeed indicate a failed
translation, and not generate a fault, which I think is what you're
observing. After all, it is the hypervisor that unmapped those pages, it
might as well properly track what is happening.

It is a bit of an odd case because the AT here is executed at vEL2
(EL1), and trapped to EL2 because of the NV bits. If it wasn't trapped,
everything would just work. In this case, I can't see any other way but
to walk the S1PT by hand, having put all the other vcpus on hold to
avoid concurrent modifications... Yes, this sucks. If only AT could do
partial walks...

The saving grace is that this only happens in the unmapped S1PT case.
The above can be used as a fallback if the AT S1 from EL2 actually fails.

> I think I haven't encountered this case yet, probably because I
> usually don't set a swap partition.
> 
> In fact, I couldn't find pseudocode for AT instructions. If you
> happened to have one, is that behavior you observed described in ARM
> ARM?

See J1.1.5 in the ARMv8 ARM Rev B.a, and the various comments indicating
how this applies to Address Translation instructions. There is also some
description of what is expected from the AT instructions in D4.2.11.

Thanks,

	M.
diff mbox

Patch

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 8d04926..d8728cc 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1621,6 +1621,72 @@  static bool access_id_aa64mmfr0_el1(struct kvm_vcpu *v,
 	{ SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
 };
 
+/*
+ * AT instruction emulation
+ *
+ * We emulate AT instructions executed in the virtual EL2.
+ * Basic strategy for the stage-1 translation emulation is to load proper
+ * context, which depends on the trapped instruction and the virtual HCR_EL2,
+ * to the EL1 virtual memory control registers and execute S1E[01] instructions
+ * in EL2. See below for more detail.
+ *
+ * For the stage-2 translation, which is necessary for S12E[01] emulation,
+ * we walk the guest hypervisor's stage-2 page table in software.
+ *
+ * The stage-1 translation emulations can be divided into two groups depending
+ * on the translation regime.
+ *
+ * 1. EL2 AT instructions: S1E2x
+ * +-----------------------------------------------------------------------+
+ * |                             |         Setting for the emulation       |
+ * | Virtual HCR_EL2.E2H on trap |-----------------------------------------+
+ * |                             | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |-----------------------------------------------------------------------|
+ * |             0               |     vEL2      |    (1, 1)    |    0     |
+ * |             1               |     vEL2      |    (0, 0)    |    0     |
+ * +-----------------------------------------------------------------------+
+ *
+ * We emulate the EL2 AT instructions by loading virtual EL2 context
+ * to the EL1 virtual memory control registers and executing corresponding
+ * EL1 AT instructions.
+ *
+ * We set physical NV and NV1 bits to use EL2 page table format for non-VHE
+ * guest hypervisor (i.e. HCR_EL2.E2H == 0). As a VHE guest hypervisor uses the
+ * EL1 page table format, we don't set those bits.
+ *
+ * We should clear physical TGE bit not to use the EL2 translation regime when
+ * the host uses the VHE feature.
+ *
+ *
+ * 2. EL0/EL1 AT instructions: S1E[01]x, S12E1x
+ * +----------------------------------------------------------------------+
+ * |   Virtual HCR_EL2 on trap  |        Setting for the emulation        |
+ * |----------------------------------------------------------------------+
+ * | (vE2H, vTGE) | (vNV, vNV1) | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |----------------------------------------------------------------------|
+ * |    (0, 0)*   |   (0, 0)    |      vEL1     |    (0, 0)    |    0     |
+ * |    (0, 0)    |   (1, 1)    |      vEL1     |    (1, 1)    |    0     |
+ * |    (1, 1)    |   (0, 0)    |      vEL2     |    (0, 0)    |    0     |
+ * |    (1, 1)    |   (1, 1)    |      vEL2     |    (1, 1)    |    0     |
+ * +----------------------------------------------------------------------+
+ *
+ * *For (0, 0) in the 'Virtual HCR_EL2 on trap' column, it actually means
+ *  (1, 1). Keep them (0, 0) just for the readability.
+ *
+ * We set physical EL1 virtual memory control registers depending on
+ * (vE2H, vTGE) pair. When the pair is (0, 0) where AT instructions are
+ * supposed to use EL0/EL1 translation regime, we load the EL1 registers with
+ * the virtual EL1 registers (i.e. EL1 registers from the guest hypervisor's
+ * point of view). When the pair is (1, 1), however, AT instructions are defined
+ * to apply EL2 translation regime. To emulate this behavior, we load the EL1
+ * registers with the virtual EL2 context. (i.e the shadow registers)
+ *
+ * We respect the virtual NV and NV1 bit for the emulation. When those bits are
+ * set, it means that a guest hypervisor would like to use EL2 page table format
+ * for the EL1 translation regime. We emulate this by setting the physical
+ * NV and NV1 bits.
+ */
+
 #define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
 	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
 static struct sys_reg_desc sys_insn_descs[] = {