Message ID | 20211122121844.867-1-shameerali.kolothum.thodi@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | kvm/arm: New VMID allocator based on asid | expand |
Hi, A gentle ping on this series. Please take a look and let me know the new approach taken in this revision is good enough or not. Appreciate your feedback. Thanks, Shameer > -----Original Message----- > From: linux-arm-kernel [mailto:linux-arm-kernel-bounces@lists.infradead.org] > On Behalf Of Shameer Kolothum > Sent: 22 November 2021 12:19 > To: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > linux-kernel@vger.kernel.org > Cc: maz@kernel.org; will@kernel.org; catalin.marinas@arm.com; > james.morse@arm.com; julien.thierry.kdev@gmail.com; > suzuki.poulose@arm.com; jean-philippe@linaro.org; > Alexandru.Elisei@arm.com; qperret@google.com; Jonathan Cameron > <jonathan.cameron@huawei.com>; Linuxarm <linuxarm@huawei.com> > Subject: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid > > Changes from v3: > - Main change is in patch #4, where the VMID is now set to an > invalid one on vCPU schedule out. Introduced an > INVALID_ACTIVE_VMID which is basically a VMID 0 with generation 1. > Since the basic allocator algorithm reserves vmid #0, it is never > used as an active VMID. This (hopefully) will fix the issue of > unnecessarily reserving VMID space with active_vmids when those > VMs are no longer active[0] and at the same time address the > problem noted in v3 wherein everything ends up in slow-path[1]. > > Testing: > -Run with VMID bit set to 4 and maxcpus to 8 on D06. The test > involves running concurrently 50 guests with 4 vCPUs. Each > guest will then execute hackbench 5 times before exiting. > No crash was observed for a 4-day continuous run. > The latest branch is here, > https://github.com/hisilicon/kernel-dev/tree/private-v5.16-rc1-vmid-v4 > > -TLA+ model. Modified the asidalloc model to incorporate the new > VMID algo. The main differences are, > -flush_tlb_all() instead of local_tlb_flush_all() on rollover. > -Introduced INVALID_VMID and vCPU Sched Out logic. > -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants). > -Removed UniqueVMIDPerCPU invariant for now as it looks like > because of the speculative fetching with flush_tlb_all() there > is a small window where this gets triggered. If I change the > logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to > be fine. With my limited knowledge on TLA+ model, it is not > clear to me whether this is a problem with the above logic > or the VMID model implementation. Really appreciate any help > with the model. > The initial VMID TLA+ model is here, > https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1 > > Please take a look and let me know. > > Thanks, > Shameer > > [0] > https://lore.kernel.org/kvmarm/20210721160614.GC11003@willie-the-truck/ > [1] > https://lore.kernel.org/kvmarm/20210803114034.GB30853@willie-the-truck/ > > History: > -------- > v2 --> v3 > -Dropped adding a new static key and cpufeature for retrieving > supported VMID bits. Instead, we now make use of the > kvm_arm_vmid_bits variable (patch #2). > > -Since we expect less frequent rollover in the case of VMIDs, > the TLB invalidation is now broadcasted on rollover instead > of keeping per CPU flush_pending info and issuing a local > context flush. > > -Clear active_vmids on vCPU schedule out to avoid unnecessarily > reserving the VMID space(patch #3). > > -I have kept the struct kvm_vmid as it is for now(instead of a > typedef as suggested), as we may soon add another variable to > it when we introduce Pinned KVM VMID support. > > RFCv1 --> v2 > -Dropped "pinned VMID" support for now. > -Dropped RFC tag. > RFCv1 > > https://lore.kernel.org/kvmarm/20210506165232.1969-1-shameerali.kolothu > m.thodi@huawei.com/ > > Julien Grall (1): > KVM: arm64: Align the VMID allocation with the arm64 ASID > > Shameer Kolothum (3): > KVM: arm64: Introduce a new VMID allocator for KVM > KVM: arm64: Make VMID bits accessible outside of allocator > KVM: arm64: Make active_vmids invalid on vCPU schedule out > > arch/arm64/include/asm/kvm_host.h | 10 +- > arch/arm64/include/asm/kvm_mmu.h | 4 +- > arch/arm64/kernel/image-vars.h | 3 + > arch/arm64/kvm/Makefile | 2 +- > arch/arm64/kvm/arm.c | 106 +++----------- > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 3 +- > arch/arm64/kvm/mmu.c | 1 - > arch/arm64/kvm/vmid.c | 196 > ++++++++++++++++++++++++++ > 8 files changed, 228 insertions(+), 97 deletions(-) create mode 100644 > arch/arm64/kvm/vmid.c > > -- > 2.17.1 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Shameer, On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote: > -TLA+ model. Modified the asidalloc model to incorporate the new > VMID algo. The main differences are, > -flush_tlb_all() instead of local_tlb_flush_all() on rollover. > -Introduced INVALID_VMID and vCPU Sched Out logic. > -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants). > -Removed UniqueVMIDPerCPU invariant for now as it looks like > because of the speculative fetching with flush_tlb_all() there > is a small window where this gets triggered. If I change the > logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to > be fine. With my limited knowledge on TLA+ model, it is not > clear to me whether this is a problem with the above logic > or the VMID model implementation. Really appreciate any help > with the model. > The initial VMID TLA+ model is here, > https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1 I only had a brief look at the TLA+ model and I don't understand why you have a separate 'shed_out' process. It would run in parallel with the 'sched' but AFAICT you can't really schedule a guest out while you are in the middle of scheduling it in. I'd rather use the same 'sched' process and either schedule in an inactive task or schedule out an active one for a given CPU. Also active_vmids[] for example is defined on the CPUS domain but you call vcpu_sched_out() from a process that's not in the CPUS domain but the SCHED_OUT one. Regarding UniqueVMIDPerCPU, I think we need to figure out why it happens. The fact that flush_tlb_all() was made to simulate the speculative TLB loads is not relevant. In a different spec I have, arm64kpti.tla, I just used another process that invokes an update_tlbs() macro so that it can happen at any time. I didn't bother to update the ASID spec in a similar way but it may be useful. The corresponding UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if they correspond to different tasks (pgd), they should have different ASIDs. That's a strong requirement, otherwise we end up with the wrong translation. Why did you remove the CnP? Do we have this disabled for KVM guests?
Hi Catalin, > -----Original Message----- > From: Catalin Marinas [mailto:catalin.marinas@arm.com] > Sent: 18 January 2022 12:33 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > linux-kernel@vger.kernel.org; maz@kernel.org; will@kernel.org; > james.morse@arm.com; julien.thierry.kdev@gmail.com; > suzuki.poulose@arm.com; jean-philippe@linaro.org; > Alexandru.Elisei@arm.com; qperret@google.com; Jonathan Cameron > <jonathan.cameron@huawei.com>; Linuxarm <linuxarm@huawei.com> > Subject: Re: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid > > Hi Shameer, > > On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote: > > -TLA+ model. Modified the asidalloc model to incorporate the new > > VMID algo. The main differences are, > > -flush_tlb_all() instead of local_tlb_flush_all() on rollover. > > -Introduced INVALID_VMID and vCPU Sched Out logic. > > -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask > invariants). > > -Removed UniqueVMIDPerCPU invariant for now as it looks like > > because of the speculative fetching with flush_tlb_all() there > > is a small window where this gets triggered. If I change the > > logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to > > be fine. With my limited knowledge on TLA+ model, it is not > > clear to me whether this is a problem with the above logic > > or the VMID model implementation. Really appreciate any help > > with the model. > > The initial VMID TLA+ model is here, > > https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1 > > I only had a brief look at the TLA+ model and I don't understand why you > have a separate 'shed_out' process. It would run in parallel with the > 'sched' but AFAICT you can't really schedule a guest out while you are > in the middle of scheduling it in. I'd rather use the same 'sched' > process and either schedule in an inactive task or schedule out an > active one for a given CPU. > > Also active_vmids[] for example is defined on the CPUS domain but you > call vcpu_sched_out() from a process that's not in the CPUS domain but > the SCHED_OUT one. Many thanks for taking a look. My bad!. The 'sched_out' would indeed run in parallel and defeat the purpose. I must say I was really confused by the TLA+ syntax and is still not very confident about it. Based on the above suggestion, I have modified it as below, \* vCPU is scheduled out by KVM macro vcpu_sched_out() { active_kvm[self].task := 0; active_vmids[self] := INVALID_VMID; } .... \* About to run a Guest VM process (sched \in CPUS) { sched_loop: while (TRUE) { with (t \in TASKS) { if (t # ActiveTask(self)) call kvm_arm_vmid_update(t); else vcpu_sched_out(); } } } Please let me know if this is ok. > Regarding UniqueVMIDPerCPU, I think we need to figure out why it > happens. The fact that flush_tlb_all() was made to simulate the > speculative TLB loads is not relevant. In a different spec I have, > arm64kpti.tla, I just used another process that invokes an update_tlbs() > macro so that it can happen at any time. I didn't bother to update the > ASID spec in a similar way but it may be useful. Ok. I will take a look at that and try that. The corresponding > UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if > they correspond to different tasks (pgd), they should have different > ASIDs. That's a strong requirement, otherwise we end up with the wrong > translation. Yes, I understand that it is a strong requirement. Also, I thought this is something that will trigger easily with the test setup I had with the real hardware. But testing stayed on for days, so I was not sure it is a problem with the TLA+ implementation or not. > > Why did you remove the CnP? Do we have this disabled for KVM guests? I removed CnP related Invariants to simplify things for the first version. Also not sure what specific changes we need to do for CnP here. Do we still need that switching to global pg_dir to prevent any speculative reading? I didn't see that being done in KVM anywhere at the moment. Maybe I am missing something. On a side note, In my setup, the CnP=TRUE case for asidalloc.tla now fails with, "Error: Invariant TLBEmptyInvalPTE is violated.". Please check. Thanks, Shameer > -- > Catalin
On Wed, Jan 19, 2022 at 09:23:31AM +0000, Shameerali Kolothum Thodi wrote: > > On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote: > > > -TLA+ model. Modified the asidalloc model to incorporate the new > > > VMID algo. The main differences are, > > > -flush_tlb_all() instead of local_tlb_flush_all() on rollover. > > > -Introduced INVALID_VMID and vCPU Sched Out logic. > > > -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants). > > > -Removed UniqueVMIDPerCPU invariant for now as it looks like > > > because of the speculative fetching with flush_tlb_all() there > > > is a small window where this gets triggered. If I change the > > > logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to > > > be fine. With my limited knowledge on TLA+ model, it is not > > > clear to me whether this is a problem with the above logic > > > or the VMID model implementation. Really appreciate any help > > > with the model. > > > The initial VMID TLA+ model is here, > > > https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1 > > > > I only had a brief look at the TLA+ model and I don't understand why you > > have a separate 'shed_out' process. It would run in parallel with the > > 'sched' but AFAICT you can't really schedule a guest out while you are > > in the middle of scheduling it in. I'd rather use the same 'sched' > > process and either schedule in an inactive task or schedule out an > > active one for a given CPU. > > > > Also active_vmids[] for example is defined on the CPUS domain but you > > call vcpu_sched_out() from a process that's not in the CPUS domain but > > the SCHED_OUT one. > > Many thanks for taking a look. My bad!. The 'sched_out' would indeed run in parallel > and defeat the purpose. I must say I was really confused by the TLA+ syntax and > is still not very confident about it. Yeah, it can be confusing. If you have time, you could give CBMC a try and the 'spec' would be pretty close to your C version. Each CPU would be modelled as a thread with an extra thread that simulates the speculative TLB look-ups for all CPUS together with the asserts for the invariants. The spinlocks would be pthread_mutexes. > Based on the above suggestion, I have modified it as below, > > \* vCPU is scheduled out by KVM > macro vcpu_sched_out() { > active_kvm[self].task := 0; > active_vmids[self] := INVALID_VMID; > } Could you call cpu_switch_kvm(0, INVALID_VMID) instead? You could do this directly below and avoid another macro. Well, whatever you find clearer. What confuses me is that your INVALID_VMID looks like a valid one: vmid 0, generation 1. Do you ensure that you never allocate VMID 0? > \* About to run a Guest VM > process (sched \in CPUS) > { > sched_loop: > while (TRUE) { > with (t \in TASKS) { > if (t # ActiveTask(self)) > call kvm_arm_vmid_update(t); > else > vcpu_sched_out(); > } > } > } Yes, that's what I meant. > > The corresponding > > UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if > > they correspond to different tasks (pgd), they should have different > > ASIDs. That's a strong requirement, otherwise we end up with the wrong > > translation. > > Yes, I understand that it is a strong requirement. Also, I thought this is something > that will trigger easily with the test setup I had with the real hardware. But testing > stayed on for days, so I was not sure it is a problem with the TLA+ implementation > or not. Well, you'd have to check the TLA+ state trace and see how it got there, whether the last state would be a valid one. It's either something missing in the spec that the hardware enforces or the algorithm is wrong and just hard to hit in practice. If this condition is violated in hardware for a very brief period, e.g. due to some TLBI, you'd not notice an issue under normal circumstances. But it's still incorrect. > > Why did you remove the CnP? Do we have this disabled for KVM guests? > > I removed CnP related Invariants to simplify things for the first version. Also not sure > what specific changes we need to do for CnP here. Do we still need that switching to > global pg_dir to prevent any speculative reading? I didn't see that being done in KVM > anywhere at the moment. Maybe I am missing something. It make sense to ignore CnP for now. Maybe KVM doesn't even bother and sets VTTBR_EL2.CnP to 0 (I haven't checked). > On a side note, In my setup, the CnP=TRUE case for asidalloc.tla now fails with, > "Error: Invariant TLBEmptyInvalPTE is violated.". Please check. This was added later as part of try-to-unmap and I only checked this with CnP = FALSE. I'll need to look into, it's possible that flush_tlb_all() doesn't take into account that the pte is unmapped (as cpu_switch_mm() does). I'll add a separate thread for speculative TLB loads, it's easier to have them in one place. Thanks.
> -----Original Message----- > From: Catalin Marinas [mailto:catalin.marinas@arm.com] > Sent: 19 January 2022 11:50 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > linux-kernel@vger.kernel.org; maz@kernel.org; will@kernel.org; > james.morse@arm.com; julien.thierry.kdev@gmail.com; > suzuki.poulose@arm.com; jean-philippe@linaro.org; > Alexandru.Elisei@arm.com; qperret@google.com; Jonathan Cameron > <jonathan.cameron@huawei.com>; Linuxarm <linuxarm@huawei.com> > Subject: Re: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid > > On Wed, Jan 19, 2022 at 09:23:31AM +0000, Shameerali Kolothum Thodi > wrote: > > > On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote: > > > > -TLA+ model. Modified the asidalloc model to incorporate the new > > > > VMID algo. The main differences are, > > > > -flush_tlb_all() instead of local_tlb_flush_all() on rollover. > > > > -Introduced INVALID_VMID and vCPU Sched Out logic. > > > > -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask > invariants). > > > > -Removed UniqueVMIDPerCPU invariant for now as it looks like > > > > because of the speculative fetching with flush_tlb_all() there > > > > is a small window where this gets triggered. If I change the > > > > logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to > > > > be fine. With my limited knowledge on TLA+ model, it is not > > > > clear to me whether this is a problem with the above logic > > > > or the VMID model implementation. Really appreciate any help > > > > with the model. > > > > The initial VMID TLA+ model is here, > > > > > https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1 > > > > > > I only had a brief look at the TLA+ model and I don't understand why you > > > have a separate 'shed_out' process. It would run in parallel with the > > > 'sched' but AFAICT you can't really schedule a guest out while you are > > > in the middle of scheduling it in. I'd rather use the same 'sched' > > > process and either schedule in an inactive task or schedule out an > > > active one for a given CPU. > > > > > > Also active_vmids[] for example is defined on the CPUS domain but you > > > call vcpu_sched_out() from a process that's not in the CPUS domain but > > > the SCHED_OUT one. > > > > Many thanks for taking a look. My bad!. The 'sched_out' would indeed run in > parallel > > and defeat the purpose. I must say I was really confused by the TLA+ syntax > and > > is still not very confident about it. > > Yeah, it can be confusing. If you have time, you could give CBMC a try > and the 'spec' would be pretty close to your C version. Each CPU would > be modelled as a thread with an extra thread that simulates the > speculative TLB look-ups for all CPUS together with the asserts for the > invariants. The spinlocks would be pthread_mutexes. Ok, will take a look. > > Based on the above suggestion, I have modified it as below, > > > > \* vCPU is scheduled out by KVM > > macro vcpu_sched_out() { > > active_kvm[self].task := 0; > > active_vmids[self] := INVALID_VMID; > > } > > Could you call cpu_switch_kvm(0, INVALID_VMID) instead? You could do > this directly below and avoid another macro. Well, whatever you find > clearer. Sure, will change that. > What confuses me is that your INVALID_VMID looks like a valid one: vmid > 0, generation 1. Do you ensure that you never allocate VMID 0? This is same as in asidalloc wherein the cur_idx starts at 1 for any new allocation. I think that guarantees VMID 0 is never allocated. > > > \* About to run a Guest VM > > process (sched \in CPUS) > > { > > sched_loop: > > while (TRUE) { > > with (t \in TASKS) { > > if (t # ActiveTask(self)) > > call kvm_arm_vmid_update(t); > > else > > vcpu_sched_out(); > > } > > } > > } > > Yes, that's what I meant. Ok. > > > > The corresponding > > > UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if > > > they correspond to different tasks (pgd), they should have different > > > ASIDs. That's a strong requirement, otherwise we end up with the wrong > > > translation. > > > > Yes, I understand that it is a strong requirement. Also, I thought this is > something > > that will trigger easily with the test setup I had with the real hardware. But > testing > > stayed on for days, so I was not sure it is a problem with the TLA+ > implementation > > or not. > > Well, you'd have to check the TLA+ state trace and see how it got > there, whether the last state would be a valid one. It's either > something missing in the spec that the hardware enforces or the > algorithm is wrong and just hard to hit in practice. If this condition > is violated in hardware for a very brief period, e.g. due to some TLBI, > you'd not notice an issue under normal circumstances. But it's still > incorrect. Hmm..I had a quick implementation with a separate thread for speculative load as done in the arm64kpti.tla, but still get the UniqueVMIDPerCPU violation error. I will go through the state trace and see whether I can interpret something :) > > > Why did you remove the CnP? Do we have this disabled for KVM guests? > > > > I removed CnP related Invariants to simplify things for the first version. Also > not sure > > what specific changes we need to do for CnP here. Do we still need that > switching to > > global pg_dir to prevent any speculative reading? I didn't see that being > done in KVM > > anywhere at the moment. Maybe I am missing something. > > It make sense to ignore CnP for now. Maybe KVM doesn't even bother and > sets VTTBR_EL2.CnP to 0 (I haven't checked). I think KVM sets the CnP here, https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/kvm_mmu.h#L264 But I haven't figured out what else we need to do in this new VMID allocator for CnP case. Marc, Will? > > > On a side note, In my setup, the CnP=TRUE case for asidalloc.tla now fails > with, > > "Error: Invariant TLBEmptyInvalPTE is violated.". Please check. > > This was added later as part of try-to-unmap and I only checked this > with CnP = FALSE. I'll need to look into, it's possible that > flush_tlb_all() doesn't take into account that the pte is unmapped (as > cpu_switch_mm() does). I'll add a separate thread for speculative TLB > loads, it's easier to have them in one place. Thanks. Ok. Thanks, Shameer
On Mon, 22 Nov 2021 12:18:40 +0000, Shameer Kolothum wrote: > Changes from v3: > - Main change is in patch #4, where the VMID is now set to an > invalid one on vCPU schedule out. Introduced an > INVALID_ACTIVE_VMID which is basically a VMID 0 with generation 1. > Since the basic allocator algorithm reserves vmid #0, it is never > used as an active VMID. This (hopefully) will fix the issue of > unnecessarily reserving VMID space with active_vmids when those > VMs are no longer active[0] and at the same time address the > problem noted in v3 wherein everything ends up in slow-path[1]. > > [...] Applied to next, thanks! [1/4] KVM: arm64: Introduce a new VMID allocator for KVM commit: 417838392f2e657ee25cc30e373ff4c35a0faa90 [2/4] KVM: arm64: Make VMID bits accessible outside of allocator commit: f8051e960922a9de8e42159103d5d9c697ef17ec [3/4] KVM: arm64: Align the VMID allocation with the arm64 ASID commit: 3248136b3637e1671e4fa46e32e2122f9ec4bc3d [4/4] KVM: arm64: Make active_vmids invalid on vCPU schedule out commit: 100b4f092f878dc379f1fcef9ce567c25dee3473 Cheers, M.