[v4,0/4] kvm/arm: New VMID allocator based on asid

Message ID	20211122121844.867-1-shameerali.kolothum.thodi@huawei.com (mailing list archive)
Headers	show Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> To: <linux-arm-kernel@lists.infradead.org>, <kvmarm@lists.cs.columbia.edu>, <linux-kernel@vger.kernel.org> CC: <maz@kernel.org>, <will@kernel.org>, <catalin.marinas@arm.com>, <james.morse@arm.com>, <julien.thierry.kdev@gmail.com>, <suzuki.poulose@arm.com>, <jean-philippe@linaro.org>, <Alexandru.Elisei@arm.com>, <qperret@google.com>, <jonathan.cameron@huawei.com>, <linuxarm@huawei.com> Subject: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid Date: Mon, 22 Nov 2021 12:18:40 +0000 Message-ID: <20211122121844.867-1-shameerali.kolothum.thodi@huawei.com> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	kvm/arm: New VMID allocator based on asid \| expand [v4,0/4] kvm/arm: New VMID allocator based on asid [v4,1/4] KVM: arm64: Introduce a new VMID allocator for KVM [v4,2/4] KVM: arm64: Make VMID bits accessible outside of allocator [v4,3/4] KVM: arm64: Align the VMID allocation with the arm64 ASID [v4,4/4] KVM: arm64: Make active_vmids invalid on vCPU schedule out

Shameerali Kolothum Thodi Nov. 22, 2021, 12:18 p.m. UTC

Changes from v3:
- Main change is in patch #4, where the VMID is now set to an
invalid one on vCPU schedule out. Introduced an
INVALID_ACTIVE_VMID which is basically a VMID 0 with generation 1.
Since the basic allocator algorithm reserves vmid #0, it is never
used as an active VMID. This (hopefully) will fix the issue of
unnecessarily reserving VMID space with active_vmids when those
VMs are no longer active[0] and at the same time address the
problem noted in v3 wherein everything ends up in slow-path[1].

Testing:
-Run with VMID bit set to 4 and maxcpus to 8 on D06. The test
involves running concurrently 50 guests with 4 vCPUs. Each
guest will then execute hackbench 5 times before exiting.
No crash was observed for a 4-day continuous run.
The latest branch is here,
https://github.com/hisilicon/kernel-dev/tree/private-v5.16-rc1-vmid-v4

-TLA+ model. Modified the asidalloc model to incorporate the new
VMID algo. The main differences are,
-flush_tlb_all() instead of local_tlb_flush_all() on rollover.
-Introduced INVALID_VMID and vCPU Sched Out logic.
-No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants).
-Removed UniqueVMIDPerCPU invariant for now as it looks like
because of the speculative fetching with flush_tlb_all() there
is a small window where this gets triggered. If I change the
logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
be fine. With my limited knowledge on TLA+ model, it is not
clear to me whether this is a problem with the above logic
or the VMID model implementation. Really appreciate any help
with the model.
The initial VMID TLA+ model is here,
https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1

Please take a look and let me know.

Thanks,
Shameer

[0] https://lore.kernel.org/kvmarm/20210721160614.GC11003@willie-the-truck/
[1] https://lore.kernel.org/kvmarm/20210803114034.GB30853@willie-the-truck/

History:
--------
v2 --> v3
-Dropped adding a new static key and cpufeature for retrieving
supported VMID bits. Instead, we now make use of the
kvm_arm_vmid_bits variable (patch #2).

-Since we expect less frequent rollover in the case of VMIDs,
the TLB invalidation is now broadcasted on rollover instead
of keeping per CPU flush_pending info and issuing a local
context flush.

-Clear active_vmids on vCPU schedule out to avoid unnecessarily
reserving the VMID space(patch #3).

-I have kept the struct kvm_vmid as it is for now(instead of a
typedef as suggested), as we may soon add another variable to
it when we introduce Pinned KVM VMID support.

RFCv1 --> v2
-Dropped "pinned VMID" support for now.
-Dropped RFC tag.
RFCv1
https://lore.kernel.org/kvmarm/20210506165232.1969-1-shameerali.kolothum.thodi@huawei.com/

Julien Grall (1):
KVM: arm64: Align the VMID allocation with the arm64 ASID

Shameer Kolothum (3):
KVM: arm64: Introduce a new VMID allocator for KVM
KVM: arm64: Make VMID bits accessible outside of allocator
KVM: arm64: Make active_vmids invalid on vCPU schedule out

arch/arm64/include/asm/kvm_host.h | 10 +-
arch/arm64/include/asm/kvm_mmu.h | 4 +-
arch/arm64/kernel/image-vars.h | 3 +
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/arm.c | 106 +++-----------
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 3 +-
arch/arm64/kvm/mmu.c | 1 -
arch/arm64/kvm/vmid.c | 196 ++++++++++++++++++++++++++
8 files changed, 228 insertions(+), 97 deletions(-)
create mode 100644 arch/arm64/kvm/vmid.c

Shameerali Kolothum Thodi Jan. 5, 2022, 1:25 p.m. UTC | #1

Hi,

A gentle ping on this series. Please take a look and let me know the new approach
taken in this revision is good enough or not. 

Appreciate your feedback.

Thanks,
Shameer

> -----Original Message-----
> From: linux-arm-kernel [mailto:linux-arm-kernel-bounces@lists.infradead.org]
> On Behalf Of Shameer Kolothum
> Sent: 22 November 2021 12:19
> To: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-kernel@vger.kernel.org
> Cc: maz@kernel.org; will@kernel.org; catalin.marinas@arm.com;
> james.morse@arm.com; julien.thierry.kdev@gmail.com;
> suzuki.poulose@arm.com; jean-philippe@linaro.org;
> Alexandru.Elisei@arm.com; qperret@google.com; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Linuxarm <linuxarm@huawei.com>
> Subject: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid
> 
> Changes from v3:
> - Main change is in patch #4, where the VMID is now set to an
>   invalid one on vCPU schedule out. Introduced an
>   INVALID_ACTIVE_VMID which is basically a VMID 0 with generation 1.
>   Since the basic allocator algorithm reserves vmid #0, it is never
>   used as an active VMID. This (hopefully) will fix the issue of
>   unnecessarily reserving VMID space with active_vmids when those
>   VMs are no longer active[0] and at the same time address the
>   problem noted in v3 wherein everything ends up in slow-path[1].
> 
> Testing:
>  -Run with VMID bit set to 4 and maxcpus to 8 on D06. The test
>   involves running concurrently 50 guests with 4 vCPUs. Each
>   guest will then execute hackbench 5 times before exiting.
>   No crash was observed for a 4-day continuous run.
>   The latest branch is here,
>    https://github.com/hisilicon/kernel-dev/tree/private-v5.16-rc1-vmid-v4
> 
>  -TLA+ model. Modified the asidalloc model to incorporate the new
>   VMID algo. The main differences are,
>   -flush_tlb_all() instead of local_tlb_flush_all() on rollover.
>   -Introduced INVALID_VMID and vCPU Sched Out logic.
>   -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants).
>   -Removed  UniqueVMIDPerCPU invariant for now as it looks like
>    because of the speculative fetching with flush_tlb_all() there
>    is a small window where this gets triggered. If I change the
>    logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
>    be fine. With my limited knowledge on TLA+ model, it is not
>    clear to me whether this is a problem with the above logic
>    or the VMID model implementation. Really appreciate any help
>    with the model.
>    The initial VMID TLA+ model is here,
>    https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1
> 
> Please take a look and let me know.
> 
> Thanks,
> Shameer
> 
> [0]
> https://lore.kernel.org/kvmarm/20210721160614.GC11003@willie-the-truck/
> [1]
> https://lore.kernel.org/kvmarm/20210803114034.GB30853@willie-the-truck/
> 
> History:
> --------
> v2 --> v3
>   -Dropped adding a new static key and cpufeature for retrieving
>    supported VMID bits. Instead, we now make use of the
>    kvm_arm_vmid_bits variable (patch #2).
> 
>   -Since we expect less frequent rollover in the case of VMIDs,
>    the TLB invalidation is now broadcasted on rollover instead
>    of keeping per CPU flush_pending info and issuing a local
>    context flush.
> 
>   -Clear active_vmids on vCPU schedule out to avoid unnecessarily
>    reserving the VMID space(patch #3).
> 
>   -I have kept the struct kvm_vmid as it is for now(instead of a
>    typedef as suggested), as we may soon add another variable to
>    it when we introduce Pinned KVM VMID support.
> 
> RFCv1 --> v2
>    -Dropped "pinned VMID" support for now.
>    -Dropped RFC tag.
> RFCv1
> 
> https://lore.kernel.org/kvmarm/20210506165232.1969-1-shameerali.kolothu
> m.thodi@huawei.com/
> 
> Julien Grall (1):
>   KVM: arm64: Align the VMID allocation with the arm64 ASID
> 
> Shameer Kolothum (3):
>   KVM: arm64: Introduce a new VMID allocator for KVM
>   KVM: arm64: Make VMID bits accessible outside of allocator
>   KVM: arm64: Make active_vmids invalid on vCPU schedule out
> 
>  arch/arm64/include/asm/kvm_host.h     |  10 +-
>  arch/arm64/include/asm/kvm_mmu.h      |   4 +-
>  arch/arm64/kernel/image-vars.h        |   3 +
>  arch/arm64/kvm/Makefile               |   2 +-
>  arch/arm64/kvm/arm.c                  | 106 +++-----------
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c |   3 +-
>  arch/arm64/kvm/mmu.c                  |   1 -
>  arch/arm64/kvm/vmid.c                 | 196
> ++++++++++++++++++++++++++
>  8 files changed, 228 insertions(+), 97 deletions(-)  create mode 100644
> arch/arm64/kvm/vmid.c
> 
> --
> 2.17.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Catalin Marinas Jan. 18, 2022, 12:32 p.m. UTC | #2

Hi Shameer,

On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote:
>  -TLA+ model. Modified the asidalloc model to incorporate the new
>   VMID algo. The main differences are,
>   -flush_tlb_all() instead of local_tlb_flush_all() on rollover.
>   -Introduced INVALID_VMID and vCPU Sched Out logic.
>   -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants).
>   -Removed  UniqueVMIDPerCPU invariant for now as it looks like
>    because of the speculative fetching with flush_tlb_all() there
>    is a small window where this gets triggered. If I change the
>    logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
>    be fine. With my limited knowledge on TLA+ model, it is not
>    clear to me whether this is a problem with the above logic
>    or the VMID model implementation. Really appreciate any help
>    with the model.
>    The initial VMID TLA+ model is here,
>    https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1

I only had a brief look at the TLA+ model and I don't understand why you
have a separate 'shed_out' process. It would run in parallel with the
'sched' but AFAICT you can't really schedule a guest out while you are
in the middle of scheduling it in. I'd rather use the same 'sched'
process and either schedule in an inactive task or schedule out an
active one for a given CPU.

Also active_vmids[] for example is defined on the CPUS domain but you
call vcpu_sched_out() from a process that's not in the CPUS domain but
the SCHED_OUT one.

Regarding UniqueVMIDPerCPU, I think we need to figure out why it
happens. The fact that flush_tlb_all() was made to simulate the
speculative TLB loads is not relevant. In a different spec I have,
arm64kpti.tla, I just used another process that invokes an update_tlbs()
macro so that it can happen at any time. I didn't bother to update the
ASID spec in a similar way but it may be useful. The corresponding
UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if
they correspond to different tasks (pgd), they should have different
ASIDs. That's a strong requirement, otherwise we end up with the wrong
translation.

Why did you remove the CnP? Do we have this disabled for KVM guests?

Shameerali Kolothum Thodi Jan. 19, 2022, 9:23 a.m. UTC | #3

Hi Catalin,

> -----Original Message-----
> From: Catalin Marinas [mailto:catalin.marinas@arm.com]
> Sent: 18 January 2022 12:33
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-kernel@vger.kernel.org; maz@kernel.org; will@kernel.org;
> james.morse@arm.com; julien.thierry.kdev@gmail.com;
> suzuki.poulose@arm.com; jean-philippe@linaro.org;
> Alexandru.Elisei@arm.com; qperret@google.com; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Linuxarm <linuxarm@huawei.com>
> Subject: Re: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid
> 
> Hi Shameer,
> 
> On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote:
> >  -TLA+ model. Modified the asidalloc model to incorporate the new
> >   VMID algo. The main differences are,
> >   -flush_tlb_all() instead of local_tlb_flush_all() on rollover.
> >   -Introduced INVALID_VMID and vCPU Sched Out logic.
> >   -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask
> invariants).
> >   -Removed  UniqueVMIDPerCPU invariant for now as it looks like
> >    because of the speculative fetching with flush_tlb_all() there
> >    is a small window where this gets triggered. If I change the
> >    logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
> >    be fine. With my limited knowledge on TLA+ model, it is not
> >    clear to me whether this is a problem with the above logic
> >    or the VMID model implementation. Really appreciate any help
> >    with the model.
> >    The initial VMID TLA+ model is here,
> >    https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1
> 
> I only had a brief look at the TLA+ model and I don't understand why you
> have a separate 'shed_out' process. It would run in parallel with the
> 'sched' but AFAICT you can't really schedule a guest out while you are
> in the middle of scheduling it in. I'd rather use the same 'sched'
> process and either schedule in an inactive task or schedule out an
> active one for a given CPU.
> 
> Also active_vmids[] for example is defined on the CPUS domain but you
> call vcpu_sched_out() from a process that's not in the CPUS domain but
> the SCHED_OUT one.

Many thanks for taking a look. My bad!. The 'sched_out' would indeed run in parallel
and defeat the purpose. I must say I was really confused by the TLA+ syntax and
is still not very confident about it.

Based on the above suggestion, I have modified it as below,

\* vCPU is scheduled out by KVM
macro vcpu_sched_out() {
        active_kvm[self].task := 0;
        active_vmids[self] := INVALID_VMID;
}
....

\* About to run a Guest VM
process (sched \in CPUS)
{
sched_loop:
        while (TRUE) {
                with (t \in TASKS) {
                        if (t # ActiveTask(self))
                                call kvm_arm_vmid_update(t);
                        else
                                vcpu_sched_out();
                }
        }
}

Please let me know if this is ok.

> Regarding UniqueVMIDPerCPU, I think we need to figure out why it
> happens. The fact that flush_tlb_all() was made to simulate the
> speculative TLB loads is not relevant. In a different spec I have,
> arm64kpti.tla, I just used another process that invokes an update_tlbs()
> macro so that it can happen at any time. I didn't bother to update the
> ASID spec in a similar way but it may be useful.

Ok. I will take a look at that and try that.

 The corresponding
> UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if
> they correspond to different tasks (pgd), they should have different
> ASIDs. That's a strong requirement, otherwise we end up with the wrong
> translation.

Yes, I understand that it is a strong requirement. Also, I thought this is something
that will trigger easily with the test setup I had with the real hardware. But testing
stayed on for days, so I was not sure it is a problem with the TLA+ implementation
or not.  

> 
> Why did you remove the CnP? Do we have this disabled for KVM guests?

I removed CnP related Invariants to simplify things for the first version. Also not sure
what specific changes we need to do for CnP here. Do we still need that switching to 
global pg_dir to prevent any speculative reading? I didn't see that being done in KVM 
anywhere at the moment. Maybe I am missing something.

On a side note, In my setup, the CnP=TRUE case for asidalloc.tla now fails with,
"Error: Invariant TLBEmptyInvalPTE is violated.". Please check.

Thanks,
Shameer 

> --
> Catalin

Catalin Marinas Jan. 19, 2022, 11:49 a.m. UTC | #4

On Wed, Jan 19, 2022 at 09:23:31AM +0000, Shameerali Kolothum Thodi wrote:
> > On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote:
> > >  -TLA+ model. Modified the asidalloc model to incorporate the new
> > >   VMID algo. The main differences are,
> > >   -flush_tlb_all() instead of local_tlb_flush_all() on rollover.
> > >   -Introduced INVALID_VMID and vCPU Sched Out logic.
> > >   -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask invariants).
> > >   -Removed  UniqueVMIDPerCPU invariant for now as it looks like
> > >    because of the speculative fetching with flush_tlb_all() there
> > >    is a small window where this gets triggered. If I change the
> > >    logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
> > >    be fine. With my limited knowledge on TLA+ model, it is not
> > >    clear to me whether this is a problem with the above logic
> > >    or the VMID model implementation. Really appreciate any help
> > >    with the model.
> > >    The initial VMID TLA+ model is here,
> > >    https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1
> > 
> > I only had a brief look at the TLA+ model and I don't understand why you
> > have a separate 'shed_out' process. It would run in parallel with the
> > 'sched' but AFAICT you can't really schedule a guest out while you are
> > in the middle of scheduling it in. I'd rather use the same 'sched'
> > process and either schedule in an inactive task or schedule out an
> > active one for a given CPU.
> > 
> > Also active_vmids[] for example is defined on the CPUS domain but you
> > call vcpu_sched_out() from a process that's not in the CPUS domain but
> > the SCHED_OUT one.
> 
> Many thanks for taking a look. My bad!. The 'sched_out' would indeed run in parallel
> and defeat the purpose. I must say I was really confused by the TLA+ syntax and
> is still not very confident about it.

Yeah, it can be confusing. If you have time, you could give CBMC a try
and the 'spec' would be pretty close to your C version. Each CPU would
be modelled as a thread with an extra thread that simulates the
speculative TLB look-ups for all CPUS together with the asserts for the
invariants. The spinlocks would be pthread_mutexes.

> Based on the above suggestion, I have modified it as below,
> 
> \* vCPU is scheduled out by KVM
> macro vcpu_sched_out() {
>         active_kvm[self].task := 0;
>         active_vmids[self] := INVALID_VMID;
> }

Could you call cpu_switch_kvm(0, INVALID_VMID) instead? You could do
this directly below and avoid another macro. Well, whatever you find
clearer.

What confuses me is that your INVALID_VMID looks like a valid one: vmid
0, generation 1. Do you ensure that you never allocate VMID 0?

> \* About to run a Guest VM
> process (sched \in CPUS)
> {
> sched_loop:
>         while (TRUE) {
>                 with (t \in TASKS) {
>                         if (t # ActiveTask(self))
>                                 call kvm_arm_vmid_update(t);
>                         else
>                                 vcpu_sched_out();
>                 }
>         }
> }

Yes, that's what I meant.

> > The corresponding
> > UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if
> > they correspond to different tasks (pgd), they should have different
> > ASIDs. That's a strong requirement, otherwise we end up with the wrong
> > translation.
> 
> Yes, I understand that it is a strong requirement. Also, I thought this is something
> that will trigger easily with the test setup I had with the real hardware. But testing
> stayed on for days, so I was not sure it is a problem with the TLA+ implementation
> or not.  

Well, you'd have to check the TLA+ state trace and see how it got
there, whether the last state would be a valid one. It's either
something missing in the spec that the hardware enforces or the
algorithm is wrong and just hard to hit in practice. If this condition
is violated in hardware for a very brief period, e.g. due to some TLBI,
you'd not notice an issue under normal circumstances. But it's still
incorrect.

> > Why did you remove the CnP? Do we have this disabled for KVM guests?
> 
> I removed CnP related Invariants to simplify things for the first version. Also not sure
> what specific changes we need to do for CnP here. Do we still need that switching to 
> global pg_dir to prevent any speculative reading? I didn't see that being done in KVM 
> anywhere at the moment. Maybe I am missing something.

It make sense to ignore CnP for now. Maybe KVM doesn't even bother and
sets VTTBR_EL2.CnP to 0 (I haven't checked).

> On a side note, In my setup, the CnP=TRUE case for asidalloc.tla now fails with,
> "Error: Invariant TLBEmptyInvalPTE is violated.". Please check.

This was added later as part of try-to-unmap and I only checked this
with CnP = FALSE. I'll need to look into, it's possible that
flush_tlb_all() doesn't take into account that the pte is unmapped (as
cpu_switch_mm() does). I'll add a separate thread for speculative TLB
loads, it's easier to have them in one place. Thanks.

Shameerali Kolothum Thodi Jan. 19, 2022, 12:30 p.m. UTC | #5

> -----Original Message-----
> From: Catalin Marinas [mailto:catalin.marinas@arm.com]
> Sent: 19 January 2022 11:50
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> linux-kernel@vger.kernel.org; maz@kernel.org; will@kernel.org;
> james.morse@arm.com; julien.thierry.kdev@gmail.com;
> suzuki.poulose@arm.com; jean-philippe@linaro.org;
> Alexandru.Elisei@arm.com; qperret@google.com; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Linuxarm <linuxarm@huawei.com>
> Subject: Re: [PATCH v4 0/4] kvm/arm: New VMID allocator based on asid
> 
> On Wed, Jan 19, 2022 at 09:23:31AM +0000, Shameerali Kolothum Thodi
> wrote:
> > > On Mon, Nov 22, 2021 at 12:18:40PM +0000, Shameer Kolothum wrote:
> > > >  -TLA+ model. Modified the asidalloc model to incorporate the new
> > > >   VMID algo. The main differences are,
> > > >   -flush_tlb_all() instead of local_tlb_flush_all() on rollover.
> > > >   -Introduced INVALID_VMID and vCPU Sched Out logic.
> > > >   -No CnP (Removed UniqueASIDAllCPUs & UniqueASIDActiveTask
> invariants).
> > > >   -Removed  UniqueVMIDPerCPU invariant for now as it looks like
> > > >    because of the speculative fetching with flush_tlb_all() there
> > > >    is a small window where this gets triggered. If I change the
> > > >    logic back to local_flush_tlb_all(), UniqueVMIDPerCPU seems to
> > > >    be fine. With my limited knowledge on TLA+ model, it is not
> > > >    clear to me whether this is a problem with the above logic
> > > >    or the VMID model implementation. Really appreciate any help
> > > >    with the model.
> > > >    The initial VMID TLA+ model is here,
> > > >
> https://github.com/shamiali2008/kernel-tla/tree/private-vmidalloc-v1
> > >
> > > I only had a brief look at the TLA+ model and I don't understand why you
> > > have a separate 'shed_out' process. It would run in parallel with the
> > > 'sched' but AFAICT you can't really schedule a guest out while you are
> > > in the middle of scheduling it in. I'd rather use the same 'sched'
> > > process and either schedule in an inactive task or schedule out an
> > > active one for a given CPU.
> > >
> > > Also active_vmids[] for example is defined on the CPUS domain but you
> > > call vcpu_sched_out() from a process that's not in the CPUS domain but
> > > the SCHED_OUT one.
> >
> > Many thanks for taking a look. My bad!. The 'sched_out' would indeed run in
> parallel
> > and defeat the purpose. I must say I was really confused by the TLA+ syntax
> and
> > is still not very confident about it.
> 
> Yeah, it can be confusing. If you have time, you could give CBMC a try
> and the 'spec' would be pretty close to your C version. Each CPU would
> be modelled as a thread with an extra thread that simulates the
> speculative TLB look-ups for all CPUS together with the asserts for the
> invariants. The spinlocks would be pthread_mutexes.

Ok, will take a look.

> > Based on the above suggestion, I have modified it as below,
> >
> > \* vCPU is scheduled out by KVM
> > macro vcpu_sched_out() {
> >         active_kvm[self].task := 0;
> >         active_vmids[self] := INVALID_VMID;
> > }
> 
> Could you call cpu_switch_kvm(0, INVALID_VMID) instead? You could do
> this directly below and avoid another macro. Well, whatever you find
> clearer.

Sure, will change that.

> What confuses me is that your INVALID_VMID looks like a valid one: vmid
> 0, generation 1. Do you ensure that you never allocate VMID 0?

This is same as in asidalloc wherein the cur_idx starts at 1 for any new
allocation. I think that guarantees VMID 0 is never allocated.  

> 
> > \* About to run a Guest VM
> > process (sched \in CPUS)
> > {
> > sched_loop:
> >         while (TRUE) {
> >                 with (t \in TASKS) {
> >                         if (t # ActiveTask(self))
> >                                 call kvm_arm_vmid_update(t);
> >                         else
> >                                 vcpu_sched_out();
> >                 }
> >         }
> > }
> 
> Yes, that's what I meant.

Ok.

> 
> > > The corresponding
> > > UniqueASIDPerCPU meant that for any two TLB entries on a single CPU, if
> > > they correspond to different tasks (pgd), they should have different
> > > ASIDs. That's a strong requirement, otherwise we end up with the wrong
> > > translation.
> >
> > Yes, I understand that it is a strong requirement. Also, I thought this is
> something
> > that will trigger easily with the test setup I had with the real hardware. But
> testing
> > stayed on for days, so I was not sure it is a problem with the TLA+
> implementation
> > or not.
> 
> Well, you'd have to check the TLA+ state trace and see how it got
> there, whether the last state would be a valid one. It's either
> something missing in the spec that the hardware enforces or the
> algorithm is wrong and just hard to hit in practice. If this condition
> is violated in hardware for a very brief period, e.g. due to some TLBI,
> you'd not notice an issue under normal circumstances. But it's still
> incorrect.

Hmm..I had a quick implementation with a separate thread for speculative
load as done in the arm64kpti.tla, but still get the UniqueVMIDPerCPU violation
error. I will go through the state trace and see whether I can interpret something :) 
 
> > > Why did you remove the CnP? Do we have this disabled for KVM guests?
> >
> > I removed CnP related Invariants to simplify things for the first version. Also
> not sure
> > what specific changes we need to do for CnP here. Do we still need that
> switching to
> > global pg_dir to prevent any speculative reading? I didn't see that being
> done in KVM
> > anywhere at the moment. Maybe I am missing something.
> 
> It make sense to ignore CnP for now. Maybe KVM doesn't even bother and
> sets VTTBR_EL2.CnP to 0 (I haven't checked).

I think KVM sets the CnP here,
https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/kvm_mmu.h#L264

But I haven't figured out what else we need to do in this new VMID allocator for
CnP case. 

Marc, Will?

> 
> > On a side note, In my setup, the CnP=TRUE case for asidalloc.tla now fails
> with,
> > "Error: Invariant TLBEmptyInvalPTE is violated.". Please check.
> 
> This was added later as part of try-to-unmap and I only checked this
> with CnP = FALSE. I'll need to look into, it's possible that
> flush_tlb_all() doesn't take into account that the pte is unmapped (as
> cpu_switch_mm() does). I'll add a separate thread for speculative TLB
> loads, it's easier to have them in one place. Thanks.

Ok.

Thanks,
Shameer

Marc Zyngier Feb. 8, 2022, 5:37 p.m. UTC | #6

On Mon, 22 Nov 2021 12:18:40 +0000, Shameer Kolothum wrote:
> Changes from v3:
> - Main change is in patch #4, where the VMID is now set to an
>   invalid one on vCPU schedule out. Introduced an
>   INVALID_ACTIVE_VMID which is basically a VMID 0 with generation 1.
>   Since the basic allocator algorithm reserves vmid #0, it is never
>   used as an active VMID. This (hopefully) will fix the issue of
>   unnecessarily reserving VMID space with active_vmids when those
>   VMs are no longer active[0] and at the same time address the
>   problem noted in v3 wherein everything ends up in slow-path[1].
> 
> [...]

Applied to next, thanks!

[1/4] KVM: arm64: Introduce a new VMID allocator for KVM
      commit: 417838392f2e657ee25cc30e373ff4c35a0faa90
[2/4] KVM: arm64: Make VMID bits accessible outside of allocator
      commit: f8051e960922a9de8e42159103d5d9c697ef17ec
[3/4] KVM: arm64: Align the VMID allocation with the arm64 ASID
      commit: 3248136b3637e1671e4fa46e32e2122f9ec4bc3d
[4/4] KVM: arm64: Make active_vmids invalid on vCPU schedule out
      commit: 100b4f092f878dc379f1fcef9ce567c25dee3473

Cheers,

	M.

[v4,0/4] kvm/arm: New VMID allocator based on asid

Message

Comments