diff mbox series

KVM: arm64: Ensure I-cache isolation between vcpus of a same VM

Message ID 20210303164505.68492-1-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Ensure I-cache isolation between vcpus of a same VM | expand

Commit Message

Marc Zyngier March 3, 2021, 4:45 p.m. UTC
It recently became apparent that the ARMv8 architecture has interesting
rules regarding attributes being used when fetching instructions
if the MMU is off at Stage-1.

In this situation, the CPU is allowed to fetch from the PoC and
allocate into the I-cache (unless the memory is mapped with
the XN attribute at Stage-2).

If we transpose this to vcpus sharing a single physical CPU,
it is possible for a vcpu running with its MMU off to influence
another vcpu running with its MMU on, as the latter is expected to
fetch from the PoU (and self-patching code doesn't flush below that
level).

In order to solve this, reuse the vcpu-private TLB invalidation
code to apply the same policy to the I-cache, nuking it every time
the vcpu runs on a physical CPU that ran another vcpu of the same
VM in the past.

This involve renaming __kvm_tlb_flush_local_vmid() to
__kvm_flush_cpu_context(), and inserting a local i-cache invalidation
there.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h   | 4 ++--
 arch/arm64/kvm/arm.c               | 7 ++++++-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +++---
 arch/arm64/kvm/hyp/nvhe/tlb.c      | 3 ++-
 arch/arm64/kvm/hyp/vhe/tlb.c       | 3 ++-
 5 files changed, 15 insertions(+), 8 deletions(-)

Comments

Catalin Marinas March 5, 2021, 7:07 p.m. UTC | #1
On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote:
> It recently became apparent that the ARMv8 architecture has interesting
> rules regarding attributes being used when fetching instructions
> if the MMU is off at Stage-1.
> 
> In this situation, the CPU is allowed to fetch from the PoC and
> allocate into the I-cache (unless the memory is mapped with
> the XN attribute at Stage-2).

Digging through the ARM ARM is hard. Do we have this behaviour with FWB
as well?
Marc Zyngier March 6, 2021, 10:54 a.m. UTC | #2
On Fri, 05 Mar 2021 19:07:09 +0000,
Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote:
> > It recently became apparent that the ARMv8 architecture has interesting
> > rules regarding attributes being used when fetching instructions
> > if the MMU is off at Stage-1.
> > 
> > In this situation, the CPU is allowed to fetch from the PoC and
> > allocate into the I-cache (unless the memory is mapped with
> > the XN attribute at Stage-2).
> 
> Digging through the ARM ARM is hard. Do we have this behaviour with FWB
> as well?

The ARM ARM doesn't seem to mention FWB at all when it comes to
instruction fetch, which is sort of expected as it only covers the
D-side. I *think* we could sidestep this when CTR_EL0.DIC is set
though, as the I-side would then snoop the D-side.

Thanks,

	M.
Catalin Marinas March 6, 2021, 2:15 p.m. UTC | #3
On Sat, Mar 06, 2021 at 10:54:47AM +0000, Marc Zyngier wrote:
> On Fri, 05 Mar 2021 19:07:09 +0000,
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > 
> > On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote:
> > > It recently became apparent that the ARMv8 architecture has interesting
> > > rules regarding attributes being used when fetching instructions
> > > if the MMU is off at Stage-1.
> > > 
> > > In this situation, the CPU is allowed to fetch from the PoC and
> > > allocate into the I-cache (unless the memory is mapped with
> > > the XN attribute at Stage-2).
> > 
> > Digging through the ARM ARM is hard. Do we have this behaviour with FWB
> > as well?
> 
> The ARM ARM doesn't seem to mention FWB at all when it comes to
> instruction fetch, which is sort of expected as it only covers the
> D-side. I *think* we could sidestep this when CTR_EL0.DIC is set
> though, as the I-side would then snoop the D-side.

Not sure this helps. CTR_EL0.DIC refers to the need for maintenance to
PoU while the SCTLR_EL1.M == 0 causes the I-cache to fetch from PoC. I
don't think I-cache snooping the D-cache would happen to the PoU when
the S1 MMU is off.

My reading of D4.4.4 is that when SCTLR_EL1.M == 0 both I and D accesses
are Normal Non-cacheable with a note in D4.4.6 that Non-cacheable
accesses may be held in the I-cache.

The FWB rules on combining S1 and S2 says that Normal Non-cacheable at
S1 is "upgraded" to cacheable. This should happen irrespective of
whether the S1 MMU is on or off and should apply to both I and D
accesses (since it does not explicitly says). So I think we could skip
this IC IALLU when FWB is present.

The same logic should apply when the VMM copies the VM text. With FWB,
we probably only need D-cache maintenance to PoU and only if
CTR_EL0.IDC==0. I haven't checked what the code currently does.
Alexandru Elisei March 8, 2021, 4:53 p.m. UTC | #4
Hello,

It's not clear to me why this patch is needed. If one VCPU in the VM is generating
code, is it not the software running in the VM responsible for keeping track of
the MMU state of the other VCPUs and making sure the new code is executed
correctly? Why should KVM get involved?

I don't see how this is different than running on bare metal (no hypervisor), and
one CPU with the MMU on generates code that another CPU with the MMU off must execute.

Some comments below.

On 3/6/21 2:15 PM, Catalin Marinas wrote:
> On Sat, Mar 06, 2021 at 10:54:47AM +0000, Marc Zyngier wrote:
>> On Fri, 05 Mar 2021 19:07:09 +0000,
>> Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote:
>>>> It recently became apparent that the ARMv8 architecture has interesting
>>>> rules regarding attributes being used when fetching instructions
>>>> if the MMU is off at Stage-1.
>>>>
>>>> In this situation, the CPU is allowed to fetch from the PoC and
>>>> allocate into the I-cache (unless the memory is mapped with
>>>> the XN attribute at Stage-2).
>>> Digging through the ARM ARM is hard. Do we have this behaviour with FWB
>>> as well?
>> The ARM ARM doesn't seem to mention FWB at all when it comes to
>> instruction fetch, which is sort of expected as it only covers the
>> D-side. I *think* we could sidestep this when CTR_EL0.DIC is set
>> though, as the I-side would then snoop the D-side.
> Not sure this helps. CTR_EL0.DIC refers to the need for maintenance to
> PoU while the SCTLR_EL1.M == 0 causes the I-cache to fetch from PoC. I
> don't think I-cache snooping the D-cache would happen to the PoU when
> the S1 MMU is off.

FEAT_FWB requires that CLIDR_EL1.{LoUIS, LoUU} = {0, 0} which means that no dcache
clean is required for instruction to data coherence (page D13-3086). I interpret
that as saying that with FEAT_FWB, CTR_EL0.IDC is effectively 1, which means that
dcache clean is not required for instruction generation, and icache invalidation
is required only if CTR_EL0.DIC = 0 (according to B2-158).

> My reading of D4.4.4 is that when SCTLR_EL1.M == 0 both I and D accesses
> are Normal Non-cacheable with a note in D4.4.6 that Non-cacheable
> accesses may be held in the I-cache.

Nitpicking, but SCTLR_EL1.M == 0 and SCTLR_EL1.I == 1 means that instruction
fetches are to Normal Cacheable, Inner and Outer Read-Allocate memory (ARM DDI
0487G.a, pages D5-2709 and indirectly at D13-3586).

Like you've pointed out, as mentioned in D4.4.6, it is always possible that
instruction fetches are held in the instruction cache, regardless of the state of
the SCTLR_EL1.M bit.

> The FWB rules on combining S1 and S2 says that Normal Non-cacheable at
> S1 is "upgraded" to cacheable. This should happen irrespective of
> whether the S1 MMU is on or off and should apply to both I and D
> accesses (since it does not explicitly says). So I think we could skip
> this IC IALLU when FWB is present.
>
> The same logic should apply when the VMM copies the VM text. With FWB,
> we probably only need D-cache maintenance to PoU and only if
> CTR_EL0.IDC==0. I haven't checked what the code currently does.

When FEAT_FWB, CTR_EL0.IDC is effectively 1 (see above), so we don't need a dcache
clean in this case.

Thanks,

Alex
Marc Zyngier March 8, 2021, 8:03 p.m. UTC | #5
Hi Alex,

On Mon, 08 Mar 2021 16:53:09 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hello,
> 
> It's not clear to me why this patch is needed. If one VCPU in the VM is generating
> code, is it not the software running in the VM responsible for keeping track of
> the MMU state of the other VCPUs and making sure the new code is executed
> correctly? Why should KVM get involved?
> 
> I don't see how this is different than running on bare metal (no
> hypervisor), and one CPU with the MMU on generates code that another
> CPU with the MMU off must execute.

The difference is that so far, we have always considered i-caches to
be private to each CPU. With a hypervisor that allows migration of
vcpus from one physical CPU to another, the i-cache isn't private
anymore from the perspective of the vcpus.

> 
> Some comments below.
> 
> On 3/6/21 2:15 PM, Catalin Marinas wrote:
> > On Sat, Mar 06, 2021 at 10:54:47AM +0000, Marc Zyngier wrote:
> >> On Fri, 05 Mar 2021 19:07:09 +0000,
> >> Catalin Marinas <catalin.marinas@arm.com> wrote:
> >>> On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote:
> >>>> It recently became apparent that the ARMv8 architecture has interesting
> >>>> rules regarding attributes being used when fetching instructions
> >>>> if the MMU is off at Stage-1.
> >>>>
> >>>> In this situation, the CPU is allowed to fetch from the PoC and
> >>>> allocate into the I-cache (unless the memory is mapped with
> >>>> the XN attribute at Stage-2).
> >>> Digging through the ARM ARM is hard. Do we have this behaviour with FWB
> >>> as well?
> >> The ARM ARM doesn't seem to mention FWB at all when it comes to
> >> instruction fetch, which is sort of expected as it only covers the
> >> D-side. I *think* we could sidestep this when CTR_EL0.DIC is set
> >> though, as the I-side would then snoop the D-side.
> > Not sure this helps. CTR_EL0.DIC refers to the need for maintenance to
> > PoU while the SCTLR_EL1.M == 0 causes the I-cache to fetch from PoC. I
> > don't think I-cache snooping the D-cache would happen to the PoU when
> > the S1 MMU is off.
> 
> FEAT_FWB requires that CLIDR_EL1.{LoUIS, LoUU} = {0, 0} which means
> that no dcache clean is required for instruction to data coherence
> (page D13-3086). I interpret that as saying that with FEAT_FWB,
> CTR_EL0.IDC is effectively 1, which means that dcache clean is not
> required for instruction generation, and icache invalidation is
> required only if CTR_EL0.DIC = 0 (according to B2-158).
> 
> > My reading of D4.4.4 is that when SCTLR_EL1.M == 0 both I and D accesses
> > are Normal Non-cacheable with a note in D4.4.6 that Non-cacheable
> > accesses may be held in the I-cache.
> 
> Nitpicking, but SCTLR_EL1.M == 0 and SCTLR_EL1.I == 1 means that
> instruction fetches are to Normal Cacheable, Inner and Outer
> Read-Allocate memory (ARM DDI 0487G.a, pages D5-2709 and indirectly
> at D13-3586).

I think that's the allocation in unified caches, and not necessarily
the i-cache, given that it also mention things such as "Inner
Write-Through", which makes no sense for the i-cache.

> Like you've pointed out, as mentioned in D4.4.6, it is always
> possible that instruction fetches are held in the instruction cache,
> regardless of the state of the SCTLR_EL1.M bit.

Exactly, and that's what breaks things.

> > The FWB rules on combining S1 and S2 says that Normal Non-cacheable at
> > S1 is "upgraded" to cacheable. This should happen irrespective of
> > whether the S1 MMU is on or off and should apply to both I and D
> > accesses (since it does not explicitly says). So I think we could skip
> > this IC IALLU when FWB is present.
> >
> > The same logic should apply when the VMM copies the VM text. With FWB,
> > we probably only need D-cache maintenance to PoU and only if
> > CTR_EL0.IDC==0. I haven't checked what the code currently does.
> 
> When FEAT_FWB, CTR_EL0.IDC is effectively 1 (see above), so we don't
> need a dcache clean in this case.

But that isn't what concerns me. FWB is exclusively documented in
terms of d-cache, and doesn't describe how that affects the
instruction fetch (which is why I'm reluctant to attribute any effect
to it).

Thanks,

	M.
Will Deacon March 9, 2021, 1:26 p.m. UTC | #6
On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote:
> It recently became apparent that the ARMv8 architecture has interesting
> rules regarding attributes being used when fetching instructions
> if the MMU is off at Stage-1.
> 
> In this situation, the CPU is allowed to fetch from the PoC and
> allocate into the I-cache (unless the memory is mapped with
> the XN attribute at Stage-2).
> 
> If we transpose this to vcpus sharing a single physical CPU,
> it is possible for a vcpu running with its MMU off to influence
> another vcpu running with its MMU on, as the latter is expected to
> fetch from the PoU (and self-patching code doesn't flush below that
> level).
> 
> In order to solve this, reuse the vcpu-private TLB invalidation
> code to apply the same policy to the I-cache, nuking it every time
> the vcpu runs on a physical CPU that ran another vcpu of the same
> VM in the past.
> 
> This involve renaming __kvm_tlb_flush_local_vmid() to
> __kvm_flush_cpu_context(), and inserting a local i-cache invalidation
> there.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_asm.h   | 4 ++--
>  arch/arm64/kvm/arm.c               | 7 ++++++-
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +++---
>  arch/arm64/kvm/hyp/nvhe/tlb.c      | 3 ++-
>  arch/arm64/kvm/hyp/vhe/tlb.c       | 3 ++-
>  5 files changed, 15 insertions(+), 8 deletions(-)

Since the FWB discussion doesn't affect the correctness of this patch:

Acked-by: Will Deacon <will@kernel.org>

Will
Catalin Marinas March 9, 2021, 2:38 p.m. UTC | #7
On Tue, Mar 09, 2021 at 01:26:46PM +0000, Will Deacon wrote:
> On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote:
> > It recently became apparent that the ARMv8 architecture has interesting
> > rules regarding attributes being used when fetching instructions
> > if the MMU is off at Stage-1.
> > 
> > In this situation, the CPU is allowed to fetch from the PoC and
> > allocate into the I-cache (unless the memory is mapped with
> > the XN attribute at Stage-2).
> > 
> > If we transpose this to vcpus sharing a single physical CPU,
> > it is possible for a vcpu running with its MMU off to influence
> > another vcpu running with its MMU on, as the latter is expected to
> > fetch from the PoU (and self-patching code doesn't flush below that
> > level).
> > 
> > In order to solve this, reuse the vcpu-private TLB invalidation
> > code to apply the same policy to the I-cache, nuking it every time
> > the vcpu runs on a physical CPU that ran another vcpu of the same
> > VM in the past.
> > 
> > This involve renaming __kvm_tlb_flush_local_vmid() to
> > __kvm_flush_cpu_context(), and inserting a local i-cache invalidation
> > there.
> > 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_asm.h   | 4 ++--
> >  arch/arm64/kvm/arm.c               | 7 ++++++-
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +++---
> >  arch/arm64/kvm/hyp/nvhe/tlb.c      | 3 ++-
> >  arch/arm64/kvm/hyp/vhe/tlb.c       | 3 ++-
> >  5 files changed, 15 insertions(+), 8 deletions(-)
> 
> Since the FWB discussion doesn't affect the correctness of this patch:
> 
> Acked-by: Will Deacon <will@kernel.org>

I agree. We can optimise it later for FWB.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Alexandru Elisei March 9, 2021, 5:07 p.m. UTC | #8
Hi Marc,

On 3/8/21 8:03 PM, Marc Zyngier wrote:
> Hi Alex,
>
> On Mon, 08 Mar 2021 16:53:09 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>> Hello,
>>
>> It's not clear to me why this patch is needed. If one VCPU in the VM is generating
>> code, is it not the software running in the VM responsible for keeping track of
>> the MMU state of the other VCPUs and making sure the new code is executed
>> correctly? Why should KVM get involved?
>>
>> I don't see how this is different than running on bare metal (no
>> hypervisor), and one CPU with the MMU on generates code that another
>> CPU with the MMU off must execute.
> The difference is that so far, we have always considered i-caches to
> be private to each CPU. With a hypervisor that allows migration of
> vcpus from one physical CPU to another, the i-cache isn't private
> anymore from the perspective of the vcpus.

I think I understand what the problem is. VCPU X running on CPU A with MMU off
fetches instructions from PoC and allocates them into the icache. VCPU Y running
on CPU B generates code and does dcache clean to PoU + icache invalidate, gets
scheduled on CPU A and executes the stale instructions fetched by VCPU X from PoC.

>
>> Some comments below.
>>
>> On 3/6/21 2:15 PM, Catalin Marinas wrote:
>>> On Sat, Mar 06, 2021 at 10:54:47AM +0000, Marc Zyngier wrote:
>>>> On Fri, 05 Mar 2021 19:07:09 +0000,
>>>> Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>>> On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote:
>>>>>> It recently became apparent that the ARMv8 architecture has interesting
>>>>>> rules regarding attributes being used when fetching instructions
>>>>>> if the MMU is off at Stage-1.
>>>>>>
>>>>>> In this situation, the CPU is allowed to fetch from the PoC and
>>>>>> allocate into the I-cache (unless the memory is mapped with
>>>>>> the XN attribute at Stage-2).
>>>>> Digging through the ARM ARM is hard. Do we have this behaviour with FWB
>>>>> as well?
>>>> The ARM ARM doesn't seem to mention FWB at all when it comes to
>>>> instruction fetch, which is sort of expected as it only covers the
>>>> D-side. I *think* we could sidestep this when CTR_EL0.DIC is set
>>>> though, as the I-side would then snoop the D-side.
>>> Not sure this helps. CTR_EL0.DIC refers to the need for maintenance to
>>> PoU while the SCTLR_EL1.M == 0 causes the I-cache to fetch from PoC. I
>>> don't think I-cache snooping the D-cache would happen to the PoU when
>>> the S1 MMU is off.
>> FEAT_FWB requires that CLIDR_EL1.{LoUIS, LoUU} = {0, 0} which means
>> that no dcache clean is required for instruction to data coherence
>> (page D13-3086). I interpret that as saying that with FEAT_FWB,
>> CTR_EL0.IDC is effectively 1, which means that dcache clean is not
>> required for instruction generation, and icache invalidation is
>> required only if CTR_EL0.DIC = 0 (according to B2-158).
>>
>>> My reading of D4.4.4 is that when SCTLR_EL1.M == 0 both I and D accesses
>>> are Normal Non-cacheable with a note in D4.4.6 that Non-cacheable
>>> accesses may be held in the I-cache.
>> Nitpicking, but SCTLR_EL1.M == 0 and SCTLR_EL1.I == 1 means that
>> instruction fetches are to Normal Cacheable, Inner and Outer
>> Read-Allocate memory (ARM DDI 0487G.a, pages D5-2709 and indirectly
>> at D13-3586).
> I think that's the allocation in unified caches, and not necessarily
> the i-cache, given that it also mention things such as "Inner
> Write-Through", which makes no sense for the i-cache.
>> Like you've pointed out, as mentioned in D4.4.6, it is always
>> possible that instruction fetches are held in the instruction cache,
>> regardless of the state of the SCTLR_EL1.M bit.
> Exactly, and that's what breaks things.
>
>>> The FWB rules on combining S1 and S2 says that Normal Non-cacheable at
>>> S1 is "upgraded" to cacheable. This should happen irrespective of
>>> whether the S1 MMU is on or off and should apply to both I and D
>>> accesses (since it does not explicitly says). So I think we could skip
>>> this IC IALLU when FWB is present.
>>>
>>> The same logic should apply when the VMM copies the VM text. With FWB,
>>> we probably only need D-cache maintenance to PoU and only if
>>> CTR_EL0.IDC==0. I haven't checked what the code currently does.
>> When FEAT_FWB, CTR_EL0.IDC is effectively 1 (see above), so we don't
>> need a dcache clean in this case.
> But that isn't what concerns me. FWB is exclusively documented in
> terms of d-cache, and doesn't describe how that affects the
> instruction fetch (which is why I'm reluctant to attribute any effect
> to it).

I tend to agree with this. FEAT_S2FWB is described in terms of resultant memory
type, cacheability attribute and cacheability hints, which in the architecture
don't affect the need to do instruction cache invalidation or data cache clean
when generating instructions.

There's also this part which is specifically targeted at instruction generation
(page D5-2761):

"When FEAT_S2FWB is implemented, the architecture requires that CLIDR_EL1.{LOUU,
LOIUS} are zero so that no levels of data cache need to be cleaned in order to
manage coherency with instruction fetches."

There's no mention of not needing to do instruction invalidation. I think the
invalidation is still necessary with FWB when CTR_EL0.DIC == 0b0.

Thanks,

Alex

> Thanks,
>
> 	M.
>
Marc Zyngier March 9, 2021, 6:01 p.m. UTC | #9
On Wed, 3 Mar 2021 16:45:05 +0000, Marc Zyngier wrote:
> It recently became apparent that the ARMv8 architecture has interesting
> rules regarding attributes being used when fetching instructions
> if the MMU is off at Stage-1.
> 
> In this situation, the CPU is allowed to fetch from the PoC and
> allocate into the I-cache (unless the memory is mapped with
> the XN attribute at Stage-2).
> 
> [...]

Applied to fixes, thanks!

[1/1] KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
      commit: 01dc9262ff5797b675c32c0c6bc682777d23de05

Cheers,

	M.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 9c0e396dd03f..a7ab84f781f7 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -47,7 +47,7 @@ 
 #define __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context		2
 #define __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_ipa		3
 #define __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid		4
-#define __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_local_vmid	5
+#define __KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context		5
 #define __KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff		6
 #define __KVM_HOST_SMCCC_FUNC___kvm_enable_ssbs			7
 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config		8
@@ -183,10 +183,10 @@  DECLARE_KVM_HYP_SYM(__bp_harden_hyp_vecs);
 #define __bp_harden_hyp_vecs	CHOOSE_HYP_SYM(__bp_harden_hyp_vecs)
 
 extern void __kvm_flush_vm_context(void);
+extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 				     int level);
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
-extern void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fc4c95dd2d26..7f06ba76698d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -385,11 +385,16 @@  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
 
 	/*
+	 * We guarantee that both TLBs and I-cache are private to each
+	 * vcpu. If detecting that a vcpu from the same VM has
+	 * previously run on the same physical CPU, call into the
+	 * hypervisor code to nuke the relevant contexts.
+	 *
 	 * We might get preempted before the vCPU actually runs, but
 	 * over-invalidation doesn't affect correctness.
 	 */
 	if (*last_ran != vcpu->vcpu_id) {
-		kvm_call_hyp(__kvm_tlb_flush_local_vmid, mmu);
+		kvm_call_hyp(__kvm_flush_cpu_context, mmu);
 		*last_ran = vcpu->vcpu_id;
 	}
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 8f129968204e..936328207bde 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -46,11 +46,11 @@  static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
 	__kvm_tlb_flush_vmid(kern_hyp_va(mmu));
 }
 
-static void handle___kvm_tlb_flush_local_vmid(struct kvm_cpu_context *host_ctxt)
+static void handle___kvm_flush_cpu_context(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
 
-	__kvm_tlb_flush_local_vmid(kern_hyp_va(mmu));
+	__kvm_flush_cpu_context(kern_hyp_va(mmu));
 }
 
 static void handle___kvm_timer_set_cntvoff(struct kvm_cpu_context *host_ctxt)
@@ -115,7 +115,7 @@  static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__kvm_flush_vm_context),
 	HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
 	HANDLE_FUNC(__kvm_tlb_flush_vmid),
-	HANDLE_FUNC(__kvm_tlb_flush_local_vmid),
+	HANDLE_FUNC(__kvm_flush_cpu_context),
 	HANDLE_FUNC(__kvm_timer_set_cntvoff),
 	HANDLE_FUNC(__kvm_enable_ssbs),
 	HANDLE_FUNC(__vgic_v3_get_gic_config),
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index fbde89a2c6e8..229b06748c20 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -123,7 +123,7 @@  void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 	__tlb_switch_to_host(&cxt);
 }
 
-void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
+void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
 {
 	struct tlb_inv_context cxt;
 
@@ -131,6 +131,7 @@  void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
 	__tlb_switch_to_guest(mmu, &cxt);
 
 	__tlbi(vmalle1);
+	asm volatile("ic iallu");
 	dsb(nsh);
 	isb();
 
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index fd7895945bbc..66f17349f0c3 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -127,7 +127,7 @@  void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 	__tlb_switch_to_host(&cxt);
 }
 
-void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
+void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
 {
 	struct tlb_inv_context cxt;
 
@@ -135,6 +135,7 @@  void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
 	__tlb_switch_to_guest(mmu, &cxt);
 
 	__tlbi(vmalle1);
+	asm volatile("ic iallu");
 	dsb(nsh);
 	isb();