diff mbox series

[v2,2/2] KVM: selftests: arm64: Explicitly set the page attrs to Inner-Shareable

Message ID 20250405001042.1470552-3-rananta@google.com (mailing list archive)
State New
Headers show
Series KVM : selftests: arm64: Explicitly set the page attrs to Inner-Shareable | expand

Commit Message

Raghavendra Rao Ananta April 5, 2025, 12:10 a.m. UTC
Atomic instructions such as 'ldset' over (global) variables in the guest
is observed to cause an EL1 data abort with FSC 0x35 (IMPLEMENTATION
DEFINED fault (Unsupported Exclusive or Atomic access)). The observation
was particularly apparent on Neoverse-N3.

According to ARM ARM DDI0487L.a B2.2.6 (Possible implementation
restrictions on using atomic instructions), atomic instructions are
architecturally guaranteed for Inner Shareable and Outer Shareable
attributes. For Non-Shareable attribute, the atomic instructions are
not atomic and issuing such an instruction can lead to the FSC
mentioned in this case (among other things).

Moreover, according to DDI0487L.a C3.2.6 (Single-copy atomic 64-byte
load/store), it is implementation defined that a data abort with the
mentioned FSC is reported for the first stage of translation that
provides an inappropriate memory type. It's likely that Neoverse-N3
chose to implement these two and why we see an FSC of 0x35 in EL1 upon
executing atomic instructions.

ARM64 KVM selftests sets no shareable attributes, which makes them
Non-Shareable by default. Hence, explicitly set them as Inner-Shareable
to fix this issue.

Suggested-by: Oliver Upton <oupton@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 tools/testing/selftests/kvm/include/arm64/processor.h | 1 +
 tools/testing/selftests/kvm/lib/arm64/processor.c     | 3 +++
 2 files changed, 4 insertions(+)

Comments

Oliver Upton April 5, 2025, 2:50 a.m. UTC | #1
On Fri, Apr 04, 2025 at 05:31:49PM -0700, Mingwei Zhang wrote:
> On Fri, Apr 4, 2025 at 5:10 PM Raghavendra Rao Ananta
> <rananta@google.com> wrote:
> >
> > Atomic instructions such as 'ldset' over (global) variables in the guest
> > is observed to cause an EL1 data abort with FSC 0x35 (IMPLEMENTATION
> > DEFINED fault (Unsupported Exclusive or Atomic access)). The observation
> > was particularly apparent on Neoverse-N3.
> >
> > According to ARM ARM DDI0487L.a B2.2.6 (Possible implementation
> > restrictions on using atomic instructions), atomic instructions are
> > architecturally guaranteed for Inner Shareable and Outer Shareable
> > attributes. For Non-Shareable attribute, the atomic instructions are
> > not atomic and issuing such an instruction can lead to the FSC
> > mentioned in this case (among other things).
> >
> > Moreover, according to DDI0487L.a C3.2.6 (Single-copy atomic 64-byte
> > load/store), it is implementation defined that a data abort with the
> > mentioned FSC is reported for the first stage of translation that
> > provides an inappropriate memory type. It's likely that Neoverse-N3
> > chose to implement these two and why we see an FSC of 0x35 in EL1 upon
> > executing atomic instructions.

Ok, can we please drop this second reference?

This is talking about something else (FEAT_LS64) that happens to share
the same FSC as an unsupported atomic instruction. I mentioned this to
you internally as an illustration of how different implementations may
behave when determining if the attributes support a particular access,
but it isn't actually relevant to this change.

> nit: It's likely that Neoverse-N3 chose to implement this option (the
> first option) instead of reporting at the final enabled stage of
> translation

I would much rather we rely on the language that describes what the
architecture guarantees rather than speculate as to how Neoverse-N3
behaves.

Mentioning that the breakage was observed on Neoverse-N3 is still useful
to add to the changelog.

> I have minor question here: The DDI0487L C3.2.6 (Single-copy atomic
> 64-byte load/store) mentioned
> 
> """
> When the instructions access a memory type that is not one of the
> following, a data abort for unsupported Exclusive or atomic access is
> generated:
> 
> • Normal Inner Non-cacheable, Outer Non-cacheable.
> """
> 
> So, the above is the "Normal Inner Non-cacheable", but in our case we
> have "Normal and non-shareable" in stage-1 mapping, right? I know it
> is very close, but it seems the situation is still only "one bit" away
> in my understanding...

This citation relates to FEAT_LS64. If you look at B2.2.6 instead, it
reads:

"""
The memory types for which it is architecturally guaranteed that the
atomic instructions will be atomic are:

 - Inner Shareable, Inner Write-Back, Outer Write-Back Normal memory
   with Read allocation hints and Write allocation hints and not
   transient.

 - Outer Shareable, Inner Write-Back, Outer Write-Back Normal memory
   with Read allocation hints and Write allocation hints and not
   transient.
"""

and

"""
If the atomic insturctions are not atomic in regard to other agents that
access memory, then performing an atomic instruction to such a location
can have one or more of the following effects:

[...]

 - The instruction generates an IMPLEMENTATION DEFINED fault reported
   using the Data Abort Fault status code of ESR_ELx.DFSC = 110101
"""

The memory type used by KVM selftests is *Non-Shareable*, which is not
one of the memory types guaranteed by the architecture to work.

Thanks,
Oliver
Mingwei Zhang April 5, 2025, 7:24 a.m. UTC | #2
On Fri, Apr 4, 2025 at 7:51 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Fri, Apr 04, 2025 at 05:31:49PM -0700, Mingwei Zhang wrote:
> > On Fri, Apr 4, 2025 at 5:10 PM Raghavendra Rao Ananta
> > <rananta@google.com> wrote:
> > >
> > > Atomic instructions such as 'ldset' over (global) variables in the guest
> > > is observed to cause an EL1 data abort with FSC 0x35 (IMPLEMENTATION
> > > DEFINED fault (Unsupported Exclusive or Atomic access)). The observation
> > > was particularly apparent on Neoverse-N3.
> > >
> > > According to ARM ARM DDI0487L.a B2.2.6 (Possible implementation
> > > restrictions on using atomic instructions), atomic instructions are
> > > architecturally guaranteed for Inner Shareable and Outer Shareable
> > > attributes. For Non-Shareable attribute, the atomic instructions are
> > > not atomic and issuing such an instruction can lead to the FSC
> > > mentioned in this case (among other things).
> > >
> > > Moreover, according to DDI0487L.a C3.2.6 (Single-copy atomic 64-byte
> > > load/store), it is implementation defined that a data abort with the
> > > mentioned FSC is reported for the first stage of translation that
> > > provides an inappropriate memory type. It's likely that Neoverse-N3
> > > chose to implement these two and why we see an FSC of 0x35 in EL1 upon
> > > executing atomic instructions.
>
> Ok, can we please drop this second reference?
>
> This is talking about something else (FEAT_LS64) that happens to share
> the same FSC as an unsupported atomic instruction. I mentioned this to
> you internally as an illustration of how different implementations may
> behave when determining if the attributes support a particular access,
> but it isn't actually relevant to this change.
>
> > nit: It's likely that Neoverse-N3 chose to implement this option (the
> > first option) instead of reporting at the final enabled stage of
> > translation
>
> I would much rather we rely on the language that describes what the
> architecture guarantees rather than speculate as to how Neoverse-N3
> behaves.
>
> Mentioning that the breakage was observed on Neoverse-N3 is still useful
> to add to the changelog.
>
> > I have minor question here: The DDI0487L C3.2.6 (Single-copy atomic
> > 64-byte load/store) mentioned
> >
> > """
> > When the instructions access a memory type that is not one of the
> > following, a data abort for unsupported Exclusive or atomic access is
> > generated:
> >
> > • Normal Inner Non-cacheable, Outer Non-cacheable.
> > """
> >
> > So, the above is the "Normal Inner Non-cacheable", but in our case we
> > have "Normal and non-shareable" in stage-1 mapping, right? I know it
> > is very close, but it seems the situation is still only "one bit" away
> > in my understanding...
>
> This citation relates to FEAT_LS64. If you look at B2.2.6 instead, it
> reads:
>
> """
> The memory types for which it is architecturally guaranteed that the
> atomic instructions will be atomic are:
>
>  - Inner Shareable, Inner Write-Back, Outer Write-Back Normal memory
>    with Read allocation hints and Write allocation hints and not
>    transient.
>
>  - Outer Shareable, Inner Write-Back, Outer Write-Back Normal memory
>    with Read allocation hints and Write allocation hints and not
>    transient.
> """

Agree that the above should be the right place to cite. C3.2.6 seems
to discuss atomic instruction with memory attributes bits(which points
to MAIR_EL1 and MAIR_EL2?). In this case, it is more related to
shareability bits instead.


>
> and
>
> """
> If the atomic insturctions are not atomic in regard to other agents that
> access memory, then performing an atomic instruction to such a location
> can have one or more of the following effects:
>
> [...]
>
>  - The instruction generates an IMPLEMENTATION DEFINED fault reported
>    using the Data Abort Fault status code of ESR_ELx.DFSC = 110101
> """
>
> The memory type used by KVM selftests is *Non-Shareable*, which is not
> one of the memory types guaranteed by the architecture to work.
>
> Thanks,
> Oliver
Marc Zyngier April 5, 2025, 9:33 a.m. UTC | #3
On Sat, 05 Apr 2025 08:24:30 +0100,
Mingwei Zhang <mizhang@google.com> wrote:
> 
> On Fri, Apr 4, 2025 at 7:51 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > On Fri, Apr 04, 2025 at 05:31:49PM -0700, Mingwei Zhang wrote:
> > > On Fri, Apr 4, 2025 at 5:10 PM Raghavendra Rao Ananta
> > > <rananta@google.com> wrote:
> > > >
> > > > Atomic instructions such as 'ldset' over (global) variables in the guest
> > > > is observed to cause an EL1 data abort with FSC 0x35 (IMPLEMENTATION
> > > > DEFINED fault (Unsupported Exclusive or Atomic access)). The observation
> > > > was particularly apparent on Neoverse-N3.
> > > >
> > > > According to ARM ARM DDI0487L.a B2.2.6 (Possible implementation
> > > > restrictions on using atomic instructions), atomic instructions are
> > > > architecturally guaranteed for Inner Shareable and Outer Shareable
> > > > attributes. For Non-Shareable attribute, the atomic instructions are
> > > > not atomic and issuing such an instruction can lead to the FSC
> > > > mentioned in this case (among other things).
> > > >
> > > > Moreover, according to DDI0487L.a C3.2.6 (Single-copy atomic 64-byte
> > > > load/store), it is implementation defined that a data abort with the
> > > > mentioned FSC is reported for the first stage of translation that
> > > > provides an inappropriate memory type. It's likely that Neoverse-N3
> > > > chose to implement these two and why we see an FSC of 0x35 in EL1 upon
> > > > executing atomic instructions.
> >
> > Ok, can we please drop this second reference?
> >
> > This is talking about something else (FEAT_LS64) that happens to share
> > the same FSC as an unsupported atomic instruction. I mentioned this to
> > you internally as an illustration of how different implementations may
> > behave when determining if the attributes support a particular access,
> > but it isn't actually relevant to this change.
> >
> > > nit: It's likely that Neoverse-N3 chose to implement this option (the
> > > first option) instead of reporting at the final enabled stage of
> > > translation
> >
> > I would much rather we rely on the language that describes what the
> > architecture guarantees rather than speculate as to how Neoverse-N3
> > behaves.
> >
> > Mentioning that the breakage was observed on Neoverse-N3 is still useful
> > to add to the changelog.
> >
> > > I have minor question here: The DDI0487L C3.2.6 (Single-copy atomic
> > > 64-byte load/store) mentioned
> > >
> > > """
> > > When the instructions access a memory type that is not one of the
> > > following, a data abort for unsupported Exclusive or atomic access is
> > > generated:
> > >
> > > • Normal Inner Non-cacheable, Outer Non-cacheable.
> > > """
> > >
> > > So, the above is the "Normal Inner Non-cacheable", but in our case we
> > > have "Normal and non-shareable" in stage-1 mapping, right? I know it
> > > is very close, but it seems the situation is still only "one bit" away
> > > in my understanding...
> >
> > This citation relates to FEAT_LS64. If you look at B2.2.6 instead, it
> > reads:
> >
> > """
> > The memory types for which it is architecturally guaranteed that the
> > atomic instructions will be atomic are:
> >
> >  - Inner Shareable, Inner Write-Back, Outer Write-Back Normal memory
> >    with Read allocation hints and Write allocation hints and not
> >    transient.
> >
> >  - Outer Shareable, Inner Write-Back, Outer Write-Back Normal memory
> >    with Read allocation hints and Write allocation hints and not
> >    transient.
> > """
> 
> Agree that the above should be the right place to cite. C3.2.6 seems
> to discuss atomic instruction with memory attributes bits(which points
> to MAIR_EL1 and MAIR_EL2?). In this case, it is more related to
> shareability bits instead.

Not sure what you mean by this reference to MAIR.

These constraints apply to any memory access, including those that are
*not* the direct effect of an instruction.  For example, an atomic
page table update generated by the HW (AF or DB update) is only
guaranteed to succeed if the PTW is itself configured to use ISH+WB or
OSH+WB, and could otherwise result in any of the ill effects described
in B2.2.6.

	M.
diff mbox series

Patch

diff --git a/tools/testing/selftests/kvm/include/arm64/processor.h b/tools/testing/selftests/kvm/include/arm64/processor.h
index 7d88ff22013a..b0fc0f945766 100644
--- a/tools/testing/selftests/kvm/include/arm64/processor.h
+++ b/tools/testing/selftests/kvm/include/arm64/processor.h
@@ -113,6 +113,7 @@ 
 #define PMD_TYPE_TABLE		BIT(1)
 #define PTE_TYPE_PAGE		BIT(1)
 
+#define PTE_SHARED		(UL(3) << 8) /* SH[1:0], inner shareable */
 #define PTE_AF			BIT(10)
 
 #define PTE_ADDR_MASK(page_shift)	GENMASK(47, (page_shift))
diff --git a/tools/testing/selftests/kvm/lib/arm64/processor.c b/tools/testing/selftests/kvm/lib/arm64/processor.c
index da5802c8a59c..9d69904cb608 100644
--- a/tools/testing/selftests/kvm/lib/arm64/processor.c
+++ b/tools/testing/selftests/kvm/lib/arm64/processor.c
@@ -172,6 +172,9 @@  static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 	}
 
 	pg_attr = PTE_AF | PTE_ATTRINDX(attr_idx) | PTE_TYPE_PAGE | PTE_VALID;
+	if (!use_lpa2_pte_format(vm))
+		pg_attr |= PTE_SHARED;
+
 	*ptep = addr_pte(vm, paddr, pg_attr);
 }