mbox series

[0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS

Message ID 20240702234155.2106399-1-richard.henderson@linaro.org (mailing list archive)
Headers show
Series target/arm: Fix unwind from dc zva and FEAT_MOPS | expand

Message

Richard Henderson July 2, 2024, 11:41 p.m. UTC
While looking into Zoltan's attempt to speed up ppc64 DCBZ
(data cache block set to zero), I wondered what AArch64 was
doing differently.  It turned out that Arm is the only user
of tlb_vaddr_to_host.

None of the code sequences in use between AArch64, Power64 and S390X
are 100% safe, with race conditions vs mmap et al, however, AArch64
is the only one that will fail this single threaded test case.  Use
of these new functions fixes the race condition as well, though I
have not yet touched the other guests.

I thought about exposing accel/tcg/user-retaddr.h for direct use
from the targets, but perhaps these wrappers are cleaner.  RFC?


r~


Richard Henderson (2):
  accel/tcg: Introduce memset_ra, memmove_ra
  target/arm: Use memset_ra, memmove_ra in helper-a64.c

 include/exec/cpu_ldst.h            | 40 ++++++++++++++++
 accel/tcg/user-exec.c              | 22 +++++++++
 target/arm/tcg/helper-a64.c        | 10 ++--
 tests/tcg/multiarch/memset-fault.c | 77 ++++++++++++++++++++++++++++++
 4 files changed, 144 insertions(+), 5 deletions(-)
 create mode 100644 tests/tcg/multiarch/memset-fault.c

Comments

Ilya Leoshkevich July 4, 2024, 2:50 p.m. UTC | #1
On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote:
> While looking into Zoltan's attempt to speed up ppc64 DCBZ
> (data cache block set to zero), I wondered what AArch64 was
> doing differently.  It turned out that Arm is the only user
> of tlb_vaddr_to_host.
> 
> None of the code sequences in use between AArch64, Power64 and S390X
> are 100% safe, with race conditions vs mmap et al, however, AArch64
> is the only one that will fail this single threaded test case.  Use
> of these new functions fixes the race condition as well, though I
> have not yet touched the other guests.
> 
> I thought about exposing accel/tcg/user-retaddr.h for direct use
> from the targets, but perhaps these wrappers are cleaner.  RFC?
> 
> 
> r~
> 
> 
> Richard Henderson (2):
>   accel/tcg: Introduce memset_ra, memmove_ra
>   target/arm: Use memset_ra, memmove_ra in helper-a64.c
> 
>  include/exec/cpu_ldst.h            | 40 ++++++++++++++++
>  accel/tcg/user-exec.c              | 22 +++++++++
>  target/arm/tcg/helper-a64.c        | 10 ++--
>  tests/tcg/multiarch/memset-fault.c | 77
> ++++++++++++++++++++++++++++++
>  4 files changed, 144 insertions(+), 5 deletions(-)
>  create mode 100644 tests/tcg/multiarch/memset-fault.c

This sounds good to me.

I haven't debugged it, but I wonder why doesn't s390x fail here.
For XC with src == dst, it does access_memset() -> do_access_memset()
-> memset() without setting the RA. And I don't think that anything
around it sets the RA either.
Richard Henderson July 4, 2024, 3:18 p.m. UTC | #2
On 7/4/24 07:50, Ilya Leoshkevich wrote:
> On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote:
>> While looking into Zoltan's attempt to speed up ppc64 DCBZ
>> (data cache block set to zero), I wondered what AArch64 was
>> doing differently.  It turned out that Arm is the only user
>> of tlb_vaddr_to_host.
>>
>> None of the code sequences in use between AArch64, Power64 and S390X
>> are 100% safe, with race conditions vs mmap et al, however, AArch64
>> is the only one that will fail this single threaded test case.  Use
>> of these new functions fixes the race condition as well, though I
>> have not yet touched the other guests.
>>
>> I thought about exposing accel/tcg/user-retaddr.h for direct use
>> from the targets, but perhaps these wrappers are cleaner.  RFC?
>>
>>
>> r~
>>
>>
>> Richard Henderson (2):
>>    accel/tcg: Introduce memset_ra, memmove_ra
>>    target/arm: Use memset_ra, memmove_ra in helper-a64.c
>>
>>   include/exec/cpu_ldst.h            | 40 ++++++++++++++++
>>   accel/tcg/user-exec.c              | 22 +++++++++
>>   target/arm/tcg/helper-a64.c        | 10 ++--
>>   tests/tcg/multiarch/memset-fault.c | 77
>> ++++++++++++++++++++++++++++++
>>   4 files changed, 144 insertions(+), 5 deletions(-)
>>   create mode 100644 tests/tcg/multiarch/memset-fault.c
> 
> This sounds good to me.
> 
> I haven't debugged it, but I wonder why doesn't s390x fail here.
> For XC with src == dst, it does access_memset() -> do_access_memset()
> -> memset() without setting the RA. And I don't think that anything
> around it sets the RA either.

s390x uses probe_access_flags, which verifies the page is mapped and writable, and raises 
the exception when it isn't.  In contrast, for user-only, tlb_vaddr_to_host *only* 
performs the guest -> host address mapping, i.e. (addr + guest_base).


r~
Richard Henderson July 4, 2024, 9:48 p.m. UTC | #3
On 7/4/24 08:18, Richard Henderson wrote:
> On 7/4/24 07:50, Ilya Leoshkevich wrote:
>> On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote:
>>> While looking into Zoltan's attempt to speed up ppc64 DCBZ
>>> (data cache block set to zero), I wondered what AArch64 was
>>> doing differently.  It turned out that Arm is the only user
>>> of tlb_vaddr_to_host.
>>>
>>> None of the code sequences in use between AArch64, Power64 and S390X
>>> are 100% safe, with race conditions vs mmap et al, however, AArch64
>>> is the only one that will fail this single threaded test case.  Use
>>> of these new functions fixes the race condition as well, though I
>>> have not yet touched the other guests.
>>>
>>> I thought about exposing accel/tcg/user-retaddr.h for direct use
>>> from the targets, but perhaps these wrappers are cleaner.  RFC?
>>>
>>>
>>> r~
>>>
>>>
>>> Richard Henderson (2):
>>>    accel/tcg: Introduce memset_ra, memmove_ra
>>>    target/arm: Use memset_ra, memmove_ra in helper-a64.c
>>>
>>>   include/exec/cpu_ldst.h            | 40 ++++++++++++++++
>>>   accel/tcg/user-exec.c              | 22 +++++++++
>>>   target/arm/tcg/helper-a64.c        | 10 ++--
>>>   tests/tcg/multiarch/memset-fault.c | 77
>>> ++++++++++++++++++++++++++++++
>>>   4 files changed, 144 insertions(+), 5 deletions(-)
>>>   create mode 100644 tests/tcg/multiarch/memset-fault.c
>>
>> This sounds good to me.
>>
>> I haven't debugged it, but I wonder why doesn't s390x fail here.
>> For XC with src == dst, it does access_memset() -> do_access_memset()
>> -> memset() without setting the RA. And I don't think that anything
>> around it sets the RA either.
> 
> s390x uses probe_access_flags, which verifies the page is mapped and writable, and raises 
> the exception when it isn't.  In contrast, for user-only, tlb_vaddr_to_host *only* 
> performs the guest -> host address mapping, i.e. (addr + guest_base).

I should clarify: probe_access_flags verifies that the page is mapped *at that moment*, 
but does not take the mmap_lock.  So the race is that the page can be unmapped by another 
thread after probe_access_flags and before the memset completes.


r~
Ilya Leoshkevich July 5, 2024, 5:24 p.m. UTC | #4
On Thu, 2024-07-04 at 14:48 -0700, Richard Henderson wrote:
> On 7/4/24 08:18, Richard Henderson wrote:
> > On 7/4/24 07:50, Ilya Leoshkevich wrote:
> > > On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote:
> > > > While looking into Zoltan's attempt to speed up ppc64 DCBZ
> > > > (data cache block set to zero), I wondered what AArch64 was
> > > > doing differently.  It turned out that Arm is the only user
> > > > of tlb_vaddr_to_host.
> > > > 
> > > > None of the code sequences in use between AArch64, Power64 and
> > > > S390X
> > > > are 100% safe, with race conditions vs mmap et al, however,
> > > > AArch64
> > > > is the only one that will fail this single threaded test case. 
> > > > Use
> > > > of these new functions fixes the race condition as well, though
> > > > I
> > > > have not yet touched the other guests.
> > > > 
> > > > I thought about exposing accel/tcg/user-retaddr.h for direct
> > > > use
> > > > from the targets, but perhaps these wrappers are cleaner.  RFC?
> > > > 
> > > > 
> > > > r~
> > > > 
> > > > 
> > > > Richard Henderson (2):
> > > >    accel/tcg: Introduce memset_ra, memmove_ra
> > > >    target/arm: Use memset_ra, memmove_ra in helper-a64.c
> > > > 
> > > >   include/exec/cpu_ldst.h            | 40 ++++++++++++++++
> > > >   accel/tcg/user-exec.c              | 22 +++++++++
> > > >   target/arm/tcg/helper-a64.c        | 10 ++--
> > > >   tests/tcg/multiarch/memset-fault.c | 77
> > > > ++++++++++++++++++++++++++++++
> > > >   4 files changed, 144 insertions(+), 5 deletions(-)
> > > >   create mode 100644 tests/tcg/multiarch/memset-fault.c
> > > 
> > > This sounds good to me.
> > > 
> > > I haven't debugged it, but I wonder why doesn't s390x fail here.
> > > For XC with src == dst, it does access_memset() ->
> > > do_access_memset()
> > > -> memset() without setting the RA. And I don't think that
> > > anything
> > > around it sets the RA either.
> > 
> > s390x uses probe_access_flags, which verifies the page is mapped
> > and writable, and raises 
> > the exception when it isn't.  In contrast, for user-only,
> > tlb_vaddr_to_host *only* 
> > performs the guest -> host address mapping, i.e. (addr +
> > guest_base).
> 
> I should clarify: probe_access_flags verifies that the page is mapped
> *at that moment*, 
> but does not take the mmap_lock.  So the race is that the page can be
> unmapped by another 
> thread after probe_access_flags and before the memset completes.

I see, thanks. I completely overlooked the access_prepare() calls.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Peter Maydell July 8, 2024, 2:25 p.m. UTC | #5
On Wed, 3 Jul 2024 at 00:43, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> While looking into Zoltan's attempt to speed up ppc64 DCBZ
> (data cache block set to zero), I wondered what AArch64 was
> doing differently.  It turned out that Arm is the only user
> of tlb_vaddr_to_host.

riscv also seems to use it in vext_ldff(), fwiw.

-- PMM
Richard Henderson July 9, 2024, 4:17 p.m. UTC | #6
On 7/8/24 07:25, Peter Maydell wrote:
> On Wed, 3 Jul 2024 at 00:43, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> While looking into Zoltan's attempt to speed up ppc64 DCBZ
>> (data cache block set to zero), I wondered what AArch64 was
>> doing differently.  It turned out that Arm is the only user
>> of tlb_vaddr_to_host.
> 
> riscv also seems to use it in vext_ldff(), fwiw.

So it does, followed by a second probe for read.
That's quite wrong...

But you're right that it has a similar race condition.


r~