Message ID | 20240702234155.2106399-1-richard.henderson@linaro.org (mailing list archive) |
---|---|
Headers | show |
Series | target/arm: Fix unwind from dc zva and FEAT_MOPS | expand |
On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote: > While looking into Zoltan's attempt to speed up ppc64 DCBZ > (data cache block set to zero), I wondered what AArch64 was > doing differently. It turned out that Arm is the only user > of tlb_vaddr_to_host. > > None of the code sequences in use between AArch64, Power64 and S390X > are 100% safe, with race conditions vs mmap et al, however, AArch64 > is the only one that will fail this single threaded test case. Use > of these new functions fixes the race condition as well, though I > have not yet touched the other guests. > > I thought about exposing accel/tcg/user-retaddr.h for direct use > from the targets, but perhaps these wrappers are cleaner. RFC? > > > r~ > > > Richard Henderson (2): > accel/tcg: Introduce memset_ra, memmove_ra > target/arm: Use memset_ra, memmove_ra in helper-a64.c > > include/exec/cpu_ldst.h | 40 ++++++++++++++++ > accel/tcg/user-exec.c | 22 +++++++++ > target/arm/tcg/helper-a64.c | 10 ++-- > tests/tcg/multiarch/memset-fault.c | 77 > ++++++++++++++++++++++++++++++ > 4 files changed, 144 insertions(+), 5 deletions(-) > create mode 100644 tests/tcg/multiarch/memset-fault.c This sounds good to me. I haven't debugged it, but I wonder why doesn't s390x fail here. For XC with src == dst, it does access_memset() -> do_access_memset() -> memset() without setting the RA. And I don't think that anything around it sets the RA either.
On 7/4/24 07:50, Ilya Leoshkevich wrote: > On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote: >> While looking into Zoltan's attempt to speed up ppc64 DCBZ >> (data cache block set to zero), I wondered what AArch64 was >> doing differently. It turned out that Arm is the only user >> of tlb_vaddr_to_host. >> >> None of the code sequences in use between AArch64, Power64 and S390X >> are 100% safe, with race conditions vs mmap et al, however, AArch64 >> is the only one that will fail this single threaded test case. Use >> of these new functions fixes the race condition as well, though I >> have not yet touched the other guests. >> >> I thought about exposing accel/tcg/user-retaddr.h for direct use >> from the targets, but perhaps these wrappers are cleaner. RFC? >> >> >> r~ >> >> >> Richard Henderson (2): >> accel/tcg: Introduce memset_ra, memmove_ra >> target/arm: Use memset_ra, memmove_ra in helper-a64.c >> >> include/exec/cpu_ldst.h | 40 ++++++++++++++++ >> accel/tcg/user-exec.c | 22 +++++++++ >> target/arm/tcg/helper-a64.c | 10 ++-- >> tests/tcg/multiarch/memset-fault.c | 77 >> ++++++++++++++++++++++++++++++ >> 4 files changed, 144 insertions(+), 5 deletions(-) >> create mode 100644 tests/tcg/multiarch/memset-fault.c > > This sounds good to me. > > I haven't debugged it, but I wonder why doesn't s390x fail here. > For XC with src == dst, it does access_memset() -> do_access_memset() > -> memset() without setting the RA. And I don't think that anything > around it sets the RA either. s390x uses probe_access_flags, which verifies the page is mapped and writable, and raises the exception when it isn't. In contrast, for user-only, tlb_vaddr_to_host *only* performs the guest -> host address mapping, i.e. (addr + guest_base). r~
On 7/4/24 08:18, Richard Henderson wrote: > On 7/4/24 07:50, Ilya Leoshkevich wrote: >> On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote: >>> While looking into Zoltan's attempt to speed up ppc64 DCBZ >>> (data cache block set to zero), I wondered what AArch64 was >>> doing differently. It turned out that Arm is the only user >>> of tlb_vaddr_to_host. >>> >>> None of the code sequences in use between AArch64, Power64 and S390X >>> are 100% safe, with race conditions vs mmap et al, however, AArch64 >>> is the only one that will fail this single threaded test case. Use >>> of these new functions fixes the race condition as well, though I >>> have not yet touched the other guests. >>> >>> I thought about exposing accel/tcg/user-retaddr.h for direct use >>> from the targets, but perhaps these wrappers are cleaner. RFC? >>> >>> >>> r~ >>> >>> >>> Richard Henderson (2): >>> accel/tcg: Introduce memset_ra, memmove_ra >>> target/arm: Use memset_ra, memmove_ra in helper-a64.c >>> >>> include/exec/cpu_ldst.h | 40 ++++++++++++++++ >>> accel/tcg/user-exec.c | 22 +++++++++ >>> target/arm/tcg/helper-a64.c | 10 ++-- >>> tests/tcg/multiarch/memset-fault.c | 77 >>> ++++++++++++++++++++++++++++++ >>> 4 files changed, 144 insertions(+), 5 deletions(-) >>> create mode 100644 tests/tcg/multiarch/memset-fault.c >> >> This sounds good to me. >> >> I haven't debugged it, but I wonder why doesn't s390x fail here. >> For XC with src == dst, it does access_memset() -> do_access_memset() >> -> memset() without setting the RA. And I don't think that anything >> around it sets the RA either. > > s390x uses probe_access_flags, which verifies the page is mapped and writable, and raises > the exception when it isn't. In contrast, for user-only, tlb_vaddr_to_host *only* > performs the guest -> host address mapping, i.e. (addr + guest_base). I should clarify: probe_access_flags verifies that the page is mapped *at that moment*, but does not take the mmap_lock. So the race is that the page can be unmapped by another thread after probe_access_flags and before the memset completes. r~
On Thu, 2024-07-04 at 14:48 -0700, Richard Henderson wrote: > On 7/4/24 08:18, Richard Henderson wrote: > > On 7/4/24 07:50, Ilya Leoshkevich wrote: > > > On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote: > > > > While looking into Zoltan's attempt to speed up ppc64 DCBZ > > > > (data cache block set to zero), I wondered what AArch64 was > > > > doing differently. It turned out that Arm is the only user > > > > of tlb_vaddr_to_host. > > > > > > > > None of the code sequences in use between AArch64, Power64 and > > > > S390X > > > > are 100% safe, with race conditions vs mmap et al, however, > > > > AArch64 > > > > is the only one that will fail this single threaded test case. > > > > Use > > > > of these new functions fixes the race condition as well, though > > > > I > > > > have not yet touched the other guests. > > > > > > > > I thought about exposing accel/tcg/user-retaddr.h for direct > > > > use > > > > from the targets, but perhaps these wrappers are cleaner. RFC? > > > > > > > > > > > > r~ > > > > > > > > > > > > Richard Henderson (2): > > > > accel/tcg: Introduce memset_ra, memmove_ra > > > > target/arm: Use memset_ra, memmove_ra in helper-a64.c > > > > > > > > include/exec/cpu_ldst.h | 40 ++++++++++++++++ > > > > accel/tcg/user-exec.c | 22 +++++++++ > > > > target/arm/tcg/helper-a64.c | 10 ++-- > > > > tests/tcg/multiarch/memset-fault.c | 77 > > > > ++++++++++++++++++++++++++++++ > > > > 4 files changed, 144 insertions(+), 5 deletions(-) > > > > create mode 100644 tests/tcg/multiarch/memset-fault.c > > > > > > This sounds good to me. > > > > > > I haven't debugged it, but I wonder why doesn't s390x fail here. > > > For XC with src == dst, it does access_memset() -> > > > do_access_memset() > > > -> memset() without setting the RA. And I don't think that > > > anything > > > around it sets the RA either. > > > > s390x uses probe_access_flags, which verifies the page is mapped > > and writable, and raises > > the exception when it isn't. In contrast, for user-only, > > tlb_vaddr_to_host *only* > > performs the guest -> host address mapping, i.e. (addr + > > guest_base). > > I should clarify: probe_access_flags verifies that the page is mapped > *at that moment*, > but does not take the mmap_lock. So the race is that the page can be > unmapped by another > thread after probe_access_flags and before the memset completes. I see, thanks. I completely overlooked the access_prepare() calls. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
On Wed, 3 Jul 2024 at 00:43, Richard Henderson <richard.henderson@linaro.org> wrote: > > While looking into Zoltan's attempt to speed up ppc64 DCBZ > (data cache block set to zero), I wondered what AArch64 was > doing differently. It turned out that Arm is the only user > of tlb_vaddr_to_host. riscv also seems to use it in vext_ldff(), fwiw. -- PMM
On 7/8/24 07:25, Peter Maydell wrote: > On Wed, 3 Jul 2024 at 00:43, Richard Henderson > <richard.henderson@linaro.org> wrote: >> >> While looking into Zoltan's attempt to speed up ppc64 DCBZ >> (data cache block set to zero), I wondered what AArch64 was >> doing differently. It turned out that Arm is the only user >> of tlb_vaddr_to_host. > > riscv also seems to use it in vext_ldff(), fwiw. So it does, followed by a second probe for read. That's quite wrong... But you're right that it has a similar race condition. r~