diff mbox series

[26/26] tcg: Use tlb_fill probe from tlb_vaddr_to_host

Message ID 20190403034358.21999-27-richard.henderson@linaro.org (mailing list archive)
State New, archived
Headers show
Series tcg: Add CPUClass::tlb_fill | expand

Commit Message

Richard Henderson April 3, 2019, 3:43 a.m. UTC
Most of the existing users would continue around a loop which
would fault the tlb entry in via a normal load/store.  But for
SVE we have a true non-faulting case which requires the new
probing form of tlb_fill.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu_ldst.h | 40 ++++--------------------
 accel/tcg/cputlb.c      | 69 ++++++++++++++++++++++++++++++++++++-----
 target/arm/sve_helper.c |  6 +---
 3 files changed, 68 insertions(+), 47 deletions(-)

Comments

Peter Maydell April 29, 2019, 5:41 p.m. UTC | #1
On Wed, 3 Apr 2019 at 05:05, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Most of the existing users would continue around a loop which
> would fault the tlb entry in via a normal load/store.  But for
> SVE we have a true non-faulting case which requires the new
> probing form of tlb_fill.

So am I right in thinking that this fixes a bug where we
previously would mark a load as faulted if the memory happened
not to be in the TLB, whereas now we will correctly pull in the
TLB entry and do the load ?

(Since guest code ought to be handling the "non-first-load
faulted" case by looping round or otherwise arranging to
retry, nothing in practice would have noticed this bug, right?)

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/exec/cpu_ldst.h | 40 ++++--------------------
>  accel/tcg/cputlb.c      | 69 ++++++++++++++++++++++++++++++++++++-----
>  target/arm/sve_helper.c |  6 +---
>  3 files changed, 68 insertions(+), 47 deletions(-)
>
> diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
> index d78041d7a0..be8c3f4da2 100644
> --- a/include/exec/cpu_ldst.h
> +++ b/include/exec/cpu_ldst.h
> @@ -440,43 +440,15 @@ static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx,
>   * This is the equivalent of the initial fast-path code used by
>   * TCG backends for guest load and store accesses.
>   */

The doc comment which this is the last two lines of needs
updating, I think -- with the changed implementation it's
no longer just the equivalent of the fast-path bit of code,
and it doesn't return NULL on a TLB miss any more.

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM
Richard Henderson May 9, 2019, 5:24 a.m. UTC | #2
On 4/29/19 10:41 AM, Peter Maydell wrote:
> On Wed, 3 Apr 2019 at 05:05, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Most of the existing users would continue around a loop which
>> would fault the tlb entry in via a normal load/store.  But for
>> SVE we have a true non-faulting case which requires the new
>> probing form of tlb_fill.
> 
> So am I right in thinking that this fixes a bug where we
> previously would mark a load as faulted if the memory happened
> not to be in the TLB, whereas now we will correctly pull in the
> TLB entry and do the load ?

Yes.

> (Since guest code ought to be handling the "non-first-load
> faulted" case by looping round or otherwise arranging to
> retry, nothing in practice would have noticed this bug, right?)

Yes.

The only case with changed behaviour is (expected to be) SVE no-fault, where
the loop you mention would have produced different incorrect results.


r~
Peter Maydell May 9, 2019, 8:56 a.m. UTC | #3
On Thu, 9 May 2019 at 06:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 4/29/19 10:41 AM, Peter Maydell wrote:
> > On Wed, 3 Apr 2019 at 05:05, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> Most of the existing users would continue around a loop which
> >> would fault the tlb entry in via a normal load/store.  But for
> >> SVE we have a true non-faulting case which requires the new
> >> probing form of tlb_fill.
> >
> > So am I right in thinking that this fixes a bug where we
> > previously would mark a load as faulted if the memory happened
> > not to be in the TLB, whereas now we will correctly pull in the
> > TLB entry and do the load ?
>
> Yes.
>
> > (Since guest code ought to be handling the "non-first-load
> > faulted" case by looping round or otherwise arranging to
> > retry, nothing in practice would have noticed this bug, right?)
>
> Yes.
>
> The only case with changed behaviour is (expected to be) SVE no-fault, where
> the loop you mention would have produced different incorrect results.

OK. If we're fixing a guest-visible bug it would be nice to
describe that in the commit message.

thanks
-- PMM
Richard Henderson May 9, 2019, 10:24 p.m. UTC | #4
On 5/9/19 1:56 AM, Peter Maydell wrote:
> On Thu, 9 May 2019 at 06:24, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 4/29/19 10:41 AM, Peter Maydell wrote:
>>> On Wed, 3 Apr 2019 at 05:05, Richard Henderson
>>> <richard.henderson@linaro.org> wrote:
>>>>
>>>> Most of the existing users would continue around a loop which
>>>> would fault the tlb entry in via a normal load/store.  But for
>>>> SVE we have a true non-faulting case which requires the new
>>>> probing form of tlb_fill.
>>>
>>> So am I right in thinking that this fixes a bug where we
>>> previously would mark a load as faulted if the memory happened
>>> not to be in the TLB, whereas now we will correctly pull in the
>>> TLB entry and do the load ?
>>
>> Yes.
>>
>>> (Since guest code ought to be handling the "non-first-load
>>> faulted" case by looping round or otherwise arranging to
>>> retry, nothing in practice would have noticed this bug, right?)
>>
>> Yes.
>>
>> The only case with changed behaviour is (expected to be) SVE no-fault, where
>> the loop you mention would have produced different incorrect results.
> 
> OK. If we're fixing a guest-visible bug it would be nice to
> describe that in the commit message.

The commit message now reads, in part,

But for AArch64 SVE we have an existing emulation bug wherein we
would mark the first element of a no-fault vector load as faulted
(within the FFR, not via exception) just because we did not have
its address in the TLB.  Now we can properly only mark it as faulted
if there really is no valid, readable translation, while still not
raising an exception.  (Note that beyond the first element of the
vector, the hardware may report a fault for any reason whatsoever;
with at least one element loaded, forward progress is guaranteed.)


r~
diff mbox series

Patch

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index d78041d7a0..be8c3f4da2 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -440,43 +440,15 @@  static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx,
  * This is the equivalent of the initial fast-path code used by
  * TCG backends for guest load and store accesses.
  */
+#ifdef CONFIG_USER_ONLY
 static inline void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
-                                      int access_type, int mmu_idx)
+                                      MMUAccessType access_type, int mmu_idx)
 {
-#if defined(CONFIG_USER_ONLY)
     return g2h(addr);
-#else
-    CPUTLBEntry *tlbentry = tlb_entry(env, mmu_idx, addr);
-    abi_ptr tlb_addr;
-    uintptr_t haddr;
-
-    switch (access_type) {
-    case 0:
-        tlb_addr = tlbentry->addr_read;
-        break;
-    case 1:
-        tlb_addr = tlb_addr_write(tlbentry);
-        break;
-    case 2:
-        tlb_addr = tlbentry->addr_code;
-        break;
-    default:
-        g_assert_not_reached();
-    }
-
-    if (!tlb_hit(tlb_addr, addr)) {
-        /* TLB entry is for a different page */
-        return NULL;
-    }
-
-    if (tlb_addr & ~TARGET_PAGE_MASK) {
-        /* IO access */
-        return NULL;
-    }
-
-    haddr = addr + tlbentry->addend;
-    return (void *)haddr;
-#endif /* defined(CONFIG_USER_ONLY) */
 }
+#else
+void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
+                        MMUAccessType access_type, int mmu_idx);
+#endif
 
 #endif /* CPU_LDST_H */
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 7f59d815db..959b6d4ded 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1006,6 +1006,16 @@  static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
     }
 }
 
+static inline target_ulong tlb_read_ofs(CPUTLBEntry *entry, size_t ofs)
+{
+#if TCG_OVERSIZED_GUEST
+    return *(target_ulong *)((uintptr_t)entry + ofs);
+#else
+    /* ofs might correspond to .addr_write, so use atomic_read */
+    return atomic_read((target_ulong *)((uintptr_t)entry + ofs));
+#endif
+}
+
 /* Return true if ADDR is present in the victim tlb, and has been copied
    back to the main tlb.  */
 static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
@@ -1016,14 +1026,7 @@  static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
     assert_cpu_is_self(ENV_GET_CPU(env));
     for (vidx = 0; vidx < CPU_VTLB_SIZE; ++vidx) {
         CPUTLBEntry *vtlb = &env->tlb_v_table[mmu_idx][vidx];
-        target_ulong cmp;
-
-        /* elt_ofs might correspond to .addr_write, so use atomic_read */
-#if TCG_OVERSIZED_GUEST
-        cmp = *(target_ulong *)((uintptr_t)vtlb + elt_ofs);
-#else
-        cmp = atomic_read((target_ulong *)((uintptr_t)vtlb + elt_ofs));
-#endif
+        target_ulong cmp = tlb_read_ofs(vtlb, elt_ofs);
 
         if (cmp == page) {
             /* Found entry in victim tlb, swap tlb and iotlb.  */
@@ -1107,6 +1110,56 @@  void probe_write(CPUArchState *env, target_ulong addr, int size, int mmu_idx,
     }
 }
 
+void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
+                        MMUAccessType access_type, int mmu_idx)
+{
+    CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
+    uintptr_t tlb_addr, page;
+    size_t elt_ofs;
+
+    switch (access_type) {
+    case MMU_DATA_LOAD:
+        elt_ofs = offsetof(CPUTLBEntry, addr_read);
+        break;
+    case MMU_DATA_STORE:
+        elt_ofs = offsetof(CPUTLBEntry, addr_write);
+        break;
+    case MMU_INST_FETCH:
+        elt_ofs = offsetof(CPUTLBEntry, addr_code);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    page = addr & TARGET_PAGE_MASK;
+    tlb_addr = tlb_read_ofs(entry, elt_ofs);
+
+    if (!tlb_hit_page(tlb_addr, page)) {
+        uintptr_t index = tlb_index(env, mmu_idx, addr);
+
+        if (!victim_tlb_hit(env, mmu_idx, index, elt_ofs, page)) {
+            CPUState *cs = ENV_GET_CPU(env);
+            CPUClass *cc = CPU_GET_CLASS(cs);
+
+            if (!cc->tlb_fill(cs, addr, 0, access_type, mmu_idx, true, 0)) {
+                /* Non-faulting page table read failed.  */
+                return NULL;
+            }
+
+            /* TLB resize via tlb_fill may have moved the entry.  */
+            entry = tlb_entry(env, mmu_idx, addr);
+        }
+        tlb_addr = tlb_read_ofs(entry, elt_ofs);
+    }
+
+    if (tlb_addr & ~TARGET_PAGE_MASK) {
+        /* IO access */
+        return NULL;
+    }
+
+    return (void *)(addr + entry->addend);
+}
+
 /* Probe for a read-modify-write atomic operation.  Do not allow unaligned
  * operations, or io operations to proceed.  Return the host address.  */
 static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index bc847250dd..fd434c66ea 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4598,11 +4598,7 @@  static void sve_ldnf1_r(CPUARMState *env, void *vg, const target_ulong addr,
      * in the real world, obviously.)
      *
      * Then there are the annoying special cases with watchpoints...
-     *
-     * TODO: Add a form of tlb_fill that does not raise an exception,
-     * with a form of tlb_vaddr_to_host and a set of loads to match.
-     * The non_fault_vaddr_to_host would handle everything, usually,
-     * and the loads would handle the iomem path for watchpoints.
+     * TODO: Add a form of non-faulting loads using cc->tlb_fill(probe=true).
      */
     host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
     split = max_for_page(addr, mem_off, mem_max);