diff mbox

[v2,10/45] translate-all: guarantee that tb_hash only holds valid TBs

Message ID 1500235468-15341-11-git-send-email-cota@braap.org (mailing list archive)
State New, archived
Headers show

Commit Message

Emilio Cota July 16, 2017, 8:03 p.m. UTC
This gets rid of the need to check the tb->invalid bit during lookups.

After this change we do not need atomics to operate on tb->invalid: setting
and checking its value is serialised with tb_lock.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/cpu-exec.c      | 3 +--
 accel/tcg/translate-all.c | 8 ++++++--
 2 files changed, 7 insertions(+), 4 deletions(-)

Comments

Richard Henderson July 17, 2017, 10:55 p.m. UTC | #1
On 07/16/2017 10:03 AM, Emilio G. Cota wrote:
> @@ -1073,13 +1073,17 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>   
>       assert_tb_locked();
>   
> -    atomic_set(&tb->invalid, true);
> -
>       /* remove the TB from the hash list */
>       phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
>       h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
>       qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
>   
> +    /*
> +     * Mark the TB as invalid *after* it's been removed from tb_hash, which
> +     * eliminates the need to check this bit on lookups.
> +     */
> +    tb->invalid = true;

I believe you need atomic_store_release here.  Previously we were relying on 
the lock acquisition in qht_remove to provide the required memory barrier.

We definitely need to make sure this reaches memory before we zap the TB in the 
CPU_FOREACH loop.


r~
Emilio Cota July 18, 2017, 12:27 a.m. UTC | #2
On Mon, Jul 17, 2017 at 12:55:03 -1000, Richard Henderson wrote:
> On 07/16/2017 10:03 AM, Emilio G. Cota wrote:
> >@@ -1073,13 +1073,17 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
> >      assert_tb_locked();
> >-    atomic_set(&tb->invalid, true);
> >-
> >      /* remove the TB from the hash list */
> >      phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
> >      h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
> >      qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
> >+    /*
> >+     * Mark the TB as invalid *after* it's been removed from tb_hash, which
> >+     * eliminates the need to check this bit on lookups.
> >+     */
> >+    tb->invalid = true;
> 
> I believe you need atomic_store_release here.  Previously we were relying on
> the lock acquisition in qht_remove to provide the required memory barrier.
> 
> We definitely need to make sure this reaches memory before we zap the TB in
> the CPU_FOREACH loop.

After this patch tb->invalid is only read/set with tb_lock held, so no need for
atomics while accessing it.

		E.
Richard Henderson July 18, 2017, 3:40 a.m. UTC | #3
On 07/17/2017 02:27 PM, Emilio G. Cota wrote:
> On Mon, Jul 17, 2017 at 12:55:03 -1000, Richard Henderson wrote:
>> On 07/16/2017 10:03 AM, Emilio G. Cota wrote:
>>> @@ -1073,13 +1073,17 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
>>>       assert_tb_locked();
>>> -    atomic_set(&tb->invalid, true);
>>> -
>>>       /* remove the TB from the hash list */
>>>       phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
>>>       h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
>>>       qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
>>> +    /*
>>> +     * Mark the TB as invalid *after* it's been removed from tb_hash, which
>>> +     * eliminates the need to check this bit on lookups.
>>> +     */
>>> +    tb->invalid = true;
>>
>> I believe you need atomic_store_release here.  Previously we were relying on
>> the lock acquisition in qht_remove to provide the required memory barrier.
>>
>> We definitely need to make sure this reaches memory before we zap the TB in
>> the CPU_FOREACH loop.
> 
> After this patch tb->invalid is only read/set with tb_lock held, so no need for
> atomics while accessing it.

I think there's a path by which we do get stale data.
For threads A and B,

   (A) Lookup succeeds for TB in hash without tb_lock
        (B) Removes TB from hash
        (B) Sets tb->invalid
        (B) Clears FORALL_CPU jmp_cache
   (A) Store TB into local jmp_cache

... and since we never check for invalid again, there's nothing to evict TB 
from the jmp_cache again.

Here's a plan that will make me happy:

(1) Drop this patch, leaving the set of tb->invalid (or CF_INVALID) in place.
(2) Include CF_INVALID in the mask of bits compared in tb_lookup__cpu_state.
     (a) At this point in the patch set that's just

	(tb->flags & CF_INVALID) == 0

     (b) Later in the patch series when CF_PARALLEL is introduced
         (and CF_HASH_MASK, lets call it, instead of the cf_mask
         function you have now), this becomes

         (tb->flags & (CF_HASH_MASK | CF_INVALID)) == cf_mask

So that we continue to check CF_INVALID each time we lookup a TB, but now we 
get it for free as a part of the other flags check.


r~
Emilio Cota July 18, 2017, 4:54 a.m. UTC | #4
On Mon, Jul 17, 2017 at 17:40:29 -1000, Richard Henderson wrote:
> On 07/17/2017 02:27 PM, Emilio G. Cota wrote:
> >On Mon, Jul 17, 2017 at 12:55:03 -1000, Richard Henderson wrote:
> >>On 07/16/2017 10:03 AM, Emilio G. Cota wrote:
> >>>@@ -1073,13 +1073,17 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
> >>>      assert_tb_locked();
> >>>-    atomic_set(&tb->invalid, true);
> >>>-
> >>>      /* remove the TB from the hash list */
> >>>      phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
> >>>      h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
> >>>      qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
> >>>+    /*
> >>>+     * Mark the TB as invalid *after* it's been removed from tb_hash, which
> >>>+     * eliminates the need to check this bit on lookups.
> >>>+     */
> >>>+    tb->invalid = true;
> >>
> >>I believe you need atomic_store_release here.  Previously we were relying on
> >>the lock acquisition in qht_remove to provide the required memory barrier.
> >>
> >>We definitely need to make sure this reaches memory before we zap the TB in
> >>the CPU_FOREACH loop.
> >
> >After this patch tb->invalid is only read/set with tb_lock held, so no need for
> >atomics while accessing it.
> 
> I think there's a path by which we do get stale data.
> For threads A and B,
> 
>   (A) Lookup succeeds for TB in hash without tb_lock
>        (B) Removes TB from hash
>        (B) Sets tb->invalid
>        (B) Clears FORALL_CPU jmp_cache
>   (A) Store TB into local jmp_cache
> 
> ... and since we never check for invalid again, there's nothing to evict TB
> from the jmp_cache again.

Ouch. Yes I see it now.

What threw me off was that in lookup_tb_ptr we're not checking tb->invalid,
and that biased me into thinking that it's not needed. But I should have
tried harder. Also, that's a bug, and yet another reason to have tb_lookup,
so that we fix these things at once in one place.

> Here's a plan that will make me happy:
> 
> (1) Drop this patch, leaving the set of tb->invalid (or CF_INVALID) in place.
> (2) Include CF_INVALID in the mask of bits compared in tb_lookup__cpu_state.
>     (a) At this point in the patch set that's just
> 
> 	(tb->flags & CF_INVALID) == 0
> 
>     (b) Later in the patch series when CF_PARALLEL is introduced
>         (and CF_HASH_MASK, lets call it, instead of the cf_mask
>         function you have now), this becomes
> 
>         (tb->flags & (CF_HASH_MASK | CF_INVALID)) == cf_mask
> 
> So that we continue to check CF_INVALID each time we lookup a TB, but now we
> get it for free as a part of the other flags check.

With the annoying atomic_read thrown in there :-) but yes, will do.

		E.
Richard Henderson July 18, 2017, 5:29 a.m. UTC | #5
On 07/17/2017 06:54 PM, Emilio G. Cota wrote:
> What threw me off was that in lookup_tb_ptr we're not checking tb->invalid,
> and that biased me into thinking that it's not needed. But I should have
> tried harder. Also, that's a bug, and yet another reason to have tb_lookup,
> so that we fix these things at once in one place.

Yes, me as well.  Quite right we need only one copy of this code.

>>          (tb->flags & (CF_HASH_MASK | CF_INVALID)) == cf_mask
>>
>> So that we continue to check CF_INVALID each time we lookup a TB, but now we
>> get it for free as a part of the other flags check.
> 
> With the annoying atomic_read thrown in there :-) but yes, will do.

Yes of course.  ;-)


r~
Emilio Cota July 18, 2017, 11:30 p.m. UTC | #6
On Mon, Jul 17, 2017 at 19:29:57 -1000, Richard Henderson wrote:
> On 07/17/2017 06:54 PM, Emilio G. Cota wrote:
> >What threw me off was that in lookup_tb_ptr we're not checking tb->invalid,
> >and that biased me into thinking that it's not needed. But I should have
> >tried harder. Also, that's a bug, and yet another reason to have tb_lookup,
> >so that we fix these things at once in one place.
> 
> Yes, me as well.  Quite right we need only one copy of this code.
> 
> >>         (tb->flags & (CF_HASH_MASK | CF_INVALID)) == cf_mask
> >>
> >>So that we continue to check CF_INVALID each time we lookup a TB, but now we
> >>get it for free as a part of the other flags check.
> >
> >With the annoying atomic_read thrown in there :-) but yes, will do.
> 
> Yes of course.  ;-)

Gaah, we'll need to update all readers of tb->cflags, of which we have
plenty (~145, mostly in target code) to avoid C11 undefined behaviour
and make Paolo happy.

Should I do those updates in the same patch where tb->invalid is brought
over to cflags? Alternatives: have a later patch where all readers
are converted to atomic_read, or keep tb->invalid as a separate field (we
could use that 4-byte hole in struct tb_tc..)

		E.
Richard Henderson July 18, 2017, 11:43 p.m. UTC | #7
On 07/18/2017 01:30 PM, Emilio G. Cota wrote:
> Should I do those updates in the same patch where tb->invalid is brought
> over to cflags? Alternatives: have a later patch where all readers
> are converted to atomic_read, or keep tb->invalid as a separate field (we
> could use that 4-byte hole in struct tb_tc..)

I would prefer the readers be converted in a separate patch.
I wonder if an accessor inline might be in order?

static inline uint32_t tb_cflags(TranslationBlock *tb)
{
     return atomic_read(tb->cflags);
}

That might keep line lengths from expanding too much...

Please don't try to do anything clever to re-use padding holes.
I think that's just confusing and probably premature optimization.


r~
diff mbox

Patch

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index c4c289b..9b5ce13 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -292,8 +292,7 @@  static bool tb_cmp(const void *p, const void *d)
         tb->page_addr[0] == desc->phys_page1 &&
         tb->cs_base == desc->cs_base &&
         tb->flags == desc->flags &&
-        tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
-        !atomic_read(&tb->invalid)) {
+        tb->trace_vcpu_dstate == desc->trace_vcpu_dstate) {
         /* check next page if needed */
         if (tb->page_addr[1] == -1) {
             return true;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a124181..6d4c05f 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1073,13 +1073,17 @@  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 
     assert_tb_locked();
 
-    atomic_set(&tb->invalid, true);
-
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
     h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
     qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
 
+    /*
+     * Mark the TB as invalid *after* it's been removed from tb_hash, which
+     * eliminates the need to check this bit on lookups.
+     */
+    tb->invalid = true;
+
     /* remove the TB from the page list */
     if (tb->page_addr[0] != page_addr) {
         p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);