Message ID | 1500235468-15341-11-git-send-email-cota@braap.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/16/2017 10:03 AM, Emilio G. Cota wrote: > @@ -1073,13 +1073,17 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) > > assert_tb_locked(); > > - atomic_set(&tb->invalid, true); > - > /* remove the TB from the hash list */ > phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); > h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate); > qht_remove(&tcg_ctx.tb_ctx.htable, tb, h); > > + /* > + * Mark the TB as invalid *after* it's been removed from tb_hash, which > + * eliminates the need to check this bit on lookups. > + */ > + tb->invalid = true; I believe you need atomic_store_release here. Previously we were relying on the lock acquisition in qht_remove to provide the required memory barrier. We definitely need to make sure this reaches memory before we zap the TB in the CPU_FOREACH loop. r~
On Mon, Jul 17, 2017 at 12:55:03 -1000, Richard Henderson wrote: > On 07/16/2017 10:03 AM, Emilio G. Cota wrote: > >@@ -1073,13 +1073,17 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) > > assert_tb_locked(); > >- atomic_set(&tb->invalid, true); > >- > > /* remove the TB from the hash list */ > > phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); > > h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate); > > qht_remove(&tcg_ctx.tb_ctx.htable, tb, h); > >+ /* > >+ * Mark the TB as invalid *after* it's been removed from tb_hash, which > >+ * eliminates the need to check this bit on lookups. > >+ */ > >+ tb->invalid = true; > > I believe you need atomic_store_release here. Previously we were relying on > the lock acquisition in qht_remove to provide the required memory barrier. > > We definitely need to make sure this reaches memory before we zap the TB in > the CPU_FOREACH loop. After this patch tb->invalid is only read/set with tb_lock held, so no need for atomics while accessing it. E.
On 07/17/2017 02:27 PM, Emilio G. Cota wrote: > On Mon, Jul 17, 2017 at 12:55:03 -1000, Richard Henderson wrote: >> On 07/16/2017 10:03 AM, Emilio G. Cota wrote: >>> @@ -1073,13 +1073,17 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) >>> assert_tb_locked(); >>> - atomic_set(&tb->invalid, true); >>> - >>> /* remove the TB from the hash list */ >>> phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); >>> h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate); >>> qht_remove(&tcg_ctx.tb_ctx.htable, tb, h); >>> + /* >>> + * Mark the TB as invalid *after* it's been removed from tb_hash, which >>> + * eliminates the need to check this bit on lookups. >>> + */ >>> + tb->invalid = true; >> >> I believe you need atomic_store_release here. Previously we were relying on >> the lock acquisition in qht_remove to provide the required memory barrier. >> >> We definitely need to make sure this reaches memory before we zap the TB in >> the CPU_FOREACH loop. > > After this patch tb->invalid is only read/set with tb_lock held, so no need for > atomics while accessing it. I think there's a path by which we do get stale data. For threads A and B, (A) Lookup succeeds for TB in hash without tb_lock (B) Removes TB from hash (B) Sets tb->invalid (B) Clears FORALL_CPU jmp_cache (A) Store TB into local jmp_cache ... and since we never check for invalid again, there's nothing to evict TB from the jmp_cache again. Here's a plan that will make me happy: (1) Drop this patch, leaving the set of tb->invalid (or CF_INVALID) in place. (2) Include CF_INVALID in the mask of bits compared in tb_lookup__cpu_state. (a) At this point in the patch set that's just (tb->flags & CF_INVALID) == 0 (b) Later in the patch series when CF_PARALLEL is introduced (and CF_HASH_MASK, lets call it, instead of the cf_mask function you have now), this becomes (tb->flags & (CF_HASH_MASK | CF_INVALID)) == cf_mask So that we continue to check CF_INVALID each time we lookup a TB, but now we get it for free as a part of the other flags check. r~
On Mon, Jul 17, 2017 at 17:40:29 -1000, Richard Henderson wrote: > On 07/17/2017 02:27 PM, Emilio G. Cota wrote: > >On Mon, Jul 17, 2017 at 12:55:03 -1000, Richard Henderson wrote: > >>On 07/16/2017 10:03 AM, Emilio G. Cota wrote: > >>>@@ -1073,13 +1073,17 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) > >>> assert_tb_locked(); > >>>- atomic_set(&tb->invalid, true); > >>>- > >>> /* remove the TB from the hash list */ > >>> phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); > >>> h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate); > >>> qht_remove(&tcg_ctx.tb_ctx.htable, tb, h); > >>>+ /* > >>>+ * Mark the TB as invalid *after* it's been removed from tb_hash, which > >>>+ * eliminates the need to check this bit on lookups. > >>>+ */ > >>>+ tb->invalid = true; > >> > >>I believe you need atomic_store_release here. Previously we were relying on > >>the lock acquisition in qht_remove to provide the required memory barrier. > >> > >>We definitely need to make sure this reaches memory before we zap the TB in > >>the CPU_FOREACH loop. > > > >After this patch tb->invalid is only read/set with tb_lock held, so no need for > >atomics while accessing it. > > I think there's a path by which we do get stale data. > For threads A and B, > > (A) Lookup succeeds for TB in hash without tb_lock > (B) Removes TB from hash > (B) Sets tb->invalid > (B) Clears FORALL_CPU jmp_cache > (A) Store TB into local jmp_cache > > ... and since we never check for invalid again, there's nothing to evict TB > from the jmp_cache again. Ouch. Yes I see it now. What threw me off was that in lookup_tb_ptr we're not checking tb->invalid, and that biased me into thinking that it's not needed. But I should have tried harder. Also, that's a bug, and yet another reason to have tb_lookup, so that we fix these things at once in one place. > Here's a plan that will make me happy: > > (1) Drop this patch, leaving the set of tb->invalid (or CF_INVALID) in place. > (2) Include CF_INVALID in the mask of bits compared in tb_lookup__cpu_state. > (a) At this point in the patch set that's just > > (tb->flags & CF_INVALID) == 0 > > (b) Later in the patch series when CF_PARALLEL is introduced > (and CF_HASH_MASK, lets call it, instead of the cf_mask > function you have now), this becomes > > (tb->flags & (CF_HASH_MASK | CF_INVALID)) == cf_mask > > So that we continue to check CF_INVALID each time we lookup a TB, but now we > get it for free as a part of the other flags check. With the annoying atomic_read thrown in there :-) but yes, will do. E.
On 07/17/2017 06:54 PM, Emilio G. Cota wrote: > What threw me off was that in lookup_tb_ptr we're not checking tb->invalid, > and that biased me into thinking that it's not needed. But I should have > tried harder. Also, that's a bug, and yet another reason to have tb_lookup, > so that we fix these things at once in one place. Yes, me as well. Quite right we need only one copy of this code. >> (tb->flags & (CF_HASH_MASK | CF_INVALID)) == cf_mask >> >> So that we continue to check CF_INVALID each time we lookup a TB, but now we >> get it for free as a part of the other flags check. > > With the annoying atomic_read thrown in there :-) but yes, will do. Yes of course. ;-) r~
On Mon, Jul 17, 2017 at 19:29:57 -1000, Richard Henderson wrote: > On 07/17/2017 06:54 PM, Emilio G. Cota wrote: > >What threw me off was that in lookup_tb_ptr we're not checking tb->invalid, > >and that biased me into thinking that it's not needed. But I should have > >tried harder. Also, that's a bug, and yet another reason to have tb_lookup, > >so that we fix these things at once in one place. > > Yes, me as well. Quite right we need only one copy of this code. > > >> (tb->flags & (CF_HASH_MASK | CF_INVALID)) == cf_mask > >> > >>So that we continue to check CF_INVALID each time we lookup a TB, but now we > >>get it for free as a part of the other flags check. > > > >With the annoying atomic_read thrown in there :-) but yes, will do. > > Yes of course. ;-) Gaah, we'll need to update all readers of tb->cflags, of which we have plenty (~145, mostly in target code) to avoid C11 undefined behaviour and make Paolo happy. Should I do those updates in the same patch where tb->invalid is brought over to cflags? Alternatives: have a later patch where all readers are converted to atomic_read, or keep tb->invalid as a separate field (we could use that 4-byte hole in struct tb_tc..) E.
On 07/18/2017 01:30 PM, Emilio G. Cota wrote: > Should I do those updates in the same patch where tb->invalid is brought > over to cflags? Alternatives: have a later patch where all readers > are converted to atomic_read, or keep tb->invalid as a separate field (we > could use that 4-byte hole in struct tb_tc..) I would prefer the readers be converted in a separate patch. I wonder if an accessor inline might be in order? static inline uint32_t tb_cflags(TranslationBlock *tb) { return atomic_read(tb->cflags); } That might keep line lengths from expanding too much... Please don't try to do anything clever to re-use padding holes. I think that's just confusing and probably premature optimization. r~
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index c4c289b..9b5ce13 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -292,8 +292,7 @@ static bool tb_cmp(const void *p, const void *d) tb->page_addr[0] == desc->phys_page1 && tb->cs_base == desc->cs_base && tb->flags == desc->flags && - tb->trace_vcpu_dstate == desc->trace_vcpu_dstate && - !atomic_read(&tb->invalid)) { + tb->trace_vcpu_dstate == desc->trace_vcpu_dstate) { /* check next page if needed */ if (tb->page_addr[1] == -1) { return true; diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index a124181..6d4c05f 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -1073,13 +1073,17 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) assert_tb_locked(); - atomic_set(&tb->invalid, true); - /* remove the TB from the hash list */ phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate); qht_remove(&tcg_ctx.tb_ctx.htable, tb, h); + /* + * Mark the TB as invalid *after* it's been removed from tb_hash, which + * eliminates the need to check this bit on lookups. + */ + tb->invalid = true; + /* remove the TB from the page list */ if (tb->page_addr[0] != page_addr) { p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
This gets rid of the need to check the tb->invalid bit during lookups. After this change we do not need atomics to operate on tb->invalid: setting and checking its value is serialised with tb_lock. Signed-off-by: Emilio G. Cota <cota@braap.org> --- accel/tcg/cpu-exec.c | 3 +-- accel/tcg/translate-all.c | 8 ++++++-- 2 files changed, 7 insertions(+), 4 deletions(-)