Message ID | 1467389770-9738-2-git-send-email-alex.bennee@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/01/2016 09:16 AM, Alex Bennée wrote: > From: Sergey Fedorov <serge.fdrv@gmail.com> > > First, ensure atomicity of CPU's 'tb_jmp_cache' access by: > * using atomic_read() to look up a TB when not holding 'tb_lock'; > * using atomic_write() to remove a TB from each CPU's local cache on > TB invalidation. > > Second, add some memory barriers to ensure we don't put the TB being > invalidated back to CPU's 'tb_jmp_cache'. If we fail to look up a TB in > CPU's local cache because it is being invalidated by some other thread > then it must not be found in the shared TB hash table. Otherwise we'd > put it back to CPU's local cache. > > Note that this patch does *not* make CPU's TLB invalidation safe if it > is done from some other thread while the CPU is in its execution loop. > > Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> > Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> > [AJB: fixed missing atomic set, tweak title] > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > > --- > AJB: > - tweak title > - fixed missing set of tb_jmp_cache > --- > cpu-exec.c | 9 +++++++-- > translate-all.c | 7 ++++++- > 2 files changed, 13 insertions(+), 3 deletions(-) Reviewed-by: Richard Henderson <rth@twiddle.net> r~
On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote: > From: Sergey Fedorov <serge.fdrv@gmail.com> (snip) > @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu, > is executed. */ > cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); > tb_lock(); > - tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]; > + tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]); > if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base || > tb->flags != flags)) { > tb = tb_find_slow(cpu, pc, cs_base, flags); > diff --git a/translate-all.c b/translate-all.c > index eaa95e4..1fcfe79 100644 > --- a/translate-all.c > +++ b/translate-all.c > @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) > invalidate_page_bitmap(p); > } > > + /* Ensure that we won't find the TB in the shared hash table > + * if we con't see it in CPU's local cache. s/con't/can't/ > + * Pairs with smp_rmb() in tb_find_slow(). */ > + smp_wmb(); This fence is already embedded in qht_remove, since it internally calls seqlock_write_end() on a successful removal, so we could get away with a comment instead of emitting a redundant fence. However, if qht ever changed its implementation this would have to be taken into account. So I'd be OK with emitting the fence here too. > + > /* remove the TB from the hash list */ > h = tb_jmp_cache_hash_func(tb->pc); > CPU_FOREACH(cpu) { > if (cpu->tb_jmp_cache[h] == tb) { Missing atomic_read here: if (atomic_read(cpu->tb_jmp_cache[...])) { > - cpu->tb_jmp_cache[h] = NULL; > + atomic_set(&cpu->tb_jmp_cache[h], NULL); Other than that, Reviewed-by: Emilio G. Cota <cota@braap.org>
On 07/01/2016 05:17 PM, Emilio G. Cota wrote: > On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote: >> From: Sergey Fedorov <serge.fdrv@gmail.com> > (snip) >> @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu, >> is executed. */ >> cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); >> tb_lock(); >> - tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]; >> + tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]); >> if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base || >> tb->flags != flags)) { >> tb = tb_find_slow(cpu, pc, cs_base, flags); >> diff --git a/translate-all.c b/translate-all.c >> index eaa95e4..1fcfe79 100644 >> --- a/translate-all.c >> +++ b/translate-all.c >> @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) >> invalidate_page_bitmap(p); >> } >> >> + /* Ensure that we won't find the TB in the shared hash table >> + * if we con't see it in CPU's local cache. > > s/con't/can't/ > >> + * Pairs with smp_rmb() in tb_find_slow(). */ >> + smp_wmb(); > > This fence is already embedded in qht_remove, since it internally > calls seqlock_write_end() on a successful removal... No. There's stuff that happens after qht_remove and before this barrier: tb_page_remove and invalidate_page_bitmap. r~
Emilio G. Cota <cota@braap.org> writes: > On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote: >> From: Sergey Fedorov <serge.fdrv@gmail.com> > (snip) >> @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu, >> is executed. */ >> cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); >> tb_lock(); >> - tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]; >> + tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]); >> if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base || >> tb->flags != flags)) { >> tb = tb_find_slow(cpu, pc, cs_base, flags); >> diff --git a/translate-all.c b/translate-all.c >> index eaa95e4..1fcfe79 100644 >> --- a/translate-all.c >> +++ b/translate-all.c >> @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) >> invalidate_page_bitmap(p); >> } >> >> + /* Ensure that we won't find the TB in the shared hash table >> + * if we con't see it in CPU's local cache. > > s/con't/can't/ > >> + * Pairs with smp_rmb() in tb_find_slow(). */ >> + smp_wmb(); > > This fence is already embedded in qht_remove, since it internally > calls seqlock_write_end() on a successful removal, so we could get > away with a comment instead of emitting a redundant fence. > However, if qht ever changed its implementation this would have > to be taken into account. So I'd be OK with emitting the > fence here too. > >> + >> /* remove the TB from the hash list */ >> h = tb_jmp_cache_hash_func(tb->pc); >> CPU_FOREACH(cpu) { >> if (cpu->tb_jmp_cache[h] == tb) { > > Missing atomic_read here: if (atomic_read(cpu->tb_jmp_cache[...])) { Oops, good catch. > >> - cpu->tb_jmp_cache[h] = NULL; >> + atomic_set(&cpu->tb_jmp_cache[h], NULL); > > Other than that, > > Reviewed-by: Emilio G. Cota <cota@braap.org> -- Alex Bennée
On Sat, Jul 02, 2016 at 08:09:35 +0100, Alex Bennée wrote: > > Emilio G. Cota <cota@braap.org> writes: > > > On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote: > >> From: Sergey Fedorov <serge.fdrv@gmail.com> > > (snip) > >> @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu, > >> is executed. */ > >> cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); > >> tb_lock(); > >> - tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]; > >> + tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]); > >> if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base || > >> tb->flags != flags)) { > >> tb = tb_find_slow(cpu, pc, cs_base, flags); > >> diff --git a/translate-all.c b/translate-all.c > >> index eaa95e4..1fcfe79 100644 > >> --- a/translate-all.c > >> +++ b/translate-all.c > >> @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) > >> invalidate_page_bitmap(p); > >> } > >> > >> + /* Ensure that we won't find the TB in the shared hash table > >> + * if we con't see it in CPU's local cache. > > > > s/con't/can't/ > > > >> + * Pairs with smp_rmb() in tb_find_slow(). */ > >> + smp_wmb(); > > > > This fence is already embedded in qht_remove, since it internally > > calls seqlock_write_end() on a successful removal, so we could get > > away with a comment instead of emitting a redundant fence. > > However, if qht ever changed its implementation this would have > > to be taken into account. So I'd be OK with emitting the > > fence here too. > > > >> + > >> /* remove the TB from the hash list */ > >> h = tb_jmp_cache_hash_func(tb->pc); > >> CPU_FOREACH(cpu) { > >> if (cpu->tb_jmp_cache[h] == tb) { > > > > Missing atomic_read here: if (atomic_read(cpu->tb_jmp_cache[...])) { > > Oops, good catch. My mistake. An atomic_read here isn't needed: as the commit message points out, we only need atomic_read when tb_lock isn't held. In this case tb_lock is held, so we only use atomic accesses for writing to the array. E.
On Fri, Jul 01, 2016 at 17:32:01 -0700, Richard Henderson wrote: > On 07/01/2016 05:17 PM, Emilio G. Cota wrote: > >On Fri, Jul 01, 2016 at 17:16:09 +0100, Alex Bennée wrote: > >>From: Sergey Fedorov <serge.fdrv@gmail.com> > >(snip) > >>@@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu, > >> is executed. */ > >> cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); > >> tb_lock(); > >>- tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]; > >>+ tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]); > >> if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base || > >> tb->flags != flags)) { > >> tb = tb_find_slow(cpu, pc, cs_base, flags); > >>diff --git a/translate-all.c b/translate-all.c > >>index eaa95e4..1fcfe79 100644 > >>--- a/translate-all.c > >>+++ b/translate-all.c > >>@@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) > >> invalidate_page_bitmap(p); > >> } > >> > >>+ /* Ensure that we won't find the TB in the shared hash table > >>+ * if we con't see it in CPU's local cache. > > > >s/con't/can't/ > > > >>+ * Pairs with smp_rmb() in tb_find_slow(). */ > >>+ smp_wmb(); > > > >This fence is already embedded in qht_remove, since it internally > >calls seqlock_write_end() on a successful removal... > > No. There's stuff that happens after qht_remove and before this barrier: > tb_page_remove and invalidate_page_bitmap. I can't see how that "tb page" stuff you refer to is relevant here. AFAICT the barrier pairing in this patch only applies to tb_jmp_cache and qht. If as you say it does not, then all comments and the commit message are wrong. What am I missing? Emilio
On 05/07/2016 00:31, Emilio G. Cota wrote: > My mistake. An atomic_read here isn't needed: as the commit message > points out, we only need atomic_read when tb_lock isn't held. In this > case tb_lock is held, so we only use atomic accesses for writing > to the array. It's harmless though. In C11 and C++11 it would even be required, so I think it's better to add it even though our compilers don't yet enforce it. Paolo
diff --git a/cpu-exec.c b/cpu-exec.c index b840e1d..10ce1cb 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -285,6 +285,11 @@ static TranslationBlock *tb_find_slow(CPUState *cpu, { TranslationBlock *tb; + /* Ensure that we won't find a TB in the shared hash table + * if it is being invalidated by some other thread. + * Otherwise we'd put it back to CPU's local cache. + * Pairs with smp_wmb() in tb_phys_invalidate(). */ + smp_rmb(); tb = tb_find_physical(cpu, pc, cs_base, flags); if (tb) { goto found; @@ -315,7 +320,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu, found: /* we add the TB in the virtual pc hash table */ - cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb; + atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb); return tb; } @@ -333,7 +338,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu, is executed. */ cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); tb_lock(); - tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]; + tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]); if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base || tb->flags != flags)) { tb = tb_find_slow(cpu, pc, cs_base, flags); diff --git a/translate-all.c b/translate-all.c index eaa95e4..1fcfe79 100644 --- a/translate-all.c +++ b/translate-all.c @@ -1004,11 +1004,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) invalidate_page_bitmap(p); } + /* Ensure that we won't find the TB in the shared hash table + * if we con't see it in CPU's local cache. + * Pairs with smp_rmb() in tb_find_slow(). */ + smp_wmb(); + /* remove the TB from the hash list */ h = tb_jmp_cache_hash_func(tb->pc); CPU_FOREACH(cpu) { if (cpu->tb_jmp_cache[h] == tb) { - cpu->tb_jmp_cache[h] = NULL; + atomic_set(&cpu->tb_jmp_cache[h], NULL); } }