Message ID | 1464138802-23503-15-git-send-email-cota@braap.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 25/05/16 04:13, Emilio G. Cota wrote: > Having a fixed-size hash table for keeping track of all translation blocks > is suboptimal: some workloads are just too big or too small to get maximum > performance from the hash table. The MRU promotion policy helps improve > performance when the hash table is a little undersized, but it cannot > make up for severely undersized hash tables. > > Furthermore, frequent MRU promotions result in writes that are a scalability > bottleneck. For scalability, lookups should only perform reads, not writes. > This is not a big deal for now, but it will become one once MTTCG matures. > > The appended fixes these issues by using qht as the implementation of > the TB hash table. This solution is superior to other alternatives considered, > namely: > > - master: implementation in QEMU before this patchset > - xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU. > - xxhash-rcu: fixed buckets + xxhash + RCU list + MRU. > MRU is implemented here by adding an intermediate struct > that contains the u32 hash and a pointer to the TB; this > allows us, on an MRU promotion, to copy said struct (that is not > at the head), and put this new copy at the head. After a grace > period, the original non-head struct can be eliminated, and > after another grace period, freed. > - qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize + > no MRU for lookups; MRU for inserts. > The appended solution is the following: > - qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize + > no MRU for lookups; MRU for inserts. > > The plots below compare the considered solutions. The Y axis shows the > boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis > sweeps the number of buckets (or initial number of buckets for qht-autoresize). > The plots in PNG format (and with errorbars) can be seen here: > http://imgur.com/a/Awgnq > > Each test runs 5 times, and the entire QEMU process is pinned to a > single core for repeatability of results. > > Host: Intel Xeon E5-2690 > > 28 ++------------+-------------+-------------+-------------+------------++ > A***** + + + master **A*** + > 27 ++ * xxhash ##B###++ > | A******A****** xxhash-rcu $$C$$$ | > 26 C$$ A******A****** qht-fixed-nomru*%%D%%%++ > D%%$$ A******A******A*qht-dyn-mru A*E****A > 25 ++ %%$$ qht-dyn-nomru &&F&&&++ > B#####% | > 24 ++ #C$$$$$ ++ > | B### $ | > | ## C$$$$$$ | > 23 ++ # C$$$$$$ ++ > | B###### C$$$$$$ %%%D > 22 ++ %B###### C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C > | D%%%%%%B###### @E@@@@@@ %%%D%%%@@@E@@@@@@E > 21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B > + E@@@ F&&& + E@ + F&&& + + > 20 ++------------+-------------+-------------+-------------+------------++ > 14 16 18 20 22 24 > log2 number of buckets > > Host: Intel i7-4790K > > 14.5 ++------------+------------+-------------+------------+------------++ > A** + + + master **A*** + > 14 ++ ** xxhash ##B###++ > 13.5 ++ ** xxhash-rcu $$C$$$++ > | qht-fixed-nomru %%D%%% | > 13 ++ A****** qht-dyn-mru @@E@@@++ > | A*****A******A****** qht-dyn-nomru &&F&&& | > 12.5 C$$ A******A******A*****A****** ***A > 12 ++ $$ A*** ++ > D%%% $$ | > 11.5 ++ %% ++ > B### %C$$$$$$ | > 11 ++ ## D%%%%% C$$$$$ ++ > | # % C$$$$$$ | > 10.5 F&&&&&&B######D%%%%% C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$ $$$C > 10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B > + F&& D%%%%%%B######B######B#####B###@@@D%%% + > 9.5 ++------------+------------+-------------+------------+------------++ > 14 16 18 20 22 24 > log2 number of buckets > > Note that the original point before this patch series is X=15 for "master"; > the little sensitivity to the increased number of buckets is due to the > poor hashing function in master. > > xxhash-rcu has significant overhead due to the constant churn of allocating > and deallocating intermediate structs for implementing MRU. An alternative > would be do consider failed lookups as "maybe not there", and then > acquire the external lock (tb_lock in this case) to really confirm that > there was indeed a failed lookup. This, however, would not be enough > to implement dynamic resizing--this is more complex: see > "Resizable, Scalable, Concurrent Hash Tables via Relativistic > Programming" by Triplett, McKenney and Walpole. This solution was > discarded due to the very coarse RCU read critical sections that we have > in MTTCG; resizing requires waiting for readers after every pointer update, > and resizes require many pointer updates, so this would quickly become > prohibitive. > > qht-fixed-nomru shows that MRU promotion is advisable for undersized > hash tables. > > However, qht-dyn-mru shows that MRU promotion is not important if the > hash table is properly sized: there is virtually no difference in > performance between qht-dyn-nomru and qht-dyn-mru. > > Before this patch, we're at X=15 on "xxhash"; after this patch, we're at > X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we > can achieve with optimum sizing of the hash table, while keeping the hash > table scalable for readers. > > The improvement we get before and after this patch for booting debian jessie > with arm-softmmu is: > > - Intel Xeon E5-2690: 10.5% less time > - Intel i7-4790K: 5.2% less time > > We could get this same improvement _for this particular workload_ by > statically increasing the size of the hash table. But this would hurt > workloads that do not need a large hash table. The dynamic (upward) > resizing allows us to start small and enlarge the hash table as needed. > > A quick note on downsizing: the table is resized back to 2**15 buckets > on every tb_flush; this makes sense because it is not guaranteed that the > table will reach the same number of TBs later on (e.g. most bootup code is > thrown away after boot); it makes sense to grow the hash table as > more code blocks are translated. This also avoids the complication of > having to build downsizing hysteresis logic into qht. > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org> > Reviewed-by: Richard Henderson <rth@twiddle.net> > Signed-off-by: Emilio G. Cota <cota@braap.org> > --- > cpu-exec.c | 86 ++++++++++++++++++++++++------------------------- > include/exec/exec-all.h | 9 +++--- > include/exec/tb-hash.h | 3 +- > translate-all.c | 85 ++++++++++++++++++++++-------------------------- > 4 files changed, 86 insertions(+), 97 deletions(-) > > diff --git a/cpu-exec.c b/cpu-exec.c > index 1735032..6a2350d 100644 > --- a/cpu-exec.c > +++ b/cpu-exec.c > @@ -224,57 +224,57 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles, > } > #endif > > +struct tb_desc { > + target_ulong pc; > + target_ulong cs_base; > + CPUArchState *env; > + tb_page_addr_t phys_page1; > + uint32_t flags; > +}; > + > +static bool tb_cmp(const void *p, const void *d) > +{ > + const TranslationBlock *tb = p; > + const struct tb_desc *desc = d; > + > + if (tb->pc == desc->pc && > + tb->page_addr[0] == desc->phys_page1 && > + tb->cs_base == desc->cs_base && > + tb->flags == desc->flags) { > + /* check next page if needed */ > + if (tb->page_addr[1] == -1) { > + return true; > + } else { > + tb_page_addr_t phys_page2; > + target_ulong virt_page2; > + > + virt_page2 = (desc->pc & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE; > + phys_page2 = get_page_addr_code(desc->env, virt_page2); > + if (tb->page_addr[1] == phys_page2) { > + return true; > + } > + } > + } > + return false; > +} > + > static TranslationBlock *tb_find_physical(CPUState *cpu, > target_ulong pc, > target_ulong cs_base, > uint32_t flags) > { > - CPUArchState *env = (CPUArchState *)cpu->env_ptr; > - TranslationBlock *tb, **tb_hash_head, **ptb1; > + tb_page_addr_t phys_pc; > + struct tb_desc desc; > uint32_t h; > - tb_page_addr_t phys_pc, phys_page1; > > - /* find translated block using physical mappings */ > - phys_pc = get_page_addr_code(env, pc); > - phys_page1 = phys_pc & TARGET_PAGE_MASK; > + desc.env = (CPUArchState *)cpu->env_ptr; > + desc.cs_base = cs_base; > + desc.flags = flags; > + desc.pc = pc; > + phys_pc = get_page_addr_code(desc.env, pc); > + desc.phys_page1 = phys_pc & TARGET_PAGE_MASK; > h = tb_hash_func(phys_pc, pc, flags); > - > - /* Start at head of the hash entry */ > - ptb1 = tb_hash_head = &tcg_ctx.tb_ctx.tb_phys_hash[h]; > - tb = *ptb1; > - > - while (tb) { > - if (tb->pc == pc && > - tb->page_addr[0] == phys_page1 && > - tb->cs_base == cs_base && > - tb->flags == flags) { > - > - if (tb->page_addr[1] == -1) { > - /* done, we have a match */ > - break; > - } else { > - /* check next page if needed */ > - target_ulong virt_page2 = (pc & TARGET_PAGE_MASK) + > - TARGET_PAGE_SIZE; > - tb_page_addr_t phys_page2 = get_page_addr_code(env, virt_page2); get_page_addr_code() can trigger a spurious exception here. However, this patch is not a guilt because this could happen before this patch and even before this patch series. Reviewed-by: Sergey Fedorov <serge.fedorov@linaro.org> Kind regards, Sergey > - > - if (tb->page_addr[1] == phys_page2) { > - break; > - } > - } > - } > - > - ptb1 = &tb->phys_hash_next; > - tb = *ptb1; > - } > - > - if (tb) { > - /* Move the TB to the head of the list */ > - *ptb1 = tb->phys_hash_next; > - tb->phys_hash_next = *tb_hash_head; > - *tb_hash_head = tb; > - } > - return tb; > + return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h); > } > > static TranslationBlock *tb_find_slow(CPUState *cpu, > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h > index 85528f9..68e73b6 100644 > --- a/include/exec/exec-all.h > +++ b/include/exec/exec-all.h > @@ -21,6 +21,7 @@ > #define _EXEC_ALL_H_ > > #include "qemu-common.h" > +#include "qemu/qht.h" > > /* allow to see translation results - the slowdown should be negligible, so we leave it */ > #define DEBUG_DISAS > @@ -212,8 +213,8 @@ static inline void tlb_flush_by_mmuidx(CPUState *cpu, ...) > > #define CODE_GEN_ALIGN 16 /* must be >= of the size of a icache line */ > > -#define CODE_GEN_PHYS_HASH_BITS 15 > -#define CODE_GEN_PHYS_HASH_SIZE (1 << CODE_GEN_PHYS_HASH_BITS) > +#define CODE_GEN_HTABLE_BITS 15 > +#define CODE_GEN_HTABLE_SIZE (1 << CODE_GEN_HTABLE_BITS) > > /* Estimated block size for TB allocation. */ > /* ??? The following is based on a 2015 survey of x86_64 host output. > @@ -250,8 +251,6 @@ struct TranslationBlock { > > void *tc_ptr; /* pointer to the translated code */ > uint8_t *tc_search; /* pointer to search data */ > - /* next matching tb for physical address. */ > - struct TranslationBlock *phys_hash_next; > /* original tb when cflags has CF_NOCACHE */ > struct TranslationBlock *orig_tb; > /* first and second physical page containing code. The lower bit > @@ -296,7 +295,7 @@ typedef struct TBContext TBContext; > struct TBContext { > > TranslationBlock *tbs; > - TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE]; > + struct qht htable; > int nb_tbs; > /* any access to the tbs or the page table must use this lock */ > QemuMutex tb_lock; > diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h > index 88ccfd1..1d0200b 100644 > --- a/include/exec/tb-hash.h > +++ b/include/exec/tb-hash.h > @@ -20,7 +20,6 @@ > #ifndef EXEC_TB_HASH > #define EXEC_TB_HASH > > -#include "exec/exec-all.h" > #include "exec/tb-hash-xx.h" > > /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for > @@ -49,7 +48,7 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc) > static inline > uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags) > { > - return tb_hash_func5(phys_pc, pc, flags) & (CODE_GEN_PHYS_HASH_SIZE - 1); > + return tb_hash_func5(phys_pc, pc, flags); > } > > #endif > diff --git a/translate-all.c b/translate-all.c > index c48fccb..5357737 100644 > --- a/translate-all.c > +++ b/translate-all.c > @@ -734,6 +734,13 @@ static inline void code_gen_alloc(size_t tb_size) > qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock); > } > > +static void tb_htable_init(void) > +{ > + unsigned int mode = QHT_MODE_AUTO_RESIZE; > + > + qht_init(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE, mode); > +} > + > /* Must be called before using the QEMU cpus. 'tb_size' is the size > (in bytes) allocated to the translation buffer. Zero means default > size. */ > @@ -741,6 +748,7 @@ void tcg_exec_init(unsigned long tb_size) > { > cpu_gen_init(); > page_init(); > + tb_htable_init(); > code_gen_alloc(tb_size); > #if defined(CONFIG_SOFTMMU) > /* There's no guest base to take into account, so go ahead and > @@ -845,7 +853,7 @@ void tb_flush(CPUState *cpu) > cpu->tb_flushed = true; > } > > - memset(tcg_ctx.tb_ctx.tb_phys_hash, 0, sizeof(tcg_ctx.tb_ctx.tb_phys_hash)); > + qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE); > page_flush_tb(); > > tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer; > @@ -856,60 +864,46 @@ void tb_flush(CPUState *cpu) > > #ifdef DEBUG_TB_CHECK > > -static void tb_invalidate_check(target_ulong address) > +static void > +do_tb_invalidate_check(struct qht *ht, void *p, uint32_t hash, void *userp) > { > - TranslationBlock *tb; > - int i; > + TranslationBlock *tb = p; > + target_ulong addr = *(target_ulong *)userp; > > - address &= TARGET_PAGE_MASK; > - for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) { > - for (tb = tcg_ctx.tb_ctx.tb_phys_hash[i]; tb != NULL; > - tb = tb->phys_hash_next) { > - if (!(address + TARGET_PAGE_SIZE <= tb->pc || > - address >= tb->pc + tb->size)) { > - printf("ERROR invalidate: address=" TARGET_FMT_lx > - " PC=%08lx size=%04x\n", > - address, (long)tb->pc, tb->size); > - } > - } > + if (!(addr + TARGET_PAGE_SIZE <= tb->pc || addr >= tb->pc + tb->size)) { > + printf("ERROR invalidate: address=" TARGET_FMT_lx > + " PC=%08lx size=%04x\n", addr, (long)tb->pc, tb->size); > } > } > > -/* verify that all the pages have correct rights for code */ > -static void tb_page_check(void) > +static void tb_invalidate_check(target_ulong address) > { > - TranslationBlock *tb; > - int i, flags1, flags2; > - > - for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) { > - for (tb = tcg_ctx.tb_ctx.tb_phys_hash[i]; tb != NULL; > - tb = tb->phys_hash_next) { > - flags1 = page_get_flags(tb->pc); > - flags2 = page_get_flags(tb->pc + tb->size - 1); > - if ((flags1 & PAGE_WRITE) || (flags2 & PAGE_WRITE)) { > - printf("ERROR page flags: PC=%08lx size=%04x f1=%x f2=%x\n", > - (long)tb->pc, tb->size, flags1, flags2); > - } > - } > - } > + address &= TARGET_PAGE_MASK; > + qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_invalidate_check, &address); > } > > -#endif > - > -static inline void tb_hash_remove(TranslationBlock **ptb, TranslationBlock *tb) > +static void > +do_tb_page_check(struct qht *ht, void *p, uint32_t hash, void *userp) > { > - TranslationBlock *tb1; > + TranslationBlock *tb = p; > + int flags1, flags2; > > - for (;;) { > - tb1 = *ptb; > - if (tb1 == tb) { > - *ptb = tb1->phys_hash_next; > - break; > - } > - ptb = &tb1->phys_hash_next; > + flags1 = page_get_flags(tb->pc); > + flags2 = page_get_flags(tb->pc + tb->size - 1); > + if ((flags1 & PAGE_WRITE) || (flags2 & PAGE_WRITE)) { > + printf("ERROR page flags: PC=%08lx size=%04x f1=%x f2=%x\n", > + (long)tb->pc, tb->size, flags1, flags2); > } > } > > +/* verify that all the pages have correct rights for code */ > +static void tb_page_check(void) > +{ > + qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_page_check, NULL); > +} > + > +#endif > + > static inline void tb_page_remove(TranslationBlock **ptb, TranslationBlock *tb) > { > TranslationBlock *tb1; > @@ -997,7 +991,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) > /* remove the TB from the hash list */ > phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); > h = tb_hash_func(phys_pc, tb->pc, tb->flags); > - tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb); > + qht_remove(&tcg_ctx.tb_ctx.htable, tb, h); > > /* remove the TB from the page list */ > if (tb->page_addr[0] != page_addr) { > @@ -1127,13 +1121,10 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, > tb_page_addr_t phys_page2) > { > uint32_t h; > - TranslationBlock **ptb; > > /* add in the hash table */ > h = tb_hash_func(phys_pc, tb->pc, tb->flags); > - ptb = &tcg_ctx.tb_ctx.tb_phys_hash[h]; > - tb->phys_hash_next = *ptb; > - *ptb = tb; > + qht_insert(&tcg_ctx.tb_ctx.htable, tb, h); > > /* add in the page list */ > tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);
Emilio G. Cota <cota@braap.org> writes: > Having a fixed-size hash table for keeping track of all translation blocks > is suboptimal: some workloads are just too big or too small to get maximum <snip> > Reviewed-by: Alex Bennée <alex.bennee@linaro.org> > Reviewed-by: Richard Henderson <rth@twiddle.net> > Signed-off-by: Emilio G. Cota <cota@braap.org> > --- > cpu-exec.c | 86 ++++++++++++++++++++++++------------------------- > include/exec/exec-all.h | 9 +++--- > include/exec/tb-hash.h | 3 +- > translate-all.c | 85 ++++++++++++++++++++++-------------------------- > 4 files changed, 86 insertions(+), 97 deletions(-) There are some conflicts with master here due to the move of TBContext. -- Alex Bennée
diff --git a/cpu-exec.c b/cpu-exec.c index 1735032..6a2350d 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -224,57 +224,57 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles, } #endif +struct tb_desc { + target_ulong pc; + target_ulong cs_base; + CPUArchState *env; + tb_page_addr_t phys_page1; + uint32_t flags; +}; + +static bool tb_cmp(const void *p, const void *d) +{ + const TranslationBlock *tb = p; + const struct tb_desc *desc = d; + + if (tb->pc == desc->pc && + tb->page_addr[0] == desc->phys_page1 && + tb->cs_base == desc->cs_base && + tb->flags == desc->flags) { + /* check next page if needed */ + if (tb->page_addr[1] == -1) { + return true; + } else { + tb_page_addr_t phys_page2; + target_ulong virt_page2; + + virt_page2 = (desc->pc & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE; + phys_page2 = get_page_addr_code(desc->env, virt_page2); + if (tb->page_addr[1] == phys_page2) { + return true; + } + } + } + return false; +} + static TranslationBlock *tb_find_physical(CPUState *cpu, target_ulong pc, target_ulong cs_base, uint32_t flags) { - CPUArchState *env = (CPUArchState *)cpu->env_ptr; - TranslationBlock *tb, **tb_hash_head, **ptb1; + tb_page_addr_t phys_pc; + struct tb_desc desc; uint32_t h; - tb_page_addr_t phys_pc, phys_page1; - /* find translated block using physical mappings */ - phys_pc = get_page_addr_code(env, pc); - phys_page1 = phys_pc & TARGET_PAGE_MASK; + desc.env = (CPUArchState *)cpu->env_ptr; + desc.cs_base = cs_base; + desc.flags = flags; + desc.pc = pc; + phys_pc = get_page_addr_code(desc.env, pc); + desc.phys_page1 = phys_pc & TARGET_PAGE_MASK; h = tb_hash_func(phys_pc, pc, flags); - - /* Start at head of the hash entry */ - ptb1 = tb_hash_head = &tcg_ctx.tb_ctx.tb_phys_hash[h]; - tb = *ptb1; - - while (tb) { - if (tb->pc == pc && - tb->page_addr[0] == phys_page1 && - tb->cs_base == cs_base && - tb->flags == flags) { - - if (tb->page_addr[1] == -1) { - /* done, we have a match */ - break; - } else { - /* check next page if needed */ - target_ulong virt_page2 = (pc & TARGET_PAGE_MASK) + - TARGET_PAGE_SIZE; - tb_page_addr_t phys_page2 = get_page_addr_code(env, virt_page2); - - if (tb->page_addr[1] == phys_page2) { - break; - } - } - } - - ptb1 = &tb->phys_hash_next; - tb = *ptb1; - } - - if (tb) { - /* Move the TB to the head of the list */ - *ptb1 = tb->phys_hash_next; - tb->phys_hash_next = *tb_hash_head; - *tb_hash_head = tb; - } - return tb; + return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h); } static TranslationBlock *tb_find_slow(CPUState *cpu, diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 85528f9..68e73b6 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -21,6 +21,7 @@ #define _EXEC_ALL_H_ #include "qemu-common.h" +#include "qemu/qht.h" /* allow to see translation results - the slowdown should be negligible, so we leave it */ #define DEBUG_DISAS @@ -212,8 +213,8 @@ static inline void tlb_flush_by_mmuidx(CPUState *cpu, ...) #define CODE_GEN_ALIGN 16 /* must be >= of the size of a icache line */ -#define CODE_GEN_PHYS_HASH_BITS 15 -#define CODE_GEN_PHYS_HASH_SIZE (1 << CODE_GEN_PHYS_HASH_BITS) +#define CODE_GEN_HTABLE_BITS 15 +#define CODE_GEN_HTABLE_SIZE (1 << CODE_GEN_HTABLE_BITS) /* Estimated block size for TB allocation. */ /* ??? The following is based on a 2015 survey of x86_64 host output. @@ -250,8 +251,6 @@ struct TranslationBlock { void *tc_ptr; /* pointer to the translated code */ uint8_t *tc_search; /* pointer to search data */ - /* next matching tb for physical address. */ - struct TranslationBlock *phys_hash_next; /* original tb when cflags has CF_NOCACHE */ struct TranslationBlock *orig_tb; /* first and second physical page containing code. The lower bit @@ -296,7 +295,7 @@ typedef struct TBContext TBContext; struct TBContext { TranslationBlock *tbs; - TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE]; + struct qht htable; int nb_tbs; /* any access to the tbs or the page table must use this lock */ QemuMutex tb_lock; diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h index 88ccfd1..1d0200b 100644 --- a/include/exec/tb-hash.h +++ b/include/exec/tb-hash.h @@ -20,7 +20,6 @@ #ifndef EXEC_TB_HASH #define EXEC_TB_HASH -#include "exec/exec-all.h" #include "exec/tb-hash-xx.h" /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for @@ -49,7 +48,7 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc) static inline uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags) { - return tb_hash_func5(phys_pc, pc, flags) & (CODE_GEN_PHYS_HASH_SIZE - 1); + return tb_hash_func5(phys_pc, pc, flags); } #endif diff --git a/translate-all.c b/translate-all.c index c48fccb..5357737 100644 --- a/translate-all.c +++ b/translate-all.c @@ -734,6 +734,13 @@ static inline void code_gen_alloc(size_t tb_size) qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock); } +static void tb_htable_init(void) +{ + unsigned int mode = QHT_MODE_AUTO_RESIZE; + + qht_init(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE, mode); +} + /* Must be called before using the QEMU cpus. 'tb_size' is the size (in bytes) allocated to the translation buffer. Zero means default size. */ @@ -741,6 +748,7 @@ void tcg_exec_init(unsigned long tb_size) { cpu_gen_init(); page_init(); + tb_htable_init(); code_gen_alloc(tb_size); #if defined(CONFIG_SOFTMMU) /* There's no guest base to take into account, so go ahead and @@ -845,7 +853,7 @@ void tb_flush(CPUState *cpu) cpu->tb_flushed = true; } - memset(tcg_ctx.tb_ctx.tb_phys_hash, 0, sizeof(tcg_ctx.tb_ctx.tb_phys_hash)); + qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE); page_flush_tb(); tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer; @@ -856,60 +864,46 @@ void tb_flush(CPUState *cpu) #ifdef DEBUG_TB_CHECK -static void tb_invalidate_check(target_ulong address) +static void +do_tb_invalidate_check(struct qht *ht, void *p, uint32_t hash, void *userp) { - TranslationBlock *tb; - int i; + TranslationBlock *tb = p; + target_ulong addr = *(target_ulong *)userp; - address &= TARGET_PAGE_MASK; - for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) { - for (tb = tcg_ctx.tb_ctx.tb_phys_hash[i]; tb != NULL; - tb = tb->phys_hash_next) { - if (!(address + TARGET_PAGE_SIZE <= tb->pc || - address >= tb->pc + tb->size)) { - printf("ERROR invalidate: address=" TARGET_FMT_lx - " PC=%08lx size=%04x\n", - address, (long)tb->pc, tb->size); - } - } + if (!(addr + TARGET_PAGE_SIZE <= tb->pc || addr >= tb->pc + tb->size)) { + printf("ERROR invalidate: address=" TARGET_FMT_lx + " PC=%08lx size=%04x\n", addr, (long)tb->pc, tb->size); } } -/* verify that all the pages have correct rights for code */ -static void tb_page_check(void) +static void tb_invalidate_check(target_ulong address) { - TranslationBlock *tb; - int i, flags1, flags2; - - for (i = 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) { - for (tb = tcg_ctx.tb_ctx.tb_phys_hash[i]; tb != NULL; - tb = tb->phys_hash_next) { - flags1 = page_get_flags(tb->pc); - flags2 = page_get_flags(tb->pc + tb->size - 1); - if ((flags1 & PAGE_WRITE) || (flags2 & PAGE_WRITE)) { - printf("ERROR page flags: PC=%08lx size=%04x f1=%x f2=%x\n", - (long)tb->pc, tb->size, flags1, flags2); - } - } - } + address &= TARGET_PAGE_MASK; + qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_invalidate_check, &address); } -#endif - -static inline void tb_hash_remove(TranslationBlock **ptb, TranslationBlock *tb) +static void +do_tb_page_check(struct qht *ht, void *p, uint32_t hash, void *userp) { - TranslationBlock *tb1; + TranslationBlock *tb = p; + int flags1, flags2; - for (;;) { - tb1 = *ptb; - if (tb1 == tb) { - *ptb = tb1->phys_hash_next; - break; - } - ptb = &tb1->phys_hash_next; + flags1 = page_get_flags(tb->pc); + flags2 = page_get_flags(tb->pc + tb->size - 1); + if ((flags1 & PAGE_WRITE) || (flags2 & PAGE_WRITE)) { + printf("ERROR page flags: PC=%08lx size=%04x f1=%x f2=%x\n", + (long)tb->pc, tb->size, flags1, flags2); } } +/* verify that all the pages have correct rights for code */ +static void tb_page_check(void) +{ + qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_page_check, NULL); +} + +#endif + static inline void tb_page_remove(TranslationBlock **ptb, TranslationBlock *tb) { TranslationBlock *tb1; @@ -997,7 +991,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) /* remove the TB from the hash list */ phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); h = tb_hash_func(phys_pc, tb->pc, tb->flags); - tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb); + qht_remove(&tcg_ctx.tb_ctx.htable, tb, h); /* remove the TB from the page list */ if (tb->page_addr[0] != page_addr) { @@ -1127,13 +1121,10 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, tb_page_addr_t phys_page2) { uint32_t h; - TranslationBlock **ptb; /* add in the hash table */ h = tb_hash_func(phys_pc, tb->pc, tb->flags); - ptb = &tcg_ctx.tb_ctx.tb_phys_hash[h]; - tb->phys_hash_next = *ptb; - *ptb = tb; + qht_insert(&tcg_ctx.tb_ctx.htable, tb, h); /* add in the page list */ tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);