Message ID | 1458222382-6498-5-git-send-email-sergey.fedorov@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 17/03/2016 14:46, sergey.fedorov@linaro.org wrote: > void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) > { > - CPUState *cpu; > PageDesc *p; > unsigned int h, n1; > + tb_page_addr_t pc; > tb_page_addr_t phys_pc; > TranslationBlock *tb1, *tb2; > > - /* remove the TB from the hash list */ > - phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); > - h = tb_phys_hash_func(phys_pc); > - tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb); > - > - /* remove the TB from the page list */ > - if (tb->page_addr[0] != page_addr) { > - p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS); > - tb_page_remove(&p->first_tb, tb); > - invalidate_page_bitmap(p); > - } > - if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) { > - p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS); > - tb_page_remove(&p->first_tb, tb); > - invalidate_page_bitmap(p); > - } > - > - tcg_ctx.tb_ctx.tb_invalidated_flag = 1; > - Did you investigate the removal of this setting of tb_invalidated_flag? My recollection is that it is okay to remove it because at worse it would cause a tb_add_jump from an invalidated source to a valid destination. This should be harmless as long as the source has been tb_phys_invalidated and not tb_flushed. But this needs to be checked. Paolo
On 17/03/16 18:09, Paolo Bonzini wrote: > > On 17/03/2016 14:46, sergey.fedorov@linaro.org wrote: >> void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) >> { >> - CPUState *cpu; >> PageDesc *p; >> unsigned int h, n1; >> + tb_page_addr_t pc; >> tb_page_addr_t phys_pc; >> TranslationBlock *tb1, *tb2; >> >> - /* remove the TB from the hash list */ >> - phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); >> - h = tb_phys_hash_func(phys_pc); >> - tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb); >> - >> - /* remove the TB from the page list */ >> - if (tb->page_addr[0] != page_addr) { >> - p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS); >> - tb_page_remove(&p->first_tb, tb); >> - invalidate_page_bitmap(p); >> - } >> - if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) { >> - p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS); >> - tb_page_remove(&p->first_tb, tb); >> - invalidate_page_bitmap(p); >> - } >> - >> - tcg_ctx.tb_ctx.tb_invalidated_flag = 1; >> - > Did you investigate the removal of this setting of tb_invalidated_flag? > > My recollection is that it is okay to remove it because at worse it > would cause a tb_add_jump from an invalidated source to a valid > destination. This should be harmless as long as the source has been > tb_phys_invalidated and not tb_flushed. But this needs to be checked. Thanks for pointing that. I should investigate it to make sure, although arm32/arm64/x86_64 Linux boots fine as well as the latest Alex's kmv-unit-tests pass. Kind regards, Sergey
On 17/03/16 18:14, Sergey Fedorov wrote: > On 17/03/16 18:09, Paolo Bonzini wrote: >> >> On 17/03/2016 14:46, sergey.fedorov@linaro.org wrote: >>> void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t >>> page_addr) >>> { >>> - CPUState *cpu; >>> PageDesc *p; >>> unsigned int h, n1; >>> + tb_page_addr_t pc; >>> tb_page_addr_t phys_pc; >>> TranslationBlock *tb1, *tb2; >>> - /* remove the TB from the hash list */ >>> - phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); >>> - h = tb_phys_hash_func(phys_pc); >>> - tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb); >>> - >>> - /* remove the TB from the page list */ >>> - if (tb->page_addr[0] != page_addr) { >>> - p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS); >>> - tb_page_remove(&p->first_tb, tb); >>> - invalidate_page_bitmap(p); >>> - } >>> - if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) { >>> - p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS); >>> - tb_page_remove(&p->first_tb, tb); >>> - invalidate_page_bitmap(p); >>> - } >>> - >>> - tcg_ctx.tb_ctx.tb_invalidated_flag = 1; >>> - >> Did you investigate the removal of this setting of tb_invalidated_flag? >> >> My recollection is that it is okay to remove it because at worse it >> would cause a tb_add_jump from an invalidated source to a valid >> destination. This should be harmless as long as the source has been >> tb_phys_invalidated and not tb_flushed. But this needs to be checked. > > Thanks for pointing that. I should investigate it to make sure, > although arm32/arm64/x86_64 Linux boots fine as well as the latest > Alex's kmv-unit-tests pass. The use pattern of 'tb_invalidated_flag' is a bit intricate; correct me, if I'm wrong about the following. Basically, 'tb_invalidated_flag' was meant to catch two events: * some TB has been invalidated by tb_phys_invalidate(); * the whole translation buffer has been flushed by tb_flush(). Then it is checked to ensure: * the last executed TB can be safely patched to directly call the next one in cpu_exec(); * the original TB should be provided for further possible invalidation along with the temporarily generated TB when in cpu_exec_nocache(). What, I think, we couldn't be sure in is the cpu_exec_nocache() case. It could look like a kind of corner case, but it could result in stalls, if the original TB isn't invalidated properly, see b4ac20b4df0d ("cpu-exec: fix cpu_exec_nocache"). So I would suggest the following solution:s (1) Use 'tb->pc' as an indicator of whether TB is valid; check for it in cpu_exec() when deciding on whether to patch the last executed TB or not (2) Use 'tcg_ctx.tb_ctx.tb_flush_count' to check for translation buffer flushes; capture it before calling tb_gen_code() and compare to it afterwards to check if tb_flush() has been called in between What do you think? Kind regards, Sergey
On 17/03/16 16:46, sergey.fedorov@linaro.org wrote: > First the translation block is invalidated, for which a simple write > to tb->pc is enough. This means that cpu-exec will not pick up anymore > the block, though it may still execute it through chained jumps. This > also replaces the NULLing out of the pointer in the CPUs' local cache. Although, using 'tb->pc' to mark a TB as invalid is probably not such a good idea. There may be some cases when PC could become equal to -1. For example, ARMv6-M uses PC >= 0xFFFFFFF0 to perform exception return. So we'd better introduce a separate 'tb->valid' or 'tb->invalid' flag. Kind regards, Sergey
On 28/03/2016 20:42, Sergey Fedorov wrote: > On 17/03/16 16:46, sergey.fedorov@linaro.org wrote: >> First the translation block is invalidated, for which a simple write >> to tb->pc is enough. This means that cpu-exec will not pick up anymore >> the block, though it may still execute it through chained jumps. This >> also replaces the NULLing out of the pointer in the CPUs' local cache. > > Although, using 'tb->pc' to mark a TB as invalid is probably not such a > good idea. There may be some cases when PC could become equal to -1. For > example, ARMv6-M uses PC >= 0xFFFFFFF0 to perform exception return. So > we'd better introduce a separate 'tb->valid' or 'tb->invalid' flag. It is also possible to use tb->flags for that. I suspect that all-ones tb flags is never valid, but it could also be a #define. Paolo
On 28/03/2016 17:18, Sergey Fedorov wrote: > The use pattern of 'tb_invalidated_flag' is a bit intricate; correct me, > if I'm wrong about the following. Basically, 'tb_invalidated_flag' was > meant to catch two events: > * some TB has been invalidated by tb_phys_invalidate(); This is patch 4. > * the whole translation buffer has been flushed by tb_flush(). This is patch 5. > Then it is checked to ensure: > * the last executed TB can be safely patched to directly call the next > one in cpu_exec(); > * the original TB should be provided for further possible invalidation > along with the temporarily generated TB when in cpu_exec_nocache(). > > [...] I would suggest the following solution: > (1) Use 'tb->pc' as an indicator of whether TB is valid; check for it > in cpu_exec() when deciding on whether to patch the last executed > TB or not > (2) Use 'tcg_ctx.tb_ctx.tb_flush_count' to check for translation buffer > flushes; capture it before calling tb_gen_code() and compare to it > afterwards to check if tb_flush() has been called in between Of course that would work, but it would be slower. I think it is unnecessary for two reasons: 1) There are two calls to cpu_exec_nocache. One exits immediately with "break;", the other always sets "next_tb = 0;". Therefore it is safe in both cases for cpu_exec_nocache to hijack cpu->tb_invalidated_flag. 2) if it were broken, it would _also_ be broken before these patches because cpu_exec_nocache always runs with tb_lock taken. So I think documenting the assumptions is better than changing them at the same time as doing other changes. Your observation that tb->pc==-1 is not necessarily safe still holds of course. Probably the best thing is an inline that can do one of: 1) set cs_base to an invalid value (anything nonzero is enough except on x86 and SPARC; SPARC can use all-ones) 2) sets the flags to an invalid combination (x86 can use all ones) 3) sets the PC to an invalid value (no one really needs it) Paolo
On 03/28/2016 01:58 PM, Paolo Bonzini wrote: > > > On 28/03/2016 20:42, Sergey Fedorov wrote: >> On 17/03/16 16:46, sergey.fedorov@linaro.org wrote: >>> First the translation block is invalidated, for which a simple write >>> to tb->pc is enough. This means that cpu-exec will not pick up anymore >>> the block, though it may still execute it through chained jumps. This >>> also replaces the NULLing out of the pointer in the CPUs' local cache. >> >> Although, using 'tb->pc' to mark a TB as invalid is probably not such a >> good idea. There may be some cases when PC could become equal to -1. For >> example, ARMv6-M uses PC >= 0xFFFFFFF0 to perform exception return. So >> we'd better introduce a separate 'tb->valid' or 'tb->invalid' flag. > > It is also possible to use tb->flags for that. I suspect that all-ones > tb flags is never valid, but it could also be a #define. That might work by accident, but it might not. You'd need to reserve a bit across all of the targets. r~
On 29/03/16 00:21, Paolo Bonzini wrote: > > On 28/03/2016 17:18, Sergey Fedorov wrote: >> The use pattern of 'tb_invalidated_flag' is a bit intricate; correct me, >> if I'm wrong about the following. Basically, 'tb_invalidated_flag' was >> meant to catch two events: >> * some TB has been invalidated by tb_phys_invalidate(); > This is patch 4. > >> * the whole translation buffer has been flushed by tb_flush(). > This is patch 5. > >> Then it is checked to ensure: >> * the last executed TB can be safely patched to directly call the next >> one in cpu_exec(); >> * the original TB should be provided for further possible invalidation >> along with the temporarily generated TB when in cpu_exec_nocache(). >> >> [...] I would suggest the following solution: >> (1) Use 'tb->pc' as an indicator of whether TB is valid; check for it >> in cpu_exec() when deciding on whether to patch the last executed >> TB or not >> (2) Use 'tcg_ctx.tb_ctx.tb_flush_count' to check for translation buffer >> flushes; capture it before calling tb_gen_code() and compare to it >> afterwards to check if tb_flush() has been called in between > Of course that would work, but it would be slower. What's going to be slower? > I think it is > unnecessary for two reasons: > > 1) There are two calls to cpu_exec_nocache. One exits immediately with > "break;", the other always sets "next_tb = 0;". Therefore it is safe in > both cases for cpu_exec_nocache to hijack cpu->tb_invalidated_flag. > > 2) if it were broken, it would _also_ be broken before these patches > because cpu_exec_nocache always runs with tb_lock taken. I can't see how cpu_exec_nocache() always runs with tb_lock taken. > So I think > documenting the assumptions is better than changing them at the same > time as doing other changes. I'm not sure I understand you here exactly, but if implementing my proposal, it'd rather be a separate patch/series, I think. > Your observation that tb->pc==-1 is not necessarily safe still holds of > course. Probably the best thing is an inline that can do one of: > > 1) set cs_base to an invalid value (anything nonzero is enough except on > x86 and SPARC; SPARC can use all-ones) > > 2) sets the flags to an invalid combination (x86 can use all ones) > > 3) sets the PC to an invalid value (no one really needs it) It's a bit tricky. Does it really worth doing so instead of using a separate dedicated flag? Mainly, it should cost one extra compare on TB look-up. I suppose it's a kind of trade-off between performance and code clarity. Kind regards, Sergey
On 29/03/2016 12:03, Sergey Fedorov wrote: >>> >> [...] I would suggest the following solution: >>> >> (1) Use 'tb->pc' as an indicator of whether TB is valid; check for it >>> >> in cpu_exec() when deciding on whether to patch the last executed >>> >> TB or not >>> >> (2) Use 'tcg_ctx.tb_ctx.tb_flush_count' to check for translation buffer >>> >> flushes; capture it before calling tb_gen_code() and compare to it >>> >> afterwards to check if tb_flush() has been called in between >> > Of course that would work, but it would be slower. > What's going to be slower? Checking tb->pc (or something similar) *and* tb_flush_count in tb_find_physical. > > I think it is > > unnecessary for two reasons: > > > > 1) There are two calls to cpu_exec_nocache. One exits immediately with > > "break;", the other always sets "next_tb = 0;". Therefore it is safe in > > both cases for cpu_exec_nocache to hijack cpu->tb_invalidated_flag. > > > > 2) if it were broken, it would _also_ be broken before these patches > > because cpu_exec_nocache always runs with tb_lock taken. > > I can't see how cpu_exec_nocache() always runs with tb_lock taken. It takes the lock itself. :) From Fred's "tcg: protect TBContext with tb_lock", as it is in my mttcg branch: @@ -194,17 +194,23 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles, if (max_cycles > CF_COUNT_MASK) max_cycles = CF_COUNT_MASK; + tb_lock(); cpu->tb_invalidated_flag = 0; tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags, max_cycles | CF_NOCACHE); tb->orig_tb = cpu->tb_invalidated_flag ? NULL : orig_tb; cpu->current_tb = tb; + tb_unlock(); + /* execute the generated code */ trace_exec_tb_nocache(tb, tb->pc); cpu_tb_exec(cpu, tb->tc_ptr); + + tb_lock(); cpu->current_tb = NULL; tb_phys_invalidate(tb, -1); tb_free(tb); + tb_unlock(); } #endif It takes the lock before resetting tb_invalidated_flag. cpu_exec_nocache is not used in user-mode emulation, so it's okay if qemu.git doesn't take the lock yet. (This kind of misunderstanding about which code is thread-safe is going to be common until we have MTTCG. This was the reason for the patch "cpu-exec: elide more icount code if CONFIG_USER_ONLY"). > > So I think > > documenting the assumptions is better than changing them at the same > > time as doing other changes. > > I'm not sure I understand you here exactly, but if implementing my > proposal, it'd rather be a separate patch/series, I think. Exactly. For the purpose of these 5 patches, I would just document above cpu_exec_nocache that callers should ensure that next_tb is zero. Alternatively, you could add a patch that does old_tb_invalidated_flag = cpu->tb_invalidated_flag; cpu->tb_invalidated_flag = 0; ... cpu->tb_invalidated_flag |= old_tb_invalidated_flag; it could use the single global flag (and then it would be between patch 4 and patch 5) or it could use the CPU-specific one (and then it would be after patch 5). However, I think documenting the requirements is fine. >> > Your observation that tb->pc==-1 is not necessarily safe still holds of >> > course. Probably the best thing is an inline that can do one of: >> > >> > 1) set cs_base to an invalid value (anything nonzero is enough except on >> > x86 and SPARC; SPARC can use all-ones) >> > >> > 2) sets the flags to an invalid combination (x86 can use all ones) >> > >> > 3) sets the PC to an invalid value (no one really needs it) > > It's a bit tricky. Does it really worth doing so instead of using a > separate dedicated flag? Mainly, it should cost one extra compare on TB > look-up. I suppose it's a kind of trade-off between performance and code > clarity. I think a new inline function cpu_make_tb_invalid would not be too tricky. Just setting "tb->cs_base = -1;" is pretty much obvious for all the targets that do not use cs_base at all and for SPARC which sets it to a PC (and thus a multiple of four). x86 is the odd one out. Thanks, Paolo
On 29/03/16 13:37, Paolo Bonzini wrote: > cpu_exec_nocache is not used in user-mode emulation, so it's okay if > qemu.git doesn't take the lock yet. (This kind of misunderstanding > about which code is thread-safe is going to be common until we have > MTTCG. This was the reason for the patch "cpu-exec: elide more icount > code if CONFIG_USER_ONLY"). So I'd better pull in here the rebased version of this patch from Alex's "Base enabling patches for MTTCG" series. Kind regards, Sergey
Sergey Fedorov <serge.fdrv@gmail.com> writes: > On 29/03/16 13:37, Paolo Bonzini wrote: >> cpu_exec_nocache is not used in user-mode emulation, so it's okay if >> qemu.git doesn't take the lock yet. (This kind of misunderstanding >> about which code is thread-safe is going to be common until we have >> MTTCG. This was the reason for the patch "cpu-exec: elide more icount >> code if CONFIG_USER_ONLY"). > > So I'd better pull in here the rebased version of this patch from Alex's > "Base enabling patches for MTTCG" series. OK, sounds good. -- Alex Bennée
On 29/03/16 13:37, Paolo Bonzini wrote: >>>> Your observation that tb->pc==-1 is not necessarily safe still holds of >>>> >> > course. Probably the best thing is an inline that can do one of: >>>> >> > >>>> >> > 1) set cs_base to an invalid value (anything nonzero is enough except on >>>> >> > x86 and SPARC; SPARC can use all-ones) >>>> >> > >>>> >> > 2) sets the flags to an invalid combination (x86 can use all ones) >>>> >> > >>>> >> > 3) sets the PC to an invalid value (no one really needs it) >> > >> > It's a bit tricky. Does it really worth doing so instead of using a >> > separate dedicated flag? Mainly, it should cost one extra compare on TB >> > look-up. I suppose it's a kind of trade-off between performance and code >> > clarity. > I think a new inline function cpu_make_tb_invalid would not be too tricky. > Just setting "tb->cs_base = -1;" is pretty much obvious for all the targets > that do not use cs_base at all and for SPARC which sets it to a PC (and > thus a multiple of four). x86 is the odd one out. So what would you suggest to use for x86? I can't think of something that looks like a really compelling combination when I look at cpu_get_tb_cpu_state() in target-i386/cpu.h. Personally, I'm not so happy trying to use pc/cs_base/flags to mark an invalid TB. Are my worries unreasonable? :) Some other thoughts? Anyway, I am wondering if there is still a way to clear tb_phys_hash and tb_jmp_cache safely. Maybe something like this: * Remove the TB from physical hash list * Memory barrier * Remove the TB from each vCPU's virtual address hash cache Would that work? Kind regards, Sergey
On 14/04/2016 16:45, Sergey Fedorov wrote: > So what would you suggest to use for x86? I can't think of something > that looks like a really compelling combination when I look at > cpu_get_tb_cpu_state() in target-i386/cpu.h. On x86 I think we should define HF_INVALID_TB to an invalid flag combination. I can think of several solutions: - #defining HF_INVALID_TB to an invalid combination (e.g. HF_CS64_MASK, because it always appears together with HF_LMA_MASK; see code that updates those hflags in cpu_x86_load_seg_cache, cpu_x86_update_cr0, kvm_get_sregs). Advantage: doesn't waste a bit, reasonably self documenting. Disadvantage: a bit tricky, but still my favorite. - rename HF_SOFTMMU_MASK to HF_INVALID_MASK (it's always the same as CONFIG_SOFTMMU so we can remove it), then #define HF_INVALID_TB HF_INVALID_MASK. Advantage: obviously correct. Disadvantage: wastes a bit. My second favorite. - #defining HF_INVALID_TB to -1. Advantage: ?!? Disadvantage: everything. Looks tame, actually a huge hack - #defining HF_INVALID_TB to the "wrong" direction HF_SOFTMMU_MASK (i.e. to 0 if CONFIG_SOFTMMU, . Advantage: obviously correct. Disadvantage: huge hack, HF_SOFTMMU_MASK is unused anyway. Choose your own favorite. :) (Setting cs_base to -1 actually would work on 64-bit x86, but not on qemu-system-i386). > Personally, I'm not so > happy trying to use pc/cs_base/flags to mark an invalid TB. Are my > worries unreasonable? :) Can you explain your worries? The advantages are that it's O(1) and it obviously doesn't affect other TBs than the invalidated one. > Anyway, I am wondering if there is still a way to clear tb_phys_hash and > tb_jmp_cache safely. > > Maybe something like this: > * Remove the TB from physical hash list So at this point tb_find_slow cannot find it. > * Memory barrier > * Remove the TB from each vCPU's virtual address hash cache tb_find_fast then cannot find it either. > Would that work? This is very similar to the current code. From 10,000 feet, because tb_find_fast calls tb_find_slow, this could indeed work, but I'm a bit concerned about how to order the removal of the jump lists. The usage of "tcg_ctx.tb_ctx.tb_invalidated_flag = 1" in the existing code was what worries me. Indeed the motivation of this patch was removing that single line of code to prepare for the move of tb_invalidated_flag to CPUState. Also, this loop will not be thread-safe anymore as soon as Fred's "tb_jmp_cache lookup outside tb_lock" goes in: CPU_FOREACH(cpu) { if (cpu->tb_jmp_cache[h] == tb) { cpu->tb_jmp_cache[h] = NULL; } } It should use atomic_cmpxchg (slow!) or to unconditionally NULL out cpu->tb_jmp_cache (a bit hacky). Preparing for that change is an added bonus of the tb-hacking approach. Paolo
On 14/04/16 18:13, Paolo Bonzini wrote: > This is very similar to the current code. From 10,000 feet, because > tb_find_fast calls tb_find_slow, this could indeed work, but I'm a bit > concerned about how to order the removal of the jump lists. The usage > of "tcg_ctx.tb_ctx.tb_invalidated_flag = 1" in the existing code was > what worries me. Indeed the motivation of this patch was removing that > single line of code to prepare for the move of tb_invalidated_flag to > CPUState. I'm just preparing a patch with a long commit message which describes and removes this setting of tb_invalidated_flag :) I think we can do this safely and even benefit in performance. > > Also, this loop will not be thread-safe anymore as soon as Fred's > "tb_jmp_cache lookup outside tb_lock" goes in: > > CPU_FOREACH(cpu) { > if (cpu->tb_jmp_cache[h] == tb) { > cpu->tb_jmp_cache[h] = NULL; > } > } > > It should use atomic_cmpxchg (slow!) or to unconditionally NULL out > cpu->tb_jmp_cache (a bit hacky). Preparing for that change is an added > bonus of the tb-hacking approach. IIUC we always modify tb_jmp_cache/tb_phys_hash under tb_lock. We're just gonna do tb_jmp_cache lookup outside of tb_lock. I think write memory barrier in tb_phys_invalidate() paired with read memory memory barrier in tb_find_fast()/tb_find_slow() would be enough to make it safe. Also we need to use atomic read/set to tb_jmp_cache but cmpxchg is not necessary. So I'm going to give it a thought then. Kind regards, Sergey
On 14/04/2016 17:36, Sergey Fedorov wrote: > IIUC we always modify tb_jmp_cache/tb_phys_hash under tb_lock. We're > just gonna do tb_jmp_cache lookup outside of tb_lock. I think write > memory barrier in tb_phys_invalidate() paired with read memory memory > barrier in tb_find_fast()/tb_find_slow() would be enough to make it > safe. Also we need to use atomic read/set to tb_jmp_cache but cmpxchg is > not necessary. So I'm going to give it a thought then. Sounds good! Paolo
On 14/04/16 18:13, Paolo Bonzini wrote: > This is very similar to the current code. From 10,000 feet, because > tb_find_fast calls tb_find_slow, this could indeed work, but I'm a bit > concerned about how to order the removal of the jump lists. As long as we always link/unlink TBs under tb_lock the exact order doesn't seem to be important. Kind regards, Sergey
On 14/04/16 18:13, Paolo Bonzini wrote: > On 14/04/2016 16:45, Sergey Fedorov wrote: >> Personally, I'm not so >> happy trying to use pc/cs_base/flags to mark an invalid TB. Are my >> worries unreasonable? :) > Can you explain your worries? > > The advantages are that it's O(1) and it obviously doesn't affect other > TBs than the invalidated one. Well, it looks a bit hucky, I don't like hucky code because it is each time hard to read, modify and be confident in it. I would like to introduce a dedicated boolean field to mark a TB as valid/invalid. But this approach adds one more compare in tb_find_fast() and tb_find_slow(). Luckily, it may be not an issue if my proposal with memory barriers and atomic access to 'tb_jmp_cache' solves the problem. Kind regards, Sergey
diff --git a/translate-all.c b/translate-all.c index a1ac9841de48..1db5a914d9a3 100644 --- a/translate-all.c +++ b/translate-all.c @@ -966,40 +966,21 @@ static inline void tb_jmp_remove(TranslationBlock *tb, int n) /* invalidate one TB */ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) { - CPUState *cpu; PageDesc *p; unsigned int h, n1; + tb_page_addr_t pc; tb_page_addr_t phys_pc; TranslationBlock *tb1, *tb2; - /* remove the TB from the hash list */ - phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); - h = tb_phys_hash_func(phys_pc); - tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb); - - /* remove the TB from the page list */ - if (tb->page_addr[0] != page_addr) { - p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS); - tb_page_remove(&p->first_tb, tb); - invalidate_page_bitmap(p); - } - if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) { - p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS); - tb_page_remove(&p->first_tb, tb); - invalidate_page_bitmap(p); - } - - tcg_ctx.tb_ctx.tb_invalidated_flag = 1; - - /* remove the TB from the hash list */ - h = tb_jmp_cache_hash_func(tb->pc); - CPU_FOREACH(cpu) { - if (cpu->tb_jmp_cache[h] == tb) { - cpu->tb_jmp_cache[h] = NULL; - } - } + /* First invalidate the translation block. CPUs will not use it anymore + * from their local caches. + */ + pc = tb->pc; + tb->pc = -1; - /* suppress this TB from the two jump lists */ + /* Then suppress this TB from the two jump lists. CPUs will not jump + * anymore into this translation block. + */ tb_jmp_remove(tb, 0); tb_jmp_remove(tb, 1); @@ -1017,6 +998,25 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) } tb->jmp_first = (TranslationBlock *)((uintptr_t)tb | 2); /* fail safe */ + /* Now remove the TB from the hash list, so that tb_find_slow + * cannot find it anymore. + */ + phys_pc = tb->page_addr[0] + (pc & ~TARGET_PAGE_MASK); + h = tb_phys_hash_func(phys_pc); + tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb); + + /* remove the TB from the page list */ + if (tb->page_addr[0] != page_addr) { + p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS); + tb_page_remove(&p->first_tb, tb); + invalidate_page_bitmap(p); + } + if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) { + p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS); + tb_page_remove(&p->first_tb, tb); + invalidate_page_bitmap(p); + } + tcg_ctx.tb_ctx.tb_phys_invalidate_count++; }