Message ID | 145625173880.12025.6630606700468410319.stgit@localhost (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 23 February 2016 at 18:22, Lluís Vilanova <vilanova@ac.upc.edu> wrote: > Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> > +### Guest events, keep at bottom > + > +# @vaddr: Access' virtual address. > +# @size : Access' size (bytes). > +# @store: Whether the access is a store. > +# > +# Start virtual memory access (before any potential access violation). > +# > +# Does not include memory accesses performed by devices. > +# > +# Targets: TCG(all) > +disable vcpu tcg guest_vmem(TCGv vaddr, uint8_t size, uint8_t store) "size=%d store=%d", "vaddr=0x%016"PRIx64" size=%d store=%d" Shouldn't we also be tracing some of the other information in the memop? In particular, endianness of the access seems useful. You could also say whether we're zero- or sign-extending the access, though I guess you could defend not printing that since we don't print anything about what the target- code does with the loaded data. Otherwise I think this looks OK (though the various paths memory accesses take are a bit opaque since I've forgotten how this bit of QEMU works). thanks -- PMM
Peter Maydell writes: > On 23 February 2016 at 18:22, Lluís Vilanova <vilanova@ac.upc.edu> wrote: >> Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> >> +### Guest events, keep at bottom >> + >> +# @vaddr: Access' virtual address. >> +# @size : Access' size (bytes). >> +# @store: Whether the access is a store. >> +# >> +# Start virtual memory access (before any potential access violation). >> +# >> +# Does not include memory accesses performed by devices. >> +# >> +# Targets: TCG(all) >> +disable vcpu tcg guest_vmem(TCGv vaddr, uint8_t size, uint8_t store) "size=%d store=%d", "vaddr=0x%016"PRIx64" size=%d store=%d" > Shouldn't we also be tracing some of the other information in the memop? > In particular, endianness of the access seems useful. You could also > say whether we're zero- or sign-extending the access, though I guess > you could defend not printing that since we don't print anything about > what the target- code does with the loaded data. > Otherwise I think this looks OK (though the various paths memory accesses > take are a bit opaque since I've forgotten how this bit of QEMU works). Mmmmm, the endianness seems more of a vCPU property than one of the memory access. A separate event could be added for that (e.g., at vCPU initalization/hot-plug and whenever it is dynamically changed like in ARM). For the sign extension and memory value, what about adding multiple events? What I did for instructions is have a simple event and one with extended information, so that we can tweak performance of a tracing QEMU by choosing one or the other. We could do the same for memory accesses (e.g., also show the memory value, sign extension and physical address).
On 03/17/2016 12:22 PM, Lluís Vilanova wrote: > Mmmmm, the endianness seems more of a vCPU property than one of the memory > access. On the contrary. Plenty of cpus have endian-swapping load/store insns: x86 (haswell's movbe), powerpc, sparcv9, s390. Maybe others I've forgotten. r~
On 17 March 2016 at 19:22, Lluís Vilanova <vilanova@ac.upc.edu> wrote: > Mmmmm, the endianness seems more of a vCPU property than one of the memory > access. A separate event could be added for that (e.g., at vCPU > initalization/hot-plug and whenever it is dynamically changed like in ARM). We've chosen to implement it in QEMU as a property of the memory access. Consider for instance instructions like x86 movbe -- sometimes an individual instruction will be a BE access even if the CPU more usually does LE accesses. Equally, CPUs might have different accesses for data and code, or other complicated things. I think we're probably better off tracing as part of the memory access the things we've implemented as memory access properties or flags; at least that's consistent. > For the sign extension and memory value, what about adding multiple events? > What I did for instructions is have a simple event and one with extended > information, so that we can tweak performance of a tracing QEMU by choosing one > or the other. We could do the same for memory accesses (e.g., also show the > memory value, sign extension and physical address). I think the info that's in the memop value is probably worth just printing always, it won't make much difference. Trying to trace physaddrs is very tricky -- in the case of a TLB hit there is no guarantee you can still identify the physaddr of what you're accessing (the guest might have changed the page tables and not invalidated the TLB). thanks -- PMM
Peter Maydell writes: > On 17 March 2016 at 19:22, Lluís Vilanova <vilanova@ac.upc.edu> wrote: >> Mmmmm, the endianness seems more of a vCPU property than one of the memory >> access. A separate event could be added for that (e.g., at vCPU >> initalization/hot-plug and whenever it is dynamically changed like in ARM). > We've chosen to implement it in QEMU as a property of the memory > access. Consider for instance instructions like x86 movbe -- > sometimes an individual instruction will be a BE access even if the > CPU more usually does LE accesses. Equally, CPUs might have different > accesses for data and code, or other complicated things. > I think we're probably better off tracing as part of the memory > access the things we've implemented as memory access properties > or flags; at least that's consistent. Oh, I wasn't aware of the endinness swapping, but your argument makes sense. >> For the sign extension and memory value, what about adding multiple events? >> What I did for instructions is have a simple event and one with extended >> information, so that we can tweak performance of a tracing QEMU by choosing one >> or the other. We could do the same for memory accesses (e.g., also show the >> memory value, sign extension and physical address). > I think the info that's in the memop value is probably worth > just printing always, it won't make much difference. But is the endianness, sign extension, etc. valuable information when the data is not part of the trace? If it is, I can easily add that; but, there is still some additional per-argument overhead when generating a TCG call to the exec-time tracing routine, thus my target on adding two trace events with different levels of detail. I haven't measured the impact, though. > Trying to trace physaddrs is very tricky -- in the case of > a TLB hit there is no guarantee you can still identify the > physaddr of what you're accessing (the guest might have > changed the page tables and not invalidated the TLB). I was looking at how to modify the soft TLB code to generate that information for the trace event, but it requires changes at every TCG target. But in any case, that should always provide the same phys address used by QEMU, so the event info is "correct" in that sense. Or did I miss something? Cheers, Lluis
On 18 March 2016 at 18:50, Lluís Vilanova <vilanova@ac.upc.edu> wrote: > Peter Maydell writes: >> Trying to trace physaddrs is very tricky -- in the case of >> a TLB hit there is no guarantee you can still identify the >> physaddr of what you're accessing (the guest might have >> changed the page tables and not invalidated the TLB). > > I was looking at how to modify the soft TLB code to generate that information > for the trace event, but it requires changes at every TCG target. But in any > case, that should always provide the same phys address used by QEMU, so the > event info is "correct" in that sense. Or did I miss something? Consider the sequence: * guest makes access to vaddr V, currently mapped to physaddr P1 (which is host address H) * we put the mapping V -> H into QEMU's TLB * guest changes its page tables so V now maps to P2, but doesn't do a TLB flush * guest makes another access to vaddr V * we hit in QEMU's TLB, so we know V and H; but we don't know P1 (because we don't record that in the TLB) and we can't even get it by walking the page table because at this point V maps to P2, not P1. (And for sw-TLB guest archs like MIPS you can't even do a V-to-P lookup in QEMU non-intrusively.) (This is often defined to be unpredictable or similar in the guest architecture. But a buggy guest might do it, and tracing the wrong thing would be pretty confusing if you were trying to track down that bug.) thanks -- PMM
Peter Maydell writes: > On 18 March 2016 at 18:50, Lluís Vilanova <vilanova@ac.upc.edu> wrote: >> Peter Maydell writes: >>> Trying to trace physaddrs is very tricky -- in the case of >>> a TLB hit there is no guarantee you can still identify the >>> physaddr of what you're accessing (the guest might have >>> changed the page tables and not invalidated the TLB). >> >> I was looking at how to modify the soft TLB code to generate that information >> for the trace event, but it requires changes at every TCG target. But in any >> case, that should always provide the same phys address used by QEMU, so the >> event info is "correct" in that sense. Or did I miss something? > Consider the sequence: > * guest makes access to vaddr V, currently mapped to physaddr P1 > (which is host address H) > * we put the mapping V -> H into QEMU's TLB > * guest changes its page tables so V now maps to P2, > but doesn't do a TLB flush > * guest makes another access to vaddr V > * we hit in QEMU's TLB, so we know V and H; but we don't > know P1 (because we don't record that in the TLB) and we > can't even get it by walking the page table because > at this point V maps to P2, not P1. (And for sw-TLB > guest archs like MIPS you can't even do a V-to-P lookup > in QEMU non-intrusively.) > (This is often defined to be unpredictable or similar in the guest > architecture. But a buggy guest might do it, and tracing the > wrong thing would be pretty confusing if you were trying to > track down that bug.) Oh! Yes, I seem to remember that code path now, I checked it a really long time ago. I was assuming that whenever this event is enabled at compile time, I would have to modify QEMU's TLB to store the guest physical address (then used by the tracing event). Cheers, Lluis
On 20 March 2016 at 18:09, Lluís Vilanova <vilanova@ac.upc.edu> wrote: > Oh! Yes, I seem to remember that code path now, I checked it a > really long time ago. I was assuming that whenever this event is > enabled at compile time, I would have to modify QEMU's TLB to store > the guest physical address (then used by the tracing event). I guess we could maybe put that into the iotlb. You definitely don't want it in the CPUTLBEntry struct as that one is space critical for performance. (If you're really lucky you can reconstruct the physaddr from the iotlb addr field but I suspect you can't.) Once you've decided to take the hit of keeping the paddr in the TLB it's probably faster to just unconditionally store it rather than doing a "store if trace event X enabled" test. PS: you probably also want to be able to trace the MemTxAttrs (which tells you whether you have an S or NS access on ARM, among other things). thanks -- PMM
Peter Maydell writes: > On 20 March 2016 at 18:09, Lluís Vilanova <vilanova@ac.upc.edu> wrote: >> Oh! Yes, I seem to remember that code path now, I checked it a >> really long time ago. I was assuming that whenever this event is >> enabled at compile time, I would have to modify QEMU's TLB to store >> the guest physical address (then used by the tracing event). > I guess we could maybe put that into the iotlb. You definitely > don't want it in the CPUTLBEntry struct as that one is space > critical for performance. (If you're really lucky you can > reconstruct the physaddr from the iotlb addr field but I suspect > you can't.) > Once you've decided to take the hit of keeping the paddr in the > TLB it's probably faster to just unconditionally store it rather > than doing a "store if trace event X enabled" test. I meant to make the check at compile time, since we have defines to check which events are enabled/disabled in trace-events. > PS: you probably also want to be able to trace the MemTxAttrs > (which tells you whether you have an S or NS access on ARM, > among other things). I'll keep these in mind for a separate series with extended memory info. Thanks, Lluis
diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h index 3091c00..516f378 100644 --- a/include/exec/cpu_ldst_template.h +++ b/include/exec/cpu_ldst_template.h @@ -23,6 +23,11 @@ * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, see <http://www.gnu.org/licenses/>. */ + +#if !defined(SOFTMMU_CODE_ACCESS) +#include "trace.h" +#endif + #if DATA_SIZE == 8 #define SUFFIX q #define USUFFIX q @@ -80,6 +85,10 @@ glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env, int mmu_idx; TCGMemOpIdx oi; +#if !defined(SOFTMMU_CODE_ACCESS) + trace_guest_vmem_exec(ENV_GET_CPU(env), ptr, DATA_SIZE, 0); +#endif + addr = ptr; page_index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); mmu_idx = CPU_MMU_INDEX; @@ -112,6 +121,10 @@ glue(glue(glue(cpu_lds, SUFFIX), MEMSUFFIX), _ra)(CPUArchState *env, int mmu_idx; TCGMemOpIdx oi; +#if !defined(SOFTMMU_CODE_ACCESS) + trace_guest_vmem_exec(ENV_GET_CPU(env), ptr, DATA_SIZE, 0); +#endif + addr = ptr; page_index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); mmu_idx = CPU_MMU_INDEX; @@ -148,6 +161,10 @@ glue(glue(glue(cpu_st, SUFFIX), MEMSUFFIX), _ra)(CPUArchState *env, int mmu_idx; TCGMemOpIdx oi; +#if !defined(SOFTMMU_CODE_ACCESS) + trace_guest_vmem_exec(ENV_GET_CPU(env), ptr, DATA_SIZE, 1); +#endif + addr = ptr; page_index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); mmu_idx = CPU_MMU_INDEX; diff --git a/include/exec/cpu_ldst_useronly_template.h b/include/exec/cpu_ldst_useronly_template.h index 040b147..cde3d00 100644 --- a/include/exec/cpu_ldst_useronly_template.h +++ b/include/exec/cpu_ldst_useronly_template.h @@ -22,6 +22,11 @@ * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, see <http://www.gnu.org/licenses/>. */ + +#if !defined(CODE_ACCESS) +#include "trace.h" +#endif + #if DATA_SIZE == 8 #define SUFFIX q #define USUFFIX q @@ -53,6 +58,9 @@ static inline RES_TYPE glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr) { +#if !defined(CODE_ACCESS) + trace_guest_vmem_exec(ENV_GET_CPU(env), ptr, DATA_SIZE, 0); +#endif return glue(glue(ld, USUFFIX), _p)(g2h(ptr)); } @@ -68,6 +76,9 @@ glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env, static inline int glue(glue(cpu_lds, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr) { +#if !defined(CODE_ACCESS) + trace_guest_vmem_exec(ENV_GET_CPU(env), ptr, DATA_SIZE, 0); +#endif return glue(glue(lds, SUFFIX), _p)(g2h(ptr)); } @@ -85,6 +96,9 @@ static inline void glue(glue(cpu_st, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr, RES_TYPE v) { +#if !defined(CODE_ACCESS) + trace_guest_vmem_exec(ENV_GET_CPU(env), ptr, DATA_SIZE, 1); +#endif glue(glue(st, SUFFIX), _p)(g2h(ptr), v); } diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c index f554b86..789e427 100644 --- a/tcg/tcg-op.c +++ b/tcg/tcg-op.c @@ -25,6 +25,7 @@ #include "qemu/osdep.h" #include "tcg.h" #include "tcg-op.h" +#include "trace-tcg.h" /* Reduce the number of ifdefs below. This assumes that all uses of TCGV_HIGH and TCGV_LOW are properly protected by a conditional that @@ -1904,22 +1905,44 @@ static void gen_ldst_i64(TCGOpcode opc, TCGv_i64 val, TCGv addr, #endif } -void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop) +static inline uint8_t tcg_memop_size(TCGMemOp op) +{ + return 1 << (op & MO_SIZE); +} + +static inline void do_tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop) { memop = tcg_canonicalize_memop(memop, 0, 0); gen_ldst_i32(INDEX_op_qemu_ld_i32, val, addr, memop, idx); } -void tcg_gen_qemu_st_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop) +void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop) +{ + trace_guest_vmem_tcg(tcg_ctx.cpu, tcg_ctx.tcg_env, + addr, tcg_memop_size(memop), 0); + do_tcg_gen_qemu_ld_i32(val, addr, idx, memop); +} + +static inline void do_tcg_gen_qemu_st_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop) { memop = tcg_canonicalize_memop(memop, 0, 1); gen_ldst_i32(INDEX_op_qemu_st_i32, val, addr, memop, idx); } +void tcg_gen_qemu_st_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop) +{ + trace_guest_vmem_tcg(tcg_ctx.cpu, tcg_ctx.tcg_env, + addr, tcg_memop_size(memop), 1); + do_tcg_gen_qemu_st_i32(val, addr, idx, memop); +} + void tcg_gen_qemu_ld_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop) { + trace_guest_vmem_tcg(tcg_ctx.cpu, tcg_ctx.tcg_env, + addr, tcg_memop_size(memop), 0); + if (TCG_TARGET_REG_BITS == 32 && (memop & MO_SIZE) < MO_64) { - tcg_gen_qemu_ld_i32(TCGV_LOW(val), addr, idx, memop); + do_tcg_gen_qemu_ld_i32(TCGV_LOW(val), addr, idx, memop); if (memop & MO_SIGN) { tcg_gen_sari_i32(TCGV_HIGH(val), TCGV_LOW(val), 31); } else { @@ -1934,8 +1957,11 @@ void tcg_gen_qemu_ld_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop) void tcg_gen_qemu_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop) { + trace_guest_vmem_tcg(tcg_ctx.cpu, tcg_ctx.tcg_env, + addr, tcg_memop_size(memop), 0); + if (TCG_TARGET_REG_BITS == 32 && (memop & MO_SIZE) < MO_64) { - tcg_gen_qemu_st_i32(TCGV_LOW(val), addr, idx, memop); + do_tcg_gen_qemu_st_i32(TCGV_LOW(val), addr, idx, memop); return; } diff --git a/trace-events b/trace-events index f986c81..1088fe0 100644 --- a/trace-events +++ b/trace-events @@ -1890,3 +1890,16 @@ qio_channel_command_new_pid(void *ioc, int writefd, int readfd, int pid) "Comman qio_channel_command_new_spawn(void *ioc, const char *binary, int flags) "Command new spawn ioc=%p binary=%s flags=%d" qio_channel_command_abort(void *ioc, int pid) "Command abort ioc=%p pid=%d" qio_channel_command_wait(void *ioc, int pid, int ret, int status) "Command abort ioc=%p pid=%d ret=%d status=%d" + +### Guest events, keep at bottom + +# @vaddr: Access' virtual address. +# @size : Access' size (bytes). +# @store: Whether the access is a store. +# +# Start virtual memory access (before any potential access violation). +# +# Does not include memory accesses performed by devices. +# +# Targets: TCG(all) +disable vcpu tcg guest_vmem(TCGv vaddr, uint8_t size, uint8_t store) "size=%d store=%d", "vaddr=0x%016"PRIx64" size=%d store=%d"
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> --- include/exec/cpu_ldst_template.h | 17 +++++++++++++++ include/exec/cpu_ldst_useronly_template.h | 14 ++++++++++++ tcg/tcg-op.c | 34 ++++++++++++++++++++++++++--- trace-events | 13 +++++++++++ 4 files changed, 74 insertions(+), 4 deletions(-)