Message ID | 20160531183928.29406-3-bobby.prani@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/31/2016 11:39 AM, Pranith Kumar wrote: > + case INDEX_op_mb: > + tcg_out_mb(s); You need to look at the barrier type and DTRT. In particular, the Linux smp_rmb and smp_wmb types need not emit any code. > + { INDEX_op_mb, { "r" } }, You certainly do *not* need the constant argument loaded into a register. This should remain { }. r~
On Tue, May 31, 2016 at 4:27 PM, Richard Henderson <rth@twiddle.net> wrote: > On 05/31/2016 11:39 AM, Pranith Kumar wrote: >> >> + case INDEX_op_mb: >> + tcg_out_mb(s); > > > You need to look at the barrier type and DTRT. In particular, the Linux > smp_rmb and smp_wmb types need not emit any code. These are converted to 'lfence' and 'sfence' instructions. Based on the target backend, I think we still need to emit barrier instructions. For example, if target backend is ARMv7 we need to emit 'dmb' instruction for both x86 fence instructions. I am not sure why they do not emit any code? > >> + { INDEX_op_mb, { "r" } }, > > > You certainly do *not* need the constant argument loaded into a register. > This should remain { }. > OK, I will fix this. Thanks,
On 06/01/2016 11:49 AM, Pranith Kumar wrote: > On Tue, May 31, 2016 at 4:27 PM, Richard Henderson <rth@twiddle.net> wrote: >> On 05/31/2016 11:39 AM, Pranith Kumar wrote: >>> >>> + case INDEX_op_mb: >>> + tcg_out_mb(s); >> >> >> You need to look at the barrier type and DTRT. In particular, the Linux >> smp_rmb and smp_wmb types need not emit any code. > > These are converted to 'lfence' and 'sfence' instructions. Based on > the target backend, I think we still need to emit barrier > instructions. For example, if target backend is ARMv7 we need to emit > 'dmb' instruction for both x86 fence instructions. I am not sure why > they do not emit any code? Because x86 has a strong memory model. It does not require barriers to keep normal loads and stores in order. The primary reason for the *fence instructions is to order the "non-temporal" memory operations that are part of the SSE instruction set, which we're not using at all. This is why you'll find /* * Because of the strongly ordered storage model, wmb() and rmb() are nops * here (a compiler barrier only). QEMU doesn't do accesses to write-combining * qemu memory or non-temporal load/stores from C code. */ #define smp_wmb() barrier() #define smp_rmb() barrier() for x86 and s390. r~
On Wed, Jun 1, 2016 at 5:17 PM, Richard Henderson <rth@twiddle.net> wrote: > > Because x86 has a strong memory model. > > It does not require barriers to keep normal loads and stores in order. The > primary reason for the *fence instructions is to order the "non-temporal" > memory operations that are part of the SSE instruction set, which we're not > using at all. > > This is why you'll find > > /* > * Because of the strongly ordered storage model, wmb() and rmb() are nops > * here (a compiler barrier only). QEMU doesn't do accesses to > write-combining > * qemu memory or non-temporal load/stores from C code. > */ > #define smp_wmb() barrier() > #define smp_rmb() barrier() > > for x86 and s390. OK. For x86 target, that is true. I think I got the context confused. On x86 target, we can elide the read and write barriers. But we still need to generate 'mfence' to prevent store-after-load reordering. I will refine this in the next version. Thanks,
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index 8fd37f4..1fd5a99 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -121,6 +121,16 @@ static bool have_cmov; # define have_cmov 0 #endif +/* For 32-bit, we are going to attempt to determine at runtime whether + sse2 support is available. */ +#if TCG_TARGET_REG_BITS == 64 || defined(__SSE2__) +# define have_sse2 1 +#elif defined(CONFIG_CPUID_H) && defined(bit_SSE2) +static bool have_sse2; +#else +# define have_sse2 0 +#endif + /* If bit_MOVBE is defined in cpuid.h (added in GCC version 4.6), we are going to attempt to determine at runtime whether movbe is available. */ #if defined(CONFIG_CPUID_H) && defined(bit_MOVBE) @@ -686,6 +696,21 @@ static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val) } } +static inline void tcg_out_mb(TCGContext *s) +{ + if (have_sse2) { + /* mfence */ + tcg_out8(s, 0x0f); + tcg_out8(s, 0xae); + tcg_out8(s, 0xf0); + } else { + /* lock orl $0,0(%esp) */ + tcg_out8(s, 0xf0); + tcg_out_modrm_offset(s, OPC_ARITH_EvIb, ARITH_OR, TCG_REG_ESP, 0); + tcg_out8(s, 0); + } +} + static inline void tcg_out_push(TCGContext *s, int reg) { tcg_out_opc(s, OPC_PUSH_r32 + LOWREGMASK(reg), 0, reg, 0); @@ -2114,6 +2139,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, } break; + case INDEX_op_mb: + tcg_out_mb(s); + break; case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi. */ @@ -2179,6 +2207,8 @@ static const TCGTargetOpDef x86_op_defs[] = { { INDEX_op_add2_i32, { "r", "r", "0", "1", "ri", "ri" } }, { INDEX_op_sub2_i32, { "r", "r", "0", "1", "ri", "ri" } }, + { INDEX_op_mb, { "r" } }, + #if TCG_TARGET_REG_BITS == 32 { INDEX_op_brcond2_i32, { "r", "r", "ri", "ri" } }, { INDEX_op_setcond2_i32, { "r", "r", "r", "ri", "ri" } }, @@ -2356,6 +2386,11 @@ static void tcg_target_init(TCGContext *s) available, we'll use a small forward branch. */ have_cmov = (d & bit_CMOV) != 0; #endif +#ifndef have_sse2 + /* Likewise, almost all hardware supports SSE2, but we do + have a locked memory operation to use as a substitute. */ + have_sse2 = (d & bit_SSE2) != 0; +#endif #ifndef have_movbe /* MOVBE is only available on Intel Atom and Haswell CPUs, so we need to probe for it. */