[06/37] target/i386: add ALU load/writeback core

Message ID	20220911230418.340941-7-pbonzini@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Paolo Bonzini <pbonzini@redhat.com> To: qemu-devel@nongnu.org Subject: [PATCH 06/37] target/i386: add ALU load/writeback core Date: Mon, 12 Sep 2022 01:03:46 +0200 Message-Id: <20220911230418.340941-7-pbonzini@redhat.com> In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com> References: <20220911230418.340941-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=170.10.129.124; envelope-from=pbonzini@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	target/i386: new decoder + AVX implementation \| expand [RFC,00/37] target/i386: new decoder + AVX implementation [01/37] target/i386: Define XMMReg and access macros, align ZMM registers [02/37] target/i386: make ldo/sto operations consistent with ldq [03/37] target/i386: REPZ and REPNZ are mutually exclusive [04/37] target/i386: introduce insn_get_addr [05/37] target/i386: add core of new i386 decoder [06/37] target/i386: add ALU load/writeback core [07/37] target/i386: add CPUID[EAX=7, ECX=0].ECX to DisasContext [08/37] target/i386: add CPUID feature checks to new decoder [09/37] target/i386: add AVX_EN hflag [10/37] target/i386: validate VEX prefixes via the instructions' exception classes [11/37] target/i386: validate SSE prefixes directly in the decoding table [12/37] target/i386: add scalar 0F 38 and 0F 3A instruction to new decoder [13/37] target/i386: remove scalar VEX instructions from old decoder [14/37] target/i386: Prepare ops_sse_header.h for 256 bit AVX [15/37] target/i386: extend helpers to support VEX.V 3- and 4- operand encodings [16/37] target/i386: support operand merging in binary scalar helpers [17/37] target/i386: provide 3-operand versions of unary scalar helpers [18/37] target/i386: implement additional AVX comparison operators [19/37] target/i386: Introduce 256-bit vector helpers [20/37] target/i386: reimplement 0x0f 0x60-0x6f, add AVX [21/37] target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX [22/37] target/i386: reimplement 0x0f 0x50-0x5f, add AVX [23/37] target/i386: reimplement 0x0f 0x78-0x7f, add AVX [24/37] target/i386: reimplement 0x0f 0x70-0x77, add AVX [25/37] target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX [26/37] target/i386: reimplement 0x0f 0x3a, add AVX [27/37] target/i386: Use tcg gvec ops for pmovmskb [28/37] target/i386: reimplement 0x0f 0x38, add AVX [29/37] target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX [30/37] target/i386: reimplement 0x0f 0x10-0x17, add AVX [31/37] target/i386: reimplement 0x0f 0x28-0x2f, add AVX [32/37] target/i386: implement XSAVE and XRSTOR of AVX registers [33/37] target/i386: Enable AVX cpuid bits when using TCG [34/37] target/i386: implement VLDMXCSR/VSTMXCSR [35/37] tests/tcg: extend SSE tests to AVX [36/37] target/i386: move 3DNow completely out of gen_sse

Message ID

20220911230418.340941-7-pbonzini@redhat.com (mailing list archive)

State

New, archived

Headers

From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Subject: [PATCH 06/37] target/i386: add ALU load/writeback core
Date: Mon, 12 Sep 2022 01:03:46 +0200
Message-Id: <20220911230418.340941-7-pbonzini@redhat.com>
In-Reply-To: <20220911230418.340941-1-pbonzini@redhat.com>
References: <20220911230418.340941-1-pbonzini@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=170.10.129.124;
 envelope-from=pbonzini@redhat.com;
 helo=us-smtp-delivery-124.mimecast.com
X-Spam_score_int: -27
X-Spam_score: -2.8
X-Spam_bar: --
X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>

Series

target/i386: new decoder + AVX implementation | expand

Commit Message

Paolo Bonzini Sept. 11, 2022, 11:03 p.m. UTC

Add generic code generation that takes care of preparing operands
around calls to decode.e.gen in a table-driven manner, so that ALU
operations need not take care of that.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/tcg/decode-new.c.inc |  20 +++-
 target/i386/tcg/decode-new.h     |   1 +
 target/i386/tcg/emit.c.inc       | 152 +++++++++++++++++++++++++++++++
 target/i386/tcg/translate.c      |  24 +++++
 4 files changed, 195 insertions(+), 2 deletions(-)

Comments

Richard Henderson Sept. 12, 2022, 10:02 a.m. UTC | #1

On 9/12/22 00:03, Paolo Bonzini wrote:
> Add generic code generation that takes care of preparing operands
> around calls to decode.e.gen in a table-driven manner, so that ALU
> operations need not take care of that.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   target/i386/tcg/decode-new.c.inc |  20 +++-
>   target/i386/tcg/decode-new.h     |   1 +
>   target/i386/tcg/emit.c.inc       | 152 +++++++++++++++++++++++++++++++
>   target/i386/tcg/translate.c      |  24 +++++
>   4 files changed, 195 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
> index de8ef51a2d..7f76051b2d 100644
> --- a/target/i386/tcg/decode-new.c.inc
> +++ b/target/i386/tcg/decode-new.c.inc
> @@ -228,7 +228,7 @@ static bool decode_op_size(DisasContext *s, X86OpEntry *e, X86OpSize size, MemOp
>               *ot = MO_64;
>               return true;
>           }
> -        if (s->vex_l && e->s0 != X86_SIZE_qq) {
> +        if (s->vex_l && e->s0 != X86_SIZE_qq && e->s1 != X86_SIZE_qq) {
>               return false;
>           }

Squash back?

> diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
> index e86364ffc1..6fa0062d6a 100644
> --- a/target/i386/tcg/emit.c.inc
> +++ b/target/i386/tcg/emit.c.inc
> @@ -29,3 +29,155 @@ static void gen_load_ea(DisasContext *s, AddressParts *mem)
>       TCGv ea = gen_lea_modrm_1(s, *mem);
>       gen_lea_v_seg(s, s->aflag, ea, mem->def_seg, s->override);
>   }
> +
> +static void gen_mmx_offset(TCGv_ptr ptr, X86DecodedOp *op)
> +{
> +    if (!op->has_ea) {
> +        op->offset = offsetof(CPUX86State, fpregs[op->n].mmx);
> +    } else {
> +        op->offset = offsetof(CPUX86State, mmx_t0);
> +    }
> +    tcg_gen_addi_ptr(ptr, cpu_env, op->offset);

It's a shame to generate this so early, when you don't know if you'll need it. Better to 
build these in the gen_binary_int_sse helper, immediately before they're required?

> +
> +    /*
> +     * ptr is for passing to helpers, and points to the MMXReg; op->offset
> +     * is for TCG ops and points to the operand.
> +     */
> +    if (op->ot == MO_32) {
> +        op->offset += offsetof(MMXReg, MMX_L(0));
> +    }

I guess you'd need an op->offset_base if you do the above...
Switch and g_assert_not_reached on invalid ot?

> +static int xmm_offset(MemOp ot)
> +{
> +    if (ot == MO_8) {
> +        return offsetof(ZMMReg, ZMM_B(0));
> +    } else if (ot == MO_16) {
> +        return offsetof(ZMMReg, ZMM_W(0));
> +    } else if (ot == MO_32) {
> +        return offsetof(ZMMReg, ZMM_L(0));
> +    } else if (ot == MO_64) {
> +        return offsetof(ZMMReg, ZMM_Q(0));
> +    } else if (ot == MO_128) {
> +        return offsetof(ZMMReg, ZMM_X(0));
> +    } else if (ot == MO_256) {
> +        return offsetof(ZMMReg, ZMM_Y(0));
> +    } else {
> +       abort();

Switch, g_assert_not_reached().

> +static void gen_load_sse(DisasContext *s, TCGv temp, MemOp ot, int dest_ofs)
> +{
> +    if (ot == MO_8) {
> +        gen_op_ld_v(s, MO_8, temp, s->A0);
> +        tcg_gen_st8_tl(temp, cpu_env, dest_ofs);
> +    } else if (ot == MO_16) {
> +        gen_op_ld_v(s, MO_16, temp, s->A0);
> +        tcg_gen_st16_tl(temp, cpu_env, dest_ofs);
> +    } else if (ot == MO_32) {
> +        gen_op_ld_v(s, MO_32, temp, s->A0);
> +        tcg_gen_st32_tl(temp, cpu_env, dest_ofs);
> +    } else if (ot == MO_64) {
> +        gen_ldq_env_A0(s, dest_ofs);
> +    } else if (ot == MO_128) {
> +        gen_ldo_env_A0(s, dest_ofs);
> +    } else if (ot == MO_256) {
> +        gen_ldy_env_A0(s, dest_ofs);
> +    }

Likewise.

> +static void gen_writeback(DisasContext *s, X86DecodedOp *op)
> +{
> +    switch (op->unit) {
> +    case X86_OP_SKIP:
> +        break;
> +    case X86_OP_SEG:
> +        /* Note that reg == R_SS in gen_movl_seg_T0 always sets is_jmp.  */
> +        gen_movl_seg_T0(s, op->n);
> +        if (s->base.is_jmp) {
> +            gen_jmp_im(s, s->pc - s->cs_base);
> +            if (op->n == R_SS) {
> +                s->flags &= ~HF_TF_MASK;
> +                gen_eob_inhibit_irq(s, true);
> +            } else {
> +                gen_eob(s);
> +            }
> +        }
> +        break;
> +    case X86_OP_CR:
> +    case X86_OP_DR:
> +        /* TBD */
> +        break;

Leave these adjacent with default abort until needed?

> +    default:
> +        abort();
> +    }

g_assert_not_reached.

> +static inline void gen_ldy_env_A0(DisasContext *s, int offset)
> +{
> +    int mem_index = s->mem_index;
> +    gen_ldo_env_A0(s, offset);
> +    tcg_gen_addi_tl(s->tmp0, s->A0, 16);
> +    tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ);
> +    tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(2)));
> +    tcg_gen_addi_tl(s->tmp0, s->A0, 24);
> +    tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ);
> +    tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(3)));
> +}
> +
> +static inline void gen_sty_env_A0(DisasContext *s, int offset)
> +{
> +    int mem_index = s->mem_index;
> +    gen_sto_env_A0(s, offset);
> +    tcg_gen_addi_tl(s->tmp0, s->A0, 16);
> +    tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(2)));
> +    tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ);
> +    tcg_gen_addi_tl(s->tmp0, s->A0, 24);
> +    tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(3)));
> +    tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ);
> +}

No need for inline markers.

Note that there's an outstanding patch set that enforces alignment restrictions (for 
ldy/sty it would only be for vmovdqa etc):

https://lore.kernel.org/qemu-devel/20220830034816.57091-2-ricky@rzhou.org/

but it's definitely something that ought to build into the new decoder from the start.


r~

diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index de8ef51a2d..7f76051b2d 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -228,7 +228,7 @@  static bool decode_op_size(DisasContext *s, X86OpEntry *e, X86OpSize size, MemOp
             *ot = MO_64;
             return true;
         }
-        if (s->vex_l && e->s0 != X86_SIZE_qq) {
+        if (s->vex_l && e->s0 != X86_SIZE_qq && e->s1 != X86_SIZE_qq) {
             return false;
         }
         *ot = MO_128;
@@ -741,7 +741,23 @@  static target_ulong disas_insn_new(DisasContext *s, CPUState *cpu, int b)
     if (decode.op[0].has_ea || decode.op[1].has_ea || decode.op[2].has_ea) {
         gen_load_ea(s, &decode.mem);
     }
-    decode.e.gen(s, env, &decode);
+    if (s->prefix & PREFIX_LOCK) {
+        if (decode.op[0].unit != X86_OP_INT || !decode.op[0].has_ea) {
+            goto illegal_op;
+        }
+        gen_load(s, s->T1, NULL, &decode.op[2], decode.immediate);
+        decode.e.gen(s, env, &decode);
+    } else {
+        if (decode.op[0].unit == X86_OP_MMX) {
+            gen_mmx_offset(s->ptr0, &decode.op[0]);
+        } else if (decode.op[0].unit == X86_OP_SSE) {
+            gen_xmm_offset(s->ptr0, &decode.op[0]);
+        }
+        gen_load(s, s->T0, s->ptr1, &decode.op[1], decode.immediate);
+        gen_load(s, s->T1, s->ptr2, &decode.op[2], decode.immediate);
+        decode.e.gen(s, env, &decode);
+        gen_writeback(s, &decode.op[0]);
+    }
     return s->pc;
  illegal_op:
     gen_illegal_opcode(s);
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index fb44560aae..a2d3c3867f 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -168,6 +168,7 @@  typedef struct X86DecodedOp {
     MemOp ot;     /* For b/c/d/p/s/q/v/w/y/z */
     X86OpUnit unit;
     bool has_ea;
+    int offset;   /* For MMX and SSE */
 } X86DecodedOp;
 
 struct X86DecodedInsn {
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index e86364ffc1..6fa0062d6a 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -29,3 +29,155 @@  static void gen_load_ea(DisasContext *s, AddressParts *mem)
     TCGv ea = gen_lea_modrm_1(s, *mem);
     gen_lea_v_seg(s, s->aflag, ea, mem->def_seg, s->override);
 }
+
+static void gen_mmx_offset(TCGv_ptr ptr, X86DecodedOp *op)
+{
+    if (!op->has_ea) {
+        op->offset = offsetof(CPUX86State, fpregs[op->n].mmx);
+    } else {
+        op->offset = offsetof(CPUX86State, mmx_t0);
+    }
+    tcg_gen_addi_ptr(ptr, cpu_env, op->offset);
+
+    /*
+     * ptr is for passing to helpers, and points to the MMXReg; op->offset
+     * is for TCG ops and points to the operand.
+     */
+    if (op->ot == MO_32) {
+        op->offset += offsetof(MMXReg, MMX_L(0));
+    }
+}
+
+static int xmm_offset(MemOp ot)
+{
+    if (ot == MO_8) {
+        return offsetof(ZMMReg, ZMM_B(0));
+    } else if (ot == MO_16) {
+        return offsetof(ZMMReg, ZMM_W(0));
+    } else if (ot == MO_32) {
+        return offsetof(ZMMReg, ZMM_L(0));
+    } else if (ot == MO_64) {
+        return offsetof(ZMMReg, ZMM_Q(0));
+    } else if (ot == MO_128) {
+        return offsetof(ZMMReg, ZMM_X(0));
+    } else if (ot == MO_256) {
+        return offsetof(ZMMReg, ZMM_Y(0));
+    } else {
+       abort();
+    }
+}
+
+static void gen_xmm_offset(TCGv_ptr ptr, X86DecodedOp *op)
+{
+    if (!op->has_ea) {
+        op->offset = ZMM_OFFSET(op->n);
+    } else {
+        op->offset = offsetof(CPUX86State, xmm_t0);
+    }
+    /*
+     * ptr is for passing to helpers, and points to the ZMMReg; op->offset
+     * is for TCG ops (especially gvec) and points to the base of the vector.
+     */
+    tcg_gen_addi_ptr(ptr, cpu_env, op->offset);
+    op->offset += xmm_offset(op->ot);
+}
+
+static void gen_load_sse(DisasContext *s, TCGv temp, MemOp ot, int dest_ofs)
+{
+    if (ot == MO_8) {
+        gen_op_ld_v(s, MO_8, temp, s->A0);
+        tcg_gen_st8_tl(temp, cpu_env, dest_ofs);
+    } else if (ot == MO_16) {
+        gen_op_ld_v(s, MO_16, temp, s->A0);
+        tcg_gen_st16_tl(temp, cpu_env, dest_ofs);
+    } else if (ot == MO_32) {
+        gen_op_ld_v(s, MO_32, temp, s->A0);
+        tcg_gen_st32_tl(temp, cpu_env, dest_ofs);
+    } else if (ot == MO_64) {
+        gen_ldq_env_A0(s, dest_ofs);
+    } else if (ot == MO_128) {
+        gen_ldo_env_A0(s, dest_ofs);
+    } else if (ot == MO_256) {
+        gen_ldy_env_A0(s, dest_ofs);
+    }
+}
+
+static void gen_load(DisasContext *s, TCGv v, TCGv_ptr ptr, X86DecodedOp *op, uint64_t imm)
+{
+    switch (op->unit) {
+    case X86_OP_SKIP:
+        return;
+    case X86_OP_SEG:
+        tcg_gen_ld32u_tl(v, cpu_env,
+                         offsetof(CPUX86State,segs[op->n].selector));
+        break;
+    case X86_OP_CR:
+        tcg_gen_ld_tl(v, cpu_env, offsetof(CPUX86State, cr[op->n]));
+        break;
+    case X86_OP_DR:
+        tcg_gen_ld_tl(v, cpu_env, offsetof(CPUX86State, dr[op->n]));
+        break;
+    case X86_OP_INT:
+        if (op->has_ea) {
+            gen_op_ld_v(s, op->ot, v, s->A0);
+        } else {
+            gen_op_mov_v_reg(s, op->ot, v, op->n);
+        }
+        break;
+    case X86_OP_IMM:
+        tcg_gen_movi_tl(v, imm);
+        break;
+
+    case X86_OP_MMX:
+        gen_mmx_offset(ptr, op);
+        goto load_vector;
+
+    case X86_OP_SSE:
+        gen_xmm_offset(ptr, op);
+    load_vector:
+        if (op->has_ea) {
+            gen_load_sse(s, v, op->ot, op->offset);
+        }
+        break;
+
+    default:
+        abort();
+    }
+}
+
+static void gen_writeback(DisasContext *s, X86DecodedOp *op)
+{
+    switch (op->unit) {
+    case X86_OP_SKIP:
+        break;
+    case X86_OP_SEG:
+        /* Note that reg == R_SS in gen_movl_seg_T0 always sets is_jmp.  */
+        gen_movl_seg_T0(s, op->n);
+        if (s->base.is_jmp) {
+            gen_jmp_im(s, s->pc - s->cs_base);
+            if (op->n == R_SS) {
+                s->flags &= ~HF_TF_MASK;
+                gen_eob_inhibit_irq(s, true);
+            } else {
+                gen_eob(s);
+            }
+        }
+        break;
+    case X86_OP_CR:
+    case X86_OP_DR:
+        /* TBD */
+        break;
+    case X86_OP_INT:
+        if (op->has_ea) {
+            gen_op_st_v(s, op->ot, s->T0, s->A0);
+        } else {
+            gen_op_mov_reg_v(s, op->ot, op->n, s->T0);
+        }
+        break;
+    case X86_OP_MMX:
+    case X86_OP_SSE:
+        break;
+    default:
+        abort();
+    }
+}
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index f66bf2ac79..7e9920e29c 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2831,6 +2831,30 @@  static inline void gen_sto_env_A0(DisasContext *s, int offset)
     tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ);
 }
 
+static inline void gen_ldy_env_A0(DisasContext *s, int offset)
+{
+    int mem_index = s->mem_index;
+    gen_ldo_env_A0(s, offset);
+    tcg_gen_addi_tl(s->tmp0, s->A0, 16);
+    tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ);
+    tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(2)));
+    tcg_gen_addi_tl(s->tmp0, s->A0, 24);
+    tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ);
+    tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(3)));
+}
+
+static inline void gen_sty_env_A0(DisasContext *s, int offset)
+{
+    int mem_index = s->mem_index;
+    gen_sto_env_A0(s, offset);
+    tcg_gen_addi_tl(s->tmp0, s->A0, 16);
+    tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(2)));
+    tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ);
+    tcg_gen_addi_tl(s->tmp0, s->A0, 24);
+    tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(3)));
+    tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ);
+}
+
 static inline void gen_op_movo(DisasContext *s, int d_offset, int s_offset)
 {
     tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(XMMReg, XMM_Q(0)));

[06/37] target/i386: add ALU load/writeback core

Commit Message

Comments

Patch