From patchwork Fri May 31 17:44:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 13681978 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1AE54C25B75 for ; Fri, 31 May 2024 17:46:59 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sD6Jh-0001jy-6N; Fri, 31 May 2024 13:45:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sD6Jf-0001jA-CN for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:23 -0400 Received: from mail-pl1-x636.google.com ([2607:f8b0:4864:20::636]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sD6Jb-0007Tq-Bz for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:23 -0400 Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1f44b42d1caso18081995ad.0 for ; Fri, 31 May 2024 10:45:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1717177518; x=1717782318; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gkDPJZZ6oFc98bcOGd+vVk2JDeWVu0QYYXVhgDKykJw=; b=LPS9xjJsDGF6ZRK4A7CPWJscSGsDFC2nElheKD4lWJkNOnBlbfMxaTxch+NqUexN3a Gn4BQV7aJ5MRXokZAoPSZKDuPfwq++VdBi8wekbPZX7MViXhUrFvyKIap59k80k3fz7F LVEJdX8yt3v30+nqbNubGHiVYhWv/z0SfzRUdJeJ56AWB0iFA0RuSRYh86u1Y7qCMqb3 IcMHfsf7hQrd/dcucQJPA4QB5gLv2YYLmxUAWZ2vrQ0XXkqgRMXkvB1Spvl7cpeB2LQb YP6ts/FnkGLRGX0xliRFA3MEC2vwFQdXpeFoPe5Q0PsKfv9j4wWX3TrXAJ5Fhefn8bQB xPEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717177518; x=1717782318; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gkDPJZZ6oFc98bcOGd+vVk2JDeWVu0QYYXVhgDKykJw=; b=TJuDzl/9L7SiTTsUs9pJh3zLJ6EILyJyEu4M2hYN2wUDtYTNWw4FI4qCv/gHF+kqU5 kk29AItEEoedgKb8b+DIgpCMa16CGidbqdDfTojERPaD3OGZNsYpDYjZy3aBF9EbfueV NwxZ4bU6Osw+G8h+flYvHFA3gXFnh5WToSs6CqjDrFrvpPuEzqAEtguPYrlZTMFR2aia +/YPtWUmj71q6Jdfuh6iXvVUaSBgCtM3Vkqd0Wqep8r2od//ZdyU/aLIKJ6w/DnbdORq tAGXjOsqOjspg3GHw+0TmNr3j/ixMzLy8E9xe+my8KrEG5OpTIKEwmbvwJOvlN1xWryJ Y6hQ== X-Gm-Message-State: AOJu0Yy90rlrbLOMxAxdVtmP7neqU7+nWyytPMf80BZVwriI3DgMoDJA NLjpq2ZOniJw5LQfwNx4NrBzIRRTo7YQojLMrwsQ9hxVZQRNxj6aarlH5KIVPTJ/N2V2tC0FJzy dJvympPyO+DnpWerpluOjz4+DcohV+uDTBSvrbdPQLt9kGIqrOoJezwy+i3ywe25LAw7v4OMZr4 1LBMLR2RlJw1zhFmojhI5eC1tSaCI33TUiG/VrqQ== X-Google-Smtp-Source: AGHT+IEegyQ3zmhFdvdlKucd4ilLr6J2UoyqzrlcipflnsQVvYBqsg/Vu5dgRz2bJTb/6BrdoMmTBg== X-Received: by 2002:a17:903:32d2:b0:1e5:3c5:55a5 with SMTP id d9443c01a7336-1f636fd9da4mr25829665ad.8.1717177517274; Fri, 31 May 2024 10:45:17 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f63236ce6dsm19389875ad.95.2024.05.31.10.45.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 May 2024 10:45:17 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: dbarboza@ventanamicro.com, Max Chou , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Liu Zhiwei , Richard Henderson Subject: [RFC PATCH v2 1/6] target/riscv: Separate vector segment ld/st instructions Date: Sat, 1 Jun 2024 01:44:48 +0800 Message-Id: <20240531174504.281461-2-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240531174504.281461-1-max.chou@sifive.com> References: <20240531174504.281461-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::636; envelope-from=max.chou@sifive.com; helo=mail-pl1-x636.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This commit separate the helper function implementations of vector segment load/store instructions from other vector load/store instructions. This can improve performance by avoiding unnecessary segment operation when NF = 1. Signed-off-by: Max Chou --- target/riscv/helper.h | 4 + target/riscv/insn32.decode | 11 ++- target/riscv/insn_trans/trans_rvv.c.inc | 61 +++++++++++++++ target/riscv/vector_helper.c | 100 +++++++++++++++++++++--- 4 files changed, 164 insertions(+), 12 deletions(-) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index 451261ce5a4..aaf68eadfb7 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -158,18 +158,22 @@ DEF_HELPER_FLAGS_3(hyp_hsv_d, TCG_CALL_NO_WG, void, env, tl, tl) /* Vector functions */ DEF_HELPER_3(vsetvl, tl, env, tl, tl) DEF_HELPER_5(vle8_v, void, ptr, ptr, tl, env, i32) +DEF_HELPER_5(vlsege8_v, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vle16_v, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vle32_v, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vle64_v, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vle8_v_mask, void, ptr, ptr, tl, env, i32) +DEF_HELPER_5(vlsege8_v_mask, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vle16_v_mask, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vle32_v_mask, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vle64_v_mask, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vse8_v, void, ptr, ptr, tl, env, i32) +DEF_HELPER_5(vssege8_v, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vse16_v, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vse32_v, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vse64_v, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vse8_v_mask, void, ptr, ptr, tl, env, i32) +DEF_HELPER_5(vssege8_v_mask, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vse16_v_mask, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vse32_v_mask, void, ptr, ptr, tl, env, i32) DEF_HELPER_5(vse64_v_mask, void, ptr, ptr, tl, env, i32) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index f22df04cfd1..0712e9f6314 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -77,6 +77,7 @@ @r2 ....... ..... ..... ... ..... ....... &r2 %rs1 %rd @r2_vm_1 ...... . ..... ..... ... ..... ....... &rmr vm=1 %rs2 %rd @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd +@r2_nf_1_vm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm nf=1 %rs1 %rd @r2_vm ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd @r1_vm ...... vm:1 ..... ..... ... ..... ....... %rd @r_nfvm ... ... vm:1 ..... ..... ... ..... ....... &rnfvm %nf %rs2 %rs1 %rd @@ -349,11 +350,17 @@ hsv_d 0110111 ..... ..... 100 00000 1110011 @r2_s # *** Vector loads and stores are encoded within LOADFP/STORE-FP *** # Vector unit-stride load/store insns. -vle8_v ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm +{ + vle8_v 000 000 . 00000 ..... 000 ..... 0000111 @r2_nf_1_vm + vlsege8_v ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm +} vle16_v ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm vle32_v ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm vle64_v ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm -vse8_v ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm +{ + vse8_v 000 000 . 00000 ..... 000 ..... 0100111 @r2_nf_1_vm + vssege8_v ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm +} vse16_v ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm vse32_v ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm vse64_v ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index 3a3896ba06c..1e4fa797a86 100644 --- a/target/riscv/insn_trans/trans_rvv.c.inc +++ b/target/riscv/insn_trans/trans_rvv.c.inc @@ -719,6 +719,40 @@ GEN_VEXT_TRANS(vle16_v, MO_16, r2nfvm, ld_us_op, ld_us_check) GEN_VEXT_TRANS(vle32_v, MO_32, r2nfvm, ld_us_op, ld_us_check) GEN_VEXT_TRANS(vle64_v, MO_64, r2nfvm, ld_us_op, ld_us_check) +static bool ld_us_seg_op(DisasContext *s, arg_r2nfvm *a, uint8_t eew) +{ + uint32_t data = 0; + gen_helper_ldst_us *fn; + static gen_helper_ldst_us * const fns[2][4] = { + /* masked unit stride load */ + { gen_helper_vlsege8_v_mask, gen_helper_vle16_v_mask, + gen_helper_vle32_v_mask, gen_helper_vle64_v_mask }, + /* unmasked unit stride load */ + { gen_helper_vlsege8_v, gen_helper_vle16_v, + gen_helper_vle32_v, gen_helper_vle64_v } + }; + + fn = fns[a->vm][eew]; + if (fn == NULL) { + return false; + } + + /* + * Vector load/store instructions have the EEW encoded + * directly in the instructions. The maximum vector size is + * calculated with EMUL rather than LMUL. + */ + uint8_t emul = vext_get_emul(s, eew); + data = FIELD_DP32(data, VDATA, VM, a->vm); + data = FIELD_DP32(data, VDATA, LMUL, emul); + data = FIELD_DP32(data, VDATA, NF, a->nf); + data = FIELD_DP32(data, VDATA, VTA, s->vta); + data = FIELD_DP32(data, VDATA, VMA, s->vma); + return ldst_us_trans(a->rd, a->rs1, data, fn, s, false); +} + +GEN_VEXT_TRANS(vlsege8_v, MO_8, r2nfvm, ld_us_seg_op, ld_us_check) + static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t eew) { uint32_t data = 0; @@ -756,6 +790,33 @@ GEN_VEXT_TRANS(vse16_v, MO_16, r2nfvm, st_us_op, st_us_check) GEN_VEXT_TRANS(vse32_v, MO_32, r2nfvm, st_us_op, st_us_check) GEN_VEXT_TRANS(vse64_v, MO_64, r2nfvm, st_us_op, st_us_check) +static bool st_us_seg_op(DisasContext *s, arg_r2nfvm *a, uint8_t eew) +{ + uint32_t data = 0; + gen_helper_ldst_us *fn; + static gen_helper_ldst_us * const fns[2][4] = { + /* masked unit stride store */ + { gen_helper_vssege8_v_mask, gen_helper_vse16_v_mask, + gen_helper_vse32_v_mask, gen_helper_vse64_v_mask }, + /* unmasked unit stride store */ + { gen_helper_vssege8_v, gen_helper_vse16_v, + gen_helper_vse32_v, gen_helper_vse64_v } + }; + + fn = fns[a->vm][eew]; + if (fn == NULL) { + return false; + } + + uint8_t emul = vext_get_emul(s, eew); + data = FIELD_DP32(data, VDATA, VM, a->vm); + data = FIELD_DP32(data, VDATA, LMUL, emul); + data = FIELD_DP32(data, VDATA, NF, a->nf); + return ldst_us_trans(a->rd, a->rs1, data, fn, s, true); +} + +GEN_VEXT_TRANS(vssege8_v, MO_8, r2nfvm, st_us_seg_op, st_us_check) + /* *** unit stride mask load and store */ diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index 1b4d5a8e378..440c33c141b 100644 --- a/target/riscv/vector_helper.c +++ b/target/riscv/vector_helper.c @@ -201,6 +201,32 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base, uint32_t desc, uint32_t vm, vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uintptr_t ra) +{ + uint32_t i; + uint32_t max_elems = vext_max_elems(desc, log2_esz); + uint32_t esz = 1 << log2_esz; + uint32_t vma = vext_vma(desc); + + for (i = env->vstart; i < env->vl; i++, env->vstart++) { + if (!vm && !vext_elem_mask(v0, i)) { + /* set masked-off elements to 1s */ + vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz); + continue; + } + target_ulong addr = base + stride * i; + ldst_elem(env, adjust_addr(env, addr), i, vd, ra); + } + env->vstart = 0; + + vext_set_tail_elems_1s(env->vl, vd, desc, 1, esz, max_elems); +} + +static void +vext_ldst_stride_segment(void *vd, void *v0, target_ulong base, + target_ulong stride, CPURISCVState *env, + uint32_t desc, uint32_t vm, + vext_ldst_elem_fn *ldst_elem, + uint32_t log2_esz, uintptr_t ra) { uint32_t i, k; uint32_t nf = vext_nf(desc); @@ -236,8 +262,8 @@ void HELPER(NAME)(void *vd, void * v0, target_ulong base, \ uint32_t desc) \ { \ uint32_t vm = vext_vm(desc); \ - vext_ldst_stride(vd, v0, base, stride, env, desc, vm, LOAD_FN, \ - ctzl(sizeof(ETYPE)), GETPC()); \ + vext_ldst_stride_segment(vd, v0, base, stride, env, desc, vm, \ + LOAD_FN, ctzl(sizeof(ETYPE)), GETPC()); \ } GEN_VEXT_LD_STRIDE(vlse8_v, int8_t, lde_b) @@ -251,8 +277,8 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ uint32_t desc) \ { \ uint32_t vm = vext_vm(desc); \ - vext_ldst_stride(vd, v0, base, stride, env, desc, vm, STORE_FN, \ - ctzl(sizeof(ETYPE)), GETPC()); \ + vext_ldst_stride_segment(vd, v0, base, stride, env, desc, vm, \ + STORE_FN, ctzl(sizeof(ETYPE)), GETPC()); \ } GEN_VEXT_ST_STRIDE(vsse8_v, int8_t, ste_b) @@ -269,6 +295,26 @@ static void vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uint32_t evl, uintptr_t ra) +{ + uint32_t i; + uint32_t max_elems = vext_max_elems(desc, log2_esz); + uint32_t esz = 1 << log2_esz; + + /* load bytes from guest memory */ + for (i = env->vstart; i < evl; i++, env->vstart++) { + target_ulong addr = base + (i << log2_esz); + ldst_elem(env, adjust_addr(env, addr), i, vd, ra); + } + env->vstart = 0; + + vext_set_tail_elems_1s(evl, vd, desc, 1, esz, max_elems); +} + +/* unmasked unit-stride segment load and store operation */ +static void +vext_ldst_us_segment(void *vd, target_ulong base, CPURISCVState *env, + uint32_t desc, vext_ldst_elem_fn *ldst_elem, + uint32_t log2_esz, uint32_t evl, uintptr_t ra) { uint32_t i, k; uint32_t nf = vext_nf(desc); @@ -312,10 +358,27 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ ctzl(sizeof(ETYPE)), env->vl, GETPC()); \ } +#define GEN_VEXT_LD_US_SEG(NAME, ETYPE, LOAD_FN) \ +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base, \ + CPURISCVState *env, uint32_t desc) \ +{ \ + uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE)); \ + vext_ldst_stride_segment(vd, v0, base, stride, env, desc, false, \ + LOAD_FN, ctzl(sizeof(ETYPE)), GETPC()); \ +} \ + \ +void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ + CPURISCVState *env, uint32_t desc) \ +{ \ + vext_ldst_us_segment(vd, base, env, desc, LOAD_FN, \ + ctzl(sizeof(ETYPE)), env->vl, GETPC()); \ +} + GEN_VEXT_LD_US(vle8_v, int8_t, lde_b) -GEN_VEXT_LD_US(vle16_v, int16_t, lde_h) -GEN_VEXT_LD_US(vle32_v, int32_t, lde_w) -GEN_VEXT_LD_US(vle64_v, int64_t, lde_d) +GEN_VEXT_LD_US_SEG(vlsege8_v, int8_t, lde_b) +GEN_VEXT_LD_US_SEG(vle16_v, int16_t, lde_h) +GEN_VEXT_LD_US_SEG(vle32_v, int32_t, lde_w) +GEN_VEXT_LD_US_SEG(vle64_v, int64_t, lde_d) #define GEN_VEXT_ST_US(NAME, ETYPE, STORE_FN) \ void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base, \ @@ -333,10 +396,27 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ ctzl(sizeof(ETYPE)), env->vl, GETPC()); \ } +#define GEN_VEXT_ST_US_SEG(NAME, ETYPE, STORE_FN) \ +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base, \ + CPURISCVState *env, uint32_t desc) \ +{ \ + uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE)); \ + vext_ldst_stride_segment(vd, v0, base, stride, env, desc, false, \ + STORE_FN, ctzl(sizeof(ETYPE)), GETPC()); \ +} \ + \ +void HELPER(NAME)(void *vd, void *v0, target_ulong base, \ + CPURISCVState *env, uint32_t desc) \ +{ \ + vext_ldst_us_segment(vd, base, env, desc, STORE_FN, \ + ctzl(sizeof(ETYPE)), env->vl, GETPC()); \ +} + GEN_VEXT_ST_US(vse8_v, int8_t, ste_b) -GEN_VEXT_ST_US(vse16_v, int16_t, ste_h) -GEN_VEXT_ST_US(vse32_v, int32_t, ste_w) -GEN_VEXT_ST_US(vse64_v, int64_t, ste_d) +GEN_VEXT_ST_US_SEG(vssege8_v, int8_t, ste_b) +GEN_VEXT_ST_US_SEG(vse16_v, int16_t, ste_h) +GEN_VEXT_ST_US_SEG(vse32_v, int32_t, ste_w) +GEN_VEXT_ST_US_SEG(vse64_v, int64_t, ste_d) /* * unit stride mask load and store, EEW = 1 From patchwork Fri May 31 17:44:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 13681973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E40ACC25B75 for ; Fri, 31 May 2024 17:46:03 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sD6Jr-0001qz-NA; Fri, 31 May 2024 13:45:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sD6Jq-0001qX-26 for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:34 -0400 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sD6Jj-0007WY-14 for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:33 -0400 Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1f6342c5fa8so8737075ad.1 for ; Fri, 31 May 2024 10:45:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1717177522; x=1717782322; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gNUjy3Lyly2ANVQLK35XiynkZqf6DtnuQXKWv8PGktk=; b=FoQpFzhv18orLpNtdahmGjjyvTDziPEu2rWiFuKariBndk72dNPgAO7+2Epi3IYS9s fb1nWUB2oFiGjAJ85ANpcJyXhqeGlgrgUBoRjZiNkXaAp528DeYA3YHdzNfRS6U52TN2 dAcPHlMhdln/kNh2QUWAxhKpWTvfMxCoNCbiXnc3fiJcfZsmmj2Vh6vAwQK2u8aAAJ99 FDBlfKmlW03DZSeJy7+qRxKktA9qYhSff4vdaU3B68pKdkx5ej6Wo7vgmm37e5uGv1wC DlXTPBN98THgGIYJuQx+qTCQGRMlf0YmUpOf+wldn03noR7g2+bpBf6CoqBDHRFwP/g9 DrDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717177522; x=1717782322; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gNUjy3Lyly2ANVQLK35XiynkZqf6DtnuQXKWv8PGktk=; b=AViJjq2K2aOKdf1Fo2tFqQQ73TkRIy9J78DQxVEqU16tjTFU0C6UGjgOJdP0rYaPbI viMnnMSvGkJ4J2oiGIm1u7hmVOD6rjfQEKJLcEgA7SDUgn/GjzTm7SyqFvaikmZHRGv7 47AX+Fq6vu7fzpdzWH4nDUl02nbyXbnug+DRB3HvxGC0u2FKbs68CID4IZ+n7K0JOxUg HHtE3PHqLLzBiEe6Uj6SWORYTPG3oTgxSUign/XiWcXBxp6SgNafAJMJTZ0eLeUuWY0T MrIQIesMtBztVlwY4FhwVF39fes8ioUtxd2Bs9qDXb1vCNg212nWQFLA6zOg66ewaVG1 By2w== X-Gm-Message-State: AOJu0YxNdUyuizJTzYNEV64ZjOTbfVAdI4ZePrYqiUzsjbRiRlClVwCX Z8k6gBKCI6ZVDZYKhI1dr5dfAMT7ElSdYp+LVHiRAIOS01EahihMuUXubiXs1Ky/J3Mw4kCOQ4S xPybXKCncI5rQtPDNT9nRT5JFMol/fR+v0shAqOsZ2a+seY9rMSgbDf60X3dVpLbS0V7cGrTa6v 7f8DPoHPaiYQDROBm2U4ePm0xeV4LXoGYAangZMw== X-Google-Smtp-Source: AGHT+IGUGjtUJ1hN8pGESw4gO6VOz43UlJixyBVehNle/WKiJXMIR/pDwvAq8FI01JD1YEoatIYU5g== X-Received: by 2002:a17:902:da89:b0:1f6:2ab1:6037 with SMTP id d9443c01a7336-1f6370ad379mr32146895ad.53.1717177522174; Fri, 31 May 2024 10:45:22 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f63236ce6dsm19389875ad.95.2024.05.31.10.45.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 May 2024 10:45:21 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: dbarboza@ventanamicro.com, Max Chou , Richard Henderson , Paolo Bonzini Subject: [RFC PATCH v2 2/6] accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb Date: Sat, 1 Jun 2024 01:44:49 +0800 Message-Id: <20240531174504.281461-3-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240531174504.281461-1-max.chou@sifive.com> References: <20240531174504.281461-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::635; envelope-from=max.chou@sifive.com; helo=mail-pl1-x635.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org If there are not any QEMU plugin memory callback functions, checking before calling the qemu_plugin_vcpu_mem_cb function can reduce the function call overhead. Signed-off-by: Max Chou --- accel/tcg/ldst_common.c.inc | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/accel/tcg/ldst_common.c.inc b/accel/tcg/ldst_common.c.inc index c82048e377e..87ceb954873 100644 --- a/accel/tcg/ldst_common.c.inc +++ b/accel/tcg/ldst_common.c.inc @@ -125,7 +125,9 @@ void helper_st_i128(CPUArchState *env, uint64_t addr, Int128 val, MemOpIdx oi) static void plugin_load_cb(CPUArchState *env, abi_ptr addr, MemOpIdx oi) { - qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_R); + if (cpu_plugin_mem_cbs_enabled(env_cpu(env))) { + qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_R); + } } uint8_t cpu_ldb_mmu(CPUArchState *env, abi_ptr addr, MemOpIdx oi, uintptr_t ra) @@ -188,7 +190,9 @@ Int128 cpu_ld16_mmu(CPUArchState *env, abi_ptr addr, static void plugin_store_cb(CPUArchState *env, abi_ptr addr, MemOpIdx oi) { - qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_W); + if (cpu_plugin_mem_cbs_enabled(env_cpu(env))) { + qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_W); + } } void cpu_stb_mmu(CPUArchState *env, abi_ptr addr, uint8_t val, From patchwork Fri May 31 17:44:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 13681976 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9AB18C25B75 for ; Fri, 31 May 2024 17:46:32 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sD6Jp-0001q1-72; Fri, 31 May 2024 13:45:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sD6Jn-0001na-Jj for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:31 -0400 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sD6Jk-0007YG-Id for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:31 -0400 Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1f612d7b0f5so14727105ad.0 for ; Fri, 31 May 2024 10:45:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1717177527; x=1717782327; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=E78ZGy3OpFHzGq7HIlOvBGwIH3ekaZUgpTv+1PnTGBc=; b=FlgR1ZAnRfbSXr8atAAqIt02wICb46u/x2dkAwJ5xqeBjWoZ5TQzyDlrhR6wYDUZP2 vYfWNoDXTcewrcFskZW/7PGAC9SKuu/7oZfXXQRVkJ98HpzkCNeV/ZePDM2zZQJcreWh 7mrzNM/YWvtnIH7WHD0KUARxOlFu9daa/YxZCiONe5xBquWtPvOP2NRsH2L1rg4Q4AxJ 4NBYNNH6iMISSxI/6D8gSNVay7yLMKabIPt4QvYfNYR2i3DaiLJHlwBgp4F4CAidjyCc Mu4seL6zOrSlpYtv64UUxWsHqExI4DIxN7mz7EoeUUjtSMO2E8SMWDN8vQTklgmNZjVA REoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717177527; x=1717782327; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=E78ZGy3OpFHzGq7HIlOvBGwIH3ekaZUgpTv+1PnTGBc=; b=vcRZ2w3wV/ju0vch1gq2OppiPYv1IWfqYFi6EG/pZOMnPdvlLdSODrDZGVpF2FTI8X qQYg0+RfmNYestK/CCFGdAFqofDnKF8N/WUVRX+BrsA2awZgx0mefYxvYlkXE+n78Z6f SzdoI6OugVVMDWgMArXXqn+zWJrDHqZdFCkqjtMh/ijrjMsSjq13k7qjaBrTq4PFQ6iR CYM1WzGy1ltqf/xMYjTkFXLf5DQYDl8dtcQT2V8GaXQx8hpAw2tcUxhHs5BrGCWimNBu Prv0oVxwG44ug+abopiGX+7JbaqcEq/cO/n5vZ6GCV0iC7obSdSlFzpNL5KIgBP6lHS4 TmEg== X-Gm-Message-State: AOJu0YyjeHVk8NaHNoJGZ70qYJyuoIXT+JEDhVjH6ObNUL0Kg9gFFTcQ a3yb131KzGo2s8tO3+NrMV5NGFw0etQrJXJaL+8RbqSuqUNSfOh1uF8HuAfbshfsejT8UWjB7n1 Q+xMCORQ6YRDU4eYj0MT7Kb+eGd6gk8iB19jT0zp+uJBhTLWTSrfL3jFRH+eifhoiOyGzWawGZJ VepASYBOYfQ1sZnvEl0Scf+mXmIheWTZn236kw9w== X-Google-Smtp-Source: AGHT+IEUMBJF6b1FvcfqYtWJcok/WJziyY+Fc3Fm4liI4hAtAOHeA1SNfJWhq2p23lAqFMz8Td3JCA== X-Received: by 2002:a17:902:ecc6:b0:1f6:2fca:361f with SMTP id d9443c01a7336-1f635a8302amr36876745ad.29.1717177526862; Fri, 31 May 2024 10:45:26 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f63236ce6dsm19389875ad.95.2024.05.31.10.45.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 May 2024 10:45:26 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: dbarboza@ventanamicro.com, Max Chou , Richard Henderson , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Liu Zhiwei Subject: [RFC PATCH v2 3/6] target/riscv: Inline vext_ldst_us and corresponding function for performance Date: Sat, 1 Jun 2024 01:44:50 +0800 Message-Id: <20240531174504.281461-4-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240531174504.281461-1-max.chou@sifive.com> References: <20240531174504.281461-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::635; envelope-from=max.chou@sifive.com; helo=mail-pl1-x635.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org In the vector unit-stride load/store helper functions. the vext_ldst_us function corresponding most of the execution time. Inline the functions can avoid the function call overhead to improve the helper function performance. Signed-off-by: Max Chou Reviewed-by: Richard Henderson --- target/riscv/vector_helper.c | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index 440c33c141b..cb7267c3217 100644 --- a/target/riscv/vector_helper.c +++ b/target/riscv/vector_helper.c @@ -149,25 +149,27 @@ static inline void vext_set_elem_mask(void *v0, int index, typedef void vext_ldst_elem_fn(CPURISCVState *env, abi_ptr addr, uint32_t idx, void *vd, uintptr_t retaddr); -#define GEN_VEXT_LD_ELEM(NAME, ETYPE, H, LDSUF) \ -static void NAME(CPURISCVState *env, abi_ptr addr, \ - uint32_t idx, void *vd, uintptr_t retaddr)\ -{ \ - ETYPE *cur = ((ETYPE *)vd + H(idx)); \ - *cur = cpu_##LDSUF##_data_ra(env, addr, retaddr); \ -} \ +#define GEN_VEXT_LD_ELEM(NAME, ETYPE, H, LDSUF) \ +static inline QEMU_ALWAYS_INLINE \ +void NAME(CPURISCVState *env, abi_ptr addr, \ + uint32_t idx, void *vd, uintptr_t retaddr) \ +{ \ + ETYPE *cur = ((ETYPE *)vd + H(idx)); \ + *cur = cpu_##LDSUF##_data_ra(env, addr, retaddr); \ +} \ GEN_VEXT_LD_ELEM(lde_b, int8_t, H1, ldsb) GEN_VEXT_LD_ELEM(lde_h, int16_t, H2, ldsw) GEN_VEXT_LD_ELEM(lde_w, int32_t, H4, ldl) GEN_VEXT_LD_ELEM(lde_d, int64_t, H8, ldq) -#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF) \ -static void NAME(CPURISCVState *env, abi_ptr addr, \ - uint32_t idx, void *vd, uintptr_t retaddr)\ -{ \ - ETYPE data = *((ETYPE *)vd + H(idx)); \ - cpu_##STSUF##_data_ra(env, addr, data, retaddr); \ +#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF) \ +static inline QEMU_ALWAYS_INLINE \ +void NAME(CPURISCVState *env, abi_ptr addr, \ + uint32_t idx, void *vd, uintptr_t retaddr) \ +{ \ + ETYPE data = *((ETYPE *)vd + H(idx)); \ + cpu_##STSUF##_data_ra(env, addr, data, retaddr); \ } GEN_VEXT_ST_ELEM(ste_b, int8_t, H1, stb) @@ -291,7 +293,7 @@ GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d) */ /* unmasked unit-stride load and store operation */ -static void +static inline QEMU_ALWAYS_INLINE void vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uint32_t evl, uintptr_t ra) From patchwork Fri May 31 17:44:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 13681979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5DF8CC41513 for ; Fri, 31 May 2024 17:47:00 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sD6Jt-0001s1-LP; Fri, 31 May 2024 13:45:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sD6Jr-0001qy-5q for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:35 -0400 Received: from mail-pl1-x632.google.com ([2607:f8b0:4864:20::632]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sD6Jp-0007cO-CU for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:34 -0400 Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-1f44b45d6abso16475715ad.0 for ; Fri, 31 May 2024 10:45:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1717177532; x=1717782332; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VglV/qt6rHvbvSgSjaTMT0boXZYOnbayc2OjwjqMVsk=; b=Rvi7d5AWZrvm3VJKexTywZB/I2w6wKfkF5sgJ+mTSvjqE32Fc6N5gC6bX64cEYJqGC qZN6hott+L2dy2Zz3ykM+RybSMszuzrRU/dBFCsh67AYYltLHCIevLQkbNb/rG5NRxXt Na9R8xBlESB06aS4+rhghZns4o/AFbUIhQDk2QsrwNjbMj5qFjorDgO//XDIpZRExJAn L4GcvvXJcvwIazXJsE2jzDB2yaKdLdgENh774ZiH3EpQ6FfpTVDE9+KisWTdvEZis++L MmpAmaMcndkkAE8GPEX/NkkOkCwzHKHM89fxeNchlZv6suu3LnMb/MdGEO0dmUu5aQMQ Fmtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717177532; x=1717782332; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VglV/qt6rHvbvSgSjaTMT0boXZYOnbayc2OjwjqMVsk=; b=rmqbAamSn/zXmI56mwOzRUt3ruvvARVYuqMYRJ+CRSDWwVfZKd/dal6gNtsi1GQoCS PwQEZRcqNQH+ft7Rhws0y6Gb0u/vYltXMnBWePmJtlSwfzuOu0lyQDQmzPDJZtdkSNpN Ozud+IPg8fmXpawneU3uhklLtn+Z+7v3W5Pjx86r6kBja6FPh6D9qDeWbMXXoCg6N87D TfQhyEjWmBE/K87Gr24HfTGyfbrhHHZdw2aO8JP/t26sOoTszW4BAE1eoB7V7IvtjAJf cdZqnLi1q4zZbmYBGHpcFoKHp9D1DX+8qiQLbvTIhxVZps9IeHF2dYDegY+PsxWAv8Vx s79Q== X-Gm-Message-State: AOJu0Yxx9cVvYld9aCPgqroo6DwqM+yt0IqUJDVD3h6ROXrafpL9YMrg MD48z+oGg38pvvJQDMwpAOqDmwSC8qUaZdcZVHJO1gyuz9pu8HWMt3X3PmzgmgQU88hMbtV0dQ6 mtfG448WYKDMqxl7drH+1Qgp591Rb/TX+GLaLvvXT4ZEDkoT2PprIT6QlDr81J02o5qd49Uwkb/ 6/MIgBnff7mha5B99CvQfhpHRTlBsNq2E4FmjRVw== X-Google-Smtp-Source: AGHT+IGrW1WvAV7CGEKP3jIuTMxCbXh4eSRvs9djyfvLCiCC4NI739KLdE7MiF4C/Z4Da1o5Am7XiQ== X-Received: by 2002:a17:903:22cb:b0:1f4:8372:db24 with SMTP id d9443c01a7336-1f63704b335mr33008705ad.38.1717177531741; Fri, 31 May 2024 10:45:31 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f63236ce6dsm19389875ad.95.2024.05.31.10.45.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 May 2024 10:45:31 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: dbarboza@ventanamicro.com, Max Chou , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Liu Zhiwei Subject: [RFC PATCH v2 4/6] target/riscv: Add check_probe_[read|write] helper functions Date: Sat, 1 Jun 2024 01:44:51 +0800 Message-Id: <20240531174504.281461-5-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240531174504.281461-1-max.chou@sifive.com> References: <20240531174504.281461-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::632; envelope-from=max.chou@sifive.com; helo=mail-pl1-x632.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The helper_check_probe_[read|write] functions wrap the probe_pages function to perform virtual address resolution for continuous vector load/store instructions. Signed-off-by: Max Chou --- target/riscv/helper.h | 4 ++++ target/riscv/vector_helper.c | 12 ++++++++++++ 2 files changed, 16 insertions(+) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index aaf68eadfb7..f4bc907e85f 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1,6 +1,10 @@ /* Exceptions */ DEF_HELPER_2(raise_exception, noreturn, env, i32) +/* Probe page */ +DEF_HELPER_FLAGS_3(check_probe_read, TCG_CALL_NO_WG, void, env, tl, tl) +DEF_HELPER_FLAGS_3(check_probe_write, TCG_CALL_NO_WG, void, env, tl, tl) + /* Floating Point - rounding mode */ DEF_HELPER_FLAGS_2(set_rounding_mode, TCG_CALL_NO_WG, void, env, i32) DEF_HELPER_FLAGS_2(set_rounding_mode_chkfrm, TCG_CALL_NO_WG, void, env, i32) diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index cb7267c3217..9263ab26b19 100644 --- a/target/riscv/vector_helper.c +++ b/target/riscv/vector_helper.c @@ -136,6 +136,18 @@ static void probe_pages(CPURISCVState *env, target_ulong addr, } } +void HELPER(check_probe_read)(CPURISCVState *env, target_ulong addr, + target_ulong len) +{ + probe_pages(env, addr, len, GETPC(), MMU_DATA_LOAD); +} + +void HELPER(check_probe_write)(CPURISCVState *env, target_ulong addr, + target_ulong len) +{ + probe_pages(env, addr, len, GETPC(), MMU_DATA_STORE); +} + static inline void vext_set_elem_mask(void *v0, int index, uint8_t value) { From patchwork Fri May 31 17:44:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 13681975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 45890C27C4F for ; Fri, 31 May 2024 17:46:26 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sD6KB-00021t-47; Fri, 31 May 2024 13:45:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sD6Jx-0001ul-FM for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:44 -0400 Received: from mail-pl1-x62e.google.com ([2607:f8b0:4864:20::62e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sD6Ju-0007fg-Tp for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:41 -0400 Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1f62a628b4cso12636575ad.1 for ; Fri, 31 May 2024 10:45:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1717177536; x=1717782336; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=USFoNRQZ4JeBfsf5SGx5N9OQ/v8517aQy/lRnAVzBe4=; b=ETkdgQ6U18k2lsNHHT565TsJA53p2kKmxmA7XpaOLuf5B6A9eDmd8UZeDEQykj5SGp CwtIRoBdQ0Uqqp5INxktGie9vwngesoNWCpVhQO47rYErHUi7gVvXDi9xVb8jHTgCtvI aQtm1vDQ2Nmt1PhL+bdM9t8FlN0Z800RBtY2CcF6CN30eWJ2DLiBPVGD7d91Ut7b6hDP qJ4n0LFbRjI/YxdcOTaBJ+DGuCCa25WIhK+iNWSwm50e5E5Xbu3iVkfMHCkNbMuKirTx vjFBWK4OMpTRT2eef0OYWmgqa8fd+hoS0d1+q3OZf1mdaMVFujoWeJB6z2dMJC/XjZd/ nHtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717177536; x=1717782336; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=USFoNRQZ4JeBfsf5SGx5N9OQ/v8517aQy/lRnAVzBe4=; b=C3q0dzlaIfv3Hku3Ksn838ptYrlp3A/xWXwOV1L0x63f0UCiH9oHbHFPll5BCwNw2s DYKTeAycvzMgAnZWQ/81geidj0fbts456jyL3SvaoZAZJ9kWtaWUKI1+EAvo2flBEWeg Z+KqVPs2ZvZ2FxUvnxA6irx/j2DdVkP6Uh/wOFqlB8hpma8kIvIOl4WvYVUBXCxxZGi1 il3O8kwFLJ3Bdzd2ZB4n3CKLG4hjBKgMPNqZeao06gwBu/ArScALUjF+qxbC76jOLlk7 FZTaxxyi60EpDzzyA6erU8JerABSC9M5noSs1hmCY3D8CaXVh8MPHyJf+OKh/TcAqZKs Bqqg== X-Gm-Message-State: AOJu0YyWvGe/q1aySq92JeD95aJ4gGynLTS1scNNNRRvPcejJKFN7Rp7 /fNQuR5SAE35hsUxOHZOV83RzeTIO4lVs1/L5DQE0tZedhIm1PNkz9E4ZLsSf6XPN1v4Ksg7xwu uuGV8/iR5IOYWwO1zbT3Sjm17N+7hNYZ5p4z1018F7H3ny5Lr0U95/VpDTxkh43GqBEa63kR6US wLmA//VranruNm56ZVFnyXPW7rxUCIVVTDFym4cQ== X-Google-Smtp-Source: AGHT+IFOytKgGcIwhycDTkjN3u3UrI7ejFH27Zz0XVNEoWlUGGCQYnvJc5JDHxbCZRuqO+fH+b72Eg== X-Received: by 2002:a17:902:d4d0:b0:1f4:f02f:cb14 with SMTP id d9443c01a7336-1f6370a0c13mr32942015ad.47.1717177536013; Fri, 31 May 2024 10:45:36 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f63236ce6dsm19389875ad.95.2024.05.31.10.45.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 May 2024 10:45:35 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: dbarboza@ventanamicro.com, Max Chou , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Liu Zhiwei , Richard Henderson Subject: [RFC PATCH v2 5/6] target/riscv: rvv: Optimize v[l|s]e8.v with limitations Date: Sat, 1 Jun 2024 01:44:52 +0800 Message-Id: <20240531174504.281461-6-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240531174504.281461-1-max.chou@sifive.com> References: <20240531174504.281461-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::62e; envelope-from=max.chou@sifive.com; helo=mail-pl1-x62e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The vector unit-stride load/store instructions (e.g. vle8.v/vse8.v) perform continuous load/store. We can replace the corresponding helper functions by TCG ops to copy more data at a time with following assumptions: * Perform virtual address resolution once for entire vector at beginning * Without mask * Without tail agnostic * Both host and target are little endian Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 197 +++++++++++++++++++++++- 1 file changed, 195 insertions(+), 2 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index 1e4fa797a86..bbac73bb12b 100644 --- a/target/riscv/insn_trans/trans_rvv.c.inc +++ b/target/riscv/insn_trans/trans_rvv.c.inc @@ -714,7 +714,105 @@ static bool ld_us_check(DisasContext *s, arg_r2nfvm* a, uint8_t eew) vext_check_load(s, a->rd, a->nf, a->vm, eew); } -GEN_VEXT_TRANS(vle8_v, MO_8, r2nfvm, ld_us_op, ld_us_check) +static bool trans_vle8_v(DisasContext *s, arg_r2nfvm * a) +{ + + if (ld_us_check(s, a, MO_8)) { + if (!HOST_BIG_ENDIAN && s->vstart_eq_zero && s->vta == 0 && a->vm) { + uint32_t vofs = vreg_ofs(s, a->rd); + uint32_t midx = s->mem_idx; + + TCGv_i64 t0, t1; + TCGv_i128 t16; + TCGv_ptr tp; + TCGv_ptr i = tcg_temp_new_ptr(); + TCGv len_remain = tcg_temp_new(); + TCGv rs1 = get_gpr(s, a->rs1, EXT_NONE); + TCGv addr = tcg_temp_new(); + + TCGLabel *loop_128 = gen_new_label(); + TCGLabel *remain_64 = gen_new_label(); + TCGLabel *remain_32 = gen_new_label(); + TCGLabel *remain_16 = gen_new_label(); + TCGLabel *remain_8 = gen_new_label(); + TCGLabel *over = gen_new_label(); + + tcg_gen_mov_tl(addr, rs1); + tcg_gen_mov_tl(len_remain, cpu_vl); + tcg_gen_muli_tl(len_remain, len_remain, a->nf); + tcg_gen_movi_ptr(i, 0); + + tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); + gen_helper_check_probe_read(tcg_env, addr, len_remain); + + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 16, remain_64); + + gen_set_label(loop_128); + + t16 = tcg_temp_new_i128(); + tcg_gen_qemu_ld_i128(t16, addr, midx, + MO_LE | MO_128 | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 16); + + tp = tcg_temp_new_ptr(); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_addi_ptr(i, i, 16); + + t0 = tcg_temp_new_i64(); + t1 = tcg_temp_new_i64(); + tcg_gen_extr_i128_i64(t0, t1, t16); + + tcg_gen_st_i64(t0, tp, vofs); + tcg_gen_st_i64(t1, tp, vofs + 8); + tcg_gen_subi_tl(len_remain, len_remain, 16); + + tcg_gen_brcondi_tl(TCG_COND_GEU, len_remain, 16, loop_128); + + gen_set_label(remain_64); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 8, remain_32); + tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 8); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_addi_ptr(i, i, 8); + tcg_gen_st_i64(t0, tp, vofs); + tcg_gen_subi_tl(len_remain, len_remain, 8); + + gen_set_label(remain_32); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 4, remain_16); + tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEUL | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 4); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_addi_ptr(i, i, 4); + tcg_gen_st32_i64(t0, tp, vofs); + tcg_gen_subi_tl(len_remain, len_remain, 4); + + gen_set_label(remain_16); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 2, remain_8); + tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEUW | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 2); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_addi_ptr(i, i, 2); + tcg_gen_st16_i64(t0, tp, vofs); + tcg_gen_subi_tl(len_remain, len_remain, 2); + + gen_set_label(remain_8); + tcg_gen_brcondi_tl(TCG_COND_EQ, len_remain, 0, over); + tcg_gen_qemu_ld_i64(t0, addr, midx, + MO_LE | MO_8 | MO_ATOM_NONE); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_st8_i64(t0, tp, vofs); + + gen_set_label(over); + + finalize_rvv_inst(s); + } else { + return ld_us_op(s, a, MO_8); + } + return true; + } + return false; +} + GEN_VEXT_TRANS(vle16_v, MO_16, r2nfvm, ld_us_op, ld_us_check) GEN_VEXT_TRANS(vle32_v, MO_32, r2nfvm, ld_us_op, ld_us_check) GEN_VEXT_TRANS(vle64_v, MO_64, r2nfvm, ld_us_op, ld_us_check) @@ -785,7 +883,102 @@ static bool st_us_check(DisasContext *s, arg_r2nfvm* a, uint8_t eew) vext_check_store(s, a->rd, a->nf, eew); } -GEN_VEXT_TRANS(vse8_v, MO_8, r2nfvm, st_us_op, st_us_check) +static bool trans_vse8_v(DisasContext *s, arg_r2nfvm * a) +{ + if (st_us_check(s, a, MO_8)) { + if (!HOST_BIG_ENDIAN && s->vstart_eq_zero && s->vta == 0 && a->vm) { + uint32_t vofs = vreg_ofs(s, a->rd); + uint32_t midx = s->mem_idx; + + TCGv_i64 t0, t1; + TCGv_i128 t16; + TCGv_ptr tp; + TCGv_ptr i = tcg_temp_new_ptr(); + TCGv len_remain = tcg_temp_new(); + TCGv rs1 = get_gpr(s, a->rs1, EXT_NONE); + TCGv addr = tcg_temp_new(); + + TCGLabel *loop_128 = gen_new_label(); + TCGLabel *remain_64 = gen_new_label(); + TCGLabel *remain_32 = gen_new_label(); + TCGLabel *remain_16 = gen_new_label(); + TCGLabel *remain_8 = gen_new_label(); + TCGLabel *over = gen_new_label(); + + tcg_gen_mov_tl(addr, rs1); + tcg_gen_mov_tl(len_remain, cpu_vl); + tcg_gen_muli_tl(len_remain, len_remain, a->nf); + tcg_gen_movi_ptr(i, 0); + + tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over); + gen_helper_check_probe_write(tcg_env, addr, len_remain); + + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 16, remain_64); + + gen_set_label(loop_128); + + t0 = tcg_temp_new_i64(); + t1 = tcg_temp_new_i64(); + tp = tcg_temp_new_ptr(); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_ld_i64(t0, tp, vofs); + tcg_gen_ld_i64(t1, tp, vofs + 8); + tcg_gen_addi_ptr(i, i, 16); + + t16 = tcg_temp_new_i128(); + tcg_gen_concat_i64_i128(t16, t0, t1); + + tcg_gen_qemu_st_i128(t16, addr, midx, + MO_LE | MO_128 | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 16); + tcg_gen_subi_tl(len_remain, len_remain, 16); + + tcg_gen_brcondi_tl(TCG_COND_GEU, len_remain, 16, loop_128); + + gen_set_label(remain_64); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 8, remain_32); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_ld_i64(t0, tp, vofs); + tcg_gen_addi_ptr(i, i, 8); + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 8); + tcg_gen_subi_tl(len_remain, len_remain, 8); + + gen_set_label(remain_32); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 4, remain_16); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_ld_i64(t0, tp, vofs); + tcg_gen_addi_ptr(i, i, 4); + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUL | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 4); + tcg_gen_subi_tl(len_remain, len_remain, 4); + + gen_set_label(remain_16); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 2, remain_8); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_ld_i64(t0, tp, vofs); + tcg_gen_addi_ptr(i, i, 2); + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUW | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 2); + tcg_gen_subi_tl(len_remain, len_remain, 2); + + gen_set_label(remain_8); + tcg_gen_brcondi_tl(TCG_COND_EQ, len_remain, 0, over); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_ld_i64(t0, tp, vofs); + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LE | MO_8 | MO_ATOM_NONE); + + gen_set_label(over); + + finalize_rvv_inst(s); + } else { + return st_us_op(s, a, MO_8); + } + return true; + } + return false; +} + GEN_VEXT_TRANS(vse16_v, MO_16, r2nfvm, st_us_op, st_us_check) GEN_VEXT_TRANS(vse32_v, MO_32, r2nfvm, st_us_op, st_us_check) GEN_VEXT_TRANS(vse64_v, MO_64, r2nfvm, st_us_op, st_us_check) From patchwork Fri May 31 17:44:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Chou X-Patchwork-Id: 13681980 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 281DEC25B75 for ; Fri, 31 May 2024 17:47:04 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sD6KE-00029D-QY; Fri, 31 May 2024 13:45:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sD6K0-0001z7-JU for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:46 -0400 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sD6Jy-0007iI-8N for qemu-devel@nongnu.org; Fri, 31 May 2024 13:45:44 -0400 Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1f4a5344ec7so17660415ad.1 for ; Fri, 31 May 2024 10:45:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1717177541; x=1717782341; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gCcFfZSpG0DTooCBgGsai6/Bi/5BGaMYihmO+rZ2ifU=; b=GW65MqIac/6mJdgzc3ZF6BCT4E1D5Vq/bQfGE8JV2udLkXquYKggwGXKY0Xu3MNrBs Vp9Snn9xtOsWu6cMm8ei6bbydUQJQYmrMyHwqlNgcd6TbQS5+HjRR7PLGFyxmRJJErq/ VIX79CSb2MT8Wn069qFKUXsPiucWJiK90vW5xoysEGw8tUb0GoR7UhNJcY9gh9kFkbET 4NE85bgbiCVgIKETvtMtyiQgkFeiFBbS7IWQnm5xV03rB7Bw/+f7QlUrGPdolt77t93X 6MQtx+8+uDo4F/+pVDDyaef4EJ9lUFi8mpwFYRT45I+nAzG9ZRVr9B25kVeNOBuJqgf2 xFlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717177541; x=1717782341; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gCcFfZSpG0DTooCBgGsai6/Bi/5BGaMYihmO+rZ2ifU=; b=w/x3W/ubN6urzonBvFZF6EVLb+wRriu1V2spoltTuyakRGiz2VNZJrz13FYbD7FENt KcJn4QwgCFIgQmlFyw78ECpufzYZkThZnmGukJVMTO9u2Aq6G35l654/uwDM2VkTjeAY NBT2Oc7PMENztc4udWHssKJLO1hAKTHrFrlyNnoLn4A3e1GW1eso951tD4xODBG2xSyQ erk5kWBWIk9DXUsCEMk6uaCWYPe1LB6BKaPOPT8Hurn+UwyGvWej46uNSnki8UaqNkFG uYhxmU/egyqof/4Zq1raP7eLvv41A1S3euwmQGEnguQyMLBoQNKadUqumiWPpXPo8ph1 N6pA== X-Gm-Message-State: AOJu0Yz7HRv+fHB+4XEB7ODF/0KsE/aeLcZmEbSG+SRCrhybdoXDdZWX UgGR1sAgn933YVy8dQFdvXEeSSNrbmuC8akhwqyuzcLVjtI82wBztQFDrkC0TLDaD/pIE6M/vGf cnI2Wy+tsQMoTxjiakl8t6ISFbAmADiILUnM6osUecqaii4UP0+TWTdKgjYwtiAv3bgLNyfCccf WwifKYdyCnEfj8Ax3yC2aXX58MDNHbhfsO5EvrwQ== X-Google-Smtp-Source: AGHT+IFtLgu8sT264xYDuQlZcz7mHFjTM8M3L7/mtylmHcLtt/e8zP6zBA5ztZLACl5ufm+3GwGwxA== X-Received: by 2002:a17:902:6806:b0:1f4:8a31:5a4c with SMTP id d9443c01a7336-1f635a4f54bmr34390495ad.24.1717177540488; Fri, 31 May 2024 10:45:40 -0700 (PDT) Received: from duncan.localdomain (114-35-142-126.hinet-ip.hinet.net. [114.35.142.126]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f63236ce6dsm19389875ad.95.2024.05.31.10.45.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 May 2024 10:45:40 -0700 (PDT) From: Max Chou To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Cc: dbarboza@ventanamicro.com, Max Chou , Palmer Dabbelt , Alistair Francis , Bin Meng , Weiwei Li , Liu Zhiwei , Richard Henderson Subject: [RFC PATCH v2 6/6] target/riscv: rvv: Optimize vl8re8.v/vs8r.v with limitations Date: Sat, 1 Jun 2024 01:44:53 +0800 Message-Id: <20240531174504.281461-7-max.chou@sifive.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240531174504.281461-1-max.chou@sifive.com> References: <20240531174504.281461-1-max.chou@sifive.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::635; envelope-from=max.chou@sifive.com; helo=mail-pl1-x635.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The vector load/store whole register instructions (e.g. vl8re8.v/vs8r.v) perform unmasked continuous load/store. We can optimize these instructions by replacing the corresponding helper functions by TCG ops to copy more data at a time with following assumptions: * Host and target are little endian Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 196 +++++++++++++++++++++++- 1 file changed, 194 insertions(+), 2 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index bbac73bb12b..44763ccec06 100644 --- a/target/riscv/insn_trans/trans_rvv.c.inc +++ b/target/riscv/insn_trans/trans_rvv.c.inc @@ -1402,11 +1402,108 @@ GEN_LDST_WHOLE_TRANS(vl4re8_v, 4) GEN_LDST_WHOLE_TRANS(vl4re16_v, 4) GEN_LDST_WHOLE_TRANS(vl4re32_v, 4) GEN_LDST_WHOLE_TRANS(vl4re64_v, 4) -GEN_LDST_WHOLE_TRANS(vl8re8_v, 8) GEN_LDST_WHOLE_TRANS(vl8re16_v, 8) GEN_LDST_WHOLE_TRANS(vl8re32_v, 8) GEN_LDST_WHOLE_TRANS(vl8re64_v, 8) +static bool trans_vl8re8_v(DisasContext *s, arg_r2 * a) +{ + if (require_rvv(s) && QEMU_IS_ALIGNED(a->rd, 8)) { + if (!HOST_BIG_ENDIAN && s->vstart_eq_zero) { + uint32_t vofs = vreg_ofs(s, a->rd); + uint32_t midx = s->mem_idx; + uint32_t evl = s->cfg_ptr->vlenb << 3; + + TCGv_i64 t0, t1; + TCGv_i128 t16; + TCGv_ptr tp; + TCGv_ptr i = tcg_temp_new_ptr(); + TCGv len_remain = tcg_temp_new(); + TCGv rs1 = get_gpr(s, a->rs1, EXT_NONE); + TCGv addr = tcg_temp_new(); + + TCGLabel *loop_128 = gen_new_label(); + TCGLabel *remain_64 = gen_new_label(); + TCGLabel *remain_32 = gen_new_label(); + TCGLabel *remain_16 = gen_new_label(); + TCGLabel *remain_8 = gen_new_label(); + TCGLabel *over = gen_new_label(); + + tcg_gen_mov_tl(addr, rs1); + tcg_gen_movi_tl(len_remain, evl); + tcg_gen_movi_ptr(i, 0); + + tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, evl, over); + gen_helper_check_probe_read(tcg_env, addr, len_remain); + + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 16, remain_64); + + gen_set_label(loop_128); + + t16 = tcg_temp_new_i128(); + tcg_gen_qemu_ld_i128(t16, addr, midx, + MO_LE | MO_128 | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 16); + + tp = tcg_temp_new_ptr(); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_addi_ptr(i, i, 16); + + t0 = tcg_temp_new_i64(); + t1 = tcg_temp_new_i64(); + tcg_gen_extr_i128_i64(t0, t1, t16); + + tcg_gen_st_i64(t0, tp, vofs); + tcg_gen_st_i64(t1, tp, vofs + 8); + tcg_gen_subi_tl(len_remain, len_remain, 16); + + tcg_gen_brcondi_tl(TCG_COND_GEU, len_remain, 16, loop_128); + + gen_set_label(remain_64); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 8, remain_32); + tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 8); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_addi_ptr(i, i, 8); + tcg_gen_st_i64(t0, tp, vofs); + tcg_gen_subi_tl(len_remain, len_remain, 8); + + gen_set_label(remain_32); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 4, remain_16); + tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEUL | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 4); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_addi_ptr(i, i, 4); + tcg_gen_st32_i64(t0, tp, vofs); + tcg_gen_subi_tl(len_remain, len_remain, 4); + + gen_set_label(remain_16); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 2, remain_8); + tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEUW | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 2); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_addi_ptr(i, i, 2); + tcg_gen_st16_i64(t0, tp, vofs); + tcg_gen_subi_tl(len_remain, len_remain, 2); + + gen_set_label(remain_8); + tcg_gen_brcondi_tl(TCG_COND_EQ, len_remain, 0, over); + tcg_gen_qemu_ld_i64(t0, addr, midx, + MO_LE | MO_8 | MO_ATOM_NONE); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_st8_i64(t0, tp, vofs); + + gen_set_label(over); + + finalize_rvv_inst(s); + } else { + return ldst_whole_trans(a->rd, a->rs1, 8, gen_helper_vl8re8_v, s); + } + return true; + } + return false; +} + /* * The vector whole register store instructions are encoded similar to * unmasked unit-stride store of elements with EEW=8. @@ -1414,7 +1511,102 @@ GEN_LDST_WHOLE_TRANS(vl8re64_v, 8) GEN_LDST_WHOLE_TRANS(vs1r_v, 1) GEN_LDST_WHOLE_TRANS(vs2r_v, 2) GEN_LDST_WHOLE_TRANS(vs4r_v, 4) -GEN_LDST_WHOLE_TRANS(vs8r_v, 8) + +static bool trans_vs8r_v(DisasContext *s, arg_r2 * a) +{ + if (require_rvv(s) && QEMU_IS_ALIGNED(a->rd, 8)) { + if (!HOST_BIG_ENDIAN && s->vstart_eq_zero) { + uint32_t vofs = vreg_ofs(s, a->rd); + uint32_t midx = s->mem_idx; + uint32_t evl = s->cfg_ptr->vlenb << 3; + + TCGv_i64 t0, t1; + TCGv_i128 t16; + TCGv_ptr tp; + TCGv_ptr i = tcg_temp_new_ptr(); + TCGv len_remain = tcg_temp_new(); + TCGv rs1 = get_gpr(s, a->rs1, EXT_NONE); + TCGv addr = tcg_temp_new(); + + TCGLabel *loop_128 = gen_new_label(); + TCGLabel *remain_64 = gen_new_label(); + TCGLabel *remain_32 = gen_new_label(); + TCGLabel *remain_16 = gen_new_label(); + TCGLabel *remain_8 = gen_new_label(); + TCGLabel *over = gen_new_label(); + + tcg_gen_mov_tl(addr, rs1); + tcg_gen_movi_tl(len_remain, evl); + tcg_gen_movi_ptr(i, 0); + + tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, evl, over); + gen_helper_check_probe_write(tcg_env, addr, len_remain); + + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 16, remain_64); + + gen_set_label(loop_128); + + t0 = tcg_temp_new_i64(); + t1 = tcg_temp_new_i64(); + tp = tcg_temp_new_ptr(); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_ld_i64(t0, tp, vofs); + tcg_gen_ld_i64(t1, tp, vofs + 8); + tcg_gen_addi_ptr(i, i, 16); + + t16 = tcg_temp_new_i128(); + tcg_gen_concat_i64_i128(t16, t0, t1); + + tcg_gen_qemu_st_i128(t16, addr, midx, + MO_LE | MO_128 | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 16); + tcg_gen_subi_tl(len_remain, len_remain, 16); + + tcg_gen_brcondi_tl(TCG_COND_GEU, len_remain, 16, loop_128); + + gen_set_label(remain_64); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 8, remain_32); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_ld_i64(t0, tp, vofs); + tcg_gen_addi_ptr(i, i, 8); + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 8); + tcg_gen_subi_tl(len_remain, len_remain, 8); + + gen_set_label(remain_32); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 4, remain_16); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_ld_i64(t0, tp, vofs); + tcg_gen_addi_ptr(i, i, 4); + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUL | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 4); + tcg_gen_subi_tl(len_remain, len_remain, 4); + + gen_set_label(remain_16); + tcg_gen_brcondi_tl(TCG_COND_LTU, len_remain, 2, remain_8); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_ld_i64(t0, tp, vofs); + tcg_gen_addi_ptr(i, i, 2); + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUW | MO_ATOM_NONE); + tcg_gen_addi_tl(addr, addr, 2); + tcg_gen_subi_tl(len_remain, len_remain, 2); + + gen_set_label(remain_8); + tcg_gen_brcondi_tl(TCG_COND_EQ, len_remain, 0, over); + tcg_gen_add_ptr(tp, tcg_env, i); + tcg_gen_ld_i64(t0, tp, vofs); + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LE | MO_8 | MO_ATOM_NONE); + + gen_set_label(over); + + finalize_rvv_inst(s); + } else { + return ldst_whole_trans(a->rd, a->rs1, 8, gen_helper_vl8re8_v, s); + } + return true; + } + return false; +} /* *** Vector Integer Arithmetic Instructions