From patchwork Tue Oct 18 17:24:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Simpson X-Patchwork-Id: 13010855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5429BC433FE for ; Tue, 18 Oct 2022 17:33:32 +0000 (UTC) Received: from localhost ([::1]:32928 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1okqT4-0004ao-WA for qemu-devel@archiver.kernel.org; Tue, 18 Oct 2022 13:33:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:38084) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1okqKn-0007xv-KG for qemu-devel@nongnu.org; Tue, 18 Oct 2022 13:24:59 -0400 Received: from mx0b-0031df01.pphosted.com ([205.220.180.131]:14672) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1okqKl-0002Bu-EP for qemu-devel@nongnu.org; Tue, 18 Oct 2022 13:24:57 -0400 Received: from pps.filterd (m0279869.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29IFU9kt023356; Tue, 18 Oct 2022 17:24:49 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : mime-version : content-type : content-transfer-encoding; s=qcppdkim1; bh=waYmqSjdyxfUQpQlp2ywcDx1KHg+mhuKzQkbBz38HdY=; b=VM4kRUMAYBaN9RO7E92kH5Wbaw8cRhFCxM2ytNi92hNmtd9QdGkqSKdKlTVl0UE0WmEu ulH0yH3+73oeQB+rV9L33woDf9DWfr/ndmltMCsfd5t5wnCfihgQkYUs9zTYihIBfQ+i CaacE0MwoW0eHCj9ls/0Kab5RINXMDe/25VwcOCfbO21NZO3GEZUhweKeUkisJsMeRgZ EGFlJlTDLrCM/AnKURI9lJTyqQ+J66xLLeg/Ntz3S0+qns7Zds8O8q/or8NNTsaUXyTO m7aqDWOC1EUNi80iK+wsaLF+y6GIEXcNPGao1vE+tZA5OLDZ54ZbeN4lge3J3t8c+LRa /A== Received: from nalasppmta04.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3k9jf21xrf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Oct 2022 17:24:49 +0000 Received: from pps.filterd (NALASPPMTA04.qualcomm.com [127.0.0.1]) by NALASPPMTA04.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTP id 29IHOmCa008558; Tue, 18 Oct 2022 17:24:48 GMT Received: from pps.reinject (localhost [127.0.0.1]) by NALASPPMTA04.qualcomm.com (PPS) with ESMTPS id 3k7nxkpxuk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 18 Oct 2022 17:24:48 +0000 Received: from NALASPPMTA04.qualcomm.com (NALASPPMTA04.qualcomm.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 29IHOl2i008553; Tue, 18 Oct 2022 17:24:47 GMT Received: from hu-devc-lv-u18-c.qualcomm.com (hu-tsimpson-lv.qualcomm.com [10.47.235.220]) by NALASPPMTA04.qualcomm.com (PPS) with ESMTP id 29IHOlTT008552; Tue, 18 Oct 2022 17:24:47 +0000 Received: by hu-devc-lv-u18-c.qualcomm.com (Postfix, from userid 47164) id 5CCB15000A7; Tue, 18 Oct 2022 10:24:47 -0700 (PDT) From: Taylor Simpson To: qemu-devel@nongnu.org Cc: tsimpson@quicinc.com, richard.henderson@linaro.org, f4bug@amsat.org, ale@rev.ng, anjo@rev.ng, bcain@quicinc.com, quic_mathbern@quicinc.com Subject: [PATCH v2] Hexagon (target/hexagon) Fix predicated assignment to .tmp and .cur Date: Tue, 18 Oct 2022 10:24:46 -0700 Message-Id: <20221018172446.25766-1-tsimpson@quicinc.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-QCInternal: smtphost X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: CIXKjEqNngimqVWf-tcTTlgrwoHzbnMT X-Proofpoint-ORIG-GUID: CIXKjEqNngimqVWf-tcTTlgrwoHzbnMT X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_06,2022-10-18_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 phishscore=0 impostorscore=0 clxscore=1011 lowpriorityscore=0 suspectscore=0 priorityscore=1501 mlxscore=0 mlxlogscore=861 adultscore=0 malwarescore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2210180098 Received-SPF: pass client-ip=205.220.180.131; envelope-from=tsimpson@qualcomm.com; helo=mx0b-0031df01.pphosted.com X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" *** Changes in v2 *** Update test case to use both true and false predicates Add fix for .cur Here are example instructions with a predicated .tmp/.cur assignment if (p1) v12.tmp = vmem(r7 + #0) if (p0) v12.cur = vmem(r9 + #0) The .tmp/.cur indicates that references to v12 in the same packet take the result of the load. However, when the predicate is false, the value at the start of the packet should be used. After the packet commits, the .tmp value is dropped, but the .cur value is maintained. To fix this bug, we preload the original value from the HVX register into the temporary used for the result. Test cases added to tests/tcg/hexagon/hvx_misc.c Co-authored-by: Matheus Tavares Bernardino Signed-off-by: Matheus Tavares Bernardino Signed-off-by: Taylor Simpson --- target/hexagon/translate.h | 12 +++++- tests/tcg/hexagon/hvx_misc.c | 72 +++++++++++++++++++++++++++++++++ target/hexagon/gen_tcg_funcs.py | 16 ++++++++ 3 files changed, 99 insertions(+), 1 deletion(-) diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h index a245172827..2d563cea14 100644 --- a/target/hexagon/translate.h +++ b/target/hexagon/translate.h @@ -1,5 +1,5 @@ /* - * Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Reserved. + * Copyright(c) 2019-2022 Qualcomm Innovation Center, Inc. All Rights Reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -83,6 +83,16 @@ static inline bool is_preloaded(DisasContext *ctx, int num) return test_bit(num, ctx->regs_written); } +static inline bool is_tmp_vreg_preloaded(DisasContext *ctx, int num) +{ + return test_bit(num, ctx->vregs_updated_tmp); +} + +static inline bool is_future_vreg_preloaded(DisasContext *ctx, int num) +{ + return test_bit(num, ctx->vregs_select); +} + intptr_t ctx_future_vreg_off(DisasContext *ctx, int regnum, int num, bool alloc_ok); intptr_t ctx_tmp_vreg_off(DisasContext *ctx, int regnum, diff --git a/tests/tcg/hexagon/hvx_misc.c b/tests/tcg/hexagon/hvx_misc.c index 6e2c9ab3cd..53d5c9b44f 100644 --- a/tests/tcg/hexagon/hvx_misc.c +++ b/tests/tcg/hexagon/hvx_misc.c @@ -541,6 +541,75 @@ static void test_vshuff(void) check_output_b(__LINE__, 1); } +static void test_load_tmp_predicated(void) +{ + void *p0 = buffer0; + void *p1 = buffer1; + void *pout = output; + bool pred = true; + + for (int i = 0; i < BUFSIZE; i++) { + /* + * Load into v12 as .tmp with a predicate + * When the predicate is true, we get the vector from buffer1[i] + * When the predicate is false, we get a vector of all 1's + * Regardless of the predicate, the next packet should have + * a vector of all 1's + */ + asm("v3 = vmem(%0 + #0)\n\t" + "r1 = #1\n\t" + "v12 = vsplat(r1)\n\t" + "p1 = !cmp.eq(%3, #0)\n\t" + "{\n\t" + " if (p1) v12.tmp = vmem(%1 + #0)\n\t" + " v4.w = vadd(v12.w, v3.w)\n\t" + "}\n\t" + "v4.w = vadd(v4.w, v12.w)\n\t" + "vmem(%2 + #0) = v4\n\t" + : : "r"(p0), "r"(p1), "r"(pout), "r"(pred) + : "r1", "p1", "v12", "v3", "v4", "v6", "memory"); + p0 += sizeof(MMVector); + p1 += sizeof(MMVector); + pout += sizeof(MMVector); + + for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) { + expect[i].w[j] = + pred ? buffer0[i].w[j] + buffer1[i].w[j] + 1 + : buffer0[i].w[j] + 2; + } + pred = !pred; + } + + check_output_w(__LINE__, BUFSIZE); +} + +static void test_load_cur_predicated(void) +{ + bool pred = true; + for (int i = 0; i < BUFSIZE; i++) { + asm volatile("p0 = !cmp.eq(%3, #0)\n\t" + "v3 = vmem(%0+#0)\n\t" + /* + * Preload v4 to make sure that the assignment from the + * packet below is not being ignored when pred is false. + */ + "r0 = #0x01237654\n\t" + "v4 = vsplat(r0)\n\t" + "{\n\t" + " if (p0) v3.cur = vmem(%1+#0)\n\t" + " v4 = v3\n\t" + "}\n\t" + "vmem(%2+#0) = v4\n\t" + : + : "r"(&buffer0[i]), "r"(&buffer1[i]), + "r"(&output[i]), "r"(pred) + : "r0", "p0", "v3", "v4", "memory"); + expect[i] = pred ? buffer1[i] : buffer0[i]; + pred = !pred; + } + check_output_w(__LINE__, BUFSIZE); +} + int main() { init_buffers(); @@ -578,6 +647,9 @@ int main() test_vshuff(); + test_load_tmp_predicated(); + test_load_cur_predicated(); + puts(err ? "FAIL" : "PASS"); return err ? 1 : 0; } diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py index 6dea02b0b9..de0e06ab71 100755 --- a/target/hexagon/gen_tcg_funcs.py +++ b/target/hexagon/gen_tcg_funcs.py @@ -173,6 +173,22 @@ def genptr_decl(f, tag, regtype, regid, regno): f.write(" ctx_future_vreg_off(ctx, %s%sN," % \ (regtype, regid)) f.write(" 1, true);\n"); + if regid != "y" and 'A_CONDEXEC' in hex_common.attribdict[tag]: + if hex_common.is_tmp_result(tag): + preload_test_fn = "is_tmp_vreg_preloaded" + else: + preload_test_fn = "is_future_vreg_preloaded" + f.write(" if (!%s(ctx, %s)) {\n" % (preload_test_fn, regN)) + f.write(" intptr_t src_off =") + f.write(" offsetof(CPUHexagonState, VRegs[%s%sN]);\n"% \ + (regtype, regid)) + f.write(" tcg_gen_gvec_mov(MO_64, %s%sV_off,\n" % \ + (regtype, regid)) + f.write(" src_off,\n") + f.write(" sizeof(MMVector),\n") + f.write(" sizeof(MMVector));\n") + f.write(" }\n") + if (not hex_common.skip_qemu_helper(tag)): f.write(" TCGv_ptr %s%sV = tcg_temp_new_ptr();\n" % \ (regtype, regid))