[64/99] tcg/i386: Fix dup_vec in non-AVX2 codepath

Message ID	20180723201748.25573-65-mdroth@linux.vnet.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Michael Roth <mdroth@linux.vnet.ibm.com> To: qemu-devel@nongnu.org Date: Mon, 23 Jul 2018 15:17:13 -0500 Message-Id: <20180723201748.25573-65-mdroth@linux.vnet.ibm.com> In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com> References: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com> Subject: [Qemu-devel] [PATCH 64/99] tcg/i386: Fix dup_vec in non-AVX2 codepath Precedence: list Cc: Peter Maydell <peter.maydell@linaro.org>, Richard Henderson <richard.henderson@linaro.org>, qemu-stable@nongnu.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	Patch Round-up for stable 2.12.1, freeze on 2018-07-30 \| expand [00/99] Patch Round-up for stable 2.12.1, freeze on 2018-07-30 [01/99] tests: fix tpm-crb tpm-tis tests race [02/99] device_tree: Increase FDT_MAX_SIZE to 1 MiB [03/99] ccid: Fix dwProtocols advertisement of T=0 [04/99] nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE [05/99] s390-ccw: force diag 308 subcode to unsigned long [06/99] tcg/arm: Fix memory barrier encoding [07/99] target/arm: Implement v8M VLLDM and VLSTM [08/99] target/ppc: always set PPC_MEM_TLBIE in pre 2.8 migration hack [09/99] spapr: don't advertise radix GTSE if max-compat-cpu < power9 [10/99] qxl: fix local renderer crash [11/99] configure: recognize more rpmbuild macros [12/99] qemu-img: Resolve relative backing paths in rebase [13/99] iotests: Add test for rebasing with relative paths [14/99] qemu-io: Use purely string blockdev options [15/99] qemu-img: Use only string options in img_open_opts [16/99] iotests: Add test for -U/force-share conflicts [17/99] lm32: take BQL before writing IP/IM register [18/99] raw: Check byte range uniformly [19/99] s390x/css: disabled subchannels cannot be status pending [20/99] pc-bios/s390-ccw: struct tpi_info must be declared as aligned(4) [21/99] virtio-ccw: common reset handler [22/99] s390x/ccw: make sure all ccw devices are properly reset [23/99] console: Avoid segfault in screendump [24/99] hw/intc/arm_gicv3: Fix APxR<n> register dispatching [25/99] intel-iommu: send PSI always even if across PDEs [26/99] intel-iommu: remove IntelIOMMUNotifierNode [27/99] intel-iommu: add iommu lock [28/99] intel-iommu: only do page walk for MAP notifiers [29/99] intel-iommu: introduce vtd_page_walk_info [30/99] intel-iommu: pass in address space when page walk [31/99] intel-iommu: trace domain id during page walk [32/99] util: implement simple iova tree [33/99] intel-iommu: rework the page walk logic [34/99] arm_gicv3_kvm: increase clroffset accordingly [35/99] Fix libusb-1.0.22 deprecated libusb_set_debug with libusb_set_option [36/99] ahci: fix PxCI register race [37/99] arm_gicv3_kvm: kvm_dist_get/put: skip the registers banked by GICR [38/99] block: Make bdrv_is_writable() public [39/99] qcow2: Do not mark inactive images corrupt [40/99] iotests: Add case for a corrupted inactive image [41/99] throttle: Fix crash on reopen [42/99] i386: define the 'ssbd' CPUID feature bit (CVE-2018-3639) [43/99] i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639) [44/99] i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639) [45/99] tap: set vhostfd passed from qemu cli to non-blocking [46/99] vhost-user: delete net client if necessary [47/99] qemu-img: Fix assert when mapping unaligned raw file [48/99] iotests: Add test 221 to catch qemu-img map regression [49/99] arm_gicv3_kvm: kvm_dist_get/put_priority: skip the registers banked by GICR_IPRIORITYR [50/99] usb: correctly handle Zero Length Packets [51/99] usb/dev-mtp: Fix use of uninitialized values [52/99] vnc: fix use-after-free [53/99] block/mirror: honor ratelimit again [54/99] cpus: tcg: fix never exiting loop on unplug [55/99] nbd/client: fix nbd_negotiate_simple_meta_context [56/99] migration/block-dirty-bitmap: fix memory leak in dirty_bitmap_load_bits [57/99] qapi: fill in CpuInfoFast.arch in query-cpus-fast [58/99] block/mirror: Make cancel always cancel pre-READY [59/99] iotests: Add test for cancelling a mirror job [60/99] riscv: spike: allow base == 0 [61/99] riscv: htif: increase the priority of the htif subregion [62/99] riscv: requires libfdt [63/99] nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply [64/99] tcg/i386: Fix dup_vec in non-AVX2 codepath [65/99] softfloat: Handle default NaN mode after pickNaNMulAdd, not before [66/99] tcg: Limit the number of ops in a TB [67/99] RISC-V: Minimal QEMU 2.12 fix for sifive_u machine [68/99] blockjob: expose error string via query [69/99] target/arm: Fix fp_status_f16 tininess before rounding [70/99] fpu/softfloat: Don't set Invalid for float-to-int(MAXINT) [71/99] target/arm: Implement vector shifted SCVF/UCVF for fp16 [72/99] target/arm: Implement vector shifted FCVT for fp16 [73/99] target/arm: Fix float16 to/from int16 [74/99] target/arm: Clear SVE high bits for FMOV [75/99] fpu/softfloat: Fix conversion from uint64 to float128 [76/99] target/arm: Implement FMOV (general) for fp16 [77/99] target/arm: Implement FCVT (scalar, integer) for fp16 [78/99] target/arm: Implement FCVT (scalar, fixed-point) for fp16 [79/99] target/arm: Introduce and use read_fp_hreg [80/99] target/arm: Implement FP data-processing (2 source) for fp16 [81/99] target/arm: Implement FP data-processing (3 source) for fp16 [82/99] target/arm: Implement FCMP for fp16 [83/99] target/arm: Implement FCSEL for fp16 [84/99] target/arm: Implement FMOV (immediate) for fp16 [85/99] target/arm: Fix sqrt_f16 exception raising [86/99] hw/isa/superio: Fix inconsistent use of Chardev->be [87/99] mux: fix ctrl-a b again [88/99] nfs: Remove processed options from QDict [89/99] replace functions which are only available in glib-2.24 [90/99] vfio/pci: Default display option to "off" [91/99] migration/block-dirty-bitmap: fix dirty_bitmap_load [92/99] tcg: Reduce max TB opcode count [93/99] nbd/server: Reject 0-length block status request [94/99] iscsi: Avoid potential for get_status overflow [95/99] virtio-rng: process pending requests on DRIVER_OK [96/99] target/ppc: set is_jmp on ppc_tr_breakpoint_check [97/99] tap: fix memory leak on success to create a tap device [98/99] qemu-img: avoid overflow of min_sparse parameter [99/99] tcg/i386: Mark xmm registers call-clobbered

Message ID

20180723201748.25573-65-mdroth@linux.vnet.ibm.com (mailing list archive)

State

New, archived

Headers

From: Michael Roth <mdroth@linux.vnet.ibm.com>
To: qemu-devel@nongnu.org
Date: Mon, 23 Jul 2018 15:17:13 -0500
Message-Id: <20180723201748.25573-65-mdroth@linux.vnet.ibm.com>
In-Reply-To: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
References: <20180723201748.25573-1-mdroth@linux.vnet.ibm.com>
Subject: [Qemu-devel] [PATCH 64/99] tcg/i386: Fix dup_vec in non-AVX2
 codepath
Precedence: list
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Richard Henderson <richard.henderson@linaro.org>, qemu-stable@nongnu.org
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Series

Patch Round-up for stable 2.12.1, freeze on 2018-07-30 | expand

Commit Message

Michael Roth July 23, 2018, 8:17 p.m. UTC

From: Peter Maydell <peter.maydell@linaro.org>

The VPUNPCKLD* instructions are all "non-destructive source",
indicated by "NDS" in the encoding string in the x86 ISA manual.
This means that they take two source operands, one of which is
encoded in the VEX.vvvv field. We were incorrectly treating them
as if they were destructive-source and passing 0 as the 'v'
argument of tcg_out_vex_modrm(). This meant we were always
using %xmm0 as one of the source operands, causing incorrect
results if the register allocator happened to want to use
something else. For instance the input AArch64 insn:
 DUP v26.16b, w21
which becomes TCG IR ops:
 dup_vec v128,e8,tmp2,x21
 st_vec v128,e8,tmp2,env,$0xa40
was assembled to:
0x607c568c:  c4 c1 7a 7e 86 e8 00 00  vmovq    0xe8(%r14), %xmm0
0x607c5694:  00
0x607c5695:  c5 f9 60 c8              vpunpcklbw %xmm0, %xmm0, %xmm1
0x607c5699:  c5 f9 61 c9              vpunpcklwd %xmm1, %xmm0, %xmm1
0x607c569d:  c5 f9 70 c9 00           vpshufd  $0, %xmm1, %xmm1
0x607c56a2:  c4 c1 7a 7f 8e 40 0a 00  vmovdqu  %xmm1, 0xa40(%r14)
0x607c56aa:  00

when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1".
This resulted in our incorrectly setting the output vector to
q26=0000320000003200:0000320000003200
when given an input of x21 == 0000000002803200
rather than the expected all-zeroes.

Pass the correct source register number to tcg_out_vex_modrm()
for these insns.

Fixes: 770c2fc7bb70804a
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180504153431.5169-1-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit 7eb30ef0ba2eb59e7430d4848ae8d4bf4e50f768)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 tcg/i386/tcg-target.inc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index d7e59e79c5..5357909fff 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -854,11 +854,11 @@  static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
         switch (vece) {
         case MO_8:
             /* ??? With zero in a register, use PSHUFB.  */
-            tcg_out_vex_modrm(s, OPC_PUNPCKLBW, r, 0, a);
+            tcg_out_vex_modrm(s, OPC_PUNPCKLBW, r, a, a);
             a = r;
             /* FALLTHRU */
         case MO_16:
-            tcg_out_vex_modrm(s, OPC_PUNPCKLWD, r, 0, a);
+            tcg_out_vex_modrm(s, OPC_PUNPCKLWD, r, a, a);
             a = r;
             /* FALLTHRU */
         case MO_32:
@@ -867,7 +867,7 @@  static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
             tcg_out8(s, 0);
             break;
         case MO_64:
-            tcg_out_vex_modrm(s, OPC_PUNPCKLQDQ, r, 0, a);
+            tcg_out_vex_modrm(s, OPC_PUNPCKLQDQ, r, a, a);
             break;
         default:
             g_assert_not_reached();