From patchwork Mon Jul 1 11:17:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 11025615 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1133B746 for ; Mon, 1 Jul 2019 11:18:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC70428640 for ; Mon, 1 Jul 2019 11:18:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DEC732866D; Mon, 1 Jul 2019 11:18:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1B88F28640 for ; Mon, 1 Jul 2019 11:18:56 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hhuJa-0000LI-Qy; Mon, 01 Jul 2019 11:17:42 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hhuJZ-0000Kx-BR for xen-devel@lists.xenproject.org; Mon, 01 Jul 2019 11:17:41 +0000 X-Inumbo-ID: d4634e10-9bf1-11e9-b13d-5374c1d682f8 Received: from m4a0040g.houston.softwaregrp.com (unknown [15.124.2.86]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id d4634e10-9bf1-11e9-b13d-5374c1d682f8; Mon, 01 Jul 2019 11:17:37 +0000 (UTC) Received: FROM m4a0040g.houston.softwaregrp.com (15.120.17.146) BY m4a0040g.houston.softwaregrp.com WITH ESMTP; Mon, 1 Jul 2019 11:17:18 +0000 Received: from M9W0067.microfocus.com (2002:f79:be::f79:be) by M4W0334.microfocus.com (2002:f78:1192::f78:1192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10; Mon, 1 Jul 2019 11:17:03 +0000 Received: from NAM02-BL2-obe.outbound.protection.outlook.com (15.124.72.10) by M9W0067.microfocus.com (15.121.0.190) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10 via Frontend Transport; Mon, 1 Jul 2019 11:17:03 +0000 Received: from BY5PR18MB3394.namprd18.prod.outlook.com (10.255.139.95) by BY5PR18MB3186.namprd18.prod.outlook.com (10.255.137.79) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2032.20; Mon, 1 Jul 2019 11:17:01 +0000 Received: from BY5PR18MB3394.namprd18.prod.outlook.com ([fe80::2005:4b02:1d60:d1bc]) by BY5PR18MB3394.namprd18.prod.outlook.com ([fe80::2005:4b02:1d60:d1bc%3]) with mapi id 15.20.2008.020; Mon, 1 Jul 2019 11:17:01 +0000 From: Jan Beulich To: "xen-devel@lists.xenproject.org" Thread-Topic: [PATCH v9 02/23] x86emul: support remaining misc AVX512{F,BW} insns Thread-Index: AQHVL/6B8vp6/0SnCkKFnkrxkZ15kQ== Date: Mon, 1 Jul 2019 11:17:01 +0000 Message-ID: <637adea4-61a9-6260-3464-01a20f0c6214@suse.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: LNXP265CA0012.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:5e::24) To BY5PR18MB3394.namprd18.prod.outlook.com (2603:10b6:a03:194::31) authentication-results: spf=none (sender IP is ) smtp.mailfrom=JBeulich@suse.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [87.234.252.170] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 863a1ddf-89d8-484e-7e59-08d6fe15a35a x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:BY5PR18MB3186; x-ms-traffictypediagnostic: BY5PR18MB3186: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:1417; x-forefront-prvs: 00851CA28B x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(979002)(4636009)(346002)(366004)(39860400002)(376002)(396003)(136003)(189003)(199004)(2351001)(8676002)(186003)(102836004)(2906002)(54906003)(256004)(6512007)(6916009)(71190400001)(2501003)(66946007)(6506007)(66476007)(66556008)(81156014)(3846002)(53936002)(52116002)(486006)(71200400001)(6116002)(80792005)(11346002)(305945005)(4326008)(81166006)(446003)(26005)(386003)(31696002)(73956011)(64756008)(66446008)(2616005)(36756003)(476003)(5640700003)(68736007)(5660300002)(6436002)(31686004)(86362001)(7736002)(66066001)(478600001)(72206003)(6486002)(99286004)(8936002)(14454004)(25786009)(316002)(76176011)(142933001)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1102; SCL:1; SRVR:BY5PR18MB3186; H:BY5PR18MB3394.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: suse.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: M1cKe/9ajA2IPXiNzLRTxEFdgRb9thkHO4cp2fglxzimEZWK6Gw119/1bErX8iNhkZxu1nxBRUslF8ghzLJaesRKMavfqZ6e3ZF0TgIA0xoUmVZutd6Ot9k6pCFI2l7s/0wr9ylOT2cgh9wUMs3tz1CJ62oXiwNpQvQ0+TdrUghsPsU5K5DVmsmXvG+bckwlRtAE4taOrCcjPHAau/uXpRbW8dMKk17rZRuc9pQdrpKzDiFs90K/y3X3zG7pKsyFxgMqY2ZsvXRvBmp4XzQ88vIdeYKBjkmdYev7T8cJ3liJRzvR0mlkX4H/jsYIbAQwBTcf84MC0ToSTMDt+Pt+ixgLLKu2yipm7qAV95Q6xTVGG3QRNHFAiPtCeLLZT9SQHe3dDeQ7ptW3/yrLrwhtQDDhbwiuOpZ5VK7nplCO2Is= Content-ID: <4C3C2A3EDE2A2943B8A362E86760A3EB@namprd18.prod.outlook.com> MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 863a1ddf-89d8-484e-7e59-08d6fe15a35a X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Jul 2019 11:17:01.0197 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 856b813c-16e5-49a5-85ec-6f081e13b527 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: JBeulich@suse.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR18MB3186 X-OriginatorOrg: suse.com Subject: [Xen-devel] [PATCH v9 02/23] x86emul: support remaining misc AVX512{F, BW} insns X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Andrew Cooper , Wei Liu , RogerPau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP This completes support of AVX512BW in the insn emulator, and leaves just the scatter/gather ones open in the AVX512F set. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v5: New. --- TBD: The *blendm* built-in functions don't reliably produce the intended insns, as the respective moves are about as good a fit for the compiler when looking for a match for the intended operation. We'd need to switch to inline assembly if we wanted to guarantee the testing of those insns. Thoughts? --- a/tools/tests/x86_emulator/evex-disp8.c +++ b/tools/tests/x86_emulator/evex-disp8.c @@ -105,6 +105,8 @@ enum esz { static const struct test avx512f_all[] = { INSN_FP(add, 0f, 58), + INSN(align, 66, 0f3a, 03, vl, dq, vl), + INSN(blendm, 66, 0f38, 65, vl, sd, vl), INSN(broadcastss, 66, 0f38, 18, el, d, el), INSN_FP(cmp, 0f, c2), INSN(comisd, 66, 0f, 2f, el, q, el), @@ -207,6 +209,7 @@ static const struct test avx512f_all[] = INSN(paddq, 66, 0f, d4, vl, q, vl), INSN(pand, 66, 0f, db, vl, dq, vl), INSN(pandn, 66, 0f, df, vl, dq, vl), + INSN(pblendm, 66, 0f38, 64, vl, dq, vl), // pbroadcast, 66, 0f38, 7c, dq64 INSN(pbroadcastd, 66, 0f38, 58, el, d, el), INSN(pbroadcastq, 66, 0f38, 59, el, q, el), @@ -354,6 +357,7 @@ static const struct test avx512f_512[] = }; static const struct test avx512bw_all[] = { + INSN(dbpsadbw, 66, 0f3a, 42, vl, b, vl), INSN(movdqu8, f2, 0f, 6f, vl, b, vl), INSN(movdqu8, f2, 0f, 7f, vl, b, vl), INSN(movdqu16, f2, 0f, 6f, vl, w, vl), @@ -373,6 +377,7 @@ static const struct test avx512bw_all[] INSN(palignr, 66, 0f3a, 0f, vl, b, vl), INSN(pavgb, 66, 0f, e0, vl, b, vl), INSN(pavgw, 66, 0f, e3, vl, w, vl), + INSN(pblendm, 66, 0f38, 66, vl, bw, vl), INSN(pbroadcastb, 66, 0f38, 78, el, b, el), // pbroadcastb, 66, 0f38, 7a, b INSN(pbroadcastw, 66, 0f38, 79, el_2, b, vl), --- a/tools/tests/x86_emulator/simd.c +++ b/tools/tests/x86_emulator/simd.c @@ -297,7 +297,7 @@ static inline vec_t movlhps(vec_t x, vec # define max(x, y) BR_(maxps, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minps, _mask, x, y, undef(), ~0) # endif -# define mix(x, y) B(movaps, _mask, x, y, (0b0101010101010101 & ALL_TRUE)) +# define mix(x, y) B(blendmps_, _mask, x, y, (0b1010101010101010 & ALL_TRUE)) # define scale(x, y) BR(scalefps, _mask, x, y, undef(), ~0) # if VEC_SIZE == 64 && defined(__AVX512ER__) # define recip(x) BR(rcp28ps, _mask, x, undef(), ~0) @@ -370,7 +370,7 @@ static inline vec_t movlhps(vec_t x, vec # define max(x, y) BR_(maxpd, _mask, x, y, undef(), ~0) # define min(x, y) BR_(minpd, _mask, x, y, undef(), ~0) # endif -# define mix(x, y) B(movapd, _mask, x, y, 0b01010101) +# define mix(x, y) B(blendmpd_, _mask, x, y, 0b10101010) # define scale(x, y) BR(scalefpd, _mask, x, y, undef(), ~0) # if VEC_SIZE == 64 && defined(__AVX512ER__) # define recip(x) BR(rcp28pd, _mask, x, undef(), ~0) @@ -564,8 +564,9 @@ static inline vec_t movlhps(vec_t x, vec 0b00011011, (vsi_t)undef(), ~0)) # define swap2(x) ((vec_t)B_(permvarsi, _mask, (vsi_t)(x), (vsi_t)(inv - 1), (vsi_t)undef(), ~0)) # endif -# define mix(x, y) ((vec_t)B(movdqa32_, _mask, (vsi_t)(x), (vsi_t)(y), \ - (0b0101010101010101 & ((1 << ELEM_COUNT) - 1)))) +# define mix(x, y) ((vec_t)B(blendmd_, _mask, (vsi_t)(x), (vsi_t)(y), \ + (0b1010101010101010 & ((1 << ELEM_COUNT) - 1)))) +# define rotr(x, n) ((vec_t)B(alignd, _mask, (vsi_t)(x), (vsi_t)(x), n, (vsi_t)undef(), ~0)) # define shrink1(x) ((half_t)B(pmovqd, _mask, (vdi_t)(x), (vsi_half_t){}, ~0)) # elif INT_SIZE == 8 || UINT_SIZE == 8 # define broadcast(x) ({ \ @@ -602,7 +603,8 @@ static inline vec_t movlhps(vec_t x, vec 0b01001110, (vsi_t)undef(), ~0)) # define swap2(x) ((vec_t)B(permvardi, _mask, (vdi_t)(x), (vdi_t)(inv - 1), (vdi_t)undef(), ~0)) # endif -# define mix(x, y) ((vec_t)B(movdqa64_, _mask, (vdi_t)(x), (vdi_t)(y), 0b01010101)) +# define mix(x, y) ((vec_t)B(blendmq_, _mask, (vdi_t)(x), (vdi_t)(y), 0b10101010)) +# define rotr(x, n) ((vec_t)B(alignq, _mask, (vdi_t)(x), (vdi_t)(x), n, (vdi_t)undef(), ~0)) # if VEC_SIZE == 32 # define swap3(x) ((vec_t)B_(permdi, _mask, (vdi_t)(x), 0b00011011, (vdi_t)undef(), ~0)) # elif VEC_SIZE == 64 @@ -654,8 +656,8 @@ static inline vec_t movlhps(vec_t x, vec # define interleave_hi(x, y) ((vec_t)B(vpermi2varqi, _mask, (vqi_t)(x), interleave_hi, (vqi_t)(y), ~0)) # define interleave_lo(x, y) ((vec_t)B(vpermt2varqi, _mask, interleave_lo, (vqi_t)(x), (vqi_t)(y), ~0)) # endif -# define mix(x, y) ((vec_t)B(movdquqi, _mask, (vqi_t)(x), (vqi_t)(y), \ - (0b0101010101010101010101010101010101010101010101010101010101010101LL & ALL_TRUE))) +# define mix(x, y) ((vec_t)B(blendmb_, _mask, (vqi_t)(x), (vqi_t)(y), \ + (0b1010101010101010101010101010101010101010101010101010101010101010LL & ALL_TRUE))) # define shrink1(x) ((half_t)B(pmovwb, _mask, (vhi_t)(x), (vqi_half_t){}, ~0)) # define shrink2(x) ((quarter_t)B(pmovdb, _mask, (vsi_t)(x), (vqi_quarter_t){}, ~0)) # define shrink3(x) ((eighth_t)B(pmovqb, _mask, (vdi_t)(x), (vqi_eighth_t){}, ~0)) @@ -687,8 +689,8 @@ static inline vec_t movlhps(vec_t x, vec # define interleave_hi(x, y) ((vec_t)B(vpermi2varhi, _mask, (vhi_t)(x), interleave_hi, (vhi_t)(y), ~0)) # define interleave_lo(x, y) ((vec_t)B(vpermt2varhi, _mask, interleave_lo, (vhi_t)(x), (vhi_t)(y), ~0)) # endif -# define mix(x, y) ((vec_t)B(movdquhi, _mask, (vhi_t)(x), (vhi_t)(y), \ - (0b01010101010101010101010101010101 & ALL_TRUE))) +# define mix(x, y) ((vec_t)B(blendmw_, _mask, (vhi_t)(x), (vhi_t)(y), \ + (0b10101010101010101010101010101010 & ALL_TRUE))) # define shrink1(x) ((half_t)B(pmovdw, _mask, (vsi_t)(x), (vhi_half_t){}, ~0)) # define shrink2(x) ((quarter_t)B(pmovqw, _mask, (vdi_t)(x), (vhi_quarter_t){}, ~0)) # define swap2(x) ((vec_t)B(permvarhi, _mask, (vhi_t)(x), (vhi_t)(inv - 1), (vhi_t)undef(), ~0)) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -484,6 +484,7 @@ static const struct ext0f38_table { [0x5b] = { .simd_size = simd_256, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x62] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_bw }, [0x63] = { .simd_size = simd_packed_int, .to_mem = 1, .two_op = 1, .d8s = d8s_bw }, + [0x64 ... 0x66] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x75 ... 0x76] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x77] = { .simd_size = simd_packed_fp, .d8s = d8s_vl }, [0x78] = { .simd_size = simd_other, .two_op = 1 }, @@ -550,6 +551,7 @@ static const struct ext0f3a_table { [0x00] = { .simd_size = simd_packed_int, .two_op = 1, .d8s = d8s_vl }, [0x01] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x02] = { .simd_size = simd_packed_int }, + [0x03] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x04 ... 0x05] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, [0x06] = { .simd_size = simd_packed_fp }, [0x08 ... 0x09] = { .simd_size = simd_packed_fp, .two_op = 1, .d8s = d8s_vl }, @@ -581,8 +583,7 @@ static const struct ext0f3a_table { [0x3b] = { .simd_size = simd_256, .to_mem = 1, .two_op = 1, .d8s = d8s_vl_by_2 }, [0x3e ... 0x3f] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x40 ... 0x41] = { .simd_size = simd_packed_fp }, - [0x42] = { .simd_size = simd_packed_int }, - [0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, + [0x42 ... 0x43] = { .simd_size = simd_packed_int, .d8s = d8s_vl }, [0x44] = { .simd_size = simd_packed_int }, [0x46] = { .simd_size = simd_packed_int }, [0x48 ... 0x49] = { .simd_size = simd_packed_fp, .four_op = 1 }, @@ -6178,6 +6179,8 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f38, 0x47): /* vpsllv{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x4c): /* vrcp14p{s,d} [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x4e): /* vrsqrt14p{s,d} [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x64): /* vpblendm{d,q} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x65): /* vblendmp{s,d} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ avx512f_no_sae: host_and_vcpu_must_have(avx512f); generate_exception_if(ea.type != OP_MEM && evex.brs, EXC_UD); @@ -6937,6 +6940,7 @@ x86_emulate( case X86EMUL_OPC_EVEX_66(0x0f38, 0x0b): /* vpmulhrsw [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x1c): /* vpabsb [xyz]mm/mem,[xyz]mm{k} */ case X86EMUL_OPC_EVEX_66(0x0f38, 0x1d): /* vpabsw [xyz]mm/mem,[xyz]mm{k} */ + case X86EMUL_OPC_EVEX_66(0x0f38, 0x66): /* vpblendm{b,w} [xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ host_and_vcpu_must_have(avx512bw); generate_exception_if(evex.brs, EXC_UD); elem_bytes = 1 << (b & 1); @@ -8106,10 +8110,12 @@ x86_emulate( goto simd_0f_to_gpr; CASE_SIMD_PACKED_FP(_EVEX, 0x0f, 0xc6): /* vshufp{s,d} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ - fault_suppression = false; generate_exception_if(evex.w != (evex.pfx & VEX_PREFIX_DOUBLE_MASK), EXC_UD); /* fall through */ + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x03): /* valign{d,q} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + fault_suppression = false; + /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x25): /* vpternlog{d,q} $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ avx512f_imm8_no_sae: host_and_vcpu_must_have(avx512f); @@ -9450,6 +9456,9 @@ x86_emulate( insn_bytes = PFX_BYTES + 4; break; + case X86EMUL_OPC_EVEX_66(0x0f3a, 0x42): /* vdbpsadbw $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ + generate_exception_if(evex.w, EXC_UD); + /* fall through */ case X86EMUL_OPC_EVEX_66(0x0f3a, 0x0f): /* vpalignr $imm8,[xyz]mm/mem,[xyz]mm,[xyz]mm{k} */ fault_suppression = false; goto avx512bw_imm;