From patchwork Wed Jul 17 06:38:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 11047315 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 845DD1398 for ; Wed, 17 Jul 2019 06:40:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 70CFE28306 for ; Wed, 17 Jul 2019 06:40:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6240F286C4; Wed, 17 Jul 2019 06:40:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 494EE28306 for ; Wed, 17 Jul 2019 06:40:52 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hndb5-0005Te-R4; Wed, 17 Jul 2019 06:39:27 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hndb4-0005TK-N6 for xen-devel@lists.xenproject.org; Wed, 17 Jul 2019 06:39:26 +0000 X-Inumbo-ID: 9da4b672-a85d-11e9-8980-bc764e045a96 Received: from m4a0039g.houston.softwaregrp.com (unknown [15.124.2.85]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 9da4b672-a85d-11e9-8980-bc764e045a96; Wed, 17 Jul 2019 06:39:24 +0000 (UTC) Received: FROM m4a0039g.houston.softwaregrp.com (15.120.17.146) BY m4a0039g.houston.softwaregrp.com WITH ESMTP; Wed, 17 Jul 2019 06:39:23 +0000 Received: from M9W0068.microfocus.com (2002:f79:bf::f79:bf) by M4W0334.microfocus.com (2002:f78:1192::f78:1192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10; Wed, 17 Jul 2019 06:38:23 +0000 Received: from NAM03-BY2-obe.outbound.protection.outlook.com (15.124.72.10) by M9W0068.microfocus.com (15.121.0.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10 via Frontend Transport; Wed, 17 Jul 2019 06:38:22 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VRVHBc+bppKuUQm0+mFusPP3T2n9veJPdgIbUfkDYy8dPCakmLfcfzT3l5iC5m7aiX2ObnzCVPbricP80nA1MGmW2UfR7HxYGqvHrtkSl73KbDflKL8rt5hNE1dlgUhDKkLE2gCC3GDEzK5+BWfbqZ/ljtjWMf4obIOhCrcZzoQ2v+3lJ0iuoaxYHCnXdqW05nfCdQcnIX90D7tQY2JzOLpQYBlLSpK8PhPoPBfG8mJOuCDKPLdRzFrNpcVRHfP0C0jstFWrsOaqnT0NGAGO0Cir/XXfcYLcP4es8M7ouvUMUIktqgHw299ssrcWE+7d0S3Ij8T7ZRcruYJF1UDw6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CfRQK99pYTJ6eVz2utS2XM7WJTU5MaALjgxIqBCwSqY=; b=NwkAU36ftt2X/9moFoG1ztrLU1DZvEgBvpgLxZBnEezZ5fvHv5K7EGYT4s8dJTx5onSREaeDmxTmrRPq8m5RUXAw9ko3YbXbQAoy4jiD9Tbe4VNL7pjfR2jsYiU3gco8iKWbLHadq9zwok284O1zULrrdEo70Wz147jVGjlNfDM8vvgtGYv76jMrb8bnwDYrZF3aVn1n7DAvyUgkZb/63XHtFj14y5ObeOCVlsvFRu2FNrqANNv7jqk1q0y65fwR+8QmZ4yT8ZcResNkIrUviBgns11h65+rEuYFrg6abNPghy9WUVile9fmJKlG+u2w9xLeuwbLu8ojxaXP0+5piw== ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=suse.com;dmarc=pass action=none header.from=suse.com;dkim=pass header.d=suse.com;arc=none Received: from DM6PR18MB3401.namprd18.prod.outlook.com (10.255.174.218) by DM6PR18MB2841.namprd18.prod.outlook.com (20.179.51.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2073.14; Wed, 17 Jul 2019 06:38:21 +0000 Received: from DM6PR18MB3401.namprd18.prod.outlook.com ([fe80::1fe:35f6:faf3:78c7]) by DM6PR18MB3401.namprd18.prod.outlook.com ([fe80::1fe:35f6:faf3:78c7%7]) with mapi id 15.20.2073.012; Wed, 17 Jul 2019 06:38:21 +0000 From: Jan Beulich To: "xen-devel@lists.xenproject.org" Thread-Topic: [PATCH v10 12/13] x86emul: add a SHA test case to the harness Thread-Index: AQHVPGo6cLkrLi53skSH5jxhnM12fw== Date: Wed, 17 Jul 2019 06:38:21 +0000 Message-ID: References: <0ccca19e-7bbb-ab1e-c0bb-a568b02874e0@suse.com> In-Reply-To: <0ccca19e-7bbb-ab1e-c0bb-a568b02874e0@suse.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: DB6PR1001CA0046.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:4:55::32) To DM6PR18MB3401.namprd18.prod.outlook.com (2603:10b6:5:1cc::26) authentication-results: spf=none (sender IP is ) smtp.mailfrom=JBeulich@suse.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [87.234.252.170] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 7fbcd435-6150-43c2-677f-08d70a815c81 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:DM6PR18MB2841; x-ms-traffictypediagnostic: DM6PR18MB2841: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:2089; x-forefront-prvs: 01018CB5B3 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(4636009)(376002)(346002)(396003)(39860400002)(366004)(136003)(199004)(189003)(8676002)(6116002)(5660300002)(25786009)(30864003)(53936002)(256004)(66946007)(8936002)(81166006)(446003)(476003)(6512007)(76176011)(99286004)(2616005)(11346002)(486006)(81156014)(53946003)(66066001)(31686004)(478600001)(52116002)(186003)(4326008)(14444005)(54906003)(86362001)(6436002)(316002)(7736002)(102836004)(3846002)(6506007)(31696002)(6916009)(80792005)(68736007)(2501003)(6486002)(26005)(386003)(36756003)(66476007)(66446008)(66556008)(64756008)(2906002)(305945005)(71190400001)(14454004)(5640700003)(2351001)(71200400001); DIR:OUT; SFP:1102; SCL:1; SRVR:DM6PR18MB2841; H:DM6PR18MB3401.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: suse.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: L0+nvgDbowUT+yUPUQE9ocf3/YsO7XAGRqoeQRlKjB7zBarEzTaWufDFx0t/SSDWVRug49fwixci+SaXM8F5i/Z8+2sZ3YWnAlhazb6UkWvwy1VjoU6C2MkdzaOjJ49S4UOseYgAAY9TJPnmPXFrBk+t243wv2CQFcPkdxcfRJJaBsz9/4GddLo7DDMLKsDYYK7thy5hbMDC9P1lmTVnaXM+OCt8U0iKM81FB2s7Vj1PoFU2772i/A2X/fZfH0aRXlPb23Blz23yWh6bHHttzigDnkWgjNmp+NdLAcpPw3Quc4fLjEb23JXYqWBc9nwylbhYV7H42rJ9Dd9lHaxu6qoI3Die6bqKOiOMa+yonPaCwly5TRcVRz9IBZZWkwWVJAZRAF6wVrG9MVG08DdAOoe5wHqYLo/01if4Ji19N7k= Content-ID: MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 7fbcd435-6150-43c2-677f-08d70a815c81 X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Jul 2019 06:38:21.6437 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 856b813c-16e5-49a5-85ec-6f081e13b527 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: JBeulich@suse.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR18MB2841 X-OriginatorOrg: suse.com Subject: [Xen-devel] [PATCH v10 12/13] x86emul: add a SHA test case to the harness X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Andrew Cooper , Wei Liu , RogerPau Monne Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Also use this for AVX512VL VPRO{L,R}{,V}D as well as some further shifts testing. Signed-off-by: Jan Beulich Acked-by: Andrew Cooper --- v8: New. --- a/tools/tests/x86_emulator/Makefile +++ b/tools/tests/x86_emulator/Makefile @@ -20,8 +20,9 @@ SIMD := 3dnow sse sse2 sse4 avx avx2 xop FMA := fma4 fma SG := avx2-sg avx512f-sg avx512vl-sg AES := ssse3-aes avx-aes avx2-vaes avx512bw-vaes +SHA := sse4-sha avx-sha avx512f-sha GF := sse2-gf avx2-gf avx512bw-gf -TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(GF) +TESTCASES := blowfish $(SIMD) $(FMA) $(SG) $(AES) $(SHA) $(GF) OPMASK := avx512f avx512dq avx512bw @@ -148,6 +149,10 @@ define simd-aes-defs $(1)-cflags := $(foreach vec,$($(patsubst %-aes,sse,$(1))-vecs) $($(patsubst %-vaes,%,$(1))-vecs), \ "-D_$(vec) -maes $(addprefix -m,$(subst -,$(space),$(1))) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)") endef +define simd-sha-defs +$(1)-cflags := $(foreach vec,$(sse-vecs), \ + "-D_$(vec) $(addprefix -m,$(subst -,$(space),$(1))) -Os -DVEC_SIZE=$(vec)") +endef define simd-gf-defs $(1)-cflags := $(foreach vec,$($(1:-gf=)-vecs), \ "-D_$(vec) -mgfni -m$(1:-gf=) $(call non-sse,$(1)) -Os -DVEC_SIZE=$(vec)") @@ -159,6 +164,7 @@ endef $(foreach flavor,$(SIMD) $(FMA),$(eval $(call simd-defs,$(flavor)))) $(foreach flavor,$(SG),$(eval $(call simd-sg-defs,$(flavor)))) $(foreach flavor,$(AES),$(eval $(call simd-aes-defs,$(flavor)))) +$(foreach flavor,$(SHA),$(eval $(call simd-sha-defs,$(flavor)))) $(foreach flavor,$(GF),$(eval $(call simd-gf-defs,$(flavor)))) $(foreach flavor,$(OPMASK),$(eval $(call opmask-defs,$(flavor)))) @@ -212,10 +218,13 @@ $(addsuffix .c,$(SG)): $(addsuffix .c,$(AES)): ln -sf simd-aes.c $@ +$(addsuffix .c,$(SHA)): + ln -sf simd-sha.c $@ + $(addsuffix .c,$(GF)): ln -sf simd-gf.c $@ -$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(GF)): simd.h +$(addsuffix .h,$(SIMD) $(FMA) $(SG) $(AES) $(SHA) $(GF)): simd.h xop.h avx512f.h: simd-fma.c --- /dev/null +++ b/tools/tests/x86_emulator/simd-sha.c @@ -0,0 +1,392 @@ +#define INT_SIZE 4 + +#include "simd.h" +ENTRY(sha_test); + +#define SHA(op, a...) __builtin_ia32_sha ## op(a) + +#ifdef __AVX512F__ +# define ALL_TRUE (~0ULL >> (64 - ELEM_COUNT)) +# define eq(x, y) (B(pcmpeqd, _mask, x, y, -1) == ALL_TRUE) +# define blend(x, y, sel) B(movdqa32_, _mask, y, x, sel) +# define rot_c(f, r, x, n) B(pro ## f ## d, _mask, x, n, undef(), ~0) +# define rot_s(f, r, x, n) ({ /* gcc does not support embedded broadcast */ \ + vec_t r_; \ + asm ( "vpro" #f "vd %2%{1to%c3%}, %1, %0" \ + : "=v" (r_) \ + : "v" (x), "m" (n), "i" (ELEM_COUNT) ); \ + r_; \ +}) +# define rot_v(d, x, n) B(pro ## d ## vd, _mask, x, n, undef(), ~0) +# define shift_s(d, x, n) ({ \ + vec_t r_; \ + asm ( "vps" #d "lvd %2%{1to%c3%}, %1, %0" \ + : "=v" (r_) \ + : "v" (x), "m" (n), "i" (ELEM_COUNT) ); \ + r_; \ +}) +# define vshift(d, x, n) ({ /* gcc does not allow memory operands */ \ + vec_t r_; \ + asm ( "vps" #d "ldq %2, %1, %0" \ + : "=v" (r_) : "m" (x), "i" ((n) * ELEM_SIZE) ); \ + r_; \ +}) +#else +# define to_bool(cmp) (__builtin_ia32_pmovmskb128(cmp) == 0xffff) +# define eq(x, y) to_bool((x) == (y)) +# define blend(x, y, sel) \ + ((vec_t)__builtin_ia32_pblendw128((vhi_t)(x), (vhi_t)(y), \ + ((sel) & 1 ? 0x03 : 0) | \ + ((sel) & 2 ? 0x0c : 0) | \ + ((sel) & 4 ? 0x30 : 0) | \ + ((sel) & 8 ? 0xc0 : 0))) +# define rot_c(f, r, x, n) (sh ## f ## _c(x, n) | sh ## r ## _c(x, 32 - (n))) +# define rot_s(f, r, x, n) ({ /* gcc does not allow memory operands */ \ + vec_t r_, t_, n_ = (vec_t){ 32 } - (n); \ + asm ( "ps" #f "ld %2, %0; ps" #r "ld %3, %1; por %1, %0" \ + : "=&x" (r_), "=&x" (t_) \ + : "m" (n), "m" (n_), "0" (x), "1" (x) ); \ + r_; \ +}) +static inline unsigned int rotl(unsigned int x, unsigned int n) +{ + return (x << (n & 0x1f)) | (x >> ((32 - n) & 0x1f)); +} +static inline unsigned int rotr(unsigned int x, unsigned int n) +{ + return (x >> (n & 0x1f)) | (x << ((32 - n) & 0x1f)); +} +# define rot_v(d, x, n) ({ \ + vec_t t_; \ + unsigned int i_; \ + for ( i_ = 0; i_ < ELEM_COUNT; ++i_ ) \ + t_[i_] = rot ## d((x)[i_], (n)[i_]); \ + t_; \ +}) +# define shift_s(d, x, n) ({ \ + vec_t r_; \ + asm ( "ps" #d "ld %1, %0" : "=&x" (r_) : "m" (n), "0" (x) ); \ + r_; \ +}) +# define vshift(d, x, n) \ + (vec_t)(__builtin_ia32_ps ## d ## ldqi128((vdi_t)(x), (n) * ELEM_SIZE * 8)) +#endif + +#define alignr(x, y, n) ((vec_t)__builtin_ia32_palignr128((vdi_t)(x), (vdi_t)(y), (n) * 8)) +#define hadd(x, y) __builtin_ia32_phaddd128(x, y) +#define rol_c(x, n) rot_c(l, r, x, n) +#define rol_s(x, n) rot_s(l, r, x, n) +#define rol_v(x, n...) rot_v(l, x, n) +#define ror_c(x, n) rot_c(r, l, x, n) +#define ror_s(x, n) rot_s(r, l, x, n) +#define ror_v(x, n...) rot_v(r, x, n) +#define shl_c(x, n) __builtin_ia32_pslldi128(x, n) +#define shl_s(x, n) shift_s(l, x, n) +#define shr_c(x, n) __builtin_ia32_psrldi128(x, n) +#define shr_s(x, n) shift_s(r, x, n) +#define shuf(x, s) __builtin_ia32_pshufd(x, s) +#define swap(x) shuf(x, 0b00011011) +#define vshl(x, n) vshift(l, x, n) +#define vshr(x, n) vshift(r, x, n) + +static inline vec_t sha256_sigma0(vec_t w) +{ + vec_t res; + + touch(w); + res = ror_c(w, 7); + touch(w); + res ^= rol_c(w, 14); + touch(w); + res ^= shr_c(w, 3); + touch(w); + + return res; +} + +static inline vec_t sha256_sigma1(vec_t w) +{ + vec_t _17 = { 17 }, _19 = { 19 }, _10 = { 10 }; + + return ror_s(w, _17) ^ ror_s(w, _19) ^ shr_s(w, _10); +} + +static inline vec_t sha256_Sigma0(vec_t w) +{ + vec_t res, n1 = { 0, 0, 2, 2 }, n2 = { 0, 0, 13, 13 }, n3 = { 0, 0, 10, 10 }; + + touch(n1); + res = ror_v(w, n1); + touch(n2); + res ^= ror_v(w, n2); + touch(n3); + + return res ^ rol_v(w, n3); +} + +static inline vec_t sha256_Sigma1(vec_t w) +{ + return ror_c(w, 6) ^ ror_c(w, 11) ^ rol_c(w, 7); +} + +int sha_test(void) +{ + unsigned int i; + vec_t src, one = { 1 }; + vqi_t raw = {}; + + for ( i = 1; i < VEC_SIZE; ++i ) + raw[i] = i; + src = (vec_t)raw; + + for ( i = 0; i < 256; i += VEC_SIZE ) + { + vec_t x, y, tmp, hash = -src; + vec_t a, b, c, d, e, g, h; + unsigned int k, r; + + touch(src); + x = SHA(1msg1, hash, src); + touch(src); + y = hash ^ alignr(hash, src, 8); + touch(src); + + if ( !eq(x, y) ) return __LINE__; + + touch(src); + x = SHA(1msg2, hash, src); + touch(src); + tmp = hash ^ alignr(src, hash, 12); + touch(tmp); + y = rol_c(tmp, 1); + tmp = hash ^ alignr(src, y, 12); + touch(tmp); + y = rol_c(tmp, 1); + + if ( !eq(x, y) ) return __LINE__; + + touch(src); + x = SHA(1msg2, hash, src); + touch(src); + tmp = rol_s(hash ^ alignr(src, hash, 12), one); + y = rol_s(hash ^ alignr(src, tmp, 12), one); + + if ( !eq(x, y) ) return __LINE__; + + touch(src); + x = SHA(1nexte, hash, src); + touch(src); + touch(hash); + tmp = rol_c(hash, 30); + tmp[2] = tmp[1] = tmp[0] = 0; + + if ( !eq(x, src + tmp) ) return __LINE__; + + /* + * SHA1RNDS4 + * + * SRC1 = { A0, B0, C0, D0 } + * SRC2 = W' = { W[0]E0, W[1], W[2], W[3] } + * + * (NB that the notation is not C-like, i.e. elements are listed + * high-to-low everywhere in this comment.) + * + * In order to pick a simple rounds function, an immediate value of + * 1 is used; 3 would also be a possibility. + * + * Applying + * + * A1 = ROL5(A0) + (B0 ^ C0 ^ D0) + W'[0] + K + * E1 = D0 + * D1 = C0 + * C1 = ROL30(B0) + * B1 = A0 + * + * iteratively four times and resolving round variable values to + * A and B0, C0, and D0 we get + * + * A4 = ROL5(A3) + (A2 ^ ROL30(A1) ^ ROL30(A0)) + W'[3] + ROL30(B0) + K + * A3 = ROL5(A2) + (A1 ^ ROL30(A0) ^ ROL30(B0)) + W'[2] + C0 + K + * A2 = ROL5(A1) + (A0 ^ ROL30(B0) ^ C0 ) + W'[1] + D0 + K + * A1 = ROL5(A0) + (B0 ^ C0 ^ D0 ) + W'[0] + K + * + * (respective per-column variable names: + * y a b c d src e k + * ) + * + * with + * + * B4 = A3 + * C4 = ROL30(A2) + * D4 = ROL30(A1) + * E4 = ROL30(A0) + * + * and hence + * + * DST = { A4, A3, ROL30(A2), ROL30(A1) } + */ + + touch(src); + x = SHA(1rnds4, hash, src, 1); + touch(src); + + a = vshr(hash, 3); + b = vshr(hash, 2); + touch(hash); + d = rol_c(hash, 30); + touch(hash); + d = blend(d, hash, 0b0011); + c = vshr(d, 1); + e = vshl(d, 1); + tmp = (vec_t){}; + k = rol_c(SHA(1rnds4, tmp, tmp, 1), 2)[0]; + + for ( r = 0; r < 4; ++r ) + { + y = rol_c(a, 5) + (b ^ c ^ d) + swap(src) + e + k; + + switch ( r ) + { + case 0: + c[3] = rol_c(y, 30)[0]; + /* fall through */ + case 1: + b[r + 2] = y[r]; + /* fall through */ + case 2: + a[r + 1] = y[r]; + break; + } + + switch ( r ) + { + case 3: + if ( a[3] != y[2] ) return __LINE__; + /* fall through */ + case 2: + if ( a[2] != y[1] ) return __LINE__; + if ( b[3] != y[1] ) return __LINE__; + /* fall through */ + case 1: + if ( a[1] != y[0] ) return __LINE__; + if ( b[2] != y[0] ) return __LINE__; + if ( c[3] != rol_c(y, 30)[0] ) return __LINE__; + break; + } + } + + a = blend(rol_c(y, 30), y, 0b1100); + + if ( !eq(x, a) ) return __LINE__; + + touch(src); + x = SHA(256msg1, hash, src); + touch(src); + y = hash + sha256_sigma0(alignr(src, hash, 4)); + + if ( !eq(x, y) ) return __LINE__; + + touch(src); + x = SHA(256msg2, hash, src); + touch(src); + tmp = hash + sha256_sigma1(alignr(hash, src, 8)); + y = hash + sha256_sigma1(alignr(tmp, src, 8)); + + if ( !eq(x, y) ) return __LINE__; + + /* + * SHA256RNDS2 + * + * SRC1 = { C0, D0, G0, H0 } + * SRC2 = { A0, B0, E0, F0 } + * XMM0 = W' = { ?, ?, WK1, WK0 } + * + * (NB that the notation again is not C-like, i.e. elements are listed + * high-to-low everywhere in this comment.) + * + * Ch(E,F,G) = (E & F) ^ (~E & G) + * Maj(A,B,C) = (A & B) ^ (A & C) ^ (B & C) + * + * Σ0(A) = ROR2(A) ^ ROR13(A) ^ ROR22(A) + * Σ1(E) = ROR6(E) ^ ROR11(E) ^ ROR25(E) + * + * Applying + * + * A1 = Ch(E0, F0, G0) + Σ1(E0) + WK0 + H0 + Maj(A0, B0, C0) + Σ0(A0) + * B1 = A0 + * C1 = B0 + * D1 = C0 + * E1 = Ch(E0, F0, G0) + Σ1(E0) + WK0 + H0 + D0 + * F1 = E0 + * G1 = F0 + * H1 = G0 + * + * iteratively four times and resolving round variable values to + * A / E and B0, C0, D0, F0, G0, and H0 we get + * + * A2 = Ch(E1, E0, F0) + Σ1(E1) + WK1 + G0 + Maj(A1, A0, B0) + Σ0(A1) + * A1 = Ch(E0, F0, G0) + Σ1(E0) + WK0 + H0 + Maj(A0, B0, C0) + Σ0(A0) + * E2 = Ch(E1, E0, F0) + Σ1(E1) + WK1 + G0 + C0 + * E1 = Ch(E0, F0, G0) + Σ1(E0) + WK0 + H0 + D0 + * + * with + * + * B2 = A1 + * F2 = E1 + * + * and hence + * + * DST = { A2, A1, E2, E1 } + * + * which we can simplify a little, by letting A0, B0, and E0 be zero + * and F0 = ~G0, and by then utilizing + * + * Ch(0, 0, x) = x + * Ch(x, 0, y) = ~x & y + * Maj(x, 0, 0) = Maj(0, x, 0) = Maj(0, 0, x) = 0 + * + * A2 = (~E1 & F0) + Σ1(E1) + WK1 + G0 + Σ0(A1) + * A1 = (~E0 & G0) + Σ1(E0) + WK0 + H0 + Σ0(A0) + * E2 = (~E1 & F0) + Σ1(E1) + WK1 + G0 + C0 + * E1 = (~E0 & G0) + Σ1(E0) + WK0 + H0 + D0 + * + * (respective per-column variable names: + * y e g e src h d + * ) + */ + + tmp = (vec_t){ ~hash[1] }; + touch(tmp); + x = SHA(256rnds2, hash, tmp, src); + touch(tmp); + + e = y = (vec_t){}; + d = alignr(y, hash, 8); + g = (vec_t){ hash[1], tmp[0], hash[1], tmp[0] }; + h = shuf(hash, 0b01000100); + + for ( r = 0; r < 2; ++r ) + { + y = (~e & g) + sha256_Sigma1(e) + shuf(src, 0b01000100) + + h + sha256_Sigma0(d); + + if ( !r ) + { + d[3] = y[2]; + e[3] = e[1] = y[0]; + } + else if ( d[3] != y[2] ) + return __LINE__; + else if ( e[1] != y[0] ) + return __LINE__; + else if ( e[3] != y[0] ) + return __LINE__; + } + + if ( !eq(x, y) ) return __LINE__; + + src += 0x01010101 * VEC_SIZE; + } + + return 0; +} --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -14,8 +14,10 @@ asm ( ".pushsection .test, \"ax\", @prog #include "sse2-gf.h" #include "ssse3-aes.h" #include "sse4.h" +#include "sse4-sha.h" #include "avx.h" #include "avx-aes.h" +#include "avx-sha.h" #include "fma4.h" #include "fma.h" #include "avx2.h" @@ -28,6 +30,7 @@ asm ( ".pushsection .test, \"ax\", @prog #include "avx512bw-opmask.h" #include "avx512f.h" #include "avx512f-sg.h" +#include "avx512f-sha.h" #include "avx512vl-sg.h" #include "avx512bw.h" #include "avx512bw-vaes.h" @@ -155,6 +158,21 @@ static bool simd_check_avx512vbmi_vl(voi return cpu_has_avx512_vbmi && cpu_has_avx512vl; } +static bool simd_check_sse4_sha(void) +{ + return cpu_has_sha && cpu_has_sse4_2; +} + +static bool simd_check_avx_sha(void) +{ + return cpu_has_sha && cpu_has_avx; +} + +static bool simd_check_avx512f_sha_vl(void) +{ + return cpu_has_sha && cpu_has_avx512vl; +} + static bool simd_check_avx2_vaes(void) { return cpu_has_aesni && cpu_has_vaes && cpu_has_avx2; @@ -450,6 +468,9 @@ static const struct { AVX512VL(_VBMI+VL u16x8, avx512vbmi, 16u2), AVX512VL(_VBMI+VL s16x16, avx512vbmi, 32i2), AVX512VL(_VBMI+VL u16x16, avx512vbmi, 32u2), + SIMD(SHA, sse4_sha, 16), + SIMD(AVX+SHA, avx_sha, 16), + AVX512VL(VL+SHA, avx512f_sha, 16), SIMD(VAES (VEX/x32), avx2_vaes, 32), SIMD(VAES (EVEX/x64), avx512bw_vaes, 64), AVX512VL(VL+VAES (x16), avx512bw_vaes, 16), --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -142,6 +142,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512_ifma (cp.feat.avx512_ifma && xcr0_mask(0xe6)) #define cpu_has_avx512er (cp.feat.avx512er && xcr0_mask(0xe6)) #define cpu_has_avx512cd (cp.feat.avx512cd && xcr0_mask(0xe6)) +#define cpu_has_sha cp.feat.sha #define cpu_has_avx512bw (cp.feat.avx512bw && xcr0_mask(0xe6)) #define cpu_has_avx512vl (cp.feat.avx512vl && xcr0_mask(0xe6)) #define cpu_has_avx512_vbmi (cp.feat.avx512_vbmi && xcr0_mask(0xe6))