From patchwork Wed Apr 12 11:00:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208774 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96184C77B71 for ; Wed, 12 Apr 2023 11:00:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229555AbjDLLAv (ORCPT ); Wed, 12 Apr 2023 07:00:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229559AbjDLLAu (ORCPT ); Wed, 12 Apr 2023 07:00:50 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2DFE6583 for ; Wed, 12 Apr 2023 04:00:49 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 61E1A62889 for ; Wed, 12 Apr 2023 11:00:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A7D0EC4339C; Wed, 12 Apr 2023 11:00:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297248; bh=VyfOUC2SaMgU8pz1GST90q6y2SQtV0uyViPVyABOX3E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=umjmB07QxbRqVZROTjokkPRvAlTrVYwziXenh4tiFebx6Ce5vdqAQ6ppwBCF6MNGi 35WBz0eSfsCQmBza84L1848Pc4WipotdE2NzVzvFxcagQHEsKCganATyNIDS57Vu8Y LzTmhhtvn+AKrIzGyTxYkUuenGFNnd6Iq0ZHrh8ACCOYmn0wscdGnXum6C2KzZZ5sJ 6xP+OJj2JAzDBzxl26phxgqFdJIbPHTiwlIo7r5XYUMXP8FZgc1kKlCaR3tIk4tm1P s6tFw1RaIMQuNVFQRNuAu1D6VnJxgt+GtA4Fz7oQKUIIGNaUD2EkbeQ/tU+bO+U5/q SyGdK/M5F6Rcg== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 01/13] crypto: x86/aegis128 - Use RIP-relative addressing Date: Wed, 12 Apr 2023 13:00:23 +0200 Message-Id: <20230412110035.361447-2-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1073; i=ardb@kernel.org; h=from:subject; bh=VyfOUC2SaMgU8pz1GST90q6y2SQtV0uyViPVyABOX3E=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWs31mGafJbpZ9M09V3zVwePHHFa8M7E5kWmVvfd1th8 vfuy30uHaUsDGIcDLJiiiwCs/++23l6olSt8yxZmDmsTCBDGLg4BWAiho8ZGU6aGFaVNAnlLbik 5WywbGeK7fe/N04vzNf7M8W6VPbTu0BGhosfVNd2mOU3bz10wanV5CtTopL/kz0r94f7r9Vo1Sn x5wcA X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/aegis128-aesni-asm.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/crypto/aegis128-aesni-asm.S b/arch/x86/crypto/aegis128-aesni-asm.S index cdf3215ec272ced2..ad7f4c89162568b0 100644 --- a/arch/x86/crypto/aegis128-aesni-asm.S +++ b/arch/x86/crypto/aegis128-aesni-asm.S @@ -201,8 +201,8 @@ SYM_FUNC_START(crypto_aegis128_aesni_init) movdqa KEY, STATE4 /* load the constants: */ - movdqa .Laegis128_const_0, STATE2 - movdqa .Laegis128_const_1, STATE1 + movdqa .Laegis128_const_0(%rip), STATE2 + movdqa .Laegis128_const_1(%rip), STATE1 pxor STATE2, STATE3 pxor STATE1, STATE4 @@ -682,7 +682,7 @@ SYM_TYPED_FUNC_START(crypto_aegis128_aesni_dec_tail) punpcklbw T0, T0 punpcklbw T0, T0 punpcklbw T0, T0 - movdqa .Laegis128_counter, T1 + movdqa .Laegis128_counter(%rip), T1 pcmpgtb T1, T0 pand T0, MSG From patchwork Wed Apr 12 11:00:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208775 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE5ECC7619A for ; Wed, 12 Apr 2023 11:00:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229559AbjDLLA5 (ORCPT ); Wed, 12 Apr 2023 07:00:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55740 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229659AbjDLLAy (ORCPT ); Wed, 12 Apr 2023 07:00:54 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A0666A7A for ; Wed, 12 Apr 2023 04:00:51 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id EA42C63256 for ; Wed, 12 Apr 2023 11:00:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3EA4DC433EF; Wed, 12 Apr 2023 11:00:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297250; bh=byw+Jg+9hm2CvseYnTXzAmr7kwzuwBHIZKxwiR9Q0+o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aM4VIu+FVsk96+BslgqN+KILMrukQfXkAuNyuwHHu+B3jDUJ2gMQUo5DVBZUAyE0W LIxKifrZuQsxW0WTIZ057R2MraWs2z8xbBb8emJhwE5qDAQi9P8ZMS+evWzmWoMPj4 9s3xViA8jP7c2V+gUVhrbg7vtxO9jrRfSylUxMi44mhaVypGOcNcfbLvfj9BF3+zaY BB3IiVJ/jskHfCISa5gBLyNUFpGqUgLEMU8Wz0SaIlNlecsXzWd3aXqrSqk+gbFiSE OJF4AS+2NiTcf3GsWYyjg77ssj6Ve0VVQTITwHvk/l780OZRSif0A2RYabWz9YhFbW oWvNM0EaQU05A== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 02/13] crypto: x86/aesni - Use RIP-relative addressing Date: Wed, 12 Apr 2023 13:00:24 +0200 Message-Id: <20230412110035.361447-3-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3052; i=ardb@kernel.org; h=from:subject; bh=byw+Jg+9hm2CvseYnTXzAmr7kwzuwBHIZKxwiR9Q0+o=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWs32X+8z4Lho07d+nsFRLsXmJ4f0vVbk11p1WGH7Sl+ PZdPJ7ZUcrCIMbBICumyCIw+++7nacnStU6z5KFmcPKBDKEgYtTACYS+IGRYdGpL6Xsu7Lb/m5T teuWXcLWeWm/7w0/1bN5Vx4vXNFZqcTIcOxPyMxdhTLvc6587u+zCzi66PzN+tjdBgfNnsxInSl exwwA X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. In the GCM case, we can get rid of the oversized permutation array entirely while at it. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/aesni-intel_asm.S | 2 +- arch/x86/crypto/aesni-intel_avx-x86_64.S | 36 ++++---------------- 2 files changed, 8 insertions(+), 30 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index 837c1e0aa0217783..ca99a2274d551015 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -2717,7 +2717,7 @@ SYM_FUNC_END(aesni_cts_cbc_dec) * BSWAP_MASK == endian swapping mask */ SYM_FUNC_START_LOCAL(_aesni_inc_init) - movaps .Lbswap_mask, BSWAP_MASK + movaps .Lbswap_mask(%rip), BSWAP_MASK movaps IV, CTR pshufb BSWAP_MASK, CTR mov $1, TCTR_LOW diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S index 0852ab573fd306ac..b6ca80f188ff9418 100644 --- a/arch/x86/crypto/aesni-intel_avx-x86_64.S +++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S @@ -154,30 +154,6 @@ SHIFT_MASK: .octa 0x0f0e0d0c0b0a09080706050403020100 ALL_F: .octa 0xffffffffffffffffffffffffffffffff .octa 0x00000000000000000000000000000000 -.section .rodata -.align 16 -.type aad_shift_arr, @object -.size aad_shift_arr, 272 -aad_shift_arr: - .octa 0xffffffffffffffffffffffffffffffff - .octa 0xffffffffffffffffffffffffffffff0C - .octa 0xffffffffffffffffffffffffffff0D0C - .octa 0xffffffffffffffffffffffffff0E0D0C - .octa 0xffffffffffffffffffffffff0F0E0D0C - .octa 0xffffffffffffffffffffff0C0B0A0908 - .octa 0xffffffffffffffffffff0D0C0B0A0908 - .octa 0xffffffffffffffffff0E0D0C0B0A0908 - .octa 0xffffffffffffffff0F0E0D0C0B0A0908 - .octa 0xffffffffffffff0C0B0A090807060504 - .octa 0xffffffffffff0D0C0B0A090807060504 - .octa 0xffffffffff0E0D0C0B0A090807060504 - .octa 0xffffffff0F0E0D0C0B0A090807060504 - .octa 0xffffff0C0B0A09080706050403020100 - .octa 0xffff0D0C0B0A09080706050403020100 - .octa 0xff0E0D0C0B0A09080706050403020100 - .octa 0x0F0E0D0C0B0A09080706050403020100 - - .text @@ -646,11 +622,13 @@ _get_AAD_rest4\@: _get_AAD_rest0\@: /* finalize: shift out the extra bytes we read, and align left. since pslldq can only shift by an immediate, we use - vpshufb and an array of shuffle masks */ - movq %r12, %r11 - salq $4, %r11 - vmovdqu aad_shift_arr(%r11), \T1 - vpshufb \T1, \T7, \T7 + vpshufb and a pair of shuffle masks */ + leaq ALL_F(%rip), %r11 + subq %r12, %r11 + vmovdqu 16(%r11), \T1 + andq $~3, %r11 + vpshufb (%r11), \T7, \T7 + vpand \T1, \T7, \T7 _get_AAD_rest_final\@: vpshufb SHUF_MASK(%rip), \T7, \T7 vpxor \T8, \T7, \T7 From patchwork Wed Apr 12 11:00:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208776 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 586DAC77B6E for ; Wed, 12 Apr 2023 11:00:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229659AbjDLLA6 (ORCPT ); Wed, 12 Apr 2023 07:00:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229671AbjDLLAz (ORCPT ); Wed, 12 Apr 2023 07:00:55 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 128CB6EAD for ; Wed, 12 Apr 2023 04:00:53 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 839F762889 for ; Wed, 12 Apr 2023 11:00:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C6A09C4339E; Wed, 12 Apr 2023 11:00:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297251; bh=mPiYY79mrBdPcJSB+Lsy+wY627fy3181XFaLtnuKu5s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hAeYX4yUEGV7RoHrUzJYtIrZX+gSYmK/pzgcyRhLIB9v4rqDdaP8RKg9k1GdQgUTW piOWHVTg5gnokZE4aBgfYBjABrx8+1T88npplEupBvToYoUYPCbCMKAisxxfxiH6Tq UVGeebr0Ld/PYuZkq1XWx23ZHsFB9ShVB3ay5N9HPxwKbIyqmY5zY5AxnrvU8Dmdbh gWMUlMwPopoP295mjndq9RK8YYk/xBYgBGN5I8qDGfpuIz8GSW+qZHPkh7L/qo/yEM MIipNTjUDocdfBPG5L3VCZm0TKhdMnnvUb/wKy2kNYZRYCqD4SIlVk8jYnO2GQ8AaI U5w1N9hJVmMoA== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 03/13] crypto: x86/aria - Use RIP-relative addressing Date: Wed, 12 Apr 2023 13:00:25 +0200 Message-Id: <20230412110035.361447-4-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=7533; i=ardb@kernel.org; h=from:subject; bh=mPiYY79mrBdPcJSB+Lsy+wY627fy3181XFaLtnuKu5s=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWs31VUdMnKu8c/HrL9kHczKWKjwxwBqzUmUs/nRzprG 2ot1K/tKGVhEONgkBVTZBGY/ffdztMTpWqdZ8nCzGFlAhnCwMUpABOJyGdkOHx9+2rZo4a2XyZX e8ts3ZSq9sGELzlx1+KOjc4pyl+arzAytBl9KA7YKqtcI7V51XNjXb0AhasvXN6/YlUsaewsk/n NBwA= X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/aria-aesni-avx-asm_64.S | 28 ++++++++++---------- arch/x86/crypto/aria-aesni-avx2-asm_64.S | 28 ++++++++++---------- arch/x86/crypto/aria-gfni-avx512-asm_64.S | 24 ++++++++--------- 3 files changed, 40 insertions(+), 40 deletions(-) diff --git a/arch/x86/crypto/aria-aesni-avx-asm_64.S b/arch/x86/crypto/aria-aesni-avx-asm_64.S index 9243f6289d34bfbf..7c1abc513f34621e 100644 --- a/arch/x86/crypto/aria-aesni-avx-asm_64.S +++ b/arch/x86/crypto/aria-aesni-avx-asm_64.S @@ -80,7 +80,7 @@ transpose_4x4(c0, c1, c2, c3, a0, a1); \ transpose_4x4(d0, d1, d2, d3, a0, a1); \ \ - vmovdqu .Lshufb_16x16b, a0; \ + vmovdqu .Lshufb_16x16b(%rip), a0; \ vmovdqu st1, a1; \ vpshufb a0, a2, a2; \ vpshufb a0, a3, a3; \ @@ -132,7 +132,7 @@ transpose_4x4(c0, c1, c2, c3, a0, a1); \ transpose_4x4(d0, d1, d2, d3, a0, a1); \ \ - vmovdqu .Lshufb_16x16b, a0; \ + vmovdqu .Lshufb_16x16b(%rip), a0; \ vmovdqu st1, a1; \ vpshufb a0, a2, a2; \ vpshufb a0, a3, a3; \ @@ -300,11 +300,11 @@ x4, x5, x6, x7, \ t0, t1, t2, t3, \ t4, t5, t6, t7) \ - vmovdqa .Ltf_s2_bitmatrix, t0; \ - vmovdqa .Ltf_inv_bitmatrix, t1; \ - vmovdqa .Ltf_id_bitmatrix, t2; \ - vmovdqa .Ltf_aff_bitmatrix, t3; \ - vmovdqa .Ltf_x2_bitmatrix, t4; \ + vmovdqa .Ltf_s2_bitmatrix(%rip), t0; \ + vmovdqa .Ltf_inv_bitmatrix(%rip), t1; \ + vmovdqa .Ltf_id_bitmatrix(%rip), t2; \ + vmovdqa .Ltf_aff_bitmatrix(%rip), t3; \ + vmovdqa .Ltf_x2_bitmatrix(%rip), t4; \ vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \ vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \ vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \ @@ -324,13 +324,13 @@ x4, x5, x6, x7, \ t0, t1, t2, t3, \ t4, t5, t6, t7) \ - vmovdqa .Linv_shift_row, t0; \ - vmovdqa .Lshift_row, t1; \ - vbroadcastss .L0f0f0f0f, t6; \ - vmovdqa .Ltf_lo__inv_aff__and__s2, t2; \ - vmovdqa .Ltf_hi__inv_aff__and__s2, t3; \ - vmovdqa .Ltf_lo__x2__and__fwd_aff, t4; \ - vmovdqa .Ltf_hi__x2__and__fwd_aff, t5; \ + vmovdqa .Linv_shift_row(%rip), t0; \ + vmovdqa .Lshift_row(%rip), t1; \ + vbroadcastss .L0f0f0f0f(%rip), t6; \ + vmovdqa .Ltf_lo__inv_aff__and__s2(%rip), t2; \ + vmovdqa .Ltf_hi__inv_aff__and__s2(%rip), t3; \ + vmovdqa .Ltf_lo__x2__and__fwd_aff(%rip), t4; \ + vmovdqa .Ltf_hi__x2__and__fwd_aff(%rip), t5; \ \ vaesenclast t7, x0, x0; \ vaesenclast t7, x4, x4; \ diff --git a/arch/x86/crypto/aria-aesni-avx2-asm_64.S b/arch/x86/crypto/aria-aesni-avx2-asm_64.S index 82a14b4ad920f792..c60fa2980630379b 100644 --- a/arch/x86/crypto/aria-aesni-avx2-asm_64.S +++ b/arch/x86/crypto/aria-aesni-avx2-asm_64.S @@ -96,7 +96,7 @@ transpose_4x4(c0, c1, c2, c3, a0, a1); \ transpose_4x4(d0, d1, d2, d3, a0, a1); \ \ - vbroadcasti128 .Lshufb_16x16b, a0; \ + vbroadcasti128 .Lshufb_16x16b(%rip), a0; \ vmovdqu st1, a1; \ vpshufb a0, a2, a2; \ vpshufb a0, a3, a3; \ @@ -148,7 +148,7 @@ transpose_4x4(c0, c1, c2, c3, a0, a1); \ transpose_4x4(d0, d1, d2, d3, a0, a1); \ \ - vbroadcasti128 .Lshufb_16x16b, a0; \ + vbroadcasti128 .Lshufb_16x16b(%rip), a0; \ vmovdqu st1, a1; \ vpshufb a0, a2, a2; \ vpshufb a0, a3, a3; \ @@ -307,11 +307,11 @@ x4, x5, x6, x7, \ t0, t1, t2, t3, \ t4, t5, t6, t7) \ - vpbroadcastq .Ltf_s2_bitmatrix, t0; \ - vpbroadcastq .Ltf_inv_bitmatrix, t1; \ - vpbroadcastq .Ltf_id_bitmatrix, t2; \ - vpbroadcastq .Ltf_aff_bitmatrix, t3; \ - vpbroadcastq .Ltf_x2_bitmatrix, t4; \ + vpbroadcastq .Ltf_s2_bitmatrix(%rip), t0; \ + vpbroadcastq .Ltf_inv_bitmatrix(%rip), t1; \ + vpbroadcastq .Ltf_id_bitmatrix(%rip), t2; \ + vpbroadcastq .Ltf_aff_bitmatrix(%rip), t3; \ + vpbroadcastq .Ltf_x2_bitmatrix(%rip), t4; \ vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \ vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \ vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \ @@ -332,12 +332,12 @@ t4, t5, t6, t7) \ vpxor t7, t7, t7; \ vpxor t6, t6, t6; \ - vbroadcasti128 .Linv_shift_row, t0; \ - vbroadcasti128 .Lshift_row, t1; \ - vbroadcasti128 .Ltf_lo__inv_aff__and__s2, t2; \ - vbroadcasti128 .Ltf_hi__inv_aff__and__s2, t3; \ - vbroadcasti128 .Ltf_lo__x2__and__fwd_aff, t4; \ - vbroadcasti128 .Ltf_hi__x2__and__fwd_aff, t5; \ + vbroadcasti128 .Linv_shift_row(%rip), t0; \ + vbroadcasti128 .Lshift_row(%rip), t1; \ + vbroadcasti128 .Ltf_lo__inv_aff__and__s2(%rip), t2; \ + vbroadcasti128 .Ltf_hi__inv_aff__and__s2(%rip), t3; \ + vbroadcasti128 .Ltf_lo__x2__and__fwd_aff(%rip), t4; \ + vbroadcasti128 .Ltf_hi__x2__and__fwd_aff(%rip), t5; \ \ vextracti128 $1, x0, t6##_x; \ vaesenclast t7##_x, x0##_x, x0##_x; \ @@ -369,7 +369,7 @@ vaesdeclast t7##_x, t6##_x, t6##_x; \ vinserti128 $1, t6##_x, x6, x6; \ \ - vpbroadcastd .L0f0f0f0f, t6; \ + vpbroadcastd .L0f0f0f0f(%rip), t6; \ \ /* AES inverse shift rows */ \ vpshufb t0, x0, x0; \ diff --git a/arch/x86/crypto/aria-gfni-avx512-asm_64.S b/arch/x86/crypto/aria-gfni-avx512-asm_64.S index 3193f07014506655..860887e5d02ed6ef 100644 --- a/arch/x86/crypto/aria-gfni-avx512-asm_64.S +++ b/arch/x86/crypto/aria-gfni-avx512-asm_64.S @@ -80,7 +80,7 @@ transpose_4x4(c0, c1, c2, c3, a0, a1); \ transpose_4x4(d0, d1, d2, d3, a0, a1); \ \ - vbroadcasti64x2 .Lshufb_16x16b, a0; \ + vbroadcasti64x2 .Lshufb_16x16b(%rip), a0; \ vmovdqu64 st1, a1; \ vpshufb a0, a2, a2; \ vpshufb a0, a3, a3; \ @@ -132,7 +132,7 @@ transpose_4x4(c0, c1, c2, c3, a0, a1); \ transpose_4x4(d0, d1, d2, d3, a0, a1); \ \ - vbroadcasti64x2 .Lshufb_16x16b, a0; \ + vbroadcasti64x2 .Lshufb_16x16b(%rip), a0; \ vmovdqu64 st1, a1; \ vpshufb a0, a2, a2; \ vpshufb a0, a3, a3; \ @@ -308,11 +308,11 @@ x4, x5, x6, x7, \ t0, t1, t2, t3, \ t4, t5, t6, t7) \ - vpbroadcastq .Ltf_s2_bitmatrix, t0; \ - vpbroadcastq .Ltf_inv_bitmatrix, t1; \ - vpbroadcastq .Ltf_id_bitmatrix, t2; \ - vpbroadcastq .Ltf_aff_bitmatrix, t3; \ - vpbroadcastq .Ltf_x2_bitmatrix, t4; \ + vpbroadcastq .Ltf_s2_bitmatrix(%rip), t0; \ + vpbroadcastq .Ltf_inv_bitmatrix(%rip), t1; \ + vpbroadcastq .Ltf_id_bitmatrix(%rip), t2; \ + vpbroadcastq .Ltf_aff_bitmatrix(%rip), t3; \ + vpbroadcastq .Ltf_x2_bitmatrix(%rip), t4; \ vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \ vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \ vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \ @@ -332,11 +332,11 @@ y4, y5, y6, y7, \ t0, t1, t2, t3, \ t4, t5, t6, t7) \ - vpbroadcastq .Ltf_s2_bitmatrix, t0; \ - vpbroadcastq .Ltf_inv_bitmatrix, t1; \ - vpbroadcastq .Ltf_id_bitmatrix, t2; \ - vpbroadcastq .Ltf_aff_bitmatrix, t3; \ - vpbroadcastq .Ltf_x2_bitmatrix, t4; \ + vpbroadcastq .Ltf_s2_bitmatrix(%rip), t0; \ + vpbroadcastq .Ltf_inv_bitmatrix(%rip), t1; \ + vpbroadcastq .Ltf_id_bitmatrix(%rip), t2; \ + vpbroadcastq .Ltf_aff_bitmatrix(%rip), t3; \ + vpbroadcastq .Ltf_x2_bitmatrix(%rip), t4; \ vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \ vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \ vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \ From patchwork Wed Apr 12 11:00:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208779 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10B91C77B72 for ; Wed, 12 Apr 2023 11:01:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229697AbjDLLA6 (ORCPT ); Wed, 12 Apr 2023 07:00:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229705AbjDLLAz (ORCPT ); Wed, 12 Apr 2023 07:00:55 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87F846E80 for ; Wed, 12 Apr 2023 04:00:54 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 159DE632EB for ; Wed, 12 Apr 2023 11:00:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5ADAEC433D2; Wed, 12 Apr 2023 11:00:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297253; bh=xZcJ5AirwQWbpJlVHXNZgshaCxkp54ObCSL5N21hZ3o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fqAkP3GEUiZctr4ZXuqo3itXRhh0r8nA+EAqYjN7jQpfYa+SAeoDSaujk8BroO2dX fg95HwZqqAZe7hxIsBt1xjzFv6VyURWFpOSKx5eZUHTQ5PTPQe+WWB1iAVL8HexY9i T5KpT9mbCKySP4gDXdOMRMiBTFmbK+8Va0TaHSKqrnfQX8in6ljBfXKJX+zuqemSRl wMGmEdUH+HWgV1xFRwvpEQi+h+RL/wTIpASdEe5ewRWccvLhrAmOPIVWaVszKBuE4Q FBUXEiScUhfU7jovsWlkriJWKDtksYOb0YFviRAbfOn0WlboHzu7uBmpnPjfREgEwM OEsSUojjKuZbg== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 04/13] crypto: x86/camellia - Use RIP-relative addressing Date: Wed, 12 Apr 2023 13:00:26 +0200 Message-Id: <20230412110035.361447-5-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=7312; i=ardb@kernel.org; h=from:subject; bh=xZcJ5AirwQWbpJlVHXNZgshaCxkp54ObCSL5N21hZ3o=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWs333PtpSW1Gq58pJnIZdf8NS8WKO6Yf2ecxnKyYt/8 Lvdm1rWUcrCIMbBICumyCIw+++7nacnStU6z5KFmcPKBDKEgYtTACZy8iTDf5/bFYIszjuPpty5 0i3odSJq3b1EoY2zKlewS1gemR/CtJDhr5zKScXa35+Eb55cqxPgEX61fOKsHFf358WWkRuWblG 0YAcA X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier Signed-off-by: Thomas Garnier Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/camellia-aesni-avx-asm_64.S | 30 ++++++++++---------- arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 30 ++++++++++---------- arch/x86/crypto/camellia-x86_64-asm_64.S | 6 ++-- 3 files changed, 34 insertions(+), 32 deletions(-) diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S index 4a30618281ec2e9e..646477a13e110fed 100644 --- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S @@ -52,10 +52,10 @@ /* \ * S-function with AES subbytes \ */ \ - vmovdqa .Linv_shift_row, t4; \ - vbroadcastss .L0f0f0f0f, t7; \ - vmovdqa .Lpre_tf_lo_s1, t0; \ - vmovdqa .Lpre_tf_hi_s1, t1; \ + vmovdqa .Linv_shift_row(%rip), t4; \ + vbroadcastss .L0f0f0f0f(%rip), t7; \ + vmovdqa .Lpre_tf_lo_s1(%rip), t0; \ + vmovdqa .Lpre_tf_hi_s1(%rip), t1; \ \ /* AES inverse shift rows */ \ vpshufb t4, x0, x0; \ @@ -68,8 +68,8 @@ vpshufb t4, x6, x6; \ \ /* prefilter sboxes 1, 2 and 3 */ \ - vmovdqa .Lpre_tf_lo_s4, t2; \ - vmovdqa .Lpre_tf_hi_s4, t3; \ + vmovdqa .Lpre_tf_lo_s4(%rip), t2; \ + vmovdqa .Lpre_tf_hi_s4(%rip), t3; \ filter_8bit(x0, t0, t1, t7, t6); \ filter_8bit(x7, t0, t1, t7, t6); \ filter_8bit(x1, t0, t1, t7, t6); \ @@ -83,8 +83,8 @@ filter_8bit(x6, t2, t3, t7, t6); \ \ /* AES subbytes + AES shift rows */ \ - vmovdqa .Lpost_tf_lo_s1, t0; \ - vmovdqa .Lpost_tf_hi_s1, t1; \ + vmovdqa .Lpost_tf_lo_s1(%rip), t0; \ + vmovdqa .Lpost_tf_hi_s1(%rip), t1; \ vaesenclast t4, x0, x0; \ vaesenclast t4, x7, x7; \ vaesenclast t4, x1, x1; \ @@ -95,16 +95,16 @@ vaesenclast t4, x6, x6; \ \ /* postfilter sboxes 1 and 4 */ \ - vmovdqa .Lpost_tf_lo_s3, t2; \ - vmovdqa .Lpost_tf_hi_s3, t3; \ + vmovdqa .Lpost_tf_lo_s3(%rip), t2; \ + vmovdqa .Lpost_tf_hi_s3(%rip), t3; \ filter_8bit(x0, t0, t1, t7, t6); \ filter_8bit(x7, t0, t1, t7, t6); \ filter_8bit(x3, t0, t1, t7, t6); \ filter_8bit(x6, t0, t1, t7, t6); \ \ /* postfilter sbox 3 */ \ - vmovdqa .Lpost_tf_lo_s2, t4; \ - vmovdqa .Lpost_tf_hi_s2, t5; \ + vmovdqa .Lpost_tf_lo_s2(%rip), t4; \ + vmovdqa .Lpost_tf_hi_s2(%rip), t5; \ filter_8bit(x2, t2, t3, t7, t6); \ filter_8bit(x5, t2, t3, t7, t6); \ \ @@ -443,7 +443,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab) transpose_4x4(c0, c1, c2, c3, a0, a1); \ transpose_4x4(d0, d1, d2, d3, a0, a1); \ \ - vmovdqu .Lshufb_16x16b, a0; \ + vmovdqu .Lshufb_16x16b(%rip), a0; \ vmovdqu st1, a1; \ vpshufb a0, a2, a2; \ vpshufb a0, a3, a3; \ @@ -482,7 +482,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab) #define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \ y6, y7, rio, key) \ vmovq key, x0; \ - vpshufb .Lpack_bswap, x0, x0; \ + vpshufb .Lpack_bswap(%rip), x0, x0; \ \ vpxor 0 * 16(rio), x0, y7; \ vpxor 1 * 16(rio), x0, y6; \ @@ -533,7 +533,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab) vmovdqu x0, stack_tmp0; \ \ vmovq key, x0; \ - vpshufb .Lpack_bswap, x0, x0; \ + vpshufb .Lpack_bswap(%rip), x0, x0; \ \ vpxor x0, y7, y7; \ vpxor x0, y6, y6; \ diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S index deaf62aa73a6b09c..a0eb94e53b1bb12d 100644 --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S @@ -64,12 +64,12 @@ /* \ * S-function with AES subbytes \ */ \ - vbroadcasti128 .Linv_shift_row, t4; \ - vpbroadcastd .L0f0f0f0f, t7; \ - vbroadcasti128 .Lpre_tf_lo_s1, t5; \ - vbroadcasti128 .Lpre_tf_hi_s1, t6; \ - vbroadcasti128 .Lpre_tf_lo_s4, t2; \ - vbroadcasti128 .Lpre_tf_hi_s4, t3; \ + vbroadcasti128 .Linv_shift_row(%rip), t4; \ + vpbroadcastd .L0f0f0f0f(%rip), t7; \ + vbroadcasti128 .Lpre_tf_lo_s1(%rip), t5; \ + vbroadcasti128 .Lpre_tf_hi_s1(%rip), t6; \ + vbroadcasti128 .Lpre_tf_lo_s4(%rip), t2; \ + vbroadcasti128 .Lpre_tf_hi_s4(%rip), t3; \ \ /* AES inverse shift rows */ \ vpshufb t4, x0, x0; \ @@ -115,8 +115,8 @@ vinserti128 $1, t2##_x, x6, x6; \ vextracti128 $1, x1, t3##_x; \ vextracti128 $1, x4, t2##_x; \ - vbroadcasti128 .Lpost_tf_lo_s1, t0; \ - vbroadcasti128 .Lpost_tf_hi_s1, t1; \ + vbroadcasti128 .Lpost_tf_lo_s1(%rip), t0; \ + vbroadcasti128 .Lpost_tf_hi_s1(%rip), t1; \ vaesenclast t4##_x, x2##_x, x2##_x; \ vaesenclast t4##_x, t6##_x, t6##_x; \ vinserti128 $1, t6##_x, x2, x2; \ @@ -131,16 +131,16 @@ vinserti128 $1, t2##_x, x4, x4; \ \ /* postfilter sboxes 1 and 4 */ \ - vbroadcasti128 .Lpost_tf_lo_s3, t2; \ - vbroadcasti128 .Lpost_tf_hi_s3, t3; \ + vbroadcasti128 .Lpost_tf_lo_s3(%rip), t2; \ + vbroadcasti128 .Lpost_tf_hi_s3(%rip), t3; \ filter_8bit(x0, t0, t1, t7, t6); \ filter_8bit(x7, t0, t1, t7, t6); \ filter_8bit(x3, t0, t1, t7, t6); \ filter_8bit(x6, t0, t1, t7, t6); \ \ /* postfilter sbox 3 */ \ - vbroadcasti128 .Lpost_tf_lo_s2, t4; \ - vbroadcasti128 .Lpost_tf_hi_s2, t5; \ + vbroadcasti128 .Lpost_tf_lo_s2(%rip), t4; \ + vbroadcasti128 .Lpost_tf_hi_s2(%rip), t5; \ filter_8bit(x2, t2, t3, t7, t6); \ filter_8bit(x5, t2, t3, t7, t6); \ \ @@ -475,7 +475,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab) transpose_4x4(c0, c1, c2, c3, a0, a1); \ transpose_4x4(d0, d1, d2, d3, a0, a1); \ \ - vbroadcasti128 .Lshufb_16x16b, a0; \ + vbroadcasti128 .Lshufb_16x16b(%rip), a0; \ vmovdqu st1, a1; \ vpshufb a0, a2, a2; \ vpshufb a0, a3, a3; \ @@ -514,7 +514,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab) #define inpack32_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \ y6, y7, rio, key) \ vpbroadcastq key, x0; \ - vpshufb .Lpack_bswap, x0, x0; \ + vpshufb .Lpack_bswap(%rip), x0, x0; \ \ vpxor 0 * 32(rio), x0, y7; \ vpxor 1 * 32(rio), x0, y6; \ @@ -565,7 +565,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab) vmovdqu x0, stack_tmp0; \ \ vpbroadcastq key, x0; \ - vpshufb .Lpack_bswap, x0, x0; \ + vpshufb .Lpack_bswap(%rip), x0, x0; \ \ vpxor x0, y7, y7; \ vpxor x0, y6, y6; \ diff --git a/arch/x86/crypto/camellia-x86_64-asm_64.S b/arch/x86/crypto/camellia-x86_64-asm_64.S index 347c059f59403d3c..816b6bb8bded7bb4 100644 --- a/arch/x86/crypto/camellia-x86_64-asm_64.S +++ b/arch/x86/crypto/camellia-x86_64-asm_64.S @@ -77,11 +77,13 @@ #define RXORbl %r9b #define xor2ror16(T0, T1, tmp1, tmp2, ab, dst) \ + leaq T0(%rip), tmp1; \ movzbl ab ## bl, tmp2 ## d; \ + xorq (tmp1, tmp2, 8), dst; \ + leaq T1(%rip), tmp2; \ movzbl ab ## bh, tmp1 ## d; \ rorq $16, ab; \ - xorq T0(, tmp2, 8), dst; \ - xorq T1(, tmp1, 8), dst; + xorq (tmp2, tmp1, 8), dst; /********************************************************************** 1-way camellia From patchwork Wed Apr 12 11:00:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208777 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE262C77B71 for ; Wed, 12 Apr 2023 11:01:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229705AbjDLLA7 (ORCPT ); Wed, 12 Apr 2023 07:00:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229647AbjDLLA5 (ORCPT ); Wed, 12 Apr 2023 07:00:57 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E4C26583 for ; Wed, 12 Apr 2023 04:00:56 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9F2E1632EC for ; Wed, 12 Apr 2023 11:00:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E2378C433EF; Wed, 12 Apr 2023 11:00:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297255; bh=ZwHHUi68L0/HwbVdVnggFFtQQ/3/Z1k/X5PTHCX4P2E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ubqwvkFnkGECPmccX0ZS7iufy9eP17n2J0P5Lef1q8aOEP7v7kZP1y1/mQDU8BX4T ll1jf9npj1XR633bvvoRG+rwGhy+nJmy6UY30aXOG8gHz6st2JFB1zK5Z0cHrwrway GQXunJF8TlW/EaZysBMnyTC/sa9he5F1gg0mIaKqNsh8rGy8c2ZppoMjqpzMQtgDK7 6AHDGVEjeFmDa61VoanEZJan1V9LN2aFHyzHoXlHO4cWZxFAy6I0G5TjbOsVQHSn0Q DQIMTENbB9a8my9Ou8JqDetyCYrRzNqmdokY/nif9PP14A+T+xLAeMyDb41viIX9XY saMmUYNGRxlWw== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 05/13] crypto: x86/cast5 - Use RIP-relative addressing Date: Wed, 12 Apr 2023 13:00:27 +0200 Message-Id: <20230412110035.361447-6-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3646; i=ardb@kernel.org; h=from:subject; bh=ZwHHUi68L0/HwbVdVnggFFtQQ/3/Z1k/X5PTHCX4P2E=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWs3yPH8rCt/+sJQX+cDN/rvJeMP2LBUSv1saHpVnSid evh+D8dpSwMYhwMsmKKLAKz/77beXqiVK3zLFmYOaxMIEMYuDgFYCIRvIwMX8LCanh6OC8k2wpP Sbddxcf59o+A/edbZw4snPzujNrFQEaGLXXrPMWzjE6+M3zU9SNYbGPPt8C1Ysws0q++VeZvVm/ iAAA= X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier Signed-off-by: Thomas Garnier Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 38 +++++++++++--------- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S index 0326a01503c3a554..b4e460a87f18ddaa 100644 --- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S +++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S @@ -84,15 +84,19 @@ #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \ movzbl src ## bh, RID1d; \ + leaq s1(%rip), RID2; \ + movl (RID2,RID1,4), dst ## d; \ movzbl src ## bl, RID2d; \ + leaq s2(%rip), RID1; \ + op1 (RID1,RID2,4), dst ## d; \ shrq $16, src; \ - movl s1(, RID1, 4), dst ## d; \ - op1 s2(, RID2, 4), dst ## d; \ movzbl src ## bh, RID1d; \ + leaq s3(%rip), RID2; \ + op2 (RID2,RID1,4), dst ## d; \ movzbl src ## bl, RID2d; \ interleave_op(il_reg); \ - op2 s3(, RID1, 4), dst ## d; \ - op3 s4(, RID2, 4), dst ## d; + leaq s4(%rip), RID1; \ + op3 (RID1,RID2,4), dst ## d; #define dummy(d) /* do nothing */ @@ -151,15 +155,15 @@ subround(l ## 3, r ## 3, l ## 4, r ## 4, f); #define enc_preload_rkr() \ - vbroadcastss .L16_mask, RKR; \ + vbroadcastss .L16_mask(%rip), RKR; \ /* add 16-bit rotation to key rotations (mod 32) */ \ vpxor kr(CTX), RKR, RKR; #define dec_preload_rkr() \ - vbroadcastss .L16_mask, RKR; \ + vbroadcastss .L16_mask(%rip), RKR; \ /* add 16-bit rotation to key rotations (mod 32) */ \ vpxor kr(CTX), RKR, RKR; \ - vpshufb .Lbswap128_mask, RKR, RKR; + vpshufb .Lbswap128_mask(%rip), RKR, RKR; #define transpose_2x4(x0, x1, t0, t1) \ vpunpckldq x1, x0, t0; \ @@ -235,9 +239,9 @@ SYM_FUNC_START_LOCAL(__cast5_enc_blk16) movq %rdi, CTX; - vmovdqa .Lbswap_mask, RKM; - vmovd .Lfirst_mask, R1ST; - vmovd .L32_mask, R32; + vmovdqa .Lbswap_mask(%rip), RKM; + vmovd .Lfirst_mask(%rip), R1ST; + vmovd .L32_mask(%rip), R32; enc_preload_rkr(); inpack_blocks(RL1, RR1, RTMP, RX, RKM); @@ -271,7 +275,7 @@ SYM_FUNC_START_LOCAL(__cast5_enc_blk16) popq %rbx; popq %r15; - vmovdqa .Lbswap_mask, RKM; + vmovdqa .Lbswap_mask(%rip), RKM; outunpack_blocks(RR1, RL1, RTMP, RX, RKM); outunpack_blocks(RR2, RL2, RTMP, RX, RKM); @@ -308,9 +312,9 @@ SYM_FUNC_START_LOCAL(__cast5_dec_blk16) movq %rdi, CTX; - vmovdqa .Lbswap_mask, RKM; - vmovd .Lfirst_mask, R1ST; - vmovd .L32_mask, R32; + vmovdqa .Lbswap_mask(%rip), RKM; + vmovd .Lfirst_mask(%rip), R1ST; + vmovd .L32_mask(%rip), R32; dec_preload_rkr(); inpack_blocks(RL1, RR1, RTMP, RX, RKM); @@ -341,7 +345,7 @@ SYM_FUNC_START_LOCAL(__cast5_dec_blk16) round(RL, RR, 1, 2); round(RR, RL, 0, 1); - vmovdqa .Lbswap_mask, RKM; + vmovdqa .Lbswap_mask(%rip), RKM; popq %rbx; popq %r15; @@ -504,8 +508,8 @@ SYM_FUNC_START(cast5_ctr_16way) vpcmpeqd RKR, RKR, RKR; vpaddq RKR, RKR, RKR; /* low: -2, high: -2 */ - vmovdqa .Lbswap_iv_mask, R1ST; - vmovdqa .Lbswap128_mask, RKM; + vmovdqa .Lbswap_iv_mask(%rip), R1ST; + vmovdqa .Lbswap128_mask(%rip), RKM; /* load IV and byteswap */ vmovq (%rcx), RX; From patchwork Wed Apr 12 11:00:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208778 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86853C77B76 for ; Wed, 12 Apr 2023 11:01:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229647AbjDLLBA (ORCPT ); Wed, 12 Apr 2023 07:01:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229671AbjDLLA6 (ORCPT ); Wed, 12 Apr 2023 07:00:58 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99AD965B0 for ; Wed, 12 Apr 2023 04:00:57 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3415163256 for ; Wed, 12 Apr 2023 11:00:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 779E3C433D2; Wed, 12 Apr 2023 11:00:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297256; bh=qEHykw/e7t8o+96tpVB/xNNOYi4X5fvmoTrv8XRvmCs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pyVYsqkDehjfnkV3ZwaaBytDALEmhWSq23F9wtMj/RhVBGk4YhIDtZYNET+XXmUsW Ho5c+fp3OypMTnuSxL+9OPANkeX+uFpnN3RfyJMm33GOepWu07oAyKj3hpcroqAShA QOilVAlF5EqyfiH6TMP15GRNlEYs+DoofXymLPpPgu3nroZkVVDO2JX+ZB3nyf7rkI hnDcrm9eb6X+8TDp5u8pzUmg3XFEv3tBDkddVDPG2GF9BL1uINzdkpYIMosoI/Rc4y K4LLpU37Iz982rYpHqLJrqrUh6zpvduwYzg7iuunnNd2XTqS0w3q3ZqbhI9y4WTlL3 CXDMbdlKs/JpQ== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 06/13] crypto: x86/cast6 - Use RIP-relative addressing Date: Wed, 12 Apr 2023 13:00:28 +0200 Message-Id: <20230412110035.361447-7-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3224; i=ardb@kernel.org; h=from:subject; bh=qEHykw/e7t8o+96tpVB/xNNOYi4X5fvmoTrv8XRvmCs=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWs37MnQOXLh36nBbe63uUsZL7+LOfru9C/dWLz6lbri bP0/qvsKGVhEONgkBVTZBGY/ffdztMTpWqdZ8nCzGFlAhnCwMUpABNZvZOR4Q9/UfWKXY46Qj2W KZV1rr/NeKZKBeZoJp29HqPXZmGTzvDfTUOC8/mytesMH989w3Le2zCjPcHQ1Erh2XLuIp/NWhP ZAA== X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier Signed-off-by: Thomas Garnier Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 32 +++++++++++--------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S index 82b716fd5dbac65a..9e86d460b4092826 100644 --- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S +++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S @@ -84,15 +84,19 @@ #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \ movzbl src ## bh, RID1d; \ + leaq s1(%rip), RID2; \ + movl (RID2,RID1,4), dst ## d; \ movzbl src ## bl, RID2d; \ + leaq s2(%rip), RID1; \ + op1 (RID1,RID2,4), dst ## d; \ shrq $16, src; \ - movl s1(, RID1, 4), dst ## d; \ - op1 s2(, RID2, 4), dst ## d; \ movzbl src ## bh, RID1d; \ + leaq s3(%rip), RID2; \ + op2 (RID2,RID1,4), dst ## d; \ movzbl src ## bl, RID2d; \ interleave_op(il_reg); \ - op2 s3(, RID1, 4), dst ## d; \ - op3 s4(, RID2, 4), dst ## d; + leaq s4(%rip), RID1; \ + op3 (RID1,RID2,4), dst ## d; #define dummy(d) /* do nothing */ @@ -175,10 +179,10 @@ qop(RD, RC, 1); #define shuffle(mask) \ - vpshufb mask, RKR, RKR; + vpshufb mask(%rip), RKR, RKR; #define preload_rkr(n, do_mask, mask) \ - vbroadcastss .L16_mask, RKR; \ + vbroadcastss .L16_mask(%rip), RKR; \ /* add 16-bit rotation to key rotations (mod 32) */ \ vpxor (kr+n*16)(CTX), RKR, RKR; \ do_mask(mask); @@ -258,9 +262,9 @@ SYM_FUNC_START_LOCAL(__cast6_enc_blk8) movq %rdi, CTX; - vmovdqa .Lbswap_mask, RKM; - vmovd .Lfirst_mask, R1ST; - vmovd .L32_mask, R32; + vmovdqa .Lbswap_mask(%rip), RKM; + vmovd .Lfirst_mask(%rip), R1ST; + vmovd .L32_mask(%rip), R32; inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM); inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM); @@ -284,7 +288,7 @@ SYM_FUNC_START_LOCAL(__cast6_enc_blk8) popq %rbx; popq %r15; - vmovdqa .Lbswap_mask, RKM; + vmovdqa .Lbswap_mask(%rip), RKM; outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM); outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM); @@ -306,9 +310,9 @@ SYM_FUNC_START_LOCAL(__cast6_dec_blk8) movq %rdi, CTX; - vmovdqa .Lbswap_mask, RKM; - vmovd .Lfirst_mask, R1ST; - vmovd .L32_mask, R32; + vmovdqa .Lbswap_mask(%rip), RKM; + vmovd .Lfirst_mask(%rip), R1ST; + vmovd .L32_mask(%rip), R32; inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM); inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM); @@ -332,7 +336,7 @@ SYM_FUNC_START_LOCAL(__cast6_dec_blk8) popq %rbx; popq %r15; - vmovdqa .Lbswap_mask, RKM; + vmovdqa .Lbswap_mask(%rip), RKM; outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM); outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM); From patchwork Wed Apr 12 11:00:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208780 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14E81C77B6E for ; Wed, 12 Apr 2023 11:01:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229711AbjDLLBC (ORCPT ); Wed, 12 Apr 2023 07:01:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229706AbjDLLBA (ORCPT ); Wed, 12 Apr 2023 07:01:00 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38B2A65B0 for ; Wed, 12 Apr 2023 04:00:59 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BB33C632EE for ; Wed, 12 Apr 2023 11:00:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0AF39C4339E; Wed, 12 Apr 2023 11:00:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297258; bh=QIM4dvy+y3uMDNtnBsSv0HyVwFsc2IPurP1uLhVlB6M=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rUc1jB3s4OWmcYrxzPWlTf0Hcp83OXraIUdGJLDtTxhJHmf49kYA9Cs0uA4J7vpDG zsC6npQXsyyKSabxyHpKC69XTwGMlsYE0AZPrYg/wPg9f0VpboUNvEa08w1pEO/znJ 4jSYW4kmnNv261NLqP3hlkhDODed5zD5mMbw+4XqjoJOYPLJ2CMssrZ1k4oxiwlG5N DvWfQOe/H7ZOyFvxnjG25z+KcMUk1bPaC328fwskclF+E8zF/s5NM8x6ecaPuQxIFG xvkfQWU6kezb6Sj2VMYXrpgcpL0Z8aSwAbUCtIrlYgY1tPX7CCVw3SQK8W1PEbnTQx EbBjUXrU2UAjg== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 07/13] crypto: x86/crc32c - Use RIP-relative addressing Date: Wed, 12 Apr 2023 13:00:29 +0200 Message-Id: <20230412110035.361447-8-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=795; i=ardb@kernel.org; h=from:subject; bh=QIM4dvy+y3uMDNtnBsSv0HyVwFsc2IPurP1uLhVlB6M=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWs3zs48Ht06m4vRWkr9U2V6VXOp28zRBkkblljkt1y0 MjKY3lHKQuDGAeDrJgii8Dsv+92np4oVes8SxZmDisTyBAGLk4BmIh8HsP/8O4rNb/N6m7kOCQp h1upFH58/Zz9dPVVRabTKrb/m7NmMjIsUVdNW6x2N7tONavw3PclBzNTMoTVheP99OMDYjfkWvE BAA== X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S index ec35915f0901a087..5f843dce77f1de66 100644 --- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S @@ -168,7 +168,8 @@ continue_block: xor crc2, crc2 ## branch into array - mov jump_table(,%rax,8), %bufp + leaq jump_table(%rip), %bufp + mov (%bufp,%rax,8), %bufp JMP_NOSPEC bufp ################################################################ From patchwork Wed Apr 12 11:00:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208782 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F1B9C7619A for ; Wed, 12 Apr 2023 11:01:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229679AbjDLLBP (ORCPT ); Wed, 12 Apr 2023 07:01:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229706AbjDLLBC (ORCPT ); Wed, 12 Apr 2023 07:01:02 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B738D6EAD for ; Wed, 12 Apr 2023 04:01:00 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5378963256 for ; Wed, 12 Apr 2023 11:01:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 93934C433D2; Wed, 12 Apr 2023 11:00:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297259; bh=5xYP4xdbKfF7TFdddJFji/S1DKTwXAgbtRcwhupiGTo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vBFt+YQR6gmlzPhkQ4TxKFCdzSUAMC8oA3yrw7B0tI9Vwr6OLm8k/1TAbst/qYayK c9pwl4w1NFdSReLf3LIiBX9S0HA/Dv4nYHDjAyQR+GKdTSzjsBzNQNhro/NrT7NZu4 xemqD6JV8KJdn8BNsLRRJAjecftteCQZdf9BQVwhq8rTnWAzW+oEzeFV51TVjv73Iy TGNxCTcuzYtMdjA58VvOQyneOuCJedbWnRMQ4a5csoIDdvfBC6Ov8qmYRLxVf4nhD9 3BFN6lyXYTdh3JLbXJ8xbu4T/eZSTASHhKy6TcfgJ7z6Y8bV1qB+Vc7DXa3Q1RIJM/ lDLhJtpLgOirw== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 08/13] crypto: x86/des3 - Use RIP-relative addressing Date: Wed, 12 Apr 2023 13:00:30 +0200 Message-Id: <20230412110035.361447-9-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4853; i=ardb@kernel.org; h=from:subject; bh=5xYP4xdbKfF7TFdddJFji/S1DKTwXAgbtRcwhupiGTo=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWs3yd0yin/qQ/uXdFJNImbcmbbni2dc3bbabPM/DSzw 0m0kfFPRykLgxgHg6yYIovA7L/vdp6eKFXrPEsWZg4rE8gQBi5OAZjIP3WG/7FJ176J7pKZ8e3i AmsTi02dxtKRGtwS3ImnT23azJ4odYKR4ZN9pYxAUNasuxfVbkj3HPqrd/m87byUwJqZ6Vsu5dZ N4gcA X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier Signed-off-by: Thomas Garnier Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/des3_ede-asm_64.S | 96 +++++++++++++------- 1 file changed, 64 insertions(+), 32 deletions(-) diff --git a/arch/x86/crypto/des3_ede-asm_64.S b/arch/x86/crypto/des3_ede-asm_64.S index f4c760f4cade6d7b..cf21b998e77cc4ea 100644 --- a/arch/x86/crypto/des3_ede-asm_64.S +++ b/arch/x86/crypto/des3_ede-asm_64.S @@ -129,21 +129,29 @@ movzbl RW0bl, RT2d; \ movzbl RW0bh, RT3d; \ shrq $16, RW0; \ - movq s8(, RT0, 8), RT0; \ - xorq s6(, RT1, 8), to; \ + leaq s8(%rip), RW1; \ + movq (RW1, RT0, 8), RT0; \ + leaq s6(%rip), RW1; \ + xorq (RW1, RT1, 8), to; \ movzbl RW0bl, RL1d; \ movzbl RW0bh, RT1d; \ shrl $16, RW0d; \ - xorq s4(, RT2, 8), RT0; \ - xorq s2(, RT3, 8), to; \ + leaq s4(%rip), RW1; \ + xorq (RW1, RT2, 8), RT0; \ + leaq s2(%rip), RW1; \ + xorq (RW1, RT3, 8), to; \ movzbl RW0bl, RT2d; \ movzbl RW0bh, RT3d; \ - xorq s7(, RL1, 8), RT0; \ - xorq s5(, RT1, 8), to; \ - xorq s3(, RT2, 8), RT0; \ + leaq s7(%rip), RW1; \ + xorq (RW1, RL1, 8), RT0; \ + leaq s5(%rip), RW1; \ + xorq (RW1, RT1, 8), to; \ + leaq s3(%rip), RW1; \ + xorq (RW1, RT2, 8), RT0; \ load_next_key(n, RW0); \ xorq RT0, to; \ - xorq s1(, RT3, 8), to; \ + leaq s1(%rip), RW1; \ + xorq (RW1, RT3, 8), to; \ #define load_next_key(n, RWx) \ movq (((n) + 1) * 8)(CTX), RWx; @@ -355,65 +363,89 @@ SYM_FUNC_END(des3_ede_x86_64_crypt_blk) movzbl RW0bl, RT3d; \ movzbl RW0bh, RT1d; \ shrq $16, RW0; \ - xorq s8(, RT3, 8), to##0; \ - xorq s6(, RT1, 8), to##0; \ + leaq s8(%rip), RT2; \ + xorq (RT2, RT3, 8), to##0; \ + leaq s6(%rip), RT2; \ + xorq (RT2, RT1, 8), to##0; \ movzbl RW0bl, RT3d; \ movzbl RW0bh, RT1d; \ shrq $16, RW0; \ - xorq s4(, RT3, 8), to##0; \ - xorq s2(, RT1, 8), to##0; \ + leaq s4(%rip), RT2; \ + xorq (RT2, RT3, 8), to##0; \ + leaq s2(%rip), RT2; \ + xorq (RT2, RT1, 8), to##0; \ movzbl RW0bl, RT3d; \ movzbl RW0bh, RT1d; \ shrl $16, RW0d; \ - xorq s7(, RT3, 8), to##0; \ - xorq s5(, RT1, 8), to##0; \ + leaq s7(%rip), RT2; \ + xorq (RT2, RT3, 8), to##0; \ + leaq s5(%rip), RT2; \ + xorq (RT2, RT1, 8), to##0; \ movzbl RW0bl, RT3d; \ movzbl RW0bh, RT1d; \ load_next_key(n, RW0); \ - xorq s3(, RT3, 8), to##0; \ - xorq s1(, RT1, 8), to##0; \ + leaq s3(%rip), RT2; \ + xorq (RT2, RT3, 8), to##0; \ + leaq s1(%rip), RT2; \ + xorq (RT2, RT1, 8), to##0; \ xorq from##1, RW1; \ movzbl RW1bl, RT3d; \ movzbl RW1bh, RT1d; \ shrq $16, RW1; \ - xorq s8(, RT3, 8), to##1; \ - xorq s6(, RT1, 8), to##1; \ + leaq s8(%rip), RT2; \ + xorq (RT2, RT3, 8), to##1; \ + leaq s6(%rip), RT2; \ + xorq (RT2, RT1, 8), to##1; \ movzbl RW1bl, RT3d; \ movzbl RW1bh, RT1d; \ shrq $16, RW1; \ - xorq s4(, RT3, 8), to##1; \ - xorq s2(, RT1, 8), to##1; \ + leaq s4(%rip), RT2; \ + xorq (RT2, RT3, 8), to##1; \ + leaq s2(%rip), RT2; \ + xorq (RT2, RT1, 8), to##1; \ movzbl RW1bl, RT3d; \ movzbl RW1bh, RT1d; \ shrl $16, RW1d; \ - xorq s7(, RT3, 8), to##1; \ - xorq s5(, RT1, 8), to##1; \ + leaq s7(%rip), RT2; \ + xorq (RT2, RT3, 8), to##1; \ + leaq s5(%rip), RT2; \ + xorq (RT2, RT1, 8), to##1; \ movzbl RW1bl, RT3d; \ movzbl RW1bh, RT1d; \ do_movq(RW0, RW1); \ - xorq s3(, RT3, 8), to##1; \ - xorq s1(, RT1, 8), to##1; \ + leaq s3(%rip), RT2; \ + xorq (RT2, RT3, 8), to##1; \ + leaq s1(%rip), RT2; \ + xorq (RT2, RT1, 8), to##1; \ xorq from##2, RW2; \ movzbl RW2bl, RT3d; \ movzbl RW2bh, RT1d; \ shrq $16, RW2; \ - xorq s8(, RT3, 8), to##2; \ - xorq s6(, RT1, 8), to##2; \ + leaq s8(%rip), RT2; \ + xorq (RT2, RT3, 8), to##2; \ + leaq s6(%rip), RT2; \ + xorq (RT2, RT1, 8), to##2; \ movzbl RW2bl, RT3d; \ movzbl RW2bh, RT1d; \ shrq $16, RW2; \ - xorq s4(, RT3, 8), to##2; \ - xorq s2(, RT1, 8), to##2; \ + leaq s4(%rip), RT2; \ + xorq (RT2, RT3, 8), to##2; \ + leaq s2(%rip), RT2; \ + xorq (RT2, RT1, 8), to##2; \ movzbl RW2bl, RT3d; \ movzbl RW2bh, RT1d; \ shrl $16, RW2d; \ - xorq s7(, RT3, 8), to##2; \ - xorq s5(, RT1, 8), to##2; \ + leaq s7(%rip), RT2; \ + xorq (RT2, RT3, 8), to##2; \ + leaq s5(%rip), RT2; \ + xorq (RT2, RT1, 8), to##2; \ movzbl RW2bl, RT3d; \ movzbl RW2bh, RT1d; \ do_movq(RW0, RW2); \ - xorq s3(, RT3, 8), to##2; \ - xorq s1(, RT1, 8), to##2; + leaq s3(%rip), RT2; \ + xorq (RT2, RT3, 8), to##2; \ + leaq s1(%rip), RT2; \ + xorq (RT2, RT1, 8), to##2; #define __movq(src, dst) \ movq src, dst; From patchwork Wed Apr 12 11:00:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208781 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B007FC77B71 for ; Wed, 12 Apr 2023 11:01:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229706AbjDLLBQ (ORCPT ); Wed, 12 Apr 2023 07:01:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229749AbjDLLBN (ORCPT ); Wed, 12 Apr 2023 07:01:13 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 527616A69 for ; Wed, 12 Apr 2023 04:01:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D60F4632EB for ; Wed, 12 Apr 2023 11:01:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2A791C4339C; Wed, 12 Apr 2023 11:01:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297261; bh=WCIInqLVBgl0lnWstcFe7pmbtXMzjb9G1XFch4e7MRM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VhYbnuBDrtISvotGsBEsnNMlt9hFKI5WTIhPulC6Wxf5gtgguZFWz2mFFwvz6Jwdn iNF5Ym/AwObwcTz8WJ4FmjOb93W3su0a0I21I23pHldTMfs4bUP/ev5Ga6s56DgoO7 F0AjuE0EGzyb0F5f7kCHhl774Wa7C28MJfgSsVWaKmfNqff9148rWDIapI/PsvjiBv M+Inzgyx1Iu2MIUWlxpArBgv2XeA8kMhpF4A4gFHMR8qGRv2npwSP/B3bPMojrvgwD 8avxUiQcWb5V5Zy/QGrFxxRPrdJHUAaNIgkUmnu+128YBlQiJJqetzh63G8hGGF0KO +m1UWIHW2c9fA== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 09/13] crypto: x86/ghash - Use RIP-relative addressing Date: Wed, 12 Apr 2023 13:00:31 +0200 Message-Id: <20230412110035.361447-10-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1040; i=ardb@kernel.org; h=from:subject; bh=WCIInqLVBgl0lnWstcFe7pmbtXMzjb9G1XFch4e7MRM=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWs3/fU+S1pkq3l/RKTdpk9/yW8W0Mtb7FhfLjLxfeeT 759jivuKGVhEONgkBVTZBGY/ffdztMTpWqdZ8nCzGFlAhnCwMUpABO5fZ3hv/uzJbWSJS/a47OC Dnjd+Cl0+Mjh4qi2BQZftK8cLT39robhf+K3cy8/RmoJXeX7zrB3+ovsZ3Y6Oz6xZ8f9+Hq7Kz8 /kxcA X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S index 257ed9446f3ee1a9..99cb983ded9e369f 100644 --- a/arch/x86/crypto/ghash-clmulni-intel_asm.S +++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S @@ -93,7 +93,7 @@ SYM_FUNC_START(clmul_ghash_mul) FRAME_BEGIN movups (%rdi), DATA movups (%rsi), SHASH - movaps .Lbswap_mask, BSWAP + movaps .Lbswap_mask(%rip), BSWAP pshufb BSWAP, DATA call __clmul_gf128mul_ble pshufb BSWAP, DATA @@ -110,7 +110,7 @@ SYM_FUNC_START(clmul_ghash_update) FRAME_BEGIN cmp $16, %rdx jb .Lupdate_just_ret # check length - movaps .Lbswap_mask, BSWAP + movaps .Lbswap_mask(%rip), BSWAP movups (%rdi), DATA movups (%rcx), SHASH pshufb BSWAP, DATA From patchwork Wed Apr 12 11:00:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208783 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DF27C77B6E for ; Wed, 12 Apr 2023 11:01:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229709AbjDLLBR (ORCPT ); Wed, 12 Apr 2023 07:01:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229765AbjDLLBO (ORCPT ); Wed, 12 Apr 2023 07:01:14 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1F876E80 for ; Wed, 12 Apr 2023 04:01:03 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6C02662889 for ; Wed, 12 Apr 2023 11:01:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B2243C433D2; Wed, 12 Apr 2023 11:01:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297262; bh=Em7n8Yu3fpuklxDgt7kSen6J5HDRU2CHTf0mvOk/VXI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SLwSFHuxumLEd66E7pWBfWlk6BaxpYm6klPr7ZJCB0PFtPoNlrrMo/NGXi4ehgU// FmYjwl7hsSSV6iwoiLww6gR2Y2BIRKiC3w+4NKBhdnslxrP2OO/+zfMRCbaQLUUQAH 4J8wmdtFw7Ld3ffilsGNq8pB4tIHavU2RX4Kk+UG0u1uRxjdZm243A2pAP9wGcdwe6 1heQTzcGkrCisyUaHa+RzYlB4Q26zjBI+sDZkv7CFpC3E1Uwl7aUnAQuI8UJySzIFe hTDuibp4PLF8nOD6G/wIZGt2tl62k1xdz3S2oaPQK+KmVixrLkwUIJRDOwPqGiHXqW 6qo9iPVQzoDKw== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 10/13] crypto: x86/sha256 - Use RIP-relative addressing Date: Wed, 12 Apr 2023 13:00:32 +0200 Message-Id: <20230412110035.361447-11-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1743; i=ardb@kernel.org; h=from:subject; bh=Em7n8Yu3fpuklxDgt7kSen6J5HDRU2CHTf0mvOk/VXI=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWs3//Y1k03+9xcdLoniqYffRH4QDFJ8a1g7qHp1u8es 8mWzNzbUcrCIMbBICumyCIw+++7nacnStU6z5KFmcPKBDKEgYtTACayyICRYUOl1OVdCoyN3PY+ zDnsX/PzH9inrZn7T8lQkcFHo8luLsNfUQ7Rqy/1vmcfNzm/3o85XFJy7dkCp/5U38mJu+dullj CAAA= X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/sha256-avx2-asm.S | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S index 3eada94168526665..e2a4024fb0a3f5d5 100644 --- a/arch/x86/crypto/sha256-avx2-asm.S +++ b/arch/x86/crypto/sha256-avx2-asm.S @@ -589,19 +589,23 @@ last_block_enter: .align 16 loop1: - vpaddd K256+0*32(SRND), X0, XFER + leaq K256+0*32(%rip), INP ## reuse INP as scratch reg + vpaddd (INP, SRND), X0, XFER vmovdqa XFER, 0*32+_XFER(%rsp, SRND) FOUR_ROUNDS_AND_SCHED _XFER + 0*32 - vpaddd K256+1*32(SRND), X0, XFER + leaq K256+1*32(%rip), INP + vpaddd (INP, SRND), X0, XFER vmovdqa XFER, 1*32+_XFER(%rsp, SRND) FOUR_ROUNDS_AND_SCHED _XFER + 1*32 - vpaddd K256+2*32(SRND), X0, XFER + leaq K256+2*32(%rip), INP + vpaddd (INP, SRND), X0, XFER vmovdqa XFER, 2*32+_XFER(%rsp, SRND) FOUR_ROUNDS_AND_SCHED _XFER + 2*32 - vpaddd K256+3*32(SRND), X0, XFER + leaq K256+3*32(%rip), INP + vpaddd (INP, SRND), X0, XFER vmovdqa XFER, 3*32+_XFER(%rsp, SRND) FOUR_ROUNDS_AND_SCHED _XFER + 3*32 @@ -611,11 +615,13 @@ loop1: loop2: ## Do last 16 rounds with no scheduling - vpaddd K256+0*32(SRND), X0, XFER + leaq K256+0*32(%rip), INP + vpaddd (INP, SRND), X0, XFER vmovdqa XFER, 0*32+_XFER(%rsp, SRND) DO_4ROUNDS _XFER + 0*32 - vpaddd K256+1*32(SRND), X1, XFER + leaq K256+1*32(%rip), INP + vpaddd (INP, SRND), X1, XFER vmovdqa XFER, 1*32+_XFER(%rsp, SRND) DO_4ROUNDS _XFER + 1*32 add $2*32, SRND From patchwork Wed Apr 12 11:00:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208784 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF624C77B72 for ; Wed, 12 Apr 2023 11:01:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229671AbjDLLBS (ORCPT ); Wed, 12 Apr 2023 07:01:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229685AbjDLLBP (ORCPT ); Wed, 12 Apr 2023 07:01:15 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E50E7689 for ; Wed, 12 Apr 2023 04:01:05 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2DCC4632EB for ; Wed, 12 Apr 2023 11:01:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4949DC433EF; Wed, 12 Apr 2023 11:01:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297264; bh=3mcZYgQbypouEIMQmPkYqYgFzytdi22IcIznW9aErXQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hDH/yYtCETUgsXWoVpNSoAFbD5dYq3OdeO2Zw7q+4/5UIN1wC1dePdVyW//ibPP2W EFTon8mFN2mSTiYRQO0yRWK14nLdDbsqmYr5MzDAkoE9D3lZUXUCpy7bZrKw5RpjTN acn8gsa1yg8Ka00z502jLpM0/lq1B5kOEGGsLlEz+862+wyRtPIMAM79RkWcCQ+rbn ayn5SsEl4Kw94rXSriRcnW/3HtkLu7xE01GzaZFrBBYHi/vTThWjVsgwUumk2T2l6k JVB3GRRaRU2KXs7ZlxBcQNUtx7jGx9d3KRLI7OD52tOWLlr4WaXdBmVM2T5A+sV8TF Asv9QToxUs2EA== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 11/13] crypto: x86/aesni - Use local .L symbols for code Date: Wed, 12 Apr 2023 13:00:33 +0200 Message-Id: <20230412110035.361447-12-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=33576; i=ardb@kernel.org; h=from:subject; bh=3mcZYgQbypouEIMQmPkYqYgFzytdi22IcIznW9aErXQ=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWsP8D8p12m866TbPcFfO7cY/v2vj/9adbVhrTGtNh5w e9uTirtKGVhEONgkBVTZBGY/ffdztMTpWqdZ8nCzGFlAhnCwMUpABNZ85jhf+Wvd05rbqTdbhFa a5YSk7SbTeNqKO+vIydeiL/+ksbdPZvhvwPXjY1zWW9Z7FXWYZQ+xL6Gy2rHjGNp/MaFaY9WHDM MZQQA X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid cluttering up the kallsyms symbol table with entries that should not end up in things like backtraces, as they have undescriptive and generated identifiers. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/aesni-intel_asm.S | 196 +++++++++--------- arch/x86/crypto/aesni-intel_avx-x86_64.S | 218 ++++++++++---------- 2 files changed, 207 insertions(+), 207 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index ca99a2274d551015..3ac7487ecad2d3f0 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -288,53 +288,53 @@ ALL_F: .octa 0xffffffffffffffffffffffffffffffff # Encrypt/Decrypt first few blocks and $(3<<4), %r12 - jz _initial_num_blocks_is_0_\@ + jz .L_initial_num_blocks_is_0_\@ cmp $(2<<4), %r12 - jb _initial_num_blocks_is_1_\@ - je _initial_num_blocks_is_2_\@ -_initial_num_blocks_is_3_\@: + jb .L_initial_num_blocks_is_1_\@ + je .L_initial_num_blocks_is_2_\@ +.L_initial_num_blocks_is_3_\@: INITIAL_BLOCKS_ENC_DEC %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \ %xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 5, 678, \operation sub $48, %r13 - jmp _initial_blocks_\@ -_initial_num_blocks_is_2_\@: + jmp .L_initial_blocks_\@ +.L_initial_num_blocks_is_2_\@: INITIAL_BLOCKS_ENC_DEC %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \ %xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 6, 78, \operation sub $32, %r13 - jmp _initial_blocks_\@ -_initial_num_blocks_is_1_\@: + jmp .L_initial_blocks_\@ +.L_initial_num_blocks_is_1_\@: INITIAL_BLOCKS_ENC_DEC %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \ %xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 7, 8, \operation sub $16, %r13 - jmp _initial_blocks_\@ -_initial_num_blocks_is_0_\@: + jmp .L_initial_blocks_\@ +.L_initial_num_blocks_is_0_\@: INITIAL_BLOCKS_ENC_DEC %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \ %xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 8, 0, \operation -_initial_blocks_\@: +.L_initial_blocks_\@: # Main loop - Encrypt/Decrypt remaining blocks test %r13, %r13 - je _zero_cipher_left_\@ + je .L_zero_cipher_left_\@ sub $64, %r13 - je _four_cipher_left_\@ -_crypt_by_4_\@: + je .L_four_cipher_left_\@ +.L_crypt_by_4_\@: GHASH_4_ENCRYPT_4_PARALLEL_\operation %xmm9, %xmm10, %xmm11, %xmm12, \ %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, \ %xmm7, %xmm8, enc add $64, %r11 sub $64, %r13 - jne _crypt_by_4_\@ -_four_cipher_left_\@: + jne .L_crypt_by_4_\@ +.L_four_cipher_left_\@: GHASH_LAST_4 %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, \ %xmm15, %xmm1, %xmm2, %xmm3, %xmm4, %xmm8 -_zero_cipher_left_\@: +.L_zero_cipher_left_\@: movdqu %xmm8, AadHash(%arg2) movdqu %xmm0, CurCount(%arg2) mov %arg5, %r13 and $15, %r13 # %r13 = arg5 (mod 16) - je _multiple_of_16_bytes_\@ + je .L_multiple_of_16_bytes_\@ mov %r13, PBlockLen(%arg2) @@ -348,14 +348,14 @@ _zero_cipher_left_\@: movdqu %xmm0, PBlockEncKey(%arg2) cmp $16, %arg5 - jge _large_enough_update_\@ + jge .L_large_enough_update_\@ lea (%arg4,%r11,1), %r10 mov %r13, %r12 READ_PARTIAL_BLOCK %r10 %r12 %xmm2 %xmm1 - jmp _data_read_\@ + jmp .L_data_read_\@ -_large_enough_update_\@: +.L_large_enough_update_\@: sub $16, %r11 add %r13, %r11 @@ -374,7 +374,7 @@ _large_enough_update_\@: # shift right 16-r13 bytes pshufb %xmm2, %xmm1 -_data_read_\@: +.L_data_read_\@: lea ALL_F+16(%rip), %r12 sub %r13, %r12 @@ -409,19 +409,19 @@ _data_read_\@: # Output %r13 bytes movq %xmm0, %rax cmp $8, %r13 - jle _less_than_8_bytes_left_\@ + jle .L_less_than_8_bytes_left_\@ mov %rax, (%arg3 , %r11, 1) add $8, %r11 psrldq $8, %xmm0 movq %xmm0, %rax sub $8, %r13 -_less_than_8_bytes_left_\@: +.L_less_than_8_bytes_left_\@: mov %al, (%arg3, %r11, 1) add $1, %r11 shr $8, %rax sub $1, %r13 - jne _less_than_8_bytes_left_\@ -_multiple_of_16_bytes_\@: + jne .L_less_than_8_bytes_left_\@ +.L_multiple_of_16_bytes_\@: .endm # GCM_COMPLETE Finishes update of tag of last partial block @@ -434,11 +434,11 @@ _multiple_of_16_bytes_\@: mov PBlockLen(%arg2), %r12 test %r12, %r12 - je _partial_done\@ + je .L_partial_done\@ GHASH_MUL %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6 -_partial_done\@: +.L_partial_done\@: mov AadLen(%arg2), %r12 # %r13 = aadLen (number of bytes) shl $3, %r12 # convert into number of bits movd %r12d, %xmm15 # len(A) in %xmm15 @@ -457,44 +457,44 @@ _partial_done\@: movdqu OrigIV(%arg2), %xmm0 # %xmm0 = Y0 ENCRYPT_SINGLE_BLOCK %xmm0, %xmm1 # E(K, Y0) pxor %xmm8, %xmm0 -_return_T_\@: +.L_return_T_\@: mov \AUTHTAG, %r10 # %r10 = authTag mov \AUTHTAGLEN, %r11 # %r11 = auth_tag_len cmp $16, %r11 - je _T_16_\@ + je .L_T_16_\@ cmp $8, %r11 - jl _T_4_\@ -_T_8_\@: + jl .L_T_4_\@ +.L_T_8_\@: movq %xmm0, %rax mov %rax, (%r10) add $8, %r10 sub $8, %r11 psrldq $8, %xmm0 test %r11, %r11 - je _return_T_done_\@ -_T_4_\@: + je .L_return_T_done_\@ +.L_T_4_\@: movd %xmm0, %eax mov %eax, (%r10) add $4, %r10 sub $4, %r11 psrldq $4, %xmm0 test %r11, %r11 - je _return_T_done_\@ -_T_123_\@: + je .L_return_T_done_\@ +.L_T_123_\@: movd %xmm0, %eax cmp $2, %r11 - jl _T_1_\@ + jl .L_T_1_\@ mov %ax, (%r10) cmp $2, %r11 - je _return_T_done_\@ + je .L_return_T_done_\@ add $2, %r10 sar $16, %eax -_T_1_\@: +.L_T_1_\@: mov %al, (%r10) - jmp _return_T_done_\@ -_T_16_\@: + jmp .L_return_T_done_\@ +.L_T_16_\@: movdqu %xmm0, (%r10) -_return_T_done_\@: +.L_return_T_done_\@: .endm #ifdef __x86_64__ @@ -563,30 +563,30 @@ _return_T_done_\@: # Clobbers %rax, DLEN and XMM1 .macro READ_PARTIAL_BLOCK DPTR DLEN XMM1 XMMDst cmp $8, \DLEN - jl _read_lt8_\@ + jl .L_read_lt8_\@ mov (\DPTR), %rax movq %rax, \XMMDst sub $8, \DLEN - jz _done_read_partial_block_\@ + jz .L_done_read_partial_block_\@ xor %eax, %eax -_read_next_byte_\@: +.L_read_next_byte_\@: shl $8, %rax mov 7(\DPTR, \DLEN, 1), %al dec \DLEN - jnz _read_next_byte_\@ + jnz .L_read_next_byte_\@ movq %rax, \XMM1 pslldq $8, \XMM1 por \XMM1, \XMMDst - jmp _done_read_partial_block_\@ -_read_lt8_\@: + jmp .L_done_read_partial_block_\@ +.L_read_lt8_\@: xor %eax, %eax -_read_next_byte_lt8_\@: +.L_read_next_byte_lt8_\@: shl $8, %rax mov -1(\DPTR, \DLEN, 1), %al dec \DLEN - jnz _read_next_byte_lt8_\@ + jnz .L_read_next_byte_lt8_\@ movq %rax, \XMMDst -_done_read_partial_block_\@: +.L_done_read_partial_block_\@: .endm # CALC_AAD_HASH: Calculates the hash of the data which will not be encrypted. @@ -600,8 +600,8 @@ _done_read_partial_block_\@: pxor \TMP6, \TMP6 cmp $16, %r11 - jl _get_AAD_rest\@ -_get_AAD_blocks\@: + jl .L_get_AAD_rest\@ +.L_get_AAD_blocks\@: movdqu (%r10), \TMP7 pshufb %xmm14, \TMP7 # byte-reflect the AAD data pxor \TMP7, \TMP6 @@ -609,14 +609,14 @@ _get_AAD_blocks\@: add $16, %r10 sub $16, %r11 cmp $16, %r11 - jge _get_AAD_blocks\@ + jge .L_get_AAD_blocks\@ movdqu \TMP6, \TMP7 /* read the last <16B of AAD */ -_get_AAD_rest\@: +.L_get_AAD_rest\@: test %r11, %r11 - je _get_AAD_done\@ + je .L_get_AAD_done\@ READ_PARTIAL_BLOCK %r10, %r11, \TMP1, \TMP7 pshufb %xmm14, \TMP7 # byte-reflect the AAD data @@ -624,7 +624,7 @@ _get_AAD_rest\@: GHASH_MUL \TMP7, \HASHKEY, \TMP1, \TMP2, \TMP3, \TMP4, \TMP5 movdqu \TMP7, \TMP6 -_get_AAD_done\@: +.L_get_AAD_done\@: movdqu \TMP6, AadHash(%arg2) .endm @@ -637,21 +637,21 @@ _get_AAD_done\@: AAD_HASH operation mov PBlockLen(%arg2), %r13 test %r13, %r13 - je _partial_block_done_\@ # Leave Macro if no partial blocks + je .L_partial_block_done_\@ # Leave Macro if no partial blocks # Read in input data without over reading cmp $16, \PLAIN_CYPH_LEN - jl _fewer_than_16_bytes_\@ + jl .L_fewer_than_16_bytes_\@ movups (\PLAIN_CYPH_IN), %xmm1 # If more than 16 bytes, just fill xmm - jmp _data_read_\@ + jmp .L_data_read_\@ -_fewer_than_16_bytes_\@: +.L_fewer_than_16_bytes_\@: lea (\PLAIN_CYPH_IN, \DATA_OFFSET, 1), %r10 mov \PLAIN_CYPH_LEN, %r12 READ_PARTIAL_BLOCK %r10 %r12 %xmm0 %xmm1 mov PBlockLen(%arg2), %r13 -_data_read_\@: # Finished reading in data +.L_data_read_\@: # Finished reading in data movdqu PBlockEncKey(%arg2), %xmm9 movdqu HashKey(%arg2), %xmm13 @@ -674,9 +674,9 @@ _data_read_\@: # Finished reading in data sub $16, %r10 # Determine if if partial block is not being filled and # shift mask accordingly - jge _no_extra_mask_1_\@ + jge .L_no_extra_mask_1_\@ sub %r10, %r12 -_no_extra_mask_1_\@: +.L_no_extra_mask_1_\@: movdqu ALL_F-SHIFT_MASK(%r12), %xmm1 # get the appropriate mask to mask out bottom r13 bytes of xmm9 @@ -689,17 +689,17 @@ _no_extra_mask_1_\@: pxor %xmm3, \AAD_HASH test %r10, %r10 - jl _partial_incomplete_1_\@ + jl .L_partial_incomplete_1_\@ # GHASH computation for the last <16 Byte block GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6 xor %eax, %eax mov %rax, PBlockLen(%arg2) - jmp _dec_done_\@ -_partial_incomplete_1_\@: + jmp .L_dec_done_\@ +.L_partial_incomplete_1_\@: add \PLAIN_CYPH_LEN, PBlockLen(%arg2) -_dec_done_\@: +.L_dec_done_\@: movdqu \AAD_HASH, AadHash(%arg2) .else pxor %xmm1, %xmm9 # Plaintext XOR E(K, Yn) @@ -710,9 +710,9 @@ _dec_done_\@: sub $16, %r10 # Determine if if partial block is not being filled and # shift mask accordingly - jge _no_extra_mask_2_\@ + jge .L_no_extra_mask_2_\@ sub %r10, %r12 -_no_extra_mask_2_\@: +.L_no_extra_mask_2_\@: movdqu ALL_F-SHIFT_MASK(%r12), %xmm1 # get the appropriate mask to mask out bottom r13 bytes of xmm9 @@ -724,17 +724,17 @@ _no_extra_mask_2_\@: pxor %xmm9, \AAD_HASH test %r10, %r10 - jl _partial_incomplete_2_\@ + jl .L_partial_incomplete_2_\@ # GHASH computation for the last <16 Byte block GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6 xor %eax, %eax mov %rax, PBlockLen(%arg2) - jmp _encode_done_\@ -_partial_incomplete_2_\@: + jmp .L_encode_done_\@ +.L_partial_incomplete_2_\@: add \PLAIN_CYPH_LEN, PBlockLen(%arg2) -_encode_done_\@: +.L_encode_done_\@: movdqu \AAD_HASH, AadHash(%arg2) movdqa SHUF_MASK(%rip), %xmm10 @@ -744,32 +744,32 @@ _encode_done_\@: .endif # output encrypted Bytes test %r10, %r10 - jl _partial_fill_\@ + jl .L_partial_fill_\@ mov %r13, %r12 mov $16, %r13 # Set r13 to be the number of bytes to write out sub %r12, %r13 - jmp _count_set_\@ -_partial_fill_\@: + jmp .L_count_set_\@ +.L_partial_fill_\@: mov \PLAIN_CYPH_LEN, %r13 -_count_set_\@: +.L_count_set_\@: movdqa %xmm9, %xmm0 movq %xmm0, %rax cmp $8, %r13 - jle _less_than_8_bytes_left_\@ + jle .L_less_than_8_bytes_left_\@ mov %rax, (\CYPH_PLAIN_OUT, \DATA_OFFSET, 1) add $8, \DATA_OFFSET psrldq $8, %xmm0 movq %xmm0, %rax sub $8, %r13 -_less_than_8_bytes_left_\@: +.L_less_than_8_bytes_left_\@: movb %al, (\CYPH_PLAIN_OUT, \DATA_OFFSET, 1) add $1, \DATA_OFFSET shr $8, %rax sub $1, %r13 - jne _less_than_8_bytes_left_\@ -_partial_block_done_\@: + jne .L_less_than_8_bytes_left_\@ +.L_partial_block_done_\@: .endm # PARTIAL_BLOCK /* @@ -813,14 +813,14 @@ _partial_block_done_\@: shr $2,%eax # 128->4, 192->6, 256->8 add $5,%eax # 128->9, 192->11, 256->13 -aes_loop_initial_\@: +.Laes_loop_initial_\@: MOVADQ (%r10),\TMP1 .irpc index, \i_seq aesenc \TMP1, %xmm\index .endr add $16,%r10 sub $1,%eax - jnz aes_loop_initial_\@ + jnz .Laes_loop_initial_\@ MOVADQ (%r10), \TMP1 .irpc index, \i_seq @@ -861,7 +861,7 @@ aes_loop_initial_\@: GHASH_MUL %xmm8, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1 .endif cmp $64, %r13 - jl _initial_blocks_done\@ + jl .L_initial_blocks_done\@ # no need for precomputed values /* * @@ -908,18 +908,18 @@ aes_loop_initial_\@: mov keysize,%eax shr $2,%eax # 128->4, 192->6, 256->8 sub $4,%eax # 128->0, 192->2, 256->4 - jz aes_loop_pre_done\@ + jz .Laes_loop_pre_done\@ -aes_loop_pre_\@: +.Laes_loop_pre_\@: MOVADQ (%r10),\TMP2 .irpc index, 1234 aesenc \TMP2, %xmm\index .endr add $16,%r10 sub $1,%eax - jnz aes_loop_pre_\@ + jnz .Laes_loop_pre_\@ -aes_loop_pre_done\@: +.Laes_loop_pre_done\@: MOVADQ (%r10), \TMP2 aesenclast \TMP2, \XMM1 aesenclast \TMP2, \XMM2 @@ -963,7 +963,7 @@ aes_loop_pre_done\@: pshufb %xmm14, \XMM3 # perform a 16 byte swap pshufb %xmm14, \XMM4 # perform a 16 byte swap -_initial_blocks_done\@: +.L_initial_blocks_done\@: .endm @@ -1095,18 +1095,18 @@ TMP6 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 operation mov keysize,%eax shr $2,%eax # 128->4, 192->6, 256->8 sub $4,%eax # 128->0, 192->2, 256->4 - jz aes_loop_par_enc_done\@ + jz .Laes_loop_par_enc_done\@ -aes_loop_par_enc\@: +.Laes_loop_par_enc\@: MOVADQ (%r10),\TMP3 .irpc index, 1234 aesenc \TMP3, %xmm\index .endr add $16,%r10 sub $1,%eax - jnz aes_loop_par_enc\@ + jnz .Laes_loop_par_enc\@ -aes_loop_par_enc_done\@: +.Laes_loop_par_enc_done\@: MOVADQ (%r10), \TMP3 aesenclast \TMP3, \XMM1 # Round 10 aesenclast \TMP3, \XMM2 @@ -1303,18 +1303,18 @@ TMP6 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 operation mov keysize,%eax shr $2,%eax # 128->4, 192->6, 256->8 sub $4,%eax # 128->0, 192->2, 256->4 - jz aes_loop_par_dec_done\@ + jz .Laes_loop_par_dec_done\@ -aes_loop_par_dec\@: +.Laes_loop_par_dec\@: MOVADQ (%r10),\TMP3 .irpc index, 1234 aesenc \TMP3, %xmm\index .endr add $16,%r10 sub $1,%eax - jnz aes_loop_par_dec\@ + jnz .Laes_loop_par_dec\@ -aes_loop_par_dec_done\@: +.Laes_loop_par_dec_done\@: MOVADQ (%r10), \TMP3 aesenclast \TMP3, \XMM1 # last round aesenclast \TMP3, \XMM2 diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S index b6ca80f188ff9418..46cddd78857bd9eb 100644 --- a/arch/x86/crypto/aesni-intel_avx-x86_64.S +++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S @@ -278,68 +278,68 @@ VARIABLE_OFFSET = 16*8 mov %r13, %r12 shr $4, %r12 and $7, %r12 - jz _initial_num_blocks_is_0\@ + jz .L_initial_num_blocks_is_0\@ cmp $7, %r12 - je _initial_num_blocks_is_7\@ + je .L_initial_num_blocks_is_7\@ cmp $6, %r12 - je _initial_num_blocks_is_6\@ + je .L_initial_num_blocks_is_6\@ cmp $5, %r12 - je _initial_num_blocks_is_5\@ + je .L_initial_num_blocks_is_5\@ cmp $4, %r12 - je _initial_num_blocks_is_4\@ + je .L_initial_num_blocks_is_4\@ cmp $3, %r12 - je _initial_num_blocks_is_3\@ + je .L_initial_num_blocks_is_3\@ cmp $2, %r12 - je _initial_num_blocks_is_2\@ + je .L_initial_num_blocks_is_2\@ - jmp _initial_num_blocks_is_1\@ + jmp .L_initial_num_blocks_is_1\@ -_initial_num_blocks_is_7\@: +.L_initial_num_blocks_is_7\@: \INITIAL_BLOCKS \REP, 7, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*7, %r13 - jmp _initial_blocks_encrypted\@ + jmp .L_initial_blocks_encrypted\@ -_initial_num_blocks_is_6\@: +.L_initial_num_blocks_is_6\@: \INITIAL_BLOCKS \REP, 6, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*6, %r13 - jmp _initial_blocks_encrypted\@ + jmp .L_initial_blocks_encrypted\@ -_initial_num_blocks_is_5\@: +.L_initial_num_blocks_is_5\@: \INITIAL_BLOCKS \REP, 5, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*5, %r13 - jmp _initial_blocks_encrypted\@ + jmp .L_initial_blocks_encrypted\@ -_initial_num_blocks_is_4\@: +.L_initial_num_blocks_is_4\@: \INITIAL_BLOCKS \REP, 4, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*4, %r13 - jmp _initial_blocks_encrypted\@ + jmp .L_initial_blocks_encrypted\@ -_initial_num_blocks_is_3\@: +.L_initial_num_blocks_is_3\@: \INITIAL_BLOCKS \REP, 3, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*3, %r13 - jmp _initial_blocks_encrypted\@ + jmp .L_initial_blocks_encrypted\@ -_initial_num_blocks_is_2\@: +.L_initial_num_blocks_is_2\@: \INITIAL_BLOCKS \REP, 2, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*2, %r13 - jmp _initial_blocks_encrypted\@ + jmp .L_initial_blocks_encrypted\@ -_initial_num_blocks_is_1\@: +.L_initial_num_blocks_is_1\@: \INITIAL_BLOCKS \REP, 1, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC sub $16*1, %r13 - jmp _initial_blocks_encrypted\@ + jmp .L_initial_blocks_encrypted\@ -_initial_num_blocks_is_0\@: +.L_initial_num_blocks_is_0\@: \INITIAL_BLOCKS \REP, 0, %xmm12, %xmm13, %xmm14, %xmm15, %xmm11, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm10, %xmm0, \ENC_DEC -_initial_blocks_encrypted\@: +.L_initial_blocks_encrypted\@: test %r13, %r13 - je _zero_cipher_left\@ + je .L_zero_cipher_left\@ sub $128, %r13 - je _eight_cipher_left\@ + je .L_eight_cipher_left\@ @@ -349,9 +349,9 @@ _initial_blocks_encrypted\@: vpshufb SHUF_MASK(%rip), %xmm9, %xmm9 -_encrypt_by_8_new\@: +.L_encrypt_by_8_new\@: cmp $(255-8), %r15d - jg _encrypt_by_8\@ + jg .L_encrypt_by_8\@ @@ -359,30 +359,30 @@ _encrypt_by_8_new\@: \GHASH_8_ENCRYPT_8_PARALLEL \REP, %xmm0, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm15, out_order, \ENC_DEC add $128, %r11 sub $128, %r13 - jne _encrypt_by_8_new\@ + jne .L_encrypt_by_8_new\@ vpshufb SHUF_MASK(%rip), %xmm9, %xmm9 - jmp _eight_cipher_left\@ + jmp .L_eight_cipher_left\@ -_encrypt_by_8\@: +.L_encrypt_by_8\@: vpshufb SHUF_MASK(%rip), %xmm9, %xmm9 add $8, %r15b \GHASH_8_ENCRYPT_8_PARALLEL \REP, %xmm0, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm9, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, %xmm15, in_order, \ENC_DEC vpshufb SHUF_MASK(%rip), %xmm9, %xmm9 add $128, %r11 sub $128, %r13 - jne _encrypt_by_8_new\@ + jne .L_encrypt_by_8_new\@ vpshufb SHUF_MASK(%rip), %xmm9, %xmm9 -_eight_cipher_left\@: +.L_eight_cipher_left\@: \GHASH_LAST_8 %xmm0, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm15, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8 -_zero_cipher_left\@: +.L_zero_cipher_left\@: vmovdqu %xmm14, AadHash(arg2) vmovdqu %xmm9, CurCount(arg2) @@ -390,7 +390,7 @@ _zero_cipher_left\@: mov arg5, %r13 and $15, %r13 # r13 = (arg5 mod 16) - je _multiple_of_16_bytes\@ + je .L_multiple_of_16_bytes\@ # handle the last <16 Byte block separately @@ -404,7 +404,7 @@ _zero_cipher_left\@: vmovdqu %xmm9, PBlockEncKey(arg2) cmp $16, arg5 - jge _large_enough_update\@ + jge .L_large_enough_update\@ lea (arg4,%r11,1), %r10 mov %r13, %r12 @@ -416,9 +416,9 @@ _zero_cipher_left\@: # able to shift 16-r13 bytes (r13 is the # number of bytes in plaintext mod 16) - jmp _final_ghash_mul\@ + jmp .L_final_ghash_mul\@ -_large_enough_update\@: +.L_large_enough_update\@: sub $16, %r11 add %r13, %r11 @@ -437,7 +437,7 @@ _large_enough_update\@: # shift right 16-r13 bytes vpshufb %xmm2, %xmm1, %xmm1 -_final_ghash_mul\@: +.L_final_ghash_mul\@: .if \ENC_DEC == DEC vmovdqa %xmm1, %xmm2 vpxor %xmm1, %xmm9, %xmm9 # Plaintext XOR E(K, Yn) @@ -466,7 +466,7 @@ _final_ghash_mul\@: # output r13 Bytes vmovq %xmm9, %rax cmp $8, %r13 - jle _less_than_8_bytes_left\@ + jle .L_less_than_8_bytes_left\@ mov %rax, (arg3 , %r11) add $8, %r11 @@ -474,15 +474,15 @@ _final_ghash_mul\@: vmovq %xmm9, %rax sub $8, %r13 -_less_than_8_bytes_left\@: +.L_less_than_8_bytes_left\@: movb %al, (arg3 , %r11) add $1, %r11 shr $8, %rax sub $1, %r13 - jne _less_than_8_bytes_left\@ + jne .L_less_than_8_bytes_left\@ ############################# -_multiple_of_16_bytes\@: +.L_multiple_of_16_bytes\@: .endm @@ -495,12 +495,12 @@ _multiple_of_16_bytes\@: mov PBlockLen(arg2), %r12 test %r12, %r12 - je _partial_done\@ + je .L_partial_done\@ #GHASH computation for the last <16 Byte block \GHASH_MUL %xmm14, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6 -_partial_done\@: +.L_partial_done\@: mov AadLen(arg2), %r12 # r12 = aadLen (number of bytes) shl $3, %r12 # convert into number of bits vmovd %r12d, %xmm15 # len(A) in xmm15 @@ -523,49 +523,49 @@ _partial_done\@: -_return_T\@: +.L_return_T\@: mov \AUTH_TAG, %r10 # r10 = authTag mov \AUTH_TAG_LEN, %r11 # r11 = auth_tag_len cmp $16, %r11 - je _T_16\@ + je .L_T_16\@ cmp $8, %r11 - jl _T_4\@ + jl .L_T_4\@ -_T_8\@: +.L_T_8\@: vmovq %xmm9, %rax mov %rax, (%r10) add $8, %r10 sub $8, %r11 vpsrldq $8, %xmm9, %xmm9 test %r11, %r11 - je _return_T_done\@ -_T_4\@: + je .L_return_T_done\@ +.L_T_4\@: vmovd %xmm9, %eax mov %eax, (%r10) add $4, %r10 sub $4, %r11 vpsrldq $4, %xmm9, %xmm9 test %r11, %r11 - je _return_T_done\@ -_T_123\@: + je .L_return_T_done\@ +.L_T_123\@: vmovd %xmm9, %eax cmp $2, %r11 - jl _T_1\@ + jl .L_T_1\@ mov %ax, (%r10) cmp $2, %r11 - je _return_T_done\@ + je .L_return_T_done\@ add $2, %r10 sar $16, %eax -_T_1\@: +.L_T_1\@: mov %al, (%r10) - jmp _return_T_done\@ + jmp .L_return_T_done\@ -_T_16\@: +.L_T_16\@: vmovdqu %xmm9, (%r10) -_return_T_done\@: +.L_return_T_done\@: .endm .macro CALC_AAD_HASH GHASH_MUL AAD AADLEN T1 T2 T3 T4 T5 T6 T7 T8 @@ -579,8 +579,8 @@ _return_T_done\@: vpxor \T8, \T8, \T8 vpxor \T7, \T7, \T7 cmp $16, %r11 - jl _get_AAD_rest8\@ -_get_AAD_blocks\@: + jl .L_get_AAD_rest8\@ +.L_get_AAD_blocks\@: vmovdqu (%r10), \T7 vpshufb SHUF_MASK(%rip), \T7, \T7 vpxor \T7, \T8, \T8 @@ -589,29 +589,29 @@ _get_AAD_blocks\@: sub $16, %r12 sub $16, %r11 cmp $16, %r11 - jge _get_AAD_blocks\@ + jge .L_get_AAD_blocks\@ vmovdqu \T8, \T7 test %r11, %r11 - je _get_AAD_done\@ + je .L_get_AAD_done\@ vpxor \T7, \T7, \T7 /* read the last <16B of AAD. since we have at least 4B of data right after the AAD (the ICV, and maybe some CT), we can read 4B/8B blocks safely, and then get rid of the extra stuff */ -_get_AAD_rest8\@: +.L_get_AAD_rest8\@: cmp $4, %r11 - jle _get_AAD_rest4\@ + jle .L_get_AAD_rest4\@ movq (%r10), \T1 add $8, %r10 sub $8, %r11 vpslldq $8, \T1, \T1 vpsrldq $8, \T7, \T7 vpxor \T1, \T7, \T7 - jmp _get_AAD_rest8\@ -_get_AAD_rest4\@: + jmp .L_get_AAD_rest8\@ +.L_get_AAD_rest4\@: test %r11, %r11 - jle _get_AAD_rest0\@ + jle .L_get_AAD_rest0\@ mov (%r10), %eax movq %rax, \T1 add $4, %r10 @@ -619,7 +619,7 @@ _get_AAD_rest4\@: vpslldq $12, \T1, \T1 vpsrldq $4, \T7, \T7 vpxor \T1, \T7, \T7 -_get_AAD_rest0\@: +.L_get_AAD_rest0\@: /* finalize: shift out the extra bytes we read, and align left. since pslldq can only shift by an immediate, we use vpshufb and a pair of shuffle masks */ @@ -629,12 +629,12 @@ _get_AAD_rest0\@: andq $~3, %r11 vpshufb (%r11), \T7, \T7 vpand \T1, \T7, \T7 -_get_AAD_rest_final\@: +.L_get_AAD_rest_final\@: vpshufb SHUF_MASK(%rip), \T7, \T7 vpxor \T8, \T7, \T7 \GHASH_MUL \T7, \T2, \T1, \T3, \T4, \T5, \T6 -_get_AAD_done\@: +.L_get_AAD_done\@: vmovdqu \T7, AadHash(arg2) .endm @@ -685,28 +685,28 @@ _get_AAD_done\@: vpxor \XMMDst, \XMMDst, \XMMDst cmp $8, \DLEN - jl _read_lt8_\@ + jl .L_read_lt8_\@ mov (\DPTR), %rax vpinsrq $0, %rax, \XMMDst, \XMMDst sub $8, \DLEN - jz _done_read_partial_block_\@ + jz .L_done_read_partial_block_\@ xor %eax, %eax -_read_next_byte_\@: +.L_read_next_byte_\@: shl $8, %rax mov 7(\DPTR, \DLEN, 1), %al dec \DLEN - jnz _read_next_byte_\@ + jnz .L_read_next_byte_\@ vpinsrq $1, %rax, \XMMDst, \XMMDst - jmp _done_read_partial_block_\@ -_read_lt8_\@: + jmp .L_done_read_partial_block_\@ +.L_read_lt8_\@: xor %eax, %eax -_read_next_byte_lt8_\@: +.L_read_next_byte_lt8_\@: shl $8, %rax mov -1(\DPTR, \DLEN, 1), %al dec \DLEN - jnz _read_next_byte_lt8_\@ + jnz .L_read_next_byte_lt8_\@ vpinsrq $0, %rax, \XMMDst, \XMMDst -_done_read_partial_block_\@: +.L_done_read_partial_block_\@: .endm # PARTIAL_BLOCK: Handles encryption/decryption and the tag partial blocks @@ -718,21 +718,21 @@ _done_read_partial_block_\@: AAD_HASH ENC_DEC mov PBlockLen(arg2), %r13 test %r13, %r13 - je _partial_block_done_\@ # Leave Macro if no partial blocks + je .L_partial_block_done_\@ # Leave Macro if no partial blocks # Read in input data without over reading cmp $16, \PLAIN_CYPH_LEN - jl _fewer_than_16_bytes_\@ + jl .L_fewer_than_16_bytes_\@ vmovdqu (\PLAIN_CYPH_IN), %xmm1 # If more than 16 bytes, just fill xmm - jmp _data_read_\@ + jmp .L_data_read_\@ -_fewer_than_16_bytes_\@: +.L_fewer_than_16_bytes_\@: lea (\PLAIN_CYPH_IN, \DATA_OFFSET, 1), %r10 mov \PLAIN_CYPH_LEN, %r12 READ_PARTIAL_BLOCK %r10 %r12 %xmm1 mov PBlockLen(arg2), %r13 -_data_read_\@: # Finished reading in data +.L_data_read_\@: # Finished reading in data vmovdqu PBlockEncKey(arg2), %xmm9 vmovdqu HashKey(arg2), %xmm13 @@ -755,9 +755,9 @@ _data_read_\@: # Finished reading in data sub $16, %r10 # Determine if if partial block is not being filled and # shift mask accordingly - jge _no_extra_mask_1_\@ + jge .L_no_extra_mask_1_\@ sub %r10, %r12 -_no_extra_mask_1_\@: +.L_no_extra_mask_1_\@: vmovdqu ALL_F-SHIFT_MASK(%r12), %xmm1 # get the appropriate mask to mask out bottom r13 bytes of xmm9 @@ -770,17 +770,17 @@ _no_extra_mask_1_\@: vpxor %xmm3, \AAD_HASH, \AAD_HASH test %r10, %r10 - jl _partial_incomplete_1_\@ + jl .L_partial_incomplete_1_\@ # GHASH computation for the last <16 Byte block \GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6 xor %eax,%eax mov %rax, PBlockLen(arg2) - jmp _dec_done_\@ -_partial_incomplete_1_\@: + jmp .L_dec_done_\@ +.L_partial_incomplete_1_\@: add \PLAIN_CYPH_LEN, PBlockLen(arg2) -_dec_done_\@: +.L_dec_done_\@: vmovdqu \AAD_HASH, AadHash(arg2) .else vpxor %xmm1, %xmm9, %xmm9 # Plaintext XOR E(K, Yn) @@ -791,9 +791,9 @@ _dec_done_\@: sub $16, %r10 # Determine if if partial block is not being filled and # shift mask accordingly - jge _no_extra_mask_2_\@ + jge .L_no_extra_mask_2_\@ sub %r10, %r12 -_no_extra_mask_2_\@: +.L_no_extra_mask_2_\@: vmovdqu ALL_F-SHIFT_MASK(%r12), %xmm1 # get the appropriate mask to mask out bottom r13 bytes of xmm9 @@ -805,17 +805,17 @@ _no_extra_mask_2_\@: vpxor %xmm9, \AAD_HASH, \AAD_HASH test %r10, %r10 - jl _partial_incomplete_2_\@ + jl .L_partial_incomplete_2_\@ # GHASH computation for the last <16 Byte block \GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6 xor %eax,%eax mov %rax, PBlockLen(arg2) - jmp _encode_done_\@ -_partial_incomplete_2_\@: + jmp .L_encode_done_\@ +.L_partial_incomplete_2_\@: add \PLAIN_CYPH_LEN, PBlockLen(arg2) -_encode_done_\@: +.L_encode_done_\@: vmovdqu \AAD_HASH, AadHash(arg2) vmovdqa SHUF_MASK(%rip), %xmm10 @@ -825,32 +825,32 @@ _encode_done_\@: .endif # output encrypted Bytes test %r10, %r10 - jl _partial_fill_\@ + jl .L_partial_fill_\@ mov %r13, %r12 mov $16, %r13 # Set r13 to be the number of bytes to write out sub %r12, %r13 - jmp _count_set_\@ -_partial_fill_\@: + jmp .L_count_set_\@ +.L_partial_fill_\@: mov \PLAIN_CYPH_LEN, %r13 -_count_set_\@: +.L_count_set_\@: vmovdqa %xmm9, %xmm0 vmovq %xmm0, %rax cmp $8, %r13 - jle _less_than_8_bytes_left_\@ + jle .L_less_than_8_bytes_left_\@ mov %rax, (\CYPH_PLAIN_OUT, \DATA_OFFSET, 1) add $8, \DATA_OFFSET psrldq $8, %xmm0 vmovq %xmm0, %rax sub $8, %r13 -_less_than_8_bytes_left_\@: +.L_less_than_8_bytes_left_\@: movb %al, (\CYPH_PLAIN_OUT, \DATA_OFFSET, 1) add $1, \DATA_OFFSET shr $8, %rax sub $1, %r13 - jne _less_than_8_bytes_left_\@ -_partial_block_done_\@: + jne .L_less_than_8_bytes_left_\@ +.L_partial_block_done_\@: .endm # PARTIAL_BLOCK ############################################################################### @@ -1051,7 +1051,7 @@ _partial_block_done_\@: vmovdqa \XMM8, \T3 cmp $128, %r13 - jl _initial_blocks_done\@ # no need for precomputed constants + jl .L_initial_blocks_done\@ # no need for precomputed constants ############################################################################### # Haskey_i_k holds XORed values of the low and high parts of the Haskey_i @@ -1193,7 +1193,7 @@ _partial_block_done_\@: ############################################################################### -_initial_blocks_done\@: +.L_initial_blocks_done\@: .endm @@ -2001,7 +2001,7 @@ SYM_FUNC_END(aesni_gcm_finalize_avx_gen2) vmovdqa \XMM8, \T3 cmp $128, %r13 - jl _initial_blocks_done\@ # no need for precomputed constants + jl .L_initial_blocks_done\@ # no need for precomputed constants ############################################################################### # Haskey_i_k holds XORed values of the low and high parts of the Haskey_i @@ -2145,7 +2145,7 @@ SYM_FUNC_END(aesni_gcm_finalize_avx_gen2) ############################################################################### -_initial_blocks_done\@: +.L_initial_blocks_done\@: .endm From patchwork Wed Apr 12 11:00:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208786 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 170D1C77B76 for ; Wed, 12 Apr 2023 11:01:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229737AbjDLLBT (ORCPT ); Wed, 12 Apr 2023 07:01:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229735AbjDLLBO (ORCPT ); Wed, 12 Apr 2023 07:01:14 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33DD772BE for ; Wed, 12 Apr 2023 04:01:07 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B55A762889 for ; Wed, 12 Apr 2023 11:01:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 066C3C4339B; Wed, 12 Apr 2023 11:01:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297266; bh=5OVijsRVQVtqmMk0R5SCMF+TcfjOykH+XE0rrmzmn+U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LT0lH7d+1EOQ537YAhmnobGOrnO5Z0TOPJxRCHOIDltu9hGN8VXyGEk+8aDbDtFuQ cgx6fIqTFhVhCXHsT3l6cbAgX2UVPnHfAwidld/TSZssLPdbYK5BGNoXrZ81gTmQmy G2bJcrQDl7lsv/7P5FtmhILdbKdDFZgnPko6kMUD9gQMx3NDeHULKD3yP88mEWaq3Y wmO9A4WNyiGCxcKYerrQwTcdOOSZ2joh6HPn6Ev7ZOJWvRn6Jn4jaIHtKaSXWxiWGv v2hiKzbAmFRzb+hirw1mZXJtWGny37LBz/75zNKrvL5BBplLclOAyReaFWo8I0fV4E /2B83cPFCLgtQ== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 12/13] crypto: x86/crc32 - Use local .L symbols for code Date: Wed, 12 Apr 2023 13:00:34 +0200 Message-Id: <20230412110035.361447-13-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=7697; i=ardb@kernel.org; h=from:subject; bh=5OVijsRVQVtqmMk0R5SCMF+TcfjOykH+XE0rrmzmn+U=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWsP1BA9Om/x9uPVakWtEXxfr+7S6jzyNfNd8Tqfi7YW uC9Yt2BjlIWBjEOBlkxRRaB2X/f7Tw9UarWeZYszBxWJpAhDFycAjARqQBGhk72N/nH5poe4qhy ZGfw4V4Vyex/8J/Xy9TVd8RPbi8wnszwP6d0ydXOqdedkkvtT2za4L5U2ptxlVyKuq2XhE7zlQg eJgA= X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid cluttering up the kallsyms symbol table with entries that should not end up in things like backtraces, as they have undescriptive and generated identifiers. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/crc32-pclmul_asm.S | 16 ++--- arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 67 ++++++++++---------- 2 files changed, 41 insertions(+), 42 deletions(-) diff --git a/arch/x86/crypto/crc32-pclmul_asm.S b/arch/x86/crypto/crc32-pclmul_asm.S index ca53e96996ac2143..5d31137e2c7dfca0 100644 --- a/arch/x86/crypto/crc32-pclmul_asm.S +++ b/arch/x86/crypto/crc32-pclmul_asm.S @@ -90,7 +90,7 @@ SYM_FUNC_START(crc32_pclmul_le_16) /* buffer and buffer size are 16 bytes aligne sub $0x40, LEN add $0x40, BUF cmp $0x40, LEN - jb less_64 + jb .Lless_64 #ifdef __x86_64__ movdqa .Lconstant_R2R1(%rip), CONSTANT @@ -98,7 +98,7 @@ SYM_FUNC_START(crc32_pclmul_le_16) /* buffer and buffer size are 16 bytes aligne movdqa .Lconstant_R2R1, CONSTANT #endif -loop_64:/* 64 bytes Full cache line folding */ +.Lloop_64:/* 64 bytes Full cache line folding */ prefetchnta 0x40(BUF) movdqa %xmm1, %xmm5 movdqa %xmm2, %xmm6 @@ -139,8 +139,8 @@ loop_64:/* 64 bytes Full cache line folding */ sub $0x40, LEN add $0x40, BUF cmp $0x40, LEN - jge loop_64 -less_64:/* Folding cache line into 128bit */ + jge .Lloop_64 +.Lless_64:/* Folding cache line into 128bit */ #ifdef __x86_64__ movdqa .Lconstant_R4R3(%rip), CONSTANT #else @@ -167,8 +167,8 @@ less_64:/* Folding cache line into 128bit */ pxor %xmm4, %xmm1 cmp $0x10, LEN - jb fold_64 -loop_16:/* Folding rest buffer into 128bit */ + jb .Lfold_64 +.Lloop_16:/* Folding rest buffer into 128bit */ movdqa %xmm1, %xmm5 pclmulqdq $0x00, CONSTANT, %xmm1 pclmulqdq $0x11, CONSTANT, %xmm5 @@ -177,9 +177,9 @@ loop_16:/* Folding rest buffer into 128bit */ sub $0x10, LEN add $0x10, BUF cmp $0x10, LEN - jge loop_16 + jge .Lloop_16 -fold_64: +.Lfold_64: /* perform the last 64 bit fold, also adds 32 zeroes * to the input stream */ pclmulqdq $0x01, %xmm1, CONSTANT /* R4 * xmm1.low */ diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S index 5f843dce77f1de66..81ce0f4db555ce03 100644 --- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S @@ -49,15 +49,15 @@ ## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction .macro LABEL prefix n -\prefix\n\(): +.L\prefix\n\(): .endm .macro JMPTBL_ENTRY i -.quad crc_\i +.quad .Lcrc_\i .endm .macro JNC_LESS_THAN j - jnc less_than_\j + jnc .Lless_than_\j .endm # Define threshold where buffers are considered "small" and routed to more @@ -108,30 +108,30 @@ SYM_FUNC_START(crc_pcl) neg %bufp and $7, %bufp # calculate the unalignment amount of # the address - je proc_block # Skip if aligned + je .Lproc_block # Skip if aligned ## If len is less than 8 and we're unaligned, we need to jump ## to special code to avoid reading beyond the end of the buffer cmp $8, len - jae do_align + jae .Ldo_align # less_than_8 expects length in upper 3 bits of len_dw # less_than_8_post_shl1 expects length = carryflag * 8 + len_dw[31:30] shl $32-3+1, len_dw - jmp less_than_8_post_shl1 + jmp .Lless_than_8_post_shl1 -do_align: +.Ldo_align: #### Calculate CRC of unaligned bytes of the buffer (if any) movq (bufptmp), tmp # load a quadward from the buffer add %bufp, bufptmp # align buffer pointer for quadword # processing sub %bufp, len # update buffer length -align_loop: +.Lalign_loop: crc32b %bl, crc_init_dw # compute crc32 of 1-byte shr $8, tmp # get next byte dec %bufp - jne align_loop + jne .Lalign_loop -proc_block: +.Lproc_block: ################################################################ ## 2) PROCESS BLOCKS: @@ -141,11 +141,11 @@ proc_block: movq len, tmp # save num bytes in tmp cmpq $128*24, len - jae full_block + jae .Lfull_block -continue_block: +.Lcontinue_block: cmpq $SMALL_SIZE, len - jb small + jb .Lsmall ## len < 128*24 movq $2731, %rax # 2731 = ceil(2^16 / 24) @@ -175,7 +175,7 @@ continue_block: ################################################################ ## 2a) PROCESS FULL BLOCKS: ################################################################ -full_block: +.Lfull_block: movl $128,%eax lea 128*8*2(block_0), block_1 lea 128*8*3(block_0), block_2 @@ -190,7 +190,6 @@ full_block: ## 3) CRC Array: ################################################################ -crc_array: i=128 .rept 128-1 .altmacro @@ -243,28 +242,28 @@ LABEL crc_ 0 ENDBR mov tmp, len cmp $128*24, tmp - jae full_block + jae .Lfull_block cmp $24, tmp - jae continue_block + jae .Lcontinue_block -less_than_24: +.Lless_than_24: shl $32-4, len_dw # less_than_16 expects length # in upper 4 bits of len_dw - jnc less_than_16 + jnc .Lless_than_16 crc32q (bufptmp), crc_init crc32q 8(bufptmp), crc_init - jz do_return + jz .Ldo_return add $16, bufptmp # len is less than 8 if we got here # less_than_8 expects length in upper 3 bits of len_dw # less_than_8_post_shl1 expects length = carryflag * 8 + len_dw[31:30] shl $2, len_dw - jmp less_than_8_post_shl1 + jmp .Lless_than_8_post_shl1 ####################################################################### ## 6) LESS THAN 256-bytes REMAIN AT THIS POINT (8-bits of len are full) ####################################################################### -small: +.Lsmall: shl $32-8, len_dw # Prepare len_dw for less_than_256 j=256 .rept 5 # j = {256, 128, 64, 32, 16} @@ -280,32 +279,32 @@ LABEL less_than_ %j # less_than_j: Length should be in crc32q i(bufptmp), crc_init # Compute crc32 of 8-byte data i=i+8 .endr - jz do_return # Return if remaining length is zero + jz .Ldo_return # Return if remaining length is zero add $j, bufptmp # Advance buf .endr -less_than_8: # Length should be stored in +.Lless_than_8: # Length should be stored in # upper 3 bits of len_dw shl $1, len_dw -less_than_8_post_shl1: - jnc less_than_4 +.Lless_than_8_post_shl1: + jnc .Lless_than_4 crc32l (bufptmp), crc_init_dw # CRC of 4 bytes - jz do_return # return if remaining data is zero + jz .Ldo_return # return if remaining data is zero add $4, bufptmp -less_than_4: # Length should be stored in +.Lless_than_4: # Length should be stored in # upper 2 bits of len_dw shl $1, len_dw - jnc less_than_2 + jnc .Lless_than_2 crc32w (bufptmp), crc_init_dw # CRC of 2 bytes - jz do_return # return if remaining data is zero + jz .Ldo_return # return if remaining data is zero add $2, bufptmp -less_than_2: # Length should be stored in the MSB +.Lless_than_2: # Length should be stored in the MSB # of len_dw shl $1, len_dw - jnc less_than_1 + jnc .Lless_than_1 crc32b (bufptmp), crc_init_dw # CRC of 1 byte -less_than_1: # Length should be zero -do_return: +.Lless_than_1: # Length should be zero +.Ldo_return: movq crc_init, %rax popq %rsi popq %rdi From patchwork Wed Apr 12 11:00:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 13208785 X-Patchwork-Delegate: herbert@gondor.apana.org.au Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A014C77B71 for ; Wed, 12 Apr 2023 11:01:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229685AbjDLLBT (ORCPT ); Wed, 12 Apr 2023 07:01:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229737AbjDLLBP (ORCPT ); Wed, 12 Apr 2023 07:01:15 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAB527680 for ; Wed, 12 Apr 2023 04:01:08 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 45D2263256 for ; Wed, 12 Apr 2023 11:01:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8F030C4339E; Wed, 12 Apr 2023 11:01:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1681297267; bh=52LINBgS3GLlIA5zBqsuswDWcuMBpD6xa5rGbvxGo5I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OZYgnng/Kgp++RGyTQl9dKrzzekbmtn5C2+qYxmUv23GMFxyx9f8YkAbjAQuE7M7E VnE93uZZeBhC+noBSFiDBxMRfh2r0Ba+FmEdHkX2v4HwgihEE7T35TKxSmiqlmJ6xA n9tTfiSwlQazjMa/BY5TsZssUX+FridgmGf/rsYGLkFe63l+bf9IbA5RVeyTK5fhfF JmuB9xO8A7PF637KANaXiX/hOJxyI/D0F0wfFvyI2t6IZl5iyrI926IJMS0lOWJYtK e4voMra2kZDCUDsKvd3BSqHPBaAV23292yij6ynTSJfaO6aEeEuPS8jDizdtt5fGGo hrubSQRBvSD7w== From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: Ard Biesheuvel , Herbert Xu , Eric Biggers , Kees Cook Subject: [PATCH v2 13/13] crypto: x86/sha - Use local .L symbols for code Date: Wed, 12 Apr 2023 13:00:35 +0200 Message-Id: <20230412110035.361447-14-ardb@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230412110035.361447-1-ardb@kernel.org> References: <20230412110035.361447-1-ardb@kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=11505; i=ardb@kernel.org; h=from:subject; bh=52LINBgS3GLlIA5zBqsuswDWcuMBpD6xa5rGbvxGo5I=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIcWsP7hbVljxfMKO1Jttdqx8Czz3LWhSWismP3193GWFS wsn3rzYUcrCIMbBICumyCIw+++7nacnStU6z5KFmcPKBDKEgYtTACay5h4jwxKd9jeS56Na+w15 lS5VKZv4Le/TMe197/f7/+mcO03sPgz/qw/ds3jQ0jGLcU/A2fcWyxSO59yIlb8z27DqlMuay89 TGQA= X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid cluttering up the kallsyms symbol table with entries that should not end up in things like backtraces, as they have undescriptive and generated identifiers. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/sha1_avx2_x86_64_asm.S | 25 ++++---------- arch/x86/crypto/sha256-avx-asm.S | 16 ++++----- arch/x86/crypto/sha256-avx2-asm.S | 36 ++++++++++---------- arch/x86/crypto/sha256-ssse3-asm.S | 16 ++++----- arch/x86/crypto/sha512-avx-asm.S | 8 ++--- arch/x86/crypto/sha512-avx2-asm.S | 16 ++++----- arch/x86/crypto/sha512-ssse3-asm.S | 8 ++--- 7 files changed, 57 insertions(+), 68 deletions(-) diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S b/arch/x86/crypto/sha1_avx2_x86_64_asm.S index a96b2fd26dab4bb5..4b49bdc9526583d6 100644 --- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S @@ -485,18 +485,18 @@ xchg WK_BUF, PRECALC_BUF .align 32 -_loop: +.L_loop: /* * code loops through more than one block * we use K_BASE value as a signal of a last block, * it is set below by: cmovae BUFFER_PTR, K_BASE */ test BLOCKS_CTR, BLOCKS_CTR - jnz _begin + jnz .L_begin .align 32 - jmp _end + jmp .L_end .align 32 -_begin: +.L_begin: /* * Do first block @@ -508,9 +508,6 @@ _begin: .set j, j+2 .endr - jmp _loop0 -_loop0: - /* * rounds: * 10,12,14,16,18 @@ -545,7 +542,7 @@ _loop0: UPDATE_HASH 16(HASH_PTR), E test BLOCKS_CTR, BLOCKS_CTR - jz _loop + jz .L_loop mov TB, B @@ -562,8 +559,6 @@ _loop0: .set j, j+2 .endr - jmp _loop1 -_loop1: /* * rounds * 20+80,22+80,24+80,26+80,28+80 @@ -574,9 +569,6 @@ _loop1: .set j, j+2 .endr - jmp _loop2 -_loop2: - /* * rounds * 40+80,42+80,44+80,46+80,48+80 @@ -592,9 +584,6 @@ _loop2: /* Move to the next block only if needed*/ ADD_IF_GE BUFFER_PTR2, BLOCKS_CTR, 4, 128 - jmp _loop3 -_loop3: - /* * rounds * 60+80,62+80,64+80,66+80,68+80 @@ -623,10 +612,10 @@ _loop3: xchg WK_BUF, PRECALC_BUF - jmp _loop + jmp .L_loop .align 32 - _end: +.L_end: .endm /* diff --git a/arch/x86/crypto/sha256-avx-asm.S b/arch/x86/crypto/sha256-avx-asm.S index 5555b5d5215a449e..53de72bdd851ec8d 100644 --- a/arch/x86/crypto/sha256-avx-asm.S +++ b/arch/x86/crypto/sha256-avx-asm.S @@ -360,7 +360,7 @@ SYM_TYPED_FUNC_START(sha256_transform_avx) and $~15, %rsp # align stack pointer shl $6, NUM_BLKS # convert to bytes - jz done_hash + jz .Ldone_hash add INP, NUM_BLKS # pointer to end of data mov NUM_BLKS, _INP_END(%rsp) @@ -377,7 +377,7 @@ SYM_TYPED_FUNC_START(sha256_transform_avx) vmovdqa PSHUFFLE_BYTE_FLIP_MASK(%rip), BYTE_FLIP_MASK vmovdqa _SHUF_00BA(%rip), SHUF_00BA vmovdqa _SHUF_DC00(%rip), SHUF_DC00 -loop0: +.Lloop0: lea K256(%rip), TBL ## byte swap first 16 dwords @@ -391,7 +391,7 @@ loop0: ## schedule 48 input dwords, by doing 3 rounds of 16 each mov $3, SRND .align 16 -loop1: +.Lloop1: vpaddd (TBL), X0, XFER vmovdqa XFER, _XFER(%rsp) FOUR_ROUNDS_AND_SCHED @@ -410,10 +410,10 @@ loop1: FOUR_ROUNDS_AND_SCHED sub $1, SRND - jne loop1 + jne .Lloop1 mov $2, SRND -loop2: +.Lloop2: vpaddd (TBL), X0, XFER vmovdqa XFER, _XFER(%rsp) DO_ROUND 0 @@ -433,7 +433,7 @@ loop2: vmovdqa X3, X1 sub $1, SRND - jne loop2 + jne .Lloop2 addm (4*0)(CTX),a addm (4*1)(CTX),b @@ -447,9 +447,9 @@ loop2: mov _INP(%rsp), INP add $64, INP cmp _INP_END(%rsp), INP - jne loop0 + jne .Lloop0 -done_hash: +.Ldone_hash: mov %rbp, %rsp popq %rbp diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S index e2a4024fb0a3f5d5..9918212faf914ffc 100644 --- a/arch/x86/crypto/sha256-avx2-asm.S +++ b/arch/x86/crypto/sha256-avx2-asm.S @@ -538,12 +538,12 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx) and $-32, %rsp # align rsp to 32 byte boundary shl $6, NUM_BLKS # convert to bytes - jz done_hash + jz .Ldone_hash lea -64(INP, NUM_BLKS), NUM_BLKS # pointer to last block mov NUM_BLKS, _INP_END(%rsp) cmp NUM_BLKS, INP - je only_one_block + je .Lonly_one_block ## load initial digest mov (CTX), a @@ -561,7 +561,7 @@ SYM_TYPED_FUNC_START(sha256_transform_rorx) mov CTX, _CTX(%rsp) -loop0: +.Lloop0: ## Load first 16 dwords from two blocks VMOVDQ 0*32(INP),XTMP0 VMOVDQ 1*32(INP),XTMP1 @@ -580,7 +580,7 @@ loop0: vperm2i128 $0x20, XTMP3, XTMP1, X2 vperm2i128 $0x31, XTMP3, XTMP1, X3 -last_block_enter: +.Llast_block_enter: add $64, INP mov INP, _INP(%rsp) @@ -588,7 +588,7 @@ last_block_enter: xor SRND, SRND .align 16 -loop1: +.Lloop1: leaq K256+0*32(%rip), INP ## reuse INP as scratch reg vpaddd (INP, SRND), X0, XFER vmovdqa XFER, 0*32+_XFER(%rsp, SRND) @@ -611,9 +611,9 @@ loop1: add $4*32, SRND cmp $3*4*32, SRND - jb loop1 + jb .Lloop1 -loop2: +.Lloop2: ## Do last 16 rounds with no scheduling leaq K256+0*32(%rip), INP vpaddd (INP, SRND), X0, XFER @@ -630,7 +630,7 @@ loop2: vmovdqa X3, X1 cmp $4*4*32, SRND - jb loop2 + jb .Lloop2 mov _CTX(%rsp), CTX mov _INP(%rsp), INP @@ -645,17 +645,17 @@ loop2: addm (4*7)(CTX),h cmp _INP_END(%rsp), INP - ja done_hash + ja .Ldone_hash #### Do second block using previously scheduled results xor SRND, SRND .align 16 -loop3: +.Lloop3: DO_4ROUNDS _XFER + 0*32 + 16 DO_4ROUNDS _XFER + 1*32 + 16 add $2*32, SRND cmp $4*4*32, SRND - jb loop3 + jb .Lloop3 mov _CTX(%rsp), CTX mov _INP(%rsp), INP @@ -671,10 +671,10 @@ loop3: addm (4*7)(CTX),h cmp _INP_END(%rsp), INP - jb loop0 - ja done_hash + jb .Lloop0 + ja .Ldone_hash -do_last_block: +.Ldo_last_block: VMOVDQ 0*16(INP),XWORD0 VMOVDQ 1*16(INP),XWORD1 VMOVDQ 2*16(INP),XWORD2 @@ -685,9 +685,9 @@ do_last_block: vpshufb X_BYTE_FLIP_MASK, XWORD2, XWORD2 vpshufb X_BYTE_FLIP_MASK, XWORD3, XWORD3 - jmp last_block_enter + jmp .Llast_block_enter -only_one_block: +.Lonly_one_block: ## load initial digest mov (4*0)(CTX),a @@ -704,9 +704,9 @@ only_one_block: vmovdqa _SHUF_DC00(%rip), SHUF_DC00 mov CTX, _CTX(%rsp) - jmp do_last_block + jmp .Ldo_last_block -done_hash: +.Ldone_hash: mov %rbp, %rsp pop %rbp diff --git a/arch/x86/crypto/sha256-ssse3-asm.S b/arch/x86/crypto/sha256-ssse3-asm.S index 959288eecc689891..93264ee4454325b9 100644 --- a/arch/x86/crypto/sha256-ssse3-asm.S +++ b/arch/x86/crypto/sha256-ssse3-asm.S @@ -369,7 +369,7 @@ SYM_TYPED_FUNC_START(sha256_transform_ssse3) and $~15, %rsp shl $6, NUM_BLKS # convert to bytes - jz done_hash + jz .Ldone_hash add INP, NUM_BLKS mov NUM_BLKS, _INP_END(%rsp) # pointer to end of data @@ -387,7 +387,7 @@ SYM_TYPED_FUNC_START(sha256_transform_ssse3) movdqa _SHUF_00BA(%rip), SHUF_00BA movdqa _SHUF_DC00(%rip), SHUF_DC00 -loop0: +.Lloop0: lea K256(%rip), TBL ## byte swap first 16 dwords @@ -401,7 +401,7 @@ loop0: ## schedule 48 input dwords, by doing 3 rounds of 16 each mov $3, SRND .align 16 -loop1: +.Lloop1: movdqa (TBL), XFER paddd X0, XFER movdqa XFER, _XFER(%rsp) @@ -424,10 +424,10 @@ loop1: FOUR_ROUNDS_AND_SCHED sub $1, SRND - jne loop1 + jne .Lloop1 mov $2, SRND -loop2: +.Lloop2: paddd (TBL), X0 movdqa X0, _XFER(%rsp) DO_ROUND 0 @@ -446,7 +446,7 @@ loop2: movdqa X3, X1 sub $1, SRND - jne loop2 + jne .Lloop2 addm (4*0)(CTX),a addm (4*1)(CTX),b @@ -460,9 +460,9 @@ loop2: mov _INP(%rsp), INP add $64, INP cmp _INP_END(%rsp), INP - jne loop0 + jne .Lloop0 -done_hash: +.Ldone_hash: mov %rbp, %rsp popq %rbp diff --git a/arch/x86/crypto/sha512-avx-asm.S b/arch/x86/crypto/sha512-avx-asm.S index b0984f19fdb408e9..d902b8ea07218467 100644 --- a/arch/x86/crypto/sha512-avx-asm.S +++ b/arch/x86/crypto/sha512-avx-asm.S @@ -276,7 +276,7 @@ frame_size = frame_WK + WK_SIZE ######################################################################## SYM_TYPED_FUNC_START(sha512_transform_avx) test msglen, msglen - je nowork + je .Lnowork # Save GPRs push %rbx @@ -291,7 +291,7 @@ SYM_TYPED_FUNC_START(sha512_transform_avx) sub $frame_size, %rsp and $~(0x20 - 1), %rsp -updateblock: +.Lupdateblock: # Load state variables mov DIGEST(0), a_64 @@ -348,7 +348,7 @@ updateblock: # Advance to next message block add $16*8, msg dec msglen - jnz updateblock + jnz .Lupdateblock # Restore Stack Pointer mov %rbp, %rsp @@ -361,7 +361,7 @@ updateblock: pop %r12 pop %rbx -nowork: +.Lnowork: RET SYM_FUNC_END(sha512_transform_avx) diff --git a/arch/x86/crypto/sha512-avx2-asm.S b/arch/x86/crypto/sha512-avx2-asm.S index b1ca99055ef994f5..f08496cd68708fde 100644 --- a/arch/x86/crypto/sha512-avx2-asm.S +++ b/arch/x86/crypto/sha512-avx2-asm.S @@ -581,7 +581,7 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx) and $~(0x20 - 1), %rsp shl $7, NUM_BLKS # convert to bytes - jz done_hash + jz .Ldone_hash add INP, NUM_BLKS # pointer to end of data mov NUM_BLKS, frame_INPEND(%rsp) @@ -600,7 +600,7 @@ SYM_TYPED_FUNC_START(sha512_transform_rorx) vmovdqa PSHUFFLE_BYTE_FLIP_MASK(%rip), BYTE_FLIP_MASK -loop0: +.Lloop0: lea K512(%rip), TBL ## byte swap first 16 dwords @@ -615,7 +615,7 @@ loop0: movq $4, frame_SRND(%rsp) .align 16 -loop1: +.Lloop1: vpaddq (TBL), Y_0, XFER vmovdqa XFER, frame_XFER(%rsp) FOUR_ROUNDS_AND_SCHED @@ -634,10 +634,10 @@ loop1: FOUR_ROUNDS_AND_SCHED subq $1, frame_SRND(%rsp) - jne loop1 + jne .Lloop1 movq $2, frame_SRND(%rsp) -loop2: +.Lloop2: vpaddq (TBL), Y_0, XFER vmovdqa XFER, frame_XFER(%rsp) DO_4ROUNDS @@ -650,7 +650,7 @@ loop2: vmovdqa Y_3, Y_1 subq $1, frame_SRND(%rsp) - jne loop2 + jne .Lloop2 mov frame_CTX(%rsp), CTX2 addm 8*0(CTX2), a @@ -665,9 +665,9 @@ loop2: mov frame_INP(%rsp), INP add $128, INP cmp frame_INPEND(%rsp), INP - jne loop0 + jne .Lloop0 -done_hash: +.Ldone_hash: # Restore Stack Pointer mov %rbp, %rsp diff --git a/arch/x86/crypto/sha512-ssse3-asm.S b/arch/x86/crypto/sha512-ssse3-asm.S index c06afb5270e5f578..65be301568162654 100644 --- a/arch/x86/crypto/sha512-ssse3-asm.S +++ b/arch/x86/crypto/sha512-ssse3-asm.S @@ -278,7 +278,7 @@ frame_size = frame_WK + WK_SIZE SYM_TYPED_FUNC_START(sha512_transform_ssse3) test msglen, msglen - je nowork + je .Lnowork # Save GPRs push %rbx @@ -293,7 +293,7 @@ SYM_TYPED_FUNC_START(sha512_transform_ssse3) sub $frame_size, %rsp and $~(0x20 - 1), %rsp -updateblock: +.Lupdateblock: # Load state variables mov DIGEST(0), a_64 @@ -350,7 +350,7 @@ updateblock: # Advance to next message block add $16*8, msg dec msglen - jnz updateblock + jnz .Lupdateblock # Restore Stack Pointer mov %rbp, %rsp @@ -363,7 +363,7 @@ updateblock: pop %r12 pop %rbx -nowork: +.Lnowork: RET SYM_FUNC_END(sha512_transform_ssse3)