From patchwork Tue Feb 13 18:57:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 10217315 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 31762601C2 for ; Tue, 13 Feb 2018 18:57:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1D14B285FB for ; Tue, 13 Feb 2018 18:57:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0ED6928709; Tue, 13 Feb 2018 18:57:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FSL_HELO_FAKE, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 544A8285FB for ; Tue, 13 Feb 2018 18:57:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965556AbeBMS5j (ORCPT ); Tue, 13 Feb 2018 13:57:39 -0500 Received: from mail-pl0-f66.google.com ([209.85.160.66]:45088 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965509AbeBMS5i (ORCPT ); Tue, 13 Feb 2018 13:57:38 -0500 Received: by mail-pl0-f66.google.com with SMTP id p5so7060242plo.12 for ; Tue, 13 Feb 2018 10:57:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=fcrzjHOwcmxCopdO5Zy6hN5qW1PwDatNf/hSW1cZ+og=; b=dpDt/Uc+HcffXvD2QcekeR7Qek/eCauwT1S/s3Es6oaqwR1AXjH1bB221T6bUgp8He 8W66yray+ntrh4YBBV8bYLMnYlboltE0dVE5sPciVcA+RT0u93ZhybI/x3KKBfYoDN2W PJZnj/at0SXIXBsyErKVwEQlfEW7RTyxbkkmVt8au3QrqtEz1jQzaSlDUlhCagJAhgUQ 11g5tQzZbL43LBWy3ByJ3SKesQ2/OCGi4l+6WctWihiTX5Py7FCTKZSL/gg0+ACO5ql9 rRMvfst4u2BpFVNWUgZRcXT9Q+buL3m5V2NTpUujjDl0AcPMOrudXZ402yV/saOEzFZr J20Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=fcrzjHOwcmxCopdO5Zy6hN5qW1PwDatNf/hSW1cZ+og=; b=R/J0EljTKivzFEVqBNbNyt/FyQcB52LMZNjzEZ6e1ORmVrLFiLVe3UIr7rsKnvaeEo YWL/NxjUlhvJpHEHnzOHhSrbEV9dTZN+nQ3T20K2OcCWPDLrhufAKwae/iF7xmtJ1WX3 2oKMFac8dFX1Pp9SzQrqNRoNm3pwdqGJs5YjDcfS7y6I1Mb7Yl3zDW/eOm+hhoi05eAH U8rX8C/IDPuEZKwqm24v67WIl0aRFBL2WnV+GAsG0a3SB1QKRikyQIdCl+mA+hE39Gby 6TXEgJIuYUr8H7je9Yoh+nYO+GJ2z3sf6DIgB6KnqrNb5Zuf2eJeUm2aC2TcA8vmO+f6 sWog== X-Gm-Message-State: APf1xPChadNHy/BbpwX7bAmEhjiA/aDnZylXDVQwTAqJYLKC2tFwqrkT fmGk9+mjDDePKk7Jt95qR3xtnUuK3MQ= X-Google-Smtp-Source: AH8x2270iYbvzlZKHubtvIF6su0uFwcUftIOuVtGR++RcUTMhJL5cbqaBtbVpvs7OhknIkG/0vGksg== X-Received: by 2002:a17:902:24a5:: with SMTP id w34-v6mr1959063pla.221.1518548258122; Tue, 13 Feb 2018 10:57:38 -0800 (PST) Received: from google.com ([2620:15c:17:3:dc28:5c82:b905:e8a8]) by smtp.gmail.com with ESMTPSA id 206sm38113076pfy.140.2018.02.13.10.57.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 13 Feb 2018 10:57:36 -0800 (PST) Date: Tue, 13 Feb 2018 10:57:35 -0800 From: Eric Biggers To: Ard Biesheuvel Cc: "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" , Herbert Xu , linux-fscrypt@vger.kernel.org, linux-arm-kernel , Jeffrey Walton , Paul Crowley , Patrik Torstensson , Greg Kaiser , Paul Lawrence , Michael Halcrow , Alex Cope , Greg Kroah-Hartman Subject: Re: [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS Message-ID: <20180213185735.GA243817@google.com> References: <20180212235209.117393-1-ebiggers@google.com> <20180212235209.117393-4-ebiggers@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-fscrypt-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fscrypt@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Ard, On Tue, Feb 13, 2018 at 11:34:36AM +0000, Ard Biesheuvel wrote: > Hi Eric, > > On 12 February 2018 at 23:52, Eric Biggers wrote: > > Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on > > 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for > > Speck64. Each 128-byte chunk goes through XTS preprocessing, then is > > encrypted/decrypted (doing one cipher round for all the blocks, then the > > next round, etc.), then goes through XTS postprocessing. > > > > The performance depends on the processor but can be about 3 times faster > > than the generic code. For example, on an ARMv7 processor we observe > > the following performance with Speck128/256-XTS: > > > > xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s > > xts(speck128-generic): Encryption 32.1 MB/s, Decryption 36.6 MB/s > > > > In comparison to AES-256-XTS without the Cryptography Extensions: > > > > xts-aes-neonbs: Encryption 41.2 MB/s, Decryption 36.7 MB/s > > xts(aes-asm): Encryption 31.7 MB/s, Decryption 30.8 MB/s > > xts(aes-generic): Encryption 21.2 MB/s, Decryption 20.9 MB/s > > > > Speck64/128-XTS is even faster: > > > > xts-speck64-neon: Encryption 138.6 MB/s, Decryption 139.1 MB/s > > > > Note that as with the generic code, only the Speck128 and Speck64 > > variants are supported. Also, for now only the XTS mode of operation is > > supported, to target the disk and file encryption use cases. The NEON > > code also only handles the portion of the data that is evenly divisible > > into 128-byte chunks, with any remainder handled by a C fallback. Of > > course, other modes of operation could be added later if needed, and/or > > the NEON code could be updated to handle other buffer sizes. > > > > The XTS specification is only defined for AES which has a 128-bit block > > size, so for the GF(2^64) math needed for Speck64-XTS we use the > > reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX > > paper. Of course, when possible users should use Speck128-XTS, but even > > that may be too slow on some processors; Speck64-XTS can be faster. > > > > I think this is excellent work. Speck seems an appropriate solution to > this problem, and I'm glad we are not ending up with a stream cipher > for block encryption. > > Also, I think an arm64 port would be nice. I may take a stab at this > if nobody else beats me to it. We don't really want to encourage people to use Speck over AES with the Cryptography Extensions, so that's why I didn't include an arm64 port. That being said, I suppose we can't stop people from adding an arm64 port if they really do prefer Speck, or maybe for use on arm64 CPUs that don't have the Cryptography Extensions (though I thought that almost all do). > > I did run into an issue with this code though: On big-endian, I get > > [ 0.272381] alg: skcipher: Test 1 failed (invalid result) on > encryption for xts-speck64-neon > [ 0.276151] 00000000: 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8 > [ 0.278541] 00000010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b > > so there may be a byte order corner case you missed in the rewrite (or > the issue existed before, as I did not test your v1) > To be honest I haven't tested either version on a big endian ARM CPU yet. I don't really know how to do that currently; maybe it's possible with QEMU. But assuming I haven't missed anything, in the assembly code everything is treated as byte arrays with the exception of the round keys which are 32-bit or 64-bit numbers in CPU endianness. The byte arrays are loaded and stored with vld1.8 and vst1.8 while the round keys are loaded with vld1.32 or vld1.64, so the assembly code *should* work correctly on a big endian CPU. However, looking over it now, I think there is a bug in the glue code for Speck64-XTS when it handles buffers not evenly divisible into 128 bytes. Namely, the tweak is treated as CPU endian when it should be little endian. Could you try the following patch? Tested-by: Ard Biesheuvel --- To unsubscribe from this list: send the line "unsubscribe linux-fscrypt" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c index 3987dd6e063e..960cc634b36f 100644 --- a/arch/arm/crypto/speck-neon-glue.c +++ b/arch/arm/crypto/speck-neon-glue.c @@ -157,7 +157,7 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one, struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; - u64 tweak; + __le64 tweak; int err; err = skcipher_walk_virt(&walk, req, true); @@ -184,16 +184,16 @@ __speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one, } /* Handle any remainder with generic code */ - while (nbytes >= sizeof(u64)) { - *(u64 *)dst = *(u64 *)src ^ tweak; + while (nbytes >= sizeof(__le64)) { + *(__le64 *)dst = *(__le64 *)src ^ tweak; (*crypt_one)(&ctx->main_key, dst, dst); - *(u64 *)dst ^= tweak; - tweak = (tweak << 1) ^ - ((tweak & (1ULL << 63)) ? 0x1B : 0); - - dst += sizeof(u64); - src += sizeof(u64); - nbytes -= sizeof(u64); + *(__le64 *)dst ^= tweak; + tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^ + ((tweak & cpu_to_le64(1ULL << 63)) ? + 0x1B : 0)); + dst += sizeof(__le64); + src += sizeof(__le64); + nbytes -= sizeof(__le64); } err = skcipher_walk_done(&walk, nbytes); }