[RFC] gcm - fix setkey cache coherence issues

Message ID	1498055361-12493-1-git-send-email-radu.solea@nxp.com (mailing list archive)
State	RFC
Delegated to:	Herbert Xu
Headers	show Return-Path: <linux-crypto-owner@kernel.org> Received-SPF: Fail (protection.outlook.com: domain of nxp.com does not designate 192.88.168.50 as permitted sender) receiver=protection.outlook.com; client-ip=192.88.168.50; helo=tx30smr01.am.freescale.net; From: Radu Solea <radu.solea@nxp.com> To: <herbert@gondor.apana.org.au>, <davem@davemloft.net> CC: <linux-crypto@vger.kernel.org>, <horia.geanta@nxp.com>, Radu Solea <radu.solea@nxp.com> Subject: [RFC PATCH] gcm - fix setkey cache coherence issues Date: Wed, 21 Jun 2017 17:29:21 +0300 Message-ID: <1498055361-12493-1-git-send-email-radu.solea@nxp.com> MIME-Version: 1.0 Content-Type: text/plain SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; BLUPR0301MB2003; 7:PLkA00SsT2cGGalBL2JxzMuwkA6P1+TCexXN5yqn7wwpQ/5bexI+mU7OAvPOAjuculsT9kXlTg7+UoYkZD1Ww/HefZoISLSYYU6A2I8Z+LGa/d1ha3src4OCPpcLn9FWXiV6pLzHhgOlkWXrNRqpp8oID+n+2WC235ikxo/EbohgJt524ToB+7dJQtna4nFJ0P5LQ7xYxhJYDBpdpcqf34ORO6DMtCxJoJyIYYiDydTCntp6oLuYghuh+WSsP/lhnPLbndnu6VF0L8KGdBZOLLHsoCHWhKK/31OIQLslP/sOSZanabraHMONePbIXcznuk+R8PmL8tcQ/8t85sEjgDyB+gv87W8os+LiMZqUyy8tCGS533l7A8uevKs3an6JGdPQZ51fPtiMNFbL5hg8sZv2XjdGCGkZ77zmAeR7lRlhNf/XSWVRU+jAJMrUTFDk3YsrS1KnrdLg7e0nxrbWcux+/DCb8EMU/tRSETlE7ljHkJJmp6FaGLjjTE6uo7Kd508djKwSa/JsYIDX+yOEOSlryPSYcxP4zCpH1FmNu95zeLBW4wZUOPyam+slLrOVK4cFxru/gCkTxDFyC2B+EmwPX1iGcu3ITGNs1KBD5kYrDSg9yGtK8plf9reU/97qDgg8hyqYTHbirD0UF8L7FsNrZ5jmlTzQQkQD2zPZa9Dym4bRm6JlHq8w7YJxnQ/aKbQt8LeYlHYbMaKaY3Nw1dvhNcnWG2TlJ9y5LjLkXHAwbV9SiWZ4on4bIVvvQXwMTw8w/N/EU9Pzy9rkLJTVzKkBOnS4lVO1EShuLhZXR6A= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Jun 2017 14:29:41.6639 (UTC) Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk

Radu Solea June 21, 2017, 2:29 p.m. UTC

Generic GCM is likely to end up using a hardware accelerator to do
part of the job. Allocating hash, iv and result in a contiguous memory
area increases the risk of dma mapping multiple ranges on the same
cacheline. Also having dma and cpu written data on the same cacheline
will cause coherence issues.

Signed-off-by: Radu Solea <radu.solea@nxp.com>
---
Hi!

I've encountered cache coherence issues when using GCM with CAAM and this was
one way of fixing them but it has its drawbacks. Another would be to allocate
each element instead of all at once, but that only decreases the likelyhood of
this happening. Does anyone know of a better way of fixing this?

Thanks,
Radu.

 crypto/gcm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Herbert Xu June 22, 2017, 2:40 a.m. UTC | #1

On Wed, Jun 21, 2017 at 05:29:21PM +0300, Radu Solea wrote:
> Generic GCM is likely to end up using a hardware accelerator to do
> part of the job. Allocating hash, iv and result in a contiguous memory
> area increases the risk of dma mapping multiple ranges on the same
> cacheline. Also having dma and cpu written data on the same cacheline
> will cause coherence issues.
> 
> Signed-off-by: Radu Solea <radu.solea@nxp.com>
> ---
> Hi!
> 
> I've encountered cache coherence issues when using GCM with CAAM and this was
> one way of fixing them but it has its drawbacks. Another would be to allocate
> each element instead of all at once, but that only decreases the likelyhood of
> this happening. Does anyone know of a better way of fixing this?

I don't get it.  You're modifying the software version of GCM, and
the function is marked as static.  How can this patch have any effect
on CAAM?

Thanks,

Herbert Xu June 22, 2017, 7:53 a.m. UTC | #2

On Thu, Jun 22, 2017 at 05:03:44AM +0000, Radu Solea wrote:
> I'm adding ____cacheline_aligned to iv and result. That is done so hash, iv and result never share a cache line.

Yes but you're doing it in crypto/gcm.c which isn't even used by
caam AFAIK.

Cheers,

Radu Solea June 22, 2017, 1:56 p.m. UTC | #3

On Jo, 2017-06-22 at 15:53 +0800, Herbert Xu wrote:
> On Thu, Jun 22, 2017 at 05:03:44AM +0000, Radu Solea wrote:

> > 

> > I'm adding ____cacheline_aligned to iv and result. That is done so

> > hash, iv and result never share a cache line.

> Yes but you're doing it in crypto/gcm.c which isn't even used by

> caam AFAIK.

> 

> Cheers,

My bad, I need to put this in context better. First, I saw this issue
with CAAM, but it has the potential to impact any implementation that
tries to access those fields through DMA. 

CAAM has a number of variations one of them is called CAAM LP, it
doesn't support GCM but it can offload ctr(aes). Which it tries to do,
but because of how this memory is allocated, when the CPU writes
result, the cache mechanism overwrites hash and iv with whatever was
there before the offload happened. This is not the only problem, DMA
writing two fields on the same cacheline is likely to cause issues too:

from https://www.kernel.org/doc/Documentation/DMA-API.txt
Memory coherency operates at a granularity called the cache
line width.  In order for memory mapped by this API to operate
correctly, the mapped region must begin exactly on a cache line
boundary and end exactly on one (to prevent two separately mapped
regions from sharing a single cache line).  Since the cache line size
may not be known at compile time, the API will not enforce this
requirement.  Therefore, it is recommended that driver writers who
don't take special care to determine the cache line size at run time
only map virtual regions that begin and end on page boundaries (which
are guaranteed also to be cache line boundaries).

There are two ways of fixing this AFAIK: the first is adding
cacheline_aligned so those fields don't fall into the same cacheline.
The second is to kzalloc hash and iv separately. kmalloc should honor
ARCH_DMA_MINALIGN which would make this issue go away. 

Cheers,
Radu.

Herbert Xu June 23, 2017, 6:31 a.m. UTC | #4

On Thu, Jun 22, 2017 at 01:56:40PM +0000, Radu Solea wrote:
> There are two ways of fixing this AFAIK: the first is adding
> cacheline_aligned so those fields don't fall into the same cacheline.
> The second is to kzalloc hash and iv separately. kmalloc should honor
> ARCH_DMA_MINALIGN which would make this issue go away. 

Thanks for the explanation.  I see the problem now.

The crypto API cannot rely on users providing aligned buffers.  So
if your driver has an alignment requirement, it either has to use
the existing crypto API alignmask setting which can cope with some
unaligned inputs, e.g., the IV if you use the skcipher walk mechanism,
or you must copy unaligned data yourself before performing DMA on them.

Cheers,

Radu Solea June 23, 2017, 7:33 a.m. UTC | #5

On Vi, 2017-06-23 at 14:31 +0800, Herbert Xu wrote:
> 

> The crypto API cannot rely on users providing aligned buffers.  So

> if your driver has an alignment requirement, it either has to use

> the existing crypto API alignmask setting which can cope with some

> unaligned inputs, e.g., the IV if you use the skcipher walk

> mechanism,

> or you must copy unaligned data yourself before performing DMA on

> them.

> 

> Cheers,

Normally I would agree with you, if it's a weird requirement coming
from hardware or driver. In this case I think it's different. This is
not a limitation coming from one driver or one particular hardware
variety. It applies to all platforms that do not have hw cache
coherence and a large enough cacheline.

A couple of lines below the allocation hash is linked into a
scatterlist, a data structure with remarkably high chances of ending up
in a DMA endpoint, yet we choose to ignore all other DMA requirements?

Cheers,
Radu.

Herbert Xu June 23, 2017, 7:43 a.m. UTC | #6

On Fri, Jun 23, 2017 at 07:33:20AM +0000, Radu Solea wrote:
>
> Normally I would agree with you, if it's a weird requirement coming
> from hardware or driver. In this case I think it's different. This is
> not a limitation coming from one driver or one particular hardware
> variety. It applies to all platforms that do not have hw cache
> coherence and a large enough cacheline.
>  
> A couple of lines below the allocation hash is linked into a
> scatterlist, a data structure with remarkably high chances of ending up
> in a DMA endpoint, yet we choose to ignore all other DMA requirements?

What I'm saying is that you cannot rely on crypto API users to do
this for you.  Sure we can fix this one spot in gcm.c.  But any
other user of caam anywhere in the kernel can do exactly the same
thing.

You cannot expect them to know to allocate IVs at cacheline boundaries.
So if you have this requirement (which the generic C version certainly
does not), then you'll need to deal with it in the driver.

Of course if every DMA driver needed to do the same thing, then it's
something the crypto API should take care of, e.g., like we do for
alignmask.

Cheers,

[RFC] gcm - fix setkey cache coherence issues

Commit Message

Comments

Patch