Message ID | 1498055361-12493-1-git-send-email-radu.solea@nxp.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Herbert Xu |
Headers | show |
On Wed, Jun 21, 2017 at 05:29:21PM +0300, Radu Solea wrote: > Generic GCM is likely to end up using a hardware accelerator to do > part of the job. Allocating hash, iv and result in a contiguous memory > area increases the risk of dma mapping multiple ranges on the same > cacheline. Also having dma and cpu written data on the same cacheline > will cause coherence issues. > > Signed-off-by: Radu Solea <radu.solea@nxp.com> > --- > Hi! > > I've encountered cache coherence issues when using GCM with CAAM and this was > one way of fixing them but it has its drawbacks. Another would be to allocate > each element instead of all at once, but that only decreases the likelyhood of > this happening. Does anyone know of a better way of fixing this? I don't get it. You're modifying the software version of GCM, and the function is marked as static. How can this patch have any effect on CAAM? Thanks,
On Thu, Jun 22, 2017 at 05:03:44AM +0000, Radu Solea wrote:
> I'm adding ____cacheline_aligned to iv and result. That is done so hash, iv and result never share a cache line.
Yes but you're doing it in crypto/gcm.c which isn't even used by
caam AFAIK.
Cheers,
On Jo, 2017-06-22 at 15:53 +0800, Herbert Xu wrote: > On Thu, Jun 22, 2017 at 05:03:44AM +0000, Radu Solea wrote: > > > > I'm adding ____cacheline_aligned to iv and result. That is done so > > hash, iv and result never share a cache line. > Yes but you're doing it in crypto/gcm.c which isn't even used by > caam AFAIK. > > Cheers, My bad, I need to put this in context better. First, I saw this issue with CAAM, but it has the potential to impact any implementation that tries to access those fields through DMA. CAAM has a number of variations one of them is called CAAM LP, it doesn't support GCM but it can offload ctr(aes). Which it tries to do, but because of how this memory is allocated, when the CPU writes result, the cache mechanism overwrites hash and iv with whatever was there before the offload happened. This is not the only problem, DMA writing two fields on the same cacheline is likely to cause issues too: from https://www.kernel.org/doc/Documentation/DMA-API.txt Memory coherency operates at a granularity called the cache line width. In order for memory mapped by this API to operate correctly, the mapped region must begin exactly on a cache line boundary and end exactly on one (to prevent two separately mapped regions from sharing a single cache line). Since the cache line size may not be known at compile time, the API will not enforce this requirement. Therefore, it is recommended that driver writers who don't take special care to determine the cache line size at run time only map virtual regions that begin and end on page boundaries (which are guaranteed also to be cache line boundaries). There are two ways of fixing this AFAIK: the first is adding cacheline_aligned so those fields don't fall into the same cacheline. The second is to kzalloc hash and iv separately. kmalloc should honor ARCH_DMA_MINALIGN which would make this issue go away. Cheers, Radu.
On Thu, Jun 22, 2017 at 01:56:40PM +0000, Radu Solea wrote: > There are two ways of fixing this AFAIK: the first is adding > cacheline_aligned so those fields don't fall into the same cacheline. > The second is to kzalloc hash and iv separately. kmalloc should honor > ARCH_DMA_MINALIGN which would make this issue go away. Thanks for the explanation. I see the problem now. The crypto API cannot rely on users providing aligned buffers. So if your driver has an alignment requirement, it either has to use the existing crypto API alignmask setting which can cope with some unaligned inputs, e.g., the IV if you use the skcipher walk mechanism, or you must copy unaligned data yourself before performing DMA on them. Cheers,
On Vi, 2017-06-23 at 14:31 +0800, Herbert Xu wrote: > > The crypto API cannot rely on users providing aligned buffers. So > if your driver has an alignment requirement, it either has to use > the existing crypto API alignmask setting which can cope with some > unaligned inputs, e.g., the IV if you use the skcipher walk > mechanism, > or you must copy unaligned data yourself before performing DMA on > them. > > Cheers, Normally I would agree with you, if it's a weird requirement coming from hardware or driver. In this case I think it's different. This is not a limitation coming from one driver or one particular hardware variety. It applies to all platforms that do not have hw cache coherence and a large enough cacheline. A couple of lines below the allocation hash is linked into a scatterlist, a data structure with remarkably high chances of ending up in a DMA endpoint, yet we choose to ignore all other DMA requirements? Cheers, Radu.
On Fri, Jun 23, 2017 at 07:33:20AM +0000, Radu Solea wrote: > > Normally I would agree with you, if it's a weird requirement coming > from hardware or driver. In this case I think it's different. This is > not a limitation coming from one driver or one particular hardware > variety. It applies to all platforms that do not have hw cache > coherence and a large enough cacheline. > > A couple of lines below the allocation hash is linked into a > scatterlist, a data structure with remarkably high chances of ending up > in a DMA endpoint, yet we choose to ignore all other DMA requirements? What I'm saying is that you cannot rely on crypto API users to do this for you. Sure we can fix this one spot in gcm.c. But any other user of caam anywhere in the kernel can do exactly the same thing. You cannot expect them to know to allocate IVs at cacheline boundaries. So if you have this requirement (which the generic C version certainly does not), then you'll need to deal with it in the driver. Of course if every DMA driver needed to do the same thing, then it's something the crypto API should take care of, e.g., like we do for alignmask. Cheers,
diff --git a/crypto/gcm.c b/crypto/gcm.c index b7ad808..657eefe 100644 --- a/crypto/gcm.c +++ b/crypto/gcm.c @@ -117,9 +117,9 @@ static int crypto_gcm_setkey(struct crypto_aead *aead, const u8 *key, struct crypto_skcipher *ctr = ctx->ctr; struct { be128 hash; - u8 iv[16]; + u8 iv[16] ____cacheline_aligned; - struct crypto_gcm_setkey_result result; + struct crypto_gcm_setkey_result result ____cacheline_aligned; struct scatterlist sg[1]; struct skcipher_request req;
Generic GCM is likely to end up using a hardware accelerator to do part of the job. Allocating hash, iv and result in a contiguous memory area increases the risk of dma mapping multiple ranges on the same cacheline. Also having dma and cpu written data on the same cacheline will cause coherence issues. Signed-off-by: Radu Solea <radu.solea@nxp.com> --- Hi! I've encountered cache coherence issues when using GCM with CAAM and this was one way of fixing them but it has its drawbacks. Another would be to allocate each element instead of all at once, but that only decreases the likelyhood of this happening. Does anyone know of a better way of fixing this? Thanks, Radu. crypto/gcm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)