Message ID | 1583707893-23699-2-git-send-email-iuliana.prodan@nxp.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Herbert Xu |
Headers | show |
Series | crypto: engine - support for parallel and batch requests | expand |
On Mon, Mar 09, 2020 at 12:51:32AM +0200, Iuliana Prodan wrote: > > ret = enginectx->op.do_one_request(engine, async_req); > - if (ret) { > - dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret); > - goto req_err; > + can_enq_more = ret; > + if (can_enq_more < 0) { > + dev_err(engine->dev, "Failed to do one request from queue: %d\n", > + ret); > + goto req_err_1; > + } So this now includes the case of the hardware queue being full and the request needs to be queued until space opens up again. In this case, we should not do dev_err. So you need to be able to distinguish between the hardware queue being full vs. a real fatal error on the request (e.g., out-of-memory or some hardware failure). Thanks,
On 3/12/2020 5:26 AM, Herbert Xu wrote: > On Mon, Mar 09, 2020 at 12:51:32AM +0200, Iuliana Prodan wrote: >> >> ret = enginectx->op.do_one_request(engine, async_req); >> - if (ret) { >> - dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret); >> - goto req_err; >> + can_enq_more = ret; >> + if (can_enq_more < 0) { >> + dev_err(engine->dev, "Failed to do one request from queue: %d\n", >> + ret); >> + goto req_err_1; >> + } > > So this now includes the case of the hardware queue being full > and the request needs to be queued until space opens up again. I see no difference when compared with existing implementation: in both cases failing the transfer from SW queue to HW queue means losing the request irrespective of the error code returned by .do_one_request. This doesn't mean it shouldn't be fixed. > In this case, we should not do dev_err. So you need to be able > to distinguish between the hardware queue being full vs. a real > fatal error on the request (e.g., out-of-memory or some hardware > failure). > Yes, in case of -ENOSPC (HW queue full) the request should be put back in the SW queue. Thanks, Horia
On 3/12/2020 5:26 AM, Herbert Xu wrote: > On Mon, Mar 09, 2020 at 12:51:32AM +0200, Iuliana Prodan wrote: >> >> ret = enginectx->op.do_one_request(engine, async_req); >> - if (ret) { >> - dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret); >> - goto req_err; >> + can_enq_more = ret; >> + if (can_enq_more < 0) { >> + dev_err(engine->dev, "Failed to do one request from queue: %d\n", >> + ret); >> + goto req_err_1; >> + } > > So this now includes the case of the hardware queue being full > and the request needs to be queued until space opens up again. > In this case, we should not do dev_err. So you need to be able > to distinguish between the hardware queue being full vs. a real > fatal error on the request (e.g., out-of-memory or some hardware > failure). > There are two aspects here: - if all requests go through crypto-engine, and, in this case, if there is no space in hw queue, do_one_req returns 0, and actually there will be no case of do_one_request() < 0; - if there are other requests (non crypto API) that go to hw for execution (in CAAM we have split key or RNG) those might occupy the hw queue, after crypto-engine returns that it still has space. This case wasn't supported before my modifications and neither with these patches. This use-case can be solved by retrying to send the request to hw - enqueue it back to crypto-engine, in the head of the queue (to keep the order, and send it again to hw). I've tried this, but it implies modifications in all drivers. For example, a driver, in case of error, it frees the resources of the request. So, will need to map again a request. Thanks, Iulia
On 3/12/2020 2:45 PM, Iuliana Prodan wrote: > On 3/12/2020 5:26 AM, Herbert Xu wrote: >> On Mon, Mar 09, 2020 at 12:51:32AM +0200, Iuliana Prodan wrote: >>> >>> ret = enginectx->op.do_one_request(engine, async_req); >>> - if (ret) { >>> - dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret); >>> - goto req_err; >>> + can_enq_more = ret; >>> + if (can_enq_more < 0) { >>> + dev_err(engine->dev, "Failed to do one request from queue: %d\n", >>> + ret); >>> + goto req_err_1; >>> + } >> >> So this now includes the case of the hardware queue being full >> and the request needs to be queued until space opens up again. >> In this case, we should not do dev_err. So you need to be able >> to distinguish between the hardware queue being full vs. a real >> fatal error on the request (e.g., out-of-memory or some hardware >> failure). >> > There are two aspects here: > - if all requests go through crypto-engine, and, in this case, if there > is no space in hw queue, do_one_req returns 0, and actually there will > be no case of do_one_request() < 0; > - if there are other requests (non crypto API) that go to hw for > execution (in CAAM we have split key or RNG) those might occupy the hw > queue, after crypto-engine returns that it still has space. This case > wasn't supported before my modifications and neither with these patches. > This use-case can be solved by retrying to send the request to hw - > enqueue it back to crypto-engine, in the head of the queue (to keep the > order, and send it again to hw). > I've tried this, but it implies modifications in all drivers. For > example, a driver, in case of error, it frees the resources of the > request. So, will need to map again a request. > I believe this recovery mechanism should be handled in other patch (series). Herbert, what is your proposal to move forward? Thanks, Iulia
On Thu, Mar 12, 2020 at 01:05:32PM +0200, Horia Geantă wrote: > On 3/12/2020 5:26 AM, Herbert Xu wrote: > > On Mon, Mar 09, 2020 at 12:51:32AM +0200, Iuliana Prodan wrote: > >> > >> ret = enginectx->op.do_one_request(engine, async_req); > >> - if (ret) { > >> - dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret); > >> - goto req_err; > >> + can_enq_more = ret; > >> + if (can_enq_more < 0) { > >> + dev_err(engine->dev, "Failed to do one request from queue: %d\n", > >> + ret); > >> + goto req_err_1; > >> + } > > > > So this now includes the case of the hardware queue being full > > and the request needs to be queued until space opens up again. > I see no difference when compared with existing implementation: > in both cases failing the transfer from SW queue to HW queue means > losing the request irrespective of the error code returned by .do_one_request. > > This doesn't mean it shouldn't be fixed. I don't think they are the same though. With the existing code, you only ever have one outstanding request so a new one is only given over to the hardware after the previous one has completed. That means that the only errors you expect to get from the driver are fatal ones that you cannot recover from. With parallel requests, you will be giving as many requests to the driver as it can take. In fact the error condition is now used to tell the engine to stop giving more requests. This is in no way the same as a fatal error from before. We should not print out an error in this case and we should ensure that the request is put back on the queue and reprocessed when the driver comes back for more. Cheers,
On Thu, Mar 12, 2020 at 12:45:54PM +0000, Iuliana Prodan wrote: > > There are two aspects here: > - if all requests go through crypto-engine, and, in this case, if there > is no space in hw queue, do_one_req returns 0, and actually there will > be no case of do_one_request() < 0; OK, that makes sense. However, this way of signaling for more requests can be racy. Unless you can guarantee that the driver is not taking any requests from another engine queue (or any other source), just because it returned a positive value now does not mean that it would be able to take a request the next time you come around the loop. > I've tried this, but it implies modifications in all drivers. For > example, a driver, in case of error, it frees the resources of the > request. So, will need to map again a request. I think what we are doing here is a major overhaul to the crypto engine API so while it's always a good idea to minimise the impact, we should not let the existing drivers constrain us too much. Thanks,
On 3/17/2020 5:29 AM, Herbert Xu wrote: > On Thu, Mar 12, 2020 at 12:45:54PM +0000, Iuliana Prodan wrote: >> >> There are two aspects here: >> - if all requests go through crypto-engine, and, in this case, if there >> is no space in hw queue, do_one_req returns 0, and actually there will >> be no case of do_one_request() < 0; > > OK, that makes sense. However, this way of signaling for more > requests can be racy. Unless you can guarantee that the driver > is not taking any requests from another engine queue (or any > other source), just because it returned a positive value now does > not mean that it would be able to take a request the next time > you come around the loop. > This case can happen right now, also. I can't guarantee that all drivers send all requests via crypto-engine. This is the second aspect from my other mail. There are cases, when we send requests (non crypto API) to hardware without passing to crypto-engine. To solve this, I'm thinking of adding new patches that doesn't do request dequeue from crypto-engine queue, just peek, and dequeues the request after was successfully executed by hardware (if it has MAY_BACKLOG flag, otherwise will dequeue it). What do you think? Also, the above modification will imply changes in the drivers that use crypto-engine. Thanks, Iulia >> I've tried this, but it implies modifications in all drivers. For >> example, a driver, in case of error, it frees the resources of the >> request. So, will need to map again a request. > > I think what we are doing here is a major overhaul to the crypto > engine API so while it's always a good idea to minimise the impact, > we should not let the existing drivers constrain us too much. > > Thanks, >
On Tue, Mar 17, 2020 at 01:08:18PM +0000, Iuliana Prodan wrote: > > This case can happen right now, also. I can't guarantee that all drivers > send all requests via crypto-engine. I think this is something that we should address. If a driver is going to use crypto_engine then really all requests should go through that. IOW, we should have a one-to-one relationship between engines and hardware. Thanks,
On 3/27/2020 6:44 AM, Herbert Xu wrote: > On Tue, Mar 17, 2020 at 01:08:18PM +0000, Iuliana Prodan wrote: >> >> This case can happen right now, also. I can't guarantee that all drivers >> send all requests via crypto-engine. > > I think this is something that we should address. I agree we should not have failed requests, _with_ MAY_BACKLOG, that pass through crypto-engine. All these requests must me executed. For this case I have a solution and I'll work on it. > If a driver > is going to use crypto_engine then really all requests should go > through that. IOW, we should have a one-to-one relationship between > engines and hardware. > This cannot happen right now. For non-crypto API requests (like split key or rng) I cannot pass them through crypto-engine which only supports aead, skcipher, hash and akcipher requests. So, I believe that if the first problem is solved (to have all crypto API requests, that have backlog flag executed - retry mechanism instead of reporting failure) the non-crypto API request doesn't have to be pass through crypto-engine - these can be handled by driver. Thanks, Iulia
On Fri, Mar 27, 2020 at 10:44:02AM +0000, Iuliana Prodan wrote: > > This cannot happen right now. > For non-crypto API requests (like split key or rng) I cannot pass them > through crypto-engine which only supports aead, skcipher, hash and > akcipher requests. I don't see why crypto-engine can't be extended to cover that too, right? Cheers,
diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c index eb029ff..dbfd53c2 100644 --- a/crypto/crypto_engine.c +++ b/crypto/crypto_engine.c @@ -22,30 +22,27 @@ * @err: error number */ static void crypto_finalize_request(struct crypto_engine *engine, - struct crypto_async_request *req, int err) + struct crypto_async_request *req, int err) { unsigned long flags; - bool finalize_cur_req = false; int ret; struct crypto_engine_ctx *enginectx; spin_lock_irqsave(&engine->queue_lock, flags); - if (engine->cur_req == req) - finalize_cur_req = true; + /* + * Increment the number of requests completed. + * We'll need it to start the engine on pump_requests, + * if hardware can enqueue new requests. + */ + engine->cnt_finalize++; spin_unlock_irqrestore(&engine->queue_lock, flags); - if (finalize_cur_req) { - enginectx = crypto_tfm_ctx(req->tfm); - if (engine->cur_req_prepared && - enginectx->op.unprepare_request) { - ret = enginectx->op.unprepare_request(engine, req); - if (ret) - dev_err(engine->dev, "failed to unprepare request\n"); - } - spin_lock_irqsave(&engine->queue_lock, flags); - engine->cur_req = NULL; - engine->cur_req_prepared = false; - spin_unlock_irqrestore(&engine->queue_lock, flags); + enginectx = crypto_tfm_ctx(req->tfm); + if (enginectx->op.prepare_request && + enginectx->op.unprepare_request) { + ret = enginectx->op.unprepare_request(engine, req); + if (ret) + dev_err(engine->dev, "failed to unprepare request\n"); } req->complete(req, err); @@ -69,12 +66,19 @@ static void crypto_pump_requests(struct crypto_engine *engine, unsigned long flags; bool was_busy = false; int ret; + int can_enq_more = 0; struct crypto_engine_ctx *enginectx; spin_lock_irqsave(&engine->queue_lock, flags); - /* Make sure we are not already running a request */ - if (engine->cur_req) + /* + * If hardware cannot enqueue new requests, + * stop the engine, until requests are processed and + * hardware can execute new requests. + * We'll start the engine on request completion + * (crypto_finalize_request). + */ + if (engine->cnt_finalize != engine->cnt_do_req) goto out; /* If another context is idling then defer */ @@ -108,13 +112,13 @@ static void crypto_pump_requests(struct crypto_engine *engine, goto out; } +start_request: /* Get the fist request from the engine queue to handle */ backlog = crypto_get_backlog(&engine->queue); async_req = crypto_dequeue_request(&engine->queue); if (!async_req) goto out; - engine->cur_req = async_req; if (backlog) backlog->complete(backlog, -EINPROGRESS); @@ -130,7 +134,7 @@ static void crypto_pump_requests(struct crypto_engine *engine, ret = engine->prepare_crypt_hardware(engine); if (ret) { dev_err(engine->dev, "failed to prepare crypt hardware\n"); - goto req_err; + goto req_err_2; } } @@ -141,25 +145,53 @@ static void crypto_pump_requests(struct crypto_engine *engine, if (ret) { dev_err(engine->dev, "failed to prepare request: %d\n", ret); - goto req_err; + goto req_err_2; } - engine->cur_req_prepared = true; } if (!enginectx->op.do_one_request) { dev_err(engine->dev, "failed to do request\n"); ret = -EINVAL; - goto req_err; + goto req_err_1; } + ret = enginectx->op.do_one_request(engine, async_req); - if (ret) { - dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret); - goto req_err; + can_enq_more = ret; + if (can_enq_more < 0) { + dev_err(engine->dev, "Failed to do one request from queue: %d\n", + ret); + goto req_err_1; + } + + goto retry; + +req_err_1: + if (enginectx->op.unprepare_request) { + ret = enginectx->op.unprepare_request(engine, async_req); + if (ret) + dev_err(engine->dev, "failed to unprepare request\n"); } - return; -req_err: - crypto_finalize_request(engine, async_req, ret); - return; +req_err_2: + async_req->complete(async_req, ret); + +retry: + spin_lock_irqsave(&engine->queue_lock, flags); + + /* + * If hardware can still enqueue requests, + * increment the number of requests accepted by hardware. + * We'll need it to start the engine on pump_requests. + */ + if (can_enq_more >= 0) + engine->cnt_do_req++; + + /* + * We'll send new requests to engine, if there is space. + * If the 2 counters are equal, that means that all requests + * were executed, so we can send new requests. + */ + if (engine->cnt_finalize == engine->cnt_do_req || can_enq_more > 0) + goto start_request; out: spin_unlock_irqrestore(&engine->queue_lock, flags); @@ -386,15 +418,18 @@ int crypto_engine_stop(struct crypto_engine *engine) EXPORT_SYMBOL_GPL(crypto_engine_stop); /** - * crypto_engine_alloc_init - allocate crypto hardware engine structure and - * initialize it. + * crypto_engine_alloc_init_and_set - allocate crypto hardware engine structure + * and initialize it by setting the maximum number of entries in the software + * crypto-engine queue. * @dev: the device attached with one hardware engine * @rt: whether this queue is set to run as a realtime task + * @qlen: maximum size of the crypto-engine queue * * This must be called from context that can sleep. * Return: the crypto engine structure on success, else NULL. */ -struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt) +struct crypto_engine *crypto_engine_alloc_init_and_set(struct device *dev, + bool rt, int qlen) { struct sched_param param = { .sched_priority = MAX_RT_PRIO / 2 }; struct crypto_engine *engine; @@ -411,12 +446,13 @@ struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt) engine->running = false; engine->busy = false; engine->idling = false; - engine->cur_req_prepared = false; + engine->cnt_do_req = 0; + engine->cnt_finalize = 0; engine->priv_data = dev; snprintf(engine->name, sizeof(engine->name), "%s-engine", dev_name(dev)); - crypto_init_queue(&engine->queue, CRYPTO_ENGINE_MAX_QLEN); + crypto_init_queue(&engine->queue, qlen); spin_lock_init(&engine->queue_lock); engine->kworker = kthread_create_worker(0, "%s", engine->name); @@ -433,6 +469,22 @@ struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt) return engine; } +EXPORT_SYMBOL_GPL(crypto_engine_alloc_init_and_set); + +/** + * crypto_engine_alloc_init - allocate crypto hardware engine structure and + * initialize it. + * @dev: the device attached with one hardware engine + * @rt: whether this queue is set to run as a realtime task + * + * This must be called from context that can sleep. + * Return: the crypto engine structure on success, else NULL. + */ +struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt) +{ + return crypto_engine_alloc_init_and_set(dev, rt, + CRYPTO_ENGINE_MAX_QLEN); +} EXPORT_SYMBOL_GPL(crypto_engine_alloc_init); /** diff --git a/include/crypto/engine.h b/include/crypto/engine.h index e29cd67..33a5be2 100644 --- a/include/crypto/engine.h +++ b/include/crypto/engine.h @@ -24,7 +24,8 @@ * @idling: the engine is entering idle state * @busy: request pump is busy * @running: the engine is on working - * @cur_req_prepared: current request is prepared + * @cnt_finalize: number of completed/finalized requests + * @cnt_do_req: number of requests accepted by hardware * @list: link with the global crypto engine list * @queue_lock: spinlock to syncronise access to request queue * @queue: the crypto queue of the engine @@ -38,14 +39,15 @@ * @kworker: kthread worker struct for request pump * @pump_requests: work struct for scheduling work to the request pump * @priv_data: the engine private data - * @cur_req: the current request which is on processing */ struct crypto_engine { char name[ENGINE_NAME_LEN]; bool idling; bool busy; bool running; - bool cur_req_prepared; + + u32 cnt_finalize; + u32 cnt_do_req; struct list_head list; spinlock_t queue_lock; @@ -61,7 +63,6 @@ struct crypto_engine { struct kthread_work pump_requests; void *priv_data; - struct crypto_async_request *cur_req; }; /* @@ -102,6 +103,8 @@ void crypto_finalize_skcipher_request(struct crypto_engine *engine, int crypto_engine_start(struct crypto_engine *engine); int crypto_engine_stop(struct crypto_engine *engine); struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt); +struct crypto_engine *crypto_engine_alloc_init_and_set(struct device *dev, + bool rt, int qlen); int crypto_engine_exit(struct crypto_engine *engine); #endif /* _CRYPTO_ENGINE_H */