diff mbox series

[v4,1/2] crypto: engine - support for parallel requests

Message ID 1583707893-23699-2-git-send-email-iuliana.prodan@nxp.com (mailing list archive)
State Changes Requested
Delegated to: Herbert Xu
Headers show
Series crypto: engine - support for parallel and batch requests | expand

Commit Message

Iuliana Prodan March 8, 2020, 10:51 p.m. UTC
Added support for executing multiple requests, in parallel,
for crypto engine.

Two new variables are added, cnt_do_req (number of requests accepted
by hardware) and cnt_finalize (number of completed/
finalized requests), which keeps track whether the hardware
can enqueue new requests.
cnt_do_req will be set based on the return value of
do_one_request(), which is expected to be:
> 0: if hardware still has space in its queue. A driver can implement
do_one_request() to return the number of free entries in
hardware queue;
0: if the request was accepted, by hardware, with success, but the
hardware doesn't support multiple requests or there is no space
left in the hardware queue.
This is to keep the backward compatibility of crypto-engine.
< 0: error.

cnt_finalize will be increased in crypto_finalize_request.

The new crypto_engine_alloc_init_and_set function, initialize
crypto-engine, sets the maximum size for crypto-engine software
queue (not hardcoded anymore) and the cnt_do_req and cnt_finalize
variables will be set, by default, to 0.
On crypto_pump_requests(), if do_one_request() returns > 0,
a new request is send to hardware, until there is no space
and do_one_request() returns 0.

By default, if do_one_request() returns 0, crypto-engine will
work as before - will send requests to hardware,
one-by-one, on crypto_pump_requests(), and complete it, on
crypto_finalize_request(), and so on.
To support multiple requests, in each driver, do_one_request()
needs to be updated to return > 0, if there is space in
hardware queue, otherwise will work as before.

Signed-off-by: Iuliana Prodan <iuliana.prodan@nxp.com>
---
 crypto/crypto_engine.c  | 122 ++++++++++++++++++++++++++++++++++--------------
 include/crypto/engine.h |  11 +++--
 2 files changed, 94 insertions(+), 39 deletions(-)

Comments

Herbert Xu March 12, 2020, 3:25 a.m. UTC | #1
On Mon, Mar 09, 2020 at 12:51:32AM +0200, Iuliana Prodan wrote:
>
>  	ret = enginectx->op.do_one_request(engine, async_req);
> -	if (ret) {
> -		dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret);
> -		goto req_err;
> +	can_enq_more = ret;
> +	if (can_enq_more < 0) {
> +		dev_err(engine->dev, "Failed to do one request from queue: %d\n",
> +			ret);
> +		goto req_err_1;
> +	}

So this now includes the case of the hardware queue being full
and the request needs to be queued until space opens up again.
In this case, we should not do dev_err.  So you need to be able
to distinguish between the hardware queue being full vs. a real
fatal error on the request (e.g., out-of-memory or some hardware
failure).

Thanks,
Horia Geanta March 12, 2020, 11:05 a.m. UTC | #2
On 3/12/2020 5:26 AM, Herbert Xu wrote:
> On Mon, Mar 09, 2020 at 12:51:32AM +0200, Iuliana Prodan wrote:
>>
>>  	ret = enginectx->op.do_one_request(engine, async_req);
>> -	if (ret) {
>> -		dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret);
>> -		goto req_err;
>> +	can_enq_more = ret;
>> +	if (can_enq_more < 0) {
>> +		dev_err(engine->dev, "Failed to do one request from queue: %d\n",
>> +			ret);
>> +		goto req_err_1;
>> +	}
> 
> So this now includes the case of the hardware queue being full
> and the request needs to be queued until space opens up again.
I see no difference when compared with existing implementation:
in both cases failing the transfer from SW queue to HW queue means
losing the request irrespective of the error code returned by .do_one_request.

This doesn't mean it shouldn't be fixed.

> In this case, we should not do dev_err.  So you need to be able
> to distinguish between the hardware queue being full vs. a real
> fatal error on the request (e.g., out-of-memory or some hardware
> failure).
> 
Yes, in case of -ENOSPC (HW queue full) the request should be put back
in the SW queue.

Thanks,
Horia
Iuliana Prodan March 12, 2020, 12:45 p.m. UTC | #3
On 3/12/2020 5:26 AM, Herbert Xu wrote:
> On Mon, Mar 09, 2020 at 12:51:32AM +0200, Iuliana Prodan wrote:
>>
>>   	ret = enginectx->op.do_one_request(engine, async_req);
>> -	if (ret) {
>> -		dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret);
>> -		goto req_err;
>> +	can_enq_more = ret;
>> +	if (can_enq_more < 0) {
>> +		dev_err(engine->dev, "Failed to do one request from queue: %d\n",
>> +			ret);
>> +		goto req_err_1;
>> +	}
> 
> So this now includes the case of the hardware queue being full
> and the request needs to be queued until space opens up again.
> In this case, we should not do dev_err.  So you need to be able
> to distinguish between the hardware queue being full vs. a real
> fatal error on the request (e.g., out-of-memory or some hardware
> failure).
> 
There are two aspects here:
- if all requests go through crypto-engine, and, in this case, if there 
is no space in hw queue, do_one_req returns 0, and actually there will 
be no case of do_one_request() < 0;
- if there are other requests (non crypto API) that go to hw for 
execution (in CAAM we have split key or RNG) those might occupy the hw 
queue, after crypto-engine returns that it still has space. This case 
wasn't supported before my modifications and neither with these patches.
This use-case can be solved by retrying to send the request to hw - 
enqueue it back to crypto-engine, in the head of the queue (to keep the 
order, and send it again to hw).
I've tried this, but it implies modifications in all drivers. For 
example, a driver, in case of error, it frees the resources of the 
request. So, will need to map again a request.

Thanks,
Iulia
Iuliana Prodan March 12, 2020, 12:52 p.m. UTC | #4
On 3/12/2020 2:45 PM, Iuliana Prodan wrote:
> On 3/12/2020 5:26 AM, Herbert Xu wrote:
>> On Mon, Mar 09, 2020 at 12:51:32AM +0200, Iuliana Prodan wrote:
>>>
>>>    	ret = enginectx->op.do_one_request(engine, async_req);
>>> -	if (ret) {
>>> -		dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret);
>>> -		goto req_err;
>>> +	can_enq_more = ret;
>>> +	if (can_enq_more < 0) {
>>> +		dev_err(engine->dev, "Failed to do one request from queue: %d\n",
>>> +			ret);
>>> +		goto req_err_1;
>>> +	}
>>
>> So this now includes the case of the hardware queue being full
>> and the request needs to be queued until space opens up again.
>> In this case, we should not do dev_err.  So you need to be able
>> to distinguish between the hardware queue being full vs. a real
>> fatal error on the request (e.g., out-of-memory or some hardware
>> failure).
>>
> There are two aspects here:
> - if all requests go through crypto-engine, and, in this case, if there
> is no space in hw queue, do_one_req returns 0, and actually there will
> be no case of do_one_request() < 0;
> - if there are other requests (non crypto API) that go to hw for
> execution (in CAAM we have split key or RNG) those might occupy the hw
> queue, after crypto-engine returns that it still has space. This case
> wasn't supported before my modifications and neither with these patches.
> This use-case can be solved by retrying to send the request to hw -
> enqueue it back to crypto-engine, in the head of the queue (to keep the
> order, and send it again to hw).
> I've tried this, but it implies modifications in all drivers. For
> example, a driver, in case of error, it frees the resources of the
> request. So, will need to map again a request.
> 

I believe this recovery mechanism should be handled in other patch (series).
Herbert, what is your proposal to move forward?

Thanks,
Iulia
Herbert Xu March 17, 2020, 3:22 a.m. UTC | #5
On Thu, Mar 12, 2020 at 01:05:32PM +0200, Horia Geantă wrote:
> On 3/12/2020 5:26 AM, Herbert Xu wrote:
> > On Mon, Mar 09, 2020 at 12:51:32AM +0200, Iuliana Prodan wrote:
> >>
> >>  	ret = enginectx->op.do_one_request(engine, async_req);
> >> -	if (ret) {
> >> -		dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret);
> >> -		goto req_err;
> >> +	can_enq_more = ret;
> >> +	if (can_enq_more < 0) {
> >> +		dev_err(engine->dev, "Failed to do one request from queue: %d\n",
> >> +			ret);
> >> +		goto req_err_1;
> >> +	}
> > 
> > So this now includes the case of the hardware queue being full
> > and the request needs to be queued until space opens up again.
> I see no difference when compared with existing implementation:
> in both cases failing the transfer from SW queue to HW queue means
> losing the request irrespective of the error code returned by .do_one_request.
> 
> This doesn't mean it shouldn't be fixed.

I don't think they are the same though.  With the existing code,
you only ever have one outstanding request so a new one is only
given over to the hardware after the previous one has completed.
That means that the only errors you expect to get from the driver
are fatal ones that you cannot recover from.

With parallel requests, you will be giving as many requests to
the driver as it can take.  In fact the error condition is now
used to tell the engine to stop giving more requests.  This is
in no way the same as a fatal error from before.

We should not print out an error in this case and we should ensure
that the request is put back on the queue and reprocessed when the
driver comes back for more.

Cheers,
Herbert Xu March 17, 2020, 3:29 a.m. UTC | #6
On Thu, Mar 12, 2020 at 12:45:54PM +0000, Iuliana Prodan wrote:
>
> There are two aspects here:
> - if all requests go through crypto-engine, and, in this case, if there 
> is no space in hw queue, do_one_req returns 0, and actually there will 
> be no case of do_one_request() < 0;

OK, that makes sense.  However, this way of signaling for more
requests can be racy.  Unless you can guarantee that the driver
is not taking any requests from another engine queue (or any
other source), just because it returned a positive value now does
not mean that it would be able to take a request the next time
you come around the loop.

> I've tried this, but it implies modifications in all drivers. For 
> example, a driver, in case of error, it frees the resources of the 
> request. So, will need to map again a request.

I think what we are doing here is a major overhaul to the crypto
engine API so while it's always a good idea to minimise the impact,
we should not let the existing drivers constrain us too much.

Thanks,
Iuliana Prodan March 17, 2020, 1:08 p.m. UTC | #7
On 3/17/2020 5:29 AM, Herbert Xu wrote:
> On Thu, Mar 12, 2020 at 12:45:54PM +0000, Iuliana Prodan wrote:
>>
>> There are two aspects here:
>> - if all requests go through crypto-engine, and, in this case, if there
>> is no space in hw queue, do_one_req returns 0, and actually there will
>> be no case of do_one_request() < 0;
> 
> OK, that makes sense.  However, this way of signaling for more
> requests can be racy.  Unless you can guarantee that the driver
> is not taking any requests from another engine queue (or any
> other source), just because it returned a positive value now does
> not mean that it would be able to take a request the next time
> you come around the loop.
> 

This case can happen right now, also. I can't guarantee that all drivers 
send all requests via crypto-engine.
This is the second aspect from my other mail. There are cases, when we 
send requests (non crypto API) to hardware without passing to crypto-engine.

To solve this, I'm thinking of adding new patches that doesn't do 
request dequeue from crypto-engine queue, just peek, and dequeues the 
request after was successfully executed by hardware (if it has 
MAY_BACKLOG flag, otherwise will dequeue it). What do you think?

Also, the above modification will imply changes in the drivers that use 
crypto-engine.

Thanks,
Iulia

>> I've tried this, but it implies modifications in all drivers. For
>> example, a driver, in case of error, it frees the resources of the
>> request. So, will need to map again a request.
> 
> I think what we are doing here is a major overhaul to the crypto
> engine API so while it's always a good idea to minimise the impact,
> we should not let the existing drivers constrain us too much.
> 
> Thanks,
>
Herbert Xu March 27, 2020, 4:44 a.m. UTC | #8
On Tue, Mar 17, 2020 at 01:08:18PM +0000, Iuliana Prodan wrote:
> 
> This case can happen right now, also. I can't guarantee that all drivers 
> send all requests via crypto-engine.

I think this is something that we should address.  If a driver
is going to use crypto_engine then really all requests should go
through that.  IOW, we should have a one-to-one relationship between
engines and hardware.

Thanks,
Iuliana Prodan March 27, 2020, 10:44 a.m. UTC | #9
On 3/27/2020 6:44 AM, Herbert Xu wrote:
> On Tue, Mar 17, 2020 at 01:08:18PM +0000, Iuliana Prodan wrote:
>>
>> This case can happen right now, also. I can't guarantee that all drivers
>> send all requests via crypto-engine.
> 
> I think this is something that we should address.  


I agree we should not have failed requests, _with_ MAY_BACKLOG, that 
pass through crypto-engine. All these requests must me executed. For 
this case I have a solution and I'll work on it.

> If a driver
> is going to use crypto_engine then really all requests should go
> through that.  IOW, we should have a one-to-one relationship between
> engines and hardware.
> 
This cannot happen right now.
For non-crypto API requests (like split key or rng) I cannot pass them 
through crypto-engine which only supports aead, skcipher, hash and 
akcipher requests.

So, I believe that if the first problem is solved (to have all crypto 
API requests, that have backlog flag executed - retry mechanism instead 
of reporting failure) the non-crypto API request doesn't have to be pass 
through crypto-engine - these can be handled by driver.

Thanks,
Iulia
Herbert Xu April 3, 2020, 6:19 a.m. UTC | #10
On Fri, Mar 27, 2020 at 10:44:02AM +0000, Iuliana Prodan wrote:
>
> This cannot happen right now.
> For non-crypto API requests (like split key or rng) I cannot pass them 
> through crypto-engine which only supports aead, skcipher, hash and 
> akcipher requests.

I don't see why crypto-engine can't be extended to cover that
too, right?

Cheers,
diff mbox series

Patch

diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
index eb029ff..dbfd53c2 100644
--- a/crypto/crypto_engine.c
+++ b/crypto/crypto_engine.c
@@ -22,30 +22,27 @@ 
  * @err: error number
  */
 static void crypto_finalize_request(struct crypto_engine *engine,
-			     struct crypto_async_request *req, int err)
+				    struct crypto_async_request *req, int err)
 {
 	unsigned long flags;
-	bool finalize_cur_req = false;
 	int ret;
 	struct crypto_engine_ctx *enginectx;
 
 	spin_lock_irqsave(&engine->queue_lock, flags);
-	if (engine->cur_req == req)
-		finalize_cur_req = true;
+	/*
+	 * Increment the number of requests completed.
+	 * We'll need it to start the engine on pump_requests,
+	 * if hardware can enqueue new requests.
+	 */
+	engine->cnt_finalize++;
 	spin_unlock_irqrestore(&engine->queue_lock, flags);
 
-	if (finalize_cur_req) {
-		enginectx = crypto_tfm_ctx(req->tfm);
-		if (engine->cur_req_prepared &&
-		    enginectx->op.unprepare_request) {
-			ret = enginectx->op.unprepare_request(engine, req);
-			if (ret)
-				dev_err(engine->dev, "failed to unprepare request\n");
-		}
-		spin_lock_irqsave(&engine->queue_lock, flags);
-		engine->cur_req = NULL;
-		engine->cur_req_prepared = false;
-		spin_unlock_irqrestore(&engine->queue_lock, flags);
+	enginectx = crypto_tfm_ctx(req->tfm);
+	if (enginectx->op.prepare_request &&
+	    enginectx->op.unprepare_request) {
+		ret = enginectx->op.unprepare_request(engine, req);
+		if (ret)
+			dev_err(engine->dev, "failed to unprepare request\n");
 	}
 
 	req->complete(req, err);
@@ -69,12 +66,19 @@  static void crypto_pump_requests(struct crypto_engine *engine,
 	unsigned long flags;
 	bool was_busy = false;
 	int ret;
+	int can_enq_more = 0;
 	struct crypto_engine_ctx *enginectx;
 
 	spin_lock_irqsave(&engine->queue_lock, flags);
 
-	/* Make sure we are not already running a request */
-	if (engine->cur_req)
+	/*
+	 * If hardware cannot enqueue new requests,
+	 * stop the engine, until requests are processed and
+	 * hardware can execute new requests.
+	 * We'll start the engine on request completion
+	 * (crypto_finalize_request).
+	 */
+	if (engine->cnt_finalize != engine->cnt_do_req)
 		goto out;
 
 	/* If another context is idling then defer */
@@ -108,13 +112,13 @@  static void crypto_pump_requests(struct crypto_engine *engine,
 		goto out;
 	}
 
+start_request:
 	/* Get the fist request from the engine queue to handle */
 	backlog = crypto_get_backlog(&engine->queue);
 	async_req = crypto_dequeue_request(&engine->queue);
 	if (!async_req)
 		goto out;
 
-	engine->cur_req = async_req;
 	if (backlog)
 		backlog->complete(backlog, -EINPROGRESS);
 
@@ -130,7 +134,7 @@  static void crypto_pump_requests(struct crypto_engine *engine,
 		ret = engine->prepare_crypt_hardware(engine);
 		if (ret) {
 			dev_err(engine->dev, "failed to prepare crypt hardware\n");
-			goto req_err;
+			goto req_err_2;
 		}
 	}
 
@@ -141,25 +145,53 @@  static void crypto_pump_requests(struct crypto_engine *engine,
 		if (ret) {
 			dev_err(engine->dev, "failed to prepare request: %d\n",
 				ret);
-			goto req_err;
+			goto req_err_2;
 		}
-		engine->cur_req_prepared = true;
 	}
 	if (!enginectx->op.do_one_request) {
 		dev_err(engine->dev, "failed to do request\n");
 		ret = -EINVAL;
-		goto req_err;
+		goto req_err_1;
 	}
+
 	ret = enginectx->op.do_one_request(engine, async_req);
-	if (ret) {
-		dev_err(engine->dev, "Failed to do one request from queue: %d\n", ret);
-		goto req_err;
+	can_enq_more = ret;
+	if (can_enq_more < 0) {
+		dev_err(engine->dev, "Failed to do one request from queue: %d\n",
+			ret);
+		goto req_err_1;
+	}
+
+	goto retry;
+
+req_err_1:
+	if (enginectx->op.unprepare_request) {
+		ret = enginectx->op.unprepare_request(engine, async_req);
+		if (ret)
+			dev_err(engine->dev, "failed to unprepare request\n");
 	}
-	return;
 
-req_err:
-	crypto_finalize_request(engine, async_req, ret);
-	return;
+req_err_2:
+	async_req->complete(async_req, ret);
+
+retry:
+	spin_lock_irqsave(&engine->queue_lock, flags);
+
+	/*
+	 * If hardware can still enqueue requests,
+	 * increment the number of requests accepted by hardware.
+	 * We'll need it to start the engine on pump_requests.
+	 */
+	if (can_enq_more >= 0)
+		engine->cnt_do_req++;
+
+	/*
+	 * We'll send new requests to engine, if there is space.
+	 * If the 2 counters are equal, that means that all requests
+	 * were executed, so we can send new requests.
+	 */
+	if (engine->cnt_finalize == engine->cnt_do_req || can_enq_more > 0)
+		goto start_request;
 
 out:
 	spin_unlock_irqrestore(&engine->queue_lock, flags);
@@ -386,15 +418,18 @@  int crypto_engine_stop(struct crypto_engine *engine)
 EXPORT_SYMBOL_GPL(crypto_engine_stop);
 
 /**
- * crypto_engine_alloc_init - allocate crypto hardware engine structure and
- * initialize it.
+ * crypto_engine_alloc_init_and_set - allocate crypto hardware engine structure
+ * and initialize it by setting the maximum number of entries in the software
+ * crypto-engine queue.
  * @dev: the device attached with one hardware engine
  * @rt: whether this queue is set to run as a realtime task
+ * @qlen: maximum size of the crypto-engine queue
  *
  * This must be called from context that can sleep.
  * Return: the crypto engine structure on success, else NULL.
  */
-struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
+struct crypto_engine *crypto_engine_alloc_init_and_set(struct device *dev,
+						       bool rt, int qlen)
 {
 	struct sched_param param = { .sched_priority = MAX_RT_PRIO / 2 };
 	struct crypto_engine *engine;
@@ -411,12 +446,13 @@  struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
 	engine->running = false;
 	engine->busy = false;
 	engine->idling = false;
-	engine->cur_req_prepared = false;
+	engine->cnt_do_req = 0;
+	engine->cnt_finalize = 0;
 	engine->priv_data = dev;
 	snprintf(engine->name, sizeof(engine->name),
 		 "%s-engine", dev_name(dev));
 
-	crypto_init_queue(&engine->queue, CRYPTO_ENGINE_MAX_QLEN);
+	crypto_init_queue(&engine->queue, qlen);
 	spin_lock_init(&engine->queue_lock);
 
 	engine->kworker = kthread_create_worker(0, "%s", engine->name);
@@ -433,6 +469,22 @@  struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
 
 	return engine;
 }
+EXPORT_SYMBOL_GPL(crypto_engine_alloc_init_and_set);
+
+/**
+ * crypto_engine_alloc_init - allocate crypto hardware engine structure and
+ * initialize it.
+ * @dev: the device attached with one hardware engine
+ * @rt: whether this queue is set to run as a realtime task
+ *
+ * This must be called from context that can sleep.
+ * Return: the crypto engine structure on success, else NULL.
+ */
+struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt)
+{
+	return crypto_engine_alloc_init_and_set(dev, rt,
+						CRYPTO_ENGINE_MAX_QLEN);
+}
 EXPORT_SYMBOL_GPL(crypto_engine_alloc_init);
 
 /**
diff --git a/include/crypto/engine.h b/include/crypto/engine.h
index e29cd67..33a5be2 100644
--- a/include/crypto/engine.h
+++ b/include/crypto/engine.h
@@ -24,7 +24,8 @@ 
  * @idling: the engine is entering idle state
  * @busy: request pump is busy
  * @running: the engine is on working
- * @cur_req_prepared: current request is prepared
+ * @cnt_finalize: number of completed/finalized requests
+ * @cnt_do_req: number of requests accepted by hardware
  * @list: link with the global crypto engine list
  * @queue_lock: spinlock to syncronise access to request queue
  * @queue: the crypto queue of the engine
@@ -38,14 +39,15 @@ 
  * @kworker: kthread worker struct for request pump
  * @pump_requests: work struct for scheduling work to the request pump
  * @priv_data: the engine private data
- * @cur_req: the current request which is on processing
  */
 struct crypto_engine {
 	char			name[ENGINE_NAME_LEN];
 	bool			idling;
 	bool			busy;
 	bool			running;
-	bool			cur_req_prepared;
+
+	u32			cnt_finalize;
+	u32			cnt_do_req;
 
 	struct list_head	list;
 	spinlock_t		queue_lock;
@@ -61,7 +63,6 @@  struct crypto_engine {
 	struct kthread_work             pump_requests;
 
 	void				*priv_data;
-	struct crypto_async_request	*cur_req;
 };
 
 /*
@@ -102,6 +103,8 @@  void crypto_finalize_skcipher_request(struct crypto_engine *engine,
 int crypto_engine_start(struct crypto_engine *engine);
 int crypto_engine_stop(struct crypto_engine *engine);
 struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt);
+struct crypto_engine *crypto_engine_alloc_init_and_set(struct device *dev,
+						       bool rt, int qlen);
 int crypto_engine_exit(struct crypto_engine *engine);
 
 #endif /* _CRYPTO_ENGINE_H */