diff mbox

[11/14] Revert "crypto: caam - get rid of tasklet"

Message ID 1478681184-9442-12-git-send-email-horia.geanta@nxp.com (mailing list archive)
State Accepted
Delegated to: Herbert Xu
Headers show

Commit Message

Horia Geanta Nov. 9, 2016, 8:46 a.m. UTC
This reverts commit 66d2e2028091a074aa1290d2eeda5ddb1a6c329c.

Quoting from Russell's findings:
https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg21136.html

[quote]
Okay, I've re-tested, using a different way of measuring, because using
openssl speed is impractical for off-loaded engines.  I've decided to
use this way to measure the performance:

dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5

For the threaded IRQs case gives:

0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k
0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k
0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k
	=> 5.36s => 25.0MB/s

and the tasklet case:

0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k
0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k
0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k
	=> 4.95 => 27.1MB/s

which corresponds to an 8% slowdown for the threaded IRQ case.  So,
tasklets are indeed faster than threaded IRQs.

[...]

I think I've proven from the above that this patch needs to be reverted
due to the performance regression, and that there _is_ most definitely
a deterimental effect of switching from tasklets to threaded IRQs.
[/quote]

Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
---
 drivers/crypto/caam/intern.h |  1 +
 drivers/crypto/caam/jr.c     | 25 ++++++++++++++++---------
 2 files changed, 17 insertions(+), 9 deletions(-)

Comments

Thomas Gleixner Nov. 9, 2016, 11:17 p.m. UTC | #1
On Wed, 9 Nov 2016, Russell King - ARM Linux wrote:
> Please include Thomas in this.

Thanks!
 
> On Wed, Nov 09, 2016 at 10:46:21AM +0200, Horia Geantă wrote:
> > This reverts commit 66d2e2028091a074aa1290d2eeda5ddb1a6c329c.
> > 
> > Quoting from Russell's findings:
> > https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg21136.html
> > 
> > [quote]
> > Okay, I've re-tested, using a different way of measuring, because using
> > openssl speed is impractical for off-loaded engines.  I've decided to
> > use this way to measure the performance:
> > 
> > dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5
> > 
> > For the threaded IRQs case gives:
> > 
> > 0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k
> > 0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k
> > 0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k
> > 	=> 5.36s => 25.0MB/s
> > 
> > and the tasklet case:
> > 
> > 0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k
> > 0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k
> > 0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k
> > 	=> 4.95 => 27.1MB/s
> > 
> > which corresponds to an 8% slowdown for the threaded IRQ case.  So,
> > tasklets are indeed faster than threaded IRQs.

Each operation involves:

 submit job
 hard irq handler
 wakeup irq thread
 context switch
 wakeup openssl
 context switch

and that repeats over and over.

Russell has provided me traces and the extra wakeup/context switch is what
causes the slowdown vs. the tasklet case which has only one wakeup/context
switch, unless it gets outsourced to ksoftirqd which might get even slower
than the threaded handler.

You might add something like that to the changelog so it's not just a
number comparison which does not explain why this happens.

Thanks,

	tglx
Thomas Gleixner Nov. 9, 2016, 11:19 p.m. UTC | #2
On Thu, 10 Nov 2016, Thomas Gleixner wrote:
> > > which corresponds to an 8% slowdown for the threaded IRQ case.  So,
> > > tasklets are indeed faster than threaded IRQs.

Forgot to say, that this should be:

  So, tasklets are indeed faster than threaded IRQs for this particular use
  case. They are not generally faster.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/crypto/caam/intern.h b/drivers/crypto/caam/intern.h
index 5d4c05074a5c..e2bcacc1a921 100644
--- a/drivers/crypto/caam/intern.h
+++ b/drivers/crypto/caam/intern.h
@@ -41,6 +41,7 @@  struct caam_drv_private_jr {
 	struct device		*dev;
 	int ridx;
 	struct caam_job_ring __iomem *rregs;	/* JobR's register space */
+	struct tasklet_struct irqtask;
 	int irq;			/* One per queue */
 
 	/* Number of scatterlist crypt transforms active on the JobR */
diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c
index 7331ea734f37..c8604dfadbf5 100644
--- a/drivers/crypto/caam/jr.c
+++ b/drivers/crypto/caam/jr.c
@@ -73,6 +73,8 @@  static int caam_jr_shutdown(struct device *dev)
 
 	ret = caam_reset_hw_jr(dev);
 
+	tasklet_kill(&jrp->irqtask);
+
 	/* Release interrupt */
 	free_irq(jrp->irq, dev);
 
@@ -128,7 +130,7 @@  static irqreturn_t caam_jr_interrupt(int irq, void *st_dev)
 
 	/*
 	 * Check the output ring for ready responses, kick
-	 * the threaded irq if jobs done.
+	 * tasklet if jobs done.
 	 */
 	irqstate = rd_reg32(&jrp->rregs->jrintstatus);
 	if (!irqstate)
@@ -150,13 +152,18 @@  static irqreturn_t caam_jr_interrupt(int irq, void *st_dev)
 	/* Have valid interrupt at this point, just ACK and trigger */
 	wr_reg32(&jrp->rregs->jrintstatus, irqstate);
 
-	return IRQ_WAKE_THREAD;
+	preempt_disable();
+	tasklet_schedule(&jrp->irqtask);
+	preempt_enable();
+
+	return IRQ_HANDLED;
 }
 
-static irqreturn_t caam_jr_threadirq(int irq, void *st_dev)
+/* Deferred service handler, run as interrupt-fired tasklet */
+static void caam_jr_dequeue(unsigned long devarg)
 {
 	int hw_idx, sw_idx, i, head, tail;
-	struct device *dev = st_dev;
+	struct device *dev = (struct device *)devarg;
 	struct caam_drv_private_jr *jrp = dev_get_drvdata(dev);
 	void (*usercall)(struct device *dev, u32 *desc, u32 status, void *arg);
 	u32 *userdesc, userstatus;
@@ -230,8 +237,6 @@  static irqreturn_t caam_jr_threadirq(int irq, void *st_dev)
 
 	/* reenable / unmask IRQs */
 	clrsetbits_32(&jrp->rregs->rconfig_lo, JRCFG_IMSK, 0);
-
-	return IRQ_HANDLED;
 }
 
 /**
@@ -389,10 +394,11 @@  static int caam_jr_init(struct device *dev)
 
 	jrp = dev_get_drvdata(dev);
 
+	tasklet_init(&jrp->irqtask, caam_jr_dequeue, (unsigned long)dev);
+
 	/* Connect job ring interrupt handler. */
-	error = request_threaded_irq(jrp->irq, caam_jr_interrupt,
-				     caam_jr_threadirq, IRQF_SHARED,
-				     dev_name(dev), dev);
+	error = request_irq(jrp->irq, caam_jr_interrupt, IRQF_SHARED,
+			    dev_name(dev), dev);
 	if (error) {
 		dev_err(dev, "can't connect JobR %d interrupt (%d)\n",
 			jrp->ridx, jrp->irq);
@@ -454,6 +460,7 @@  static int caam_jr_init(struct device *dev)
 out_free_irq:
 	free_irq(jrp->irq, dev);
 out_kill_deq:
+	tasklet_kill(&jrp->irqtask);
 	return error;
 }