From patchwork Thu Oct 18 22:20:46 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Greer X-Patchwork-Id: 1613781 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork2.kernel.org (Postfix) with ESMTP id 491BDE00B1 for ; Thu, 18 Oct 2012 22:30:06 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1TOyYu-00055z-JP; Thu, 18 Oct 2012 22:27:32 +0000 Received: from mail20.dotsterhost.com ([66.11.232.73]) by merlin.infradead.org with smtps (Exim 4.76 #1 (Red Hat Linux)) id 1TOyYq-00055B-6e for linux-arm-kernel@lists.infradead.org; Thu, 18 Oct 2012 22:27:29 +0000 Received: (qmail 7979 invoked from network); 18 Oct 2012 22:20:47 -0000 Received: from unknown (HELO blue.animalcreek.com) (mgreer@animalcreek.com@[68.3.93.7]) by 66.11.232.73 with SMTP; 18 Oct 2012 22:20:47 -0000 Received: by blue.animalcreek.com (Postfix, from userid 1001) id B74DB65C2F; Thu, 18 Oct 2012 15:20:46 -0700 (MST) Date: Thu, 18 Oct 2012 15:20:46 -0700 From: "Mark A. Greer" To: linux-omap@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [RFC] dmaengine: omap-dma: Allow DMA controller to prefetch data Message-ID: <20121018222046.GA28541@animalcreek.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-Spam-Note: CRM114 invocation failed X-Spam-Score: -1.9 (-) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-1.9 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [66.11.232.73 listed in list.dnswl.org] -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: Peter Ujfalusi , Russell King X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Enable DMA prefetching by setting the 'OMAP_DMA_DST_SYNC_PREFETCH' flag whenever there is a destination synchronized DMA transfer. Prefetching is not allowed on source synchronized DMA transfers. Enabling prefetch significantly improves DMA performance. For example, running 'modprobe tcrypt sec=2 mode=403' which exercises the omap-sham driver on an am37x EVM yeilds the following results: a) With prefetch disabled testing speed of async sha1 test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 24049 opers/sec, 384784 bytes/sec test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 22030 opers/sec, 1409920 bytes/sec test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 24055 opers/sec, 1539520 bytes/sec test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 7648 opers/sec, 1958016 bytes/sec test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 7918 opers/sec, 2027008 bytes/sec test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 8000 opers/sec, 2048000 bytes/sec test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 3295 opers/sec, 3374080 bytes/sec test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 3602 opers/sec, 3688960 bytes/sec test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 3753 opers/sec, 3843072 bytes/sec test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 3239 opers/sec, 6633472 bytes/sec test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 3557 opers/sec, 7284736 bytes/sec test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 3591 opers/sec, 7354368 bytes/sec test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 3598 opers/sec, 7369728 bytes/sec test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 1751 opers/sec, 7174144 bytes/sec test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 2302 opers/sec, 9431040 bytes/sec test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 2087 opers/sec, 8548352 bytes/sec test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 2050 opers/sec, 8398848 bytes/sec test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 864 opers/sec, 7077888 bytes/sec test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 993 opers/sec, 8138752 bytes/sec test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 936 opers/sec, 7671808 bytes/sec test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 1048 opers/sec, 8589312 bytes/sec test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 1274 opers/sec, 10436608 bytes/sec b) With prefetch enabled testing speed of async sha1 test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 23868 opers/sec, 381888 bytes/sec test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 21928 opers/sec, 1403424 bytes/sec test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 23910 opers/sec, 1530272 bytes/sec test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 7664 opers/sec, 1962112 bytes/sec test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 7924 opers/sec, 2028672 bytes/sec test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 8006 opers/sec, 2049536 bytes/sec test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 3276 opers/sec, 3355136 bytes/sec test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 3856 opers/sec, 3949056 bytes/sec test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 3634 opers/sec, 3721728 bytes/sec test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 3257 opers/sec, 6670336 bytes/sec test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 3604 opers/sec, 7380992 bytes/sec test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 3604 opers/sec, 7380992 bytes/sec test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 3624 opers/sec, 7422976 bytes/sec test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 2698 opers/sec, 11051008 bytes/sec test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 3500 opers/sec, 14336000 bytes/sec test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 3596 opers/sec, 14729216 bytes/sec test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 3588 opers/sec, 14698496 bytes/sec test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 1319 opers/sec, 10809344 bytes/sec test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 1550 opers/sec, 12701696 bytes/sec test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 1164 opers/sec, 9539584 bytes/sec test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 1802 opers/sec, 14766080 bytes/sec test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 1720 opers/sec, 14094336 bytes/sec CC: Peter Ujfalusi CC: Russell King Signed-off-by: Mark A. Greer --- This patch seems fairly stable but I've only tested omap-sham (crypto) and omap_hsmmc (mmc) on an am37x EVM. I also enabled burst mode but that made the system unstable when exercising either omap-sham or omap_hsmmc. I'm unaware of any errata that would make this an unwanted modification but I haven't checked all of the SoCs. Are there other reasons that this should be applied?? The different types of hardware that I have is somewhat limited so if you have some different platforms/SoCs, please give this patch a try. It should apply cleanly against recent k.o. kernels. Note that the current omap-sham driver doesn't use the dmaengine API but I have a set of patches to convert it which is what I used when testing. I will submit those patches once they're ready (next day or so). Also note that an am37xx GP actually does have sham hardware and yours might too if you look closely. If so, you'll have hack omap_sham_mod_init() to use it. Thanks, Mark drivers/dma/omap-dma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c index bb2d8e7..aadddb2 100644 --- a/drivers/dma/omap-dma.c +++ b/drivers/dma/omap-dma.c @@ -310,7 +310,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg( dev_addr = c->cfg.dst_addr; dev_width = c->cfg.dst_addr_width; burst = c->cfg.dst_maxburst; - sync_type = OMAP_DMA_DST_SYNC; + sync_type = OMAP_DMA_DST_SYNC | OMAP_DMA_DST_SYNC_PREFETCH; } else { dev_err(chan->device->dev, "%s: bad direction?\n", __func__); return NULL; @@ -387,7 +387,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_dma_cyclic( dev_addr = c->cfg.dst_addr; dev_width = c->cfg.dst_addr_width; burst = c->cfg.dst_maxburst; - sync_type = OMAP_DMA_DST_SYNC; + sync_type = OMAP_DMA_DST_SYNC | OMAP_DMA_DST_SYNC_PREFETCH; } else { dev_err(chan->device->dev, "%s: bad direction?\n", __func__); return NULL;