From patchwork Tue Nov 19 15:29:53 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Pieralisi X-Patchwork-Id: 3203001 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 5A7719F243 for ; Tue, 19 Nov 2013 15:30:40 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9A8F120374 for ; Tue, 19 Nov 2013 15:30:35 +0000 (UTC) Received: from casper.infradead.org (casper.infradead.org [85.118.1.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0F73E2035E for ; Tue, 19 Nov 2013 15:30:34 +0000 (UTC) Received: from merlin.infradead.org ([2001:4978:20e::2]) by casper.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1VinG0-0004Zz-SK; Tue, 19 Nov 2013 15:30:29 +0000 Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1VinFy-0002LB-Gy; Tue, 19 Nov 2013 15:30:26 +0000 Received: from service87.mimecast.com ([91.220.42.44]) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1VinFv-0002Jj-UD for linux-arm-kernel@lists.infradead.org; Tue, 19 Nov 2013 15:30:24 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 19 Nov 2013 15:30:00 +0000 Received: from e102568-lin.cambridge.arm.com ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 19 Nov 2013 15:29:57 +0000 From: Lorenzo Pieralisi To: linux-arm-kernel@lists.infradead.org Subject: [PATCH] arm: mm: refactor v7 cache cleaning ops to use way/index sequence Date: Tue, 19 Nov 2013 15:29:53 +0000 Message-Id: <1384874993-4577-1-git-send-email-lorenzo.pieralisi@arm.com> X-Mailer: git-send-email 1.8.2.2 X-OriginalArrivalTime: 19 Nov 2013 15:29:57.0774 (UTC) FILETIME=[344022E0:01CEE53C] X-MC-Unique: 113111915300001301 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20131119_103024_146742_CAD6BA07 X-CRM114-Status: GOOD ( 10.97 ) X-Spam-Score: -2.6 (--) Cc: Lorenzo Pieralisi , Russell King , Stephen Warren , Catalin Marinas , Nicolas Pitre , Magnus Damm , Kevin Hilman , Santosh Shilimkar , Kukjin Kim , Dave Martin X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Set-associative caches on all v7 implementations map the index bits to physical addresses LSBs and tag bits to MSBs. On most systems with sane DRAM controller configurations, this means that the current v7 cache flush routine using set/way operations triggers a DRAM memory controller precharge/activate for every cache line writeback since the cache routine cleans lines by first fixing the index and then looping through ways. Given the random content of cache tags, swapping the order between indexes and ways loops do not prevent DRAM pages precharge and activate cycles but at least, on average, improves the chances that either multiple lines hit the same page or multiple lines belong to different DRAM banks, improving throughput significantly. This patch swaps the inner loops in the v7 cache flushing routine to carry out the clean operations first on all sets belonging to a given way (looping through sets) and then decrementing the way. Benchmarks showed that by swapping the ordering in which sets and ways are decremented in the v7 cache flushing routine, that uses set/way operations, time required to flush caches is reduced significantly, owing to improved writebacks throughput to the DRAM controller. Signed-off-by: Lorenzo Pieralisi Acked-by: Catalin Marinas Acked-by: Nicolas Pitre Acked-by: Santosh Shilimkar Reviewed-by: Dave Martin --- arch/arm/mm/cache-v7.S | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index b5c467a..778bcf8 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -146,18 +146,18 @@ flush_levels: ldr r7, =0x7fff ands r7, r7, r1, lsr #13 @ extract max number of the index size loop1: - mov r9, r4 @ create working copy of max way size + mov r9, r7 @ create working copy of max index loop2: - ARM( orr r11, r10, r9, lsl r5 ) @ factor way and cache number into r11 - THUMB( lsl r6, r9, r5 ) + ARM( orr r11, r10, r4, lsl r5 ) @ factor way and cache number into r11 + THUMB( lsl r6, r4, r5 ) THUMB( orr r11, r10, r6 ) @ factor way and cache number into r11 - ARM( orr r11, r11, r7, lsl r2 ) @ factor index number into r11 - THUMB( lsl r6, r7, r2 ) + ARM( orr r11, r11, r9, lsl r2 ) @ factor index number into r11 + THUMB( lsl r6, r9, r2 ) THUMB( orr r11, r11, r6 ) @ factor index number into r11 mcr p15, 0, r11, c7, c14, 2 @ clean & invalidate by set/way - subs r9, r9, #1 @ decrement the way + subs r9, r9, #1 @ decrement the index bge loop2 - subs r7, r7, #1 @ decrement the index + subs r4, r4, #1 @ decrement the way bge loop1 skip: add r10, r10, #2 @ increment cache number