From patchwork Wed Jul 30 06:28:26 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joonwoo Park X-Patchwork-Id: 4646041 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 62E71C0338 for ; Wed, 30 Jul 2014 06:32:40 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 80A4B2012B for ; Wed, 30 Jul 2014 06:32:39 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D12382011B for ; Wed, 30 Jul 2014 06:32:37 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1XCNNm-0004vE-TU; Wed, 30 Jul 2014 06:29:02 +0000 Received: from wolverine02.qualcomm.com ([199.106.114.251]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1XCNNe-0004td-2W for linux-arm-kernel@lists.infradead.org; Wed, 30 Jul 2014 06:29:01 +0000 X-IronPort-AV: E=McAfee;i="5600,1067,7514"; a="146290809" Received: from ironmsg03-r.qualcomm.com ([172.30.46.17]) by wolverine02.qualcomm.com with ESMTP; 29 Jul 2014 23:28:34 -0700 X-IronPort-AV: E=Sophos;i="5.01,762,1400050800"; d="scan'208";a="722119435" Received: from joonwoop-linux.qualcomm.com ([10.46.168.200]) by Ironmsg03-R.qualcomm.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 29 Jul 2014 23:28:32 -0700 Received: (from joonwoop@localhost) by joonwoop-linux.qualcomm.com (8.14.2/8.14.5/Submit) id s6U6SV3B012841; Tue, 29 Jul 2014 23:28:31 -0700 From: Joonwoo Park To: Russell King Subject: [PATCH] arm64: optimize memcpy_{from,to}io() and memset_io() Date: Tue, 29 Jul 2014 23:28:26 -0700 Message-Id: <1406701706-12808-1-git-send-email-joonwoop@codeaurora.org> X-Mailer: git-send-email 1.8.2.1 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20140729_232854_145724_06BE4253 X-CRM114-Status: GOOD ( 10.76 ) X-Spam-Score: -3.0 (---) Cc: Trilok Soni , Joonwoo Park , linux-arm-kernel@lists.infradead.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Optimize memcpy_{from,to}io() and memset_io() by transferring in 64 bit as much as possible with minimized barrier usage. This simplest optimization brings faster throughput compare to current byte-by-byte read and write with barrier in the loop. Code's skeleton is taken from the powerpc. Signed-off-by: Joonwoo Park Acked-by: Trilok Soni --- arch/arm64/kernel/io.c | 72 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 62 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c index 7d37ead..c0e3ab1 100644 --- a/arch/arm64/kernel/io.c +++ b/arch/arm64/kernel/io.c @@ -20,18 +20,34 @@ #include #include +#define IO_CHECK_ALIGN(v, a) ((((unsigned long)(v)) & ((a) - 1)) == 0) + /* * Copy data from IO memory space to "real" memory space. */ void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count) { - unsigned char *t = to; - while (count) { + while (count && (!IO_CHECK_ALIGN(from, 8) || !IO_CHECK_ALIGN(to, 8))) { + *(u8 *)to = readb_relaxed(from); + from++; + to++; count--; - *t = readb(from); - t++; + } + + while (count >= 8) { + *(u64 *)to = readq_relaxed(from); + from += 8; + to += 8; + count -= 8; + } + + while (count) { + *(u8 *)to = readb_relaxed(from); from++; + to++; + count--; } + __iormb(); } EXPORT_SYMBOL(__memcpy_fromio); @@ -40,12 +56,28 @@ EXPORT_SYMBOL(__memcpy_fromio); */ void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count) { - const unsigned char *f = from; + void *p = (void __force *)from; + + __iowmb(); + while (count && (!IO_CHECK_ALIGN(p, 8) || !IO_CHECK_ALIGN(from, 8))) { + writeb_relaxed(*(volatile u8 *)from, p); + from++; + p++; + count--; + } + + while (count >= 8) { + writeq_relaxed(*(volatile u64 *)from, p); + from += 8; + p += 8; + count -= 8; + } + while (count) { + writeb_relaxed(*(volatile u8 *)from, p); + from++; + p++; count--; - writeb(*f, to); - f++; - to++; } } EXPORT_SYMBOL(__memcpy_toio); @@ -55,10 +87,30 @@ EXPORT_SYMBOL(__memcpy_toio); */ void __memset_io(volatile void __iomem *dst, int c, size_t count) { + void *p = (void __force *)dst; + u64 qc = c; + + qc |= qc << 8; + qc |= qc << 16; + qc |= qc << 32; + + __iowmb(); + while (count && !IO_CHECK_ALIGN(p, 8)) { + writeb_relaxed(c, p); + p++; + count--; + } + + while (count >= 8) { + writeq_relaxed(c, p); + p += 8; + count -= 8; + } + while (count) { + writeb_relaxed(c, p); + p++; count--; - writeb(c, dst); - dst++; } } EXPORT_SYMBOL(__memset_io);