From patchwork Mon Jul 29 17:05:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Will Deacon X-Patchwork-Id: 11064235 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4C1D14E5 for ; Mon, 29 Jul 2019 17:05:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 99BAB21327 for ; Mon, 29 Jul 2019 17:05:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 88D20280B0; Mon, 29 Jul 2019 17:05:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id CD38D21327 for ; Mon, 29 Jul 2019 17:05:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=yIrIlg1FCupJm/O9USCRDaNZ84Q67RXevZfkpHDukfg=; b=OMF nOcOrQoft+pbGDFeWgizPwHLSN+lXfwEDNvyw2aWYski4Mx3RX9zvzwnZJLYNLP46i1xV8CwpetkO fFWNjhUtvX1IKE5RsdMUwsrVLB9JFNaAjatk0fZPOOLtHj+iM8kc1NMLoDCIw3LfOnVMiQoF0uwii hqBmIW5kC3t22FZcfEeUHNaYVxVHSSZUVrlhF01hO4zufUCrpPIi77etw3kc3l0TV7qZqZPQAsxWZ +35AGLBq8oIm/ZKNNuazmWXQdZzqDyf3lhuOZJUW/GT017pXthhNflwJEP8Ahc3ipaklLJWo1beAw 8CB2fDtUp2aRtKVFN8WGl2xtDEuPEMQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92 #3 (Red Hat Linux)) id 1hs95Z-0001vZ-D6; Mon, 29 Jul 2019 17:05:33 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92 #3 (Red Hat Linux)) id 1hs95V-0001v4-OD for linux-arm-kernel@lists.infradead.org; Mon, 29 Jul 2019 17:05:31 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7A695337; Mon, 29 Jul 2019 10:05:27 -0700 (PDT) Received: from fuggles.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id E20AF3F694; Mon, 29 Jul 2019 10:05:26 -0700 (PDT) From: Will Deacon To: linux-arm-kernel@lists.infradead.org Subject: [PATCH] arm64: io: Relax implicit barriers in default I/O accessors Date: Mon, 29 Jul 2019 18:05:18 +0100 Message-Id: <20190729170518.14271-1-will@kernel.org> X-Mailer: git-send-email 2.11.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190729_100529_878092_52E4D964 X-CRM114-Status: GOOD ( 15.61 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: catalin.marinas@arm.com, Will Deacon MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP From: Will Deacon The arm64 implementation of the default I/O accessors requires barrier instructions to satisfy the memory ordering requirements documented in memory-barriers.txt [1], which are largely derived from the behaviour of I/O accesses on x86. Of particular interest are the requirements that a write to a device must be ordered against prior writes to memory, and a read from a device must be ordered against subsequent reads from memory. We satisfy these requirements using various flavours of DSB: the most expensive barrier we have, since it implies completion of prior accesses. This was deemed necessary when we first implemented the accessors, since accesses to different endpoints could propagate independently and therefore the only way to enforce order is to rely on completion guarantees [2]. Since then, the Armv8 memory model has been retrospectively strengthened to require "other-multi-copy atomicity", a property that requires memory accesses from an observer to become visible to all other observers simultaneously [3]. In other words, propagation of accesses is limited to transitioning from locally observed to globally observed. It recently became apparent that this change also has a subtle impact on our I/O accessors for shared peripherals, allowing us to use the cheaper DMB instruction instead. As a concrete example, consider the following: memcpy(dma_buffer, data, bufsz); writel(DMA_START, dev->ctrl_reg); A DMB ST instruction between the final write to the DMA buffer and the write to the control register will ensure that the writes to the DMA buffer are observed before the write to the control register by all observers. Put another way, if an observer can see the write to the control register, it can also see the writes to memory. This has always been the case and is not sufficient to provide the ordering required by Linux, since there is no guarantee that the master interface of the DMA-capable device has observed either of the accesses. However, in an other-multi-copy atomic world, we can infer two things: 1. A write arriving at an endpoint shared between multiple CPUs is visible to all CPUs 2. A write that is visible to all CPUs is also visible to all other observers in the shareability domain Pieced together, this allows us to use DMB OSHST for our default I/O write accessors and DMB OSHLD for our default I/O read accessors (the outer-shareability is for handling non-cacheable mappings) for shared devices. Memory-mapped, DMA-capable peripherals that are private to a CPU (i.e. inaccessible to other CPUs) still require the DSB, however these are few and far between and typically require special treatment anyway which is outside of the scope of the portable driver API (e.g. GIC, page-table walker, SPE profiler). Note that our mandatory barriers remain as DSBs, since there are cases where they are used to flush the store buffer of the CPU, e.g. when publishing page table updates to the SMMU. [1] https://git.kernel.org/linus/4614bbdee357 [2] https://www.youtube.com/watch?v=i6DayghhA8Q [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/ Signed-off-by: Will Deacon Reviewed-by: Catalin Marinas --- arch/arm64/include/asm/io.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h index 7ed92626949d..b0a3d5b85d4f 100644 --- a/arch/arm64/include/asm/io.h +++ b/arch/arm64/include/asm/io.h @@ -97,7 +97,7 @@ static inline u64 __raw_readq(const volatile void __iomem *addr) ({ \ unsigned long tmp; \ \ - rmb(); \ + dma_rmb(); \ \ /* \ * Create a dummy control dependency from the IO read to any \ @@ -111,7 +111,7 @@ static inline u64 __raw_readq(const volatile void __iomem *addr) }) #define __io_par(v) __iormb(v) -#define __iowmb() wmb() +#define __iowmb() dma_wmb() /* * Relaxed I/O memory access primitives. These follow the Device memory