From patchwork Fri Jul 7 00:22:38 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 9829251 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id BADD8602F0 for ; Fri, 7 Jul 2017 00:22:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ABBCD26E3E for ; Fri, 7 Jul 2017 00:22:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9FA7628429; Fri, 7 Jul 2017 00:22:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A804326E3E for ; Fri, 7 Jul 2017 00:22:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753067AbdGGAWn convert rfc822-to-8bit (ORCPT ); Thu, 6 Jul 2017 20:22:43 -0400 Received: from mga02.intel.com ([134.134.136.20]:35770 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752987AbdGGAWm (ORCPT ); Thu, 6 Jul 2017 20:22:42 -0400 Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Jul 2017 17:22:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,319,1496127600"; d="scan'208";a="876075685" Received: from orsmsx109.amr.corp.intel.com ([10.22.240.7]) by FMSMGA003.fm.intel.com with ESMTP; 06 Jul 2017 17:22:40 -0700 Received: from orsmsx161.amr.corp.intel.com (10.22.240.84) by ORSMSX109.amr.corp.intel.com (10.22.240.7) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 6 Jul 2017 17:22:39 -0700 Received: from orsmsx107.amr.corp.intel.com ([169.254.1.150]) by ORSMSX161.amr.corp.intel.com ([169.254.4.55]) with mapi id 14.03.0319.002; Thu, 6 Jul 2017 17:22:39 -0700 From: "Williams, Dan J" To: "torvalds@linux-foundation.org" CC: "linux-kernel@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-block@vger.kernel.org" , "viro@zeniv.linux.org.uk" , "x86@kernel.org" , "linux-fsdevel@vger.kernel.org" Subject: [GIT PULL] libnvdimm for 4.13 Thread-Topic: [GIT PULL] libnvdimm for 4.13 Thread-Index: AQHS9rcj+vhGWWihEk2YtrCnde4lMw== Date: Fri, 7 Jul 2017 00:22:38 +0000 Message-ID: <1499386957.6081.29.camel@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.201.133] Content-ID: <50E9FF9960AEA045A79246ECFBE9C8CB@intel.com> MIME-Version: 1.0 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Linus, please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.13 to receive, libnvdimm updates for the latest ACPI and UEFI specifications. This pull request also includes new 'struct dax_operations' enabling to undo the abuse [1] of copy_user_nocache() for copy operations to pmem. The dax work originally missed 4.12 to address concerns raised by Al. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html All of the commits in this pull request have appeared in one or more -next releases with no errors reported, however, Stephen did report a late merge conflict between nvdimm.git and the vfs.git tree. Stephen's merge resolution is here: http://marc.info/?l=linux-kernel&m=1499064115 07301&w=2, but to match Al's changes we appear to also need the incremental change below. Please pull, I believe any straggling _flushcache() feedback at this point can be fixed up post -rc1. I include commit 0aed55af8834 "x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations" at the end of this message for reference. diff --git a/include/linux/uio.h b/include/linux/uio.h index 2f46f8d4b508..073bb1feb0d0 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -97,6 +97,7 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i); size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i); bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i); size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i); +size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i); bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i); static __always_inline __must_check @@ -151,7 +152,14 @@ bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i) * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the * destination is flushed from the cache on return. */ -size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i); +static __always_inline __must_check +size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) +{ + if (unlikely(!check_copy_size(addr, bytes, false))) + return bytes; + else + return _copy_from_iter_flushcache(addr, bytes, i); +} #else static inline size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) diff --git a/lib/iov_iter.c b/lib/iov_iter.c index ee82300d98b9..0d18ede56a36 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -642,7 +642,7 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) EXPORT_SYMBOL(_copy_from_iter_nocache); #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE -size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) +size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) { char *to = addr; if (unlikely(i->type & ITER_PIPE)) { @@ -660,7 +660,7 @@ size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) return bytes; } -EXPORT_SYMBOL_GPL(copy_from_iter_flushcache); +EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache); #endif bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i) --- The following changes since commit 87085ff2e90ecfa91f8bb0cb0ce19ea661bd6f83: thermal: int340x_thermal: fix compile after the UUID API switch (2017-06-09 16:37:31 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.13 for you to fetch changes up to 9d92573fff3ec70785ef1815cc80573f70e7a921: Merge branch 'for-4.13/dax' into libnvdimm-for-next (2017-07-03 16:54:58 -0700) ---------------------------------------------------------------- libnvdimm for 4.13 * Introduce the _flushcache() family of memory copy helpers and use them for persistent memory write operations on x86. The _flushcache() semantic indicates that the cache is either bypassed for the copy operation (movnt) or any lines dirtied by the copy operation are written back (clwb, clflushopt, or clflush). * Extend dax_operations with ->copy_from_iter() and ->flush() operations. These operations and other infrastructure updates allow all persistent memory specific dax functionality to be pushed into libnvdimm and the pmem driver directly. It also allows dax-specific sysfs attributes to be linked to a host device, for example: /sys/block/pmem0/dax/write_cache * Add support for the new NVDIMM platform/firmware mechanisms introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2 namespace label format, extensions to the address-range-scrub command set, new error injection commands, and a new BTT (block-translation-table) layout. These updates support inter-OS and pre-OS compatibility. * Fix a longstanding memory corruption bug in nfit_test. * Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2) capable. * Miscellaneous fixes and small updates across libnvdimm and the nfit driver. Acknowledgements that came after the branch was pushed: commit 6aa734a2f38e "libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime" Reviewed-by: Toshi Kani ---------------------------------------------------------------- Arvind Yadav (1): acpi, nfit: constify *_attribute_group Dan Williams (29): x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations dm: add ->copy_from_iter() dax operation support libnvdimm, label: add v1.2 nvdimm label definitions libnvdimm, label: add v1.2 interleave-set-cookie algorithm libnvdimm, label: honor the lba size specified in v1.2 labels libnvdimm, label: populate the type_guid property for v1.2 namespaces libnvdimm, label: populate 'isetcookie' for blk-aperture namespaces libnvdimm, label: update 'nlabel' and 'position' handling for local namespaces libnvdimm, label: add v1.2 label checksum support libnvdimm, label: add address abstraction identifiers libnvdimm, label: switch to using v1.2 labels by default filesystem-dax: convert to dax_copy_from_iter() dax, pmem: introduce an optional 'flush' dax_operation dm: add ->flush() dax operation support filesystem-dax: convert to dax_flush() x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush x86, dax, libnvdimm: remove wb_cache_pmem() indirection x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm x86, libnvdimm, pmem: remove global pmem api libnvdimm, pmem: fix persistence warning libnvdimm, nfit: enable support for volatile ranges dax: remove default copy_from_iter fallback dax: convert to bitmask for flags libnvdimm, pmem, dax: export a cache control attribute libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region acpi, nfit: quiet invalid block-aperture-region warnings libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime libnvdimm, namespace: record 'lbasize' for pmem namespaces Merge branch 'for-4.13/dax' into libnvdimm-for-next Jerry Hoemann (5): libnvdimm: passthru functions clear to send acpi, nfit: Enable DSM pass thru for root functions. libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru. acpi, nfit: Show bus_dsm_mask in sysfs libnvdimm: New ACPI 6.2 DSM functions Toshi Kani (3): libnvdimm, pmem: Add sysfs notifications to badblocks acpi/nfit: Add support of NVDIMM memory error notification in ACPI 6.2 acpi/nfit: Issue Start ARS to retrieve existing records Vishal Verma (4): libnvdimm, btt: BTT updates for UEFI 2.7 format libnvdimm, btt: fix btt_rw_page not returning errors libnvdimm: fix the clear-error check in nsio_rw_bytes libnvdimm, btt: convert some info messages to warn/err Yasunori Goto (1): tools/testing/nvdimm: fix nfit_test buffer overflow MAINTAINERS | 4 +- arch/powerpc/sysdev/axonram.c | 8 ++ arch/x86/Kconfig | 1 + arch/x86/include/asm/pmem.h | 136 ------------------ arch/x86/include/asm/string_64.h | 5 + arch/x86/include/asm/uaccess_64.h | 11 ++ arch/x86/lib/usercopy_64.c | 134 ++++++++++++++++++ arch/x86/mm/pageattr.c | 6 + drivers/acpi/nfit/core.c | 167 ++++++++++++++++++---- drivers/acpi/nfit/mce.c | 2 +- drivers/acpi/nfit/nfit.h | 4 +- drivers/block/brd.c | 8 ++ drivers/dax/super.c | 118 +++++++++++++++- drivers/md/dm-linear.c | 30 ++++ drivers/md/dm-stripe.c | 40 ++++++ drivers/md/dm.c | 45 ++++++ drivers/nvdimm/btt.c | 45 ++++-- drivers/nvdimm/btt.h | 2 + drivers/nvdimm/btt_devs.c | 54 +++++++- drivers/nvdimm/bus.c | 15 +- drivers/nvdimm/claim.c | 38 ++++- drivers/nvdimm/core.c | 5 +- drivers/nvdimm/dax_devs.c | 10 +- drivers/nvdimm/dimm_devs.c | 10 +- drivers/nvdimm/label.c | 251 +++++++++++++++++++++++++++++---- drivers/nvdimm/label.h | 21 ++- drivers/nvdimm/namespace_devs.c | 282 ++++++++++++++++++++++++++++++++------ drivers/nvdimm/nd-core.h | 9 ++ drivers/nvdimm/nd.h | 17 ++- drivers/nvdimm/pfn_devs.c | 12 +- drivers/nvdimm/pmem.c | 63 ++++++++- drivers/nvdimm/pmem.h | 15 ++ drivers/nvdimm/region.c | 17 ++- drivers/nvdimm/region_devs.c | 88 ++++++++---- drivers/s390/block/dcssblk.c | 8 ++ fs/dax.c | 9 +- include/linux/dax.h | 12 ++ include/linux/device-mapper.h | 6 + include/linux/libnvdimm.h | 11 +- include/linux/nd.h | 13 ++ include/linux/pmem.h | 142 ------------------- include/linux/string.h | 6 + include/linux/uio.h | 15 ++ include/uapi/linux/ndctl.h | 42 +++++- lib/Kconfig | 3 + lib/iov_iter.c | 22 +++ tools/testing/nvdimm/test/nfit.c | 2 +- 47 files changed, 1504 insertions(+), 460 deletions(-) delete mode 100644 arch/x86/include/asm/pmem.h delete mode 100644 include/linux/pmem.h --- commit 0aed55af88345b5d673240f90e671d79662fb01e Author: Dan Williams Date: Mon May 29 12:22:50 2017 -0700 x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations The pmem driver has a need to transfer data with a persistent memory destination and be able to rely on the fact that the destination writes are not cached. It is sufficient for the writes to be flushed to a cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync() to ensure data-writes have reached a power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn around and fence previous writes with an "sfence". Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and memcpy_flushcache, that guarantee that the destination buffer is not dirty in the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines will be used to replace the "pmem api" (include/linux/pmem.h + arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache() and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE config symbol, and fallback to copy_from_iter_nocache() and plain memcpy() otherwise. This is meant to satisfy the concern from Linus that if a driver wants to do something beyond the normal nocache semantics it should be something private to that driver [1], and Al's concern that anything uaccess related belongs with the rest of the uaccess code [2]. The first consumer of this interface is a new 'copy_from_iter' dax operation so that pmem can inject cache maintenance operations without imposing this overhead on other dax-capable drivers. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html Cc: Cc: Jan Kara Cc: Jeff Moyer Cc: Ingo Molnar Cc: Christoph Hellwig Cc: Toshi Kani Cc: "H. Peter Anvin" Cc: Al Viro Cc: Thomas Gleixner Cc: Matthew Wilcox Reviewed-by: Ross Zwisler Signed-off-by: Dan Williams diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 4ccfacc7232a..bb273b2f50b5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -54,6 +54,7 @@ config X86 select ARCH_HAS_KCOV if X86_64 select ARCH_HAS_MMIO_FLUSH select ARCH_HAS_PMEM_API if X86_64 + select ARCH_HAS_UACCESS_FLUSHCACHE if X86_64 select ARCH_HAS_SET_MEMORY select ARCH_HAS_SG_CHAIN select ARCH_HAS_STRICT_KERNEL_RWX diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h index 733bae07fb29..1f22bc277c45 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -109,6 +109,11 @@ memcpy_mcsafe(void *dst, const void *src, size_t cnt) return 0; } +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE +#define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1 +void memcpy_flushcache(void *dst, const void *src, size_t cnt); +#endif + #endif /* __KERNEL__ */ #endif /* _ASM_X86_STRING_64_H */ diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index c5504b9a472e..b16f6a1d8b26 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -171,6 +171,10 @@ unsigned long raw_copy_in_user(void __user *dst, const void __user *src, unsigne extern long __copy_user_nocache(void *dst, const void __user *src, unsigned size, int zerorest); +extern long __copy_user_flushcache(void *dst, const void __user *src, unsigned size); +extern void memcpy_page_flushcache(char *to, struct page *page, size_t offset, + size_t len); + static inline int __copy_from_user_inatomic_nocache(void *dst, const void __user *src, unsigned size) @@ -179,6 +183,13 @@ __copy_from_user_inatomic_nocache(void *dst, const void __user *src, return __copy_user_nocache(dst, src, size, 0); } +static inline int +__copy_from_user_flushcache(void *dst, const void __user *src, unsigned size) +{ + kasan_check_write(dst, size); + return __copy_user_flushcache(dst, src, size); +} + unsigned long copy_user_handle_tail(char *to, char *from, unsigned len); diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index 3b7c40a2e3e1..f42d2fd86ca3 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -7,6 +7,7 @@ */ #include #include +#include /* * Zero Userspace @@ -73,3 +74,130 @@ copy_user_handle_tail(char *to, char *from, unsigned len) clac(); return len; } + +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE +/** + * clean_cache_range - write back a cache range with CLWB + * @vaddr: virtual start address + * @size: number of bytes to write back + * + * Write back a cache range using the CLWB (cache line write back) + * instruction. Note that @size is internally rounded up to be cache + * line size aligned. + */ +static void clean_cache_range(void *addr, size_t size) +{ + u16 x86_clflush_size = boot_cpu_data.x86_clflush_size; + unsigned long clflush_mask = x86_clflush_size - 1; + void *vend = addr + size; + void *p; + + for (p = (void *)((unsigned long)addr & ~clflush_mask); + p < vend; p += x86_clflush_size) + clwb(p); +} + +long __copy_user_flushcache(void *dst, const void __user *src, unsigned size) +{ + unsigned long flushed, dest = (unsigned long) dst; + long rc = __copy_user_nocache(dst, src, size, 0); + + /* + * __copy_user_nocache() uses non-temporal stores for the bulk + * of the transfer, but we need to manually flush if the + * transfer is unaligned. A cached memory copy is used when + * destination or size is not naturally aligned. That is: + * - Require 8-byte alignment when size is 8 bytes or larger. + * - Require 4-byte alignment when size is 4 bytes. + */ + if (size < 8) { + if (!IS_ALIGNED(dest, 4) || size != 4) + clean_cache_range(dst, 1); + } else { + if (!IS_ALIGNED(dest, 8)) { + dest = ALIGN(dest, boot_cpu_data.x86_clflush_size); + clean_cache_range(dst, 1); + } + + flushed = dest - (unsigned long) dst; + if (size > flushed && !IS_ALIGNED(size - flushed, 8)) + clean_cache_range(dst + size - 1, 1); + } + + return rc; +} + +void memcpy_flushcache(void *_dst, const void *_src, size_t size) +{ + unsigned long dest = (unsigned long) _dst; + unsigned long source = (unsigned long) _src; + + /* cache copy and flush to align dest */ + if (!IS_ALIGNED(dest, 8)) { + unsigned len = min_t(unsigned, size, ALIGN(dest, 8) - dest); + + memcpy((void *) dest, (void *) source, len); + clean_cache_range((void *) dest, len); + dest += len; + source += len; + size -= len; + if (!size) + return; + } + + /* 4x8 movnti loop */ + while (size >= 32) { + asm("movq (%0), %%r8\n" + "movq 8(%0), %%r9\n" + "movq 16(%0), %%r10\n" + "movq 24(%0), %%r11\n" + "movnti %%r8, (%1)\n" + "movnti %%r9, 8(%1)\n" + "movnti %%r10, 16(%1)\n" + "movnti %%r11, 24(%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8", "r9", "r10", "r11"); + dest += 32; + source += 32; + size -= 32; + } + + /* 1x8 movnti loop */ + while (size >= 8) { + asm("movq (%0), %%r8\n" + "movnti %%r8, (%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8"); + dest += 8; + source += 8; + size -= 8; + } + + /* 1x4 movnti loop */ + while (size >= 4) { + asm("movl (%0), %%r8d\n" + "movnti %%r8d, (%1)\n" + :: "r" (source), "r" (dest) + : "memory", "r8"); + dest += 4; + source += 4; + size -= 4; + } + + /* cache copy for remaining bytes */ + if (size) { + memcpy((void *) dest, (void *) source, size); + clean_cache_range((void *) dest, size); + } +} +EXPORT_SYMBOL_GPL(memcpy_flushcache); + +void memcpy_page_flushcache(char *to, struct page *page, size_t offset, + size_t len) +{ + char *from = kmap_atomic(page); + + memcpy_flushcache(to, from + offset, len); + kunmap_atomic(from); +} +#endif diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 656acb5d7166..cbd5596e7562 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -1842,8 +1842,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk, } if (rw) - memcpy_to_pmem(mmio->addr.aperture + offset, - iobuf + copied, c); + memcpy_flushcache(mmio->addr.aperture + offset, iobuf + copied, c); else { if (nfit_blk->dimm_flags & NFIT_BLK_READ_FLUSH) mmio_flush_range((void __force *) diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index 7ceb5fa4f2a1..b8b9c8ca7862 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -277,7 +277,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, rc = -EIO; } - memcpy_to_pmem(nsio->addr + offset, buf, size); + memcpy_flushcache(nsio->addr + offset, buf, size); nvdimm_flush(to_nd_region(ndns->dev.parent)); return rc; diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index c544d466ea51..2f3aefe565c6 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include "pmem.h" @@ -80,7 +81,7 @@ static void write_pmem(void *pmem_addr, struct page *page, { void *mem = kmap_atomic(page); - memcpy_to_pmem(pmem_addr, mem + off, len); + memcpy_flushcache(pmem_addr, mem + off, len); kunmap_atomic(mem); } @@ -235,8 +236,15 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev, return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn); } +static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, + void *addr, size_t bytes, struct iov_iter *i) +{ + return copy_from_iter_flushcache(addr, bytes, i); +} + static const struct dax_operations pmem_dax_ops = { .direct_access = pmem_dax_direct_access, + .copy_from_iter = pmem_copy_from_iter, }; static void pmem_release_queue(void *q) @@ -294,7 +302,8 @@ static int pmem_attach_disk(struct device *dev, dev_set_drvdata(dev, pmem); pmem->phys_addr = res->start; pmem->size = resource_size(res); - if (nvdimm_has_flush(nd_region) < 0) + if (!IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) + || nvdimm_has_flush(nd_region) < 0) dev_warn(dev, "unable to guarantee persistence of writes\n"); if (!devm_request_mem_region(dev, res->start, resource_size(res), diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index b550edf2571f..985b0e11bd73 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -1015,8 +1015,8 @@ void nvdimm_flush(struct nd_region *nd_region) * The first wmb() is needed to 'sfence' all previous writes * such that they are architecturally visible for the platform * buffer flush. Note that we've already arranged for pmem - * writes to avoid the cache via arch_memcpy_to_pmem(). The - * final wmb() ensures ordering for the NVDIMM flush write. + * writes to avoid the cache via memcpy_flushcache(). The final + * wmb() ensures ordering for the NVDIMM flush write. */ wmb(); for (i = 0; i < nd_region->ndr_mappings; i++) diff --git a/include/linux/dax.h b/include/linux/dax.h index 5ec1f6c47716..bbe79ed90e2b 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -16,6 +16,9 @@ struct dax_operations { */ long (*direct_access)(struct dax_device *, pgoff_t, long, void **, pfn_t *); + /* copy_from_iter: dax-driver override for default copy_from_iter */ + size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t, + struct iov_iter *); }; #if IS_ENABLED(CONFIG_DAX) diff --git a/include/linux/string.h b/include/linux/string.h index 537918f8a98e..7439d83eaa33 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -122,6 +122,12 @@ static inline __must_check int memcpy_mcsafe(void *dst, const void *src, return 0; } #endif +#ifndef __HAVE_ARCH_MEMCPY_FLUSHCACHE +static inline void memcpy_flushcache(void *dst, const void *src, size_t cnt) +{ + memcpy(dst, src, cnt); +} +#endif void *memchr_inv(const void *s, int c, size_t n); char *strreplace(char *s, char old, char new); diff --git a/include/linux/uio.h b/include/linux/uio.h index f2d36a3d3005..55cd54a0e941 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -95,6 +95,21 @@ size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i); size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i); bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i); size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i); +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE +/* + * Note, users like pmem that depend on the stricter semantics of + * copy_from_iter_flushcache() than copy_from_iter_nocache() must check for + * IS_ENABLED(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) before assuming that the + * destination is flushed from the cache on return. + */ +size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i); +#else +static inline size_t copy_from_iter_flushcache(void *addr, size_t bytes, + struct iov_iter *i) +{ + return copy_from_iter_nocache(addr, bytes, i); +} +#endif bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i); size_t iov_iter_zero(size_t bytes, struct iov_iter *); unsigned long iov_iter_alignment(const struct iov_iter *i); diff --git a/lib/Kconfig b/lib/Kconfig index 0c8b78a9ae2e..2d1c4b3a085c 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -548,6 +548,9 @@ config ARCH_HAS_SG_CHAIN config ARCH_HAS_PMEM_API bool +config ARCH_HAS_UACCESS_FLUSHCACHE + bool + config ARCH_HAS_MMIO_FLUSH bool diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f835964c9485..c9a69064462f 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -615,6 +615,28 @@ size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) } EXPORT_SYMBOL(copy_from_iter_nocache); +#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE +size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) +{ + char *to = addr; + if (unlikely(i->type & ITER_PIPE)) { + WARN_ON(1); + return 0; + } + iterate_and_advance(i, bytes, v, + __copy_from_user_flushcache((to += v.iov_len) - v.iov_len, + v.iov_base, v.iov_len), + memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len), + memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base, + v.iov_len) + ) + + return bytes; +} +EXPORT_SYMBOL_GPL(copy_from_iter_flushcache); +#endif + bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i) { char *to = addr;