From patchwork Tue May 1 20:45:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10374455 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0E18060234 for ; Tue, 1 May 2018 20:55:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F05D428CD3 for ; Tue, 1 May 2018 20:55:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E470D28CE3; Tue, 1 May 2018 20:55:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4570128CD3 for ; Tue, 1 May 2018 20:55:17 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 3D7B9203B900E; Tue, 1 May 2018 13:55:17 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 14534203B9000 for ; Tue, 1 May 2018 13:55:16 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 13:55:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,352,1520924400"; d="scan'208";a="37900906" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga008.jf.intel.com with ESMTP; 01 May 2018 13:55:15 -0700 Subject: [PATCH 2/6] x86, memcpy_mcsafe: return bytes remaining From: Dan Williams To: linux-nvdimm@lists.01.org Date: Tue, 01 May 2018 13:45:19 -0700 Message-ID: <152520751957.36522.6348894783685371152.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <152520750404.36522.15462513519590065300.stgit@dwillia2-desk3.amr.corp.intel.com> References: <152520750404.36522.15462513519590065300.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: tony.luck@intel.com, Peter Zijlstra , x86@kernel.org, linux-kernel@vger.kernel.org, Andy Lutomirski , Ingo Molnar , Borislav Petkov , Al Viro , Thomas Gleixner , Linus Torvalds , Andrew Morton Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Machine check safe memory copies are currently deployed in the pmem driver whenever reading from persistent memory media, so that -EIO is returned rather than triggering a kernel panic. While this protects most pmem accesses, it is not complete in the filesystem-dax case. When filesystem-dax is enabled reads may bypass the block layer and the driver via dax_iomap_actor() and its usage of copy_to_iter(). In preparation for creating a copy_to_iter() variant that can handle machine checks, teach memcpy_mcsafe() to return the number of bytes remaining rather than -EFAULT when an exception occurs. Given that the source buffer is aligned to 8-bytes and that x86 reports poison in terms of cachelines, we can assume that all reads faults occur at cacheline boundaries. When an exception occurs we have succeeded in reading some data before the poisoned cacheline. mcsafe_handle_tail() is introduced as a common helper to complete the copy operation on the good data while also being careful to limit the accesses to the known good cachelines to limit reduce the chance for additional machine check exceptions. Cc: Cc: Ingo Molnar Cc: Borislav Petkov Cc: Tony Luck Cc: Al Viro Cc: Thomas Gleixner Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Andrew Morton Cc: Linus Torvalds Co-developed-by: Tony Luck Signed-off-by: Dan Williams --- arch/x86/include/asm/string_64.h | 8 ++- arch/x86/include/asm/uaccess_64.h | 3 + arch/x86/lib/memcpy_64.S | 85 +++++++++++++++++++++++++++++++------ arch/x86/lib/usercopy_64.c | 12 +++++ drivers/nvdimm/claim.c | 3 + drivers/nvdimm/pmem.c | 6 +-- include/linux/string.h | 4 +- 7 files changed, 98 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h index 533f74c300c2..92ee5e187113 100644 --- a/arch/x86/include/asm/string_64.h +++ b/arch/x86/include/asm/string_64.h @@ -116,7 +116,8 @@ int strcmp(const char *cs, const char *ct); #endif #define __HAVE_ARCH_MEMCPY_MCSAFE 1 -__must_check int memcpy_mcsafe_unrolled(void *dst, const void *src, size_t cnt); +__must_check unsigned long memcpy_mcsafe_unrolled(void *dst, const void *src, + size_t cnt); DECLARE_STATIC_KEY_FALSE(mcsafe_key); /** @@ -131,9 +132,10 @@ DECLARE_STATIC_KEY_FALSE(mcsafe_key); * actually do machine check recovery. Everyone else can just * use memcpy(). * - * Return 0 for success, -EFAULT for fail + * Return 0 for success, or number of bytes not copied if there was an + * exception. */ -static __always_inline __must_check int +static __always_inline __must_check unsigned long memcpy_mcsafe(void *dst, const void *src, size_t cnt) { #ifdef CONFIG_X86_MCE diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h index 62546b3a398e..c064a77e8fcb 100644 --- a/arch/x86/include/asm/uaccess_64.h +++ b/arch/x86/include/asm/uaccess_64.h @@ -194,4 +194,7 @@ __copy_from_user_flushcache(void *dst, const void __user *src, unsigned size) unsigned long copy_user_handle_tail(char *to, char *from, unsigned len); +unsigned long +mcsafe_handle_tail(char *to, char *from, unsigned len, unsigned limit); + #endif /* _ASM_X86_UACCESS_64_H */ diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S index 6a416a7df8ee..97b772fcf62f 100644 --- a/arch/x86/lib/memcpy_64.S +++ b/arch/x86/lib/memcpy_64.S @@ -283,22 +283,79 @@ ENDPROC(memcpy_mcsafe_unrolled) EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled) .section .fixup, "ax" - /* Return -EFAULT for any failure */ -.L_memcpy_mcsafe_fail: - mov $-EFAULT, %rax + /* Return number of bytes not copied for any failure */ + + /* + * For .E_cache_{1,2,3} we have successfully read {8,16,24} + * bytes before crossing into the poison cacheline. Arrange for + * mcsafe_handle_tail to write those {8,16,24} bytes to the + * destination without re-triggering the machine check. %ecx + * contains the limit and %edx contains total bytes remaining. + */ +.E_cache_1: + shll $6, %ecx + addl %ecx, %edx + movl $8, %ecx + jmp mcsafe_handle_tail +.E_cache_2: + shll $6, %ecx + addl %ecx, %edx + movl $16, %ecx + jmp mcsafe_handle_tail +.E_cache_3: + shll $6, %ecx + addl %ecx, %edx + movl $24, %ecx + jmp mcsafe_handle_tail + /* + * In contrast to .E_cache_{1,2,3}, .E_cache_{5,6,7} have + * successfully copied 32-bytes before crossing into the + * poisoned cacheline. + */ +.E_cache_5: + shll $6, %ecx + addl %ecx, %edx + movl $8, %ecx + jmp .E_cache_upper +.E_cache_6: + shll $6, %ecx + addl %ecx, %edx + movl $16, %ecx + jmp .E_cache_upper +.E_cache_7: + shll $6, %ecx + addl %ecx, %edx + movl $24, %ecx + jmp .E_cache_upper +.E_cache_upper: + addq $32, %rsi + addq $32, %rdi + subl $32, %edx + jmp mcsafe_handle_tail +.E_trailing_words: + shll $3, %ecx + jmp .E_leading_bytes +.E_cache_4: + subl $32, %edx +.E_cache_0: + shll $6, %ecx +.E_leading_bytes: + addl %edx, %ecx +.E_trailing_bytes: + mov %ecx, %eax ret .previous - _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_r0, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_r1, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_r2, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_r3, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_r4, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_r5, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_r6, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_cache_r7, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_read_trailing_words, .L_memcpy_mcsafe_fail) - _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .L_memcpy_mcsafe_fail) + _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes) + _ASM_EXTABLE_FAULT(.L_cache_r0, .E_cache_0) + _ASM_EXTABLE_FAULT(.L_cache_r1, .E_cache_1) + _ASM_EXTABLE_FAULT(.L_cache_r2, .E_cache_2) + _ASM_EXTABLE_FAULT(.L_cache_r3, .E_cache_3) + _ASM_EXTABLE_FAULT(.L_cache_r4, .E_cache_4) + _ASM_EXTABLE_FAULT(.L_cache_r5, .E_cache_5) + _ASM_EXTABLE_FAULT(.L_cache_r6, .E_cache_6) + _ASM_EXTABLE_FAULT(.L_cache_r7, .E_cache_7) + _ASM_EXTABLE_FAULT(.L_read_trailing_words, .E_trailing_words) + _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes) #endif diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index 75d3776123cc..e2bcc7d85436 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -75,6 +75,18 @@ copy_user_handle_tail(char *to, char *from, unsigned len) return len; } +__visible unsigned long +mcsafe_handle_tail(char *to, char *from, unsigned len, unsigned limit) +{ + for (; len && limit; --len, --limit, to++) { + unsigned long rem = memcpy_mcsafe_unrolled(to, from, 1); + + if (rem) + break; + } + return len; +} + #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE /** * clean_cache_range - write back a cache range with CLWB diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index 30852270484f..2e96b34bc936 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -276,7 +276,8 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, if (rw == READ) { if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) return -EIO; - return memcpy_mcsafe(buf, nsio->addr + offset, size); + if (memcpy_mcsafe(buf, nsio->addr + offset, size) != 0) + return -EIO; } if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) { diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 9d714926ecf5..e023d6aa22b5 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -101,15 +101,15 @@ static blk_status_t read_pmem(struct page *page, unsigned int off, void *pmem_addr, unsigned int len) { unsigned int chunk; - int rc; + unsigned long rem; void *mem; while (len) { mem = kmap_atomic(page); chunk = min_t(unsigned int, len, PAGE_SIZE); - rc = memcpy_mcsafe(mem + off, pmem_addr, chunk); + rem = memcpy_mcsafe(mem + off, pmem_addr, chunk); kunmap_atomic(mem); - if (rc) + if (rem) return BLK_STS_IOERR; len -= chunk; off = 0; diff --git a/include/linux/string.h b/include/linux/string.h index dd39a690c841..4a5a0eb7df51 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -147,8 +147,8 @@ extern int memcmp(const void *,const void *,__kernel_size_t); extern void * memchr(const void *,int,__kernel_size_t); #endif #ifndef __HAVE_ARCH_MEMCPY_MCSAFE -static inline __must_check int memcpy_mcsafe(void *dst, const void *src, - size_t cnt) +static inline __must_check unsigned long memcpy_mcsafe(void *dst, + const void *src, size_t cnt) { memcpy(dst, src, cnt); return 0;