From patchwork Mon Jul 6 18:21:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11646453 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 59AFB1398 for ; Mon, 6 Jul 2020 18:21:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4AD9C20739 for ; Mon, 6 Jul 2020 18:21:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729813AbgGFSVe (ORCPT ); Mon, 6 Jul 2020 14:21:34 -0400 Received: from mga06.intel.com ([134.134.136.31]:43341 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729647AbgGFSVd (ORCPT ); Mon, 6 Jul 2020 14:21:33 -0400 IronPort-SDR: JGs4GxlP9PGzHSvu2eXbdJr9+O3HI+33inLLUzxC/uR91bEbvO6dNOcoazw3fg5Ay4vKswNYJr LtdkVZ0gpsaw== X-IronPort-AV: E=McAfee;i="6000,8403,9673"; a="208990011" X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="208990011" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2020 11:21:32 -0700 IronPort-SDR: XNFZSYdpw7Tvw3KS8Zs1yAXWoq/gtp1xa/mA6JeHfHAdhe5OxX9uUO6C7WSLDN7c4zK88SOe7c 2s5wuBqEQ52w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="283120290" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga006.jf.intel.com with ESMTP; 06 Jul 2020 11:21:32 -0700 Subject: [PATCH v3 1/6] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions From: Dave Jiang To: vkoul@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de Cc: Fenghua Yu , Tony Luck , dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, dan.j.williams@intel.com, ashok.raj@intel.com, fenghua.yu@intel.com, tony.luck@intel.com, jing.lin@intel.com Date: Mon, 06 Jul 2020 11:21:32 -0700 Message-ID: <159405969206.19216.9552068036045362031.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> References: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: dmaengine-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dmaengine@vger.kernel.org From: Fenghua Yu Work submission instruction comes in two flavors. ENQCMD can be called both in ring 3 and ring 0 and always uses the contents of PASID MSR when shipping the command to the device. ENQCMDS allows a kernel driver to submit commands on behalf of a user process. The driver supplies the PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel as of now. The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo. Signed-off-by: Fenghua Yu Reviewed-by: Tony Luck This patch is here only for compilation reasons. Drop when Fenghua's patch series gets accepted upstream. --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/cpuid-deps.c | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 02dabc9e77b0..4469618c410f 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -351,6 +351,7 @@ #define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */ #define X86_FEATURE_MOVDIRI (16*32+27) /* MOVDIRI instruction */ #define X86_FEATURE_MOVDIR64B (16*32+28) /* MOVDIR64B instruction */ +#define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS instructions */ /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */ diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index 3cbe24ca80ab..3a02707c1f4d 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_CQM_MBM_TOTAL, X86_FEATURE_CQM_LLC }, { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC }, { X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL }, + { X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES }, {} }; From patchwork Mon Jul 6 18:21:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11646455 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1974913B6 for ; Mon, 6 Jul 2020 18:21:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0B0BB20739 for ; Mon, 6 Jul 2020 18:21:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729802AbgGFSVj (ORCPT ); Mon, 6 Jul 2020 14:21:39 -0400 Received: from mga02.intel.com ([134.134.136.20]:29261 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729647AbgGFSVj (ORCPT ); Mon, 6 Jul 2020 14:21:39 -0400 IronPort-SDR: 2zsI27iLv6wlzLiavrPmygR7VJutojq+Hhhg/B4mzKzvyOK+Iinv2PDGlsPgGoFScRG3ECIpli /v4eJCOiQ2Ow== X-IronPort-AV: E=McAfee;i="6000,8403,9673"; a="135722844" X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="135722844" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2020 11:21:38 -0700 IronPort-SDR: FqSu+TyLGuY3x0FnvppQHuHISp0IWyu4Rv5nY/nDQ/yCOUALg8Jgd8Np4Agne7ip8T4rQ4pbKY qxbLtvONOuJg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="482788013" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga006.fm.intel.com with ESMTP; 06 Jul 2020 11:21:38 -0700 Subject: [PATCH v3 2/6] x86/asm: move the raw asm in iosubmit_cmds512() to special_insns.h From: Dave Jiang To: vkoul@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de Cc: Tony Luck , dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, dan.j.williams@intel.com, ashok.raj@intel.com, fenghua.yu@intel.com, tony.luck@intel.com, jing.lin@intel.com Date: Mon, 06 Jul 2020 11:21:37 -0700 Message-ID: <159405969791.19216.14520281863577841635.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> References: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: dmaengine-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dmaengine@vger.kernel.org The MOVDIR64B instruction can be used by other wrapper instructions. Move the core asm code to special_insns.h and have iosubmit_cmds512() call the core asm function. Signed-off-by: Dave Jiang Reviewed-by: Tony Luck --- arch/x86/include/asm/io.h | 17 +++-------------- arch/x86/include/asm/special_insns.h | 17 +++++++++++++++++ 2 files changed, 20 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index e1aa17a468a8..d726459d08e5 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -401,7 +401,7 @@ extern bool phys_mem_access_encrypted(unsigned long phys_addr, /** * iosubmit_cmds512 - copy data to single MMIO location, in 512-bit units - * @__dst: destination, in MMIO space (must be 512-bit aligned) + * @dst: destination, in MMIO space (must be 512-bit aligned) * @src: source * @count: number of 512 bits quantities to submit * @@ -412,25 +412,14 @@ extern bool phys_mem_access_encrypted(unsigned long phys_addr, * Warning: Do not use this helper unless your driver has checked that the CPU * instruction is supported on the platform. */ -static inline void iosubmit_cmds512(void __iomem *__dst, const void *src, +static inline void iosubmit_cmds512(void __iomem *dst, const void *src, size_t count) { - /* - * Note that this isn't an "on-stack copy", just definition of "dst" - * as a pointer to 64-bytes of stuff that is going to be overwritten. - * In the MOVDIR64B case that may be needed as you can use the - * MOVDIR64B instruction to copy arbitrary memory around. This trick - * lets the compiler know how much gets clobbered. - */ - volatile struct { char _[64]; } *dst = __dst; const u8 *from = src; const u8 *end = from + count * 64; while (from < end) { - /* MOVDIR64B [rdx], rax */ - asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02" - : "=m" (dst) - : "d" (from), "a" (dst)); + movdir64b(dst, from); from += 64; } } diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index eb8e781c4353..fb28caec9aa0 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -234,6 +234,23 @@ static inline void clwb(volatile void *__p) #define nop() asm volatile ("nop") +static inline void movdir64b(void *__dst, const void *src) +{ + /* + * Note that this isn't an "on-stack copy", just definition of "dst" + * as a pointer to 64-bytes of stuff that is going to be overwritten. + * In the MOVDIR64B case that may be needed as you can use the + * MOVDIR64B instruction to copy arbitrary memory around. This trick + * lets the compiler know how much gets clobbered. + */ + volatile struct { char _[64]; } *dst = __dst; + + /* MOVDIR64B [rdx], rax */ + asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02" + : "=m" (dst) + : "d" (src), "a" (dst)); +} + #endif /* __KERNEL__ */ From patchwork Mon Jul 6 18:21:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11646457 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1A8B81398 for ; Mon, 6 Jul 2020 18:21:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0CBD62075B for ; Mon, 6 Jul 2020 18:21:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729834AbgGFSVp (ORCPT ); Mon, 6 Jul 2020 14:21:45 -0400 Received: from mga18.intel.com ([134.134.136.126]:36347 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729647AbgGFSVp (ORCPT ); Mon, 6 Jul 2020 14:21:45 -0400 IronPort-SDR: Aae92qXDZZJDx0JKW+7X19GX0jPxG4fuLq+QuOZlR2jHXTzDsSjIfVRxueq9NsTcLFn3x+8lhS FxYzcSA1Wdsw== X-IronPort-AV: E=McAfee;i="6000,8403,9673"; a="134934559" X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="134934559" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2020 11:21:44 -0700 IronPort-SDR: J1QudWee5dnAWHRd5trZgshDsZsH4lpMcdUiaEB1Tam/vmfrqZoww2EdUnirIuhgPP/+s5B2I/ gi1d9RNC5voQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="297090147" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga002.jf.intel.com with ESMTP; 06 Jul 2020 11:21:43 -0700 Subject: [PATCH v3 3/6] x86/asm: add enqcmds() to support ENQCMDS instruction From: Dave Jiang To: vkoul@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de Cc: Tony Luck , dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, dan.j.williams@intel.com, ashok.raj@intel.com, fenghua.yu@intel.com, tony.luck@intel.com, jing.lin@intel.com Date: Mon, 06 Jul 2020 11:21:43 -0700 Message-ID: <159405970353.19216.17280759893651949219.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> References: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: dmaengine-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dmaengine@vger.kernel.org Currently, the MOVDIR64B instruction is used to atomically submit 64-byte work descriptors to devices. Although it can encounter errors like faults, MOVDIR64B can not report back on errors from the device itself. This means that MOVDIR64B users need to separately interact with a device to see if a descriptor was successfully queued, which slows down device interactions. ENQCMD and ENQCMDS also atomically submit 64-byte work descriptors to devices. But, they *can* report back errors directly from the device, such as if the device was busy, or there was an error made in composing the descriptor. This immediate feedback from the submission instruction itself reduces the number of interactions with the device and can greatly increase efficiency. ENQCMD can be used at any privilege level, but can effectively only submit work on behalf of the current process. ENQCMDS is a ring0-only instruction and can explicitly specify a process context instead of being tied to the current process or needing to reprogram the IA32_PASID MSR. Use ENQCMDS for work submission within the kernel because a Process Address ID (PASID) is setup to translate the kernel virtual address space. This PASID is provided to ENQCMDS from the descriptor structure submitted to the device and not retrieved from IA32_PASID MSR, which is setup for the current user address space. See Intel Software Developer’s Manual for more information on the instructions. Add enqcmds() in x86 io.h instead of special_insns.h. MOVDIR64B instruction can be used for other purposes. A wrapper was introduced in io.h for its command submission usage. ENQCMDS has a single purpose of submit 64-byte commands to supported devices and should be called directly. Signed-off-by: Dave Jiang Reviewed-by: Tony Luck --- arch/x86/include/asm/io.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index d726459d08e5..fa0ccf9a65ce 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -424,4 +424,32 @@ static inline void iosubmit_cmds512(void __iomem *dst, const void *src, } } +/** + * enqcmds - copy a 512 bits data unit to single MMIO location + * @dst: destination, in MMIO space (must be 512-bit aligned) + * @src: source + * + * Submit data from kernel space to MMIO space, in a unit of 512 bits. + * Order of data access is not guaranteed, nor is a memory barrier + * performed afterwards. The command returns false (0) on failure, and true (1) + * on success. + * + * Warning: Do not use this helper unless your driver has checked that the CPU + * instruction is supported on the platform. + */ +static inline bool enqcmds(void __iomem *dst, const void *src) +{ + bool retry; + + /* ENQCMDS [rdx], rax */ + asm volatile(".byte 0xf3, 0x0f, 0x38, 0xf8, 0x02, 0x66, 0x90\t\n" + "setz %0\t\n" + : "=r"(retry) : "a" (dst), "d" (src)); + /* Submission failure is indicated via EFLAGS.ZF=1 */ + if (retry) + return false; + + return true; +} + #endif /* _ASM_X86_IO_H */ From patchwork Mon Jul 6 18:21:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11646459 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 731DF13B6 for ; Mon, 6 Jul 2020 18:21:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5586820702 for ; Mon, 6 Jul 2020 18:21:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729845AbgGFSVv (ORCPT ); Mon, 6 Jul 2020 14:21:51 -0400 Received: from mga14.intel.com ([192.55.52.115]:23788 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729647AbgGFSVv (ORCPT ); Mon, 6 Jul 2020 14:21:51 -0400 IronPort-SDR: 0tDNsuk41x3OiwajnFBGNAKzyDVcruzqN/uBPz9+y2sN0ZB2QOsRliaYdvkkCyNDTvjVmdIVRZ 7loivoSIlazQ== X-IronPort-AV: E=McAfee;i="6000,8403,9673"; a="146550161" X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="146550161" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2020 11:21:50 -0700 IronPort-SDR: VAx4HodIhqrIGyTrOWZPSQ5N4G4nQOICkeAA30/0UwAqEjS6W/9Tqa48EpWblNo7oJkWVogxuh uSwhClqSBesQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="357531050" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga001.jf.intel.com with ESMTP; 06 Jul 2020 11:21:49 -0700 Subject: [PATCH v3 4/6] dmaengine: idxd: add shared workqueue support From: Dave Jiang To: vkoul@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de Cc: Tony Luck , Dan Williams , dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, dan.j.williams@intel.com, ashok.raj@intel.com, fenghua.yu@intel.com, tony.luck@intel.com, jing.lin@intel.com Date: Mon, 06 Jul 2020 11:21:49 -0700 Message-ID: <159405970949.19216.8492789801701094750.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> References: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: dmaengine-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dmaengine@vger.kernel.org Add shared workqueue support that includes the support of Shared Virtual memory (SVM) or in similar terms On Demand Paging (ODP). The shared workqueue uses the enqcmds command in kernel and will respond with retry if the workqueue is full. Shared workqueue only works when there is PASID support from the IOMMU. Signed-off-by: Dave Jiang Reviewed-by: Tony Luck Reviewed-by: Dan Williams --- drivers/dma/Kconfig | 13 ++++ drivers/dma/idxd/cdev.c | 47 +++++++++++++++- drivers/dma/idxd/device.c | 91 +++++++++++++++++++++++++++--- drivers/dma/idxd/dma.c | 9 --- drivers/dma/idxd/idxd.h | 21 +++++++ drivers/dma/idxd/init.c | 90 +++++++++++++++++++++++------- drivers/dma/idxd/registers.h | 14 +++++ drivers/dma/idxd/submit.c | 33 +++++++++-- drivers/dma/idxd/sysfs.c | 127 ++++++++++++++++++++++++++++++++++++++++++ 9 files changed, 401 insertions(+), 44 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index b70e90765ad3..dd4e6027db8c 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -295,6 +295,19 @@ config INTEL_IDXD If unsure, say N. +# Config symbol that collects all the dependencies that's necessary to +# support shared virtual memory for the devices supported by idxd. +config INTEL_IDXD_SVM + bool "Accelerator Shared Virtual Memory Support" + depends on INTEL_IDXD + depends on INTEL_IOMMU_SVM + depends on PCI_PRI + depends on PCI_PASID + depends on PCI_IOV + +comment "SVM support not enabled for Intel Accelerator (see INTEL_IDXD_SVM)" + depends on INTEL_IDXD && !INTEL_IDXD_SVM + config INTEL_IOATDMA tristate "Intel I/OAT DMA support" depends on PCI && X86_64 diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index c3976156db2f..d4841d8c0928 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include "registers.h" #include "idxd.h" @@ -32,7 +33,9 @@ static struct idxd_cdev_context ictx[IDXD_TYPE_MAX] = { struct idxd_user_context { struct idxd_wq *wq; struct task_struct *task; + unsigned int pasid; unsigned int flags; + struct iommu_sva *sva; }; enum idxd_cdev_cleanup { @@ -75,6 +78,8 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp) struct idxd_wq *wq; struct device *dev; int rc = 0; + struct iommu_sva *sva; + unsigned int pasid; wq = inode_wq(inode); idxd = wq->idxd; @@ -95,6 +100,34 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp) ctx->wq = wq; filp->private_data = ctx; + + if (device_pasid_enabled(idxd)) { + sva = iommu_sva_bind_device(dev, current->mm, NULL); + if (IS_ERR(sva)) { + rc = PTR_ERR(sva); + dev_err(dev, "pasid allocation failed: %d\n", rc); + goto failed; + } + + pasid = iommu_sva_get_pasid(sva); + if (pasid == IOMMU_PASID_INVALID) { + iommu_sva_unbind_device(sva); + goto failed; + } + + ctx->sva = sva; + ctx->pasid = pasid; + + if (wq_dedicated(wq)) { + rc = idxd_wq_set_pasid(wq, pasid); + if (rc < 0) { + iommu_sva_unbind_device(sva); + dev_err(dev, "wq set pasid failed: %d\n", rc); + goto failed; + } + } + } + idxd_wq_get(wq); mutex_unlock(&wq->wq_lock); return 0; @@ -111,13 +144,25 @@ static int idxd_cdev_release(struct inode *node, struct file *filep) struct idxd_wq *wq = ctx->wq; struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; + int rc; dev_dbg(dev, "%s called\n", __func__); filep->private_data = NULL; /* Wait for in-flight operations to complete. */ - idxd_wq_drain(wq); + if (wq_shared(wq)) { + idxd_device_drain_pasid(idxd, ctx->pasid); + } else { + if (device_pasid_enabled(idxd)) { + rc = idxd_wq_disable_pasid(wq); + if (rc < 0) + dev_err(dev, "wq disable pasid failed.\n"); + } + idxd_wq_drain(wq); + } + if (ctx->sva) + iommu_sva_unbind_device(ctx->sva); kfree(ctx); mutex_lock(&wq->wq_lock); idxd_wq_put(wq); diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 14b45853aa5f..9032b50b31af 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -273,10 +273,9 @@ int idxd_wq_map_portal(struct idxd_wq *wq) start = pci_resource_start(pdev, IDXD_WQ_BAR); start = start + wq->id * IDXD_PORTAL_SIZE; - wq->dportal = devm_ioremap(dev, start, IDXD_PORTAL_SIZE); - if (!wq->dportal) + wq->portal = devm_ioremap(dev, start, IDXD_PORTAL_SIZE); + if (!wq->portal) return -ENOMEM; - dev_dbg(dev, "wq %d portal mapped at %p\n", wq->id, wq->dportal); return 0; } @@ -285,7 +284,61 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq) { struct device *dev = &wq->idxd->pdev->dev; - devm_iounmap(dev, wq->dportal); + devm_iounmap(dev, wq->portal); +} + +int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) +{ + struct idxd_device *idxd = wq->idxd; + int rc; + union wqcfg wqcfg; + unsigned int offset; + unsigned long flags; + + rc = idxd_wq_disable(wq); + if (rc < 0) + return rc; + + offset = WQCFG_OFFSET(idxd, wq->id, 2); + spin_lock_irqsave(&idxd->dev_lock, flags); + wqcfg.bits[2] = ioread32(idxd->reg_base + offset); + wqcfg.pasid_en = 1; + wqcfg.pasid = pasid; + iowrite32(wqcfg.bits[2], idxd->reg_base + offset); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + + rc = idxd_wq_enable(wq); + if (rc < 0) + return rc; + + return 0; +} + +int idxd_wq_disable_pasid(struct idxd_wq *wq) +{ + struct idxd_device *idxd = wq->idxd; + int rc; + union wqcfg wqcfg; + unsigned int offset; + unsigned long flags; + + rc = idxd_wq_disable(wq); + if (rc < 0) + return rc; + + offset = idxd->wqcfg_offset + wq->id * sizeof(wqcfg); + spin_lock_irqsave(&idxd->dev_lock, flags); + wqcfg.bits[2] = ioread32(idxd->reg_base + offset); + wqcfg.pasid_en = 0; + wqcfg.pasid = 0; + iowrite32(wqcfg.bits[2], idxd->reg_base + offset); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + + rc = idxd_wq_enable(wq); + if (rc < 0) + return rc; + + return 0; } void idxd_wq_disable_cleanup(struct idxd_wq *wq) @@ -438,6 +491,17 @@ void idxd_device_reset(struct idxd_device *idxd) idxd_cmd_exec(idxd, IDXD_CMD_RESET_DEVICE, 0, NULL); } +void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid) +{ + struct device *dev = &idxd->pdev->dev; + u32 operand; + + operand = pasid; + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_DRAIN_PASID, operand); + idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_PASID, operand, NULL); + dev_dbg(dev, "pasid %d drained\n", pasid); +} + /* Device configuration bits */ static void idxd_group_config_write(struct idxd_group *group) { @@ -523,11 +587,22 @@ static int idxd_wq_config_write(struct idxd_wq *wq) wq->wqcfg.wq_thresh = wq->threshold; /* byte 8-11 */ - wq->wqcfg.priv = !!(wq->type == IDXD_WQT_KERNEL); - wq->wqcfg.mode = 1; + wq->wqcfg.priv = wq->type == IDXD_WQT_KERNEL ? 1 : 0; + if (wq_dedicated(wq)) + wq->wqcfg.mode = 1; + + if (device_pasid_enabled(idxd)) { + wq->wqcfg.pasid_en = 1; + if (wq->type == IDXD_WQT_KERNEL && wq_dedicated(wq)) + wq->wqcfg.pasid = idxd->pasid; + } wq->wqcfg.priority = wq->priority; + if (idxd->hw.gen_cap.block_on_fault && + test_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags)) + wq->wqcfg.bof = 1; + /* bytes 12-15 */ wq->wqcfg.max_xfer_shift = idxd->hw.gen_cap.max_xfer_shift; wq->wqcfg.max_batch_shift = idxd->hw.gen_cap.max_batch_shift; @@ -635,8 +710,8 @@ static int idxd_wqs_setup(struct idxd_device *idxd) if (!wq->size) continue; - if (!wq_dedicated(wq)) { - dev_warn(dev, "No shared workqueue support.\n"); + if (wq_shared(wq) && !device_pasid_enabled(idxd)) { + dev_warn(dev, "No shared wq support but configured.\n"); return -EINVAL; } diff --git a/drivers/dma/idxd/dma.c b/drivers/dma/idxd/dma.c index 0c892cbd72e0..8ed2773d8285 100644 --- a/drivers/dma/idxd/dma.c +++ b/drivers/dma/idxd/dma.c @@ -61,8 +61,6 @@ static inline void idxd_prep_desc_common(struct idxd_wq *wq, u64 addr_f1, u64 addr_f2, u64 len, u64 compl, u32 flags) { - struct idxd_device *idxd = wq->idxd; - hw->flags = flags; hw->opcode = opcode; hw->src_addr = addr_f1; @@ -70,13 +68,6 @@ static inline void idxd_prep_desc_common(struct idxd_wq *wq, hw->xfer_size = len; hw->priv = !!(wq->type == IDXD_WQT_KERNEL); hw->completion_addr = compl; - - /* - * Descriptor completion vectors are 1-8 for MSIX. We will round - * robin through the 8 vectors. - */ - wq->vec_ptr = (wq->vec_ptr % idxd->num_wq_irqs) + 1; - hw->int_handle = wq->vec_ptr; } static struct dma_async_tx_descriptor * diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index e62b4799d189..ab31e939dad6 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -59,6 +59,7 @@ enum idxd_wq_state { enum idxd_wq_flag { WQ_FLAG_DEDICATED = 0, + WQ_FLAG_BLOCK_ON_FAULT, }; enum idxd_wq_type { @@ -86,10 +87,11 @@ enum idxd_op_type { enum idxd_complete_type { IDXD_COMPLETE_NORMAL = 0, IDXD_COMPLETE_ABORT, + IDXD_COMPLETE_DEV_FAIL, }; struct idxd_wq { - void __iomem *dportal; + void __iomem *portal; struct device conf_dev; struct idxd_cdev idxd_cdev; struct idxd_device *idxd; @@ -143,6 +145,7 @@ enum idxd_device_state { enum idxd_device_flag { IDXD_FLAG_CONFIGURABLE = 0, IDXD_FLAG_CMD_RUNNING, + IDXD_FLAG_PASID_ENABLED, }; struct idxd_device { @@ -164,6 +167,9 @@ struct idxd_device { struct idxd_wq *wqs; struct idxd_engine *engines; + struct iommu_sva *sva; + unsigned int pasid; + int num_groups; u32 msix_perm_offset; @@ -216,6 +222,16 @@ static inline bool wq_dedicated(struct idxd_wq *wq) return test_bit(WQ_FLAG_DEDICATED, &wq->flags); } +static inline bool wq_shared(struct idxd_wq *wq) +{ + return !test_bit(WQ_FLAG_DEDICATED, &wq->flags); +} + +static inline bool device_pasid_enabled(struct idxd_device *idxd) +{ + return test_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags); +} + enum idxd_portal_prot { IDXD_PORTAL_UNLIMITED = 0, IDXD_PORTAL_LIMITED, @@ -284,6 +300,7 @@ void idxd_device_reset(struct idxd_device *idxd); void idxd_device_cleanup(struct idxd_device *idxd); int idxd_device_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); +void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); @@ -294,6 +311,8 @@ void idxd_wq_drain(struct idxd_wq *wq); int idxd_wq_map_portal(struct idxd_wq *wq); void idxd_wq_unmap_portal(struct idxd_wq *wq); void idxd_wq_disable_cleanup(struct idxd_wq *wq); +int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid); +int idxd_wq_disable_pasid(struct idxd_wq *wq); /* submission */ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index c7c61974f20f..1f23df0f97c1 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -14,6 +14,8 @@ #include #include #include +#include +#include #include #include #include "../dmaengine.h" @@ -28,6 +30,7 @@ MODULE_AUTHOR("Intel Corporation"); static struct idr idxd_idrs[IDXD_TYPE_MAX]; static struct mutex idxd_idr_lock; +static bool support_enqcmd; static struct pci_device_id idxd_pci_tbl[] = { /* DSA ver 1.0 platforms */ @@ -53,6 +56,7 @@ static int idxd_setup_interrupts(struct idxd_device *idxd) struct idxd_irq_entry *irq_entry; int i, msixcnt; int rc = 0; + union msix_perm mperm; msixcnt = pci_msix_vec_count(pdev); if (msixcnt < 0) { @@ -131,6 +135,13 @@ static int idxd_setup_interrupts(struct idxd_device *idxd) idxd_unmask_error_interrupts(idxd); + /* Setup MSIX permission table */ + mperm.bits = 0; + mperm.pasid = idxd->pasid; + mperm.pasid_en = device_pasid_enabled(idxd); + for (i = 1; i < msixcnt; i++) + iowrite32(mperm.bits, idxd->reg_base + idxd->msix_perm_offset + i * 8); + return 0; err_no_irq: @@ -258,8 +269,7 @@ static void idxd_read_caps(struct idxd_device *idxd) } } -static struct idxd_device *idxd_alloc(struct pci_dev *pdev, - void __iomem * const *iomap) +static struct idxd_device *idxd_alloc(struct pci_dev *pdev) { struct device *dev = &pdev->dev; struct idxd_device *idxd; @@ -269,12 +279,45 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, return NULL; idxd->pdev = pdev; - idxd->reg_base = iomap[IDXD_MMIO_BAR]; spin_lock_init(&idxd->dev_lock); return idxd; } +static int idxd_enable_system_pasid(struct idxd_device *idxd) +{ + int flags; + unsigned int pasid; + struct iommu_sva *sva; + + flags = SVM_FLAG_SUPERVISOR_MODE; + + sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags); + if (IS_ERR(sva)) { + dev_warn(&idxd->pdev->dev, + "iommu sva bind failed: %ld\n", PTR_ERR(sva)); + return PTR_ERR(sva); + } + + pasid = iommu_sva_get_pasid(sva); + if (pasid == IOMMU_PASID_INVALID) { + iommu_sva_unbind_device(sva); + return -ENODEV; + } + + idxd->sva = sva; + idxd->pasid = pasid; + dev_dbg(&idxd->pdev->dev, "system pasid: %u\n", pasid); + return 0; +} + +static void idxd_disable_system_pasid(struct idxd_device *idxd) +{ + + iommu_sva_unbind_device(idxd->sva); + idxd->sva = NULL; +} + static int idxd_probe(struct idxd_device *idxd) { struct pci_dev *pdev = idxd->pdev; @@ -285,6 +328,14 @@ static int idxd_probe(struct idxd_device *idxd) idxd_device_init_reset(idxd); dev_dbg(dev, "IDXD reset complete\n"); + if (IS_ENABLED(CONFIG_INTEL_IDXD_SVM) && support_enqcmd) { + rc = idxd_enable_system_pasid(idxd); + if (rc < 0) + dev_warn(dev, "Failed to enable PASID. No SVA support: %d\n", rc); + else + set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags); + } + idxd_read_caps(idxd); idxd_read_table_offsets(idxd); @@ -315,29 +366,29 @@ static int idxd_probe(struct idxd_device *idxd) idxd_mask_error_interrupts(idxd); idxd_mask_msix_vectors(idxd); err_setup: + if (device_pasid_enabled(idxd)) + idxd_disable_system_pasid(idxd); return rc; } static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { - void __iomem * const *iomap; struct device *dev = &pdev->dev; struct idxd_device *idxd; int rc; - unsigned int mask; rc = pcim_enable_device(pdev); if (rc) return rc; - dev_dbg(dev, "Mapping BARs\n"); - mask = (1 << IDXD_MMIO_BAR); - rc = pcim_iomap_regions(pdev, mask, DRV_NAME); - if (rc) - return rc; + dev_dbg(dev, "Alloc IDXD context\n"); + idxd = idxd_alloc(pdev); + if (!idxd) + return -ENOMEM; - iomap = pcim_iomap_table(pdev); - if (!iomap) + dev_dbg(dev, "Mapping BARs\n"); + idxd->reg_base = pcim_iomap(pdev, IDXD_MMIO_BAR, 0); + if (!idxd->reg_base) return -ENOMEM; dev_dbg(dev, "Set DMA masks\n"); @@ -353,11 +404,6 @@ static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (rc) return rc; - dev_dbg(dev, "Alloc IDXD context\n"); - idxd = idxd_alloc(pdev, iomap); - if (!idxd) - return -ENOMEM; - idxd_set_type(idxd); dev_dbg(dev, "Set PCI master\n"); @@ -445,6 +491,8 @@ static void idxd_remove(struct pci_dev *pdev) dev_dbg(&pdev->dev, "%s called\n", __func__); idxd_cleanup_sysfs(idxd); idxd_shutdown(pdev); + if (device_pasid_enabled(idxd)) + idxd_disable_system_pasid(idxd); mutex_lock(&idxd_idr_lock); idr_remove(&idxd_idrs[idxd->type], idxd->id); mutex_unlock(&idxd_idr_lock); @@ -463,7 +511,7 @@ static int __init idxd_init_module(void) int err, i; /* - * If the CPU does not support write512, there's no point in + * If the CPU does not support MOVDIR64B or ENQCMDS, there's no point in * enumerating the device. We can not utilize it. */ if (!boot_cpu_has(X86_FEATURE_MOVDIR64B)) { @@ -471,8 +519,10 @@ static int __init idxd_init_module(void) return -ENODEV; } - pr_info("%s: Intel(R) Accelerator Devices Driver %s\n", - DRV_NAME, IDXD_DRIVER_VERSION); + if (!boot_cpu_has(X86_FEATURE_ENQCMD)) + pr_warn("Platform does not have ENQCMD(S) support.\n"); + else + support_enqcmd = true; mutex_init(&idxd_idr_lock); for (i = 0; i < IDXD_TYPE_MAX; i++) diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index a39e7ae6b3d9..a0df4f3fe1fb 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -333,4 +333,18 @@ union wqcfg { }; u32 bits[8]; } __packed; + +/* + * This macro calculates the offset into the WQCFG register + * idxd - struct idxd * + * n - wq id + * ofs - the index of the 32b dword for the config register + * + * The WQCFG register block is divided into groups per each wq. The n index + * allows us to move to the register group that's for that particular wq. + * Each register is 32bits. The ofs gives us the number of register to access. + */ +#define WQCFG_OFFSET(idxd_dev, n, ofs) ((idxd_dev)->wqcfg_offset +\ + (n) * sizeof(union wqcfg) +\ + sizeof(u32) * (ofs)) #endif diff --git a/drivers/dma/idxd/submit.c b/drivers/dma/idxd/submit.c index 156a1ee233aa..3e63d820a98e 100644 --- a/drivers/dma/idxd/submit.c +++ b/drivers/dma/idxd/submit.c @@ -11,11 +11,22 @@ static struct idxd_desc *__get_desc(struct idxd_wq *wq, int idx, int cpu) { struct idxd_desc *desc; + struct idxd_device *idxd = wq->idxd; desc = wq->descs[idx]; memset(desc->hw, 0, sizeof(struct dsa_hw_desc)); memset(desc->completion, 0, sizeof(struct dsa_completion_record)); desc->cpu = cpu; + + if (device_pasid_enabled(idxd)) + desc->hw->pasid = idxd->pasid; + + /* + * Descriptor completion vectors are 1-8 for MSIX. We will round + * robin through the 8 vectors. + */ + wq->vec_ptr = (wq->vec_ptr % idxd->num_wq_irqs) + 1; + desc->hw->int_handle = wq->vec_ptr; return desc; } @@ -74,14 +85,26 @@ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc) if (idxd->state != IDXD_DEV_ENABLED) return -EIO; - portal = wq->dportal + idxd_get_wq_portal_offset(IDXD_PORTAL_UNLIMITED); + portal = wq->portal + idxd_get_wq_portal_offset(IDXD_PORTAL_LIMITED); + /* - * The wmb() flushes writes to coherent DMA data before possibly - * triggering a DMA read. The wmb() is necessary even on UP because - * the recipient is a device. + * The wmb() flushes writes to coherent DMA data before + * possibly triggering a DMA read. The wmb() is necessary + * even on UP because the recipient is a device. */ wmb(); - iosubmit_cmds512(portal, desc->hw, 1); + if (wq_dedicated(wq)) { + iosubmit_cmds512(portal, desc->hw, 1); + } else { + /* + * It's not likely that we would receive queue full rejection + * since the descriptor allocation gates at wq size. If we + * receive a 1, that means something went wrong such as the + * device is not accepting descriptor at all. + */ + if (!enqcmds(portal, desc->hw)) + return -ENXIO; + } /* * Pending the descriptor to the lockless list for the irq_entry diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index dcba60953217..bd216160634f 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -175,6 +175,30 @@ static int idxd_config_bus_probe(struct device *dev) return -EINVAL; } + /* Shared WQ checks */ + if (wq_shared(wq)) { + if (!device_pasid_enabled(idxd)) { + dev_warn(dev, + "PASID not enabled and shared WQ.\n"); + mutex_unlock(&wq->wq_lock); + return -ENXIO; + } + /* + * Shared wq with the threshold set to 0 means the user + * did not set the threshold or transitioned from a + * dedicated wq but did not set threshold. A value + * of 0 would effectively disable the shared wq. The + * driver does not allow a value of 0 to be set for + * threshold via sysfs. + */ + if (wq->threshold == 0) { + dev_warn(dev, + "Shared WQ and threshold 0.\n"); + mutex_unlock(&wq->wq_lock); + return -EINVAL; + } + } + rc = idxd_wq_alloc_resources(wq); if (rc < 0) { mutex_unlock(&wq->wq_lock); @@ -875,6 +899,8 @@ static ssize_t wq_mode_store(struct device *dev, if (sysfs_streq(buf, "dedicated")) { set_bit(WQ_FLAG_DEDICATED, &wq->flags); wq->threshold = 0; + } else if (sysfs_streq(buf, "shared") && device_pasid_enabled(idxd)) { + clear_bit(WQ_FLAG_DEDICATED, &wq->flags); } else { return -EINVAL; } @@ -973,6 +999,87 @@ static ssize_t wq_priority_store(struct device *dev, static struct device_attribute dev_attr_wq_priority = __ATTR(priority, 0644, wq_priority_show, wq_priority_store); +static ssize_t wq_block_on_fault_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct idxd_wq *wq = container_of(dev, struct idxd_wq, conf_dev); + + return sprintf(buf, "%u\n", + test_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags)); +} + +static ssize_t wq_block_on_fault_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct idxd_wq *wq = container_of(dev, struct idxd_wq, conf_dev); + struct idxd_device *idxd = wq->idxd; + bool bof; + int rc; + + if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) + return -EPERM; + + if (wq->state != IDXD_WQ_DISABLED) + return -ENXIO; + + rc = kstrtobool(buf, &bof); + if (rc < 0) + return rc; + + if (bof) + set_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags); + else + clear_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags); + + return count; +} + +static struct device_attribute dev_attr_wq_block_on_fault = + __ATTR(block_on_fault, 0644, wq_block_on_fault_show, + wq_block_on_fault_store); + +static ssize_t wq_threshold_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct idxd_wq *wq = container_of(dev, struct idxd_wq, conf_dev); + + return sprintf(buf, "%u\n", wq->threshold); +} + +static ssize_t wq_threshold_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct idxd_wq *wq = container_of(dev, struct idxd_wq, conf_dev); + struct idxd_device *idxd = wq->idxd; + unsigned int val; + int rc; + + rc = kstrtouint(buf, 0, &val); + if (rc < 0) + return -EINVAL; + + if (val > wq->size || val <= 0) + return -EINVAL; + + if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) + return -EPERM; + + if (wq->state != IDXD_WQ_DISABLED) + return -ENXIO; + + if (test_bit(WQ_FLAG_DEDICATED, &wq->flags)) + return -EINVAL; + + wq->threshold = val; + + return count; +} + +static struct device_attribute dev_attr_wq_threshold = + __ATTR(threshold, 0644, wq_threshold_show, wq_threshold_store); + static ssize_t wq_type_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1044,6 +1151,13 @@ static ssize_t wq_name_store(struct device *dev, if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0) return -EINVAL; + /* + * This is temporarily placed here until we have SVM support for + * dmaengine. + */ + if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd)) + return -EOPNOTSUPP; + memset(wq->name, 0, WQ_NAME_SIZE + 1); strncpy(wq->name, buf, WQ_NAME_SIZE); strreplace(wq->name, '\n', '\0'); @@ -1071,6 +1185,8 @@ static struct attribute *idxd_wq_attributes[] = { &dev_attr_wq_mode.attr, &dev_attr_wq_size.attr, &dev_attr_wq_priority.attr, + &dev_attr_wq_block_on_fault.attr, + &dev_attr_wq_threshold.attr, &dev_attr_wq_type.attr, &dev_attr_wq_name.attr, &dev_attr_wq_cdev_minor.attr, @@ -1220,6 +1336,16 @@ static ssize_t clients_show(struct device *dev, } static DEVICE_ATTR_RO(clients); +static ssize_t pasid_enabled_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct idxd_device *idxd = + container_of(dev, struct idxd_device, conf_dev); + + return sprintf(buf, "%u\n", device_pasid_enabled(idxd)); +} +static DEVICE_ATTR_RO(pasid_enabled); + static ssize_t state_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1330,6 +1456,7 @@ static struct attribute *idxd_device_attributes[] = { &dev_attr_gen_cap.attr, &dev_attr_configurable.attr, &dev_attr_clients.attr, + &dev_attr_pasid_enabled.attr, &dev_attr_state.attr, &dev_attr_errors.attr, &dev_attr_max_tokens.attr, From patchwork Mon Jul 6 18:21:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11646461 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7E7E11398 for ; Mon, 6 Jul 2020 18:22:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6999620702 for ; Mon, 6 Jul 2020 18:22:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729743AbgGFSWB (ORCPT ); Mon, 6 Jul 2020 14:22:01 -0400 Received: from mga05.intel.com ([192.55.52.43]:2390 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729647AbgGFSWA (ORCPT ); Mon, 6 Jul 2020 14:22:00 -0400 IronPort-SDR: H9Ins1009DlVb9qOUagcPF4h2ogsaPoTuogkwuee9Vsn/53bXgSvsnu4sseC1dfGclI6VPGCB/ Y3sTeuW1LwLw== X-IronPort-AV: E=McAfee;i="6000,8403,9673"; a="232334042" X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="232334042" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2020 11:21:56 -0700 IronPort-SDR: vEq1dLRVboRKtmkQk7QG5u8f3HQsKMF8qQlQHOK4pwND+lrLj29S7B+c6a0eX+1Tr7A5Qnn+Wx qJjcbDeva2UA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="314037038" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga008.jf.intel.com with ESMTP; 06 Jul 2020 11:21:55 -0700 Subject: [PATCH v3 5/6] dmaengine: idxd: clean up descriptors with fault error From: Dave Jiang To: vkoul@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de Cc: Tony Luck , Dan Williams , dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, dan.j.williams@intel.com, ashok.raj@intel.com, fenghua.yu@intel.com, tony.luck@intel.com, jing.lin@intel.com Date: Mon, 06 Jul 2020 11:21:55 -0700 Message-ID: <159405971555.19216.16912463239825977485.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> References: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: dmaengine-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dmaengine@vger.kernel.org Add code to "complete" a descriptor when the descriptor or its completion address hit a fault error when SVA mode is being used. This error can be triggered due to bad programming by the user. A lock is introduced in order to protect the descriptor completion lists since the fault handler will run from the system work queue after being scheduled in the interrupt handler. Signed-off-by: Dave Jiang Reviewed-by: Tony Luck Reviewed-by: Dan Williams --- drivers/dma/idxd/idxd.h | 5 ++ drivers/dma/idxd/init.c | 1 drivers/dma/idxd/irq.c | 143 +++++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 137 insertions(+), 12 deletions(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index ab31e939dad6..518768885d7b 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -34,6 +34,11 @@ struct idxd_irq_entry { int id; struct llist_head pending_llist; struct list_head work_list; + /* + * Lock to protect access between irq thread process descriptor + * and irq thread processing error descriptor. + */ + spinlock_t list_lock; }; struct idxd_group { diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 1f23df0f97c1..a7e1dbfcd173 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -96,6 +96,7 @@ static int idxd_setup_interrupts(struct idxd_device *idxd) for (i = 0; i < msixcnt; i++) { idxd->irq_entries[i].id = i; idxd->irq_entries[i].idxd = idxd; + spin_lock_init(&idxd->irq_entries[i].list_lock); } msix = &idxd->msix_entries[0]; diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index b5142556cc4e..c97e08480323 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -11,6 +11,24 @@ #include "idxd.h" #include "registers.h" +enum irq_work_type { + IRQ_WORK_NORMAL = 0, + IRQ_WORK_PROCESS_FAULT, +}; + +struct idxd_fault { + struct work_struct work; + u64 addr; + struct idxd_device *idxd; +}; + +static int irq_process_work_list(struct idxd_irq_entry *irq_entry, + enum irq_work_type wtype, + int *processed, u64 data); +static int irq_process_pending_llist(struct idxd_irq_entry *irq_entry, + enum irq_work_type wtype, + int *processed, u64 data); + void idxd_device_wqs_clear_state(struct idxd_device *idxd) { int i; @@ -56,6 +74,46 @@ static void idxd_device_reinit(struct work_struct *work) idxd_device_wqs_clear_state(idxd); } +static void idxd_device_fault_work(struct work_struct *work) +{ + struct idxd_fault *fault = container_of(work, struct idxd_fault, work); + struct idxd_irq_entry *ie; + int i; + int processed; + int irqcnt = fault->idxd->num_wq_irqs + 1; + + for (i = 1; i < irqcnt; i++) { + ie = &fault->idxd->irq_entries[i]; + irq_process_work_list(ie, IRQ_WORK_PROCESS_FAULT, + &processed, fault->addr); + if (processed) + break; + + irq_process_pending_llist(ie, IRQ_WORK_PROCESS_FAULT, + &processed, fault->addr); + if (processed) + break; + } + + kfree(fault); +} + +static int idxd_device_schedule_fault_process(struct idxd_device *idxd, + u64 fault_addr) +{ + struct idxd_fault *fault; + + fault = kmalloc(sizeof(*fault), GFP_ATOMIC); + if (!fault) + return -ENOMEM; + + fault->addr = fault_addr; + fault->idxd = idxd; + INIT_WORK(&fault->work, idxd_device_fault_work); + queue_work(idxd->wq, &fault->work); + return 0; +} + irqreturn_t idxd_irq_handler(int vec, void *data) { struct idxd_irq_entry *irq_entry = data; @@ -137,6 +195,16 @@ irqreturn_t idxd_misc_thread(int vec, void *data) if (!err) goto out; + /* + * This case should rarely happen and typically is due to software + * programming error by the driver. + */ + if (idxd->sw_err.valid && + idxd->sw_err.desc_valid && + idxd->sw_err.fault_addr) + idxd_device_schedule_fault_process(idxd, + idxd->sw_err.fault_addr); + gensts.bits = ioread32(idxd->reg_base + IDXD_GENSTATS_OFFSET); if (gensts.state == IDXD_DEVICE_STATE_HALT) { idxd->state = IDXD_DEV_HALTED; @@ -164,57 +232,106 @@ irqreturn_t idxd_misc_thread(int vec, void *data) return IRQ_HANDLED; } +static bool process_fault(struct idxd_desc *desc, u64 fault_addr) +{ + if ((u64)desc->hw == fault_addr || + (u64)desc->completion == fault_addr) { + idxd_dma_complete_txd(desc, IDXD_COMPLETE_DEV_FAIL); + return true; + } + + return false; +} + +static bool complete_desc(struct idxd_desc *desc) +{ + if (desc->completion->status) { + idxd_dma_complete_txd(desc, IDXD_COMPLETE_NORMAL); + return true; + } + + return false; +} + static int irq_process_pending_llist(struct idxd_irq_entry *irq_entry, - int *processed) + enum irq_work_type wtype, + int *processed, u64 data) { struct idxd_desc *desc, *t; struct llist_node *head; int queued = 0; + bool completed = false; + unsigned long flags; *processed = 0; head = llist_del_all(&irq_entry->pending_llist); if (!head) - return 0; + goto out; llist_for_each_entry_safe(desc, t, head, llnode) { - if (desc->completion->status) { - idxd_dma_complete_txd(desc, IDXD_COMPLETE_NORMAL); + if (wtype == IRQ_WORK_NORMAL) + completed = complete_desc(desc); + else if (wtype == IRQ_WORK_PROCESS_FAULT) + completed = process_fault(desc, data); + + if (completed) { idxd_free_desc(desc->wq, desc); (*processed)++; + if (wtype == IRQ_WORK_PROCESS_FAULT) + break; } else { - list_add_tail(&desc->list, &irq_entry->work_list); + spin_lock_irqsave(&irq_entry->list_lock, flags); + list_add_tail(&desc->list, + &irq_entry->work_list); + spin_unlock_irqrestore(&irq_entry->list_lock, flags); queued++; } } + out: return queued; } static int irq_process_work_list(struct idxd_irq_entry *irq_entry, - int *processed) + enum irq_work_type wtype, + int *processed, u64 data) { struct list_head *node, *next; int queued = 0; + bool completed = false; + unsigned long flags; *processed = 0; + spin_lock_irqsave(&irq_entry->list_lock, flags); if (list_empty(&irq_entry->work_list)) - return 0; + goto out; list_for_each_safe(node, next, &irq_entry->work_list) { struct idxd_desc *desc = container_of(node, struct idxd_desc, list); - if (desc->completion->status) { + spin_unlock_irqrestore(&irq_entry->list_lock, flags); + if (wtype == IRQ_WORK_NORMAL) + completed = complete_desc(desc); + else if (wtype == IRQ_WORK_PROCESS_FAULT) + completed = process_fault(desc, data); + + if (completed) { + spin_lock_irqsave(&irq_entry->list_lock, flags); list_del(&desc->list); - /* process and callback */ - idxd_dma_complete_txd(desc, IDXD_COMPLETE_NORMAL); + spin_unlock_irqrestore(&irq_entry->list_lock, flags); idxd_free_desc(desc->wq, desc); (*processed)++; + if (wtype == IRQ_WORK_PROCESS_FAULT) + return queued; } else { queued++; } + spin_lock_irqsave(&irq_entry->list_lock, flags); } + out: + spin_unlock_irqrestore(&irq_entry->list_lock, flags); return queued; } @@ -242,12 +359,14 @@ static int idxd_desc_process(struct idxd_irq_entry *irq_entry) * 5. Repeat until no more descriptors. */ do { - rc = irq_process_work_list(irq_entry, &processed); + rc = irq_process_work_list(irq_entry, IRQ_WORK_NORMAL, + &processed, 0); total += processed; if (rc != 0) continue; - rc = irq_process_pending_llist(irq_entry, &processed); + rc = irq_process_pending_llist(irq_entry, IRQ_WORK_NORMAL, + &processed, 0); total += processed; } while (rc != 0); From patchwork Mon Jul 6 18:22:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11646463 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7300E1398 for ; Mon, 6 Jul 2020 18:22:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 647492075B for ; Mon, 6 Jul 2020 18:22:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729750AbgGFSWE (ORCPT ); Mon, 6 Jul 2020 14:22:04 -0400 Received: from mga07.intel.com ([134.134.136.100]:55280 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729647AbgGFSWD (ORCPT ); Mon, 6 Jul 2020 14:22:03 -0400 IronPort-SDR: 9pD/5DerAwCwSyfQWsNu2vzvNOzf3p319ejckPWeFW08yageC1bZ2ItHsEI3NLlSuKpELhA68L Ax4hOQwE9u7Q== X-IronPort-AV: E=McAfee;i="6000,8403,9673"; a="212454138" X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="212454138" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2020 11:22:02 -0700 IronPort-SDR: oN9m+rIQCN3syuIFmMlur2gPncmerVM4+pwhjHFM5sag09qNc72pYXSYJPgdQYw9tFcjioVOOt AWvVDpXhwOUA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,320,1589266800"; d="scan'208";a="427187389" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga004.jf.intel.com with ESMTP; 06 Jul 2020 11:22:01 -0700 Subject: [PATCH v3 6/6] dmaengine: idxd: add ABI documentation for shared wq From: Dave Jiang To: vkoul@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de Cc: Jing Lin , Tony Luck , Dan Williams , dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, dan.j.williams@intel.com, ashok.raj@intel.com, fenghua.yu@intel.com, tony.luck@intel.com, jing.lin@intel.com Date: Mon, 06 Jul 2020 11:22:01 -0700 Message-ID: <159405972142.19216.202541495863164149.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> References: <159405827797.19216.15283540319201919054.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: dmaengine-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dmaengine@vger.kernel.org Add the sysfs attribute bits in ABI/stable for shared wq support. Signed-off-by: Jing Lin Signed-off-by: Dave Jiang Reviewed-by: Tony Luck Reviewed-by: Dan Williams --- Documentation/ABI/stable/sysfs-driver-dma-idxd | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd index 1af9c4175213..a336d2b2009a 100644 --- a/Documentation/ABI/stable/sysfs-driver-dma-idxd +++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd @@ -77,6 +77,13 @@ Contact: dmaengine@vger.kernel.org Description: The operation capability bit mask specify the operation types supported by the this device. +What: /sys/bus/dsa/devices/dsa/pasid_enabled +Date: Jul 5, 2020 +KernelVersion: 5.9.0 +Contact: dmaengine@vger.kernel.org +Description: To indicate if PASID (process address space identifier) is + enabled or not for this device. + What: /sys/bus/dsa/devices/dsa/state Date: Oct 25, 2019 KernelVersion: 5.6.0 @@ -116,6 +123,13 @@ Description: The maximum number of bandwidth tokens that may be in use at one time by operations that access low bandwidth memory in the device. +What: /sys/bus/dsa/devices/wq./block_on_fault +Date: Jul 5, 2020 +KernelVersion: 5.9.0 +Contact: dmaengine@vger.kernel.org +Description: To indicate block on fault is allowed or not for the work queue + to support on demand paging. + What: /sys/bus/dsa/devices/wq./group_id Date: Oct 25, 2019 KernelVersion: 5.6.0