[1/2] x86/io: add interface to reserve io memtype for a resource range. (v1.1)
diff mbox

Message ID 1477290706-7696-2-git-send-email-airlied@redhat.com
State New
Headers show

Commit Message

David Airlie Oct. 24, 2016, 6:31 a.m. UTC
A recent change to the mm code in:
87744ab3832b83ba71b931f86f9cfdb000d07da5
mm: fix cache mode tracking in vm_insert_mixed()

started enforcing checking the memory type against the registered list for
amixed pfn insertion mappings. It happens that the drm drivers for a number
of gpus relied on this being broken. Currently the driver only inserted
VRAM mappings into the tracking table when they came from the kernel,
and userspace mappings never landed in the table. This led to a regression
where all the mapping end up as UC instead of WC now.

I've considered a number of solutions but since this needs to be fixed
in fixes and not next, and some of the solutions were going to introduce
overhead that hadn't been there before I didn't consider them viable at
this stage. These mainly concerned hooking into the TTM io reserve APIs,
but these API have a bunch of fast paths I didn't want to unwind to add
this to.

The solution I've decided on is to add a new API like the arch_phys_wc
APIs (these would have worked but wc_del didn't take a range), and
use them from the drivers to add a WC compatible mapping to the table
for all VRAM on those GPUs. This means we can then create userspace
mapping that won't get degraded to UC.

v1.1: use CONFIG_X86_PAT
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: x86@kernel.org
Cc: mcgrof@suse.com
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
---
 arch/x86/include/asm/io.h |  6 ++++++
 arch/x86/mm/pat.c         | 13 +++++++++++++
 include/linux/io.h        | 13 +++++++++++++
 3 files changed, 32 insertions(+)

Comments

Ingo Molnar Oct. 25, 2016, 9:40 a.m. UTC | #1
* Dave Airlie <airlied@redhat.com> wrote:

> A recent change to the mm code in:
> 87744ab3832b83ba71b931f86f9cfdb000d07da5
> mm: fix cache mode tracking in vm_insert_mixed()
> 
> started enforcing checking the memory type against the registered list for
> amixed pfn insertion mappings. It happens that the drm drivers for a number
> of gpus relied on this being broken. Currently the driver only inserted
> VRAM mappings into the tracking table when they came from the kernel,
> and userspace mappings never landed in the table. This led to a regression
> where all the mapping end up as UC instead of WC now.
> 
> I've considered a number of solutions but since this needs to be fixed
> in fixes and not next, and some of the solutions were going to introduce
> overhead that hadn't been there before I didn't consider them viable at
> this stage. These mainly concerned hooking into the TTM io reserve APIs,
> but these API have a bunch of fast paths I didn't want to unwind to add
> this to.
> 
> The solution I've decided on is to add a new API like the arch_phys_wc
> APIs (these would have worked but wc_del didn't take a range), and
> use them from the drivers to add a WC compatible mapping to the table
> for all VRAM on those GPUs. This means we can then create userspace
> mapping that won't get degraded to UC.
> 
> v1.1: use CONFIG_X86_PAT
> Cc: Toshi Kani <toshi.kani@hp.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Denys Vlasenko <dvlasenk@redhat.com>
> Cc: Brian Gerst <brgerst@gmail.com>
> Cc: x86@kernel.org
> Cc: mcgrof@suse.com
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Dave Airlie <airlied@redhat.com>
> ---
>  arch/x86/include/asm/io.h |  6 ++++++
>  arch/x86/mm/pat.c         | 13 +++++++++++++
>  include/linux/io.h        | 13 +++++++++++++
>  3 files changed, 32 insertions(+)

These changes look good to me in principle:

  Acked-by: Ingo Molnar <mingo@kernel.org>

I think it would be best to merge these fixes via the DRM tree?

Thanks,

	Ingo
Thomas Gleixner Oct. 25, 2016, 11:10 a.m. UTC | #2
On Mon, 24 Oct 2016, Dave Airlie wrote:
> A recent change to the mm code in:
> 87744ab3832b83ba71b931f86f9cfdb000d07da5

nit: 12 digits of the SHA1 are sufficient :)

> +int arch_io_reserve_memtype_wc(resource_size_t start, resource_size_t size)
> +{
> +	enum page_cache_mode type = _PAGE_CACHE_MODE_WC;

Empty line between variable declaration and code please

> +	return io_reserve_memtype(start, start + size, &type);

Other than that:

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Luis Chamberlain Oct. 25, 2016, 5:31 p.m. UTC | #3
On Mon, Oct 24, 2016 at 04:31:45PM +1000, Dave Airlie wrote:
> A recent change to the mm code in:
> 87744ab3832b83ba71b931f86f9cfdb000d07da5
> mm: fix cache mode tracking in vm_insert_mixed()
> 
> started enforcing checking the memory type against the registered list for
> amixed pfn insertion mappings. It happens that the drm drivers for a number
> of gpus relied on this being broken. Currently the driver only inserted
> VRAM mappings into the tracking table when they came from the kernel,
> and userspace mappings never landed in the table. This led to a regression
> where all the mapping end up as UC instead of WC now.

Eek.

> I've considered a number of solutions but since this needs to be fixed
> in fixes and not next, and some of the solutions were going to introduce
> overhead that hadn't been there before I didn't consider them viable at
> this stage. These mainly concerned hooking into the TTM io reserve APIs,
> but these API have a bunch of fast paths I didn't want to unwind to add
> this to.
> 
> The solution I've decided on is to add a new API like the arch_phys_wc
> APIs (these would have worked but wc_del didn't take a range), and
> use them from the drivers to add a WC compatible mapping to the table
> for all VRAM on those GPUs. This means we can then create userspace
> mapping that won't get degraded to UC.

Is anything on a driver to be able to tell when this is actually needed ?
How will driver developers know? Can you add a bit of documentation to
the API? If its transitive towards a secondary solution indicating so
would help driver developers.

  Luis
Daniel Vetter Oct. 26, 2016, 5:49 a.m. UTC | #4
On Tue, Oct 25, 2016 at 07:31:29PM +0200, Luis R. Rodriguez wrote:
> On Mon, Oct 24, 2016 at 04:31:45PM +1000, Dave Airlie wrote:
> > A recent change to the mm code in:
> > 87744ab3832b83ba71b931f86f9cfdb000d07da5
> > mm: fix cache mode tracking in vm_insert_mixed()
> > 
> > started enforcing checking the memory type against the registered list for
> > amixed pfn insertion mappings. It happens that the drm drivers for a number
> > of gpus relied on this being broken. Currently the driver only inserted
> > VRAM mappings into the tracking table when they came from the kernel,
> > and userspace mappings never landed in the table. This led to a regression
> > where all the mapping end up as UC instead of WC now.
> 
> Eek.
> 
> > I've considered a number of solutions but since this needs to be fixed
> > in fixes and not next, and some of the solutions were going to introduce
> > overhead that hadn't been there before I didn't consider them viable at
> > this stage. These mainly concerned hooking into the TTM io reserve APIs,
> > but these API have a bunch of fast paths I didn't want to unwind to add
> > this to.
> > 
> > The solution I've decided on is to add a new API like the arch_phys_wc
> > APIs (these would have worked but wc_del didn't take a range), and
> > use them from the drivers to add a WC compatible mapping to the table
> > for all VRAM on those GPUs. This means we can then create userspace
> > mapping that won't get degraded to UC.
> 
> Is anything on a driver to be able to tell when this is actually needed ?
> How will driver developers know? Can you add a bit of documentation to
> the API? If its transitive towards a secondary solution indicating so
> would help driver developers.

I'll plug the io-mapping stuff again here, and more specifically the
userspace pte wrangling stuff we've added in 4.9 to i915_mm.c. Should
probably move that one to the core. That way io_mapping takes care of the
full reservartion, and allows you to on-demand kmap (for kernel) and write
ptes. All nicely fast and all, and for bonus, also nicely encapsulated.
-Daniel
Dave Airlie Oct. 26, 2016, 6:12 a.m. UTC | #5
>>
>> Is anything on a driver to be able to tell when this is actually needed ?
>> How will driver developers know? Can you add a bit of documentation to
>> the API? If its transitive towards a secondary solution indicating so
>> would help driver developers.
>
> I'll plug the io-mapping stuff again here, and more specifically the
> userspace pte wrangling stuff we've added in 4.9 to i915_mm.c. Should
> probably move that one to the core. That way io_mapping takes care of the
> full reservartion, and allows you to on-demand kmap (for kernel) and write
> ptes. All nicely fast and all, and for bonus, also nicely encapsulated.

Yeah I think ideally we'd want to move towards that, however we don't tend
to want to ioremap the full range even on 64-bit, which is what io-mapping does.

At least on most GPUs with VRAM we rarely want to map VRAM for much,
I think page tables and fbcon are probably the main two uses for touch
it at all.

So I don't think we need to be as efficient as i915 in this area.

Dave.
Daniel Vetter Oct. 26, 2016, 7 a.m. UTC | #6
On Wed, Oct 26, 2016 at 8:12 AM, Dave Airlie <airlied@gmail.com> wrote:
>>> Is anything on a driver to be able to tell when this is actually needed ?
>>> How will driver developers know? Can you add a bit of documentation to
>>> the API? If its transitive towards a secondary solution indicating so
>>> would help driver developers.
>>
>> I'll plug the io-mapping stuff again here, and more specifically the
>> userspace pte wrangling stuff we've added in 4.9 to i915_mm.c. Should
>> probably move that one to the core. That way io_mapping takes care of the
>> full reservartion, and allows you to on-demand kmap (for kernel) and write
>> ptes. All nicely fast and all, and for bonus, also nicely encapsulated.
>
> Yeah I think ideally we'd want to move towards that, however we don't tend
> to want to ioremap the full range even on 64-bit, which is what io-mapping does.

Hm, I thought on 64 we have linear mappings of all the io space
anyway, and they're essentially for free. Am I wrong and there's some
overhead here too?
-Daniel

Patch
diff mbox

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index de25aad..d34bd37 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -351,4 +351,10 @@  extern void arch_phys_wc_del(int handle);
 #define arch_phys_wc_add arch_phys_wc_add
 #endif
 
+#ifdef CONFIG_X86_PAT
+extern int arch_io_reserve_memtype_wc(resource_size_t start, resource_size_t size);
+extern void arch_io_free_memtype_wc(resource_size_t start, resource_size_t size);
+#define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
+#endif
+
 #endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 170cc4f..49d1b75 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -730,6 +730,19 @@  void io_free_memtype(resource_size_t start, resource_size_t end)
 	free_memtype(start, end);
 }
 
+int arch_io_reserve_memtype_wc(resource_size_t start, resource_size_t size)
+{
+	enum page_cache_mode type = _PAGE_CACHE_MODE_WC;
+	return io_reserve_memtype(start, start + size, &type);
+}
+EXPORT_SYMBOL(arch_io_reserve_memtype_wc);
+
+void arch_io_free_memtype_wc(resource_size_t start, resource_size_t size)
+{
+	io_free_memtype(start, start + size);
+}
+EXPORT_SYMBOL(arch_io_free_memtype_wc);
+
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 				unsigned long size, pgprot_t vma_prot)
 {
diff --git a/include/linux/io.h b/include/linux/io.h
index e2c8419..963ab71 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -141,4 +141,17 @@  enum {
 void *memremap(resource_size_t offset, size_t size, unsigned long flags);
 void memunmap(void *addr);
 
+#ifndef arch_io_reserve_memtype_wc
+static inline int arch_io_reserve_memtype_wc(resource_size_t base,
+					     resource_size_t size)
+{
+	return 0;
+}
+
+static inline void arch_io_free_memtype_wc(resource_size_t base,
+					   resource_size_t size)
+{
+}
+#endif
+
 #endif /* _LINUX_IO_H */