drm: return false in drm_arch_can_wc_memory() for ARM
diff mbox series

Message ID 20181220145657.304-1-alexander.deucher@amd.com
State New
Headers show
Series
  • drm: return false in drm_arch_can_wc_memory() for ARM
Related show

Commit Message

Alex Deucher Dec. 20, 2018, 2:56 p.m. UTC
I'm not familiar enough with ARM to know if write combining
is actually an architectural limitation or if it's an issue
with the PCIe IPs used on various platforms, but so far
everyone that has tried to run radeon hardware on
ARM has had to disable it.  So let's just make it official.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 include/drm/drm_cache.h | 2 ++
 1 file changed, 2 insertions(+)

Comments

Daniel Vetter Dec. 20, 2018, 3:36 p.m. UTC | #1
On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote:
> I'm not familiar enough with ARM to know if write combining
> is actually an architectural limitation or if it's an issue
> with the PCIe IPs used on various platforms, but so far
> everyone that has tried to run radeon hardware on
> ARM has had to disable it.  So let's just make it official.

wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing
mappings and stuff, so you need to allocate your wc memory from special
pools. So probably best to just disable it until we figure this out.
 
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  include/drm/drm_cache.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
> index bfe1639df02d..691b4c4b0587 100644
> --- a/include/drm/drm_cache.h
> +++ b/include/drm/drm_cache.h
> @@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void)
>  	return false;
>  #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
>  	return false;
> +#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> +	return false;
>  #else
>  	return true;
>  #endif
> -- 
> 2.13.6
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Liviu Dudau Dec. 21, 2018, 2:16 p.m. UTC | #2
On Thu, Dec 20, 2018 at 04:36:19PM +0100, Daniel Vetter wrote:
> On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote:
> > I'm not familiar enough with ARM to know if write combining
> > is actually an architectural limitation or if it's an issue
> > with the PCIe IPs used on various platforms, but so far
> > everyone that has tried to run radeon hardware on
> > ARM has had to disable it.  So let's just make it official.
> 
> wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing
> mappings and stuff, so you need to allocate your wc memory from special
> pools. So probably best to just disable it until we figure this out.

I believe both of you are conflating different issues under the wrong
name. Write combining happens all the time with Arm, the ARMv8
architecture is a weakly-ordered model of memory so hardware is allowed
to re-order or combine memory access as they seem fit.

A while ago I did run an AMD GPU card on my Juno dev board and it worked
(for a very limited definition of worked, I've only validated the fact
that I could get an fbcon and could run un-accelerated X11). So I would
be interested if Alex could share some of the scenarios where people are
seeing failures.

As for aliasing, yeah, having multiple aliases to the same piece of
memory is a bad thing. The problem arises when devices on the PCI bus
have memory allocated as device memory (which on Arm is non-cacheable
and non-reorderable), but the PCI bus effectively acts as a write-combiner
which changes the order of transactions. Therefore, for devices that
have local memory associated with them (i.e. more than just register
accesses) one should allocate memory in the first place that is
Device-GRE (gathering, reordering and early-access). Otherwise, problems
will surface that are not visible on x86 as that is a strongly ordered
architecture.

>  
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Given that this API is only used by AMD I'm OK for now with the change,
but I think in general it is misleading and we should work towards
fixing radeon and amd drivers.

Best regards,
Liviu

> 
> > ---
> >  include/drm/drm_cache.h | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
> > index bfe1639df02d..691b4c4b0587 100644
> > --- a/include/drm/drm_cache.h
> > +++ b/include/drm/drm_cache.h
> > @@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void)
> >  	return false;
> >  #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
> >  	return false;
> > +#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> > +	return false;
> >  #else
> >  	return true;
> >  #endif
> > -- 
> > 2.13.6
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Alex Deucher Dec. 21, 2018, 2:48 p.m. UTC | #3
On Fri, Dec 21, 2018 at 9:16 AM Liviu Dudau <Liviu.Dudau@arm.com> wrote:
>
> On Thu, Dec 20, 2018 at 04:36:19PM +0100, Daniel Vetter wrote:
> > On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote:
> > > I'm not familiar enough with ARM to know if write combining
> > > is actually an architectural limitation or if it's an issue
> > > with the PCIe IPs used on various platforms, but so far
> > > everyone that has tried to run radeon hardware on
> > > ARM has had to disable it.  So let's just make it official.
> >
> > wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing
> > mappings and stuff, so you need to allocate your wc memory from special
> > pools. So probably best to just disable it until we figure this out.
>
> I believe both of you are conflating different issues under the wrong
> name. Write combining happens all the time with Arm, the ARMv8
> architecture is a weakly-ordered model of memory so hardware is allowed
> to re-order or combine memory access as they seem fit.
>
> A while ago I did run an AMD GPU card on my Juno dev board and it worked
> (for a very limited definition of worked, I've only validated the fact
> that I could get an fbcon and could run un-accelerated X11). So I would
> be interested if Alex could share some of the scenarios where people are
> seeing failures.

Here's an example:
https://bugs.freedesktop.org/show_bug.cgi?id=108625
But there are probably 5 or 6 other cases where people have emailed me
or our team directly with issues on ARM resolved by disabling WC.
Generally the driver seems to load ok, but then hangs as soon as you
try and use acceleration from userspace or we end up with page
flipping timeouts.  Not really sure what the issue is.  Michel
suggested maybe ARM has a cacheable kernel mapping of all "normal"
system memory, and having
both that mapping and another non-cacheable mapping of the same page
can result in bad behaviour.

>
> As for aliasing, yeah, having multiple aliases to the same piece of
> memory is a bad thing. The problem arises when devices on the PCI bus
> have memory allocated as device memory (which on Arm is non-cacheable
> and non-reorderable), but the PCI bus effectively acts as a write-combiner
> which changes the order of transactions. Therefore, for devices that
> have local memory associated with them (i.e. more than just register
> accesses) one should allocate memory in the first place that is
> Device-GRE (gathering, reordering and early-access). Otherwise, problems
> will surface that are not visible on x86 as that is a strongly ordered
> architecture.

PCI framebuffer BARs are mapped on the CPU with WC.  We also use
uncached WC mappings for system memory in cases where it's not likely
we will be doing any CPU reads.  When accessing system memory, the GPU
can either do a CPU cache snooped transaction or a non-snooped
transaction.  The non-snooped transaction has lower latency and better
throughput since it doesn't have to snoop the CPU cache.

>
> >
> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Given that this API is only used by AMD I'm OK for now with the change,
> but I think in general it is misleading and we should work towards
> fixing radeon and amd drivers.

Alternatively, we could just disable WC in the amdgpu driver on ARM.
I'm not sure to what extent other drivers are using WC in general or
have been tested on ARM.

Alex

>
> Best regards,
> Liviu
>
> >
> > > ---
> > >  include/drm/drm_cache.h | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
> > > index bfe1639df02d..691b4c4b0587 100644
> > > --- a/include/drm/drm_cache.h
> > > +++ b/include/drm/drm_cache.h
> > > @@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void)
> > >     return false;
> > >  #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
> > >     return false;
> > > +#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> > > +   return false;
> > >  #else
> > >     return true;
> > >  #endif
> > > --
> > > 2.13.6
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> ====================
> | I would like to |
> | fix the world,  |
> | but they're not |
> | giving me the   |
>  \ source code!  /
>   ---------------
>     ¯\_(ツ)_/¯
Eric Anholt Dec. 21, 2018, 4:39 p.m. UTC | #4
Alex Deucher <alexdeucher@gmail.com> writes:

> On Fri, Dec 21, 2018 at 9:16 AM Liviu Dudau <Liviu.Dudau@arm.com> wrote:
>>
>> On Thu, Dec 20, 2018 at 04:36:19PM +0100, Daniel Vetter wrote:
>> > On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote:
>> > > I'm not familiar enough with ARM to know if write combining
>> > > is actually an architectural limitation or if it's an issue
>> > > with the PCIe IPs used on various platforms, but so far
>> > > everyone that has tried to run radeon hardware on
>> > > ARM has had to disable it.  So let's just make it official.
>> >
>> > wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing
>> > mappings and stuff, so you need to allocate your wc memory from special
>> > pools. So probably best to just disable it until we figure this out.
>>
>> I believe both of you are conflating different issues under the wrong
>> name. Write combining happens all the time with Arm, the ARMv8
>> architecture is a weakly-ordered model of memory so hardware is allowed
>> to re-order or combine memory access as they seem fit.
>>
>> A while ago I did run an AMD GPU card on my Juno dev board and it worked
>> (for a very limited definition of worked, I've only validated the fact
>> that I could get an fbcon and could run un-accelerated X11). So I would
>> be interested if Alex could share some of the scenarios where people are
>> seeing failures.
>
> Here's an example:
> https://bugs.freedesktop.org/show_bug.cgi?id=108625
> But there are probably 5 or 6 other cases where people have emailed me
> or our team directly with issues on ARM resolved by disabling WC.
> Generally the driver seems to load ok, but then hangs as soon as you
> try and use acceleration from userspace or we end up with page
> flipping timeouts.  Not really sure what the issue is.  Michel
> suggested maybe ARM has a cacheable kernel mapping of all "normal"
> system memory, and having
> both that mapping and another non-cacheable mapping of the same page
> can result in bad behaviour.
>
>>
>> As for aliasing, yeah, having multiple aliases to the same piece of
>> memory is a bad thing. The problem arises when devices on the PCI bus
>> have memory allocated as device memory (which on Arm is non-cacheable
>> and non-reorderable), but the PCI bus effectively acts as a write-combiner
>> which changes the order of transactions. Therefore, for devices that
>> have local memory associated with them (i.e. more than just register
>> accesses) one should allocate memory in the first place that is
>> Device-GRE (gathering, reordering and early-access). Otherwise, problems
>> will surface that are not visible on x86 as that is a strongly ordered
>> architecture.
>
> PCI framebuffer BARs are mapped on the CPU with WC.  We also use
> uncached WC mappings for system memory in cases where it's not likely
> we will be doing any CPU reads.  When accessing system memory, the GPU
> can either do a CPU cache snooped transaction or a non-snooped
> transaction.  The non-snooped transaction has lower latency and better
> throughput since it doesn't have to snoop the CPU cache.
>
>>
>> >
>> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> >
>> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>
>> Given that this API is only used by AMD I'm OK for now with the change,
>> but I think in general it is misleading and we should work towards
>> fixing radeon and amd drivers.
>
> Alternatively, we could just disable WC in the amdgpu driver on ARM.
> I'm not sure to what extent other drivers are using WC in general or
> have been tested on ARM.

FWIW, I use WC mappings of BOs on V3D (shmem) and VC4 (cma).  V3D is
totally stable.  VC4 I've heard reports of stability issues long-term
but I don't think it's related.  I don't do any cached mappings of my
BOs, though.

Patch
diff mbox series

diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
index bfe1639df02d..691b4c4b0587 100644
--- a/include/drm/drm_cache.h
+++ b/include/drm/drm_cache.h
@@ -47,6 +47,8 @@  static inline bool drm_arch_can_wc_memory(void)
 	return false;
 #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
 	return false;
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+	return false;
 #else
 	return true;
 #endif