Message ID | 20181220145657.304-1-alexander.deucher@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm: return false in drm_arch_can_wc_memory() for ARM | expand |
On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote: > I'm not familiar enough with ARM to know if write combining > is actually an architectural limitation or if it's an issue > with the PCIe IPs used on various platforms, but so far > everyone that has tried to run radeon hardware on > ARM has had to disable it. So let's just make it official. wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing mappings and stuff, so you need to allocate your wc memory from special pools. So probably best to just disable it until we figure this out. > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > include/drm/drm_cache.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h > index bfe1639df02d..691b4c4b0587 100644 > --- a/include/drm/drm_cache.h > +++ b/include/drm/drm_cache.h > @@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void) > return false; > #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3) > return false; > +#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64) > + return false; > #else > return true; > #endif > -- > 2.13.6 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, Dec 20, 2018 at 04:36:19PM +0100, Daniel Vetter wrote: > On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote: > > I'm not familiar enough with ARM to know if write combining > > is actually an architectural limitation or if it's an issue > > with the PCIe IPs used on various platforms, but so far > > everyone that has tried to run radeon hardware on > > ARM has had to disable it. So let's just make it official. > > wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing > mappings and stuff, so you need to allocate your wc memory from special > pools. So probably best to just disable it until we figure this out. I believe both of you are conflating different issues under the wrong name. Write combining happens all the time with Arm, the ARMv8 architecture is a weakly-ordered model of memory so hardware is allowed to re-order or combine memory access as they seem fit. A while ago I did run an AMD GPU card on my Juno dev board and it worked (for a very limited definition of worked, I've only validated the fact that I could get an fbcon and could run un-accelerated X11). So I would be interested if Alex could share some of the scenarios where people are seeing failures. As for aliasing, yeah, having multiple aliases to the same piece of memory is a bad thing. The problem arises when devices on the PCI bus have memory allocated as device memory (which on Arm is non-cacheable and non-reorderable), but the PCI bus effectively acts as a write-combiner which changes the order of transactions. Therefore, for devices that have local memory associated with them (i.e. more than just register accesses) one should allocate memory in the first place that is Device-GRE (gathering, reordering and early-access). Otherwise, problems will surface that are not visible on x86 as that is a strongly ordered architecture. > > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Given that this API is only used by AMD I'm OK for now with the change, but I think in general it is misleading and we should work towards fixing radeon and amd drivers. Best regards, Liviu > > > --- > > include/drm/drm_cache.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h > > index bfe1639df02d..691b4c4b0587 100644 > > --- a/include/drm/drm_cache.h > > +++ b/include/drm/drm_cache.h > > @@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void) > > return false; > > #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3) > > return false; > > +#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64) > > + return false; > > #else > > return true; > > #endif > > -- > > 2.13.6 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Fri, Dec 21, 2018 at 9:16 AM Liviu Dudau <Liviu.Dudau@arm.com> wrote: > > On Thu, Dec 20, 2018 at 04:36:19PM +0100, Daniel Vetter wrote: > > On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote: > > > I'm not familiar enough with ARM to know if write combining > > > is actually an architectural limitation or if it's an issue > > > with the PCIe IPs used on various platforms, but so far > > > everyone that has tried to run radeon hardware on > > > ARM has had to disable it. So let's just make it official. > > > > wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing > > mappings and stuff, so you need to allocate your wc memory from special > > pools. So probably best to just disable it until we figure this out. > > I believe both of you are conflating different issues under the wrong > name. Write combining happens all the time with Arm, the ARMv8 > architecture is a weakly-ordered model of memory so hardware is allowed > to re-order or combine memory access as they seem fit. > > A while ago I did run an AMD GPU card on my Juno dev board and it worked > (for a very limited definition of worked, I've only validated the fact > that I could get an fbcon and could run un-accelerated X11). So I would > be interested if Alex could share some of the scenarios where people are > seeing failures. Here's an example: https://bugs.freedesktop.org/show_bug.cgi?id=108625 But there are probably 5 or 6 other cases where people have emailed me or our team directly with issues on ARM resolved by disabling WC. Generally the driver seems to load ok, but then hangs as soon as you try and use acceleration from userspace or we end up with page flipping timeouts. Not really sure what the issue is. Michel suggested maybe ARM has a cacheable kernel mapping of all "normal" system memory, and having both that mapping and another non-cacheable mapping of the same page can result in bad behaviour. > > As for aliasing, yeah, having multiple aliases to the same piece of > memory is a bad thing. The problem arises when devices on the PCI bus > have memory allocated as device memory (which on Arm is non-cacheable > and non-reorderable), but the PCI bus effectively acts as a write-combiner > which changes the order of transactions. Therefore, for devices that > have local memory associated with them (i.e. more than just register > accesses) one should allocate memory in the first place that is > Device-GRE (gathering, reordering and early-access). Otherwise, problems > will surface that are not visible on x86 as that is a strongly ordered > architecture. PCI framebuffer BARs are mapped on the CPU with WC. We also use uncached WC mappings for system memory in cases where it's not likely we will be doing any CPU reads. When accessing system memory, the GPU can either do a CPU cache snooped transaction or a non-snooped transaction. The non-snooped transaction has lower latency and better throughput since it doesn't have to snoop the CPU cache. > > > > > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > Given that this API is only used by AMD I'm OK for now with the change, > but I think in general it is misleading and we should work towards > fixing radeon and amd drivers. Alternatively, we could just disable WC in the amdgpu driver on ARM. I'm not sure to what extent other drivers are using WC in general or have been tested on ARM. Alex > > Best regards, > Liviu > > > > > > --- > > > include/drm/drm_cache.h | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h > > > index bfe1639df02d..691b4c4b0587 100644 > > > --- a/include/drm/drm_cache.h > > > +++ b/include/drm/drm_cache.h > > > @@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void) > > > return false; > > > #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3) > > > return false; > > > +#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64) > > > + return false; > > > #else > > > return true; > > > #endif > > > -- > > > 2.13.6 > > > > > > _______________________________________________ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > -- > ==================== > | I would like to | > | fix the world, | > | but they're not | > | giving me the | > \ source code! / > --------------- > ¯\_(ツ)_/¯
Alex Deucher <alexdeucher@gmail.com> writes: > On Fri, Dec 21, 2018 at 9:16 AM Liviu Dudau <Liviu.Dudau@arm.com> wrote: >> >> On Thu, Dec 20, 2018 at 04:36:19PM +0100, Daniel Vetter wrote: >> > On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote: >> > > I'm not familiar enough with ARM to know if write combining >> > > is actually an architectural limitation or if it's an issue >> > > with the PCIe IPs used on various platforms, but so far >> > > everyone that has tried to run radeon hardware on >> > > ARM has had to disable it. So let's just make it official. >> > >> > wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing >> > mappings and stuff, so you need to allocate your wc memory from special >> > pools. So probably best to just disable it until we figure this out. >> >> I believe both of you are conflating different issues under the wrong >> name. Write combining happens all the time with Arm, the ARMv8 >> architecture is a weakly-ordered model of memory so hardware is allowed >> to re-order or combine memory access as they seem fit. >> >> A while ago I did run an AMD GPU card on my Juno dev board and it worked >> (for a very limited definition of worked, I've only validated the fact >> that I could get an fbcon and could run un-accelerated X11). So I would >> be interested if Alex could share some of the scenarios where people are >> seeing failures. > > Here's an example: > https://bugs.freedesktop.org/show_bug.cgi?id=108625 > But there are probably 5 or 6 other cases where people have emailed me > or our team directly with issues on ARM resolved by disabling WC. > Generally the driver seems to load ok, but then hangs as soon as you > try and use acceleration from userspace or we end up with page > flipping timeouts. Not really sure what the issue is. Michel > suggested maybe ARM has a cacheable kernel mapping of all "normal" > system memory, and having > both that mapping and another non-cacheable mapping of the same page > can result in bad behaviour. > >> >> As for aliasing, yeah, having multiple aliases to the same piece of >> memory is a bad thing. The problem arises when devices on the PCI bus >> have memory allocated as device memory (which on Arm is non-cacheable >> and non-reorderable), but the PCI bus effectively acts as a write-combiner >> which changes the order of transactions. Therefore, for devices that >> have local memory associated with them (i.e. more than just register >> accesses) one should allocate memory in the first place that is >> Device-GRE (gathering, reordering and early-access). Otherwise, problems >> will surface that are not visible on x86 as that is a strongly ordered >> architecture. > > PCI framebuffer BARs are mapped on the CPU with WC. We also use > uncached WC mappings for system memory in cases where it's not likely > we will be doing any CPU reads. When accessing system memory, the GPU > can either do a CPU cache snooped transaction or a non-snooped > transaction. The non-snooped transaction has lower latency and better > throughput since it doesn't have to snoop the CPU cache. > >> >> > >> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> >> > >> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> >> >> Given that this API is only used by AMD I'm OK for now with the change, >> but I think in general it is misleading and we should work towards >> fixing radeon and amd drivers. > > Alternatively, we could just disable WC in the amdgpu driver on ARM. > I'm not sure to what extent other drivers are using WC in general or > have been tested on ARM. FWIW, I use WC mappings of BOs on V3D (shmem) and VC4 (cma). V3D is totally stable. VC4 I've heard reports of stability issues long-term but I don't think it's related. I don't do any cached mappings of my BOs, though.
diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h index bfe1639df02d..691b4c4b0587 100644 --- a/include/drm/drm_cache.h +++ b/include/drm/drm_cache.h @@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void) return false; #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3) return false; +#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64) + return false; #else return true; #endif
I'm not familiar enough with ARM to know if write combining is actually an architectural limitation or if it's an issue with the PCIe IPs used on various platforms, but so far everyone that has tried to run radeon hardware on ARM has had to disable it. So let's just make it official. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> --- include/drm/drm_cache.h | 2 ++ 1 file changed, 2 insertions(+)