diff mbox

[4/5] r600g, radeonsi: Use write-combined persistent GTT mappings

Message ID 1405591275-14461-10-git-send-email-michel@daenzer.net (mailing list archive)
State New, archived
Headers show

Commit Message

Michel Dänzer July 17, 2014, 10:01 a.m. UTC
From: Michel Dänzer <michel.daenzer@amd.com>

This is hopefully safe: The kernel makes sure writes to these mappings
finish before the GPU might start reading from them, and the GPU caches
are invalidated at the start of a command stream.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Grigori Goronzy July 17, 2014, 10:56 a.m. UTC | #1
On 17.07.2014 12:01, Michel Dänzer wrote:
> From: Michel Dänzer <michel.daenzer@amd.com>
> 
> This is hopefully safe: The kernel makes sure writes to these mappings
> finish before the GPU might start reading from them, and the GPU caches
> are invalidated at the start of a command stream.
>

Aren't CPU reads from write-combined GTT memory extraordinarily slow,
because they're uncached? And don't you need the right access patterns
to make write combining perform well?

Grigori

> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
> ---
>  src/gallium/drivers/radeon/r600_buffer_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c
> index 40917f0..c8a0723 100644
> --- a/src/gallium/drivers/radeon/r600_buffer_common.c
> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
> @@ -131,7 +131,7 @@ bool r600_init_resource(struct r600_common_screen *rscreen,
>  	    res->b.b.flags & (PIPE_RESOURCE_FLAG_MAP_PERSISTENT |
>  			      PIPE_RESOURCE_FLAG_MAP_COHERENT)) {
>  		res->domains = RADEON_DOMAIN_GTT;
> -		flags = 0;
> +		flags = RADEON_FLAG_GTT_WC;
>  	}
>  
>  	/* Tiled textures are unmappable. Always put them in VRAM. */
>
Marek Olšák July 17, 2014, noon UTC | #2
The resource flags actually tell you what you can do. If the COHERENT
flag is set, the mapping must be cached. If it's unset, it's up to
you.

If write-combining is faster for vertex uploads, then Glamor shouldn't
set the coherent flag.

Marek

On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote:
> From: Michel Dänzer <michel.daenzer@amd.com>
>
> This is hopefully safe: The kernel makes sure writes to these mappings
> finish before the GPU might start reading from them, and the GPU caches
> are invalidated at the start of a command stream.
>
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
> ---
>  src/gallium/drivers/radeon/r600_buffer_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c
> index 40917f0..c8a0723 100644
> --- a/src/gallium/drivers/radeon/r600_buffer_common.c
> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
> @@ -131,7 +131,7 @@ bool r600_init_resource(struct r600_common_screen *rscreen,
>             res->b.b.flags & (PIPE_RESOURCE_FLAG_MAP_PERSISTENT |
>                               PIPE_RESOURCE_FLAG_MAP_COHERENT)) {
>                 res->domains = RADEON_DOMAIN_GTT;
> -               flags = 0;
> +               flags = RADEON_FLAG_GTT_WC;
>         }
>
>         /* Tiled textures are unmappable. Always put them in VRAM. */
> --
> 2.0.0
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
Michel Dänzer July 18, 2014, 3:19 a.m. UTC | #3
On 17.07.2014 21:00, Marek Olšák wrote:
> On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote:
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> This is hopefully safe: The kernel makes sure writes to these mappings
>> finish before the GPU might start reading from them, and the GPU caches
>> are invalidated at the start of a command stream.
>>
> The resource flags actually tell you what you can do. If the COHERENT
> flag is set, the mapping must be cached.

Why is that required? As I explain above, we should satisfy the
requirements of the ARB_buffer_storage extension AFAICT.


As pointed out by you and Grigori in other posts, I should probably just
drop the special treatment of persistent mappings though, so the
placement and flags are derived from the buffer usage.
Marek Olšák July 18, 2014, 11:45 a.m. UTC | #4
If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the
patch is okay.

Marek


On Fri, Jul 18, 2014 at 5:19 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On 17.07.2014 21:00, Marek Olšák wrote:
>> On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>> From: Michel Dänzer <michel.daenzer@amd.com>
>>>
>>> This is hopefully safe: The kernel makes sure writes to these mappings
>>> finish before the GPU might start reading from them, and the GPU caches
>>> are invalidated at the start of a command stream.
>>>
>> The resource flags actually tell you what you can do. If the COHERENT
>> flag is set, the mapping must be cached.
>
> Why is that required? As I explain above, we should satisfy the
> requirements of the ARB_buffer_storage extension AFAICT.
>
>
> As pointed out by you and Grigori in other posts, I should probably just
> drop the special treatment of persistent mappings though, so the
> placement and flags are derived from the buffer usage.
>
>
> --
> Earthling Michel Dänzer            |                  http://www.amd.com
> Libre software enthusiast          |                Mesa and X developer
Grigori Goronzy July 18, 2014, 7:50 p.m. UTC | #5
On 18.07.2014 13:45, Marek Olšák wrote:
> If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the
> patch is okay.
>

Apart from correctness, I still wonder how this will affect performance,
most notably CPU reads. This change unconditionally uses write-combined,
uncached memory for MAP_COHERENT buffers. Unless I am missing something,
CPU reads will be slow, even if the buffer storage flags indicate that
the buffer will be read by the CPU. Maybe it's a good idea to avoid
write combined memory if the buffer storage flags include MAP_READ_BIT?

Grigori

> Marek
> 
> 
> On Fri, Jul 18, 2014 at 5:19 AM, Michel Dänzer <michel@daenzer.net> wrote:
>> On 17.07.2014 21:00, Marek Olšák wrote:
>>> On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>>> From: Michel Dänzer <michel.daenzer@amd.com>
>>>>
>>>> This is hopefully safe: The kernel makes sure writes to these mappings
>>>> finish before the GPU might start reading from them, and the GPU caches
>>>> are invalidated at the start of a command stream.
>>>>
>>> The resource flags actually tell you what you can do. If the COHERENT
>>> flag is set, the mapping must be cached.
>>
>> Why is that required? As I explain above, we should satisfy the
>> requirements of the ARB_buffer_storage extension AFAICT.
>>
>>
>> As pointed out by you and Grigori in other posts, I should probably just
>> drop the special treatment of persistent mappings though, so the
>> placement and flags are derived from the buffer usage.
>>
>>
>> --
>> Earthling Michel Dänzer            |                  http://www.amd.com
>> Libre software enthusiast          |                Mesa and X developer
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
Marek Olšák July 18, 2014, 8:52 p.m. UTC | #6
GL_MAP_READ_BIT is always unconditionally set for glBufferData, which
is the most-used function. However, st/mesa can look at the
immutability flag to distinguish between BufferData and BufferStorage
before pipe_buffer_create is called, and set the read flag if the
caller is BufferStorage and GL_MAP_READ_BIT is flagged, and not set
any flag otherwise.

Marek

On Fri, Jul 18, 2014 at 9:50 PM, Grigori Goronzy <greg@chown.ath.cx> wrote:
> On 18.07.2014 13:45, Marek Olšák wrote:
>> If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the
>> patch is okay.
>>
>
> Apart from correctness, I still wonder how this will affect performance,
> most notably CPU reads. This change unconditionally uses write-combined,
> uncached memory for MAP_COHERENT buffers. Unless I am missing something,
> CPU reads will be slow, even if the buffer storage flags indicate that
> the buffer will be read by the CPU. Maybe it's a good idea to avoid
> write combined memory if the buffer storage flags include MAP_READ_BIT?
>
> Grigori
>
>> Marek
>>
>>
>> On Fri, Jul 18, 2014 at 5:19 AM, Michel Dänzer <michel@daenzer.net> wrote:
>>> On 17.07.2014 21:00, Marek Olšák wrote:
>>>> On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>>>> From: Michel Dänzer <michel.daenzer@amd.com>
>>>>>
>>>>> This is hopefully safe: The kernel makes sure writes to these mappings
>>>>> finish before the GPU might start reading from them, and the GPU caches
>>>>> are invalidated at the start of a command stream.
>>>>>
>>>> The resource flags actually tell you what you can do. If the COHERENT
>>>> flag is set, the mapping must be cached.
>>>
>>> Why is that required? As I explain above, we should satisfy the
>>> requirements of the ARB_buffer_storage extension AFAICT.
>>>
>>>
>>> As pointed out by you and Grigori in other posts, I should probably just
>>> drop the special treatment of persistent mappings though, so the
>>> placement and flags are derived from the buffer usage.
>>>
>>>
>>> --
>>> Earthling Michel Dänzer            |                  http://www.amd.com
>>> Libre software enthusiast          |                Mesa and X developer
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
>
Marek Olšák July 18, 2014, 8:55 p.m. UTC | #7
On Fri, Jul 18, 2014 at 5:19 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On 17.07.2014 21:00, Marek Olšák wrote:
>> On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>> From: Michel Dänzer <michel.daenzer@amd.com>
>>>
>>> This is hopefully safe: The kernel makes sure writes to these mappings
>>> finish before the GPU might start reading from them, and the GPU caches
>>> are invalidated at the start of a command stream.
>>>
>> The resource flags actually tell you what you can do. If the COHERENT
>> flag is set, the mapping must be cached.
>
> Why is that required? As I explain above, we should satisfy the
> requirements of the ARB_buffer_storage extension AFAICT.
>
>
> As pointed out by you and Grigori in other posts, I should probably just
> drop the special treatment of persistent mappings though, so the
> placement and flags are derived from the buffer usage.

Yes, please drop the special treatment of persistent mappings. Thank you.

Marek
Michel Dänzer July 23, 2014, 6:32 a.m. UTC | #8
On 18.07.2014 20:45, Marek Olšák wrote:
> If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the
> patch is okay.

AFAICT GL_MAP_COHERENT_BIT simply means the app doesn't need to call
glMemoryBarrier() to ensure coherency between access to the mapping and
GL commands. Since we don't need to do anything for glMemoryBarrier(),
GL_MAP_COHERENT_BIT doesn't make any difference for us.

That said, I think I need to add code in the kernel ensuring we always
flush the HDP cache before submitting a command stream to the hardware,
bump the minor version, and only use VRAM for streaming when the DRM
minor version is >= the bumped version.
Marek Olšák July 23, 2014, 11:52 a.m. UTC | #9
This sounds good to me.

Marek

On Wed, Jul 23, 2014 at 8:32 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On 18.07.2014 20:45, Marek Olšák wrote:
>> If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the
>> patch is okay.
>
> AFAICT GL_MAP_COHERENT_BIT simply means the app doesn't need to call
> glMemoryBarrier() to ensure coherency between access to the mapping and
> GL commands. Since we don't need to do anything for glMemoryBarrier(),
> GL_MAP_COHERENT_BIT doesn't make any difference for us.
>
> That said, I think I need to add code in the kernel ensuring we always
> flush the HDP cache before submitting a command stream to the hardware,
> bump the minor version, and only use VRAM for streaming when the DRM
> minor version is >= the bumped version.
>
>
> --
> Earthling Michel Dänzer            |                  http://www.amd.com
> Libre software enthusiast          |                Mesa and X developer
diff mbox

Patch

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c
index 40917f0..c8a0723 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -131,7 +131,7 @@  bool r600_init_resource(struct r600_common_screen *rscreen,
 	    res->b.b.flags & (PIPE_RESOURCE_FLAG_MAP_PERSISTENT |
 			      PIPE_RESOURCE_FLAG_MAP_COHERENT)) {
 		res->domains = RADEON_DOMAIN_GTT;
-		flags = 0;
+		flags = RADEON_FLAG_GTT_WC;
 	}
 
 	/* Tiled textures are unmappable. Always put them in VRAM. */