Message ID | 1405591275-14461-10-git-send-email-michel@daenzer.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 17.07.2014 12:01, Michel Dänzer wrote: > From: Michel Dänzer <michel.daenzer@amd.com> > > This is hopefully safe: The kernel makes sure writes to these mappings > finish before the GPU might start reading from them, and the GPU caches > are invalidated at the start of a command stream. > Aren't CPU reads from write-combined GTT memory extraordinarily slow, because they're uncached? And don't you need the right access patterns to make write combining perform well? Grigori > Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> > --- > src/gallium/drivers/radeon/r600_buffer_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c > index 40917f0..c8a0723 100644 > --- a/src/gallium/drivers/radeon/r600_buffer_common.c > +++ b/src/gallium/drivers/radeon/r600_buffer_common.c > @@ -131,7 +131,7 @@ bool r600_init_resource(struct r600_common_screen *rscreen, > res->b.b.flags & (PIPE_RESOURCE_FLAG_MAP_PERSISTENT | > PIPE_RESOURCE_FLAG_MAP_COHERENT)) { > res->domains = RADEON_DOMAIN_GTT; > - flags = 0; > + flags = RADEON_FLAG_GTT_WC; > } > > /* Tiled textures are unmappable. Always put them in VRAM. */ >
The resource flags actually tell you what you can do. If the COHERENT flag is set, the mapping must be cached. If it's unset, it's up to you. If write-combining is faster for vertex uploads, then Glamor shouldn't set the coherent flag. Marek On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote: > From: Michel Dänzer <michel.daenzer@amd.com> > > This is hopefully safe: The kernel makes sure writes to these mappings > finish before the GPU might start reading from them, and the GPU caches > are invalidated at the start of a command stream. > > Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> > --- > src/gallium/drivers/radeon/r600_buffer_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c > index 40917f0..c8a0723 100644 > --- a/src/gallium/drivers/radeon/r600_buffer_common.c > +++ b/src/gallium/drivers/radeon/r600_buffer_common.c > @@ -131,7 +131,7 @@ bool r600_init_resource(struct r600_common_screen *rscreen, > res->b.b.flags & (PIPE_RESOURCE_FLAG_MAP_PERSISTENT | > PIPE_RESOURCE_FLAG_MAP_COHERENT)) { > res->domains = RADEON_DOMAIN_GTT; > - flags = 0; > + flags = RADEON_FLAG_GTT_WC; > } > > /* Tiled textures are unmappable. Always put them in VRAM. */ > -- > 2.0.0 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
On 17.07.2014 21:00, Marek Olšák wrote: > On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote: >> From: Michel Dänzer <michel.daenzer@amd.com> >> >> This is hopefully safe: The kernel makes sure writes to these mappings >> finish before the GPU might start reading from them, and the GPU caches >> are invalidated at the start of a command stream. >> > The resource flags actually tell you what you can do. If the COHERENT > flag is set, the mapping must be cached. Why is that required? As I explain above, we should satisfy the requirements of the ARB_buffer_storage extension AFAICT. As pointed out by you and Grigori in other posts, I should probably just drop the special treatment of persistent mappings though, so the placement and flags are derived from the buffer usage.
If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the patch is okay. Marek On Fri, Jul 18, 2014 at 5:19 AM, Michel Dänzer <michel@daenzer.net> wrote: > On 17.07.2014 21:00, Marek Olšák wrote: >> On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote: >>> From: Michel Dänzer <michel.daenzer@amd.com> >>> >>> This is hopefully safe: The kernel makes sure writes to these mappings >>> finish before the GPU might start reading from them, and the GPU caches >>> are invalidated at the start of a command stream. >>> >> The resource flags actually tell you what you can do. If the COHERENT >> flag is set, the mapping must be cached. > > Why is that required? As I explain above, we should satisfy the > requirements of the ARB_buffer_storage extension AFAICT. > > > As pointed out by you and Grigori in other posts, I should probably just > drop the special treatment of persistent mappings though, so the > placement and flags are derived from the buffer usage. > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer
On 18.07.2014 13:45, Marek Olšák wrote: > If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the > patch is okay. > Apart from correctness, I still wonder how this will affect performance, most notably CPU reads. This change unconditionally uses write-combined, uncached memory for MAP_COHERENT buffers. Unless I am missing something, CPU reads will be slow, even if the buffer storage flags indicate that the buffer will be read by the CPU. Maybe it's a good idea to avoid write combined memory if the buffer storage flags include MAP_READ_BIT? Grigori > Marek > > > On Fri, Jul 18, 2014 at 5:19 AM, Michel Dänzer <michel@daenzer.net> wrote: >> On 17.07.2014 21:00, Marek Olšák wrote: >>> On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote: >>>> From: Michel Dänzer <michel.daenzer@amd.com> >>>> >>>> This is hopefully safe: The kernel makes sure writes to these mappings >>>> finish before the GPU might start reading from them, and the GPU caches >>>> are invalidated at the start of a command stream. >>>> >>> The resource flags actually tell you what you can do. If the COHERENT >>> flag is set, the mapping must be cached. >> >> Why is that required? As I explain above, we should satisfy the >> requirements of the ARB_buffer_storage extension AFAICT. >> >> >> As pointed out by you and Grigori in other posts, I should probably just >> drop the special treatment of persistent mappings though, so the >> placement and flags are derived from the buffer usage. >> >> >> -- >> Earthling Michel Dänzer | http://www.amd.com >> Libre software enthusiast | Mesa and X developer > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev >
GL_MAP_READ_BIT is always unconditionally set for glBufferData, which is the most-used function. However, st/mesa can look at the immutability flag to distinguish between BufferData and BufferStorage before pipe_buffer_create is called, and set the read flag if the caller is BufferStorage and GL_MAP_READ_BIT is flagged, and not set any flag otherwise. Marek On Fri, Jul 18, 2014 at 9:50 PM, Grigori Goronzy <greg@chown.ath.cx> wrote: > On 18.07.2014 13:45, Marek Olšák wrote: >> If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the >> patch is okay. >> > > Apart from correctness, I still wonder how this will affect performance, > most notably CPU reads. This change unconditionally uses write-combined, > uncached memory for MAP_COHERENT buffers. Unless I am missing something, > CPU reads will be slow, even if the buffer storage flags indicate that > the buffer will be read by the CPU. Maybe it's a good idea to avoid > write combined memory if the buffer storage flags include MAP_READ_BIT? > > Grigori > >> Marek >> >> >> On Fri, Jul 18, 2014 at 5:19 AM, Michel Dänzer <michel@daenzer.net> wrote: >>> On 17.07.2014 21:00, Marek Olšák wrote: >>>> On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote: >>>>> From: Michel Dänzer <michel.daenzer@amd.com> >>>>> >>>>> This is hopefully safe: The kernel makes sure writes to these mappings >>>>> finish before the GPU might start reading from them, and the GPU caches >>>>> are invalidated at the start of a command stream. >>>>> >>>> The resource flags actually tell you what you can do. If the COHERENT >>>> flag is set, the mapping must be cached. >>> >>> Why is that required? As I explain above, we should satisfy the >>> requirements of the ARB_buffer_storage extension AFAICT. >>> >>> >>> As pointed out by you and Grigori in other posts, I should probably just >>> drop the special treatment of persistent mappings though, so the >>> placement and flags are derived from the buffer usage. >>> >>> >>> -- >>> Earthling Michel Dänzer | http://www.amd.com >>> Libre software enthusiast | Mesa and X developer >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > >
On Fri, Jul 18, 2014 at 5:19 AM, Michel Dänzer <michel@daenzer.net> wrote: > On 17.07.2014 21:00, Marek Olšák wrote: >> On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer <michel@daenzer.net> wrote: >>> From: Michel Dänzer <michel.daenzer@amd.com> >>> >>> This is hopefully safe: The kernel makes sure writes to these mappings >>> finish before the GPU might start reading from them, and the GPU caches >>> are invalidated at the start of a command stream. >>> >> The resource flags actually tell you what you can do. If the COHERENT >> flag is set, the mapping must be cached. > > Why is that required? As I explain above, we should satisfy the > requirements of the ARB_buffer_storage extension AFAICT. > > > As pointed out by you and Grigori in other posts, I should probably just > drop the special treatment of persistent mappings though, so the > placement and flags are derived from the buffer usage. Yes, please drop the special treatment of persistent mappings. Thank you. Marek
On 18.07.2014 20:45, Marek Olšák wrote: > If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the > patch is okay. AFAICT GL_MAP_COHERENT_BIT simply means the app doesn't need to call glMemoryBarrier() to ensure coherency between access to the mapping and GL commands. Since we don't need to do anything for glMemoryBarrier(), GL_MAP_COHERENT_BIT doesn't make any difference for us. That said, I think I need to add code in the kernel ensuring we always flush the HDP cache before submitting a command stream to the hardware, bump the minor version, and only use VRAM for streaming when the DRM minor version is >= the bumped version.
This sounds good to me. Marek On Wed, Jul 23, 2014 at 8:32 AM, Michel Dänzer <michel@daenzer.net> wrote: > On 18.07.2014 20:45, Marek Olšák wrote: >> If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the >> patch is okay. > > AFAICT GL_MAP_COHERENT_BIT simply means the app doesn't need to call > glMemoryBarrier() to ensure coherency between access to the mapping and > GL commands. Since we don't need to do anything for glMemoryBarrier(), > GL_MAP_COHERENT_BIT doesn't make any difference for us. > > That said, I think I need to add code in the kernel ensuring we always > flush the HDP cache before submitting a command stream to the hardware, > bump the minor version, and only use VRAM for streaming when the DRM > minor version is >= the bumped version. > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer
diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c index 40917f0..c8a0723 100644 --- a/src/gallium/drivers/radeon/r600_buffer_common.c +++ b/src/gallium/drivers/radeon/r600_buffer_common.c @@ -131,7 +131,7 @@ bool r600_init_resource(struct r600_common_screen *rscreen, res->b.b.flags & (PIPE_RESOURCE_FLAG_MAP_PERSISTENT | PIPE_RESOURCE_FLAG_MAP_COHERENT)) { res->domains = RADEON_DOMAIN_GTT; - flags = 0; + flags = RADEON_FLAG_GTT_WC; } /* Tiled textures are unmappable. Always put them in VRAM. */