Message ID | 20201130120433.7205-2-tzimmermann@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/vram-helper: Lock GEM BOs while they are mapped | expand |
On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: > Mapping a GEM object's buffer into kernel address space prevents the > buffer from being evicted from VRAM, which in turn may result in > out-of-memory errors. It's therefore required to only vmap GEM BOs for > short periods of time; unless the GEM implementation provides additional > guarantees. > > Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> > --- > drivers/gpu/drm/drm_prime.c | 6 ++++++ > include/drm/drm_gem.h | 16 ++++++++++++++++ > 2 files changed, 22 insertions(+) > > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c > index 7db55fce35d8..9c9ece9833e0 100644 > --- a/drivers/gpu/drm/drm_prime.c > +++ b/drivers/gpu/drm/drm_prime.c > @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); > * callback. Calls into &drm_gem_object_funcs.vmap for device specific handling. > * The kernel virtual address is returned in map. > * > + * To prevent the GEM object from being relocated, callers must hold the GEM > + * object's reservation lock from when calling this function until releasing the > + * mapping. Holding onto a mapping and the associated reservation lock for an > + * unbound time may result in out-of-memory errors. Calls to drm_gem_dmabuf_vmap() > + * should therefore be accompanied by a call to drm_gem_dmabuf_vunmap(). > + * > * Returns 0 on success or a negative errno code otherwise. This is a dma-buf hook, which means just documenting the rules you'd like to have here isn't enough. We need to roll this out at the dma-buf level, and enforce it. Enforce it = assert_lock_held Roll out = review everyone. Because this goes through dma-buf it'll come back through shmem helpers (and other helpers and other subsystems) back to any driver using vmap for gpu buffers. This includes the media subsystem, and the media subsystem definitely doesn't cope with just temporarily mapping buffers. So there we need to pin them, which I think means we'll need 2 version of dma_buf_vmap - one that's temporary and requires we hold dma_resv lock, the other requires that the buffer is pinned. That's what I meant with that this approach here is very sprawling :-/ -Daniel > */ > int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map *map) > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h > index 5e6daa1c982f..7c34cd5ec261 100644 > --- a/include/drm/drm_gem.h > +++ b/include/drm/drm_gem.h > @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { > * Returns a virtual address for the buffer. Used by the > * drm_gem_dmabuf_vmap() helper. > * > + * Notes to implementors: > + * > + * - Implementations must expect pairs of @vmap and @vunmap to be > + * called frequently and should optimize for this case. > + * > + * - Implemenations may expect the caller to hold the GEM object's > + * reservation lock to protect against concurrent calls and relocation > + * of the GEM object. > + * > + * - Implementations may provide additional guarantees (e.g., working > + * without holding the reservation lock). > + * > * This callback is optional. > + * > + * See also drm_gem_dmabuf_vmap() > */ > int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); > > @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { > * drm_gem_dmabuf_vunmap() helper. > * > * This callback is optional. > + * > + * See also @vmap. > */ > void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map *map); > > -- > 2.29.2 >
Am 30.11.20 um 16:30 schrieb Daniel Vetter: > On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: >> Mapping a GEM object's buffer into kernel address space prevents the >> buffer from being evicted from VRAM, which in turn may result in >> out-of-memory errors. It's therefore required to only vmap GEM BOs for >> short periods of time; unless the GEM implementation provides additional >> guarantees. >> >> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> >> --- >> drivers/gpu/drm/drm_prime.c | 6 ++++++ >> include/drm/drm_gem.h | 16 ++++++++++++++++ >> 2 files changed, 22 insertions(+) >> >> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c >> index 7db55fce35d8..9c9ece9833e0 100644 >> --- a/drivers/gpu/drm/drm_prime.c >> +++ b/drivers/gpu/drm/drm_prime.c >> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); >> * callback. Calls into &drm_gem_object_funcs.vmap for device specific handling. >> * The kernel virtual address is returned in map. >> * >> + * To prevent the GEM object from being relocated, callers must hold the GEM >> + * object's reservation lock from when calling this function until releasing the >> + * mapping. Holding onto a mapping and the associated reservation lock for an >> + * unbound time may result in out-of-memory errors. Calls to drm_gem_dmabuf_vmap() >> + * should therefore be accompanied by a call to drm_gem_dmabuf_vunmap(). >> + * >> * Returns 0 on success or a negative errno code otherwise. > This is a dma-buf hook, which means just documenting the rules you'd like > to have here isn't enough. We need to roll this out at the dma-buf level, > and enforce it. > > Enforce it = assert_lock_held > > Roll out = review everyone. Because this goes through dma-buf it'll come > back through shmem helpers (and other helpers and other subsystems) back > to any driver using vmap for gpu buffers. This includes the media > subsystem, and the media subsystem definitely doesn't cope with just > temporarily mapping buffers. So there we need to pin them, which I think > means we'll need 2 version of dma_buf_vmap - one that's temporary and > requires we hold dma_resv lock, the other requires that the buffer is > pinned. OR start to proper use the dma_buf_pin/dma_buf_unpin functions which I added to cover this use case as well. Cheers, Christian. > > That's what I meant with that this approach here is very sprawling :-/ > -Daniel > >> */ >> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map *map) >> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >> index 5e6daa1c982f..7c34cd5ec261 100644 >> --- a/include/drm/drm_gem.h >> +++ b/include/drm/drm_gem.h >> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { >> * Returns a virtual address for the buffer. Used by the >> * drm_gem_dmabuf_vmap() helper. >> * >> + * Notes to implementors: >> + * >> + * - Implementations must expect pairs of @vmap and @vunmap to be >> + * called frequently and should optimize for this case. >> + * >> + * - Implemenations may expect the caller to hold the GEM object's >> + * reservation lock to protect against concurrent calls and relocation >> + * of the GEM object. >> + * >> + * - Implementations may provide additional guarantees (e.g., working >> + * without holding the reservation lock). >> + * >> * This callback is optional. >> + * >> + * See also drm_gem_dmabuf_vmap() >> */ >> int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); >> >> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { >> * drm_gem_dmabuf_vunmap() helper. >> * >> * This callback is optional. >> + * >> + * See also @vmap. >> */ >> void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map *map); >> >> -- >> 2.29.2 >>
Am 30.11.20 um 16:30 schrieb Daniel Vetter: > On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: >> Mapping a GEM object's buffer into kernel address space prevents the >> buffer from being evicted from VRAM, which in turn may result in >> out-of-memory errors. It's therefore required to only vmap GEM BOs for >> short periods of time; unless the GEM implementation provides additional >> guarantees. >> >> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> >> --- >> drivers/gpu/drm/drm_prime.c | 6 ++++++ >> include/drm/drm_gem.h | 16 ++++++++++++++++ >> 2 files changed, 22 insertions(+) >> >> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c >> index 7db55fce35d8..9c9ece9833e0 100644 >> --- a/drivers/gpu/drm/drm_prime.c >> +++ b/drivers/gpu/drm/drm_prime.c >> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); >> * callback. Calls into &drm_gem_object_funcs.vmap for device specific handling. >> * The kernel virtual address is returned in map. >> * >> + * To prevent the GEM object from being relocated, callers must hold the GEM >> + * object's reservation lock from when calling this function until releasing the >> + * mapping. Holding onto a mapping and the associated reservation lock for an >> + * unbound time may result in out-of-memory errors. Calls to drm_gem_dmabuf_vmap() >> + * should therefore be accompanied by a call to drm_gem_dmabuf_vunmap(). >> + * >> * Returns 0 on success or a negative errno code otherwise. > > This is a dma-buf hook, which means just documenting the rules you'd like > to have here isn't enough. We need to roll this out at the dma-buf level, > and enforce it. > The documentation for GEM vmap callbacks point here, so it was a good point to start. I know about the dependencies on dmabuf. But fixing everything now is unrealistic. My hope for this patch is that we can find the necessary rules and document them. > Enforce it = assert_lock_held That's probably the final step of many. Best regards Thomas > > Roll out = review everyone. Because this goes through dma-buf it'll come > back through shmem helpers (and other helpers and other subsystems) back > to any driver using vmap for gpu buffers. This includes the media > subsystem, and the media subsystem definitely doesn't cope with just > temporarily mapping buffers. So there we need to pin them, which I think > means we'll need 2 version of dma_buf_vmap - one that's temporary and > requires we hold dma_resv lock, the other requires that the buffer is > pinned. > > That's what I meant with that this approach here is very sprawling :-/ > -Daniel > >> */ >> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map *map) >> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >> index 5e6daa1c982f..7c34cd5ec261 100644 >> --- a/include/drm/drm_gem.h >> +++ b/include/drm/drm_gem.h >> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { >> * Returns a virtual address for the buffer. Used by the >> * drm_gem_dmabuf_vmap() helper. >> * >> + * Notes to implementors: >> + * >> + * - Implementations must expect pairs of @vmap and @vunmap to be >> + * called frequently and should optimize for this case. >> + * >> + * - Implemenations may expect the caller to hold the GEM object's >> + * reservation lock to protect against concurrent calls and relocation >> + * of the GEM object. >> + * >> + * - Implementations may provide additional guarantees (e.g., working >> + * without holding the reservation lock). >> + * >> * This callback is optional. >> + * >> + * See also drm_gem_dmabuf_vmap() >> */ >> int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); >> >> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { >> * drm_gem_dmabuf_vunmap() helper. >> * >> * This callback is optional. >> + * >> + * See also @vmap. >> */ >> void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map *map); >> >> -- >> 2.29.2 >> >
Hi Am 30.11.20 um 16:33 schrieb Christian König: > Am 30.11.20 um 16:30 schrieb Daniel Vetter: >> On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: >>> Mapping a GEM object's buffer into kernel address space prevents the >>> buffer from being evicted from VRAM, which in turn may result in >>> out-of-memory errors. It's therefore required to only vmap GEM BOs for >>> short periods of time; unless the GEM implementation provides additional >>> guarantees. >>> >>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> >>> --- >>> drivers/gpu/drm/drm_prime.c | 6 ++++++ >>> include/drm/drm_gem.h | 16 ++++++++++++++++ >>> 2 files changed, 22 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c >>> index 7db55fce35d8..9c9ece9833e0 100644 >>> --- a/drivers/gpu/drm/drm_prime.c >>> +++ b/drivers/gpu/drm/drm_prime.c >>> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); >>> * callback. Calls into &drm_gem_object_funcs.vmap for device >>> specific handling. >>> * The kernel virtual address is returned in map. >>> * >>> + * To prevent the GEM object from being relocated, callers must hold >>> the GEM >>> + * object's reservation lock from when calling this function until >>> releasing the >>> + * mapping. Holding onto a mapping and the associated reservation >>> lock for an >>> + * unbound time may result in out-of-memory errors. Calls to >>> drm_gem_dmabuf_vmap() >>> + * should therefore be accompanied by a call to >>> drm_gem_dmabuf_vunmap(). >>> + * >>> * Returns 0 on success or a negative errno code otherwise. >> This is a dma-buf hook, which means just documenting the rules you'd like >> to have here isn't enough. We need to roll this out at the dma-buf level, >> and enforce it. >> >> Enforce it = assert_lock_held >> >> Roll out = review everyone. Because this goes through dma-buf it'll come >> back through shmem helpers (and other helpers and other subsystems) back >> to any driver using vmap for gpu buffers. This includes the media >> subsystem, and the media subsystem definitely doesn't cope with just >> temporarily mapping buffers. So there we need to pin them, which I think >> means we'll need 2 version of dma_buf_vmap - one that's temporary and >> requires we hold dma_resv lock, the other requires that the buffer is >> pinned. > > OR start to proper use the dma_buf_pin/dma_buf_unpin functions which I > added to cover this use case as well. While I generally agree, here are some thoughts: I found all generic pin functions useless, because they don't allow for specifying where to pin. With fbdev emulation, this means that console buffers might never make it to VRAM for scanout. If anything, the policy should be that pin always pins in HW-accessible memory. Pin has quite a bit of overhead (more locking, buffer movement), so it should be the second choice after regular vmap. To make both work together, pin probably relies on holding the reservation lock internally. Therefore I think we still would want some additional helpers, such as: pin_unlocked(), which acquires the resv lock, calls regular pin and then drops the resv lock. Same for unpin_unlocked() vmap_pinned(), which enforces that the buffer has been pinned and then calls regalar vmap. Same for vunmap_pinned() A typical pattern with these functions would look like this. drm_gem_object bo; dma_buf_map map; init() { pin_unlocked(bo); vmap_pinned(bo, map); } worker() { begin_cpu_access() // access bo via map end_cpu_access() } fini() { vunmap_pinned(bo, map); unpin_unlocked(bo); } init() while (...) { worker() } fini() Is that reasonable for media drivers? Best regards Thomas > > Cheers, > Christian. > >> >> That's what I meant with that this approach here is very sprawling :-/ >> -Daniel >> >>> */ >>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map >>> *map) >>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >>> index 5e6daa1c982f..7c34cd5ec261 100644 >>> --- a/include/drm/drm_gem.h >>> +++ b/include/drm/drm_gem.h >>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { >>> * Returns a virtual address for the buffer. Used by the >>> * drm_gem_dmabuf_vmap() helper. >>> * >>> + * Notes to implementors: >>> + * >>> + * - Implementations must expect pairs of @vmap and @vunmap to be >>> + * called frequently and should optimize for this case. >>> + * >>> + * - Implemenations may expect the caller to hold the GEM object's >>> + * reservation lock to protect against concurrent calls and >>> relocation >>> + * of the GEM object. >>> + * >>> + * - Implementations may provide additional guarantees (e.g., >>> working >>> + * without holding the reservation lock). >>> + * >>> * This callback is optional. >>> + * >>> + * See also drm_gem_dmabuf_vmap() >>> */ >>> int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); >>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { >>> * drm_gem_dmabuf_vunmap() helper. >>> * >>> * This callback is optional. >>> + * >>> + * See also @vmap. >>> */ >>> void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map >>> *map); >>> -- >>> 2.29.2 >>> > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Dec 1, 2020 at 9:32 AM Thomas Zimmermann <tzimmermann@suse.de> wrote: > > Hi > > Am 30.11.20 um 16:33 schrieb Christian König: > > Am 30.11.20 um 16:30 schrieb Daniel Vetter: > >> On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: > >>> Mapping a GEM object's buffer into kernel address space prevents the > >>> buffer from being evicted from VRAM, which in turn may result in > >>> out-of-memory errors. It's therefore required to only vmap GEM BOs for > >>> short periods of time; unless the GEM implementation provides additional > >>> guarantees. > >>> > >>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> > >>> --- > >>> drivers/gpu/drm/drm_prime.c | 6 ++++++ > >>> include/drm/drm_gem.h | 16 ++++++++++++++++ > >>> 2 files changed, 22 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c > >>> index 7db55fce35d8..9c9ece9833e0 100644 > >>> --- a/drivers/gpu/drm/drm_prime.c > >>> +++ b/drivers/gpu/drm/drm_prime.c > >>> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); > >>> * callback. Calls into &drm_gem_object_funcs.vmap for device > >>> specific handling. > >>> * The kernel virtual address is returned in map. > >>> * > >>> + * To prevent the GEM object from being relocated, callers must hold > >>> the GEM > >>> + * object's reservation lock from when calling this function until > >>> releasing the > >>> + * mapping. Holding onto a mapping and the associated reservation > >>> lock for an > >>> + * unbound time may result in out-of-memory errors. Calls to > >>> drm_gem_dmabuf_vmap() > >>> + * should therefore be accompanied by a call to > >>> drm_gem_dmabuf_vunmap(). > >>> + * > >>> * Returns 0 on success or a negative errno code otherwise. > >> This is a dma-buf hook, which means just documenting the rules you'd like > >> to have here isn't enough. We need to roll this out at the dma-buf level, > >> and enforce it. > >> > >> Enforce it = assert_lock_held > >> > >> Roll out = review everyone. Because this goes through dma-buf it'll come > >> back through shmem helpers (and other helpers and other subsystems) back > >> to any driver using vmap for gpu buffers. This includes the media > >> subsystem, and the media subsystem definitely doesn't cope with just > >> temporarily mapping buffers. So there we need to pin them, which I think > >> means we'll need 2 version of dma_buf_vmap - one that's temporary and > >> requires we hold dma_resv lock, the other requires that the buffer is > >> pinned. > > > > OR start to proper use the dma_buf_pin/dma_buf_unpin functions which I > > added to cover this use case as well. > > While I generally agree, here are some thoughts: > > I found all generic pin functions useless, because they don't allow for > specifying where to pin. With fbdev emulation, this means that console > buffers might never make it to VRAM for scanout. If anything, the policy > should be that pin always pins in HW-accessible memory. > > Pin has quite a bit of overhead (more locking, buffer movement), so it > should be the second choice after regular vmap. To make both work > together, pin probably relies on holding the reservation lock internally. > > Therefore I think we still would want some additional helpers, such as: > > pin_unlocked(), which acquires the resv lock, calls regular pin and > then drops the resv lock. Same for unpin_unlocked() > > vmap_pinned(), which enforces that the buffer has been pinned and > then calls regalar vmap. Same for vunmap_pinned() > > A typical pattern with these functions would look like this. > > drm_gem_object bo; > dma_buf_map map; > > init() { > pin_unlocked(bo); > vmap_pinned(bo, map); > } > > worker() { > begin_cpu_access() > // access bo via map > end_cpu_access() > } > > fini() { > vunmap_pinned(bo, map); > unpin_unlocked(bo); > } > > init() > while (...) { > worker() > } > fini() > > Is that reasonable for media drivers? So media drivers go through dma-buf, which means we always pin into system memory. Which I guess for vram-only display drivers makes no sense and should be rejected, but we still need somewhat consistent rules. The other thing is that if you do a dma_buf_attach without dynamic mode, dma-buf will pin things for you already. So in many cases it could be that we don't need a separate pin (but since the pin is in the exporter, not dma-buf layer, we can't check for that). I'm also not seeing why existing users need to split up their dma_buf_vmap into a pin + vmap, they don't need them separately. I think we could use what we've done for dynamic dma-buf attachment (which also change locking rules) and just have new functions for the new way (i.e. short term vmap protected by dma_resv lock. Maybe call these dma_buf_vmap_local, in the spirit of the new kmap_local which are currently under discussion. I think _local suffix is better, for otherwise people might do something silly like dma_resv_lock(); dma_buf_vmap_locked(); dma_resv_unlock(); /* actual access maybe even in some other thread */ dma_buf_resv_lock(); dma_buf_vunmap_unlocked(); dma_resv_unlock(); _local suffix is better at telling that the resulting pointer has very limited use (essentially just local to the calling context, if you don't change any locking or anything). I think encouraging importers to call dma_buf_pin/unpin isn't a good idea. Yes dynamic ones need it, but maybe we should check for that somehow in the exporterd interface (atm only amdgpu is using it). -Daniel > Best regards > Thomas > > > > > > Cheers, > > Christian. > > > >> > >> That's what I meant with that this approach here is very sprawling :-/ > >> -Daniel > >> > >>> */ > >>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map > >>> *map) > >>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h > >>> index 5e6daa1c982f..7c34cd5ec261 100644 > >>> --- a/include/drm/drm_gem.h > >>> +++ b/include/drm/drm_gem.h > >>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { > >>> * Returns a virtual address for the buffer. Used by the > >>> * drm_gem_dmabuf_vmap() helper. > >>> * > >>> + * Notes to implementors: > >>> + * > >>> + * - Implementations must expect pairs of @vmap and @vunmap to be > >>> + * called frequently and should optimize for this case. > >>> + * > >>> + * - Implemenations may expect the caller to hold the GEM object's > >>> + * reservation lock to protect against concurrent calls and > >>> relocation > >>> + * of the GEM object. > >>> + * > >>> + * - Implementations may provide additional guarantees (e.g., > >>> working > >>> + * without holding the reservation lock). > >>> + * > >>> * This callback is optional. > >>> + * > >>> + * See also drm_gem_dmabuf_vmap() > >>> */ > >>> int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); > >>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { > >>> * drm_gem_dmabuf_vunmap() helper. > >>> * > >>> * This callback is optional. > >>> + * > >>> + * See also @vmap. > >>> */ > >>> void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map > >>> *map); > >>> -- > >>> 2.29.2 > >>> > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5, 90409 Nürnberg, Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Felix Imendörffer >
Am 01.12.20 um 09:32 schrieb Thomas Zimmermann: > Hi > > Am 30.11.20 um 16:33 schrieb Christian König: >> Am 30.11.20 um 16:30 schrieb Daniel Vetter: >>> On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: >>>> Mapping a GEM object's buffer into kernel address space prevents the >>>> buffer from being evicted from VRAM, which in turn may result in >>>> out-of-memory errors. It's therefore required to only vmap GEM BOs for >>>> short periods of time; unless the GEM implementation provides >>>> additional >>>> guarantees. >>>> >>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> >>>> --- >>>> drivers/gpu/drm/drm_prime.c | 6 ++++++ >>>> include/drm/drm_gem.h | 16 ++++++++++++++++ >>>> 2 files changed, 22 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c >>>> index 7db55fce35d8..9c9ece9833e0 100644 >>>> --- a/drivers/gpu/drm/drm_prime.c >>>> +++ b/drivers/gpu/drm/drm_prime.c >>>> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); >>>> * callback. Calls into &drm_gem_object_funcs.vmap for device >>>> specific handling. >>>> * The kernel virtual address is returned in map. >>>> * >>>> + * To prevent the GEM object from being relocated, callers must >>>> hold the GEM >>>> + * object's reservation lock from when calling this function until >>>> releasing the >>>> + * mapping. Holding onto a mapping and the associated reservation >>>> lock for an >>>> + * unbound time may result in out-of-memory errors. Calls to >>>> drm_gem_dmabuf_vmap() >>>> + * should therefore be accompanied by a call to >>>> drm_gem_dmabuf_vunmap(). >>>> + * >>>> * Returns 0 on success or a negative errno code otherwise. >>> This is a dma-buf hook, which means just documenting the rules you'd >>> like >>> to have here isn't enough. We need to roll this out at the dma-buf >>> level, >>> and enforce it. >>> >>> Enforce it = assert_lock_held >>> >>> Roll out = review everyone. Because this goes through dma-buf it'll >>> come >>> back through shmem helpers (and other helpers and other subsystems) >>> back >>> to any driver using vmap for gpu buffers. This includes the media >>> subsystem, and the media subsystem definitely doesn't cope with just >>> temporarily mapping buffers. So there we need to pin them, which I >>> think >>> means we'll need 2 version of dma_buf_vmap - one that's temporary and >>> requires we hold dma_resv lock, the other requires that the buffer is >>> pinned. >> >> OR start to proper use the dma_buf_pin/dma_buf_unpin functions which >> I added to cover this use case as well. > > While I generally agree, here are some thoughts: > > I found all generic pin functions useless, because they don't allow > for specifying where to pin. With fbdev emulation, this means that > console buffers might never make it to VRAM for scanout. If anything, > the policy should be that pin always pins in HW-accessible memory. Yes, completely agree. The major missing part here are some kind of usage flags what we want to do with the buffer. > > Pin has quite a bit of overhead (more locking, buffer movement), so it > should be the second choice after regular vmap. To make both work > together, pin probably relies on holding the reservation lock internally. There is a dma_resv_assert_held() at the beginning of those two functions. > > Therefore I think we still would want some additional helpers, such as: > > pin_unlocked(), which acquires the resv lock, calls regular pin and > then drops the resv lock. Same for unpin_unlocked() > > vmap_pinned(), which enforces that the buffer has been pinned and > then calls regalar vmap. Same for vunmap_pinned() I would rather open code that in each driver, hiding the locking logic in some midlayer is usually not a good idea. Regards, Christian. > > A typical pattern with these functions would look like this. > > drm_gem_object bo; > dma_buf_map map; > > init() { > pin_unlocked(bo); > vmap_pinned(bo, map); > } > > worker() { > begin_cpu_access() > // access bo via map > end_cpu_access() > } > > fini() { > vunmap_pinned(bo, map); > unpin_unlocked(bo); > } > > init() > while (...) { > worker() > } > fini() > > Is that reasonable for media drivers? > > Best regards > Thomas > > >> >> Cheers, >> Christian. >> >>> >>> That's what I meant with that this approach here is very sprawling :-/ >>> -Daniel
Hi Am 01.12.20 um 10:13 schrieb Christian König: > Am 01.12.20 um 09:32 schrieb Thomas Zimmermann: >> Hi >> >> Am 30.11.20 um 16:33 schrieb Christian König: >>> Am 30.11.20 um 16:30 schrieb Daniel Vetter: >>>> On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: >>>>> Mapping a GEM object's buffer into kernel address space prevents the >>>>> buffer from being evicted from VRAM, which in turn may result in >>>>> out-of-memory errors. It's therefore required to only vmap GEM BOs for >>>>> short periods of time; unless the GEM implementation provides >>>>> additional >>>>> guarantees. >>>>> >>>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> >>>>> --- >>>>> drivers/gpu/drm/drm_prime.c | 6 ++++++ >>>>> include/drm/drm_gem.h | 16 ++++++++++++++++ >>>>> 2 files changed, 22 insertions(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c >>>>> index 7db55fce35d8..9c9ece9833e0 100644 >>>>> --- a/drivers/gpu/drm/drm_prime.c >>>>> +++ b/drivers/gpu/drm/drm_prime.c >>>>> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); >>>>> * callback. Calls into &drm_gem_object_funcs.vmap for device >>>>> specific handling. >>>>> * The kernel virtual address is returned in map. >>>>> * >>>>> + * To prevent the GEM object from being relocated, callers must >>>>> hold the GEM >>>>> + * object's reservation lock from when calling this function until >>>>> releasing the >>>>> + * mapping. Holding onto a mapping and the associated reservation >>>>> lock for an >>>>> + * unbound time may result in out-of-memory errors. Calls to >>>>> drm_gem_dmabuf_vmap() >>>>> + * should therefore be accompanied by a call to >>>>> drm_gem_dmabuf_vunmap(). >>>>> + * >>>>> * Returns 0 on success or a negative errno code otherwise. >>>> This is a dma-buf hook, which means just documenting the rules you'd >>>> like >>>> to have here isn't enough. We need to roll this out at the dma-buf >>>> level, >>>> and enforce it. >>>> >>>> Enforce it = assert_lock_held >>>> >>>> Roll out = review everyone. Because this goes through dma-buf it'll >>>> come >>>> back through shmem helpers (and other helpers and other subsystems) >>>> back >>>> to any driver using vmap for gpu buffers. This includes the media >>>> subsystem, and the media subsystem definitely doesn't cope with just >>>> temporarily mapping buffers. So there we need to pin them, which I >>>> think >>>> means we'll need 2 version of dma_buf_vmap - one that's temporary and >>>> requires we hold dma_resv lock, the other requires that the buffer is >>>> pinned. >>> >>> OR start to proper use the dma_buf_pin/dma_buf_unpin functions which >>> I added to cover this use case as well. >> >> While I generally agree, here are some thoughts: >> >> I found all generic pin functions useless, because they don't allow >> for specifying where to pin. With fbdev emulation, this means that >> console buffers might never make it to VRAM for scanout. If anything, >> the policy should be that pin always pins in HW-accessible memory. > > Yes, completely agree. The major missing part here are some kind of > usage flags what we want to do with the buffer. Not sure, but wasn't that not wanted by someone? I don't really remember. Given Daniel's reply about pin, it really feels like we have contradicting policies for this interface. > >> >> Pin has quite a bit of overhead (more locking, buffer movement), so it >> should be the second choice after regular vmap. To make both work >> together, pin probably relies on holding the reservation lock internally. > > There is a dma_resv_assert_held() at the beginning of those two functions. > >> >> Therefore I think we still would want some additional helpers, such as: >> >> pin_unlocked(), which acquires the resv lock, calls regular pin and >> then drops the resv lock. Same for unpin_unlocked() >> >> vmap_pinned(), which enforces that the buffer has been pinned and >> then calls regalar vmap. Same for vunmap_pinned() > > I would rather open code that in each driver, hiding the locking logic > in some midlayer is usually not a good idea. These helpers are less about hiding logic and more about making drivers do the right thing. The idea behind pin_unlocked() is that it drops the reservation lock immediately before returning. I suspect that some driver might not do that with an open-coded version. And vmap_pinned() does check for the BO to be pinned. Regular vmap would assert for the reservation lock instead. Best regards Thomas > > Regards, > Christian. > >> >> A typical pattern with these functions would look like this. >> >> drm_gem_object bo; >> dma_buf_map map; >> >> init() { >> pin_unlocked(bo); >> vmap_pinned(bo, map); >> } >> >> worker() { >> begin_cpu_access() >> // access bo via map >> end_cpu_access() >> } >> >> fini() { >> vunmap_pinned(bo, map); >> unpin_unlocked(bo); >> } >> >> init() >> while (...) { >> worker() >> } >> fini() >> >> Is that reasonable for media drivers? >> >> Best regards >> Thomas >> >> >>> >>> Cheers, >>> Christian. >>> >>>> >>>> That's what I meant with that this approach here is very sprawling :-/ >>>> -Daniel > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Am 01.12.20 um 10:10 schrieb Daniel Vetter: > On Tue, Dec 1, 2020 at 9:32 AM Thomas Zimmermann <tzimmermann@suse.de> wrote: >> >> Hi >> >> Am 30.11.20 um 16:33 schrieb Christian König: >>> Am 30.11.20 um 16:30 schrieb Daniel Vetter: >>>> On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: >>>>> Mapping a GEM object's buffer into kernel address space prevents the >>>>> buffer from being evicted from VRAM, which in turn may result in >>>>> out-of-memory errors. It's therefore required to only vmap GEM BOs for >>>>> short periods of time; unless the GEM implementation provides additional >>>>> guarantees. >>>>> >>>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> >>>>> --- >>>>> drivers/gpu/drm/drm_prime.c | 6 ++++++ >>>>> include/drm/drm_gem.h | 16 ++++++++++++++++ >>>>> 2 files changed, 22 insertions(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c >>>>> index 7db55fce35d8..9c9ece9833e0 100644 >>>>> --- a/drivers/gpu/drm/drm_prime.c >>>>> +++ b/drivers/gpu/drm/drm_prime.c >>>>> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); >>>>> * callback. Calls into &drm_gem_object_funcs.vmap for device >>>>> specific handling. >>>>> * The kernel virtual address is returned in map. >>>>> * >>>>> + * To prevent the GEM object from being relocated, callers must hold >>>>> the GEM >>>>> + * object's reservation lock from when calling this function until >>>>> releasing the >>>>> + * mapping. Holding onto a mapping and the associated reservation >>>>> lock for an >>>>> + * unbound time may result in out-of-memory errors. Calls to >>>>> drm_gem_dmabuf_vmap() >>>>> + * should therefore be accompanied by a call to >>>>> drm_gem_dmabuf_vunmap(). >>>>> + * >>>>> * Returns 0 on success or a negative errno code otherwise. >>>> This is a dma-buf hook, which means just documenting the rules you'd like >>>> to have here isn't enough. We need to roll this out at the dma-buf level, >>>> and enforce it. >>>> >>>> Enforce it = assert_lock_held >>>> >>>> Roll out = review everyone. Because this goes through dma-buf it'll come >>>> back through shmem helpers (and other helpers and other subsystems) back >>>> to any driver using vmap for gpu buffers. This includes the media >>>> subsystem, and the media subsystem definitely doesn't cope with just >>>> temporarily mapping buffers. So there we need to pin them, which I think >>>> means we'll need 2 version of dma_buf_vmap - one that's temporary and >>>> requires we hold dma_resv lock, the other requires that the buffer is >>>> pinned. >>> >>> OR start to proper use the dma_buf_pin/dma_buf_unpin functions which I >>> added to cover this use case as well. >> >> While I generally agree, here are some thoughts: >> >> I found all generic pin functions useless, because they don't allow for >> specifying where to pin. With fbdev emulation, this means that console >> buffers might never make it to VRAM for scanout. If anything, the policy >> should be that pin always pins in HW-accessible memory. >> >> Pin has quite a bit of overhead (more locking, buffer movement), so it >> should be the second choice after regular vmap. To make both work >> together, pin probably relies on holding the reservation lock internally. >> >> Therefore I think we still would want some additional helpers, such as: >> >> pin_unlocked(), which acquires the resv lock, calls regular pin and >> then drops the resv lock. Same for unpin_unlocked() >> >> vmap_pinned(), which enforces that the buffer has been pinned and >> then calls regalar vmap. Same for vunmap_pinned() >> >> A typical pattern with these functions would look like this. >> >> drm_gem_object bo; >> dma_buf_map map; >> >> init() { >> pin_unlocked(bo); >> vmap_pinned(bo, map); >> } >> >> worker() { >> begin_cpu_access() >> // access bo via map >> end_cpu_access() >> } >> >> fini() { >> vunmap_pinned(bo, map); >> unpin_unlocked(bo); >> } >> >> init() >> while (...) { >> worker() >> } >> fini() >> >> Is that reasonable for media drivers? > > So media drivers go through dma-buf, which means we always pin into > system memory. Which I guess for vram-only display drivers makes no > sense and should be rejected, but we still need somewhat consistent > rules. > > The other thing is that if you do a dma_buf_attach without dynamic > mode, dma-buf will pin things for you already. So in many cases it Do you have a pointer to code that illustrates this well? > could be that we don't need a separate pin (but since the pin is in > the exporter, not dma-buf layer, we can't check for that). I'm also > not seeing why existing users need to split up their dma_buf_vmap into > a pin + vmap, they don't need them separately. It's two different operations, with pin having some possible overhead and failure conditions. And during the last discussion, pin was say to be for userspace-managed buffers. Kernel code should hold the reservation lock. For my POV, the current interfaces have no clear policy or semantics. Looking through the different GEM implementations, each one seems to have its own interpretation. > > I think we could use what we've done for dynamic dma-buf attachment > (which also change locking rules) and just have new functions for the > new way (i.e. short term vmap protected by dma_resv lock. Maybe call > these dma_buf_vmap_local, in the spirit of the new kmap_local which > are currently under discussion. I think _local suffix is better, for > otherwise people might do something silly like > > dma_resv_lock(); > dma_buf_vmap_locked(); > dma_resv_unlock(); > > /* actual access maybe even in some other thread */ > > dma_buf_resv_lock(); > dma_buf_vunmap_unlocked(); > dma_resv_unlock(); > > _local suffix is better at telling that the resulting pointer has very > limited use (essentially just local to the calling context, if you > don't change any locking or anything). _local sounds good. Best regards Thomas > > I think encouraging importers to call dma_buf_pin/unpin isn't a good > idea. Yes dynamic ones need it, but maybe we should check for that > somehow in the exporterd interface (atm only amdgpu is using it). > -Daniel > > > > > >> Best regards >> Thomas >> >> >>> >>> Cheers, >>> Christian. >>> >>>> >>>> That's what I meant with that this approach here is very sprawling :-/ >>>> -Daniel >>>> >>>>> */ >>>>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map >>>>> *map) >>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >>>>> index 5e6daa1c982f..7c34cd5ec261 100644 >>>>> --- a/include/drm/drm_gem.h >>>>> +++ b/include/drm/drm_gem.h >>>>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { >>>>> * Returns a virtual address for the buffer. Used by the >>>>> * drm_gem_dmabuf_vmap() helper. >>>>> * >>>>> + * Notes to implementors: >>>>> + * >>>>> + * - Implementations must expect pairs of @vmap and @vunmap to be >>>>> + * called frequently and should optimize for this case. >>>>> + * >>>>> + * - Implemenations may expect the caller to hold the GEM object's >>>>> + * reservation lock to protect against concurrent calls and >>>>> relocation >>>>> + * of the GEM object. >>>>> + * >>>>> + * - Implementations may provide additional guarantees (e.g., >>>>> working >>>>> + * without holding the reservation lock). >>>>> + * >>>>> * This callback is optional. >>>>> + * >>>>> + * See also drm_gem_dmabuf_vmap() >>>>> */ >>>>> int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); >>>>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { >>>>> * drm_gem_dmabuf_vunmap() helper. >>>>> * >>>>> * This callback is optional. >>>>> + * >>>>> + * See also @vmap. >>>>> */ >>>>> void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map >>>>> *map); >>>>> -- >>>>> 2.29.2 >>>>> >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> -- >> Thomas Zimmermann >> Graphics Driver Developer >> SUSE Software Solutions Germany GmbH >> Maxfeldstr. 5, 90409 Nürnberg, Germany >> (HRB 36809, AG Nürnberg) >> Geschäftsführer: Felix Imendörffer >> > >
On Tue, Dec 1, 2020 at 10:40 AM Thomas Zimmermann <tzimmermann@suse.de> wrote: > > Hi > > Am 01.12.20 um 10:10 schrieb Daniel Vetter: > > On Tue, Dec 1, 2020 at 9:32 AM Thomas Zimmermann <tzimmermann@suse.de> wrote: > >> > >> Hi > >> > >> Am 30.11.20 um 16:33 schrieb Christian König: > >>> Am 30.11.20 um 16:30 schrieb Daniel Vetter: > >>>> On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: > >>>>> Mapping a GEM object's buffer into kernel address space prevents the > >>>>> buffer from being evicted from VRAM, which in turn may result in > >>>>> out-of-memory errors. It's therefore required to only vmap GEM BOs for > >>>>> short periods of time; unless the GEM implementation provides additional > >>>>> guarantees. > >>>>> > >>>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> > >>>>> --- > >>>>> drivers/gpu/drm/drm_prime.c | 6 ++++++ > >>>>> include/drm/drm_gem.h | 16 ++++++++++++++++ > >>>>> 2 files changed, 22 insertions(+) > >>>>> > >>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c > >>>>> index 7db55fce35d8..9c9ece9833e0 100644 > >>>>> --- a/drivers/gpu/drm/drm_prime.c > >>>>> +++ b/drivers/gpu/drm/drm_prime.c > >>>>> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); > >>>>> * callback. Calls into &drm_gem_object_funcs.vmap for device > >>>>> specific handling. > >>>>> * The kernel virtual address is returned in map. > >>>>> * > >>>>> + * To prevent the GEM object from being relocated, callers must hold > >>>>> the GEM > >>>>> + * object's reservation lock from when calling this function until > >>>>> releasing the > >>>>> + * mapping. Holding onto a mapping and the associated reservation > >>>>> lock for an > >>>>> + * unbound time may result in out-of-memory errors. Calls to > >>>>> drm_gem_dmabuf_vmap() > >>>>> + * should therefore be accompanied by a call to > >>>>> drm_gem_dmabuf_vunmap(). > >>>>> + * > >>>>> * Returns 0 on success or a negative errno code otherwise. > >>>> This is a dma-buf hook, which means just documenting the rules you'd like > >>>> to have here isn't enough. We need to roll this out at the dma-buf level, > >>>> and enforce it. > >>>> > >>>> Enforce it = assert_lock_held > >>>> > >>>> Roll out = review everyone. Because this goes through dma-buf it'll come > >>>> back through shmem helpers (and other helpers and other subsystems) back > >>>> to any driver using vmap for gpu buffers. This includes the media > >>>> subsystem, and the media subsystem definitely doesn't cope with just > >>>> temporarily mapping buffers. So there we need to pin them, which I think > >>>> means we'll need 2 version of dma_buf_vmap - one that's temporary and > >>>> requires we hold dma_resv lock, the other requires that the buffer is > >>>> pinned. > >>> > >>> OR start to proper use the dma_buf_pin/dma_buf_unpin functions which I > >>> added to cover this use case as well. > >> > >> While I generally agree, here are some thoughts: > >> > >> I found all generic pin functions useless, because they don't allow for > >> specifying where to pin. With fbdev emulation, this means that console > >> buffers might never make it to VRAM for scanout. If anything, the policy > >> should be that pin always pins in HW-accessible memory. > >> > >> Pin has quite a bit of overhead (more locking, buffer movement), so it > >> should be the second choice after regular vmap. To make both work > >> together, pin probably relies on holding the reservation lock internally. > >> > >> Therefore I think we still would want some additional helpers, such as: > >> > >> pin_unlocked(), which acquires the resv lock, calls regular pin and > >> then drops the resv lock. Same for unpin_unlocked() > >> > >> vmap_pinned(), which enforces that the buffer has been pinned and > >> then calls regalar vmap. Same for vunmap_pinned() > >> > >> A typical pattern with these functions would look like this. > >> > >> drm_gem_object bo; > >> dma_buf_map map; > >> > >> init() { > >> pin_unlocked(bo); > >> vmap_pinned(bo, map); > >> } > >> > >> worker() { > >> begin_cpu_access() > >> // access bo via map > >> end_cpu_access() > >> } > >> > >> fini() { > >> vunmap_pinned(bo, map); > >> unpin_unlocked(bo); > >> } > >> > >> init() > >> while (...) { > >> worker() > >> } > >> fini() > >> > >> Is that reasonable for media drivers? > > > > So media drivers go through dma-buf, which means we always pin into > > system memory. Which I guess for vram-only display drivers makes no > > sense and should be rejected, but we still need somewhat consistent > > rules. > > > > The other thing is that if you do a dma_buf_attach without dynamic > > mode, dma-buf will pin things for you already. So in many cases it > > Do you have a pointer to code that illustrates this well? > > > could be that we don't need a separate pin (but since the pin is in > > the exporter, not dma-buf layer, we can't check for that). I'm also > > not seeing why existing users need to split up their dma_buf_vmap into > > a pin + vmap, they don't need them separately. > > It's two different operations, with pin having some possible overhead > and failure conditions. And during the last discussion, pin was say to > be for userspace-managed buffers. Kernel code should hold the > reservation lock. > > For my POV, the current interfaces have no clear policy or semantics. > Looking through the different GEM implementations, each one seems to > have its own interpretation. Yup, that's the problem really. In the past we've had vmap exclusively for permanently pinned access, with no locking requirements. Now we're trying to make this more dynamic, but in a somewhat ad-hoc fashion (generic fbdev emulation charged ahead with making the fbdev framebuffer evictable), and now it breaks at every seam. Adding more ad-hoc semantics on top doesn't seem like a good idea. That's why I think we should have 2 different interfaces: - dma_buf_vmap, the one we currently have. Permanently pins the buffer, mapping survives, no locking required. - dma_buf_vmap_local, the new interface, the one that generic fbdev should have used (but oh well mistakes happen), requires dma_resv_lock, the mapping is only local to the caller Trying to shovel both semantics into one interface, depending upon which implementation we have backing the buffer, doesn't work indeed. Also on the pin topic, I think neither interface should require callers to explicitly pin anything. For existing users it should happen automatically behind the scenes imo, that's what they're expecting. -Daniel > > I think we could use what we've done for dynamic dma-buf attachment > > (which also change locking rules) and just have new functions for the > > new way (i.e. short term vmap protected by dma_resv lock. Maybe call > > these dma_buf_vmap_local, in the spirit of the new kmap_local which > > are currently under discussion. I think _local suffix is better, for > > otherwise people might do something silly like > > > > dma_resv_lock(); > > dma_buf_vmap_locked(); > > dma_resv_unlock(); > > > > /* actual access maybe even in some other thread */ > > > > dma_buf_resv_lock(); > > dma_buf_vunmap_unlocked(); > > dma_resv_unlock(); > > > > _local suffix is better at telling that the resulting pointer has very > > limited use (essentially just local to the calling context, if you > > don't change any locking or anything). > > _local sounds good. > > Best regards > Thomas > > > > > I think encouraging importers to call dma_buf_pin/unpin isn't a good > > idea. Yes dynamic ones need it, but maybe we should check for that > > somehow in the exporterd interface (atm only amdgpu is using it). > > -Daniel > > > > > > > > > > > >> Best regards > >> Thomas > >> > >> > >>> > >>> Cheers, > >>> Christian. > >>> > >>>> > >>>> That's what I meant with that this approach here is very sprawling :-/ > >>>> -Daniel > >>>> > >>>>> */ > >>>>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map > >>>>> *map) > >>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h > >>>>> index 5e6daa1c982f..7c34cd5ec261 100644 > >>>>> --- a/include/drm/drm_gem.h > >>>>> +++ b/include/drm/drm_gem.h > >>>>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { > >>>>> * Returns a virtual address for the buffer. Used by the > >>>>> * drm_gem_dmabuf_vmap() helper. > >>>>> * > >>>>> + * Notes to implementors: > >>>>> + * > >>>>> + * - Implementations must expect pairs of @vmap and @vunmap to be > >>>>> + * called frequently and should optimize for this case. > >>>>> + * > >>>>> + * - Implemenations may expect the caller to hold the GEM object's > >>>>> + * reservation lock to protect against concurrent calls and > >>>>> relocation > >>>>> + * of the GEM object. > >>>>> + * > >>>>> + * - Implementations may provide additional guarantees (e.g., > >>>>> working > >>>>> + * without holding the reservation lock). > >>>>> + * > >>>>> * This callback is optional. > >>>>> + * > >>>>> + * See also drm_gem_dmabuf_vmap() > >>>>> */ > >>>>> int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); > >>>>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { > >>>>> * drm_gem_dmabuf_vunmap() helper. > >>>>> * > >>>>> * This callback is optional. > >>>>> + * > >>>>> + * See also @vmap. > >>>>> */ > >>>>> void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map > >>>>> *map); > >>>>> -- > >>>>> 2.29.2 > >>>>> > >>> > >>> _______________________________________________ > >>> dri-devel mailing list > >>> dri-devel@lists.freedesktop.org > >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >> > >> -- > >> Thomas Zimmermann > >> Graphics Driver Developer > >> SUSE Software Solutions Germany GmbH > >> Maxfeldstr. 5, 90409 Nürnberg, Germany > >> (HRB 36809, AG Nürnberg) > >> Geschäftsführer: Felix Imendörffer > >> > > > > > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5, 90409 Nürnberg, Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Felix Imendörffer >
Hi Am 01.12.20 um 11:00 schrieb Daniel Vetter: > On Tue, Dec 1, 2020 at 10:40 AM Thomas Zimmermann <tzimmermann@suse.de> wrote: >> >> Hi >> >> Am 01.12.20 um 10:10 schrieb Daniel Vetter: >>> On Tue, Dec 1, 2020 at 9:32 AM Thomas Zimmermann <tzimmermann@suse.de> wrote: >>>> >>>> Hi >>>> >>>> Am 30.11.20 um 16:33 schrieb Christian König: >>>>> Am 30.11.20 um 16:30 schrieb Daniel Vetter: >>>>>> On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: >>>>>>> Mapping a GEM object's buffer into kernel address space prevents the >>>>>>> buffer from being evicted from VRAM, which in turn may result in >>>>>>> out-of-memory errors. It's therefore required to only vmap GEM BOs for >>>>>>> short periods of time; unless the GEM implementation provides additional >>>>>>> guarantees. >>>>>>> >>>>>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> >>>>>>> --- >>>>>>> drivers/gpu/drm/drm_prime.c | 6 ++++++ >>>>>>> include/drm/drm_gem.h | 16 ++++++++++++++++ >>>>>>> 2 files changed, 22 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c >>>>>>> index 7db55fce35d8..9c9ece9833e0 100644 >>>>>>> --- a/drivers/gpu/drm/drm_prime.c >>>>>>> +++ b/drivers/gpu/drm/drm_prime.c >>>>>>> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); >>>>>>> * callback. Calls into &drm_gem_object_funcs.vmap for device >>>>>>> specific handling. >>>>>>> * The kernel virtual address is returned in map. >>>>>>> * >>>>>>> + * To prevent the GEM object from being relocated, callers must hold >>>>>>> the GEM >>>>>>> + * object's reservation lock from when calling this function until >>>>>>> releasing the >>>>>>> + * mapping. Holding onto a mapping and the associated reservation >>>>>>> lock for an >>>>>>> + * unbound time may result in out-of-memory errors. Calls to >>>>>>> drm_gem_dmabuf_vmap() >>>>>>> + * should therefore be accompanied by a call to >>>>>>> drm_gem_dmabuf_vunmap(). >>>>>>> + * >>>>>>> * Returns 0 on success or a negative errno code otherwise. >>>>>> This is a dma-buf hook, which means just documenting the rules you'd like >>>>>> to have here isn't enough. We need to roll this out at the dma-buf level, >>>>>> and enforce it. >>>>>> >>>>>> Enforce it = assert_lock_held >>>>>> >>>>>> Roll out = review everyone. Because this goes through dma-buf it'll come >>>>>> back through shmem helpers (and other helpers and other subsystems) back >>>>>> to any driver using vmap for gpu buffers. This includes the media >>>>>> subsystem, and the media subsystem definitely doesn't cope with just >>>>>> temporarily mapping buffers. So there we need to pin them, which I think >>>>>> means we'll need 2 version of dma_buf_vmap - one that's temporary and >>>>>> requires we hold dma_resv lock, the other requires that the buffer is >>>>>> pinned. >>>>> >>>>> OR start to proper use the dma_buf_pin/dma_buf_unpin functions which I >>>>> added to cover this use case as well. >>>> >>>> While I generally agree, here are some thoughts: >>>> >>>> I found all generic pin functions useless, because they don't allow for >>>> specifying where to pin. With fbdev emulation, this means that console >>>> buffers might never make it to VRAM for scanout. If anything, the policy >>>> should be that pin always pins in HW-accessible memory. >>>> >>>> Pin has quite a bit of overhead (more locking, buffer movement), so it >>>> should be the second choice after regular vmap. To make both work >>>> together, pin probably relies on holding the reservation lock internally. >>>> >>>> Therefore I think we still would want some additional helpers, such as: >>>> >>>> pin_unlocked(), which acquires the resv lock, calls regular pin and >>>> then drops the resv lock. Same for unpin_unlocked() >>>> >>>> vmap_pinned(), which enforces that the buffer has been pinned and >>>> then calls regalar vmap. Same for vunmap_pinned() >>>> >>>> A typical pattern with these functions would look like this. >>>> >>>> drm_gem_object bo; >>>> dma_buf_map map; >>>> >>>> init() { >>>> pin_unlocked(bo); >>>> vmap_pinned(bo, map); >>>> } >>>> >>>> worker() { >>>> begin_cpu_access() >>>> // access bo via map >>>> end_cpu_access() >>>> } >>>> >>>> fini() { >>>> vunmap_pinned(bo, map); >>>> unpin_unlocked(bo); >>>> } >>>> >>>> init() >>>> while (...) { >>>> worker() >>>> } >>>> fini() >>>> >>>> Is that reasonable for media drivers? >>> >>> So media drivers go through dma-buf, which means we always pin into >>> system memory. Which I guess for vram-only display drivers makes no >>> sense and should be rejected, but we still need somewhat consistent >>> rules. >>> >>> The other thing is that if you do a dma_buf_attach without dynamic >>> mode, dma-buf will pin things for you already. So in many cases it >> >> Do you have a pointer to code that illustrates this well? >> >>> could be that we don't need a separate pin (but since the pin is in >>> the exporter, not dma-buf layer, we can't check for that). I'm also >>> not seeing why existing users need to split up their dma_buf_vmap into >>> a pin + vmap, they don't need them separately. >> >> It's two different operations, with pin having some possible overhead >> and failure conditions. And during the last discussion, pin was say to >> be for userspace-managed buffers. Kernel code should hold the >> reservation lock. >> >> For my POV, the current interfaces have no clear policy or semantics. >> Looking through the different GEM implementations, each one seems to >> have its own interpretation. > > Yup, that's the problem really. In the past we've had vmap exclusively > for permanently pinned access, with no locking requirements. Now we're > trying to make this more dynamic, but in a somewhat ad-hoc fashion > (generic fbdev emulation charged ahead with making the fbdev > framebuffer evictable), and now it breaks at every seam. Adding more > ad-hoc semantics on top doesn't seem like a good idea. > > That's why I think we should have 2 different interfaces: > - dma_buf_vmap, the one we currently have. Permanently pins the > buffer, mapping survives, no locking required. > - dma_buf_vmap_local, the new interface, the one that generic fbdev > should have used (but oh well mistakes happen), requires > dma_resv_lock, the mapping is only local to the caller In patch 6 of this series, there's ast cursor code that acquires two BO's reservation locks and vmaps them afterwards. That's probably how you intend to use dma_buf_vmap_local. However, I think it's more logically to have a vmap callback that only does the actual vmap. This is all that exporters would have to implement. And with that, build one helper that pins before vmap and one helper that gets the resv lock. I know that it might require some work on exporting drivers. But with your proposal, we probably need another dma-buf callback just for vmap_local. (?) That seems even less appealing to me. Best regards Thomas > > Trying to shovel both semantics into one interface, depending upon > which implementation we have backing the buffer, doesn't work indeed. > > Also on the pin topic, I think neither interface should require > callers to explicitly pin anything. For existing users it should > happen automatically behind the scenes imo, that's what they're > expecting. > -Daniel > > >>> I think we could use what we've done for dynamic dma-buf attachment >>> (which also change locking rules) and just have new functions for the >>> new way (i.e. short term vmap protected by dma_resv lock. Maybe call >>> these dma_buf_vmap_local, in the spirit of the new kmap_local which >>> are currently under discussion. I think _local suffix is better, for >>> otherwise people might do something silly like >>> >>> dma_resv_lock(); >>> dma_buf_vmap_locked(); >>> dma_resv_unlock(); >>> >>> /* actual access maybe even in some other thread */ >>> >>> dma_buf_resv_lock(); >>> dma_buf_vunmap_unlocked(); >>> dma_resv_unlock(); >>> >>> _local suffix is better at telling that the resulting pointer has very >>> limited use (essentially just local to the calling context, if you >>> don't change any locking or anything). >> >> _local sounds good. >> >> Best regards >> Thomas >> >>> >>> I think encouraging importers to call dma_buf_pin/unpin isn't a good >>> idea. Yes dynamic ones need it, but maybe we should check for that >>> somehow in the exporterd interface (atm only amdgpu is using it). >>> -Daniel >>> >>> >>> >>> >>> >>>> Best regards >>>> Thomas >>>> >>>> >>>>> >>>>> Cheers, >>>>> Christian. >>>>> >>>>>> >>>>>> That's what I meant with that this approach here is very sprawling :-/ >>>>>> -Daniel >>>>>> >>>>>>> */ >>>>>>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map >>>>>>> *map) >>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >>>>>>> index 5e6daa1c982f..7c34cd5ec261 100644 >>>>>>> --- a/include/drm/drm_gem.h >>>>>>> +++ b/include/drm/drm_gem.h >>>>>>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { >>>>>>> * Returns a virtual address for the buffer. Used by the >>>>>>> * drm_gem_dmabuf_vmap() helper. >>>>>>> * >>>>>>> + * Notes to implementors: >>>>>>> + * >>>>>>> + * - Implementations must expect pairs of @vmap and @vunmap to be >>>>>>> + * called frequently and should optimize for this case. >>>>>>> + * >>>>>>> + * - Implemenations may expect the caller to hold the GEM object's >>>>>>> + * reservation lock to protect against concurrent calls and >>>>>>> relocation >>>>>>> + * of the GEM object. >>>>>>> + * >>>>>>> + * - Implementations may provide additional guarantees (e.g., >>>>>>> working >>>>>>> + * without holding the reservation lock). >>>>>>> + * >>>>>>> * This callback is optional. >>>>>>> + * >>>>>>> + * See also drm_gem_dmabuf_vmap() >>>>>>> */ >>>>>>> int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); >>>>>>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { >>>>>>> * drm_gem_dmabuf_vunmap() helper. >>>>>>> * >>>>>>> * This callback is optional. >>>>>>> + * >>>>>>> + * See also @vmap. >>>>>>> */ >>>>>>> void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map >>>>>>> *map); >>>>>>> -- >>>>>>> 2.29.2 >>>>>>> >>>>> >>>>> _______________________________________________ >>>>> dri-devel mailing list >>>>> dri-devel@lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>> >>>> -- >>>> Thomas Zimmermann >>>> Graphics Driver Developer >>>> SUSE Software Solutions Germany GmbH >>>> Maxfeldstr. 5, 90409 Nürnberg, Germany >>>> (HRB 36809, AG Nürnberg) >>>> Geschäftsführer: Felix Imendörffer >>>> >>> >>> >> >> -- >> Thomas Zimmermann >> Graphics Driver Developer >> SUSE Software Solutions Germany GmbH >> Maxfeldstr. 5, 90409 Nürnberg, Germany >> (HRB 36809, AG Nürnberg) >> Geschäftsführer: Felix Imendörffer >> > >
Am 01.12.20 um 11:27 schrieb Thomas Zimmermann: > Hi > > Am 01.12.20 um 11:00 schrieb Daniel Vetter: >> On Tue, Dec 1, 2020 at 10:40 AM Thomas Zimmermann >> <tzimmermann@suse.de> wrote: >>> >>> Hi >>> >>> Am 01.12.20 um 10:10 schrieb Daniel Vetter: >>>> On Tue, Dec 1, 2020 at 9:32 AM Thomas Zimmermann >>>> <tzimmermann@suse.de> wrote: >>>>> >>>>> Hi >>>>> >>>>> Am 30.11.20 um 16:33 schrieb Christian König: >>>>>> Am 30.11.20 um 16:30 schrieb Daniel Vetter: >>>>>>> On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: >>>>>>>> Mapping a GEM object's buffer into kernel address space >>>>>>>> prevents the >>>>>>>> buffer from being evicted from VRAM, which in turn may result in >>>>>>>> out-of-memory errors. It's therefore required to only vmap GEM >>>>>>>> BOs for >>>>>>>> short periods of time; unless the GEM implementation provides >>>>>>>> additional >>>>>>>> guarantees. >>>>>>>> >>>>>>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> >>>>>>>> --- >>>>>>>> drivers/gpu/drm/drm_prime.c | 6 ++++++ >>>>>>>> include/drm/drm_gem.h | 16 ++++++++++++++++ >>>>>>>> 2 files changed, 22 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c >>>>>>>> b/drivers/gpu/drm/drm_prime.c >>>>>>>> index 7db55fce35d8..9c9ece9833e0 100644 >>>>>>>> --- a/drivers/gpu/drm/drm_prime.c >>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c >>>>>>>> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); >>>>>>>> * callback. Calls into &drm_gem_object_funcs.vmap for device >>>>>>>> specific handling. >>>>>>>> * The kernel virtual address is returned in map. >>>>>>>> * >>>>>>>> + * To prevent the GEM object from being relocated, callers >>>>>>>> must hold >>>>>>>> the GEM >>>>>>>> + * object's reservation lock from when calling this function >>>>>>>> until >>>>>>>> releasing the >>>>>>>> + * mapping. Holding onto a mapping and the associated reservation >>>>>>>> lock for an >>>>>>>> + * unbound time may result in out-of-memory errors. Calls to >>>>>>>> drm_gem_dmabuf_vmap() >>>>>>>> + * should therefore be accompanied by a call to >>>>>>>> drm_gem_dmabuf_vunmap(). >>>>>>>> + * >>>>>>>> * Returns 0 on success or a negative errno code otherwise. >>>>>>> This is a dma-buf hook, which means just documenting the rules >>>>>>> you'd like >>>>>>> to have here isn't enough. We need to roll this out at the >>>>>>> dma-buf level, >>>>>>> and enforce it. >>>>>>> >>>>>>> Enforce it = assert_lock_held >>>>>>> >>>>>>> Roll out = review everyone. Because this goes through dma-buf >>>>>>> it'll come >>>>>>> back through shmem helpers (and other helpers and other >>>>>>> subsystems) back >>>>>>> to any driver using vmap for gpu buffers. This includes the media >>>>>>> subsystem, and the media subsystem definitely doesn't cope with >>>>>>> just >>>>>>> temporarily mapping buffers. So there we need to pin them, which >>>>>>> I think >>>>>>> means we'll need 2 version of dma_buf_vmap - one that's >>>>>>> temporary and >>>>>>> requires we hold dma_resv lock, the other requires that the >>>>>>> buffer is >>>>>>> pinned. >>>>>> >>>>>> OR start to proper use the dma_buf_pin/dma_buf_unpin functions >>>>>> which I >>>>>> added to cover this use case as well. >>>>> >>>>> While I generally agree, here are some thoughts: >>>>> >>>>> I found all generic pin functions useless, because they don't >>>>> allow for >>>>> specifying where to pin. With fbdev emulation, this means that >>>>> console >>>>> buffers might never make it to VRAM for scanout. If anything, the >>>>> policy >>>>> should be that pin always pins in HW-accessible memory. >>>>> >>>>> Pin has quite a bit of overhead (more locking, buffer movement), >>>>> so it >>>>> should be the second choice after regular vmap. To make both work >>>>> together, pin probably relies on holding the reservation lock >>>>> internally. >>>>> >>>>> Therefore I think we still would want some additional helpers, >>>>> such as: >>>>> >>>>> pin_unlocked(), which acquires the resv lock, calls regular >>>>> pin and >>>>> then drops the resv lock. Same for unpin_unlocked() >>>>> >>>>> vmap_pinned(), which enforces that the buffer has been pinned >>>>> and >>>>> then calls regalar vmap. Same for vunmap_pinned() >>>>> >>>>> A typical pattern with these functions would look like this. >>>>> >>>>> drm_gem_object bo; >>>>> dma_buf_map map; >>>>> >>>>> init() { >>>>> pin_unlocked(bo); >>>>> vmap_pinned(bo, map); >>>>> } >>>>> >>>>> worker() { >>>>> begin_cpu_access() >>>>> // access bo via map >>>>> end_cpu_access() >>>>> } >>>>> >>>>> fini() { >>>>> vunmap_pinned(bo, map); >>>>> unpin_unlocked(bo); >>>>> } >>>>> >>>>> init() >>>>> while (...) { >>>>> worker() >>>>> } >>>>> fini() >>>>> >>>>> Is that reasonable for media drivers? >>>> >>>> So media drivers go through dma-buf, which means we always pin into >>>> system memory. Which I guess for vram-only display drivers makes no >>>> sense and should be rejected, but we still need somewhat consistent >>>> rules. >>>> >>>> The other thing is that if you do a dma_buf_attach without dynamic >>>> mode, dma-buf will pin things for you already. So in many cases it >>> >>> Do you have a pointer to code that illustrates this well? >>> >>>> could be that we don't need a separate pin (but since the pin is in >>>> the exporter, not dma-buf layer, we can't check for that). I'm also >>>> not seeing why existing users need to split up their dma_buf_vmap into >>>> a pin + vmap, they don't need them separately. >>> >>> It's two different operations, with pin having some possible overhead >>> and failure conditions. And during the last discussion, pin was say to >>> be for userspace-managed buffers. Kernel code should hold the >>> reservation lock. >>> >>> For my POV, the current interfaces have no clear policy or semantics. >>> Looking through the different GEM implementations, each one seems to >>> have its own interpretation. >> >> Yup, that's the problem really. In the past we've had vmap exclusively >> for permanently pinned access, with no locking requirements. Now we're >> trying to make this more dynamic, but in a somewhat ad-hoc fashion >> (generic fbdev emulation charged ahead with making the fbdev >> framebuffer evictable), and now it breaks at every seam. Adding more >> ad-hoc semantics on top doesn't seem like a good idea. >> >> That's why I think we should have 2 different interfaces: >> - dma_buf_vmap, the one we currently have. Permanently pins the >> buffer, mapping survives, no locking required. >> - dma_buf_vmap_local, the new interface, the one that generic fbdev >> should have used (but oh well mistakes happen), requires >> dma_resv_lock, the mapping is only local to the caller > > In patch 6 of this series, there's ast cursor code that acquires two > BO's reservation locks and vmaps them afterwards. That's probably how > you intend to use dma_buf_vmap_local. > > However, I think it's more logically to have a vmap callback that only > does the actual vmap. This is all that exporters would have to implement. > > And with that, build one helper that pins before vmap and one helper > that gets the resv lock. I don't think that this is will work nor is it a good approach. See the ast cursor handling for example. You need to acquire two BOs here, not just one. And this can't be done cleanly with a single vmap call. Regards, Christian. > > I know that it might require some work on exporting drivers. But with > your proposal, we probably need another dma-buf callback just for > vmap_local. (?) That seems even less appealing to me. > > Best regards > Thomas > >> >> Trying to shovel both semantics into one interface, depending upon >> which implementation we have backing the buffer, doesn't work indeed. >> >> Also on the pin topic, I think neither interface should require >> callers to explicitly pin anything. For existing users it should >> happen automatically behind the scenes imo, that's what they're >> expecting. >> -Daniel >> >> >>>> I think we could use what we've done for dynamic dma-buf attachment >>>> (which also change locking rules) and just have new functions for the >>>> new way (i.e. short term vmap protected by dma_resv lock. Maybe call >>>> these dma_buf_vmap_local, in the spirit of the new kmap_local which >>>> are currently under discussion. I think _local suffix is better, for >>>> otherwise people might do something silly like >>>> >>>> dma_resv_lock(); >>>> dma_buf_vmap_locked(); >>>> dma_resv_unlock(); >>>> >>>> /* actual access maybe even in some other thread */ >>>> >>>> dma_buf_resv_lock(); >>>> dma_buf_vunmap_unlocked(); >>>> dma_resv_unlock(); >>>> >>>> _local suffix is better at telling that the resulting pointer has very >>>> limited use (essentially just local to the calling context, if you >>>> don't change any locking or anything). >>> >>> _local sounds good. >>> >>> Best regards >>> Thomas >>> >>>> >>>> I think encouraging importers to call dma_buf_pin/unpin isn't a good >>>> idea. Yes dynamic ones need it, but maybe we should check for that >>>> somehow in the exporterd interface (atm only amdgpu is using it). >>>> -Daniel >>>> >>>> >>>> >>>> >>>> >>>>> Best regards >>>>> Thomas >>>>> >>>>> >>>>>> >>>>>> Cheers, >>>>>> Christian. >>>>>> >>>>>>> >>>>>>> That's what I meant with that this approach here is very >>>>>>> sprawling :-/ >>>>>>> -Daniel >>>>>>> >>>>>>>> */ >>>>>>>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct >>>>>>>> dma_buf_map >>>>>>>> *map) >>>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >>>>>>>> index 5e6daa1c982f..7c34cd5ec261 100644 >>>>>>>> --- a/include/drm/drm_gem.h >>>>>>>> +++ b/include/drm/drm_gem.h >>>>>>>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { >>>>>>>> * Returns a virtual address for the buffer. Used by the >>>>>>>> * drm_gem_dmabuf_vmap() helper. >>>>>>>> * >>>>>>>> + * Notes to implementors: >>>>>>>> + * >>>>>>>> + * - Implementations must expect pairs of @vmap and >>>>>>>> @vunmap to be >>>>>>>> + * called frequently and should optimize for this case. >>>>>>>> + * >>>>>>>> + * - Implemenations may expect the caller to hold the GEM >>>>>>>> object's >>>>>>>> + * reservation lock to protect against concurrent calls and >>>>>>>> relocation >>>>>>>> + * of the GEM object. >>>>>>>> + * >>>>>>>> + * - Implementations may provide additional guarantees (e.g., >>>>>>>> working >>>>>>>> + * without holding the reservation lock). >>>>>>>> + * >>>>>>>> * This callback is optional. >>>>>>>> + * >>>>>>>> + * See also drm_gem_dmabuf_vmap() >>>>>>>> */ >>>>>>>> int (*vmap)(struct drm_gem_object *obj, struct >>>>>>>> dma_buf_map *map); >>>>>>>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { >>>>>>>> * drm_gem_dmabuf_vunmap() helper. >>>>>>>> * >>>>>>>> * This callback is optional. >>>>>>>> + * >>>>>>>> + * See also @vmap. >>>>>>>> */ >>>>>>>> void (*vunmap)(struct drm_gem_object *obj, struct >>>>>>>> dma_buf_map >>>>>>>> *map); >>>>>>>> -- >>>>>>>> 2.29.2 >>>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> dri-devel mailing list >>>>>> dri-devel@lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>> >>>>> -- >>>>> Thomas Zimmermann >>>>> Graphics Driver Developer >>>>> SUSE Software Solutions Germany GmbH >>>>> Maxfeldstr. 5, 90409 Nürnberg, Germany >>>>> (HRB 36809, AG Nürnberg) >>>>> Geschäftsführer: Felix Imendörffer >>>>> >>>> >>>> >>> >>> -- >>> Thomas Zimmermann >>> Graphics Driver Developer >>> SUSE Software Solutions Germany GmbH >>> Maxfeldstr. 5, 90409 Nürnberg, Germany >>> (HRB 36809, AG Nürnberg) >>> Geschäftsführer: Felix Imendörffer >>> >> >> >
Hi Am 01.12.20 um 11:34 schrieb Christian König: >> [...] >> In patch 6 of this series, there's ast cursor code that acquires two >> BO's reservation locks and vmaps them afterwards. That's probably how >> you intend to use dma_buf_vmap_local. >> >> However, I think it's more logically to have a vmap callback that only >> does the actual vmap. This is all that exporters would have to implement. >> >> And with that, build one helper that pins before vmap and one helper >> that gets the resv lock. > > I don't think that this is will work nor is it a good approach. > > See the ast cursor handling for example. You need to acquire two BOs > here, not just one. And this can't be done cleanly with a single vmap call. That seems to be a misunderstanding. I don't mentioned it explicitly, but there's of course another helper that only vmaps and nothing else. This would be useful for cases like the cursor code. So there would be: dma_buf_vmap() - pin + vmap dma_buf_vmap_local() - lock + vmap dma_buf_vmap_locked() - only vmap; caller has set up the BOs I did some conversion of drivers that use vram and shmem. They occasionally update a buffer (ast cursors) or flush a BO from system memory to HW (udl, cirrus, mgag200). In terms of these 3 interfaces: I never needed dma_buf_vmap() because pinning was never really required here. Almost all of the cases were handled by dma_buf_vmap_local(). And the ast cursor code uses the equivalent of dma_buf_vmap_locked(). The driver exporting the buffer would only have to implement vmap() and pin() interfaces. Each does only its one thing and would assume that the caller holds the lock. Best regards Thomas > > Regards, > Christian. > >> >> I know that it might require some work on exporting drivers. But with >> your proposal, we probably need another dma-buf callback just for >> vmap_local. (?) That seems even less appealing to me. >> >> Best regards >> Thomas >> >>> >>> Trying to shovel both semantics into one interface, depending upon >>> which implementation we have backing the buffer, doesn't work indeed. >>> >>> Also on the pin topic, I think neither interface should require >>> callers to explicitly pin anything. For existing users it should >>> happen automatically behind the scenes imo, that's what they're >>> expecting. >>> -Daniel >>> >>> >>>>> I think we could use what we've done for dynamic dma-buf attachment >>>>> (which also change locking rules) and just have new functions for the >>>>> new way (i.e. short term vmap protected by dma_resv lock. Maybe call >>>>> these dma_buf_vmap_local, in the spirit of the new kmap_local which >>>>> are currently under discussion. I think _local suffix is better, for >>>>> otherwise people might do something silly like >>>>> >>>>> dma_resv_lock(); >>>>> dma_buf_vmap_locked(); >>>>> dma_resv_unlock(); >>>>> >>>>> /* actual access maybe even in some other thread */ >>>>> >>>>> dma_buf_resv_lock(); >>>>> dma_buf_vunmap_unlocked(); >>>>> dma_resv_unlock(); >>>>> >>>>> _local suffix is better at telling that the resulting pointer has very >>>>> limited use (essentially just local to the calling context, if you >>>>> don't change any locking or anything). >>>> >>>> _local sounds good. >>>> >>>> Best regards >>>> Thomas >>>> >>>>> >>>>> I think encouraging importers to call dma_buf_pin/unpin isn't a good >>>>> idea. Yes dynamic ones need it, but maybe we should check for that >>>>> somehow in the exporterd interface (atm only amdgpu is using it). >>>>> -Daniel >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Best regards >>>>>> Thomas >>>>>> >>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> Christian. >>>>>>> >>>>>>>> >>>>>>>> That's what I meant with that this approach here is very >>>>>>>> sprawling :-/ >>>>>>>> -Daniel >>>>>>>> >>>>>>>>> */ >>>>>>>>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct >>>>>>>>> dma_buf_map >>>>>>>>> *map) >>>>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >>>>>>>>> index 5e6daa1c982f..7c34cd5ec261 100644 >>>>>>>>> --- a/include/drm/drm_gem.h >>>>>>>>> +++ b/include/drm/drm_gem.h >>>>>>>>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { >>>>>>>>> * Returns a virtual address for the buffer. Used by the >>>>>>>>> * drm_gem_dmabuf_vmap() helper. >>>>>>>>> * >>>>>>>>> + * Notes to implementors: >>>>>>>>> + * >>>>>>>>> + * - Implementations must expect pairs of @vmap and >>>>>>>>> @vunmap to be >>>>>>>>> + * called frequently and should optimize for this case. >>>>>>>>> + * >>>>>>>>> + * - Implemenations may expect the caller to hold the GEM >>>>>>>>> object's >>>>>>>>> + * reservation lock to protect against concurrent calls and >>>>>>>>> relocation >>>>>>>>> + * of the GEM object. >>>>>>>>> + * >>>>>>>>> + * - Implementations may provide additional guarantees (e.g., >>>>>>>>> working >>>>>>>>> + * without holding the reservation lock). >>>>>>>>> + * >>>>>>>>> * This callback is optional. >>>>>>>>> + * >>>>>>>>> + * See also drm_gem_dmabuf_vmap() >>>>>>>>> */ >>>>>>>>> int (*vmap)(struct drm_gem_object *obj, struct >>>>>>>>> dma_buf_map *map); >>>>>>>>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { >>>>>>>>> * drm_gem_dmabuf_vunmap() helper. >>>>>>>>> * >>>>>>>>> * This callback is optional. >>>>>>>>> + * >>>>>>>>> + * See also @vmap. >>>>>>>>> */ >>>>>>>>> void (*vunmap)(struct drm_gem_object *obj, struct >>>>>>>>> dma_buf_map >>>>>>>>> *map); >>>>>>>>> -- >>>>>>>>> 2.29.2 >>>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> dri-devel mailing list >>>>>>> dri-devel@lists.freedesktop.org >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>> >>>>>> -- >>>>>> Thomas Zimmermann >>>>>> Graphics Driver Developer >>>>>> SUSE Software Solutions Germany GmbH >>>>>> Maxfeldstr. 5, 90409 Nürnberg, Germany >>>>>> (HRB 36809, AG Nürnberg) >>>>>> Geschäftsführer: Felix Imendörffer >>>>>> >>>>> >>>>> >>>> >>>> -- >>>> Thomas Zimmermann >>>> Graphics Driver Developer >>>> SUSE Software Solutions Germany GmbH >>>> Maxfeldstr. 5, 90409 Nürnberg, Germany >>>> (HRB 36809, AG Nürnberg) >>>> Geschäftsführer: Felix Imendörffer >>>> >>> >>> >> > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Am 01.12.20 um 11:00 schrieb Daniel Vetter: >> [...] >> For my POV, the current interfaces have no clear policy or semantics. >> Looking through the different GEM implementations, each one seems to >> have its own interpretation. > > Yup, that's the problem really. In the past we've had vmap exclusively > for permanently pinned access, with no locking requirements. Now we're > trying to make this more dynamic, but in a somewhat ad-hoc fashion > (generic fbdev emulation charged ahead with making the fbdev > framebuffer evictable), and now it breaks at every seam. Adding more > ad-hoc semantics on top doesn't seem like a good idea. > > That's why I think we should have 2 different interfaces: > - dma_buf_vmap, the one we currently have. Permanently pins the > buffer, mapping survives, no locking required. I just looked at the implementation of dma_buf_vmap() and there's no pinning happening AFAICT. Also, none of the callback's implementations does pinning (except vram helpers). Do you mean dma_buf_attach() instead? Best regards Thomas > - dma_buf_vmap_local, the new interface, the one that generic fbdev > should have used (but oh well mistakes happen), requires > dma_resv_lock, the mapping is only local to the caller > > Trying to shovel both semantics into one interface, depending upon > which implementation we have backing the buffer, doesn't work indeed. > > Also on the pin topic, I think neither interface should require > callers to explicitly pin anything. For existing users it should > happen automatically behind the scenes imo, that's what they're > expecting. > -Daniel > > >>> I think we could use what we've done for dynamic dma-buf attachment >>> (which also change locking rules) and just have new functions for the >>> new way (i.e. short term vmap protected by dma_resv lock. Maybe call >>> these dma_buf_vmap_local, in the spirit of the new kmap_local which >>> are currently under discussion. I think _local suffix is better, for >>> otherwise people might do something silly like >>> >>> dma_resv_lock(); >>> dma_buf_vmap_locked(); >>> dma_resv_unlock(); >>> >>> /* actual access maybe even in some other thread */ >>> >>> dma_buf_resv_lock(); >>> dma_buf_vunmap_unlocked(); >>> dma_resv_unlock(); >>> >>> _local suffix is better at telling that the resulting pointer has very >>> limited use (essentially just local to the calling context, if you >>> don't change any locking or anything). >> >> _local sounds good. >> >> Best regards >> Thomas >> >>> >>> I think encouraging importers to call dma_buf_pin/unpin isn't a good >>> idea. Yes dynamic ones need it, but maybe we should check for that >>> somehow in the exporterd interface (atm only amdgpu is using it). >>> -Daniel >>> >>> >>> >>> >>> >>>> Best regards >>>> Thomas >>>> >>>> >>>>> >>>>> Cheers, >>>>> Christian. >>>>> >>>>>> >>>>>> That's what I meant with that this approach here is very sprawling :-/ >>>>>> -Daniel >>>>>> >>>>>>> */ >>>>>>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map >>>>>>> *map) >>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >>>>>>> index 5e6daa1c982f..7c34cd5ec261 100644 >>>>>>> --- a/include/drm/drm_gem.h >>>>>>> +++ b/include/drm/drm_gem.h >>>>>>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { >>>>>>> * Returns a virtual address for the buffer. Used by the >>>>>>> * drm_gem_dmabuf_vmap() helper. >>>>>>> * >>>>>>> + * Notes to implementors: >>>>>>> + * >>>>>>> + * - Implementations must expect pairs of @vmap and @vunmap to be >>>>>>> + * called frequently and should optimize for this case. >>>>>>> + * >>>>>>> + * - Implemenations may expect the caller to hold the GEM object's >>>>>>> + * reservation lock to protect against concurrent calls and >>>>>>> relocation >>>>>>> + * of the GEM object. >>>>>>> + * >>>>>>> + * - Implementations may provide additional guarantees (e.g., >>>>>>> working >>>>>>> + * without holding the reservation lock). >>>>>>> + * >>>>>>> * This callback is optional. >>>>>>> + * >>>>>>> + * See also drm_gem_dmabuf_vmap() >>>>>>> */ >>>>>>> int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); >>>>>>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { >>>>>>> * drm_gem_dmabuf_vunmap() helper. >>>>>>> * >>>>>>> * This callback is optional. >>>>>>> + * >>>>>>> + * See also @vmap. >>>>>>> */ >>>>>>> void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map >>>>>>> *map); >>>>>>> -- >>>>>>> 2.29.2 >>>>>>> >>>>> >>>>> _______________________________________________ >>>>> dri-devel mailing list >>>>> dri-devel@lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>> >>>> -- >>>> Thomas Zimmermann >>>> Graphics Driver Developer >>>> SUSE Software Solutions Germany GmbH >>>> Maxfeldstr. 5, 90409 Nürnberg, Germany >>>> (HRB 36809, AG Nürnberg) >>>> Geschäftsführer: Felix Imendörffer >>>> >>> >>> >> >> -- >> Thomas Zimmermann >> Graphics Driver Developer >> SUSE Software Solutions Germany GmbH >> Maxfeldstr. 5, 90409 Nürnberg, Germany >> (HRB 36809, AG Nürnberg) >> Geschäftsführer: Felix Imendörffer >> > >
Am 01.12.20 um 12:30 schrieb Thomas Zimmermann: > Hi > > Am 01.12.20 um 11:34 schrieb Christian König: >>> [...] >>> In patch 6 of this series, there's ast cursor code that acquires two >>> BO's reservation locks and vmaps them afterwards. That's probably >>> how you intend to use dma_buf_vmap_local. >>> >>> However, I think it's more logically to have a vmap callback that >>> only does the actual vmap. This is all that exporters would have to >>> implement. >>> >>> And with that, build one helper that pins before vmap and one helper >>> that gets the resv lock. >> >> I don't think that this is will work nor is it a good approach. >> >> See the ast cursor handling for example. You need to acquire two BOs >> here, not just one. And this can't be done cleanly with a single vmap >> call. > > That seems to be a misunderstanding. > > I don't mentioned it explicitly, but there's of course another helper > that only vmaps and nothing else. This would be useful for cases like > the cursor code. So there would be: > > dma_buf_vmap() - pin + vmap > dma_buf_vmap_local() - lock + vmap > dma_buf_vmap_locked() - only vmap; caller has set up the BOs Well that zoo of helpers will certainly get a NAK from my side. See interfaces like this should implement simple functions and not hide what's actually needs to be done inside the drivers using this interface. What we could do is to add a pin count to the DMA-buf and then do WARN_ON(dma_buf->pin_count || dma_resv_lock_help(dma_buf->resv)) in the vmap/vunmap calls. > > I did some conversion of drivers that use vram and shmem. They > occasionally update a buffer (ast cursors) or flush a BO from system > memory to HW (udl, cirrus, mgag200). In terms of these 3 interfaces: I > never needed dma_buf_vmap() because pinning was never really required > here. Almost all of the cases were handled by dma_buf_vmap_local(). > And the ast cursor code uses the equivalent of dma_buf_vmap_locked(). Yeah, that is kind of expected. I was already wondering as well why we didn't used the reservation lock more extensively. Regards, Christian. > > The driver exporting the buffer would only have to implement vmap() > and pin() interfaces. Each does only its one thing and would assume > that the caller holds the lock. > > Best regards > Thomas > >> >> Regards, >> Christian. >> >>> >>> I know that it might require some work on exporting drivers. But >>> with your proposal, we probably need another dma-buf callback just >>> for vmap_local. (?) That seems even less appealing to me. >>> >>> Best regards >>> Thomas >>> >>>> >>>> Trying to shovel both semantics into one interface, depending upon >>>> which implementation we have backing the buffer, doesn't work indeed. >>>> >>>> Also on the pin topic, I think neither interface should require >>>> callers to explicitly pin anything. For existing users it should >>>> happen automatically behind the scenes imo, that's what they're >>>> expecting. >>>> -Daniel >>>> >>>> >>>>>> I think we could use what we've done for dynamic dma-buf attachment >>>>>> (which also change locking rules) and just have new functions for >>>>>> the >>>>>> new way (i.e. short term vmap protected by dma_resv lock. Maybe call >>>>>> these dma_buf_vmap_local, in the spirit of the new kmap_local which >>>>>> are currently under discussion. I think _local suffix is better, for >>>>>> otherwise people might do something silly like >>>>>> >>>>>> dma_resv_lock(); >>>>>> dma_buf_vmap_locked(); >>>>>> dma_resv_unlock(); >>>>>> >>>>>> /* actual access maybe even in some other thread */ >>>>>> >>>>>> dma_buf_resv_lock(); >>>>>> dma_buf_vunmap_unlocked(); >>>>>> dma_resv_unlock(); >>>>>> >>>>>> _local suffix is better at telling that the resulting pointer has >>>>>> very >>>>>> limited use (essentially just local to the calling context, if you >>>>>> don't change any locking or anything). >>>>> >>>>> _local sounds good. >>>>> >>>>> Best regards >>>>> Thomas >>>>> >>>>>> >>>>>> I think encouraging importers to call dma_buf_pin/unpin isn't a good >>>>>> idea. Yes dynamic ones need it, but maybe we should check for that >>>>>> somehow in the exporterd interface (atm only amdgpu is using it). >>>>>> -Daniel >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Best regards >>>>>>> Thomas >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Christian. >>>>>>>> >>>>>>>>> >>>>>>>>> That's what I meant with that this approach here is very >>>>>>>>> sprawling :-/ >>>>>>>>> -Daniel >>>>>>>>> >>>>>>>>>> */ >>>>>>>>>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct >>>>>>>>>> dma_buf_map >>>>>>>>>> *map) >>>>>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >>>>>>>>>> index 5e6daa1c982f..7c34cd5ec261 100644 >>>>>>>>>> --- a/include/drm/drm_gem.h >>>>>>>>>> +++ b/include/drm/drm_gem.h >>>>>>>>>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { >>>>>>>>>> * Returns a virtual address for the buffer. Used by the >>>>>>>>>> * drm_gem_dmabuf_vmap() helper. >>>>>>>>>> * >>>>>>>>>> + * Notes to implementors: >>>>>>>>>> + * >>>>>>>>>> + * - Implementations must expect pairs of @vmap and >>>>>>>>>> @vunmap to be >>>>>>>>>> + * called frequently and should optimize for this case. >>>>>>>>>> + * >>>>>>>>>> + * - Implemenations may expect the caller to hold the >>>>>>>>>> GEM object's >>>>>>>>>> + * reservation lock to protect against concurrent >>>>>>>>>> calls and >>>>>>>>>> relocation >>>>>>>>>> + * of the GEM object. >>>>>>>>>> + * >>>>>>>>>> + * - Implementations may provide additional guarantees >>>>>>>>>> (e.g., >>>>>>>>>> working >>>>>>>>>> + * without holding the reservation lock). >>>>>>>>>> + * >>>>>>>>>> * This callback is optional. >>>>>>>>>> + * >>>>>>>>>> + * See also drm_gem_dmabuf_vmap() >>>>>>>>>> */ >>>>>>>>>> int (*vmap)(struct drm_gem_object *obj, struct >>>>>>>>>> dma_buf_map *map); >>>>>>>>>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { >>>>>>>>>> * drm_gem_dmabuf_vunmap() helper. >>>>>>>>>> * >>>>>>>>>> * This callback is optional. >>>>>>>>>> + * >>>>>>>>>> + * See also @vmap. >>>>>>>>>> */ >>>>>>>>>> void (*vunmap)(struct drm_gem_object *obj, struct >>>>>>>>>> dma_buf_map >>>>>>>>>> *map); >>>>>>>>>> -- >>>>>>>>>> 2.29.2 >>>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dri-devel mailing list >>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>> >>>>>>> -- >>>>>>> Thomas Zimmermann >>>>>>> Graphics Driver Developer >>>>>>> SUSE Software Solutions Germany GmbH >>>>>>> Maxfeldstr. 5, 90409 Nürnberg, Germany >>>>>>> (HRB 36809, AG Nürnberg) >>>>>>> Geschäftsführer: Felix Imendörffer >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Thomas Zimmermann >>>>> Graphics Driver Developer >>>>> SUSE Software Solutions Germany GmbH >>>>> Maxfeldstr. 5, 90409 Nürnberg, Germany >>>>> (HRB 36809, AG Nürnberg) >>>>> Geschäftsführer: Felix Imendörffer >>>>> >>>> >>>> >>> >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >
Hi Am 01.12.20 um 13:14 schrieb Christian König: > Am 01.12.20 um 12:30 schrieb Thomas Zimmermann: >> Hi >> >> Am 01.12.20 um 11:34 schrieb Christian König: >>>> [...] >>>> In patch 6 of this series, there's ast cursor code that acquires two >>>> BO's reservation locks and vmaps them afterwards. That's probably >>>> how you intend to use dma_buf_vmap_local. >>>> >>>> However, I think it's more logically to have a vmap callback that >>>> only does the actual vmap. This is all that exporters would have to >>>> implement. >>>> >>>> And with that, build one helper that pins before vmap and one helper >>>> that gets the resv lock. >>> >>> I don't think that this is will work nor is it a good approach. >>> >>> See the ast cursor handling for example. You need to acquire two BOs >>> here, not just one. And this can't be done cleanly with a single vmap >>> call. >> >> That seems to be a misunderstanding. >> >> I don't mentioned it explicitly, but there's of course another helper >> that only vmaps and nothing else. This would be useful for cases like >> the cursor code. So there would be: >> >> dma_buf_vmap() - pin + vmap >> dma_buf_vmap_local() - lock + vmap >> dma_buf_vmap_locked() - only vmap; caller has set up the BOs > > Well that zoo of helpers will certainly get a NAK from my side. > > See interfaces like this should implement simple functions and not hide > what's actually needs to be done inside the drivers using this interface. If 9 of 10 invocations use the same pattern, why not put that pattern in a helper? I see nothing wrong with that. > > What we could do is to add a pin count to the DMA-buf and then do > WARN_ON(dma_buf->pin_count || dma_resv_lock_help(dma_buf->resv)) in the > vmap/vunmap calls. Most of the vmap code is either CMA or SHMEM GEM stuff. They don't need to pin. It's just baggage to them. The TTM stuff that does need pinning is the minority. > >> >> I did some conversion of drivers that use vram and shmem. They >> occasionally update a buffer (ast cursors) or flush a BO from system >> memory to HW (udl, cirrus, mgag200). In terms of these 3 interfaces: I >> never needed dma_buf_vmap() because pinning was never really required >> here. Almost all of the cases were handled by dma_buf_vmap_local(). >> And the ast cursor code uses the equivalent of dma_buf_vmap_locked(). > > Yeah, that is kind of expected. I was already wondering as well why we > didn't used the reservation lock more extensively. As a side note, I found only 6 trivial implementations of vmap outside of drivers/gpu/drm. I cannot find a single implementation of pin there. What am I missing? Best regards Thomas > > Regards, > Christian. > >> >> The driver exporting the buffer would only have to implement vmap() >> and pin() interfaces. Each does only its one thing and would assume >> that the caller holds the lock. >> >> Best regards >> Thomas >> >>> >>> Regards, >>> Christian. >>> >>>> >>>> I know that it might require some work on exporting drivers. But >>>> with your proposal, we probably need another dma-buf callback just >>>> for vmap_local. (?) That seems even less appealing to me. >>>> >>>> Best regards >>>> Thomas >>>> >>>>> >>>>> Trying to shovel both semantics into one interface, depending upon >>>>> which implementation we have backing the buffer, doesn't work indeed. >>>>> >>>>> Also on the pin topic, I think neither interface should require >>>>> callers to explicitly pin anything. For existing users it should >>>>> happen automatically behind the scenes imo, that's what they're >>>>> expecting. >>>>> -Daniel >>>>> >>>>> >>>>>>> I think we could use what we've done for dynamic dma-buf attachment >>>>>>> (which also change locking rules) and just have new functions for >>>>>>> the >>>>>>> new way (i.e. short term vmap protected by dma_resv lock. Maybe call >>>>>>> these dma_buf_vmap_local, in the spirit of the new kmap_local which >>>>>>> are currently under discussion. I think _local suffix is better, for >>>>>>> otherwise people might do something silly like >>>>>>> >>>>>>> dma_resv_lock(); >>>>>>> dma_buf_vmap_locked(); >>>>>>> dma_resv_unlock(); >>>>>>> >>>>>>> /* actual access maybe even in some other thread */ >>>>>>> >>>>>>> dma_buf_resv_lock(); >>>>>>> dma_buf_vunmap_unlocked(); >>>>>>> dma_resv_unlock(); >>>>>>> >>>>>>> _local suffix is better at telling that the resulting pointer has >>>>>>> very >>>>>>> limited use (essentially just local to the calling context, if you >>>>>>> don't change any locking or anything). >>>>>> >>>>>> _local sounds good. >>>>>> >>>>>> Best regards >>>>>> Thomas >>>>>> >>>>>>> >>>>>>> I think encouraging importers to call dma_buf_pin/unpin isn't a good >>>>>>> idea. Yes dynamic ones need it, but maybe we should check for that >>>>>>> somehow in the exporterd interface (atm only amdgpu is using it). >>>>>>> -Daniel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Best regards >>>>>>>> Thomas >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> That's what I meant with that this approach here is very >>>>>>>>>> sprawling :-/ >>>>>>>>>> -Daniel >>>>>>>>>> >>>>>>>>>>> */ >>>>>>>>>>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct >>>>>>>>>>> dma_buf_map >>>>>>>>>>> *map) >>>>>>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >>>>>>>>>>> index 5e6daa1c982f..7c34cd5ec261 100644 >>>>>>>>>>> --- a/include/drm/drm_gem.h >>>>>>>>>>> +++ b/include/drm/drm_gem.h >>>>>>>>>>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { >>>>>>>>>>> * Returns a virtual address for the buffer. Used by the >>>>>>>>>>> * drm_gem_dmabuf_vmap() helper. >>>>>>>>>>> * >>>>>>>>>>> + * Notes to implementors: >>>>>>>>>>> + * >>>>>>>>>>> + * - Implementations must expect pairs of @vmap and >>>>>>>>>>> @vunmap to be >>>>>>>>>>> + * called frequently and should optimize for this case. >>>>>>>>>>> + * >>>>>>>>>>> + * - Implemenations may expect the caller to hold the >>>>>>>>>>> GEM object's >>>>>>>>>>> + * reservation lock to protect against concurrent >>>>>>>>>>> calls and >>>>>>>>>>> relocation >>>>>>>>>>> + * of the GEM object. >>>>>>>>>>> + * >>>>>>>>>>> + * - Implementations may provide additional guarantees >>>>>>>>>>> (e.g., >>>>>>>>>>> working >>>>>>>>>>> + * without holding the reservation lock). >>>>>>>>>>> + * >>>>>>>>>>> * This callback is optional. >>>>>>>>>>> + * >>>>>>>>>>> + * See also drm_gem_dmabuf_vmap() >>>>>>>>>>> */ >>>>>>>>>>> int (*vmap)(struct drm_gem_object *obj, struct >>>>>>>>>>> dma_buf_map *map); >>>>>>>>>>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { >>>>>>>>>>> * drm_gem_dmabuf_vunmap() helper. >>>>>>>>>>> * >>>>>>>>>>> * This callback is optional. >>>>>>>>>>> + * >>>>>>>>>>> + * See also @vmap. >>>>>>>>>>> */ >>>>>>>>>>> void (*vunmap)(struct drm_gem_object *obj, struct >>>>>>>>>>> dma_buf_map >>>>>>>>>>> *map); >>>>>>>>>>> -- >>>>>>>>>>> 2.29.2 >>>>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> dri-devel mailing list >>>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>>> >>>>>>>> -- >>>>>>>> Thomas Zimmermann >>>>>>>> Graphics Driver Developer >>>>>>>> SUSE Software Solutions Germany GmbH >>>>>>>> Maxfeldstr. 5, 90409 Nürnberg, Germany >>>>>>>> (HRB 36809, AG Nürnberg) >>>>>>>> Geschäftsführer: Felix Imendörffer >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Thomas Zimmermann >>>>>> Graphics Driver Developer >>>>>> SUSE Software Solutions Germany GmbH >>>>>> Maxfeldstr. 5, 90409 Nürnberg, Germany >>>>>> (HRB 36809, AG Nürnberg) >>>>>> Geschäftsführer: Felix Imendörffer >>>>>> >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 01.12.20 um 13:33 schrieb Thomas Zimmermann: > Hi > > Am 01.12.20 um 13:14 schrieb Christian König: >> Am 01.12.20 um 12:30 schrieb Thomas Zimmermann: >>> Hi >>> >>> Am 01.12.20 um 11:34 schrieb Christian König: >>>>> [...] >>>>> In patch 6 of this series, there's ast cursor code that acquires >>>>> two BO's reservation locks and vmaps them afterwards. That's >>>>> probably how you intend to use dma_buf_vmap_local. >>>>> >>>>> However, I think it's more logically to have a vmap callback that >>>>> only does the actual vmap. This is all that exporters would have >>>>> to implement. >>>>> >>>>> And with that, build one helper that pins before vmap and one >>>>> helper that gets the resv lock. >>>> >>>> I don't think that this is will work nor is it a good approach. >>>> >>>> See the ast cursor handling for example. You need to acquire two >>>> BOs here, not just one. And this can't be done cleanly with a >>>> single vmap call. >>> >>> That seems to be a misunderstanding. >>> >>> I don't mentioned it explicitly, but there's of course another >>> helper that only vmaps and nothing else. This would be useful for >>> cases like the cursor code. So there would be: >>> >>> dma_buf_vmap() - pin + vmap >>> dma_buf_vmap_local() - lock + vmap >>> dma_buf_vmap_locked() - only vmap; caller has set up the BOs >> >> Well that zoo of helpers will certainly get a NAK from my side. >> >> See interfaces like this should implement simple functions and not >> hide what's actually needs to be done inside the drivers using this >> interface. > > If 9 of 10 invocations use the same pattern, why not put that pattern > in a helper? I see nothing wrong with that. Because it hides the locking semantics inside the helper. See when you have the lock/unlock inside the driver it is obvious that you need to be careful not to take locks in different orders. >> What we could do is to add a pin count to the DMA-buf and then do >> WARN_ON(dma_buf->pin_count || dma_resv_lock_help(dma_buf->resv)) in >> the vmap/vunmap calls. > > Most of the vmap code is either CMA or SHMEM GEM stuff. They don't > need to pin. It's just baggage to them. The TTM stuff that does need > pinning is the minority. > >> >>> >>> I did some conversion of drivers that use vram and shmem. They >>> occasionally update a buffer (ast cursors) or flush a BO from system >>> memory to HW (udl, cirrus, mgag200). In terms of these 3 interfaces: >>> I never needed dma_buf_vmap() because pinning was never really >>> required here. Almost all of the cases were handled by >>> dma_buf_vmap_local(). And the ast cursor code uses the equivalent of >>> dma_buf_vmap_locked(). >> >> Yeah, that is kind of expected. I was already wondering as well why >> we didn't used the reservation lock more extensively. > > As a side note, I found only 6 trivial implementations of vmap outside > of drivers/gpu/drm. I cannot find a single implementation of pin > there. What am I missing? Amdgpu is the only one currently implementing the new interface. So far we didn't had the time nor the need to correctly move the locking into the calling drivers. That's what the whole dynamic DMA-buf patches where all about. Regards, Christian. > > Best regards > Thomas
Hi Am 01.12.20 um 13:38 schrieb Christian König: > Am 01.12.20 um 13:33 schrieb Thomas Zimmermann: >> Hi >> >> Am 01.12.20 um 13:14 schrieb Christian König: >>> Am 01.12.20 um 12:30 schrieb Thomas Zimmermann: >>>> Hi >>>> >>>> Am 01.12.20 um 11:34 schrieb Christian König: >>>>>> [...] >>>>>> In patch 6 of this series, there's ast cursor code that acquires >>>>>> two BO's reservation locks and vmaps them afterwards. That's >>>>>> probably how you intend to use dma_buf_vmap_local. >>>>>> >>>>>> However, I think it's more logically to have a vmap callback that >>>>>> only does the actual vmap. This is all that exporters would have >>>>>> to implement. >>>>>> >>>>>> And with that, build one helper that pins before vmap and one >>>>>> helper that gets the resv lock. >>>>> >>>>> I don't think that this is will work nor is it a good approach. >>>>> >>>>> See the ast cursor handling for example. You need to acquire two >>>>> BOs here, not just one. And this can't be done cleanly with a >>>>> single vmap call. >>>> >>>> That seems to be a misunderstanding. >>>> >>>> I don't mentioned it explicitly, but there's of course another >>>> helper that only vmaps and nothing else. This would be useful for >>>> cases like the cursor code. So there would be: >>>> >>>> dma_buf_vmap() - pin + vmap >>>> dma_buf_vmap_local() - lock + vmap >>>> dma_buf_vmap_locked() - only vmap; caller has set up the BOs >>> >>> Well that zoo of helpers will certainly get a NAK from my side. >>> >>> See interfaces like this should implement simple functions and not >>> hide what's actually needs to be done inside the drivers using this >>> interface. >> >> If 9 of 10 invocations use the same pattern, why not put that pattern >> in a helper? I see nothing wrong with that. > > Because it hides the locking semantics inside the helper. See when you > have the lock/unlock inside the driver it is obvious that you need to be > careful not to take locks in different orders. > >>> What we could do is to add a pin count to the DMA-buf and then do >>> WARN_ON(dma_buf->pin_count || dma_resv_lock_help(dma_buf->resv)) in >>> the vmap/vunmap calls. >> >> Most of the vmap code is either CMA or SHMEM GEM stuff. They don't >> need to pin. It's just baggage to them. The TTM stuff that does need >> pinning is the minority. >> >>> >>>> >>>> I did some conversion of drivers that use vram and shmem. They >>>> occasionally update a buffer (ast cursors) or flush a BO from system >>>> memory to HW (udl, cirrus, mgag200). In terms of these 3 interfaces: >>>> I never needed dma_buf_vmap() because pinning was never really >>>> required here. Almost all of the cases were handled by >>>> dma_buf_vmap_local(). And the ast cursor code uses the equivalent of >>>> dma_buf_vmap_locked(). >>> >>> Yeah, that is kind of expected. I was already wondering as well why >>> we didn't used the reservation lock more extensively. >> >> As a side note, I found only 6 trivial implementations of vmap outside >> of drivers/gpu/drm. I cannot find a single implementation of pin >> there. What am I missing? > > Amdgpu is the only one currently implementing the new interface. So far > we didn't had the time nor the need to correctly move the locking into > the calling drivers. > > That's what the whole dynamic DMA-buf patches where all about. Thanks for the pointer. Best regards Thomas > > Regards, > Christian. > >> >> Best regards >> Thomas > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Am 01.12.20 um 13:51 schrieb Thomas Zimmermann: > Hi > > Am 01.12.20 um 13:38 schrieb Christian König: >> Am 01.12.20 um 13:33 schrieb Thomas Zimmermann: >>> Hi >>> >>> Am 01.12.20 um 13:14 schrieb Christian König: >>>> Am 01.12.20 um 12:30 schrieb Thomas Zimmermann: >>>>> Hi >>>>> >>>>> Am 01.12.20 um 11:34 schrieb Christian König: >>>>>>> [...] >>>>>>> In patch 6 of this series, there's ast cursor code that acquires >>>>>>> two BO's reservation locks and vmaps them afterwards. That's >>>>>>> probably how you intend to use dma_buf_vmap_local. >>>>>>> >>>>>>> However, I think it's more logically to have a vmap callback that >>>>>>> only does the actual vmap. This is all that exporters would have >>>>>>> to implement. >>>>>>> >>>>>>> And with that, build one helper that pins before vmap and one >>>>>>> helper that gets the resv lock. >>>>>> >>>>>> I don't think that this is will work nor is it a good approach. >>>>>> >>>>>> See the ast cursor handling for example. You need to acquire two >>>>>> BOs here, not just one. And this can't be done cleanly with a >>>>>> single vmap call. >>>>> >>>>> That seems to be a misunderstanding. >>>>> >>>>> I don't mentioned it explicitly, but there's of course another >>>>> helper that only vmaps and nothing else. This would be useful for >>>>> cases like the cursor code. So there would be: >>>>> >>>>> dma_buf_vmap() - pin + vmap >>>>> dma_buf_vmap_local() - lock + vmap >>>>> dma_buf_vmap_locked() - only vmap; caller has set up the BOs >>>> >>>> Well that zoo of helpers will certainly get a NAK from my side. >>>> >>>> See interfaces like this should implement simple functions and not >>>> hide what's actually needs to be done inside the drivers using this >>>> interface. >>> >>> If 9 of 10 invocations use the same pattern, why not put that pattern >>> in a helper? I see nothing wrong with that. >> >> Because it hides the locking semantics inside the helper. See when you >> have the lock/unlock inside the driver it is obvious that you need to >> be careful not to take locks in different orders. >> >>>> What we could do is to add a pin count to the DMA-buf and then do >>>> WARN_ON(dma_buf->pin_count || dma_resv_lock_help(dma_buf->resv)) in >>>> the vmap/vunmap calls. >>> >>> Most of the vmap code is either CMA or SHMEM GEM stuff. They don't >>> need to pin. It's just baggage to them. The TTM stuff that does need >>> pinning is the minority. >>> >>>> >>>>> >>>>> I did some conversion of drivers that use vram and shmem. They >>>>> occasionally update a buffer (ast cursors) or flush a BO from >>>>> system memory to HW (udl, cirrus, mgag200). In terms of these 3 >>>>> interfaces: I never needed dma_buf_vmap() because pinning was never >>>>> really required here. Almost all of the cases were handled by >>>>> dma_buf_vmap_local(). And the ast cursor code uses the equivalent >>>>> of dma_buf_vmap_locked(). >>>> >>>> Yeah, that is kind of expected. I was already wondering as well why >>>> we didn't used the reservation lock more extensively. >>> >>> As a side note, I found only 6 trivial implementations of vmap >>> outside of drivers/gpu/drm. I cannot find a single implementation of >>> pin there. What am I missing? >> >> Amdgpu is the only one currently implementing the new interface. So >> far we didn't had the time nor the need to correctly move the locking >> into the calling drivers. >> >> That's what the whole dynamic DMA-buf patches where all about. > > Thanks for the pointer. That was not a snarky comment, although it might sound like one. I found the series in my inbox. :) Best regards Thomas > > Best regards > Thomas > >> >> Regards, >> Christian. >> >>> >>> Best regards >>> Thomas >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >
Am 01.12.20 um 13:53 schrieb Thomas Zimmermann: > Hi > > Am 01.12.20 um 13:51 schrieb Thomas Zimmermann: >> Hi >> >> Am 01.12.20 um 13:38 schrieb Christian König: >>> Am 01.12.20 um 13:33 schrieb Thomas Zimmermann: >>>> Hi >>>> >>>> Am 01.12.20 um 13:14 schrieb Christian König: >>>>> Am 01.12.20 um 12:30 schrieb Thomas Zimmermann: >>>>>> Hi >>>>>> >>>>>> Am 01.12.20 um 11:34 schrieb Christian König: >>>>>>>> [...] >>>>>>>> In patch 6 of this series, there's ast cursor code that >>>>>>>> acquires two BO's reservation locks and vmaps them afterwards. >>>>>>>> That's probably how you intend to use dma_buf_vmap_local. >>>>>>>> >>>>>>>> However, I think it's more logically to have a vmap callback >>>>>>>> that only does the actual vmap. This is all that exporters >>>>>>>> would have to implement. >>>>>>>> >>>>>>>> And with that, build one helper that pins before vmap and one >>>>>>>> helper that gets the resv lock. >>>>>>> >>>>>>> I don't think that this is will work nor is it a good approach. >>>>>>> >>>>>>> See the ast cursor handling for example. You need to acquire two >>>>>>> BOs here, not just one. And this can't be done cleanly with a >>>>>>> single vmap call. >>>>>> >>>>>> That seems to be a misunderstanding. >>>>>> >>>>>> I don't mentioned it explicitly, but there's of course another >>>>>> helper that only vmaps and nothing else. This would be useful for >>>>>> cases like the cursor code. So there would be: >>>>>> >>>>>> dma_buf_vmap() - pin + vmap >>>>>> dma_buf_vmap_local() - lock + vmap >>>>>> dma_buf_vmap_locked() - only vmap; caller has set up the BOs >>>>> >>>>> Well that zoo of helpers will certainly get a NAK from my side. >>>>> >>>>> See interfaces like this should implement simple functions and not >>>>> hide what's actually needs to be done inside the drivers using >>>>> this interface. >>>> >>>> If 9 of 10 invocations use the same pattern, why not put that >>>> pattern in a helper? I see nothing wrong with that. >>> >>> Because it hides the locking semantics inside the helper. See when >>> you have the lock/unlock inside the driver it is obvious that you >>> need to be careful not to take locks in different orders. >>> >>>>> What we could do is to add a pin count to the DMA-buf and then do >>>>> WARN_ON(dma_buf->pin_count || dma_resv_lock_help(dma_buf->resv)) >>>>> in the vmap/vunmap calls. >>>> >>>> Most of the vmap code is either CMA or SHMEM GEM stuff. They don't >>>> need to pin. It's just baggage to them. The TTM stuff that does >>>> need pinning is the minority. >>>> >>>>> >>>>>> >>>>>> I did some conversion of drivers that use vram and shmem. They >>>>>> occasionally update a buffer (ast cursors) or flush a BO from >>>>>> system memory to HW (udl, cirrus, mgag200). In terms of these 3 >>>>>> interfaces: I never needed dma_buf_vmap() because pinning was >>>>>> never really required here. Almost all of the cases were handled >>>>>> by dma_buf_vmap_local(). And the ast cursor code uses the >>>>>> equivalent of dma_buf_vmap_locked(). >>>>> >>>>> Yeah, that is kind of expected. I was already wondering as well >>>>> why we didn't used the reservation lock more extensively. >>>> >>>> As a side note, I found only 6 trivial implementations of vmap >>>> outside of drivers/gpu/drm. I cannot find a single implementation >>>> of pin there. What am I missing? >>> >>> Amdgpu is the only one currently implementing the new interface. So >>> far we didn't had the time nor the need to correctly move the >>> locking into the calling drivers. >>> >>> That's what the whole dynamic DMA-buf patches where all about. >> >> Thanks for the pointer. > > That was not a snarky comment, although it might sound like one. I > found the series in my inbox. :) It wasn't recognized as such. And just to be clear your work here is extremely welcomed. Regards, Christian. > > Best regards > Thomas > >> >> Best regards >> Thomas >> >>> >>> Regards, >>> Christian. >>> >>>> >>>> Best regards >>>> Thomas >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >
On Tue, Dec 1, 2020 at 11:27 AM Thomas Zimmermann <tzimmermann@suse.de> wrote: > > Hi > > Am 01.12.20 um 11:00 schrieb Daniel Vetter: > > On Tue, Dec 1, 2020 at 10:40 AM Thomas Zimmermann <tzimmermann@suse.de> wrote: > >> > >> Hi > >> > >> Am 01.12.20 um 10:10 schrieb Daniel Vetter: > >>> On Tue, Dec 1, 2020 at 9:32 AM Thomas Zimmermann <tzimmermann@suse.de> wrote: > >>>> > >>>> Hi > >>>> > >>>> Am 30.11.20 um 16:33 schrieb Christian König: > >>>>> Am 30.11.20 um 16:30 schrieb Daniel Vetter: > >>>>>> On Mon, Nov 30, 2020 at 01:04:26PM +0100, Thomas Zimmermann wrote: > >>>>>>> Mapping a GEM object's buffer into kernel address space prevents the > >>>>>>> buffer from being evicted from VRAM, which in turn may result in > >>>>>>> out-of-memory errors. It's therefore required to only vmap GEM BOs for > >>>>>>> short periods of time; unless the GEM implementation provides additional > >>>>>>> guarantees. > >>>>>>> > >>>>>>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> > >>>>>>> --- > >>>>>>> drivers/gpu/drm/drm_prime.c | 6 ++++++ > >>>>>>> include/drm/drm_gem.h | 16 ++++++++++++++++ > >>>>>>> 2 files changed, 22 insertions(+) > >>>>>>> > >>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c > >>>>>>> index 7db55fce35d8..9c9ece9833e0 100644 > >>>>>>> --- a/drivers/gpu/drm/drm_prime.c > >>>>>>> +++ b/drivers/gpu/drm/drm_prime.c > >>>>>>> @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); > >>>>>>> * callback. Calls into &drm_gem_object_funcs.vmap for device > >>>>>>> specific handling. > >>>>>>> * The kernel virtual address is returned in map. > >>>>>>> * > >>>>>>> + * To prevent the GEM object from being relocated, callers must hold > >>>>>>> the GEM > >>>>>>> + * object's reservation lock from when calling this function until > >>>>>>> releasing the > >>>>>>> + * mapping. Holding onto a mapping and the associated reservation > >>>>>>> lock for an > >>>>>>> + * unbound time may result in out-of-memory errors. Calls to > >>>>>>> drm_gem_dmabuf_vmap() > >>>>>>> + * should therefore be accompanied by a call to > >>>>>>> drm_gem_dmabuf_vunmap(). > >>>>>>> + * > >>>>>>> * Returns 0 on success or a negative errno code otherwise. > >>>>>> This is a dma-buf hook, which means just documenting the rules you'd like > >>>>>> to have here isn't enough. We need to roll this out at the dma-buf level, > >>>>>> and enforce it. > >>>>>> > >>>>>> Enforce it = assert_lock_held > >>>>>> > >>>>>> Roll out = review everyone. Because this goes through dma-buf it'll come > >>>>>> back through shmem helpers (and other helpers and other subsystems) back > >>>>>> to any driver using vmap for gpu buffers. This includes the media > >>>>>> subsystem, and the media subsystem definitely doesn't cope with just > >>>>>> temporarily mapping buffers. So there we need to pin them, which I think > >>>>>> means we'll need 2 version of dma_buf_vmap - one that's temporary and > >>>>>> requires we hold dma_resv lock, the other requires that the buffer is > >>>>>> pinned. > >>>>> > >>>>> OR start to proper use the dma_buf_pin/dma_buf_unpin functions which I > >>>>> added to cover this use case as well. > >>>> > >>>> While I generally agree, here are some thoughts: > >>>> > >>>> I found all generic pin functions useless, because they don't allow for > >>>> specifying where to pin. With fbdev emulation, this means that console > >>>> buffers might never make it to VRAM for scanout. If anything, the policy > >>>> should be that pin always pins in HW-accessible memory. > >>>> > >>>> Pin has quite a bit of overhead (more locking, buffer movement), so it > >>>> should be the second choice after regular vmap. To make both work > >>>> together, pin probably relies on holding the reservation lock internally. > >>>> > >>>> Therefore I think we still would want some additional helpers, such as: > >>>> > >>>> pin_unlocked(), which acquires the resv lock, calls regular pin and > >>>> then drops the resv lock. Same for unpin_unlocked() > >>>> > >>>> vmap_pinned(), which enforces that the buffer has been pinned and > >>>> then calls regalar vmap. Same for vunmap_pinned() > >>>> > >>>> A typical pattern with these functions would look like this. > >>>> > >>>> drm_gem_object bo; > >>>> dma_buf_map map; > >>>> > >>>> init() { > >>>> pin_unlocked(bo); > >>>> vmap_pinned(bo, map); > >>>> } > >>>> > >>>> worker() { > >>>> begin_cpu_access() > >>>> // access bo via map > >>>> end_cpu_access() > >>>> } > >>>> > >>>> fini() { > >>>> vunmap_pinned(bo, map); > >>>> unpin_unlocked(bo); > >>>> } > >>>> > >>>> init() > >>>> while (...) { > >>>> worker() > >>>> } > >>>> fini() > >>>> > >>>> Is that reasonable for media drivers? > >>> > >>> So media drivers go through dma-buf, which means we always pin into > >>> system memory. Which I guess for vram-only display drivers makes no > >>> sense and should be rejected, but we still need somewhat consistent > >>> rules. > >>> > >>> The other thing is that if you do a dma_buf_attach without dynamic > >>> mode, dma-buf will pin things for you already. So in many cases it > >> > >> Do you have a pointer to code that illustrates this well? > >> > >>> could be that we don't need a separate pin (but since the pin is in > >>> the exporter, not dma-buf layer, we can't check for that). I'm also > >>> not seeing why existing users need to split up their dma_buf_vmap into > >>> a pin + vmap, they don't need them separately. > >> > >> It's two different operations, with pin having some possible overhead > >> and failure conditions. And during the last discussion, pin was say to > >> be for userspace-managed buffers. Kernel code should hold the > >> reservation lock. > >> > >> For my POV, the current interfaces have no clear policy or semantics. > >> Looking through the different GEM implementations, each one seems to > >> have its own interpretation. > > > > Yup, that's the problem really. In the past we've had vmap exclusively > > for permanently pinned access, with no locking requirements. Now we're > > trying to make this more dynamic, but in a somewhat ad-hoc fashion > > (generic fbdev emulation charged ahead with making the fbdev > > framebuffer evictable), and now it breaks at every seam. Adding more > > ad-hoc semantics on top doesn't seem like a good idea. > > > > That's why I think we should have 2 different interfaces: > > - dma_buf_vmap, the one we currently have. Permanently pins the > > buffer, mapping survives, no locking required. > > - dma_buf_vmap_local, the new interface, the one that generic fbdev > > should have used (but oh well mistakes happen), requires > > dma_resv_lock, the mapping is only local to the caller > > In patch 6 of this series, there's ast cursor code that acquires two > BO's reservation locks and vmaps them afterwards. That's probably how > you intend to use dma_buf_vmap_local. > > However, I think it's more logically to have a vmap callback that only > does the actual vmap. This is all that exporters would have to implement. > > And with that, build one helper that pins before vmap and one helper > that gets the resv lock. > > I know that it might require some work on exporting drivers. But with > your proposal, we probably need another dma-buf callback just for > vmap_local. (?) That seems even less appealing to me. The stuff I was explaining is what I expect the interface to look like for the importers. For exporters I think there's going to be two modes, and there I think a single vmap plus separate (optional) pin as needed sounds reasonable. Again that's pretty much what we've done for dynamic exporters too. So sounds all good to me. Btw I don't think we need a dma_buf_vmap_local_unlocked helper, imo better to inflict the dma_resv_lock onto callers explicitly. It's a bit more code, but the dma_resv is a very tricky lock which is highly constrained. Surfacing it instead of hiding it feels better. In general best practice is that the dma_resv is locked by the top most caller possible, and lower functions and callbacks just have an dma_resv_assert_locked. -Daniel > > Best regards > Thomas > > > > > Trying to shovel both semantics into one interface, depending upon > > which implementation we have backing the buffer, doesn't work indeed. > > > > Also on the pin topic, I think neither interface should require > > callers to explicitly pin anything. For existing users it should > > happen automatically behind the scenes imo, that's what they're > > expecting. > > -Daniel > > > > > >>> I think we could use what we've done for dynamic dma-buf attachment > >>> (which also change locking rules) and just have new functions for the > >>> new way (i.e. short term vmap protected by dma_resv lock. Maybe call > >>> these dma_buf_vmap_local, in the spirit of the new kmap_local which > >>> are currently under discussion. I think _local suffix is better, for > >>> otherwise people might do something silly like > >>> > >>> dma_resv_lock(); > >>> dma_buf_vmap_locked(); > >>> dma_resv_unlock(); > >>> > >>> /* actual access maybe even in some other thread */ > >>> > >>> dma_buf_resv_lock(); > >>> dma_buf_vunmap_unlocked(); > >>> dma_resv_unlock(); > >>> > >>> _local suffix is better at telling that the resulting pointer has very > >>> limited use (essentially just local to the calling context, if you > >>> don't change any locking or anything). > >> > >> _local sounds good. > >> > >> Best regards > >> Thomas > >> > >>> > >>> I think encouraging importers to call dma_buf_pin/unpin isn't a good > >>> idea. Yes dynamic ones need it, but maybe we should check for that > >>> somehow in the exporterd interface (atm only amdgpu is using it). > >>> -Daniel > >>> > >>> > >>> > >>> > >>> > >>>> Best regards > >>>> Thomas > >>>> > >>>> > >>>>> > >>>>> Cheers, > >>>>> Christian. > >>>>> > >>>>>> > >>>>>> That's what I meant with that this approach here is very sprawling :-/ > >>>>>> -Daniel > >>>>>> > >>>>>>> */ > >>>>>>> int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map > >>>>>>> *map) > >>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h > >>>>>>> index 5e6daa1c982f..7c34cd5ec261 100644 > >>>>>>> --- a/include/drm/drm_gem.h > >>>>>>> +++ b/include/drm/drm_gem.h > >>>>>>> @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { > >>>>>>> * Returns a virtual address for the buffer. Used by the > >>>>>>> * drm_gem_dmabuf_vmap() helper. > >>>>>>> * > >>>>>>> + * Notes to implementors: > >>>>>>> + * > >>>>>>> + * - Implementations must expect pairs of @vmap and @vunmap to be > >>>>>>> + * called frequently and should optimize for this case. > >>>>>>> + * > >>>>>>> + * - Implemenations may expect the caller to hold the GEM object's > >>>>>>> + * reservation lock to protect against concurrent calls and > >>>>>>> relocation > >>>>>>> + * of the GEM object. > >>>>>>> + * > >>>>>>> + * - Implementations may provide additional guarantees (e.g., > >>>>>>> working > >>>>>>> + * without holding the reservation lock). > >>>>>>> + * > >>>>>>> * This callback is optional. > >>>>>>> + * > >>>>>>> + * See also drm_gem_dmabuf_vmap() > >>>>>>> */ > >>>>>>> int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); > >>>>>>> @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { > >>>>>>> * drm_gem_dmabuf_vunmap() helper. > >>>>>>> * > >>>>>>> * This callback is optional. > >>>>>>> + * > >>>>>>> + * See also @vmap. > >>>>>>> */ > >>>>>>> void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map > >>>>>>> *map); > >>>>>>> -- > >>>>>>> 2.29.2 > >>>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> dri-devel mailing list > >>>>> dri-devel@lists.freedesktop.org > >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >>>> > >>>> -- > >>>> Thomas Zimmermann > >>>> Graphics Driver Developer > >>>> SUSE Software Solutions Germany GmbH > >>>> Maxfeldstr. 5, 90409 Nürnberg, Germany > >>>> (HRB 36809, AG Nürnberg) > >>>> Geschäftsführer: Felix Imendörffer > >>>> > >>> > >>> > >> > >> -- > >> Thomas Zimmermann > >> Graphics Driver Developer > >> SUSE Software Solutions Germany GmbH > >> Maxfeldstr. 5, 90409 Nürnberg, Germany > >> (HRB 36809, AG Nürnberg) > >> Geschäftsführer: Felix Imendörffer > >> > > > > > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5, 90409 Nürnberg, Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Felix Imendörffer >
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 7db55fce35d8..9c9ece9833e0 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -669,6 +669,12 @@ EXPORT_SYMBOL(drm_gem_unmap_dma_buf); * callback. Calls into &drm_gem_object_funcs.vmap for device specific handling. * The kernel virtual address is returned in map. * + * To prevent the GEM object from being relocated, callers must hold the GEM + * object's reservation lock from when calling this function until releasing the + * mapping. Holding onto a mapping and the associated reservation lock for an + * unbound time may result in out-of-memory errors. Calls to drm_gem_dmabuf_vmap() + * should therefore be accompanied by a call to drm_gem_dmabuf_vunmap(). + * * Returns 0 on success or a negative errno code otherwise. */ int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map *map) diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 5e6daa1c982f..7c34cd5ec261 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -137,7 +137,21 @@ struct drm_gem_object_funcs { * Returns a virtual address for the buffer. Used by the * drm_gem_dmabuf_vmap() helper. * + * Notes to implementors: + * + * - Implementations must expect pairs of @vmap and @vunmap to be + * called frequently and should optimize for this case. + * + * - Implemenations may expect the caller to hold the GEM object's + * reservation lock to protect against concurrent calls and relocation + * of the GEM object. + * + * - Implementations may provide additional guarantees (e.g., working + * without holding the reservation lock). + * * This callback is optional. + * + * See also drm_gem_dmabuf_vmap() */ int (*vmap)(struct drm_gem_object *obj, struct dma_buf_map *map); @@ -148,6 +162,8 @@ struct drm_gem_object_funcs { * drm_gem_dmabuf_vunmap() helper. * * This callback is optional. + * + * See also @vmap. */ void (*vunmap)(struct drm_gem_object *obj, struct dma_buf_map *map);
Mapping a GEM object's buffer into kernel address space prevents the buffer from being evicted from VRAM, which in turn may result in out-of-memory errors. It's therefore required to only vmap GEM BOs for short periods of time; unless the GEM implementation provides additional guarantees. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> --- drivers/gpu/drm/drm_prime.c | 6 ++++++ include/drm/drm_gem.h | 16 ++++++++++++++++ 2 files changed, 22 insertions(+)