Message ID | 1472593821-38429-1-git-send-email-hshi@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Aug 30, 2016 at 02:50:20PM -0700, Haixia Shi wrote: > Previously this function had a NULL pointer check for gem->map_list.map, but > that line was refactored after commit 0de23977cfeb5b357ec884ba15417ae118ff9e9bb > ("drm/gem: convert to new unified vma manager"). > > After the refactor it is still necessasry to check that the vma manager is not > NULL because udl_gem_free_object() may come in after the vma manager is destroyed. When/how does this happen? Backtrace? Destroying the vma manager before the objects are all gone sounds a bit fishy. -Daniel > > Signed-off-by: Haixia Shi <hshi@chromium.org> > Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> > --- > drivers/gpu/drm/udl/udl_gem.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/udl/udl_gem.c b/drivers/gpu/drm/udl/udl_gem.c > index 818e707..72e1bd4 100644 > --- a/drivers/gpu/drm/udl/udl_gem.c > +++ b/drivers/gpu/drm/udl/udl_gem.c > @@ -204,7 +204,8 @@ void udl_gem_free_object(struct drm_gem_object *gem_obj) > if (obj->pages) > udl_gem_put_pages(obj); > > - drm_gem_free_mmap_offset(gem_obj); > + if (gem_obj->dev->vma_offset_manager) > + drm_gem_free_mmap_offset(gem_obj); > } > > /* the dumb interface doesn't work with the GEM straight MMAP > -- > 2.8.0.rc3.226.g39d4020 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
For details see https://bugs.chromium.org/p/chromium/issues/detail?id=468050 So drm_mode_config_cleanup() is called from udl_driver_unload() in which we found there's still a framebuffer left, hence the WARN in drm_crtc.c:5495. This also forcefully releases all the buffers. A bit later the actual drm_buf_release comes in which attempts to release the buffer again. On Wed, Aug 31, 2016 at 12:55 AM, Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Aug 30, 2016 at 02:50:20PM -0700, Haixia Shi wrote: > > Previously this function had a NULL pointer check for gem->map_list.map, but > > that line was refactored after commit 0de23977cfeb5b357ec884ba15417ae118ff9e9bb > > ("drm/gem: convert to new unified vma manager"). > > > > After the refactor it is still necessasry to check that the vma manager is not > > NULL because udl_gem_free_object() may come in after the vma manager is destroyed. > > When/how does this happen? Backtrace? Destroying the vma manager before > the objects are all gone sounds a bit fishy. > -Daniel > > > > > Signed-off-by: Haixia Shi <hshi@chromium.org> > > Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> > > --- > > drivers/gpu/drm/udl/udl_gem.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/udl/udl_gem.c b/drivers/gpu/drm/udl/udl_gem.c > > index 818e707..72e1bd4 100644 > > --- a/drivers/gpu/drm/udl/udl_gem.c > > +++ b/drivers/gpu/drm/udl/udl_gem.c > > @@ -204,7 +204,8 @@ void udl_gem_free_object(struct drm_gem_object *gem_obj) > > if (obj->pages) > > udl_gem_put_pages(obj); > > > > - drm_gem_free_mmap_offset(gem_obj); > > + if (gem_obj->dev->vma_offset_manager) > > + drm_gem_free_mmap_offset(gem_obj); > > } > > > > /* the dumb interface doesn't work with the GEM straight MMAP > > -- > > 2.8.0.rc3.226.g39d4020 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Wed, Aug 31, 2016 at 10:45 PM, Haixia Shi <hshi@chromium.org> wrote: > For details see https://bugs.chromium.org/p/chromium/issues/detail?id=468050 > > So drm_mode_config_cleanup() is called from udl_driver_unload() in > which we found there's still a framebuffer left, hence the WARN in > drm_crtc.c:5495. This also forcefully releases all the buffers. > > A bit later the actual drm_buf_release comes in which attempts to > release the buffer again. Leaving a drm_framebuffer behind on unload is definitely a bug, but not fixed with this patch here I think. The dma-buf lifetime issue is far worse, since we simply don't handling those leaking past the lifetime of the exporting drm_device at all. The dma-buf has references to a lot more than just the vma manager. What we probably need is that every exported dma-buf holds a ref on the underlying drm_device, but that means untangling the refcounting&cleanup of that vs unplugging it. -Daniel
On Wed, Aug 31, 2016 at 11:05 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Wed, Aug 31, 2016 at 10:45 PM, Haixia Shi <hshi@chromium.org> wrote: >> For details see https://bugs.chromium.org/p/chromium/issues/detail?id=468050 >> >> So drm_mode_config_cleanup() is called from udl_driver_unload() in >> which we found there's still a framebuffer left, hence the WARN in >> drm_crtc.c:5495. This also forcefully releases all the buffers. >> >> A bit later the actual drm_buf_release comes in which attempts to >> release the buffer again. > > Leaving a drm_framebuffer behind on unload is definitely a bug, but > not fixed with this patch here I think. > > The dma-buf lifetime issue is far worse, since we simply don't > handling those leaking past the lifetime of the exporting drm_device > at all. The dma-buf has references to a lot more than just the vma > manager. What we probably need is that every exported dma-buf holds a > ref on the underlying drm_device, but that means untangling the > refcounting&cleanup of that vs unplugging it. Just noticed that these problems started only when dma-buf export support was added (by you) to udl. That dma-buf support is a bit hack-ish (e.g. it leaks sg mappings since udl_unmap_dma_buf isn't implemented. Rough sketch of a fix: - fix up udl_dmabuf.c. We could/should probably put most of these into the core as a set of helpers for drivers which use plain shmem gem objects. - fix udl load/unload to no longer be midlayered, i.e. get rid of the ->load/unload hooks. There's tons of examples and drivers out there for templates. - fix up the unplug hook to correctly unregister everything. With the fixed load/unload all the unplugged tracking is probably no longer neeeded. Also with this you can drop the drm_global_mutex dance from drm_unplug_dev. Also the sequence in drm_unplug_dev is wrong like the midlayered ->unload. First it should do all the unregistering, then release internal resources. Atm it's the other way round. - make sure _all_ public objects (open files, counted by dev->open_count, dma-bufs, ...) hold a full reference onto the drm_device. For the dma-buf case this probably needs changes in drm_prime.c. - Fix that little ordering issue with the leaking fb. It's probably not getting cleanup up as it should in udl_fbdev_unplug. tada, bug fixed for real! Cheers, Daniel
Daniel Thanks! I agree the PATCH 1/2 needs some more work. What do you think about the PATCH 2/2 (suspend/resume) -- would it make sense to review it as a single standalone patch? Regards, Haixia On Wed, Aug 31, 2016 at 2:17 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Wed, Aug 31, 2016 at 11:05 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >> On Wed, Aug 31, 2016 at 10:45 PM, Haixia Shi <hshi@chromium.org> wrote: >>> For details see https://bugs.chromium.org/p/chromium/issues/detail?id=468050 >>> >>> So drm_mode_config_cleanup() is called from udl_driver_unload() in >>> which we found there's still a framebuffer left, hence the WARN in >>> drm_crtc.c:5495. This also forcefully releases all the buffers. >>> >>> A bit later the actual drm_buf_release comes in which attempts to >>> release the buffer again. >> >> Leaving a drm_framebuffer behind on unload is definitely a bug, but >> not fixed with this patch here I think. >> >> The dma-buf lifetime issue is far worse, since we simply don't >> handling those leaking past the lifetime of the exporting drm_device >> at all. The dma-buf has references to a lot more than just the vma >> manager. What we probably need is that every exported dma-buf holds a >> ref on the underlying drm_device, but that means untangling the >> refcounting&cleanup of that vs unplugging it. > > Just noticed that these problems started only when dma-buf export > support was added (by you) to udl. That dma-buf support is a bit > hack-ish (e.g. it leaks sg mappings since udl_unmap_dma_buf isn't > implemented. Rough sketch of a fix: > > - fix up udl_dmabuf.c. We could/should probably put most of these into > the core as a set of helpers for drivers which use plain shmem gem > objects. > > - fix udl load/unload to no longer be midlayered, i.e. get rid of the > ->load/unload hooks. There's tons of examples and drivers out there > for templates. > > - fix up the unplug hook to correctly unregister everything. With the > fixed load/unload all the unplugged tracking is probably no longer > neeeded. Also with this you can drop the drm_global_mutex dance from > drm_unplug_dev. Also the sequence in drm_unplug_dev is wrong like the > midlayered ->unload. First it should do all the unregistering, then > release internal resources. Atm it's the other way round. > > - make sure _all_ public objects (open files, counted by > dev->open_count, dma-bufs, ...) hold a full reference onto the > drm_device. For the dma-buf case this probably needs changes in > drm_prime.c. > > - Fix that little ordering issue with the leaking fb. It's probably > not getting cleanup up as it should in udl_fbdev_unplug. > > tada, bug fixed for real! > > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, Sep 1, 2016 at 3:22 AM, Haixia Shi <hshi@chromium.org> wrote: > Daniel > > Thanks! I agree the PATCH 1/2 needs some more work. > > What do you think about the PATCH 2/2 (suspend/resume) -- would it > make sense to review it as a single standalone patch? Sure, but I have no clue about usb or udl-specific issues (I only analyzed the generic lifetime issues in patch 1/2), I think you need someone else for this. Sean Paul could probably help you out here. -Daniel > > Regards, Haixia > > On Wed, Aug 31, 2016 at 2:17 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >> On Wed, Aug 31, 2016 at 11:05 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >>> On Wed, Aug 31, 2016 at 10:45 PM, Haixia Shi <hshi@chromium.org> wrote: >>>> For details see https://bugs.chromium.org/p/chromium/issues/detail?id=468050 >>>> >>>> So drm_mode_config_cleanup() is called from udl_driver_unload() in >>>> which we found there's still a framebuffer left, hence the WARN in >>>> drm_crtc.c:5495. This also forcefully releases all the buffers. >>>> >>>> A bit later the actual drm_buf_release comes in which attempts to >>>> release the buffer again. >>> >>> Leaving a drm_framebuffer behind on unload is definitely a bug, but >>> not fixed with this patch here I think. >>> >>> The dma-buf lifetime issue is far worse, since we simply don't >>> handling those leaking past the lifetime of the exporting drm_device >>> at all. The dma-buf has references to a lot more than just the vma >>> manager. What we probably need is that every exported dma-buf holds a >>> ref on the underlying drm_device, but that means untangling the >>> refcounting&cleanup of that vs unplugging it. >> >> Just noticed that these problems started only when dma-buf export >> support was added (by you) to udl. That dma-buf support is a bit >> hack-ish (e.g. it leaks sg mappings since udl_unmap_dma_buf isn't >> implemented. Rough sketch of a fix: >> >> - fix up udl_dmabuf.c. We could/should probably put most of these into >> the core as a set of helpers for drivers which use plain shmem gem >> objects. >> >> - fix udl load/unload to no longer be midlayered, i.e. get rid of the >> ->load/unload hooks. There's tons of examples and drivers out there >> for templates. >> >> - fix up the unplug hook to correctly unregister everything. With the >> fixed load/unload all the unplugged tracking is probably no longer >> neeeded. Also with this you can drop the drm_global_mutex dance from >> drm_unplug_dev. Also the sequence in drm_unplug_dev is wrong like the >> midlayered ->unload. First it should do all the unregistering, then >> release internal resources. Atm it's the other way round. >> >> - make sure _all_ public objects (open files, counted by >> dev->open_count, dma-bufs, ...) hold a full reference onto the >> drm_device. For the dma-buf case this probably needs changes in >> drm_prime.c. >> >> - Fix that little ordering issue with the leaking fb. It's probably >> not getting cleanup up as it should in udl_fbdev_unplug. >> >> tada, bug fixed for real! >> >> Cheers, Daniel >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
diff --git a/drivers/gpu/drm/udl/udl_gem.c b/drivers/gpu/drm/udl/udl_gem.c index 818e707..72e1bd4 100644 --- a/drivers/gpu/drm/udl/udl_gem.c +++ b/drivers/gpu/drm/udl/udl_gem.c @@ -204,7 +204,8 @@ void udl_gem_free_object(struct drm_gem_object *gem_obj) if (obj->pages) udl_gem_put_pages(obj); - drm_gem_free_mmap_offset(gem_obj); + if (gem_obj->dev->vma_offset_manager) + drm_gem_free_mmap_offset(gem_obj); } /* the dumb interface doesn't work with the GEM straight MMAP