Message ID | 20190415123908.28986-5-paul.kocialkowski@bootlin.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/vc4: Binner BO management improvements | expand |
Paul Kocialkowski <paul.kocialkowski@bootlin.com> writes: > The binner bo is not required until the V3D is in use, so avoid > allocating it at probe and do it on the first non-dumb BO allocation. > Keep track of which clients are using the V3D and liberate the buffer > when there is none left (through a kref) and protect it with a mutex to > avoid race conditions. > > The Out-Of-Memory (OOM) interrupt also gets some tweaking, to avoid > enabling it before having allocated a binner bo. I love your solution to this! > We also want to keep the BO alive during runtime suspend/resume to avoid > failing to allocate it at resume. This happens when the CMA pool is > full at that point and results in a hard crash. > > Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > --- > drivers/gpu/drm/vc4/vc4_bo.c | 30 ++++++++++++++++++++++ > drivers/gpu/drm/vc4/vc4_drv.c | 17 +++++++++++++ > drivers/gpu/drm/vc4/vc4_drv.h | 9 +++++++ > drivers/gpu/drm/vc4/vc4_irq.c | 6 +++-- > drivers/gpu/drm/vc4/vc4_v3d.c | 47 +++++++++++++++++++++++++---------- > 5 files changed, 94 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c > index 88ebd681d7eb..5a5276cce9a2 100644 > --- a/drivers/gpu/drm/vc4/vc4_bo.c > +++ b/drivers/gpu/drm/vc4/vc4_bo.c > @@ -799,6 +799,28 @@ vc4_prime_import_sg_table(struct drm_device *dev, > return obj; > } > > +static int vc4_grab_bin_bo(struct drm_device *dev, > + struct drm_file *file_priv) > +{ > + struct vc4_file *vc4file = file_priv->driver_priv; > + struct vc4_dev *vc4 = to_vc4_dev(dev); > + > + if (!vc4->v3d) > + return -ENODEV; > + > + if (vc4file->bin_bo_kref) > + return 0; > + > + mutex_lock(&vc4->bin_bo_lock); > + vc4file->bin_bo_kref = vc4_v3d_bin_bo_get(vc4); > + mutex_unlock(&vc4->bin_bo_lock); Interesting. I think I would have stored a bool for whether he had the kref, instead of a pointer to vc4->bin_bo_kref. Either way is fine with me, though. > + > + if (!vc4file->bin_bo_kref) > + return -ENOMEM; > + > + return 0; > +} > diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c > index d840b52b9805..b09f771384be 100644 > --- a/drivers/gpu/drm/vc4/vc4_drv.c > +++ b/drivers/gpu/drm/vc4/vc4_drv.c > @@ -126,10 +126,25 @@ static int vc4_open(struct drm_device *dev, struct drm_file *file) > return 0; > } > > +static void vc4_close_bin_bo_release(struct kref *ref) > +{ > + struct vc4_dev *vc4 = container_of(ref, struct vc4_dev, bin_bo_kref); > + > + return vc4_v3d_bin_bo_free(vc4); > +} Could we have this be the function that vc4_v3d.c publishes instead of _free, and then we don't need two separate functions for the free path? > static void vc4_close(struct drm_device *dev, struct drm_file *file) > { > + struct vc4_dev *vc4 = to_vc4_dev(dev); > struct vc4_file *vc4file = file->driver_priv; > > + mutex_lock(&vc4->bin_bo_lock); > + > + if (vc4file->bin_bo_kref) > + kref_put(vc4file->bin_bo_kref, vc4_close_bin_bo_release); > + > + mutex_unlock(&vc4->bin_bo_lock); > + > vc4_perfmon_close_file(vc4file); > kfree(vc4file); > } > @@ -313,6 +338,15 @@ int vc4_v3d_bin_bo_alloc(struct vc4_dev *vc4) > return ret; > } > > +void vc4_v3d_bin_bo_free(struct vc4_dev *vc4) > +{ > + if (!vc4->bin_bo) > + return; > + > + drm_gem_object_put_unlocked(&vc4->bin_bo->base.base); > + vc4->bin_bo = NULL; Hmm. I'm thinking: We free the bin BO from the DRM fd close operation, but we could have jobs still being executed after the close if userspace didn't happen to wait on them before finishing (closing doesn't wait for execution to complete, and we couldn't reliably wait during close even if we wanted to). I think to resolve that we need the vc4_exec_info to keep a ref on the bin BO as well (at least, if they had a binning component to their job). > #ifdef CONFIG_PM > static int vc4_v3d_runtime_suspend(struct device *dev) > { > @@ -321,9 +355,6 @@ static int vc4_v3d_runtime_suspend(struct device *dev) > > vc4_irq_uninstall(vc4->dev); > > - drm_gem_object_put_unlocked(&vc4->bin_bo->base.base); > - vc4->bin_bo = NULL; > - > clk_disable_unprepare(v3d->clk); > > return 0; > @@ -335,10 +366,6 @@ static int vc4_v3d_runtime_resume(struct device *dev) > struct vc4_dev *vc4 = v3d->vc4; > int ret; > > - ret = vc4_v3d_bin_bo_alloc(vc4); > - if (ret) > - return ret; > - > ret = clk_prepare_enable(v3d->clk); > if (ret != 0) > return ret;
Hi, On Mon, 2019-04-15 at 13:50 -0700, Eric Anholt wrote: > Paul Kocialkowski <paul.kocialkowski@bootlin.com> writes: > > > The binner bo is not required until the V3D is in use, so avoid > > allocating it at probe and do it on the first non-dumb BO allocation. > > Keep track of which clients are using the V3D and liberate the buffer > > when there is none left (through a kref) and protect it with a mutex to > > avoid race conditions. > > > > The Out-Of-Memory (OOM) interrupt also gets some tweaking, to avoid > > enabling it before having allocated a binner bo. > > I love your solution to this! > > > We also want to keep the BO alive during runtime suspend/resume to avoid > > failing to allocate it at resume. This happens when the CMA pool is > > full at that point and results in a hard crash. > > > > Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> > > --- > > drivers/gpu/drm/vc4/vc4_bo.c | 30 ++++++++++++++++++++++ > > drivers/gpu/drm/vc4/vc4_drv.c | 17 +++++++++++++ > > drivers/gpu/drm/vc4/vc4_drv.h | 9 +++++++ > > drivers/gpu/drm/vc4/vc4_irq.c | 6 +++-- > > drivers/gpu/drm/vc4/vc4_v3d.c | 47 +++++++++++++++++++++++++---------- > > 5 files changed, 94 insertions(+), 15 deletions(-) > > > > diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c > > index 88ebd681d7eb..5a5276cce9a2 100644 > > --- a/drivers/gpu/drm/vc4/vc4_bo.c > > +++ b/drivers/gpu/drm/vc4/vc4_bo.c > > @@ -799,6 +799,28 @@ vc4_prime_import_sg_table(struct drm_device *dev, > > return obj; > > } > > > > +static int vc4_grab_bin_bo(struct drm_device *dev, > > + struct drm_file *file_priv) > > +{ > > + struct vc4_file *vc4file = file_priv->driver_priv; > > + struct vc4_dev *vc4 = to_vc4_dev(dev); > > + > > + if (!vc4->v3d) > > + return -ENODEV; > > + > > + if (vc4file->bin_bo_kref) > > + return 0; > > + > > + mutex_lock(&vc4->bin_bo_lock); > > + vc4file->bin_bo_kref = vc4_v3d_bin_bo_get(vc4); > > + mutex_unlock(&vc4->bin_bo_lock); > > Interesting. I think I would have stored a bool for whether he had the > kref, instead of a pointer to vc4->bin_bo_kref. Either way is fine with > me, though. Heh, I've changed it to a bool for v6 anyway, it makes the code look more symetrical. > > + > > + if (!vc4file->bin_bo_kref) > > + return -ENOMEM; > > + > > + return 0; > > +} > > diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c > > index d840b52b9805..b09f771384be 100644 > > --- a/drivers/gpu/drm/vc4/vc4_drv.c > > +++ b/drivers/gpu/drm/vc4/vc4_drv.c > > @@ -126,10 +126,25 @@ static int vc4_open(struct drm_device *dev, struct drm_file *file) > > return 0; > > } > > > > +static void vc4_close_bin_bo_release(struct kref *ref) > > +{ > > + struct vc4_dev *vc4 = container_of(ref, struct vc4_dev, bin_bo_kref); > > + > > + return vc4_v3d_bin_bo_free(vc4); > > +} > > Could we have this be the function that vc4_v3d.c publishes instead of > _free, and then we don't need two separate functions for the free path? I've made get/put helpers for v6 that shall address this as well. > > static void vc4_close(struct drm_device *dev, struct drm_file *file) > > { > > + struct vc4_dev *vc4 = to_vc4_dev(dev); > > struct vc4_file *vc4file = file->driver_priv; > > > > + mutex_lock(&vc4->bin_bo_lock); > > + > > + if (vc4file->bin_bo_kref) > > + kref_put(vc4file->bin_bo_kref, vc4_close_bin_bo_release); > > + > > + mutex_unlock(&vc4->bin_bo_lock); > > + > > vc4_perfmon_close_file(vc4file); > > kfree(vc4file); > > } > > @@ -313,6 +338,15 @@ int vc4_v3d_bin_bo_alloc(struct vc4_dev *vc4) > > return ret; > > } > > > > +void vc4_v3d_bin_bo_free(struct vc4_dev *vc4) > > +{ > > + if (!vc4->bin_bo) > > + return; > > + > > + drm_gem_object_put_unlocked(&vc4->bin_bo->base.base); > > + vc4->bin_bo = NULL; > > Hmm. I'm thinking: > > We free the bin BO from the DRM fd close operation, but we could have > jobs still being executed after the close if userspace didn't happen to > wait on them before finishing (closing doesn't wait for execution to > complete, and we couldn't reliably wait during close even if we wanted > to). I think to resolve that we need the vc4_exec_info to keep a ref on > the bin BO as well (at least, if they had a binning component to their > job). Oh good catch! So since we assign binner slots to exec jobs when getting a tile_binning_mode_config_packet, we need to make sure that's kept alive as well. Here is something more I noticed when reworking the code: there are subsequent calls in the ioctl handlers that may fall after we got a reference to the binner BO. So we need to make sure to put in the failure path since the matching legit put won't happen if the ioctl errored out. Cheers, Paul > > #ifdef CONFIG_PM > > static int vc4_v3d_runtime_suspend(struct device *dev) > > { > > @@ -321,9 +355,6 @@ static int vc4_v3d_runtime_suspend(struct device *dev) > > > > vc4_irq_uninstall(vc4->dev); > > > > - drm_gem_object_put_unlocked(&vc4->bin_bo->base.base); > > - vc4->bin_bo = NULL; > > - > > clk_disable_unprepare(v3d->clk); > > > > return 0; > > @@ -335,10 +366,6 @@ static int vc4_v3d_runtime_resume(struct device *dev) > > struct vc4_dev *vc4 = v3d->vc4; > > int ret; > > > > - ret = vc4_v3d_bin_bo_alloc(vc4); > > - if (ret) > > - return ret; > > - > > ret = clk_prepare_enable(v3d->clk); > > if (ret != 0) > > return ret;
diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c index 88ebd681d7eb..5a5276cce9a2 100644 --- a/drivers/gpu/drm/vc4/vc4_bo.c +++ b/drivers/gpu/drm/vc4/vc4_bo.c @@ -799,6 +799,28 @@ vc4_prime_import_sg_table(struct drm_device *dev, return obj; } +static int vc4_grab_bin_bo(struct drm_device *dev, + struct drm_file *file_priv) +{ + struct vc4_file *vc4file = file_priv->driver_priv; + struct vc4_dev *vc4 = to_vc4_dev(dev); + + if (!vc4->v3d) + return -ENODEV; + + if (vc4file->bin_bo_kref) + return 0; + + mutex_lock(&vc4->bin_bo_lock); + vc4file->bin_bo_kref = vc4_v3d_bin_bo_get(vc4); + mutex_unlock(&vc4->bin_bo_lock); + + if (!vc4file->bin_bo_kref) + return -ENOMEM; + + return 0; +} + int vc4_create_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { @@ -806,6 +828,10 @@ int vc4_create_bo_ioctl(struct drm_device *dev, void *data, struct vc4_bo *bo = NULL; int ret; + ret = vc4_grab_bin_bo(dev, file_priv); + if (ret) + return ret; + /* * We can't allocate from the BO cache, because the BOs don't * get zeroed, and that might leak data between users. @@ -865,6 +891,10 @@ vc4_create_shader_bo_ioctl(struct drm_device *dev, void *data, return -EINVAL; } + ret = vc4_grab_bin_bo(dev, file_priv); + if (ret) + return ret; + bo = vc4_bo_create(dev, args->size, true, VC4_BO_TYPE_V3D_SHADER); if (IS_ERR(bo)) return PTR_ERR(bo); diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c index d840b52b9805..b09f771384be 100644 --- a/drivers/gpu/drm/vc4/vc4_drv.c +++ b/drivers/gpu/drm/vc4/vc4_drv.c @@ -126,10 +126,25 @@ static int vc4_open(struct drm_device *dev, struct drm_file *file) return 0; } +static void vc4_close_bin_bo_release(struct kref *ref) +{ + struct vc4_dev *vc4 = container_of(ref, struct vc4_dev, bin_bo_kref); + + return vc4_v3d_bin_bo_free(vc4); +} + static void vc4_close(struct drm_device *dev, struct drm_file *file) { + struct vc4_dev *vc4 = to_vc4_dev(dev); struct vc4_file *vc4file = file->driver_priv; + mutex_lock(&vc4->bin_bo_lock); + + if (vc4file->bin_bo_kref) + kref_put(vc4file->bin_bo_kref, vc4_close_bin_bo_release); + + mutex_unlock(&vc4->bin_bo_lock); + vc4_perfmon_close_file(vc4file); kfree(vc4file); } @@ -274,6 +289,8 @@ static int vc4_drm_bind(struct device *dev) drm->dev_private = vc4; INIT_LIST_HEAD(&vc4->debugfs_list); + mutex_init(&vc4->bin_bo_lock); + ret = vc4_bo_cache_init(drm); if (ret) goto dev_put; diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h index 1f0f529169a1..69dbda8e89ba 100644 --- a/drivers/gpu/drm/vc4/vc4_drv.h +++ b/drivers/gpu/drm/vc4/vc4_drv.h @@ -216,6 +216,11 @@ struct vc4_dev { * the minor is available (after drm_dev_register()). */ struct list_head debugfs_list; + + /* Mutex for binner bo allocation. */ + struct mutex bin_bo_lock; + /* Reference count for our binner bo. */ + struct kref bin_bo_kref; }; static inline struct vc4_dev * @@ -594,6 +599,8 @@ struct vc4_file { struct idr idr; struct mutex lock; } perfmon; + + struct kref *bin_bo_kref; }; static inline struct vc4_exec_info * @@ -833,7 +840,9 @@ void vc4_plane_async_set_fb(struct drm_plane *plane, extern struct platform_driver vc4_v3d_driver; extern const struct of_device_id vc4_v3d_dt_match[]; int vc4_v3d_get_bin_slot(struct vc4_dev *vc4); +struct kref *vc4_v3d_bin_bo_get(struct vc4_dev *vc4); int vc4_v3d_bin_bo_alloc(struct vc4_dev *vc4); +void vc4_v3d_bin_bo_free(struct vc4_dev *vc4); int vc4_v3d_pm_get(struct vc4_dev *vc4); void vc4_v3d_pm_put(struct vc4_dev *vc4); diff --git a/drivers/gpu/drm/vc4/vc4_irq.c b/drivers/gpu/drm/vc4/vc4_irq.c index 723dc86b4511..e20e46483346 100644 --- a/drivers/gpu/drm/vc4/vc4_irq.c +++ b/drivers/gpu/drm/vc4/vc4_irq.c @@ -252,8 +252,10 @@ vc4_irq_postinstall(struct drm_device *dev) if (!vc4->v3d) return 0; - /* Enable both the render done and out of memory interrupts. */ - V3D_WRITE(V3D_INTENA, V3D_DRIVER_IRQS); + /* Enable the render done interrupts. The out-of-memory interrupt is + * enabled as soon as we have a binner BO allocated. + */ + V3D_WRITE(V3D_INTENA, V3D_INT_FLDONE | V3D_INT_FRDONE); return 0; } diff --git a/drivers/gpu/drm/vc4/vc4_v3d.c b/drivers/gpu/drm/vc4/vc4_v3d.c index a77c8d6c7d5b..3501a83a0e91 100644 --- a/drivers/gpu/drm/vc4/vc4_v3d.c +++ b/drivers/gpu/drm/vc4/vc4_v3d.c @@ -212,6 +212,23 @@ int vc4_v3d_get_bin_slot(struct vc4_dev *vc4) return -ENOMEM; } +struct kref *vc4_v3d_bin_bo_get(struct vc4_dev *vc4) +{ + struct kref *bin_bo_kref = &vc4->bin_bo_kref; + int ret; + + if (vc4->bin_bo) { + kref_get(bin_bo_kref); + return bin_bo_kref; + } + + ret = vc4_v3d_bin_bo_alloc(vc4); + if (ret) + return NULL; + + return bin_bo_kref; +} + /** * vc4_v3d_bin_bo_alloc() - allocates the memory that will be used for * tile binning. @@ -294,6 +311,14 @@ int vc4_v3d_bin_bo_alloc(struct vc4_dev *vc4) WARN_ON_ONCE(sizeof(vc4->bin_alloc_used) * 8 != bo->base.base.size / vc4->bin_alloc_size); + kref_init(&vc4->bin_bo_kref); + + /* Enable the out-of-memory interrupt to set our + * newly-allocated binner BO, potentially from an + * already-pending-but-masked interupt. + */ + V3D_WRITE(V3D_INTENA, V3D_INT_OUTOMEM); + break; } @@ -313,6 +338,15 @@ int vc4_v3d_bin_bo_alloc(struct vc4_dev *vc4) return ret; } +void vc4_v3d_bin_bo_free(struct vc4_dev *vc4) +{ + if (!vc4->bin_bo) + return; + + drm_gem_object_put_unlocked(&vc4->bin_bo->base.base); + vc4->bin_bo = NULL; +} + #ifdef CONFIG_PM static int vc4_v3d_runtime_suspend(struct device *dev) { @@ -321,9 +355,6 @@ static int vc4_v3d_runtime_suspend(struct device *dev) vc4_irq_uninstall(vc4->dev); - drm_gem_object_put_unlocked(&vc4->bin_bo->base.base); - vc4->bin_bo = NULL; - clk_disable_unprepare(v3d->clk); return 0; @@ -335,10 +366,6 @@ static int vc4_v3d_runtime_resume(struct device *dev) struct vc4_dev *vc4 = v3d->vc4; int ret; - ret = vc4_v3d_bin_bo_alloc(vc4); - if (ret) - return ret; - ret = clk_prepare_enable(v3d->clk); if (ret != 0) return ret; @@ -405,12 +432,6 @@ static int vc4_v3d_bind(struct device *dev, struct device *master, void *data) if (ret != 0) return ret; - ret = vc4_v3d_bin_bo_alloc(vc4); - if (ret) { - clk_disable_unprepare(v3d->clk); - return ret; - } - /* Reset the binner overflow address/size at setup, to be sure * we don't reuse an old one. */
The binner bo is not required until the V3D is in use, so avoid allocating it at probe and do it on the first non-dumb BO allocation. Keep track of which clients are using the V3D and liberate the buffer when there is none left (through a kref) and protect it with a mutex to avoid race conditions. The Out-Of-Memory (OOM) interrupt also gets some tweaking, to avoid enabling it before having allocated a binner bo. We also want to keep the BO alive during runtime suspend/resume to avoid failing to allocate it at resume. This happens when the CMA pool is full at that point and results in a hard crash. Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> --- drivers/gpu/drm/vc4/vc4_bo.c | 30 ++++++++++++++++++++++ drivers/gpu/drm/vc4/vc4_drv.c | 17 +++++++++++++ drivers/gpu/drm/vc4/vc4_drv.h | 9 +++++++ drivers/gpu/drm/vc4/vc4_irq.c | 6 +++-- drivers/gpu/drm/vc4/vc4_v3d.c | 47 +++++++++++++++++++++++++---------- 5 files changed, 94 insertions(+), 15 deletions(-)