diff mbox

[5/6] drm/vc4: Fix overflow mem unreferencing when the binner runs dry.

Message ID 1469566035-22006-6-git-send-email-eric@anholt.net (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Anholt July 26, 2016, 8:47 p.m. UTC
Overflow memory handling is tricky: While it's still referenced by the
BPO registers, we want to keep it from being freed.  When we are
putting a new set of overflow memory in the registers, we need to
assign the old one to the last rendering job using it.

We were looking at "what's currently running in the binner", but since
the bin/render submission split, we may end up with the binner
completing and having no new job while the renderer is still
processing.  So, if we don't find a bin job at all, look at the
highest-seqno (last) render job to attach our overflow to.

Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/vc4/vc4_drv.h | 9 +++++++++
 drivers/gpu/drm/vc4/vc4_irq.c | 4 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

Comments

Rob Clark July 26, 2016, 9:10 p.m. UTC | #1
On Tue, Jul 26, 2016 at 4:47 PM, Eric Anholt <eric@anholt.net> wrote:
> Overflow memory handling is tricky: While it's still referenced by the
> BPO registers, we want to keep it from being freed.  When we are
> putting a new set of overflow memory in the registers, we need to
> assign the old one to the last rendering job using it.
>
> We were looking at "what's currently running in the binner", but since
> the bin/render submission split, we may end up with the binner
> completing and having no new job while the renderer is still
> processing.  So, if we don't find a bin job at all, look at the
> highest-seqno (last) render job to attach our overflow to.

so, drive-by comment.. but can you allocate gem bo's without backing
them immediately with pages?  If so, just always allocate the bo
up-front and attach it as a dependency of the batch, and only pin it
to actual pages when you have to overflow?

BR,
-R

> Signed-off-by: Eric Anholt <eric@anholt.net>
> Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/vc4/vc4_drv.h | 9 +++++++++
>  drivers/gpu/drm/vc4/vc4_irq.c | 4 +++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
> index 0ced289d7696..87f727932af2 100644
> --- a/drivers/gpu/drm/vc4/vc4_drv.h
> +++ b/drivers/gpu/drm/vc4/vc4_drv.h
> @@ -321,6 +321,15 @@ vc4_first_render_job(struct vc4_dev *vc4)
>                                 struct vc4_exec_info, head);
>  }
>
> +static inline struct vc4_exec_info *
> +vc4_last_render_job(struct vc4_dev *vc4)
> +{
> +       if (list_empty(&vc4->render_job_list))
> +               return NULL;
> +       return list_last_entry(&vc4->render_job_list,
> +                              struct vc4_exec_info, head);
> +}
> +
>  /**
>   * struct vc4_texture_sample_info - saves the offsets into the UBO for texture
>   * setup parameters.
> diff --git a/drivers/gpu/drm/vc4/vc4_irq.c b/drivers/gpu/drm/vc4/vc4_irq.c
> index b0104a346a74..094bc6a475c1 100644
> --- a/drivers/gpu/drm/vc4/vc4_irq.c
> +++ b/drivers/gpu/drm/vc4/vc4_irq.c
> @@ -83,8 +83,10 @@ vc4_overflow_mem_work(struct work_struct *work)
>
>                 spin_lock_irqsave(&vc4->job_lock, irqflags);
>                 current_exec = vc4_first_bin_job(vc4);
> +               if (!current_exec)
> +                       current_exec = vc4_last_render_job(vc4);
>                 if (current_exec) {
> -                       vc4->overflow_mem->seqno = vc4->finished_seqno + 1;
> +                       vc4->overflow_mem->seqno = current_exec->seqno;
>                         list_add_tail(&vc4->overflow_mem->unref_head,
>                                       &current_exec->unref_list);
>                         vc4->overflow_mem = NULL;
> --
> 2.8.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Eric Anholt July 26, 2016, 11:11 p.m. UTC | #2
Rob Clark <robdclark@gmail.com> writes:

> On Tue, Jul 26, 2016 at 4:47 PM, Eric Anholt <eric@anholt.net> wrote:
>> Overflow memory handling is tricky: While it's still referenced by the
>> BPO registers, we want to keep it from being freed.  When we are
>> putting a new set of overflow memory in the registers, we need to
>> assign the old one to the last rendering job using it.
>>
>> We were looking at "what's currently running in the binner", but since
>> the bin/render submission split, we may end up with the binner
>> completing and having no new job while the renderer is still
>> processing.  So, if we don't find a bin job at all, look at the
>> highest-seqno (last) render job to attach our overflow to.
>
> so, drive-by comment.. but can you allocate gem bo's without backing
> them immediately with pages?  If so, just always allocate the bo
> up-front and attach it as a dependency of the batch, and only pin it
> to actual pages when you have to overflow?

The amount of overflow for a given CL is arbitrary, depending on the
geometry submitted, and the overflow pool just gets streamed into by the
hardware as you submit bin jobs.  You'll end up allocating [0,n] new
overflows per bin job.  I don't see where "allocate gem BOs without
backing them immediately with pages" idea would fit into this.
Rob Clark July 26, 2016, 11:37 p.m. UTC | #3
On Tue, Jul 26, 2016 at 7:11 PM, Eric Anholt <eric@anholt.net> wrote:
> Rob Clark <robdclark@gmail.com> writes:
>
>> On Tue, Jul 26, 2016 at 4:47 PM, Eric Anholt <eric@anholt.net> wrote:
>>> Overflow memory handling is tricky: While it's still referenced by the
>>> BPO registers, we want to keep it from being freed.  When we are
>>> putting a new set of overflow memory in the registers, we need to
>>> assign the old one to the last rendering job using it.
>>>
>>> We were looking at "what's currently running in the binner", but since
>>> the bin/render submission split, we may end up with the binner
>>> completing and having no new job while the renderer is still
>>> processing.  So, if we don't find a bin job at all, look at the
>>> highest-seqno (last) render job to attach our overflow to.
>>
>> so, drive-by comment.. but can you allocate gem bo's without backing
>> them immediately with pages?  If so, just always allocate the bo
>> up-front and attach it as a dependency of the batch, and only pin it
>> to actual pages when you have to overflow?
>
> The amount of overflow for a given CL is arbitrary, depending on the
> geometry submitted, and the overflow pool just gets streamed into by the
> hardware as you submit bin jobs.  You'll end up allocating [0,n] new
> overflows per bin job.  I don't see where "allocate gem BOs without
> backing them immediately with pages" idea would fit into this.

well, even not knowing the size up front shouldn't really be a
show-stopper, unless you had to mmap it to userspace, perhaps..
normally backing pages aren't allocated until drm_gem_get_pages() so
allocating the gem bo as placeholder to track dependencies of the
batch/submit shouldn't be an issue.  But I noticed you don't use
drm_gem_get_pages().. maybe w/ cma helpers it is harder to decouple
allocation of the drm_gem_object from the backing store.

BR,
-R
Eric Anholt July 27, 2016, 5:37 a.m. UTC | #4
Rob Clark <robdclark@gmail.com> writes:

> On Tue, Jul 26, 2016 at 7:11 PM, Eric Anholt <eric@anholt.net> wrote:
>> Rob Clark <robdclark@gmail.com> writes:
>>
>>> On Tue, Jul 26, 2016 at 4:47 PM, Eric Anholt <eric@anholt.net> wrote:
>>>> Overflow memory handling is tricky: While it's still referenced by the
>>>> BPO registers, we want to keep it from being freed.  When we are
>>>> putting a new set of overflow memory in the registers, we need to
>>>> assign the old one to the last rendering job using it.
>>>>
>>>> We were looking at "what's currently running in the binner", but since
>>>> the bin/render submission split, we may end up with the binner
>>>> completing and having no new job while the renderer is still
>>>> processing.  So, if we don't find a bin job at all, look at the
>>>> highest-seqno (last) render job to attach our overflow to.
>>>
>>> so, drive-by comment.. but can you allocate gem bo's without backing
>>> them immediately with pages?  If so, just always allocate the bo
>>> up-front and attach it as a dependency of the batch, and only pin it
>>> to actual pages when you have to overflow?
>>
>> The amount of overflow for a given CL is arbitrary, depending on the
>> geometry submitted, and the overflow pool just gets streamed into by the
>> hardware as you submit bin jobs.  You'll end up allocating [0,n] new
>> overflows per bin job.  I don't see where "allocate gem BOs without
>> backing them immediately with pages" idea would fit into this.
>
> well, even not knowing the size up front shouldn't really be a
> show-stopper, unless you had to mmap it to userspace, perhaps..
> normally backing pages aren't allocated until drm_gem_get_pages() so
> allocating the gem bo as placeholder to track dependencies of the
> batch/submit shouldn't be an issue.  But I noticed you don't use
> drm_gem_get_pages().. maybe w/ cma helpers it is harder to decouple
> allocation of the drm_gem_object from the backing store.

There's no period of time between "I need to allocate an overflow BO"
and "I need pages in the BO", though.

I could have a different setup that allocated a massive (all of CMA?),
fresh overflow BO per CL and populated page ranges in it as I overflow,
but with CMA you really need to never do new allocations in the hot path
because you get to stop and wait approximately forever.  So you'd want
to chunk it up so you could cache the groups of contiguous pages of
overflow, and it turns out we already have a thing for this in the form
of GEM BOs.  Anyway, doing that that means you're losing out on the rest
of the last overflow BO for the new CL, expanding the working set in
your precious 256MB CMA area.

Well, OK, actually I *do* allocate a fresh overflow BO per CL today,
because of leftover bringup code that I think I could just delete at
this point.  I'm not doing that in a -fixes commit, though.
Rob Clark July 27, 2016, 11:18 a.m. UTC | #5
On Wed, Jul 27, 2016 at 1:37 AM, Eric Anholt <eric@anholt.net> wrote:
> Rob Clark <robdclark@gmail.com> writes:
>
>> On Tue, Jul 26, 2016 at 7:11 PM, Eric Anholt <eric@anholt.net> wrote:
>>> Rob Clark <robdclark@gmail.com> writes:
>>>
>>>> On Tue, Jul 26, 2016 at 4:47 PM, Eric Anholt <eric@anholt.net> wrote:
>>>>> Overflow memory handling is tricky: While it's still referenced by the
>>>>> BPO registers, we want to keep it from being freed.  When we are
>>>>> putting a new set of overflow memory in the registers, we need to
>>>>> assign the old one to the last rendering job using it.
>>>>>
>>>>> We were looking at "what's currently running in the binner", but since
>>>>> the bin/render submission split, we may end up with the binner
>>>>> completing and having no new job while the renderer is still
>>>>> processing.  So, if we don't find a bin job at all, look at the
>>>>> highest-seqno (last) render job to attach our overflow to.
>>>>
>>>> so, drive-by comment.. but can you allocate gem bo's without backing
>>>> them immediately with pages?  If so, just always allocate the bo
>>>> up-front and attach it as a dependency of the batch, and only pin it
>>>> to actual pages when you have to overflow?
>>>
>>> The amount of overflow for a given CL is arbitrary, depending on the
>>> geometry submitted, and the overflow pool just gets streamed into by the
>>> hardware as you submit bin jobs.  You'll end up allocating [0,n] new
>>> overflows per bin job.  I don't see where "allocate gem BOs without
>>> backing them immediately with pages" idea would fit into this.
>>
>> well, even not knowing the size up front shouldn't really be a
>> show-stopper, unless you had to mmap it to userspace, perhaps..
>> normally backing pages aren't allocated until drm_gem_get_pages() so
>> allocating the gem bo as placeholder to track dependencies of the
>> batch/submit shouldn't be an issue.  But I noticed you don't use
>> drm_gem_get_pages().. maybe w/ cma helpers it is harder to decouple
>> allocation of the drm_gem_object from the backing store.
>
> There's no period of time between "I need to allocate an overflow BO"
> and "I need pages in the BO", though.

oh, ok, so this is some memory that is already being used by the GPU,
not something that starts to be used when you hit overflow condition..
I'd assumed it was something you were allocating in response to the
overflow irq, but looks like you are actually *re*allocating.

BR,
-R

> I could have a different setup that allocated a massive (all of CMA?),
> fresh overflow BO per CL and populated page ranges in it as I overflow,
> but with CMA you really need to never do new allocations in the hot path
> because you get to stop and wait approximately forever.  So you'd want
> to chunk it up so you could cache the groups of contiguous pages of
> overflow, and it turns out we already have a thing for this in the form
> of GEM BOs.  Anyway, doing that that means you're losing out on the rest
> of the last overflow BO for the new CL, expanding the working set in
> your precious 256MB CMA area.
>
> Well, OK, actually I *do* allocate a fresh overflow BO per CL today,
> because of leftover bringup code that I think I could just delete at
> this point.  I'm not doing that in a -fixes commit, though.
diff mbox

Patch

diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
index 0ced289d7696..87f727932af2 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.h
+++ b/drivers/gpu/drm/vc4/vc4_drv.h
@@ -321,6 +321,15 @@  vc4_first_render_job(struct vc4_dev *vc4)
 				struct vc4_exec_info, head);
 }
 
+static inline struct vc4_exec_info *
+vc4_last_render_job(struct vc4_dev *vc4)
+{
+	if (list_empty(&vc4->render_job_list))
+		return NULL;
+	return list_last_entry(&vc4->render_job_list,
+			       struct vc4_exec_info, head);
+}
+
 /**
  * struct vc4_texture_sample_info - saves the offsets into the UBO for texture
  * setup parameters.
diff --git a/drivers/gpu/drm/vc4/vc4_irq.c b/drivers/gpu/drm/vc4/vc4_irq.c
index b0104a346a74..094bc6a475c1 100644
--- a/drivers/gpu/drm/vc4/vc4_irq.c
+++ b/drivers/gpu/drm/vc4/vc4_irq.c
@@ -83,8 +83,10 @@  vc4_overflow_mem_work(struct work_struct *work)
 
 		spin_lock_irqsave(&vc4->job_lock, irqflags);
 		current_exec = vc4_first_bin_job(vc4);
+		if (!current_exec)
+			current_exec = vc4_last_render_job(vc4);
 		if (current_exec) {
-			vc4->overflow_mem->seqno = vc4->finished_seqno + 1;
+			vc4->overflow_mem->seqno = current_exec->seqno;
 			list_add_tail(&vc4->overflow_mem->unref_head,
 				      &current_exec->unref_list);
 			vc4->overflow_mem = NULL;