[RFC] tests/gem_ring_sync_copy: reduce memory usage

Message ID	20141128153929.GB8425@strange.ger.corp.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> Date: Fri, 28 Nov 2014 15:39:29 +0000 From: Damien Lespiau <damien.lespiau@intel.com> To: tim.gore@intel.com Message-ID: <20141128153929.GB8425@strange.ger.corp.intel.com> References: <1417185984-30815-1-git-send-email-tim.gore@intel.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1417185984-30815-1-git-send-email-tim.gore@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: intel-gfx@lists.freedesktop.org Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory usage Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Lespiau, Damien Nov. 28, 2014, 3:39 p.m. UTC

On Fri, Nov 28, 2014 at 02:46:24PM +0000, tim.gore@intel.com wrote:
> From: Tim Gore <tim.gore@intel.com>
> 
> gem_ring_sync_copy uses a lot of memory and gets OOM
> killed on smaller systems (eg android devices).
> Most of the allocation is for "busy work" to keep the
> render rings busy and for this we can just re-use the
> same few buffers over and over. This enables the test
> to be run on low end devices.
> 
> Signed-off-by: Tim Gore <tim.gore@intel.com>

The one thing I'm wondering is how we can ensure that 32 is indeed
enough to create a dependency between the two rings.

I'm not sure why you've chosen to remove n_buffers_load (without
actually removing it from the test data. How about:

tim.gore@intel.com Nov. 28, 2014, 3:47 p.m. UTC | #1

> -----Original Message-----
> From: Lespiau, Damien
> Sent: Friday, November 28, 2014 3:39 PM
> To: Gore, Tim
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [RFC] tests/gem_ring_sync_copy: reduce memory usage
> 
> On Fri, Nov 28, 2014 at 02:46:24PM +0000, tim.gore@intel.com wrote:
> > From: Tim Gore <tim.gore@intel.com>
> >
> > gem_ring_sync_copy uses a lot of memory and gets OOM killed on smaller
> > systems (eg android devices).
> > Most of the allocation is for "busy work" to keep the render rings
> > busy and for this we can just re-use the same few buffers over and
> > over. This enables the test to be run on low end devices.
> >
> > Signed-off-by: Tim Gore <tim.gore@intel.com>
> 
> The one thing I'm wondering is how we can ensure that 32 is indeed enough
> to create a dependency between the two rings.
> 
> I'm not sure why you've chosen to remove n_buffers_load (without actually
> removing it from the test data. How about:
> 
> diff --git a/tests/gem_ring_sync_copy.c b/tests/gem_ring_sync_copy.c
> index 4a732d2..2561bac 100644
> --- a/tests/gem_ring_sync_copy.c
> +++ b/tests/gem_ring_sync_copy.c
> @@ -331,7 +331,7 @@ igt_main
>                 data.drm_fd = drm_open_any_render();
>                 data.devid = intel_get_drm_devid(data.drm_fd);
> 
> -               data.n_buffers_load = 1000;
> +               data.n_buffers_load = 32;
> 
>                 data.bufmgr = drm_intel_bufmgr_gem_init(data.drm_fd, 4096);
>                 igt_assert(data.bufmgr);
> 
> --
> Damien
> 

N_buffers_load is still used. I am still submitting 1000 buffers to the ring, its just
that I use the same buffers over and over (hence the "i % NUM_BUSY_BUFFERS").
So I only allocate 32 buffers, and each gets copied 1000/32 times, so the ring is kept
busy for as long as previously.

  Tim

> > ---
> >  tests/gem_ring_sync_copy.c | 25 +++++++++++++++----------
> >  1 file changed, 15 insertions(+), 10 deletions(-)
> >
> > diff --git a/tests/gem_ring_sync_copy.c b/tests/gem_ring_sync_copy.c
> > index 4a732d2..7257188 100644
> > --- a/tests/gem_ring_sync_copy.c
> > +++ b/tests/gem_ring_sync_copy.c
> > @@ -57,6 +57,7 @@
> >
> >  #define WIDTH	512
> >  #define HEIGHT	512
> > +#define NUM_BUSY_BUFFERS 32
> >
> >  typedef struct {
> >  	int drm_fd;
> > @@ -163,11 +164,13 @@ static void render_busy(data_t *data)
> >  	size_t array_size;
> >  	int i;
> >
> > -	array_size = data->n_buffers_load * sizeof(struct igt_buf);
> > +	/* allocate 32 buffer objects and re-use them as needed */
> > +	array_size = NUM_BUSY_BUFFERS * sizeof(struct igt_buf);
> > +
> >  	data->render.srcs = malloc(array_size);
> >  	data->render.dsts = malloc(array_size);
> >
> > -	for (i = 0; i < data->n_buffers_load; i++) {
> > +	for (i = 0; i < NUM_BUSY_BUFFERS; i++) {
> >  		scratch_buf_init(data, &data->render.srcs[i], WIDTH,
> HEIGHT,
> >  				 0xdeadbeef);
> >  		scratch_buf_init(data, &data->render.dsts[i], WIDTH,
> HEIGHT, @@
> > -177,10 +180,10 @@ static void render_busy(data_t *data)
> >  	for (i = 0; i < data->n_buffers_load; i++) {
> >  		data->render.copy(data->batch,
> >  				  NULL,			/* context */
> > -				  &data->render.srcs[i],
> > +				  &data->render.srcs[i %
> NUM_BUSY_BUFFERS],
> >  				  0, 0,			/* src_x, src_y */
> >  				  WIDTH, HEIGHT,
> > -				  &data->render.dsts[i],
> > +				  &data->render.dsts[i %
> NUM_BUSY_BUFFERS],
> >  				  0, 0			/* dst_x, dst_y */);
> >  	}
> >  }
> > @@ -189,7 +192,7 @@ static void render_busy_fini(data_t *data)  {
> >  	int i;
> >
> > -	for (i = 0; i < data->n_buffers_load; i++) {
> > +	for (i = 0; i < NUM_BUSY_BUFFERS; i++) {
> >  		drm_intel_bo_unreference(data->render.srcs[i].bo);
> >  		drm_intel_bo_unreference(data->render.dsts[i].bo);
> >  	}
> > @@ -225,11 +228,13 @@ static void blitter_busy(data_t *data)
> >  	size_t array_size;
> >  	int i;
> >
> > -	array_size = data->n_buffers_load * sizeof(drm_intel_bo *);
> > +	/* allocate 32 buffer objects and re-use them as needed */
> > +	array_size = NUM_BUSY_BUFFERS * sizeof(drm_intel_bo *);
> > +
> >  	data->blitter.srcs = malloc(array_size);
> >  	data->blitter.dsts = malloc(array_size);
> >
> > -	for (i = 0; i < data->n_buffers_load; i++) {
> > +	for (i = 0; i < NUM_BUSY_BUFFERS; i++) {
> >  		data->blitter.srcs[i] = bo_create(data,
> >  						  WIDTH, HEIGHT,
> >  						  0xdeadbeef);
> > @@ -240,8 +245,8 @@ static void blitter_busy(data_t *data)
> >
> >  	for (i = 0; i < data->n_buffers_load; i++) {
> >  		intel_copy_bo(data->batch,
> > -			      data->blitter.srcs[i],
> > -			      data->blitter.dsts[i],
> > +			      data->blitter.srcs[i % NUM_BUSY_BUFFERS],
> > +			      data->blitter.dsts[i % NUM_BUSY_BUFFERS],
> >  			      WIDTH*HEIGHT*4);
> >  	}
> >  }
> > @@ -250,7 +255,7 @@ static void blitter_busy_fini(data_t *data)  {
> >  	int i;
> >
> > -	for (i = 0; i < data->n_buffers_load; i++) {
> > +	for (i = 0; i < NUM_BUSY_BUFFERS; i++) {
> >  		drm_intel_bo_unreference(data->blitter.srcs[i]);
> >  		drm_intel_bo_unreference(data->blitter.dsts[i]);
> >  	}
> > --
> > 2.1.3
> >

Lespiau, Damien Nov. 28, 2014, 4:04 p.m. UTC | #2

On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote:
> N_buffers_load is still used. I am still submitting 1000 buffers to the ring, its just
> that I use the same buffers over and over (hence the "i % NUM_BUSY_BUFFERS").
> So I only allocate 32 buffers, and each gets copied 1000/32 times, so the ring is kept
> busy for as long as previously.

Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.

Chris Wilson Nov. 28, 2014, 4:19 p.m. UTC | #3

On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote:
> On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote:
> > N_buffers_load is still used. I am still submitting 1000 buffers to the ring, its just
> > that I use the same buffers over and over (hence the "i % NUM_BUSY_BUFFERS").
> > So I only allocate 32 buffers, and each gets copied 1000/32 times, so the ring is kept
> > busy for as long as previously.
> 
> Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.

The ring is kept as busy, but the queue depth is drastically reduced
(from N_buffers to 32). Since both numbers are arbitrary, I am not
adverse to the change, but I would feel happier if it was demonstrated
that the new test is still capable of detecting bugs deliberately
introduced into the ring synchronisation code.
-Chris

tim.gore@intel.com Nov. 28, 2014, 4:34 p.m. UTC | #4

> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Friday, November 28, 2014 4:20 PM
> To: Lespiau, Damien
> Cc: Gore, Tim; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory
> usage
> 
> On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote:
> > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote:
> > > N_buffers_load is still used. I am still submitting 1000 buffers to
> > > the ring, its just that I use the same buffers over and over (hence the "i %
> NUM_BUSY_BUFFERS").
> > > So I only allocate 32 buffers, and each gets copied 1000/32 times,
> > > so the ring is kept busy for as long as previously.
> >
> > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.
> 
> The ring is kept as busy, but the queue depth is drastically reduced (from
> N_buffers to 32). Since both numbers are arbitrary, I am not adverse to the
> change, but I would feel happier if it was demonstrated that the new test is
> still capable of detecting bugs deliberately introduced into the ring
> synchronisation code.
> -Chris
> 

Excuse a rather novice question, but which queue depth is reduced?

  Tim

> --
> Chris Wilson, Intel Open Source Technology Centre

Chris Wilson Nov. 28, 2014, 4:47 p.m. UTC | #5

On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote:
> 
> 
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > Sent: Friday, November 28, 2014 4:20 PM
> > To: Lespiau, Damien
> > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory
> > usage
> > 
> > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote:
> > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote:
> > > > N_buffers_load is still used. I am still submitting 1000 buffers to
> > > > the ring, its just that I use the same buffers over and over (hence the "i %
> > NUM_BUSY_BUFFERS").
> > > > So I only allocate 32 buffers, and each gets copied 1000/32 times,
> > > > so the ring is kept busy for as long as previously.
> > >
> > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.
> > 
> > The ring is kept as busy, but the queue depth is drastically reduced (from
> > N_buffers to 32). Since both numbers are arbitrary, I am not adverse to the
> > change, but I would feel happier if it was demonstrated that the new test is
> > still capable of detecting bugs deliberately introduced into the ring
> > synchronisation code.
> > -Chris
> > 
> 
> Excuse a rather novice question, but which queue depth is reduced?

We track on the object the last read/write request. If you reuse objects
the effective depth in the read/write queue is reduced, and this queue
is implicitly used when synchronising between rings.
-Chris

tim.gore@intel.com Nov. 28, 2014, 5:05 p.m. UTC | #6

> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Friday, November 28, 2014 4:47 PM
> To: Gore, Tim
> Cc: Lespiau, Damien; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory
> usage
> 
> On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote:
> >
> >
> > > -----Original Message-----
> > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > Sent: Friday, November 28, 2014 4:20 PM
> > > To: Lespiau, Damien
> > > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce
> > > memory usage
> > >
> > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote:
> > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote:
> > > > > N_buffers_load is still used. I am still submitting 1000 buffers
> > > > > to the ring, its just that I use the same buffers over and over
> > > > > (hence the "i %
> > > NUM_BUSY_BUFFERS").
> > > > > So I only allocate 32 buffers, and each gets copied 1000/32
> > > > > times, so the ring is kept busy for as long as previously.
> > > >
> > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.
> > >
> > > The ring is kept as busy, but the queue depth is drastically reduced
> > > (from N_buffers to 32). Since both numbers are arbitrary, I am not
> > > adverse to the change, but I would feel happier if it was
> > > demonstrated that the new test is still capable of detecting bugs
> > > deliberately introduced into the ring synchronisation code.
> > > -Chris
> > >
> >
> > Excuse a rather novice question, but which queue depth is reduced?
> 
> We track on the object the last read/write request. If you reuse objects the
> effective depth in the read/write queue is reduced, and this queue is
> implicitly used when synchronising between rings.
> -Chris
> 

OK thanks. So the test has changed but in a subtle way. It would seem that the main
thrust of the test is still there but perhaps we could do better by checking how much
memory we have and then using 1000 buffers of a size that we can accommodate.?

  Tim
> --
> Chris Wilson, Intel Open Source Technology Centre

Chris Wilson Nov. 28, 2014, 5:34 p.m. UTC | #7

On Fri, Nov 28, 2014 at 05:05:11PM +0000, Gore, Tim wrote:
> 
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > Sent: Friday, November 28, 2014 4:47 PM
> > To: Gore, Tim
> > Cc: Lespiau, Damien; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory
> > usage
> > 
> > On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > Sent: Friday, November 28, 2014 4:20 PM
> > > > To: Lespiau, Damien
> > > > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce
> > > > memory usage
> > > >
> > > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote:
> > > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote:
> > > > > > N_buffers_load is still used. I am still submitting 1000 buffers
> > > > > > to the ring, its just that I use the same buffers over and over
> > > > > > (hence the "i %
> > > > NUM_BUSY_BUFFERS").
> > > > > > So I only allocate 32 buffers, and each gets copied 1000/32
> > > > > > times, so the ring is kept busy for as long as previously.
> > > > >
> > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.
> > > >
> > > > The ring is kept as busy, but the queue depth is drastically reduced
> > > > (from N_buffers to 32). Since both numbers are arbitrary, I am not
> > > > adverse to the change, but I would feel happier if it was
> > > > demonstrated that the new test is still capable of detecting bugs
> > > > deliberately introduced into the ring synchronisation code.
> > > > -Chris
> > > >
> > >
> > > Excuse a rather novice question, but which queue depth is reduced?
> > 
> > We track on the object the last read/write request. If you reuse objects the
> > effective depth in the read/write queue is reduced, and this queue is
> > implicitly used when synchronising between rings.
> > -Chris
> > 
> 
> OK thanks. So the test has changed but in a subtle way. It would seem that the main
> thrust of the test is still there but perhaps we could do better by checking how much
> memory we have and then using 1000 buffers of a size that we can accommodate.?

Yes. You could allocate 1000 single pages and perform lots of little copies
in each page to generate the workload with large queue depths. Good idea.
-Chris

Daniel Vetter Nov. 28, 2014, 6:24 p.m. UTC | #8

On Fri, Nov 28, 2014 at 05:34:31PM +0000, Chris Wilson wrote:
> On Fri, Nov 28, 2014 at 05:05:11PM +0000, Gore, Tim wrote:
> > 
> > > -----Original Message-----
> > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > Sent: Friday, November 28, 2014 4:47 PM
> > > To: Gore, Tim
> > > Cc: Lespiau, Damien; intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory
> > > usage
> > > 
> > > On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > > Sent: Friday, November 28, 2014 4:20 PM
> > > > > To: Lespiau, Damien
> > > > > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce
> > > > > memory usage
> > > > >
> > > > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote:
> > > > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote:
> > > > > > > N_buffers_load is still used. I am still submitting 1000 buffers
> > > > > > > to the ring, its just that I use the same buffers over and over
> > > > > > > (hence the "i %
> > > > > NUM_BUSY_BUFFERS").
> > > > > > > So I only allocate 32 buffers, and each gets copied 1000/32
> > > > > > > times, so the ring is kept busy for as long as previously.
> > > > > >
> > > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.
> > > > >
> > > > > The ring is kept as busy, but the queue depth is drastically reduced
> > > > > (from N_buffers to 32). Since both numbers are arbitrary, I am not
> > > > > adverse to the change, but I would feel happier if it was
> > > > > demonstrated that the new test is still capable of detecting bugs
> > > > > deliberately introduced into the ring synchronisation code.
> > > > > -Chris
> > > > >
> > > >
> > > > Excuse a rather novice question, but which queue depth is reduced?
> > > 
> > > We track on the object the last read/write request. If you reuse objects the
> > > effective depth in the read/write queue is reduced, and this queue is
> > > implicitly used when synchronising between rings.
> > > -Chris
> > > 
> > 
> > OK thanks. So the test has changed but in a subtle way. It would seem that the main
> > thrust of the test is still there but perhaps we could do better by checking how much
> > memory we have and then using 1000 buffers of a size that we can accommodate.?
> 
> Yes. You could allocate 1000 single pages and perform lots of little copies
> in each page to generate the workload with large queue depths. Good idea.

I think overall we have way too many copypastas of "throw load with
depencies onto $engine with $ctx". Volunteers to extract something
autotuning and use it everwhere highgly welcome ;-)

I think Ville is working on something for kms_flip (since on vlv the
current workload takes to long and results in timeouts for flips and so
test failures). But not sure whether he'll do the full librarization.

Aside on depencies: As long as we keep things in the same (implicit)
context we won't reorder with the scheduler. Or at least that's been my
working assumption while writing tests. Only for cross-ctx/fd/engine stuff
do the actual depencies matter.
-Daniel

Ville Syrjala Nov. 28, 2014, 6:49 p.m. UTC | #9

On Fri, Nov 28, 2014 at 07:24:18PM +0100, Daniel Vetter wrote:
> On Fri, Nov 28, 2014 at 05:34:31PM +0000, Chris Wilson wrote:
> > On Fri, Nov 28, 2014 at 05:05:11PM +0000, Gore, Tim wrote:
> > > 
> > > > -----Original Message-----
> > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > Sent: Friday, November 28, 2014 4:47 PM
> > > > To: Gore, Tim
> > > > Cc: Lespiau, Damien; intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce memory
> > > > usage
> > > > 
> > > > On Fri, Nov 28, 2014 at 04:34:08PM +0000, Gore, Tim wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > > > > > Sent: Friday, November 28, 2014 4:20 PM
> > > > > > To: Lespiau, Damien
> > > > > > Cc: Gore, Tim; intel-gfx@lists.freedesktop.org
> > > > > > Subject: Re: [Intel-gfx] [RFC] tests/gem_ring_sync_copy: reduce
> > > > > > memory usage
> > > > > >
> > > > > > On Fri, Nov 28, 2014 at 04:04:14PM +0000, Damien Lespiau wrote:
> > > > > > > On Fri, Nov 28, 2014 at 03:47:01PM +0000, Gore, Tim wrote:
> > > > > > > > N_buffers_load is still used. I am still submitting 1000 buffers
> > > > > > > > to the ring, its just that I use the same buffers over and over
> > > > > > > > (hence the "i %
> > > > > > NUM_BUSY_BUFFERS").
> > > > > > > > So I only allocate 32 buffers, and each gets copied 1000/32
> > > > > > > > times, so the ring is kept busy for as long as previously.
> > > > > > >
> > > > > > > Ah oops, yes, indeed. Looks good then, pushed, thanks for the patch.
> > > > > >
> > > > > > The ring is kept as busy, but the queue depth is drastically reduced
> > > > > > (from N_buffers to 32). Since both numbers are arbitrary, I am not
> > > > > > adverse to the change, but I would feel happier if it was
> > > > > > demonstrated that the new test is still capable of detecting bugs
> > > > > > deliberately introduced into the ring synchronisation code.
> > > > > > -Chris
> > > > > >
> > > > >
> > > > > Excuse a rather novice question, but which queue depth is reduced?
> > > > 
> > > > We track on the object the last read/write request. If you reuse objects the
> > > > effective depth in the read/write queue is reduced, and this queue is
> > > > implicitly used when synchronising between rings.
> > > > -Chris
> > > > 
> > > 
> > > OK thanks. So the test has changed but in a subtle way. It would seem that the main
> > > thrust of the test is still there but perhaps we could do better by checking how much
> > > memory we have and then using 1000 buffers of a size that we can accommodate.?
> > 
> > Yes. You could allocate 1000 single pages and perform lots of little copies
> > in each page to generate the workload with large queue depths. Good idea.
> 
> I think overall we have way too many copypastas of "throw load with
> depencies onto $engine with $ctx". Volunteers to extract something
> autotuning and use it everwhere highgly welcome ;-)
> 
> I think Ville is working on something for kms_flip (since on vlv the
> current workload takes to long and results in timeouts for flips and so
> test failures). But not sure whether he'll do the full librarization.

My current hacked up dummy load stuff still lives in kms_flip. It just
tries to tune for a 1 second delay now. And it still uses two bos to do
the copies, but those are now fixed 2kx2k and only the final copy is
aimed at the fb. Seems to working reosonably well now. I'll try to clean
it up a bit and post it to the list, but it's getting a bit late here
so that'll have to wait until early next week.

[RFC] tests/gem_ring_sync_copy: reduce memory usage

Commit Message

Comments

Patch