diff mbox

drm/i915: Trim the object sg table

Message ID 1478701802-2982-1-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin Nov. 9, 2016, 2:30 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

At the moment we allocate enough sg table entries assuming we
will not be able to do any coallescing. But since in practice
we most often can, and more so very effectively, this ends up
wasting a lot of memory.

A simple and effective way of trimming the over-allocated
entries is to copy the table over to a new one allocated to the
exact size.

Experiment on my freshly logged and idle desktop (KDE) showed
that by doing this we can save approximately 1 MiB of RAM, or
when running a typical benchmark like gl_manhattan I have
even seen a 6 MiB saving.

v2:
 * Update commit message.
 * Use temporary sg_table on stack. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

Comments

Chris Wilson Nov. 9, 2016, 2:44 p.m. UTC | #1
On Wed, Nov 09, 2016 at 02:30:02PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> At the moment we allocate enough sg table entries assuming we
> will not be able to do any coallescing. But since in practice
> we most often can, and more so very effectively, this ends up
> wasting a lot of memory.
> 
> A simple and effective way of trimming the over-allocated
> entries is to copy the table over to a new one allocated to the
> exact size.
> 
> Experiment on my freshly logged and idle desktop (KDE) showed
Experiments
> that by doing this we can save approximately 1 MiB of RAM, or
> when running a typical benchmark like gl_manhattan I have
> even seen a 6 MiB saving.

More complicated techniques such as only copying the last used page and
freeing the rest are left to the reader.

> v2:
>  * Update commit message.
>  * Use temporary sg_table on stack. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d2ad73d0b5b9..411aae535abe 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2232,6 +2232,28 @@ static unsigned int swiotlb_max_size(void)
>  #endif
>  }
>  
> +static void i915_sg_trim(struct sg_table *orig_st)
> +{
> +	struct sg_table new_st;
> +	struct scatterlist *sg, *new_sg;
> +	unsigned int i;
> +
> +	if (orig_st->nents == orig_st->orig_nents)
> +		return;
> +
> +	if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL))
> +		return;
> +
> +	new_sg = new_st.sgl;
> +	for_each_sg(orig_st->sgl, sg, orig_st->nents, i) {
> +		sg_set_page(new_sg, sg_page(sg), sg->length, 0);
> +		new_sg = sg_next(new_sg);

Worth a
	/* called before being DMA mapped, no need to copy sg->dma_* */
?

> +	}
> +
> +	sg_free_table(orig_st);
> +	memcpy(orig_st, &new_st, sizeof(*orig_st));

I would have used *orig_st = new;

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
           ^ I remembered it this time!
Took a couple of attempts to spell my name right though.
-Chris
Tvrtko Ursulin Nov. 9, 2016, 3:07 p.m. UTC | #2
On 09/11/2016 14:44, Chris Wilson wrote:
> On Wed, Nov 09, 2016 at 02:30:02PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> At the moment we allocate enough sg table entries assuming we
>> will not be able to do any coallescing. But since in practice
>> we most often can, and more so very effectively, this ends up
>> wasting a lot of memory.
>>
>> A simple and effective way of trimming the over-allocated
>> entries is to copy the table over to a new one allocated to the
>> exact size.
>>
>> Experiment on my freshly logged and idle desktop (KDE) showed
> Experiments
>> that by doing this we can save approximately 1 MiB of RAM, or
>> when running a typical benchmark like gl_manhattan I have
>> even seen a 6 MiB saving.
>
> More complicated techniques such as only copying the last used page and
> freeing the rest are left to the reader.

Yes that would need to go into the core kernel since it needs access to 
static alloc/free functions and performance benefit might be quite small 
for that. Typically I see coalescing working really well so the delta in 
saved allocations and frees would be quite small. Perhaps I need to 
attempt to fragment my memory a lot to see what happens then.

>> v2:
>>  * Update commit message.
>>  * Use temporary sg_table on stack. (Chris Wilson)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>  drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index d2ad73d0b5b9..411aae535abe 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2232,6 +2232,28 @@ static unsigned int swiotlb_max_size(void)
>>  #endif
>>  }
>>
>> +static void i915_sg_trim(struct sg_table *orig_st)
>> +{
>> +	struct sg_table new_st;
>> +	struct scatterlist *sg, *new_sg;
>> +	unsigned int i;
>> +
>> +	if (orig_st->nents == orig_st->orig_nents)
>> +		return;
>> +
>> +	if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL))
>> +		return;
>> +
>> +	new_sg = new_st.sgl;
>> +	for_each_sg(orig_st->sgl, sg, orig_st->nents, i) {
>> +		sg_set_page(new_sg, sg_page(sg), sg->length, 0);
>> +		new_sg = sg_next(new_sg);
>
> Worth a
> 	/* called before being DMA mapped, no need to copy sg->dma_* */
> ?

Hm, or something safer than a comment. Unfortunately entries are not 
zeroed by default to enable a GEM_BUG_ON here. Unless CONFIG_GEM_DEBUG 
could mean GFP_ZERO added to some our allocations. :)

Yeah I think comment is the best option as long as this function is 
static only. Will add.

>
>> +	}
>> +
>> +	sg_free_table(orig_st);
>> +	memcpy(orig_st, &new_st, sizeof(*orig_st));
>
> I would have used *orig_st = new;

It is more readable, agreed.

>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>            ^ I remembered it this time!
> Took a couple of attempts to spell my name right though.

Thanks! I assume I can keep it for the above little changes.

Regards,

Tvrtko
Chris Wilson Nov. 9, 2016, 3:13 p.m. UTC | #3
On Wed, Nov 09, 2016 at 03:07:38PM +0000, Tvrtko Ursulin wrote:
> 
> On 09/11/2016 14:44, Chris Wilson wrote:
> >Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> >           ^ I remembered it this time!
> >Took a couple of attempts to spell my name right though.
> 
> Thanks! I assume I can keep it for the above little changes.

Yes.
-Chris
Tvrtko Ursulin Nov. 10, 2016, 9:30 a.m. UTC | #4
On 09/11/2016 16:45, Patchwork wrote:
> == Series Details ==
>
> Series: drm/i915: Trim the object sg table (rev2)
> URL   : https://patchwork.freedesktop.org/series/15036/
> State : warning
>
> == Summary ==
>
> Series 15036v2 drm/i915: Trim the object sg table
> https://patchwork.freedesktop.org/api/1.0/series/15036/revisions/2/mbox/
>
> Test drv_module_reload_basic:
>                 pass       -> DMESG-WARN (fi-bsw-n3050)

Unrelated, raised new BZ at 
https://bugs.freedesktop.org/show_bug.cgi?id=98670.

>
> fi-bdw-5557u     total:244  pass:229  dwarn:0   dfail:0   fail:0   skip:15
> fi-bsw-n3050     total:244  pass:203  dwarn:1   dfail:0   fail:0   skip:40
> fi-bxt-t5700     total:244  pass:216  dwarn:0   dfail:0   fail:0   skip:28
> fi-byt-j1900     total:244  pass:216  dwarn:0   dfail:0   fail:0   skip:28
> fi-byt-n2820     total:244  pass:212  dwarn:0   dfail:0   fail:0   skip:32
> fi-hsw-4770      total:244  pass:224  dwarn:0   dfail:0   fail:0   skip:20
> fi-hsw-4770r     total:244  pass:224  dwarn:0   dfail:0   fail:0   skip:20
> fi-ilk-650       total:244  pass:191  dwarn:0   dfail:0   fail:0   skip:53
> fi-ivb-3520m     total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22
> fi-ivb-3770      total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22
> fi-kbl-7200u     total:244  pass:222  dwarn:0   dfail:0   fail:0   skip:22
> fi-skl-6260u     total:244  pass:230  dwarn:0   dfail:0   fail:0   skip:14
> fi-skl-6700hq    total:244  pass:223  dwarn:0   dfail:0   fail:0   skip:21
> fi-skl-6700k     total:244  pass:222  dwarn:1   dfail:0   fail:0   skip:21
> fi-skl-6770hq    total:244  pass:230  dwarn:0   dfail:0   fail:0   skip:14
> fi-snb-2520m     total:244  pass:212  dwarn:0   dfail:0   fail:0   skip:32
> fi-snb-2600      total:244  pass:211  dwarn:0   dfail:0   fail:0   skip:33
>
> 74d13d4fab710f664d5eeb15fd3de821a7f46818 drm-intel-nightly: 2016y-11m-09d-15h-02m-46s UTC integration manifest
> 2af81e6 drm/i915: Trim the object sg table

Merged to dinq, thanks for the review!

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d2ad73d0b5b9..411aae535abe 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2232,6 +2232,28 @@  static unsigned int swiotlb_max_size(void)
 #endif
 }
 
+static void i915_sg_trim(struct sg_table *orig_st)
+{
+	struct sg_table new_st;
+	struct scatterlist *sg, *new_sg;
+	unsigned int i;
+
+	if (orig_st->nents == orig_st->orig_nents)
+		return;
+
+	if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL))
+		return;
+
+	new_sg = new_st.sgl;
+	for_each_sg(orig_st->sgl, sg, orig_st->nents, i) {
+		sg_set_page(new_sg, sg_page(sg), sg->length, 0);
+		new_sg = sg_next(new_sg);
+	}
+
+	sg_free_table(orig_st);
+	memcpy(orig_st, &new_st, sizeof(*orig_st));
+}
+
 static struct sg_table *
 i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 {
@@ -2317,6 +2339,9 @@  i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	if (sg) /* loop terminated early; short sg table */
 		sg_mark_end(sg);
 
+	/* Trim unused sg entries to avoid wasting memory. */
+	i915_sg_trim(st);
+
 	ret = i915_gem_gtt_prepare_pages(obj, st);
 	if (ret)
 		goto err_pages;