Message ID | 1478701802-2982-1-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Nov 09, 2016 at 02:30:02PM +0000, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > At the moment we allocate enough sg table entries assuming we > will not be able to do any coallescing. But since in practice > we most often can, and more so very effectively, this ends up > wasting a lot of memory. > > A simple and effective way of trimming the over-allocated > entries is to copy the table over to a new one allocated to the > exact size. > > Experiment on my freshly logged and idle desktop (KDE) showed Experiments > that by doing this we can save approximately 1 MiB of RAM, or > when running a typical benchmark like gl_manhattan I have > even seen a 6 MiB saving. More complicated techniques such as only copying the last used page and freeing the rest are left to the reader. > v2: > * Update commit message. > * Use temporary sg_table on stack. (Chris Wilson) > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index d2ad73d0b5b9..411aae535abe 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2232,6 +2232,28 @@ static unsigned int swiotlb_max_size(void) > #endif > } > > +static void i915_sg_trim(struct sg_table *orig_st) > +{ > + struct sg_table new_st; > + struct scatterlist *sg, *new_sg; > + unsigned int i; > + > + if (orig_st->nents == orig_st->orig_nents) > + return; > + > + if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL)) > + return; > + > + new_sg = new_st.sgl; > + for_each_sg(orig_st->sgl, sg, orig_st->nents, i) { > + sg_set_page(new_sg, sg_page(sg), sg->length, 0); > + new_sg = sg_next(new_sg); Worth a /* called before being DMA mapped, no need to copy sg->dma_* */ ? > + } > + > + sg_free_table(orig_st); > + memcpy(orig_st, &new_st, sizeof(*orig_st)); I would have used *orig_st = new; Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> ^ I remembered it this time! Took a couple of attempts to spell my name right though. -Chris
On 09/11/2016 14:44, Chris Wilson wrote: > On Wed, Nov 09, 2016 at 02:30:02PM +0000, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> At the moment we allocate enough sg table entries assuming we >> will not be able to do any coallescing. But since in practice >> we most often can, and more so very effectively, this ends up >> wasting a lot of memory. >> >> A simple and effective way of trimming the over-allocated >> entries is to copy the table over to a new one allocated to the >> exact size. >> >> Experiment on my freshly logged and idle desktop (KDE) showed > Experiments >> that by doing this we can save approximately 1 MiB of RAM, or >> when running a typical benchmark like gl_manhattan I have >> even seen a 6 MiB saving. > > More complicated techniques such as only copying the last used page and > freeing the rest are left to the reader. Yes that would need to go into the core kernel since it needs access to static alloc/free functions and performance benefit might be quite small for that. Typically I see coalescing working really well so the delta in saved allocations and frees would be quite small. Perhaps I need to attempt to fragment my memory a lot to see what happens then. >> v2: >> * Update commit message. >> * Use temporary sg_table on stack. (Chris Wilson) >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> --- >> drivers/gpu/drm/i915/i915_gem.c | 25 +++++++++++++++++++++++++ >> 1 file changed, 25 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c >> index d2ad73d0b5b9..411aae535abe 100644 >> --- a/drivers/gpu/drm/i915/i915_gem.c >> +++ b/drivers/gpu/drm/i915/i915_gem.c >> @@ -2232,6 +2232,28 @@ static unsigned int swiotlb_max_size(void) >> #endif >> } >> >> +static void i915_sg_trim(struct sg_table *orig_st) >> +{ >> + struct sg_table new_st; >> + struct scatterlist *sg, *new_sg; >> + unsigned int i; >> + >> + if (orig_st->nents == orig_st->orig_nents) >> + return; >> + >> + if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL)) >> + return; >> + >> + new_sg = new_st.sgl; >> + for_each_sg(orig_st->sgl, sg, orig_st->nents, i) { >> + sg_set_page(new_sg, sg_page(sg), sg->length, 0); >> + new_sg = sg_next(new_sg); > > Worth a > /* called before being DMA mapped, no need to copy sg->dma_* */ > ? Hm, or something safer than a comment. Unfortunately entries are not zeroed by default to enable a GEM_BUG_ON here. Unless CONFIG_GEM_DEBUG could mean GFP_ZERO added to some our allocations. :) Yeah I think comment is the best option as long as this function is static only. Will add. > >> + } >> + >> + sg_free_table(orig_st); >> + memcpy(orig_st, &new_st, sizeof(*orig_st)); > > I would have used *orig_st = new; It is more readable, agreed. > > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > ^ I remembered it this time! > Took a couple of attempts to spell my name right though. Thanks! I assume I can keep it for the above little changes. Regards, Tvrtko
On Wed, Nov 09, 2016 at 03:07:38PM +0000, Tvrtko Ursulin wrote: > > On 09/11/2016 14:44, Chris Wilson wrote: > >Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > > ^ I remembered it this time! > >Took a couple of attempts to spell my name right though. > > Thanks! I assume I can keep it for the above little changes. Yes. -Chris
On 09/11/2016 16:45, Patchwork wrote: > == Series Details == > > Series: drm/i915: Trim the object sg table (rev2) > URL : https://patchwork.freedesktop.org/series/15036/ > State : warning > > == Summary == > > Series 15036v2 drm/i915: Trim the object sg table > https://patchwork.freedesktop.org/api/1.0/series/15036/revisions/2/mbox/ > > Test drv_module_reload_basic: > pass -> DMESG-WARN (fi-bsw-n3050) Unrelated, raised new BZ at https://bugs.freedesktop.org/show_bug.cgi?id=98670. > > fi-bdw-5557u total:244 pass:229 dwarn:0 dfail:0 fail:0 skip:15 > fi-bsw-n3050 total:244 pass:203 dwarn:1 dfail:0 fail:0 skip:40 > fi-bxt-t5700 total:244 pass:216 dwarn:0 dfail:0 fail:0 skip:28 > fi-byt-j1900 total:244 pass:216 dwarn:0 dfail:0 fail:0 skip:28 > fi-byt-n2820 total:244 pass:212 dwarn:0 dfail:0 fail:0 skip:32 > fi-hsw-4770 total:244 pass:224 dwarn:0 dfail:0 fail:0 skip:20 > fi-hsw-4770r total:244 pass:224 dwarn:0 dfail:0 fail:0 skip:20 > fi-ilk-650 total:244 pass:191 dwarn:0 dfail:0 fail:0 skip:53 > fi-ivb-3520m total:244 pass:222 dwarn:0 dfail:0 fail:0 skip:22 > fi-ivb-3770 total:244 pass:222 dwarn:0 dfail:0 fail:0 skip:22 > fi-kbl-7200u total:244 pass:222 dwarn:0 dfail:0 fail:0 skip:22 > fi-skl-6260u total:244 pass:230 dwarn:0 dfail:0 fail:0 skip:14 > fi-skl-6700hq total:244 pass:223 dwarn:0 dfail:0 fail:0 skip:21 > fi-skl-6700k total:244 pass:222 dwarn:1 dfail:0 fail:0 skip:21 > fi-skl-6770hq total:244 pass:230 dwarn:0 dfail:0 fail:0 skip:14 > fi-snb-2520m total:244 pass:212 dwarn:0 dfail:0 fail:0 skip:32 > fi-snb-2600 total:244 pass:211 dwarn:0 dfail:0 fail:0 skip:33 > > 74d13d4fab710f664d5eeb15fd3de821a7f46818 drm-intel-nightly: 2016y-11m-09d-15h-02m-46s UTC integration manifest > 2af81e6 drm/i915: Trim the object sg table Merged to dinq, thanks for the review! Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d2ad73d0b5b9..411aae535abe 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2232,6 +2232,28 @@ static unsigned int swiotlb_max_size(void) #endif } +static void i915_sg_trim(struct sg_table *orig_st) +{ + struct sg_table new_st; + struct scatterlist *sg, *new_sg; + unsigned int i; + + if (orig_st->nents == orig_st->orig_nents) + return; + + if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL)) + return; + + new_sg = new_st.sgl; + for_each_sg(orig_st->sgl, sg, orig_st->nents, i) { + sg_set_page(new_sg, sg_page(sg), sg->length, 0); + new_sg = sg_next(new_sg); + } + + sg_free_table(orig_st); + memcpy(orig_st, &new_st, sizeof(*orig_st)); +} + static struct sg_table * i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj) { @@ -2317,6 +2339,9 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj) if (sg) /* loop terminated early; short sg table */ sg_mark_end(sg); + /* Trim unused sg entries to avoid wasting memory. */ + i915_sg_trim(st); + ret = i915_gem_gtt_prepare_pages(obj, st); if (ret) goto err_pages;