diff mbox

[RFC,1/7] drm/i915: Extract sg creation into a helper

Message ID 1476349444-7331-2-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin Oct. 13, 2016, 9:03 a.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

In order to reuse the same logic in several places in the driver,
extract the logic which adds pages to the sg list and does the
potential coalescing, into separate functions.

Code wanting to build the sg table needs to do the following:

1. Call i915_sg_create to create the state object for a given
   maximum number of pages.
2. Iterate using i915_sg_for_each_page
3. Call i915_sg_add_page to add all the pages in order
4. On success call i915_sg_complete, or on failure i915_sg_abort

In this patch only i915_gem_object_get_pages_gtt gets converted
to use the new functions.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 153 ++++++++++++++++++++++++++++------------
 1 file changed, 109 insertions(+), 44 deletions(-)

Comments

Chris Wilson Oct. 13, 2016, 9:20 a.m. UTC | #1
On Thu, Oct 13, 2016 at 10:03:58AM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> In order to reuse the same logic in several places in the driver,
> extract the logic which adds pages to the sg list and does the
> potential coalescing, into separate functions.
> 
> Code wanting to build the sg table needs to do the following:
> 
> 1. Call i915_sg_create to create the state object for a given
>    maximum number of pages.
> 2. Iterate using i915_sg_for_each_page
> 3. Call i915_sg_add_page to add all the pages in order
> 4. On success call i915_sg_complete, or on failure i915_sg_abort
> 
> In this patch only i915_gem_object_get_pages_gtt gets converted
> to use the new functions.

Yup, much happier. Though can we just allocate the state on the stack,
it is the same as what we are previously using anyway. 40 bytes.

I've also thought about doing the sg_trim, just never actually tried...

Just not convinced that recreating the compact sg is better than copying
the already compact sg (for dmabuf, partial).
-Chris
Tvrtko Ursulin Oct. 13, 2016, 9:55 a.m. UTC | #2
On 13/10/2016 10:20, Chris Wilson wrote:
> On Thu, Oct 13, 2016 at 10:03:58AM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> In order to reuse the same logic in several places in the driver,
>> extract the logic which adds pages to the sg list and does the
>> potential coalescing, into separate functions.
>>
>> Code wanting to build the sg table needs to do the following:
>>
>> 1. Call i915_sg_create to create the state object for a given
>>     maximum number of pages.
>> 2. Iterate using i915_sg_for_each_page
>> 3. Call i915_sg_add_page to add all the pages in order
>> 4. On success call i915_sg_complete, or on failure i915_sg_abort
>>
>> In this patch only i915_gem_object_get_pages_gtt gets converted
>> to use the new functions.
> Yup, much happier. Though can we just allocate the state on the stack,
> it is the same as what we are previously using anyway. 40 bytes.

Thought about this also and had a slight preference not to use stack, 
since we got to have kmalloc and error handling in i915_sg_create anyway.

> I've also thought about doing the sg_trim, just never actually tried...
>
> Just not convinced that recreating the compact sg is better than copying
> the already compact sg (for dmabuf, partial).

Agreed already on dmabuf. Partials yes, I suppose that could be 
implemented more efficiently. You think it is worth special casing it 
rather than reuse the common API? It would have to handle a start in the 
middle of the sg entry so it wouldn't be a straightforward copy.

Regards,

Tvrtko
Chris Wilson Oct. 13, 2016, 10:12 a.m. UTC | #3
On Thu, Oct 13, 2016 at 10:55:12AM +0100, Tvrtko Ursulin wrote:
> 
> On 13/10/2016 10:20, Chris Wilson wrote:
> >On Thu, Oct 13, 2016 at 10:03:58AM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>In order to reuse the same logic in several places in the driver,
> >>extract the logic which adds pages to the sg list and does the
> >>potential coalescing, into separate functions.
> >>
> >>Code wanting to build the sg table needs to do the following:
> >>
> >>1. Call i915_sg_create to create the state object for a given
> >>    maximum number of pages.
> >>2. Iterate using i915_sg_for_each_page
> >>3. Call i915_sg_add_page to add all the pages in order
> >>4. On success call i915_sg_complete, or on failure i915_sg_abort
> >>
> >>In this patch only i915_gem_object_get_pages_gtt gets converted
> >>to use the new functions.
> >Yup, much happier. Though can we just allocate the state on the stack,
> >it is the same as what we are previously using anyway. 40 bytes.
> 
> Thought about this also and had a slight preference not to use
> stack, since we got to have kmalloc and error handling in
> i915_sg_create anyway.
> 
> >I've also thought about doing the sg_trim, just never actually tried...
> >
> >Just not convinced that recreating the compact sg is better than copying
> >the already compact sg (for dmabuf, partial).
> 
> Agreed already on dmabuf. Partials yes, I suppose that could be
> implemented more efficiently. You think it is worth special casing
> it rather than reuse the common API? It would have to handle a start
> in the middle of the sg entry so it wouldn't be a straightforward
> copy.

If you happen to running perf and gem_mmap_gtt, you might want to
improve it. (You have to be using lots of large textures with mesa to
worry about it, and then you are more likely bandwidth limited and not
notice the load stalls as much.) It doesn't look too bad:

https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=eb-fence&id=6eb7f33c14b9c408856af86c3b35c7ff878afca9

partial pages is a bit special as we have to copy across the dma mapping
as well.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fdd496e6c081..93b047735e6b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2216,18 +2216,104 @@  static unsigned long swiotlb_max_size(void)
 #endif
 }
 
+struct i915_sg_create_state {
+	struct sg_table *st;
+	struct scatterlist *sg;
+	unsigned int idx;
+	unsigned int page_count;
+	unsigned long max_segment;
+	unsigned long last_pfn;
+};
+
+static struct i915_sg_create_state *
+i915_sg_create(unsigned int page_count)
+{
+	struct i915_sg_create_state *state;
+	struct sg_table *st;
+
+	state = kmalloc(sizeof(*state), GFP_KERNEL);
+	if (!state)
+		return ERR_PTR(-ENOMEM);
+
+	st = kmalloc(sizeof(*st), GFP_KERNEL);
+	if (!st) {
+		kfree(state);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	if (sg_alloc_table(st, page_count, GFP_KERNEL)) {
+		kfree(st);
+		kfree(state);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	st->nents = 0;
+	state->st = st;
+	state->sg = st->sgl;
+	state->page_count = page_count;
+	state->idx = 0;
+	state->max_segment = swiotlb_max_size();
+	if (!state->max_segment)
+		state->max_segment = page_count * PAGE_SIZE;
+	state->last_pfn = 0;
+
+	return state;
+}
+
+static void
+i915_sg_add_page(struct i915_sg_create_state *state, struct page *page)
+{
+	unsigned long pfn = page_to_pfn(page);
+	struct scatterlist *sg = state->sg;
+
+	if (!state->idx ||
+	    sg->length >= state->max_segment ||
+	    pfn != state->last_pfn + 1) {
+		if (state->idx)
+			sg = sg_next(sg);
+		state->st->nents++;
+		sg_set_page(sg, page, PAGE_SIZE, 0);
+	} else {
+		sg->length += PAGE_SIZE;
+	}
+
+	state->last_pfn = pfn;
+	state->sg = sg;
+	state->idx++;
+}
+
+static struct sg_table *
+i915_sg_complete(struct i915_sg_create_state *state)
+{
+	struct sg_table *st = state->st;
+
+	if (state->sg) /* fewer sg entries written than max allocated */
+		sg_mark_end(state->sg);
+
+	kfree(state);
+
+	return st;
+}
+
+static void
+i915_sg_abort(struct i915_sg_create_state *state)
+{
+	sg_free_table(state->st);
+	kfree(state->st);
+	kfree(state);
+}
+
+#define i915_sg_for_each_page(state) \
+	for( ; (state)->idx < (state)->page_count; )
+
 static int
 i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
-	int page_count, i;
 	struct address_space *mapping;
-	struct sg_table *st;
-	struct scatterlist *sg;
+	struct i915_sg_create_state *state;
 	struct sgt_iter sgt_iter;
 	struct page *page;
-	unsigned long last_pfn = 0;	/* suppress gcc warning */
-	unsigned long max_segment;
 	int ret;
 	gfp_t gfp;
 
@@ -2238,19 +2324,9 @@  i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	BUG_ON(obj->base.read_domains & I915_GEM_GPU_DOMAINS);
 	BUG_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS);
 
-	max_segment = swiotlb_max_size();
-	if (!max_segment)
-		max_segment = obj->base.size;
-
-	st = kmalloc(sizeof(*st), GFP_KERNEL);
-	if (st == NULL)
-		return -ENOMEM;
-
-	page_count = obj->base.size / PAGE_SIZE;
-	if (sg_alloc_table(st, page_count, GFP_KERNEL)) {
-		kfree(st);
-		return -ENOMEM;
-	}
+	state = i915_sg_create(obj->base.size / PAGE_SIZE);
+	if (IS_ERR(state))
+		return PTR_ERR(state);
 
 	/* Get the list of pages out of our struct file.  They'll be pinned
 	 * at this point until we release them.
@@ -2260,47 +2336,37 @@  i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	mapping = obj->base.filp->f_mapping;
 	gfp = mapping_gfp_constraint(mapping, ~(__GFP_IO | __GFP_RECLAIM));
 	gfp |= __GFP_NORETRY | __GFP_NOWARN;
-	sg = st->sgl;
-	st->nents = 0;
-	for (i = 0; i < page_count; i++) {
-		page = shmem_read_mapping_page_gfp(mapping, i, gfp);
+	i915_sg_for_each_page(state) {
+		page = shmem_read_mapping_page_gfp(mapping, state->idx, gfp);
 		if (IS_ERR(page)) {
 			i915_gem_shrink(dev_priv,
-					page_count,
+					state->page_count,
 					I915_SHRINK_BOUND |
 					I915_SHRINK_UNBOUND |
 					I915_SHRINK_PURGEABLE);
-			page = shmem_read_mapping_page_gfp(mapping, i, gfp);
+			page = shmem_read_mapping_page_gfp(mapping, state->idx,
+							   gfp);
 		}
 		if (IS_ERR(page)) {
 			/* We've tried hard to allocate the memory by reaping
 			 * our own buffer, now let the real VM do its job and
 			 * go down in flames if truly OOM.
 			 */
-			page = shmem_read_mapping_page(mapping, i);
+			page = shmem_read_mapping_page(mapping, state->idx);
 			if (IS_ERR(page)) {
 				ret = PTR_ERR(page);
 				goto err_pages;
 			}
 		}
-		if (!i ||
-		    sg->length >= max_segment ||
-		    page_to_pfn(page) != last_pfn + 1) {
-			if (i)
-				sg = sg_next(sg);
-			st->nents++;
-			sg_set_page(sg, page, PAGE_SIZE, 0);
-		} else {
-			sg->length += PAGE_SIZE;
-		}
-		last_pfn = page_to_pfn(page);
+
+		i915_sg_add_page(state, page);
 
 		/* Check that the i965g/gm workaround works. */
-		WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
+		WARN_ON((gfp & __GFP_DMA32) &&
+			(state->last_pfn >= 0x00100000UL));
 	}
-	if (sg) /* loop terminated early; short sg table */
-		sg_mark_end(sg);
-	obj->pages = st;
+
+	obj->pages = i915_sg_complete(state);
 
 	ret = i915_gem_gtt_prepare_object(obj);
 	if (ret)
@@ -2316,11 +2382,10 @@  i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	return 0;
 
 err_pages:
-	sg_mark_end(sg);
-	for_each_sgt_page(page, sgt_iter, st)
+	sg_mark_end(state->sg);
+	for_each_sgt_page(page, sgt_iter, state->st)
 		put_page(page);
-	sg_free_table(st);
-	kfree(st);
+	i915_sg_abort(state);
 
 	/* shmemfs first checks if there is enough memory to allocate the page
 	 * and reports ENOSPC should there be insufficient, along with the usual