diff mbox

[1/2,v2] drm/i915: mark GEM object pages dirty when mapped & written by the CPU

Message ID 1449676372-6988-2-git-send-email-david.s.gordon@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dave Gordon Dec. 9, 2015, 3:52 p.m. UTC
In various places, a single page of a (regular) GEM object is mapped into
CPU address space and updated. In each such case, either the page or the
the object should be marked dirty, to ensure that the modifications are
not discarded if the object is evicted under memory pressure.

The typical sequence is:
	va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
	*(va+offset) = ...
	kunmap_atomic(va);

Here we introduce i915_gem_object_get_dirty_page(), which performs the
same operation as i915_gem_object_get_page() but with the side-effect
of marking the returned page dirty in the pagecache.  This will ensure
that if the object is subsequently evicted (due to memory pressure),
the changes are written to backing store rather than discarded.

Note that it works only for regular (shmfs-backed) GEM objects, but (at
least for now) those are the only ones that are updated in this way --
the objects in question are contexts and batchbuffers, which are always
shmfs-backed.

A separate patch deals with the case where whole objects are (or may
be) dirtied.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h              |  3 +++
 drivers/gpu/drm/i915/i915_gem.c              | 15 +++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  4 ++--
 drivers/gpu/drm/i915/i915_gem_render_state.c |  2 +-
 drivers/gpu/drm/i915/i915_guc_submission.c   |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c             | 11 ++++-------
 6 files changed, 26 insertions(+), 11 deletions(-)

Comments

Chris Wilson Dec. 10, 2015, 1:29 p.m. UTC | #1
On Wed, Dec 09, 2015 at 03:52:51PM +0000, Dave Gordon wrote:
> In various places, a single page of a (regular) GEM object is mapped into
> CPU address space and updated. In each such case, either the page or the
> the object should be marked dirty, to ensure that the modifications are
> not discarded if the object is evicted under memory pressure.
> 
> The typical sequence is:
> 	va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
> 	*(va+offset) = ...
> 	kunmap_atomic(va);
> 
> Here we introduce i915_gem_object_get_dirty_page(), which performs the
> same operation as i915_gem_object_get_page() but with the side-effect
> of marking the returned page dirty in the pagecache.  This will ensure
> that if the object is subsequently evicted (due to memory pressure),
> the changes are written to backing store rather than discarded.
> 
> Note that it works only for regular (shmfs-backed) GEM objects, but (at
> least for now) those are the only ones that are updated in this way --
> the objects in question are contexts and batchbuffers, which are always
> shmfs-backed.
> 
> A separate patch deals with the case where whole objects are (or may
> be) dirtied.
> 
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>

I like this. There are places were we do both obj->dirty and
set_page_dirty(), but this so much more clearly shows what is going on.
All of these locations should be infrequent (or at least have patches to
make them so), so moving the call out-of-line will also be a benefit.

>  /* Allocate a new GEM object and fill it with the supplied data */
>  struct drm_i915_gem_object *
>  i915_gem_object_create_from_data(struct drm_device *dev,
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index a4c243c..81796cc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -264,7 +264,7 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj,
>  	if (ret)
>  		return ret;
>  
> -	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
> +	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
>  				reloc->offset >> PAGE_SHIFT));
>  	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
>  
> @@ -355,7 +355,7 @@ relocate_entry_clflush(struct drm_i915_gem_object *obj,
>  	if (ret)
>  		return ret;
>  
> -	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
> +	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
>  				reloc->offset >> PAGE_SHIFT));
>  	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
>  

The relocation functions may dirty pairs of pages. Other than that, I
think you have the right mix of callsites.
-Chris
Dave Gordon Dec. 10, 2015, 5:24 p.m. UTC | #2
On 10/12/15 13:29, Chris Wilson wrote:
> On Wed, Dec 09, 2015 at 03:52:51PM +0000, Dave Gordon wrote:
>> In various places, a single page of a (regular) GEM object is mapped into
>> CPU address space and updated. In each such case, either the page or the
>> the object should be marked dirty, to ensure that the modifications are
>> not discarded if the object is evicted under memory pressure.
>>
>> The typical sequence is:
>> 	va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
>> 	*(va+offset) = ...
>> 	kunmap_atomic(va);
>>
>> Here we introduce i915_gem_object_get_dirty_page(), which performs the
>> same operation as i915_gem_object_get_page() but with the side-effect
>> of marking the returned page dirty in the pagecache.  This will ensure
>> that if the object is subsequently evicted (due to memory pressure),
>> the changes are written to backing store rather than discarded.
>>
>> Note that it works only for regular (shmfs-backed) GEM objects, but (at
>> least for now) those are the only ones that are updated in this way --
>> the objects in question are contexts and batchbuffers, which are always
>> shmfs-backed.
>>
>> A separate patch deals with the case where whole objects are (or may
>> be) dirtied.
>>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>
> I like this. There are places were we do both obj->dirty and
> set_page_dirty(), but this so much more clearly shows what is going on.
> All of these locations should be infrequent (or at least have patches to
> make them so), so moving the call out-of-line will also be a benefit.

I think there was only one place that both called set_page_dirty() AND 
set obj->dirty, which was in populate_lr_context(). You'll see that I've 
eliminated both in favour of a call to get_dirty_page() :)

>>   /* Allocate a new GEM object and fill it with the supplied data */
>>   struct drm_i915_gem_object *
>>   i915_gem_object_create_from_data(struct drm_device *dev,
>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> index a4c243c..81796cc 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> @@ -264,7 +264,7 @@ relocate_entry_cpu(struct drm_i915_gem_object *obj,
>>   	if (ret)
>>   		return ret;
>>
>> -	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
>> +	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
>>   				reloc->offset >> PAGE_SHIFT));
>>   	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
>>
>> @@ -355,7 +355,7 @@ relocate_entry_clflush(struct drm_i915_gem_object *obj,
>>   	if (ret)
>>   		return ret;
>>
>> -	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
>> +	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
>>   				reloc->offset >> PAGE_SHIFT));
>>   	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
>>
>
> The relocation functions may dirty pairs of pages. Other than that, I
> think you have the right mix of callsites.
> -Chris

Thanks, I've added the other two to the next (v3) version :)

.Dave.
Chris Wilson Dec. 10, 2015, 9:04 p.m. UTC | #3
On Thu, Dec 10, 2015 at 05:24:42PM +0000, Dave Gordon wrote:
> On 10/12/15 13:29, Chris Wilson wrote:
> >On Wed, Dec 09, 2015 at 03:52:51PM +0000, Dave Gordon wrote:
> >>In various places, a single page of a (regular) GEM object is mapped into
> >>CPU address space and updated. In each such case, either the page or the
> >>the object should be marked dirty, to ensure that the modifications are
> >>not discarded if the object is evicted under memory pressure.
> >>
> >>The typical sequence is:
> >>	va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
> >>	*(va+offset) = ...
> >>	kunmap_atomic(va);
> >>
> >>Here we introduce i915_gem_object_get_dirty_page(), which performs the
> >>same operation as i915_gem_object_get_page() but with the side-effect
> >>of marking the returned page dirty in the pagecache.  This will ensure
> >>that if the object is subsequently evicted (due to memory pressure),
> >>the changes are written to backing store rather than discarded.
> >>
> >>Note that it works only for regular (shmfs-backed) GEM objects, but (at
> >>least for now) those are the only ones that are updated in this way --
> >>the objects in question are contexts and batchbuffers, which are always
> >>shmfs-backed.
> >>
> >>A separate patch deals with the case where whole objects are (or may
> >>be) dirtied.
> >>
> >>Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >
> >I like this. There are places were we do both obj->dirty and
> >set_page_dirty(), but this so much more clearly shows what is going on.
> >All of these locations should be infrequent (or at least have patches to
> >make them so), so moving the call out-of-line will also be a benefit.
> 
> I think there was only one place that both called set_page_dirty()
> AND set obj->dirty, which was in populate_lr_context(). You'll see
> that I've eliminated both in favour of a call to get_dirty_page() :)

It was things like all GPU objects have obj->dirty set, so really the
importance of using get_dirty_page() in the relocations and context
pinning is for documentation. Which is a very good reason, nevertheless.
-Chris
Daniel Vetter Dec. 11, 2015, 5:08 p.m. UTC | #4
On Thu, Dec 10, 2015 at 09:04:23PM +0000, Chris Wilson wrote:
> On Thu, Dec 10, 2015 at 05:24:42PM +0000, Dave Gordon wrote:
> > On 10/12/15 13:29, Chris Wilson wrote:
> > >On Wed, Dec 09, 2015 at 03:52:51PM +0000, Dave Gordon wrote:
> > >>In various places, a single page of a (regular) GEM object is mapped into
> > >>CPU address space and updated. In each such case, either the page or the
> > >>the object should be marked dirty, to ensure that the modifications are
> > >>not discarded if the object is evicted under memory pressure.
> > >>
> > >>The typical sequence is:
> > >>	va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
> > >>	*(va+offset) = ...
> > >>	kunmap_atomic(va);
> > >>
> > >>Here we introduce i915_gem_object_get_dirty_page(), which performs the
> > >>same operation as i915_gem_object_get_page() but with the side-effect
> > >>of marking the returned page dirty in the pagecache.  This will ensure
> > >>that if the object is subsequently evicted (due to memory pressure),
> > >>the changes are written to backing store rather than discarded.
> > >>
> > >>Note that it works only for regular (shmfs-backed) GEM objects, but (at
> > >>least for now) those are the only ones that are updated in this way --
> > >>the objects in question are contexts and batchbuffers, which are always
> > >>shmfs-backed.
> > >>
> > >>A separate patch deals with the case where whole objects are (or may
> > >>be) dirtied.
> > >>
> > >>Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> > >>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > >
> > >I like this. There are places were we do both obj->dirty and
> > >set_page_dirty(), but this so much more clearly shows what is going on.
> > >All of these locations should be infrequent (or at least have patches to
> > >make them so), so moving the call out-of-line will also be a benefit.
> > 
> > I think there was only one place that both called set_page_dirty()
> > AND set obj->dirty, which was in populate_lr_context(). You'll see
> > that I've eliminated both in favour of a call to get_dirty_page() :)
> 
> It was things like all GPU objects have obj->dirty set, so really the
> importance of using get_dirty_page() in the relocations and context
> pinning is for documentation. Which is a very good reason, nevertheless.

Hm, I think if you force a fault on relocs and then shrink everything
really hard before actually managing to submit the batch you could provoke
this into a proper bug. one-in-a-billion perhaps ;-)
Chris Wilson Dec. 11, 2015, 5:27 p.m. UTC | #5
On Fri, Dec 11, 2015 at 06:08:10PM +0100, Daniel Vetter wrote:
> Hm, I think if you force a fault on relocs and then shrink everything
> really hard before actually managing to submit the batch you could provoke
> this into a proper bug. one-in-a-billion perhaps ;-)

Hmm, you would need to force the slowpath (otherwise all the objects are
reserved and so not swappable). And then we force the presumed_offset to
be invalid but only on the user side - so we don't force the relocations
in this batch. Ok, plausible. But who hits the slowpath? Sigh. Fancy
reviewing some mesa patches?
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f1a8a53..ca77392 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2894,6 +2894,9 @@  static inline int __sg_page_count(struct scatterlist *sg)
 	return sg->length >> PAGE_SHIFT;
 }
 
+struct page *
+i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, int n);
+
 static inline struct page *
 i915_gem_object_get_page(struct drm_i915_gem_object *obj, int n)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dfaf25b..06a5f39 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5184,6 +5184,21 @@  bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
 	return false;
 }
 
+/* Like i915_gem_object_get_page(), but mark the returned page dirty */
+struct page *
+i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, int n)
+{
+	struct page *page;
+
+	/* Only default objects have per-page dirty tracking */
+	if (WARN_ON(obj->ops != &i915_gem_object_ops))
+		return NULL;
+
+	page = i915_gem_object_get_page(obj, n);
+	set_page_dirty(page);
+	return page;
+}
+
 /* Allocate a new GEM object and fill it with the supplied data */
 struct drm_i915_gem_object *
 i915_gem_object_create_from_data(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a4c243c..81796cc 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -264,7 +264,7 @@  relocate_entry_cpu(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
+	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
 				reloc->offset >> PAGE_SHIFT));
 	*(uint32_t *)(vaddr + page_offset) = lower_32_bits(delta);
 
@@ -355,7 +355,7 @@  relocate_entry_clflush(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
-	vaddr = kmap_atomic(i915_gem_object_get_page(obj,
+	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj,
 				reloc->offset >> PAGE_SHIFT));
 	clflush_write32(vaddr + page_offset, lower_32_bits(delta));
 
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 5026a62..fc7e6d5 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -103,7 +103,7 @@  static int render_state_setup(struct render_state *so)
 	if (ret)
 		return ret;
 
-	page = sg_page(so->obj->pages->sgl);
+	page = i915_gem_object_get_dirty_page(so->obj, 0);
 	d = kmap(page);
 
 	while (i < rodata->batch_items) {
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 0d23785b..05aa7e6 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -568,7 +568,7 @@  static void lr_context_update(struct drm_i915_gem_request *rq)
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
 	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
 
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
+	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
 	reg_state = kmap_atomic(page);
 
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4ebafab..ceccecc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -372,7 +372,7 @@  static int execlists_update_context(struct drm_i915_gem_request *rq)
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
 	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
 
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
+	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
 	reg_state = kmap_atomic(page);
 
 	reg_state[CTX_RING_TAIL+1] = rq->tail;
@@ -1425,7 +1425,7 @@  static int intel_init_workaround_bb(struct intel_engine_cs *ring)
 		return ret;
 	}
 
-	page = i915_gem_object_get_page(wa_ctx->obj, 0);
+	page = i915_gem_object_get_dirty_page(wa_ctx->obj, 0);
 	batch = kmap_atomic(page);
 	offset = 0;
 
@@ -2257,7 +2257,7 @@  populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 
 	/* The second page of the context object contains some fields which must
 	 * be set up prior to the first execution. */
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
+	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
 	reg_state = kmap_atomic(page);
 
 	/* A context is actually a big batch buffer with several MI_LOAD_REGISTER_IMM
@@ -2343,9 +2343,6 @@  populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	}
 
 	kunmap_atomic(reg_state);
-
-	ctx_obj->dirty = 1;
-	set_page_dirty(page);
 	i915_gem_object_unpin_pages(ctx_obj);
 
 	return 0;
@@ -2529,7 +2526,7 @@  void intel_lr_context_reset(struct drm_device *dev,
 			WARN(1, "Failed get_pages for context obj\n");
 			continue;
 		}
-		page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
+		page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
 		reg_state = kmap_atomic(page);
 
 		reg_state[CTX_RING_HEAD+1] = 0;