diff mbox

[24/30] drm/i915: Refactor pwrite/pread to use single copy of get_user_pages

Message ID 1302640318-23165-25-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson April 12, 2011, 8:31 p.m. UTC
Replace the three nearly identical copies of the code with a single
function. And take advantage of the opportunity to do some
micro-optimisation: avoid the vmalloc if at all possible and also avoid
dropping the lock unless we are forced to acquire the mm semaphore.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |  132 ++++++++++++++++++++++-----------------
 1 files changed, 75 insertions(+), 57 deletions(-)

Comments

Eric Anholt April 13, 2011, 3:59 p.m. UTC | #1
On Tue, 12 Apr 2011 21:31:52 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Replace the three nearly identical copies of the code with a single
> function. And take advantage of the opportunity to do some
> micro-optimisation: avoid the vmalloc if at all possible and also avoid
> dropping the lock unless we are forced to acquire the mm semaphore.

Could we get some performance numbers in patches that add code for
performance?
Chris Wilson April 13, 2011, 5:24 p.m. UTC | #2
On Wed, 13 Apr 2011 08:59:55 -0700, Eric Anholt <eric@anholt.net> wrote:
> On Tue, 12 Apr 2011 21:31:52 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > Replace the three nearly identical copies of the code with a single
> > function. And take advantage of the opportunity to do some
> > micro-optimisation: avoid the vmalloc if at all possible and also avoid
> > dropping the lock unless we are forced to acquire the mm semaphore.
> 
> Could we get some performance numbers in patches that add code for
> performance?

For myself, this was justified by simply refactoring the common code.
However, x11perf -aa10text on pnv:
  before: 1.28 Mglyph/sec
  after:  1.45 Mglyph/sec

I have my SNB box doing a more thorough analysis of the difference for
various pwrite sizes (assuming that the likelihood of faulting is not
totally workload dependent.)
-Chris
Daniel Vetter April 13, 2011, 7:26 p.m. UTC | #3
On Tue, Apr 12, 2011 at 09:31:52PM +0100, Chris Wilson wrote:
> Replace the three nearly identical copies of the code with a single
> function. And take advantage of the opportunity to do some
> micro-optimisation: avoid the vmalloc if at all possible and also avoid
> dropping the lock unless we are forced to acquire the mm semaphore.

One tiny nitpick: Perhaps put an api comment at the top of
gem_get_user_pages that this function drops the struct_mutex. That's not
something we normally do and could cause endless amounts of fun if
neglected.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Eric Anholt April 13, 2011, 7:35 p.m. UTC | #4
On Wed, 13 Apr 2011 18:24:36 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Wed, 13 Apr 2011 08:59:55 -0700, Eric Anholt <eric@anholt.net> wrote:
> > On Tue, 12 Apr 2011 21:31:52 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > Replace the three nearly identical copies of the code with a single
> > > function. And take advantage of the opportunity to do some
> > > micro-optimisation: avoid the vmalloc if at all possible and also avoid
> > > dropping the lock unless we are forced to acquire the mm semaphore.
> > 
> > Could we get some performance numbers in patches that add code for
> > performance?
> 
> For myself, this was justified by simply refactoring the common code.
> However, x11perf -aa10text on pnv:
>   before: 1.28 Mglyph/sec
>   after:  1.45 Mglyph/sec

Awesome.
Chris Wilson April 13, 2011, 7:56 p.m. UTC | #5
On Wed, 13 Apr 2011 21:26:24 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Tue, Apr 12, 2011 at 09:31:52PM +0100, Chris Wilson wrote:
> > Replace the three nearly identical copies of the code with a single
> > function. And take advantage of the opportunity to do some
> > micro-optimisation: avoid the vmalloc if at all possible and also avoid
> > dropping the lock unless we are forced to acquire the mm semaphore.
> 
> One tiny nitpick: Perhaps put an api comment at the top of
> gem_get_user_pages that this function drops the struct_mutex. That's not
> something we normally do and could cause endless amounts of fun if
> neglected.

How about:

/**
 * Magically retrieves the pages for the user addr whilst holding the
 * dev->struct_mutex.
 *
 * Since we can not take the mm semaphore whilst holding our dev->struct_mutex,
 * due to the pre-existing lock dependency established by i915_gem_fault(),
 * we have to perform some sleight-of-hand.
 *
 * First, we try the lockless variant of gup whilst continuing to hold the
 * mutex. If that fails to get all the user pages, then we no choice but
 * to acquire the mm semaphore (thus dropping the lock on dev->struct_mutex
 * to do so). The dev->struct_mutex is then re-acquired before we return.
 *
 * Returns: an error code *and* the number of user pages acquired. Even
 * on an error, you must iterate over the return pages and release them.
 */

?
-Chris
Daniel Vetter April 13, 2011, 8:56 p.m. UTC | #6
On Wed, Apr 13, 2011 at 08:56:26PM +0100, Chris Wilson wrote:
> On Wed, 13 Apr 2011 21:26:24 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Tue, Apr 12, 2011 at 09:31:52PM +0100, Chris Wilson wrote:
> > > Replace the three nearly identical copies of the code with a single
> > > function. And take advantage of the opportunity to do some
> > > micro-optimisation: avoid the vmalloc if at all possible and also avoid
> > > dropping the lock unless we are forced to acquire the mm semaphore.
> > 
> > One tiny nitpick: Perhaps put an api comment at the top of
> > gem_get_user_pages that this function drops the struct_mutex. That's not
> > something we normally do and could cause endless amounts of fun if
> > neglected.
> 
> How about:
> 
> /**
>  * Magically retrieves the pages for the user addr whilst holding the
>  * dev->struct_mutex.
>  *
>  * Since we can not take the mm semaphore whilst holding our dev->struct_mutex,
>  * due to the pre-existing lock dependency established by i915_gem_fault(),
>  * we have to perform some sleight-of-hand.
>  *
>  * First, we try the lockless variant of gup whilst continuing to hold the
>  * mutex. If that fails to get all the user pages, then we no choice but
>  * to acquire the mm semaphore (thus dropping the lock on dev->struct_mutex
>  * to do so). The dev->struct_mutex is then re-acquired before we return.
>  *
>  * Returns: an error code *and* the number of user pages acquired. Even
>  * on an error, you must iterate over the return pages and release them.
>  */

Perfect. And just reminded me that my review wasn't too careful, I've
glossed a bit over that num_pages detail ...
-Daniel
Ben Widawsky April 14, 2011, 11:23 p.m. UTC | #7
On Wed, Apr 13, 2011 at 08:56:26PM +0100, Chris Wilson wrote:
> On Wed, 13 Apr 2011 21:26:24 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Tue, Apr 12, 2011 at 09:31:52PM +0100, Chris Wilson wrote:
> > > Replace the three nearly identical copies of the code with a single
> > > function. And take advantage of the opportunity to do some
> > > micro-optimisation: avoid the vmalloc if at all possible and also avoid
> > > dropping the lock unless we are forced to acquire the mm semaphore.
> > 
> > One tiny nitpick: Perhaps put an api comment at the top of
> > gem_get_user_pages that this function drops the struct_mutex. That's not
> > something we normally do and could cause endless amounts of fun if
> > neglected.
> 
> How about:
> 
> /**
>  * Magically retrieves the pages for the user addr whilst holding the
>  * dev->struct_mutex.
>  *
>  * Since we can not take the mm semaphore whilst holding our dev->struct_mutex,
>  * due to the pre-existing lock dependency established by i915_gem_fault(),
>  * we have to perform some sleight-of-hand.
>  *
>  * First, we try the lockless variant of gup whilst continuing to hold the
>  * mutex. If that fails to get all the user pages, then we no choice but
>  * to acquire the mm semaphore (thus dropping the lock on dev->struct_mutex
>  * to do so). The dev->struct_mutex is then re-acquired before we return.
>  *
>  * Returns: an error code *and* the number of user pages acquired. Even
>  * on an error, you must iterate over the return pages and release them.
>  */
> 
> ?
> -Chris

I like this patch...

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Paul Menzel April 15, 2011, 9:48 a.m. UTC | #8
Am Donnerstag, den 14.04.2011, 16:23 -0700 schrieb Ben Widawsky:
> On Wed, Apr 13, 2011 at 08:56:26PM +0100, Chris Wilson wrote:
> > On Wed, 13 Apr 2011 21:26:24 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Tue, Apr 12, 2011 at 09:31:52PM +0100, Chris Wilson wrote:
> > > > Replace the three nearly identical copies of the code with a single
> > > > function. And take advantage of the opportunity to do some
> > > > micro-optimisation: avoid the vmalloc if at all possible and also avoid
> > > > dropping the lock unless we are forced to acquire the mm semaphore.
> > > 
> > > One tiny nitpick: Perhaps put an api comment at the top of
> > > gem_get_user_pages that this function drops the struct_mutex. That's not
> > > something we normally do and could cause endless amounts of fun if
> > > neglected.
> > 
> > How about:
> > 
> > /**
> >  * Magically retrieves the pages for the user addr whilst holding the
> >  * dev->struct_mutex.
> >  *
> >  * Since we can not take the mm semaphore whilst holding our dev->struct_mutex,
> >  * due to the pre-existing lock dependency established by i915_gem_fault(),
> >  * we have to perform some sleight-of-hand.
> >  *
> >  * First, we try the lockless variant of gup whilst continuing to hold the

I do not know what »gup« means.

> >  * mutex. If that fails to get all the user pages, then we no choice but

s/then we no/then we have no/

> >  * to acquire the mm semaphore (thus dropping the lock on dev->struct_mutex
> >  * to do so). The dev->struct_mutex is then re-acquired before we return.
> >  *
> >  * Returns: an error code *and* the number of user pages acquired. Even
> >  * on an error, you must iterate over the return pages and release them.
> >  */
> > 
> > ?
> > -Chris
> 
> I like this patch...
> 
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

Reviewed-by: Paul Menzel <paulepanter@users.sourceforge.net>


Thanks,

Paul
Chris Wilson April 16, 2011, 8:03 a.m. UTC | #9
On Fri, 15 Apr 2011 11:48:33 +0200, Paul Menzel <paulepanter@users.sourceforge.net> wrote:
> Am Donnerstag, den 14.04.2011, 16:23 -0700 schrieb Ben Widawsky:
> > On Wed, Apr 13, 2011 at 08:56:26PM +0100, Chris Wilson wrote:
> > >  * First, we try the lockless variant of gup whilst continuing to hold the
> 
> I do not know what »gup« means.

I guessed that people would recognise the common abbreviation for
get_user_pages, but not many people know about the different variants.
(I didn't until I dug through the headers trying to find a way
to avoid the mm semaphore.) So being explicit here helps, thanks.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 33830c9..0028f3b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -257,6 +257,56 @@  static int i915_gem_object_needs_bit17_swizzle(struct drm_i915_gem_object *obj)
 		obj->tiling_mode != I915_TILING_NONE;
 }
 
+static int
+i915_gem_get_user_pages(struct drm_device *dev,
+			unsigned long addr,
+			bool write,
+			int *num_pages,
+			struct page ***pages_out)
+{
+	struct page **pages;
+	int pinned, ret;
+	int n = *num_pages;
+
+	pages = kmalloc(n*sizeof(struct page *),
+			GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
+	if (pages == NULL) {
+		pages = drm_malloc_ab(n, sizeof(struct page *));
+		if (pages == NULL) {
+			*pages_out = NULL;
+			*num_pages = 0;
+			return -ENOMEM;
+		}
+	}
+
+	pinned = __get_user_pages_fast(addr, n, write, pages);
+	if (pinned < n) {
+		struct mm_struct *mm = current->mm;
+
+		mutex_unlock(&dev->struct_mutex);
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm,
+				     addr + (pinned << PAGE_SHIFT),
+				     n - pinned,
+				     write, 0,
+				     pages + pinned,
+				     NULL);
+		up_read(&mm->mmap_sem);
+		mutex_lock(&dev->struct_mutex);
+		if (ret > 0)
+			pinned += ret;
+	}
+
+	ret = 0;
+	if (pinned < n)
+		ret = -EFAULT;
+
+	*num_pages = pinned;
+	*pages_out = pages;
+	return ret;
+}
+
+
 static inline void
 slow_shmem_copy(struct page *dst_page,
 		int dst_offset,
@@ -398,11 +448,11 @@  i915_gem_shmem_pread_slow(struct drm_device *dev,
 			  struct drm_file *file)
 {
 	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
-	struct mm_struct *mm = current->mm;
 	struct page **user_pages;
 	ssize_t remain;
-	loff_t offset, pinned_pages, i;
-	loff_t first_data_page, last_data_page, num_pages;
+	loff_t offset;
+	loff_t first_data_page, last_data_page;
+	int num_pages, i;
 	int shmem_page_offset;
 	int data_page_index, data_page_offset;
 	int page_length;
@@ -420,20 +470,10 @@  i915_gem_shmem_pread_slow(struct drm_device *dev,
 	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
 	num_pages = last_data_page - first_data_page + 1;
 
-	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
-	if (user_pages == NULL)
-		return -ENOMEM;
-
-	mutex_unlock(&dev->struct_mutex);
-	down_read(&mm->mmap_sem);
-	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
-				      num_pages, 1, 0, user_pages, NULL);
-	up_read(&mm->mmap_sem);
-	mutex_lock(&dev->struct_mutex);
-	if (pinned_pages < num_pages) {
-		ret = -EFAULT;
+	ret = i915_gem_get_user_pages(dev, data_ptr, true,
+				      &num_pages, &user_pages);
+	if (ret)
 		goto out;
-	}
 
 	ret = i915_gem_object_set_cpu_read_domain_range(obj,
 							args->offset,
@@ -494,7 +534,7 @@  i915_gem_shmem_pread_slow(struct drm_device *dev,
 	}
 
 out:
-	for (i = 0; i < pinned_pages; i++) {
+	for (i = 0; i < num_pages; i++) {
 		SetPageDirty(user_pages[i]);
 		mark_page_accessed(user_pages[i]);
 		page_cache_release(user_pages[i]);
@@ -679,10 +719,9 @@  i915_gem_gtt_pwrite_slow(struct drm_device *dev,
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	ssize_t remain;
 	loff_t gtt_page_base, offset;
-	loff_t first_data_page, last_data_page, num_pages;
-	loff_t pinned_pages, i;
+	loff_t first_data_page, last_data_page;
+	int num_pages, i;
 	struct page **user_pages;
-	struct mm_struct *mm = current->mm;
 	int gtt_page_offset, data_page_offset, data_page_index, page_length;
 	int ret;
 	uint64_t data_ptr = args->data_ptr;
@@ -697,28 +736,18 @@  i915_gem_gtt_pwrite_slow(struct drm_device *dev,
 	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
 	num_pages = last_data_page - first_data_page + 1;
 
-	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
-	if (user_pages == NULL)
-		return -ENOMEM;
-
-	mutex_unlock(&dev->struct_mutex);
-	down_read(&mm->mmap_sem);
-	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
-				      num_pages, 0, 0, user_pages, NULL);
-	up_read(&mm->mmap_sem);
-	mutex_lock(&dev->struct_mutex);
-	if (pinned_pages < num_pages) {
-		ret = -EFAULT;
-		goto out_unpin_pages;
-	}
+	ret = i915_gem_get_user_pages(dev, data_ptr, false,
+				      &num_pages, &user_pages);
+	if (ret)
+		goto out;
 
 	ret = i915_gem_object_set_to_gtt_domain(obj, true);
 	if (ret)
-		goto out_unpin_pages;
+		goto out;
 
 	ret = i915_gem_object_put_fence(obj);
 	if (ret)
-		goto out_unpin_pages;
+		goto out;
 
 	offset = obj->gtt_offset + args->offset;
 
@@ -753,8 +782,8 @@  i915_gem_gtt_pwrite_slow(struct drm_device *dev,
 		data_ptr += page_length;
 	}
 
-out_unpin_pages:
-	for (i = 0; i < pinned_pages; i++)
+out:
+	for (i = 0; i < num_pages; i++)
 		page_cache_release(user_pages[i]);
 	drm_free_large(user_pages);
 
@@ -803,11 +832,11 @@  i915_gem_shmem_pwrite_fast(struct drm_device *dev,
 		if (IS_ERR(page))
 			return PTR_ERR(page);
 
-		vaddr = kmap_atomic(page, KM_USER0);
+		vaddr = kmap_atomic(page);
 		ret = __copy_from_user_inatomic(vaddr + page_offset,
 						user_data,
 						page_length);
-		kunmap_atomic(vaddr, KM_USER0);
+		kunmap_atomic(vaddr);
 
 		set_page_dirty(page);
 		mark_page_accessed(page);
@@ -842,11 +871,10 @@  i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 			   struct drm_file *file)
 {
 	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
-	struct mm_struct *mm = current->mm;
 	struct page **user_pages;
 	ssize_t remain;
-	loff_t offset, pinned_pages, i;
-	loff_t first_data_page, last_data_page, num_pages;
+	loff_t first_data_page, last_data_page, offset;
+	int num_pages, i;
 	int shmem_page_offset;
 	int data_page_index,  data_page_offset;
 	int page_length;
@@ -864,20 +892,10 @@  i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
 	num_pages = last_data_page - first_data_page + 1;
 
-	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
-	if (user_pages == NULL)
-		return -ENOMEM;
-
-	mutex_unlock(&dev->struct_mutex);
-	down_read(&mm->mmap_sem);
-	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
-				      num_pages, 0, 0, user_pages, NULL);
-	up_read(&mm->mmap_sem);
-	mutex_lock(&dev->struct_mutex);
-	if (pinned_pages < num_pages) {
-		ret = -EFAULT;
+	ret = i915_gem_get_user_pages(dev, data_ptr, false,
+				      &num_pages, &user_pages);
+	if (ret)
 		goto out;
-	}
 
 	ret = i915_gem_object_set_to_cpu_domain(obj, 1);
 	if (ret)
@@ -940,7 +958,7 @@  i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 	}
 
 out:
-	for (i = 0; i < pinned_pages; i++)
+	for (i = 0; i < num_pages; i++)
 		page_cache_release(user_pages[i]);
 	drm_free_large(user_pages);