[1/6] drm/i915: Clearing buffer objects via CPU/GTT

Message ID	1449665182-10054-2-git-send-email-ankitprasad.r.sharma@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: ankitprasad.r.sharma@intel.com To: intel-gfx@lists.freedesktop.org Date: Wed, 9 Dec 2015 18:16:17 +0530 Message-Id: <1449665182-10054-2-git-send-email-ankitprasad.r.sharma@intel.com> In-Reply-To: <1449665182-10054-1-git-send-email-ankitprasad.r.sharma@intel.com> References: <1449665182-10054-1-git-send-email-ankitprasad.r.sharma@intel.com> Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>, akash.goel@intel.com, shashidhar.hiremath@intel.com Subject: [Intel-gfx] [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

ankitprasad.r.sharma@intel.com Dec. 9, 2015, 12:46 p.m. UTC

From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>

This patch adds support for clearing buffer objects via CPU/GTT. This
is particularly useful for clearing out the non shmem backed objects.
Currently intend to use this only for buffers allocated from stolen
region.

v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
variable assignments (Tvrtko)

v3: Map object page by page to the gtt if the pinning of the whole object
to the ggtt fails, Corrected function name (Chris)

Testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

Dave Gordon Dec. 9, 2015, 1:26 p.m. UTC | #1

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> This patch adds support for clearing buffer objects via CPU/GTT. This
> is particularly useful for clearing out the non shmem backed objects.
> Currently intend to use this only for buffers allocated from stolen
> region.
>
> v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> variable assignments (Tvrtko)
>
> v3: Map object page by page to the gtt if the pinning of the whole object
> to the ggtt fails, Corrected function name (Chris)
>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h |  1 +
>   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 80 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 548a0eb..8e554d3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   				    int *needs_clflush);
>
>   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
>
>   static inline int __sg_page_count(struct scatterlist *sg)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9d2e6e3..d57e850 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5244,3 +5244,82 @@ fail:
>   	drm_gem_object_unreference(&obj->base);
>   	return ERR_PTR(ret);
>   }
> +
> +/**
> + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> + * @obj: Buffer object to be cleared
> + *
> + * Return: 0 - success, non-zero - failure
> + */
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> +{
> +	int ret, i;
> +	char __iomem *base;
> +	size_t size = obj->base.size;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_mm_node node;
> +
> +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> +	if (ret) {
> +		memset(&node, 0, sizeof(node));
> +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> +							  &node, 4096, 0,
> +							  I915_CACHE_NONE, 0,
> +							  i915->gtt.mappable_end,
> +							  DRM_MM_SEARCH_DEFAULT,
> +							  DRM_MM_CREATE_DEFAULT);
> +		if (ret)
> +			goto out;
> +
> +		i915_gem_object_pin_pages(obj);
> +	} else {
> +		node.start = i915_gem_obj_ggtt_offset(obj);
> +		node.allocated = false;
> +	}
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto unpin;
> +
> +	if (node.allocated) {
> +		for (i = 0; i < size/PAGE_SIZE; i++) {
> +			wmb();
> +			i915->gtt.base.insert_page(&i915->gtt.base,
> +					i915_gem_object_get_dma_address(obj, i),
> +					node.start,
> +					I915_CACHE_NONE,
> +					0);
> +			wmb();
> +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> +			memset_io(base, 0, 4096);
> +			iounmap(base);
> +		}
> +	} else {
> +		/* Get the CPU virtual address of the buffer */
> +		base = ioremap_wc(i915->gtt.mappable_base +
> +				  node.start, size);
> +		if (base == NULL) {
> +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> +			ret = -ENOSPC;
> +			goto unpin;
> +		}
> +
> +		memset_io(base, 0, size);
> +		iounmap(base);
> +	}
> +unpin:
> +	if (node.allocated) {
> +		wmb();
> +		i915->gtt.base.clear_range(&i915->gtt.base,
> +				node.start, node.size,
> +				true);
> +		drm_mm_remove_node(&node);
> +		i915_gem_object_unpin_pages(obj);
> +	}
> +	else {
> +		i915_gem_object_ggtt_unpin(obj);
> +	}
> +out:
> +	return ret;
> +}

This is effectively two functions interleaved, as shown by the repeated 
if (node.allocated) tests. Would it not be clearer to have the mainline 
function deal only with the GTT-pinned case, and a separate function for 
the page-by-page version, called as a fallback if pinning fails?

int i915_gem_object_clear(struct drm_i915_gem_object *obj)
{
	int ret, i;
	char __iomem *base;
	size_t size = obj->base.size;
	struct drm_i915_private *i915 = to_i915(obj->base.dev);

	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE|PIN_NONBLOCK);
	if (ret)
		return __i915_obj_clear_by_pages(...);

	... mainline (fast) code here ...

	return ret;
}

static int __i915_obj_clear_by_pages(...);
{
	... complicated page-by-page fallback code here ...
}

.Dave.

Tvrtko Ursulin Dec. 9, 2015, 1:30 p.m. UTC | #2

Hi,

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> This patch adds support for clearing buffer objects via CPU/GTT. This
> is particularly useful for clearing out the non shmem backed objects.
> Currently intend to use this only for buffers allocated from stolen
> region.
>
> v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> variable assignments (Tvrtko)
>
> v3: Map object page by page to the gtt if the pinning of the whole object
> to the ggtt fails, Corrected function name (Chris)
>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

I gave my r-b from v2 and v3 is sufficiently different that it cannot 
apply any more.

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/i915_drv.h |  1 +
>   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 80 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 548a0eb..8e554d3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   				    int *needs_clflush);
>
>   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
>
>   static inline int __sg_page_count(struct scatterlist *sg)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9d2e6e3..d57e850 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5244,3 +5244,82 @@ fail:
>   	drm_gem_object_unreference(&obj->base);
>   	return ERR_PTR(ret);
>   }
> +
> +/**
> + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> + * @obj: Buffer object to be cleared
> + *
> + * Return: 0 - success, non-zero - failure
> + */
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> +{
> +	int ret, i;
> +	char __iomem *base;
> +	size_t size = obj->base.size;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_mm_node node;
> +
> +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> +	if (ret) {
> +		memset(&node, 0, sizeof(node));
> +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> +							  &node, 4096, 0,
> +							  I915_CACHE_NONE, 0,
> +							  i915->gtt.mappable_end,
> +							  DRM_MM_SEARCH_DEFAULT,
> +							  DRM_MM_CREATE_DEFAULT);
> +		if (ret)
> +			goto out;
> +
> +		i915_gem_object_pin_pages(obj);
> +	} else {
> +		node.start = i915_gem_obj_ggtt_offset(obj);
> +		node.allocated = false;
> +	}
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto unpin;
> +
> +	if (node.allocated) {
> +		for (i = 0; i < size/PAGE_SIZE; i++) {
> +			wmb();
> +			i915->gtt.base.insert_page(&i915->gtt.base,
> +					i915_gem_object_get_dma_address(obj, i),
> +					node.start,
> +					I915_CACHE_NONE,
> +					0);
> +			wmb();
> +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> +			memset_io(base, 0, 4096);
> +			iounmap(base);
> +		}
> +	} else {
> +		/* Get the CPU virtual address of the buffer */
> +		base = ioremap_wc(i915->gtt.mappable_base +
> +				  node.start, size);
> +		if (base == NULL) {
> +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> +			ret = -ENOSPC;
> +			goto unpin;
> +		}
> +
> +		memset_io(base, 0, size);
> +		iounmap(base);
> +	}
> +unpin:
> +	if (node.allocated) {
> +		wmb();
> +		i915->gtt.base.clear_range(&i915->gtt.base,
> +				node.start, node.size,
> +				true);
> +		drm_mm_remove_node(&node);
> +		i915_gem_object_unpin_pages(obj);
> +	}
> +	else {
> +		i915_gem_object_ggtt_unpin(obj);
> +	}
> +out:
> +	return ret;
> +}
>

Tvrtko Ursulin Dec. 9, 2015, 1:57 p.m. UTC | #3

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> This patch adds support for clearing buffer objects via CPU/GTT. This
> is particularly useful for clearing out the non shmem backed objects.
> Currently intend to use this only for buffers allocated from stolen
> region.
>
> v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> variable assignments (Tvrtko)
>
> v3: Map object page by page to the gtt if the pinning of the whole object
> to the ggtt fails, Corrected function name (Chris)
>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h |  1 +
>   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 80 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 548a0eb..8e554d3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   				    int *needs_clflush);
>
>   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
>
>   static inline int __sg_page_count(struct scatterlist *sg)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9d2e6e3..d57e850 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5244,3 +5244,82 @@ fail:
>   	drm_gem_object_unreference(&obj->base);
>   	return ERR_PTR(ret);
>   }
> +
> +/**
> + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> + * @obj: Buffer object to be cleared
> + *
> + * Return: 0 - success, non-zero - failure
> + */
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> +{
> +	int ret, i;
> +	char __iomem *base;
> +	size_t size = obj->base.size;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_mm_node node;
> +
> +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);

Hm, I thought Chrises suggestion was not to even try mapping all of it 
into GTT but just go page by page?

If I misunderstood that then I agree with Dave's comment that it should 
be split in two helper functions.

> +	if (ret) {
> +		memset(&node, 0, sizeof(node));
> +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> +							  &node, 4096, 0,
> +							  I915_CACHE_NONE, 0,
> +							  i915->gtt.mappable_end,
> +							  DRM_MM_SEARCH_DEFAULT,
> +							  DRM_MM_CREATE_DEFAULT);
> +		if (ret)
> +			goto out;
> +
> +		i915_gem_object_pin_pages(obj);
> +	} else {
> +		node.start = i915_gem_obj_ggtt_offset(obj);
> +		node.allocated = false;

This looks very hacky anyway and I would not recommend it.

> +	}
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto unpin;
> +
> +	if (node.allocated) {
> +		for (i = 0; i < size/PAGE_SIZE; i++) {
> +			wmb();

What is this barreier for? Shouldn't the one after writting out the PTEs 
and before remapping be enough?

> +			i915->gtt.base.insert_page(&i915->gtt.base,
> +					i915_gem_object_get_dma_address(obj, i),
> +					node.start,
> +					I915_CACHE_NONE,
> +					0);
> +			wmb();
> +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> +			memset_io(base, 0, 4096);
> +			iounmap(base);
> +		}
> +	} else {
> +		/* Get the CPU virtual address of the buffer */
> +		base = ioremap_wc(i915->gtt.mappable_base +
> +				  node.start, size);
> +		if (base == NULL) {
> +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> +			ret = -ENOSPC;
> +			goto unpin;
> +		}
> +
> +		memset_io(base, 0, size);
> +		iounmap(base);
> +	}
> +unpin:
> +	if (node.allocated) {
> +		wmb();

I don't understand this one either?

> +		i915->gtt.base.clear_range(&i915->gtt.base,
> +				node.start, node.size,
> +				true);
> +		drm_mm_remove_node(&node);
> +		i915_gem_object_unpin_pages(obj);
> +	}
> +	else {
> +		i915_gem_object_ggtt_unpin(obj);
> +	}
> +out:
> +	return ret;
> +}
>

Regards,

Tvrtko

Chris Wilson Dec. 9, 2015, 1:57 p.m. UTC | #4

On Wed, Dec 09, 2015 at 06:16:17PM +0530, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> 
> This patch adds support for clearing buffer objects via CPU/GTT. This
> is particularly useful for clearing out the non shmem backed objects.
> Currently intend to use this only for buffers allocated from stolen
> region.
> 
> v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> variable assignments (Tvrtko)
> 
> v3: Map object page by page to the gtt if the pinning of the whole object
> to the ggtt fails, Corrected function name (Chris)
> 
> Testcase: igt/gem_stolen
> 
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |  1 +
>  drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 548a0eb..8e554d3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>  				    int *needs_clflush);
>  
>  int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
>  
>  static inline int __sg_page_count(struct scatterlist *sg)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9d2e6e3..d57e850 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5244,3 +5244,82 @@ fail:
>  	drm_gem_object_unreference(&obj->base);
>  	return ERR_PTR(ret);
>  }
> +
> +/**
> + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> + * @obj: Buffer object to be cleared
> + *
> + * Return: 0 - success, non-zero - failure
> + */
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> +{
> +	int ret, i;
> +	char __iomem *base;
> +	size_t size = obj->base.size;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_mm_node node;
> +
> +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));

Just lockdep_assert_held.

> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);

Would be nice to get the PIN_NOFAULT patches in to give preference to
userspace mappings....

> +	if (ret) {
> +		memset(&node, 0, sizeof(node));
> +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> +							  &node, 4096, 0,
> +							  I915_CACHE_NONE, 0,
> +							  i915->gtt.mappable_end,
> +							  DRM_MM_SEARCH_DEFAULT,
> +							  DRM_MM_CREATE_DEFAULT);

We use this often enough to merit a little helper.

> +		if (ret)
> +			goto out;
> +
> +		i915_gem_object_pin_pages(obj);
> +	} else {
> +		node.start = i915_gem_obj_ggtt_offset(obj);
> +		node.allocated = false;
> +	}
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto unpin;

You only need to drop the fence when using the whole object GGTT
mmaping.

> +
> +	if (node.allocated) {
> +		for (i = 0; i < size/PAGE_SIZE; i++) {
> +			wmb();
> +			i915->gtt.base.insert_page(&i915->gtt.base,
> +					i915_gem_object_get_dma_address(obj, i),
> +					node.start,
> +					I915_CACHE_NONE,
> +					0);
> +			wmb();
> +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> +			memset_io(base, 0, 4096);
> +			iounmap(base);
> +		}
> +	} else {
> +		/* Get the CPU virtual address of the buffer */
> +		base = ioremap_wc(i915->gtt.mappable_base +
> +				  node.start, size);

You should not use ioremap_wc() as it is easy to exhaust the kernel
address space on 32bit.

If you did a page by page approach for both paths, you could do this with
much less code...
-Chris

ankitprasad.r.sharma@intel.com Dec. 10, 2015, 10:02 a.m. UTC | #5

On Wed, 2015-12-09 at 13:26 +0000, Dave Gordon wrote:
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >
> > This patch adds support for clearing buffer objects via CPU/GTT. This
> > is particularly useful for clearing out the non shmem backed objects.
> > Currently intend to use this only for buffers allocated from stolen
> > region.
> >
> > v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> > variable assignments (Tvrtko)
> >
> > v3: Map object page by page to the gtt if the pinning of the whole object
> > to the ggtt fails, Corrected function name (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h |  1 +
> >   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 80 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 548a0eb..8e554d3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> >   				    int *needs_clflush);
> >
> >   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
> >
> >   static inline int __sg_page_count(struct scatterlist *sg)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 9d2e6e3..d57e850 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5244,3 +5244,82 @@ fail:
> >   	drm_gem_object_unreference(&obj->base);
> >   	return ERR_PTR(ret);
> >   }
> > +
> > +/**
> > + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> > + * @obj: Buffer object to be cleared
> > + *
> > + * Return: 0 - success, non-zero - failure
> > + */
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> > +{
> > +	int ret, i;
> > +	char __iomem *base;
> > +	size_t size = obj->base.size;
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_mm_node node;
> > +
> > +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> > +	if (ret) {
> > +		memset(&node, 0, sizeof(node));
> > +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > +							  &node, 4096, 0,
> > +							  I915_CACHE_NONE, 0,
> > +							  i915->gtt.mappable_end,
> > +							  DRM_MM_SEARCH_DEFAULT,
> > +							  DRM_MM_CREATE_DEFAULT);
> > +		if (ret)
> > +			goto out;
> > +
> > +		i915_gem_object_pin_pages(obj);
> > +	} else {
> > +		node.start = i915_gem_obj_ggtt_offset(obj);
> > +		node.allocated = false;
> > +	}
> > +
> > +	ret = i915_gem_object_put_fence(obj);
> > +	if (ret)
> > +		goto unpin;
> > +
> > +	if (node.allocated) {
> > +		for (i = 0; i < size/PAGE_SIZE; i++) {
> > +			wmb();
> > +			i915->gtt.base.insert_page(&i915->gtt.base,
> > +					i915_gem_object_get_dma_address(obj, i),
> > +					node.start,
> > +					I915_CACHE_NONE,
> > +					0);
> > +			wmb();
> > +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> > +			memset_io(base, 0, 4096);
> > +			iounmap(base);
> > +		}
> > +	} else {
> > +		/* Get the CPU virtual address of the buffer */
> > +		base = ioremap_wc(i915->gtt.mappable_base +
> > +				  node.start, size);
> > +		if (base == NULL) {
> > +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> > +			ret = -ENOSPC;
> > +			goto unpin;
> > +		}
> > +
> > +		memset_io(base, 0, size);
> > +		iounmap(base);
> > +	}
> > +unpin:
> > +	if (node.allocated) {
> > +		wmb();
> > +		i915->gtt.base.clear_range(&i915->gtt.base,
> > +				node.start, node.size,
> > +				true);
> > +		drm_mm_remove_node(&node);
> > +		i915_gem_object_unpin_pages(obj);
> > +	}
> > +	else {
> > +		i915_gem_object_ggtt_unpin(obj);
> > +	}
> > +out:
> > +	return ret;
> > +}
> 
> This is effectively two functions interleaved, as shown by the repeated 
> if (node.allocated) tests. Would it not be clearer to have the mainline 
> function deal only with the GTT-pinned case, and a separate function for 
> the page-by-page version, called as a fallback if pinning fails?
> 
> int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> {
> 	int ret, i;
> 	char __iomem *base;
> 	size_t size = obj->base.size;
> 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> 
> 	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> 	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE|PIN_NONBLOCK);
> 	if (ret)
> 		return __i915_obj_clear_by_pages(...);
> 
> 	... mainline (fast) code here ...
> 
> 	return ret;
> }
> 
> static int __i915_obj_clear_by_pages(...);
> {
> 	... complicated page-by-page fallback code here ...
> }
> 
This is good to separate the page-by-page path to not make the code
messy, Also I kind of liked Chris' suggestion to not use ioremap_wc() as
it could easily exhaust kernel space.

To make it less messy and more robust, I would prefer to use only the
page-by-page path (no need to even try mapping the full object), with
io_mapping_map_wc()

Thanks,
Ankit
>

ankitprasad.r.sharma@intel.com Dec. 10, 2015, 10:23 a.m. UTC | #6

On Wed, 2015-12-09 at 13:57 +0000, Tvrtko Ursulin wrote:
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >
> > This patch adds support for clearing buffer objects via CPU/GTT. This
> > is particularly useful for clearing out the non shmem backed objects.
> > Currently intend to use this only for buffers allocated from stolen
> > region.
> >
> > v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> > variable assignments (Tvrtko)
> >
> > v3: Map object page by page to the gtt if the pinning of the whole object
> > to the ggtt fails, Corrected function name (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h |  1 +
> >   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 80 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 548a0eb..8e554d3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> >   				    int *needs_clflush);
> >
> >   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
> >
> >   static inline int __sg_page_count(struct scatterlist *sg)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 9d2e6e3..d57e850 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5244,3 +5244,82 @@ fail:
> >   	drm_gem_object_unreference(&obj->base);
> >   	return ERR_PTR(ret);
> >   }
> > +
> > +/**
> > + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> > + * @obj: Buffer object to be cleared
> > + *
> > + * Return: 0 - success, non-zero - failure
> > + */
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> > +{
> > +	int ret, i;
> > +	char __iomem *base;
> > +	size_t size = obj->base.size;
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_mm_node node;
> > +
> > +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> 
> Hm, I thought Chrises suggestion was not to even try mapping all of it 
> into GTT but just go page by page?
> 
Yes, I will modify this to use only the page-by-page approach.
> If I misunderstood that then I agree with Dave's comment that it should 
> be split in two helper functions.
> 
> > +	if (ret) {
> > +		memset(&node, 0, sizeof(node));
> > +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > +							  &node, 4096, 0,
> > +							  I915_CACHE_NONE, 0,
> > +							  i915->gtt.mappable_end,
> > +							  DRM_MM_SEARCH_DEFAULT,
> > +							  DRM_MM_CREATE_DEFAULT);
> > +		if (ret)
> > +			goto out;
> > +
> > +		i915_gem_object_pin_pages(obj);
> > +	} else {
> > +		node.start = i915_gem_obj_ggtt_offset(obj);
> > +		node.allocated = false;
> 
> This looks very hacky anyway and I would not recommend it.
> 
> > +	}
> > +
> > +	ret = i915_gem_object_put_fence(obj);
> > +	if (ret)
> > +		goto unpin;
> > +
> > +	if (node.allocated) {
> > +		for (i = 0; i < size/PAGE_SIZE; i++) {
> > +			wmb();
> 
> What is this barreier for? Shouldn't the one after writting out the PTEs 
> and before remapping be enough?
This is to be fully on the safer side, to avoid any overlapping done by
the compiler across the iterations and that the loop instructions are
strictly executed in the program order.

Having just one barrier will ensure that insert_page and subsequent
ioremap are done in that order but the end of one iteration can still
overlap the start of next iteration.
> 
> > +			i915->gtt.base.insert_page(&i915->gtt.base,
> > +					i915_gem_object_get_dma_address(obj, i),
> > +					node.start,
> > +					I915_CACHE_NONE,
> > +					0);
> > +			wmb();
> > +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> > +			memset_io(base, 0, 4096);
> > +			iounmap(base);
> > +		}
> > +	} else {
> > +		/* Get the CPU virtual address of the buffer */
> > +		base = ioremap_wc(i915->gtt.mappable_base +
> > +				  node.start, size);
> > +		if (base == NULL) {
> > +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> > +			ret = -ENOSPC;
> > +			goto unpin;
> > +		}
> > +
> > +		memset_io(base, 0, size);
> > +		iounmap(base);
> > +	}
> > +unpin:
> > +	if (node.allocated) {
> > +		wmb();
> 
> I don't understand this one either?
This is to make sure the last memset is over before we move to
clear_range.
> 
> > +		i915->gtt.base.clear_range(&i915->gtt.base,
> > +				node.start, node.size,
> > +				true);
> > +		drm_mm_remove_node(&node);
> > +		i915_gem_object_unpin_pages(obj);
> > +	}
> > +	else {
> > +		i915_gem_object_ggtt_unpin(obj);
> > +	}
> > +out:
> > +	return ret;
> > +}
> >
> 
> Regards,
> 
> Tvrtko

ankitprasad.r.sharma@intel.com Dec. 10, 2015, 10:27 a.m. UTC | #7

On Wed, 2015-12-09 at 13:57 +0000, Chris Wilson wrote:
> On Wed, Dec 09, 2015 at 06:16:17PM +0530, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > 
> > This patch adds support for clearing buffer objects via CPU/GTT. This
> > is particularly useful for clearing out the non shmem backed objects.
> > Currently intend to use this only for buffers allocated from stolen
> > region.
> > 
> > v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> > variable assignments (Tvrtko)
> > 
> > v3: Map object page by page to the gtt if the pinning of the whole object
> > to the ggtt fails, Corrected function name (Chris)
> > 
> > Testcase: igt/gem_stolen
> > 
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h |  1 +
> >  drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 80 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 548a0eb..8e554d3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> >  				    int *needs_clflush);
> >  
> >  int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
> >  
> >  static inline int __sg_page_count(struct scatterlist *sg)
> >  {
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 9d2e6e3..d57e850 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5244,3 +5244,82 @@ fail:
> >  	drm_gem_object_unreference(&obj->base);
> >  	return ERR_PTR(ret);
> >  }
> > +
> > +/**
> > + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> > + * @obj: Buffer object to be cleared
> > + *
> > + * Return: 0 - success, non-zero - failure
> > + */
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> > +{
> > +	int ret, i;
> > +	char __iomem *base;
> > +	size_t size = obj->base.size;
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_mm_node node;
> > +
> > +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> 
> Just lockdep_assert_held.
> 
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> 
> Would be nice to get the PIN_NOFAULT patches in to give preference to
> userspace mappings....
> 
Wouldn't it be better, not to use 2 approaches and just do the clearing
using the insert_page function. (not to even try mapping the whole
object) ?

Thanks,
Ankit

[1/6] drm/i915: Clearing buffer objects via CPU/GTT

Commit Message

Comments

Patch