diff mbox

[5/7] drm/i915/vlv: Increase the utilization of stolen memory on VLV.

Message ID 1389245462-11083-1-git-send-email-akash.goel@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

akash.goel@intel.com Jan. 9, 2014, 5:31 a.m. UTC
From: Akash Goel <akash.goel@intel.com>

On VLV, 64MB of system memory was being reserved for stolen
area, but ~8MB of it was being utilized.
Increased the utilization of Stolen area by allocating
User created Frame buffers(only X Tiled) from it.
User Frame buffers are suitable for allocation from stolen area,
as its very unlikely that they are not accessed from CPU side.
And its even more unlikely that the Tiled(X) buffers will be
accessed directly from the CPU side. And any allocation
from stolen area is not directly CPU accessible, but
accessible only through the aperture space.
With 1080p sized frame buffers (32 bpp), the allocation of 6-7
frame buffers itself gives almost the full utilization of
stolen area.

Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h        |  1 +
 drivers/gpu/drm/i915/i915_gem.c        | 21 ++++++++
 drivers/gpu/drm/i915/i915_gem_stolen.c | 93 ++++++++++++++++++++++++++++++++++
 3 files changed, 115 insertions(+)

Comments

Daniel Vetter Jan. 9, 2014, 7:27 a.m. UTC | #1
On Thu, Jan 09, 2014 at 11:01:02AM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> On VLV, 64MB of system memory was being reserved for stolen
> area, but ~8MB of it was being utilized.
> Increased the utilization of Stolen area by allocating
> User created Frame buffers(only X Tiled) from it.
> User Frame buffers are suitable for allocation from stolen area,
> as its very unlikely that they are not accessed from CPU side.
> And its even more unlikely that the Tiled(X) buffers will be
> accessed directly from the CPU side. And any allocation
> from stolen area is not directly CPU accessible, but
> accessible only through the aperture space.
> With 1080p sized frame buffers (32 bpp), the allocation of 6-7
> frame buffers itself gives almost the full utilization of
> stolen area.
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h        |  1 +
>  drivers/gpu/drm/i915/i915_gem.c        | 21 ++++++++
>  drivers/gpu/drm/i915/i915_gem_stolen.c | 93 ++++++++++++++++++++++++++++++++++
>  3 files changed, 115 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1bcc543..db29537 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2376,6 +2376,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>  					       u32 stolen_offset,
>  					       u32 gtt_offset,
>  					       u32 size);
> +void i915_gem_object_move_to_stolen(struct drm_i915_gem_object *obj);
>  void i915_gem_object_release_stolen(struct drm_i915_gem_object *obj);
>  
>  /* i915_gem_tiling.c */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 656406d..24f93ef 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3915,6 +3915,27 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
>  		}
>  	}
>  
> +	/* Try to allocate the physical space for the GEM object,
> +	 * representing the User frame buffer, from the stolen area.
> +	 * But if there is no sufficient free space left in stolen
> +	 * area, will fallback to shmem.
> +	 */
> +	if (obj->user_fb == 1) {
> +		if (obj->pages == NULL) {
> +			if (obj->tiling_mode == I915_TILING_X) {
> +				/* Tiled(X) Scanout buffers are more suitable
> +				 * for allocation from stolen area, as its very
> +				 * unlikely that they will be accessed directly
> +				 * from the CPU side and any allocation from
> +				 * stolen area is not directly CPU accessible,
> +				 * but accessible only through the aperture
> +				 * space.
> +				 */
> +				i915_gem_object_move_to_stolen(obj);
> +			}
> +		}
> +	}

Neat hack ;-) But I think for upstream we need to address a few more
concerns. Random comments:

- The vlv patch subject is a bit misleading here since this is about
  stolen memory usage in general across all platforms.

- object_pin is the wrong place to change the backing storage since
  someone else could already have pinned the it (e.g. through dma-buf). We
  need to do this earlier before calling down into ->get_pages.

- If we allow opportunistical placement of objects into stolen memory I
  think we should also fully support purgeable objects so that we can
  dynamically scale to workloads. Or at least to be able to kick out stuff
  in case the kernel needs the contiguous memory for something more
  important. So if we couldn't find a suitable stolen block we'd scan
  through all objects with stolen memory backing and check whether any
  purgeable objects could be kicked out.

- For your usecase, have you tried to simply reduce the stolen area as
  much as possible? Our friendly competition on ARM SoCs seems to have
  mostly moved away from gfx reserved chunks towards more flexible
  approaches like CMA. Giving stolen back to the linux page allocator
  should be possible, but I haven't really looked into that yet all that
  much ...

- For upstream I think changing the personality of buffer objects behind
  userspace's back is a bit too frisky wrt breaking something. I prefer if
  userspace opts-in explicitly by passing a flag to the create ioctl
  stating that stolen memory is allowed/preferred for this allocation.
  Chris Wilson posted an rfc patch a while back to add a create2 ioctl
  (which at the same time also allows us to set the caching, tiling and
  any other object parameters).

- We've had some "fun" last time around we've tried to use stolen more
  seriously with scanout buffers being strangely offset/corrupted.
  Hopefully this is fixed by your sg->offset patch, but I can't find the
  reports nor recall the details right now.

Cheers, Daniel
> +
>  	if (!i915_gem_obj_bound(obj, vm)) {
>  		ret = i915_gem_object_bind_to_vm(obj, vm, alignment,
>  						 map_and_fenceable,
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index 5cf97d6..29c22f9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -29,6 +29,7 @@
>  #include <drm/drmP.h>
>  #include <drm/i915_drm.h>
>  #include "i915_drv.h"
> +#include <linux/shmem_fs.h>
>  
>  /*
>   * The BIOS typically reserves some of the system's memory for the exclusive
> @@ -370,6 +371,98 @@ i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
>  	return NULL;
>  }
>  
> +void
> +i915_gem_object_move_to_stolen(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_device *dev = obj->base.dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_mm_node *stolen;
> +	u32 size = obj->base.size;
> +	int ret = 0;
> +
> +	if (!IS_VALLEYVIEW(dev)) {
> +		return;
> +	}
> +
> +	if (obj->stolen) {
> +		BUG_ON(obj->pages == NULL);
> +		return;
> +	}
> +
> +	if (!drm_mm_initialized(&dev_priv->mm.stolen))
> +		return;
> +
> +	if (size == 0)
> +		return;
> +
> +	/* Check if already shmem space has been allocated for the object
> +	 * or not. We cannot rely upon on the value of 'pages' field for this.
> +	 * As even though if the 'pages' field is NULL, it does not actually
> +	 * indicate that the backing physical space (shmem) is currently not
> +	 * reserved for the object, as the object may not get purged/truncated
> +	 * on the calll to 'put_pages_gtt'.
> +	 */
> +	if (obj->base.filp) {
> +		struct inode *inode = file_inode(obj->base.filp);
> +		struct shmem_inode_info *info = SHMEM_I(inode);
> +		if (!inode)
> +			return;
> +		spin_lock(&info->lock);
> +		/* The alloced field stores how many data pages are
> +		 * allocated to the file.
> +		 */
> +		ret = info->alloced;
> +		spin_unlock(&info->lock);
> +		if (ret > 0) {
> +			DRM_DEBUG_DRIVER(
> +				"Already shmem space alloced, %d pges\n", ret);
> +			return;
> +		}
> +	}
> +
> +	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
> +	if (!stolen)
> +		return;
> +
> +	ret = drm_mm_insert_node(&dev_priv->mm.stolen, stolen, size,
> +				 4096, DRM_MM_SEARCH_DEFAULT);
> +	if (ret) {
> +		kfree(stolen);
> +		DRM_DEBUG_DRIVER("ran out of stolen space\n");
> +		return;
> +	}
> +
> +	/* Set up the object to use the stolen memory,
> +	 * backing store no longer managed by shmem layer */
> +	drm_gem_object_release(&(obj->base));
> +	obj->base.filp = NULL;
> +	obj->ops = &i915_gem_object_stolen_ops;
> +
> +	obj->pages = i915_pages_create_for_stolen(dev,
> +						stolen->start, stolen->size);
> +	if (obj->pages == NULL)
> +		goto cleanup;
> +
> +	i915_gem_object_pin_pages(obj);
> +	list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
> +	obj->has_dma_mapping = true;
> +	obj->stolen = stolen;
> +
> +	DRM_DEBUG_DRIVER("Obj moved to stolen, ptr = %p, size = %x\n",
> +			 obj, size);
> +
> +	obj->base.read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
> +	obj->cache_level = HAS_LLC(dev) ? I915_CACHE_LLC : I915_CACHE_NONE;
> +
> +	/* No zeroing-out of buffers allocated from stolen area */
> +	return;
> +
> +cleanup:
> +	drm_mm_remove_node(stolen);
> +	kfree(stolen);
> +	return;
> +}
> +
>  struct drm_i915_gem_object *
>  i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>  					       u32 stolen_offset,
> -- 
> 1.8.5.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Jan. 9, 2014, 7:46 a.m. UTC | #2
One thing I've forgotten: The Threading in your patch-bomb seems
busted, the patches are somehow not linked up to the cover letter. If
you use a recent git and format-patch/send-email with default settings
it should all work out ok though.

With proper in-reply linking patch bomb threads tend to get splattered
all over the place in mail clients making it hard to keep an overview
of the discussions.
-Daniel

On Thu, Jan 9, 2014 at 8:27 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jan 09, 2014 at 11:01:02AM +0530, akash.goel@intel.com wrote:
>> From: Akash Goel <akash.goel@intel.com>
>>
>> On VLV, 64MB of system memory was being reserved for stolen
>> area, but ~8MB of it was being utilized.
>> Increased the utilization of Stolen area by allocating
>> User created Frame buffers(only X Tiled) from it.
>> User Frame buffers are suitable for allocation from stolen area,
>> as its very unlikely that they are not accessed from CPU side.
>> And its even more unlikely that the Tiled(X) buffers will be
>> accessed directly from the CPU side. And any allocation
>> from stolen area is not directly CPU accessible, but
>> accessible only through the aperture space.
>> With 1080p sized frame buffers (32 bpp), the allocation of 6-7
>> frame buffers itself gives almost the full utilization of
>> stolen area.
>>
>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_drv.h        |  1 +
>>  drivers/gpu/drm/i915/i915_gem.c        | 21 ++++++++
>>  drivers/gpu/drm/i915/i915_gem_stolen.c | 93 ++++++++++++++++++++++++++++++++++
>>  3 files changed, 115 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 1bcc543..db29537 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2376,6 +2376,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>>                                              u32 stolen_offset,
>>                                              u32 gtt_offset,
>>                                              u32 size);
>> +void i915_gem_object_move_to_stolen(struct drm_i915_gem_object *obj);
>>  void i915_gem_object_release_stolen(struct drm_i915_gem_object *obj);
>>
>>  /* i915_gem_tiling.c */
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 656406d..24f93ef 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -3915,6 +3915,27 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
>>               }
>>       }
>>
>> +     /* Try to allocate the physical space for the GEM object,
>> +      * representing the User frame buffer, from the stolen area.
>> +      * But if there is no sufficient free space left in stolen
>> +      * area, will fallback to shmem.
>> +      */
>> +     if (obj->user_fb == 1) {
>> +             if (obj->pages == NULL) {
>> +                     if (obj->tiling_mode == I915_TILING_X) {
>> +                             /* Tiled(X) Scanout buffers are more suitable
>> +                              * for allocation from stolen area, as its very
>> +                              * unlikely that they will be accessed directly
>> +                              * from the CPU side and any allocation from
>> +                              * stolen area is not directly CPU accessible,
>> +                              * but accessible only through the aperture
>> +                              * space.
>> +                              */
>> +                             i915_gem_object_move_to_stolen(obj);
>> +                     }
>> +             }
>> +     }
>
> Neat hack ;-) But I think for upstream we need to address a few more
> concerns. Random comments:
>
> - The vlv patch subject is a bit misleading here since this is about
>   stolen memory usage in general across all platforms.
>
> - object_pin is the wrong place to change the backing storage since
>   someone else could already have pinned the it (e.g. through dma-buf). We
>   need to do this earlier before calling down into ->get_pages.
>
> - If we allow opportunistical placement of objects into stolen memory I
>   think we should also fully support purgeable objects so that we can
>   dynamically scale to workloads. Or at least to be able to kick out stuff
>   in case the kernel needs the contiguous memory for something more
>   important. So if we couldn't find a suitable stolen block we'd scan
>   through all objects with stolen memory backing and check whether any
>   purgeable objects could be kicked out.
>
> - For your usecase, have you tried to simply reduce the stolen area as
>   much as possible? Our friendly competition on ARM SoCs seems to have
>   mostly moved away from gfx reserved chunks towards more flexible
>   approaches like CMA. Giving stolen back to the linux page allocator
>   should be possible, but I haven't really looked into that yet all that
>   much ...
>
> - For upstream I think changing the personality of buffer objects behind
>   userspace's back is a bit too frisky wrt breaking something. I prefer if
>   userspace opts-in explicitly by passing a flag to the create ioctl
>   stating that stolen memory is allowed/preferred for this allocation.
>   Chris Wilson posted an rfc patch a while back to add a create2 ioctl
>   (which at the same time also allows us to set the caching, tiling and
>   any other object parameters).
>
> - We've had some "fun" last time around we've tried to use stolen more
>   seriously with scanout buffers being strangely offset/corrupted.
>   Hopefully this is fixed by your sg->offset patch, but I can't find the
>   reports nor recall the details right now.
>
> Cheers, Daniel
>> +
>>       if (!i915_gem_obj_bound(obj, vm)) {
>>               ret = i915_gem_object_bind_to_vm(obj, vm, alignment,
>>                                                map_and_fenceable,
>> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
>> index 5cf97d6..29c22f9 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>> @@ -29,6 +29,7 @@
>>  #include <drm/drmP.h>
>>  #include <drm/i915_drm.h>
>>  #include "i915_drv.h"
>> +#include <linux/shmem_fs.h>
>>
>>  /*
>>   * The BIOS typically reserves some of the system's memory for the exclusive
>> @@ -370,6 +371,98 @@ i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
>>       return NULL;
>>  }
>>
>> +void
>> +i915_gem_object_move_to_stolen(struct drm_i915_gem_object *obj)
>> +{
>> +     struct drm_device *dev = obj->base.dev;
>> +     struct drm_i915_private *dev_priv = dev->dev_private;
>> +     struct drm_mm_node *stolen;
>> +     u32 size = obj->base.size;
>> +     int ret = 0;
>> +
>> +     if (!IS_VALLEYVIEW(dev)) {
>> +             return;
>> +     }
>> +
>> +     if (obj->stolen) {
>> +             BUG_ON(obj->pages == NULL);
>> +             return;
>> +     }
>> +
>> +     if (!drm_mm_initialized(&dev_priv->mm.stolen))
>> +             return;
>> +
>> +     if (size == 0)
>> +             return;
>> +
>> +     /* Check if already shmem space has been allocated for the object
>> +      * or not. We cannot rely upon on the value of 'pages' field for this.
>> +      * As even though if the 'pages' field is NULL, it does not actually
>> +      * indicate that the backing physical space (shmem) is currently not
>> +      * reserved for the object, as the object may not get purged/truncated
>> +      * on the calll to 'put_pages_gtt'.
>> +      */
>> +     if (obj->base.filp) {
>> +             struct inode *inode = file_inode(obj->base.filp);
>> +             struct shmem_inode_info *info = SHMEM_I(inode);
>> +             if (!inode)
>> +                     return;
>> +             spin_lock(&info->lock);
>> +             /* The alloced field stores how many data pages are
>> +              * allocated to the file.
>> +              */
>> +             ret = info->alloced;
>> +             spin_unlock(&info->lock);
>> +             if (ret > 0) {
>> +                     DRM_DEBUG_DRIVER(
>> +                             "Already shmem space alloced, %d pges\n", ret);
>> +                     return;
>> +             }
>> +     }
>> +
>> +     stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
>> +     if (!stolen)
>> +             return;
>> +
>> +     ret = drm_mm_insert_node(&dev_priv->mm.stolen, stolen, size,
>> +                              4096, DRM_MM_SEARCH_DEFAULT);
>> +     if (ret) {
>> +             kfree(stolen);
>> +             DRM_DEBUG_DRIVER("ran out of stolen space\n");
>> +             return;
>> +     }
>> +
>> +     /* Set up the object to use the stolen memory,
>> +      * backing store no longer managed by shmem layer */
>> +     drm_gem_object_release(&(obj->base));
>> +     obj->base.filp = NULL;
>> +     obj->ops = &i915_gem_object_stolen_ops;
>> +
>> +     obj->pages = i915_pages_create_for_stolen(dev,
>> +                                             stolen->start, stolen->size);
>> +     if (obj->pages == NULL)
>> +             goto cleanup;
>> +
>> +     i915_gem_object_pin_pages(obj);
>> +     list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
>> +     obj->has_dma_mapping = true;
>> +     obj->stolen = stolen;
>> +
>> +     DRM_DEBUG_DRIVER("Obj moved to stolen, ptr = %p, size = %x\n",
>> +                      obj, size);
>> +
>> +     obj->base.read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
>> +     obj->cache_level = HAS_LLC(dev) ? I915_CACHE_LLC : I915_CACHE_NONE;
>> +
>> +     /* No zeroing-out of buffers allocated from stolen area */
>> +     return;
>> +
>> +cleanup:
>> +     drm_mm_remove_node(stolen);
>> +     kfree(stolen);
>> +     return;
>> +}
>> +
>>  struct drm_i915_gem_object *
>>  i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>>                                              u32 stolen_offset,
>> --
>> 1.8.5.2
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Jan. 9, 2014, 8:53 a.m. UTC | #3
Re-adding mailing-lists so everyone can learn.

On Thu, Jan 9, 2014 at 9:49 AM, Goel, Akash <akash.goel@intel.com> wrote:
> Then at last I did the following :-
> 1. git format-patch -7 --cover-letter
> 2. git send-email -identity=otc-external 0000-cover-letter.patch
> 3. git send-email -identity=otc-external 0001-drm-i915-Fix-the-offset-issue-for-the-stolen-GEM-obj
> 4. git send-email -identity=otc-external 0002-drm-i915-Resolving-the-memory-region-conflict-for-St
> And so on for the rest of the patches.

You need to send all the *.patch files out in one go, otherwise the
auto-threading won't work. To make that easier I format-patch each
patchbomb into it's own directory with the -o switch, so

$ git format-patch -7 --cover-letter -o ../patches/vlv-stolen
... edit cover letter ...
$ git send-email -identity=otc-external ../patches/vlv-stolen/*.patch

Then git will link up the mails correctly.

Cheers, Daniel
Chris Wilson Jan. 9, 2014, 9:48 a.m. UTC | #4
On Thu, Jan 09, 2014 at 11:01:02AM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> On VLV, 64MB of system memory was being reserved for stolen
> area, but ~8MB of it was being utilized.
> Increased the utilization of Stolen area by allocating
> User created Frame buffers(only X Tiled) from it.
> User Frame buffers are suitable for allocation from stolen area,
> as its very unlikely that they are not accessed from CPU side.
> And its even more unlikely that the Tiled(X) buffers will be
> accessed directly from the CPU side. And any allocation
> from stolen area is not directly CPU accessible, but
> accessible only through the aperture space.
> With 1080p sized frame buffers (32 bpp), the allocation of 6-7
> frame buffers itself gives almost the full utilization of
> stolen area.

This completely breaks the ABI. Nak.

See the create2 ioctl.
-Chris
akash.goel@intel.com Jan. 9, 2014, 6:09 p.m. UTC | #5
>> This completely breaks the ABI. Nak.

If you remember, before doing any stolen memory related changes, we had consulted you. We went ahead with these changes, after getting a nod from your side. 
This idea of allocating the User created frame buffers came from your side only.
We didn't had the option of using a new ioctl. And the GEM objects of User frame buffers were the only such objects which can be identified uniquely at driver level. Otherwise there was no way to determine the type of a GEM object at driver level and take some custom decisions based on that.
It was discussed that the stolen area has a limitation of not being directly accessible from CPU side, but it was also told that the direct access of User frame buffers (especially X tiled) from the CPU side would be very unlikely.
Also we have seen that typically the life-time of Frame buffers is more, as they are not local objects. They are always being used as a shared object between the Surface flinger & the Rendering App.
So the logic of BO cacheing on libdrm side is also not applicable to them and thus we didn't have to care about the BO purging part also.
Considering all this, we came to a conclusion that the User fb's fitted well into our requirement and we took the decision of using the stolen area as a backing store for the frame buffers whenever possible.

We admit that this is still a basic rudimentary implementation. But it can be enhanced/refined further over a period of time in subsequent patches.

Actually the problem of underutilization of Stolen memory is more pronounced on VLV-BYT platform only, as there is a Hw bug that either the allocation of Stolen area can be completely disabled or the next allocation has to be of 64 MB only. But this limitation is not present on newer platforms, like from CHV onwards. 
For low RAM based products on BYT, it became imperative for us to fully utilize the Stolen space. 

Best Regards
Akash

-----Original Message-----
From: Chris Wilson [mailto:chris@chris-wilson.co.uk] 
Sent: Thursday, January 09, 2014 3:18 PM
To: Goel, Akash
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 5/7] drm/i915/vlv: Increase the utilization of stolen memory on VLV.

On Thu, Jan 09, 2014 at 11:01:02AM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> On VLV, 64MB of system memory was being reserved for stolen area, but 
> ~8MB of it was being utilized.
> Increased the utilization of Stolen area by allocating User created 
> Frame buffers(only X Tiled) from it.
> User Frame buffers are suitable for allocation from stolen area, as 
> its very unlikely that they are not accessed from CPU side.
> And its even more unlikely that the Tiled(X) buffers will be accessed 
> directly from the CPU side. And any allocation from stolen area is not 
> directly CPU accessible, but accessible only through the aperture 
> space.
> With 1080p sized frame buffers (32 bpp), the allocation of 6-7 frame 
> buffers itself gives almost the full utilization of stolen area.

This completely breaks the ABI. Nak.

See the create2 ioctl.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre
akash.goel@intel.com Jan. 10, 2014, 8:40 a.m. UTC | #6
>> For your usecase, have you tried to simply reduce the stolen area as
>>  much as possible? Our friendly competition on ARM SoCs seems to have
>>  mostly moved away from gfx reserved chunks towards more flexible
>>  approaches like CMA. Giving stolen back to the linux page allocator
>>  should be possible, but I haven't really looked into that yet all that
>>  much ...

This is the first option we explored, as it would have made our task also simple. There is a Hw bug on BYT, due to which either the allocation of Stolen area can be completely disabled or the next allocation has to be of 64 MB only. 
But this limitation will not be present on upcoming platforms.

>> - object_pin is the wrong place to change the backing storage since
>>  someone else could already have pinned the it (e.g. through dma-buf). We
>>  need to do this earlier before calling down into ->get_pages.

We had an option to allocate the backing storage from stolen area, at the time when a GEM object is associated with a User created frame buffer (drm_mode_addfb2). 
But we often saw that pool of Frame buffer objects were getting created, but not all them were getting used. So we thought that it will be more optimal, if we reserve the stolen space when the object actually starts to gets used, i.e. when it is being mapped to GTT.
In order to handle the dma-buf case (for User frame buffers) we can add a new check, so as not to consider using stolen area for such objects.

>> - If we allow opportunistical placement of objects into stolen memory I
>>  think we should also fully support purgeable objects so that we can
>>  dynamically scale to workloads. Or at least to be able to kick out stuff
>>  in case the kernel needs the contiguous memory for something more
>>  important. So if we couldn't find a suitable stolen block we'd scan
>>  through all objects with stolen memory backing and check whether any
>>   purgeable objects could be kicked out.

Actually we didn't expected much value-add on having the purging/truncate logic for Frame buffer objects also allocated from stolen area.
We saw that Frame buffer objects were being used as shared objects only & not as local objects. So the cacheing/purging logic in libdrm will not really apply to them, 
until unless the gem_madvise ioctl call is used to truncate the objects. But on our UFO (OGL & Media) drivers side, currently the gem_madvise ioctl call is not being used. 
So until the frame buffer object itself is destroyed, it cannot be purged before that. 
On Android side, as the 'swap' is disabled, the physical space of the GEM objects cannot be reclaimed by releasing the ref count on the underlying Physical pages (put_pages).
The purging from the GEM shrinker side, will be really effective in relinquishing the backing physical space, only when the objects are marked as purgeable.
We can try to add the support to purge/truncate logic for the stolen objects, in order to create room in stolen space for a new frame buffer. 

>> For upstream I think changing the personality of buffer objects behind
>>  userspace's back is a bit too frisky wrt breaking something. I prefer if
>>  userspace opts-in explicitly by passing a flag to the create ioctl
>>  stating that stolen memory is allowed/preferred for this allocation.
>>  Chris Wilson posted an rfc patch a while back to add a create2 ioctl
>>  (which at the same time also allows us to set the caching, tiling and
>>  any other object parameters).

Yes, agree that is not cleanest of a solution, but we didn't had an option of introducing a new API/interface.
But the change is limited only to User created frame buffer GEM objects. 
What new constraints we will be introducing if we go ahead with this design for Frame buffers. 
The mmap_gtt interface can still be used for these GEM objects.

Best regards
Akash

-----Original Message-----
From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
Sent: Thursday, January 09, 2014 12:57 PM
To: Goel, Akash
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 5/7] drm/i915/vlv: Increase the utilization of stolen memory on VLV.

On Thu, Jan 09, 2014 at 11:01:02AM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> On VLV, 64MB of system memory was being reserved for stolen area, but 
> ~8MB of it was being utilized.
> Increased the utilization of Stolen area by allocating User created 
> Frame buffers(only X Tiled) from it.
> User Frame buffers are suitable for allocation from stolen area, as 
> its very unlikely that they are not accessed from CPU side.
> And its even more unlikely that the Tiled(X) buffers will be accessed 
> directly from the CPU side. And any allocation from stolen area is not 
> directly CPU accessible, but accessible only through the aperture 
> space.
> With 1080p sized frame buffers (32 bpp), the allocation of 6-7 frame 
> buffers itself gives almost the full utilization of stolen area.
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h        |  1 +
>  drivers/gpu/drm/i915/i915_gem.c        | 21 ++++++++
>  drivers/gpu/drm/i915/i915_gem_stolen.c | 93 
> ++++++++++++++++++++++++++++++++++
>  3 files changed, 115 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> b/drivers/gpu/drm/i915/i915_drv.h index 1bcc543..db29537 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2376,6 +2376,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>  					       u32 stolen_offset,
>  					       u32 gtt_offset,
>  					       u32 size);
> +void i915_gem_object_move_to_stolen(struct drm_i915_gem_object *obj);
>  void i915_gem_object_release_stolen(struct drm_i915_gem_object *obj);
>  
>  /* i915_gem_tiling.c */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> b/drivers/gpu/drm/i915/i915_gem.c index 656406d..24f93ef 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3915,6 +3915,27 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj,
>  		}
>  	}
>  
> +	/* Try to allocate the physical space for the GEM object,
> +	 * representing the User frame buffer, from the stolen area.
> +	 * But if there is no sufficient free space left in stolen
> +	 * area, will fallback to shmem.
> +	 */
> +	if (obj->user_fb == 1) {
> +		if (obj->pages == NULL) {
> +			if (obj->tiling_mode == I915_TILING_X) {
> +				/* Tiled(X) Scanout buffers are more suitable
> +				 * for allocation from stolen area, as its very
> +				 * unlikely that they will be accessed directly
> +				 * from the CPU side and any allocation from
> +				 * stolen area is not directly CPU accessible,
> +				 * but accessible only through the aperture
> +				 * space.
> +				 */
> +				i915_gem_object_move_to_stolen(obj);
> +			}
> +		}
> +	}

Neat hack ;-) But I think for upstream we need to address a few more concerns. Random comments:

- The vlv patch subject is a bit misleading here since this is about
  stolen memory usage in general across all platforms.

- object_pin is the wrong place to change the backing storage since
  someone else could already have pinned the it (e.g. through dma-buf). We
  need to do this earlier before calling down into ->get_pages.

- If we allow opportunistical placement of objects into stolen memory I
  think we should also fully support purgeable objects so that we can
  dynamically scale to workloads. Or at least to be able to kick out stuff
  in case the kernel needs the contiguous memory for something more
  important. So if we couldn't find a suitable stolen block we'd scan
  through all objects with stolen memory backing and check whether any
  purgeable objects could be kicked out.

- For your usecase, have you tried to simply reduce the stolen area as
  much as possible? Our friendly competition on ARM SoCs seems to have
  mostly moved away from gfx reserved chunks towards more flexible
  approaches like CMA. Giving stolen back to the linux page allocator
  should be possible, but I haven't really looked into that yet all that
  much ...

- For upstream I think changing the personality of buffer objects behind
  userspace's back is a bit too frisky wrt breaking something. I prefer if
  userspace opts-in explicitly by passing a flag to the create ioctl
  stating that stolen memory is allowed/preferred for this allocation.
  Chris Wilson posted an rfc patch a while back to add a create2 ioctl
  (which at the same time also allows us to set the caching, tiling and
  any other object parameters).

- We've had some "fun" last time around we've tried to use stolen more
  seriously with scanout buffers being strangely offset/corrupted.
  Hopefully this is fixed by your sg->offset patch, but I can't find the
  reports nor recall the details right now.

Cheers, Daniel
> +
>  	if (!i915_gem_obj_bound(obj, vm)) {
>  		ret = i915_gem_object_bind_to_vm(obj, vm, alignment,
>  						 map_and_fenceable,
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c 
> b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index 5cf97d6..29c22f9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -29,6 +29,7 @@
>  #include <drm/drmP.h>
>  #include <drm/i915_drm.h>
>  #include "i915_drv.h"
> +#include <linux/shmem_fs.h>
>  
>  /*
>   * The BIOS typically reserves some of the system's memory for the 
> exclusive @@ -370,6 +371,98 @@ i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
>  	return NULL;
>  }
>  
> +void
> +i915_gem_object_move_to_stolen(struct drm_i915_gem_object *obj) {
> +	struct drm_device *dev = obj->base.dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_mm_node *stolen;
> +	u32 size = obj->base.size;
> +	int ret = 0;
> +
> +	if (!IS_VALLEYVIEW(dev)) {
> +		return;
> +	}
> +
> +	if (obj->stolen) {
> +		BUG_ON(obj->pages == NULL);
> +		return;
> +	}
> +
> +	if (!drm_mm_initialized(&dev_priv->mm.stolen))
> +		return;
> +
> +	if (size == 0)
> +		return;
> +
> +	/* Check if already shmem space has been allocated for the object
> +	 * or not. We cannot rely upon on the value of 'pages' field for this.
> +	 * As even though if the 'pages' field is NULL, it does not actually
> +	 * indicate that the backing physical space (shmem) is currently not
> +	 * reserved for the object, as the object may not get purged/truncated
> +	 * on the calll to 'put_pages_gtt'.
> +	 */
> +	if (obj->base.filp) {
> +		struct inode *inode = file_inode(obj->base.filp);
> +		struct shmem_inode_info *info = SHMEM_I(inode);
> +		if (!inode)
> +			return;
> +		spin_lock(&info->lock);
> +		/* The alloced field stores how many data pages are
> +		 * allocated to the file.
> +		 */
> +		ret = info->alloced;
> +		spin_unlock(&info->lock);
> +		if (ret > 0) {
> +			DRM_DEBUG_DRIVER(
> +				"Already shmem space alloced, %d pges\n", ret);
> +			return;
> +		}
> +	}
> +
> +	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
> +	if (!stolen)
> +		return;
> +
> +	ret = drm_mm_insert_node(&dev_priv->mm.stolen, stolen, size,
> +				 4096, DRM_MM_SEARCH_DEFAULT);
> +	if (ret) {
> +		kfree(stolen);
> +		DRM_DEBUG_DRIVER("ran out of stolen space\n");
> +		return;
> +	}
> +
> +	/* Set up the object to use the stolen memory,
> +	 * backing store no longer managed by shmem layer */
> +	drm_gem_object_release(&(obj->base));
> +	obj->base.filp = NULL;
> +	obj->ops = &i915_gem_object_stolen_ops;
> +
> +	obj->pages = i915_pages_create_for_stolen(dev,
> +						stolen->start, stolen->size);
> +	if (obj->pages == NULL)
> +		goto cleanup;
> +
> +	i915_gem_object_pin_pages(obj);
> +	list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
> +	obj->has_dma_mapping = true;
> +	obj->stolen = stolen;
> +
> +	DRM_DEBUG_DRIVER("Obj moved to stolen, ptr = %p, size = %x\n",
> +			 obj, size);
> +
> +	obj->base.read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
> +	obj->cache_level = HAS_LLC(dev) ? I915_CACHE_LLC : I915_CACHE_NONE;
> +
> +	/* No zeroing-out of buffers allocated from stolen area */
> +	return;
> +
> +cleanup:
> +	drm_mm_remove_node(stolen);
> +	kfree(stolen);
> +	return;
> +}
> +
>  struct drm_i915_gem_object *
>  i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>  					       u32 stolen_offset,
> --
> 1.8.5.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Jan. 10, 2014, 9:26 a.m. UTC | #7
On Fri, Jan 10, 2014 at 08:40:34AM +0000, Goel, Akash wrote:
> >> For your usecase, have you tried to simply reduce the stolen area as
> >>  much as possible? Our friendly competition on ARM SoCs seems to have
> >>  mostly moved away from gfx reserved chunks towards more flexible
> >>  approaches like CMA. Giving stolen back to the linux page allocator
> >>  should be possible, but I haven't really looked into that yet all that
> >>  much ...
> 
> This is the first option we explored, as it would have made our task
> also simple. There is a Hw bug on BYT, due to which either the
> allocation of Stolen area can be completely disabled or the next
> allocation has to be of 64 MB only.  But this limitation will not be
> present on upcoming platforms.
> 
> >> - object_pin is the wrong place to change the backing storage since
> >>  someone else could already have pinned the it (e.g. through dma-buf). We
> >>  need to do this earlier before calling down into ->get_pages.
> 
> We had an option to allocate the backing storage from stolen area, at
> the time when a GEM object is associated with a User created frame
> buffer (drm_mode_addfb2).  But we often saw that pool of Frame buffer
> objects were getting created, but not all them were getting used. So we
> thought that it will be more optimal, if we reserve the stolen space
> when the object actually starts to gets used, i.e. when it is being
> mapped to GTT.  In order to handle the dma-buf case (for User frame
> buffers) we can add a new check, so as not to consider using stolen area
> for such objects.

Correct purgeable semantics with shared objects is still an open for
shared buffers. An option would be to introduce the concept of weak/strong
userspace references to dma-buf/gem objects and allow a buffer to be
purged if there's no strong reference. A bit more work though.

> >> - If we allow opportunistical placement of objects into stolen memory I
> >>  think we should also fully support purgeable objects so that we can
> >>  dynamically scale to workloads. Or at least to be able to kick out stuff
> >>  in case the kernel needs the contiguous memory for something more
> >>  important. So if we couldn't find a suitable stolen block we'd scan
> >>  through all objects with stolen memory backing and check whether any
> >>   purgeable objects could be kicked out.
> 
> Actually we didn't expected much value-add on having the
> purging/truncate logic for Frame buffer objects also allocated from
> stolen area.  We saw that Frame buffer objects were being used as shared
> objects only & not as local objects. So the cacheing/purging logic in
> libdrm will not really apply to them, until unless the gem_madvise ioctl
> call is used to truncate the objects.  But on our UFO (OGL & Media)
> drivers side, currently the gem_madvise ioctl call is not being used.
> So until the frame buffer object itself is destroyed, it cannot be
> purged before that.  On Android side, as the 'swap' is disabled, the
> physical space of the GEM objects cannot be reclaimed by releasing the
> ref count on the underlying Physical pages (put_pages).  The purging
> from the GEM shrinker side, will be really effective in relinquishing
> the backing physical space, only when the objects are marked as
> purgeable.  We can try to add the support to purge/truncate logic for
> the stolen objects, in order to create room in stolen space for a new
> frame buffer. 

Well the caching/purging will be interesting once we have use stolen in
general where it's useful, e.g. for accelarated media decode using large
page ptes.

> >> For upstream I think changing the personality of buffer objects behind
> >>  userspace's back is a bit too frisky wrt breaking something. I prefer if
> >>  userspace opts-in explicitly by passing a flag to the create ioctl
> >>  stating that stolen memory is allowed/preferred for this allocation.
> >>  Chris Wilson posted an rfc patch a while back to add a create2 ioctl
> >>  (which at the same time also allows us to set the caching, tiling and
> >>  any other object parameters).
> 
> Yes, agree that is not cleanest of a solution, but we didn't had an
> option of introducing a new API/interface.  But the change is limited
> only to User created frame buffer GEM objects.  What new constraints we
> will be introducing if we go ahead with this design for Frame buffers.
> The mmap_gtt interface can still be used for these GEM objects.

It breaks pwrite/pread and cpu mmaps. Yes, we've had old userspace that
uses this on X-tiled buffers and breaking them isn't an option for
upstream really.

One thing I don't understand is why you can't use a new ioctl. You've
emphasised this point a few times by now, but I don't understand the
reasons for it ... Please elaborate.

Cheers, Daniel
akash.goel@intel.com Jan. 13, 2014, 9:17 a.m. UTC | #8
>> It breaks pwrite/pread and cpu mmaps. Yes, we've had old userspace that uses this on X-tiled buffers and breaking them isn't an option for upstream really.
>> One thing I don't understand is why you can't use a new ioctl. You've emphasised this point a few times by now, but I don't understand the reasons for it ... Please elaborate.

Since pwrite/pread interfaces are being handled completely by Driver, we can modify their implementation to add support for Stolen buffers also (as Driver can access the stolen contents through the aperture space).
It's the CPU mmaps which will still remain broken. But is User space code aware & capable of handling the bit17 swizzling (since currently driver handle that for pread/pwrite), even though they can handle the X-Tile to Linear conversion ?

We were reluctant to modify the User space Driver code, to introduce a new ioctl for using the stolen area as a backing store. 
Actually Stolen memory is acting as a dedicated graphics memory, which can be used by the Driver. 
Our understanding that since kernel mode Driver is acting as a Graphics memory manager, it shall have the prerogative to optimally decide when to use the System memory(shmem) or Graphics memory for the allocation of Graphics resources. We were apprehensive about exposing the stolen memory interface, as we thought that then Users will always prefer to use the dedicated memory for their allocations which may not be optimal.

Best Regards
Akash 



-----Original Message-----
From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
Sent: Friday, January 10, 2014 2:57 PM
To: Goel, Akash
Cc: Daniel Vetter; Vetter, Daniel; intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 5/7] drm/i915/vlv: Increase the utilization of stolen memory on VLV.

On Fri, Jan 10, 2014 at 08:40:34AM +0000, Goel, Akash wrote:
> >> For your usecase, have you tried to simply reduce the stolen area 
> >> as  much as possible? Our friendly competition on ARM SoCs seems to 
> >> have  mostly moved away from gfx reserved chunks towards more 
> >> flexible  approaches like CMA. Giving stolen back to the linux page 
> >> allocator  should be possible, but I haven't really looked into 
> >> that yet all that  much ...
> 
> This is the first option we explored, as it would have made our task 
> also simple. There is a Hw bug on BYT, due to which either the 
> allocation of Stolen area can be completely disabled or the next 
> allocation has to be of 64 MB only.  But this limitation will not be 
> present on upcoming platforms.
> 
> >> - object_pin is the wrong place to change the backing storage since  
> >> someone else could already have pinned the it (e.g. through 
> >> dma-buf). We  need to do this earlier before calling down into ->get_pages.
> 
> We had an option to allocate the backing storage from stolen area, at 
> the time when a GEM object is associated with a User created frame 
> buffer (drm_mode_addfb2).  But we often saw that pool of Frame buffer 
> objects were getting created, but not all them were getting used. So 
> we thought that it will be more optimal, if we reserve the stolen 
> space when the object actually starts to gets used, i.e. when it is 
> being mapped to GTT.  In order to handle the dma-buf case (for User 
> frame
> buffers) we can add a new check, so as not to consider using stolen 
> area for such objects.

Correct purgeable semantics with shared objects is still an open for shared buffers. An option would be to introduce the concept of weak/strong userspace references to dma-buf/gem objects and allow a buffer to be purged if there's no strong reference. A bit more work though.

> >> - If we allow opportunistical placement of objects into stolen 
> >> memory I  think we should also fully support purgeable objects so 
> >> that we can  dynamically scale to workloads. Or at least to be able 
> >> to kick out stuff  in case the kernel needs the contiguous memory 
> >> for something more  important. So if we couldn't find a suitable 
> >> stolen block we'd scan  through all objects with stolen memory backing and check whether any
> >>   purgeable objects could be kicked out.
> 
> Actually we didn't expected much value-add on having the 
> purging/truncate logic for Frame buffer objects also allocated from 
> stolen area.  We saw that Frame buffer objects were being used as 
> shared objects only & not as local objects. So the cacheing/purging 
> logic in libdrm will not really apply to them, until unless the 
> gem_madvise ioctl call is used to truncate the objects.  But on our 
> UFO (OGL & Media) drivers side, currently the gem_madvise ioctl call is not being used.
> So until the frame buffer object itself is destroyed, it cannot be 
> purged before that.  On Android side, as the 'swap' is disabled, the 
> physical space of the GEM objects cannot be reclaimed by releasing the 
> ref count on the underlying Physical pages (put_pages).  The purging 
> from the GEM shrinker side, will be really effective in relinquishing 
> the backing physical space, only when the objects are marked as 
> purgeable.  We can try to add the support to purge/truncate logic for 
> the stolen objects, in order to create room in stolen space for a new 
> frame buffer.

Well the caching/purging will be interesting once we have use stolen in general where it's useful, e.g. for accelarated media decode using large page ptes.

> >> For upstream I think changing the personality of buffer objects 
> >> behind  userspace's back is a bit too frisky wrt breaking 
> >> something. I prefer if  userspace opts-in explicitly by passing a 
> >> flag to the create ioctl  stating that stolen memory is allowed/preferred for this allocation.
> >>  Chris Wilson posted an rfc patch a while back to add a create2 
> >> ioctl  (which at the same time also allows us to set the caching, 
> >> tiling and  any other object parameters).
> 
> Yes, agree that is not cleanest of a solution, but we didn't had an 
> option of introducing a new API/interface.  But the change is limited 
> only to User created frame buffer GEM objects.  What new constraints 
> we will be introducing if we go ahead with this design for Frame buffers.
> The mmap_gtt interface can still be used for these GEM objects.

It breaks pwrite/pread and cpu mmaps. Yes, we've had old userspace that uses this on X-tiled buffers and breaking them isn't an option for upstream really.

One thing I don't understand is why you can't use a new ioctl. You've emphasised this point a few times by now, but I don't understand the reasons for it ... Please elaborate.

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1bcc543..db29537 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2376,6 +2376,7 @@  i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 					       u32 stolen_offset,
 					       u32 gtt_offset,
 					       u32 size);
+void i915_gem_object_move_to_stolen(struct drm_i915_gem_object *obj);
 void i915_gem_object_release_stolen(struct drm_i915_gem_object *obj);
 
 /* i915_gem_tiling.c */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 656406d..24f93ef 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3915,6 +3915,27 @@  i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		}
 	}
 
+	/* Try to allocate the physical space for the GEM object,
+	 * representing the User frame buffer, from the stolen area.
+	 * But if there is no sufficient free space left in stolen
+	 * area, will fallback to shmem.
+	 */
+	if (obj->user_fb == 1) {
+		if (obj->pages == NULL) {
+			if (obj->tiling_mode == I915_TILING_X) {
+				/* Tiled(X) Scanout buffers are more suitable
+				 * for allocation from stolen area, as its very
+				 * unlikely that they will be accessed directly
+				 * from the CPU side and any allocation from
+				 * stolen area is not directly CPU accessible,
+				 * but accessible only through the aperture
+				 * space.
+				 */
+				i915_gem_object_move_to_stolen(obj);
+			}
+		}
+	}
+
 	if (!i915_gem_obj_bound(obj, vm)) {
 		ret = i915_gem_object_bind_to_vm(obj, vm, alignment,
 						 map_and_fenceable,
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 5cf97d6..29c22f9 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -29,6 +29,7 @@ 
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
+#include <linux/shmem_fs.h>
 
 /*
  * The BIOS typically reserves some of the system's memory for the exclusive
@@ -370,6 +371,98 @@  i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
 	return NULL;
 }
 
+void
+i915_gem_object_move_to_stolen(struct drm_i915_gem_object *obj)
+{
+	struct drm_device *dev = obj->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_mm_node *stolen;
+	u32 size = obj->base.size;
+	int ret = 0;
+
+	if (!IS_VALLEYVIEW(dev)) {
+		return;
+	}
+
+	if (obj->stolen) {
+		BUG_ON(obj->pages == NULL);
+		return;
+	}
+
+	if (!drm_mm_initialized(&dev_priv->mm.stolen))
+		return;
+
+	if (size == 0)
+		return;
+
+	/* Check if already shmem space has been allocated for the object
+	 * or not. We cannot rely upon on the value of 'pages' field for this.
+	 * As even though if the 'pages' field is NULL, it does not actually
+	 * indicate that the backing physical space (shmem) is currently not
+	 * reserved for the object, as the object may not get purged/truncated
+	 * on the calll to 'put_pages_gtt'.
+	 */
+	if (obj->base.filp) {
+		struct inode *inode = file_inode(obj->base.filp);
+		struct shmem_inode_info *info = SHMEM_I(inode);
+		if (!inode)
+			return;
+		spin_lock(&info->lock);
+		/* The alloced field stores how many data pages are
+		 * allocated to the file.
+		 */
+		ret = info->alloced;
+		spin_unlock(&info->lock);
+		if (ret > 0) {
+			DRM_DEBUG_DRIVER(
+				"Already shmem space alloced, %d pges\n", ret);
+			return;
+		}
+	}
+
+	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
+	if (!stolen)
+		return;
+
+	ret = drm_mm_insert_node(&dev_priv->mm.stolen, stolen, size,
+				 4096, DRM_MM_SEARCH_DEFAULT);
+	if (ret) {
+		kfree(stolen);
+		DRM_DEBUG_DRIVER("ran out of stolen space\n");
+		return;
+	}
+
+	/* Set up the object to use the stolen memory,
+	 * backing store no longer managed by shmem layer */
+	drm_gem_object_release(&(obj->base));
+	obj->base.filp = NULL;
+	obj->ops = &i915_gem_object_stolen_ops;
+
+	obj->pages = i915_pages_create_for_stolen(dev,
+						stolen->start, stolen->size);
+	if (obj->pages == NULL)
+		goto cleanup;
+
+	i915_gem_object_pin_pages(obj);
+	list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
+	obj->has_dma_mapping = true;
+	obj->stolen = stolen;
+
+	DRM_DEBUG_DRIVER("Obj moved to stolen, ptr = %p, size = %x\n",
+			 obj, size);
+
+	obj->base.read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
+	obj->cache_level = HAS_LLC(dev) ? I915_CACHE_LLC : I915_CACHE_NONE;
+
+	/* No zeroing-out of buffers allocated from stolen area */
+	return;
+
+cleanup:
+	drm_mm_remove_node(stolen);
+	kfree(stolen);
+	return;
+}
+
 struct drm_i915_gem_object *
 i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 					       u32 stolen_offset,