diff mbox

drm/radeon: fix TOPDOWN handling for bo_create

Message ID CADnq5_NeqyNb=XKxdWuEGiHbfGX+i5yqtFRyOnzg3EzyjzQmKg@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Deucher March 17, 2015, 3:19 p.m. UTC
On Mon, Mar 16, 2015 at 11:48 PM, Michel Dänzer <michel@daenzer.net> wrote:
> On 17.03.2015 07:32, Alex Deucher wrote:
>> On Thu, Mar 12, 2015 at 10:55 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>> On 12.03.2015 22:09, Alex Deucher wrote:
>>>> On Thu, Mar 12, 2015 at 5:23 AM, Christian König
>>>> <deathsimple@vodafone.de> wrote:
>>>>> On 12.03.2015 10:02, Michel Dänzer wrote:
>>>>>>
>>>>>> On 12.03.2015 06:14, Alex Deucher wrote:
>>>>>>>
>>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Alex Deucher <alexdeucher@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Wed, Mar 11, 2015 at 2:21 PM, Christian König
>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>
>>>>>>>>> On 11.03.2015 16:44, Alex Deucher wrote:
>>>>>>>>>>
>>>>>>>>>> radeon_bo_create() calls radeon_ttm_placement_from_domain()
>>>>>>>>>> before ttm_bo_init() is called.  radeon_ttm_placement_from_domain()
>>>>>>>>>> uses the ttm bo size to determine when to select top down
>>>>>>>>>> allocation but since the ttm bo is not initialized yet the
>>>>>>>>>> check is always false.
>>>>>>>>>>
>>>>>>>>>> Noticed-by: Oded Gabbay <oded.gabbay@amd.com>
>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And I was already wondering why the heck the BOs always made this
>>>>>>>>> ping/pong
>>>>>>>>> in memory after creation.
>>>>>>>>>
>>>>>>>>> Patch is Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>>>>
>>>>>>>> And fixing that promptly broke VCE due to vram location requirements.
>>>>>>>> Updated patch attached.  Thoughts?
>>>>>>>
>>>>>>> And one more take to make things a bit more explicit for static kernel
>>>>>>> driver allocations.
>>>>>>
>>>>>> struct ttm_place::lpfn is honoured even with TTM_PL_FLAG_TOPDOWN, so
>>>>>> latter should work with RADEON_GEM_CPU_ACCESS. It sounds like the
>>>>>> problem is really that some BOs are expected to be within a certain
>>>>>> range from the beginning of VRAM, but lpfn isn't set accordingly. It
>>>>>> would be better to fix that by setting lpfn directly than indirectly via
>>>>>> RADEON_GEM_CPU_ACCESS.
>>>>>
>>>>>
>>>>> Yeah, agree. We should probably try to find the root cause of this instead.
>>>>>
>>>>> As far as I know VCE has no documented limitation on where buffers are
>>>>> placed (unlike UVD). So this is a bit strange. Are you sure that it isn't
>>>>> UVD which breaks here?
>>>>
>>>> It's definitely VCE, I don't know why UVD didn't have a problem.  I
>>>> considered using pin_restricted to make sure it got pinned in the CPU
>>>> visible region, but that had two problems: 1. it would end up getting
>>>> migrated when pinned,
>>>
>>> Maybe something like radeon_uvd_force_into_uvd_segment() is needed for
>>> VCE as well?
>>>
>>>
>>>> 2. it would end up at the top of the restricted
>>>> region since the top down flag is set which would end up fragmenting
>>>> vram.
>>>
>>> If that's an issue (which outweighs the supposed benefit of
>>> TTM_PL_FLAG_TOPDOWN), then again the proper solution would be not to set
>>> TTM_PL_FLAG_TOPDOWN when rbo->placements[i].lpfn != 0 and smaller than
>>> the whole available region, instead of checking for VRAM and
>>> RADEON_GEM_CPU_ACCESS.
>>>
>>
>> How about something like the attached patch?  I'm not really sure
>> about the restrictions for the UVD and VCE fw and stack/heap buffers,
>> but this seems to work.  It seems like the current UVD/VCE code works
>> by accident since the check for TOPDOWN fails.
>
> This patch is getting a bit messy, mixing several logically separate
> changes. Can you split it up accordingly? E.g. one patch just adding the
> new fpfn and lpfn function parameters but passing 0 for them (so no
> functional change), then one or several patches with the corresponding
> functional changes, and finally one patch adding the new size parameter
> (and thus making TTM_PL_FLAG_TOPDOWN actually used for newly allocated
> BOs). I think that would help for reviewing and generally understanding
> the changes.
>
>
>> @@ -105,14 +106,17 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
>>                */
>>               if ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) &&
>>                   rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size) {
>> -                     rbo->placements[c].fpfn =
>> -                             rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
>> +                     if (fpfn > (rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT))
>> +                             rbo->placements[c].fpfn = fpfn;
>> +                     else
>> +                             rbo->placements[c].fpfn =
>> +                                     rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
>>                       rbo->placements[c++].flags = TTM_PL_FLAG_WC |
>>                                                    TTM_PL_FLAG_UNCACHED |
>>                                                    TTM_PL_FLAG_VRAM;
>>               }
>
> If (fpfn >= rbo->rdev->mc.visible_vram_size), this whole block can be
> skipped, since the next placement will be identical.
>
> OTOH, fpfn is currently always 0 anyway, so maybe it's better not to add
> that parameter in the first place.
>
>
> Other than that, looks good to me.

Broken out patches attached.  Also available here:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=topdown-fixes

Alex

Comments

Christian König March 17, 2015, 3:43 p.m. UTC | #1
On 17.03.2015 16:19, Alex Deucher wrote:
> On Mon, Mar 16, 2015 at 11:48 PM, Michel Dänzer <michel@daenzer.net> wrote:
>> On 17.03.2015 07:32, Alex Deucher wrote:
>>> On Thu, Mar 12, 2015 at 10:55 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>>> On 12.03.2015 22:09, Alex Deucher wrote:
>>>>> On Thu, Mar 12, 2015 at 5:23 AM, Christian König
>>>>> <deathsimple@vodafone.de> wrote:
>>>>>> On 12.03.2015 10:02, Michel Dänzer wrote:
>>>>>>> On 12.03.2015 06:14, Alex Deucher wrote:
>>>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Alex Deucher <alexdeucher@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> On Wed, Mar 11, 2015 at 2:21 PM, Christian König
>>>>>>>>> <deathsimple@vodafone.de> wrote:
>>>>>>>>>> On 11.03.2015 16:44, Alex Deucher wrote:
>>>>>>>>>>> radeon_bo_create() calls radeon_ttm_placement_from_domain()
>>>>>>>>>>> before ttm_bo_init() is called.  radeon_ttm_placement_from_domain()
>>>>>>>>>>> uses the ttm bo size to determine when to select top down
>>>>>>>>>>> allocation but since the ttm bo is not initialized yet the
>>>>>>>>>>> check is always false.
>>>>>>>>>>>
>>>>>>>>>>> Noticed-by: Oded Gabbay <oded.gabbay@amd.com>
>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>>>>
>>>>>>>>>> And I was already wondering why the heck the BOs always made this
>>>>>>>>>> ping/pong
>>>>>>>>>> in memory after creation.
>>>>>>>>>>
>>>>>>>>>> Patch is Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>>>>> And fixing that promptly broke VCE due to vram location requirements.
>>>>>>>>> Updated patch attached.  Thoughts?
>>>>>>>> And one more take to make things a bit more explicit for static kernel
>>>>>>>> driver allocations.
>>>>>>> struct ttm_place::lpfn is honoured even with TTM_PL_FLAG_TOPDOWN, so
>>>>>>> latter should work with RADEON_GEM_CPU_ACCESS. It sounds like the
>>>>>>> problem is really that some BOs are expected to be within a certain
>>>>>>> range from the beginning of VRAM, but lpfn isn't set accordingly. It
>>>>>>> would be better to fix that by setting lpfn directly than indirectly via
>>>>>>> RADEON_GEM_CPU_ACCESS.
>>>>>>
>>>>>> Yeah, agree. We should probably try to find the root cause of this instead.
>>>>>>
>>>>>> As far as I know VCE has no documented limitation on where buffers are
>>>>>> placed (unlike UVD). So this is a bit strange. Are you sure that it isn't
>>>>>> UVD which breaks here?
>>>>> It's definitely VCE, I don't know why UVD didn't have a problem.  I
>>>>> considered using pin_restricted to make sure it got pinned in the CPU
>>>>> visible region, but that had two problems: 1. it would end up getting
>>>>> migrated when pinned,
>>>> Maybe something like radeon_uvd_force_into_uvd_segment() is needed for
>>>> VCE as well?
>>>>
>>>>
>>>>> 2. it would end up at the top of the restricted
>>>>> region since the top down flag is set which would end up fragmenting
>>>>> vram.
>>>> If that's an issue (which outweighs the supposed benefit of
>>>> TTM_PL_FLAG_TOPDOWN), then again the proper solution would be not to set
>>>> TTM_PL_FLAG_TOPDOWN when rbo->placements[i].lpfn != 0 and smaller than
>>>> the whole available region, instead of checking for VRAM and
>>>> RADEON_GEM_CPU_ACCESS.
>>>>
>>> How about something like the attached patch?  I'm not really sure
>>> about the restrictions for the UVD and VCE fw and stack/heap buffers,
>>> but this seems to work.  It seems like the current UVD/VCE code works
>>> by accident since the check for TOPDOWN fails.
>> This patch is getting a bit messy, mixing several logically separate
>> changes. Can you split it up accordingly? E.g. one patch just adding the
>> new fpfn and lpfn function parameters but passing 0 for them (so no
>> functional change), then one or several patches with the corresponding
>> functional changes, and finally one patch adding the new size parameter
>> (and thus making TTM_PL_FLAG_TOPDOWN actually used for newly allocated
>> BOs). I think that would help for reviewing and generally understanding
>> the changes.
>>
>>
>>> @@ -105,14 +106,17 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
>>>                 */
>>>                if ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) &&
>>>                    rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size) {
>>> -                     rbo->placements[c].fpfn =
>>> -                             rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
>>> +                     if (fpfn > (rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT))
>>> +                             rbo->placements[c].fpfn = fpfn;
>>> +                     else
>>> +                             rbo->placements[c].fpfn =
>>> +                                     rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
>>>                        rbo->placements[c++].flags = TTM_PL_FLAG_WC |
>>>                                                     TTM_PL_FLAG_UNCACHED |
>>>                                                     TTM_PL_FLAG_VRAM;
>>>                }
>> If (fpfn >= rbo->rdev->mc.visible_vram_size), this whole block can be
>> skipped, since the next placement will be identical.
>>
>> OTOH, fpfn is currently always 0 anyway, so maybe it's better not to add
>> that parameter in the first place.
>>
>>
>> Other than that, looks good to me.
> Broken out patches attached.  Also available here:
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=topdown-fixes
Thinking more about it that approach is a NAK. For limiting a BO into 
visible VRAM we want the limit it to only apply to the VRAM domain 
entry, doing it this way it applies to GTT as well which is really bad 
for handling page faults.

I would rather say let us completely nuke 
radeon_ttm_placement_from_domain for internal allocations and give 
radeon_bo_create a ttm_placement pointer to use.

Driver internal allocations would then have a couple of predefined 
placements for their buffers. We might need to make a few ttm_placement 
pointers const for this, but I think that this is the better approach.

Regards,
Christian.


>
> Alex
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
Alex Deucher March 17, 2015, 3:50 p.m. UTC | #2
On Tue, Mar 17, 2015 at 11:43 AM, Christian König
<deathsimple@vodafone.de> wrote:
> On 17.03.2015 16:19, Alex Deucher wrote:
>
> On Mon, Mar 16, 2015 at 11:48 PM, Michel Dänzer <michel@daenzer.net> wrote:
>
> On 17.03.2015 07:32, Alex Deucher wrote:
>
> On Thu, Mar 12, 2015 at 10:55 PM, Michel Dänzer <michel@daenzer.net> wrote:
>
> On 12.03.2015 22:09, Alex Deucher wrote:
>
> On Thu, Mar 12, 2015 at 5:23 AM, Christian König
> <deathsimple@vodafone.de> wrote:
>
> On 12.03.2015 10:02, Michel Dänzer wrote:
>
> On 12.03.2015 06:14, Alex Deucher wrote:
>
> On Wed, Mar 11, 2015 at 4:51 PM, Alex Deucher <alexdeucher@gmail.com>
> wrote:
>
> On Wed, Mar 11, 2015 at 2:21 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>
> On 11.03.2015 16:44, Alex Deucher wrote:
>
> radeon_bo_create() calls radeon_ttm_placement_from_domain()
> before ttm_bo_init() is called.  radeon_ttm_placement_from_domain()
> uses the ttm bo size to determine when to select top down
> allocation but since the ttm bo is not initialized yet the
> check is always false.
>
> Noticed-by: Oded Gabbay <oded.gabbay@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Cc: stable@vger.kernel.org
>
> And I was already wondering why the heck the BOs always made this
> ping/pong
> in memory after creation.
>
> Patch is Reviewed-by: Christian König <christian.koenig@amd.com>
>
> And fixing that promptly broke VCE due to vram location requirements.
> Updated patch attached.  Thoughts?
>
> And one more take to make things a bit more explicit for static kernel
> driver allocations.
>
> struct ttm_place::lpfn is honoured even with TTM_PL_FLAG_TOPDOWN, so
> latter should work with RADEON_GEM_CPU_ACCESS. It sounds like the
> problem is really that some BOs are expected to be within a certain
> range from the beginning of VRAM, but lpfn isn't set accordingly. It
> would be better to fix that by setting lpfn directly than indirectly via
> RADEON_GEM_CPU_ACCESS.
>
> Yeah, agree. We should probably try to find the root cause of this instead.
>
> As far as I know VCE has no documented limitation on where buffers are
> placed (unlike UVD). So this is a bit strange. Are you sure that it isn't
> UVD which breaks here?
>
> It's definitely VCE, I don't know why UVD didn't have a problem.  I
> considered using pin_restricted to make sure it got pinned in the CPU
> visible region, but that had two problems: 1. it would end up getting
> migrated when pinned,
>
> Maybe something like radeon_uvd_force_into_uvd_segment() is needed for
> VCE as well?
>
>
> 2. it would end up at the top of the restricted
> region since the top down flag is set which would end up fragmenting
> vram.
>
> If that's an issue (which outweighs the supposed benefit of
> TTM_PL_FLAG_TOPDOWN), then again the proper solution would be not to set
> TTM_PL_FLAG_TOPDOWN when rbo->placements[i].lpfn != 0 and smaller than
> the whole available region, instead of checking for VRAM and
> RADEON_GEM_CPU_ACCESS.
>
> How about something like the attached patch?  I'm not really sure
> about the restrictions for the UVD and VCE fw and stack/heap buffers,
> but this seems to work.  It seems like the current UVD/VCE code works
> by accident since the check for TOPDOWN fails.
>
> This patch is getting a bit messy, mixing several logically separate
> changes. Can you split it up accordingly? E.g. one patch just adding the
> new fpfn and lpfn function parameters but passing 0 for them (so no
> functional change), then one or several patches with the corresponding
> functional changes, and finally one patch adding the new size parameter
> (and thus making TTM_PL_FLAG_TOPDOWN actually used for newly allocated
> BOs). I think that would help for reviewing and generally understanding
> the changes.
>
>
> @@ -105,14 +106,17 @@ void radeon_ttm_placement_from_domain(struct radeon_bo
> *rbo, u32 domain)
>                */
>               if ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) &&
>                   rbo->rdev->mc.visible_vram_size <
> rbo->rdev->mc.real_vram_size) {
> -                     rbo->placements[c].fpfn =
> -                             rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
> +                     if (fpfn > (rbo->rdev->mc.visible_vram_size >>
> PAGE_SHIFT))
> +                             rbo->placements[c].fpfn = fpfn;
> +                     else
> +                             rbo->placements[c].fpfn =
> +                                     rbo->rdev->mc.visible_vram_size >>
> PAGE_SHIFT;
>                       rbo->placements[c++].flags = TTM_PL_FLAG_WC |
>                                                    TTM_PL_FLAG_UNCACHED |
>                                                    TTM_PL_FLAG_VRAM;
>               }
>
> If (fpfn >= rbo->rdev->mc.visible_vram_size), this whole block can be
> skipped, since the next placement will be identical.
>
> OTOH, fpfn is currently always 0 anyway, so maybe it's better not to add
> that parameter in the first place.
>
>
> Other than that, looks good to me.
>
> Broken out patches attached.  Also available here:
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=topdown-fixes
>
> Thinking more about it that approach is a NAK. For limiting a BO into
> visible VRAM we want the limit it to only apply to the VRAM domain entry,
> doing it this way it applies to GTT as well which is really bad for handling
> page faults.
>
> I would rather say let us completely nuke radeon_ttm_placement_from_domain
> for internal allocations and give radeon_bo_create a ttm_placement pointer
> to use.
>
> Driver internal allocations would then have a couple of predefined
> placements for their buffers. We might need to make a few ttm_placement
> pointers const for this, but I think that this is the better approach.

Yeah, the more I think about it, the more I'm starting to agree.
Maybe for now we just drop the topdown thing.  It seems like it can
only cause needless migration in it's current state.  It also reported
breaks suspend/resume on some systems.

Alex

>
> Regards,
> Christian.
>
>
>
> Alex
>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
diff mbox

Patch

From 7af0b96d2d6983044e914e481754afa6232e3d05 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher@amd.com>
Date: Tue, 17 Mar 2015 10:10:11 -0400
Subject: [PATCH 01/10] drm/radeon: add fpfn, lpfn to
 radeon_ttm_placement_from_domain()

This lets us set the valid page range for buffers at allocation
time.  Currently set to 0, 0 so no functional change.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |  3 ++-
 drivers/gpu/drm/radeon/radeon_gem.c    |  2 +-
 drivers/gpu/drm/radeon/radeon_mn.c     |  4 ++--
 drivers/gpu/drm/radeon/radeon_object.c | 36 ++++++++++++++++++----------------
 drivers/gpu/drm/radeon/radeon_ttm.c    |  8 ++++----
 5 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 35ab65d..7de3e21 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2980,7 +2980,8 @@  extern void radeon_surface_init(struct radeon_device *rdev);
 extern int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data);
 extern void radeon_legacy_set_clock_gating(struct radeon_device *rdev, int enable);
 extern void radeon_atom_set_clock_gating(struct radeon_device *rdev, int enable);
-extern void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain);
+extern void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain,
+					     unsigned fpfn, unsigned lpfn);
 extern bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object *bo);
 extern int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t addr,
 				     uint32_t flags);
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index ac3c131..fb50ea9 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -337,7 +337,7 @@  int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data,
 			goto release_object;
 		}
 
-		radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_GTT);
+		radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_GTT, 0, 0);
 		r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false);
 		radeon_bo_unreserve(bo);
 		up_read(&current->mm->mmap_sem);
diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c
index a69bd44..359f1f2 100644
--- a/drivers/gpu/drm/radeon/radeon_mn.c
+++ b/drivers/gpu/drm/radeon/radeon_mn.c
@@ -141,14 +141,14 @@  static void radeon_mn_invalidate_range_start(struct mmu_notifier *mn,
 				DRM_ERROR("(%d) failed to wait for user bo\n", r);
 		}
 
-		radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_CPU);
+		radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_CPU, 0, 0);
 		r = ttm_bo_validate(&bo->tbo, &bo->placement, false, false);
 		if (r)
 			DRM_ERROR("(%d) failed to validate user bo\n", r);
 
 		radeon_bo_unreserve(bo);
 	}
-	
+
 	mutex_unlock(&rmn->lock);
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 43e0994..a439bcf 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -93,7 +93,8 @@  bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object *bo)
 	return false;
 }
 
-void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
+void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain,
+				      unsigned fpfn, unsigned lpfn)
 {
 	u32 c = 0, i;
 
@@ -103,8 +104,9 @@  void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 		/* Try placing BOs which don't need CPU access outside of the
 		 * CPU accessible part of VRAM
 		 */
-		if ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) &&
-		    rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size) {
+		if (fpfn >= rbo->rdev->mc.visible_vram_size ||
+		    ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) &&
+		     rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size)) {
 			rbo->placements[c].fpfn =
 				rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
 			rbo->placements[c++].flags = TTM_PL_FLAG_WC |
@@ -112,7 +114,7 @@  void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 						     TTM_PL_FLAG_VRAM;
 		}
 
-		rbo->placements[c].fpfn = 0;
+		rbo->placements[c].fpfn = fpfn;
 		rbo->placements[c++].flags = TTM_PL_FLAG_WC |
 					     TTM_PL_FLAG_UNCACHED |
 					     TTM_PL_FLAG_VRAM;
@@ -120,18 +122,18 @@  void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 
 	if (domain & RADEON_GEM_DOMAIN_GTT) {
 		if (rbo->flags & RADEON_GEM_GTT_UC) {
-			rbo->placements[c].fpfn = 0;
+			rbo->placements[c].fpfn = fpfn;
 			rbo->placements[c++].flags = TTM_PL_FLAG_UNCACHED |
 				TTM_PL_FLAG_TT;
 
 		} else if ((rbo->flags & RADEON_GEM_GTT_WC) ||
 			   (rbo->rdev->flags & RADEON_IS_AGP)) {
-			rbo->placements[c].fpfn = 0;
+			rbo->placements[c].fpfn = fpfn;
 			rbo->placements[c++].flags = TTM_PL_FLAG_WC |
 				TTM_PL_FLAG_UNCACHED |
 				TTM_PL_FLAG_TT;
 		} else {
-			rbo->placements[c].fpfn = 0;
+			rbo->placements[c].fpfn = fpfn;
 			rbo->placements[c++].flags = TTM_PL_FLAG_CACHED |
 						     TTM_PL_FLAG_TT;
 		}
@@ -139,24 +141,24 @@  void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 
 	if (domain & RADEON_GEM_DOMAIN_CPU) {
 		if (rbo->flags & RADEON_GEM_GTT_UC) {
-			rbo->placements[c].fpfn = 0;
+			rbo->placements[c].fpfn = fpfn;
 			rbo->placements[c++].flags = TTM_PL_FLAG_UNCACHED |
 				TTM_PL_FLAG_SYSTEM;
 
 		} else if ((rbo->flags & RADEON_GEM_GTT_WC) ||
 		    rbo->rdev->flags & RADEON_IS_AGP) {
-			rbo->placements[c].fpfn = 0;
+			rbo->placements[c].fpfn = fpfn;
 			rbo->placements[c++].flags = TTM_PL_FLAG_WC |
 				TTM_PL_FLAG_UNCACHED |
 				TTM_PL_FLAG_SYSTEM;
 		} else {
-			rbo->placements[c].fpfn = 0;
+			rbo->placements[c].fpfn = fpfn;
 			rbo->placements[c++].flags = TTM_PL_FLAG_CACHED |
 						     TTM_PL_FLAG_SYSTEM;
 		}
 	}
 	if (!c) {
-		rbo->placements[c].fpfn = 0;
+		rbo->placements[c].fpfn = fpfn;
 		rbo->placements[c++].flags = TTM_PL_MASK_CACHING |
 					     TTM_PL_FLAG_SYSTEM;
 	}
@@ -171,7 +173,7 @@  void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain)
 			rbo->placements[i].lpfn =
 				rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
 		else
-			rbo->placements[i].lpfn = 0;
+			rbo->placements[i].lpfn = lpfn;
 	}
 
 	/*
@@ -252,7 +254,7 @@  int radeon_bo_create(struct radeon_device *rdev,
 	bo->flags &= ~RADEON_GEM_GTT_WC;
 #endif
 
-	radeon_ttm_placement_from_domain(bo, domain);
+	radeon_ttm_placement_from_domain(bo, domain, 0, 0);
 	/* Kernel allocation are uninterruptible */
 	down_read(&rdev->pm.mclk_lock);
 	r = ttm_bo_init(&rdev->mman.bdev, &bo->tbo, size, type,
@@ -350,7 +352,7 @@  int radeon_bo_pin_restricted(struct radeon_bo *bo, u32 domain, u64 max_offset,
 
 		return 0;
 	}
-	radeon_ttm_placement_from_domain(bo, domain);
+	radeon_ttm_placement_from_domain(bo, domain, 0, 0);
 	for (i = 0; i < bo->placement.num_placement; i++) {
 		/* force to pin into visible video ram */
 		if ((bo->placements[i].flags & TTM_PL_FLAG_VRAM) &&
@@ -557,7 +559,7 @@  int radeon_bo_list_validate(struct radeon_device *rdev,
 			}
 
 		retry:
-			radeon_ttm_placement_from_domain(bo, domain);
+			radeon_ttm_placement_from_domain(bo, domain, 0, 0);
 			if (ring == R600_RING_TYPE_UVD_INDEX)
 				radeon_uvd_force_into_uvd_segment(bo, allowed);
 
@@ -800,7 +802,7 @@  int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
 		return 0;
 
 	/* hurrah the memory is not visible ! */
-	radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_VRAM);
+	radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_VRAM, 0, 0);
 	lpfn =	rdev->mc.visible_vram_size >> PAGE_SHIFT;
 	for (i = 0; i < rbo->placement.num_placement; i++) {
 		/* Force into visible VRAM */
@@ -810,7 +812,7 @@  int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
 	}
 	r = ttm_bo_validate(bo, &rbo->placement, false, false);
 	if (unlikely(r == -ENOMEM)) {
-		radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_GTT);
+		radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_GTT, 0, 0);
 		return ttm_bo_validate(bo, &rbo->placement, false, false);
 	} else if (unlikely(r != 0)) {
 		return r;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index d02aa1d..2994bed 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -197,7 +197,7 @@  static void radeon_evict_flags(struct ttm_buffer_object *bo,
 	switch (bo->mem.mem_type) {
 	case TTM_PL_VRAM:
 		if (rbo->rdev->ring[radeon_copy_ring_index(rbo->rdev)].ready == false)
-			radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_CPU);
+			radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_CPU, 0, 0);
 		else if (rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size &&
 			 bo->mem.start < (rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT)) {
 			unsigned fpfn = rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT;
@@ -209,7 +209,7 @@  static void radeon_evict_flags(struct ttm_buffer_object *bo,
 			 * BOs to be evicted from VRAM
 			 */
 			radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_VRAM |
-							 RADEON_GEM_DOMAIN_GTT);
+							 RADEON_GEM_DOMAIN_GTT, 0, 0);
 			rbo->placement.num_busy_placement = 0;
 			for (i = 0; i < rbo->placement.num_placement; i++) {
 				if (rbo->placements[i].flags & TTM_PL_FLAG_VRAM) {
@@ -222,11 +222,11 @@  static void radeon_evict_flags(struct ttm_buffer_object *bo,
 				}
 			}
 		} else
-			radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_GTT);
+			radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_GTT, 0, 0);
 		break;
 	case TTM_PL_TT:
 	default:
-		radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_CPU);
+		radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_CPU, 0, 0);
 	}
 	*placement = rbo->placement;
 }
-- 
1.8.3.1