From patchwork Tue Mar 17 15:19:20 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alex Deucher X-Patchwork-Id: 6032351 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 20EE7BF90F for ; Tue, 17 Mar 2015 15:19:30 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 24E492041D for ; Tue, 17 Mar 2015 15:19:27 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 98B1D2041B for ; Tue, 17 Mar 2015 15:19:23 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6C6276E6E3; Tue, 17 Mar 2015 08:19:22 -0700 (PDT) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mail-qg0-f52.google.com (mail-qg0-f52.google.com [209.85.192.52]) by gabe.freedesktop.org (Postfix) with ESMTP id 600886E6E3 for ; Tue, 17 Mar 2015 08:19:21 -0700 (PDT) Received: by qgh62 with SMTP id 62so10958716qgh.1 for ; Tue, 17 Mar 2015 08:19:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=A7XkuA+vIPWfQGYh7YiKbDjLzrZbi6wfII0BUnCeByI=; b=uRmzGsh1YD66KRLgII2icqzIJvBMtgsYZ/k3dzmxJbXHrVbaw2AUK7ogoIoDl409dO fSIfLLer+bpddgA2wxKvnOdxg8OZBsA9Cxc3AMM86lcnTNyrX4CfWUlxjl1Y25vgwVEC YBtgcPRFqJw7HPesDGPoBxVvfJ6h+crFRTMoTG7pQfIvn3ZFMFO8Edfu2FDAZ9R2skUR pZ+OT2gAByt9Xs3TxpKQ/nDpVZkggIShQGO5VRH4gE599A6L5z8cQZWC7haMZzgNeBG2 vdVfO5Se6MjFJDJmvJGQ6oJUSZ2YZj0Hh5bjFGyTXdbd19rx9jhK5pCuzsTzQW/szrbX dQzQ== MIME-Version: 1.0 X-Received: by 10.55.55.141 with SMTP id e135mr84577498qka.78.1426605560728; Tue, 17 Mar 2015 08:19:20 -0700 (PDT) Received: by 10.140.41.69 with HTTP; Tue, 17 Mar 2015 08:19:20 -0700 (PDT) In-Reply-To: <5507A41F.20003@daenzer.net> References: <1426088652-32727-1-git-send-email-alexander.deucher@amd.com> <550087B6.3090307@vodafone.de> <55015640.4020008@daenzer.net> <55015B27.2030208@vodafone.de> <550251B2.5020000@daenzer.net> <5507A41F.20003@daenzer.net> Date: Tue, 17 Mar 2015 11:19:20 -0400 Message-ID: Subject: Re: [PATCH] drm/radeon: fix TOPDOWN handling for bo_create From: Alex Deucher To: =?UTF-8?Q?Michel_D=C3=A4nzer?= Cc: Maling list - DRI developers X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_MED, T_DKIM_INVALID, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Mar 16, 2015 at 11:48 PM, Michel Dänzer wrote: > On 17.03.2015 07:32, Alex Deucher wrote: >> On Thu, Mar 12, 2015 at 10:55 PM, Michel Dänzer wrote: >>> On 12.03.2015 22:09, Alex Deucher wrote: >>>> On Thu, Mar 12, 2015 at 5:23 AM, Christian König >>>> wrote: >>>>> On 12.03.2015 10:02, Michel Dänzer wrote: >>>>>> >>>>>> On 12.03.2015 06:14, Alex Deucher wrote: >>>>>>> >>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Alex Deucher >>>>>>> wrote: >>>>>>>> >>>>>>>> On Wed, Mar 11, 2015 at 2:21 PM, Christian König >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On 11.03.2015 16:44, Alex Deucher wrote: >>>>>>>>>> >>>>>>>>>> radeon_bo_create() calls radeon_ttm_placement_from_domain() >>>>>>>>>> before ttm_bo_init() is called. radeon_ttm_placement_from_domain() >>>>>>>>>> uses the ttm bo size to determine when to select top down >>>>>>>>>> allocation but since the ttm bo is not initialized yet the >>>>>>>>>> check is always false. >>>>>>>>>> >>>>>>>>>> Noticed-by: Oded Gabbay >>>>>>>>>> Signed-off-by: Alex Deucher >>>>>>>>>> Cc: stable@vger.kernel.org >>>>>>>>> >>>>>>>>> >>>>>>>>> And I was already wondering why the heck the BOs always made this >>>>>>>>> ping/pong >>>>>>>>> in memory after creation. >>>>>>>>> >>>>>>>>> Patch is Reviewed-by: Christian König >>>>>>>> >>>>>>>> And fixing that promptly broke VCE due to vram location requirements. >>>>>>>> Updated patch attached. Thoughts? >>>>>>> >>>>>>> And one more take to make things a bit more explicit for static kernel >>>>>>> driver allocations. >>>>>> >>>>>> struct ttm_place::lpfn is honoured even with TTM_PL_FLAG_TOPDOWN, so >>>>>> latter should work with RADEON_GEM_CPU_ACCESS. It sounds like the >>>>>> problem is really that some BOs are expected to be within a certain >>>>>> range from the beginning of VRAM, but lpfn isn't set accordingly. It >>>>>> would be better to fix that by setting lpfn directly than indirectly via >>>>>> RADEON_GEM_CPU_ACCESS. >>>>> >>>>> >>>>> Yeah, agree. We should probably try to find the root cause of this instead. >>>>> >>>>> As far as I know VCE has no documented limitation on where buffers are >>>>> placed (unlike UVD). So this is a bit strange. Are you sure that it isn't >>>>> UVD which breaks here? >>>> >>>> It's definitely VCE, I don't know why UVD didn't have a problem. I >>>> considered using pin_restricted to make sure it got pinned in the CPU >>>> visible region, but that had two problems: 1. it would end up getting >>>> migrated when pinned, >>> >>> Maybe something like radeon_uvd_force_into_uvd_segment() is needed for >>> VCE as well? >>> >>> >>>> 2. it would end up at the top of the restricted >>>> region since the top down flag is set which would end up fragmenting >>>> vram. >>> >>> If that's an issue (which outweighs the supposed benefit of >>> TTM_PL_FLAG_TOPDOWN), then again the proper solution would be not to set >>> TTM_PL_FLAG_TOPDOWN when rbo->placements[i].lpfn != 0 and smaller than >>> the whole available region, instead of checking for VRAM and >>> RADEON_GEM_CPU_ACCESS. >>> >> >> How about something like the attached patch? I'm not really sure >> about the restrictions for the UVD and VCE fw and stack/heap buffers, >> but this seems to work. It seems like the current UVD/VCE code works >> by accident since the check for TOPDOWN fails. > > This patch is getting a bit messy, mixing several logically separate > changes. Can you split it up accordingly? E.g. one patch just adding the > new fpfn and lpfn function parameters but passing 0 for them (so no > functional change), then one or several patches with the corresponding > functional changes, and finally one patch adding the new size parameter > (and thus making TTM_PL_FLAG_TOPDOWN actually used for newly allocated > BOs). I think that would help for reviewing and generally understanding > the changes. > > >> @@ -105,14 +106,17 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) >> */ >> if ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) && >> rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size) { >> - rbo->placements[c].fpfn = >> - rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT; >> + if (fpfn > (rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT)) >> + rbo->placements[c].fpfn = fpfn; >> + else >> + rbo->placements[c].fpfn = >> + rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT; >> rbo->placements[c++].flags = TTM_PL_FLAG_WC | >> TTM_PL_FLAG_UNCACHED | >> TTM_PL_FLAG_VRAM; >> } > > If (fpfn >= rbo->rdev->mc.visible_vram_size), this whole block can be > skipped, since the next placement will be identical. > > OTOH, fpfn is currently always 0 anyway, so maybe it's better not to add > that parameter in the first place. > > > Other than that, looks good to me. Broken out patches attached. Also available here: http://cgit.freedesktop.org/~agd5f/linux/log/?h=topdown-fixes Alex From 7af0b96d2d6983044e914e481754afa6232e3d05 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Tue, 17 Mar 2015 10:10:11 -0400 Subject: [PATCH 01/10] drm/radeon: add fpfn, lpfn to radeon_ttm_placement_from_domain() This lets us set the valid page range for buffers at allocation time. Currently set to 0, 0 so no functional change. Signed-off-by: Alex Deucher --- drivers/gpu/drm/radeon/radeon.h | 3 ++- drivers/gpu/drm/radeon/radeon_gem.c | 2 +- drivers/gpu/drm/radeon/radeon_mn.c | 4 ++-- drivers/gpu/drm/radeon/radeon_object.c | 36 ++++++++++++++++++---------------- drivers/gpu/drm/radeon/radeon_ttm.c | 8 ++++---- 5 files changed, 28 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 35ab65d..7de3e21 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -2980,7 +2980,8 @@ extern void radeon_surface_init(struct radeon_device *rdev); extern int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data); extern void radeon_legacy_set_clock_gating(struct radeon_device *rdev, int enable); extern void radeon_atom_set_clock_gating(struct radeon_device *rdev, int enable); -extern void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain); +extern void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain, + unsigned fpfn, unsigned lpfn); extern bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object *bo); extern int radeon_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t addr, uint32_t flags); diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index ac3c131..fb50ea9 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -337,7 +337,7 @@ int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data, goto release_object; } - radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_GTT); + radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_GTT, 0, 0); r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false); radeon_bo_unreserve(bo); up_read(¤t->mm->mmap_sem); diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c index a69bd44..359f1f2 100644 --- a/drivers/gpu/drm/radeon/radeon_mn.c +++ b/drivers/gpu/drm/radeon/radeon_mn.c @@ -141,14 +141,14 @@ static void radeon_mn_invalidate_range_start(struct mmu_notifier *mn, DRM_ERROR("(%d) failed to wait for user bo\n", r); } - radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_CPU); + radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_CPU, 0, 0); r = ttm_bo_validate(&bo->tbo, &bo->placement, false, false); if (r) DRM_ERROR("(%d) failed to validate user bo\n", r); radeon_bo_unreserve(bo); } - + mutex_unlock(&rmn->lock); } diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index 43e0994..a439bcf 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -93,7 +93,8 @@ bool radeon_ttm_bo_is_radeon_bo(struct ttm_buffer_object *bo) return false; } -void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) +void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain, + unsigned fpfn, unsigned lpfn) { u32 c = 0, i; @@ -103,8 +104,9 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) /* Try placing BOs which don't need CPU access outside of the * CPU accessible part of VRAM */ - if ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) && - rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size) { + if (fpfn >= rbo->rdev->mc.visible_vram_size || + ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) && + rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size)) { rbo->placements[c].fpfn = rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT; rbo->placements[c++].flags = TTM_PL_FLAG_WC | @@ -112,7 +114,7 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) TTM_PL_FLAG_VRAM; } - rbo->placements[c].fpfn = 0; + rbo->placements[c].fpfn = fpfn; rbo->placements[c++].flags = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_VRAM; @@ -120,18 +122,18 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) if (domain & RADEON_GEM_DOMAIN_GTT) { if (rbo->flags & RADEON_GEM_GTT_UC) { - rbo->placements[c].fpfn = 0; + rbo->placements[c].fpfn = fpfn; rbo->placements[c++].flags = TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_TT; } else if ((rbo->flags & RADEON_GEM_GTT_WC) || (rbo->rdev->flags & RADEON_IS_AGP)) { - rbo->placements[c].fpfn = 0; + rbo->placements[c].fpfn = fpfn; rbo->placements[c++].flags = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_TT; } else { - rbo->placements[c].fpfn = 0; + rbo->placements[c].fpfn = fpfn; rbo->placements[c++].flags = TTM_PL_FLAG_CACHED | TTM_PL_FLAG_TT; } @@ -139,24 +141,24 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) if (domain & RADEON_GEM_DOMAIN_CPU) { if (rbo->flags & RADEON_GEM_GTT_UC) { - rbo->placements[c].fpfn = 0; + rbo->placements[c].fpfn = fpfn; rbo->placements[c++].flags = TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_SYSTEM; } else if ((rbo->flags & RADEON_GEM_GTT_WC) || rbo->rdev->flags & RADEON_IS_AGP) { - rbo->placements[c].fpfn = 0; + rbo->placements[c].fpfn = fpfn; rbo->placements[c++].flags = TTM_PL_FLAG_WC | TTM_PL_FLAG_UNCACHED | TTM_PL_FLAG_SYSTEM; } else { - rbo->placements[c].fpfn = 0; + rbo->placements[c].fpfn = fpfn; rbo->placements[c++].flags = TTM_PL_FLAG_CACHED | TTM_PL_FLAG_SYSTEM; } } if (!c) { - rbo->placements[c].fpfn = 0; + rbo->placements[c].fpfn = fpfn; rbo->placements[c++].flags = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM; } @@ -171,7 +173,7 @@ void radeon_ttm_placement_from_domain(struct radeon_bo *rbo, u32 domain) rbo->placements[i].lpfn = rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT; else - rbo->placements[i].lpfn = 0; + rbo->placements[i].lpfn = lpfn; } /* @@ -252,7 +254,7 @@ int radeon_bo_create(struct radeon_device *rdev, bo->flags &= ~RADEON_GEM_GTT_WC; #endif - radeon_ttm_placement_from_domain(bo, domain); + radeon_ttm_placement_from_domain(bo, domain, 0, 0); /* Kernel allocation are uninterruptible */ down_read(&rdev->pm.mclk_lock); r = ttm_bo_init(&rdev->mman.bdev, &bo->tbo, size, type, @@ -350,7 +352,7 @@ int radeon_bo_pin_restricted(struct radeon_bo *bo, u32 domain, u64 max_offset, return 0; } - radeon_ttm_placement_from_domain(bo, domain); + radeon_ttm_placement_from_domain(bo, domain, 0, 0); for (i = 0; i < bo->placement.num_placement; i++) { /* force to pin into visible video ram */ if ((bo->placements[i].flags & TTM_PL_FLAG_VRAM) && @@ -557,7 +559,7 @@ int radeon_bo_list_validate(struct radeon_device *rdev, } retry: - radeon_ttm_placement_from_domain(bo, domain); + radeon_ttm_placement_from_domain(bo, domain, 0, 0); if (ring == R600_RING_TYPE_UVD_INDEX) radeon_uvd_force_into_uvd_segment(bo, allowed); @@ -800,7 +802,7 @@ int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo) return 0; /* hurrah the memory is not visible ! */ - radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_VRAM); + radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_VRAM, 0, 0); lpfn = rdev->mc.visible_vram_size >> PAGE_SHIFT; for (i = 0; i < rbo->placement.num_placement; i++) { /* Force into visible VRAM */ @@ -810,7 +812,7 @@ int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo) } r = ttm_bo_validate(bo, &rbo->placement, false, false); if (unlikely(r == -ENOMEM)) { - radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_GTT); + radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_GTT, 0, 0); return ttm_bo_validate(bo, &rbo->placement, false, false); } else if (unlikely(r != 0)) { return r; diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index d02aa1d..2994bed 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -197,7 +197,7 @@ static void radeon_evict_flags(struct ttm_buffer_object *bo, switch (bo->mem.mem_type) { case TTM_PL_VRAM: if (rbo->rdev->ring[radeon_copy_ring_index(rbo->rdev)].ready == false) - radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_CPU); + radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_CPU, 0, 0); else if (rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.real_vram_size && bo->mem.start < (rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT)) { unsigned fpfn = rbo->rdev->mc.visible_vram_size >> PAGE_SHIFT; @@ -209,7 +209,7 @@ static void radeon_evict_flags(struct ttm_buffer_object *bo, * BOs to be evicted from VRAM */ radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_VRAM | - RADEON_GEM_DOMAIN_GTT); + RADEON_GEM_DOMAIN_GTT, 0, 0); rbo->placement.num_busy_placement = 0; for (i = 0; i < rbo->placement.num_placement; i++) { if (rbo->placements[i].flags & TTM_PL_FLAG_VRAM) { @@ -222,11 +222,11 @@ static void radeon_evict_flags(struct ttm_buffer_object *bo, } } } else - radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_GTT); + radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_GTT, 0, 0); break; case TTM_PL_TT: default: - radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_CPU); + radeon_ttm_placement_from_domain(rbo, RADEON_GEM_DOMAIN_CPU, 0, 0); } *placement = rbo->placement; } -- 1.8.3.1