From patchwork Mon Jan 11 11:01:24 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 8002201 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id EB21E9F32E for ; Mon, 11 Jan 2016 11:03:37 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 75E20202E9 for ; Mon, 11 Jan 2016 11:03:35 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 2A15620295 for ; Mon, 11 Jan 2016 11:03:34 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C04F289B38; Mon, 11 Jan 2016 03:03:31 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com [74.125.82.68]) by gabe.freedesktop.org (Postfix) with ESMTPS id 81A7E6E489 for ; Mon, 11 Jan 2016 03:02:34 -0800 (PST) Received: by mail-wm0-f68.google.com with SMTP id u188so25686086wmu.0 for ; Mon, 11 Jan 2016 03:02:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=y8caCTMWwMonMVrcRSe0vrkpbS3bhW+p3whdQ5smpJo=; b=Z2krpT+z5qkHRO/nRhyl3wBXISBFeLCUU06KfaZXUnYKaGDSfvQ5Ag4pJDThmpJo6Y Vpk64bpDGEfl3svsUXa+TomoTSw1sNzVTv+ndviqx9NbkBH3MEI7xEqGT6d8H2Rot0Ni vAX8gApQ7EeHaT97nQhBwEH2DPBzOP33sZgQQQ2oSuwnU89Ug6LNVbH9C7ccO2OVOXlv SooBuvHiv5fJkQMV4WNKmprPOOXgcExIooOfUHXGCYeQxSOujNPfIIlo0AO79VJ/+8Dw lsw2JiTPV9s+EQpB1I+g7VMQDTiFEGOT2fDOIjtmj5RfF4f2+WEU0qrLRH52HfngzN/G iF4w== X-Received: by 10.28.89.195 with SMTP id n186mr11874315wmb.49.1452510153001; Mon, 11 Jan 2016 03:02:33 -0800 (PST) Received: from haswell.alporthouse.com ([78.156.65.138]) by smtp.gmail.com with ESMTPSA id 73sm12311579wmm.7.2016.01.11.03.02.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 11 Jan 2016 03:02:32 -0800 (PST) From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 11 Jan 2016 11:01:24 +0000 Message-Id: <1452510091-6833-42-git-send-email-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.7.0.rc3 In-Reply-To: <1452510091-6833-1-git-send-email-chris@chris-wilson.co.uk> References: <1452503961-14837-1-git-send-email-chris@chris-wilson.co.uk> <1452510091-6833-1-git-send-email-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH 183/190] drm/i915/cmdparser: Use cached vmappings X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The single largest factor in the overhead of parsing the commands is the setup of the virtual mapping to provide a continuous block for the batch buffer. If we keep those vmappings around (against the better judgement of mm/vmalloc.c, which we offset by handwaving and looking suggestively at the shrinker) we can dramatically improve the performance of the parser for small batches (such as media workloads). Furthermore, we can use the prepare shmem read/write functions to determine how best we need to clflush the range (rather than every page of the object). The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt for the throughput on small batches. (Caching just the dst vmap and iterating over the src, doing a page by page copy is roughly 5% slower on both platforms. That may be an acceptable trade-off to eliminate one cached vmapping, and we may be able to reduce the per-page copying overhead further.) For *this* simple test case, the cmdparser is now within a factor of 2 of ideal performance. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_cmd_parser.c | 123 ++++++++++------------------- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 6 ++ 2 files changed, 48 insertions(+), 81 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c index 84340eb42e1b..32b83369ae4e 100644 --- a/drivers/gpu/drm/i915/i915_cmd_parser.c +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c @@ -866,98 +866,57 @@ find_reg(const struct drm_i915_reg_descriptor *table, return NULL; } -static u32 *vmap_batch(struct drm_i915_gem_object *obj, - unsigned start, unsigned len) -{ - int i; - void *addr = NULL; - struct sg_page_iter sg_iter; - int first_page = start >> PAGE_SHIFT; - int last_page = (len + start + 4095) >> PAGE_SHIFT; - int npages = last_page - first_page; - struct page **pages; - - pages = drm_malloc_ab(npages, sizeof(*pages)); - if (pages == NULL) { - DRM_DEBUG_DRIVER("Failed to get space for pages\n"); - goto finish; - } - - i = 0; - for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first_page) { - pages[i++] = sg_page_iter_page(&sg_iter); - if (i == npages) - break; - } - - addr = vmap(pages, i, 0, PAGE_KERNEL); - if (addr == NULL) { - DRM_DEBUG_DRIVER("Failed to vmap pages\n"); - goto finish; - } - -finish: - if (pages) - drm_free_large(pages); - return (u32*)addr; -} - -/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */ -static u32 *copy_batch(struct drm_i915_gem_object *dest_obj, +/* Returns a vmap'd pointer to dst_obj, which the caller must unmap */ +static u32 *copy_batch(struct drm_i915_gem_object *dst_obj, struct drm_i915_gem_object *src_obj, u32 batch_start_offset, - u32 batch_len) + u32 batch_len, + bool *needs_clflush_after) { - unsigned needs_clflush; - void *src_base, *src; - void *dst = NULL; + unsigned src_needs_clflush; + unsigned dst_needs_clflush; + void *src, *dst; int ret; - if (batch_len > dest_obj->base.size || - batch_len + batch_start_offset > src_obj->base.size) - return ERR_PTR(-E2BIG); - - if (WARN_ON(dest_obj->pages_pin_count == 0)) - return ERR_PTR(-ENODEV); - - ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush); - if (ret) { - DRM_DEBUG_DRIVER("CMD: failed to prepare shadow batch\n"); + ret = i915_gem_obj_prepare_shmem_read(src_obj, &src_needs_clflush); + if (ret) return ERR_PTR(ret); - } - src_base = vmap_batch(src_obj, batch_start_offset, batch_len); - if (!src_base) { - DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n"); - ret = -ENOMEM; + ret = i915_gem_obj_prepare_shmem_write(dst_obj, &dst_needs_clflush); + if (ret) { + dst = ERR_PTR(ret); goto unpin_src; } - ret = i915_gem_object_set_to_cpu_domain(dest_obj, true); - if (ret) { - DRM_DEBUG_DRIVER("CMD: Failed to set shadow batch to CPU\n"); - goto unmap_src; + src = i915_gem_object_pin_vmap(src_obj); + if (IS_ERR(src)) { + dst = src; + goto unpin_dst; } - dst = vmap_batch(dest_obj, 0, batch_len); - if (!dst) { - DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n"); - ret = -ENOMEM; + dst = i915_gem_object_pin_vmap(dst_obj); + if (IS_ERR(dst)) goto unmap_src; - } - src = src_base + offset_in_page(batch_start_offset); - if (needs_clflush) - drm_clflush_virt_range(src, batch_len); + src += batch_start_offset; + if (src_needs_clflush) + clflush_cache_range(src, batch_len); + + if (dst_needs_clflush & CLFLUSH_BEFORE) + batch_len = roundup(batch_len, boot_cpu_data.x86_clflush_size); memcpy(dst, src, batch_len); + /* dst_obj is returned with vmap pinned */ + *needs_clflush_after = dst_needs_clflush & CLFLUSH_AFTER; + unmap_src: - vunmap(src_base); + i915_gem_object_unpin_vmap(src_obj); +unpin_dst: + i915_gem_object_unpin_pages(dst_obj); unpin_src: i915_gem_object_unpin_pages(src_obj); - - return ret ? ERR_PTR(ret) : dst; + return dst; } static bool check_cmd(const struct intel_engine_cs *ring, @@ -1106,16 +1065,18 @@ int i915_parse_cmds(struct intel_engine_cs *ring, u32 batch_len, bool is_master) { - u32 *cmd, *batch_base, *batch_end; + u32 *cmd, *batch_end; struct drm_i915_cmd_descriptor default_desc = { 0 }; bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */ + bool needs_clflush_after = false; int ret = 0; - batch_base = copy_batch(shadow_batch_obj, batch_obj, - batch_start_offset, batch_len); - if (IS_ERR(batch_base)) { + cmd = copy_batch(shadow_batch_obj, batch_obj, + batch_start_offset, batch_len, + &needs_clflush_after); + if (IS_ERR(cmd)) { DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n"); - return PTR_ERR(batch_base); + return PTR_ERR(cmd); } /* @@ -1123,9 +1084,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring, * large or larger and copy_batch() will write MI_NOPs to the extra * space. Parsing should be faster in some cases this way. */ - batch_end = batch_base + (batch_len / sizeof(*batch_end)); - - cmd = batch_base; + batch_end = cmd + (batch_len / sizeof(*batch_end)); while (cmd < batch_end) { const struct drm_i915_cmd_descriptor *desc; u32 length; @@ -1184,7 +1143,9 @@ int i915_parse_cmds(struct intel_engine_cs *ring, ret = -EINVAL; } - vunmap(batch_base); + if (ret == 0 && needs_clflush_after) + clflush_cache_range(shadow_batch_obj->vmapping, batch_len); + i915_gem_object_unpin_vmap(shadow_batch_obj); return ret; } diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 733250afa139..eac3d52f790d 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1616,6 +1616,12 @@ i915_gem_do_execbuffer(struct drm_device *dev, ret = -EINVAL; goto err; } + if (args->batch_start_offset > eb.batch_vma->size || + args->batch_len > eb.batch_vma->size - args->batch_start_offset) { + DRM_DEBUG("Attempting to use out-of-bounds batch\n"); + ret = -EINVAL; + goto err; + } if (intel_engine_needs_cmd_parser(eb.engine) && args->batch_len) { struct i915_vma *vma;