From patchwork Sat Oct 3 04:02:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 11814959 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A37C66CA for ; Sat, 3 Oct 2020 04:03:08 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 74411207FB for ; Sat, 3 Oct 2020 04:03:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="KHGKrTS4" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 74411207FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 53FAB6EA31; Sat, 3 Oct 2020 04:03:05 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) by gabe.freedesktop.org (Postfix) with ESMTPS id ECC766EA31 for ; Sat, 3 Oct 2020 04:03:03 +0000 (UTC) Received: by mail-pf1-x443.google.com with SMTP id e10so2229913pfj.1 for ; Fri, 02 Oct 2020 21:03:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=VjaWCVVHnQkyjgZIf+F5rOHFvTh6Oj2mEVi69T9s924=; b=KHGKrTS4eMx4zeqKXxUrtf+ZXVtgB0NHPyLsRT96T1A3zG5TuEhdRy6U18S43zwA1f 2YIldOPesNlBtEKnk0r5EV+J3Air7t7eHaKcV7ydpMka5SLd6wmNy+RFto6yfOJnvnlr 9seQC5gFITUdsj0smCtMwOowEnt+3koJAv3ItloHOn6of1Y8HuK46F7Sf27EaVsMvwCM d3SrxbrCwIldC+c8v3HsbLx9Ncs8DNbfVDrzq/Cogdha17EnpLBWUbW4FJyVsTKZsFdk ZG1LubdsfYfXus12O2dAGXG2UVfZV8VN4fjMGlK++4U8fBGTqP6GiLcn3TSGOjir2b0L EFPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=VjaWCVVHnQkyjgZIf+F5rOHFvTh6Oj2mEVi69T9s924=; b=mulTwO4eBT5RC+2lhdHL3HN74IQcn1z8mzUBmg3eEXrCPhzzLVs0QQtXXO31CQo+VW djk4FTgGpV0HkrgOTS8FhmgWzDmNG+xMSeGLO1Ma+CYvJcgQXWqXdzqfWBoMmDU+lc1j wlQ5SyldJv35uV15b1RyBEglXC5753aXuK+w6WEhXyxHDQ2ZLuBJMrM2ccMCHWzw2K6y hsL3eV+T4M6V3IH1p7QlcjR6dCMiVjEBiK8Z+wFPz8112ciHTWmZdK0E9vTfqCz+Zok/ RaMa3yzQ4NnATMjA2G9IZxPIkledTF6OWDGpGQkVhbxLerF+fXRDkaOCto992Ymx7+56 q8MA== X-Gm-Message-State: AOAM533D4eHPwZCTAYUBq1Dsv/Rn8rQQzuZZd9z6mw5xp8Em8Hrs+Zrk 9ohJMttzhMYKe+rwboUtL0d+TQ== X-Google-Smtp-Source: ABdhPJzkXxcRaSO+6+4+aDg4abp67iwneyQm/cvxOIoR8lrdt1w8XWzkhH0Qz1V1uu9y3MYgXyu3PQ== X-Received: by 2002:aa7:8249:0:b029:142:2501:39dd with SMTP id e9-20020aa782490000b0290142250139ddmr5982786pfn.44.1601697783389; Fri, 02 Oct 2020 21:03:03 -0700 (PDT) Received: from localhost.localdomain ([2601:1c2:680:1319:692:26ff:feda:3a81]) by smtp.gmail.com with ESMTPSA id 190sm3909290pfy.22.2020.10.02.21.03.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Oct 2020 21:03:02 -0700 (PDT) From: John Stultz To: lkml Subject: [PATCH v3 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation Date: Sat, 3 Oct 2020 04:02:50 +0000 Message-Id: <20201003040257.62768-1-john.stultz@linaro.org> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sandeep Patil , dri-devel@lists.freedesktop.org, Ezequiel Garcia , Robin Murphy , James Jones , Liam Mark , Laura Abbott , Chris Goldsworthy , Hridya Valsaraju , =?utf-8?q?=C3=98rjan_Eide?= , linux-media@vger.kernel.org, Suren Baghdasaryan , Daniel Mentz Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Hey All, So this is another revision of my patch series to performance optimizations to the dma-buf system heap. Unfortunately, in working these up, I realized the heap-helpers infrastructure we tried to add to miniimize code duplication is not as generic as we intended. For some heaps it makes sense to deal with page lists, for other heaps it makes more sense to track things with sgtables. So this series reworks the system heap to use sgtables, and then consolidates the pagelist method from the heap-helpers into the CMA heap. After which the heap-helpers logic is removed (as it is unused). I'd still like to find a better way to avoid some of the logic duplication in implementing the entire dma_buf_ops handlers per heap. But unfortunately that code is tied somewhat to how the buffer's memory is tracked. After this, the series introduces an optimization that Ørjan Eide implemented for ION that avoids calling sync on attachments that don't have a mapping. Next, an optimization to use larger order pages for the system heap. This change brings us closer to the current performance of the ION code. Unfortunately, after submitting the last round, I realized that part of the reason the page-pooling patch I had included was providing such great performance numbers, was because the network page-pool implementation doesn't zero pages that it pulls from the cache. This is very inappropriate for buffers we pass to userland and was what gave it an unfair advantage (almost constant time performance) relative to ION's allocation performance numbers. I added some patches to zero the buffers manually similar to how ION does it, but I found this resulted in basically no performance improvement from the standard page allocator. Thus I've dropped that patch in this series for now. Unfortunately this means we still have a performance delta from the ION system heap as measured by my microbenchmark, and this delta comes from ION system_heap's use of deferred freeing of pages. So less work is done in the measured interval of the microbenchmark. I'll be looking at adding similar code eventually but I don't want to hold the rest of the patches up on this, as it is still a good improvement over the current code. I've updated the chart I shared earlier with current numbers (including with the unsubmitted net pagepool implementation, and with a different unsubmitted pagepool implementation borrowed from ION) here: https://docs.google.com/spreadsheets/d/1-1C8ZQpmkl_0DISkI6z4xelE08MlNAN7oEu34AnO4Ao/edit?usp=sharing I did add to this series a reworked version of my uncached system heap implementation I was submitting a few weeks back. Since it duplicated a lot of the now reworked system heap code, I realized it would be much simpler to add the functionality to the system_heap implementaiton itself. While not improving the core allocation performance, the uncached heap allocations do result in *much* improved performance on HiKey960 as it avoids a lot of flushing and invalidating buffers that the cpu doesn't touch often. Feedback on these would be great! thanks -john New in v3: * Dropped page-pool patches as after correcting the code to zero buffers, they provided no net performance gain. * Added system-uncached implementation ontop of reworked system-heap. * Use the new sgtable mapping functions, in the system and cma code as Suggested-by: Daniel Mentz * Cleanup: Use page_size() rather then open-coding it Cc: Sumit Semwal Cc: Liam Mark Cc: Laura Abbott Cc: Brian Starkey Cc: Hridya Valsaraju Cc: Suren Baghdasaryan Cc: Sandeep Patil Cc: Daniel Mentz Cc: Chris Goldsworthy Cc: Ørjan Eide Cc: Robin Murphy Cc: Ezequiel Garcia Cc: Simon Ser Cc: James Jones Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org John Stultz (7): dma-buf: system_heap: Rework system heap to use sgtables instead of pagelists dma-buf: heaps: Move heap-helper logic into the cma_heap implementation dma-buf: heaps: Remove heap-helpers code dma-buf: heaps: Skip sync if not mapped dma-buf: system_heap: Allocate higher order pages if available dma-buf: dma-heap: Keep track of the heap device struct dma-buf: system_heap: Add a system-uncached heap re-using the system heap drivers/dma-buf/dma-heap.c | 33 +- drivers/dma-buf/heaps/Makefile | 1 - drivers/dma-buf/heaps/cma_heap.c | 327 +++++++++++++++--- drivers/dma-buf/heaps/heap-helpers.c | 271 --------------- drivers/dma-buf/heaps/heap-helpers.h | 53 --- drivers/dma-buf/heaps/system_heap.c | 480 ++++++++++++++++++++++++--- include/linux/dma-heap.h | 9 + 7 files changed, 741 insertions(+), 433 deletions(-) delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h