From patchwork Wed May 3 08:34:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maarten Lankhorst X-Patchwork-Id: 13229858 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D0DC4C7EE2A for ; Wed, 3 May 2023 08:36:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 42E9F89167; Wed, 3 May 2023 08:36:02 +0000 (UTC) Received: from mblankhorst.nl (lankhorst.se [IPv6:2a02:2308:0:7ec:e79c:4e97:b6c4:f0ae]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8E47789167; Wed, 3 May 2023 08:36:00 +0000 (UTC) From: Maarten Lankhorst To: dri-devel@lists.freedesktop.org, cgroups@vger.kernel.org, intel-xe@lists.freedesktop.org Date: Wed, 3 May 2023 10:34:56 +0200 Message-Id: <20230503083500.645848-1-maarten.lankhorst@linux.intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Subject: [Intel-gfx] [RFC PATCH 0/4] Add support for DRM cgroup memory accounting. X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Vetter , Thomas Zimmermann , intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org, Maxime Ripard , Zefan Li , Johannes Weiner , Tejun Heo , David Airlie Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" RFC as I'm looking for comments. For long running compute, it can be beneficial to partition the GPU memory between cgroups, so each cgroup can use its maximum amount of memory without interfering with other scheduled jobs. Done properly, this can alleviate the need for eviction, which might result in a job being terminated if the GPU doesn't support mid-thread preemption or recoverable page faults. This is done by adding a bunch of knobs to cgroup: drm.capacity: Shows maximum capacity of each resource region. drm.max: Display or limit max amount of memory. drm.current: Current amount of memory in use. TTM has not been made cgroup aware yet, so instead of evicting from the current cgroup to stay within the cgroup limits, it simply returns the error -ENOSPC to userspace. I've used Tvrtko's cgroup controller series as a base, but it implemented scheduling weight, not memory accounting, so I only ended up keeping the base patch. Xe is not upstream yet, so the driver specific patch will only apply on https://gitlab.freedesktop.org/drm/xe/kernel Maarten Lankhorst (3): drm/cgroup: Add memory accounting to DRM cgroup drm/ttm: Handle -EAGAIN in ttm_resource_alloc as -ENOSPC. drm/xe: Add support for the drm cgroup Tvrtko Ursulin (1): cgroup: Add the DRM cgroup controller Documentation/admin-guide/cgroup-v2.rst | 46 ++ Documentation/gpu/drm-compute.rst | 54 ++ drivers/gpu/drm/ttm/ttm_bo.c | 4 +- drivers/gpu/drm/xe/xe_device.c | 4 + drivers/gpu/drm/xe/xe_device_types.h | 4 + drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 21 +- drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h | 5 + include/linux/cgroup_drm.h | 90 ++++ include/linux/cgroup_subsys.h | 4 + init/Kconfig | 7 + kernel/cgroup/Makefile | 1 + kernel/cgroup/drm.c | 557 +++++++++++++++++++++ 12 files changed, 794 insertions(+), 3 deletions(-) create mode 100644 Documentation/gpu/drm-compute.rst create mode 100644 include/linux/cgroup_drm.h create mode 100644 kernel/cgroup/drm.c