[0/7] kernel/cgroups: Add "dev" memory accounting cgroup.

Message ID	20241023075302.27194-1-maarten.lankhorst@linux.intel.com (mailing list archive)
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> To: intel-xe@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>, Johannes Weiner <hannes@cmpxchg.org>, Andrew Morton <akpm@linux-foundation.org> Cc: Friedrich Vock <friedrich.vock@gmx.de>, cgroups@vger.kernel.org, linux-mm@kvack.org, Maxime Ripard <mripard@kernel.org>, Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Subject: [PATCH 0/7] kernel/cgroups: Add "dev" memory accounting cgroup. Date: Wed, 23 Oct 2024 09:52:53 +0200 Message-ID: <20241023075302.27194-1-maarten.lankhorst@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	kernel/cgroups: Add "dev" memory accounting cgroup. \| expand [0/7] kernel/cgroups: Add "dev" memory accounting cgroup. [1/7] kernel/cgroup: Add "dev" memory accounting cgroup [2/7] drm/drv: Add drmm cgroup registration for dev cgroups. [3/7] drm/ttm: Handle cgroup based eviction in TTM [4/7] drm/xe: Implement cgroup for vram [5/7] drm/amdgpu: Add cgroups implementation [6/7,HACK] drm/xe: Hack to test with mapped pages instead of vram. [7/7,DISCUSSION] drm/gem: Add cgroup memory accounting

Message ID

20241023075302.27194-1-maarten.lankhorst@linux.intel.com (mailing list archive)

Headers

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: intel-xe@lists.freedesktop.org, linux-kernel@vger.kernel.org,
 dri-devel@lists.freedesktop.org, Tejun Heo <tj@kernel.org>,
 Zefan Li <lizefan.x@bytedance.com>, Johannes Weiner <hannes@cmpxchg.org>,
 Andrew Morton <akpm@linux-foundation.org>
Cc: Friedrich Vock <friedrich.vock@gmx.de>, cgroups@vger.kernel.org,
 linux-mm@kvack.org, Maxime Ripard <mripard@kernel.org>,
 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Subject: [PATCH 0/7] kernel/cgroups: Add "dev" memory accounting cgroup.
Date: Wed, 23 Oct 2024 09:52:53 +0200
Message-ID: <20241023075302.27194-1-maarten.lankhorst@linux.intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

kernel/cgroups: Add "dev" memory accounting cgroup. | expand

Message

Maarten Lankhorst Oct. 23, 2024, 7:52 a.m. UTC

New submission!
I've added documentation for each call, and integrated the renaming from
drm cgroup to dev cgroup, based on maxime ripard's work.

Maxime has been testing this with dma-buf heaps and v4l2 too, and it seems to work.
In the initial submission, I've decided to only add the smallest enablement possible,
to have less chance of breaking things.

The API has been changed slightly, from "$name region.$regionname=$limit" in a file called
dev.min/low/max to "$subsystem/$name $regionname=$limit" in a file called dev.region.min/low/max.

This hopefully allows us to perhaps extend the API later on with the possibility to
set scheduler weights on the device, like in

https://blogs.igalia.com/tursulin/drm-scheduling-cgroup-controller/

Maarten Lankhorst (5):
  kernel/cgroup: Add "dev" memory accounting cgroup
  drm/ttm: Handle cgroup based eviction in TTM
  drm/xe: Implement cgroup for vram
  drm/amdgpu: Add cgroups implementation
  [HACK] drm/xe: Hack to test with mapped pages instead of vram.

Maxime Ripard (2):
  drm/drv: Add drmm cgroup registration for dev cgroups.
  [DISCUSSION] drm/gem: Add cgroup memory accounting

 Documentation/admin-guide/cgroup-v2.rst       |  51 +
 Documentation/core-api/cgroup.rst             |   9 +
 Documentation/core-api/index.rst              |   1 +
 Documentation/gpu/drm-compute.rst             |  54 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |   6 +
 drivers/gpu/drm/drm_drv.c                     |  32 +-
 drivers/gpu/drm/drm_gem.c                     |   4 +
 drivers/gpu/drm/drm_gem_dma_helper.c          |   4 +
 drivers/gpu/drm/ttm/tests/ttm_bo_test.c       |  18 +-
 .../gpu/drm/ttm/tests/ttm_bo_validate_test.c  |   4 +-
 drivers/gpu/drm/ttm/tests/ttm_resource_test.c |   2 +-
 drivers/gpu/drm/ttm/ttm_bo.c                  |  57 +-
 drivers/gpu/drm/ttm/ttm_resource.c            |  24 +-
 drivers/gpu/drm/xe/xe_device.c                |   4 +
 drivers/gpu/drm/xe/xe_device_types.h          |   4 +
 drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |  14 +
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c          |  10 +
 include/drm/drm_device.h                      |   4 +
 include/drm/drm_drv.h                         |   4 +
 include/drm/drm_gem.h                         |   2 +
 include/drm/ttm/ttm_resource.h                |  16 +-
 include/linux/cgroup_dev.h                    |  91 ++
 include/linux/cgroup_subsys.h                 |   4 +
 include/linux/page_counter.h                  |   2 +-
 init/Kconfig                                  |   7 +
 kernel/cgroup/Makefile                        |   1 +
 kernel/cgroup/dev.c                           | 893 ++++++++++++++++++
 mm/page_counter.c                             |   4 +-
 30 files changed, 1307 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/core-api/cgroup.rst
 create mode 100644 Documentation/gpu/drm-compute.rst
 create mode 100644 include/linux/cgroup_dev.h
 create mode 100644 kernel/cgroup/dev.c

Comments

Tejun Heo Oct. 23, 2024, 7:40 p.m. UTC | #1

Hello,

On Wed, Oct 23, 2024 at 09:52:53AM +0200, Maarten Lankhorst wrote:
> New submission!
> I've added documentation for each call, and integrated the renaming from
> drm cgroup to dev cgroup, based on maxime ripard's work.
> 
> Maxime has been testing this with dma-buf heaps and v4l2 too, and it seems to work.
> In the initial submission, I've decided to only add the smallest enablement possible,
> to have less chance of breaking things.
> 
> The API has been changed slightly, from "$name region.$regionname=$limit" in a file called
> dev.min/low/max to "$subsystem/$name $regionname=$limit" in a file called dev.region.min/low/max.
> 
> This hopefully allows us to perhaps extend the API later on with the possibility to
> set scheduler weights on the device, like in
> 
> https://blogs.igalia.com/tursulin/drm-scheduling-cgroup-controller/
> 
> Maarten Lankhorst (5):
>   kernel/cgroup: Add "dev" memory accounting cgroup

Yeah, let's not use "dev" name for this. As Waiman pointed out, it conflicts
with the devices controller from cgroup1. While cgroup1 is mostly
deprecated, the same features are provided through BPF in systemd using the
same terminologies, so this is going to be really confusing.

What happened with Tvrtko's weighted implementation? I've seen many proposed
patchsets in this area but as far as I could see none could establish
consensus among GPU crowd and that's one of the reasons why nothing ever
landed. Is the aim of this patchset establishing such consensus?

If reaching consensus doesn't seem feasible in a predictable timeframe, my
suggesstion is just extending the misc controller. If the only way forward
here is fragmented vendor(s)-specific implementations, let's throw them into
the misc controller.

Thanks.

Maxime Ripard Oct. 24, 2024, 7:20 a.m. UTC | #2

Hi Tejun,

Thanks a lot for your review.

On Wed, Oct 23, 2024 at 09:40:28AM -1000, Tejun Heo wrote:
> On Wed, Oct 23, 2024 at 09:52:53AM +0200, Maarten Lankhorst wrote:
> > New submission!
> > I've added documentation for each call, and integrated the renaming from
> > drm cgroup to dev cgroup, based on maxime ripard's work.
> > 
> > Maxime has been testing this with dma-buf heaps and v4l2 too, and it seems to work.
> > In the initial submission, I've decided to only add the smallest enablement possible,
> > to have less chance of breaking things.
> > 
> > The API has been changed slightly, from "$name region.$regionname=$limit" in a file called
> > dev.min/low/max to "$subsystem/$name $regionname=$limit" in a file called dev.region.min/low/max.
> > 
> > This hopefully allows us to perhaps extend the API later on with the possibility to
> > set scheduler weights on the device, like in
> > 
> > https://blogs.igalia.com/tursulin/drm-scheduling-cgroup-controller/
> > 
> > Maarten Lankhorst (5):
> >   kernel/cgroup: Add "dev" memory accounting cgroup
> 
> Yeah, let's not use "dev" name for this. As Waiman pointed out, it conflicts
> with the devices controller from cgroup1. While cgroup1 is mostly
> deprecated, the same features are provided through BPF in systemd using the
> same terminologies, so this is going to be really confusing.

Yeah, I agree. We switched to dev because we want to support more than
just DRM, but all DMA-able memory. We have patches to support for v4l2
and dma-buf heaps, so using the name DRM didn't feel great either.

Do you have a better name in mind? "device memory"? "dma memory"?

> What happened with Tvrtko's weighted implementation? I've seen many proposed
> patchsets in this area but as far as I could see none could establish
> consensus among GPU crowd and that's one of the reasons why nothing ever
> landed. Is the aim of this patchset establishing such consensus?

Yeah, we have a consensus by now I think. Valve, Intel, Google, and Red
Hat have been involved in that series and we all agree on the implementation.

Tvrtko aims at a different feature set though: this one is about memory
allocation limits, Tvrtko's about scheduling.

Scheduling doesn't make much sense for things outside of DRM (and even
for a fraction of all DRM devices), and it's pretty much orthogonal. So
i guess you can expect another series from Tvrtko, but I don't think
they should be considered equivalent or dependent on each other.

> If reaching consensus doesn't seem feasible in a predictable timeframe, my
> suggesstion is just extending the misc controller. If the only way forward
> here is fragmented vendor(s)-specific implementations, let's throw them into
> the misc controller.

I don't think we have a fragmented implementation here, at all. The last
patch especially implements it for all devices implementing the GEM
interface in DRM, which would be around 100 drivers from various vendors.

It's marked as a discussion because we don't quite know how to plumb it
in for all drivers in the current DRM framework, but it's very much what
we want to achieve.

Maxime

Tejun Heo Oct. 24, 2024, 5:06 p.m. UTC | #3

Hello,

On Thu, Oct 24, 2024 at 09:20:43AM +0200, Maxime Ripard wrote:
...
> > Yeah, let's not use "dev" name for this. As Waiman pointed out, it conflicts
> > with the devices controller from cgroup1. While cgroup1 is mostly
> > deprecated, the same features are provided through BPF in systemd using the
> > same terminologies, so this is going to be really confusing.
> 
> Yeah, I agree. We switched to dev because we want to support more than
> just DRM, but all DMA-able memory. We have patches to support for v4l2
> and dma-buf heaps, so using the name DRM didn't feel great either.
> 
> Do you have a better name in mind? "device memory"? "dma memory"?

Maybe just dma (I think the term isn't used heavily anymore, so the word is
kinda open)? But, hopefully, others have better ideas.

> > What happened with Tvrtko's weighted implementation? I've seen many proposed
> > patchsets in this area but as far as I could see none could establish
> > consensus among GPU crowd and that's one of the reasons why nothing ever
> > landed. Is the aim of this patchset establishing such consensus?
> 
> Yeah, we have a consensus by now I think. Valve, Intel, Google, and Red
> Hat have been involved in that series and we all agree on the implementation.

That's great to hear.

> Tvrtko aims at a different feature set though: this one is about memory
> allocation limits, Tvrtko's about scheduling.
> 
> Scheduling doesn't make much sense for things outside of DRM (and even
> for a fraction of all DRM devices), and it's pretty much orthogonal. So
> i guess you can expect another series from Tvrtko, but I don't think
> they should be considered equivalent or dependent on each other.

Yeah, I get that this is about memory and that is about processing capacity,
so the plan is going for separate controllers for each? Or would it be
better to present both under the same controller interface? Even if they're
going to be separate controllers, we at least want to be aligned on how
devices and their configurations are presented in the two controllers.

Thanks.