[RFC,v2,0/5] new cgroup controller for gpu/drm subsystem

Message ID	20190509210410.5471-1-Kenny.Ho@amd.com (mailing list archive)
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) From: Kenny Ho <Kenny.Ho@amd.com> To: <y2kenny@gmail.com>, <Kenny.Ho@amd.com>, <cgroups@vger.kernel.org>, <dri-devel@lists.freedesktop.org>, <amd-gfx@lists.freedesktop.org>, <tj@kernel.org>, <sunnanyong@huawei.com>, <alexander.deucher@amd.com>, <brian.welty@intel.com> Subject: [RFC PATCH v2 0/5] new cgroup controller for gpu/drm subsystem Date: Thu, 9 May 2019 17:04:05 -0400 Message-ID: <20190509210410.5471-1-Kenny.Ho@amd.com> In-Reply-To: <20181120185814.13362-1-Kenny.Ho@amd.com> References: <20181120185814.13362-1-Kenny.Ho@amd.com> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	new cgroup controller for gpu/drm subsystem \| expand [RFC,v2,0/5] new cgroup controller for gpu/drm subsystem [RFC,v2,1/5] cgroup: Introduce cgroup for drm subsystem [RFC,v2,2/5] cgroup: Add mechanism to register DRM devices [RFC,v2,3/5] drm/amdgpu: Register AMD devices for DRM cgroup [RFC,v2,4/5] drm, cgroup: Add total GEM buffer allocation limit [RFC,v2,5/5] drm, cgroup: Add peak GEM buffer allocation limit

Message ID

20190509210410.5471-1-Kenny.Ho@amd.com (mailing list archive)

Headers

Received-SPF: None (protection.outlook.com: amd.com does not designate
 permitted sender hosts)
From: Kenny Ho <Kenny.Ho@amd.com>
To: <y2kenny@gmail.com>, <Kenny.Ho@amd.com>, <cgroups@vger.kernel.org>,
 <dri-devel@lists.freedesktop.org>, <amd-gfx@lists.freedesktop.org>,
 <tj@kernel.org>, <sunnanyong@huawei.com>, <alexander.deucher@amd.com>,
 <brian.welty@intel.com>
Subject: [RFC PATCH v2 0/5] new cgroup controller for gpu/drm subsystem
Date: Thu, 9 May 2019 17:04:05 -0400
Message-ID: <20190509210410.5471-1-Kenny.Ho@amd.com>
In-Reply-To: <20181120185814.13362-1-Kenny.Ho@amd.com>
References: <20181120185814.13362-1-Kenny.Ho@amd.com>
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 May 2019 21:04:44.0851 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 38c7712e-ce8a-4ba0-d67d-08d6d4c1f635
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17];
 Helo=[SATLEXCHOV02.amd.com]
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR1201MB0062
X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=amdcloud.onmicrosoft.com; s=selector1-amd-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=mpMEXD/9fMIms9U40QfczX6+Ub8nc+u8q76hRonweeA=;
 b=Z4xF2jQn7jCAoLuTuvA+fIbY76AyWoer6988uXwCgG2jqIsV3cFdhRiGkaEIaa5vMCHK4JuxKStuFsTcpymxp02dtqOtAuj0UPVSQXdt7/uZh1MGUpqDIdZIrta6KqCkQolZ0X8WbmFuCMgNaj8sCbvwLoz+MxbYAiiWlAp9l2c=
X-Mailman-Original-Authentication-Results: spf=none (sender IP is
 165.204.84.17)
 smtp.mailfrom=amd.com; kernel.org; dkim=none (message not signed)
 header.d=none;kernel.org; dmarc=permerror action=none header.from=amd.com;
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Series

new cgroup controller for gpu/drm subsystem | expand

Message

Ho, Kenny May 9, 2019, 9:04 p.m. UTC

This is a follow up to the RFC I made last november to introduce a cgroup controller for the GPU/DRM subsystem [a].  The goal is to be able to provide resource management to GPU resources using things like container.  The cover letter from v1 is copied below for reference.

Usage examples:
// set limit for card1 to 1GB
sed -i '2s/.*/1073741824/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max

// set limit for card0 to 512MB
sed -i '1s/.*/536870912/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max


v2:
* Removed the vendoring concepts
* Add limit to total buffer allocation
* Add limit to the maximum size of a buffer allocation

TODO: process migration
TODO: documentations

[a]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html

v1: cover letter

The purpose of this patch series is to start a discussion for a generic cgroup
controller for the drm subsystem.  The design proposed here is a very early one.
We are hoping to engage the community as we develop the idea.


Backgrounds
==========
Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
tasks, and all their future children, into hierarchical groups with specialized
behaviour, such as accounting/limiting the resources which processes in a cgroup
can access[1].  Weights, limits, protections, allocations are the main resource
distribution models.  Existing cgroup controllers includes cpu, memory, io,
rdma, and more.  cgroup is one of the foundational technologies that enables the
popular container application deployment and management method.

Direct Rendering Manager/drm contains code intended to support the needs of
complex graphics devices. Graphics drivers in the kernel may make use of DRM
functions to make tasks like memory management, interrupt handling and DMA
easier, and provide a uniform interface to applications.  The DRM has also
developed beyond traditional graphics applications to support compute/GPGPU
applications.


Motivations
=========
As GPU grow beyond the realm of desktop/workstation graphics into areas like
data center clusters and IoT, there are increasing needs to monitor and regulate
GPU as a resource like cpu, memory and io.

Matt Roper from Intel began working on similar idea in early 2018 [2] for the
purpose of managing GPU priority using the cgroup hierarchy.  While that
particular use case may not warrant a standalone drm cgroup controller, there
are other use cases where having one can be useful [3].  Monitoring GPU
resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
(execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
sysadmins get a better understanding of the applications usage profile.  Further
usage regulations of the aforementioned resources can also help sysadmins
optimize workload deployment on limited GPU resources.

With the increased importance of machine learning, data science and other
cloud-based applications, GPUs are already in production use in data centers
today [5,6,7].  Existing GPU resource management is very course grain, however,
as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
alternative is to use GPU virtualization (with or without SRIOV) but it
generally acts on the entire GPU instead of the specific resources in a GPU.
With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
resource management (in addition to what may be available via GPU
virtualization.)

In addition to production use, the DRM cgroup can also help with testing
graphics application robustness by providing a mean to artificially limit DRM
resources availble to the applications.

Challenges
========
While there are common infrastructure in DRM that is shared across many vendors
(the scheduler [4] for example), there are also aspects of DRM that are vendor
specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
handle different kinds of cgroup controller.

Resources for DRM are also often device (GPU) specific instead of system
specific and a system may contain more than one GPU.  For this, we borrowed some
of the ideas from RDMA cgroup controller.

Approach
=======
To experiment with the idea of a DRM cgroup, we would like to start with basic
accounting and statistics, then continue to iterate and add regulating
mechanisms into the driver.

[1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
[2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
[3] https://www.spinics.net/lists/cgroups/msg20720.html
[4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
[5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
[6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
[7] https://github.com/RadeonOpenCompute/k8s-device-plugin
[8] https://github.com/kubernetes/kubernetes/issues/52757

Kenny Ho (5):
  cgroup: Introduce cgroup for drm subsystem
  cgroup: Add mechanism to register DRM devices
  drm/amdgpu: Register AMD devices for DRM cgroup
  drm, cgroup: Add total GEM buffer allocation limit
  drm, cgroup: Add peak GEM buffer allocation limit

 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +
 drivers/gpu/drm/drm_gem.c                  |   7 +
 drivers/gpu/drm/drm_prime.c                |   9 +
 include/drm/drm_cgroup.h                   |  54 +++
 include/drm/drm_gem.h                      |  11 +
 include/linux/cgroup_drm.h                 |  47 ++
 include/linux/cgroup_subsys.h              |   4 +
 init/Kconfig                               |   5 +
 kernel/cgroup/Makefile                     |   1 +
 kernel/cgroup/drm.c                        | 497 +++++++++++++++++++++
 11 files changed, 643 insertions(+)
 create mode 100644 include/drm/drm_cgroup.h
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

Comments

Christian König May 10, 2019, 12:31 p.m. UTC | #1

That looks better than I thought it would be.

I think it is a good approach to try to add a global limit first and 
when that's working go ahead with limiting device specific resources.

The only major issue I can see is on patch #4, see there for further 
details.

Christian.

Am 09.05.19 um 23:04 schrieb Kenny Ho:
> This is a follow up to the RFC I made last november to introduce a cgroup controller for the GPU/DRM subsystem [a].  The goal is to be able to provide resource management to GPU resources using things like container.  The cover letter from v1 is copied below for reference.
>
> Usage examples:
> // set limit for card1 to 1GB
> sed -i '2s/.*/1073741824/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max
>
> // set limit for card0 to 512MB
> sed -i '1s/.*/536870912/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max
>
>
> v2:
> * Removed the vendoring concepts
> * Add limit to total buffer allocation
> * Add limit to the maximum size of a buffer allocation
>
> TODO: process migration
> TODO: documentations
>
> [a]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
>
> v1: cover letter
>
> The purpose of this patch series is to start a discussion for a generic cgroup
> controller for the drm subsystem.  The design proposed here is a very early one.
> We are hoping to engage the community as we develop the idea.
>
>
> Backgrounds
> ==========
> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
> tasks, and all their future children, into hierarchical groups with specialized
> behaviour, such as accounting/limiting the resources which processes in a cgroup
> can access[1].  Weights, limits, protections, allocations are the main resource
> distribution models.  Existing cgroup controllers includes cpu, memory, io,
> rdma, and more.  cgroup is one of the foundational technologies that enables the
> popular container application deployment and management method.
>
> Direct Rendering Manager/drm contains code intended to support the needs of
> complex graphics devices. Graphics drivers in the kernel may make use of DRM
> functions to make tasks like memory management, interrupt handling and DMA
> easier, and provide a uniform interface to applications.  The DRM has also
> developed beyond traditional graphics applications to support compute/GPGPU
> applications.
>
>
> Motivations
> =========
> As GPU grow beyond the realm of desktop/workstation graphics into areas like
> data center clusters and IoT, there are increasing needs to monitor and regulate
> GPU as a resource like cpu, memory and io.
>
> Matt Roper from Intel began working on similar idea in early 2018 [2] for the
> purpose of managing GPU priority using the cgroup hierarchy.  While that
> particular use case may not warrant a standalone drm cgroup controller, there
> are other use cases where having one can be useful [3].  Monitoring GPU
> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
> sysadmins get a better understanding of the applications usage profile.  Further
> usage regulations of the aforementioned resources can also help sysadmins
> optimize workload deployment on limited GPU resources.
>
> With the increased importance of machine learning, data science and other
> cloud-based applications, GPUs are already in production use in data centers
> today [5,6,7].  Existing GPU resource management is very course grain, however,
> as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
> alternative is to use GPU virtualization (with or without SRIOV) but it
> generally acts on the entire GPU instead of the specific resources in a GPU.
> With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
> resource management (in addition to what may be available via GPU
> virtualization.)
>
> In addition to production use, the DRM cgroup can also help with testing
> graphics application robustness by providing a mean to artificially limit DRM
> resources availble to the applications.
>
> Challenges
> ========
> While there are common infrastructure in DRM that is shared across many vendors
> (the scheduler [4] for example), there are also aspects of DRM that are vendor
> specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
> handle different kinds of cgroup controller.
>
> Resources for DRM are also often device (GPU) specific instead of system
> specific and a system may contain more than one GPU.  For this, we borrowed some
> of the ideas from RDMA cgroup controller.
>
> Approach
> =======
> To experiment with the idea of a DRM cgroup, we would like to start with basic
> accounting and statistics, then continue to iterate and add regulating
> mechanisms into the driver.
>
> [1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
> [2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
> [3] https://www.spinics.net/lists/cgroups/msg20720.html
> [4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
> [5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
> [6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
> [7] https://github.com/RadeonOpenCompute/k8s-device-plugin
> [8] https://github.com/kubernetes/kubernetes/issues/52757
>
> Kenny Ho (5):
>    cgroup: Introduce cgroup for drm subsystem
>    cgroup: Add mechanism to register DRM devices
>    drm/amdgpu: Register AMD devices for DRM cgroup
>    drm, cgroup: Add total GEM buffer allocation limit
>    drm, cgroup: Add peak GEM buffer allocation limit
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |   4 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +
>   drivers/gpu/drm/drm_gem.c                  |   7 +
>   drivers/gpu/drm/drm_prime.c                |   9 +
>   include/drm/drm_cgroup.h                   |  54 +++
>   include/drm/drm_gem.h                      |  11 +
>   include/linux/cgroup_drm.h                 |  47 ++
>   include/linux/cgroup_subsys.h              |   4 +
>   init/Kconfig                               |   5 +
>   kernel/cgroup/Makefile                     |   1 +
>   kernel/cgroup/drm.c                        | 497 +++++++++++++++++++++
>   11 files changed, 643 insertions(+)
>   create mode 100644 include/drm/drm_cgroup.h
>   create mode 100644 include/linux/cgroup_drm.h
>   create mode 100644 kernel/cgroup/drm.c
>

Kenny Ho May 10, 2019, 3:07 p.m. UTC | #2

On Fri, May 10, 2019 at 8:31 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> I think it is a good approach to try to add a global limit first and
> when that's working go ahead with limiting device specific resources.
What are some of the global drm resource limit/allocation that would
be useful to implement? I would be happy to dig into those.

Regards,
Kenny


> The only major issue I can see is on patch #4, see there for further
> details.
>
> Christian.
>
> Am 09.05.19 um 23:04 schrieb Kenny Ho:
> > This is a follow up to the RFC I made last november to introduce a cgroup controller for the GPU/DRM subsystem [a].  The goal is to be able to provide resource management to GPU resources using things like container.  The cover letter from v1 is copied below for reference.
> >
> > Usage examples:
> > // set limit for card1 to 1GB
> > sed -i '2s/.*/1073741824/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max
> >
> > // set limit for card0 to 512MB
> > sed -i '1s/.*/536870912/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max
> >
> >
> > v2:
> > * Removed the vendoring concepts
> > * Add limit to total buffer allocation
> > * Add limit to the maximum size of a buffer allocation
> >
> > TODO: process migration
> > TODO: documentations
> >
> > [a]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> >
> > v1: cover letter
> >
> > The purpose of this patch series is to start a discussion for a generic cgroup
> > controller for the drm subsystem.  The design proposed here is a very early one.
> > We are hoping to engage the community as we develop the idea.
> >
> >
> > Backgrounds
> > ==========
> > Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
> > tasks, and all their future children, into hierarchical groups with specialized
> > behaviour, such as accounting/limiting the resources which processes in a cgroup
> > can access[1].  Weights, limits, protections, allocations are the main resource
> > distribution models.  Existing cgroup controllers includes cpu, memory, io,
> > rdma, and more.  cgroup is one of the foundational technologies that enables the
> > popular container application deployment and management method.
> >
> > Direct Rendering Manager/drm contains code intended to support the needs of
> > complex graphics devices. Graphics drivers in the kernel may make use of DRM
> > functions to make tasks like memory management, interrupt handling and DMA
> > easier, and provide a uniform interface to applications.  The DRM has also
> > developed beyond traditional graphics applications to support compute/GPGPU
> > applications.
> >
> >
> > Motivations
> > =========
> > As GPU grow beyond the realm of desktop/workstation graphics into areas like
> > data center clusters and IoT, there are increasing needs to monitor and regulate
> > GPU as a resource like cpu, memory and io.
> >
> > Matt Roper from Intel began working on similar idea in early 2018 [2] for the
> > purpose of managing GPU priority using the cgroup hierarchy.  While that
> > particular use case may not warrant a standalone drm cgroup controller, there
> > are other use cases where having one can be useful [3].  Monitoring GPU
> > resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
> > (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
> > sysadmins get a better understanding of the applications usage profile.  Further
> > usage regulations of the aforementioned resources can also help sysadmins
> > optimize workload deployment on limited GPU resources.
> >
> > With the increased importance of machine learning, data science and other
> > cloud-based applications, GPUs are already in production use in data centers
> > today [5,6,7].  Existing GPU resource management is very course grain, however,
> > as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
> > alternative is to use GPU virtualization (with or without SRIOV) but it
> > generally acts on the entire GPU instead of the specific resources in a GPU.
> > With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
> > resource management (in addition to what may be available via GPU
> > virtualization.)
> >
> > In addition to production use, the DRM cgroup can also help with testing
> > graphics application robustness by providing a mean to artificially limit DRM
> > resources availble to the applications.
> >
> > Challenges
> > ========
> > While there are common infrastructure in DRM that is shared across many vendors
> > (the scheduler [4] for example), there are also aspects of DRM that are vendor
> > specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
> > handle different kinds of cgroup controller.
> >
> > Resources for DRM are also often device (GPU) specific instead of system
> > specific and a system may contain more than one GPU.  For this, we borrowed some
> > of the ideas from RDMA cgroup controller.
> >
> > Approach
> > =======
> > To experiment with the idea of a DRM cgroup, we would like to start with basic
> > accounting and statistics, then continue to iterate and add regulating
> > mechanisms into the driver.
> >
> > [1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
> > [2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
> > [3] https://www.spinics.net/lists/cgroups/msg20720.html
> > [4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
> > [5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
> > [6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
> > [7] https://github.com/RadeonOpenCompute/k8s-device-plugin
> > [8] https://github.com/kubernetes/kubernetes/issues/52757
> >
> > Kenny Ho (5):
> >    cgroup: Introduce cgroup for drm subsystem
> >    cgroup: Add mechanism to register DRM devices
> >    drm/amdgpu: Register AMD devices for DRM cgroup
> >    drm, cgroup: Add total GEM buffer allocation limit
> >    drm, cgroup: Add peak GEM buffer allocation limit
> >
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |   4 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +
> >   drivers/gpu/drm/drm_gem.c                  |   7 +
> >   drivers/gpu/drm/drm_prime.c                |   9 +
> >   include/drm/drm_cgroup.h                   |  54 +++
> >   include/drm/drm_gem.h                      |  11 +
> >   include/linux/cgroup_drm.h                 |  47 ++
> >   include/linux/cgroup_subsys.h              |   4 +
> >   init/Kconfig                               |   5 +
> >   kernel/cgroup/Makefile                     |   1 +
> >   kernel/cgroup/drm.c                        | 497 +++++++++++++++++++++
> >   11 files changed, 643 insertions(+)
> >   create mode 100644 include/drm/drm_cgroup.h
> >   create mode 100644 include/linux/cgroup_drm.h
> >   create mode 100644 kernel/cgroup/drm.c
> >
>

Christian König May 10, 2019, 5:46 p.m. UTC | #3

Am 10.05.19 um 17:07 schrieb Kenny Ho:
> [CAUTION: External Email]
>
> On Fri, May 10, 2019 at 8:31 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> I think it is a good approach to try to add a global limit first and
>> when that's working go ahead with limiting device specific resources.
> What are some of the global drm resource limit/allocation that would
> be useful to implement? I would be happy to dig into those.

I was thinking about device specific stuff like VRAM etc...

What I'm also not clear about is how this should interact with memcg. 
E.g. do we also need to account BOs in memcg?

In theory I would say yes.

Christian.

>
> Regards,
> Kenny
>
>
>> The only major issue I can see is on patch #4, see there for further
>> details.
>>
>> Christian.
>>
>> Am 09.05.19 um 23:04 schrieb Kenny Ho:
>>> This is a follow up to the RFC I made last november to introduce a cgroup controller for the GPU/DRM subsystem [a].  The goal is to be able to provide resource management to GPU resources using things like container.  The cover letter from v1 is copied below for reference.
>>>
>>> Usage examples:
>>> // set limit for card1 to 1GB
>>> sed -i '2s/.*/1073741824/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max
>>>
>>> // set limit for card0 to 512MB
>>> sed -i '1s/.*/536870912/' /sys/fs/cgroup/<cgroup>/drm.buffer.total.max
>>>
>>>
>>> v2:
>>> * Removed the vendoring concepts
>>> * Add limit to total buffer allocation
>>> * Add limit to the maximum size of a buffer allocation
>>>
>>> TODO: process migration
>>> TODO: documentations
>>>
>>> [a]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
>>>
>>> v1: cover letter
>>>
>>> The purpose of this patch series is to start a discussion for a generic cgroup
>>> controller for the drm subsystem.  The design proposed here is a very early one.
>>> We are hoping to engage the community as we develop the idea.
>>>
>>>
>>> Backgrounds
>>> ==========
>>> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
>>> tasks, and all their future children, into hierarchical groups with specialized
>>> behaviour, such as accounting/limiting the resources which processes in a cgroup
>>> can access[1].  Weights, limits, protections, allocations are the main resource
>>> distribution models.  Existing cgroup controllers includes cpu, memory, io,
>>> rdma, and more.  cgroup is one of the foundational technologies that enables the
>>> popular container application deployment and management method.
>>>
>>> Direct Rendering Manager/drm contains code intended to support the needs of
>>> complex graphics devices. Graphics drivers in the kernel may make use of DRM
>>> functions to make tasks like memory management, interrupt handling and DMA
>>> easier, and provide a uniform interface to applications.  The DRM has also
>>> developed beyond traditional graphics applications to support compute/GPGPU
>>> applications.
>>>
>>>
>>> Motivations
>>> =========
>>> As GPU grow beyond the realm of desktop/workstation graphics into areas like
>>> data center clusters and IoT, there are increasing needs to monitor and regulate
>>> GPU as a resource like cpu, memory and io.
>>>
>>> Matt Roper from Intel began working on similar idea in early 2018 [2] for the
>>> purpose of managing GPU priority using the cgroup hierarchy.  While that
>>> particular use case may not warrant a standalone drm cgroup controller, there
>>> are other use cases where having one can be useful [3].  Monitoring GPU
>>> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
>>> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
>>> sysadmins get a better understanding of the applications usage profile.  Further
>>> usage regulations of the aforementioned resources can also help sysadmins
>>> optimize workload deployment on limited GPU resources.
>>>
>>> With the increased importance of machine learning, data science and other
>>> cloud-based applications, GPUs are already in production use in data centers
>>> today [5,6,7].  Existing GPU resource management is very course grain, however,
>>> as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
>>> alternative is to use GPU virtualization (with or without SRIOV) but it
>>> generally acts on the entire GPU instead of the specific resources in a GPU.
>>> With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
>>> resource management (in addition to what may be available via GPU
>>> virtualization.)
>>>
>>> In addition to production use, the DRM cgroup can also help with testing
>>> graphics application robustness by providing a mean to artificially limit DRM
>>> resources availble to the applications.
>>>
>>> Challenges
>>> ========
>>> While there are common infrastructure in DRM that is shared across many vendors
>>> (the scheduler [4] for example), there are also aspects of DRM that are vendor
>>> specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
>>> handle different kinds of cgroup controller.
>>>
>>> Resources for DRM are also often device (GPU) specific instead of system
>>> specific and a system may contain more than one GPU.  For this, we borrowed some
>>> of the ideas from RDMA cgroup controller.
>>>
>>> Approach
>>> =======
>>> To experiment with the idea of a DRM cgroup, we would like to start with basic
>>> accounting and statistics, then continue to iterate and add regulating
>>> mechanisms into the driver.
>>>
>>> [1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
>>> [2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
>>> [3] https://www.spinics.net/lists/cgroups/msg20720.html
>>> [4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
>>> [5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
>>> [6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
>>> [7] https://github.com/RadeonOpenCompute/k8s-device-plugin
>>> [8] https://github.com/kubernetes/kubernetes/issues/52757
>>>
>>> Kenny Ho (5):
>>>     cgroup: Introduce cgroup for drm subsystem
>>>     cgroup: Add mechanism to register DRM devices
>>>     drm/amdgpu: Register AMD devices for DRM cgroup
>>>     drm, cgroup: Add total GEM buffer allocation limit
>>>     drm, cgroup: Add peak GEM buffer allocation limit
>>>
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |   4 +
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   4 +
>>>    drivers/gpu/drm/drm_gem.c                  |   7 +
>>>    drivers/gpu/drm/drm_prime.c                |   9 +
>>>    include/drm/drm_cgroup.h                   |  54 +++
>>>    include/drm/drm_gem.h                      |  11 +
>>>    include/linux/cgroup_drm.h                 |  47 ++
>>>    include/linux/cgroup_subsys.h              |   4 +
>>>    init/Kconfig                               |   5 +
>>>    kernel/cgroup/Makefile                     |   1 +
>>>    kernel/cgroup/drm.c                        | 497 +++++++++++++++++++++
>>>    11 files changed, 643 insertions(+)
>>>    create mode 100644 include/drm/drm_cgroup.h
>>>    create mode 100644 include/linux/cgroup_drm.h
>>>    create mode 100644 kernel/cgroup/drm.c
>>>