[RFC,0/9] cgroup support for GPU devices

Message ID	20210126214626.16260-1-brian.welty@intel.com (mailing list archive)
Headers	show Return-Path: <SRS0=v8c9=G5=lists.freedesktop.org=dri-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7F5502065C IronPort-SDR: 276D3R91aoDuLzhUyiJfRUdmY6V5DERd5YgwREUr52dyvOTF3Of4jZ4FUHXHl6LsynZH78bc7S j48Chhexu1tw== IronPort-SDR: pxx/maiC2y8TN8v4a9MNIpLQaukuEIO7m6+F+beuK12x15k/ghwB9/i8eJB1vT2jFY5rzFHq4M l0e928oDl2bA== From: Brian Welty <brian.welty@intel.com> To: Brian Welty <brian.welty@intel.com>, cgroups@vger.kernel.org, Tejun Heo <tj@kernel.org>, dri-devel@lists.freedesktop.org, David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>, Kenny Ho <Kenny.Ho@amd.com>, amd-gfx@lists.freedesktop.org, Chris Wilson <chris@chris-wilson.co.uk>, Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>, intel-gfx@lists.freedesktop.org, Joonas Lahtinen <joonas.lahtinen@linux.intel.com>, Eero Tamminen <eero.t.tamminen@intel.com> Subject: [RFC PATCH 0/9] cgroup support for GPU devices Date: Tue, 26 Jan 2021 13:46:17 -0800 Message-Id: <20210126214626.16260-1-brian.welty@intel.com> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	cgroup support for GPU devices \| expand [RFC,0/9] cgroup support for GPU devices [RFC,1/9] cgroup: Introduce cgroup for drm subsystem [RFC,2/9] drm, cgroup: Bind drm and cgroup subsystem [RFC,3/9] drm, cgroup: Initialize drmcg properties [RFC,4/9] drmcg: Add skeleton seq_show and write for drmcg files [RFC,5/9] drmcg: Add support for device memory accounting via page counter [RFC,6/9] drmcg: Add memory.total file [RFC,7/9] drmcg: Add initial support for tracking gpu time usage [RFC,8/9] drm/gem: Associate GEM objects with drm cgroup [RFC,9/9] drm/i915: Use memory cgroup for enforcing device memory limit

Message ID

20210126214626.16260-1-brian.welty@intel.com (mailing list archive)

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7F5502065C
IronPort-SDR: 
 276D3R91aoDuLzhUyiJfRUdmY6V5DERd5YgwREUr52dyvOTF3Of4jZ4FUHXHl6LsynZH78bc7S
 j48Chhexu1tw==
IronPort-SDR: 
 pxx/maiC2y8TN8v4a9MNIpLQaukuEIO7m6+F+beuK12x15k/ghwB9/i8eJB1vT2jFY5rzFHq4M
 l0e928oDl2bA==
From: Brian Welty <brian.welty@intel.com>
To: Brian Welty <brian.welty@intel.com>, cgroups@vger.kernel.org,
 Tejun Heo <tj@kernel.org>, dri-devel@lists.freedesktop.org,
 David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>,
	=?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>,
 Kenny Ho <Kenny.Ho@amd.com>, amd-gfx@lists.freedesktop.org,
 Chris Wilson <chris@chris-wilson.co.uk>,
 Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
 intel-gfx@lists.freedesktop.org,
 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
 Eero Tamminen <eero.t.tamminen@intel.com>
Subject: [RFC PATCH 0/9] cgroup support for GPU devices 
Date: Tue, 26 Jan 2021 13:46:17 -0800
Message-Id: <20210126214626.16260-1-brian.welty@intel.com>
MIME-Version: 1.0
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

cgroup support for GPU devices | expand

Message

Welty, Brian Jan. 26, 2021, 9:46 p.m. UTC

We'd like to revisit the proposal of a GPU cgroup controller for managing
GPU devices but with just a basic set of controls.  This series is based on 
the prior patch series from Kenny Ho [1].  We take Kenny's base patches
which implement the basic framework for the controller, but we propose an
alternate set of control files.  Here we've taken a subset of the controls
proposed in earlier discussion on ML here [2]. 

This series proposes a set of device memory controls (gpu.memory.current,
gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage
(gpu.sched.runtime).  GPU time sharing controls are left as future work.
These are implemented within the GPU controller along with integration/usage
of the device memory controls by the i915 device driver.

As an accelerator or GPU device is similar in many respects to a CPU with
(or without) attached system memory, the basic principle here is try to
copy the semantics of existing controls from other controllers when possible
and where these controls serve the same underlying purpose.
For example, the memory.max and memory.current controls are based on
same controls from MEMCG controller.

Following with the implementation used by the existing RDMA controller,
here we introduce a general purpose drm_cgroup_try_charge and uncharge
pair of exported functions. These functions are to be used for
charging and uncharging all current and future DRM resource controls.

Patches 1 - 4 are part original work and part refactoring of the prior
work from Kenny Ho from his series for GPU / DRM controller v2 [1].

Patches 5 - 7 introduce new controls to the GPU / DRM controller for device
memory accounting and GPU time tracking.

Patch 8 introduces DRM support for associating GEM objects with a cgroup.

Patch 9 implements i915 changes to use cgroups for device memory charging
and enforcing device memory allocation limit.

[1] https://lists.freedesktop.org/archives/dri-devel/2020-February/257052.html
[2] https://lists.freedesktop.org/archives/dri-devel/2019-November/242599.html

Brian Welty (6):
  drmcg: Add skeleton seq_show and write for drmcg files
  drmcg: Add support for device memory accounting via page counter
  drmcg: Add memory.total file
  drmcg: Add initial support for tracking gpu time usage
  drm/gem: Associate GEM objects with drm cgroup
  drm/i915: Use memory cgroup for enforcing device memory limit

Kenny Ho (3):
  cgroup: Introduce cgroup for drm subsystem
  drm, cgroup: Bind drm and cgroup subsystem
  drm, cgroup: Initialize drmcg properties

 Documentation/admin-guide/cgroup-v2.rst    |  58 ++-
 Documentation/cgroup-v1/drm.rst            |   1 +
 drivers/gpu/drm/drm_drv.c                  |  11 +
 drivers/gpu/drm/drm_gem.c                  |  89 ++++
 drivers/gpu/drm/i915/gem/i915_gem_mman.c   |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_region.c |  23 +-
 drivers/gpu/drm/i915/intel_memory_region.c |  13 +-
 drivers/gpu/drm/i915/intel_memory_region.h |   2 +-
 include/drm/drm_cgroup.h                   |  85 ++++
 include/drm/drm_device.h                   |   7 +
 include/drm/drm_gem.h                      |  17 +
 include/linux/cgroup_drm.h                 | 113 +++++
 include/linux/cgroup_subsys.h              |   4 +
 init/Kconfig                               |   5 +
 kernel/cgroup/Makefile                     |   1 +
 kernel/cgroup/drm.c                        | 533 +++++++++++++++++++++
 16 files changed, 954 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/cgroup-v1/drm.rst
 create mode 100644 include/drm/drm_cgroup.h
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

Comments

Xingyou Chen Jan. 29, 2021, 2:45 a.m. UTC | #1

On 2021/1/27 上午5:46, Brian Welty wrote:
> We'd like to revisit the proposal of a GPU cgroup controller for managing
> GPU devices but with just a basic set of controls.  This series is based on 
> the prior patch series from Kenny Ho [1].  We take Kenny's base patches
> which implement the basic framework for the controller, but we propose an
> alternate set of control files.  Here we've taken a subset of the controls
> proposed in earlier discussion on ML here [2]. 
>
> This series proposes a set of device memory controls (gpu.memory.current,
> gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage
> (gpu.sched.runtime).  GPU time sharing controls are left as future work.
> These are implemented within the GPU controller along with integration/usage
> of the device memory controls by the i915 device driver.
>
> As an accelerator or GPU device is similar in many respects to a CPU with
> (or without) attached system memory, the basic principle here is try to
> copy the semantics of existing controls from other controllers when possible
> and where these controls serve the same underlying purpose.
> For example, the memory.max and memory.current controls are based on
> same controls from MEMCG controller.

It seems not to be DRM specific, or even GPU specific. Would we have an universal control group for

any accelerator, GPGPU device etc, that hold sharable resources like device memory, compute utility,

bandwidth, with extra control file to select between devices(or vendors)?

e.g. /cgname.device that stores PCI BDF， or enum(intel, amdgpu, nvidia, ...), defaults to none,

means not enabled.

Xingyou Chen Jan. 29, 2021, 3 a.m. UTC | #2

On 2021/1/27 上午5:46, Brian Welty wrote:

> We'd like to revisit the proposal of a GPU cgroup controller for managing
> GPU devices but with just a basic set of controls.  This series is based on 
> the prior patch series from Kenny Ho [1].  We take Kenny's base patches
> which implement the basic framework for the controller, but we propose an
> alternate set of control files.  Here we've taken a subset of the controls
> proposed in earlier discussion on ML here [2]. 
>
> This series proposes a set of device memory controls (gpu.memory.current,
> gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage
> (gpu.sched.runtime).  GPU time sharing controls are left as future work.
> These are implemented within the GPU controller along with integration/usage
> of the device memory controls by the i915 device driver.
>
> As an accelerator or GPU device is similar in many respects to a CPU with
> (or without) attached system memory, the basic principle here is try to
> copy the semantics of existing controls from other controllers when possible
> and where these controls serve the same underlying purpose.
> For example, the memory.max and memory.current controls are based on
> same controls from MEMCG controller.

It seems not to be DRM specific, or even GPU specific. Would we have an universal
control group for any accelerator, GPGPU device etc, that hold sharable resources
like device memory, compute utility, bandwidth, with extra control file to select
between devices(or vendors)?

e.g. /cgname.device that stores PCI BDF， or enum(intel, amdgpu, nvidia, ...),
defaults to none, means not enabled.

Welty, Brian Feb. 1, 2021, 11:21 p.m. UTC | #3

On 1/28/2021 7:00 PM, Xingyou Chen wrote:
> On 2021/1/27 上午5:46, Brian Welty wrote:
> 
>> We'd like to revisit the proposal of a GPU cgroup controller for managing
>> GPU devices but with just a basic set of controls.  This series is based on 
>> the prior patch series from Kenny Ho [1].  We take Kenny's base patches
>> which implement the basic framework for the controller, but we propose an
>> alternate set of control files.  Here we've taken a subset of the controls
>> proposed in earlier discussion on ML here [2]. 
>>
>> This series proposes a set of device memory controls (gpu.memory.current,
>> gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage
>> (gpu.sched.runtime).  GPU time sharing controls are left as future work.
>> These are implemented within the GPU controller along with integration/usage
>> of the device memory controls by the i915 device driver.
>>
>> As an accelerator or GPU device is similar in many respects to a CPU with
>> (or without) attached system memory, the basic principle here is try to
>> copy the semantics of existing controls from other controllers when possible
>> and where these controls serve the same underlying purpose.
>> For example, the memory.max and memory.current controls are based on
>> same controls from MEMCG controller.
> 
> It seems not to be DRM specific, or even GPU specific. Would we have an universal
> control group for any accelerator, GPGPU device etc, that hold sharable resources
> like device memory, compute utility, bandwidth, with extra control file to select
> between devices(or vendors)?
> 
> e.g. /cgname.device that stores PCI BDF， or enum(intel, amdgpu, nvidia, ...),
> defaults to none, means not enabled.
> 

Hi, thanks for the feedback.  Yes, I tend to agree.  I've asked about this in
earlier work; my suggestion is to name the controller something like 'XPU' to
be clear that these controls could apply to more than GPU.

But at least for now, based on Tejun's reply [1], the feedback is to try and keep
this controller as small and focused as possible on just GPU.  At least until
we get some consensus on set of controls for GPU.....  but for this we need more
active input from community......

-Brian

[1] https://lists.freedesktop.org/archives/dri-devel/2019-November/243167.html

Daniel Vetter Feb. 3, 2021, 10:18 a.m. UTC | #4

On Mon, Feb 01, 2021 at 03:21:35PM -0800, Brian Welty wrote:
> 
> On 1/28/2021 7:00 PM, Xingyou Chen wrote:
> > On 2021/1/27 上午5:46, Brian Welty wrote:
> > 
> >> We'd like to revisit the proposal of a GPU cgroup controller for managing
> >> GPU devices but with just a basic set of controls.  This series is based on 
> >> the prior patch series from Kenny Ho [1].  We take Kenny's base patches
> >> which implement the basic framework for the controller, but we propose an
> >> alternate set of control files.  Here we've taken a subset of the controls
> >> proposed in earlier discussion on ML here [2]. 
> >>
> >> This series proposes a set of device memory controls (gpu.memory.current,
> >> gpu.memory.max, and gpu.memory.total) and accounting of GPU time usage
> >> (gpu.sched.runtime).  GPU time sharing controls are left as future work.
> >> These are implemented within the GPU controller along with integration/usage
> >> of the device memory controls by the i915 device driver.
> >>
> >> As an accelerator or GPU device is similar in many respects to a CPU with
> >> (or without) attached system memory, the basic principle here is try to
> >> copy the semantics of existing controls from other controllers when possible
> >> and where these controls serve the same underlying purpose.
> >> For example, the memory.max and memory.current controls are based on
> >> same controls from MEMCG controller.
> > 
> > It seems not to be DRM specific, or even GPU specific. Would we have an universal
> > control group for any accelerator, GPGPU device etc, that hold sharable resources
> > like device memory, compute utility, bandwidth, with extra control file to select
> > between devices(or vendors)?
> > 
> > e.g. /cgname.device that stores PCI BDF， or enum(intel, amdgpu, nvidia, ...),
> > defaults to none, means not enabled.
> > 
> 
> Hi, thanks for the feedback.  Yes, I tend to agree.  I've asked about this in
> earlier work; my suggestion is to name the controller something like 'XPU' to
> be clear that these controls could apply to more than GPU.
> 
> But at least for now, based on Tejun's reply [1], the feedback is to try and keep
> this controller as small and focused as possible on just GPU.  At least until
> we get some consensus on set of controls for GPU.....  but for this we need more
> active input from community......

There's also nothing stopping anyone from exposing any kind of XPU as
drivers/gpu device. Aside from the "full stack must be open requirement we
have" in drm. And frankly with drm being very confusing acronym we could
also rename GPU to be the "general processing unit" subsytem :-)
-Daniel

> 
> -Brian
> 
> [1] https://lists.freedesktop.org/archives/dri-devel/2019-November/243167.html