From patchwork Tue Nov 20 18:58:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ho, Kenny" X-Patchwork-Id: 10691507 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0607214BD for ; Wed, 21 Nov 2018 01:09:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E5C322AF79 for ; Wed, 21 Nov 2018 01:09:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D95162AFCF; Wed, 21 Nov 2018 01:09:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAD_ENC_HEADER,BAYES_00, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=unavailable version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6A2F82AF79 for ; Wed, 21 Nov 2018 01:09:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CC4096E4E1; Wed, 21 Nov 2018 01:09:47 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from NAM05-DM3-obe.outbound.protection.outlook.com (mail-eopbgr730078.outbound.protection.outlook.com [40.107.73.78]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4B0CD6E375; Tue, 20 Nov 2018 18:58:59 +0000 (UTC) Received: from CY4PR1201CA0014.namprd12.prod.outlook.com (2603:10b6:910:16::24) by SN1PR12MB0751.namprd12.prod.outlook.com (2a01:111:e400:c45b::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1339.26; Tue, 20 Nov 2018 18:58:57 +0000 Received: from BY2NAM03FT025.eop-NAM03.prod.protection.outlook.com (2a01:111:f400:7e4a::208) by CY4PR1201CA0014.outlook.office365.com (2603:10b6:910:16::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1361.14 via Frontend Transport; Tue, 20 Nov 2018 18:58:56 +0000 Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) Received: from SATLEXCHOV02.amd.com (165.204.84.17) by BY2NAM03FT025.mail.protection.outlook.com (10.152.84.232) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1339.10 via Frontend Transport; Tue, 20 Nov 2018 18:58:56 +0000 Received: from kho-5039A.amd.com (10.180.168.240) by SATLEXCHOV02.amd.com (10.181.40.72) with Microsoft SMTP Server id 14.3.389.1; Tue, 20 Nov 2018 12:58:55 -0600 From: Kenny Ho To: , , , , , Date: Tue, 20 Nov 2018 13:58:09 -0500 Message-ID: <20181120185814.13362-1-Kenny.Ho@amd.com> X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:165.204.84.17; IPV:NLI; CTRY:US; EFV:NLI; SFV:NSPM; SFS:(10009020)(39860400002)(396003)(376002)(136003)(346002)(2980300002)(428003)(189003)(199004)(50466002)(478600001)(48376002)(426003)(186003)(86362001)(217873002)(47776003)(26005)(2906002)(72206003)(966005)(77096007)(2201001)(51416003)(7696005)(50226002)(2870700001)(8936002)(336012)(8676002)(81166006)(68736007)(81156014)(14444005)(126002)(476003)(2616005)(110136005)(486006)(53416004)(104016004)(6306002)(53936002)(1076002)(305945005)(105586002)(6666004)(36756003)(97736004)(356004)(5660300001)(316002)(106466001)(39060400002); DIR:OUT; SFP:1101; SCL:1; SRVR:SN1PR12MB0751; H:SATLEXCHOV02.amd.com; FPR:; SPF:None; LANG:en; PTR:InfoDomainNonexistent; MX:1; A:1; X-Microsoft-Exchange-Diagnostics: 1; BY2NAM03FT025; 1:1JvMAFdiozRtCo+qpyw5FMv2kLEWaLdY8AlThDyQFbcIz84S/mKxFzhLctsyV0K7CL+xJGpZjX9YI5B8QNdLeiFvNg6CpEzpQeWcWRsYwfdleVFpmiN59mzIzq+10rvk X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 373164a4-9918-42d6-f857-08d64f1a3948 X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(2390098)(7020095)(4652040)(8989299)(5600074)(711020)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060); SRVR:SN1PR12MB0751; X-Microsoft-Exchange-Diagnostics: 1; SN1PR12MB0751; 3:jhvZJhSbyleIjh5mX9wRhUyMqZCCuRHEzwmeWMBp+RiEsh4fwAqlDS8hQ/e4gvHZvPnTW38YOI+6FJ7kLiuv14Mxnb8uFoA4M2g8w2xb1FqWqD8vLUPsIUyh9qGsTeuhzMGvgd7pAugQUnyYDvULFJIyu1XIqw7oSrMsIwpITEGMHaJmR8WdSmpPsGSsX+WsEhKOgNVqD+b37Fl6L01t+KcRfFKo0EOE9wqrYC3O/3olQvfjDjDhKdNTHP89YAP386G6ahru7+abSUZ6WWSoUCqFSXXxmqQdFx1eXu8/SA+MJbBSMFjVhdhrersdy12DOCCLeucgaDCAncySuv30SYy+EouN0hvbxnXb89L8Ecw=; 25:oSb6qXPJXYIXY4+itLOtkDv0QqqnCjS9oQ5hlseQWyau131hpi/16o6Aqh3iVOTGIvu4N9lg94iyoMbyhD6dKhar57R5RVoInwDW8NCWcxDIuzqwRZv4Ij0MTWHenCOwxsJicwvNfGX3HpkWWlaekyTIMcVfpYDN4q1JEO8trFB8PoUnZGzHj25tuG278o4q5ETsLOcsvwdMIZ/ML0aOcoumZGiNmk0dRER+cNCx0RLHr7l/nN9uyD9Sowb1TQkXrGje4EHSbW3WYpHq0h3t4QBdErYa4GBDcaorswSfpyGi+3+ygWQYiGoPs20bwXDXjw4zLMWdljODcgngLHtdRw== X-MS-TrafficTypeDiagnostic: SN1PR12MB0751: X-Microsoft-Exchange-Diagnostics: 1; SN1PR12MB0751; 31:Bx3G4UpUSLIfI0J1osy7IvtRJebgZ45nljkV5Y1EXavVEJyKR266whImhUWsnPO8prfjKWyw/SVw6eeWcIppoWHhn1F5lwClJObefa7KER6ogZ+N5+uNI5aqqH1r9dQjbikGsvx3vX1klwku0wSPWhCZMh+Uf8aXyy2I80PpFmbJdKOuGgae0Q0Ynmg380iAGinHQ1uAHeHoeoMzN3XImFbKcA4cNajpuI/4YvA2sZI=; 20:FjthwwVsMob5VwV1EMQujeFKZLom7Nyp/TMUk2F/rb+90xCW6f2odYgb9z/1yjYpQcZJBMnbDW2BLlBMp20rxYpWPU7PdGusBkWMsyssc8mefl2X0GPW5z7FKjTNnznBnRmVwx9NS4O1YjJsw5BiNv450vCODEepfgJpe9jU6zSY4lHCQ4uK8bQl4vqB0y7tAiH7APxwpXeqjJ+2YVevscfNWY3UDexRPquTxr2qYHATa6+uDGi6klaGFh1CdHurl8YPVTrqV2VBZl5b6qVWCSrEw5QKTd8iIrixAXcjjuyG7W8RmGzYacWYUA2trAAdq6iCBaBimW2XqR7E92vJJ15dtDtPolf0ru3ExGWHfaUQ+lLY8vmfqKdLEualgEkUpzj8rUA2GFuezn8iInbOJNJ/dc0w3gpNYVu0XvHQ12OgzYbH3vm0+1zUpO+lQso+Cduagk6/vLQ4IokandYvWudfzG3CNsPXqJ2ISEoL6AxYVK5fDFHWO4agADiNW8RY X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(93006095)(93003095)(3231442)(944501410)(52105112)(3002001)(10201501046)(6055026)(148016)(149066)(150057)(6041310)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(20161123564045)(20161123560045)(201708071742011)(7699051)(76991095); SRVR:SN1PR12MB0751; BCL:0; PCL:0; RULEID:; SRVR:SN1PR12MB0751; X-Microsoft-Exchange-Diagnostics: 1; SN1PR12MB0751; 4:4ywGONdQUoj4d94GnxEfDI61rH6BIbOxwWx4yq3BKB5KElkZ92dX750/gJnKSIu4p3AZdYPmp7XHhugb9bTPDqlogaNF12fthfJHRmFxXhX9Qn5ZSQevEiG3O+joQdXxwyxlWc/q+4/9eJKL/qL/H0ZSqgW3fyJXXEPrCFrz21uI5VPkSPf3TSmldQV/63mI8Rza5UIyYJERf9DsHUc+pOKjbcyP90XVEVVmGYOTwBzKF/9gMO9lVrTDI8qTx5sBaL0dCi3hkaaI6fVM2dlK8g== X-Forefront-PRVS: 08626BE3A5 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; SN1PR12MB0751; 23:Yk3kXRKr+3mRd1o5sa9ba26lXgysAa1dgPSYPGQZi?= 0QemiHsYTMJfDbYLUeyvyS85Ji68T9KjPd7rIvWYyKpPtHv8FjvS9oeHo0YeY/zoCpBMyQiZw1yl6GB9W5CsiRPp7alXnRAVT4hKD2CJaxDk42sqThqhx5Snz/HlraYCLmIgd0+jTSIYpx3m1Pdj2SRR72J0U4qP8oalG9yhAgAssDmlbSmYkHmi9QkCn22ezn/ItrFH2wjDajU5/eUMi3ZCEIBiejo2hsmxGtAD6GI4j/TPlYj+IwNhlKfZFULJgbjz4hniNDAUZCVaqnFrR2cN5AduuwD4o3LPKeSEhcqjdOBCdhwN1OpK9yO0ylLnOWLep/FWDaMHvp4IVBNvN2ri5vD3UIRGFz4qN4GNyevDduF8km8t//ODs2CWmokrv8cGQYf3FLGIWYCJnXf6fgjxxCqHsud0usL20lVE1/2L0f+9WZZZEWg2qiU0Pum7wWEEvJSrJiLAMg+viiwIFSTZWzDoza/7r8Sk3up2CcVV2JmC/eemTKWZ0GDN7T2ipKwET6+x0Z1YSwcmawZoB9ARdy7LujYGXQz/EatEIpgfybChZQ0xbaYP+/Ya+IVfLZHMCDk+1/XJ3nIJ4NFW7NiCS3aS1TUrGPYJQmPRlUHuZRe8/sNY5WAlWedSTlCO5LLNADKH1M1gk6RAYQdLcO0kPwiQ4BSPbSQy1K49O1wiY8j9HxqvZg9NR/1c2iQvV1eWimSSJ5Mm3rsFyNjJ9TbSn90SMd0hev+sY2OZYOStWxmp+SdDB5I7GrjUsRmwszwFXK5Jj2+i+QpQY5IUkyLT+bUeS5xb3KEqJK1cl9tpBRnb7tgr9j17qTtvJ7kcgFSkJj5YrLPVvXG0HrJLRRYCNQ1/5kKg8ueitGQ43qLkufrNiv/nG1SmKJERGOk0JpNPdjoCpFVAc98nN1CXxjV/m+VVtLPog56lEvMhW08Uem1OefagJ4iESeC2B3hu6LlBjza1ItFDgwG5RJ+tGc3axMCfhiczX0SInT+ZbXLGg4YWO717KoSsBSCrPg9DK/pg9SmIscgZRj4JQYHtQ3+tqrpcG8HoyQtylkg+BZb9M/ozGIH9FJIcecxDMFScdWs2nAf1mRJOiCcu9UGBUio6CqrUKnwRsereXbEMl65PMA8YO0cVAr6s9k8WLdPqdLBlHcCHLh60dgyhZzQY8ft X-Microsoft-Antispam-Message-Info: K7haVjZDZBMjyhVNfbQHhoiV+dWoqDKgmHd91OmDJ4B9GAOUJHJ25u45aRDptAm2Dun0bCVHnc5704wuwpeLQCzTo6zawLZx0yQUwhwX0OSq/y0yXvqveP0cSckYMO2/k1GVL17W/OCeqEfHzgiTHGfhWGCvTVuC8QeczPzg+p+q9nVWzseW5v+CfHoNX8YpvacRrycIYYVUIcM2HEonBELsSeTTRTPDQvfpeuYe90Cs62dXM61XrQY5YaBdVDqkcOtjyDNM1qcidj+r2WqX8yEZlmds3lsddUSecIxSZQXqBl2YQIx3rVIfink87VRdNb7roeG58/msMXtPeZ3CzlQ6P+wAEHT/E4LyjWnmDk8= X-Microsoft-Exchange-Diagnostics: 1; SN1PR12MB0751; 6:f3WZy0dZhLZjHybNkOpzuKh5cX4C3HZ6edZjUywZSw23xtbzWcLSswYhJqF276zn6e4ReVG0+srm8YDm6SwiqohMSfdLQ6LrIDnY1GqI98kgkAKrHSaYByz9jHvpcibklAcEpyN6j7e9vZxiAC1uhUpKq8DSFM2B2jmRtLPJjqeFPDqV1pbVikiN8MS+/ePuDBW+oXDOiUxxibNvl9LYjh4C3VLS9CnhkGs1gHOwKNZ64jZ2DdtW+3GUCrc1r/aOr5BKG+jgeuovk1OJ5TRNBOmdL8uehX3u1wMPQGV5hFidi2HgkS3Yx/bWMjtvQygE2ODrO16nXAoVaBBRt/vDE3Qxe7kn/6VbLXkzzqFl4a8aOMIfx6AUVpok9/WTeVRt/88DvrRXL/12LQQ+YLM1a31PPWMX1uDZBQewvX0NgrbLofetZQ2eV2k9S85/SEaLP0ODxs7WcSLHTiaQ11fzAg==; 5:5iJO/4uyzMNc3VfXu2avJoTWZJYWnA63sYElbioi5sJHw3ZzttyAmxQqiOf0YTBdAkObN6uBJXmCDPXi6cBfTQXUOibdkvq4oHfmT1EllGIKqsY0dF7QqbVciIPWk2YotvIX1Qblje5o/GtPGjoPlV+FQhEXCpIDxReN3F3G+KQ=; 7:Jdb2p0i/REwPlUScWQ2VFFECA1XV+yw3hd5naHPlPXZZmTvuTbUBfdxBsh/Z4jSrZROEi07ASAufajKG1rCZo/mSkCAL6oCAqS0r47R5Nwzt6NFZHeuwn1jHBfS5ObZN53cFbMwL+bD2RcMH+ZDxVw== SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; SN1PR12MB0751; 20:6idkZsfzn1O/QlPUsBjSqvAm7eb4YoxLJRkbQACbPq5o56X5pybbGOWAR3+4qZBaXFxQV4fgcxqTEH+PLv/nCOGN266N4n8is9LKBWTvequlzDXe8jOysrd/T8+ZXN9/NaUDRJCpnqvJ5guy0/taXOJPYygItWxkz+NVebluTywa8CJJwzRwB9gLA5ZpFmUEycrRwUsFCF6CGz5f9MDu/WKsIuhxvRXVWrdsrBBydzEls67kblxiZVpa3hDA+i2w X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Nov 2018 18:58:56.6300 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 373164a4-9918-42d6-f857-08d64f1a3948 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXCHOV02.amd.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN1PR12MB0751 X-Mailman-Approved-At: Wed, 21 Nov 2018 01:09:32 +0000 Subject: [Intel-gfx] [PATCH RFC 0/5] DRM cgroup controller X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP The purpose of this patch series is to start a discussion for a generic cgroup controller for the drm subsystem. The design proposed here is a very early one. We are hoping to engage the community as we develop the idea. Backgrounds ========== Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behaviour, such as accounting/limiting the resources which processes in a cgroup can access[1]. Weights, limits, protections, allocations are the main resource distribution models. Existing cgroup controllers includes cpu, memory, io, rdma, and more. cgroup is one of the foundational technologies that enables the popular container application deployment and management method. Direct Rendering Manager/drm contains code intended to support the needs of complex graphics devices. Graphics drivers in the kernel may make use of DRM functions to make tasks like memory management, interrupt handling and DMA easier, and provide a uniform interface to applications. The DRM has also developed beyond traditional graphics applications to support compute/GPGPU applications. Motivations ========= As GPU grow beyond the realm of desktop/workstation graphics into areas like data center clusters and IoT, there are increasing needs to monitor and regulate GPU as a resource like cpu, memory and io. Matt Roper from Intel began working on similar idea in early 2018 [2] for the purpose of managing GPU priority using the cgroup hierarchy. While that particular use case may not warrant a standalone drm cgroup controller, there are other use cases where having one can be useful [3]. Monitoring GPU resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help sysadmins get a better understanding of the applications usage profile. Further usage regulations of the aforementioned resources can also help sysadmins optimize workload deployment on limited GPU resources. With the increased importance of machine learning, data science and other cloud-based applications, GPUs are already in production use in data centers today [5,6,7]. Existing GPU resource management is very course grain, however, as sysadmins are only able to distribute workload on a per-GPU basis [8]. An alternative is to use GPU virtualization (with or without SRIOV) but it generally acts on the entire GPU instead of the specific resources in a GPU. With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU resource management (in addition to what may be available via GPU virtualization.) In addition to production use, the DRM cgroup can also help with testing graphics application robustness by providing a mean to artificially limit DRM resources availble to the applications. Challenges ======== While there are common infrastructure in DRM that is shared across many vendors (the scheduler [4] for example), there are also aspects of DRM that are vendor specific. To accommodate this, we borrowed the mechanism used by the cgroup to handle different kinds of cgroup controller. Resources for DRM are also often device (GPU) specific instead of system specific and a system may contain more than one GPU. For this, we borrowed some of the ideas from RDMA cgroup controller. Approach ======= To experiment with the idea of a DRM cgroup, we would like to start with basic accounting and statistics, then continue to iterate and add regulating mechanisms into the driver. [1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt [2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html [3] https://www.spinics.net/lists/cgroups/msg20720.html [4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler [5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ [6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/ [7] https://github.com/RadeonOpenCompute/k8s-device-plugin [8] https://github.com/kubernetes/kubernetes/issues/52757 Kenny Ho (5): cgroup: Introduce cgroup for drm subsystem cgroup: Add mechanism to register vendor specific DRM devices drm/amdgpu: Add DRM cgroup support for AMD devices drm/amdgpu: Add accounting of command submission via DRM cgroup drm/amdgpu: Add accounting of buffer object creation request via DRM cgroup drivers/gpu/drm/amd/amdgpu/Makefile | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 + drivers/gpu/drm/amd/amdgpu/amdgpu_drmcgrp.c | 147 ++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_drmcgrp.h | 27 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 13 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 15 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 5 +- include/drm/drm_cgroup.h | 39 ++++++ include/drm/drmcgrp_vendors.h | 8 ++ include/linux/cgroup_drm.h | 58 ++++++++ include/linux/cgroup_subsys.h | 4 + include/uapi/drm/amdgpu_drm.h | 24 +++- init/Kconfig | 5 + kernel/cgroup/Makefile | 1 + kernel/cgroup/drm.c | 130 +++++++++++++++++ 16 files changed, 484 insertions(+), 7 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_drmcgrp.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_drmcgrp.h create mode 100644 include/drm/drm_cgroup.h create mode 100644 include/drm/drmcgrp_vendors.h create mode 100644 include/linux/cgroup_drm.h create mode 100644 kernel/cgroup/drm.c