From patchwork Fri Dec 2 19:07:17 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Parav Pandit X-Patchwork-Id: 9458991 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 835F960756 for ; Fri, 2 Dec 2016 19:08:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7190E28574 for ; Fri, 2 Dec 2016 19:08:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 65EED2858F; Fri, 2 Dec 2016 19:08:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA8D82858F for ; Fri, 2 Dec 2016 19:08:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755935AbcLBTHq (ORCPT ); Fri, 2 Dec 2016 14:07:46 -0500 Received: from mail-pg0-f68.google.com ([74.125.83.68]:36362 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757571AbcLBTHn (ORCPT ); Fri, 2 Dec 2016 14:07:43 -0500 Received: by mail-pg0-f68.google.com with SMTP id x23so8663632pgx.3; Fri, 02 Dec 2016 11:07:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=acG6YymFpYpiOJJU6BZQixcs4fqNqU0GzDC5BkjgF7s=; b=PNbHcwIXcaV+RtSw4LPazMGw8o5WDUJDl576VFG8UHINKydSFZwwhOzNfk4n0TjOmf IyFlaX17N2+oHZBe724YrW1VLThaRhnYSdjx9XM6I41i0t156UEZyPQhk/aEr6BRV1yA btp33p5BThDxDnG9x3+k1HZPbmxczVjNXoPUeddOKZWidb1D/nxdiYY96owPIVIonYAt jfjI1JTtIe/vzzShGVqlPCEd54v/oZt5HdQ8S9NeHHUKa9D+L7HMlHjABYI+zUtTWnCs FCJeQqtZ0B6tuKcD68nMLjjtXvHphKUIpF/QyIz42DZN7ZyLDR7sL+z96mibpeltmCvb 2qhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=acG6YymFpYpiOJJU6BZQixcs4fqNqU0GzDC5BkjgF7s=; b=an+wL2jdzSli5rwBSFDpPEZNiPV15ly12ebIHLnomGeJAz2ggplw9w9NjVxwbdo+Zf T0mnxTLLVgjWLD0NVc7i8Rmo9GQK43/CDMpw5b4DPsYqwr6JWGFQ7sI+wqDat2NinvXb s5iH0f0isnoYW4D+O2ka7v4dIK43lQfp2Fj4U1ZsBY+kEhSwZBBChol5J7sHqJSYJXfG 8/yGmeqpguVclM26PhMcbO7w/yvVkwMUE1JF7u5n/YNazIbgw6l+M3Qpat3pSLMa5OHK pscqfWfBHFVgG707DCdadtwlVSpiXhthSQV+2ukKsH+dLmAaVNci+xHm02M7XLIEp9G+ Y51g== X-Gm-Message-State: AKaTC03gtJKHCymlwvQTRxqFJ+/3h/NO6C4RyMbwP31BRJzYdka9UxkkizuoX/gySdCWaA== X-Received: by 10.99.189.26 with SMTP id a26mr9631777pgf.67.1480705657755; Fri, 02 Dec 2016 11:07:37 -0800 (PST) Received: from ip-172-31-0-41.us-west-2.compute.internal (ec2-35-165-38-7.us-west-2.compute.amazonaws.com. [35.165.38.7]) by smtp.gmail.com with ESMTPSA id d1sm9409894pfb.76.2016.12.02.11.07.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Dec 2016 11:07:37 -0800 (PST) From: Parav Pandit To: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, tj@kernel.org, lizefan@huawei.com, hannes@cmpxchg.org, dledford@redhat.com, hch@lst.de, liranl@mellanox.com, sean.hefty@intel.com, jgunthorpe@obsidianresearch.com, haggaie@mellanox.com Cc: corbet@lwn.net, james.l.morris@oracle.com, serge@hallyn.com, ogerlitz@mellanox.com, matanb@mellanox.com, akpm@linux-foundation.org, linux-security-module@vger.kernel.org, pandit.parav@gmail.com Subject: [PATCHv13 3/3] rdmacg: Added documentation for rdmacg Date: Fri, 2 Dec 2016 19:07:17 +0000 Message-Id: <1480705637-2986-4-git-send-email-pandit.parav@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1480705637-2986-1-git-send-email-pandit.parav@gmail.com> References: <1480705637-2986-1-git-send-email-pandit.parav@gmail.com> Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Added documentation for v1 and v2 version describing high level design and usage examples on using rdma controller. Signed-off-by: Parav Pandit --- Documentation/cgroup-v1/rdma.txt | 109 +++++++++++++++++++++++++++++++++++++++ Documentation/cgroup-v2.txt | 38 ++++++++++++++ 2 files changed, 147 insertions(+) create mode 100644 Documentation/cgroup-v1/rdma.txt diff --git a/Documentation/cgroup-v1/rdma.txt b/Documentation/cgroup-v1/rdma.txt new file mode 100644 index 0000000..af61817 --- /dev/null +++ b/Documentation/cgroup-v1/rdma.txt @@ -0,0 +1,109 @@ + RDMA Controller + ---------------- + +Contents +-------- + +1. Overview + 1-1. What is RDMA controller? + 1-2. Why RDMA controller needed? + 1-3. How is RDMA controller implemented? +2. Usage Examples + +1. Overview + +1-1. What is RDMA controller? +----------------------------- + +RDMA controller allows user to limit RDMA/IB specific resources that a given +set of processes can use. These processes are grouped using RDMA controller. + +RDMA controller defines two resources which can be limited for processes of a +cgroup. + +1-2. Why RDMA controller needed? +-------------------------------- + +Currently user space applications can easily take away all the rdma verb +specific resources such as AH, CQ, QP, MR etc. Due to which other applications +in other cgroup or kernel space ULPs may not even get chance to allocate any +rdma resources. This can leads to service unavailability. + +Therefore RDMA controller is needed through which resource consumption +of processes can be limited. Through this controller different rdma +resources can be accounted. + +1-3. How is RDMA controller implemented? +---------------------------------------- + +RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains +resource accounting per cgroup, per device using resource pool structure. +Each such resource pool is limited up to 64 resources in given resource pool +by rdma cgroup, which can be extended later if required. + +This resource pool object is linked to the cgroup css. Typically there +are 0 to 4 resource pool instances per cgroup, per device in most use cases. +But nothing limits to have it more. At present hundreds of RDMA devices per +single cgroup may not be handled optimally, however there is no +known use case or requirement for such configuration either. + +Since RDMA resources can be allocated from any process and can be freed by any +of the child processes which shares the address space, rdma resources are +always owned by the creator cgroup css. This allows process migration from one +to other cgroup without major complexity of transferring resource ownership; +because such ownership is not really present due to shared nature of +rdma resources. Linking resources around css also ensures that cgroups can be +deleted after processes migrated. This allow progress migration as well with +active resources, even though that is not a primary use case. + +Whenever RDMA resource charging occurs, owner rdma cgroup is returned to +the caller. Same rdma cgroup should be passed while uncharging the resource. +This also allows process migrated with active RDMA resource to charge +to new owner cgroup for new resource. It also allows to uncharge resource of +a process from previously charged cgroup which is migrated to new cgroup, +even though that is not a primary use case. + +Resource pool object is created in following situations. +(a) User sets the limit and no previous resource pool exist for the device +of interest for the cgroup. +(b) No resource limits were configured, but IB/RDMA stack tries to +charge the resource. So that it correctly uncharge them when applications are +running without limits and later on when limits are enforced during uncharging, +otherwise usage count will drop to negative. + +Resource pool is destroyed if all the resource limits are set to max and +it is the last resource getting deallocated. + +User should set all the limit to max value if it intents to remove/unconfigure +the resource pool for a particular device. + +IB stack honors limits enforced by the rdma controller. When application +query about maximum resource limits of IB device, it returns minimum of +what is configured by user for a given cgroup and what is supported by +IB device. + +Following resources can be accounted by rdma controller. + hca_handle Maximum number of HCA Handles + hca_object Maximum number of HCA Objects + +2. Usage Examples +----------------- + +(a) Configure resource limit: +echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max +echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max + +(b) Query resource limit: +cat /sys/fs/cgroup/rdma/2/rdma.max +#Output: +mlx4_0 hca_handle=2 hca_object=2000 +ocrdma1 hca_handle=3 hca_object=max + +(c) Query current usage: +cat /sys/fs/cgroup/rdma/2/rdma.current +#Output: +mlx4_0 hca_handle=1 hca_object=20 +ocrdma1 hca_handle=1 hca_object=23 + +(d) Delete resource limit: +echo echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index 4cc07ce..94350d7 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -47,6 +47,8 @@ CONTENTS 5-3. IO 5-3-1. IO Interface Files 5-3-2. Writeback + 5-4. RDMA + 5-4-1. RDMA Interface Files 6. Namespace 6-1. Basics 6-2. The Root and Views @@ -1119,6 +1121,42 @@ writeback as follows. vm.dirty[_background]_ratio. +5-4. RDMA + +The "rdma" controller regulates the distribution and accounting of +of RDMA resources. + +5-4-1. RDMA Interface Files + + rdma.max + A readwrite nested-keyed file that exists for all the cgroups + except root that describes current configured resource limit + for a RDMA/IB device. + + Lines are keyed by device name and are not ordered. + Each line contains space separated resource name and its configured + limit that can be distributed. + + The following nested keys are defined. + + hca_handle Maximum number of HCA Handles + hca_object Maximum number of HCA Objects + + An example for mlx4 and ocrdma device follows. + + mlx4_0 hca_handle=2 hca_object=2000 + ocrdma1 hca_handle=3 hca_object=max + + rdma.current + A read-only file that describes current resource usage. + It exists for all the cgroup except root. + + An example for mlx4 and ocrdma device follows. + + mlx4_0 hca_handle=1 hca_object=20 + ocrdma1 hca_handle=1 hca_object=23 + + 6. Namespace 6-1. Basics