From patchwork Sat Jan 30 11:41:27 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Parav Pandit X-Patchwork-Id: 8170721 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 466FFBEEE5 for ; Sat, 30 Jan 2016 11:43:54 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 2304720398 for ; Sat, 30 Jan 2016 11:43:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CAA5F2038D for ; Sat, 30 Jan 2016 11:43:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756510AbcA3Lna (ORCPT ); Sat, 30 Jan 2016 06:43:30 -0500 Received: from mail-pf0-f175.google.com ([209.85.192.175]:36330 "EHLO mail-pf0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756412AbcA3LnW (ORCPT ); Sat, 30 Jan 2016 06:43:22 -0500 Received: by mail-pf0-f175.google.com with SMTP id n128so56266911pfn.3; Sat, 30 Jan 2016 03:43:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=vlW5DgSkWmFypYxRKC04ePhJ3Ad3aRGJhf1fcOGCoDQ=; b=Lkw904dCvm+JRsBL9vh89QT6OVHlo389JLAAZt+YmhKh3svU+H8HfGEaLSwnDNQHKc 1DhTOwabXTbPS58QnPL6EzVeeck7turP9oujrqzGQ26dOqmZmxawb6xe3dXAxmyOwds0 5juxsm44C61C1K1beoSUjZPhnQxO/yhBH+aS1l8XPss9b1yRT0eRasdWOb18PCnXdZ2Y CpISL96wBEOeSeZmqGCHYt2K8T2ZVKhPODgvWjd+eM+vjpIJkBt3IwZKjPrqsIN2SCfO RwYaTTMTVXTH9FtMTg8KYNM9JxjM0wVbH1Le9QoSoFe+oB6RDaQCqLp88D+kSu5z5mh0 QaCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=vlW5DgSkWmFypYxRKC04ePhJ3Ad3aRGJhf1fcOGCoDQ=; b=ZNCOHBmMuYp8Pmok3xW5lMHrldMgM2uO+FtQh+Nil71IDleWmtOsnLeDsj/JWdNthB tvmFwErkiLZQEPdQiynUF6467cCT3dBu9/vVwC6naeMT2Ypa0gFCIOFItLurEj8gZ12F aGjrraOQKOAzIkSLmY+9NXJwjF/DajgS90t4/pnTnnDK/hyAptkLTn9mSki91l4a4Gu1 adENs5pj+T4xKvw8rVhFpM29D+Yq4dBzv6R20Q4uGFeXNudnxpihjaRhiV4JvfEE3k/V o1aqm6+QJ/oNU/rpL/KsfFiNiwKKotFPu05fU/w6GqrfEqmyc3wcIDli1Y7FDwcN1Xka XYLQ== X-Gm-Message-State: AG10YOQPe9YSwQA3FAG6+ZwwOq5GYE5QOEi0argTfd10dC7cKRbTWZH3WF+SNgQ2m3usKg== X-Received: by 10.98.71.197 with SMTP id p66mr21822339pfi.166.1454154201393; Sat, 30 Jan 2016 03:43:21 -0800 (PST) Received: from server1.localdomain ([106.216.174.88]) by smtp.gmail.com with ESMTPSA id 16sm29974714pfh.48.2016.01.30.03.43.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 30 Jan 2016 03:43:19 -0800 (PST) From: Parav Pandit To: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, tj@kernel.org, lizefan@huawei.com, hannes@cmpxchg.org, dledford@redhat.com, liranl@mellanox.com, sean.hefty@intel.com, jgunthorpe@obsidianresearch.com, haggaie@mellanox.com Cc: corbet@lwn.net, james.l.morris@oracle.com, serge@hallyn.com, ogerlitz@mellanox.com, matanb@mellanox.com, raindel@mellanox.com, akpm@linux-foundation.org, linux-security-module@vger.kernel.org, pandit.parav@gmail.com Subject: [PATCHv2 3/3] rdmacg: Added documentation for rdma controller Date: Sat, 30 Jan 2016 17:11:27 +0530 Message-Id: <1454154087-27375-4-git-send-email-pandit.parav@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1454154087-27375-1-git-send-email-pandit.parav@gmail.com> References: <1454154087-27375-1-git-send-email-pandit.parav@gmail.com> MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Added documentation for rdma controller to use in legacy mode and using new unified hirerchy. Signed-off-by: Parav Pandit --- Documentation/cgroup-v1/rdma.txt | 122 +++++++++++++++++++++++++++++++++++++++ Documentation/cgroup-v2.txt | 43 ++++++++++++++ 2 files changed, 165 insertions(+) create mode 100644 Documentation/cgroup-v1/rdma.txt diff --git a/Documentation/cgroup-v1/rdma.txt b/Documentation/cgroup-v1/rdma.txt new file mode 100644 index 0000000..240e34a --- /dev/null +++ b/Documentation/cgroup-v1/rdma.txt @@ -0,0 +1,122 @@ + RDMA Controller + ---------------- + +Contents +-------- + +1. Overview + 1-1. What is RDMA controller? + 1-2. Why RDMA controller needed? + 1-3. How is RDMA controller implemented? +2. Usage Examples + +1. Overview + +1-1. What is RDMA controller? +------------------------------------- + +RDMA controller allows user to limit RDMA/IB specific resources +that a given set of processes can use. These processes are grouped using +RDMA controller. + +RDMA controller currently allows two different type of resource +pools. +(a) RDMA IB specification level verb resources defined by IB stack +(b) HCA vendor device specific resources + +RDMA controller controller allows maximum of upto 64 resources in +a resource pool which is the internal construct of rdma cgroup explained +at later part of this document. + +1-2. Why RDMA controller needed? +---------------------------------------- + +Currently user space applications can easily take away all the rdma device +specific resources such as AH, CQ, QP, MR etc. Due to which other applications +in other cgroup or kernel space ULPs may not even get chance to allocate any +rdma resources. This leads to service unavailability. + +Therefore RDMA controller is needed through which resource consumption +of processes can be limited. Through this controller various different rdma +resources described by IB uverbs layer and any HCA vendor driver can be +accounted. + +1-3. How is RDMA controller implemented? +------------------------------------------------ + +RDMA cgroup allows limit configuration of resources. These resources are not +defined by the rdma controller. Instead they are defined by the IB stack +and HCA device drivers(optionally). +This provides great flexibility to allow IB stack to define new resources, +without any changes to rdma cgroup. +Rdma cgroup maintains resource accounting per cgroup, per device, per resource +type using resource pool structure. Each such resource pool is limited up to +64 resources in given resource pool by rdma cgroup, which can be extended +later if required. + +This resource pool object is linked to the cgroup css. Typically there +are 0 to 4 resource pool instances per cgroup, per device in most use cases. +But nothing limits to have it more. At present hundreds of RDMA devices per +single cgroup may not be handled optimally, however there is no known use case +for such configuration either. + +Since RDMA resources can be allocated from any process and can be freed by any +of the child processes which shares the address space, rdma resources are +always owned by the creator cgroup css. This allows process migration from one +to other cgroup without major complexity of transferring resource ownership; +because such ownership is not really present due to shared nature of +rdma resources. Linking resources around css also ensures that cgroups can be +deleted after processes migrated. This allow progress migration as well with +active resources, even though that’s not the primary use case. + +Whenever RDMA resource charing occurs, owner rdma cgroup is returned to +the caller. Same rdma cgroup should be passed while uncharging the resource. +This also allows process migrated with active RDMA resource to charge +to new owner cgroup for new resource. It also allows to uncharge resource of +a process from previously charged cgroup which is migrated to new cgroup, +even though that is not a primary use case. + +Resource pool object is created in following situations. +(a) User sets the limit and no previous resource pool exist for the device +of interest for the cgroup. +(b) No resource limits were configured, but IB/RDMA stack tries to +charge the resource. So that it correctly uncharge them when applications are +running without limits and later on when limits are enforced during uncharging, +otherwise usage count will drop to negative. This is done using default +resource pool. Instead of implementing any sort of time markers, default pool +simplifies the design. + +Resource pool is destroyed if it was of default type (not created +by administrative operation) and it’s the last resource getting +deallocated. Resource pool created as administrative operation is not +deleted, as it’s expected to be used in near future. + +If user setting tries to delete all the resource limit +with active resources per device, RDMA cgroup just marks the pool as +default pool with maximum limits for each resource, otherwise it deletes the +default resource pool. + +2. Usage Examples +----------------- + +(a) Configure resource limit: +echo mlx4_0 mr=100 qp=10 ah=2 > /sys/fs/cgroup/rdma/1/rdma.verb.max +echo ocrdma1 mr=120 qp=20 cq=10 > /sys/fs/cgroup/rdma/2/rdma.verb.max + +(b) Query resource limit: +cat /sys/fs/cgroup/rdma/2/rdma.verb.max +#Output: +mlx4_0 mr=100 qp=10 ah=2 +ocrdma1 mr=120 qp=20 cq=10 + +(c) Query current usage: +cat /sys/fs/cgroup/rdma/2/rdma.verb.current +#Output: +mlx4_0 mr=95 qp=8 ah=2 +ocrdma1 mr=0 qp=20 cq=10 + +(d) Delete resource limit: +echo mlx4_0 remove > /sys/fs/cgroup/rdma/1/rdma.verb.max + +(e) Configure hw specific resource limit: (optional) +echo vendor1 hw_qp=56 > /sys/fs/cgroup/rdma/2/rdma.hw.max diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index 31d1f7b..6741529 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -47,6 +47,8 @@ CONTENTS 5-3. IO 5-3-1. IO Interface Files 5-3-2. Writeback + 5-4. RDMA + 5-4-1. RDMA Interface Files P. Information on Kernel Programming P-1. Filesystem Support for Writeback D. Deprecated v1 Core Features @@ -1012,6 +1014,47 @@ writeback as follows. total available memory and applied the same way as vm.dirty[_background]_ratio. +5-4. RDMA + +The "rdma" controller regulates the distribution of RDMA resources. +This controller implements both RDMA/IB verb level and RDMA HCA +driver level resource distribution. + +5-4-1. RDMA Interface Files + + rdma.verb.max + A readwrite file that exists for all the cgroups except root that + describes current configured verbs resource limit for a RDMA/IB device. + + Lines are keyed by device name and are not ordered. + Each line contains space separated resource name and its configured + limit that can be distributed. + + An example for mlx4 and ocrdma device follows. + + mlx4_0 mr=1000 qp=104 ah=2 + ocrdma1 mr=900 qp=89 cq=10 + + rdma.verb.current + A read-only file that describes current resource usage. + It exists for all the cgroup including root. + + An example for mlx4 and ocrdma device follows. + + mlx4_0 mr=1000 qp=102 ah=2 flow=10 srq=0 + ocrdma1 mr=900 qp=79 cq=10 flow=0 srq=0 + + rdma.hw.max + A readwrite file that exists for all the cgroups except root that + describes current configured HCA hardware resource limit for a + RDMA/IB device. + + Lines are keyed by device name and are not ordered. + Each line contains space separated resource name and its configured + limit that can be distributed. + + rdma.hw.current + A read-only file that describes current resource usage. P. Information on Kernel Programming