From patchwork Thu Aug 8 23:13:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 11084947 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1BB114DB for ; Thu, 8 Aug 2019 23:14:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DEA2628C03 for ; Thu, 8 Aug 2019 23:14:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CF89A28C02; Thu, 8 Aug 2019 23:14:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB29528C03 for ; Thu, 8 Aug 2019 23:14:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F1EA66B0008; Thu, 8 Aug 2019 19:13:59 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EA6DF6B000A; Thu, 8 Aug 2019 19:13:59 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D47B86B000C; Thu, 8 Aug 2019 19:13:59 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by kanga.kvack.org (Postfix) with ESMTP id AF0246B0008 for ; Thu, 8 Aug 2019 19:13:59 -0400 (EDT) Received: by mail-qk1-f199.google.com with SMTP id f22so2864010qkg.0 for ; Thu, 08 Aug 2019 16:13:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=1e5Y+e42KorvbcxGXpQwcEigXZeGV0ULHZdDaD0hvA4=; b=mTQgQroyXFPdDozBxJGxYcr3OdD7NU7MLKyDSS0fXZ5W1OfwLRJ79HNGkVvE/UfaDO rFtyOejXfN6Lx3u40FYqspXVwH7V+mBTcEwiw7DzRh1uRLr0J/5xavVnOcVARf52RqAn TnYupJCQNAxQdDEb939OXZG/QEywftByl7r3VcsT4us7ydj9vv/k+D5A2UjfdzXPKF7h g/jT0AKtekiN8xLgJrn+DytyEV/CHCsJiA7lAo6C/moa8LWkDav6EtRE8VyMTUM8sBUn urGUfcXCFgak9V/Cw8hoJ95KqW6gYcIUNXSPnGz4yPl0tc8yBWAELSrKz96HtKognUyz ww5Q== X-Gm-Message-State: APjAAAVqjCx136fJ9/Y4tg8jxpIZUXAT8LifYutFhn20Dn/Our7nq6Cs r1D0zXt8YyZFEFJ4bTvxSll2VggfLGZ4mmPJhFlC3FiEBZ8IAc9Db5yPsuApu5VTABsQsKJlnHR +S6XW7/W9iMylnNh+QIinQwVcqZsCRrqRWiQFHFCM3PdNBEW948PXUyj+dpE0xBq5QQ== X-Received: by 2002:a05:620a:5a7:: with SMTP id q7mr16047454qkq.477.1565306039427; Thu, 08 Aug 2019 16:13:59 -0700 (PDT) X-Received: by 2002:a05:620a:5a7:: with SMTP id q7mr16047414qkq.477.1565306038697; Thu, 08 Aug 2019 16:13:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565306038; cv=none; d=google.com; s=arc-20160816; b=WnXfeaiSv8seAhGZ4V00DWqdo/b0FwzGFX+MQQ+fYAq9BaGbsuxjVDRAWzCz4Gqmkp tgX0pxE9L2NspDvI9yewo0V6m4Rpb9ypfu3qr0NQD71qXm5M5yxrVMhBJL1BWsplDxXA 5M+AnmPV3/0v+r1ViKdxz3tNCawt2BiFVZgkx1LxFSKuBfg59F2e6Or8sX42VuPt2iKi vVP2+bOOgcWIJPm5wDaA9CSgjrPWSaj2JR52TB6di+KbX5KKXTkr0MUdQzEoobLjLnn7 BhDWWY16FCnFjxxvH/VdsQ8PXM0PbqPsTkUSpgSk3EGGnUOH4oHIZS/K4HZ0jEkacs3T LQvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:dkim-signature; bh=1e5Y+e42KorvbcxGXpQwcEigXZeGV0ULHZdDaD0hvA4=; b=eCnukQq+vvD1x4ViCpcAoP2zjcXya1z1HbMLgTQCEJBMp4EKpoTci6FS4tw/nx8rBi MyMHrxEyKBlMSmMXlMYFFahY0S+4UnT7rMcdrvQhUctqCjnGqhagD/fjUaap/GcwolXW ItXMnZ8fhLCAeJtReOyU5paw9Lz7t4gwyXOpjX17Iru8O0Ui9ikyO75L1qIN8mTN1MyF 1NdvM5EeJtzJ1FAI6E0EMPR9STOqS8Cdjj9mcY2BdyecGyjE5KVXQvcdKT6m2+QyX2Ph F7dq+DaLeRXLS6Wi5rAmrPSUuAun2KZBjBgLOoY4D4jfDJpoDV8eehm71aUYwEgw7eCV z4Vg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=b+zT0f9f; spf=pass (google.com: domain of 3tqxmxqskcdmpabphgnbxcpvddvat.rdbaxcjm-bbzkprz.dgv@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3tqxMXQsKCDMPabPhgnbXcPVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--almasrymina.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f73.google.com (mail-sor-f73.google.com. [209.85.220.73]) by mx.google.com with SMTPS id f4sor7235673qti.68.2019.08.08.16.13.58 for (Google Transport Security); Thu, 08 Aug 2019 16:13:58 -0700 (PDT) Received-SPF: pass (google.com: domain of 3tqxmxqskcdmpabphgnbxcpvddvat.rdbaxcjm-bbzkprz.dgv@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) client-ip=209.85.220.73; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=b+zT0f9f; spf=pass (google.com: domain of 3tqxmxqskcdmpabphgnbxcpvddvat.rdbaxcjm-bbzkprz.dgv@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3tqxMXQsKCDMPabPhgnbXcPVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--almasrymina.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=1e5Y+e42KorvbcxGXpQwcEigXZeGV0ULHZdDaD0hvA4=; b=b+zT0f9fu3ULZ0gkfmCIj5ZnufLu5TrTyMwWV6AJcHZSMmCx7vDSJ678HGyeAZNvNR HRmHfd11DfakQ4WibruAurLDZaAKlglidQFRwRel4tJ021LGCoozToPmLIDFt7zHH1OO lsli6SbS7zTH9Mz7jEgwSq3wYd0RJVOM0WEkDUb6HeLaw3d1WT5JQ1gPstZbdXAIjhkT Zk31JWtZT3EoPeJJmuV6/efvk9XKcRoc9S2YhR65FM7YW6fHe5K9v2lNn3b3c6YPzlv6 B2F7+oVMu6Kf7+vPrHz/b7G9vi4mbpKh2MPYppFHPiPJAjkAl/4VKB1CUO2o15wdpHSt CuZQ== X-Google-Smtp-Source: APXvYqxZsZORtAFxvo/GCCxeazLSDeTloGwWD2nvNdmwV7J3BRRl3Vegc7pnzWtMZdZFgg0hWgmiIgFkIB0uV5GRKw== X-Received: by 2002:aed:3461:: with SMTP id w88mr15451018qtd.13.1565306038224; Thu, 08 Aug 2019 16:13:58 -0700 (PDT) Date: Thu, 8 Aug 2019 16:13:36 -0700 In-Reply-To: <20190808231340.53601-1-almasrymina@google.com> Message-Id: <20190808231340.53601-2-almasrymina@google.com> Mime-Version: 1.0 References: <20190808231340.53601-1-almasrymina@google.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog Subject: [RFC PATCH v2 1/5] hugetlb_cgroup: Add hugetlb_cgroup reservation counter From: Mina Almasry To: mike.kravetz@oracle.com Cc: shuah@kernel.org, almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, khalid.aziz@oracle.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP These counters will track hugetlb reservations rather than hugetlb memory faulted in. This patch only adds the counter, following patches add the charging and uncharging of the counter. --- include/linux/hugetlb.h | 2 +- mm/hugetlb_cgroup.c | 86 +++++++++++++++++++++++++++++++++++++---- 2 files changed, 80 insertions(+), 8 deletions(-) -- 2.23.0.rc1.153.gdeed80330f-goog diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index edfca42783192..6777b3013345d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -340,7 +340,7 @@ struct hstate { unsigned int surplus_huge_pages_node[MAX_NUMNODES]; #ifdef CONFIG_CGROUP_HUGETLB /* cgroup control files */ - struct cftype cgroup_files[5]; + struct cftype cgroup_files[9]; #endif char name[HSTATE_NAME_LEN]; }; diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 68c2f2f3c05b7..708103663988a 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -25,6 +25,10 @@ struct hugetlb_cgroup { * the counter to account for hugepages from hugetlb. */ struct page_counter hugepage[HUGE_MAX_HSTATE]; + /* + * the counter to account for hugepage reservations from hugetlb. + */ + struct page_counter reserved_hugepage[HUGE_MAX_HSTATE]; }; #define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val)) @@ -33,6 +37,15 @@ struct hugetlb_cgroup { static struct hugetlb_cgroup *root_h_cgroup __read_mostly; +static inline +struct page_counter *get_counter(struct hugetlb_cgroup *h_cg, int idx, + bool reserved) +{ + if (reserved) + return &h_cg->reserved_hugepage[idx]; + return &h_cg->hugepage[idx]; +} + static inline struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s) { @@ -256,28 +269,42 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, enum { RES_USAGE, + RES_RESERVATION_USAGE, RES_LIMIT, + RES_RESERVATION_LIMIT, RES_MAX_USAGE, + RES_RESERVATION_MAX_USAGE, RES_FAILCNT, + RES_RESERVATION_FAILCNT, }; static u64 hugetlb_cgroup_read_u64(struct cgroup_subsys_state *css, struct cftype *cft) { struct page_counter *counter; + struct page_counter *reserved_counter; struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(css); counter = &h_cg->hugepage[MEMFILE_IDX(cft->private)]; + reserved_counter = &h_cg->reserved_hugepage[MEMFILE_IDX(cft->private)]; switch (MEMFILE_ATTR(cft->private)) { case RES_USAGE: return (u64)page_counter_read(counter) * PAGE_SIZE; + case RES_RESERVATION_USAGE: + return (u64)page_counter_read(reserved_counter) * PAGE_SIZE; case RES_LIMIT: return (u64)counter->max * PAGE_SIZE; + case RES_RESERVATION_LIMIT: + return (u64)reserved_counter->max * PAGE_SIZE; case RES_MAX_USAGE: return (u64)counter->watermark * PAGE_SIZE; + case RES_RESERVATION_MAX_USAGE: + return (u64)reserved_counter->watermark * PAGE_SIZE; case RES_FAILCNT: return counter->failcnt; + case RES_RESERVATION_FAILCNT: + return reserved_counter->failcnt; default: BUG(); } @@ -291,6 +318,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of, int ret, idx; unsigned long nr_pages; struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of)); + bool reserved = false; if (hugetlb_cgroup_is_root(h_cg)) /* Can't set limit on root */ return -EINVAL; @@ -303,10 +331,16 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of, idx = MEMFILE_IDX(of_cft(of)->private); nr_pages = round_down(nr_pages, 1 << huge_page_order(&hstates[idx])); + if (MEMFILE_ATTR(of_cft(of)->private) == RES_RESERVATION_LIMIT) { + reserved = true; + } + switch (MEMFILE_ATTR(of_cft(of)->private)) { + case RES_RESERVATION_LIMIT: case RES_LIMIT: mutex_lock(&hugetlb_limit_mutex); - ret = page_counter_set_max(&h_cg->hugepage[idx], nr_pages); + ret = page_counter_set_max(get_counter(h_cg, idx, reserved), + nr_pages); mutex_unlock(&hugetlb_limit_mutex); break; default: @@ -320,18 +354,26 @@ static ssize_t hugetlb_cgroup_reset(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { int ret = 0; - struct page_counter *counter; + struct page_counter *counter, *reserved_counter; struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of)); counter = &h_cg->hugepage[MEMFILE_IDX(of_cft(of)->private)]; + reserved_counter = &h_cg->reserved_hugepage[ + MEMFILE_IDX(of_cft(of)->private)]; switch (MEMFILE_ATTR(of_cft(of)->private)) { case RES_MAX_USAGE: page_counter_reset_watermark(counter); break; + case RES_RESERVATION_MAX_USAGE: + page_counter_reset_watermark(reserved_counter); + break; case RES_FAILCNT: counter->failcnt = 0; break; + case RES_RESERVATION_FAILCNT: + reserved_counter->failcnt = 0; + break; default: ret = -EINVAL; break; @@ -357,7 +399,7 @@ static void __init __hugetlb_cgroup_file_init(int idx) struct hstate *h = &hstates[idx]; /* format the size */ - mem_fmt(buf, 32, huge_page_size(h)); + mem_fmt(buf, sizeof(buf), huge_page_size(h)); /* Add the limit file */ cft = &h->cgroup_files[0]; @@ -366,28 +408,58 @@ static void __init __hugetlb_cgroup_file_init(int idx) cft->read_u64 = hugetlb_cgroup_read_u64; cft->write = hugetlb_cgroup_write; - /* Add the usage file */ + /* Add the reservation limit file */ cft = &h->cgroup_files[1]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_limit_in_bytes", + buf); + cft->private = MEMFILE_PRIVATE(idx, RES_RESERVATION_LIMIT); + cft->read_u64 = hugetlb_cgroup_read_u64; + cft->write = hugetlb_cgroup_write; + + /* Add the usage file */ + cft = &h->cgroup_files[2]; snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf); cft->private = MEMFILE_PRIVATE(idx, RES_USAGE); cft->read_u64 = hugetlb_cgroup_read_u64; + /* Add the reservation usage file */ + cft = &h->cgroup_files[3]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_usage_in_bytes", + buf); + cft->private = MEMFILE_PRIVATE(idx, RES_RESERVATION_USAGE); + cft->read_u64 = hugetlb_cgroup_read_u64; + /* Add the MAX usage file */ - cft = &h->cgroup_files[2]; + cft = &h->cgroup_files[4]; snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf); cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE); cft->write = hugetlb_cgroup_reset; cft->read_u64 = hugetlb_cgroup_read_u64; + /* Add the MAX reservation usage file */ + cft = &h->cgroup_files[5]; + snprintf(cft->name, MAX_CFTYPE_NAME, + "%s.reservation_max_usage_in_bytes", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_RESERVATION_MAX_USAGE); + cft->write = hugetlb_cgroup_reset; + cft->read_u64 = hugetlb_cgroup_read_u64; + /* Add the failcntfile */ - cft = &h->cgroup_files[3]; + cft = &h->cgroup_files[6]; snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf); cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT); cft->write = hugetlb_cgroup_reset; cft->read_u64 = hugetlb_cgroup_read_u64; + /* Add the reservation failcntfile */ + cft = &h->cgroup_files[7]; + snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_failcnt", buf); + cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT); + cft->write = hugetlb_cgroup_reset; + cft->read_u64 = hugetlb_cgroup_read_u64; + /* NULL terminate the last cft */ - cft = &h->cgroup_files[4]; + cft = &h->cgroup_files[8]; memset(cft, 0, sizeof(*cft)); WARN_ON(cgroup_add_legacy_cftypes(&hugetlb_cgrp_subsys, From patchwork Thu Aug 8 23:13:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 11084951 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE2FA1398 for ; Thu, 8 Aug 2019 23:14:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB4BD28826 for ; Thu, 8 Aug 2019 23:14:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 95A3B28C02; Thu, 8 Aug 2019 23:14:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E26E528826 for ; Thu, 8 Aug 2019 23:14:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ECFE56B000A; Thu, 8 Aug 2019 19:14:02 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E59586B000C; Thu, 8 Aug 2019 19:14:02 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFB526B000D; Thu, 8 Aug 2019 19:14:02 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 92A0D6B000A for ; Thu, 8 Aug 2019 19:14:02 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id x10so401289pfn.2 for ; Thu, 08 Aug 2019 16:14:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=+wNNF10zN2glTEVlCyNTwfvE/ShUZWkrH2OVFnVoyK4=; b=jLmMm6/EV82sUC2nU6Fi/ERhA6eamQ5pXtL5aTyvf5Zl+6nguIlNZH0TzOziEqG7TS yOPDBBcWR2dOxhlqimjF2kJJ4sKNJo/nF3RKObl0xQt2JKHKkVFteo3/e5bxFOgkYolq 9y4pSO+5CJ3AW2DR3AyV7+yHMzZVpbWwnxLeQi8bOB6rL+4+hBouBR2N7VnkM3zeg844 qtH0AVItsJYIVuV2UgZ66f1T7M2Sw0A6jfLiGc0PyVKBIhTtVtogY3kUPPxnvXbijs03 Xj4wOBpkTYAqamt/r6Gv4i1OjUcVDynsr87GCEuSTZQcu5cR3LappCfWZpKp0oGJjvM0 Ij1Q== X-Gm-Message-State: APjAAAXPtjt926N7iYjAcVdX6tft5IFeXL1AEEilVkmyeiRtaV8+adlx BQKaATYAwQB8p59k2Ra+WhatMfpce4PHnm/Tuf4BMuzPTYBieCESPFSEqzoF/tLKCOpR8cr8Q9K 5oW6SjBX4u00q44msVDrPSXODUuP5vibBH8p5eWifKDN1ttpIxMWIfzWskO3KXsGVnA== X-Received: by 2002:a62:3347:: with SMTP id z68mr18567356pfz.174.1565306042295; Thu, 08 Aug 2019 16:14:02 -0700 (PDT) X-Received: by 2002:a62:3347:: with SMTP id z68mr18567306pfz.174.1565306041421; Thu, 08 Aug 2019 16:14:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565306041; cv=none; d=google.com; s=arc-20160816; b=ARb3cjT8yuWyjCbn+PRavtrJlLxMhhsa7VdXVKCMOB/FmxEUq7a9liksXPWnRi2RLa Bm88JkaDoYumlrE651GM3emW0HfMTXEWFzpwP8P+z6CsaLzUKlBCNWw25wBiQ+yu1Zrt aFMxfn/DeZvnGZU2fz6hftgsfEz/z4gjlYSBWL5ob6MvWI2QNWd+xzsu/fgsvoWrx7VT 4OfmRZGqUzX2VKcOalDDipVQZsy+RzDV46lqFRM9f8PLjg6LXd97z3HPfHw0gaXi/8aO W/rlXkYlI/G0QAh4kSYi9eR5hV74v/ErpqhUJpXkKhQ0mz9c0x4XzIj6BY+998ONKLCN hl5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:dkim-signature; bh=+wNNF10zN2glTEVlCyNTwfvE/ShUZWkrH2OVFnVoyK4=; b=VclGV8U7vXPonJTDcCBs4kKWM4btqJjXuGbymbMKu9Q8ZiRBtmlG0Tw7pfRkKpBfNs T+RGypuovJZVSldn9vhDfVfUBz0bwAHJ+5HAgWvzXvkfrz9uK4RkiKbstK7iEBqNCSpJ +MgMeYyWvAApDQb7J1hzK7uRhTYXoxxkpkKKDKhYdMzRBOm/fUSgyuDClJX0/oYMcZ0c Zg7dCXKJDVLWvxXKQa/9d+jhUUEjt34Y3dIfkIaW2BdNahqvW7xVcfyHa89GIFZSJCJA CU3/zAdbsD8L9p5rPQO9xkFmgwZ5YZcvPfbWXs79ueF+cebMoK2MDJOobj0l+BW5JjMD wViA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=cHvzQV+J; spf=pass (google.com: domain of 3ukxmxqskcdurcdrjipdzerxffxcv.tfdczelo-ddbmrtb.fix@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3uKxMXQsKCDURcdRjipdZeRXffXcV.TfdcZelo-ddbmRTb.fiX@flex--almasrymina.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f73.google.com (mail-sor-f73.google.com. [209.85.220.73]) by mx.google.com with SMTPS id 31sor67598821pgy.17.2019.08.08.16.14.01 for (Google Transport Security); Thu, 08 Aug 2019 16:14:01 -0700 (PDT) Received-SPF: pass (google.com: domain of 3ukxmxqskcdurcdrjipdzerxffxcv.tfdczelo-ddbmrtb.fix@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) client-ip=209.85.220.73; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=cHvzQV+J; spf=pass (google.com: domain of 3ukxmxqskcdurcdrjipdzerxffxcv.tfdczelo-ddbmrtb.fix@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3uKxMXQsKCDURcdRjipdZeRXffXcV.TfdcZelo-ddbmRTb.fiX@flex--almasrymina.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=+wNNF10zN2glTEVlCyNTwfvE/ShUZWkrH2OVFnVoyK4=; b=cHvzQV+JUEBPGkWz4oh8bEtYs18v5GHnZGHn7x45P+W1pL8U1mYr5uO38kfP7gO83B JY5jhe3eG0tn9dHHNJtCyTAF5zntEb93HcpZ+X0e2Ey+do5W1FAFscDhvED7CLnXWsOw WRgC96Lrpmi0lYydy+TK8FTduKC2ZCBxQ9etuT52ycRh/sl2m77Gl0+uQ5qlA+y0wZ0Y FuHpNiAFzLsBvdrQe6N7IPY8TbglQ/BOQhPBPA50NISWvfqb40u+ujnhGEVoSrbYvmEr uaTYD5AChzo27g5Csi8Ub4jYFIeJ3iLbExyWqlPbbusYfI1s0MAhTahg+1DYLst8cZRg 4q5g== X-Google-Smtp-Source: APXvYqxiELON7MBP5DXs3jkqLdiH5sTqXqtOk/noxLZmdDFa0RCeUeXTnlV4V5nVMv0MfW4AtOE6gp48V0EL0oVwrQ== X-Received: by 2002:a63:c013:: with SMTP id h19mr14955058pgg.108.1565306040809; Thu, 08 Aug 2019 16:14:00 -0700 (PDT) Date: Thu, 8 Aug 2019 16:13:37 -0700 In-Reply-To: <20190808231340.53601-1-almasrymina@google.com> Message-Id: <20190808231340.53601-3-almasrymina@google.com> Mime-Version: 1.0 References: <20190808231340.53601-1-almasrymina@google.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog Subject: [RFC PATCH v2 2/5] hugetlb_cgroup: Add interface for charge/uncharge From: Mina Almasry To: mike.kravetz@oracle.com Cc: shuah@kernel.org, almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, khalid.aziz@oracle.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Augments hugetlb_cgroup_charge_cgroup to be able to charge hugetlb usage or hugetlb reservation counter. Adds a new interface to uncharge a hugetlb_cgroup counter via hugetlb_cgroup_uncharge_counter. Integrates the counter with hugetlb_cgroup, via hugetlb_cgroup_init, hugetlb_cgroup_have_usage, and hugetlb_cgroup_css_offline. --- include/linux/hugetlb_cgroup.h | 8 +++-- mm/hugetlb.c | 3 +- mm/hugetlb_cgroup.c | 63 ++++++++++++++++++++++++++++------ 3 files changed, 61 insertions(+), 13 deletions(-) -- 2.23.0.rc1.153.gdeed80330f-goog diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index 063962f6dfc6a..0725f809cd2d9 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -52,7 +52,8 @@ static inline bool hugetlb_cgroup_disabled(void) } extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, - struct hugetlb_cgroup **ptr); + struct hugetlb_cgroup **ptr, + bool reserved); extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg, struct page *page); @@ -60,6 +61,9 @@ extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, struct page *page); extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg); +extern void hugetlb_cgroup_uncharge_counter(struct page_counter *p, + unsigned long nr_pages); + extern void hugetlb_cgroup_file_init(void) __init; extern void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage); @@ -83,7 +87,7 @@ static inline bool hugetlb_cgroup_disabled(void) static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, - struct hugetlb_cgroup **ptr) + struct hugetlb_cgroup **ptr, bool reserved) { return 0; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ede7e7f5d1ab2..c153bef42e729 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2078,7 +2078,8 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, gbl_chg = 1; } - ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg, + false); if (ret) goto out_subpool_put; diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 708103663988a..119176a0b2ec5 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -74,8 +74,10 @@ static inline bool hugetlb_cgroup_have_usage(struct hugetlb_cgroup *h_cg) int idx; for (idx = 0; idx < hugetlb_max_hstate; idx++) { - if (page_counter_read(&h_cg->hugepage[idx])) + if (page_counter_read(get_counter(h_cg, idx, true)) || + page_counter_read(get_counter(h_cg, idx, false))) { return true; + } } return false; } @@ -86,18 +88,27 @@ static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup, int idx; for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) { - struct page_counter *counter = &h_cgroup->hugepage[idx]; struct page_counter *parent = NULL; + struct page_counter *reserved_parent = NULL; unsigned long limit; int ret; - if (parent_h_cgroup) - parent = &parent_h_cgroup->hugepage[idx]; - page_counter_init(counter, parent); + if (parent_h_cgroup) { + parent = get_counter(parent_h_cgroup, idx, false); + reserved_parent = get_counter(parent_h_cgroup, idx, + true); + } + page_counter_init(get_counter(h_cgroup, idx, false), parent); + page_counter_init(get_counter(h_cgroup, idx, true), + reserved_parent); limit = round_down(PAGE_COUNTER_MAX, 1 << huge_page_order(&hstates[idx])); - ret = page_counter_set_max(counter, limit); + + ret = page_counter_set_max(get_counter( + h_cgroup, idx, false), limit); + ret = page_counter_set_max(get_counter( + h_cgroup, idx, true), limit); VM_BUG_ON(ret); } } @@ -127,6 +138,25 @@ static void hugetlb_cgroup_css_free(struct cgroup_subsys_state *css) kfree(h_cgroup); } +static void hugetlb_cgroup_move_parent_reservation(int idx, + struct hugetlb_cgroup *h_cg) +{ + struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(h_cg); + + /* Move the reservation counters. */ + if (!parent_hugetlb_cgroup(h_cg)) { + parent = root_h_cgroup; + /* root has no limit */ + page_counter_charge( + &root_h_cgroup->reserved_hugepage[idx], + page_counter_read(get_counter(h_cg, idx, + true))); + } + + /* Take the pages off the local counter */ + page_counter_cancel(get_counter(h_cg, idx, true), + page_counter_read(get_counter(h_cg, idx, true))); +} /* * Should be called with hugetlb_lock held. @@ -181,6 +211,7 @@ static void hugetlb_cgroup_css_offline(struct cgroup_subsys_state *css) do { for_each_hstate(h) { spin_lock(&hugetlb_lock); + hugetlb_cgroup_move_parent_reservation(idx, h_cg); list_for_each_entry(page, &h->hugepage_activelist, lru) hugetlb_cgroup_move_parent(idx, h_cg, page); @@ -192,7 +223,7 @@ static void hugetlb_cgroup_css_offline(struct cgroup_subsys_state *css) } int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, - struct hugetlb_cgroup **ptr) + struct hugetlb_cgroup **ptr, bool reserved) { int ret = 0; struct page_counter *counter; @@ -215,8 +246,10 @@ int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, } rcu_read_unlock(); - if (!page_counter_try_charge(&h_cg->hugepage[idx], nr_pages, &counter)) + if (!page_counter_try_charge(get_counter(h_cg, idx, reserved), + nr_pages, &counter)) { ret = -ENOMEM; + } css_put(&h_cg->css); done: *ptr = h_cg; @@ -250,7 +283,8 @@ void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, if (unlikely(!h_cg)) return; set_hugetlb_cgroup(page, NULL); - page_counter_uncharge(&h_cg->hugepage[idx], nr_pages); + page_counter_uncharge(get_counter(h_cg, idx, false), nr_pages); + return; } @@ -263,7 +297,16 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, if (huge_page_order(&hstates[idx]) < HUGETLB_CGROUP_MIN_ORDER) return; - page_counter_uncharge(&h_cg->hugepage[idx], nr_pages); + page_counter_uncharge(get_counter(h_cg, idx, false), nr_pages); +} + +void hugetlb_cgroup_uncharge_counter(struct page_counter *p, + unsigned long nr_pages) +{ + if (hugetlb_cgroup_disabled() || !p) + return; + + page_counter_uncharge(p, nr_pages); return; } From patchwork Thu Aug 8 23:13:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 11084957 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7A6781398 for ; Thu, 8 Aug 2019 23:14:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6884F28826 for ; Thu, 8 Aug 2019 23:14:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5D10828C03; Thu, 8 Aug 2019 23:14:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A6A1528826 for ; Thu, 8 Aug 2019 23:14:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77CAA6B000C; Thu, 8 Aug 2019 19:14:05 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 70ED86B000D; Thu, 8 Aug 2019 19:14:05 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 532846B000E; Thu, 8 Aug 2019 19:14:05 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-vs1-f70.google.com (mail-vs1-f70.google.com [209.85.217.70]) by kanga.kvack.org (Postfix) with ESMTP id 296796B000C for ; Thu, 8 Aug 2019 19:14:05 -0400 (EDT) Received: by mail-vs1-f70.google.com with SMTP id a11so24688641vso.9 for ; Thu, 08 Aug 2019 16:14:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=M6cyBd5hA6OsVmAelGzV5tX+uF9n4OBWzNC5c9d9ybI=; b=kY0JtrXtsdhmjOXcYMzQuHtX21nyFoe+122QBGqoNNuGeUCz29H3gVw3bZqw56bvcQ Psuar864earnGST8hibnTkl6wryWhO7kAPLAIWZE9pY8uIHThXLTdBBq+6TSVkAEZpgH unmJpo+/CBDVvKVV6s1MrZik0MdXOYXauxyVIrnXRH2Aw1IrdDInlegqosTfuXKp2il2 ZCGbS8fMS3b/XYCZQAcO0g5qyEEXKzakm1Iq35PiIxc1Do45bONQ4oVN+YJ20Sb5GJUN RJkTh/IR8LXrVHokOxNfRnc/eglJ81mp+xXRhNOAEx39LcKLiCAeOlqiSQpFA36vufPg Yqtg== X-Gm-Message-State: APjAAAVVgv/ahkuJ3mv4ZwWpHu2P/ip/aYrzQoH4p5vQ5mFoCai71tpt lz8kz3Sx4yFwtfMH18tPQ2M6r8kM7C7Qu9OFU8PThyhk8gF6Dvhbqn42dBCl0hBzVppBcNNXh2D 5U2+GfyxPOJkb3SHvdDl5RTbjzBiLQHnjZL/XrIcsxw168deUyU3YIY/iNLypm7UmWw== X-Received: by 2002:a1f:b48e:: with SMTP id d136mr6826415vkf.57.1565306044708; Thu, 08 Aug 2019 16:14:04 -0700 (PDT) X-Received: by 2002:a1f:b48e:: with SMTP id d136mr6826404vkf.57.1565306043905; Thu, 08 Aug 2019 16:14:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565306043; cv=none; d=google.com; s=arc-20160816; b=KUnWOOW0T6/ta+qBztE1NxSDYDHzG20z/RLyCnKI4KDf9PvI0zkv9RJTQq4eB2UJ2j 34RQe7cQYIweU1Uc5GBtDQPGnK+zJ9s1fBVgozw6hbI7F1NjY06paikCF4IBHeglG6nQ XPAuC63XWRQP/e0OziIjTlY4WeGs4gYM4yEm/7kumTwxz6BD9P2yQNhXhf/mYSNCtqwg kjHCgeCXr0J53ENozwd0IKBYmh6dE0vu70EXnHX3RJrsoJAxZTMDOHst/5GyoHrqSIp/ OvEY8mrnJYGQ5yhmkitbJ8kHi3weuaJprWojl7jwrnoP1bxKcj2rqYKWyrE5RYdYIXXG ZYFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:dkim-signature; bh=M6cyBd5hA6OsVmAelGzV5tX+uF9n4OBWzNC5c9d9ybI=; b=QjKqsBAL5TQTf2LpdAMgkiQt78E69gpIlCj62+KU4dnqCz0xNDP8Rb71bm8VISpVsg B3v/iZLMs3KHj/l/xkJEkR9hGb4VRjZc/0SdRhXW2pBACw/aYpLkV6yOTCfQzBbWB7+S YpPfJzCXznRxY4VeBxpn1ITEWVYRaZ8Em7LBrL/sDtRKt0nYCk1JQm8wc3eOEAP9TH0h FdfKUBPfWXdgVGTlq9tvrZAFaLsTSndTpB++vAkpknkV9J1vgP5e3WlPtzeXSDB+xNFA sTuHAGrVnSnX5f3jOa/AldFpR4PQMGXYSih1KmJ0Za/p2E01O143uYjjoNNzrUKeDBBL FE4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=r1KEmh0q; spf=pass (google.com: domain of 3u6xmxqskcdgufgumlsgchuaiiafy.wigfchor-ggepuwe.ila@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3u6xMXQsKCDgUfgUmlsgchUaiiafY.Wigfchor-ggepUWe.ila@flex--almasrymina.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f73.google.com (mail-sor-f73.google.com. [209.85.220.73]) by mx.google.com with SMTPS id 197sor1812788vkg.17.2019.08.08.16.14.03 for (Google Transport Security); Thu, 08 Aug 2019 16:14:03 -0700 (PDT) Received-SPF: pass (google.com: domain of 3u6xmxqskcdgufgumlsgchuaiiafy.wigfchor-ggepuwe.ila@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) client-ip=209.85.220.73; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=r1KEmh0q; spf=pass (google.com: domain of 3u6xmxqskcdgufgumlsgchuaiiafy.wigfchor-ggepuwe.ila@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3u6xMXQsKCDgUfgUmlsgchUaiiafY.Wigfchor-ggepUWe.ila@flex--almasrymina.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=M6cyBd5hA6OsVmAelGzV5tX+uF9n4OBWzNC5c9d9ybI=; b=r1KEmh0qrOpvBm1/+0oF0sFMH6EQNPOKTvWW1271fb2M52m+iVMsdTPIGKzfVVpgj5 x+Nc7IcNm8URXfh6TYEay9WEJj4RCDbRjTAMgdrTcpePccY/hODxMK3rNh/znjoVcOBh LHTQpfT5/V4EZ7R6N+KMldgXBUoH0DykmWZx9LJEzyLnRFCME7SC3gsq4nP41UK1D3az fXXxpAnfQK8rjWMgzEd+dqkIbR5A5/BX9N3Fv+dnJlUU3lDtlUjaBsCblfohOv1PHDlw HoFX3WEegCMRRcW1gySNo8nwEMo4BRcSGuyQWrwBWkybGT21t/g38QCSjpGUG2swlQU/ vkZA== X-Google-Smtp-Source: APXvYqw//j8PX12o0jMJ/sTXdB0ZGAaXDk7In1PyTrMUN50VA3e5H4NhBHXZpX/7PJT5SedxTsS+zxmp8Csw1EfK6w== X-Received: by 2002:ac5:c853:: with SMTP id g19mr100310vkm.60.1565306043479; Thu, 08 Aug 2019 16:14:03 -0700 (PDT) Date: Thu, 8 Aug 2019 16:13:38 -0700 In-Reply-To: <20190808231340.53601-1-almasrymina@google.com> Message-Id: <20190808231340.53601-4-almasrymina@google.com> Mime-Version: 1.0 References: <20190808231340.53601-1-almasrymina@google.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog Subject: [RFC PATCH v2 3/5] hugetlb_cgroup: Add reservation accounting for private mappings From: Mina Almasry To: mike.kravetz@oracle.com Cc: shuah@kernel.org, almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, khalid.aziz@oracle.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Normally the pointer to the cgroup to uncharge hangs off the struct page, and gets queried when it's time to free the page. With hugetlb_cgroup reservations, this is not possible. Because it's possible for a page to be reserved by one task and actually faulted in by another task. The best place to put the hugetlb_cgroup pointer to uncharge for reservations is in the resv_map. But, because the resv_map has different semantics for private and shared mappings, the code patch to charge/uncharge shared and private mappings is different. This patch implements charging and uncharging for private mappings. For private mappings, the counter to uncharge is in resv_map->reservation_counter. On initializing the resv_map this is set to NULL. On reservation of a region in private mapping, the tasks hugetlb_cgroup is charged and the hugetlb_cgroup is placed is resv_map->reservation_counter. On hugetlb_vm_op_close, we uncharge resv_map->reservation_counter. --- include/linux/hugetlb.h | 8 ++++++ include/linux/hugetlb_cgroup.h | 11 ++++++++ mm/hugetlb.c | 47 ++++++++++++++++++++++++++++++++-- mm/hugetlb_cgroup.c | 12 --------- 4 files changed, 64 insertions(+), 14 deletions(-) -- 2.23.0.rc1.153.gdeed80330f-goog diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6777b3013345d..90b3c928d16c1 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -46,6 +46,14 @@ struct resv_map { long adds_in_progress; struct list_head region_cache; long region_cache_count; + #ifdef CONFIG_CGROUP_HUGETLB + /* + * On private mappings, the counter to uncharge reservations is stored + * here. If these fields are 0, then the mapping is shared. + */ + struct page_counter *reservation_counter; + unsigned long pages_per_hpage; +#endif }; extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index 0725f809cd2d9..1fdde63a4e775 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -25,6 +25,17 @@ struct hugetlb_cgroup; #define HUGETLB_CGROUP_MIN_ORDER 2 #ifdef CONFIG_CGROUP_HUGETLB +struct hugetlb_cgroup { + struct cgroup_subsys_state css; + /* + * the counter to account for hugepages from hugetlb. + */ + struct page_counter hugepage[HUGE_MAX_HSTATE]; + /* + * the counter to account for hugepage reservations from hugetlb. + */ + struct page_counter reserved_hugepage[HUGE_MAX_HSTATE]; +}; static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c153bef42e729..235996aef6618 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -711,6 +711,16 @@ struct resv_map *resv_map_alloc(void) INIT_LIST_HEAD(&resv_map->regions); resv_map->adds_in_progress = 0; +#ifdef CONFIG_CGROUP_HUGETLB + /* + * Initialize these to 0. On shared mappings, 0's here indicate these + * fields don't do cgroup accounting. On private mappings, these will be + * re-initialized to the proper values, to indicate that hugetlb cgroup + * reservations are to be un-charged from here. + */ + resv_map->reservation_counter = NULL; + resv_map->pages_per_hpage = 0; +#endif INIT_LIST_HEAD(&resv_map->region_cache); list_add(&rg->link, &resv_map->region_cache); @@ -3192,7 +3202,19 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) reserve = (end - start) - region_count(resv, start, end); - kref_put(&resv->refs, resv_map_release); +#ifdef CONFIG_CGROUP_HUGETLB + /* + * Since we check for HPAGE_RESV_OWNER above, this must a private + * mapping, and these values should be none-zero, and should point to + * the hugetlb_cgroup counter to uncharge for this reservation. + */ + WARN_ON(!resv->reservation_counter); + WARN_ON(!resv->pages_per_hpage); + + hugetlb_cgroup_uncharge_counter( + resv->reservation_counter, + (end - start) * resv->pages_per_hpage); +#endif if (reserve) { /* @@ -3202,6 +3224,8 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) gbl_reserve = hugepage_subpool_put_pages(spool, reserve); hugetlb_acct_memory(h, -gbl_reserve); } + + kref_put(&resv->refs, resv_map_release); } static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr) @@ -4516,6 +4540,7 @@ int hugetlb_reserve_pages(struct inode *inode, struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; + struct hugetlb_cgroup *h_cg; long gbl_reserve; /* This should never happen */ @@ -4549,11 +4574,29 @@ int hugetlb_reserve_pages(struct inode *inode, chg = region_chg(resv_map, from, to); } else { + /* Private mapping. */ + chg = to - from; + + if (hugetlb_cgroup_charge_cgroup( + hstate_index(h), + chg * pages_per_huge_page(h), + &h_cg, true)) { + return -ENOMEM; + } + resv_map = resv_map_alloc(); if (!resv_map) return -ENOMEM; - chg = to - from; +#ifdef CONFIG_CGROUP_HUGETLB + /* + * Since this branch handles private mappings, we attach the + * counter to uncharge for this reservation off resv_map. + */ + resv_map->reservation_counter = + &h_cg->reserved_hugepage[hstate_index(h)]; + resv_map->pages_per_hpage = pages_per_huge_page(h); +#endif set_vma_resv_map(vma, resv_map); set_vma_resv_flags(vma, HPAGE_RESV_OWNER); diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 119176a0b2ec5..06e99ae1fec81 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -19,18 +19,6 @@ #include #include -struct hugetlb_cgroup { - struct cgroup_subsys_state css; - /* - * the counter to account for hugepages from hugetlb. - */ - struct page_counter hugepage[HUGE_MAX_HSTATE]; - /* - * the counter to account for hugepage reservations from hugetlb. - */ - struct page_counter reserved_hugepage[HUGE_MAX_HSTATE]; -}; - #define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val)) #define MEMFILE_IDX(val) (((val) >> 16) & 0xffff) #define MEMFILE_ATTR(val) ((val) & 0xffff) From patchwork Thu Aug 8 23:13:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 11084961 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 77F3714DB for ; Thu, 8 Aug 2019 23:14:10 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6322728826 for ; Thu, 8 Aug 2019 23:14:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 56A9C28C02; Thu, 8 Aug 2019 23:14:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7223428826 for ; Thu, 8 Aug 2019 23:14:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 46EEE6B000D; Thu, 8 Aug 2019 19:14:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3FDBA6B000E; Thu, 8 Aug 2019 19:14:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 274E76B0010; Thu, 8 Aug 2019 19:14:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f198.google.com (mail-pg1-f198.google.com [209.85.215.198]) by kanga.kvack.org (Postfix) with ESMTP id DFB426B000D for ; Thu, 8 Aug 2019 19:14:07 -0400 (EDT) Received: by mail-pg1-f198.google.com with SMTP id x19so58580298pgx.1 for ; Thu, 08 Aug 2019 16:14:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=drCiqg5eO1p+COxGwvWoRnEMTmsA8j0da20LEkdR9+8=; b=e23Ro2Kb7zXSEAASjMTWKFDvCXGv+3xoPY6o/jUV57xo4J/Wdzr6yc9Ud/MCewK5iq 3480qfbzHeGFIlN9I3hCHgEsA6QH5iQR9ksQwBYDn4VU6FkNfjlKWbrzw0af/wxlaxEn nwHvi8Ry67omLZO2hoazDQe2hbH33USUaiCDLFVep6wzyneLHN76GbmzDUUsxyCTwH5W xnIbk1+VTdw8vZCM6yZ644X3EsJzCcZEURQ54G0qEWdUikcl2cnP+Zpyulm8vJP+5g8i 1widi5kPn7wOGmmKFaCmyOdvmMwReGCUMTyJOXdkvohKuEdauJ2gKRpoEdwdFGfZX+T9 mpuQ== X-Gm-Message-State: APjAAAXXrbhRdGqpJt9c+kb7Lbv8jNNNgJsKNxWZ8Te6MFfeXFXkgNhc TtfOA0J8q38wRJ8810uXtyuOOBg64lev9GDYksLpLYNmTA/rIAB7DPPleP2YxrvmsKpOrG1UftW YRbpefsJjSE4Io4mAUL6h2JKTYJjRFYiix+MrZjx4Zt45oYFEX3ZditQM3cj2vVmmYw== X-Received: by 2002:aa7:86cc:: with SMTP id h12mr18508642pfo.2.1565306047583; Thu, 08 Aug 2019 16:14:07 -0700 (PDT) X-Received: by 2002:aa7:86cc:: with SMTP id h12mr18508569pfo.2.1565306046344; Thu, 08 Aug 2019 16:14:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565306046; cv=none; d=google.com; s=arc-20160816; b=H1jYDOWH0f/l3UQ/XF3uX0USkd5iDMwXAJ2z9+dOQcVX0AKDPXs5b5iw3N6HTYE9yw HRzpr2yeh6vqRSmtlR3Jjc7Kfz7holXE3ukkssu5hApfn4Q6EDHrHRvnxZHXjjF8KoE0 syLfRikCLf32K86h68Oq3NmeuN16+CaWd1JUXXEM9xcK1Ny/hlY8bEwBPmjgXDQWq5M9 jFts3MNJVMRhZ9MPzbSBlKl4b6S7Vz/W2Ro5IItCMKLq1701P8gMD1ve0sFuEUWSsx3G LIatOhqbAh/Rq0kQtxwWc1BmHA0uKHE+hXNP+z0Z+HVBjO/i3cIvUL/4qKaW9necSmXx WBAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:dkim-signature; bh=drCiqg5eO1p+COxGwvWoRnEMTmsA8j0da20LEkdR9+8=; b=MXbqN09p7hAMF2DVkd6bsNQkVEH1sL17GCHHClRdybisONpR8PfYb+mp5IilY4XsEX LlWQSN9kq1rzBm38Ro3ISv7HYv/7yWWP6xK/gEUUcgIxBhX+j9h1UpLJ//sVH4o1BOl1 I976nBE3v1dRHwBa4TzzUwsW/5DsgncS9LvhE1tjcMDOOs5iai9EXtGD2Ke3o0Q/e4z2 W3jxJC1UIhALj8q1nO3jEa4Zqjnr1HfNwk2N4KI2QbAxYOUcjRZd394hvwS6ruG9ITYK zLEFiI+jB+pYPCY3UN4jagCTSQ3M6ndJs9dK9HNHN1I+la4Ug3WSaMfqpH6/mZ2pC6lt V2vg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=HwdqrDzr; spf=pass (google.com: domain of 3vaxmxqskcdowhiwonuiejwckkcha.ykihejqt-iigrwyg.knc@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3vaxMXQsKCDoWhiWonuiejWckkcha.Ykihejqt-iigrWYg.knc@flex--almasrymina.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f73.google.com (mail-sor-f73.google.com. [209.85.220.73]) by mx.google.com with SMTPS id g15sor68648943pgg.19.2019.08.08.16.14.06 for (Google Transport Security); Thu, 08 Aug 2019 16:14:06 -0700 (PDT) Received-SPF: pass (google.com: domain of 3vaxmxqskcdowhiwonuiejwckkcha.ykihejqt-iigrwyg.knc@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) client-ip=209.85.220.73; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=HwdqrDzr; spf=pass (google.com: domain of 3vaxmxqskcdowhiwonuiejwckkcha.ykihejqt-iigrwyg.knc@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3vaxMXQsKCDoWhiWonuiejWckkcha.Ykihejqt-iigrWYg.knc@flex--almasrymina.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=drCiqg5eO1p+COxGwvWoRnEMTmsA8j0da20LEkdR9+8=; b=HwdqrDzr13x64ZIseXlcLcodkObujd2c9PGDHKeGJvwU4h7405yqZw+dhi88K6thZ/ Hkk7kedUYi0DGhpXnUevb7H9L/5iTKlHpGXS82/zvKmrtlRX2T7iRPae99XQnuH0yp4c yGb7XS7i9QugXTcFtu0TUFjZbfZpRAeX5w1EEA/34a+DlAspSW2HuXf4Li8u7+l5HTK4 28m947qa5CJM0ibSgMCjh1HsJKFHudJg/O578scs8UgFTjIHtx+y91PaExxiJqfgs9yk nCcYu9Hrm56wOkmNUrntb1lzpcUG7yFDAq16LlE2WItq3Y0AqvYvEeCwTPwI4mAD8SfT ISMw== X-Google-Smtp-Source: APXvYqyvXv68SxmFiKFPVh/7lJ9pVVf/mq3qplP82Zyca+bEQZ85+a43+DwF33JU6YM617k79zoSdt24qkmqNAzF5w== X-Received: by 2002:a65:41c2:: with SMTP id b2mr14787518pgq.320.1565306045744; Thu, 08 Aug 2019 16:14:05 -0700 (PDT) Date: Thu, 8 Aug 2019 16:13:39 -0700 In-Reply-To: <20190808231340.53601-1-almasrymina@google.com> Message-Id: <20190808231340.53601-5-almasrymina@google.com> Mime-Version: 1.0 References: <20190808231340.53601-1-almasrymina@google.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog Subject: [RFC PATCH v2 4/5] hugetlb_cgroup: Add accounting for shared mappings From: Mina Almasry To: mike.kravetz@oracle.com Cc: shuah@kernel.org, almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, khalid.aziz@oracle.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives in the resv_map entries, in file_region->reservation_counter. When a file_region entry is added to the resv_map via region_add, we also charge the appropriate hugetlb_cgroup and put the pointer to that in file_region->reservation_counter. This is slightly delicate since we need to not modify the resv_map until we know that charging the reservation has succeeded. If charging doesn't succeed, we report the error to the caller, so that the kernel fails the reservation. On region_del, which is when the hugetlb memory is unreserved, we delete the file_region entry in the resv_map, but also uncharge the file_region->reservation_counter. --- mm/hugetlb.c | 208 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 170 insertions(+), 38 deletions(-) -- 2.23.0.rc1.153.gdeed80330f-goog diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 235996aef6618..d76e3137110ab 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -242,8 +242,72 @@ struct file_region { struct list_head link; long from; long to; +#ifdef CONFIG_CGROUP_HUGETLB + /* + * On shared mappings, each reserved region appears as a struct + * file_region in resv_map. These fields hold the info needed to + * uncharge each reservation. + */ + struct page_counter *reservation_counter; + unsigned long pages_per_hpage; +#endif }; +/* Must be called with resv->lock held. Calling this with dry_run == true will + * count the number of pages added but will not modify the linked list. + */ +static long consume_regions_we_overlap_with(struct file_region *rg, + struct list_head *head, long f, long *t, + struct hugetlb_cgroup *h_cg, + struct hstate *h, + bool dry_run) +{ + long add = 0; + struct file_region *trg = NULL, *nrg = NULL; + + /* Consume any regions we now overlap with. */ + nrg = rg; + list_for_each_entry_safe(rg, trg, rg->link.prev, link) { + if (&rg->link == head) + break; + if (rg->from > *t) + break; + + /* If this area reaches higher then extend our area to + * include it completely. If this is not the first area + * which we intend to reuse, free it. + */ + if (rg->to > *t) + *t = rg->to; + if (rg != nrg) { + /* Decrement return value by the deleted range. + * Another range will span this area so that by + * end of routine add will be >= zero + */ + add -= (rg->to - rg->from); + if (!dry_run) { + list_del(&rg->link); + kfree(rg); + } + } + } + + add += (nrg->from - f); /* Added to beginning of region */ + add += *t - nrg->to; /* Added to end of region */ + + if (!dry_run) { + nrg->from = f; + nrg->to = *t; +#ifdef CONFIG_CGROUP_HUGETLB + nrg->reservation_counter = + &h_cg->reserved_hugepage[hstate_index(h)]; + nrg->pages_per_hpage = pages_per_huge_page(h); +#endif + } + + return add; +} + /* * Add the huge page range represented by [f, t) to the reserve * map. In the normal case, existing regions will be expanded @@ -258,11 +322,13 @@ struct file_region { * Return the number of new huge pages added to the map. This * number is greater than or equal to zero. */ -static long region_add(struct resv_map *resv, long f, long t) +static long region_add(struct hstate *h, struct resv_map *resv, long f, long t) { struct list_head *head = &resv->regions; - struct file_region *rg, *nrg, *trg; - long add = 0; + struct file_region *rg, *nrg; + long add = 0, newadd = 0; + struct hugetlb_cgroup *h_cg = NULL; + int ret = 0; spin_lock(&resv->lock); /* Locate the region we are either in or before. */ @@ -277,6 +343,23 @@ static long region_add(struct resv_map *resv, long f, long t) * from the cache and use it for this range. */ if (&rg->link == head || t < rg->from) { +#ifdef CONFIG_CGROUP_HUGETLB + /* + * If res->reservation_counter is NULL, then it means this is + * a shared mapping, and hugetlb cgroup accounting should be + * done on the file_region entries inside resv_map. + */ + if (!resv->reservation_counter) { + ret = hugetlb_cgroup_charge_cgroup( + hstate_index(h), + (t - f) * pages_per_huge_page(h), + &h_cg, true); + } + + if (ret) + goto out_locked; +#endif + VM_BUG_ON(resv->region_cache_count <= 0); resv->region_cache_count--; @@ -286,6 +369,15 @@ static long region_add(struct resv_map *resv, long f, long t) nrg->from = f; nrg->to = t; + +#ifdef CONFIG_CGROUP_HUGETLB + if (h_cg) { + nrg->reservation_counter = + &h_cg->reserved_hugepage[hstate_index(h)]; + nrg->pages_per_hpage = pages_per_huge_page(h); + } +#endif + list_add(&nrg->link, rg->link.prev); add += t - f; @@ -296,38 +388,37 @@ static long region_add(struct resv_map *resv, long f, long t) if (f > rg->from) f = rg->from; - /* Check for and consume any regions we now overlap with. */ - nrg = rg; - list_for_each_entry_safe(rg, trg, rg->link.prev, link) { - if (&rg->link == head) - break; - if (rg->from > t) - break; - - /* If this area reaches higher then extend our area to - * include it completely. If this is not the first area - * which we intend to reuse, free it. */ - if (rg->to > t) - t = rg->to; - if (rg != nrg) { - /* Decrement return value by the deleted range. - * Another range will span this area so that by - * end of routine add will be >= zero - */ - add -= (rg->to - rg->from); - list_del(&rg->link); - kfree(rg); - } +#ifdef CONFIG_CGROUP_HUGETLB + /* Count any regions we now overlap with. */ + add = consume_regions_we_overlap_with(rg, head, f, &t, NULL, NULL, + true); + + if (!resv->reservation_counter) { + ret = hugetlb_cgroup_charge_cgroup( + hstate_index(h), + add * pages_per_huge_page(h), + &h_cg, true); } - add += (nrg->from - f); /* Added to beginning of region */ - nrg->from = f; - add += t - nrg->to; /* Added to end of region */ - nrg->to = t; + if (ret) + goto out_locked; +#endif + + /* Check for and consume any regions we now overlap with. */ + newadd = consume_regions_we_overlap_with(rg, head, f, &t, h_cg, h, + false); + /* + * If these aren't equal, then there is a bug with + * consume_regions_we_overlap_with, and we're charging the wrong amount + * of memory. + */ + WARN_ON(add != newadd); out_locked: resv->adds_in_progress--; spin_unlock(&resv->lock); + if (ret) + return ret; VM_BUG_ON(add < 0); return add; } @@ -487,6 +578,10 @@ static long region_del(struct resv_map *resv, long f, long t) struct file_region *rg, *trg; struct file_region *nrg = NULL; long del = 0; +#ifdef CONFIG_CGROUP_HUGETLB + struct page_counter *reservation_counter = NULL; + unsigned long pages_per_hpage = 0; +#endif retry: spin_lock(&resv->lock); @@ -514,6 +609,14 @@ static long region_del(struct resv_map *resv, long f, long t) nrg = list_first_entry(&resv->region_cache, struct file_region, link); +#ifdef CONFIG_CGROUP_HUGETLB + /* + * Save counter information from the deleted + * node, in case we need to do an uncharge. + */ + reservation_counter = nrg->reservation_counter; + pages_per_hpage = nrg->pages_per_hpage; +#endif list_del(&nrg->link); resv->region_cache_count--; } @@ -543,6 +646,14 @@ static long region_del(struct resv_map *resv, long f, long t) if (f <= rg->from && t >= rg->to) { /* Remove entire region */ del += rg->to - rg->from; +#ifdef CONFIG_CGROUP_HUGETLB + /* + * Save counter information from the deleted node, + * in case we need to do an uncharge. + */ + reservation_counter = rg->reservation_counter; + pages_per_hpage = rg->pages_per_hpage; +#endif list_del(&rg->link); kfree(rg); continue; @@ -559,6 +670,19 @@ static long region_del(struct resv_map *resv, long f, long t) spin_unlock(&resv->lock); kfree(nrg); +#ifdef CONFIG_CGROUP_HUGETLB + /* + * If resv->reservation_counter is NULL, then this is shared + * reservation, and the reserved memory is tracked in the file_struct + * entries inside of resv_map. So we need to uncharge the memory here. + */ + if (reservation_counter && pages_per_hpage && del > 0 && + !resv->reservation_counter) { + hugetlb_cgroup_uncharge_counter( + reservation_counter, + del * pages_per_hpage); + } +#endif return del; } @@ -1930,7 +2054,7 @@ static long __vma_reservation_common(struct hstate *h, ret = region_chg(resv, idx, idx + 1); break; case VMA_COMMIT_RESV: - ret = region_add(resv, idx, idx + 1); + ret = region_add(h, resv, idx, idx + 1); break; case VMA_END_RESV: region_abort(resv, idx, idx + 1); @@ -1938,7 +2062,7 @@ static long __vma_reservation_common(struct hstate *h, break; case VMA_ADD_RESV: if (vma->vm_flags & VM_MAYSHARE) - ret = region_add(resv, idx, idx + 1); + ret = region_add(h, resv, idx, idx + 1); else { region_abort(resv, idx, idx + 1); ret = region_del(resv, idx, idx + 1); @@ -4536,7 +4660,7 @@ int hugetlb_reserve_pages(struct inode *inode, struct vm_area_struct *vma, vm_flags_t vm_flags) { - long ret, chg; + long ret, chg, add; struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; @@ -4624,9 +4748,7 @@ int hugetlb_reserve_pages(struct inode *inode, */ ret = hugetlb_acct_memory(h, gbl_reserve); if (ret < 0) { - /* put back original number of pages, chg */ - (void)hugepage_subpool_put_pages(spool, chg); - goto out_err; + goto out_put_pages; } /* @@ -4641,7 +4763,12 @@ int hugetlb_reserve_pages(struct inode *inode, * else has to be done for private mappings here */ if (!vma || vma->vm_flags & VM_MAYSHARE) { - long add = region_add(resv_map, from, to); + add = region_add(h, resv_map, from, to); + if (add < 0) { + ret = -ENOMEM; + goto out_acct_memory; + } + if (unlikely(chg > add)) { /* @@ -4659,10 +4786,15 @@ int hugetlb_reserve_pages(struct inode *inode, } } return 0; +out_acct_memory: + hugetlb_acct_memory(h, -gbl_reserve); +out_put_pages: + /* put back original number of pages, chg */ + (void)hugepage_subpool_put_pages(spool, chg); out_err: if (!vma || vma->vm_flags & VM_MAYSHARE) - /* Don't call region_abort if region_chg failed */ - if (chg >= 0) + /* Don't call region_abort if region_chg or region_add failed */ + if (chg >= 0 && add >= 0) region_abort(resv_map, from, to); if (vma && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) kref_put(&resv_map->refs, resv_map_release); From patchwork Thu Aug 8 23:13:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 11084963 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 60CFE14DB for ; Thu, 8 Aug 2019 23:14:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CEA328826 for ; Thu, 8 Aug 2019 23:14:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 410DA28C02; Thu, 8 Aug 2019 23:14:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C22B028826 for ; Thu, 8 Aug 2019 23:14:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B90A6B000E; Thu, 8 Aug 2019 19:14:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 642A06B0010; Thu, 8 Aug 2019 19:14:11 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BE116B0266; Thu, 8 Aug 2019 19:14:11 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 03D826B000E for ; Thu, 8 Aug 2019 19:14:11 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id i27so60124811pfk.12 for ; Thu, 08 Aug 2019 16:14:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=d1cgxA8xN6av5hnFjkPhb8pR5IdstRIAmF2SmgVzr5k=; b=inu7ziNlCBwy7CKZaEdtwutmeQP0NGwvQUMeHfl+QqrO7i5YW92ghCUUEkJPbJuOl6 D2HbY6OLVbOwv99Wk2F8b6ldfjpIHZLhwFNhDl3htJVdFmR57mGiwTanRA7B4tMYyd9Y unOXiO/fNAlihO7U7QfWmA7wSuxPlzQmaJdehAjfNbopUBOuCmwfZ1NRuni8dnO0c0fw eTGHP2YEhNj80aXOQzt+fZdXo7MHPLorPk19QmGhPO7wud+9tsERUdRLx6SOkKB+fDk5 J/mg43r+SdHQtPEXUHcciR4NI78iv5m+c8sAz1dwUpmIKRtZ+aTt9uAoj/Fi2r/Ky60N 3dmw== X-Gm-Message-State: APjAAAV0QbsBvWMXHwqUNQfMZqGokuRx4gpWNU1CFzbK+Le/sA0qDrx7 TLhB+NGlBKWUTlk/T/HqpEWv5UoIUrqvvBXkJrCdBHsL1qeR1KTFTkbLXV6vuUndTDi0U6oXhur yEEMiUtTGixKm8b/XmtHhPCjJ/hqdXDdWUk9aLlqgQd3op+uI5q+HUyNLoxtmkJUkTw== X-Received: by 2002:a17:90a:37e9:: with SMTP id v96mr6328207pjb.10.1565306050564; Thu, 08 Aug 2019 16:14:10 -0700 (PDT) X-Received: by 2002:a17:90a:37e9:: with SMTP id v96mr6328113pjb.10.1565306048952; Thu, 08 Aug 2019 16:14:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565306048; cv=none; d=google.com; s=arc-20160816; b=AEGNs3lTQXPw7hhp7b3maEW/qbE8Zijoi0/X6kL7yAATLC1r2eLgQsYzd/bV0OREXO Z056vN0Npg75pwNrsKP5c7iff4g0dg6uh2lmIPqwCFDdjgqhhVFafiZOhIqCTlaVEoPN HMAo8p8bIKIq1EVej8FmCEtVJYoA2xv6Z+IZCa0DuxO5FRvVGuUeHtxyDhpm6YNo8TS/ 8A95OKOv5U8+wuLMYyaFxuF0Ieahd+kmk5BcqVoWnfyTJwBcnd7MqBhac/dQlgwWB5ij Aq/e79rujnD/E83WW9+UFVq55LTkB34Swh6iY9M/OMkDcCdI115wjWbIX6LMakhivi3f LxRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:dkim-signature; bh=d1cgxA8xN6av5hnFjkPhb8pR5IdstRIAmF2SmgVzr5k=; b=D6MgfMmd2DE4xTeYbLNLrnRcXhjUZhQk7CSvUwJFFaGG74hRVALC1+RUmF4FJ3YgQc T7BT64NZ6lgWHbPprg2woTE1nyOYXmgdKy+kOU8+46LT3aR1s1Yew8vJJXrNuzl66MJQ 7Y1mvCwIC1nDa+QCSOTqhDCptyHYncQyQyb8cbFE/aQNGx5pPvONwqSxSVE4NPnq0y4Z yZ2ymbXBMVx/Ub14ho+R1Xcp77TiPJ2eTDjZnfLwJn11zjg9Jb0j54NGwrxME95GzLTG /CLwq11eSpOiwsLU/W2W1h2bFj9ohQtFB3Yb+KJb4cJJ+r67uEEidezrxgm6xOIYCSzE NtAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=IwfZdbos; spf=pass (google.com: domain of 3wkxmxqskcd0zklzrqxlhmzfnnfkd.bnlkhmtw-lljuzbj.nqf@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3wKxMXQsKCD0ZklZrqxlhmZfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--almasrymina.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f73.google.com (mail-sor-f73.google.com. [209.85.220.73]) by mx.google.com with SMTPS id b28sor66730890pgb.4.2019.08.08.16.14.08 for (Google Transport Security); Thu, 08 Aug 2019 16:14:08 -0700 (PDT) Received-SPF: pass (google.com: domain of 3wkxmxqskcd0zklzrqxlhmzfnnfkd.bnlkhmtw-lljuzbj.nqf@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) client-ip=209.85.220.73; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=IwfZdbos; spf=pass (google.com: domain of 3wkxmxqskcd0zklzrqxlhmzfnnfkd.bnlkhmtw-lljuzbj.nqf@flex--almasrymina.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3wKxMXQsKCD0ZklZrqxlhmZfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--almasrymina.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=d1cgxA8xN6av5hnFjkPhb8pR5IdstRIAmF2SmgVzr5k=; b=IwfZdbosMiu95mLiae4AwMODJo4FKPtA4JwD1pSpG1feBJrEYcuyl/qr4podVHGG6U VbpDS9JV/Ome33bFK34i1SN+rOy8JCFcruspgnyoM58PPEv+sfolFpiobFpZ2h9x7jmm RAHTAP31xUqIJn04qM3m8+a73nDofh5SHqgYxyO2fbc5BV+Tqoi9OBgnXP+Ke9d/2r5Y YoWaYwkzdKkRLFzbHVRjZfnmBlvPKu81brxtGnNygGjUlBitVM39dqsVGEf31TTaTYav fgzjJDHh99d8qUrBgK2K2/XKw3IwMy7KndZqfgQrIfzI6hYGrKvr+X+eILaQthrF+/+O 8W2A== X-Google-Smtp-Source: APXvYqwYzi9jh//Psx9g0OB/k9SlW/17A73AOt+JvBSTeJcG0YgzpaChl+Yv+VirBfKazBqnaM+6BUwBSHUzZHSslw== X-Received: by 2002:a63:fc52:: with SMTP id r18mr14828014pgk.378.1565306048234; Thu, 08 Aug 2019 16:14:08 -0700 (PDT) Date: Thu, 8 Aug 2019 16:13:40 -0700 In-Reply-To: <20190808231340.53601-1-almasrymina@google.com> Message-Id: <20190808231340.53601-6-almasrymina@google.com> Mime-Version: 1.0 References: <20190808231340.53601-1-almasrymina@google.com> X-Mailer: git-send-email 2.23.0.rc1.153.gdeed80330f-goog Subject: [RFC PATCH v2 5/5] hugetlb_cgroup: Add hugetlb_cgroup reservation tests From: Mina Almasry To: mike.kravetz@oracle.com Cc: shuah@kernel.org, almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, khalid.aziz@oracle.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The tests use both shared and private mapped hugetlb memory, and monitors the hugetlb usage counter as well as the hugetlb reservation counter. They test different configurations such as hugetlb memory usage via hugetlbfs, or MAP_HUGETLB, or shmget/shmat, and with and without MAP_POPULATE. --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 4 + .../selftests/vm/charge_reserved_hugetlb.sh | 438 ++++++++++++++++++ .../selftests/vm/write_hugetlb_memory.sh | 22 + .../testing/selftests/vm/write_to_hugetlbfs.c | 252 ++++++++++ 5 files changed, 717 insertions(+) create mode 100755 tools/testing/selftests/vm/charge_reserved_hugetlb.sh create mode 100644 tools/testing/selftests/vm/write_hugetlb_memory.sh create mode 100644 tools/testing/selftests/vm/write_to_hugetlbfs.c -- 2.23.0.rc1.153.gdeed80330f-goog diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 31b3c98b6d34d..d3bed9407773c 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -14,3 +14,4 @@ virtual_address_range gup_benchmark va_128TBswitch map_fixed_noreplace +write_to_hugetlbfs diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 9534dc2bc9295..8d37d5409b52c 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -18,6 +18,7 @@ TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd TEST_GEN_FILES += va_128TBswitch TEST_GEN_FILES += virtual_address_range +TEST_GEN_FILES += write_to_hugetlbfs TEST_PROGS := run_vmtests @@ -29,3 +30,6 @@ include ../lib.mk $(OUTPUT)/userfaultfd: LDLIBS += -lpthread $(OUTPUT)/mlock-random-test: LDLIBS += -lcap + +# Why does adding $(OUTPUT)/ like above not apply this flag..? +write_to_hugetlbfs: CFLAGS += -static diff --git a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh new file mode 100755 index 0000000000000..bf0b6dcec9977 --- /dev/null +++ b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh @@ -0,0 +1,438 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +set -e + +cgroup_path=/dev/cgroup/memory +if [[ ! -e $cgroup_path ]]; then + mkdir -p $cgroup_path + mount -t cgroup -o hugetlb,memory cgroup $cgroup_path +fi + +cleanup () { + echo $$ > $cgroup_path/tasks + + set +e + if [[ "$(pgrep write_to_hugetlbfs)" != "" ]]; then + kill -2 write_to_hugetlbfs + # Wait for hugetlbfs memory to get depleted. + sleep 0.5 + fi + set -e + + if [[ -e /mnt/huge ]]; then + rm -rf /mnt/huge/* + umount /mnt/huge || echo error + rmdir /mnt/huge + fi + if [[ -e $cgroup_path/hugetlb_cgroup_test ]]; then + rmdir $cgroup_path/hugetlb_cgroup_test + fi + if [[ -e $cgroup_path/hugetlb_cgroup_test1 ]]; then + rmdir $cgroup_path/hugetlb_cgroup_test1 + fi + if [[ -e $cgroup_path/hugetlb_cgroup_test2 ]]; then + rmdir $cgroup_path/hugetlb_cgroup_test2 + fi + echo 0 > /proc/sys/vm/nr_hugepages + echo CLEANUP DONE +} + +cleanup + +function expect_equal() { + local expected="$1" + local actual="$2" + local error="$3" + + if [[ "$expected" != "$actual" ]]; then + echo "expected ($expected) != actual ($actual): $3" + cleanup + exit 1 + fi +} + +function setup_cgroup() { + local name="$1" + local cgroup_limit="$2" + local reservation_limit="$3" + + mkdir $cgroup_path/$name + + echo writing cgroup limit: "$cgroup_limit" + echo "$cgroup_limit" > $cgroup_path/$name/hugetlb.2MB.limit_in_bytes + + echo writing reseravation limit: "$reservation_limit" + echo "$reservation_limit" > \ + $cgroup_path/$name/hugetlb.2MB.reservation_limit_in_bytes +} + +function write_hugetlbfs_and_get_usage() { + local cgroup="$1" + local size="$2" + local populate="$3" + local write="$4" + local path="$5" + local method="$6" + local private="$7" + local expect_failure="$8" + + # Function return values. + reservation_failed=0 + oom_killed=0 + hugetlb_difference=0 + reserved_difference=0 + + local hugetlb_usage=$cgroup_path/$cgroup/hugetlb.2MB.usage_in_bytes + local reserved_usage=$cgroup_path/$cgroup/hugetlb.2MB.reservation_usage_in_bytes + + local hugetlb_before=$(cat $hugetlb_usage) + local reserved_before=$(cat $reserved_usage) + + echo + echo Starting: + echo hugetlb_usage="$hugetlb_before" + echo reserved_usage="$reserved_before" + echo expect_failure is "$expect_failure" + + set +e + if [[ "$method" == "1" ]] || [[ "$method" == 2 ]] || \ + [[ "$private" == "-r" ]] && [[ "$expect_failure" != 1 ]]; then + bash write_hugetlb_memory.sh "$size" "$populate" "$write" \ + "$cgroup" "$path" "$method" "$private" "-l" & + + local write_result=$? + # This sleep is to make sure that the script above has had enough + # time to do its thing, since it runs in the background. This may + # cause races... + sleep 0.5 + echo write_result is $write_result + else + bash write_hugetlb_memory.sh "$size" "$populate" "$write" \ + "$cgroup" "$path" "$method" "$private" + local write_result=$? + fi + set -e + + if [[ "$write_result" == 1 ]]; then + reservation_failed=1 + fi + + # On linus/master, the above process gets SIGBUS'd on oomkill, with + # return code 135. On earlier kernels, it gets actual oomkill, with return + # code 137, so just check for both conditions incase we're testing against + # an earlier kernel. + if [[ "$write_result" == 135 ]] || [[ "$write_result" == 137 ]]; then + oom_killed=1 + fi + + local hugetlb_after=$(cat $hugetlb_usage) + local reserved_after=$(cat $reserved_usage) + + echo After write: + echo hugetlb_usage="$hugetlb_after" + echo reserved_usage="$reserved_after" + + hugetlb_difference=$(($hugetlb_after - $hugetlb_before)) + reserved_difference=$(($reserved_after - $reserved_before)) +} + +function cleanup_hugetlb_memory() { + set +e + if [[ "$(pgrep write_to_hugetlbfs)" != "" ]]; then + echo kiling write_to_hugetlbfs + killall -2 write_to_hugetlbfs + # Wait for hugetlbfs memory to get depleted. + sleep 0.5 + fi + set -e + + if [[ -e /mnt/huge ]]; then + rm -rf /mnt/huge/* + umount /mnt/huge + rmdir /mnt/huge + fi +} + +function run_test() { + local size="$1" + local populate="$2" + local write="$3" + local cgroup_limit="$4" + local reservation_limit="$5" + local nr_hugepages="$6" + local method="$7" + local private="$8" + local expect_failure="$9" + + # Function return values. + hugetlb_difference=0 + reserved_difference=0 + reservation_failed=0 + oom_killed=0 + + echo nr hugepages = "$nr_hugepages" + echo "$nr_hugepages" > /proc/sys/vm/nr_hugepages + + setup_cgroup "hugetlb_cgroup_test" "$cgroup_limit" "$reservation_limit" + + mkdir -p /mnt/huge + mount -t hugetlbfs \ + -o pagesize=2M,size=256M none /mnt/huge + + write_hugetlbfs_and_get_usage "hugetlb_cgroup_test" "$size" "$populate" \ + "$write" "/mnt/huge/test" "$method" "$private" "$expect_failure" + + cleanup_hugetlb_memory + + local final_hugetlb=$(cat $cgroup_path/hugetlb_cgroup_test/hugetlb.2MB.usage_in_bytes) + local final_reservation=$(cat $cgroup_path/hugetlb_cgroup_test/hugetlb.2MB.reservation_usage_in_bytes) + + expect_equal "0" "$final_hugetlb" "final hugetlb is not zero" + expect_equal "0" "$final_reservation" "final reservation is not zero" +} + +function run_multiple_cgroup_test() { + local size1="$1" + local populate1="$2" + local write1="$3" + local cgroup_limit1="$4" + local reservation_limit1="$5" + + local size2="$6" + local populate2="$7" + local write2="$8" + local cgroup_limit2="$9" + local reservation_limit2="${10}" + + local nr_hugepages="${11}" + local method="${12}" + local private="${13}" + local expect_failure="${14}" + + # Function return values. + hugetlb_difference1=0 + reserved_difference1=0 + reservation_failed1=0 + oom_killed1=0 + + hugetlb_difference2=0 + reserved_difference2=0 + reservation_failed2=0 + oom_killed2=0 + + + echo nr hugepages = "$nr_hugepages" + echo "$nr_hugepages" > /proc/sys/vm/nr_hugepages + + setup_cgroup "hugetlb_cgroup_test1" "$cgroup_limit1" "$reservation_limit1" + setup_cgroup "hugetlb_cgroup_test2" "$cgroup_limit2" "$reservation_limit2" + + mkdir -p /mnt/huge + mount -t hugetlbfs \ + -o pagesize=2M,size=256M none /mnt/huge + + write_hugetlbfs_and_get_usage "hugetlb_cgroup_test1" "$size1" \ + "$populate1" "$write1" "/mnt/huge/test1" "$method" "$private" \ + "$expect_failure" + + hugetlb_difference1=$hugetlb_difference + reserved_difference1=$reserved_difference + reservation_failed1=$reservation_failed + oom_killed1=$oom_killed + + local cgroup1_hugetlb_usage=$cgroup_path/hugetlb_cgroup_test1/hugetlb.2MB.usage_in_bytes + local cgroup1_reservation_usage=$cgroup_path/hugetlb_cgroup_test1/hugetlb.2MB.reservation_usage_in_bytes + local cgroup2_hugetlb_usage=$cgroup_path/hugetlb_cgroup_test2/hugetlb.2MB.usage_in_bytes + local cgroup2_reservation_usage=$cgroup_path/hugetlb_cgroup_test2/hugetlb.2MB.reservation_usage_in_bytes + + local usage_before_second_write=$(cat $cgroup1_hugetlb_usage) + local reservation_usage_before_second_write=$(cat \ + $cgroup1_reservation_usage) + + write_hugetlbfs_and_get_usage "hugetlb_cgroup_test2" "$size2" \ + "$populate2" "$write2" "/mnt/huge/test2" "$method" "$private" \ + "$expect_failure" + + hugetlb_difference2=$hugetlb_difference + reserved_difference2=$reserved_difference + reservation_failed2=$reservation_failed + oom_killed2=$oom_killed + + expect_equal "$usage_before_second_write" \ + "$(cat $cgroup1_hugetlb_usage)" "Usage changed." + expect_equal "$reservation_usage_before_second_write" \ + "$(cat $cgroup1_reservation_usage)" "Reservation usage changed." + + cleanup_hugetlb_memory + + local final_hugetlb=$(cat $cgroup1_hugetlb_usage) + local final_reservation=$(cat $cgroup1_reservation_usage) + + expect_equal "0" "$final_hugetlb" \ + "hugetlbt_cgroup_test1 final hugetlb is not zero" + expect_equal "0" "$final_reservation" \ + "hugetlbt_cgroup_test1 final reservation is not zero" + + local final_hugetlb=$(cat $cgroup2_hugetlb_usage) + local final_reservation=$(cat $cgroup2_reservation_usage) + + expect_equal "0" "$final_hugetlb" \ + "hugetlb_cgroup_test2 final hugetlb is not zero" + expect_equal "0" "$final_reservation" \ + "hugetlb_cgroup_test2 final reservation is not zero" +} + +for private in "" "-r" ; do +for populate in "" "-o"; do +for method in 0 1 2; do + +# Skip mmap(MAP_HUGETLB | MAP_SHARED). Doesn't seem to be supported. +if [[ "$method" == 1 ]] && [[ "$private" == "" ]]; then + continue +fi + +# Skip populated shmem tests. Doesn't seem to be supported. +if [[ "$method" == 2"" ]] && [[ "$populate" == "-o" ]]; then + continue +fi + +cleanup +echo +echo +echo +echo Test normal case. +echo private=$private, populate=$populate, method=$method +run_test $((10 * 1024 * 1024)) "$populate" "" $((20 * 1024 * 1024)) \ + $((20 * 1024 * 1024)) 10 "$method" "$private" "0" + +echo Memory charged to hugtlb=$hugetlb_difference +echo Memory charged to reservation=$reserved_difference + +if [[ "$populate" == "-o" ]]; then + expect_equal "$((10 * 1024 * 1024))" "$hugetlb_difference" \ + "Reserved memory charged to hugetlb cgroup." +else + expect_equal "0" "$hugetlb_difference" \ + "Reserved memory charged to hugetlb cgroup." +fi + +expect_equal "$((10 * 1024 * 1024))" "$reserved_difference" \ + "Reserved memory not charged to reservation usage." +echo 'PASS' + +cleanup +echo +echo +echo +echo Test normal case with write. +echo private=$private, populate=$populate, method=$method +run_test $((10 * 1024 * 1024)) "$populate" '-w' $((20 * 1024 * 1024)) \ + $((20 * 1024 * 1024)) 10 "$method" "$private" "0" + +echo Memory charged to hugtlb=$hugetlb_difference +echo Memory charged to reservation=$reserved_difference + +expect_equal "$((10 * 1024 * 1024))" "$hugetlb_difference" \ + "Reserved memory charged to hugetlb cgroup." +expect_equal "$((10 * 1024 * 1024))" "$reserved_difference" \ + "Reserved memory not charged to reservation usage." +echo 'PASS' + + +cleanup +echo +echo +echo +echo Test more than reservation case. +echo private=$private, populate=$populate, method=$method +run_test "$((10 * 1024 * 1024))" "$populate" '' "$((20 * 1024 * 1024))" \ + "$((5 * 1024 * 1024))" "10" "$method" "$private" "1" + +expect_equal "1" "$reservation_failed" "Reservation succeeded." +echo 'PASS' + +cleanup + +echo +echo +echo +echo Test more than cgroup limit case. +echo private=$private, populate=$populate, method=$method + +# Not sure if shm memory can be cleaned up when the process gets sigbus'd. +if [[ "$method" != 2 ]]; then + run_test $((10 * 1024 * 1024)) "$populate" "-w" $((5 * 1024 * 1024)) \ + $((20 * 1024 * 1024)) 10 "$method" "$private" "1" + + expect_equal "1" "$oom_killed" "Not oom killed." +fi +echo 'PASS' + +cleanup + +echo +echo +echo +echo Test normal case, multiple cgroups. +echo private=$private, populate=$populate, method=$method +run_multiple_cgroup_test "$((6 * 1024 * 1024))" "$populate" "" \ + "$((20 * 1024 * 1024))" "$((20 * 1024 * 1024))" "$((10 * 1024 * 1024))" \ + "$populate" "" "$((20 * 1024 * 1024))" "$((20 * 1024 * 1024))" "10" \ + "$method" "$private" "0" + +echo Memory charged to hugtlb1=$hugetlb_difference1 +echo Memory charged to reservation1=$reserved_difference1 +echo Memory charged to hugtlb2=$hugetlb_difference2 +echo Memory charged to reservation2=$reserved_difference2 + +expect_equal "$((6 * 1024 * 1024))" "$reserved_difference1" \ + "Incorrect reservations charged to cgroup 1." +expect_equal "$((10 * 1024 * 1024))" "$reserved_difference2" \ + "Incorrect reservation charged to cgroup 2." +if [[ "$populate" == "-o" ]]; then + expect_equal "$((6 * 1024 * 1024))" "$hugetlb_difference1" \ + "Incorrect hugetlb charged to cgroup 1." + expect_equal "$((10 * 1024 * 1024))" "$hugetlb_difference2" \ + "Incorrect hugetlb charged to cgroup 2." +else + expect_equal "0" "$hugetlb_difference1" \ + "Incorrect hugetlb charged to cgroup 1." + expect_equal "0" "$hugetlb_difference2" \ + "Incorrect hugetlb charged to cgroup 2." +fi +echo 'PASS' + +cleanup +echo +echo +echo +echo Test normal case with write, multiple cgroups. +echo private=$private, populate=$populate, method=$method +run_multiple_cgroup_test "$((6 * 1024 * 1024))" "$populate" "-w" \ + "$((20 * 1024 * 1024))" "$((20 * 1024 * 1024))" "$((10 * 1024 * 1024))" \ + "$populate" "-w" "$((20 * 1024 * 1024))" "$((20 * 1024 * 1024))" "10" \ + "$method" "$private" "0" + +echo Memory charged to hugtlb1=$hugetlb_difference1 +echo Memory charged to reservation1=$reserved_difference1 +echo Memory charged to hugtlb2=$hugetlb_difference2 +echo Memory charged to reservation2=$reserved_difference2 + +expect_equal "$((6 * 1024 * 1024))" "$hugetlb_difference1" \ + "Incorrect hugetlb charged to cgroup 1." +expect_equal "$((6 * 1024 * 1024))" "$reserved_difference1" \ + "Incorrect reservation charged to cgroup 1." +expect_equal "$((10 * 1024 * 1024))" "$hugetlb_difference2" \ + "Incorrect hugetlb charged to cgroup 2." +expect_equal "$((10 * 1024 * 1024))" "$reserved_difference2" \ + "Incorrected reservation charged to cgroup 2." + +echo 'PASS' + +done # private +done # populate +done # method + +umount $cgroup_path +rmdir $cgroup_path diff --git a/tools/testing/selftests/vm/write_hugetlb_memory.sh b/tools/testing/selftests/vm/write_hugetlb_memory.sh new file mode 100644 index 0000000000000..08f5fa5527cfd --- /dev/null +++ b/tools/testing/selftests/vm/write_hugetlb_memory.sh @@ -0,0 +1,22 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +set -e + +size=$1 +populate=$2 +write=$3 +cgroup=$4 +path=$5 +method=$6 +private=$7 +want_sleep=$8 + +echo "Putting task in cgroup '$cgroup'" +echo $$ > /dev/cgroup/memory/"$cgroup"/tasks + +echo "Method is $method" + +set +e +./write_to_hugetlbfs -p "$path" -s "$size" "$write" "$populate" -m "$method" \ + "$private" "$want_sleep" diff --git a/tools/testing/selftests/vm/write_to_hugetlbfs.c b/tools/testing/selftests/vm/write_to_hugetlbfs.c new file mode 100644 index 0000000000000..f02a897427a97 --- /dev/null +++ b/tools/testing/selftests/vm/write_to_hugetlbfs.c @@ -0,0 +1,252 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This program reserves and uses hugetlb memory, supporting a bunch of + * scenorios needed by the charged_reserved_hugetlb.sh test. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Global definitions. */ +enum method { + HUGETLBFS, + MMAP_MAP_HUGETLB, + SHM, + MAX_METHOD +}; + + +/* Global variables. */ +static const char *self; +static char *shmaddr; +static int shmid; + +/* + * Show usage and exit. + */ +static void exit_usage(void) +{ + + printf("Usage: %s -p -s " + "[-m <0=hugetlbfs | 1=mmap(MAP_HUGETLB)>] [-l] [-r] " + "[-o] [-w]\n", self); + exit(EXIT_FAILURE); +} + +void sig_handler(int signo) +{ + printf("Received %d.\n", signo); + if (signo == SIGINT) { + printf("Deleting the memory\n"); + if (shmdt((const void *)shmaddr) != 0) { + perror("Detach failure"); + shmctl(shmid, IPC_RMID, NULL); + exit(4); + } + + shmctl(shmid, IPC_RMID, NULL); + printf("Done deleting the memory\n"); + } + exit(2); +} + +int main(int argc, char **argv) +{ + int fd = 0; + int key = 0; + int *ptr = NULL; + int c = 0; + int size = 0; + char path[256] = ""; + enum method method = MAX_METHOD; + int want_sleep = 0, private = 0; + int populate = 0; + int write = 0; + + unsigned long i; + + + if (signal(SIGINT, sig_handler) == SIG_ERR) + err(1, "\ncan't catch SIGINT\n"); + + /* Parse command-line arguments. */ + setvbuf(stdout, NULL, _IONBF, 0); + self = argv[0]; + + while ((c = getopt(argc, argv, "s:p:m:owlr")) != -1) { + switch (c) { + case 's': + size = atoi(optarg); + break; + case 'p': + strncpy(path, optarg, sizeof(path)); + break; + case 'm': + if (atoi(optarg) >= MAX_METHOD) { + errno = EINVAL; + perror("Invalid -m."); + exit_usage(); + } + method = atoi(optarg); + break; + case 'o': + populate = 1; + break; + case 'w': + write = 1; + break; + case 'l': + want_sleep = 1; + break; + case 'r': + private = 1; + break; + default: + errno = EINVAL; + perror("Invalid arg"); + exit_usage(); + } + } + + if (strncmp(path, "", sizeof(path)) != 0) { + printf("Writing to this path: %s\n", path); + } else { + errno = EINVAL; + perror("path not found"); + exit_usage(); + } + + if (size != 0) { + printf("Writing this size: %d\n", size); + } else { + errno = EINVAL; + perror("size not found"); + exit_usage(); + } + + if (!populate) + printf("Not populating.\n"); + else + printf("Populating.\n"); + + if (!write) + printf("Not writing to memory.\n"); + + if (method == MAX_METHOD) { + errno = EINVAL; + perror("-m Invalid"); + exit_usage(); + } else + printf("Using method=%d\n", method); + + if (!private) + printf("Shared mapping.\n"); + else + printf("Private mapping.\n"); + + + switch (method) { + case HUGETLBFS: + printf("Allocating using HUGETLBFS.\n"); + fd = open(path, O_CREAT | O_RDWR, 0777); + if (fd == -1) + err(1, "Failed to open file."); + + ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, + (private ? MAP_PRIVATE : MAP_SHARED) | (populate ? + MAP_POPULATE : 0), fd, 0); + + if (ptr == MAP_FAILED) { + close(fd); + err(1, "Error mapping the file"); + } + break; + case MMAP_MAP_HUGETLB: + printf("Allocating using MAP_HUGETLB.\n"); + ptr = mmap(NULL, size, + PROT_READ | PROT_WRITE, + (private ? (MAP_PRIVATE | MAP_ANONYMOUS) : MAP_SHARED) | + MAP_HUGETLB | (populate ? + MAP_POPULATE : 0), + -1, 0); + + if (ptr == MAP_FAILED) + err(1, "mmap"); + + printf("Returned address is %p\n", ptr); + break; + case SHM: + printf("Allocating using SHM.\n"); + shmid = shmget(key, size, SHM_HUGETLB | IPC_CREAT | SHM_R | + SHM_W); + if (shmid < 0) { + shmid = shmget(++key, size, SHM_HUGETLB | IPC_CREAT | + SHM_R | SHM_W); + if (shmid < 0) + err(1, "shmget"); + + } + printf("shmid: 0x%x, shmget key:%d\n", shmid, key); + + shmaddr = shmat(shmid, NULL, 0); + if (shmaddr == (char *)-1) { + perror("Shared memory attach failure"); + shmctl(shmid, IPC_RMID, NULL); + exit(2); + } + printf("shmaddr: %p\n", shmaddr); + + break; + default: + errno = EINVAL; + err(1, "Invalid method."); + } + + if (write) { + printf("Writing to memory.\n"); + if (method != SHM) { + memset(ptr, 1, size); + } else { + printf("Starting the writes:\n"); + for (i = 0; i < size; i++) { + shmaddr[i] = (char)(i); + if (!(i % (1024 * 1024))) + printf("."); + } + printf("\n"); + + printf("Starting the Check..."); + for (i = 0; i < size; i++) + if (shmaddr[i] != (char)i) { + printf("\nIndex %lu mismatched\n", i); + exit(3); + } + printf("Done.\n"); + + + } + } + + if (want_sleep) { + /* Signal to caller that we're done. */ + printf("DONE\n"); + + /* Hold memory until external kill signal is delivered. */ + while (1) + sleep(100); + } + + close(fd); + + return 0; +}