From patchwork Thu Nov 11 23:42:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12615883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE43DC4332F for ; Thu, 11 Nov 2021 23:42:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2D0FC61213 for ; Thu, 11 Nov 2021 23:42:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2D0FC61213 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 7BA5B6B0078; Thu, 11 Nov 2021 18:42:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7433B6B007D; Thu, 11 Nov 2021 18:42:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 596096B0081; Thu, 11 Nov 2021 18:42:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0136.hostedemail.com [216.40.44.136]) by kanga.kvack.org (Postfix) with ESMTP id 414086B0078 for ; Thu, 11 Nov 2021 18:42:20 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DF76418544810 for ; Thu, 11 Nov 2021 23:42:19 +0000 (UTC) X-FDA: 78798275598.26.C977A1A Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf17.hostedemail.com (Postfix) with ESMTP id 98AB0F0003A5 for ; Thu, 11 Nov 2021 23:42:16 +0000 (UTC) Received: by mail-pg1-f202.google.com with SMTP id x14-20020a63cc0e000000b002a5bc462947so3958079pgf.20 for ; Thu, 11 Nov 2021 15:42:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=hhhBGiHo1owFEumWbv75KK8v4aTjrOx44gQLpz28MgY=; b=o27QKmXQPNCMqa688bbrGvW7CtGtFWwPMXw+RccYjhtzl2PwlMRHXHAniX3XlwYCDj kv8gAmHLgJbDlOKWBeqGUGX73BaFEuZjQBpNlg1QXtfFBQh1C2g7TjA5bN5hdDQhRx+8 CjxPPNornrQYu6LnQG6I+LqkxRYKcXDJckrTQtK+wDopcS/IypADOOpeqvhzJoRsptan TUDJzjz3++Y2z2kc3uUaiKmUk3fQcxVzEWwWJeQ+lH6Q3SiwfyQDF9dB7CmL13RHAZaB N4roNRazhHVckPCyCIpa8zWWDeJ+RoizBsNqzzpIQm6dPtZwfTmfBPv6t/pHUmZNcmQ6 gYxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=hhhBGiHo1owFEumWbv75KK8v4aTjrOx44gQLpz28MgY=; b=TFYsnZ+ehs78G0yOhSCgYL/NXZGd3dr8uS9Af/EPaRBwtxCupGhgD/FvAMw4vMq9ki p7tj52xN5bbfCbe9Lm9Egu3WPstyih3mCsxoQ9G01+4Phta676DNzNH/hvwgQjxxmj5E a2ph3QnzRrwb9KvGRtCDaVXHlb8gXbFfr/oPBUPH1JnDJzSP5sPgXFBuHR0TxQpH+Vu4 j7cwbkwzrgTSFDusaItA1j3GOKX/4sKz3j0P80EDfbu7JkzlPOsg6lIvHyfTowH6NTNN 6Nw2jKGHGTG37c/92fqzUisRxloMP0mGoS3RWh1Jtc0T0b5UU1caDZtjYU9HprP5VDsB /kig== X-Gm-Message-State: AOAM533dBsn1vdc+UREjrMnctMCbOp/CcwinITZIRQ0WBoZl2cwR14L2 7jZXiLYHz6FeUSr0bMElFFx+NvVZtuBgVtGslA== X-Google-Smtp-Source: ABdhPJx3FblJFq44ccOJ4GfqmDBVFIwJcFiOtzTzF7YQVBH1uNod7uW1RLDA2KYkQpmbvpJQmY6aj48cRbDPjeTP9w== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:672d:70d0:3f83:676d]) (user=almasrymina job=sendgmr) by 2002:a65:560c:: with SMTP id l12mr7176108pgs.375.1636674135556; Thu, 11 Nov 2021 15:42:15 -0800 (PST) Date: Thu, 11 Nov 2021 15:42:00 -0800 In-Reply-To: <20211111234203.1824138-1-almasrymina@google.com> Message-Id: <20211111234203.1824138-2-almasrymina@google.com> Mime-Version: 1.0 References: <20211111234203.1824138-1-almasrymina@google.com> X-Mailer: git-send-email 2.34.0.rc1.387.gb447b232ab-goog Subject: [PATCH v3 1/4] mm/shmem: support deterministic charging of tmpfs From: Mina Almasry Cc: Mina Almasry , Michal Hocko , "Theodore Ts'o" , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , Muchun Song , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 98AB0F0003A5 X-Stat-Signature: ijekwexibaauzdt7w3jcmibegc97ncn4 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=o27QKmXQ; spf=pass (imf17.hostedemail.com: domain of 3V6qNYQsKCPQWhiWonuiejWckkcha.Ykihejqt-iigrWYg.knc@flex--almasrymina.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3V6qNYQsKCPQWhiWonuiejWckkcha.Ykihejqt-iigrWYg.knc@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1636674136-831615 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add memcg= option to shmem mount. Users can specify this option at mount time and all data page charges will be charged to the memcg supplied. Processes are only allowed to direct tmpfs changes to a cgroup that they themselves can enter and allocate memory in. Signed-off-by: Mina Almasry Cc: Michal Hocko Cc: Theodore Ts'o Cc: Greg Thelen Cc: Shakeel Butt Cc: Andrew Morton Cc: Hugh Dickins CC: Roman Gushchin Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Muchun Song Cc: riel@surriel.com Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: cgroups@vger.kernel.org Reported-by: kernel test robot --- Changes in v3: - Fixed build failures/warnings Reported-by: kernel test robot Changes in v2: - Fixed Roman's email. - Added a new wrapper around charge_memcg() instead of __mem_cgroup_charge() - Merged the permission check into this patch as Roman suggested. - Instead of checking for a s_memcg_to_charge off the superblock in the filemap code, I set_active_memcg() before calling into the fs generic code as Dave suggests. - I have kept the s_memcg_to_charge in the superblock to keep the struct address_space pointer small and preserve the remount use case.. --- fs/super.c | 7 ++ include/linux/fs.h | 5 ++ include/linux/memcontrol.h | 58 +++++++++++++++++ mm/memcontrol.c | 130 +++++++++++++++++++++++++++++++++++++ mm/shmem.c | 73 ++++++++++++++++++++- 5 files changed, 271 insertions(+), 2 deletions(-) -- 2.34.0.rc1.387.gb447b232ab-goog diff --git a/fs/super.c b/fs/super.c index 3bfc0f8fbd5bc..5484b08ba0025 100644 --- a/fs/super.c +++ b/fs/super.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include /* for the emergency remount stuff */ @@ -180,6 +181,9 @@ static void destroy_unused_super(struct super_block *s) up_write(&s->s_umount); list_lru_destroy(&s->s_dentry_lru); list_lru_destroy(&s->s_inode_lru); +#if CONFIG_MEMCG + mem_cgroup_set_charge_target(&s->s_memcg_to_charge, NULL); +#endif security_sb_free(s); put_user_ns(s->s_user_ns); kfree(s->s_subtype); @@ -292,6 +296,9 @@ static void __put_super(struct super_block *s) WARN_ON(s->s_dentry_lru.node); WARN_ON(s->s_inode_lru.node); WARN_ON(!list_empty(&s->s_mounts)); +#if CONFIG_MEMCG + mem_cgroup_set_charge_target(&s->s_memcg_to_charge, NULL); +#endif security_sb_free(s); fscrypt_sb_free(s); put_user_ns(s->s_user_ns); diff --git a/include/linux/fs.h b/include/linux/fs.h index 3afca821df32e..59407b3e7aee3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1567,6 +1567,11 @@ struct super_block { struct workqueue_struct *s_dio_done_wq; struct hlist_head s_pins; +#ifdef CONFIG_MEMCG + /* memcg to charge for pages allocated to this filesystem */ + struct mem_cgroup *s_memcg_to_charge; +#endif + /* * Owning user namespace and default context in which to * interpret filesystem uids, gids, quotas, device nodes, diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0c5c403f4be6b..8583d37c05d9b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -27,6 +27,7 @@ struct obj_cgroup; struct page; struct mm_struct; struct kmem_cache; +struct super_block; /* Cgroup-specific page state, on top of universal node page state */ enum memcg_stat_item { @@ -713,6 +714,9 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, return __mem_cgroup_charge(folio, mm, gfp); } +int mem_cgroup_charge_memcg(struct folio *folio, struct mem_cgroup *memcg, + gfp_t gfp); + int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); @@ -923,6 +927,24 @@ static inline bool mem_cgroup_online(struct mem_cgroup *memcg) return !!(memcg->css.flags & CSS_ONLINE); } +struct mem_cgroup * +mem_cgroup_mapping_get_charge_target(struct address_space *mapping); + +static inline void mem_cgroup_put_memcg(struct mem_cgroup *memcg) +{ + if (memcg) + css_put(&memcg->css); +} + +void mem_cgroup_set_charge_target(struct mem_cgroup **target, + struct mem_cgroup *memcg); +struct mem_cgroup *mem_cgroup_get_from_path(const char *path); +/** + * User is responsible for providing a buffer @buf of length @len and freeing + * it. + */ +int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, size_t len); + void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, int zid, int nr_pages); @@ -1223,6 +1245,42 @@ static inline int mem_cgroup_charge(struct folio *folio, return 0; } +static inline int mem_cgroup_charge_memcg(struct folio *folio, + struct mem_cgroup *memcg, + gfp_t gfp_mask) +{ + return 0; +} + +static inline struct mem_cgroup * +mem_cgroup_mapping_get_charge_target(struct address_space *mapping) +{ + return NULL; +} + +static inline void mem_cgroup_put_memcg(struct mem_cgroup *memcg) +{ +} + +static inline void mem_cgroup_set_charge_target(struct mem_cgroup **target, + struct mem_cgroup *memcg) +{ +} + +static inline struct mem_cgroup *mem_cgroup_get_from_path(const char *path) +{ + return NULL; +} + +static inline int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, + size_t len) +{ + if (len < 1) + return -EINVAL; + buf[0] = '\0'; + return 0; +} + static inline int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 781605e920153..b3d8f52a63d17 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -62,6 +62,7 @@ #include #include #include +#include #include "internal.h" #include #include @@ -2580,6 +2581,126 @@ void mem_cgroup_handle_over_high(void) css_put(&memcg->css); } +/* + * Non error return value must eventually be released with css_put(). + */ +struct mem_cgroup *mem_cgroup_get_from_path(const char *path) +{ + static const char procs_filename[] = "/cgroup.procs"; + struct file *file, *procs; + struct cgroup_subsys_state *css; + struct mem_cgroup *memcg; + char *procs_path = + kmalloc(strlen(path) + sizeof(procs_filename), GFP_KERNEL); + + if (procs_path == NULL) + return ERR_PTR(-ENOMEM); + strcpy(procs_path, path); + strcat(procs_path, procs_filename); + + procs = filp_open(procs_path, O_WRONLY, 0); + kfree(procs_path); + + /* + * Restrict the capability for tasks to mount with memcg charging to the + * cgroup they could not join. For example, disallow: + * + * mount -t tmpfs -o memcg=root-cgroup nodev + * + * if it is a non-root task. + */ + if (IS_ERR(procs)) + return (struct mem_cgroup *)procs; + fput(procs); + + file = filp_open(path, O_DIRECTORY | O_RDONLY, 0); + if (IS_ERR(file)) + return (struct mem_cgroup *)file; + + css = css_tryget_online_from_dir(file->f_path.dentry, + &memory_cgrp_subsys); + if (IS_ERR(css)) + memcg = (struct mem_cgroup *)css; + else + memcg = container_of(css, struct mem_cgroup, css); + + fput(file); + return memcg; +} + +/* + * Get the name of the optional charge target memcg associated with @sb. This + * is the cgroup name, not the cgroup path. + */ +int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, size_t len) +{ + struct mem_cgroup *memcg; + int ret = 0; + + buf[0] = '\0'; + + rcu_read_lock(); + memcg = rcu_dereference(sb->s_memcg_to_charge); + if (memcg && !css_tryget_online(&memcg->css)) + memcg = NULL; + rcu_read_unlock(); + + if (!memcg) + return 0; + + ret = cgroup_path(memcg->css.cgroup, buf + len / 2, len / 2); + if (ret >= len / 2) + strcpy(buf, "?"); + else { + char *p = mangle_path(buf, buf + len / 2, " \t\n\\"); + + if (p) + *p = '\0'; + else + strcpy(buf, "?"); + } + + css_put(&memcg->css); + return ret < 0 ? ret : 0; +} + +/* + * Set or clear (if @memcg is NULL) charge association from file system to + * memcg. If @memcg != NULL, then a css reference must be held by the caller to + * ensure that the cgroup is not deleted during this operation. + */ +void mem_cgroup_set_charge_target(struct mem_cgroup **target, + struct mem_cgroup *memcg) +{ + if (memcg) + css_get(&memcg->css); + memcg = xchg(target, memcg); + if (memcg) + css_put(&memcg->css); +} + +/* + * Returns the memcg to charge for inode pages. If non-NULL is returned, caller + * must drop reference with css_put(). NULL indicates that the inode does not + * have a memcg to charge, so the default process based policy should be used. + */ +struct mem_cgroup * +mem_cgroup_mapping_get_charge_target(struct address_space *mapping) +{ + struct mem_cgroup *memcg; + + if (!mapping) + return NULL; + + rcu_read_lock(); + memcg = rcu_dereference(mapping->host->i_sb->s_memcg_to_charge); + if (memcg && !css_tryget_online(&memcg->css)) + memcg = NULL; + rcu_read_unlock(); + + return memcg; +} + static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages) { @@ -6678,6 +6799,15 @@ static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, return ret; } +int mem_cgroup_charge_memcg(struct folio *folio, struct mem_cgroup *memcg, + gfp_t gfp) +{ + if (mem_cgroup_disabled()) + return 0; + + return charge_memcg(folio, memcg, gfp); +} + int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp) { struct mem_cgroup *memcg; diff --git a/mm/shmem.c b/mm/shmem.c index 23c91a8beb781..8b623c49ee50d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -115,10 +115,14 @@ struct shmem_options { bool full_inums; int huge; int seen; +#if CONFIG_MEMCG + struct mem_cgroup *memcg; +#endif #define SHMEM_SEEN_BLOCKS 1 #define SHMEM_SEEN_INODES 2 #define SHMEM_SEEN_HUGE 4 #define SHMEM_SEEN_INUMS 8 +#define SHMEM_SEEN_MEMCG 16 }; #ifdef CONFIG_TMPFS @@ -697,6 +701,7 @@ static int shmem_add_to_page_cache(struct page *page, unsigned long i = 0; unsigned long nr = compound_nr(page); int error; + struct mem_cgroup *remote_memcg; VM_BUG_ON_PAGE(PageTail(page), page); VM_BUG_ON_PAGE(index != round_down(index, nr), page); @@ -709,7 +714,14 @@ static int shmem_add_to_page_cache(struct page *page, page->index = index; if (!PageSwapCache(page)) { - error = mem_cgroup_charge(page_folio(page), charge_mm, gfp); + remote_memcg = mem_cgroup_mapping_get_charge_target(mapping); + if (remote_memcg) { + error = mem_cgroup_charge_memcg(page_folio(page), + remote_memcg, gfp); + mem_cgroup_put_memcg(remote_memcg); + } else + error = mem_cgroup_charge(page_folio(page), charge_mm, + gfp); if (error) { if (PageTransHuge(page)) { count_vm_event(THP_FILE_FALLBACK); @@ -1822,6 +1834,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, int error; int once = 0; int alloced = 0; + struct mem_cgroup *remote_memcg, *old_memcg; if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT)) return -EFBIG; @@ -1834,8 +1847,21 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, sbinfo = SHMEM_SB(inode->i_sb); charge_mm = vma ? vma->vm_mm : NULL; + /* + * If we're doing a remote charge here, set the active_memcg as the + * remote memcg, so that eventually if pagecache_get_page() calls into + * filemap_add_folio(), we charge the correct memcg. + */ + remote_memcg = mem_cgroup_mapping_get_charge_target(mapping); + if (remote_memcg) + old_memcg = set_active_memcg(remote_memcg); + page = pagecache_get_page(mapping, index, FGP_ENTRY | FGP_HEAD | FGP_LOCK, 0); + if (remote_memcg) { + set_active_memcg(old_memcg); + mem_cgroup_put_memcg(remote_memcg); + } if (page && vma && userfaultfd_minor(vma)) { if (!xa_is_value(page)) { @@ -3342,6 +3368,7 @@ static const struct export_operations shmem_export_ops = { enum shmem_param { Opt_gid, Opt_huge, + Opt_memcg, Opt_mode, Opt_mpol, Opt_nr_blocks, @@ -3363,6 +3390,7 @@ static const struct constant_table shmem_param_enums_huge[] = { const struct fs_parameter_spec shmem_fs_parameters[] = { fsparam_u32 ("gid", Opt_gid), fsparam_enum ("huge", Opt_huge, shmem_param_enums_huge), + fsparam_string("memcg", Opt_memcg), fsparam_u32oct("mode", Opt_mode), fsparam_string("mpol", Opt_mpol), fsparam_string("nr_blocks", Opt_nr_blocks), @@ -3379,6 +3407,9 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) struct shmem_options *ctx = fc->fs_private; struct fs_parse_result result; unsigned long long size; +#if CONFIG_MEMCG + struct mem_cgroup *memcg; +#endif char *rest; int opt; @@ -3412,6 +3443,17 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) goto bad_value; ctx->seen |= SHMEM_SEEN_INODES; break; +#if CONFIG_MEMCG + case Opt_memcg: + if (ctx->memcg) + css_put(&ctx->memcg->css); + memcg = mem_cgroup_get_from_path(param->string); + if (IS_ERR(memcg)) + goto bad_value; + ctx->memcg = memcg; + ctx->seen |= SHMEM_SEEN_MEMCG; + break; +#endif case Opt_mode: ctx->mode = result.uint_32 & 07777; break; @@ -3573,6 +3615,14 @@ static int shmem_reconfigure(struct fs_context *fc) } raw_spin_unlock(&sbinfo->stat_lock); mpol_put(mpol); +#if CONFIG_MEMCG + if (ctx->seen & SHMEM_SEEN_MEMCG && ctx->memcg) { + mem_cgroup_set_charge_target(&fc->root->d_sb->s_memcg_to_charge, + ctx->memcg); + css_put(&ctx->memcg->css); + ctx->memcg = NULL; + } +#endif return 0; out: raw_spin_unlock(&sbinfo->stat_lock); @@ -3582,6 +3632,11 @@ static int shmem_reconfigure(struct fs_context *fc) static int shmem_show_options(struct seq_file *seq, struct dentry *root) { struct shmem_sb_info *sbinfo = SHMEM_SB(root->d_sb); + int err; + char *buf = __getname(); + + if (!buf) + return -ENOMEM; if (sbinfo->max_blocks != shmem_default_max_blocks()) seq_printf(seq, ",size=%luk", @@ -3625,7 +3680,13 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root) seq_printf(seq, ",huge=%s", shmem_format_huge(sbinfo->huge)); #endif shmem_show_mpol(seq, sbinfo->mpol); - return 0; + /* Memory cgroup binding: memcg=cgroup_name */ + err = mem_cgroup_get_name_from_sb(root->d_sb, buf, PATH_MAX); + if (!err && buf[0] != '\0') + seq_printf(seq, ",memcg=%s", buf); + + __putname(buf); + return err; } #endif /* CONFIG_TMPFS */ @@ -3710,6 +3771,14 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_flags |= SB_POSIXACL; #endif uuid_gen(&sb->s_uuid); +#if CONFIG_MEMCG + if (ctx->memcg) { + mem_cgroup_set_charge_target(&sb->s_memcg_to_charge, + ctx->memcg); + css_put(&ctx->memcg->css); + ctx->memcg = NULL; + } +#endif inode = shmem_get_inode(sb, NULL, S_IFDIR | sbinfo->mode, 0, VM_NORESERVE); if (!inode) From patchwork Thu Nov 11 23:42:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12615881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6A7EC433FE for ; Thu, 11 Nov 2021 23:42:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2C6E36108B for ; Thu, 11 Nov 2021 23:42:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2C6E36108B Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 1CBF46B007D; Thu, 11 Nov 2021 18:42:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1501F6B0081; Thu, 11 Nov 2021 18:42:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F32566B0082; Thu, 11 Nov 2021 18:42:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id E09C86B007D for ; Thu, 11 Nov 2021 18:42:20 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9F93418544810 for ; Thu, 11 Nov 2021 23:42:20 +0000 (UTC) X-FDA: 78798275640.13.FB52AB3 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf19.hostedemail.com (Postfix) with ESMTP id 01CADB0000BC for ; Thu, 11 Nov 2021 23:42:09 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id k63-20020a628442000000b004812ea67c34so4712148pfd.2 for ; Thu, 11 Nov 2021 15:42:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=uSYMrm67Vam+Dj2xrSbIEx2XERRWcvK22Y5/aS6BHtY=; b=CiDhBgQvahrK+6hVk+HcAOFmXy97c2vf/1ydDDBH1xL0pfjijygM31NRMWm8S5whE6 VwFoIHYU7JCsu11Ie0nFUvS1kTau69fdABOM/7qmi9XqIpPvM4gnyKU03S0rxudHIdd4 xydVAMFmUPhqpxIziGaf67c/sT3dJ8w3NLXFbMF4+RCnD6/f/RxSiqpFh7254Gktxk1h r7WL7PoRl1aVj1lmIY5o/3Xb5FU/kFA5qsSuO+/VYs3A/t8jN9I0cEL6QIgzUeGslgbT +XNoI/2B0fbn4NtjwWsasZUm+4Kz3h1pnnI8dfhKkyAlZdQ+JxFPA6b47EQKFllu+LFD 8F0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=uSYMrm67Vam+Dj2xrSbIEx2XERRWcvK22Y5/aS6BHtY=; b=GycPnhdKY4BNIC8874TzVZjgdohQhRRV8sxMCqjQuGxhW6OAdq/R6TEqLL9HjA2KQX PIr8e1mdFPlYxAm1xeW5yVS5ahNEDIosqIMO4Xxlnw6x8a0e+KkWy+cjaqNPuNERRM9A TKD4v1ASfdYNSVKrEeFhuBd1SSmG+9+7Q9GmdwQ3xgxRoTptePbMwf6JQETOQGszlJjf MEaiAuLGTrjqsuukPYnsMoqLOEX2aseUir4asZr5vXFPsajLkzXUUJd6Tmtl63RWeI4x AJwl/FiQZh9/f4mOrV+Z6AH105HgpZiXdQVEnCPimkP4frQj/mtcmqQyXgPb00ND+bvQ CHxg== X-Gm-Message-State: AOAM532OkctkmW83nzp9maAMrjylIvx6NTMsDbY5Yt+umoFqY0mhc+xK AsY/SHEzC6CUtQquStGWRQxD5NkRx6WOIyDCvQ== X-Google-Smtp-Source: ABdhPJyp/iCYvtwqeJScKbPzvLF/N7vXMcYDBxGm3GttEQK9QMMQ4wjrI/AFO1Dp+y83jLMeVYKkH2PYDbUx9V6kDw== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:672d:70d0:3f83:676d]) (user=almasrymina job=sendgmr) by 2002:a63:7c41:: with SMTP id l1mr6912741pgn.372.1636674138045; Thu, 11 Nov 2021 15:42:18 -0800 (PST) Date: Thu, 11 Nov 2021 15:42:01 -0800 In-Reply-To: <20211111234203.1824138-1-almasrymina@google.com> Message-Id: <20211111234203.1824138-3-almasrymina@google.com> Mime-Version: 1.0 References: <20211111234203.1824138-1-almasrymina@google.com> X-Mailer: git-send-email 2.34.0.rc1.387.gb447b232ab-goog Subject: [PATCH v3 2/4] mm/oom: handle remote ooms From: Mina Almasry Cc: Mina Almasry , Michal Hocko , "Theodore Ts'o" , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , Muchun Song , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=CiDhBgQv; spf=pass (imf19.hostedemail.com: domain of 3WqqNYQsKCPcZklZrqxlhmZfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--almasrymina.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3WqqNYQsKCPcZklZrqxlhmZfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 01CADB0000BC X-Stat-Signature: hkbane4kza5pbdctb37h6n97i15nu61t X-HE-Tag: 1636674129-969401 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On remote ooms (OOMs due to remote charging), the oom-killer will attempt to find a task to kill in the memcg under oom, if the oom-killer is unable to find one, the oom-killer should simply return ENOMEM to the allocating process. If we're in pagefault path and we're unable to return ENOMEM to the allocating process, we instead kill the allocating process. Signed-off-by: Mina Almasry Cc: Michal Hocko Cc: Theodore Ts'o Cc: Greg Thelen Cc: Shakeel Butt Cc: Andrew Morton Cc: Hugh Dickins CC: Roman Gushchin Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Muchun Song Cc: riel@surriel.com Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: cgroups@vger.kernel.org --- Changes in v3: - Fixed build failures/warnings Reported-by: kernel test robot Changes in v2: - Moved the remote oom handling as Roman requested. - Used mem_cgroup_from_task(current) instead of grabbing the memcg from current->mm --- include/linux/memcontrol.h | 16 ++++++++++++++++ mm/memcontrol.c | 29 +++++++++++++++++++++++++++++ mm/oom_kill.c | 22 ++++++++++++++++++++++ 3 files changed, 67 insertions(+) -- 2.34.0.rc1.387.gb447b232ab-goog diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 8583d37c05d9b..b7a045ace7b2c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -944,6 +944,7 @@ struct mem_cgroup *mem_cgroup_get_from_path(const char *path); * it. */ int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, size_t len); +bool is_remote_oom(struct mem_cgroup *memcg_under_oom); void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, int zid, int nr_pages); @@ -981,6 +982,11 @@ static inline void mem_cgroup_exit_user_fault(void) current->in_user_fault = 0; } +static inline bool is_in_user_fault(void) +{ + return current->in_user_fault; +} + static inline bool task_in_memcg_oom(struct task_struct *p) { return p->memcg_in_oom; @@ -1281,6 +1287,11 @@ static inline int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, return 0; } +static inline bool is_remote_oom(struct mem_cgroup *memcg_under_oom) +{ + return false; +} + static inline int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) { @@ -1472,6 +1483,11 @@ static inline void mem_cgroup_exit_user_fault(void) { } +static inline bool is_in_user_fault(void) +{ + return false; +} + static inline bool task_in_memcg_oom(struct task_struct *p) { return false; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b3d8f52a63d17..8019c396bfdd9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2664,6 +2664,35 @@ int mem_cgroup_get_name_from_sb(struct super_block *sb, char *buf, size_t len) return ret < 0 ? ret : 0; } +/* + * Returns true if current's mm is a descendant of the memcg_under_oom (or + * equal to it). False otherwise. This is used by the oom-killer to detect + * ooms due to remote charging. + */ +bool is_remote_oom(struct mem_cgroup *memcg_under_oom) +{ + struct mem_cgroup *current_memcg; + bool is_remote_oom; + + if (!memcg_under_oom) + return false; + + rcu_read_lock(); + current_memcg = mem_cgroup_from_task(current); + if (current_memcg && !css_tryget_online(¤t_memcg->css)) + current_memcg = NULL; + rcu_read_unlock(); + + if (!current_memcg) + return false; + + is_remote_oom = + !mem_cgroup_is_descendant(current_memcg, memcg_under_oom); + css_put(¤t_memcg->css); + + return is_remote_oom; +} + /* * Set or clear (if @memcg is NULL) charge association from file system to * memcg. If @memcg != NULL, then a css reference must be held by the caller to diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0a7e16b16b8c3..499924efab370 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1108,6 +1108,28 @@ bool out_of_memory(struct oom_control *oc) select_bad_process(oc); /* Found nothing?!?! */ if (!oc->chosen) { + if (is_remote_oom(oc->memcg)) { + /* + * For remote ooms in userfaults, we have no choice but + * to kill the allocating process. + */ + if (is_in_user_fault() && + !oom_unkillable_task(current)) { + get_task_struct(current); + oc->chosen = current; + oom_kill_process( + oc, + "Out of memory (Killing remote allocating task)"); + return true; + } + + /* + * For remote ooms in non-userfaults, simply return + * ENOMEM to the caller. + */ + return false; + } + dump_header(oc, NULL); pr_warn("Out of memory and no killable processes...\n"); /* From patchwork Thu Nov 11 23:42:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12615885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D791C43217 for ; Thu, 11 Nov 2021 23:42:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3E9F360F55 for ; Thu, 11 Nov 2021 23:42:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3E9F360F55 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 438BC6B0085; Thu, 11 Nov 2021 18:42:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3253B6B0083; Thu, 11 Nov 2021 18:42:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 19F306B0085; Thu, 11 Nov 2021 18:42:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0088.hostedemail.com [216.40.44.88]) by kanga.kvack.org (Postfix) with ESMTP id F20266B0082 for ; Thu, 11 Nov 2021 18:42:22 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id ACA011854B8DF for ; Thu, 11 Nov 2021 23:42:22 +0000 (UTC) X-FDA: 78798275724.26.87B4FEA Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf24.hostedemail.com (Postfix) with ESMTP id AD081B0000B6 for ; Thu, 11 Nov 2021 23:42:21 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id 184-20020a6217c1000000b0049f9aad0040so4639782pfx.21 for ; Thu, 11 Nov 2021 15:42:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=QKU2zoRZAY9B8wj0r50tOlnkCOTcLxsxIw+JHt9jziI=; b=JocGXKO8vY/Ookh/ASf19pgzF9fbTxKzPIQc9mr3ZW9WLy37prKQp+n3wZnM/xMthM 2KQx2QAwcupd9pCNoZfYj+6FCwVVh6Jkn9m052frBDQ15Ovd03/ja8SLsW+GW7JuTW7/ I3epIUBB2H6nOAIkBN9Txbrt0YOYzxDhPkI1ChBp+uj1WGkwN6tJWejpY2LsYHDbTgct t+eExB3JvbT+XdeGFGYhdFhDmoi9K+yadiin6mJ9215TXb/QussBIvxWorlBaK7Mu5RX 61Yl8hcta58kMpcN4IZPjM68lyv1IVyZSGpliCorVZf2XwliHcMa5DkDnOuoS1MPphyS L0wA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=QKU2zoRZAY9B8wj0r50tOlnkCOTcLxsxIw+JHt9jziI=; b=QkiaUGilkG7gC4gpksN2Ri9UpqosshNtNnSh4acOxk/CLK9JJjgxAohysiqWbx6c9L 9CD6WgwWE2r3QMIWZ0VeKNd/Oiu8A0tp6wH1qbkeykh6hzD+Wmo3ZRZt+HMPi6lJFJCY DI57Ku6Xuqi4Ijomk2+WFybInldWQckPumF+M86tLlxk/BOlU5+ZXAsgcsXmKQAyHJvO xYw1ZlN0Ox5PKfisbKW7ZF7n7blwpUk2nER7wLOvH7GRkmHLw0k335xwSflG6mN1UWDL 7s7DyGO+nFz0IisDx78Pe19MMoq9VMu6LdgXLgs+r7GPEGQ8FyoL5rkBonfCv6q1mxfN cL0A== X-Gm-Message-State: AOAM5302HSq7XAw6agK9IJORfa2os7Cs+//PKyIHZl5+P3wwOiKDgD6p qMcQ4ypQPqwQ6vSY2DZPR8JgvMerTIvprlNFwg== X-Google-Smtp-Source: ABdhPJyv18UxWjHYGcxmL0Hv3Vr6h6XrUEROs+TEjWYMoubdPeYAV2S1MsBJ8jE6zKcEwPC+EOBt3F7jMzJb6sulWA== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:672d:70d0:3f83:676d]) (user=almasrymina job=sendgmr) by 2002:a62:8683:0:b0:480:edf9:33c0 with SMTP id x125-20020a628683000000b00480edf933c0mr10265886pfd.11.1636674140761; Thu, 11 Nov 2021 15:42:20 -0800 (PST) Date: Thu, 11 Nov 2021 15:42:02 -0800 In-Reply-To: <20211111234203.1824138-1-almasrymina@google.com> Message-Id: <20211111234203.1824138-4-almasrymina@google.com> Mime-Version: 1.0 References: <20211111234203.1824138-1-almasrymina@google.com> X-Mailer: git-send-email 2.34.0.rc1.387.gb447b232ab-goog Subject: [PATCH v3 3/4] mm, shmem: add tmpfs memcg= option documentation From: Mina Almasry Cc: Mina Almasry , Michal Hocko , "Theodore Ts'o" , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , Muchun Song , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: AD081B0000B6 X-Stat-Signature: t96rn3xs1e7ztxifttfh49ytk14srf1q Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=JocGXKO8; spf=pass (imf24.hostedemail.com: domain of 3XKqNYQsKCPkbmnbtsznjobhpphmf.dpnmjovy-nnlwbdl.psh@flex--almasrymina.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3XKqNYQsKCPkbmnbtsznjobhpphmf.dpnmjovy-nnlwbdl.psh@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1636674141-909104 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Signed-off-by: Mina Almasry Cc: Michal Hocko Cc: Theodore Ts'o Cc: Greg Thelen Cc: Shakeel Butt Cc: Andrew Morton Cc: Hugh Dickins Cc: Roman Gushchin Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Muchun Song Cc: riel@surriel.com Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: cgroups@vger.kernel.org --- Documentation/filesystems/tmpfs.rst | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) -- 2.34.0.rc1.387.gb447b232ab-goog diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst index 0408c245785e3..1ab04e8fa9222 100644 --- a/Documentation/filesystems/tmpfs.rst +++ b/Documentation/filesystems/tmpfs.rst @@ -137,6 +137,23 @@ mount options. It can be added later, when the tmpfs is already mounted on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'. +If CONFIG_MEMCG is enabled, tmpfs has a mount option to specify the memory +cgroup to be charged for page allocations. + +memcg=/sys/fs/cgroup/unified/test/: data page allocations are charged to +cgroup /sys/fs/cgroup/unified/test/. + +When charging memory to the remote memcg (memcg specified with memcg=) and +hitting the limit, the oom-killer will be invoked and will attempt to kill +a process in the remote memcg. If no such processes are found, the remote +charging process gets an ENOMEM. If the remote charging process is in the +pagefault path, it gets killed. + +Only processes that have access to /sys/fs/cgroup/unified/test/cgroup.procs can +mount a tmpfs with memcg=/sys/fs/cgroup/unified/test. Thus, a process is able +to charge memory to a cgroup only if it itself is able to enter that cgroup. + + To specify the initial root directory you can use the following mount options: From patchwork Thu Nov 11 23:42:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 12615887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81102C433FE for ; Thu, 11 Nov 2021 23:42:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 20C8060F55 for ; Thu, 11 Nov 2021 23:42:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 20C8060F55 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 1C6E36B0082; Thu, 11 Nov 2021 18:42:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 153A06B0083; Thu, 11 Nov 2021 18:42:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFE346B0087; Thu, 11 Nov 2021 18:42:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0195.hostedemail.com [216.40.44.195]) by kanga.kvack.org (Postfix) with ESMTP id C02CD6B0082 for ; Thu, 11 Nov 2021 18:42:24 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 7EE6118549E1F for ; Thu, 11 Nov 2021 23:42:24 +0000 (UTC) X-FDA: 78798275808.12.2B1343E Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf23.hostedemail.com (Postfix) with ESMTP id 60CEE90000B4 for ; Thu, 11 Nov 2021 23:42:08 +0000 (UTC) Received: by mail-pg1-f202.google.com with SMTP id z19-20020a630a53000000b002dc2f4542faso3963000pgk.13 for ; Thu, 11 Nov 2021 15:42:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:cc; bh=bJJb3LipnObduqylMFraDDSm1wjifbOFXVr66I+RhZ8=; b=EKcKi3xwn4hIude8AbYwkaaMB5MUZadnVfAkLoCepiat36Am+LfWQNO/RQlRJwJFVX OsGh0vaxXKtbkDFHjIgDBAoAmYl0N4h2oTO6vL2EehcsqcCAlkamLLHB1X67PAtXccfm Bnywi6cddZu0oSp0EyExbppuAT+y/2oMffZjbmWj4PFhy4h/yAJOhHzLTkZDGfW/C57Q dJikp42zhHMY+cU3Ii/J0/xr/53lzjNBr+PUenM7dbXlYScUd2cKnnouArpxWzwFx9Gt L7N2EfX3YtHvGYQr6stPOipojqyWpSf9RMFzrKw0p/aFKRg5cUoKFjKiH7B+yHxQI8nX wBIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:cc; bh=bJJb3LipnObduqylMFraDDSm1wjifbOFXVr66I+RhZ8=; b=PpsZWk7t24tx9ojHslIBhQj9B9ZgmdJP58eu90nMGaemd2IFJbwgA0gakse84CJGq/ j2iPgYxcOczflDqHi7CBmGKmJVhVk3ty3WH984Vsxd20rEn1r/J29dOsjPu12jVX24Du PjeizhBKTzY91JxJoUGPotqn9sxG2Ykk4ze0ohcWjdtLpTCoW0AnSZK6Fy2tA2b5T8FF QZEARVfuk+lj31OM6FFAgz4dOWl2q8VS+/fUlphYfd/wAo9yF1dDH5VR85DnjEDv4w+r qLM1TtCKSrt8nKFwgnR3g+IXSH5B5ghFAKO/3YXu3QpwsA0gK+dqxJEf2GPpQIc+e3Kn w73A== X-Gm-Message-State: AOAM532kYrtqB8pr8/dNZ1i1Bh4ejLA8i77xJq0EGeh13bvOzpJHyjgY 7kG/4IFnJfMNgN43J4/PRhJ86e/4xSQp8p/qyw== X-Google-Smtp-Source: ABdhPJxLO5x09WXyHwJWu958JgtkSOMtTHlHgrNKSREG+PxJ3VXuGHdsOrG0vy6ZWMnJYQCtZUrxZ5yUQM6F1eILMQ== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2cd:202:672d:70d0:3f83:676d]) (user=almasrymina job=sendgmr) by 2002:a63:ff09:: with SMTP id k9mr7297796pgi.76.1636674143224; Thu, 11 Nov 2021 15:42:23 -0800 (PST) Date: Thu, 11 Nov 2021 15:42:03 -0800 In-Reply-To: <20211111234203.1824138-1-almasrymina@google.com> Message-Id: <20211111234203.1824138-5-almasrymina@google.com> Mime-Version: 1.0 References: <20211111234203.1824138-1-almasrymina@google.com> X-Mailer: git-send-email 2.34.0.rc1.387.gb447b232ab-goog Subject: [PATCH v3 4/4] mm, shmem, selftests: add tmpfs memcg= mount option tests From: Mina Almasry Cc: Mina Almasry , Michal Hocko , "Theodore Ts'o" , Greg Thelen , Shakeel Butt , Andrew Morton , Hugh Dickins , Roman Gushchin , Johannes Weiner , Tejun Heo , Vladimir Davydov , Muchun Song , riel@surriel.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 60CEE90000B4 X-Stat-Signature: xqzrktkhbionpssdt9cjwzxqn8i59c7b Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EKcKi3xw; spf=pass (imf23.hostedemail.com: domain of 3X6qNYQsKCPwepqewv2qmreksskpi.gsqpmry1-qqozego.svk@flex--almasrymina.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3X6qNYQsKCPwepqewv2qmreksskpi.gsqpmry1-qqozego.svk@flex--almasrymina.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1636674128-244701 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Signed-off-by: Mina Almasry Cc: Michal Hocko Cc: Theodore Ts'o Cc: Greg Thelen Cc: Shakeel Butt Cc: Andrew Morton Cc: Hugh Dickins Cc: Roman Gushchin Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tejun Heo Cc: Vladimir Davydov Cc: Muchun Song Cc: riel@surriel.com Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org Cc: cgroups@vger.kernel.org --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/mmap_write.c | 103 ++++++++++++++++++++++ tools/testing/selftests/vm/tmpfs-memcg.sh | 91 +++++++++++++++++++ 3 files changed, 195 insertions(+) create mode 100644 tools/testing/selftests/vm/mmap_write.c create mode 100755 tools/testing/selftests/vm/tmpfs-memcg.sh -- 2.34.0.rc1.387.gb447b232ab-goog diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 2e7e86e852828..cb229974c5f15 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -19,6 +19,7 @@ madv_populate userfaultfd mlock-intersect-test mlock-random-test +mmap_write virtual_address_range gup_test va_128TBswitch diff --git a/tools/testing/selftests/vm/mmap_write.c b/tools/testing/selftests/vm/mmap_write.c new file mode 100644 index 0000000000000..88a8468f2128c --- /dev/null +++ b/tools/testing/selftests/vm/mmap_write.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This program faults memory in tmpfs + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Global definitions. */ + +/* Global variables. */ +static const char *self; +static char *shmaddr; +static int shmid; + +/* + * Show usage and exit. + */ +static void exit_usage(void) +{ + printf("Usage: %s -p -s \n", self); + exit(EXIT_FAILURE); +} + +int main(int argc, char **argv) +{ + int fd = 0; + int key = 0; + int *ptr = NULL; + int c = 0; + int size = 0; + char path[256] = ""; + int want_sleep = 0, private = 0; + int populate = 0; + int write = 0; + int reserve = 1; + + /* Parse command-line arguments. */ + setvbuf(stdout, NULL, _IONBF, 0); + self = argv[0]; + + while ((c = getopt(argc, argv, ":s:p:")) != -1) { + switch (c) { + case 's': + size = atoi(optarg); + break; + case 'p': + strncpy(path, optarg, sizeof(path)); + break; + default: + errno = EINVAL; + perror("Invalid arg"); + exit_usage(); + } + } + + printf("%s\n", path); + if (strncmp(path, "", sizeof(path)) != 0) { + printf("Writing to this path: %s\n", path); + } else { + errno = EINVAL; + perror("path not found"); + exit_usage(); + } + + if (size != 0) { + printf("Writing this size: %d\n", size); + } else { + errno = EINVAL; + perror("size not found"); + exit_usage(); + } + + fd = open(path, O_CREAT | O_RDWR, 0777); + if (fd == -1) + err(1, "Failed to open file."); + + if (ftruncate(fd, size)) + err(1, "failed to ftruncate %s", path); + + ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr == MAP_FAILED) { + close(fd); + err(1, "Error mapping the file"); + } + + printf("Writing to memory.\n"); + memset(ptr, 1, size); + printf("Done writing to memory.\n"); + close(fd); + + return 0; +} diff --git a/tools/testing/selftests/vm/tmpfs-memcg.sh b/tools/testing/selftests/vm/tmpfs-memcg.sh new file mode 100755 index 0000000000000..30da2fad06357 --- /dev/null +++ b/tools/testing/selftests/vm/tmpfs-memcg.sh @@ -0,0 +1,91 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +CGROUP_PATH=/dev/cgroup/memory/tmpfs-memcg-test + +function cleanup() { + rm -rf /mnt/tmpfs/* + umount /mnt/tmpfs + rm -rf /mnt/tmpfs + + rmdir $CGROUP_PATH + + echo CLEANUP DONE +} + +function setup() { + mkdir -p $CGROUP_PATH + echo $((10 * 1024 * 1024)) > $CGROUP_PATH/memory.limit_in_bytes + echo 0 > $CGROUP_PATH/cpuset.cpus + echo 0 > $CGROUP_PATH/cpuset.mems + + mkdir -p /mnt/tmpfs + + echo SETUP DONE +} + +function expect_equal() { + local expected="$1" + local actual="$2" + local error="$3" + + if [[ "$actual" != "$expected" ]]; then + echo "expected ($expected) != actual ($actual): $3" >&2 + cleanup + exit 1 + fi +} + +function expect_ge() { + local expected="$1" + local actual="$2" + local error="$3" + + if [[ "$actual" -lt "$expected" ]]; then + echo "expected ($expected) < actual ($actual): $3" >&2 + cleanup + exit 1 + fi +} + +cleanup +setup + +mount -t tmpfs -o memcg=$CGROUP_PATH tmpfs /mnt/tmpfs + +TARGET_MEMCG_USAGE=$(cat $CGROUP_PATH/memory.usage_in_bytes) +expect_equal 0 "$TARGET_MEMCG_USAGE" "Before echo, memcg usage should be 0" + +# Echo to allocate a page in the tmpfs +echo +echo +echo hello > /mnt/tmpfs/test +TARGET_MEMCG_USAGE=$(cat $CGROUP_PATH/memory.usage_in_bytes) +expect_ge 4096 "$TARGET_MEMCG_USAGE" "After echo, memcg usage should be greater than 4096" +echo "Echo test succeeded" + +echo +echo +tools/testing/selftests/vm/mmap_write -p /mnt/tmpfs/test -s $((1 * 1024 * 1024)) +TARGET_MEMCG_USAGE=$(cat $CGROUP_PATH/memory.usage_in_bytes) +expect_ge $((1 * 1024 * 1024)) "$TARGET_MEMCG_USAGE" "After echo, memcg usage should greater than 1MB" +echo "Write succeeded" + +# OOM the remote container on pagefault. +echo +echo +echo "OOMing the remote container using pagefault." +echo "This will take a long time because the kernel goes through reclaim retries," +echo "but should eventually be OOM-killed by 'Out of memory (Killing remote allocating task)'" +tools/testing/selftests/vm/mmap_write -p /mnt/tmpfs/test -s $((11 * 1024 * 1024)) + +# OOM the remote container on non pagefault. +echo +echo +echo "OOMing the remote container using cat (non-pagefault)" +echo "This will take a long time because the kernel goes through reclaim retries," +echo "but should eventually the cat command should receive an ENOMEM" +cat /dev/random > /mnt/tmpfs/random + +cleanup +echo SUCCESS