From patchwork Thu Apr 22 09:06:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abel Wu X-Patchwork-Id: 12218033 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0B06C433B4 for ; Thu, 22 Apr 2021 09:06:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1683F61459 for ; Thu, 22 Apr 2021 09:06:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1683F61459 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9663F6B0073; Thu, 22 Apr 2021 05:06:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9164A6B0074; Thu, 22 Apr 2021 05:06:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 741D56B0075; Thu, 22 Apr 2021 05:06:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 53CD26B0073 for ; Thu, 22 Apr 2021 05:06:19 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id EB54C5923C1A for ; Thu, 22 Apr 2021 09:06:18 +0000 (UTC) X-FDA: 78059421636.22.E201279 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf10.hostedemail.com (Postfix) with ESMTP id 930A540002D0 for ; Thu, 22 Apr 2021 09:06:10 +0000 (UTC) Received: by mail-pl1-f170.google.com with SMTP id y1so7547014plg.11 for ; Thu, 22 Apr 2021 02:06:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZTf4VDhB+vsAlWiKaTvh+bcNfZdV5PiIE39ZO2UyLSc=; b=QKA0xpPzIE5m7ObGgIH4Ww862+GJtKlZgxONOym/4benb+hChVs1lThp8lwEvL6DZx gS1BnKHW2MgjI3Lqh5503JGP4B0YVGJFdTsikNoUyTXVN116heU7bqRphUrvPGj92oKd +u5PUlbGn7/JrtSL9OtxLTP32nBbFwhlhOx4HU414jrOF5HheHVeODie6K4UMF1xs22x 1CBrFHKGf5P2fQHjAKP1nUFeiSwN9yC7eLv+/vTZUzLUSaD2Hd9Io2IVjr+e60skyXih 6kNpW9gEiD1qkbAycdjVN/M0wr2e+UqQWumISWzgQUrI+Hf7bbUsUK9udCeOsfxOK0LO WP/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZTf4VDhB+vsAlWiKaTvh+bcNfZdV5PiIE39ZO2UyLSc=; b=Rskdka7PKTTZZcVGAw4rK9Tso2my9srNZZdB8Q2DIMEXMbxq5wSxKk0lgyq1uIezBU 6sLQSuNtW2sfh0M8OpZdqs7ad2olZ5xjUoj1dp8m/qe/OtzaoPsfpwWuK4fspd/NzvDG gJKRB+rWdUt38rL/xsLHJWc0AnjD7X4r7F2vliLm/Qpl7kMmX/DhOxB41FcrsTcWwP0i V8L1EsOtsvBhVvA1/wQ8FpU7LFtQZAbVXxmkgvj1mh/yCvHOxPivZ6yWopuCQ//1Pgu5 +gdF18PZQXDuv8MxW7T5x/vGrqN7nLARfKDeIuF4OProkS4qnPEi0TctloFfoFPJa+DZ 1tcw== X-Gm-Message-State: AOAM532NgeFk2DPv9M9tim1TosWUDuAinIs8aEXhtnoohmVLeQOmylIk /slmzd0jj/ykqK5TbN73rXFMTQ== X-Google-Smtp-Source: ABdhPJy7hDjm/CvB0p5qTi+Hnk4OrdeiYrBuA5gdl7Zxb1Pv4C2IC+AbA17XxgO/+j11iunrSnKAAA== X-Received: by 2002:a17:90a:4290:: with SMTP id p16mr16316343pjg.120.1619082377567; Thu, 22 Apr 2021 02:06:17 -0700 (PDT) Received: from C02DV8HUMD6R.bytedance.net ([139.177.225.240]) by smtp.gmail.com with ESMTPSA id x2sm1514348pfu.77.2021.04.22.02.06.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Apr 2021 02:06:17 -0700 (PDT) From: Abel Wu To: akpm@linux-foundation.org, lizefan.x@bytedance.com, tj@kernel.org, hannes@cmpxchg.org, corbet@lwn.net Cc: cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 1/3] mm/mempolicy: apply cpuset limits to tasks using default policy Date: Thu, 22 Apr 2021 17:06:06 +0800 Message-Id: <20210422090608.7160-2-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210422090608.7160-1-wuyun.abel@bytedance.com> References: <20210422090608.7160-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 930A540002D0 X-Stat-Signature: 6hy85uuyt9yyc8b6h3ffux15o31zhkxb X-Rspamd-Server: rspam02 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf10; identity=mailfrom; envelope-from=""; helo=mail-pl1-f170.google.com; client-ip=209.85.214.170 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619082370-50733 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The nodemasks of non-default policies (pol->v) are calculated within the restriction of task->mems_allowed, while default policies are not. This may lead to improper results of mpol_misplaced(), since it can return a target node outside of current->mems_allowed for tasks using default policies. Although this is not a bug because migrating pages to that out-of-cpuset node will fail eventually due to sanity checks in page allocation, it still would be better to avoid such useless efforts. This patch also changes the behavior of autoNUMA a bit by showing a tendency to move pages inside mems_allowed for tasks using default policies, which is good for memory isolation. Signed-off-by: Abel Wu --- mm/mempolicy.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d79fa299b70c..e0ae6997bbfb 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2516,7 +2516,10 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long /* Migrate the page towards the node whose CPU is referencing it */ if (pol->flags & MPOL_F_MORON) { - polnid = thisnid; + if (node_isset(thisnid, cpuset_current_mems_allowed)) + polnid = thisnid; + else + polnid = node_random(&cpuset_current_mems_allowed); if (!should_numa_migrate_memory(current, page, curnid, thiscpu)) goto out; From patchwork Thu Apr 22 09:06:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Abel Wu X-Patchwork-Id: 12218035 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D440FC43460 for ; Thu, 22 Apr 2021 09:06:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 127EA613B0 for ; Thu, 22 Apr 2021 09:06:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 127EA613B0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8CAF96B0078; Thu, 22 Apr 2021 05:06:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87ADE8D0001; Thu, 22 Apr 2021 05:06:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F3596B007D; Thu, 22 Apr 2021 05:06:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id 46F866B0078 for ; Thu, 22 Apr 2021 05:06:22 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0AC3718014A16 for ; Thu, 22 Apr 2021 09:06:22 +0000 (UTC) X-FDA: 78059421804.10.78C6F87 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf09.hostedemail.com (Postfix) with ESMTP id C0A7E6000128 for ; Thu, 22 Apr 2021 09:06:16 +0000 (UTC) Received: by mail-pg1-f174.google.com with SMTP id m12so11654981pgr.9 for ; Thu, 22 Apr 2021 02:06:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4BNCpox+Q4e8hYikN+nhg8PF+MLOBfs4k+9TqIvGhtE=; b=btTOT05HMemuwbHYCb2mPjUrw/63sacWVxqhXZD3iPMjcg4U9gW2ZeRbL0EbO97mdr b9SJTY4H0lyJoXxgcxu9R9Xer4PwHpj1qK2UPhYbyn1AlctU8BDQsqKAHQOTlUQALVSK NiReesIJk5nrCG915MRPoWrxUBYNp1am+b42iGPEgg+2V5HvoEBhiSB1TBHOtzmfZiHB l1O3+/wcJ/DLazkXk/4BdirNJ39KGBdGr+hvQRNJY/1EBq13drewLqLtZevWu7eKsg2y GvDwH9mgpc83FZZssWySS6jNwtqoiTq7385cSXxCZmqPPiszxnCjuPOUiBM5jx+REvRx WTuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4BNCpox+Q4e8hYikN+nhg8PF+MLOBfs4k+9TqIvGhtE=; b=FH8jO0tuF3uDIUJ90el+ehBXKAxUQ8RJfTKKRXBUjjelIjqqBIdiRVth1V9vbe2eC/ YKUkwTrq/uTc2I00qfNw8zP+dAgF/Ds7ZuiBJaWUzd2uRolukWyryrUosbyxVe9b203a HoU644jeZNGKxLQPtTxRFYT+gF12oxWNQ4zAmpad1Tf3dRg2qyEOwWTUyfdh7VANwjJu vmnxHyobQWKUdMYBSTBQhbFWtS2GSrolTngA+bMkpXiO1MRwCWY3AGg5NuClxsoDrp88 /3NFM9ZvUKF9NEQtQCLTJP4S0PBT0w0hVb7yZWXG5jpdzCmW1pPYiMh6hMCIUhGHapir iukQ== X-Gm-Message-State: AOAM530ffces1DTRq2rLuFoSjBxe9ZoA/SAFpptG5ZXPzdsrSon9/20U 0viASdf/NP477ohhQN9Hyg4Npw== X-Google-Smtp-Source: ABdhPJxK+DeadLf9GyTrtbaiBLpLybvZoNyKy+NoxtowOmP+Dcbv9ma7k5UsNPm0zKzgZf2LpvblcQ== X-Received: by 2002:a62:f249:0:b029:249:acb4:98b6 with SMTP id y9-20020a62f2490000b0290249acb498b6mr2414480pfl.55.1619082380780; Thu, 22 Apr 2021 02:06:20 -0700 (PDT) Received: from C02DV8HUMD6R.bytedance.net ([139.177.225.240]) by smtp.gmail.com with ESMTPSA id x2sm1514348pfu.77.2021.04.22.02.06.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Apr 2021 02:06:20 -0700 (PDT) From: Abel Wu To: akpm@linux-foundation.org, lizefan.x@bytedance.com, tj@kernel.org, hannes@cmpxchg.org, corbet@lwn.net Cc: cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 2/3] cgroup/cpuset: introduce cpuset.mems.migration Date: Thu, 22 Apr 2021 17:06:07 +0800 Message-Id: <20210422090608.7160-3-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210422090608.7160-1-wuyun.abel@bytedance.com> References: <20210422090608.7160-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C0A7E6000128 X-Stat-Signature: grdtdz4bo35jjkdqj1jgjjr93angedeb Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from=""; helo=mail-pg1-f174.google.com; client-ip=209.85.215.174 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619082376-188680 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Some of our services are quite performance sensitive and actually NUMA-aware designed, aka numa-services. The SLOs can be easily violated when co-locate these services with other workloads. Thus they are granted to occupy the whole one or several NUMA nodes according to their quota. When a NUMA node is assigned to numa-service, the workload on that node needs to be moved away fast and complete. The main aspects we cared about on the eviction are as follows: a) it should complete soon enough so that numa-services won’t wait too long to hurt user experience b) the workloads to be evicted could have massive usage on memory, and migrating such amount of memory may lead to a sudden severe performance drop lasting tens of seconds that some certain workloads may not afford c) the impact of the eviction should be limited within the source and destination nodes d) cgroup interface is preferred So we come to a thought that: 1) fire up numa-services without waiting for memory migration 2) memory migration can be done asynchronously by using spare memory bandwidth AutoNUMA seems to be a solution, but its scope is global which violates c&d. And cpuset.memory_migrate performs in a synchronous fashion which breaks a&b. So a mixture of them, the new cgroup2 interface cpuset.mems.migration, is introduced. The new cpuset.mems.migration supports three modes: - "none" mode, meaning migration disabled - "sync" mode, which is exactly the same as the cgroup v1 interface cpuset.memory_migrate - "lazy" mode, when walking through all the pages, unlike cpuset.memory_migrate, it only sets pages to protnone, and numa faults triggered by later touch will handle the movement. See next patch for detailed information. Signed-off-by: Abel Wu --- kernel/cgroup/cpuset.c | 104 ++++++++++++++++++++++++++++++++--------- mm/mempolicy.c | 2 +- 2 files changed, 84 insertions(+), 22 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index a945504c0ae7..ee84f168eea8 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -212,6 +212,7 @@ typedef enum { CS_MEM_EXCLUSIVE, CS_MEM_HARDWALL, CS_MEMORY_MIGRATE, + CS_MEMORY_MIGRATE_LAZY, CS_SCHED_LOAD_BALANCE, CS_SPREAD_PAGE, CS_SPREAD_SLAB, @@ -248,6 +249,11 @@ static inline int is_memory_migrate(const struct cpuset *cs) return test_bit(CS_MEMORY_MIGRATE, &cs->flags); } +static inline int is_memory_migrate_lazy(const struct cpuset *cs) +{ + return test_bit(CS_MEMORY_MIGRATE_LAZY, &cs->flags); +} + static inline int is_spread_page(const struct cpuset *cs) { return test_bit(CS_SPREAD_PAGE, &cs->flags); @@ -1594,6 +1600,7 @@ struct cpuset_migrate_mm_work { struct mm_struct *mm; nodemask_t from; nodemask_t to; + int flags; }; static void cpuset_migrate_mm_workfn(struct work_struct *work) @@ -1602,21 +1609,29 @@ static void cpuset_migrate_mm_workfn(struct work_struct *work) container_of(work, struct cpuset_migrate_mm_work, work); /* on a wq worker, no need to worry about %current's mems_allowed */ - do_migrate_pages(mwork->mm, &mwork->from, &mwork->to, MPOL_MF_MOVE_ALL); + do_migrate_pages(mwork->mm, &mwork->from, &mwork->to, mwork->flags); mmput(mwork->mm); kfree(mwork); } -static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from, - const nodemask_t *to) +static void cpuset_migrate_mm(struct cpuset *cs, struct mm_struct *mm, + const nodemask_t *from, const nodemask_t *to) { - struct cpuset_migrate_mm_work *mwork; + struct cpuset_migrate_mm_work *mwork = NULL; + int flags = 0; - mwork = kzalloc(sizeof(*mwork), GFP_KERNEL); + if (is_memory_migrate_lazy(cs)) + flags = MPOL_MF_LAZY; + else if (is_memory_migrate(cs)) + flags = MPOL_MF_MOVE_ALL; + + if (flags) + mwork = kzalloc(sizeof(*mwork), GFP_KERNEL); if (mwork) { mwork->mm = mm; mwork->from = *from; mwork->to = *to; + mwork->flags = flags; INIT_WORK(&mwork->work, cpuset_migrate_mm_workfn); queue_work(cpuset_migrate_mm_wq, &mwork->work); } else { @@ -1690,7 +1705,6 @@ static void update_tasks_nodemask(struct cpuset *cs) css_task_iter_start(&cs->css, 0, &it); while ((task = css_task_iter_next(&it))) { struct mm_struct *mm; - bool migrate; cpuset_change_task_nodemask(task, &newmems); @@ -1698,13 +1712,8 @@ static void update_tasks_nodemask(struct cpuset *cs) if (!mm) continue; - migrate = is_memory_migrate(cs); - mpol_rebind_mm(mm, &cs->mems_allowed); - if (migrate) - cpuset_migrate_mm(mm, &cs->old_mems_allowed, &newmems); - else - mmput(mm); + cpuset_migrate_mm(cs, mm, &cs->old_mems_allowed, &newmems); } css_task_iter_end(&it); @@ -1911,6 +1920,11 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, else clear_bit(bit, &trialcs->flags); + if (bit == CS_MEMORY_MIGRATE) + clear_bit(CS_MEMORY_MIGRATE_LAZY, &trialcs->flags); + if (bit == CS_MEMORY_MIGRATE_LAZY) + clear_bit(CS_MEMORY_MIGRATE, &trialcs->flags); + err = validate_change(cs, trialcs); if (err < 0) goto out; @@ -2237,11 +2251,8 @@ static void cpuset_attach(struct cgroup_taskset *tset) * @old_mems_allowed is the right nodesets that we * migrate mm from. */ - if (is_memory_migrate(cs)) - cpuset_migrate_mm(mm, &oldcs->old_mems_allowed, - &cpuset_attach_nodemask_to); - else - mmput(mm); + cpuset_migrate_mm(cs, mm, &oldcs->old_mems_allowed, + &cpuset_attach_nodemask_to); } } @@ -2258,6 +2269,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) typedef enum { FILE_MEMORY_MIGRATE, + FILE_MEMORY_MIGRATE_LAZY, FILE_CPULIST, FILE_MEMLIST, FILE_EFFECTIVE_CPULIST, @@ -2275,11 +2287,8 @@ typedef enum { FILE_SPREAD_SLAB, } cpuset_filetype_t; -static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, - u64 val) +static int __cpuset_write_u64(struct cpuset *cs, cpuset_filetype_t type, u64 val) { - struct cpuset *cs = css_cs(css); - cpuset_filetype_t type = cft->private; int retval = 0; get_online_cpus(); @@ -2305,6 +2314,9 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, case FILE_MEMORY_MIGRATE: retval = update_flag(CS_MEMORY_MIGRATE, cs, val); break; + case FILE_MEMORY_MIGRATE_LAZY: + retval = update_flag(CS_MEMORY_MIGRATE_LAZY, cs, val); + break; case FILE_MEMORY_PRESSURE_ENABLED: cpuset_memory_pressure_enabled = !!val; break; @@ -2324,6 +2336,12 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, return retval; } +static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, + u64 val) +{ + return __cpuset_write_u64(css_cs(css), cft->private, val); +} + static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft, s64 val) { @@ -2473,6 +2491,8 @@ static u64 cpuset_read_u64(struct cgroup_subsys_state *css, struct cftype *cft) return is_sched_load_balance(cs); case FILE_MEMORY_MIGRATE: return is_memory_migrate(cs); + case FILE_MEMORY_MIGRATE_LAZY: + return is_memory_migrate_lazy(cs); case FILE_MEMORY_PRESSURE_ENABLED: return cpuset_memory_pressure_enabled; case FILE_MEMORY_PRESSURE: @@ -2555,6 +2575,40 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf, return retval ?: nbytes; } +static int cpuset_mm_migration_show(struct seq_file *seq, void *v) +{ + struct cpuset *cs = css_cs(seq_css(seq)); + + if (is_memory_migrate_lazy(cs)) + seq_puts(seq, "lazy\n"); + else if (is_memory_migrate(cs)) + seq_puts(seq, "sync\n"); + else + seq_puts(seq, "none\n"); + return 0; +} + +static ssize_t cpuset_mm_migration_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct cpuset *cs = css_cs(of_css(of)); + cpuset_filetype_t type = FILE_MEMORY_MIGRATE; + int turning_on = 1; + int retval; + + buf = strstrip(buf); + + if (!strcmp(buf, "none")) + turning_on = 0; + else if (!strcmp(buf, "lazy")) + type = FILE_MEMORY_MIGRATE_LAZY; + else if (strcmp(buf, "sync")) + return -EINVAL; + + retval = __cpuset_write_u64(cs, type, turning_on); + return retval ?: nbytes; +} + /* * for the common functions, 'private' gives the type of file */ @@ -2711,6 +2765,14 @@ static struct cftype dfl_files[] = { .flags = CFTYPE_DEBUG, }, + { + .name = "mems.migration", + .seq_show = cpuset_mm_migration_show, + .write = cpuset_mm_migration_write, + .private = FILE_MEMORY_MIGRATE, + .flags = CFTYPE_NOT_ON_ROOT, + }, + { } /* terminate */ }; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index e0ae6997bbfb..f816b2ac5f52 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1097,7 +1097,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, * need migration. Between passing in the full user address * space range and MPOL_MF_DISCONTIG_OK, this call can not fail. */ - VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))); + VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL | MPOL_MF_LAZY))); queue_pages_range(mm, mm->mmap->vm_start, mm->task_size, &nmask, flags | MPOL_MF_DISCONTIG_OK, &pagelist); From patchwork Thu Apr 22 09:06:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Abel Wu X-Patchwork-Id: 12218037 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8F3FC433ED for ; Thu, 22 Apr 2021 09:06:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4EF0661458 for ; Thu, 22 Apr 2021 09:06:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4EF0661458 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CC1816B007D; Thu, 22 Apr 2021 05:06:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4CC36B007E; Thu, 22 Apr 2021 05:06:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B12216B0080; Thu, 22 Apr 2021 05:06:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0148.hostedemail.com [216.40.44.148]) by kanga.kvack.org (Postfix) with ESMTP id 8F4FA6B007D for ; Thu, 22 Apr 2021 05:06:25 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4E1BD180ACF16 for ; Thu, 22 Apr 2021 09:06:25 +0000 (UTC) X-FDA: 78059421930.14.42A5DB1 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf11.hostedemail.com (Postfix) with ESMTP id 64F7B2000242 for ; Thu, 22 Apr 2021 09:06:11 +0000 (UTC) Received: by mail-pf1-f173.google.com with SMTP id 10so22337248pfl.1 for ; Thu, 22 Apr 2021 02:06:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SfF/Z0ScslK5LaLkBhpaNHR++tEupceqm51lspaZqiM=; b=DDJ1gaaHVhTnJEYif/PYB82fh8uyWeB0mKRq575+v9VXDwnUsvR0QZrLWnxhvjzeKP GBvaClcBVchRRh34BoRsfFf/JWIhLmVsBHRGBlfCpxnj3LFsc9mZ9E16wl3fhjnx0PwI YML3XbmzkG6cmYHj4FydNXJewHjMOjfmq9jPXRqt05Ktm5GVHcWQQJxN7gRVaN01chh7 VlB5G0U1VIGpRexa5T40CJpjITNHl/c/2zMpH5u3Smlih3z8dkCcp0EnuCh3ymT06htv dsBv1yoDXNEaPYqKFMIKnRDUmHgudqSxqy8zkT1AR5nLtXTOW/0Gf+G6hEN5rC3/u7t7 Kb5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SfF/Z0ScslK5LaLkBhpaNHR++tEupceqm51lspaZqiM=; b=FNyh4/Ov3/oVwVsz44uKwSr89b+5w5MKRaRcHssLkk73k47FhPPp4nZJ2Qp1+ZHaFy 4ZQc96PA0vobWz8l9MFfnI8joYrYMRzna/ZiUUNUaCqa0g+0Yy3r1Rh7PENdHCf6Hlv9 dg3Zns9uRiatvbzIev2PwZ92PjX57SEmUhce6S5RaffrRVUQnvn9S+CZsXr5brjy816E 13oIx1K6kn+efUAPKWsAvBsVEXBUTc9urDoPwwBUMvqRhjyxge55cD1klgm12lLLGGM9 sS3y2s3Sab9W2tgAbbQZ+VRSZ16txYXII+eSzjcpLLqbfrq/3C8ekBOLgy4p7Ihfyslv 0NrQ== X-Gm-Message-State: AOAM5328UGglRtcvEYzq1H+mMyCtczCGMJrgVSy5os5H3q76cMH2ICVj E0yb2ovgQgC0V6LNBhWdt+cT5w== X-Google-Smtp-Source: ABdhPJxIjtKJshdj00uAYNjGJ/vZ/LJ4kTMFy2Md31l7tYMpW1iId+EC+yEwM0P1TWU1YGqa6vJfgw== X-Received: by 2002:a65:6704:: with SMTP id u4mr2499719pgf.169.1619082384007; Thu, 22 Apr 2021 02:06:24 -0700 (PDT) Received: from C02DV8HUMD6R.bytedance.net ([139.177.225.240]) by smtp.gmail.com with ESMTPSA id x2sm1514348pfu.77.2021.04.22.02.06.21 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 22 Apr 2021 02:06:23 -0700 (PDT) From: Abel Wu To: akpm@linux-foundation.org, lizefan.x@bytedance.com, tj@kernel.org, hannes@cmpxchg.org, corbet@lwn.net Cc: cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 3/3] docs/admin-guide/cgroup-v2: add cpuset.mems.migration Date: Thu, 22 Apr 2021 17:06:08 +0800 Message-Id: <20210422090608.7160-4-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210422090608.7160-1-wuyun.abel@bytedance.com> References: <20210422090608.7160-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: 7hmp3mxq5j1x9swezeo461pzotcxbo3r X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 64F7B2000242 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf11; identity=mailfrom; envelope-from=""; helo=mail-pf1-f173.google.com; client-ip=209.85.210.173 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619082371-728061 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add docs for new interface cpuset.mems.migration, most of which are stolen from cpuset(7) manpages. Signed-off-by: Abel Wu --- Documentation/admin-guide/cgroup-v2.rst | 36 +++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index b1e81aa8598a..abf6589a390d 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2079,6 +2079,42 @@ Cpuset Interface Files Changing the partition state of an invalid partition root to "member" is always allowed even if child cpusets are present. + cpuset.mems.migration + A read-write single value file which exists on non-root + cpuset-enabled cgroups. + + Only the following migration modes are defined. + + ======== ========================================== + "none" migration disabled [default] + "sync" move pages to cpuset nodes synchronously + "lazy" move pages to cpuset nodes on second touch + ======== ========================================== + + By default, "none" mode is enabled. In this mode, once a page + is allocated (given a physical page of main memory) then that + page stays on whatever node it was allocated, so long as it + remains allocated, even if the cpusets memory placement policy + 'cpuset.mems' subsequently changes. + + If "sync" mode is enabled in a cpuset, when the 'cpuset.mems' + setting is changed, any memory page in use by any process in + the cpuset that is on a memory node that is no longer allowed + will be migrated to a memory node that is allowed synchronously. + The relative placement of a migrated page within the cpuset is + preserved during these migration operations if possible. + + The "lazy" mode is almost the same as "sync" mode, except that + it doesn't move the pages right away. Instead it sets these + pages to protnone, and numa faults triggered by second touching + these pages will handle the movement. + + Furthermore, if a process is moved into a cpuset with migration + enabled ("sync" or "lazy" enabled), any memory pages it uses + that on memory nodes allowed in its previous cpuset, but which + are not allowed in its new cpuset, will be migrated to a memory + node allowed in the new cpuset. + Device controller -----------------