[3/3] mm: Fix missing mem cgroup soft limit tree updates

Message ID	3b6e4e9aa8b3ee1466269baf23ed82d90a8f791c.1612902157.git.tim.c.chen@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=0hDs=HL=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D43F764E92 IronPort-SDR: 2iHnL25oHyGfJ8mFaSHcrzvhKUTM2AOECIqGYpu7JOOVy6NE+YwnC6gAlDk2N+hCtp7eW7ESvH wByiENQjdFkw== IronPort-SDR: ntxFtm3gbYXN9YZW0jDW+P93csodkdqLTeOygb9DELVS/54Q5IXqSoho4FE39fWWsA4tGeFxqB 6mFZv6L04ljg== From: Tim Chen <tim.c.chen@linux.intel.com> To: Andrew Morton <akpm@linux-foundation.org>, Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@suse.cz>, Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Tim Chen <tim.c.chen@linux.intel.com>, Dave Hansen <dave.hansen@intel.com>, Ying Huang <ying.huang@intel.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 3/3] mm: Fix missing mem cgroup soft limit tree updates Date: Tue, 9 Feb 2021 12:29:47 -0800 Message-Id: <3b6e4e9aa8b3ee1466269baf23ed82d90a8f791c.1612902157.git.tim.c.chen@linux.intel.com> In-Reply-To: <cover.1612902157.git.tim.c.chen@linux.intel.com> References: <cover.1612902157.git.tim.c.chen@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Soft limit memory management bug fixes \| expand [0/3] Soft limit memory management bug fixes [1/3] mm: Fix dropped memcg from mem cgroup soft limit tree [2/3] mm: Force update of mem cgroup soft limit tree on usage excess [3/3] mm: Fix missing mem cgroup soft limit tree updates

Message ID

3b6e4e9aa8b3ee1466269baf23ed82d90a8f791c.1612902157.git.tim.c.chen@linux.intel.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D43F764E92
IronPort-SDR: 
 2iHnL25oHyGfJ8mFaSHcrzvhKUTM2AOECIqGYpu7JOOVy6NE+YwnC6gAlDk2N+hCtp7eW7ESvH
 wByiENQjdFkw==
IronPort-SDR: 
 ntxFtm3gbYXN9YZW0jDW+P93csodkdqLTeOygb9DELVS/54Q5IXqSoho4FE39fWWsA4tGeFxqB
 6mFZv6L04ljg==
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Ying Huang <ying.huang@intel.com>,
	linux-mm@kvack.org,
	cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 3/3] mm: Fix missing mem cgroup soft limit tree updates
Date: Tue,  9 Feb 2021 12:29:47 -0800
Message-Id: 
 <3b6e4e9aa8b3ee1466269baf23ed82d90a8f791c.1612902157.git.tim.c.chen@linux.intel.com>
In-Reply-To: <cover.1612902157.git.tim.c.chen@linux.intel.com>
References: <cover.1612902157.git.tim.c.chen@linux.intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Soft limit memory management bug fixes | expand

Commit Message

Tim Chen Feb. 9, 2021, 8:29 p.m. UTC

On a per node basis, the mem cgroup soft limit tree on each node tracks
how much a cgroup has exceeded its soft limit memory limit and sorts
the cgroup by its excess usage.  On page release, the trees are not
updated right away, until we have gathered a batch of pages belonging to
the same cgroup. This reduces the frequency of updating the soft limit tree
and locking of the tree and associated cgroup.

However, the batch of pages could contain pages from multiple nodes but
only the soft limit tree from one node would get updated.  Change the
logic so that we update the tree in batch of pages, with each batch of
pages all in the same mem cgroup and memory node.  An update is issued for
the batch of pages of a node collected till now whenever we encounter
a page belonging to a different node.

Reviewed-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 mm/memcontrol.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Johannes Weiner Feb. 9, 2021, 10:22 p.m. UTC | #1

Hello Tim,

On Tue, Feb 09, 2021 at 12:29:47PM -0800, Tim Chen wrote:
> @@ -6849,7 +6850,9 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  	 * exclusive access to the page.
>  	 */
>  
> -	if (ug->memcg != page_memcg(page)) {
> +	if (ug->memcg != page_memcg(page) ||
> +	    /* uncharge batch update soft limit tree on a node basis */
> +	    (ug->dummy_page && ug->nid != page_to_nid(page))) {

The fix makes sense to me.

However, unconditionally breaking up the batch by node can
unnecessarily regress workloads in cgroups that do not have a soft
limit configured, and cgroup2 which doesn't have soft limits at
all. Consider an interleaving allocation policy for example.

Can you please further gate on memcg->soft_limit != PAGE_COUNTER_MAX,
or at least on !cgroup_subsys_on_dfl(memory_cgrp_subsys)?

Thanks

Tim Chen Feb. 9, 2021, 10:34 p.m. UTC | #2

On 2/9/21 2:22 PM, Johannes Weiner wrote:
> Hello Tim,
> 
> On Tue, Feb 09, 2021 at 12:29:47PM -0800, Tim Chen wrote:
>> @@ -6849,7 +6850,9 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>>  	 * exclusive access to the page.
>>  	 */
>>  
>> -	if (ug->memcg != page_memcg(page)) {
>> +	if (ug->memcg != page_memcg(page) ||
>> +	    /* uncharge batch update soft limit tree on a node basis */
>> +	    (ug->dummy_page && ug->nid != page_to_nid(page))) {
> 
> The fix makes sense to me.
> 
> However, unconditionally breaking up the batch by node can
> unnecessarily regress workloads in cgroups that do not have a soft
> limit configured, and cgroup2 which doesn't have soft limits at
> all. Consider an interleaving allocation policy for example.
> 
> Can you please further gate on memcg->soft_limit != PAGE_COUNTER_MAX,
> or at least on !cgroup_subsys_on_dfl(memory_cgrp_subsys)?
> 

Sure.  Will fix this.

Tim

Michal Hocko Feb. 10, 2021, 10:08 a.m. UTC | #3

On Tue 09-02-21 12:29:47, Tim Chen wrote:
> On a per node basis, the mem cgroup soft limit tree on each node tracks
> how much a cgroup has exceeded its soft limit memory limit and sorts
> the cgroup by its excess usage.  On page release, the trees are not
> updated right away, until we have gathered a batch of pages belonging to
> the same cgroup. This reduces the frequency of updating the soft limit tree
> and locking of the tree and associated cgroup.
> 
> However, the batch of pages could contain pages from multiple nodes but
> only the soft limit tree from one node would get updated.  Change the
> logic so that we update the tree in batch of pages, with each batch of
> pages all in the same mem cgroup and memory node.  An update is issued for
> the batch of pages of a node collected till now whenever we encounter
> a page belonging to a different node.

I do agree with Johannes here. This shouldn't be done unconditionally
for all memcgs. Wouldn't it be much better to do the fix up in the
mem_cgroup_soft_reclaim path instead. Simply check the excess before
doing any reclaim?

Btw. have you seen this triggering a noticeable misbehaving? I would
expect this to have a rather small effect considering how many sources
of memcg_check_events we have.

Unless I have missed something this has been introduced by 747db954cab6
("mm: memcontrol: use page lists for uncharge batching"). Please add
Fixes tag as well if this is really worth fixing.

> Reviewed-by: Ying Huang <ying.huang@intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  mm/memcontrol.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d72449eeb85a..f5a4a0e4e2ec 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6804,6 +6804,7 @@ struct uncharge_gather {
>  	unsigned long pgpgout;
>  	unsigned long nr_kmem;
>  	struct page *dummy_page;
> +	int nid;
>  };
>  
>  static inline void uncharge_gather_clear(struct uncharge_gather *ug)
> @@ -6849,7 +6850,9 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  	 * exclusive access to the page.
>  	 */
>  
> -	if (ug->memcg != page_memcg(page)) {
> +	if (ug->memcg != page_memcg(page) ||
> +	    /* uncharge batch update soft limit tree on a node basis */
> +	    (ug->dummy_page && ug->nid != page_to_nid(page))) {
>  		if (ug->memcg) {
>  			uncharge_batch(ug);
>  			uncharge_gather_clear(ug);
> @@ -6869,6 +6872,7 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  		ug->pgpgout++;
>  
>  	ug->dummy_page = page;
> +	ug->nid = page_to_nid(page);
>  	page->memcg_data = 0;
>  	css_put(&ug->memcg->css);
>  }
> -- 
> 2.20.1

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d72449eeb85a..f5a4a0e4e2ec 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6804,6 +6804,7 @@  struct uncharge_gather {
 	unsigned long pgpgout;
 	unsigned long nr_kmem;
 	struct page *dummy_page;
+	int nid;
 };
 
 static inline void uncharge_gather_clear(struct uncharge_gather *ug)
@@ -6849,7 +6850,9 @@  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 	 * exclusive access to the page.
 	 */
 
-	if (ug->memcg != page_memcg(page)) {
+	if (ug->memcg != page_memcg(page) ||
+	    /* uncharge batch update soft limit tree on a node basis */
+	    (ug->dummy_page && ug->nid != page_to_nid(page))) {
 		if (ug->memcg) {
 			uncharge_batch(ug);
 			uncharge_gather_clear(ug);
@@ -6869,6 +6872,7 @@  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 		ug->pgpgout++;
 
 	ug->dummy_page = page;
+	ug->nid = page_to_nid(page);
 	page->memcg_data = 0;
 	css_put(&ug->memcg->css);
 }

[3/3] mm: Fix missing mem cgroup soft limit tree updates

Commit Message

Comments

Patch