diff mbox series

[-next] mm/vmscan: fix an undefined behavior for zone id

Message ID 20191108204407.1435-1-cai@lca.pw (mailing list archive)
State New, archived
Headers show
Series [-next] mm/vmscan: fix an undefined behavior for zone id | expand

Commit Message

Qian Cai Nov. 8, 2019, 8:44 p.m. UTC
The -next commit "mm: vmscan: simplify lruvec_lru_size()" [1] introduced
an undefined behavior as zone_idx could equal to MAX_NR_ZONES, and then
zid is then out of range.

[ 5399.483257] LTP: starting mtest01w (mtest01 -p80 -w)
[ 5400.245051] ================================================================================
[ 5400.255784] UBSAN: Undefined behaviour in ./include/linux/memcontrol.h:536:26
[ 5400.265235] index 5 is out of range for type 'long unsigned int [5][5]'
[ 5400.273925] CPU: 28 PID: 455 Comm: kswapd7 Tainted: G        W         5.4.0-rc6-next-20191108 #3
[ 5400.285461] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[ 5400.295784] Call Trace:
[ 5400.299483]  dump_stack+0x7a/0xaa
[ 5400.304052]  ubsan_epilogue+0x9/0x26
[ 5400.309180]  __ubsan_handle_out_of_bounds.cold.13+0x2b/0x36
[ 5400.316192]  inactive_list_is_low+0x8bb/0x9f0
[ 5400.321952]  balance_pgdat+0x252/0x7d0
[ 5400.327006]  kswapd+0x251/0x590
[ 5400.331725]  ? finish_wait+0x90/0x90
[ 5400.336574]  kthread+0x12a/0x140
[ 5400.341102]  ? balance_pgdat+0x7d0/0x7d0
[ 5400.346330]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 5400.352810]  ret_from_fork+0x27/0x50

[1] https://lore.kernel.org/linux-mm/20191022144803.302233-2-hannes@cmpxchg.org/

Signed-off-by: Qian Cai <cai@lca.pw>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Qian Cai Nov. 8, 2019, 9:26 p.m. UTC | #1
> On Nov 8, 2019, at 3:44 PM, Qian Cai <cai@lca.pw> wrote:
> 
> -    for (zid = 0; zid <= zone_idx; zid++) {
> +    for (zid = 0; zid < zone_idx; zid++) {
>        struct zone *zone =

Oops, I think here needs to be,

for (zid = 0; zid <= zone_idx && zid < MAX_NR_ZONES; zid++) {

to deal with this MAX_NR_ZONES special case.
Michal Hocko Nov. 11, 2019, 10:12 a.m. UTC | #2
On Fri 08-11-19 16:26:52, Qian Cai wrote:
> 
> 
> > On Nov 8, 2019, at 3:44 PM, Qian Cai <cai@lca.pw> wrote:
> > 
> > -    for (zid = 0; zid <= zone_idx; zid++) {
> > +    for (zid = 0; zid < zone_idx; zid++) {
> >        struct zone *zone =
> 
> Oops, I think here needs to be,
> 
> for (zid = 0; zid <= zone_idx && zid < MAX_NR_ZONES; zid++) {
> 
> to deal with this MAX_NR_ZONES special case.

Yep this looks correct.

Thanks!
Chris Down Nov. 11, 2019, 1:05 p.m. UTC | #3
Qian Cai writes:
>> On Nov 8, 2019, at 3:44 PM, Qian Cai <cai@lca.pw> wrote:
>>
>> -    for (zid = 0; zid <= zone_idx; zid++) {
>> +    for (zid = 0; zid < zone_idx; zid++) {
>>        struct zone *zone =
>
>Oops, I think here needs to be,
>
>for (zid = 0; zid <= zone_idx && zid < MAX_NR_ZONES; zid++) {
>
>to deal with this MAX_NR_ZONES special case.

Ah, I just saw this in my local checkout and thought it was from my changes, 
until I saw it's also on clean mmots checkout. Thanks for the fixup!

Acked-by: Chris Down <chris@chrisdown.name>
Chris Down Nov. 11, 2019, 1:14 p.m. UTC | #4
Chris Down writes:
>Ah, I just saw this in my local checkout and thought it was from my 
>changes, until I saw it's also on clean mmots checkout. Thanks for the 
>fixup!

Also, does this mean we should change callers that may pass through 
zone_idx=MAX_NR_ZONES to become MAX_NR_ZONES-1 in a separate commit, then 
remove this interim fixup? I'm worried otherwise we might paper over real 
issues in future.
Michal Hocko Nov. 11, 2019, 1:28 p.m. UTC | #5
On Mon 11-11-19 13:14:27, Chris Down wrote:
> Chris Down writes:
> > Ah, I just saw this in my local checkout and thought it was from my
> > changes, until I saw it's also on clean mmots checkout. Thanks for the
> > fixup!
> 
> Also, does this mean we should change callers that may pass through
> zone_idx=MAX_NR_ZONES to become MAX_NR_ZONES-1 in a separate commit, then
> remove this interim fixup? I'm worried otherwise we might paper over real
> issues in future.

Yes, removing this special casing is reasonable. I am not sure
MAX_NR_ZONES - 1 is a better choice though. It is error prone and
zone_idx is the highest zone we should consider and MAX_NR_ZONES - 1
be ZONE_DEVICE if it is configured. But ZONE_DEVICE is really standing
outside of MM reclaim code AFAIK. It would be probably better to have
MAX_LRU_ZONE (equal to MOVABLE) and use it instead.
Johannes Weiner Nov. 12, 2019, 2:59 p.m. UTC | #6
Qian, thanks for the report and the fix.

On Mon, Nov 11, 2019 at 02:28:12PM +0100, Michal Hocko wrote:
> On Mon 11-11-19 13:14:27, Chris Down wrote:
> > Chris Down writes:
> > > Ah, I just saw this in my local checkout and thought it was from my
> > > changes, until I saw it's also on clean mmots checkout. Thanks for the
> > > fixup!
> > 
> > Also, does this mean we should change callers that may pass through
> > zone_idx=MAX_NR_ZONES to become MAX_NR_ZONES-1 in a separate commit, then
> > remove this interim fixup? I'm worried otherwise we might paper over real
> > issues in future.
> 
> Yes, removing this special casing is reasonable. I am not sure
> MAX_NR_ZONES - 1 is a better choice though. It is error prone and
> zone_idx is the highest zone we should consider and MAX_NR_ZONES - 1
> be ZONE_DEVICE if it is configured. But ZONE_DEVICE is really standing
> outside of MM reclaim code AFAIK. It would be probably better to have
> MAX_LRU_ZONE (equal to MOVABLE) and use it instead.

We already use MAX_NR_ZONES - 1 everywhere else in vmscan.c to mean
"no zone restrictions" - get_scan_count() is the odd one out:

- mem_cgroup_shrink_node()
- try_to_free_mem_cgroup_pages()
- balance_pgdat()
- kswapd()
- shrink_all_memory()

It's a little odd that it points to ZONE_DEVICE, but it's MUCH less
subtle than handling both inclusive and exclusive range delimiters.

So I think the better fix would be this:

---
From 1566a255eef7c2165d435125231ad1eeecac7959 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 11 Nov 2019 13:46:25 -0800
Subject: [PATCH] mm: vmscan: simplify lruvec_lru_size() fix

get_scan_count() passes MAX_NR_ZONES for the reclaim index, which is
beyond the range of valid zone indexes, but used to be handled before
the patch. Every other callsite in vmscan.c passes MAX_NR_ZONES - 1 to
express "all zones, please", so do the same here.

Reported-by: Qian Cai <cai@lca.pw>
Reported-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index df859b1d583c..34ad8a0f3f27 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2322,10 +2322,10 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	 * anon in [0], file in [1]
 	 */
 
-	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +
-		lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES);
-	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
-		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
+	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES - 1) +
+		lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES - 1);
+	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES - 1) +
+		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES - 1);
 
 	spin_lock_irq(&pgdat->lru_lock);
 	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
Michal Hocko Nov. 12, 2019, 3:27 p.m. UTC | #7
On Tue 12-11-19 06:59:42, Johannes Weiner wrote:
> Qian, thanks for the report and the fix.
> 
> On Mon, Nov 11, 2019 at 02:28:12PM +0100, Michal Hocko wrote:
> > On Mon 11-11-19 13:14:27, Chris Down wrote:
> > > Chris Down writes:
> > > > Ah, I just saw this in my local checkout and thought it was from my
> > > > changes, until I saw it's also on clean mmots checkout. Thanks for the
> > > > fixup!
> > > 
> > > Also, does this mean we should change callers that may pass through
> > > zone_idx=MAX_NR_ZONES to become MAX_NR_ZONES-1 in a separate commit, then
> > > remove this interim fixup? I'm worried otherwise we might paper over real
> > > issues in future.
> > 
> > Yes, removing this special casing is reasonable. I am not sure
> > MAX_NR_ZONES - 1 is a better choice though. It is error prone and
> > zone_idx is the highest zone we should consider and MAX_NR_ZONES - 1
> > be ZONE_DEVICE if it is configured. But ZONE_DEVICE is really standing
> > outside of MM reclaim code AFAIK. It would be probably better to have
> > MAX_LRU_ZONE (equal to MOVABLE) and use it instead.
> 
> We already use MAX_NR_ZONES - 1 everywhere else in vmscan.c to mean
> "no zone restrictions" - get_scan_count() is the odd one out:
> 
> - mem_cgroup_shrink_node()
> - try_to_free_mem_cgroup_pages()
> - balance_pgdat()
> - kswapd()
> - shrink_all_memory()
> 
> It's a little odd that it points to ZONE_DEVICE, but it's MUCH less
> subtle than handling both inclusive and exclusive range delimiters.
> 
> So I think the better fix would be this:

lruvec_lru_size is explicitly documented to use MAX_NR_ZONES for all
LRUs and git grep says there are more instances outside of
get_scan_count. So all of them have to be fixed.

I still think that MAX_NR_ZONES - 1 is a very error prone and subtle
construct IMHO and an alias would be better readable.

Anyway I definitely do agree that we do not want to use both
(MAX_NR_ZONES and MAX_NR_ZONES - 1) because that is even more confusing.

> ---
> >From 1566a255eef7c2165d435125231ad1eeecac7959 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Mon, 11 Nov 2019 13:46:25 -0800
> Subject: [PATCH] mm: vmscan: simplify lruvec_lru_size() fix
> 
> get_scan_count() passes MAX_NR_ZONES for the reclaim index, which is
> beyond the range of valid zone indexes, but used to be handled before
> the patch. Every other callsite in vmscan.c passes MAX_NR_ZONES - 1 to
> express "all zones, please", so do the same here.
> 
> Reported-by: Qian Cai <cai@lca.pw>
> Reported-by: Chris Down <chris@chrisdown.name>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/vmscan.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index df859b1d583c..34ad8a0f3f27 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2322,10 +2322,10 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
>  	 * anon in [0], file in [1]
>  	 */
>  
> -	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +
> -		lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES);
> -	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
> -		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
> +	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES - 1) +
> +		lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES - 1);
> +	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES - 1) +
> +		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES - 1);
>  
>  	spin_lock_irq(&pgdat->lru_lock);
>  	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
> -- 
> 2.24.0
Johannes Weiner Nov. 12, 2019, 4:16 p.m. UTC | #8
On Tue, Nov 12, 2019 at 04:27:50PM +0100, Michal Hocko wrote:
> On Tue 12-11-19 06:59:42, Johannes Weiner wrote:
> > Qian, thanks for the report and the fix.
> > 
> > On Mon, Nov 11, 2019 at 02:28:12PM +0100, Michal Hocko wrote:
> > > On Mon 11-11-19 13:14:27, Chris Down wrote:
> > > > Chris Down writes:
> > > > > Ah, I just saw this in my local checkout and thought it was from my
> > > > > changes, until I saw it's also on clean mmots checkout. Thanks for the
> > > > > fixup!
> > > > 
> > > > Also, does this mean we should change callers that may pass through
> > > > zone_idx=MAX_NR_ZONES to become MAX_NR_ZONES-1 in a separate commit, then
> > > > remove this interim fixup? I'm worried otherwise we might paper over real
> > > > issues in future.
> > > 
> > > Yes, removing this special casing is reasonable. I am not sure
> > > MAX_NR_ZONES - 1 is a better choice though. It is error prone and
> > > zone_idx is the highest zone we should consider and MAX_NR_ZONES - 1
> > > be ZONE_DEVICE if it is configured. But ZONE_DEVICE is really standing
> > > outside of MM reclaim code AFAIK. It would be probably better to have
> > > MAX_LRU_ZONE (equal to MOVABLE) and use it instead.
> > 
> > We already use MAX_NR_ZONES - 1 everywhere else in vmscan.c to mean
> > "no zone restrictions" - get_scan_count() is the odd one out:
> > 
> > - mem_cgroup_shrink_node()
> > - try_to_free_mem_cgroup_pages()
> > - balance_pgdat()
> > - kswapd()
> > - shrink_all_memory()
> > 
> > It's a little odd that it points to ZONE_DEVICE, but it's MUCH less
> > subtle than handling both inclusive and exclusive range delimiters.
> > 
> > So I think the better fix would be this:
> 
> lruvec_lru_size is explicitly documented to use MAX_NR_ZONES for all
> LRUs and git grep says there are more instances outside of
> get_scan_count. So all of them have to be fixed.

Which ones?

[hannes@computer linux]$ git grep lruvec_lru_size
include/linux/mmzone.h:extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx);
mm/vmscan.c: * lruvec_lru_size -  Returns the number of pages on the given LRU list.
mm/vmscan.c:unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
mm/vmscan.c:    anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES - 1) +
mm/vmscan.c:            lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES - 1);
mm/vmscan.c:    file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES - 1) +
mm/vmscan.c:            lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES - 1);
mm/vmscan.c:            lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
[hannes@computer linux]$

The only other user already passes sc->reclaim_idx, which always
points to a valid zone, and is initialized to MAX_NR_ZONES - 1 in many
places.

> I still think that MAX_NR_ZONES - 1 is a very error prone and subtle
> construct IMHO and an alias would be better readable.

I wouldn't mind a follow-up patch that changes this pattern
comprehensively. As it stands, get_scan_count() is the odd one out.

The documentation bit is a good point, though. We should fix
that. Updated patch:

---

From b1b6ce306010554aba6ebd7aac0abffc1576d71a Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 11 Nov 2019 13:46:25 -0800
Subject: [PATCH] mm: vmscan: simplify lruvec_lru_size() fix

get_scan_count() passes MAX_NR_ZONES for the reclaim index, which is
beyond the range of valid zone indexes, but used to be handled before
the patch. Every other callsite in vmscan.c passes MAX_NR_ZONES - 1 to
express "all zones, please", so do the same here.

Reported-by: Qian Cai <cai@lca.pw>
Reported-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index df859b1d583c..5eb96a63ad1e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -323,7 +323,7 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
  * lruvec_lru_size -  Returns the number of pages on the given LRU list.
  * @lruvec: lru vector
  * @lru: lru to use
- * @zone_idx: zones to consider (use MAX_NR_ZONES for the whole LRU list)
+ * @zone_idx: index of the highest zone to include (use MAX_NR_ZONES - 1 for all)
  */
 unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
 {
@@ -2322,10 +2322,10 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	 * anon in [0], file in [1]
 	 */
 
-	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +
-		lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES);
-	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
-		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
+	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES - 1) +
+		lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES - 1);
+	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES - 1) +
+		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES - 1);
 
 	spin_lock_irq(&pgdat->lru_lock);
 	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
Qian Cai Nov. 12, 2019, 4:24 p.m. UTC | #9
On Tue, 2019-11-12 at 11:16 -0500, Johannes Weiner wrote:
> On Tue, Nov 12, 2019 at 04:27:50PM +0100, Michal Hocko wrote:
> > On Tue 12-11-19 06:59:42, Johannes Weiner wrote:
> > > Qian, thanks for the report and the fix.
> > > 
> > > On Mon, Nov 11, 2019 at 02:28:12PM +0100, Michal Hocko wrote:
> > > > On Mon 11-11-19 13:14:27, Chris Down wrote:
> > > > > Chris Down writes:
> > > > > > Ah, I just saw this in my local checkout and thought it was from my
> > > > > > changes, until I saw it's also on clean mmots checkout. Thanks for the
> > > > > > fixup!
> > > > > 
> > > > > Also, does this mean we should change callers that may pass through
> > > > > zone_idx=MAX_NR_ZONES to become MAX_NR_ZONES-1 in a separate commit, then
> > > > > remove this interim fixup? I'm worried otherwise we might paper over real
> > > > > issues in future.
> > > > 
> > > > Yes, removing this special casing is reasonable. I am not sure
> > > > MAX_NR_ZONES - 1 is a better choice though. It is error prone and
> > > > zone_idx is the highest zone we should consider and MAX_NR_ZONES - 1
> > > > be ZONE_DEVICE if it is configured. But ZONE_DEVICE is really standing
> > > > outside of MM reclaim code AFAIK. It would be probably better to have
> > > > MAX_LRU_ZONE (equal to MOVABLE) and use it instead.
> > > 
> > > We already use MAX_NR_ZONES - 1 everywhere else in vmscan.c to mean
> > > "no zone restrictions" - get_scan_count() is the odd one out:
> > > 
> > > - mem_cgroup_shrink_node()
> > > - try_to_free_mem_cgroup_pages()
> > > - balance_pgdat()
> > > - kswapd()
> > > - shrink_all_memory()

There is also inactive_list_is_low(),

if (trace)
	trace_mm_vmscan_inactive_list_is_low(pgdat->node_id, sc->reclaim_idx,
		lruvec_lru_size(lruvec, inactive_lru, MAX_NR_ZONES), inactive,
		lruvec_lru_size(lruvec, active_lru, MAX_NR_ZONES), active,
		inactive_ratio, file);

> > > 
> > > It's a little odd that it points to ZONE_DEVICE, but it's MUCH less
> > > subtle than handling both inclusive and exclusive range delimiters.
> > > 
> > > So I think the better fix would be this:
> > 
> > lruvec_lru_size is explicitly documented to use MAX_NR_ZONES for all
> > LRUs and git grep says there are more instances outside of
> > get_scan_count. So all of them have to be fixed.
> 
> Which ones?
> 
> [hannes@computer linux]$ git grep lruvec_lru_size
> include/linux/mmzone.h:extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx);
> mm/vmscan.c: * lruvec_lru_size -  Returns the number of pages on the given LRU list.
> mm/vmscan.c:unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
> mm/vmscan.c:    anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES - 1) +
> mm/vmscan.c:            lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES - 1);
> mm/vmscan.c:    file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES - 1) +
> mm/vmscan.c:            lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES - 1);
> mm/vmscan.c:            lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
> [hannes@computer linux]$
> 
> The only other user already passes sc->reclaim_idx, which always
> points to a valid zone, and is initialized to MAX_NR_ZONES - 1 in many
> places.
> 
> > I still think that MAX_NR_ZONES - 1 is a very error prone and subtle
> > construct IMHO and an alias would be better readable.
> 
> I wouldn't mind a follow-up patch that changes this pattern
> comprehensively. As it stands, get_scan_count() is the odd one out.
> 
> The documentation bit is a good point, though. We should fix
> that. Updated patch:
> 
> ---
> 
> From b1b6ce306010554aba6ebd7aac0abffc1576d71a Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Mon, 11 Nov 2019 13:46:25 -0800
> Subject: [PATCH] mm: vmscan: simplify lruvec_lru_size() fix
> 
> get_scan_count() passes MAX_NR_ZONES for the reclaim index, which is
> beyond the range of valid zone indexes, but used to be handled before
> the patch. Every other callsite in vmscan.c passes MAX_NR_ZONES - 1 to
> express "all zones, please", so do the same here.
> 
> Reported-by: Qian Cai <cai@lca.pw>
> Reported-by: Chris Down <chris@chrisdown.name>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/vmscan.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index df859b1d583c..5eb96a63ad1e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -323,7 +323,7 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
>   * lruvec_lru_size -  Returns the number of pages on the given LRU list.
>   * @lruvec: lru vector
>   * @lru: lru to use
> - * @zone_idx: zones to consider (use MAX_NR_ZONES for the whole LRU list)
> + * @zone_idx: index of the highest zone to include (use MAX_NR_ZONES - 1 for all)
>   */
>  unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
>  {
> @@ -2322,10 +2322,10 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
>  	 * anon in [0], file in [1]
>  	 */
>  
> -	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +
> -		lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES);
> -	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
> -		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
> +	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES - 1) +
> +		lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES - 1);
> +	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES - 1) +
> +		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES - 1);
>  
>  	spin_lock_irq(&pgdat->lru_lock);
>  	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
Michal Hocko Nov. 12, 2019, 4:31 p.m. UTC | #10
On Tue 12-11-19 11:16:58, Johannes Weiner wrote:
> On Tue, Nov 12, 2019 at 04:27:50PM +0100, Michal Hocko wrote:
> > On Tue 12-11-19 06:59:42, Johannes Weiner wrote:
> > > Qian, thanks for the report and the fix.
> > > 
> > > On Mon, Nov 11, 2019 at 02:28:12PM +0100, Michal Hocko wrote:
> > > > On Mon 11-11-19 13:14:27, Chris Down wrote:
> > > > > Chris Down writes:
> > > > > > Ah, I just saw this in my local checkout and thought it was from my
> > > > > > changes, until I saw it's also on clean mmots checkout. Thanks for the
> > > > > > fixup!
> > > > > 
> > > > > Also, does this mean we should change callers that may pass through
> > > > > zone_idx=MAX_NR_ZONES to become MAX_NR_ZONES-1 in a separate commit, then
> > > > > remove this interim fixup? I'm worried otherwise we might paper over real
> > > > > issues in future.
> > > > 
> > > > Yes, removing this special casing is reasonable. I am not sure
> > > > MAX_NR_ZONES - 1 is a better choice though. It is error prone and
> > > > zone_idx is the highest zone we should consider and MAX_NR_ZONES - 1
> > > > be ZONE_DEVICE if it is configured. But ZONE_DEVICE is really standing
> > > > outside of MM reclaim code AFAIK. It would be probably better to have
> > > > MAX_LRU_ZONE (equal to MOVABLE) and use it instead.
> > > 
> > > We already use MAX_NR_ZONES - 1 everywhere else in vmscan.c to mean
> > > "no zone restrictions" - get_scan_count() is the odd one out:
> > > 
> > > - mem_cgroup_shrink_node()
> > > - try_to_free_mem_cgroup_pages()
> > > - balance_pgdat()
> > > - kswapd()
> > > - shrink_all_memory()
> > > 
> > > It's a little odd that it points to ZONE_DEVICE, but it's MUCH less
> > > subtle than handling both inclusive and exclusive range delimiters.
> > > 
> > > So I think the better fix would be this:
> > 
> > lruvec_lru_size is explicitly documented to use MAX_NR_ZONES for all
> > LRUs and git grep says there are more instances outside of
> > get_scan_count. So all of them have to be fixed.
> 
> Which ones?
> 
> [hannes@computer linux]$ git grep lruvec_lru_size
> include/linux/mmzone.h:extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx);
> mm/vmscan.c: * lruvec_lru_size -  Returns the number of pages on the given LRU list.
> mm/vmscan.c:unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
> mm/vmscan.c:    anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES - 1) +
> mm/vmscan.c:            lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES - 1);
> mm/vmscan.c:    file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES - 1) +
> mm/vmscan.c:            lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES - 1);
> mm/vmscan.c:            lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
> [hannes@computer linux]$

I have checked the Linus tree but now double checked with the current
next
$ git describe next/master
next-20191112
$ git grep "lruvec_lru_size.*MAX_NR_ZONES" next/master
next/master:mm/vmscan.c:                        lruvec_lru_size(lruvec, inactive_lru, MAX_NR_ZONES), inactive,
next/master:mm/vmscan.c:                        lruvec_lru_size(lruvec, active_lru, MAX_NR_ZONES), active,
next/master:mm/vmscan.c:        anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +
next/master:mm/vmscan.c:                lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES);
next/master:mm/vmscan.c:        file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
next/master:mm/vmscan.c:                lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
next/master:mm/workingset.c:    active_file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES);

are there any changes which didn't make it to linux next yet?

> The only other user already passes sc->reclaim_idx, which always
> points to a valid zone, and is initialized to MAX_NR_ZONES - 1 in many
> places.
> 
> > I still think that MAX_NR_ZONES - 1 is a very error prone and subtle
> > construct IMHO and an alias would be better readable.
> 
> I wouldn't mind a follow-up patch that changes this pattern
> comprehensively. As it stands, get_scan_count() is the odd one out.

OK, a follow up patch to unify everything makes sense to me.

> The documentation bit is a good point, though. We should fix
> that. Updated patch:
> 
> ---
> 
> >From b1b6ce306010554aba6ebd7aac0abffc1576d71a Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Mon, 11 Nov 2019 13:46:25 -0800
> Subject: [PATCH] mm: vmscan: simplify lruvec_lru_size() fix
> 
> get_scan_count() passes MAX_NR_ZONES for the reclaim index, which is
> beyond the range of valid zone indexes, but used to be handled before
> the patch. Every other callsite in vmscan.c passes MAX_NR_ZONES - 1 to
> express "all zones, please", so do the same here.
> 
> Reported-by: Qian Cai <cai@lca.pw>
> Reported-by: Chris Down <chris@chrisdown.name>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/vmscan.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index df859b1d583c..5eb96a63ad1e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -323,7 +323,7 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
>   * lruvec_lru_size -  Returns the number of pages on the given LRU list.
>   * @lruvec: lru vector
>   * @lru: lru to use
> - * @zone_idx: zones to consider (use MAX_NR_ZONES for the whole LRU list)
> + * @zone_idx: index of the highest zone to include (use MAX_NR_ZONES - 1 for all)
>   */
>  unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
>  {
> @@ -2322,10 +2322,10 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
>  	 * anon in [0], file in [1]
>  	 */
>  
> -	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +
> -		lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES);
> -	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
> -		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
> +	anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES - 1) +
> +		lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES - 1);
> +	file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES - 1) +
> +		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES - 1);
>  
>  	spin_lock_irq(&pgdat->lru_lock);
>  	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
> -- 
> 2.24.0
Johannes Weiner Nov. 12, 2019, 6:20 p.m. UTC | #11
On Tue, Nov 12, 2019 at 05:31:56PM +0100, Michal Hocko wrote:
> On Tue 12-11-19 11:16:58, Johannes Weiner wrote:
> > On Tue, Nov 12, 2019 at 04:27:50PM +0100, Michal Hocko wrote:
> > > lruvec_lru_size is explicitly documented to use MAX_NR_ZONES for all
> > > LRUs and git grep says there are more instances outside of
> > > get_scan_count. So all of them have to be fixed.
> > 
> > Which ones?
> > 
> > [hannes@computer linux]$ git grep lruvec_lru_size
> > include/linux/mmzone.h:extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx);
> > mm/vmscan.c: * lruvec_lru_size -  Returns the number of pages on the given LRU list.
> > mm/vmscan.c:unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
> > mm/vmscan.c:    anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES - 1) +
> > mm/vmscan.c:            lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES - 1);
> > mm/vmscan.c:    file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES - 1) +
> > mm/vmscan.c:            lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES - 1);
> > mm/vmscan.c:            lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
> > [hannes@computer linux]$
> 
> I have checked the Linus tree but now double checked with the current
> next
> $ git describe next/master
> next-20191112
> $ git grep "lruvec_lru_size.*MAX_NR_ZONES" next/master
> next/master:mm/vmscan.c:                        lruvec_lru_size(lruvec, inactive_lru, MAX_NR_ZONES), inactive,
> next/master:mm/vmscan.c:                        lruvec_lru_size(lruvec, active_lru, MAX_NR_ZONES), active,
> next/master:mm/vmscan.c:        anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +
> next/master:mm/vmscan.c:                lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES);
> next/master:mm/vmscan.c:        file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
> next/master:mm/vmscan.c:                lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
> next/master:mm/workingset.c:    active_file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES);
> 
> are there any changes which didn't make it to linux next yet?

Aaahh, that makes sense. I was looking at the latest mmots which
has

- mm: vmscan: detect file thrashing at the reclaim root
- mm: vmscan: enforce inactive:active ratio at the reclaim root

Those replace the inactive_is_low and the workingset callsites with
the recursive lruvec_page_state(). It looks like that isn't in next -
and while I hope it'll make it into 5.5, it might not. So we need a
fix that considers the other callsites as well.

Qian's patches that Andrew already has will be good then, as it
reduces the churn to those other callsites that are in flux.

We can clean things up when the dust settles.
Michal Hocko Nov. 12, 2019, 6:30 p.m. UTC | #12
On Tue 12-11-19 13:20:17, Johannes Weiner wrote:
> On Tue, Nov 12, 2019 at 05:31:56PM +0100, Michal Hocko wrote:
> > On Tue 12-11-19 11:16:58, Johannes Weiner wrote:
> > > On Tue, Nov 12, 2019 at 04:27:50PM +0100, Michal Hocko wrote:
> > > > lruvec_lru_size is explicitly documented to use MAX_NR_ZONES for all
> > > > LRUs and git grep says there are more instances outside of
> > > > get_scan_count. So all of them have to be fixed.
> > > 
> > > Which ones?
> > > 
> > > [hannes@computer linux]$ git grep lruvec_lru_size
> > > include/linux/mmzone.h:extern unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx);
> > > mm/vmscan.c: * lruvec_lru_size -  Returns the number of pages on the given LRU list.
> > > mm/vmscan.c:unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
> > > mm/vmscan.c:    anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES - 1) +
> > > mm/vmscan.c:            lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES - 1);
> > > mm/vmscan.c:    file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES - 1) +
> > > mm/vmscan.c:            lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES - 1);
> > > mm/vmscan.c:            lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
> > > [hannes@computer linux]$
> > 
> > I have checked the Linus tree but now double checked with the current
> > next
> > $ git describe next/master
> > next-20191112
> > $ git grep "lruvec_lru_size.*MAX_NR_ZONES" next/master
> > next/master:mm/vmscan.c:                        lruvec_lru_size(lruvec, inactive_lru, MAX_NR_ZONES), inactive,
> > next/master:mm/vmscan.c:                        lruvec_lru_size(lruvec, active_lru, MAX_NR_ZONES), active,
> > next/master:mm/vmscan.c:        anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +
> > next/master:mm/vmscan.c:                lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES);
> > next/master:mm/vmscan.c:        file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
> > next/master:mm/vmscan.c:                lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
> > next/master:mm/workingset.c:    active_file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES);
> > 
> > are there any changes which didn't make it to linux next yet?
> 
> Aaahh, that makes sense. I was looking at the latest mmots which
> has
> 
> - mm: vmscan: detect file thrashing at the reclaim root
> - mm: vmscan: enforce inactive:active ratio at the reclaim root
> 
> Those replace the inactive_is_low and the workingset callsites with
> the recursive lruvec_page_state(). It looks like that isn't in next -
> and while I hope it'll make it into 5.5, it might not. So we need a
> fix that considers the other callsites as well.
> 
> Qian's patches that Andrew already has will be good then, as it
> reduces the churn to those other callsites that are in flux.
> 
> We can clean things up when the dust settles.

Yeah, while I am not really super happy about the code that is more
complex than necessary the clean up can be done on top and we can also
think about how to do it properly (I still haven't given up ;))
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index d97985262dda..9485b80d6b5b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -317,7 +317,7 @@  unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone
 	unsigned long size = 0;
 	int zid;
 
-	for (zid = 0; zid <= zone_idx; zid++) {
+	for (zid = 0; zid < zone_idx; zid++) {
 		struct zone *zone = &lruvec_pgdat(lruvec)->node_zones[zid];
 
 		if (!managed_zone(zone))