Message ID | 20180817231834.15959-1-guro@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] mm: don't miss the last page because of round-off error | expand |
On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote: > - scan = div64_u64(scan * fraction[file], > - denominator); > + if (scan > 1) > + scan = div64_u64(scan * fraction[file], > + denominator); Wouldn't we be better off doing a div_round_up? ie: scan = div64_u64(scan * fraction[file] + denominator - 1, denominator); although i'd rather hide that in a new macro in math64.h than opencode it here.
On Fri, Aug 17, 2018 at 06:22:13PM -0700, Matthew Wilcox wrote: > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote: > > - scan = div64_u64(scan * fraction[file], > > - denominator); > > + if (scan > 1) > > + scan = div64_u64(scan * fraction[file], > > + denominator); > > Wouldn't we be better off doing a div_round_up? ie: > > scan = div64_u64(scan * fraction[file] + denominator - 1, denominator); > > although i'd rather hide that in a new macro in math64.h than opencode it > here. Good idea! Will do in v2. Thanks!
On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote: > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote: >> - scan = div64_u64(scan * fraction[file], >> - denominator); >> + if (scan > 1) >> + scan = div64_u64(scan * fraction[file], >> + denominator); > > Wouldn't we be better off doing a div_round_up? ie: > > scan = div64_u64(scan * fraction[file] + denominator - 1, denominator); > > although i'd rather hide that in a new macro in math64.h than opencode it > here. All numbers here should be up to nr_pages * 200 and fit into unsigned long. I see no reason for u64. If they overflow then u64 wouldn't help either. There is macro DIV_ROUND_UP in kernel.h
On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote: > On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote: > >> - scan = div64_u64(scan * fraction[file], > >> - denominator); > >> + if (scan > 1) > >> + scan = div64_u64(scan * fraction[file], > >> + denominator); > > > > Wouldn't we be better off doing a div_round_up? ie: > > > > scan = div64_u64(scan * fraction[file] + denominator - 1, denominator); > > > > although i'd rather hide that in a new macro in math64.h than opencode it > > here. > > All numbers here should be up to nr_pages * 200 and fit into unsigned long. > I see no reason for u64. If they overflow then u64 wouldn't help either. Shaohua added the div64 usage initially, adding him to the cc. > There is macro DIV_ROUND_UP in kernel.h Indeed there is.
On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote: > On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote: > >> - scan = div64_u64(scan * fraction[file], > >> - denominator); > >> + if (scan > 1) > >> + scan = div64_u64(scan * fraction[file], > >> + denominator); > > > > Wouldn't we be better off doing a div_round_up? ie: > > > > scan = div64_u64(scan * fraction[file] + denominator - 1, denominator); > > > > although i'd rather hide that in a new macro in math64.h than opencode it > > here. > > All numbers here should be up to nr_pages * 200 and fit into unsigned long. > I see no reason for u64. If they overflow then u64 wouldn't help either. It is nr_pages * 200 * recent_scanned, where recent_scanned can be up to four times of what's on the LRUs. That can overflow a u32 with even small amounts of memory.
On Tue, Aug 21, 2018 at 8:15 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote: >> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote: >> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote: >> >> - scan = div64_u64(scan * fraction[file], >> >> - denominator); >> >> + if (scan > 1) >> >> + scan = div64_u64(scan * fraction[file], >> >> + denominator); >> > >> > Wouldn't we be better off doing a div_round_up? ie: >> > >> > scan = div64_u64(scan * fraction[file] + denominator - 1, denominator); >> > >> > although i'd rather hide that in a new macro in math64.h than opencode it >> > here. >> >> All numbers here should be up to nr_pages * 200 and fit into unsigned long. >> I see no reason for u64. If they overflow then u64 wouldn't help either. > > It is nr_pages * 200 * recent_scanned, where recent_scanned can be up > to four times of what's on the LRUs. That can overflow a u32 with even > small amounts of memory. Ah, this thing is inverted because it aims to proportional reactivation rate rather than the proportional pressure to reclaimable pages. That's not obvious. I suppose this should be in comment above it. Well, at least denominator should fit into unsigned long. So full 64/64 division is redundant.
On Wed, Aug 22, 2018 at 09:01:19AM +0300, Konstantin Khlebnikov wrote: > On Tue, Aug 21, 2018 at 8:15 PM, Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Tue, Aug 21, 2018 at 08:11:44AM +0300, Konstantin Khlebnikov wrote: > >> On Sat, Aug 18, 2018 at 4:22 AM, Matthew Wilcox <willy@infradead.org> wrote: > >> > On Fri, Aug 17, 2018 at 04:18:34PM -0700, Roman Gushchin wrote: > >> >> - scan = div64_u64(scan * fraction[file], > >> >> - denominator); > >> >> + if (scan > 1) > >> >> + scan = div64_u64(scan * fraction[file], > >> >> + denominator); > >> > > >> > Wouldn't we be better off doing a div_round_up? ie: > >> > > >> > scan = div64_u64(scan * fraction[file] + denominator - 1, denominator); > >> > > >> > although i'd rather hide that in a new macro in math64.h than opencode it > >> > here. > >> > >> All numbers here should be up to nr_pages * 200 and fit into unsigned long. > >> I see no reason for u64. If they overflow then u64 wouldn't help either. > > > > It is nr_pages * 200 * recent_scanned, where recent_scanned can be up > > to four times of what's on the LRUs. That can overflow a u32 with even > > small amounts of memory. > > Ah, this thing is inverted because it aims to proportional reactivation rate > rather than the proportional pressure to reclaimable pages. > That's not obvious. I suppose this should be in comment above it. > > Well, at least denominator should fit into unsigned long. So full > 64/64 division is redundant. In any case it's not related to the original issue and should be treated separately. I'd like to keep the patch simple to make backporting to stable easy. All refactorings can be done separately, if necessarily. Thanks!
diff --git a/mm/vmscan.c b/mm/vmscan.c index 03822f86f288..f85c5ec01886 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2287,9 +2287,12 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, /* * Scan types proportional to swappiness and * their relative recent reclaim efficiency. + * Make sure we don't miss the last page + * because of a round-off error. */ - scan = div64_u64(scan * fraction[file], - denominator); + if (scan > 1) + scan = div64_u64(scan * fraction[file], + denominator); break; case SCAN_FILE: case SCAN_ANON:
I've noticed, that dying memory cgroups are often pinned in memory by a single pagecache page. Even under moderate memory pressure they sometimes stayed in such state for a long time. That looked strange. My investigation showed that the problem is caused by applying the LRU pressure balancing math: scan = div64_u64(scan * fraction[lru], denominator), where denominator = fraction[anon] + fraction[file] + 1. Because fraction[lru] is always less than denominator, if the initial scan size is 1, the result is always 0. This means the last page is not scanned and has no chances to be reclaimed. Fix this by skipping the balancing logic if the initial scan count is 1. In practice this change significantly improves the speed of dying cgroups reclaim. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> --- mm/vmscan.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)