Message ID | 20220407031525.2368067-6-yuzhao@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Multi-Gen LRU Framework | expand |
On Wed, 6 Apr 2022 21:15:17 -0600 Yu Zhao <yuzhao@google.com> wrote: > Evictable pages are divided into multiple generations for each lruvec. > The youngest generation number is stored in lrugen->max_seq for both > anon and file types as they are aged on an equal footing. The oldest > generation numbers are stored in lrugen->min_seq[] separately for anon > and file types as clean file pages can be evicted regardless of swap > constraints. These three variables are monotonically increasing. > > ... > > +static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) There's a lot of function inlining here. Fortunately the compiler will ignore it all, because some of it looks wrong. Please review (and remeasure!). If inlining is reqlly justified, use __always_inline, and document the reasons for doing so. > +{ > + int gen; > + unsigned long old_flags, new_flags; > + > + do { > + new_flags = old_flags = READ_ONCE(folio->flags); > + if (!(new_flags & LRU_GEN_MASK)) > + return false; > + > + VM_BUG_ON_FOLIO(folio_test_active(folio), folio); > + VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio); > + > + gen = ((new_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; > + > + new_flags &= ~LRU_GEN_MASK; > + /* for shrink_page_list() */ > + if (reclaiming) > + new_flags &= ~(BIT(PG_referenced) | BIT(PG_reclaim)); > + else if (lru_gen_is_active(lruvec, gen)) > + new_flags |= BIT(PG_active); > + } while (cmpxchg(&folio->flags, old_flags, new_flags) != old_flags); Clearly the cmpxchg loop is handling races against a concurrent updater. But it's unclear who that updater is, what are the dynamics here and why the designer didn't use, say, spin_lock(). The way to clarify such thigs is with code comments! > > +#endif /* !__GENERATING_BOUNDS_H */ > + > +/* > + * Evictable pages are divided into multiple generations. The youngest and the > + * oldest generation numbers, max_seq and min_seq, are monotonically increasing. > + * They form a sliding window of a variable size [MIN_NR_GENS, MAX_NR_GENS]. An > + * offset within MAX_NR_GENS, gen, indexes the LRU list of the corresponding The "within MAX_NR_GENS, gen," text here is unclear? > + * generation. The gen counter in folio->flags stores gen+1 while a page is on > + * one of lrugen->lists[]. Otherwise it stores 0. > + * > + * A page is added to the youngest generation on faulting. The aging needs to > + * check the accessed bit at least twice before handing this page over to the > + * eviction. The first check takes care of the accessed bit set on the initial > + * fault; the second check makes sure this page hasn't been used since then. > + * This process, AKA second chance, requires a minimum of two generations, > + * hence MIN_NR_GENS. And to maintain ABI compatibility with the active/inactive Where is the ABI compatibility issue? Is it in some way in which the legacy LRU is presented to userspace? > + * LRU, these two generations are considered active; the rest of generations, if > + * they exist, are considered inactive. See lru_gen_is_active(). PG_active is > + * always cleared while a page is on one of lrugen->lists[] so that the aging > + * needs not to worry about it. And it's set again when a page considered active > + * is isolated for non-reclaiming purposes, e.g., migration. See > + * lru_gen_add_folio() and lru_gen_del_folio(). > + * > + * MAX_NR_GENS is set to 4 so that the multi-gen LRU can support twice of the "twice the number of"? > + * categories of the active/inactive LRU when keeping track of accesses through > + * page tables. It requires order_base_2(MAX_NR_GENS+1) bits in folio->flags. > + */ Helpful comment, overall. > > ... > > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -909,6 +909,14 @@ config ANON_VMA_NAME > area from being merged with adjacent virtual memory areas due to the > difference in their name. > > +config LRU_GEN > + bool "Multi-Gen LRU" > + depends on MMU > + # the following options can use up the spare bits in page flags > + depends on !MAXSMP && (64BIT || !SPARSEMEM || SPARSEMEM_VMEMMAP) > + help > + A high performance LRU implementation to overcommit memory. > + > source "mm/damon/Kconfig" This is a problem. I had to jump through hoops just to be able to compile-test this. Turns out I had to figure out how to disable MAXSMP. Can we please figure out a way to ensure that more testers are at least compile testing this? Allnoconfig, defconfig, allyesconfig, allmodconfig. Also, I suggest that we actually make MGLRU the default while in linux-next. > > ... >
On Mon, Apr 11, 2022 at 07:16:15PM -0700, Andrew Morton wrote: > > +{ > > + int gen; > > + unsigned long old_flags, new_flags; > > + > > + do { > > + new_flags = old_flags = READ_ONCE(folio->flags); > > + if (!(new_flags & LRU_GEN_MASK)) > > + return false; > > + > > + VM_BUG_ON_FOLIO(folio_test_active(folio), folio); > > + VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio); > > + > > + gen = ((new_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; > > + > > + new_flags &= ~LRU_GEN_MASK; > > + /* for shrink_page_list() */ > > + if (reclaiming) > > + new_flags &= ~(BIT(PG_referenced) | BIT(PG_reclaim)); > > + else if (lru_gen_is_active(lruvec, gen)) > > + new_flags |= BIT(PG_active); > > + } while (cmpxchg(&folio->flags, old_flags, new_flags) != old_flags); Also; please use the form: unsigned long new, old = READ_ONCE(folio->flags); do { new = /* something */ old; } while (!try_cmpxchg(&folio->flags, &old, new))
On Tue, Apr 12, 2022 at 1:06 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Apr 11, 2022 at 07:16:15PM -0700, Andrew Morton wrote: > > > > +{ > > > + int gen; > > > + unsigned long old_flags, new_flags; > > > + > > > + do { > > > + new_flags = old_flags = READ_ONCE(folio->flags); > > > + if (!(new_flags & LRU_GEN_MASK)) > > > + return false; > > > + > > > + VM_BUG_ON_FOLIO(folio_test_active(folio), folio); > > > + VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio); > > > + > > > + gen = ((new_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; > > > + > > > + new_flags &= ~LRU_GEN_MASK; > > > + /* for shrink_page_list() */ > > > + if (reclaiming) > > > + new_flags &= ~(BIT(PG_referenced) | BIT(PG_reclaim)); > > > + else if (lru_gen_is_active(lruvec, gen)) > > > + new_flags |= BIT(PG_active); > > > + } while (cmpxchg(&folio->flags, old_flags, new_flags) != old_flags); > > Also; please use the form: > > unsigned long new, old = READ_ONCE(folio->flags); > > do { > new = /* something */ old; > } while (!try_cmpxchg(&folio->flags, &old, new)) Sweet. Thanks. A related question: if I pass new = old to try_cmpxchg(), does it know that and avoid an unnecessary atomic op?
On Tue, Apr 19, 2022 at 5:39 PM Yu Zhao <yuzhao@google.com> wrote: > > A related question: if I pass new = old to try_cmpxchg(), does it know > that and avoid an unnecessary atomic op? No. try_cmpxchg() basically translates directly to a cmpxchg instruction (on x86) with the return value being the eflags 'Z' bit. Linus
On Mon, Apr 11, 2022 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 6 Apr 2022 21:15:17 -0600 Yu Zhao <yuzhao@google.com> wrote: > > > Evictable pages are divided into multiple generations for each lruvec. > > The youngest generation number is stored in lrugen->max_seq for both > > anon and file types as they are aged on an equal footing. The oldest > > generation numbers are stored in lrugen->min_seq[] separately for anon > > and file types as clean file pages can be evicted regardless of swap > > constraints. These three variables are monotonically increasing. > > > > ... > > > > +static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) > > There's a lot of function inlining here. Fortunately the compiler will > ignore it all, because some of it looks wrong. Please review (and > remeasure!). If inlining is reqlly justified, use __always_inline, and > document the reasons for doing so. I totally expect modern compilers to make better decisions than I do. And personally, I'd never use __always_inline; instead, I'd strongly recommend FDO/LTO. > > +{ > > + int gen; > > + unsigned long old_flags, new_flags; > > + > > + do { > > + new_flags = old_flags = READ_ONCE(folio->flags); > > + if (!(new_flags & LRU_GEN_MASK)) > > + return false; > > + > > + VM_BUG_ON_FOLIO(folio_test_active(folio), folio); > > + VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio); > > + > > + gen = ((new_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; > > + > > + new_flags &= ~LRU_GEN_MASK; > > + /* for shrink_page_list() */ > > + if (reclaiming) > > + new_flags &= ~(BIT(PG_referenced) | BIT(PG_reclaim)); > > + else if (lru_gen_is_active(lruvec, gen)) > > + new_flags |= BIT(PG_active); > > + } while (cmpxchg(&folio->flags, old_flags, new_flags) != old_flags); > > Clearly the cmpxchg loop is handling races against a concurrent > updater. But it's unclear who that updater is, what are the dynamics > here and why the designer didn't use, say, spin_lock(). The way to > clarify such thigs is with code comments! Right. set_mask_bits() should suffice here. > > +#endif /* !__GENERATING_BOUNDS_H */ > > + > > +/* > > + * Evictable pages are divided into multiple generations. The youngest and the > > + * oldest generation numbers, max_seq and min_seq, are monotonically increasing. > > + * They form a sliding window of a variable size [MIN_NR_GENS, MAX_NR_GENS]. An > > + * offset within MAX_NR_GENS, gen, indexes the LRU list of the corresponding > > The "within MAX_NR_GENS, gen," text here is unclear? Will update: "i.e., gen". > > + * generation. The gen counter in folio->flags stores gen+1 while a page is on > > + * one of lrugen->lists[]. Otherwise it stores 0. > > + * > > + * A page is added to the youngest generation on faulting. The aging needs to > > + * check the accessed bit at least twice before handing this page over to the > > + * eviction. The first check takes care of the accessed bit set on the initial > > + * fault; the second check makes sure this page hasn't been used since then. > > + * This process, AKA second chance, requires a minimum of two generations, > > + * hence MIN_NR_GENS. And to maintain ABI compatibility with the active/inactive > > Where is the ABI compatibility issue? Is it in some way in which the > legacy LRU is presented to userspace? Will update: yes, active/inactive LRU sizes in /proc/vmstat. > > + * LRU, these two generations are considered active; the rest of generations, if > > + * they exist, are considered inactive. See lru_gen_is_active(). PG_active is > > + * always cleared while a page is on one of lrugen->lists[] so that the aging > > + * needs not to worry about it. And it's set again when a page considered active > > + * is isolated for non-reclaiming purposes, e.g., migration. See > > + * lru_gen_add_folio() and lru_gen_del_folio(). > > + * > > + * MAX_NR_GENS is set to 4 so that the multi-gen LRU can support twice of the > > "twice the number of"? Will update. > > + * categories of the active/inactive LRU when keeping track of accesses through > > + * page tables. It requires order_base_2(MAX_NR_GENS+1) bits in folio->flags. > > + */ > > Helpful comment, overall. > > > > > ... > > > > --- a/mm/Kconfig > > +++ b/mm/Kconfig > > @@ -909,6 +909,14 @@ config ANON_VMA_NAME > > area from being merged with adjacent virtual memory areas due to the > > difference in their name. > > > > +config LRU_GEN > > + bool "Multi-Gen LRU" > > + depends on MMU > > + # the following options can use up the spare bits in page flags > > + depends on !MAXSMP && (64BIT || !SPARSEMEM || SPARSEMEM_VMEMMAP) > > + help > > + A high performance LRU implementation to overcommit memory. > > + > > source "mm/damon/Kconfig" > > This is a problem. I had to jump through hoops just to be able to > compile-test this. Turns out I had to figure out how to disable > MAXSMP. > > Can we please figure out a way to ensure that more testers are at least > compile testing this? Allnoconfig, defconfig, allyesconfig, allmodconfig. > > Also, I suggest that we actually make MGLRU the default while in linux-next. The !MAXSMP is to work around [1], which I haven't had the time to fix. That BUILD_BUG_ON() shouldn't assert sizeof(struct page) == 64 since the true size depends on WANT_PAGE_VIRTUAL as well as LAST_CPUPID_NOT_IN_PAGE_FLAGS. My plan is here [2]. [1] https://lore.kernel.org/r/20190905154603.10349-4-aneesh.kumar@linux.ibm.com/ [2] https://lore.kernel.org/r/Ygl1Gf+ATBuI%2Fm2q@google.com/
On Tue, 26 Apr 2022 16:39:07 -0600 Yu Zhao <yuzhao@google.com> wrote: > On Mon, Apr 11, 2022 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Wed, 6 Apr 2022 21:15:17 -0600 Yu Zhao <yuzhao@google.com> wrote: > > > > > Evictable pages are divided into multiple generations for each lruvec. > > > The youngest generation number is stored in lrugen->max_seq for both > > > anon and file types as they are aged on an equal footing. The oldest > > > generation numbers are stored in lrugen->min_seq[] separately for anon > > > and file types as clean file pages can be evicted regardless of swap > > > constraints. These three variables are monotonically increasing. > > > > > > ... > > > > > > +static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) > > > > There's a lot of function inlining here. Fortunately the compiler will > > ignore it all, because some of it looks wrong. Please review (and > > remeasure!). If inlining is reqlly justified, use __always_inline, and > > document the reasons for doing so. > > I totally expect modern compilers to make better decisions than I do. > And personally, I'd never use __always_inline; instead, I'd strongly > recommend FDO/LTO. My (badly expressed) point is that there's a lot of inlining of large functions here. For example, lru_gen_add_folio() is huge and has 4(?) call sites. This may well produce slower code due to the icache footprint. Experiment: moving lru_gen_del_folio() into mm/vmscan.c shrinks that file's .text from 80612 bytes to 78956. I tend to think that out-of-line regular old C functions should be the default and that the code should be inlined only when a clear benefit is demonstrable, or has at least been seriously thought about. > > > --- a/mm/Kconfig > > > +++ b/mm/Kconfig > > > @@ -909,6 +909,14 @@ config ANON_VMA_NAME > > > area from being merged with adjacent virtual memory areas due to the > > > difference in their name. > > > > > > +config LRU_GEN > > > + bool "Multi-Gen LRU" > > > + depends on MMU > > > + # the following options can use up the spare bits in page flags > > > + depends on !MAXSMP && (64BIT || !SPARSEMEM || SPARSEMEM_VMEMMAP) > > > + help > > > + A high performance LRU implementation to overcommit memory. > > > + > > > source "mm/damon/Kconfig" > > > > This is a problem. I had to jump through hoops just to be able to > > compile-test this. Turns out I had to figure out how to disable > > MAXSMP. > > > > Can we please figure out a way to ensure that more testers are at least > > compile testing this? Allnoconfig, defconfig, allyesconfig, allmodconfig. > > > > Also, I suggest that we actually make MGLRU the default while in linux-next. > > The !MAXSMP is to work around [1], which I haven't had the time to > fix. That BUILD_BUG_ON() shouldn't assert sizeof(struct page) == 64 > since the true size depends on WANT_PAGE_VIRTUAL as well as > LAST_CPUPID_NOT_IN_PAGE_FLAGS. My plan is here [2]. > > [1] https://lore.kernel.org/r/20190905154603.10349-4-aneesh.kumar@linux.ibm.com/ > [2] https://lore.kernel.org/r/Ygl1Gf+ATBuI%2Fm2q@google.com/ OK, thanks. This is fairly urgent for -next and -rc inclusion. If practically nobody is compiling the feature then practically nobody is testing it. Let's come up with a way to improves the expected coverage by a lot.
On Tue, Apr 26, 2022 at 5:42 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 26 Apr 2022 16:39:07 -0600 Yu Zhao <yuzhao@google.com> wrote: > > > On Mon, Apr 11, 2022 at 8:16 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Wed, 6 Apr 2022 21:15:17 -0600 Yu Zhao <yuzhao@google.com> wrote: > > > > > > > Evictable pages are divided into multiple generations for each lruvec. > > > > The youngest generation number is stored in lrugen->max_seq for both > > > > anon and file types as they are aged on an equal footing. The oldest > > > > generation numbers are stored in lrugen->min_seq[] separately for anon > > > > and file types as clean file pages can be evicted regardless of swap > > > > constraints. These three variables are monotonically increasing. > > > > > > > > ... > > > > > > > > +static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) > > > > > > There's a lot of function inlining here. Fortunately the compiler will > > > ignore it all, because some of it looks wrong. Please review (and > > > remeasure!). If inlining is reqlly justified, use __always_inline, and > > > document the reasons for doing so. > > > > I totally expect modern compilers to make better decisions than I do. > > And personally, I'd never use __always_inline; instead, I'd strongly > > recommend FDO/LTO. > > My (badly expressed) point is that there's a lot of inlining of large > functions here. > > For example, lru_gen_add_folio() is huge and has 4(?) call sites. This > may well produce slower code due to the icache footprint. > > Experiment: moving lru_gen_del_folio() into mm/vmscan.c shrinks that > file's .text from 80612 bytes to 78956. > > I tend to think that out-of-line regular old C functions should be the > default and that the code should be inlined only when a clear benefit > is demonstrable, or has at least been seriously thought about. I can move those functions to vmscan.c if you think it would improve performance. I don't have a strong opinion here -- I was able to measure the bloat but not the performance impact. > > > > --- a/mm/Kconfig > > > > +++ b/mm/Kconfig > > > > @@ -909,6 +909,14 @@ config ANON_VMA_NAME > > > > area from being merged with adjacent virtual memory areas due to the > > > > difference in their name. > > > > > > > > +config LRU_GEN > > > > + bool "Multi-Gen LRU" > > > > + depends on MMU > > > > + # the following options can use up the spare bits in page flags > > > > + depends on !MAXSMP && (64BIT || !SPARSEMEM || SPARSEMEM_VMEMMAP) > > > > + help > > > > + A high performance LRU implementation to overcommit memory. > > > > + > > > > source "mm/damon/Kconfig" > > > > > > This is a problem. I had to jump through hoops just to be able to > > > compile-test this. Turns out I had to figure out how to disable > > > MAXSMP. > > > > > > Can we please figure out a way to ensure that more testers are at least > > > compile testing this? Allnoconfig, defconfig, allyesconfig, allmodconfig. > > > > > > Also, I suggest that we actually make MGLRU the default while in linux-next. > > > > The !MAXSMP is to work around [1], which I haven't had the time to > > fix. That BUILD_BUG_ON() shouldn't assert sizeof(struct page) == 64 > > since the true size depends on WANT_PAGE_VIRTUAL as well as > > LAST_CPUPID_NOT_IN_PAGE_FLAGS. My plan is here [2]. > > > > [1] https://lore.kernel.org/r/20190905154603.10349-4-aneesh.kumar@linux.ibm.com/ > > [2] https://lore.kernel.org/r/Ygl1Gf+ATBuI%2Fm2q@google.com/ > > OK, thanks. This is fairly urgent for -next and -rc inclusion. If > practically nobody is compiling the feature then practically nobody is > testing it. Let's come up with a way to improves the expected coverage > by a lot. Let me just remove !MAXSMP, since I wasn't able to reproduce this build error [1] anymore. [1] https://lore.kernel.org/r/1792f0b2e29.d72f70c9807100.8179330337708563324@xanmod.org/
On Tue, 26 Apr 2022 19:18:21 -0600 Yu Zhao <yuzhao@google.com> wrote: > > For example, lru_gen_add_folio() is huge and has 4(?) call sites. This > > may well produce slower code due to the icache footprint. > > > > Experiment: moving lru_gen_del_folio() into mm/vmscan.c shrinks that > > file's .text from 80612 bytes to 78956. > > > > I tend to think that out-of-line regular old C functions should be the > > default and that the code should be inlined only when a clear benefit > > is demonstrable, or has at least been seriously thought about. > > I can move those functions to vmscan.c if you think it would improve > performance. I don't have a strong opinion here -- I was able to > measure the bloat but not the performance impact. This seems to be more an act of faith than anything else. Unlikely that any difference will be measurable. If there is a difference, the inlined version should win on microbenchmarks because all four copies of the function will be in cache. But a more realistic, broader test might suffer a slowdown due to having to move the larger text in more frequently. And inter-build alignment changes seem to make a larger difference than anything else, thus confounding measurement attempts.
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 0e537e580dc1..5d36015071d2 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -777,7 +777,8 @@ static int fuse_check_page(struct page *page) 1 << PG_active | 1 << PG_workingset | 1 << PG_reclaim | - 1 << PG_waiters))) { + 1 << PG_waiters | + LRU_GEN_MASK | LRU_REFS_MASK))) { dump_page(page, "fuse: trying to steal weird page"); return 1; } diff --git a/include/linux/mm.h b/include/linux/mm.h index e34edb775334..980f568204a3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1060,6 +1060,8 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf); #define ZONES_PGOFF (NODES_PGOFF - ZONES_WIDTH) #define LAST_CPUPID_PGOFF (ZONES_PGOFF - LAST_CPUPID_WIDTH) #define KASAN_TAG_PGOFF (LAST_CPUPID_PGOFF - KASAN_TAG_WIDTH) +#define LRU_GEN_PGOFF (KASAN_TAG_PGOFF - LRU_GEN_WIDTH) +#define LRU_REFS_PGOFF (LRU_GEN_PGOFF - LRU_REFS_WIDTH) /* * Define the bit shifts to access each section. For non-existent diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 7c9c2157e9a8..9abd72a95462 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -38,6 +38,9 @@ static __always_inline void __update_lru_size(struct lruvec *lruvec, { struct pglist_data *pgdat = lruvec_pgdat(lruvec); + lockdep_assert_held(&lruvec->lru_lock); + WARN_ON_ONCE(nr_pages != (int)nr_pages); + __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); __mod_zone_page_state(&pgdat->node_zones[zid], NR_ZONE_LRU_BASE + lru, nr_pages); @@ -99,11 +102,178 @@ static __always_inline enum lru_list folio_lru_list(struct folio *folio) return lru; } +#ifdef CONFIG_LRU_GEN + +static inline bool lru_gen_enabled(void) +{ + return true; +} + +static inline bool lru_gen_in_fault(void) +{ + return current->in_lru_fault; +} + +static inline int lru_gen_from_seq(unsigned long seq) +{ + return seq % MAX_NR_GENS; +} + +static inline bool lru_gen_is_active(struct lruvec *lruvec, int gen) +{ + unsigned long max_seq = lruvec->lrugen.max_seq; + + VM_BUG_ON(gen >= MAX_NR_GENS); + + /* see the comment on MIN_NR_GENS */ + return gen == lru_gen_from_seq(max_seq) || gen == lru_gen_from_seq(max_seq - 1); +} + +static inline void lru_gen_update_size(struct lruvec *lruvec, struct folio *folio, + int old_gen, int new_gen) +{ + int type = folio_is_file_lru(folio); + int zone = folio_zonenum(folio); + int delta = folio_nr_pages(folio); + enum lru_list lru = type * LRU_INACTIVE_FILE; + struct lru_gen_struct *lrugen = &lruvec->lrugen; + + VM_BUG_ON(old_gen != -1 && old_gen >= MAX_NR_GENS); + VM_BUG_ON(new_gen != -1 && new_gen >= MAX_NR_GENS); + VM_BUG_ON(old_gen == -1 && new_gen == -1); + + if (old_gen >= 0) + WRITE_ONCE(lrugen->nr_pages[old_gen][type][zone], + lrugen->nr_pages[old_gen][type][zone] - delta); + if (new_gen >= 0) + WRITE_ONCE(lrugen->nr_pages[new_gen][type][zone], + lrugen->nr_pages[new_gen][type][zone] + delta); + + /* addition */ + if (old_gen < 0) { + if (lru_gen_is_active(lruvec, new_gen)) + lru += LRU_ACTIVE; + __update_lru_size(lruvec, lru, zone, delta); + return; + } + + /* deletion */ + if (new_gen < 0) { + if (lru_gen_is_active(lruvec, old_gen)) + lru += LRU_ACTIVE; + __update_lru_size(lruvec, lru, zone, -delta); + return; + } +} + +static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) +{ + int gen; + unsigned long old_flags, new_flags; + int type = folio_is_file_lru(folio); + int zone = folio_zonenum(folio); + struct lru_gen_struct *lrugen = &lruvec->lrugen; + + if (folio_test_unevictable(folio)) + return false; + /* + * There are three common cases for this page: + * 1. If it's hot, e.g., freshly faulted in or previously hot and + * migrated, add it to the youngest generation. + * 2. If it's cold but can't be evicted immediately, i.e., an anon page + * not in swapcache or a dirty page pending writeback, add it to the + * second oldest generation. + * 3. Everything else (clean, cold) is added to the oldest generation. + */ + if (folio_test_active(folio)) + gen = lru_gen_from_seq(lrugen->max_seq); + else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) || + (folio_test_reclaim(folio) && + (folio_test_dirty(folio) || folio_test_writeback(folio)))) + gen = lru_gen_from_seq(lrugen->min_seq[type] + 1); + else + gen = lru_gen_from_seq(lrugen->min_seq[type]); + + do { + new_flags = old_flags = READ_ONCE(folio->flags); + VM_BUG_ON_FOLIO(new_flags & LRU_GEN_MASK, folio); + + /* see the comment on MIN_NR_GENS */ + new_flags &= ~(LRU_GEN_MASK | BIT(PG_active)); + new_flags |= (gen + 1UL) << LRU_GEN_PGOFF; + } while (cmpxchg(&folio->flags, old_flags, new_flags) != old_flags); + + lru_gen_update_size(lruvec, folio, -1, gen); + /* for folio_rotate_reclaimable() */ + if (reclaiming) + list_add_tail(&folio->lru, &lrugen->lists[gen][type][zone]); + else + list_add(&folio->lru, &lrugen->lists[gen][type][zone]); + + return true; +} + +static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) +{ + int gen; + unsigned long old_flags, new_flags; + + do { + new_flags = old_flags = READ_ONCE(folio->flags); + if (!(new_flags & LRU_GEN_MASK)) + return false; + + VM_BUG_ON_FOLIO(folio_test_active(folio), folio); + VM_BUG_ON_FOLIO(folio_test_unevictable(folio), folio); + + gen = ((new_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; + + new_flags &= ~LRU_GEN_MASK; + /* for shrink_page_list() */ + if (reclaiming) + new_flags &= ~(BIT(PG_referenced) | BIT(PG_reclaim)); + else if (lru_gen_is_active(lruvec, gen)) + new_flags |= BIT(PG_active); + } while (cmpxchg(&folio->flags, old_flags, new_flags) != old_flags); + + lru_gen_update_size(lruvec, folio, gen, -1); + list_del(&folio->lru); + + return true; +} + +#else + +static inline bool lru_gen_enabled(void) +{ + return false; +} + +static inline bool lru_gen_in_fault(void) +{ + return false; +} + +static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) +{ + return false; +} + +static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) +{ + return false; +} + +#endif /* CONFIG_LRU_GEN */ + static __always_inline void lruvec_add_folio(struct lruvec *lruvec, struct folio *folio) { enum lru_list lru = folio_lru_list(folio); + if (lru_gen_add_folio(lruvec, folio, false)) + return; + update_lru_size(lruvec, lru, folio_zonenum(folio), folio_nr_pages(folio)); if (lru != LRU_UNEVICTABLE) @@ -121,6 +291,9 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struct folio *folio) { enum lru_list lru = folio_lru_list(folio); + if (lru_gen_add_folio(lruvec, folio, true)) + return; + update_lru_size(lruvec, lru, folio_zonenum(folio), folio_nr_pages(folio)); /* This is not expected to be used on LRU_UNEVICTABLE */ @@ -138,6 +311,9 @@ void lruvec_del_folio(struct lruvec *lruvec, struct folio *folio) { enum lru_list lru = folio_lru_list(folio); + if (lru_gen_del_folio(lruvec, folio, false)) + return; + if (lru != LRU_UNEVICTABLE) list_del(&folio->lru); update_lru_size(lruvec, lru, folio_zonenum(folio), diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 962b14d403e8..bde05427e2bb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -317,6 +317,96 @@ enum lruvec_flags { */ }; +#endif /* !__GENERATING_BOUNDS_H */ + +/* + * Evictable pages are divided into multiple generations. The youngest and the + * oldest generation numbers, max_seq and min_seq, are monotonically increasing. + * They form a sliding window of a variable size [MIN_NR_GENS, MAX_NR_GENS]. An + * offset within MAX_NR_GENS, gen, indexes the LRU list of the corresponding + * generation. The gen counter in folio->flags stores gen+1 while a page is on + * one of lrugen->lists[]. Otherwise it stores 0. + * + * A page is added to the youngest generation on faulting. The aging needs to + * check the accessed bit at least twice before handing this page over to the + * eviction. The first check takes care of the accessed bit set on the initial + * fault; the second check makes sure this page hasn't been used since then. + * This process, AKA second chance, requires a minimum of two generations, + * hence MIN_NR_GENS. And to maintain ABI compatibility with the active/inactive + * LRU, these two generations are considered active; the rest of generations, if + * they exist, are considered inactive. See lru_gen_is_active(). PG_active is + * always cleared while a page is on one of lrugen->lists[] so that the aging + * needs not to worry about it. And it's set again when a page considered active + * is isolated for non-reclaiming purposes, e.g., migration. See + * lru_gen_add_folio() and lru_gen_del_folio(). + * + * MAX_NR_GENS is set to 4 so that the multi-gen LRU can support twice of the + * categories of the active/inactive LRU when keeping track of accesses through + * page tables. It requires order_base_2(MAX_NR_GENS+1) bits in folio->flags. + */ +#define MIN_NR_GENS 2U +#define MAX_NR_GENS 4U + +#ifndef __GENERATING_BOUNDS_H + +struct lruvec; + +#define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF) +#define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF) + +#ifdef CONFIG_LRU_GEN + +enum { + LRU_GEN_ANON, + LRU_GEN_FILE, +}; + +/* + * The youngest generation number is stored in max_seq for both anon and file + * types as they are aged on an equal footing. The oldest generation numbers are + * stored in min_seq[] separately for anon and file types as clean file pages + * can be evicted regardless of swap constraints. + * + * Normally anon and file min_seq are in sync. But if swapping is constrained, + * e.g., out of swap space, file min_seq is allowed to advance and leave anon + * min_seq behind. + */ +struct lru_gen_struct { + /* the aging increments the youngest generation number */ + unsigned long max_seq; + /* the eviction increments the oldest generation numbers */ + unsigned long min_seq[ANON_AND_FILE]; + /* the multi-gen LRU lists */ + struct list_head lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; + /* the sizes of the above lists */ + unsigned long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; +}; + +void lru_gen_init_lruvec(struct lruvec *lruvec); + +#ifdef CONFIG_MEMCG +void lru_gen_init_memcg(struct mem_cgroup *memcg); +void lru_gen_exit_memcg(struct mem_cgroup *memcg); +#endif + +#else /* !CONFIG_LRU_GEN */ + +static inline void lru_gen_init_lruvec(struct lruvec *lruvec) +{ +} + +#ifdef CONFIG_MEMCG +static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) +{ +} + +static inline void lru_gen_exit_memcg(struct mem_cgroup *memcg) +{ +} +#endif + +#endif /* CONFIG_LRU_GEN */ + struct lruvec { struct list_head lists[NR_LRU_LISTS]; /* per lruvec lru_lock for memcg */ @@ -334,6 +424,10 @@ struct lruvec { unsigned long refaults[ANON_AND_FILE]; /* Various lruvec state flags (enum lruvec_flags) */ unsigned long flags; +#ifdef CONFIG_LRU_GEN + /* evictable pages divided into generations */ + struct lru_gen_struct lrugen; +#endif #ifdef CONFIG_MEMCG struct pglist_data *pgdat; #endif diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h index ef1e3e736e14..c1946cdb845f 100644 --- a/include/linux/page-flags-layout.h +++ b/include/linux/page-flags-layout.h @@ -55,7 +55,8 @@ #define SECTIONS_WIDTH 0 #endif -#if ZONES_WIDTH + SECTIONS_WIDTH + NODES_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS +#if ZONES_WIDTH + LRU_GEN_WIDTH + LRU_REFS_WIDTH + SECTIONS_WIDTH + NODES_SHIFT \ + <= BITS_PER_LONG - NR_PAGEFLAGS #define NODES_WIDTH NODES_SHIFT #elif defined(CONFIG_SPARSEMEM_VMEMMAP) #error "Vmemmap: No space for nodes field in page flags" @@ -89,8 +90,8 @@ #define LAST_CPUPID_SHIFT 0 #endif -#if ZONES_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + KASAN_TAG_WIDTH + LAST_CPUPID_SHIFT \ - <= BITS_PER_LONG - NR_PAGEFLAGS +#if ZONES_WIDTH + LRU_GEN_WIDTH + LRU_REFS_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + \ + KASAN_TAG_WIDTH + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT #else #define LAST_CPUPID_WIDTH 0 @@ -100,8 +101,8 @@ #define LAST_CPUPID_NOT_IN_PAGE_FLAGS #endif -#if ZONES_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + KASAN_TAG_WIDTH + LAST_CPUPID_WIDTH \ - > BITS_PER_LONG - NR_PAGEFLAGS +#if ZONES_WIDTH + LRU_GEN_WIDTH + LRU_REFS_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + \ + KASAN_TAG_WIDTH + LAST_CPUPID_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS #error "Not enough bits in page flags" #endif diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 9d8eeaa67d05..5cbde013ce66 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -1017,7 +1017,7 @@ PAGEFLAG(Isolated, isolated, PF_ANY); 1UL << PG_private | 1UL << PG_private_2 | \ 1UL << PG_writeback | 1UL << PG_reserved | \ 1UL << PG_slab | 1UL << PG_active | \ - 1UL << PG_unevictable | __PG_MLOCKED) + 1UL << PG_unevictable | __PG_MLOCKED | LRU_GEN_MASK) /* * Flags checked when a page is prepped for return by the page allocator. @@ -1028,7 +1028,7 @@ PAGEFLAG(Isolated, isolated, PF_ANY); * alloc-free cycle to prevent from reusing the page. */ #define PAGE_FLAGS_CHECK_AT_PREP \ - (PAGEFLAGS_MASK & ~__PG_HWPOISON) + ((PAGEFLAGS_MASK & ~__PG_HWPOISON) | LRU_GEN_MASK | LRU_REFS_MASK) #define PAGE_FLAGS_PRIVATE \ (1UL << PG_private | 1UL << PG_private_2) diff --git a/include/linux/sched.h b/include/linux/sched.h index d5e3c00b74e1..4cd6345f62f9 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -914,6 +914,10 @@ struct task_struct { #ifdef CONFIG_MEMCG unsigned in_user_fault:1; #endif +#ifdef CONFIG_LRU_GEN + /* whether the LRU algorithm may apply to this access */ + unsigned in_lru_fault:1; +#endif #ifdef CONFIG_COMPAT_BRK unsigned brk_randomized:1; #endif diff --git a/kernel/bounds.c b/kernel/bounds.c index 9795d75b09b2..e08fb89f87f4 100644 --- a/kernel/bounds.c +++ b/kernel/bounds.c @@ -22,6 +22,13 @@ int main(void) DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS)); #endif DEFINE(SPINLOCK_SIZE, sizeof(spinlock_t)); +#ifdef CONFIG_LRU_GEN + DEFINE(LRU_GEN_WIDTH, order_base_2(MAX_NR_GENS + 1)); + DEFINE(LRU_REFS_WIDTH, 0); +#else + DEFINE(LRU_GEN_WIDTH, 0); + DEFINE(LRU_REFS_WIDTH, 0); +#endif /* End of constants */ return 0; diff --git a/mm/Kconfig b/mm/Kconfig index 034d87953600..4595fc654181 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -909,6 +909,14 @@ config ANON_VMA_NAME area from being merged with adjacent virtual memory areas due to the difference in their name. +config LRU_GEN + bool "Multi-Gen LRU" + depends on MMU + # the following options can use up the spare bits in page flags + depends on !MAXSMP && (64BIT || !SPARSEMEM || SPARSEMEM_VMEMMAP) + help + A high performance LRU implementation to overcommit memory. + source "mm/damon/Kconfig" endmenu diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2fe38212e07c..c5dbb7eef61d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2321,7 +2321,8 @@ static void __split_huge_page_tail(struct page *head, int tail, #ifdef CONFIG_64BIT (1L << PG_arch_2) | #endif - (1L << PG_dirty))); + (1L << PG_dirty) | + LRU_GEN_MASK | LRU_REFS_MASK)); /* ->mapping in first tail page is compound_mapcount */ VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 725f76723220..f5de8be80c13 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5062,6 +5062,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg) static void mem_cgroup_free(struct mem_cgroup *memcg) { + lru_gen_exit_memcg(memcg); memcg_wb_domain_exit(memcg); __mem_cgroup_free(memcg); } @@ -5120,6 +5121,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void) memcg->deferred_split_queue.split_queue_len = 0; #endif idr_replace(&mem_cgroup_idr, memcg, memcg->id.id); + lru_gen_init_memcg(memcg); return memcg; fail: mem_cgroup_id_remove(memcg); diff --git a/mm/memory.c b/mm/memory.c index 44a1ec7a2cac..6df27b84c5aa 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4812,6 +4812,27 @@ static inline void mm_account_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address); } +#ifdef CONFIG_LRU_GEN +static void lru_gen_enter_fault(struct vm_area_struct *vma) +{ + /* the LRU algorithm doesn't apply to sequential or random reads */ + current->in_lru_fault = !(vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ)); +} + +static void lru_gen_exit_fault(void) +{ + current->in_lru_fault = false; +} +#else +static void lru_gen_enter_fault(struct vm_area_struct *vma) +{ +} + +static void lru_gen_exit_fault(void) +{ +} +#endif /* CONFIG_LRU_GEN */ + /* * By the time we get here, we already hold the mm semaphore * @@ -4843,11 +4864,15 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, if (flags & FAULT_FLAG_USER) mem_cgroup_enter_user_fault(); + lru_gen_enter_fault(vma); + if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else ret = __handle_mm_fault(vma, address, flags); + lru_gen_exit_fault(); + if (flags & FAULT_FLAG_USER) { mem_cgroup_exit_user_fault(); /* diff --git a/mm/mm_init.c b/mm/mm_init.c index 9ddaf0e1b0ab..0d7b2bd2454a 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -65,14 +65,16 @@ void __init mminit_verify_pageflags_layout(void) shift = 8 * sizeof(unsigned long); width = shift - SECTIONS_WIDTH - NODES_WIDTH - ZONES_WIDTH - - LAST_CPUPID_SHIFT - KASAN_TAG_WIDTH; + - LAST_CPUPID_SHIFT - KASAN_TAG_WIDTH - LRU_GEN_WIDTH - LRU_REFS_WIDTH; mminit_dprintk(MMINIT_TRACE, "pageflags_layout_widths", - "Section %d Node %d Zone %d Lastcpupid %d Kasantag %d Flags %d\n", + "Section %d Node %d Zone %d Lastcpupid %d Kasantag %d Gen %d Tier %d Flags %d\n", SECTIONS_WIDTH, NODES_WIDTH, ZONES_WIDTH, LAST_CPUPID_WIDTH, KASAN_TAG_WIDTH, + LRU_GEN_WIDTH, + LRU_REFS_WIDTH, NR_PAGEFLAGS); mminit_dprintk(MMINIT_TRACE, "pageflags_layout_shifts", "Section %d Node %d Zone %d Lastcpupid %d Kasantag %d\n", diff --git a/mm/mmzone.c b/mm/mmzone.c index 0ae7571e35ab..68e1511be12d 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -88,6 +88,8 @@ void lruvec_init(struct lruvec *lruvec) * Poison its list head, so that any operations on it would crash. */ list_del(&lruvec->lists[LRU_UNEVICTABLE]); + + lru_gen_init_lruvec(lruvec); } #if defined(CONFIG_NUMA_BALANCING) && !defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) diff --git a/mm/swap.c b/mm/swap.c index 7e320ec08c6a..a6870ba0bd83 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -460,6 +460,11 @@ void folio_add_lru(struct folio *folio) VM_BUG_ON_FOLIO(folio_test_active(folio) && folio_test_unevictable(folio), folio); VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); + /* see the comment in lru_gen_add_folio() */ + if (lru_gen_enabled() && !folio_test_unevictable(folio) && + lru_gen_in_fault() && !(current->flags & PF_MEMALLOC)) + folio_set_active(folio); + folio_get(folio); local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.lru_add); @@ -551,7 +556,7 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec) static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) { - if (PageActive(page) && !PageUnevictable(page)) { + if (!PageUnevictable(page) && (PageActive(page) || lru_gen_enabled())) { int nr_pages = thp_nr_pages(page); del_page_from_lru_list(page, lruvec); @@ -666,7 +671,7 @@ void deactivate_file_folio(struct folio *folio) */ void deactivate_page(struct page *page) { - if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + if (PageLRU(page) && !PageUnevictable(page) && (PageActive(page) || lru_gen_enabled())) { struct pagevec *pvec; local_lock(&lru_pvecs.lock); diff --git a/mm/vmscan.c b/mm/vmscan.c index 2232cb55af41..37dd5d1c3d07 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2968,6 +2968,81 @@ static bool can_age_anon_pages(struct pglist_data *pgdat, return can_demote(pgdat->node_id, sc); } +#ifdef CONFIG_LRU_GEN + +/****************************************************************************** + * shorthand helpers + ******************************************************************************/ + +#define for_each_gen_type_zone(gen, type, zone) \ + for ((gen) = 0; (gen) < MAX_NR_GENS; (gen)++) \ + for ((type) = 0; (type) < ANON_AND_FILE; (type)++) \ + for ((zone) = 0; (zone) < MAX_NR_ZONES; (zone)++) + +static struct lruvec __maybe_unused *get_lruvec(struct mem_cgroup *memcg, int nid) +{ + struct pglist_data *pgdat = NODE_DATA(nid); + +#ifdef CONFIG_MEMCG + if (memcg) { + struct lruvec *lruvec = &memcg->nodeinfo[nid]->lruvec; + + /* for hotadd_new_pgdat() */ + if (!lruvec->pgdat) + lruvec->pgdat = pgdat; + + return lruvec; + } +#endif + VM_BUG_ON(!mem_cgroup_disabled()); + + return pgdat ? &pgdat->__lruvec : NULL; +} + +/****************************************************************************** + * initialization + ******************************************************************************/ + +void lru_gen_init_lruvec(struct lruvec *lruvec) +{ + int gen, type, zone; + struct lru_gen_struct *lrugen = &lruvec->lrugen; + + lrugen->max_seq = MIN_NR_GENS + 1; + + for_each_gen_type_zone(gen, type, zone) + INIT_LIST_HEAD(&lrugen->lists[gen][type][zone]); +} + +#ifdef CONFIG_MEMCG +void lru_gen_init_memcg(struct mem_cgroup *memcg) +{ +} + +void lru_gen_exit_memcg(struct mem_cgroup *memcg) +{ + int nid; + + for_each_node(nid) { + struct lruvec *lruvec = get_lruvec(memcg, nid); + + VM_BUG_ON(memchr_inv(lruvec->lrugen.nr_pages, 0, + sizeof(lruvec->lrugen.nr_pages))); + } +} +#endif + +static int __init init_lru_gen(void) +{ + BUILD_BUG_ON(MIN_NR_GENS + 1 >= MAX_NR_GENS); + BUILD_BUG_ON(BIT(LRU_GEN_WIDTH) <= MAX_NR_GENS); + + return 0; +}; +late_initcall(init_lru_gen); + +#endif /* CONFIG_LRU_GEN */ + static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { unsigned long nr[NR_LRU_LISTS];