Message ID | 20200603230303.kSkT62Lb5%akpm@linux-foundation.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [001/131] mm/slub: fix a memory leak in sysfs_slab_add() | expand |
在 2020/6/4 上午7:03, Andrew Morton 写道: > > + /* XXX: Move to lru_cache_add() when it supports new vs putback */ Hi Hannes, Sorry for a bit lost, would you like to explain a bit more of your idea here? > + spin_lock_irq(&page_pgdat(page)->lru_lock); > + lru_note_cost(page); > + spin_unlock_irq(&page_pgdat(page)->lru_lock); > + What could we see here w/o the lru_lock? Thanks Alex
On Tue, Jun 09, 2020 at 05:15:33PM +0800, Alex Shi wrote: > > > 在 2020/6/4 上午7:03, Andrew Morton 写道: > > > > + /* XXX: Move to lru_cache_add() when it supports new vs putback */ > > Hi Hannes, > > Sorry for a bit lost, would you like to explain a bit more of your idea here? > > > + spin_lock_irq(&page_pgdat(page)->lru_lock); > > + lru_note_cost(page); > > + spin_unlock_irq(&page_pgdat(page)->lru_lock); > > + > > > What could we see here w/o the lru_lock? It'll just be part of the existing LRU locking in pagevec_lru_move_fn(), when the new pages are added to the LRU in batch. See this older patch for example: https://lore.kernel.org/linux-mm/20160606194836.3624-6-hannes@cmpxchg.org/ I didn't include it in this series to reduce conflict with Joonsoo's WIP series that also operates in this area and does something similar: https://lkml.org/lkml/2020/4/3/63
2020년 6월 9일 (화) 오후 11:46, Johannes Weiner <hannes@cmpxchg.org>님이 작성: > > On Tue, Jun 09, 2020 at 05:15:33PM +0800, Alex Shi wrote: > > > > > > 在 2020/6/4 上午7:03, Andrew Morton 写道: > > > > > > + /* XXX: Move to lru_cache_add() when it supports new vs putback */ > > > > Hi Hannes, > > > > Sorry for a bit lost, would you like to explain a bit more of your idea here? > > > > > + spin_lock_irq(&page_pgdat(page)->lru_lock); > > > + lru_note_cost(page); > > > + spin_unlock_irq(&page_pgdat(page)->lru_lock); > > > + > > > > > > What could we see here w/o the lru_lock? > > It'll just be part of the existing LRU locking in > pagevec_lru_move_fn(), when the new pages are added to the LRU in > batch. See this older patch for example: > > https://lore.kernel.org/linux-mm/20160606194836.3624-6-hannes@cmpxchg.org/ > > I didn't include it in this series to reduce conflict with Joonsoo's > WIP series that also operates in this area and does something similar: Thanks! > https://lkml.org/lkml/2020/4/3/63 I haven't completed the rebase of my series but I guess that referenced patch "https://lkml.org/lkml/2020/4/3/63" would be removed in the next version. Before the I/O cost model, a new anonymous page contributes to the LRU reclaim balance. But, now, a new anonymous page doesn't contributes to the I/O cost so this adjusting patch would not be needed anymore. If anyone wants to change this part, "/* XXX: Move to lru_cache_add() when it supports new vs putback */", feel free to do it. Thanks.
在 2020/6/10 下午1:23, Joonsoo Kim 写道: > 2020년 6월 9일 (화) 오후 11:46, Johannes Weiner <hannes@cmpxchg.org>님이 작성: >> >> On Tue, Jun 09, 2020 at 05:15:33PM +0800, Alex Shi wrote: >>> >>> >>> 在 2020/6/4 上午7:03, Andrew Morton 写道: >>>> >>>> + /* XXX: Move to lru_cache_add() when it supports new vs putback */ >>> >>> Hi Hannes, >>> >>> Sorry for a bit lost, would you like to explain a bit more of your idea here? >>> >>>> + spin_lock_irq(&page_pgdat(page)->lru_lock); >>>> + lru_note_cost(page); >>>> + spin_unlock_irq(&page_pgdat(page)->lru_lock); >>>> + >>> >>> >>> What could we see here w/o the lru_lock? Why I want to know the lru_lock protection here is that currently we have 5 lru lists but only guarded by one lock, that would cause much contention when different apps active on a server. I guess originally we have only one lru_lock, since 5 locks would cause cacheline bouncing if we put them together, or a bit cacheline waste to separate them in cacheline. But after we have qspinlock, each of cpu will just loop lock on their cacheline, no interfer to others. It would much much relief the performance drop by cacheline bounce. And we could use page.mapping bits to store the using lru list index for the page. As a quick thought, I guess, except the 5 locks for 5 lists, we still need 1 more lock for common lruvec data or for others which relay on lru_lock now, like mlock, hpage_nr_pages.. That's the reason I want to know everything under lru_lock. :) Any comments for this idea? :) Thanks Alex >> >> It'll just be part of the existing LRU locking in >> pagevec_lru_move_fn(), when the new pages are added to the LRU in >> batch. See this older patch for example: >> >> https://lore.kernel.org/linux-mm/20160606194836.3624-6-hannes@cmpxchg.org/ >> >> I didn't include it in this series to reduce conflict with Joonsoo's >> WIP series that also operates in this area and does something similar: > > Thanks! > >> https://lkml.org/lkml/2020/4/3/63 > > I haven't completed the rebase of my series but I guess that referenced patch > "https://lkml.org/lkml/2020/4/3/63" would be removed in the next version. Thanks a lot for the info, Johannes&Joonsoo! A long history for a interesting idea. :) > > Before the I/O cost model, a new anonymous page contributes to the LRU reclaim > balance. But, now, a new anonymous page doesn't contributes to the I/O cost > so this adjusting patch would not be needed anymore. > > If anyone wants to change this part, > "/* XXX: Move to lru_cache_add() when it supports new vs putback */", feel free > to do it.
--- a/include/linux/swap.h~mm-balance-lru-lists-based-on-relative-thrashing +++ a/include/linux/swap.h @@ -334,8 +334,7 @@ extern unsigned long nr_free_pagecache_p /* linux/mm/swap.c */ -extern void lru_note_cost(struct lruvec *lruvec, bool file, - unsigned int nr_pages); +extern void lru_note_cost(struct page *); extern void lru_cache_add(struct page *); extern void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *head); --- a/mm/swap.c~mm-balance-lru-lists-based-on-relative-thrashing +++ a/mm/swap.c @@ -278,12 +278,15 @@ void rotate_reclaimable_page(struct page } } -void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages) +void lru_note_cost(struct page *page) { - if (file) - lruvec->file_cost += nr_pages; + struct lruvec *lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + + /* Record new data point */ + if (page_is_file_lru(page)) + lruvec->file_cost++; else - lruvec->anon_cost += nr_pages; + lruvec->anon_cost++; } static void __activate_page(struct page *page, struct lruvec *lruvec, --- a/mm/swap_state.c~mm-balance-lru-lists-based-on-relative-thrashing +++ a/mm/swap_state.c @@ -440,6 +440,11 @@ struct page *__read_swap_cache_async(swp goto fail_unlock; } + /* XXX: Move to lru_cache_add() when it supports new vs putback */ + spin_lock_irq(&page_pgdat(page)->lru_lock); + lru_note_cost(page); + spin_unlock_irq(&page_pgdat(page)->lru_lock); + /* Caller will initiate read into locked page */ SetPageWorkingset(page); lru_cache_add(page); --- a/mm/vmscan.c~mm-balance-lru-lists-based-on-relative-thrashing +++ a/mm/vmscan.c @@ -1958,12 +1958,6 @@ shrink_inactive_list(unsigned long nr_to move_pages_to_lru(lruvec, &page_list); __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - /* - * Rotating pages costs CPU without actually - * progressing toward the reclaim goal. - */ - lru_note_cost(lruvec, 0, stat.nr_activate[0]); - lru_note_cost(lruvec, 1, stat.nr_activate[1]); item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); @@ -2079,11 +2073,6 @@ static void shrink_active_list(unsigned * Move pages back to the lru list. */ spin_lock_irq(&pgdat->lru_lock); - /* - * Rotating pages costs CPU without actually - * progressing toward the reclaim goal. - */ - lru_note_cost(lruvec, file, nr_rotated); nr_activate = move_pages_to_lru(lruvec, &l_active); nr_deactivate = move_pages_to_lru(lruvec, &l_inactive); @@ -2298,22 +2287,23 @@ static void get_scan_count(struct lruvec scan_balance = SCAN_FRACT; /* - * With swappiness at 100, anonymous and file have the same priority. - * This scanning priority is essentially the inverse of IO cost. + * Calculate the pressure balance between anon and file pages. + * + * The amount of pressure we put on each LRU is inversely + * proportional to the cost of reclaiming each list, as + * determined by the share of pages that are refaulting, times + * the relative IO cost of bringing back a swapped out + * anonymous page vs reloading a filesystem page (swappiness). + * + * With swappiness at 100, anon and file have equal IO cost. */ anon_prio = swappiness; file_prio = 200 - anon_prio; /* - * OK, so we have swap space and a fair amount of page cache - * pages. We use the recently rotated / recently scanned - * ratios to determine how valuable each cache is. - * * Because workloads change over time (and to avoid overflow) * we keep these statistics as a floating average, which ends - * up weighing recent references more than old ones. - * - * anon in [0], file in [1] + * up weighing recent refaults more than old ones. */ anon = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) + @@ -2328,15 +2318,6 @@ static void get_scan_count(struct lruvec lruvec->file_cost /= 2; totalcost /= 2; } - - /* - * The amount of pressure on anon vs file pages is inversely - * proportional to the assumed cost of reclaiming each list, - * as determined by the share of pages that are likely going - * to refault or rotate on each list (recently referenced), - * times the relative IO cost of bringing back a swapped out - * anonymous page vs reloading a filesystem page (swappiness). - */ ap = anon_prio * (totalcost + 1); ap /= lruvec->anon_cost + 1; --- a/mm/workingset.c~mm-balance-lru-lists-based-on-relative-thrashing +++ a/mm/workingset.c @@ -365,6 +365,10 @@ void workingset_refault(struct page *pag /* Page was active prior to eviction */ if (workingset) { SetPageWorkingset(page); + /* XXX: Move to lru_cache_add() when it supports new vs putback */ + spin_lock_irq(&page_pgdat(page)->lru_lock); + lru_note_cost(page); + spin_unlock_irq(&page_pgdat(page)->lru_lock); inc_lruvec_state(lruvec, WORKINGSET_RESTORE); } out: