Message ID | 20180727162143.26466-1-drake@endlessm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Making direct reclaim fail when thrashing | expand |
On Fri, Jul 27, 2018 at 11:21:43AM -0500, Daniel Drake wrote: > Here I'm experimenting by adding another tag to the page cache radix tree, > tagging pages that were activated in the refault path. NAK. No more tags or you blow up the memory consumption of the radix tree by 15%. > And then in get_scan_count I'm checking how many active pages have that > tag, and also looking at the size of the active and inactive lists. You'd be better off using a bit in struct page somewhere ...
On Fri, Jul 27, 2018 at 11:21:43AM -0500, Daniel Drake wrote: > Split from the thread > [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 > where we were discussing if/how to make the direct reclaim codepath > fail if we're excessively thrashing, so that the OOM killer might > step in. This is potentially desirable when the thrashing is so bad > that the UI stops responding, causing the user to pull the plug. > > On Tue, Jul 17, 2018 at 7:23 AM, Michal Hocko <mhocko@kernel.org> wrote: > > mm/workingset.c allows for tracking when an actual page got evicted. > > workingset_refault tells us whether a give filemap fault is a recent > > refault and activates the page if that is the case. So what you need is > > to note how many refaulted pages we have on the active LRU list. If that > > is a large part of the list and if the inactive list is really small > > then we know we are trashing. This all sounds much easier than it will > > eventually turn out to be of course but I didn't really get to play with > > this much. I've mentioned it in the other thread, but whether refaults are a performance/latency problem depends 99% on your available IO capacity and the IO patterns. On a highly contended IO device, refaults of a single unfortunately located page can lead to multi-second stalls. On an idle SSD, thousands of refaults might not be noticable to the user. Without measuring how much time these events take out of your day, you can't really tell eif they're a problem or not. The event rate or the proportion between pages and refaults doesn't carry that signal.
diff --git a/include/linux/fs.h b/include/linux/fs.h index d85ac9d24bb3..45f94ffd1c67 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -462,6 +462,7 @@ struct block_device { #define PAGECACHE_TAG_DIRTY 0 #define PAGECACHE_TAG_WRITEBACK 1 #define PAGECACHE_TAG_TOWRITE 2 +#define PAGECACHE_TAG_REFAULTED 3 int mapping_tagged(struct address_space *mapping, int tag); diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 34149e8b5f73..86eccb71ef7e 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -65,7 +65,7 @@ static inline bool radix_tree_is_internal_node(void *ptr) /*** radix-tree API starts here ***/ -#define RADIX_TREE_MAX_TAGS 3 +#define RADIX_TREE_MAX_TAGS 4 #ifndef RADIX_TREE_MAP_SHIFT #define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6) diff --git a/mm/filemap.c b/mm/filemap.c index 250f675dcfb2..9a686570dc75 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -917,6 +917,8 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping, */ if (!(gfp_mask & __GFP_WRITE) && shadow && workingset_refault(shadow)) { + radix_tree_tag_set(&mapping->i_pages, page_index(page), + PAGECACHE_TAG_REFAULTED); SetPageActive(page); workingset_activation(page); } else diff --git a/mm/vmscan.c b/mm/vmscan.c index 03822f86f288..79bc810b43bb 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2102,6 +2102,30 @@ enum scan_balance { SCAN_FILE, }; + +static int count_refaulted(struct lruvec *lruvec, enum lru_list lru) { + int nr_refaulted = 0; + struct page *page; + + list_for_each_entry(page, &lruvec->lists[lru], lru) { + /* Lookup page cache entry from page following the approach + * taken in __set_page_dirty_nobuffers */ + unsigned long flags; + struct address_space *mapping = page_mapping(page); + if (!mapping) + continue; + + xa_lock_irqsave(&mapping->i_pages, flags); + BUG_ON(page_mapping(page) != mapping); + nr_refaulted += radix_tree_tag_get(&mapping->i_pages, + page_index(page), + PAGECACHE_TAG_REFAULTED); + xa_unlock_irqrestore(&mapping->i_pages, flags); + } + + return nr_refaulted; +} + /* * Determine how aggressively the anon and file LRU lists should be * scanned. The relative value of each set of LRU lists is determined @@ -2270,6 +2294,19 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, unsigned long size; unsigned long scan; + if (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE) { + int nr_refaulted; + unsigned long inactive, active; + + nr_refaulted = count_refaulted(lruvec, lru); + active = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); + inactive = lruvec_lru_size(lruvec, lru - 1, + sc->reclaim_idx); + pr_err("get_scan_count lru%d active=%ld inactive=%ld " + "refaulted=%d\n", + lru, active, inactive, nr_refaulted); + } + size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); scan = size >> sc->priority; /*