Message ID | 20200610023939.GI19604@bombadil.infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] page cache drop-behind | expand |
Hi Matthew, That's great! On 2020-06-09 19:39:39 -0700, Matthew Wilcox wrote: > Andres reported a problem recently where reading a file several times > the size of memory causes intermittent stalls. My suspicion is that > page allocation eventually runs into the low watermark and starts to > do reclaim. Some shrinkers take a long time to run and have a low chance > of actually freeing a page (eg the dentry cache needs to free 21 dentries > which all happen to be on the same pair of pages to free those two pages). That meshes with some of what I saw in profiles, IIRC. There's a related issue that I don't think this would solve, but which could be solved by some form of write-behind, in particular that I've observed that reaching dirty data limits often leads to a pretty random selection of pages being written back. > This patch attempts to free pages from the file that we're currently > reading from if there are no pages readily available. If that doesn't > work, we'll run all the shrinkers just as we did before. > This should solve Andres' problem, although it's a bit narrow in scope. > It might be better to look through the inactive page list, regardless of > which file they were allocated for. That could solve the "weekly backup" > problem with lots of little files. I wonder if there are cases where that'd cause problems: 1) If the pages selected are still dirty, doesn't this have a significant potential for additional stalls? 2) For some database/postgres tasks it's pretty common to occasionally do in-place writes where parts of the data is already in the page cache, but others aren't. A bit worried that this'd be more aggre aggressive throwing away pages that were already cached before the writes. > I'm not really set up to do performance testing at the moment, so this > is just me thinking hard about the problem. The workload where I was observing the issue was creating backups of larger postgres databases. I've attached the test program we've used. gcc -Wall -ggdb ~/tmp/write_and_fsync.c -o /tmp/write_and_fsync && \ rm -f /srv/dev/bench/test* && \ echo 3 |sudo tee /proc/sys/vm/drop_caches && \ perf stat -a -e cpu-clock,ref-cycles,cycles,instructions \ /tmp/write_and_fsync --blocksize $((128*1024)) --sync_file_range=0 --fallocate=0 --fadvise=0 --sequential=0 --filesize=$((100*1024*1024*1024)) /srv/dev/bench/test{1,2,3,4} Was the last invocation I found in my shell history :) I found that sync_file_range=1, fadvise=1 often gave considerably better performance. Here's a pointer to the thread with more details (see also my response downthread): https://postgr.es/m/20200503023643.x2o2244sa4vkikyb%40alap3.anarazel.de Greetings, Andres Freund
diff --git a/mm/readahead.c b/mm/readahead.c index 3c9a8dd7c56c..3531e1808e24 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -111,9 +111,24 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages, } return ret; } - EXPORT_SYMBOL(read_cache_pages); +/* + * Attempt to detect a streaming workload which exceeds memory and + * handle it by dropping the page cache behind the active part of the + * file. + */ +static void discard_behind(struct file *file, struct address_space *mapping) +{ + unsigned long keep = file->f_ra.ra_pages * 2; + + if (mapping->nrpages < 1000) + return; + if (file->f_ra.start < keep) + return; + invalidate_mapping_pages(mapping, 0, file->f_ra.start - keep); +} + static void read_pages(struct readahead_control *rac, struct list_head *pages, bool skip_page) { @@ -179,6 +194,7 @@ void page_cache_readahead_unbounded(struct address_space *mapping, { LIST_HEAD(page_pool); gfp_t gfp_mask = readahead_gfp_mask(mapping); + gfp_t light_gfp = gfp_mask & ~__GFP_DIRECT_RECLAIM; struct readahead_control rac = { .mapping = mapping, .file = file, @@ -219,7 +235,11 @@ void page_cache_readahead_unbounded(struct address_space *mapping, continue; } - page = __page_cache_alloc(gfp_mask); + page = __page_cache_alloc(light_gfp); + if (!page) { + discard_behind(file, mapping); + page = __page_cache_alloc(gfp_mask); + } if (!page) break; if (mapping->a_ops->readpages) {