From patchwork Tue Oct 13 23:51:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 11836335 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 450C5921 for ; Tue, 13 Oct 2020 23:51:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F24A121D7A for ; Tue, 13 Oct 2020 23:51:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="JsmTlH1m" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F24A121D7A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A27C16B005D; Tue, 13 Oct 2020 19:51:50 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9B3B96B00AF; Tue, 13 Oct 2020 19:51:50 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89E066B00B0; Tue, 13 Oct 2020 19:51:50 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 55DE06B005D for ; Tue, 13 Oct 2020 19:51:50 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E823E180AD81F for ; Tue, 13 Oct 2020 23:51:49 +0000 (UTC) X-FDA: 77368552338.09.pot57_3a0809027207 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id C00A9180AD81A for ; Tue, 13 Oct 2020 23:51:49 +0000 (UTC) X-Spam-Summary: 1,0,0,96f8485c696e2e8a,d41d8cd98f00b204,akpm@linux-foundation.org,,RULES_HIT:2:41:69:355:379:800:960:967:968:973:988:989:1260:1345:1359:1381:1431:1437:1535:1605:1730:1747:1777:1792:2194:2198:2199:2200:2393:2525:2553:2559:2563:2682:2685:2693:2859:2890:2902:2916:2918:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3873:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4049:4119:4250:4321:4605:5007:6119:6261:6653:7514:7576:7875:7903:8660:8784:8957:9025:9545:10004:11026:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12679:12683:12986:13148:13161:13229:13230:13255:13846:14096:21080:21324:21433:21450:21451:21627:21740:21939:21990:30034:30054:30070:30090,0,RBL:198.145.29.99:@linux-foundation.org:.lbl8.mailshell.net-62.2.0.100 64.100.201.201;04yfr4fh9xo79jsachsiku86tg6hpopgcjmep1mowsc44sfe9os1zyok7nbgebo.ea7gqdgehynms6q6zaay45x8sec7j1rfsiuag1ma4fmfma8op6d6d856sneyy35.e-lbl8.mailshell.net- 223.238. X-HE-Tag: pot57_3a0809027207 X-Filterd-Recvd-Size: 8147 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 13 Oct 2020 23:51:49 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4F5C121D81; Tue, 13 Oct 2020 23:51:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1602633108; bh=9lblYUDQuQXIc+Em531EfZY5r75mryVz2Z/rPy6WIlI=; h=Date:From:To:Subject:In-Reply-To:From; b=JsmTlH1m0mUSZzyZpMEJyGVzb71wtIV/vsrugiV3xZeIKLaj7qPw8DVxBKgd/9duB WhcJlZvWQ6X8BwlgbG5StAo6POJePNGqftqBzb0rw3SDKMC4dk745F+jSpA/cAYQ2I jfMnmIAFcBVJYbYiOKrOTlZF32NzthmOMS9LCCF4= Date: Tue, 13 Oct 2020 16:51:47 -0700 From: Andrew Morton To: akpm@linux-foundation.org, hannes@cmpxchg.org, laoar.shao@gmail.com, linux-mm@kvack.org, mgorman@suse.de, mm-commits@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 063/181] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED Message-ID: <20201013235147.a4WzEhOpP%akpm@linux-foundation.org> In-Reply-To: <20201013164658.3bfd96cc224d8923e66a9f4e@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yafang Shao Subject: mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED Our users reported that there're some random latency spikes when their RT process is running. Finally we found that latency spike is caused by FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on remote CPUs, and then waits the per-cpu work to complete. The wait time is uncertain, which may be tens millisecond. That behavior is unreasonable, because this process is bound to a specific CPU and the file is only accessed by itself, IOW, there should be no pagecache pages on a per-cpu pagevec of a remote CPU. That unreasonable behavior is partially caused by the wrong comparation of the number of invalidated pages and the number of the target. For example, if (count < (end_index - start_index + 1)) The count above is how many pages were invalidated in the local CPU, and (end_index - start_index + 1) is how many pages should be invalidated. The usage of (end_index - start_index + 1) is incorrect, because they are virtual addresses, which may not mapped to pages. Besides that, there may be holes between start and end. So we'd better check whether there are still pages on per-cpu pagevec after drain the local cpu, and then decide whether or not to call lru_add_drain_all(). After I applied it with a hotfix to our production environment, most of the lru_add_drain_all() can be avoided. Link: https://lkml.kernel.org/r/20200923133318.14373-1-laoar.shao@gmail.com Signed-off-by: Yafang Shao Suggested-by: Mel Gorman Acked-by: Mel Gorman Cc: Johannes Weiner Signed-off-by: Andrew Morton --- include/linux/fs.h | 4 ++ mm/fadvise.c | 9 +++--- mm/truncate.c | 58 +++++++++++++++++++++++++++++-------------- 3 files changed, 49 insertions(+), 22 deletions(-) --- a/include/linux/fs.h~mm-fadvise-improve-the-expensive-remote-lru-cache-draining-after-fadv_dontneed +++ a/include/linux/fs.h @@ -2581,6 +2581,10 @@ extern bool is_bad_inode(struct inode *) unsigned long invalidate_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t end); +void invalidate_mapping_pagevec(struct address_space *mapping, + pgoff_t start, pgoff_t end, + unsigned long *nr_pagevec); + static inline void invalidate_remote_inode(struct inode *inode) { if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || --- a/mm/fadvise.c~mm-fadvise-improve-the-expensive-remote-lru-cache-draining-after-fadv_dontneed +++ a/mm/fadvise.c @@ -141,7 +141,7 @@ int generic_fadvise(struct file *file, l } if (end_index >= start_index) { - unsigned long count; + unsigned long nr_pagevec = 0; /* * It's common to FADV_DONTNEED right after @@ -154,8 +154,9 @@ int generic_fadvise(struct file *file, l */ lru_add_drain(); - count = invalidate_mapping_pages(mapping, - start_index, end_index); + invalidate_mapping_pagevec(mapping, + start_index, end_index, + &nr_pagevec); /* * If fewer pages were invalidated than expected then @@ -163,7 +164,7 @@ int generic_fadvise(struct file *file, l * a per-cpu pagevec for a remote CPU. Drain all * pagevecs and try again. */ - if (count < (end_index - start_index + 1)) { + if (nr_pagevec) { lru_add_drain_all(); invalidate_mapping_pages(mapping, start_index, end_index); --- a/mm/truncate.c~mm-fadvise-improve-the-expensive-remote-lru-cache-draining-after-fadv_dontneed +++ a/mm/truncate.c @@ -528,23 +528,8 @@ void truncate_inode_pages_final(struct a } EXPORT_SYMBOL(truncate_inode_pages_final); -/** - * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode - * @mapping: the address_space which holds the pages to invalidate - * @start: the offset 'from' which to invalidate - * @end: the offset 'to' which to invalidate (inclusive) - * - * This function only removes the unlocked pages, if you want to - * remove all the pages of one inode, you must call truncate_inode_pages. - * - * invalidate_mapping_pages() will not block on IO activity. It will not - * invalidate pages which are dirty, locked, under writeback or mapped into - * pagetables. - * - * Return: the number of the pages that were invalidated - */ -unsigned long invalidate_mapping_pages(struct address_space *mapping, - pgoff_t start, pgoff_t end) +unsigned long __invalidate_mapping_pages(struct address_space *mapping, + pgoff_t start, pgoff_t end, unsigned long *nr_pagevec) { pgoff_t indices[PAGEVEC_SIZE]; struct pagevec pvec; @@ -610,8 +595,13 @@ unsigned long invalidate_mapping_pages(s * Invalidation is a hint that the page is no longer * of interest and try to speed up its reclaim. */ - if (!ret) + if (!ret) { deactivate_file_page(page); + /* It is likely on the pagevec of a remote CPU */ + if (nr_pagevec) + (*nr_pagevec)++; + } + if (PageTransHuge(page)) put_page(page); count += ret; @@ -623,8 +613,40 @@ unsigned long invalidate_mapping_pages(s } return count; } + +/** + * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode + * @mapping: the address_space which holds the pages to invalidate + * @start: the offset 'from' which to invalidate + * @end: the offset 'to' which to invalidate (inclusive) + * + * This function only removes the unlocked pages, if you want to + * remove all the pages of one inode, you must call truncate_inode_pages. + * + * invalidate_mapping_pages() will not block on IO activity. It will not + * invalidate pages which are dirty, locked, under writeback or mapped into + * pagetables. + * + * Return: the number of the pages that were invalidated + */ +unsigned long invalidate_mapping_pages(struct address_space *mapping, + pgoff_t start, pgoff_t end) +{ + return __invalidate_mapping_pages(mapping, start, end, NULL); +} EXPORT_SYMBOL(invalidate_mapping_pages); +/** + * This helper is similar with the above one, except that it accounts for pages + * that are likely on a pagevec and count them in @nr_pagevec, which will used by + * the caller. + */ +void invalidate_mapping_pagevec(struct address_space *mapping, + pgoff_t start, pgoff_t end, unsigned long *nr_pagevec) +{ + __invalidate_mapping_pages(mapping, start, end, nr_pagevec); +} + /* * This is like invalidate_complete_page(), except it ignores the page's * refcount. We do this because invalidate_inode_pages2() needs stronger