From patchwork Wed Sep 23 13:33:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 11795039 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BAA5B92C for ; Wed, 23 Sep 2020 13:33:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5C6C22223E for ; Wed, 23 Sep 2020 13:33:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Kdh53FYS" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5C6C22223E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 27EA66B0003; Wed, 23 Sep 2020 09:33:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 255BE6B0055; Wed, 23 Sep 2020 09:33:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 192546B005A; Wed, 23 Sep 2020 09:33:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id 05D376B0003 for ; Wed, 23 Sep 2020 09:33:36 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AE448181AEF07 for ; Wed, 23 Sep 2020 13:33:35 +0000 (UTC) X-FDA: 77294418390.23.wheel08_3b15e2e27156 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 8E67B37606 for ; Wed, 23 Sep 2020 13:33:35 +0000 (UTC) X-Spam-Summary: 1,0,0,74a281a95afa8755,d41d8cd98f00b204,laoar.shao@gmail.com,,RULES_HIT:2:41:69:355:379:541:800:960:968:973:988:989:1260:1345:1437:1535:1605:1730:1747:1777:1792:2194:2198:2199:2200:2393:2553:2559:2562:2693:2890:2916:2918:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3873:3874:4049:4120:4250:4321:4605:5007:6119:6261:6653:7514:7875:7903:8660:8957:9413:10004:11026:11658:11914:12043:12048:12291:12296:12297:12438:12517:12519:12555:12683:12895:13148:13161:13229:13230:13255:14096:14394:14687:21080:21324:21433:21444:21450:21451:21627:21666:21740:21939:21990:30034:30054:30070:30090,0,RBL:209.85.210.68:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100;04yfganz5u3iw5say9nzhinsnsogeycbbmdej5ts4tjbfzob97s1zyok7nb85ws.ffyr73ykz313dfbaya5chffixtzehxd8q948zqh6ezzf8g5kqn6ddpt7x8afm44.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:27,LUA_S UMMARY:n X-HE-Tag: wheel08_3b15e2e27156 X-Filterd-Recvd-Size: 9342 Received: from mail-ot1-f68.google.com (mail-ot1-f68.google.com [209.85.210.68]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Sep 2020 13:33:35 +0000 (UTC) Received: by mail-ot1-f68.google.com with SMTP id 95so11672542ota.13 for ; Wed, 23 Sep 2020 06:33:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=OmlWgR2EEPYx1hwp1nUCLhtJ0r/fviJC6lyb46D/m7U=; b=Kdh53FYSYuLL1qcRaOceGiz9+u8A6a8TsObKBmlrJyuGuKP0AXNTf4FzxWSxdsdvRF 7uhb6KHUBCgzudSsLGeyEzKymCHuEocc+rJDj/8FWxxKHL5m7dJMosgj0/CV3LKRFUIb jVowOAOZC2Y9jkJJPSkBHsNIzprDtvFzOx7pqb1bOl82cOcuAt2PaG6ltEoGsgLnLtX3 Sxwb5a0x8oBMZY3UCg9Mgt77L/ixfthHI2l5zDP6HD1Seq2uH+dEWMThPyeMnvtwrkWZ mBwV2X+4uul1cVrQUjD4xgcnVcNoVfQGfMZ4R1jW2RcNf2EvEbcjGV1gI2v0O620xveH nhaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=OmlWgR2EEPYx1hwp1nUCLhtJ0r/fviJC6lyb46D/m7U=; b=qf0YCXnt+OZ9JRNesb3UeyeQ7OVo1BDR+krVFYGDBkQNqsaKJa9F8K0hLcWQ7hbU1z yJ13/3BwWYe0Q5Mn1RgANSxV+yW+BAsxwTq2L66dDJui0sEPYEF4HQ19eP6XwUW7W32U 6zqtg99lIK+ablXdmkeRKZ2rVqi85ZidlA6xtUhIj4LkfKiPdTeWbC+saYZZHmXoVz8t 7Hu+Z+bOIfDen/hJk3tL90jMhTfDhregnQ/zkdF2ncZD0wzjJPiIxdoPmi1FPrsglAaT J7i8Ofa2ZUA5AzyZjzMAKIElYt7s3TDJpWkJlf/tEikEisAmzG1mrJ07XJaViLCkkMBK Qt4A== X-Gm-Message-State: AOAM5332mnDmPGbefIaMuTYQWhmefSLx6mGtWLsp9QPR/a2e1sn36OlF lfplRAjFYe5fKFExI4V8KM0= X-Google-Smtp-Source: ABdhPJzoZ9OgYzTkokmECgi/PqktdATrEGFROMa7Tgjw5hIgY/wRhwIMrkOVZJTMlIQPgRfft9tQAw== X-Received: by 2002:a9d:6a57:: with SMTP id h23mr6579397otn.22.1600868014335; Wed, 23 Sep 2020 06:33:34 -0700 (PDT) Received: from localhost.localdomain ([50.236.19.102]) by smtp.gmail.com with ESMTPSA id g22sm6294ots.35.2020.09.23.06.33.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Sep 2020 06:33:33 -0700 (PDT) From: Yafang Shao To: mgorman@suse.de, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, Yafang Shao Subject: [PATCH v2] mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED Date: Wed, 23 Sep 2020 21:33:18 +0800 Message-Id: <20200923133318.14373-1-laoar.shao@gmail.com> X-Mailer: git-send-email 2.17.2 (Apple Git-113) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Our users reported that there're some random latency spikes when their RT process is running. Finally we found that latency spike is caused by FADV_DONTNEED. Which may call lru_add_drain_all() to drain LRU cache on remote CPUs, and then waits the per-cpu work to complete. The wait time is uncertain, which may be tens millisecond. That behavior is unreasonable, because this process is bound to a specific CPU and the file is only accessed by itself, IOW, there should be no pagecache pages on a per-cpu pagevec of a remote CPU. That unreasonable behavior is partially caused by the wrong comparation of the number of invalidated pages and the number of the target. For example, if (count < (end_index - start_index + 1)) The count above is how many pages were invalidated in the local CPU, and (end_index - start_index + 1) is how many pages should be invalidated. The usage of (end_index - start_index + 1) is incorrect, because they are virtual addresses, which may not mapped to pages. Besides that, there may be holes between start and end. So we'd better check whether there are still pages on per-cpu pagevec after drain the local cpu, and then decide whether or not to call lru_add_drain_all(). After I applied it with a hotfix to our production environment, most of the lru_add_drain_all() can be avoided. Suggested-by: Mel Gorman Signed-off-by: Yafang Shao Cc: Mel Gorman Cc: Johannes Weiner Acked-by: Mel Gorman --- include/linux/fs.h | 4 ++++ mm/fadvise.c | 9 +++---- mm/truncate.c | 58 ++++++++++++++++++++++++++++++++-------------- 3 files changed, 49 insertions(+), 22 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 7519ae003a08..6ba747b097c5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2591,6 +2591,10 @@ extern bool is_bad_inode(struct inode *); unsigned long invalidate_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t end); +void invalidate_mapping_pagevec(struct address_space *mapping, + pgoff_t start, pgoff_t end, + unsigned long *nr_pagevec); + static inline void invalidate_remote_inode(struct inode *inode) { if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || diff --git a/mm/fadvise.c b/mm/fadvise.c index 0e66f2aaeea3..d6baa4f451c5 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -141,7 +141,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) } if (end_index >= start_index) { - unsigned long count; + unsigned long nr_pagevec = 0; /* * It's common to FADV_DONTNEED right after @@ -154,8 +154,9 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) */ lru_add_drain(); - count = invalidate_mapping_pages(mapping, - start_index, end_index); + invalidate_mapping_pagevec(mapping, + start_index, end_index, + &nr_pagevec); /* * If fewer pages were invalidated than expected then @@ -163,7 +164,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice) * a per-cpu pagevec for a remote CPU. Drain all * pagevecs and try again. */ - if (count < (end_index - start_index + 1)) { + if (nr_pagevec) { lru_add_drain_all(); invalidate_mapping_pages(mapping, start_index, end_index); diff --git a/mm/truncate.c b/mm/truncate.c index dd9ebc1da356..6bbe0f0b3ce9 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -528,23 +528,8 @@ void truncate_inode_pages_final(struct address_space *mapping) } EXPORT_SYMBOL(truncate_inode_pages_final); -/** - * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode - * @mapping: the address_space which holds the pages to invalidate - * @start: the offset 'from' which to invalidate - * @end: the offset 'to' which to invalidate (inclusive) - * - * This function only removes the unlocked pages, if you want to - * remove all the pages of one inode, you must call truncate_inode_pages. - * - * invalidate_mapping_pages() will not block on IO activity. It will not - * invalidate pages which are dirty, locked, under writeback or mapped into - * pagetables. - * - * Return: the number of the pages that were invalidated - */ -unsigned long invalidate_mapping_pages(struct address_space *mapping, - pgoff_t start, pgoff_t end) +unsigned long __invalidate_mapping_pages(struct address_space *mapping, + pgoff_t start, pgoff_t end, unsigned long *nr_pagevec) { pgoff_t indices[PAGEVEC_SIZE]; struct pagevec pvec; @@ -610,8 +595,13 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping, * Invalidation is a hint that the page is no longer * of interest and try to speed up its reclaim. */ - if (!ret) + if (!ret) { deactivate_file_page(page); + /* It is likely on the pagevec of a remote CPU */ + if (nr_pagevec) + (*nr_pagevec)++; + } + if (PageTransHuge(page)) put_page(page); count += ret; @@ -623,8 +613,40 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping, } return count; } + +/** + * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode + * @mapping: the address_space which holds the pages to invalidate + * @start: the offset 'from' which to invalidate + * @end: the offset 'to' which to invalidate (inclusive) + * + * This function only removes the unlocked pages, if you want to + * remove all the pages of one inode, you must call truncate_inode_pages. + * + * invalidate_mapping_pages() will not block on IO activity. It will not + * invalidate pages which are dirty, locked, under writeback or mapped into + * pagetables. + * + * Return: the number of the pages that were invalidated + */ +unsigned long invalidate_mapping_pages(struct address_space *mapping, + pgoff_t start, pgoff_t end) +{ + return __invalidate_mapping_pages(mapping, start, end, NULL); +} EXPORT_SYMBOL(invalidate_mapping_pages); +/** + * This helper is similar with the above one, except that it accounts for pages + * that are likely on a pagevec and count them in @nr_pagevec, which will used by + * the caller. + */ +void invalidate_mapping_pagevec(struct address_space *mapping, + pgoff_t start, pgoff_t end, unsigned long *nr_pagevec) +{ + __invalidate_mapping_pages(mapping, start, end, nr_pagevec); +} + /* * This is like invalidate_complete_page(), except it ignores the page's * refcount. We do this because invalidate_inode_pages2() needs stronger