From patchwork Sun Feb 4 09:35:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peng Zhang X-Patchwork-Id: 13544590 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA0EDC48286 for ; Sun, 4 Feb 2024 09:35:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 047786B0071; Sun, 4 Feb 2024 04:35:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F3AD16B0072; Sun, 4 Feb 2024 04:35:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E28BF6B0074; Sun, 4 Feb 2024 04:35:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D34DB6B0071 for ; Sun, 4 Feb 2024 04:35:42 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 75FF7C032E for ; Sun, 4 Feb 2024 09:35:42 +0000 (UTC) X-FDA: 81753614124.20.EA311C9 Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) by imf09.hostedemail.com (Postfix) with ESMTP id 01CA414002C for ; Sun, 4 Feb 2024 09:35:38 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf09.hostedemail.com: domain of zhangpeng362@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=zhangpeng362@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707039340; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=geFXpsXofL7XW5BWt0Z1o2nf7CYYC35VeMrKKXYfxvk=; b=LMl4kQA3F4skP8tySrWAsMDu1csuRwWDY6KYhjKo4amc4Wr0tN6u+wTESqqIBAurJCMvcU 042TIJ7p0hdHbFt8P0Y+aR3l1SgEiRXs4kFdCIGhfcmYd7X3pbqK3xlkYiG8nybOI4eXsT ZLlABz9wGnaMOO7QPiwNicearhqp3RA= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf09.hostedemail.com: domain of zhangpeng362@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=zhangpeng362@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707039340; a=rsa-sha256; cv=none; b=cnVKeJqSd8Dh7A+AuGRU6+F7s01V5kDKDv83HkzpRXFmXziAiAt299L6M9/eOdRdUG/EVb /Qx94F4sWlnDAo33Ya3kBH0oVh+Z1U5TpTJZjOOPVzk23AHEvIKt9jpSvNkW/ByqRXu+Yb zNn7sa8ycRtCmzP7RIGhsKPCpjGIVMY= Received: from mail.maildlp.com (unknown [172.19.163.44]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4TSPTG0kj4z1Q8bW; Sun, 4 Feb 2024 17:33:38 +0800 (CST) Received: from kwepemm600020.china.huawei.com (unknown [7.193.23.147]) by mail.maildlp.com (Postfix) with ESMTPS id 5538414013B; Sun, 4 Feb 2024 17:35:32 +0800 (CST) Received: from localhost.localdomain (10.175.112.125) by kwepemm600020.china.huawei.com (7.193.23.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Sun, 4 Feb 2024 17:35:31 +0800 From: Peng Zhang To: , CC: , , , , , , , , , Subject: [PATCH] filemap: avoid unnecessary major faults in filemap_fault() Date: Sun, 4 Feb 2024 17:35:26 +0800 Message-ID: <20240204093526.212636-1-zhangpeng362@huawei.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Originating-IP: [10.175.112.125] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600020.china.huawei.com (7.193.23.147) X-Rspamd-Queue-Id: 01CA414002C X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: rd61b8c3jbbgpd3qcn3pq4d48nmi1cdt X-HE-Tag: 1707039338-299859 X-HE-Meta: U2FsdGVkX1/ScNoxaksOG3uEC8b+LUBTFhNV1v84de9VEOgsvobkRtrh9UFHp4o7LEVzjU/uUdDbrT0OZe091ItzWsNrlwL1XRLlQsJcEqtKUso0WpYKPTxINmdSIgm/StrOtsq8YcXJjKvfUDfBPX5hxx9A0LrGRaeHLyzUEGStQqGBTxsJysoXz4RV/jtup334R8wtsfgPdwvav96kd46h9La/ZkCG17OjlYT3Vm3lmmzyStLpxLOkLDYM770zSIMTuyKtkeYYkY+cjtCNbLwYZgM85Q74Ipfi1/v+MNxvcIS432EhO2hlPYpge04yF/aFCpR/82umxJ3Jh3a0x9WfeJicNiknqc5M9fjz8j7nCiFgeb9Cr0Y3SOxAzpCCLYobuVaBr3UcTJFNg8X6ptCQ6zRyMziKmQblQuOLSr5SZ/xLiLhuJeY8Dto8IdcGZtd7z97bMA+gaQjtZIw4MaZM9ltcyxaJHoN/VnKPZ8Cy9ZfubGo/O11dYhGr5JAHquLcH0GGjClMhNiszXccqK195aK6USgIAM92ITmg1carM9I6Yr6N7UrPJ7wNvysy4LbkdRQDzAu7WXcDS0xN+hDJ2fvKk8QBHm4wtjupgw3Ujyg87CWaSzJK37atolE2e26SDgATRjoM5Jxx1ueTmPjlCtWQKHmREycSZGJ7fI6qHW/8BUkJHkk+6Cneku7TmEk006XZPCM3cXy/+3XN+hHQzR2OfEU+aG0+WCy/RX86ZVVszQfVoVW7EyvKwm/AfZgmHc1RLvzzYsYXF7S5rgRqBVPpf9FwZ2kYQF3I/DWLdSKMRoDgZU25DSvCPwAX+64FqX0l57xrWUc6ySdj1+/9xp9oW4s/5iayZe0aDnFQwlQLCR/kP2n05OhuveesPjF3yNgbCfZGDWIvlwLvb1TvQvM8Mm2S6/LnDWjRg6rNGm2fXluNxGP6FpQ+cyDXRBg6E0/Q+NQdYtzr4A5 Efd52nIO wj+CHCIrFSPz9r/yoI6eTqHW4FWhKdhEYWIdLJQS7oq57hMoIummpoIXLri4m4mO+KxmjiMGEXf7ass88kRH1JFwMp5X+P8f7//MY4DarjOLhGqWXj5PwvKZLZ3u/aF9pYtsxtpAf9+QB/XJzfwAzCAvpCwhf3YRyekHrA2rLMArNc0J8vy318bP03BmaCFgzrJCIS0vmD/YW4FJZwWvyWFBzw+CG2NN761EgV7B1YdDw8jn31WUTX4RQ/fQRXnUuYAWfIf2Rr5S9Wpstc8Ast/a3XxvnrVmfSpVxA0emeJJhzZ4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: ZhangPeng The major fault occurred when using mlockall(MCL_CURRENT | MCL_FUTURE) in application, which leading to an unexpected performance issue[1]. This caused by temporarily cleared PTE during a read/modify/write update of the PTE, eg, do_numa_page()/change_pte_range(). For the data segment of the user-mode program, the global variable area is a private mapping. After the pagecache is loaded, the private anonymous page is generated after the COW is triggered. Mlockall can lock COW pages (anonymous pages), but the original file pages cannot be locked and may be reclaimed. If the global variable (private anon page) is accessed when vmf->pte is zeroed in numa fault, a file page fault will be triggered. At this time, the original private file page may have been reclaimed. If the page cache is not available at this time, a major fault will be triggered and the file will be read, causing additional overhead. Fix this by rechecking the PTE without acquiring PTL in filemap_fault() before triggering a major fault. Testing file anonymous page read and write page fault performance in ext4 and ramdisk using will-it-scale[2] on a x86 physical machine. The data is the average change compared with the mainline after the patch is applied. The test results are within the range of fluctuation, and there is no obvious difference. The test results are as follows: processes processes_idle threads threads_idle ext4 file write: -1.14% -0.08% -1.87% 0.13% ext4 file read: 0.03% -0.65% -0.51% -0.08% ramdisk file write: -1.21% -0.21% -1.12% 0.11% ramdisk file read: 0.00% -0.68% -0.33% -0.02% [1] https://lore.kernel.org/linux-mm/9e62fd9a-bee0-52bf-50a7-498fa17434ee@huawei.com/ [2] https://github.com/antonblanchard/will-it-scale/ Suggested-by: "Huang, Ying" Suggested-by: Yin Fengwei Signed-off-by: ZhangPeng Signed-off-by: Kefeng Wang --- RFC->v1: - Add error handling when ptep == NULL per Huang, Ying and Matthew Wilcox - Check the PTE without acquiring PTL in filemap_fault(), suggested by Huang, Ying and Yin Fengwei - Add pmd_none() check before PTE map - Update commit message and add performance test information mm/filemap.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 142864338ca4..b29cdeb6a03b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3238,6 +3238,24 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) mapping_locked = true; } } else { + if (!pmd_none(*vmf->pmd)) { + pte_t *ptep; + + ptep = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!ptep)) + return VM_FAULT_NOPAGE; + /* + * Recheck pte as the pte can be cleared temporarily + * during a read/modify/write update. + */ + if (unlikely(!pte_none(ptep_get_lockless(ptep)))) + ret = VM_FAULT_NOPAGE; + pte_unmap(ptep); + if (unlikely(ret)) + return ret; + } + /* No page in the page cache at all */ count_vm_event(PGMAJFAULT); count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);