From patchwork Fri Jul 28 07:09:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yin Fengwei X-Patchwork-Id: 13331301 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EC15EB64DD for ; Fri, 28 Jul 2023 07:11:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DEE136B0072; Fri, 28 Jul 2023 03:11:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9D4E6B0074; Fri, 28 Jul 2023 03:11:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C8C656B0075; Fri, 28 Jul 2023 03:11:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BAA626B0072 for ; Fri, 28 Jul 2023 03:11:04 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7C9801608BE for ; Fri, 28 Jul 2023 07:11:04 +0000 (UTC) X-FDA: 81060148848.28.F150CB9 Received: from mgamail.intel.com (unknown [192.55.52.115]) by imf22.hostedemail.com (Postfix) with ESMTP id CC8E6C0004 for ; Fri, 28 Jul 2023 07:11:01 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=L9HIiz3z; spf=pass (imf22.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690528262; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=dPmKXMQpzacxonlUIMX20Ee0aviWzq59jedp+6MePkA=; b=ewSSJaQL5UziEKWw1fKYAuAyPSi+08afdAWQmQYe5wVkzGbsHMKtTsz1i9r/9C05YDikWe U+8foiCz/vw10mXshFsmsXQyiRRMS9oLrGS4iDbPWQOdQSgAblziCOoByI0ZsXP7/6sT7R N85cRy/VrleSwRa/g+wqIz7dtxmF9Ro= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=L9HIiz3z; spf=pass (imf22.hostedemail.com: domain of fengwei.yin@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690528262; a=rsa-sha256; cv=none; b=IebwADw3FIBgeRKLDTiNOF6EEmpBFWKW0FREkxejRFhPv1vGGUqXbdjgfAWnUstVwzBuxH n9H4nnSdHF4PZkeyo+gPL5vaPVRday5yjKgPQYdRGTfVDpdendZ6APFOdAND1FX9FbhObl jQWFx0dgGPBPY0Hxjp574pj/8JCjFGQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690528261; x=1722064261; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=NsuT7YdSl7sZxCcl3/RIBxZPQqeQjrB0Z6lwmBqdqLo=; b=L9HIiz3zSEYclO7WGlFFLpQKza9DYcf6PmMdGgSHWZP0QD2AmrRlgLZs YKxgSK902K9o24jTe2gEk/Kq+GXy3D6mRW7o4PT/vmw1PDz+bGBfj47F/ XJnjYoUNJ4ZyolxDYYpBRAYhyidnEVSRqz7BCdUdtzK/5S1JZz5bN08DN QV5h354+e5ut0pDAuB2YWOuUapMvxgQJgvYEaW8ehr/FHbzeEXcjiG16U mVMR/U2ZpLJ4zmGslDOScGdgnsltgX57zUoPTcwm25O/KmNY7g2PcPId/ th69vGqeUjs1sCoOnHihPAka35Sv2wpokNGFNjFKm5OE39dxpV1HoT3Gt A==; X-IronPort-AV: E=McAfee;i="6600,9927,10784"; a="368543763" X-IronPort-AV: E=Sophos;i="6.01,236,1684825200"; d="scan'208";a="368543763" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jul 2023 00:10:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10784"; a="757039048" X-IronPort-AV: E=Sophos;i="6.01,236,1684825200"; d="scan'208";a="757039048" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by orsmga008.jf.intel.com with ESMTP; 28 Jul 2023 00:10:54 -0700 From: Yin Fengwei To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com, hughd@google.com Cc: fengwei.yin@intel.com Subject: [PATCH 0/3] support large folio for mlock Date: Fri, 28 Jul 2023 15:09:26 +0800 Message-Id: <20230728070929.2487065-1-fengwei.yin@intel.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-Rspamd-Queue-Id: CC8E6C0004 X-Rspam-User: X-Stat-Signature: 1ttooko7zbx9dgc6df563q6o4yq86ix7 X-Rspamd-Server: rspam01 X-HE-Tag: 1690528261-912102 X-HE-Meta: U2FsdGVkX1+nvQvuwCY9I/+IMgPn/wgkDZE5vQ8tM0D9hj/yf3hnj8jKjyvFWQ8PMFusSOJiYNvUqtMyjgB5PyvNGG5JaxfcA6b1vKvLdXz4a+ab44sPOL/rDrmXcVcndwhiR9MmuYfjzYVJ37YyrZ4FgzaleBMu8hCKbUcC5RxZcHLNRrJBRw7oI2WVewwgyAZByt/iS7UxjO4K+21pmO6kAZf94A+KD3zwfMqwOgWMikmmsxTc+2CiFyFVljsLThiDE8oJm6FZ8v0fuGGuXyZvWdWaW/AtKkBM1N8Cf1rwJgehaKMNMztZcI0YfLK/dtOqyxg6tWkFRimaprW2ya/hnDvACBC5bAkokX1W3XiFhr2yv1R9GUKW1I5NFw3uRhV/Mg3iwbaXQ/qZWhOHpoEXDViyuPikRXBwHq3QjGhqtec1MSPhBhqqKt4uXnLPG1oeHB4l2sqC3ElUD/rCDAEKDHlY2e1p6i6edF7A2ifHWpXJzR+YXDKb6VlCDnbYWAxhMBoC5Tc+GOROIJiW/N+kS35ZnoLmDFdrQ2vr9IbNckBsXDbj06psSAOke1mSGeZINSUJz+kXftfpATJ+wxHe45zcqB/hMzbfYMDpMS03Dm0WhC64u4uz+oohy9g1QwhAvWYS908/A/5eRSt/GZ8I6L/1obUXENBdEzBQr9WQlXn5xxKSq7snDvsF5+g+29uZ4FYMeHBd01HHFmIq+ALE16Z5BvP3PuoKPlqUuprwJ5ZLQrAu3CY8jYNX/cKsXFhM6FGdQPGygUIMDd/3DYbhC6Kk8XIxTRKfNKA5BudO+xuYFaV+EJM/mDoXFJbh6YhxxC7oje/pY1k+yZevrVPfSi0vZ3J6DluVHsn8TNLaY+48LWNwiJmi8LovDVgOCMyOIzmLj6lGpxnFRl6pdKWZFiXlZ3hZEelIXUiDfOUORSh79Pmo+g81e1YtREF28f4nPIa+HcRPKsZRxK/ L9Gv4q4k GFNQDF1o3asvuiTypCYKMTI22Kf6kqEDwzJgOcQA3neQk3ruHLhSz18+c3st4gBVDKFrVmak1qj5YS15Y8BB4pbPNFeGSfexqC5yJsz8E5JD3RqfTclASvcJn0IQWUB/Lt6ZBWOera6KxInBhDuietOLY4CUzH+WSLjfmp4xgSWmKjw7Bk1pM23RSuz2O/6HE8/2Db2RvYM96he1jco9x3GxgXjyBnUih43nBP0QzArWxZfkDp6TPkRnsH2WZ9j6MILaPhhK+EJP6q5kyhlXQ7x5/FdWTyrA4+OwJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Yu mentioned at [1] about the mlock() can't be applied to large folio. I leant the related code and here is my understanding: - For RLIMIT_MEMLOCK related, there is no problem. Becuase the RLIMIT_MEMLOCK statistics is not related underneath page. That means underneath page mlock or munlock doesn't impact the RLIMIT_MEMLOCK statistics collection which is always correct. - For keeping the page in RAM, there is no problem either. At least, during try_to_unmap_one(), once detect the VMA has VM_LOCKED bit set in vm_flags, the folio will be kept whatever the folio is mlocked or not. So the function of mlock for large folio works. But it's not optimized because the page reclaim needs scan these large folio and may split them. This series identified the large folio for mlock to two types: - The large folio is in VM_LOCKED VMA range - The large folio cross VM_LOCKED VMA boundary For the first type, we mlock large folio so page relcaim will skip it. For the second type, we don't mlock large folio. It's allowed to be picked by page reclaim and be split. So the pages not in VM_LOCKED VMA range are allowed to be reclaimed/released. patch1 introduce API to check whether large folio is in VMA range. patch2 make page reclaim/mlock_vma_folio/munlock_vma_folio support large folio mlock/munlock. patch3 make mlock/munlock syscall support large folio. testing done: - kernel selftest. No extra failure introduced RFC v2 was post here [2]. Yu also mentioned a race which can make folio unevictable after munlock during RFC v2 discussion [3]: We decided that race issue didn't block this series based on: - That race issue was not introduced by this sereis - We had a looks-ok fix for that race issue. Need to wait for mlock_count fixing patch as Yosry Ahmed suggested [4] ChangeLog from RFC v2: - Removed RFC - dropped folio_is_large() check as suggested by both Yu and Huge - Besides the address/pgoff check, also check the page table entry when check whether the folio is in the range. This is to handle mremap case that address/pgoff is in range, but folio can't be identified as in range. - Fixed one issue in page_add_anon_rmap() and page_add_anon_rmap() introdued by RFC v2. As these two functions can be called multiple times against one folio. And remove_rmap() may not be called same times. Which can bring imbalanced mlock_count. Fix it by skip mlock large folio in these two functions. [1] https://lore.kernel.org/linux-mm/CAOUHufbtNPkdktjt_5qM45GegVO-rCFOMkSh0HQminQ12zsV8Q@mail.gmail.com/ [2] https://lore.kernel.org/linux-mm/20230712060144.3006358-1-fengwei.yin@intel.com/ [3] https://lore.kernel.org/linux-mm/CAOUHufZ6=9P_=CAOQyw0xw-3q707q-1FVV09dBNDC-hpcpj2Pg@mail.gmail.com/ [4] https://lore.kernel.org/linux-mm/CAJD7tkZJFG=7xs=9otc5CKs6odWu48daUuZP9Wd9Z-sZF07hXg@mail.gmail.com/ Yin Fengwei (3): mm: add functions folio_in_range() and folio_within_vma() mm: handle large folio when large folio in VM_LOCKED VMA range mm: mlock: update mlock_pte_range to handle large folio mm/internal.h | 87 +++++++++++++++++++++++++++++++++++++++++++++------ mm/mlock.c | 57 +++++++++++++++++++++++++++++++-- mm/rmap.c | 27 +++++++++++----- 3 files changed, 153 insertions(+), 18 deletions(-)