From patchwork Tue Jun 4 10:17:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 13685014 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D39FCC25B78 for ; Tue, 4 Jun 2024 10:18:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 50CF86B00B9; Tue, 4 Jun 2024 06:18:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 47B876B00B8; Tue, 4 Jun 2024 06:18:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 31D5A6B00B7; Tue, 4 Jun 2024 06:18:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0E4876B00B3 for ; Tue, 4 Jun 2024 06:18:14 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B5E29A26A4 for ; Tue, 4 Jun 2024 10:18:11 +0000 (UTC) X-FDA: 82192805982.25.461503B Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 83CC98000D for ; Tue, 4 Jun 2024 10:18:08 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=Tt3STTYS; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf30.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717496290; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=vLFyufJhWaexguoRjjuFto4LoceeEWV5l7eBCt2gI6k=; b=euFCjEG1vCcD3Zz1ACYaG9oomuDgWTVhp7rFopsU5ZZpiXB3U8NWSr8SBG19K4N0Xj5pfG yQwHTvGDUGGI9mCUyK5l3JEZk0o7tAN2vsWj/Pj3oFgZWdQbo9jJa2JhAaJRVM9I1mwYfb 3Q4FIWe8Oqchyex2QDoZHFdjdOerzD0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717496290; a=rsa-sha256; cv=none; b=P7UGRcU5UgYNzzN7uMiFGCO0pAxzRNiRVDwrL3nAoMTLxjHJSHVnhDSkAlnrwfpV7JAo3y MpxSnjyiBb27xRKoKoo8vk5IjKDJ0hQA6acD0SIfrDkFXtW4M3kAiee/vz0LJ+v6gQG957 loFB9HgU/Ey7TQUpbGpVPLFW6TfnuKo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=Tt3STTYS; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf30.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1717496285; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=vLFyufJhWaexguoRjjuFto4LoceeEWV5l7eBCt2gI6k=; b=Tt3STTYSmIOs8W0SA/tP9gpatb4WKikKAVUUXX9pZYks2+pPSoDzQHSs6S+Qy04qEzA4FeWSEjzq7QnlZPZNvicpdxPe4Ts/Mb7WNbHdgx52oXbK2vdnYKJDh+bImCJiGViHoAjx+im8zFA7ENuSG1XNRzd6g3E29wkLJPccjvk= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033068173054;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0W7qsp.k_1717496283; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0W7qsp.k_1717496283) by smtp.aliyun-inc.com; Tue, 04 Jun 2024 18:18:04 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, ioworker0@gmail.com, da.gomez@samsung.com, p.raghav@samsung.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 0/6] add mTHP support for anonymous shmem Date: Tue, 4 Jun 2024 18:17:44 +0800 Message-Id: X-Mailer: git-send-email 2.39.3 MIME-Version: 1.0 X-Stat-Signature: hrbfm8ka5y57o4c1c7owix35izsk7t4y X-Rspamd-Queue-Id: 83CC98000D X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1717496288-305410 X-HE-Meta: U2FsdGVkX1+9u74L31gnF7BtOOsyYCwnz46gV5N/PcsaqxQdMQdYyaP+iHDkYGRt8sLRVmcIDwWvqObHotwhUsPvOJ2nyQ5yKr/QGhGuMxliqruiY3yK4foVgAu8Hx2EQWr/BTF0suMfNeGmpln7PalUlzrdFH5WpPjJ6m90sPuY2CTDgXc6/Jk5gYHQECKOzQGk2gr49ih5QPmSY86Svch/aCEB6vqO53SB8RW7Q4RT/Wz2iDDnQjsmGyLhyCzLiDWKROwIyp9pyowXSocTOuJivgfGutyzJuOrtFaLyuVyClBnhyET0RbRmczeQJZISG6aELU7eTW4ypLOj+UGUXJ5hPtfypxcmMUva0bWHYpHnd3IoDz3YL4aLjsdrTekfK3IfWjZZRpNMlKXOykLc0xn204p8EvhKOzkFqEtr5aEiBT01pK2bFmHM0pByBR3uZWclJZ67Z2q+/PjHtzOIWJYDYwK/SpeJx1uUeBjoALK+ODE/ZRqy8XeyAN1nShhISLdh2Jvtw2KxEeci4/L/mMeoFa/+6ZDH68A/oMQ6BeEoo9fteJN3JAT2Vh7QewJ2PVfmsNfQIYYUhhxoherUOtZrh91OSFE16NxNOnlUULlrwWHfcXHmvlRzbfoczC28JepYydKLdxnHetnzgRz/NWf+Y1Wa/b31emw9KMJfQklSE/N3ES7l0rtsSMANnsq4LLivZ9PcX2OyHXAnY9/4Eey2xa5q4HB/qHgYSPRc6phb79FK4Y6o5mhI4xOOrP+3WhrVmHh8XawAfz8Mj4cb9iqz1GiFlGPL6f3vw0kuTshsgwNORj68IJQ5qIDwVoO8GC+AuDglkjhe/SFHoUMe5ET4SV1nOAVUo+rIDfjfPsXYtnAxdbewbMhNzXj9Ek2vZgFCTvqimf3ULXDkpSEYGUEnnzw47yn+3zpEWSYiLc4xsAOwgDiDSzqF0pP5BXvoZWNELM2cvS1RDxWL5W j6b1Swg+ D1yyfMXtVUvfKQhD2m/Bv8WPBNhb87EC0CKaN6/MMuxxErIGYNN8zl7IT1peoXG+tz9tVdoJCKmxOT9lAS6qJ4LkgyKhi9SaKwmf7NDNc6gDWZCIlwHnaqQafx30+uY43jN+WgyDjV83hO5EqTIacatUhxqeZSRXf+Te2jdaNwtdf99Ph1l8t+uj3shJpCaorGnxLALdKoa5vHeEDeld76n3rRMzc9gx5Qqb1cvopegqJrIo8Skzgfr+luIdI5yEOrT5Qb0HxLMdWHD6c5NlqEJMJ2G47arUr0tEwIYSyIfhaCkY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Anonymous pages have already been supported for multi-size (mTHP) allocation through commit 19eaf44954df, that can allow THP to be configured through the sysfs interface located at '/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'. However, the anonymous shmem will ignore the anonymous mTHP rule configured through the sysfs interface, and can only use the PMD-mapped THP, that is not reasonable. Many implement anonymous page sharing through mmap(MAP_SHARED | MAP_ANONYMOUS), especially in database usage scenarios, therefore, users expect to apply an unified mTHP strategy for anonymous pages, also including the anonymous shared pages, in order to enjoy the benefits of mTHP. For example, lower latency than PMD-mapped THP, smaller memory bloat than PMD-mapped THP, contiguous PTEs on ARM architecture to reduce TLB miss etc. As discussed in the bi-weekly MM meeting[1], the mTHP controls should control all of shmem, not only anonymous shmem, but support will be added iteratively. Therefore, this patch set starts with support for anonymous shmem. The primary strategy is similar to supporting anonymous mTHP. Introduce a new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled', which can have almost the same values as the top-level '/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new additional "inherit" option and dropping the testing options 'force' and 'deny'. By default all sizes will be set to "never" except PMD size, which is set to "inherit". This ensures backward compatibility with the anonymous shmem enabled of the top level, meanwhile also allows independent control of anonymous shmem enabled for each mTHP. Use the page fault latency tool to measure the performance of 1G anonymous shmem with 32 threads on my machine environment with: ARM64 Architecture, 32 cores, 125G memory: base: mm-unstable user-time sys_time faults_per_sec_per_cpu faults_per_sec 0.04s 3.10s 83516.416 2669684.890 mm-unstable + patchset, anon shmem mTHP disabled user-time sys_time faults_per_sec_per_cpu faults_per_sec 0.02s 3.14s 82936.359 2630746.027 mm-unstable + patchset, anon shmem 64K mTHP enabled user-time sys_time faults_per_sec_per_cpu faults_per_sec 0.08s 0.31s 678630.231 17082522.495 From the data above, it is observed that the patchset has a minimal impact when mTHP is not enabled (some fluctuations observed during testing). When enabling 64K mTHP, there is a significant improvement of the page fault latency. [1] https://lore.kernel.org/all/f1783ff0-65bd-4b2b-8952-52b6822a0835@redhat.com/ Changes from v3: - Drop 'force' and 'deny' testing options for each mTHP. - Use new helper update_mmu_tlb_range(), per Lance. - Update documentation to drop "anonymous thp" terminology, per David. - Initialize the 'suitable_orders' in shmem_alloc_and_add_folio(), reported by kernel test robot. - Fix the highest mTHP order in shmem_get_unmapped_area(). - Update some commit message. Changes from v2: - Rebased to mm/mm-unstable. - Remove 'huge' parameter for shmem_alloc_and_add_folio(), per Lance. Changes from v1: - Drop the patch that re-arranges the position of highest_order() and next_order(), per Ryan. - Modify the finish_fault() to fix VA alignment issue, per Ryan and David. - Fix some building issues, reported by Lance and kernel test robot. - Update some commit message. Changes from RFC: - Rebase the patch set against the new mm-unstable branch, per Lance. - Add a new patch to export highest_order() and next_order(). - Add a new patch to align mTHP size in shmem_get_unmapped_area(). - Handle the uffd case and the VMA limits case when building mapping for large folio in the finish_fault() function, per Ryan. - Remove unnecessary 'order' variable in patch 3, per Kefeng. - Keep the anon shmem counters' name consistency. - Modify the strategy to support mTHP for anonymous shmem, discussed with Ryan and David. - Add reviewed tag from Barry. - Update the commit message. Baolin Wang (6): mm: memory: extend finish_fault() to support large folio mm: shmem: add THP validation for PMD-mapped THP related statistics mm: shmem: add multi-size THP sysfs interface for anonymous shmem mm: shmem: add mTHP support for anonymous shmem mm: shmem: add mTHP size alignment in shmem_get_unmapped_area mm: shmem: add mTHP counters for anonymous shmem Documentation/admin-guide/mm/transhuge.rst | 23 ++ include/linux/huge_mm.h | 23 ++ mm/huge_memory.c | 17 +- mm/memory.c | 57 +++- mm/shmem.c | 344 ++++++++++++++++++--- 5 files changed, 403 insertions(+), 61 deletions(-)