From patchwork Fri Jul 26 09:46:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13742540 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 864BEC3DA7F for ; Fri, 26 Jul 2024 09:47:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 111A66B00A3; Fri, 26 Jul 2024 05:47:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 09C366B00A4; Fri, 26 Jul 2024 05:47:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2F036B00A5; Fri, 26 Jul 2024 05:47:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C14B66B00A3 for ; Fri, 26 Jul 2024 05:47:37 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7A5B41217AE for ; Fri, 26 Jul 2024 09:47:37 +0000 (UTC) X-FDA: 82381426554.04.41F74A3 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf03.hostedemail.com (Postfix) with ESMTP id 9214F2002B for ; Fri, 26 Jul 2024 09:47:35 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PEQSP2Jj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721987215; a=rsa-sha256; cv=none; b=wOz8j0Xl6JXJ9VhUtxapIouwQRwsGttxgodbWO5n0pTLy4xDsl09T3Xi7vPW19koGED4FH FjwQbBEnUFCsMoeKUr59W3CpZy4eTKQcJ99u+olOw6bU1mWVpyCcHWUyffh+1lqKCxqNJP gt3Js2DbI+3ltmjP+msxBv8M5lTMFz8= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PEQSP2Jj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721987215; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hXeSWJ/2XL0C00VB4D2wsSnDGte33aaztEw6N6+DVHo=; b=SBAFS/YtP2vogHjE0rVKKASI5+R+THOCzU5aV2qz0WBpj1teltLcRSR0CImRmDWQRhflIE 7/l+K6hQQdu2nSVgRunIBDREoXsPImMTn6CxrwqBRzse0XIVlusqYUTzU46UzFsF+ixFK5 plSO+Tk0JQSwdwMedSypSO4FhmJxxhE= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1fc60c3ead4so3656165ad.0 for ; Fri, 26 Jul 2024 02:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721987254; x=1722592054; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hXeSWJ/2XL0C00VB4D2wsSnDGte33aaztEw6N6+DVHo=; b=PEQSP2JjjJ0nyfWmRzBls/Dh2iCXtX+0gtRpTD/cvJnb3ONUVAovyiYB88Oz7QZzn/ hx8RIxH0lqcSS0NhRHNoqDk0wl5dT+sEdyIIr66pFT69rc5LcAf5BgDMS36Vjp6S34Dz Kw5Ics+5/ianL/T0pQ/8ex8eQ1wDpNkY4AtVb+OOp9h+JfNwvXbcvSF+hSfCEQjPqIMT k57LTPfabbKSsoq5FP7TSCdtc7aQdlmPSOhmjlI6jiRrDaUrna+OWqQbkPr5Bzr0tvKo YeZT4iJDrJVzLLSXxi88TRWqbj4DVNOYvxXFCQGFgTZH6f577A1hvqUJXLulmFSPVr8H Dt2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721987254; x=1722592054; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hXeSWJ/2XL0C00VB4D2wsSnDGte33aaztEw6N6+DVHo=; b=LQdBpjbVKw4IMNwRFEk1fI2rYd60lg5PITGYXSyu7kcDkaFEVmUUrnpXJ6mMD0ih2A UqLuc6257yEdL/zlpx+2dVfO7hWCZkZ1CeVHHaL3CMujS0eLfdDna0IS+7XItBfXcz1h qhbwKImEn2b30GAso1dL8DRMsf8CXrpN2gMjE0Cb9nFhRslk6HWO3OH5yMLOQI1ckkYA KVj5KDyqLnTS6tYLpbDFYnDndfLZW4vKvSrdqAFsprcdMIlPLEi1skLWfC8xlqe7ft4i l1lF106AOKlKfibPFqmGXxhMXnngIcHCh1X9oXgQYWmD+b+dWTAVj/wbn/yUuLpXWj7p C6qg== X-Forwarded-Encrypted: i=1; AJvYcCX4q4W+QWd3Ck6g6uYGRjrSzXuT3AxzYYzZhHxSkx1ZcH3bfW4F1DoZD3+Kg1PyNTlo6Oi4m2iHLb3KImUUMV0zMGw= X-Gm-Message-State: AOJu0Yy2VX0+Vjcnc4h/C4+PKtoN0ZYdyqojnPU/uTnC0cw4CbUgxkBA 3700TtV6Qquyjzfr8a4mCM/o+SG7HJJogp6pDKxf6A/9IU2wpXQP X-Google-Smtp-Source: AGHT+IEsB98ir1v1O+pC1M3K6sUPTrlIGX/N1CqYehZBqQeCofKQvFrGpl3C0i9QXGJhDvMHh4MUqQ== X-Received: by 2002:a17:902:dacc:b0:1fd:95ac:506b with SMTP id d9443c01a7336-1fed3af263fmr68421725ad.64.1721987254203; Fri, 26 Jul 2024 02:47:34 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fed7d15e98sm28127455ad.99.2024.07.26.02.47.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jul 2024 02:47:33 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: ying.huang@intel.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, kasong@tencent.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, yosryahmed@google.com Subject: [PATCH v5 4/4] mm: Introduce per-thpsize swapin control policy Date: Fri, 26 Jul 2024 21:46:18 +1200 Message-Id: <20240726094618.401593-5-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240726094618.401593-1-21cnbao@gmail.com> References: <20240726094618.401593-1-21cnbao@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 9214F2002B X-Stat-Signature: x7fk9b6izowqab5gqud9g95hqdz6w7io X-Rspam-User: X-HE-Tag: 1721987255-974645 X-HE-Meta: U2FsdGVkX180jhr2GUWfGzAenFi1jR6ke/vX5Y4kmvChd1y2T9GRSxXli5tAyM40wM+gUkh8Rw6538RIpv8fRxD0bUnflqoFSkj8xtjiuVOojmjp/v5/jk2dsHFC3JikZXdgrDBtnC36+IKNRnmL4CkzqCk58EEEgorYco5RF6jAtz0N5JLnVAkncT5TyF0TQIM2Q8lgAJZSUAwW0umrZpsRPBLvQ4rkBqRoUnyzY2oShI6N0lubOqHjwCT3gGyYtWUSf6eQZn24cxSN49M/LifbnevdMc4zbJEdjLYUfWsxnL6+sZghzCt2F5BzN3Ouj2n2b7IjhLPTMbiNvJkDvakUuhsSlSyWA4Lco1ppwuKTEDi4Oi7EEp2k2uoHvpACqpyGg5s8AVqUQ80wUAecaYxsYs+OsS/SAu1FhpXVBKUxW0d4zZguEeKs6wwaMQqsdzVZUg53RenW6fenfilsAmkHYOsvkADA6P4OuOodhNEza1TbveTNaOODj21tmCAZRMnPoGYWMDwqqlTeIYUU8mothxEmOimACd5tv+8z0ax9sRwW2iy0C0Uo8lMfdHlRigzKtdWI5+vUhXkQclltYglKOFzYIcMnXfkyxqg3ymJvmH/pTrcFCpN2jhP+U9A4f7CIr9A6qgzHb4S46tFDN1SOnmRAzPRU9HJEYTSjVFs6N9ghcUnOrzrQ0a3RGPf7E0571rl9hme11Nmyu8atn5ymmmqUQykL3jH1Ok20Zubk5gyIpoz8YJqXumdMs7LWUQAvK2t8j7lFdUzHoZd1gid7/0KrCt8OFHJHljGV9kGYKgOrF3r7mQ+qPH/sZ1kwLjbGmqK3eyQeZxKqAIf1lPovfzLT0KG+YHhc4Jp1NYuSSPdQjYq63FxDkZgYLE+7XSAGjEcu5KqJV4vK9eJABe4vVGMcXi0sRYHxURx2fwWFbxWFLCZPyshnyxhZHPuhR0Xc+KQwPb2fCcTI30p 1eFASo/6 caNhbEH/2R9R1LAybu4hjvTAFWB/5HNDY24o6OcQUxYcHsbPRG3IN8LrF7BHzYaFDhl0aDujzHDWDsxHg7hpig1MkByUG49Wfrg8K7xY/C/pjaXOnQaxqYUwuOT7LXwJ+mpixi4iRtHPyqpVeiVF9Rpq3Z4+zs54WZutMR8FZWP9ZryLjtMSCqn7nYoi4gKuClCicz8m5Z1QsQavoOMrQYQshk601Pb8kTRqY7yTS3ZCaUYWaN0R3R+q8ZVav11yN4N050AnfPvkhFpmVEND1c2fXuO2AW9bGvi/UWqY4ekEP8L5szd5ipnQHwMocWj0nIBue+oz0WkuSU39wfYo/wOtnF4dd02ztVk5ljMul814UJoI+DhFYrBTglXkGbTK7RuWIe5p67nqmWgEhWvI+lvyaaBS890iUKtpwPmmBGKjhA929f+ThQZzzEa4aPDmxQs+j8JA9dJOyQoU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Quote Ying's comment: A user space interface can be implemented to select different swap-in order policies, similar to the mTHP allocation order policy. We need a distinct policy because the performance characteristics of memory allocation differ significantly from those of swap-in. For example, SSD read speeds can be much slower than memory allocation. With policy selection, I believe we can implement mTHP swap-in for non-SWAP_SYNCHRONOUS scenarios as well. However, users need to understand the implications of their choices. I think that it's better to start with at least always never. I believe that we will add auto in the future to tune automatically, which can be used as default finally. Suggested-by: "Huang, Ying" Signed-off-by: Barry Song --- Documentation/admin-guide/mm/transhuge.rst | 6 +++ include/linux/huge_mm.h | 1 + mm/huge_memory.c | 44 ++++++++++++++++++++++ mm/memory.c | 3 +- 4 files changed, 53 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 058485daf186..2e94e956ee12 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -144,6 +144,12 @@ hugepage sizes have enabled="never". If enabling multiple hugepage sizes, the kernel will select the most appropriate enabled size for a given allocation. +Transparent Hugepage Swap-in for anonymous memory can be disabled or enabled +by per-supported-THP-size with one of:: + + echo always >/sys/kernel/mm/transparent_hugepage/hugepages-kB/swapin_enabled + echo never >/sys/kernel/mm/transparent_hugepage/hugepages-kB/swapin_enabled + It's also possible to limit defrag efforts in the VM to generate anonymous hugepages in case they're not immediately free to madvise regions or to never try to defrag memory and simply fallback to regular diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e25d9ebfdf89..25174305b17f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -92,6 +92,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define TVA_SMAPS (1 << 0) /* Will be used for procfs */ #define TVA_IN_PF (1 << 1) /* Page fault handler */ #define TVA_ENFORCE_SYSFS (1 << 2) /* Obey sysfs configuration */ +#define TVA_IN_SWAPIN (1 << 3) /* Do swap-in */ #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0167dc27e365..41460847988c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -80,6 +80,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL; unsigned long huge_anon_orders_always __read_mostly; unsigned long huge_anon_orders_madvise __read_mostly; unsigned long huge_anon_orders_inherit __read_mostly; +unsigned long huge_anon_orders_swapin_always __read_mostly; unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, unsigned long vm_flags, @@ -88,6 +89,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, { bool smaps = tva_flags & TVA_SMAPS; bool in_pf = tva_flags & TVA_IN_PF; + bool in_swapin = tva_flags & TVA_IN_SWAPIN; bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS; unsigned long supported_orders; @@ -100,6 +102,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, supported_orders = THP_ORDERS_ALL_FILE_DEFAULT; orders &= supported_orders; + if (in_swapin) + orders &= READ_ONCE(huge_anon_orders_swapin_always); if (!orders) return 0; @@ -523,8 +527,48 @@ static ssize_t thpsize_enabled_store(struct kobject *kobj, static struct kobj_attribute thpsize_enabled_attr = __ATTR(enabled, 0644, thpsize_enabled_show, thpsize_enabled_store); +static DEFINE_SPINLOCK(huge_anon_orders_swapin_lock); + +static ssize_t thpsize_swapin_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int order = to_thpsize(kobj)->order; + const char *output; + + if (test_bit(order, &huge_anon_orders_swapin_always)) + output = "[always] never"; + else + output = "always [never]"; + + return sysfs_emit(buf, "%s\n", output); +} + +static ssize_t thpsize_swapin_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int order = to_thpsize(kobj)->order; + ssize_t ret = count; + + if (sysfs_streq(buf, "always")) { + spin_lock(&huge_anon_orders_swapin_lock); + set_bit(order, &huge_anon_orders_swapin_always); + spin_unlock(&huge_anon_orders_swapin_lock); + } else if (sysfs_streq(buf, "never")) { + spin_lock(&huge_anon_orders_swapin_lock); + clear_bit(order, &huge_anon_orders_swapin_always); + spin_unlock(&huge_anon_orders_swapin_lock); + } else + ret = -EINVAL; + + return ret; +} +static struct kobj_attribute thpsize_swapin_enabled_attr = + __ATTR(swapin_enabled, 0644, thpsize_swapin_enabled_show, thpsize_swapin_enabled_store); + static struct attribute *thpsize_attrs[] = { &thpsize_enabled_attr.attr, + &thpsize_swapin_enabled_attr.attr, #ifdef CONFIG_SHMEM &thpsize_shmem_enabled_attr.attr, #endif diff --git a/mm/memory.c b/mm/memory.c index 14048e9285d4..27c77f739a2c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4091,7 +4091,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) * and suitable for swapping THP. */ orders = thp_vma_allowable_orders(vma, vma->vm_flags, - TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER) - 1); + TVA_IN_PF | TVA_IN_SWAPIN | TVA_ENFORCE_SYSFS, + BIT(PMD_ORDER) - 1); orders = thp_vma_suitable_orders(vma, vmf->address, orders); orders = thp_swap_suitable_orders(swp_offset(entry), vmf->address, orders);