From patchwork Wed Jun 19 08:21:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ge Yang X-Patchwork-Id: 13703515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8638FC27C53 for ; Wed, 19 Jun 2024 08:21:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2349C6B0345; Wed, 19 Jun 2024 04:21:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E5876B0346; Wed, 19 Jun 2024 04:21:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0ACDD6B0347; Wed, 19 Jun 2024 04:21:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DB8E66B0345 for ; Wed, 19 Jun 2024 04:21:57 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 96E71A386D for ; Wed, 19 Jun 2024 08:21:57 +0000 (UTC) X-FDA: 82246945074.16.BEB66FA Received: from m16.mail.126.com (m16.mail.126.com [117.135.210.8]) by imf22.hostedemail.com (Postfix) with ESMTP id A5BE4C0012 for ; Wed, 19 Jun 2024 08:21:54 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=Iw19GNkp; spf=pass (imf22.hostedemail.com: domain of yangge1116@126.com designates 117.135.210.8 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718785308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references:dkim-signature; bh=Ljn9sb++6gHyMC40ZgcwaQQrbP0466Fkz3naK5umj9I=; b=Z1PWB1UAMxzGGoA6iE8cJf38/o7QGfWNUxAuNqzSkXt5+XH8E7ZIuhaLDdlSMU3Jgxpe6R Oi8EoegUIZly0Fg6ysHUHmc4vdNv6eXyusUyOcOPEowV9wpZWycY64URnTNPh+B2n4p+yj GH1YKa9dOO59Orf8jUk6cKV43wkExDE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718785308; a=rsa-sha256; cv=none; b=xzDnjRJxeZ1YmkLvzJSXdolsl3H0HzeO/2uzYhhog+u8s1oulka5C+jOwDYmL4yxxujoJ9 ggcg0w2+BNEqEWUA3ZzSnaVy4ZkMiU078GbU4MSkPUkfFrYKtTMU2tp8eoxodWStUVX4Cz NU+P+uxPCdWfFvI1ll9cyH+yKiLZxFU= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=Iw19GNkp; spf=pass (imf22.hostedemail.com: domain of yangge1116@126.com designates 117.135.210.8 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=From:Subject:Date:Message-Id; bh=Ljn9sb++6gHyMC40Zg cwaQQrbP0466Fkz3naK5umj9I=; b=Iw19GNkpgFYNrkCpmHa21ZN1ZmbKDyU1w8 mrCjNemCxv17fI7g8xLIAyeNMJO0FOe7pOSCRH1fuBaqC7Dm3qmkpkOO2lYxXakp OPN+ltkXmL62wu2ycBubve3NBInFBAyHne7spl6gdOQlVgev/DS/9SqzRttLI/GK spbNNdMas= Received: from hg-OptiPlex-7040.hygon.cn (unknown [118.242.3.34]) by gzga-smtp-mta-g1-4 (Coremail) with SMTP id _____wDnV80SlXJmFM7xAw--.9288S2; Wed, 19 Jun 2024 16:21:40 +0800 (CST) From: yangge1116@126.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, liuzixing@hygon.cn, yangge Subject: [PATCH] mm/page_alloc: add one PCP list for THP Date: Wed, 19 Jun 2024 16:21:34 +0800 Message-Id: <1718785294-26713-1-git-send-email-yangge1116@126.com> X-Mailer: git-send-email 2.7.4 X-CM-TRANSID: _____wDnV80SlXJmFM7xAw--.9288S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxAFWUtF17Xw18JF1UuFWDXFb_yoWrWw17pF Wxtw4Syayjq343Aw18A3Wqkr1rCasxGFsrCr1fuw18ZwsxXFWS9a4fKFnFvFyrZrWUAFWx Xr9rt34fCF4DA3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07Umg4DUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiWQkDG2VLa0vv-QAAsU X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A5BE4C0012 X-Stat-Signature: 8z6a3hcmpn7wpyqzuxo14ejm8hddtfnc X-HE-Tag: 1718785314-39908 X-HE-Meta: U2FsdGVkX195fUGSubK7YssVWkFhaRX7/yJsZhWakgUWB/jayGuUbm7JjLnlDmcoxheP4pLoC/edTSa9m7G92WRphSR2XyVAlQ8Jf8j2Lzt+22b0Tk3PT5Ofcpi7DO9r0ndZ3liW5Y+hi/CCv051YbDIP2OAoZ+1CySgvm8ji9bI8ntGeGd6YhYxDbDdy+jGWQywi0AnS6ywSfKbmtVT8t5fKomyAaoR57zs9ABHm9M2NYQmJrz3Wz211dLM8QHb1e5cbySRAqVKOpPlrZA12+UR8V+lFsjv1Z1oktIeEiwqj4DpGglfqZrEMzHG0bBCbdZTVVtxl8BnLKddPlD/svIMGSyNZTf6mQ6PdHQLkBiNMZ9fqQiUfpYpqlO6/pWJbSLbt+Og9lqIYuDwSK6uCF86ng3E/XUyEJ3uAiYLGotQgSLUUhQ1luRXzqH1+zp9tKGjXI8GqtYvHcMmR6trcgBrKVHRsjrbM5Osq9UIa/r4h+87/r2aoy+cAL8zn/k0AUVL3/ZyvhJBlRxXhm88d+d3a/KbcVqoFq2c8ckHBB4n9SE5QT2HETUZugG3k02JJJrEkJKKSIoq5VgPrjmUJv82sCulIaCXXdkjb4TNtGvbxWRV6/yvSjuyQvf+ctuzjnyf7C3IC+X2itvpIwqJZUN+YVxU5iDFuA9Q/yHDBwXcy0oygPulu8mOLumpylnPYG6oCZ0SUaaMudas20r9bxGFpz0mcoiIWi0aNlPr7mnZbDNBXOCv3jTgv7IHbS/dEdhR+2sI3Wos3dz6o9f6IdZ5Dpw1+BKWuo57UaDwT8jKs6xvwy6dTOWZks7dpzcOwIZ0qjrGGOq2wDfBlRRbcZW8yIDZB5mSxg9Ws3EUZy4OABLh8YI7Ksku+iAcc/SpcC+kKsBLABL+TCQ0O7cEupVN2PJsuoY8DBTk0JLiuj+kAdmL9Kh2ZGBdzYZSUj/+g8EHefjCsCTtp5wpY67 o49hOdX0 EczJ9g6P1a0/VdaCcUXBrl5aXj1YPKxg6jDnKk7Tvlgw7F4ckFD6BjwtlAIizSLJO5fuFaVwGJyC4OqOG6p5opz1w+pptVHA03hf++bcaUGUOH7OAUGeJJDrbucTC4X6fqrdmrxhm1vgB/zeH30evhojz7MXytNu/0Ef8jx6xglusaaNNGDSMuUaE3NUA89rwBK8Vx0Xy5+mqd2y5HrqfQSWt5UlBVMEWTcKeqIbQpp3vP5DV6cnB/7tcxlHaFTXRBydQTdey584sdIoDb4Ivmp0fJorupFRvQWAFo25Y3KkTTXfFyYPWLLs+ROQBRaXj2PC3iWFQ+tzF6WCyFOPQq1G4B71rw+XSo4IycyssGUUAcsqb09XljIYv8dnmcuukbMHKR2SI7krvb8JU4xcR4rldoCaY4dR8iCObB2H/qjAs95yjhhWhD9CVJCDyhH95Q29ZzeZB/ae1JNP80xeOw8Cq9u41T8JqpThm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: yangge Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations") no longer differentiates the migration type of pages in THP-sized PCP list, it's possible that non-movable allocation requests may get a CMA page from the list, in some cases, it's not acceptable. If a large number of CMA memory are configured in system (for example, the CMA memory accounts for 50% of the system memory), starting a virtual machine with device passthrough will get stuck. During starting the virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. Normally if a page is present and in CMA area, pin_user_pages_remote() will migrate the page from CMA area to non-CMA area because of FOLL_LONGTERM flag. But if non-movable allocation requests return CMA memory, migrate_longterm_unpinnable_pages() will migrate a CMA page to another CMA page, which will fail to pass the check in check_and_migrate_movable_pages() and cause migration endless. Call trace: pin_user_pages_remote --__gup_longterm_locked // endless loops in this function ----_get_user_pages_locked ----check_and_migrate_movable_pages ------migrate_longterm_unpinnable_pages --------alloc_migration_target To fix the problem above, we add one PCP list for THP, which will not introduce a new cacheline. THP will have 2 PCP lists, one PCP list is used by MOVABLE allocation, and the other PCP list is used by UNMOVABLE allocation. MOVABLE allocation contains GPF_MOVABLE, and UNMOVABLE allocation contains GFP_UNMOVABLE and GFP_RECLAIMABLE. Link: https://lore.kernel.org/all/1717492460-19457-1-git-send-email-yangge1116@126.com/ Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations") Signed-off-by: yangge --- include/linux/mmzone.h | 11 ++++++----- mm/page_alloc.c | 14 +++++++++++--- 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b7546dd..7e6989d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -656,13 +656,14 @@ enum zone_watermarks { }; /* - * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. One additional list - * for THP which will usually be GFP_MOVABLE. Even if it is another type, - * it should not contribute to serious fragmentation causing THP allocation - * failures. + * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. Two additional lists + * are added for THP. One PCP list is used by GPF_MOVABLE, and the other PCP list + * is used by GFP_UNMOVABLE and GFP_RECLAIMABLE. */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define NR_PCP_THP 1 +#define NR_PCP_THP 2 +#define PCP_THP_MOVABLE 0 +#define PCP_THP_UNMOVABLE 1 #else #define NR_PCP_THP 0 #endif diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8f416a0..5eac18e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -504,16 +504,24 @@ static void bad_page(struct page *page, const char *reason) static inline unsigned int order_to_pindex(int migratetype, int order) { + int pcp_type = migratetype; + #ifdef CONFIG_TRANSPARENT_HUGEPAGE if (order > PAGE_ALLOC_COSTLY_ORDER) { VM_BUG_ON(order != HPAGE_PMD_ORDER); - return NR_LOWORDER_PCP_LISTS; + + if (migratetype != MIGRATE_MOVABLE) + pcp_type = PCP_THP_UNMOVABLE; + else + pcp_type = PCP_THP_MOVABLE; + + return NR_LOWORDER_PCP_LISTS + pcp_type; } #else VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); #endif - return (MIGRATE_PCPTYPES * order) + migratetype; + return (MIGRATE_PCPTYPES * order) + pcp_type; } static inline int pindex_to_order(unsigned int pindex) @@ -521,7 +529,7 @@ static inline int pindex_to_order(unsigned int pindex) int order = pindex / MIGRATE_PCPTYPES; #ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (pindex == NR_LOWORDER_PCP_LISTS) + if (order > PAGE_ALLOC_COSTLY_ORDER) order = HPAGE_PMD_ORDER; #else VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);