From patchwork Mon Jun 13 12:56:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12879617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30349CCA47C for ; Mon, 13 Jun 2022 14:33:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3C208D018D; Mon, 13 Jun 2022 10:33:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC2CC8D0171; Mon, 13 Jun 2022 10:33:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9627E8D018D; Mon, 13 Jun 2022 10:33:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 87B008D0171 for ; Mon, 13 Jun 2022 10:33:58 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 6533060B76 for ; Mon, 13 Jun 2022 14:33:58 +0000 (UTC) X-FDA: 79573456956.21.574B8FB Received: from outbound-smtp39.blacknight.com (outbound-smtp39.blacknight.com [46.22.139.222]) by imf27.hostedemail.com (Postfix) with ESMTP id C82A7400CE for ; Mon, 13 Jun 2022 14:33:50 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp39.blacknight.com (Postfix) with ESMTPS id 83EDD196F for ; Mon, 13 Jun 2022 13:56:43 +0100 (IST) Received: (qmail 28161 invoked from network); 13 Jun 2022 12:56:43 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jun 2022 12:56:43 -0000 From: Mel Gorman To: Andrew Morton Cc: Nicolas Saenz Julienne , Marcelo Tosatti , Vlastimil Babka , Michal Hocko , Hugh Dickins , LKML , Linux-MM , Mel Gorman Subject: [PATCH 1/7] mm/page_alloc: Add page->buddy_list and page->pcp_list Date: Mon, 13 Jun 2022 13:56:16 +0100 Message-Id: <20220613125622.18628-2-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220613125622.18628-1-mgorman@techsingularity.net> References: <20220613125622.18628-1-mgorman@techsingularity.net> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655130833; a=rsa-sha256; cv=none; b=6CyL80JDG+751YmEVe6gT79tkheznfqvx7mjged6cq/jDPiNWvtlzSIttuWT14l0Sdocli CYNcJxPGwHhvE007vGB8dZdBewuOjgr16sOAOVvsbntYsxdB6qEJsoM3slrPwWqJCmm2cU OtCDuxfmbXyURJvAfXqTQUvYyAl2P1w= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf27.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.222 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655130833; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jb2R7TDqB5uCueOUWy85mZGxbh2Axy5OLHTYqJb5ayw=; b=fqRTRucNVT19EmV5LBqRfcNPAXR+7tG/9nWbtkmTKu2LsZRArHVqRJpe8Sz+NXop2ew/oi 4s9qF8h5QToIcsOkInNKOnDCuxzl+srkyMkZKardHmSqzlceLB2WLxu0TdPvbyDhnPWUL9 iT5z74oQdNkwgHJVWBnxinomj0eswWs= X-Stat-Signature: ph9p63jxjpd3yewcomeenj6x6473to44 X-Rspamd-Server: rspam02 X-Rspam-User: X-Rspamd-Queue-Id: C82A7400CE Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf27.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.222 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-HE-Tag: 1655130830-858729 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The page allocator uses page->lru for storing pages on either buddy or PCP lists. Create page->buddy_list and page->pcp_list as a union with page->lru. This is simply to clarify what type of list a page is on in the page allocator. No functional change intended. [minchan@kernel.org: fix page lru fields in macros] Signed-off-by: Mel Gorman Tested-by: Minchan Kim Acked-by: Minchan Kim Reviewed-by: Nicolas Saenz Julienne Acked-by: Vlastimil Babka --- include/linux/mm_types.h | 5 +++++ mm/page_alloc.c | 24 ++++++++++++------------ 2 files changed, 17 insertions(+), 12 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index b34ff2cdbc4f..c09b7f0555b8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -87,6 +87,7 @@ struct page { */ union { struct list_head lru; + /* Or, for the Unevictable "LRU list" slot */ struct { /* Always even, to negate PageTail */ @@ -94,6 +95,10 @@ struct page { /* Count page's or folio's mlocks */ unsigned int mlock_count; }; + + /* Or, free page */ + struct list_head buddy_list; + struct list_head pcp_list; }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e008a3df0485..247fa7502199 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -785,7 +785,7 @@ static inline bool set_page_guard(struct zone *zone, struct page *page, return false; __SetPageGuard(page); - INIT_LIST_HEAD(&page->lru); + INIT_LIST_HEAD(&page->buddy_list); set_page_private(page, order); /* Guard pages are not available for any usage */ __mod_zone_freepage_state(zone, -(1 << order), migratetype); @@ -928,7 +928,7 @@ static inline void add_to_free_list(struct page *page, struct zone *zone, { struct free_area *area = &zone->free_area[order]; - list_add(&page->lru, &area->free_list[migratetype]); + list_add(&page->buddy_list, &area->free_list[migratetype]); area->nr_free++; } @@ -938,7 +938,7 @@ static inline void add_to_free_list_tail(struct page *page, struct zone *zone, { struct free_area *area = &zone->free_area[order]; - list_add_tail(&page->lru, &area->free_list[migratetype]); + list_add_tail(&page->buddy_list, &area->free_list[migratetype]); area->nr_free++; } @@ -952,7 +952,7 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, { struct free_area *area = &zone->free_area[order]; - list_move_tail(&page->lru, &area->free_list[migratetype]); + list_move_tail(&page->buddy_list, &area->free_list[migratetype]); } static inline void del_page_from_free_list(struct page *page, struct zone *zone, @@ -962,7 +962,7 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, if (page_reported(page)) __ClearPageReported(page); - list_del(&page->lru); + list_del(&page->buddy_list); __ClearPageBuddy(page); set_page_private(page, 0); zone->free_area[order].nr_free--; @@ -1504,11 +1504,11 @@ static void free_pcppages_bulk(struct zone *zone, int count, do { int mt; - page = list_last_entry(list, struct page, lru); + page = list_last_entry(list, struct page, pcp_list); mt = get_pcppage_migratetype(page); /* must delete to avoid corrupting pcp list */ - list_del(&page->lru); + list_del(&page->pcp_list); count -= nr_pages; pcp->count -= nr_pages; @@ -3068,7 +3068,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * for IO devices that can merge IO requests if the physical * pages are ordered properly. */ - list_add_tail(&page->lru, list); + list_add_tail(&page->pcp_list, list); allocated++; if (is_migrate_cma(get_pcppage_migratetype(page))) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, @@ -3318,7 +3318,7 @@ void mark_free_pages(struct zone *zone) for_each_migratetype_order(order, t) { list_for_each_entry(page, - &zone->free_area[order].free_list[t], lru) { + &zone->free_area[order].free_list[t], buddy_list) { unsigned long i; pfn = page_to_pfn(page); @@ -3407,7 +3407,7 @@ static void free_unref_page_commit(struct page *page, int migratetype, __count_vm_event(PGFREE); pcp = this_cpu_ptr(zone->per_cpu_pageset); pindex = order_to_pindex(migratetype, order); - list_add(&page->lru, &pcp->lists[pindex]); + list_add(&page->pcp_list, &pcp->lists[pindex]); pcp->count += 1 << order; /* @@ -3670,8 +3670,8 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, return NULL; } - page = list_first_entry(list, struct page, lru); - list_del(&page->lru); + page = list_first_entry(list, struct page, pcp_list); + list_del(&page->pcp_list); pcp->count -= 1 << order; } while (check_new_pcp(page, order)); From patchwork Mon Jun 13 12:56:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12879615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BAD0C433EF for ; Mon, 13 Jun 2022 14:33:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98CB08D018B; Mon, 13 Jun 2022 10:33:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C6DB8D0171; Mon, 13 Jun 2022 10:33:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 768248D018B; Mon, 13 Jun 2022 10:33:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 619F08D0171 for ; Mon, 13 Jun 2022 10:33:53 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id D851F80B1E for ; Mon, 13 Jun 2022 14:33:51 +0000 (UTC) X-FDA: 79573456662.14.C1D7A14 Received: from outbound-smtp20.blacknight.com (outbound-smtp20.blacknight.com [46.22.139.247]) by imf01.hostedemail.com (Postfix) with ESMTP id C35B0400A6 for ; Mon, 13 Jun 2022 14:33:48 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp20.blacknight.com (Postfix) with ESMTPS id C51441C36C8 for ; Mon, 13 Jun 2022 13:56:53 +0100 (IST) Received: (qmail 28507 invoked from network); 13 Jun 2022 12:56:53 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jun 2022 12:56:53 -0000 From: Mel Gorman To: Andrew Morton Cc: Nicolas Saenz Julienne , Marcelo Tosatti , Vlastimil Babka , Michal Hocko , Hugh Dickins , LKML , Linux-MM , Mel Gorman Subject: [PATCH 2/7] mm/page_alloc: Use only one PCP list for THP-sized allocations Date: Mon, 13 Jun 2022 13:56:17 +0100 Message-Id: <20220613125622.18628-3-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220613125622.18628-1-mgorman@techsingularity.net> References: <20220613125622.18628-1-mgorman@techsingularity.net> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655130830; a=rsa-sha256; cv=none; b=ZlEI5ypoX9a42XXoreVDNh8wjNE3Y5TM50PszQfGmuvwvDWf4/Dsd4nk+DBIuRENisD31e 1yJFlGYlwatlFYOl+fV0Co7HIYIsGrwjvOo1QLKsBY6bq6J0E3vFaeO4Ha6HSp7JaiMrXi J0mxB+EROUe533aKIGAaoQcc94RSvfU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.247 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655130830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XgscNebn/w3HJNudLhCXREo1bw93QiM4OgHsfyXOsQs=; b=xABAu5ok9bAn6MqytmKz0mRocwrFOpen0jOndRYPRehPj/r9UpMCeNXdg41Sb2vWv3wAM/ IEz1gubtiYMe7vyS+hxjhQAA8xL4T1CJRqmXqLzYEE1vXGD6VOl3j3HqAOc+NtqPt6Hr+g hoZK/w4Sdx51Y+3QFkwLQN1MW8jlGd0= Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.247 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-Rspam-User: X-Stat-Signature: hp11im9ygr7r5qsejugsupg59w5zscau X-Rspamd-Queue-Id: C35B0400A6 X-Rspamd-Server: rspam03 X-HE-Tag: 1655130828-261482 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The per_cpu_pages is cache-aligned on a standard x86-64 distribution configuration but a later patch will add a new field which would push the structure into the next cache line. Use only one list to store THP-sized pages on the per-cpu list. This assumes that the vast majority of THP-sized allocations are GFP_MOVABLE but even if it was another type, it would not contribute to serious fragmentation that potentially causes a later THP allocation failure. Align per_cpu_pages on the cacheline boundary to ensure there is no false cache sharing. After this patch, the structure sizing is; struct per_cpu_pages { int count; /* 0 4 */ int high; /* 4 4 */ int batch; /* 8 4 */ short int free_factor; /* 12 2 */ short int expire; /* 14 2 */ struct list_head lists[13]; /* 16 208 */ /* size: 256, cachelines: 4, members: 6 */ /* padding: 32 */ } __attribute__((__aligned__(64))); Signed-off-by: Mel Gorman Tested-by: Minchan Kim Acked-by: Minchan Kim Acked-by: Vlastimil Babka --- include/linux/mmzone.h | 11 +++++++---- mm/page_alloc.c | 4 ++-- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index aab70355d64f..4e0352cc2fcb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -355,15 +355,18 @@ enum zone_watermarks { }; /* - * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER plus one additional - * for pageblock size for THP if configured. + * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. One additional list + * for THP which will usually be GFP_MOVABLE. Even if it is another type, + * it should not contribute to serious fragmentation causing THP allocation + * failures. */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define NR_PCP_THP 1 #else #define NR_PCP_THP 0 #endif -#define NR_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1 + NR_PCP_THP)) +#define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1)) +#define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP) /* * Shift to encode migratetype and order in the same integer, with order @@ -389,7 +392,7 @@ struct per_cpu_pages { /* Lists of pages, one per migrate type stored on the pcp-lists */ struct list_head lists[NR_PCP_LISTS]; -}; +} ____cacheline_aligned_in_smp; struct per_cpu_zonestat { #ifdef CONFIG_SMP diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 247fa7502199..febd97f4a2fc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -653,7 +653,7 @@ static inline unsigned int order_to_pindex(int migratetype, int order) #ifdef CONFIG_TRANSPARENT_HUGEPAGE if (order > PAGE_ALLOC_COSTLY_ORDER) { VM_BUG_ON(order != pageblock_order); - base = PAGE_ALLOC_COSTLY_ORDER + 1; + return NR_LOWORDER_PCP_LISTS; } #else VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); @@ -667,7 +667,7 @@ static inline int pindex_to_order(unsigned int pindex) int order = pindex / MIGRATE_PCPTYPES; #ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (order > PAGE_ALLOC_COSTLY_ORDER) + if (pindex == NR_LOWORDER_PCP_LISTS) order = pageblock_order; #else VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); From patchwork Mon Jun 13 12:56:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12879614 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38576C43334 for ; Mon, 13 Jun 2022 14:33:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B6CD8D018A; Mon, 13 Jun 2022 10:33:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 766948D0171; Mon, 13 Jun 2022 10:33:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 630B98D018A; Mon, 13 Jun 2022 10:33:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 523148D0171 for ; Mon, 13 Jun 2022 10:33:52 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 28273108D for ; Mon, 13 Jun 2022 14:33:52 +0000 (UTC) X-FDA: 79573456704.01.235E96F Received: from outbound-smtp26.blacknight.com (outbound-smtp26.blacknight.com [81.17.249.194]) by imf08.hostedemail.com (Postfix) with ESMTP id 8C8311600EE for ; Mon, 13 Jun 2022 14:33:44 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp26.blacknight.com (Postfix) with ESMTPS id 6CD00CAB8F for ; Mon, 13 Jun 2022 13:57:04 +0100 (IST) Received: (qmail 28904 invoked from network); 13 Jun 2022 12:57:04 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jun 2022 12:57:03 -0000 From: Mel Gorman To: Andrew Morton Cc: Nicolas Saenz Julienne , Marcelo Tosatti , Vlastimil Babka , Michal Hocko , Hugh Dickins , LKML , Linux-MM , Mel Gorman Subject: [PATCH 3/7] mm/page_alloc: Split out buddy removal code from rmqueue into separate helper Date: Mon, 13 Jun 2022 13:56:18 +0100 Message-Id: <20220613125622.18628-4-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220613125622.18628-1-mgorman@techsingularity.net> References: <20220613125622.18628-1-mgorman@techsingularity.net> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655130830; a=rsa-sha256; cv=none; b=dU5PlVkzcE86aQnuRHUexmHqrCO0+RHbJuLrKwHX1XZwXeF23Z8F5e9U+vm/Pj8hKZWh0n pEWPikwVRUvmPNUDerOrGRXF3uJ72K7PbB3Z1vQh/w3bQU+5gvsfYOtApz3Y+oEeek1PBQ TPL3dxwDB7g8T7Rki94egHqksn4knoo= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf08.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.194 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655130830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v818/rSVUFtM5Aui90TwZmp3kJVhKKoVc8bks+ZdaDQ=; b=C9srx37psPakVM3c0bT067o84uYWlF7j6QFspr8cEzNo10McKxIxyCUF/Py3gkCt/nYihT aqLxczEsmESVeQ/P1nUFCqQ8+YPTHOuI0hRQ3qB6zTelwt2304gDbA1u8ZH7difiLgwN78 DJhqtz82/FgW/4peL4oqhNoueD4iiRE= Authentication-Results: imf08.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf08.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.194 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-Rspam-User: X-Stat-Signature: uckbfeq5i5x6hqdrfmccyaawwg1o5y9u X-Rspamd-Queue-Id: 8C8311600EE X-Rspamd-Server: rspam03 X-HE-Tag: 1655130824-568601 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a preparation page to allow the buddy removal code to be reused in a later patch. No functional change. Signed-off-by: Mel Gorman Tested-by: Minchan Kim Acked-by: Minchan Kim Reviewed-by: Nicolas Saenz Julienne Acked-by: Vlastimil Babka --- mm/page_alloc.c | 81 ++++++++++++++++++++++++++++--------------------- 1 file changed, 47 insertions(+), 34 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index febd97f4a2fc..44d198af4b35 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3637,6 +3637,43 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z, #endif } +static __always_inline +struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, + unsigned int order, unsigned int alloc_flags, + int migratetype) +{ + struct page *page; + unsigned long flags; + + do { + page = NULL; + spin_lock_irqsave(&zone->lock, flags); + /* + * order-0 request can reach here when the pcplist is skipped + * due to non-CMA allocation context. HIGHATOMIC area is + * reserved for high-order atomic allocation, so order-0 + * request should skip it. + */ + if (order > 0 && alloc_flags & ALLOC_HARDER) + page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); + if (!page) { + page = __rmqueue(zone, order, migratetype, alloc_flags); + if (!page) { + spin_unlock_irqrestore(&zone->lock, flags); + return NULL; + } + } + __mod_zone_freepage_state(zone, -(1 << order), + get_pcppage_migratetype(page)); + spin_unlock_irqrestore(&zone->lock, flags); + } while (check_new_pages(page, order)); + + __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); + zone_statistics(preferred_zone, zone, 1); + + return page; +} + /* Remove page from the per-cpu list, caller must protect the list */ static inline struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, @@ -3717,9 +3754,14 @@ struct page *rmqueue(struct zone *preferred_zone, gfp_t gfp_flags, unsigned int alloc_flags, int migratetype) { - unsigned long flags; struct page *page; + /* + * We most definitely don't want callers attempting to + * allocate greater than order-1 page units with __GFP_NOFAIL. + */ + WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); + if (likely(pcp_allowed_order(order))) { /* * MIGRATE_MOVABLE pcplist could have the pages on CMA area and @@ -3733,35 +3775,10 @@ struct page *rmqueue(struct zone *preferred_zone, } } - /* - * We most definitely don't want callers attempting to - * allocate greater than order-1 page units with __GFP_NOFAIL. - */ - WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); - - do { - page = NULL; - spin_lock_irqsave(&zone->lock, flags); - /* - * order-0 request can reach here when the pcplist is skipped - * due to non-CMA allocation context. HIGHATOMIC area is - * reserved for high-order atomic allocation, so order-0 - * request should skip it. - */ - if (order > 0 && alloc_flags & ALLOC_HARDER) - page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); - if (!page) { - page = __rmqueue(zone, order, migratetype, alloc_flags); - if (!page) - goto failed; - } - __mod_zone_freepage_state(zone, -(1 << order), - get_pcppage_migratetype(page)); - spin_unlock_irqrestore(&zone->lock, flags); - } while (check_new_pages(page, order)); - - __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); - zone_statistics(preferred_zone, zone, 1); + page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags, + migratetype); + if (unlikely(!page)) + return NULL; out: /* Separate test+clear to avoid unnecessary atomics */ @@ -3772,10 +3789,6 @@ struct page *rmqueue(struct zone *preferred_zone, VM_BUG_ON_PAGE(page && bad_range(zone, page), page); return page; - -failed: - spin_unlock_irqrestore(&zone->lock, flags); - return NULL; } #ifdef CONFIG_FAIL_PAGE_ALLOC From patchwork Mon Jun 13 12:56:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12879688 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75630C43334 for ; Mon, 13 Jun 2022 14:55:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06C2B8D0194; Mon, 13 Jun 2022 10:55:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 01B6B8D0171; Mon, 13 Jun 2022 10:55:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E261D8D0194; Mon, 13 Jun 2022 10:55:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CE29E8D0171 for ; Mon, 13 Jun 2022 10:55:18 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AB137342DD for ; Mon, 13 Jun 2022 14:55:18 +0000 (UTC) X-FDA: 79573510716.02.61A63E4 Received: from outbound-smtp25.blacknight.com (outbound-smtp25.blacknight.com [81.17.249.193]) by imf01.hostedemail.com (Postfix) with ESMTP id 08B3F405D3 for ; Mon, 13 Jun 2022 14:53:42 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp25.blacknight.com (Postfix) with ESMTPS id 9ECBFCAB91 for ; Mon, 13 Jun 2022 13:57:14 +0100 (IST) Received: (qmail 29193 invoked from network); 13 Jun 2022 12:57:14 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jun 2022 12:57:14 -0000 From: Mel Gorman To: Andrew Morton Cc: Nicolas Saenz Julienne , Marcelo Tosatti , Vlastimil Babka , Michal Hocko , Hugh Dickins , LKML , Linux-MM , Mel Gorman Subject: [PATCH 4/7] mm/page_alloc: Remove mistaken page == NULL check in rmqueue Date: Mon, 13 Jun 2022 13:56:19 +0100 Message-Id: <20220613125622.18628-5-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220613125622.18628-1-mgorman@techsingularity.net> References: <20220613125622.18628-1-mgorman@techsingularity.net> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655132023; a=rsa-sha256; cv=none; b=HJsx354LHBZ4K3sD3r/CiXIFbzU/zSDs5M3PjDvTcYkGqjIzmAISY87X4wV5mzHB1s5XAw El2CCQH9ev7duKWgLCzqWwc+9+74kghr0Scoo2m0pSiyYF2fKXlGo1EAilABOGM8NAE5mH hwJZCB0I/+pl+lIkdaRpW5DwQMvQmAI= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.193 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655132023; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CpKbpBkRPx5i5h1XOAeKyIGn1KxXP1MbOZRyma+4oAc=; b=VWreFbqLiA7Jk1fDZU+g4rT5OQe0y8kY+NDz9sOg97XEjC/ztPeinownbzq4BKcqEdR4LC 7AP7pTOOAywo8dArqg1+ga3AUyJNYqt5bdGGrIg+L5RN0v6IIo7U5lz2SNWaoTaoRck8Ef pKy4EWos9u4Lxoyme7Noe+c5wCFVfAs= X-Stat-Signature: rs1gkt87zsrwrqye41948mmu3yk8e91q X-Rspamd-Queue-Id: 08B3F405D3 X-Rspam-User: X-Rspamd-Server: rspam09 Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf01.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.193 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-HE-Tag: 1655132022-693083 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If a page allocation fails, the ZONE_BOOSTER_WATERMARK should be tested, cleared and kswapd woken whether the allocation attempt was via the PCP or directly via the buddy list. Remove the page == NULL so the ZONE_BOOSTED_WATERMARK bit is checked unconditionally. As it is unlikely that ZONE_BOOSTED_WATERMARK is set, mark the branch accordingly. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/page_alloc.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 44d198af4b35..7fb262eeec2f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3777,12 +3777,10 @@ struct page *rmqueue(struct zone *preferred_zone, page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags, migratetype); - if (unlikely(!page)) - return NULL; out: /* Separate test+clear to avoid unnecessary atomics */ - if (test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) { + if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); wakeup_kswapd(zone, 0, 0, zone_idx(zone)); } From patchwork Mon Jun 13 12:56:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12879690 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D8FAC433EF for ; Mon, 13 Jun 2022 14:57:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8C728D0198; Mon, 13 Jun 2022 10:57:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D3C438D0171; Mon, 13 Jun 2022 10:57:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C03138D0198; Mon, 13 Jun 2022 10:57:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AEC948D0171 for ; Mon, 13 Jun 2022 10:57:45 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E42EA34AD5 for ; Mon, 13 Jun 2022 14:57:44 +0000 (UTC) X-FDA: 79573516848.17.023C6B1 Received: from outbound-smtp40.blacknight.com (outbound-smtp40.blacknight.com [46.22.139.223]) by imf18.hostedemail.com (Postfix) with ESMTP id 657EB1C00A3 for ; Mon, 13 Jun 2022 14:53:42 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp40.blacknight.com (Postfix) with ESMTPS id CECA31C36D2 for ; Mon, 13 Jun 2022 13:57:24 +0100 (IST) Received: (qmail 29632 invoked from network); 13 Jun 2022 12:57:24 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jun 2022 12:57:24 -0000 From: Mel Gorman To: Andrew Morton Cc: Nicolas Saenz Julienne , Marcelo Tosatti , Vlastimil Babka , Michal Hocko , Hugh Dickins , LKML , Linux-MM , Mel Gorman Subject: [PATCH 5/7] mm/page_alloc: Protect PCP lists with a spinlock Date: Mon, 13 Jun 2022 13:56:20 +0100 Message-Id: <20220613125622.18628-6-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220613125622.18628-1-mgorman@techsingularity.net> References: <20220613125622.18628-1-mgorman@techsingularity.net> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655132022; a=rsa-sha256; cv=none; b=ljSNVJCN1hYBUJie62JppResDGPtB5SHqgQiNRlP4bSdc7ab7NXnwMXI9SktWW26tqgra/ z6bssdPxgJ21IPUcHxqSAx7rZxYulhgk56SHIywGquGB9rKsWcgasPRGNgurB55Mc5qKGC zPK1kVv4LxFHTR9oijMnVuzNC4nmqpk= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.223 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655132022; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uFOsH0qmgYfFAznsTlSVHmAMIi3hNR3/ayIyKtAGoOU=; b=DcStXJOAZ4C2bbkLuYgRkoA9k9sLPczbl18zJ0554rMP64WDly/eg9VA+FyIiwy89+Xy8X csEk0969lOhldYffk6omqYVRsOmiuah4ETuK1rYuIFEcKq/F4Kv5WLY5VYarkOiZoRikdN PueohhXrR2Kv1VYLZE54Gm89zYqluG8= Authentication-Results: imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.139.223 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 657EB1C00A3 X-Stat-Signature: z5pqu9ky9atgysnfik5x93wdf4n3tkkg X-Rspam-User: X-HE-Tag: 1655132022-270358 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently the PCP lists are protected by using local_lock_irqsave to prevent migration and IRQ reentrancy but this is inconvenient. Remote draining of the lists is impossible and a workqueue is required and every task allocation/free must disable then enable interrupts which is expensive. As preparation for dealing with both of those problems, protect the lists with a spinlock. The IRQ-unsafe version of the lock is used because IRQs are already disabled by local_lock_irqsave. spin_trylock is used in preparation for a time when local_lock could be used instead of lock_lock_irqsave. The per_cpu_pages still fits within the same number of cache lines after this patch relative to before the series. struct per_cpu_pages { spinlock_t lock; /* 0 4 */ int count; /* 4 4 */ int high; /* 8 4 */ int batch; /* 12 4 */ short int free_factor; /* 16 2 */ short int expire; /* 18 2 */ /* XXX 4 bytes hole, try to pack */ struct list_head lists[13]; /* 24 208 */ /* size: 256, cachelines: 4, members: 7 */ /* sum members: 228, holes: 1, sum holes: 4 */ /* padding: 24 */ } __attribute__((__aligned__(64))); There is overhead in the fast path due to acquiring the spinlock even though the spinlock is per-cpu and uncontended in the common case. Page Fault Test (PFT) running on a 1-socket reported the following results on a 1 socket machine. 5.18.0-rc1 5.18.0-rc1 vanilla mm-pcpdrain-v2r1 Hmean faults/sec-1 886331.5718 ( 0.00%) 885462.7479 ( -0.10%) Hmean faults/sec-3 2337706.1583 ( 0.00%) 2332130.4909 * -0.24%* Hmean faults/sec-5 2851594.2897 ( 0.00%) 2844123.9307 ( -0.26%) Hmean faults/sec-7 3543251.5507 ( 0.00%) 3516889.0442 * -0.74%* Hmean faults/sec-8 3947098.0024 ( 0.00%) 3916162.8476 * -0.78%* Stddev faults/sec-1 2302.9105 ( 0.00%) 2065.0845 ( 10.33%) Stddev faults/sec-3 7275.2442 ( 0.00%) 6033.2620 ( 17.07%) Stddev faults/sec-5 24726.0328 ( 0.00%) 12525.1026 ( 49.34%) Stddev faults/sec-7 9974.2542 ( 0.00%) 9543.9627 ( 4.31%) Stddev faults/sec-8 9468.0191 ( 0.00%) 7958.2607 ( 15.95%) CoeffVar faults/sec-1 0.2598 ( 0.00%) 0.2332 ( 10.24%) CoeffVar faults/sec-3 0.3112 ( 0.00%) 0.2587 ( 16.87%) CoeffVar faults/sec-5 0.8670 ( 0.00%) 0.4404 ( 49.21%) CoeffVar faults/sec-7 0.2815 ( 0.00%) 0.2714 ( 3.60%) CoeffVar faults/sec-8 0.2399 ( 0.00%) 0.2032 ( 15.28%) There is a small hit in the number of faults per second but given that the results are more stable, it's borderline noise. Signed-off-by: Mel Gorman Tested-by: Minchan Kim Acked-by: Minchan Kim Reviewed-by: Nicolas Saenz Julienne Acked-by: Vlastimil Babka --- include/linux/mmzone.h | 1 + mm/page_alloc.c | 167 +++++++++++++++++++++++++++++++++++------ 2 files changed, 147 insertions(+), 21 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4e0352cc2fcb..299259cfe462 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -382,6 +382,7 @@ enum zone_watermarks { /* Fields and list protected by pagesets local_lock in page_alloc.c */ struct per_cpu_pages { + spinlock_t lock; /* Protects lists field */ int count; /* number of pages in the list */ int high; /* high watermark, emptying needed */ int batch; /* chunk size for buddy add/remove */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7fb262eeec2f..533fc5527582 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -133,6 +133,20 @@ static DEFINE_PER_CPU(struct pagesets, pagesets) = { .lock = INIT_LOCAL_LOCK(lock), }; +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT) +/* + * On SMP, spin_trylock is sufficient protection. + * On PREEMPT_RT, spin_trylock is equivalent on both SMP and UP. + */ +#define pcp_trylock_prepare(flags) do { } while (0) +#define pcp_trylock_finish(flag) do { } while (0) +#else + +/* UP spin_trylock always succeeds so disable IRQs to prevent re-entrancy. */ +#define pcp_trylock_prepare(flags) local_irq_save(flags) +#define pcp_trylock_finish(flags) local_irq_restore(flags) +#endif + #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID DEFINE_PER_CPU(int, numa_node); EXPORT_PER_CPU_SYMBOL(numa_node); @@ -3097,15 +3111,22 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, */ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) { - unsigned long flags; int to_drain, batch; - local_lock_irqsave(&pagesets.lock, flags); batch = READ_ONCE(pcp->batch); to_drain = min(pcp->count, batch); - if (to_drain > 0) + if (to_drain > 0) { + unsigned long flags; + + /* + * free_pcppages_bulk expects IRQs disabled for zone->lock + * so even though pcp->lock is not intended to be IRQ-safe, + * it's needed in this context. + */ + spin_lock_irqsave(&pcp->lock, flags); free_pcppages_bulk(zone, to_drain, pcp, 0); - local_unlock_irqrestore(&pagesets.lock, flags); + spin_unlock_irqrestore(&pcp->lock, flags); + } } #endif @@ -3118,16 +3139,17 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) */ static void drain_pages_zone(unsigned int cpu, struct zone *zone) { - unsigned long flags; struct per_cpu_pages *pcp; - local_lock_irqsave(&pagesets.lock, flags); - pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - if (pcp->count) - free_pcppages_bulk(zone, pcp->count, pcp, 0); + if (pcp->count) { + unsigned long flags; - local_unlock_irqrestore(&pagesets.lock, flags); + /* See drain_zone_pages on why this is disabling IRQs */ + spin_lock_irqsave(&pcp->lock, flags); + free_pcppages_bulk(zone, pcp->count, pcp, 0); + spin_unlock_irqrestore(&pcp->lock, flags); + } } /* @@ -3395,18 +3417,30 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, return min(READ_ONCE(pcp->batch) << 2, high); } -static void free_unref_page_commit(struct page *page, int migratetype, - unsigned int order) +/* Returns true if the page was committed to the per-cpu list. */ +static bool free_unref_page_commit(struct page *page, int migratetype, + unsigned int order, bool locked) { struct zone *zone = page_zone(page); struct per_cpu_pages *pcp; int high; int pindex; bool free_high; + unsigned long __maybe_unused UP_flags; __count_vm_event(PGFREE); pcp = this_cpu_ptr(zone->per_cpu_pageset); pindex = order_to_pindex(migratetype, order); + + if (!locked) { + /* Protect against a parallel drain. */ + pcp_trylock_prepare(UP_flags); + if (!spin_trylock(&pcp->lock)) { + pcp_trylock_finish(UP_flags); + return false; + } + } + list_add(&page->pcp_list, &pcp->lists[pindex]); pcp->count += 1 << order; @@ -3424,6 +3458,13 @@ static void free_unref_page_commit(struct page *page, int migratetype, free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch, free_high), pcp, pindex); } + + if (!locked) { + spin_unlock(&pcp->lock); + pcp_trylock_finish(UP_flags); + } + + return true; } /* @@ -3434,6 +3475,7 @@ void free_unref_page(struct page *page, unsigned int order) unsigned long flags; unsigned long pfn = page_to_pfn(page); int migratetype; + bool freed_pcp = false; if (!free_unref_page_prepare(page, pfn, order)) return; @@ -3455,8 +3497,11 @@ void free_unref_page(struct page *page, unsigned int order) } local_lock_irqsave(&pagesets.lock, flags); - free_unref_page_commit(page, migratetype, order); + freed_pcp = free_unref_page_commit(page, migratetype, order, false); local_unlock_irqrestore(&pagesets.lock, flags); + + if (unlikely(!freed_pcp)) + free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); } /* @@ -3465,10 +3510,19 @@ void free_unref_page(struct page *page, unsigned int order) void free_unref_page_list(struct list_head *list) { struct page *page, *next; + struct per_cpu_pages *pcp; + struct zone *locked_zone; unsigned long flags; int batch_count = 0; int migratetype; + /* + * An empty list is possible. Check early so that the later + * lru_to_page() does not potentially read garbage. + */ + if (list_empty(list)) + return; + /* Prepare pages for freeing */ list_for_each_entry_safe(page, next, list, lru) { unsigned long pfn = page_to_pfn(page); @@ -3489,8 +3543,31 @@ void free_unref_page_list(struct list_head *list) } } + /* + * Preparation could have drained the list due to failing to prepare + * or all pages are being isolated. + */ + if (list_empty(list)) + return; + local_lock_irqsave(&pagesets.lock, flags); + + page = lru_to_page(list); + locked_zone = page_zone(page); + pcp = this_cpu_ptr(locked_zone->per_cpu_pageset); + spin_lock(&pcp->lock); + list_for_each_entry_safe(page, next, list, lru) { + struct zone *zone = page_zone(page); + + /* Different zone, different pcp lock. */ + if (zone != locked_zone) { + spin_unlock(&pcp->lock); + locked_zone = zone; + pcp = this_cpu_ptr(zone->per_cpu_pageset); + spin_lock(&pcp->lock); + } + /* * Non-isolated types over MIGRATE_PCPTYPES get added * to the MIGRATE_MOVABLE pcp list. @@ -3500,18 +3577,33 @@ void free_unref_page_list(struct list_head *list) migratetype = MIGRATE_MOVABLE; trace_mm_page_free_batched(page); - free_unref_page_commit(page, migratetype, 0); + + /* + * If there is a parallel drain in progress, free to the buddy + * allocator directly. This is expensive as the zone lock will + * be acquired multiple times but if a drain is in progress + * then an expensive operation is already taking place. + * + * TODO: Always false at the moment due to local_lock_irqsave + * and is preparation for converting to local_lock. + */ + if (unlikely(!free_unref_page_commit(page, migratetype, 0, true))) + free_one_page(page_zone(page), page, page_to_pfn(page), 0, migratetype, FPI_NONE); /* * Guard against excessive IRQ disabled times when we get * a large list of pages to free. */ if (++batch_count == SWAP_CLUSTER_MAX) { + spin_unlock(&pcp->lock); local_unlock_irqrestore(&pagesets.lock, flags); batch_count = 0; local_lock_irqsave(&pagesets.lock, flags); + pcp = this_cpu_ptr(locked_zone->per_cpu_pageset); + spin_lock(&pcp->lock); } } + spin_unlock(&pcp->lock); local_unlock_irqrestore(&pagesets.lock, flags); } @@ -3680,9 +3772,28 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, int migratetype, unsigned int alloc_flags, struct per_cpu_pages *pcp, - struct list_head *list) + struct list_head *list, + bool locked) { struct page *page; + unsigned long __maybe_unused UP_flags; + + /* + * spin_trylock is not necessary right now due to due to + * local_lock_irqsave and is a preparation step for + * a conversion to local_lock using the trylock to prevent + * IRQ re-entrancy. If pcp->lock cannot be acquired, the caller + * uses rmqueue_buddy. + * + * TODO: Convert local_lock_irqsave to local_lock. + */ + if (unlikely(!locked)) { + pcp_trylock_prepare(UP_flags); + if (!spin_trylock(&pcp->lock)) { + pcp_trylock_finish(UP_flags); + return NULL; + } + } do { if (list_empty(list)) { @@ -3703,8 +3814,10 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, migratetype, alloc_flags); pcp->count += alloced << order; - if (unlikely(list_empty(list))) - return NULL; + if (unlikely(list_empty(list))) { + page = NULL; + goto out; + } } page = list_first_entry(list, struct page, pcp_list); @@ -3712,6 +3825,12 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, pcp->count -= 1 << order; } while (check_new_pcp(page, order)); +out: + if (!locked) { + spin_unlock(&pcp->lock); + pcp_trylock_finish(UP_flags); + } + return page; } @@ -3736,7 +3855,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, pcp = this_cpu_ptr(zone->per_cpu_pageset); pcp->free_factor >>= 1; list = &pcp->lists[order_to_pindex(migratetype, order)]; - page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); + page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list, false); local_unlock_irqrestore(&pagesets.lock, flags); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1); @@ -3771,7 +3890,8 @@ struct page *rmqueue(struct zone *preferred_zone, migratetype != MIGRATE_MOVABLE) { page = rmqueue_pcplist(preferred_zone, zone, order, gfp_flags, migratetype, alloc_flags); - goto out; + if (likely(page)) + goto out; } } @@ -5343,6 +5463,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, local_lock_irqsave(&pagesets.lock, flags); pcp = this_cpu_ptr(zone->per_cpu_pageset); pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)]; + spin_lock(&pcp->lock); while (nr_populated < nr_pages) { @@ -5353,11 +5474,13 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, } page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags, - pcp, pcp_list); + pcp, pcp_list, true); if (unlikely(!page)) { /* Try and allocate at least one page */ - if (!nr_account) + if (!nr_account) { + spin_unlock(&pcp->lock); goto failed_irq; + } break; } nr_account++; @@ -5370,6 +5493,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, nr_populated++; } + spin_unlock(&pcp->lock); local_unlock_irqrestore(&pagesets.lock, flags); __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); @@ -7019,6 +7143,7 @@ static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonesta memset(pcp, 0, sizeof(*pcp)); memset(pzstats, 0, sizeof(*pzstats)); + spin_lock_init(&pcp->lock); for (pindex = 0; pindex < NR_PCP_LISTS; pindex++) INIT_LIST_HEAD(&pcp->lists[pindex]); From patchwork Mon Jun 13 12:56:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12879689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 476ACC43334 for ; Mon, 13 Jun 2022 14:55:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D3DB28D0195; Mon, 13 Jun 2022 10:55:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CED308D0171; Mon, 13 Jun 2022 10:55:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB5008D0195; Mon, 13 Jun 2022 10:55:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A90298D0171 for ; Mon, 13 Jun 2022 10:55:30 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 7944C80478 for ; Mon, 13 Jun 2022 14:55:30 +0000 (UTC) X-FDA: 79573511220.06.CFAE046 Received: from outbound-smtp25.blacknight.com (outbound-smtp25.blacknight.com [81.17.249.193]) by imf07.hostedemail.com (Postfix) with ESMTP id ED8CD40633 for ; Mon, 13 Jun 2022 14:53:42 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp25.blacknight.com (Postfix) with ESMTPS id 0CF58CAB90 for ; Mon, 13 Jun 2022 13:57:35 +0100 (IST) Received: (qmail 29889 invoked from network); 13 Jun 2022 12:57:34 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jun 2022 12:57:34 -0000 From: Mel Gorman To: Andrew Morton Cc: Nicolas Saenz Julienne , Marcelo Tosatti , Vlastimil Babka , Michal Hocko , Hugh Dickins , LKML , Linux-MM , Mel Gorman Subject: [PATCH 6/7] mm/page_alloc: Remotely drain per-cpu lists Date: Mon, 13 Jun 2022 13:56:21 +0100 Message-Id: <20220613125622.18628-7-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220613125622.18628-1-mgorman@techsingularity.net> References: <20220613125622.18628-1-mgorman@techsingularity.net> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655132022; a=rsa-sha256; cv=none; b=FhfEHyt3ALFHf//8kz+X0T4/+s72vlqQ7BHu+igv23P16tnUQY7yJcQKY3Jzek1iIZ4tCU f82Lm3OONVoF/NACHMV+tDueKuQvolezpvtUZm++QO0NdlmnK1pZYqeEPd6zyriKjDXcso IAxR4g4w1UbUCxip0LC2zuVUuyfqgUw= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf07.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.193 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655132022; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HV3dtD1y8zL+Nu7iNz9tbO0HCXiFKyoHs6/GrDGYMAo=; b=UiojKM+a/Dm8psZE/4YzIBHxm/n1AI3vSOUpPemIGzf62lNkNR62jUlrWBW3W2tn8u89oo QSvAnptwGGwxecpAKAGiwRw4mGhZ2D4Jj1HUAKal3P2BKNCGHXxwgDED78oiL8J8OfTZJb YsiHe3XIe0mxHu+mLhGLWEEtTi8/3/A= Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf07.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.193 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: hn33dh9rqxaowgomnd69ayc1iabffedj X-Rspamd-Queue-Id: ED8CD40633 X-HE-Tag: 1655132022-674558 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nicolas Saenz Julienne Some setups, notably NOHZ_FULL CPUs, are too busy to handle the per-cpu drain work queued by __drain_all_pages(). So introduce a new mechanism to remotely drain the per-cpu lists. It is made possible by remotely locking 'struct per_cpu_pages' new per-cpu spinlocks. A benefit of this new scheme is that drain operations are now migration safe. There was no observed performance degradation vs. the previous scheme. Both netperf and hackbench were run in parallel to triggering the __drain_all_pages(NULL, true) code path around ~100 times per second. The new scheme performs a bit better (~5%), although the important point here is there are no performance regressions vs. the previous mechanism. Per-cpu lists draining happens only in slow paths. Minchan Kim tested this independently and reported; My workload is not NOHZ CPUs but run apps under heavy memory pressure so they goes to direct reclaim and be stuck on drain_all_pages until work on workqueue run. unit: nanosecond max(dur) avg(dur) count(dur) 166713013 487511.77786438033 1283 From traces, system encountered the drain_all_pages 1283 times and worst case was 166ms and avg was 487us. The other problem was alloc_contig_range in CMA. The PCP draining takes several hundred millisecond sometimes though there is no memory pressure or a few of pages to be migrated out but CPU were fully booked. Your patch perfectly removed those wasted time. Signed-off-by: Nicolas Saenz Julienne Signed-off-by: Mel Gorman Tested-by: Minchan Kim Acked-by: Minchan Kim Acked-by: Vlastimil Babka --- mm/page_alloc.c | 58 ++++--------------------------------------------- 1 file changed, 4 insertions(+), 54 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 533fc5527582..03882ce7765f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -165,13 +165,7 @@ DEFINE_PER_CPU(int, _numa_mem_); /* Kernel "local memory" node */ EXPORT_PER_CPU_SYMBOL(_numa_mem_); #endif -/* work_structs for global per-cpu drains */ -struct pcpu_drain { - struct zone *zone; - struct work_struct work; -}; static DEFINE_MUTEX(pcpu_drain_mutex); -static DEFINE_PER_CPU(struct pcpu_drain, pcpu_drain); #ifdef CONFIG_GCC_PLUGIN_LATENT_ENTROPY volatile unsigned long latent_entropy __latent_entropy; @@ -3105,9 +3099,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * Called from the vmstat counter updater to drain pagesets of this * currently executing processor on remote nodes after they have * expired. - * - * Note that this function must be called with the thread pinned to - * a single processor. */ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) { @@ -3132,10 +3123,6 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) /* * Drain pcplists of the indicated processor and zone. - * - * The processor must either be the current processor and the - * thread pinned to the current processor or a processor that - * is not online. */ static void drain_pages_zone(unsigned int cpu, struct zone *zone) { @@ -3154,10 +3141,6 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) /* * Drain pcplists of all zones on the indicated processor. - * - * The processor must either be the current processor and the - * thread pinned to the current processor or a processor that - * is not online. */ static void drain_pages(unsigned int cpu) { @@ -3170,9 +3153,6 @@ static void drain_pages(unsigned int cpu) /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. - * - * The CPU has to be pinned. When zone parameter is non-NULL, spill just - * the single zone's pages. */ void drain_local_pages(struct zone *zone) { @@ -3184,24 +3164,6 @@ void drain_local_pages(struct zone *zone) drain_pages(cpu); } -static void drain_local_pages_wq(struct work_struct *work) -{ - struct pcpu_drain *drain; - - drain = container_of(work, struct pcpu_drain, work); - - /* - * drain_all_pages doesn't use proper cpu hotplug protection so - * we can race with cpu offline when the WQ can move this from - * a cpu pinned worker to an unbound one. We can operate on a different - * cpu which is alright but we also have to make sure to not move to - * a different one. - */ - migrate_disable(); - drain_local_pages(drain->zone); - migrate_enable(); -} - /* * The implementation of drain_all_pages(), exposing an extra parameter to * drain on all cpus. @@ -3222,13 +3184,6 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) */ static cpumask_t cpus_with_pcps; - /* - * Make sure nobody triggers this path before mm_percpu_wq is fully - * initialized. - */ - if (WARN_ON_ONCE(!mm_percpu_wq)) - return; - /* * Do not drain if one is already in progress unless it's specific to * a zone. Such callers are primarily CMA and memory hotplug and need @@ -3278,14 +3233,11 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) } for_each_cpu(cpu, &cpus_with_pcps) { - struct pcpu_drain *drain = per_cpu_ptr(&pcpu_drain, cpu); - - drain->zone = zone; - INIT_WORK(&drain->work, drain_local_pages_wq); - queue_work_on(cpu, mm_percpu_wq, &drain->work); + if (zone) + drain_pages_zone(cpu, zone); + else + drain_pages(cpu); } - for_each_cpu(cpu, &cpus_with_pcps) - flush_work(&per_cpu_ptr(&pcpu_drain, cpu)->work); mutex_unlock(&pcpu_drain_mutex); } @@ -3294,8 +3246,6 @@ static void __drain_all_pages(struct zone *zone, bool force_all_cpus) * Spill all the per-cpu pages from all CPUs back into the buddy allocator. * * When zone parameter is non-NULL, spill just the single zone's pages. - * - * Note that this can be extremely slow as the draining happens in a workqueue. */ void drain_all_pages(struct zone *zone) { From patchwork Mon Jun 13 12:56:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 12879687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBF3FC433EF for ; Mon, 13 Jun 2022 14:54:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 574EB8D0193; Mon, 13 Jun 2022 10:54:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FD898D0171; Mon, 13 Jun 2022 10:54:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 350358D0193; Mon, 13 Jun 2022 10:54:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1FF6E8D0171 for ; Mon, 13 Jun 2022 10:54:15 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D1FF034488 for ; Mon, 13 Jun 2022 14:54:14 +0000 (UTC) X-FDA: 79573508028.18.4FD6CC7 Received: from outbound-smtp55.blacknight.com (outbound-smtp55.blacknight.com [46.22.136.239]) by imf10.hostedemail.com (Postfix) with ESMTP id 549C1C0340 for ; Mon, 13 Jun 2022 14:52:57 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp55.blacknight.com (Postfix) with ESMTPS id 48482FA773 for ; Mon, 13 Jun 2022 13:57:45 +0100 (IST) Received: (qmail 30201 invoked from network); 13 Jun 2022 12:57:44 -0000 Received: from unknown (HELO morpheus.112glenside.lan) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPA; 13 Jun 2022 12:57:44 -0000 From: Mel Gorman To: Andrew Morton Cc: Nicolas Saenz Julienne , Marcelo Tosatti , Vlastimil Babka , Michal Hocko , Hugh Dickins , LKML , Linux-MM , Mel Gorman Subject: [PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock Date: Mon, 13 Jun 2022 13:56:22 +0100 Message-Id: <20220613125622.18628-8-mgorman@techsingularity.net> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20220613125622.18628-1-mgorman@techsingularity.net> References: <20220613125622.18628-1-mgorman@techsingularity.net> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655131977; a=rsa-sha256; cv=none; b=djcyVWK2ExM7jKdLWMd7F5f4vTFIH0UzhsKisvrD5iB+Ip7bJ/hl9szFOPg6ZWdNzZVaKZ +53Cnvz3MJn0/qmpJcIzAFTElQadf56S6fUkEwkzNpezRh79NGcdN0tuaHkqOZP3I2sVQZ NWcUOlAyMpaMf9P0FGxGUJ3Ecm1U7mQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.239 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655131977; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GbCL0IDw5Qb4VJ9sX1tMO66XxsOCvLtNurSEyZeGstw=; b=JMD10BDifIfYxJ0Rp3+aqLDQNLrKGNQFgcP1gb5SBtf6gMQRx+NiXpjhGkMVHYUkvN5rQ+ WJVJUhhArqNFSawyg+HAvx+mm1pxG5LwTIn30B0xduYilJl8R5e4B1zio78Xy8i0AJaMoO UUf6qps3FET1OONb4E5yfyvD01mU1Ww= Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.239 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 549C1C0340 X-Stat-Signature: bes9qdarc8y3nyt1deasa4uuiey4k45a X-Rspam-User: X-HE-Tag: 1655131977-460538 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: struct per_cpu_pages is no longer strictly local as PCP lists can be drained remotely using a lock for protection. While the use of local_lock works, it goes against the intent of local_lock which is for "pure CPU local concurrency control mechanisms and not suited for inter-CPU concurrency control" (Documentation/locking/locktypes.rst) local_lock protects against migration between when the percpu pointer is accessed and the pcp->lock acquired. The lock acquisition is a preemption point so in the worst case, a task could migrate to another NUMA node and accidentally allocate remote memory. The main requirement is to pin the task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT. Replace local_lock with helpers that pin a task to a CPU, lookup the per-cpu structure and acquire the embedded lock. It's similar to local_lock without breaking the intent behind the API. It is not a complete API as only the parts needed for PCP-alloc are implemented but in theory, the generic helpers could be promoted to a general API if there was demand for an embedded lock within a per-cpu struct with a guarantee that the per-cpu structure locked matches the running CPU and cannot use get_cpu_var due to RT concerns. PCP requires these semantics to avoid accidentally allocating remote memory. Signed-off-by: Mel Gorman Tested-by: Marek Szyprowski Reported-by: kernel test robot Signed-off-by: Mel Gorman Reported-by: Yu Zhao Signed-off-by: Andrew Morton --- mm/page_alloc.c | 226 ++++++++++++++++++++++++++---------------------- 1 file changed, 121 insertions(+), 105 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03882ce7765f..f10782ab7cc7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -126,13 +126,6 @@ typedef int __bitwise fpi_t; static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) -struct pagesets { - local_lock_t lock; -}; -static DEFINE_PER_CPU(struct pagesets, pagesets) = { - .lock = INIT_LOCAL_LOCK(lock), -}; - #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT) /* * On SMP, spin_trylock is sufficient protection. @@ -147,6 +140,81 @@ static DEFINE_PER_CPU(struct pagesets, pagesets) = { #define pcp_trylock_finish(flags) local_irq_restore(flags) #endif +/* + * Locking a pcp requires a PCP lookup followed by a spinlock. To avoid + * a migration causing the wrong PCP to be locked and remote memory being + * potentially allocated, pin the task to the CPU for the lookup+lock. + * preempt_disable is used on !RT because it is faster than migrate_disable. + * migrate_disable is used on RT because otherwise RT spinlock usage is + * interfered with and a high priority task cannot preempt the allocator. + */ +#ifndef CONFIG_PREEMPT_RT +#define pcpu_task_pin() preempt_disable() +#define pcpu_task_unpin() preempt_enable() +#else +#define pcpu_task_pin() migrate_disable() +#define pcpu_task_unpin() migrate_enable() +#endif + +/* + * Generic helper to lookup and a per-cpu variable with an embedded spinlock. + * Return value should be used with equivalent unlock helper. + */ +#define pcpu_spin_lock(type, member, ptr) \ +({ \ + type *_ret; \ + pcpu_task_pin(); \ + _ret = this_cpu_ptr(ptr); \ + spin_lock(&_ret->member); \ + _ret; \ +}) + +#define pcpu_spin_lock_irqsave(type, member, ptr, flags) \ +({ \ + type *_ret; \ + pcpu_task_pin(); \ + _ret = this_cpu_ptr(ptr); \ + spin_lock_irqsave(&_ret->member, flags); \ + _ret; \ +}) + +#define pcpu_spin_trylock_irqsave(type, member, ptr, flags) \ +({ \ + type *_ret; \ + pcpu_task_pin(); \ + _ret = this_cpu_ptr(ptr); \ + if (!spin_trylock_irqsave(&_ret->member, flags)) \ + _ret = NULL; \ + _ret; \ +}) + +#define pcpu_spin_unlock(member, ptr) \ +({ \ + spin_unlock(&ptr->member); \ + pcpu_task_unpin(); \ +}) + +#define pcpu_spin_unlock_irqrestore(member, ptr, flags) \ +({ \ + spin_unlock_irqrestore(&ptr->member, flags); \ + pcpu_task_unpin(); \ +}) + +/* struct per_cpu_pages specific helpers. */ +#define pcp_spin_lock(ptr) \ + pcpu_spin_lock(struct per_cpu_pages, lock, ptr) + +#define pcp_spin_lock_irqsave(ptr, flags) \ + pcpu_spin_lock_irqsave(struct per_cpu_pages, lock, ptr, flags) + +#define pcp_spin_trylock_irqsave(ptr, flags) \ + pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, ptr, flags) + +#define pcp_spin_unlock(ptr) \ + pcpu_spin_unlock(lock, ptr) + +#define pcp_spin_unlock_irqrestore(ptr, flags) \ + pcpu_spin_unlock_irqrestore(lock, ptr, flags) #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID DEFINE_PER_CPU(int, numa_node); EXPORT_PER_CPU_SYMBOL(numa_node); @@ -1481,10 +1549,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, /* Ensure requested pindex is drained first. */ pindex = pindex - 1; - /* - * local_lock_irq held so equivalent to spin_lock_irqsave for - * both PREEMPT_RT and non-PREEMPT_RT configurations. - */ + /* Caller must hold IRQ-safe pcp->lock so IRQs are disabled. */ spin_lock(&zone->lock); isolated_pageblocks = has_isolate_pageblock(zone); @@ -3052,10 +3117,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, { int i, allocated = 0; - /* - * local_lock_irq held so equivalent to spin_lock_irqsave for - * both PREEMPT_RT and non-PREEMPT_RT configurations. - */ + /* Caller must hold IRQ-safe pcp->lock so IRQs are disabled. */ spin_lock(&zone->lock); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, @@ -3367,30 +3429,17 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, return min(READ_ONCE(pcp->batch) << 2, high); } -/* Returns true if the page was committed to the per-cpu list. */ -static bool free_unref_page_commit(struct page *page, int migratetype, - unsigned int order, bool locked) +static void free_unref_page_commit(struct per_cpu_pages *pcp, struct zone *zone, + struct page *page, int migratetype, + unsigned int order) { - struct zone *zone = page_zone(page); - struct per_cpu_pages *pcp; int high; int pindex; bool free_high; - unsigned long __maybe_unused UP_flags; __count_vm_event(PGFREE); - pcp = this_cpu_ptr(zone->per_cpu_pageset); pindex = order_to_pindex(migratetype, order); - if (!locked) { - /* Protect against a parallel drain. */ - pcp_trylock_prepare(UP_flags); - if (!spin_trylock(&pcp->lock)) { - pcp_trylock_finish(UP_flags); - return false; - } - } - list_add(&page->pcp_list, &pcp->lists[pindex]); pcp->count += 1 << order; @@ -3408,13 +3457,6 @@ static bool free_unref_page_commit(struct page *page, int migratetype, free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch, free_high), pcp, pindex); } - - if (!locked) { - spin_unlock(&pcp->lock); - pcp_trylock_finish(UP_flags); - } - - return true; } /* @@ -3422,10 +3464,12 @@ static bool free_unref_page_commit(struct page *page, int migratetype, */ void free_unref_page(struct page *page, unsigned int order) { - unsigned long flags; + struct per_cpu_pages *pcp; + struct zone *zone; unsigned long pfn = page_to_pfn(page); int migratetype; - bool freed_pcp = false; + unsigned long flags; + unsigned long __maybe_unused UP_flags; if (!free_unref_page_prepare(page, pfn, order)) return; @@ -3446,12 +3490,16 @@ void free_unref_page(struct page *page, unsigned int order) migratetype = MIGRATE_MOVABLE; } - local_lock_irqsave(&pagesets.lock, flags); - freed_pcp = free_unref_page_commit(page, migratetype, order, false); - local_unlock_irqrestore(&pagesets.lock, flags); - - if (unlikely(!freed_pcp)) + zone = page_zone(page); + pcp_trylock_prepare(UP_flags); + pcp = pcpu_spin_trylock_irqsave(struct per_cpu_pages, lock, zone->per_cpu_pageset, flags); + if (pcp) { + free_unref_page_commit(pcp, zone, page, migratetype, order); + pcp_spin_unlock_irqrestore(pcp, flags); + } else { free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); + } + pcp_trylock_finish(UP_flags); } /* @@ -3500,20 +3548,20 @@ void free_unref_page_list(struct list_head *list) if (list_empty(list)) return; - local_lock_irqsave(&pagesets.lock, flags); - page = lru_to_page(list); locked_zone = page_zone(page); - pcp = this_cpu_ptr(locked_zone->per_cpu_pageset); - spin_lock(&pcp->lock); + pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags); list_for_each_entry_safe(page, next, list, lru) { struct zone *zone = page_zone(page); /* Different zone, different pcp lock. */ if (zone != locked_zone) { + /* Leave IRQs enabled as a new lock is acquired. */ spin_unlock(&pcp->lock); locked_zone = zone; + + /* Preemption disabled by pcp_spin_lock_irqsave. */ pcp = this_cpu_ptr(zone->per_cpu_pageset); spin_lock(&pcp->lock); } @@ -3528,33 +3576,19 @@ void free_unref_page_list(struct list_head *list) trace_mm_page_free_batched(page); - /* - * If there is a parallel drain in progress, free to the buddy - * allocator directly. This is expensive as the zone lock will - * be acquired multiple times but if a drain is in progress - * then an expensive operation is already taking place. - * - * TODO: Always false at the moment due to local_lock_irqsave - * and is preparation for converting to local_lock. - */ - if (unlikely(!free_unref_page_commit(page, migratetype, 0, true))) - free_one_page(page_zone(page), page, page_to_pfn(page), 0, migratetype, FPI_NONE); + free_unref_page_commit(pcp, zone, page, migratetype, 0); /* * Guard against excessive IRQ disabled times when we get * a large list of pages to free. */ if (++batch_count == SWAP_CLUSTER_MAX) { - spin_unlock(&pcp->lock); - local_unlock_irqrestore(&pagesets.lock, flags); + pcp_spin_unlock_irqrestore(pcp, flags); batch_count = 0; - local_lock_irqsave(&pagesets.lock, flags); - pcp = this_cpu_ptr(locked_zone->per_cpu_pageset); - spin_lock(&pcp->lock); + pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags); } } - spin_unlock(&pcp->lock); - local_unlock_irqrestore(&pagesets.lock, flags); + pcp_spin_unlock_irqrestore(pcp, flags); } /* @@ -3722,28 +3756,9 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, int migratetype, unsigned int alloc_flags, struct per_cpu_pages *pcp, - struct list_head *list, - bool locked) + struct list_head *list) { struct page *page; - unsigned long __maybe_unused UP_flags; - - /* - * spin_trylock is not necessary right now due to due to - * local_lock_irqsave and is a preparation step for - * a conversion to local_lock using the trylock to prevent - * IRQ re-entrancy. If pcp->lock cannot be acquired, the caller - * uses rmqueue_buddy. - * - * TODO: Convert local_lock_irqsave to local_lock. - */ - if (unlikely(!locked)) { - pcp_trylock_prepare(UP_flags); - if (!spin_trylock(&pcp->lock)) { - pcp_trylock_finish(UP_flags); - return NULL; - } - } do { if (list_empty(list)) { @@ -3776,10 +3791,6 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, } while (check_new_pcp(page, order)); out: - if (!locked) { - spin_unlock(&pcp->lock); - pcp_trylock_finish(UP_flags); - } return page; } @@ -3794,19 +3805,29 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct list_head *list; struct page *page; unsigned long flags; + unsigned long __maybe_unused UP_flags; - local_lock_irqsave(&pagesets.lock, flags); + /* + * spin_trylock_irqsave is not necessary right now as it'll only be + * true when contending with a remote drain. It's in place as a + * preparation step before converting pcp locking to spin_trylock + * to protect against IRQ reentry. + */ + pcp_trylock_prepare(UP_flags); + pcp = pcp_spin_trylock_irqsave(zone->per_cpu_pageset, flags); + if (!pcp) + return NULL; /* * On allocation, reduce the number of pages that are batch freed. * See nr_pcp_free() where free_factor is increased for subsequent * frees. */ - pcp = this_cpu_ptr(zone->per_cpu_pageset); pcp->free_factor >>= 1; list = &pcp->lists[order_to_pindex(migratetype, order)]; - page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list, false); - local_unlock_irqrestore(&pagesets.lock, flags); + page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); + pcp_spin_unlock_irqrestore(pcp, flags); + pcp_trylock_finish(UP_flags); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1); zone_statistics(preferred_zone, zone, 1); @@ -5410,10 +5431,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, goto failed; /* Attempt the batch allocation */ - local_lock_irqsave(&pagesets.lock, flags); - pcp = this_cpu_ptr(zone->per_cpu_pageset); + pcp = pcp_spin_lock_irqsave(zone->per_cpu_pageset, flags); pcp_list = &pcp->lists[order_to_pindex(ac.migratetype, 0)]; - spin_lock(&pcp->lock); while (nr_populated < nr_pages) { @@ -5424,13 +5443,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, } page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags, - pcp, pcp_list, true); + pcp, pcp_list); if (unlikely(!page)) { /* Try and allocate at least one page */ - if (!nr_account) { - spin_unlock(&pcp->lock); + if (!nr_account) goto failed_irq; - } break; } nr_account++; @@ -5443,8 +5460,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, nr_populated++; } - spin_unlock(&pcp->lock); - local_unlock_irqrestore(&pagesets.lock, flags); + pcp_spin_unlock_irqrestore(pcp, flags); __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(ac.preferred_zoneref->zone, zone, nr_account); @@ -5453,7 +5469,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, return nr_populated; failed_irq: - local_unlock_irqrestore(&pagesets.lock, flags); + pcp_spin_unlock_irqrestore(pcp, flags); failed: page = __alloc_pages(gfp, 0, preferred_nid, nodemask);