From patchwork Mon Aug 12 22:48:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13761096 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C8DBC3DA7F for ; Mon, 12 Aug 2024 22:48:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC2446B0089; Mon, 12 Aug 2024 18:48:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D72D36B009F; Mon, 12 Aug 2024 18:48:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C39E26B00A0; Mon, 12 Aug 2024 18:48:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A8F556B009F for ; Mon, 12 Aug 2024 18:48:30 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6064B160478 for ; Mon, 12 Aug 2024 22:48:30 +0000 (UTC) X-FDA: 82445083980.20.52B5EE8 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf23.hostedemail.com (Postfix) with ESMTP id B04D8140007 for ; Mon, 12 Aug 2024 22:48:28 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2j7foZSh; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of 3O5G6ZgYKCIQ627piwowwotm.kwutqv25-uus3iks.wzo@flex--yuzhao.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3O5G6ZgYKCIQ627piwowwotm.kwutqv25-uus3iks.wzo@flex--yuzhao.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723502830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=Z4tOQNonvu3WucHjBVDac8ntNZIpy85hQ1C4kgQXLs4=; b=Z+24EvZpuq/kE+d7Xz1PR33ub7CTqixHyODmm9/gXS2J3+HrRe/3byq1siIfA7IU8WiN4p Z6RsA1YuCpApVcRV/FOSC8J7owAu7JAMOhmpC/s7lC8ldvKJSheL2NscHF/LUy6T1ZjQWW e1jhtALGy/7KKY3fk5MUT7RyhuPDfAU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723502830; a=rsa-sha256; cv=none; b=2xwa4hzdGZqWzNH33N6wLPLDBcrBZ4Ql8kUj16x1ZX+QNcwdUKoQROS4STwvKLcM2IauCP UviO3yq3wr1BaGZs+TDJCEQBCQV6SyAPd7lsjjDU8Rvua/DauWhq1M8vkecrEcdZSfz13R YdQv5Ygml/7AggLobKq1JCOdNXITLZI= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2j7foZSh; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of 3O5G6ZgYKCIQ627piwowwotm.kwutqv25-uus3iks.wzo@flex--yuzhao.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3O5G6ZgYKCIQ627piwowwotm.kwutqv25-uus3iks.wzo@flex--yuzhao.bounces.google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-672bea19bedso116391427b3.3 for ; Mon, 12 Aug 2024 15:48:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723502908; x=1724107708; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=Z4tOQNonvu3WucHjBVDac8ntNZIpy85hQ1C4kgQXLs4=; b=2j7foZShBJwVcEsQpBqD13ZlN4dFi+gHvgDiogHPa4SLg3WZSdg32NLtW3594CJIbD 5mBArsO/nXEfu9WGia/zyA7iSepsMEcbCz37I1j4RlrE46SMcCNZ3/xLKbfdqnDJpszU XLbcZuJqfM/1znhkIOsuP41RJXy4KSlafTkXE4IYoWmMbk6w9oDuJSH7EwAYJ7Sz4CAu 5RDRCWoYEt051eZ/HBTQQez+mFRLHYzW/m0K4v2m2KGwXm44Is1eqYa2ejIiRqU15Rg2 0CVPplj5Js5r4U/LSTy1E8Wtd37cmfQzMm17g7jFX7HZGs6mbYyjRdGgjTrpieUQp2RV 21wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723502908; x=1724107708; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Z4tOQNonvu3WucHjBVDac8ntNZIpy85hQ1C4kgQXLs4=; b=cdtTKI0PnABpqm3wPCEKPDN5kO2Y5KnEDHAgreq4JjYhZDDtTD2gM7ETjY7Y8nR9Z9 qhuFagr/QMg/PP/XbtAeCecxR+GW2+xbiHNt/2yDjmxK5aASxh6gXa0yN0bNYc0rUxIR dz3XXFGIdpJzFt/+cBLBX6jmXi+xx77KN/m6oL+Wo+JruBHOEGAwkq4CdnfhDInXsj8z A52oVQRtBuS9S6WtDzzo6WARGBit6XPUV1kma/cUJzoqRCByWXLHIlhFtBG4YUMWCLP2 HPri0fGAFK+jEUFFAWYI1FMK2E9tg2T9u5DNiorEC/lgTdse1+Vi15tk/VYEQiqdqbIe Joaw== X-Gm-Message-State: AOJu0Yzvy3Xnn+TEt04o7NsbLirNpxg8kEbf/JJ6GCzn+/NDA0OgKdBb X+tK149Jrz3TDDWSE1EPyfmtgaUvOyT88GqYgqEAqsek3Jw2fhMGI+F9XtIGWujU+3bs1nlZBP1 umw== X-Google-Smtp-Source: AGHT+IGlt5ONcDJU5HS/vpv3a/e8q0PCTlMlvhLc6kDsXLASkpvAzlU8vdmVaQqQHVieDd5QnjBXBKwi2P8= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:87e4:5aca:969:682f]) (user=yuzhao job=sendgmr) by 2002:a0d:e543:0:b0:650:93e3:fe73 with SMTP id 00721157ae682-6a9759b99e3mr583647b3.5.1723502907708; Mon, 12 Aug 2024 15:48:27 -0700 (PDT) Date: Mon, 12 Aug 2024 16:48:23 -0600 Mime-Version: 1.0 X-Mailer: git-send-email 2.46.0.76.ge559c4bf1a-goog Message-ID: <20240812224823.3914837-1-yuzhao@google.com> Subject: [PATCH mm-unstable v2] mm/hugetlb_vmemmap: batch HVO work when demoting From: Yu Zhao To: Andrew Morton , Muchun Song Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B04D8140007 X-Stat-Signature: koyk71t73a4z885765d6ixpifh8x9uuy X-Rspam-User: X-HE-Tag: 1723502908-451127 X-HE-Meta: U2FsdGVkX1+M9xuuYSRK3rHaP04tys252fBtUw0r81pGAzp3tI222uJ9/n0RBPM+2OORtIBh9uH3Sv/QOggbw7U3oDMGxk4gAoaY+tqA/cUeN6qUTWeTQh4m6h85pUl1s5qShXaY7O0neL46NI8fQp7E/V8d+yCs77fPOiG9ut7/vOU+P36Yjf1PqlDqE9XQGmlMTjJTuioeCGsxc4I62zo81FVdOj4v5+nQ38HZYcpuoeCfbtkTktdSTnjoxI5EmmelguGmdZHNzg0/tDvPZk97yPgcGLXs4vaCB42fMoFCwlO6rj4XgecoIn7LaX77Q0LXhhcOYwWtH+w2IzSxppTp291SuwGAG/2xfrNAijAcYpdpSnRDynaAbFHhqvFGsg9JSEdd2Vb/n9XnFDOTjHBiUjJ3Nxx4BZrcheP1qIUAP78PWFlgLq9GDslgtaJblTRzC+aPcQYClnV/5PVmygtWhS1XQtQUnJf/mdNZ1dpFadpv/qz7TVbkE+e1qkP91XyV8cd7FdEVBlHJBjtNkCh7piuiKV/q0tZdMxBqJ4PA2tjMP1CTz3d4CZSgPnr9antbpVC4KvPkKK4Ey7I8w3M9mBi5jcePB1FRCLlKytK9EqmQurhV36Q0H8F+6EbwmJbxPt+ZsCl0nKowG/7Qidu4PAwqS14gIYapNlfgRghZn5oGg6yRaN2JlZ6TTEQlaZo0s/4jvW8FX0Wo9lF6QzrMx4AyTD33u5NryOdqSTXqmkbiJNkhep/vbk6RcvMfMMGRdldOWSmM313L+S569mv54Wv3R6zrPbL5GpznDt2kEgX+9Fml82SziKVR1AJsu8Yp55Pz8VL/UkMDPw7x2vbBkM0BWaISEvxWw1WfStNh+4q9ppa76KPesehrgLDQnEfao6t/GONNA2eCszlov2ZfTQyjeTjSU63C5QPghObVYfnliGDPMx1m6aJAHvG1kduX3t1XP/wfvGImvkj wp/qgdZL RWzKwvsclnO8Dceq9OhbJA5QoNcrYY7BDpQnytcfJ12WcHFLeldGn5RhiZ8h3FU7d8F+WetJVgqi9vOe24HFL4WIW/6JS6kK6V1WlDA5yzSRiGyztqhFyFKfGVbxSGPqVHN9cVN3Zq12+pTP0arBotfxi/TeONZKG0+y1BQ55qBe8JgtkdaQ5h28xVTB8nIVv2lRWaA1uDNErKus6YFrzb5XNlnB5W1rLV2yDicONcLfMsiy5wtYVK/juZ9wGnJLcKl2WIf1CQNb9YzONci/wJs62OX2B4Kek3/2MTJCa28LkpcQ7ofeI/w8MRpnmLiFcOG8sTSHouLXGiLN+vBytMQ8zeOmpeYYnhkguLa5VMKry/G/4E4VKVUELWBsIfiS1O6389y+Lc0L5ag1ArSLKtwfuG4XP63LnPXAO4At2bQSLdtxvSsvJfkVboLzBUYHG/8i/yJP5WPW8+La6mMTnQBivhHtg8PMhp6zf1HpNDtdxJabH4PIaQxvNrluMkz6vLbY9pL+jE7iWM1xu8CdZiURZgMzTWvH3uvlALZi6IrrHFpHzA0aomzOmBA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Batch the HVO work, including de-HVO of the source and HVO of the destination hugeTLB folios, to speed up demotion. After commit bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers"), each request of HVO or de-HVO, batched or not, invokes synchronize_rcu() once. For example, when not batched, demoting one 1GB hugeTLB folio to 512 2MB hugeTLB folios invokes synchronize_rcu() 513 times (1 de-HVO plus 512 HVO requests), whereas when batched, only twice (1 de-HVO plus 1 HVO request). And the performance difference between the two cases is significant, e.g., echo 2048kB >/sys/kernel/mm/hugepages/hugepages-1048576kB/demote_size time echo 100 >/sys/kernel/mm/hugepages/hugepages-1048576kB/demote Before this patch: real 8m58.158s user 0m0.009s sys 0m5.900s After this patch: real 0m0.900s user 0m0.000s sys 0m0.851s Note that this patch changes the behavior of the `demote` interface when de-HVO fails. Before, the interface aborts immediately upon failure; now, it tries to finish an entire batch, meaning it can make extra progress if the rest of the batch contains folios that do not need to de-HVO. Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") Signed-off-by: Yu Zhao Reviewed-by: Muchun Song --- mm/hugetlb.c | 156 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 92 insertions(+), 64 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1fdd9eab240c..d2b9555e6c45 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3921,100 +3921,124 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, return 0; } -static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio) +static long demote_free_hugetlb_folios(struct hstate *src, struct hstate *dst, + struct list_head *src_list) { - int i, nid = folio_nid(folio); - struct hstate *target_hstate; - struct page *subpage; - struct folio *inner_folio; - int rc = 0; + long rc; + struct folio *folio, *next; + LIST_HEAD(dst_list); + LIST_HEAD(ret_list); - target_hstate = size_to_hstate(PAGE_SIZE << h->demote_order); - - remove_hugetlb_folio(h, folio, false); - spin_unlock_irq(&hugetlb_lock); - - /* - * If vmemmap already existed for folio, the remove routine above would - * have cleared the hugetlb folio flag. Hence the folio is technically - * no longer a hugetlb folio. hugetlb_vmemmap_restore_folio can only be - * passed hugetlb folios and will BUG otherwise. - */ - if (folio_test_hugetlb(folio)) { - rc = hugetlb_vmemmap_restore_folio(h, folio); - if (rc) { - /* Allocation of vmemmmap failed, we can not demote folio */ - spin_lock_irq(&hugetlb_lock); - add_hugetlb_folio(h, folio, false); - return rc; - } - } - - /* - * Use destroy_compound_hugetlb_folio_for_demote for all huge page - * sizes as it will not ref count folios. - */ - destroy_compound_hugetlb_folio_for_demote(folio, huge_page_order(h)); + rc = hugetlb_vmemmap_restore_folios(src, src_list, &ret_list); + list_splice_init(&ret_list, src_list); /* * Taking target hstate mutex synchronizes with set_max_huge_pages. * Without the mutex, pages added to target hstate could be marked * as surplus. * - * Note that we already hold h->resize_lock. To prevent deadlock, + * Note that we already hold src->resize_lock. To prevent deadlock, * use the convention of always taking larger size hstate mutex first. */ - mutex_lock(&target_hstate->resize_lock); - for (i = 0; i < pages_per_huge_page(h); - i += pages_per_huge_page(target_hstate)) { - subpage = folio_page(folio, i); - inner_folio = page_folio(subpage); - if (hstate_is_gigantic(target_hstate)) - prep_compound_gigantic_folio_for_demote(inner_folio, - target_hstate->order); - else - prep_compound_page(subpage, target_hstate->order); - folio_change_private(inner_folio, NULL); - prep_new_hugetlb_folio(target_hstate, inner_folio, nid); - free_huge_folio(inner_folio); + mutex_lock(&dst->resize_lock); + + list_for_each_entry_safe(folio, next, src_list, lru) { + int i; + + if (folio_test_hugetlb_vmemmap_optimized(folio)) + continue; + + list_del(&folio->lru); + /* + * Use destroy_compound_hugetlb_folio_for_demote for all huge page + * sizes as it will not ref count folios. + */ + destroy_compound_hugetlb_folio_for_demote(folio, huge_page_order(src)); + + for (i = 0; i < pages_per_huge_page(src); i += pages_per_huge_page(dst)) { + struct page *page = folio_page(folio, i); + + if (hstate_is_gigantic(dst)) + prep_compound_gigantic_folio_for_demote(page_folio(page), + dst->order); + else + prep_compound_page(page, dst->order); + set_page_private(page, 0); + + init_new_hugetlb_folio(dst, page_folio(page)); + list_add(&page->lru, &dst_list); + } } - mutex_unlock(&target_hstate->resize_lock); - spin_lock_irq(&hugetlb_lock); + prep_and_add_allocated_folios(dst, &dst_list); - /* - * Not absolutely necessary, but for consistency update max_huge_pages - * based on pool changes for the demoted page. - */ - h->max_huge_pages--; - target_hstate->max_huge_pages += - pages_per_huge_page(h) / pages_per_huge_page(target_hstate); + mutex_unlock(&dst->resize_lock); return rc; } -static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed) +static long demote_pool_huge_page(struct hstate *src, nodemask_t *nodes_allowed, + unsigned long nr_to_demote) __must_hold(&hugetlb_lock) { int nr_nodes, node; - struct folio *folio; + struct hstate *dst; + long rc = 0; + long nr_demoted = 0; lockdep_assert_held(&hugetlb_lock); /* We should never get here if no demote order */ - if (!h->demote_order) { + if (!src->demote_order) { pr_warn("HugeTLB: NULL demote order passed to demote_pool_huge_page.\n"); return -EINVAL; /* internal error */ } + dst = size_to_hstate(PAGE_SIZE << src->demote_order); - for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) { - list_for_each_entry(folio, &h->hugepage_freelists[node], lru) { + for_each_node_mask_to_free(src, nr_nodes, node, nodes_allowed) { + LIST_HEAD(list); + struct folio *folio, *next; + + list_for_each_entry_safe(folio, next, &src->hugepage_freelists[node], lru) { if (folio_test_hwpoison(folio)) continue; - return demote_free_hugetlb_folio(h, folio); + + remove_hugetlb_folio(src, folio, false); + list_add(&folio->lru, &list); + + if (++nr_demoted == nr_to_demote) + break; } + + spin_unlock_irq(&hugetlb_lock); + + rc = demote_free_hugetlb_folios(src, dst, &list); + + spin_lock_irq(&hugetlb_lock); + + list_for_each_entry_safe(folio, next, &list, lru) { + list_del(&folio->lru); + add_hugetlb_folio(src, folio, false); + + nr_demoted--; + } + + if (rc < 0 || nr_demoted == nr_to_demote) + break; } + /* + * Not absolutely necessary, but for consistency update max_huge_pages + * based on pool changes for the demoted page. + */ + src->max_huge_pages -= nr_demoted; + dst->max_huge_pages += nr_demoted << (huge_page_order(src) - huge_page_order(dst)); + + if (rc < 0) + return rc; + + if (nr_demoted) + return nr_demoted; /* * Only way to get here is if all pages on free lists are poisoned. * Return -EBUSY so that caller will not retry. @@ -4249,6 +4273,8 @@ static ssize_t demote_store(struct kobject *kobj, spin_lock_irq(&hugetlb_lock); while (nr_demote) { + long rc; + /* * Check for available pages to demote each time thorough the * loop as demote_pool_huge_page will drop hugetlb_lock. @@ -4261,11 +4287,13 @@ static ssize_t demote_store(struct kobject *kobj, if (!nr_available) break; - err = demote_pool_huge_page(h, n_mask); - if (err) + rc = demote_pool_huge_page(h, n_mask, nr_demote); + if (rc < 0) { + err = rc; break; + } - nr_demote--; + nr_demote -= rc; } spin_unlock_irq(&hugetlb_lock);