From patchwork Thu Jul 4 06:52:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ge Yang X-Patchwork-Id: 13723324 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AEFFC30653 for ; Thu, 4 Jul 2024 06:53:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A27876B00B3; Thu, 4 Jul 2024 02:53:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D6B26B00B4; Thu, 4 Jul 2024 02:53:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C5806B00B5; Thu, 4 Jul 2024 02:53:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6908F6B00B3 for ; Thu, 4 Jul 2024 02:53:10 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E7F8B1C28AA for ; Thu, 4 Jul 2024 06:53:09 +0000 (UTC) X-FDA: 82301153298.10.FC13ADD Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.8]) by imf04.hostedemail.com (Postfix) with ESMTP id 3819340013 for ; Thu, 4 Jul 2024 06:53:05 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=bvlQ9sk8; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf04.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.8 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720075969; a=rsa-sha256; cv=none; b=zEhsoI7oPJccLEo/6dOJQk0s2fmUb8TDUZkuezyF3a61Pci42mT1F/TJW0e53qfeGroTzI nOChvAC+dUor2P+U3T6nAp7b2RkHfxhPFN34iC1E9MWUFgGUDQAIjJJXDikquP2srbuErc Z7mkSeHCTr5Ik9THTEnjochnxX4L3qE= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=bvlQ9sk8; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf04.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.8 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720075968; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references:dkim-signature; bh=G2ce0ny7NpQUp89do8pJiahKyj8CvImCsHF65fd4k1Y=; b=dZ7CQvdiELk02HP9JwwXObemVmeriVNW9fbevxUxSHIKxRx3BeCK6jD+ZfJucQyw7HTCg9 TcKFf/eJM7zuYeliKoic163Je7/lWstL+g+ULu9sQlauKl1fZ7zt54NJ1mhR7+MwK9y2f3 jMcV4JX05ZUigDWceu7jb2K5DxeN2N4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=From:Subject:Date:Message-Id; bh=G2ce0ny7NpQUp89do8 pJiahKyj8CvImCsHF65fd4k1Y=; b=bvlQ9sk8liOXi6532J7WtvtTn3qnv8NjhX 7PIb/N1lBDJasCvIlUvCDbUMFYG4BxnsRcwj8TcpNMh2yRXPuYm/qVHsGLm9+L+Z DDXb8OvhVBVB7kh76INqycJApPsLmJ9OHzP451YwirtIxVYZ8bdomkp2H66JiSTG k7RA4qBNg= Received: from hg-OptiPlex-7040.hygon.cn (unknown [118.242.3.34]) by gzga-smtp-mta-g1-5 (Coremail) with SMTP id _____wDHba3ERoZm+hJAAQ--.15286S2; Thu, 04 Jul 2024 14:52:54 +0800 (CST) From: yangge1116@126.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, aneesh.kumar@linux.ibm.com, liuzixing@hygon.cn, yangge Subject: [PATCH V4] mm/gup: Clear the LRU flag of a page before adding to LRU batch Date: Thu, 4 Jul 2024 14:52:24 +0800 Message-Id: <1720075944-27201-1-git-send-email-yangge1116@126.com> X-Mailer: git-send-email 2.7.4 X-CM-TRANSID: _____wDHba3ERoZm+hJAAQ--.15286S2 X-Coremail-Antispam: 1Uf129KBjvJXoW3WF4rGF4DJF1DCr4rXw1xGrg_yoW7tr17pF W7Gr9IqFWDGFsrur47Xw15Ar1Yk393Xa1UJFWxGry7ZF45Xw1qkF1xKw1UJa9xJry5uFna v3WUJF1vgF1UAF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0zRoGQDUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiWQsSG2VLbKWqWQAAsP X-Rspamd-Queue-Id: 3819340013 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: waxzsxr76m18gtunq1n8iu1th5fui9ga X-HE-Tag: 1720075985-705882 X-HE-Meta: U2FsdGVkX1+NnUddkYnTYeraBz5Ia8QP7yaz9P6+f9276dzjYzoHoeHayuQPP2dqGNtTXpTIJ09QL4v8w5E3eDjoXsSDeAtyVNuBbAScLxpQ+iQvUMj8SAb8EWytw89I+4zuMj20QpMcVsn0in/CVBaME6DVIdGyhYukN2eNAMZolSG4iBEkiWvM9SL5JtOiXrPBlxIXio7NzpnhsK3t+pJwlTUqwlEpgP5G0s//SeL68Iv9mmnZnOUmz3RPtWOSxg4JakMMKLY4ft4xGo4U621JcOjTFM4ZE4/2JCmqslk4gaHvmJ2bwstW9fU7Q0HWqHolS171ZwpOWZRaCexaXoPAdyaN+L5WjT7L5sHIjqWLdGky8szrIIgTR2sZ5PbZw67IFsZNDPATLOWeGSWM7fFO0KCS5G6/xZTr/IaRUKWKTRU4hXw27wHqT1f5y6nRdIX4xTJTNJPdOgnUwR3U3EmnNg7lNbsW7JGvB9DZCWl8lW+ovAgWdHxxj7yD98c2d7jmqQano05vP448NvLs2PLKYaDlDeZY89NplU1xe6nSom7UlfnvYY/nc2ES3GEyPENi8CINQTwC8pZuMPyP8Y05TeTBTtfAfGJ79X5LWQL+Wtequggyh3EpmZUb+eg6zfRpMHui312GVhCtm+nNRQYHITzh9JYa8B18mfqKpqMGD0/5oWZ1zT7McDGZaWUr09p0+ub5Ceug78fb2B/tnoZ3fZmHLiZqqv4KlM7HZ65rL1lFA3Cz+6otzKCnivNllsyCDrIqUSs0uarqMa8E/X7XWGJrKQvyC1zjJzIrAfjGTg7nnLYgPUwzAEbvZ8CMwPyaAO7m2I+6Dk2hKv/Vs6Fnaj/v8ivdYOBKRaW/7Ub9JL7+fRkkB/byhPbVdB1hNDw5GJSVZJiHKBOMCXUfS/P6MFEAnpkwk35sS9EWeeG3mF8AyhN9rOwtjawnjwszJLeCbc38cPhlBQxSJaH PorPhXx+ FYle9IcyzV0OoWL2wNV9fyd698d/eG1VUtQmX1MAjnnXZEKiPnF82pTsFJxQZrcewk1x9DXPQzEGqYoJLsvDAU2K5DQLyVAwRJtgQIO+ovQZP1GDgsyIzlGjAthlBO3BgKWnsuP6NzPxUUJoH5TkW0MSvkAMzWck1YUIrdLsMPsFDXOUPOQEb1mi+RxorBJdOJI5T1e9VdVK5XMAbaHnT6tiRFcHB3HM35kFABYFQJanf4Hz1h8/kOZ2SlxLdApn3p1aQmsLXt3o6LOJfxH1vdMK5/DoMkhcoDWLFxxP4elJpdTPiQZgjT0p3S51gQa/XPROA+1nZCloOgLlkI1Os5fiYTQQnlp+97VlptesnhhgUY4nODG1c8Pljzfz9C5jSlYcF/X70Tdf4lCJse391tekigVn6Q62MUeAPeXIym+qOKV7hdAOhpK0CzA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: yangge If a large number of CMA memory are configured in system (for example, the CMA memory accounts for 50% of the system memory), starting a virtual virtual machine with device passthrough, it will call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. Normally if a page is present and in CMA area, pin_user_pages_remote() will migrate the page from CMA area to non-CMA area because of FOLL_LONGTERM flag. But the current code will cause the migration failure due to unexpected page refcounts, and eventually cause the virtual machine fail to start. If a page is added in LRU batch, its refcount increases one, remove the page from LRU batch decreases one. Page migration requires the page is not referenced by others except page mapping. Before migrating a page, we should try to drain the page from LRU batch in case the page is in it, however, folio_test_lru() is not sufficient to tell whether the page is in LRU batch or not, if the page is in LRU batch, the migration will fail. To solve the problem above, we modify the logic of adding to LRU batch. Before adding a page to LRU batch, we clear the LRU flag of the page so that we can check whether the page is in LRU batch by folio_test_lru(page). It's quite valuable, because likely we don't want to blindly drain the LRU batch simply because there is some unexpected reference on a page, as described above. This change makes the LRU flag of a page invisible for longer, which may impact some programs. For example, as long as a page is on a LRU batch, we cannot isolate it, and we cannot check if it's an LRU page. Further, a page can now only be on exactly one LRU batch. This doesn't seem to matter much, because a new page is allocated from buddy and added to the lru batch, or be isolated, it's LRU flag may also be invisible for a long time. Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Cc: Signed-off-by: yangge --- mm/swap.c | 43 +++++++++++++++++++++++++++++++------------ 1 file changed, 31 insertions(+), 12 deletions(-) V4: Adjust commit message according to David's comments V3: Add fixes tag V2: Adjust code and commit message according to David's comments diff --git a/mm/swap.c b/mm/swap.c index dc205bd..9caf6b0 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -211,10 +211,6 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn) for (i = 0; i < folio_batch_count(fbatch); i++) { struct folio *folio = fbatch->folios[i]; - /* block memcg migration while the folio moves between lru */ - if (move_fn != lru_add_fn && !folio_test_clear_lru(folio)) - continue; - folio_lruvec_relock_irqsave(folio, &lruvec, &flags); move_fn(lruvec, folio); @@ -255,11 +251,16 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) void folio_rotate_reclaimable(struct folio *folio) { if (!folio_test_locked(folio) && !folio_test_dirty(folio) && - !folio_test_unevictable(folio) && folio_test_lru(folio)) { + !folio_test_unevictable(folio)) { struct folio_batch *fbatch; unsigned long flags; folio_get(folio); + if (!folio_test_clear_lru(folio)) { + folio_put(folio); + return; + } + local_lock_irqsave(&lru_rotate.lock, flags); fbatch = this_cpu_ptr(&lru_rotate.fbatch); folio_batch_add_and_move(fbatch, folio, lru_move_tail_fn); @@ -352,11 +353,15 @@ static void folio_activate_drain(int cpu) void folio_activate(struct folio *folio) { - if (folio_test_lru(folio) && !folio_test_active(folio) && - !folio_test_unevictable(folio)) { + if (!folio_test_active(folio) && !folio_test_unevictable(folio)) { struct folio_batch *fbatch; folio_get(folio); + if (!folio_test_clear_lru(folio)) { + folio_put(folio); + return; + } + local_lock(&cpu_fbatches.lock); fbatch = this_cpu_ptr(&cpu_fbatches.activate); folio_batch_add_and_move(fbatch, folio, folio_activate_fn); @@ -700,6 +705,11 @@ void deactivate_file_folio(struct folio *folio) return; folio_get(folio); + if (!folio_test_clear_lru(folio)) { + folio_put(folio); + return; + } + local_lock(&cpu_fbatches.lock); fbatch = this_cpu_ptr(&cpu_fbatches.lru_deactivate_file); folio_batch_add_and_move(fbatch, folio, lru_deactivate_file_fn); @@ -716,11 +726,16 @@ void deactivate_file_folio(struct folio *folio) */ void folio_deactivate(struct folio *folio) { - if (folio_test_lru(folio) && !folio_test_unevictable(folio) && - (folio_test_active(folio) || lru_gen_enabled())) { + if (!folio_test_unevictable(folio) && (folio_test_active(folio) || + lru_gen_enabled())) { struct folio_batch *fbatch; folio_get(folio); + if (!folio_test_clear_lru(folio)) { + folio_put(folio); + return; + } + local_lock(&cpu_fbatches.lock); fbatch = this_cpu_ptr(&cpu_fbatches.lru_deactivate); folio_batch_add_and_move(fbatch, folio, lru_deactivate_fn); @@ -737,12 +752,16 @@ void folio_deactivate(struct folio *folio) */ void folio_mark_lazyfree(struct folio *folio) { - if (folio_test_lru(folio) && folio_test_anon(folio) && - folio_test_swapbacked(folio) && !folio_test_swapcache(folio) && - !folio_test_unevictable(folio)) { + if (folio_test_anon(folio) && folio_test_swapbacked(folio) && + !folio_test_swapcache(folio) && !folio_test_unevictable(folio)) { struct folio_batch *fbatch; folio_get(folio); + if (!folio_test_clear_lru(folio)) { + folio_put(folio); + return; + } + local_lock(&cpu_fbatches.lock); fbatch = this_cpu_ptr(&cpu_fbatches.lru_lazyfree); folio_batch_add_and_move(fbatch, folio, lru_lazyfree_fn);