From patchwork Tue Jul 23 06:44:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Zhijian Li (Fujitsu)" X-Patchwork-Id: 13739440 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56115C3DA49 for ; Tue, 23 Jul 2024 06:44:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CEBE06B0098; Tue, 23 Jul 2024 02:44:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C728F6B00A9; Tue, 23 Jul 2024 02:44:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AEBD56B00A8; Tue, 23 Jul 2024 02:44:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8C3A46B0098 for ; Tue, 23 Jul 2024 02:44:50 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 40B2781CE1 for ; Tue, 23 Jul 2024 06:44:50 +0000 (UTC) X-FDA: 82370079540.24.BAB36A4 Received: from esa8.hc1455-7.c3s2.iphmx.com (esa8.hc1455-7.c3s2.iphmx.com [139.138.61.253]) by imf02.hostedemail.com (Postfix) with ESMTP id EDD4A80018 for ; Tue, 23 Jul 2024 06:44:47 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=fj2 header.b=IH9Th97Q; spf=pass (imf02.hostedemail.com: domain of lizhijian@fujitsu.com designates 139.138.61.253 as permitted sender) smtp.mailfrom=lizhijian@fujitsu.com; dmarc=pass (policy=reject) header.from=fujitsu.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721717042; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=xMc2V7OzK9+n643S50dOapUj6uI0F5fyAJJc7ujxXPo=; b=8ANoXOtd2Lf0h5R+PuL0B0qBAspH8M3h+E9f7vRYSy4uHItMJ5uBBV/CWv8PXuMIj3+KiO GLRrAaNXu1nGvR4fXYZYtwOo9i83nfiyNgAE4HcGGBoDNreNVKnqrw6HfIdCIFcRTon7mv Vi4Od432oDnU06yp71YJVUucngIusb4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721717042; a=rsa-sha256; cv=none; b=5/64eRN+gFIoaEL07rfTxCbLwWw/f6HFCu/wtNC9gjgio8z1XtxxsWfjhrOzmJBzPGe3B2 uQmpDVuoUJ5ILuCrTmWzLKXGbXhH0xf9JGjCQ1u9HJFNDvB/URlsODrge2Re+OZlgpRYcl pOM3+nzFrLjX4wE8g2CgV09zOb+LhWI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=fj2 header.b=IH9Th97Q; spf=pass (imf02.hostedemail.com: domain of lizhijian@fujitsu.com designates 139.138.61.253 as permitted sender) smtp.mailfrom=lizhijian@fujitsu.com; dmarc=pass (policy=reject) header.from=fujitsu.com DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1721717088; x=1753253088; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=Wj9f5tB2mEdbYVYaWt84WmCS/rsQK08PH2985ou0wZc=; b=IH9Th97QHTGg+bYjsYpyCnDmfbzm7nOc5ux+rlgpsoWXEUCvJXS0mIRh qUlRXXMrFbT58LiysfsAn/CJnCZJZgEtjyYIeIadncEYUT3Kz4lOvE7io nbR9QcEuhkygUNhiVLviviFyraPJg+cqk5iaQ00KDdzQ81UY99+m9augf wKFAC7egZowfjpLTEZPGfpt0u+svi6EoGJC8IN2NiW7Vnwo/8wqBf91ny 1/yaQ/zVHZVjRVXAKLVBtNlnSq1QEj8Z0TWLXMHE81vaiqDTs4hof7WPu xdJ3GJ1nAejZQ5DllCF1BhNFJ56//4PDhMaQp4rxSq0F/Em1cFSwCOFFa Q==; X-IronPort-AV: E=McAfee;i="6700,10204,11141"; a="156235363" X-IronPort-AV: E=Sophos;i="6.09,230,1716217200"; d="scan'208";a="156235363" Received: from unknown (HELO oym-r4.gw.nic.fujitsu.com) ([210.162.30.92]) by esa8.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jul 2024 15:44:45 +0900 Received: from oym-m2.gw.nic.fujitsu.com (oym-nat-oym-m2.gw.nic.fujitsu.com [192.168.87.59]) by oym-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id AE1E8D800A for ; Tue, 23 Jul 2024 15:44:43 +0900 (JST) Received: from kws-ab3.gw.nic.fujitsu.com (kws-ab3.gw.nic.fujitsu.com [192.51.206.21]) by oym-m2.gw.nic.fujitsu.com (Postfix) with ESMTP id E8113BF3DA for ; Tue, 23 Jul 2024 15:44:42 +0900 (JST) Received: from edo.cn.fujitsu.com (edo.cn.fujitsu.com [10.167.33.5]) by kws-ab3.gw.nic.fujitsu.com (Postfix) with ESMTP id 7322F200745FF for ; Tue, 23 Jul 2024 15:44:42 +0900 (JST) Received: from localhost.localdomain (unknown [10.167.226.45]) by edo.cn.fujitsu.com (Postfix) with ESMTP id 856DF1A000A; Tue, 23 Jul 2024 14:44:40 +0800 (CST) From: Li Zhijian To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Yasunori Gotou , Li Zhijian , David Hildenbrand , Vlastimil Babka , Yao Xingtao Subject: [PATCH v3] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() Date: Tue, 23 Jul 2024 14:44:28 +0800 Message-Id: <20240723064428.1179519-1-lizhijian@fujitsu.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-TM-AS-Product-Ver: IMSS-9.1.0.1417-9.0.0.1002-28546.005 X-TM-AS-User-Approved-Sender: Yes X-TMASE-Version: IMSS-9.1.0.1417-9.0.1002-28546.005 X-TMASE-Result: 10--5.765100-10.000000 X-TMASE-MatchedRID: WyUGENwRfafoSitJVour/QuB7zdAMUjAl9q75JzWJRMarX6LchMkVuBw lzWEEXt2fatVQLA534yNMkq6FfSn6lnGEjlsas2yFDuTLTe6zcNMkOX0UoduuV7V7de6UnlgmKb hu5KaCkf9F5gpB/8TUo2MogdbmQhJWSEm/dnndoSdVNZaI2n6//SzAdIVxUno2vch1fMqmI8mIm l+ywrqvklEFjVj/aAsbncztPPsTqsv+0FNnM7lDRVqL8+WwS7muhv94WF6cmlQWIO4YO+fPSsoE Tq1uNbxoa8IWqgpLBOAMuqetGVetnyef22ep6XYOwBXM346/+yq6ihSMQjFT/u99Mh5IiWaer/4 kvMjwQ8/NgRatoZopQ8W4Xlk8M63 X-TMASE-SNAP-Result: 1.821001.0001-0-1-22:0,33:0,34:0-0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: EDD4A80018 X-Stat-Signature: hmhyn3mrghuy4uzkjz4xc4kpbp9g3hn3 X-HE-Tag: 1721717087-841410 X-HE-Meta: U2FsdGVkX18Pvaz6DzTQ4GnExq0zY/Faigvz3PhJT565uygpuiX7sixjBsMQurva5ptSGirIW3ayLY7v/KUkrAyr2JPFfCuNsA9TfdiTGLofT2jP2Xn6w6EoK69JOw3biGFSMrLJCeloYNa+G47XLe/zPvx9KPNBm5atHKPpQk5rgwUxcICiGdroeeCIrDVfbVfNJJ3+9TTgvMCjlbNqv+nu9W/q9TjCc+7EyeoB8VYwlbpC0Z0eLFPb6liVxWu5YcxdSi9dvnshOlaO/QxBKeQppB45/jdqOmjBnunVS8R0hD38ndWEHGZ10rpVyIFEQowLZOAC3tLc2DttXrvJCpak5DtPpimmxz1M5vHKjSOpyALOJCrRSlPOVzZaPu5cc1R7nglMVj7n3wQHq7dhLeIE8qYgSRbONG7DOld1B1c3YB96XPHGjtbjH0f6u3YhvbCd9/mqA4NYtGM/aV6L14Ol3Wq1minSGTHBJBJS7YkN8FqsN64Wv7ODJ2y+wnd8Q1nxMqejDvBTjJfdX1YsKo46+UvUz22K8+p7scDlHbW1kOWn3UkaT7nWZRiRwMwJ5ufW4nc7ehQ9ZVjjkIgkmASn11Ue8AME/tV874+IvIoJCAlunAIpbnLoSpchN6Z/nq4Cvpx73Wt5hYlx0T62n9RBSJ+3fJhvfOjjVEprjbldm0fOxbtlajMvZpzyg+U5QupYUbcKTB9pIk9jPuateTXHDeSJb+CiMxfVtpBrRuxqdAh+FeUoAX8LlT+VXsaVMc9N7uUyEUf09bPY+uoRW4KeMLaKlBJuVZllZ4X0k5yqE6eNXbq+ytvto8xgDbBAAaNg17BYJRgg26fAFL6Dolc81/SF9l7ZAR4G3mUXVOY2sfic/Yctjyjad0pENURudDXWSlbkb7Gma1EitRlCR36wsHQPz/oiBcBA8yd+npYokkbtaUeOqoarorAEbrXp7SWaZk5dgyr/TOORiBE JaXXMqnI hQ+Z+eNjPXnT/B2Fus4Wn8KFULMKv5N1WcgtiKpHs2MC3bPr1avw3fF5YUFv8pDidp9TlofMirOwztH95HJEIhjqG9NEHtFJfPv+4ndWh1xzfOXbrLzpO5brtaMZ8clUkXp5/yAVJm+illsqUNCn7rWPsS2v7qJfVw1vRmel+5eV80dtxL9dYz2nJ3Rk0bNz0AZkVNqc0r2WTaHJRdaCn8olqhLRg7j6cd8dMTy0hOEQLzIjYOJkDthovTkmc48Zfy0U0ietXdTXkHZCv4Km4IvaoUICySGs6vgxZdvy7veUmQ7OfvXTS3xJAtUwujssGc5x+U5r58J5ZLUx/EI77mImxtzwu79ZLow9khFpzTN4DE8k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: It's expected that no page should be left in pcp_list after calling zone_pcp_disable() in offline_pages(). Previously, it's observed that offline_pages() gets stuck [1] due to some pages remaining in pcp_list. Cause: There is a race condition between drain_pages_zone() and __rmqueue_pcplist() involving the pcp->count variable. See below scenario: CPU0 CPU1 ---------------- --------------- spin_lock(&pcp->lock); __rmqueue_pcplist() { zone_pcp_disable() { /* list is empty */ if (list_empty(list)) { /* add pages to pcp_list */ alloced = rmqueue_bulk() mutex_lock(&pcp_batch_high_lock) ... __drain_all_pages() { drain_pages_zone() { /* read pcp->count, it's 0 here */ count = READ_ONCE(pcp->count) /* 0 means nothing to drain */ /* update pcp->count */ pcp->count += alloced << order; ... ... spin_unlock(&pcp->lock); In this case, after calling zone_pcp_disable() though, there are still some pages in pcp_list. And these pages in pcp_list are neither movable nor isolated, offline_pages() gets stuck as a result. Solution: Expand the scope of the pcp->lock to also protect pcp->count in drain_pages_zone(), to ensure no pages are left in the pcp list after zone_pcp_disable() [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com/ Cc: David Hildenbrand Cc: Vlastimil Babka (SUSE) Reported-by: Yao Xingtao Signed-off-by: Li Zhijian Reviewed-by: Vlastimil Babka --- V3: Read pcp->count in the loop with lock held to prevent unnecessary spin_lock() wasteful V2: - Narrow down the scope of the spin_lock() to limit the draining latency. # Vlastimil and David - In above scenario, it's sufficient to read pcp->count once with lock held, and it fully fixed my issue[1] in thounds runs(It happened in more than 5% before). RFC: https://lore.kernel.org/linux-mm/20240716073929.843277-1-lizhijian@fujitsu.com/ Signed-off-by: Li Zhijian --- mm/page_alloc.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9ecf99190ea2..a32289ec4768 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2323,16 +2323,20 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) static void drain_pages_zone(unsigned int cpu, struct zone *zone) { struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - int count = READ_ONCE(pcp->count); - - while (count) { - int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX); - count -= to_drain; + int count; + do { spin_lock(&pcp->lock); - free_pcppages_bulk(zone, to_drain, pcp, 0); + count = pcp->count; + if (count) { + int to_drain = min(count, + pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX); + + free_pcppages_bulk(zone, to_drain, pcp, 0); + count -= to_drain; + } spin_unlock(&pcp->lock); - } + } while (count); } /*