From patchwork Fri Dec 15 08:12:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qiuxu Zhuo X-Patchwork-Id: 13494104 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EF6DC4332F for ; Fri, 15 Dec 2023 08:12:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D64A6B0637; Fri, 15 Dec 2023 03:12:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 887BE6B0638; Fri, 15 Dec 2023 03:12:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74F176B0639; Fri, 15 Dec 2023 03:12:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 60ECA6B0637 for ; Fri, 15 Dec 2023 03:12:39 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 38C4DA23D9 for ; Fri, 15 Dec 2023 08:12:39 +0000 (UTC) X-FDA: 81568336038.14.A67B22D Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) by imf27.hostedemail.com (Postfix) with ESMTP id DB9E740005 for ; Fri, 15 Dec 2023 08:12:36 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=A+dJvVca; spf=pass (imf27.hostedemail.com: domain of qiuxu.zhuo@intel.com designates 192.198.163.9 as permitted sender) smtp.mailfrom=qiuxu.zhuo@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702627957; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references:dkim-signature; bh=DGvq2P/d8eyX2RziTqdqeIeSnGISto0D7RfhnwfxcyQ=; b=pHN6+CJYOkuI797LJ0B6EopR/VVcxPouoNnQdqSnUSIGJ0lg2zD4TKwpvdIcC4BTwbD7eH ytw0OdOyjpfGEdxjzkqx5LRDM4OkPGcue/5e4hOm8lp8m1mQHxfvABnd/udNmdBybH+xu8 Cqi5lWE2r/0UdzzY6LIxPAYraQBmejE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702627957; a=rsa-sha256; cv=none; b=su72RYmM5ck0G+onkZ5fToGByyWmfTkhUNp1qdFwp+abQ0v0PRV6kUhVx/BoI2QKU91s9x RC8LUOIPt0iJyQaZTNONhVW4l7ccfBGT0yMOPH8Gy1eZYV6RsoksnVRDgnOe73ILHiJvvV T4XcZYWjRuoMFOIFkGlSYk3YKDXDed4= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=A+dJvVca; spf=pass (imf27.hostedemail.com: domain of qiuxu.zhuo@intel.com designates 192.198.163.9 as permitted sender) smtp.mailfrom=qiuxu.zhuo@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702627957; x=1734163957; h=from:to:cc:subject:date:message-id; bh=ZdpnPF5yz8wPHVueRT0u50qtNAiFwZAenFpSXobBWK4=; b=A+dJvVcadjzfwV0gnvAalfjWuw8wVgnb/44yx9BavrjpzfSVQNzCSa1v kiBGqSxnjAp9jolTgPLo5Sh20HouIreRyR+TqalNqob3GDDj2Elz8Lvxs Ms3g2T0/Cx/csoelHft6N6tNy+UNT3JgLuooh0rd+AfbfyJVZBC0TvYwP J/x1QeScqt103WZFpJj/naDP1Ws6TXE7SIoAJzwbQtzwXQbDwk9T07v4B EE/cMe7dQ0N2bw0IvX0j3ilgEAVZUVWw8vOaUZXN7Z6hlIYzlAMzyXtRV OZCJGxNbIuIsiDbbt26O+Eu64PlV/TUqmcqulsndufBxA45hwrsgrWdHs A==; X-IronPort-AV: E=McAfee;i="6600,9927,10924"; a="2098123" X-IronPort-AV: E=Sophos;i="6.04,278,1695711600"; d="scan'208";a="2098123" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2023 00:12:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,278,1695711600"; d="scan'208";a="22745475" Received: from qiuxu-clx.sh.intel.com ([10.239.53.109]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2023 00:12:32 -0800 From: Qiuxu Zhuo To: naoya.horiguchi@nec.com Cc: linmiaohe@huawei.com, akpm@linux-foundation.org, tony.luck@intel.com, ying.huang@intel.com, qiuxu.zhuo@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/1] mm: memory-failure: Re-split hw-poisoned huge page on -EAGAIN Date: Fri, 15 Dec 2023 16:12:04 +0800 Message-Id: <20231215081204.8802-1-qiuxu.zhuo@intel.com> X-Mailer: git-send-email 2.17.1 X-Rspamd-Queue-Id: DB9E740005 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: fjeeq6x5jsesn74ewx9gmix7esk77mh3 X-HE-Tag: 1702627956-683606 X-HE-Meta: U2FsdGVkX18NqumhS68rvdnzZ+odx11BvYzFNNDbkhf3+nkAReswJnyisfoJeXyOW+b5PhPRL4r20rgRURk25ey0+2HwnEX8Yqt7/Z0uAWpFI8rm2mTPRk9sIuFmPQ9nArvo7A+uDT8e191zxm9hqVm2pH64Uv+vKZAcW7Rx2sxx7XiaVe3paVK8DfGywRg8DYhYMok6ESmdtx7xuFu3cXAWLNmwmQ2QViVCVJxyM+58WEEg+38S4zkt8bsIMByArxMeEYExY0kFbTk7M/CO2l675d9Hp8gMBrQhmG9KZW+aZBHOzwzWJXEjQj1cX2VlRYLsgiZsqCrQz2F+SMZWeM1Sr1alvsfor6B6+F5BKrzrk+kxHZqAktpkN4KMWmoItcing58GG5Ah00x9RNQPGE25PTcuynt9NgmUY0g94hamnaAp/5VrPttWc5N31r1BlipdXzjHylUX6j+Nr/YsVe/k/VcfJJsA5AdTWo1BQT/OGdOs0rGgBxOb5+FWWFF2tA9K/L/N1DVNPvs3RpKRegnPXTJXt/Sstbvy/szEO8+wHG7L5CqpiMGpwZZ05s80qEqmKrz8ILi5uN1xizsaZu2ydptRFf10bxG+Fwaz5B8UesMw+kuvyv1Y/lzRqLtqjFsqDnkOd4SmgrCPxGXXiSoeiAJGvvJF+NPHzqkBkZZnSb0e2ijbkPc5UyzsbV6qTseOzq0L1g/dAPkT6RFZLqwT3MEtqa77VA+IevOgmjENis++Ny+DTxbgtkgqFheVN7Bg+ve9EBTkjuPTNc4D5/+2WTzlGUly+da0E1z5zbwJRXBn1Azwrs1IJOhtPggQztYT8ZCWZo19ssWkaexHWoOIeL5nwdgk1tzd3U8iJjFsN+urQbSiJtkJ/rr2B4WGQInaqf8UDIC+1QDMsTdN5s/NGuNrthX6C2ZyMNUvyDDyPW7FGkh0Y7JagCye9ZFOXPbClk46zMr6TyoHM5Z 2aTKcxIV G+a93qHq2lFPKHxcOvqFESruZy8kTx3SqODnc23lQREjSAHCjP2N7KljCii7NI3DXa1uchxKDaEkzXSsOZF1PEC+3Pl0VGOJa2p3U8kQr8wUTbQS/fwpIzcvchQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: During the process of splitting a hw-poisoned huge page, it is possible for the reference count of the huge page to be increased by the threads within the affected process, leading to a failure in splitting the hw-poisoned huge page with an error code of -EAGAIN. This issue can be reproduced when doing memory error injection to a multiple-thread process, and the error occurs within a huge page. The call path with the returned -EAGAIN during the testing is shown below: memory_failure() try_to_split_thp_page() split_huge_page() split_huge_page_to_list() { ... Step A: can_split_folio() - Checked that the thp can be split. Step B: unmap_folio() Step C: folio_ref_freeze() - Failed and returned -EAGAIN. ... } The testing logs indicated that some huge pages were split successfully via the call path above (Step C was successful for these huge pages). However, some huge pages failed to split due to a failure at Step C, and it was observed that the reference count of the huge page increased between Step A and Step C. Testing has shown that after receiving -EAGAIN, simply re-splitting the hw-poisoned huge page within memory_failure() always results in the same -EAGAIN. This is possible because memory_failure() is executed in the currently affected process. Before this process exits memory_failure() and is terminated, its threads could increase the reference count of the hw-poisoned page. To address this issue, employ the kernel worker to re-split the hw-poisoned huge page. By the time this worker begins re-splitting the hw-poisoned huge page, the affected process has already been terminated, preventing its threads from increasing the reference count. Experimental results have consistently shown that this worker successfully re-splits these hw-poisoned huge pages on its first attempt. The kernel log (before): [ 1116.862895] Memory failure: 0x4097fa7: recovery action for unsplit thp: Ignored The kernel log (after): [ 793.573536] Memory failure: 0x2100dda: recovery action for unsplit thp: Delayed [ 793.574666] Memory failure: 0x2100dda: split unsplit thp successfully. Signed-off-by: Qiuxu Zhuo --- mm/memory-failure.c | 73 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 71 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 660c21859118..0db4cf712a78 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -72,6 +72,60 @@ atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); static bool hw_memory_failure __read_mostly = false; +#define SPLIT_THP_MAX_RETRY_CNT 10 +#define SPLIT_THP_INIT_DELAYED_MS 1 + +static bool split_thp_pending; + +struct split_thp_req { + struct delayed_work work; + struct page *thp; + int retries; +}; + +static void split_thp_work_fn(struct work_struct *work) +{ + struct split_thp_req *req = container_of(work, typeof(*req), work.work); + int ret; + + /* Split the thp. */ + get_page(req->thp); + lock_page(req->thp); + ret = split_huge_page(req->thp); + unlock_page(req->thp); + put_page(req->thp); + + /* Retry with an exponential backoff. */ + if (ret && ++req->retries < SPLIT_THP_MAX_RETRY_CNT) { + schedule_delayed_work(to_delayed_work(work), + msecs_to_jiffies(SPLIT_THP_INIT_DELAYED_MS << req->retries)); + return; + } + + pr_err("%#lx: split unsplit thp %ssuccessfully.\n", page_to_pfn(req->thp), ret ? "un" : ""); + kfree(req); + split_thp_pending = false; +} + +static bool split_thp_delayed(struct page *thp) +{ + struct split_thp_req *req; + + if (split_thp_pending) + return false; + + req = kmalloc(sizeof(*req), GFP_ATOMIC); + if (!req) + return false; + + req->thp = thp; + req->retries = 0; + INIT_DELAYED_WORK(&req->work, split_thp_work_fn); + split_thp_pending = true; + schedule_delayed_work(&req->work, msecs_to_jiffies(SPLIT_THP_INIT_DELAYED_MS)); + return true; +} + static DEFINE_MUTEX(mf_mutex); void num_poisoned_pages_inc(unsigned long pfn) @@ -2275,8 +2329,23 @@ int memory_failure(unsigned long pfn, int flags) * page is a valid handlable page. */ SetPageHasHWPoisoned(hpage); - if (try_to_split_thp_page(p) < 0) { - res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); + res = try_to_split_thp_page(p); + if (res < 0) { + /* + * Re-attempting try_to_split_thp_page() here could consistently + * yield -EAGAIN, as the threads of the process may increment the + * reference count of the huge page before the process exits + * memory_failure() and terminates. + * + * Employ the kernel worker to re-split the huge page. By the time + * this worker initiates the re-splitting process, the affected + * process has already been terminated, preventing its threads from + * incrementing the reference count. + */ + if (res == -EAGAIN && split_thp_delayed(p)) + res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_DELAYED); + else + res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); goto unlock_mutex; } VM_BUG_ON_PAGE(!page_count(p), p);