From patchwork Mon Mar 27 21:15:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13190091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFC19C76195 for ; Mon, 27 Mar 2023 21:16:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 941BE900002; Mon, 27 Mar 2023 17:16:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F278280001; Mon, 27 Mar 2023 17:16:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B97B900007; Mon, 27 Mar 2023 17:16:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6D97A900002 for ; Mon, 27 Mar 2023 17:16:04 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3AF5BC0748 for ; Mon, 27 Mar 2023 21:16:04 +0000 (UTC) X-FDA: 80615935848.16.5968EB0 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf28.hostedemail.com (Postfix) with ESMTP id B0327C001C for ; Mon, 27 Mar 2023 21:16:01 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=LS8xMIcj; spf=pass (imf28.hostedemail.com: domain of 3kAciZAgKCE80zr7zFr4x55x2v.t532z4BE-331Crt1.58x@flex--jiaqiyan.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3kAciZAgKCE80zr7zFr4x55x2v.t532z4BE-331Crt1.58x@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679951761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V6PL2SzOs50LrRx6GRjsy4y4hYbcVBBC5q6RTP2zkTI=; b=KYDFjP2i0Ah5qrlNlPc4MsoXTyWzhfB0Yk9L4aM8inNTVEGsshHrvSckLcR4arDWsH+xnj dd16vilGfxc1sNP8fpZkuV1oNI/bvnI1cKTr73YcFcQxkp3RmvO/Xfl2Jx7Djr1MmtMO4b gRPBZg1gMK7tSZSdv62udifqgNDrmwI= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=LS8xMIcj; spf=pass (imf28.hostedemail.com: domain of 3kAciZAgKCE80zr7zFr4x55x2v.t532z4BE-331Crt1.58x@flex--jiaqiyan.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3kAciZAgKCE80zr7zFr4x55x2v.t532z4BE-331Crt1.58x@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679951761; a=rsa-sha256; cv=none; b=KtNMF/cM+f2wWzjC2ua5M2zWCafOPzZcuMgvr17/LKmC1Fe74nCsSZPz5oQ9E3HuhAsMpr avKM8fu9C3dQ/PFLH7Yk5QpJOaoi52WNb1o9rSQMrkvOR4JN518v9TmlZndpkot7EPgZeL Q7Bbaly5sFLjp+4o/I9eFZZmhjEqp10= Received: by mail-pf1-f202.google.com with SMTP id i192-20020a6287c9000000b0062a43acb7faso4741616pfe.8 for ; Mon, 27 Mar 2023 14:16:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679951760; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=V6PL2SzOs50LrRx6GRjsy4y4hYbcVBBC5q6RTP2zkTI=; b=LS8xMIcjUcFBY99EVt7I4ZUphmOMxLQTbr52OHALPnngZ8KD+c5LRsGbKsLHCbD1+9 RWRq2rcH5esFMkfKHNEuoJBOxr1wcix2nr5pct+HQLG4KIXeev+k7hITbhecdVU231AE 0XBvMjMkTtACjFcwtrEfxwBZlEWPO2UYywa044d9gFZyZXtp/NDNlkomo7a17mxJizu9 e8lhu/NtGhQs2hTntY7IyQTQ6kXwPtpInBXjvH3TyzLnA5CqShvdVJDL/ypIAD3HE3SK oGk9fptyG9dFE43cesIc0sfgj3kEf44HC4LmhvYWabWgt0q5tor63YowHfXp3olpZM6J AeMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679951760; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=V6PL2SzOs50LrRx6GRjsy4y4hYbcVBBC5q6RTP2zkTI=; b=pM02olC7BCIP9n5i6BaOazGRH/Rx5m3vkIVHnyiAE6So9TkGw20j9SMywMAlZvXjc7 khCD54cytBYmpfN5DasMVH8IRklPaIt1YpwFqkPAf1+6aqigTkseprovehd14P2P6f06 OnNYwRjxh7Uo6MBmGFFjuz41KGr89leMn56YPPui71OLfPr3Fwx2FRbWt1OUxE99AOgL P0+CIj/YMEZd3+ZzP5FeKCCrbbAx6+0HqNzPlKRjeNq49B4nqTIALNIYnLt+PhnXu4u+ tBnznv3BnAtG0GExC2CD8BmxRV04qnYSmbbkfw7jTea7rELttpwEdJy98psGwCsx9wzS uR/w== X-Gm-Message-State: AAQBX9dyaJ7NpefEuIOVcA2TLygPvStZZyd0q2bd+9QEe3JN4BAMJDag CBG+kQT/+hNiXyD+bPJWScT9i0q2i4cTHQ== X-Google-Smtp-Source: AKy350bNAFJZoNSTPE5PD77ywPGNkFUJU/McoLWGIEcYCKgHOS8UkJZ4HEt5n7C075Z++xNGIiRR87evJCOGKg== X-Received: from yjqkernel.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1837]) (user=jiaqiyan job=sendgmr) by 2002:a17:90a:bd87:b0:23b:457b:a2a4 with SMTP id z7-20020a17090abd8700b0023b457ba2a4mr3955815pjr.2.1679951760720; Mon, 27 Mar 2023 14:16:00 -0700 (PDT) Date: Mon, 27 Mar 2023 14:15:48 -0700 In-Reply-To: <20230327211548.462509-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230327211548.462509-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230327211548.462509-4-jiaqiyan@google.com> Subject: [PATCH v11 3/3] mm/khugepaged: recover from poisoned file-backed memory From: Jiaqi Yan To: kirill.shutemov@linux.intel.com, kirill@shutemov.name, shy828301@gmail.com, tongtiangen@huawei.com, tony.luck@intel.com Cc: naoya.horiguchi@nec.com, linmiaohe@huawei.com, jiaqiyan@google.com, linux-mm@kvack.org, akpm@linux-foundation.org, osalvador@suse.de, wangkefeng.wang@huawei.com, stevensd@chromium.org, hughd@google.com X-Rspamd-Queue-Id: B0327C001C X-Stat-Signature: qpfeefi8fswx4s18pmns8kpztts1b58k X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1679951761-342389 X-HE-Meta: U2FsdGVkX1+n/brpsvMnkGHYt8oNiGpKSOq7My5XTJyl3hxkUIfnkJM3YmqBH//6EC8n4BhBwlqvuPTqCdvc/MUSOzxnlbd9nVonOOubrYO/NLf/n/V6htYnwhmZdKoZk7jDxr8r2MwL5Azyu70eyMg+iraQiHdhaJ+ScZWnAOgiDH+99h+8l3rO2vQJLF6rTmnyzs0BdfxE+Xm7vsIvHxeLUXvZhGWmqOpUiPg5TLX54+TSNCzU75Z2adcw15E/kZVXUgfGH4tOUaiC5+w3r+cW0WQYRBn2xV4uv0zgwkYtRZHs1NfFrNM+VgyXfdgocj3ags48zZs8KV3/402EZLoMFkirR7av3cfGpAILaQiy1cP84KxrjojVYJbVxgnKLlfwsTrQhPP2nJcvJIhfcL/KEx81JQC6mA1zaUJEQcP0L59fZIwe8r4KlImGYjNlTFzsQyLyrtvuGRiY9hAMfoEsV7ciCUWmrOPKF/spzSeBC5gEP7uIEuu1AIysLV1heKCOascMPyV/hXvBuiRgSJDteLVguanMzdF2+dBuc+LR0uXM97ccUo0FSiOn6LojdVbkRmJX12UIRFhfTkIrIqLtNhUC40X2/DQ7HucCzl1WE299DVqcd3cQgidY/7LqAMOvMbDG36sJJqiIWbhnrGwJIRT05XMODy521KE2sKQ2RaSeE4pAl42RVJ1nskprAVVd9joPC4qXwH/45nwzvkeANDu3K82Udq8brEzfqtqp18jOOE8Nf2Ao0ifQRNgB2JP5TveNZdO9QapH10eGrnVGM304s5bfxRNZL5SE6DYVkiHq8Kowg39t9k3ZWR4SPX+b2BLEeuj1tJ9j3TCt7KaqlhvLRyrZ3ABfIyY4PGJY6frgA1B4YMtSghZnaJarQJ6I+vUeKe5/WgOxhSP06ZCuFdDrqsxwXPZ6X87JFdJVImsEDzPD2nj2KEzUJibnn4s5Ex8i5/dRrm8HlZG ynXCy84c 7kkQrb9OTBm7BUJOKey+RGHAuHwFXRBA//UYkXBsclMKk9rRDodk4Qi8M7nAE43XBLDLWdhMukM1nLR0i+VhXCFM9ujlbOJM5Tqd/t4gzJt9+bulwrrFIdk8Xaa7d5i5EO/7lEl0mCCpjVDBp9qTPWv2ZTPz9MWQxZv0hk3q9WbY+IEXlptLh5Gg16eVMA8RRcdP6dbacvcpN+x0zXXZvAZqfFAC//PuZjeHNn+V9MJRuDu0Js/G2/6O1+a3uNQJpl1pKdhIfMp9f/IQf4dlSwRnxxD7s8X20pe3tQ4ypxd8pMUPAy42DOL1gycEycy6+XLPOY/HM/9tsqoAd83vNdFLKo/T2CFO50c+OIgZp7PqaEttJLxXlzh8nfa/wUdYV9dN4PlMeLknNFPsnAARIgLW6hUYYgbFL2ljdChsWoti0PAXRjFsAVMrEr+H4TOcFI++5+muhUkYqofZO9EcY64XpAprYfhgurdf8KefqkwwJa/y6W5yL8vs6F9gZ2lXFpbeVEDw8W1zd/gfgYSgtgWd0iD2+TslZqhbQ7HoqXCdK/BL2Hexheyxd0YJD7887/8/JpkMtHm7RWPbJbt/eA1vqFo4iIuBfXGvtnjfycGRuiR3d1SuTPl9EDtUJWIPOLzNoT24VUZfPAjE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Make collapse_file roll back when copying pages failed. More concretely: - extract copying operations into a separate loop - postpone the updates for nr_none until both scanning and copying succeeded - postpone joining small xarray entries until both scanning and copying succeeded - postpone the update operations to NR_XXX_THPS until both scanning and copying succeeded - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded but copying failed Tested manually: 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdisk. 1. Start a two-thread application. Each thread allocates a chunk of non-huge memory buffer from /mnt/ramdisk. 2. Pick 4 random buffer address (2 in each thread) and inject uncorrectable memory errors at physical addresses. 3. Signal both threads to make their memory buffer collapsible, i.e. calling madvise(MADV_HUGEPAGE). 4. Wait and then check kernel log: khugepaged is able to recover from poisoned pages by skipping them. 5. Signal both threads to inspect their buffer contents and make sure no data corruption. Signed-off-by: Jiaqi Yan Reviewed-by: Yang Shi --- mm/khugepaged.c | 86 +++++++++++++++++++++++++++++++------------------ 1 file changed, 54 insertions(+), 32 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index bef68286345c8..38c1655ce0a9e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1874,6 +1874,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, { struct address_space *mapping = file->f_mapping; struct page *hpage; + struct page *page; + struct page *tmp; + struct folio *folio; pgoff_t index = 0, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); @@ -1918,8 +1921,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_set(&xas, start); for (index = start; index < end; index++) { - struct page *page = xas_next(&xas); - struct folio *folio; + page = xas_next(&xas); VM_BUG_ON(index != xas.xa_index); if (is_shmem) { @@ -2099,12 +2101,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, put_page(page); goto xa_unlocked; } - nr = thp_nr_pages(hpage); - if (is_shmem) - __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); - else { - __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); + if (!is_shmem) { filemap_nr_thps_inc(mapping); /* * Paired with smp_mb() in do_dentry_open() to ensure @@ -2115,21 +2113,9 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, smp_mb(); if (inode_is_open_for_write(mapping->host)) { result = SCAN_FAIL; - __mod_lruvec_page_state(hpage, NR_FILE_THPS, -nr); filemap_nr_thps_dec(mapping); - goto xa_locked; } } - - if (nr_none) { - __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); - /* nr_none is always 0 for non-shmem. */ - __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); - } - - /* Join all the small entries into a single multi-index entry */ - xas_set_order(&xas, start, HPAGE_PMD_ORDER); - xas_store(&xas, hpage); xa_locked: xas_unlock_irq(&xas); xa_unlocked: @@ -2142,21 +2128,36 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, try_to_unmap_flush(); if (result == SCAN_SUCCEED) { - struct page *page, *tmp; - struct folio *folio; - /* * Replacing old pages with new one has succeeded, now we - * need to copy the content and free the old pages. + * attempt to copy the contents. */ index = start; - list_for_each_entry_safe(page, tmp, &pagelist, lru) { + list_for_each_entry(page, &pagelist, lru) { while (index < page->index) { clear_highpage(hpage + (index % HPAGE_PMD_NR)); index++; } - copy_highpage(hpage + (page->index % HPAGE_PMD_NR), - page); + if (copy_mc_highpage(hpage + (page->index % HPAGE_PMD_NR), + page) > 0) { + result = SCAN_COPY_MC; + break; + } + index++; + } + while (result == SCAN_SUCCEED && index < end) { + clear_highpage(hpage + (index % HPAGE_PMD_NR)); + index++; + } + } + + nr = thp_nr_pages(hpage); + if (result == SCAN_SUCCEED) { + /* + * Copying old pages to huge one has succeeded, now we + * need to free the old pages. + */ + list_for_each_entry_safe(page, tmp, &pagelist, lru) { list_del(&page->lru); page->mapping = NULL; page_ref_unfreeze(page, 1); @@ -2164,12 +2165,23 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, ClearPageUnevictable(page); unlock_page(page); put_page(page); - index++; } - while (index < end) { - clear_highpage(hpage + (index % HPAGE_PMD_NR)); - index++; + + xas_lock_irq(&xas); + if (is_shmem) + __mod_lruvec_page_state(hpage, NR_SHMEM_THPS, nr); + else + __mod_lruvec_page_state(hpage, NR_FILE_THPS, nr); + + if (nr_none) { + __mod_lruvec_page_state(hpage, NR_FILE_PAGES, nr_none); + /* nr_none is always 0 for non-shmem. */ + __mod_lruvec_page_state(hpage, NR_SHMEM, nr_none); } + /* Join all the small entries into a single multi-index entry. */ + xas_set_order(&xas, start, HPAGE_PMD_ORDER); + xas_store(&xas, hpage); + xas_unlock_irq(&xas); folio = page_folio(hpage); folio_mark_uptodate(folio); @@ -2187,8 +2199,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, unlock_page(hpage); hpage = NULL; } else { - struct page *page; - /* Something went wrong: roll back page cache changes */ xas_lock_irq(&xas); if (nr_none) { @@ -2222,6 +2232,18 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, xas_lock_irq(&xas); } VM_BUG_ON(nr_none); + /* + * Undo the updates of filemap_nr_thps_inc for non-SHMEM + * file only. This undo is not needed unless failure is + * due to SCAN_COPY_MC. + * + * Paired with smp_mb() in do_dentry_open() to ensure the + * update to nr_thps is visible. + */ + smp_mb(); + if (!is_shmem && result == SCAN_COPY_MC) + filemap_nr_thps_dec(mapping); + xas_unlock_irq(&xas); hpage->mapping = NULL;