From patchwork Sun Jan 19 18:06:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13944557 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2881C02187 for ; Sun, 19 Jan 2025 18:06:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02F096B0082; Sun, 19 Jan 2025 13:06:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F20BB6B0083; Sun, 19 Jan 2025 13:06:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE8A46B0085; Sun, 19 Jan 2025 13:06:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C0A8B6B0082 for ; Sun, 19 Jan 2025 13:06:17 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5C5C0C0C19 for ; Sun, 19 Jan 2025 18:06:17 +0000 (UTC) X-FDA: 83024980794.30.DD30E92 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf07.hostedemail.com (Postfix) with ESMTP id 903AE4000D for ; Sun, 19 Jan 2025 18:06:15 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zqhGSBPT; spf=pass (imf07.hostedemail.com: domain of 3Fj-NZwgKCA0wvn3vBn0t11tyr.p1zyv07A-zzx8npx.14t@flex--jiaqiyan.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3Fj-NZwgKCA0wvn3vBn0t11tyr.p1zyv07A-zzx8npx.14t@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737309975; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=PSqbNLs18ko+jg8IsHulEz1BqPtRdnE6aFd5WcUb1ss=; b=OkFfzcuhIYBl2RLkj8sJaB+DLWb/ZdA5VVjJlmkTJsJ7Cm6vM+79yMLWJ2pU/S7dijHTSH Qb7QSdyE49S3Vqc/BF+afOCo11rPSoxF8G5VKzWTtGK83RYVAHz38wO9dQpCpcl3jfViLo +3FxOLDSAPsuK8RDjgl0NRIB0udbyxE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zqhGSBPT; spf=pass (imf07.hostedemail.com: domain of 3Fj-NZwgKCA0wvn3vBn0t11tyr.p1zyv07A-zzx8npx.14t@flex--jiaqiyan.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3Fj-NZwgKCA0wvn3vBn0t11tyr.p1zyv07A-zzx8npx.14t@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737309975; a=rsa-sha256; cv=none; b=OfCxRwg4YBcGHSBWDOFYH/5vBMAPUbjlawGUepo1wM3SDHXG0a5lumR7sH5V8w3FUvVD5E dbWKxZu+gvieZi9whC9WP2XEq9xL/Y4vVgf93PV05+4WfePUNBXeIZEq5yInX5AWoxaLof hMQoZiK4YFF8jxNA2JgXcoyQ5/cnl/I= Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ef9da03117so10685046a91.1 for ; Sun, 19 Jan 2025 10:06:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737309974; x=1737914774; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=PSqbNLs18ko+jg8IsHulEz1BqPtRdnE6aFd5WcUb1ss=; b=zqhGSBPTXOF8Uz/4v4vLPreJ4OepbmQD52ibkNPvcizv5KSARTcnckYQPfA+bcf/aK Y9G2PulU36cNLrKc7bg1/VXSydZ5tD/DsvyATbks4roPYKzvITgAK6hs2PIkhK9i+3XC e3Kvu12/SytrP9knUPDUg70RkEN4Se9DXGA3DcwmiLR28corCjyjY2EBvKFiF8oNmmFp HyjLTJca+BFEcVXZKDxqP7BKOUAx9NPcfQXbwTvLy9U7TsstlmvvnMfcSuHZb9SHaEUl 6QZXzPfId9HRu8aMkIEiVy5g3q4Y/hjggvgs4hs+76eVL6flqcZbg2ILljDIQcMqvnms vyCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737309974; x=1737914774; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=PSqbNLs18ko+jg8IsHulEz1BqPtRdnE6aFd5WcUb1ss=; b=Ua5P+quJCDEwN+LRlEq1JdHBWD/m7Dp8/8PmMyI1tvuT4ykSWMCBNULSuqagWpFSwL vB6Mujjvf+hSf+VTtaUj0qQNHOO/tCYUTbIA2Md77IQnvlnb510vRuFiyReTyE/i6IgY 5KrvzFVfdLfnGspXxzdGoQnM4zXIHVtESx6rCn3b/qcV5UaJjn9As0f6pLPTXu5eSn7W /vCLvFwfea9+EXXin3NmMJ5D8xDfxcLT9gakSsC5mK/iPVHcSe8N9XkgWL4Oyv5m4P9q CBpO1R2Su1NPCD4GDTUBk+SdFKawVsBx695bsZEdoJrcoS1N2+52ajYrUBnC2RRH/H0b QfaA== X-Forwarded-Encrypted: i=1; AJvYcCXE+y14lTrzY91B3bxGJvG9H9iLeIzjDb5gxrHFuDRpyAd7ehst0FCJiMYOKzeUBsoM4L+EAFCojg==@kvack.org X-Gm-Message-State: AOJu0Yyo1f5yXRJQ3G+16qhWw4uxTiEGbYfvigg/9SYz8+iMjaaMW8Bb ICcEpyf/bihbRVYNYiAjf8fr2x4SGhxG92imHAlIFGMJ81QbH+6Ktf49CiQWHwbVrSuQAaVTSXj M4DPNyL0wLg== X-Google-Smtp-Source: AGHT+IHMOzZTOiOXnpr13MMBl2W62qaKngT0VQ0IMOcu0Fr4kMMfIbd4jBcTIKoYDU6pAy9RdIW7KDVXwozAbw== X-Received: from pjbsi10.prod.google.com ([2002:a17:90b:528a:b0:2ea:5fc2:b503]) (user=jiaqiyan job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:258b:b0:2ee:48bf:7dc9 with SMTP id 98e67ed59e1d1-2f782cb092dmr15849767a91.15.1737309974205; Sun, 19 Jan 2025 10:06:14 -0800 (PST) Date: Sun, 19 Jan 2025 18:06:06 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.48.0.rc2.279.g1de40edade-goog Message-ID: <20250119180608.2132296-1-jiaqiyan@google.com> Subject: [RFC PATCH v1 0/2] How HugeTLB handle HWPoison page at truncation From: Jiaqi Yan To: nao.horiguchi@gmail.com, linmiaohe@huawei.com, sidhartha.kumar@oracle.com, muchun.song@linux.dev Cc: jane.chu@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, rientjes@google.com, jthoughton@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan X-Rspamd-Queue-Id: 903AE4000D X-Stat-Signature: dbgsyt1szsx5jy4w75bjmkuipjxn3mbe X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1737309975-64385 X-HE-Meta: U2FsdGVkX1+bUPwp1ajio2LkJaPDJoi8jPWmPi6ubLl90Lnw3dPlW+vbLGBk1cOAXdL53lw4PMOOe+VVjCIxHX5eewTCoTQwVhfbYA82YNgO07useEGue0K/DdJo9YRoM7qih3Ck1ZdrGpIngEYB+P63IGsVQGhKhUgEFfmqd+KgB6/TsPd58huLyJfOATQNr468lOUhoMAsvEy6+uI6Ar5ka8tFNC6eBg7BUQoQ7gHI1FNQ17A2+LnZ8JvAClD8uDWlNbWMnnvuOdVufjRLxMz+Hsqs62wEBWonGENmtftgXZsnXots1vsTRJH8IkWg0tDY2yBoDjoQz40pw3o2RX33lVBh6YlQWapQcflxNWywRMyreseTyyzqZ3/4o+BDR3CxcEjlHnUnDL2xUIFn+zK688/Or3M7qv5OZqFg7REWaXod5br86+qGtQQRGJpWzbA5Qn+RSYbPSiMI3kHAqnO8OMyb3Fu+c41eks7WfpOSflZmtSVffNrB8m2IgCFQeRME6RZBr+DQDhsEb2E7LXkZtF2ZpfXfRpuyMktkcwLh48IZ/I9e78YMAflEUGZ0GpKohKodIKm3xui9poF2lkcmp/1xUJQ1ELfnctKmKeATM5Fc3zzOklF4rJUZd/1VVoziJ36BVbRY31IE7JWoL9TIEfxkjog3XTyA1g23DN1wLEwzNwgG8VFz0HkvEuBiU9ibmw5T1OaN9TmmXG8oiQhqBr5XAO/oQO9WguP0zEakvTZKpirRHsvc/5fm2O3lGdEd9aPhfW2PnxWaJA24ncqpaAEOXLVM19OrpJABnFSsk0gYj4UeHd3Ad6fP8+EcgvrUsvr6jBIr0olKP+leJicfsPXIMWKRsB+yN5OP83VBIwpsFqy1e6LYes+8YH2bSpCMbwEPqVaRPYUrikrwUHODARgK1D8EHUfvIZBwvGTFrFBREzujEPSymfLZt9ufO8n00IS6CYxZCtPBHT2 0RWYtgzJ XQP63i0gsDH0gohcw+UDr2FWSDyonBXHWY8ZR9YxAqTr9jytvIjHovUU/vbOInbgdRaqxat6wAdTAtLyF7XrzVKMYSLKQUMHmpeDObzgJf6yt+6gTJeaWqMoi02On2ijztDh3lhfxK1m47gGjIhg+PNulMcu1ZeYXz+EcCVPF/UshEYjyASqVWTtLBFH1xqS519XN+lm71X0g6CKypcpn+xY0GjQc8kC+d/lJXVAUf0iTcT0UryfgWCA3F+DYTP5Dc27SF5nj0bCoJnewrHH0pIZaGPdG9XRBYP91oq6gqo8WiY94J6E3GlsYuHXsHF/v0kQBzKLkkiX6Z3effK4AeDg8GEP3fUWMQkQSIq2uhoT7JQUBzZ3U/pCPdcEGlXB8EH0QU53s0qrueEPFQ+VCviLzHbAvRszIqGMNBT+QORwe/ueEuLKeIxwN13dJt00eJru27QqJqGoJMEiTy5TpWQ6C8aM55vqmFGLMLNQOpb21JPC/TOhsmpvUP8ly2vv67rwO0ATN4SGdKiE5Gq6U0Fm+Cij24bEAwqGfJbAG0tYg6OXIUl1xtUh8TusDzWRD4asqq45jdU+ELdZXY3RYxxyQdrms0ukxnb+iwf4LrZAsNctBPQXORCdS0jNsEDWgRHXu1z1SK/MqUdQO59lGQ6UQbA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: While I was working on userspace MFR via memfd [1], I spend some time to understand what current kernel does when a HugeTLB-backing memfd is truncated. My expectation is, if there is a HWPoison HugeTLB folio mapped via the memfd to userspace, it will be unmapped right away but still be kept in page cache [2]; however when the memfd is truncated to zero or after the memfd is closed, kernel should dissolve the HWPoison folio in the page cache, and free only the clean raw pages to buddy allocator, excluding the poisoned raw page. So I wrote a hugetlb-mfr-base.c selftest and expect 0. say nr_hugepages initially is 64 as system configuration. 1. after MADV_HWPOISON, nr_hugepages should still be 64 as we kept even HWPoison huge folio in page cache. free_hugepages should be nr_hugepages minus whatever the amount in use. 2. after truncated memfd to zero, nr_hugepages should reduced to 63 as kernel dissolved and freed the HWPoison huge folio. free_hugepages should also be 63. However, when testing at the head of mm-stable commit 2877a83e4a0a ("mm/hugetlb: use folio->lru int demote_free_hugetlb_folios()"), I found although free_hugepages is reduced to 63, nr_hugepages is not reduced and stay at 64. Is my expectation outdated? Or is this some kind of bug? I assume this is a bug and then digged a little bit more. It seems there are two issues, or two things I don't really understand. 1. During try_memory_failure_hugetlb, we should increased the target in-use folio's refcount via get_hwpoison_hugetlb_folio. However, until the end of try_memory_failure_hugetlb, this refcout is not put. I can make sense of this given we keep in-use huge folio in page cache. However, I failed to find the place to put this refcount at through remove_inode_hugepages. Is the refcount dec missing? At least my testcase suggested yes. In folios_put_refs, I added a dump_page: if (!folio_ref_sub_and_test(folio, nr_refs)) { dump_page(&folio-page, "track hwpoison folio's ref"); continue; } [ 1069.320976] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2780000 [ 1069.320978] head: order:18 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 1069.320980] flags: 0x400000000100044(referenced|head|hwpoison|node=0|zone=1) [ 1069.320982] page_type: f4(hugetlb) [ 1069.320984] raw: 0400000000100044 ffffffff8760bbc8 ffffffff8760bbc8 0000000000000000 [ 1069.320985] raw: 0000000000000000 0000000000000000 00000001f4000000 0000000000000000 [ 1069.320987] head: 0400000000100044 ffffffff8760bbc8 ffffffff8760bbc8 0000000000000000 [ 1069.320988] head: 0000000000000000 0000000000000000 00000001f4000000 0000000000000000 [ 1069.320990] head: 0400000000000012 ffffdd53de000001 ffffffffffffffff 0000000000000000 [ 1069.320991] head: 0000000000040000 0000000000000000 00000000ffffffff 0000000000000000 [ 1069.320992] page dumped because: track hwpoison folio's ref 2. Even if folio's refcount do drop to zero and we get into free_huge_folio, it is not clear to me which part of free_huge_folio is handling the case that folio is HWPoison. In my test what I observed is that evantually the folio is enqueue_hugetlb_folio()-ed. I tried to fix both issues with a very immature patch and the hugetlb-mfr-base.c can pass. The patch shows the two things I think currently missing. Want to use this RFC to better understand what behavior I should expect, and if this is indeed an issue, to discuss fixes. Thanks. [1] https://lore.kernel.org/linux-mm/20250118231549.1652825-1-jiaqiyan@google.com/T [2] https://lore.kernel.org/all/20221018200125.848471-1-jthoughton@google.com/T/#u Jiaqi Yan (2): selftest/mm: test HWPoison hugetlb truncation behavior mm/hugetlb: immature fix to handle HWPoisoned folio mm/hugetlb.c | 6 + mm/swap.c | 9 +- tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/hugetlb-mfr-base.c | 240 ++++++++++++++++++ 4 files changed, 255 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/mm/hugetlb-mfr-base.c