mbox series

[RFC,v1,0/2] How HugeTLB handle HWPoison page at truncation

Message ID 20250119180608.2132296-1-jiaqiyan@google.com (mailing list archive)
Headers show
Series How HugeTLB handle HWPoison page at truncation | expand

Message

Jiaqi Yan Jan. 19, 2025, 6:06 p.m. UTC
While I was working on userspace MFR via memfd [1], I spend some time to
understand what current kernel does when a HugeTLB-backing memfd is
truncated. My expectation is, if there is a HWPoison HugeTLB folio
mapped via the memfd to userspace, it will be unmapped right away but
still be kept in page cache [2]; however when the memfd is truncated to
zero or after the memfd is closed, kernel should dissolve the HWPoison
folio in the page cache, and free only the clean raw pages to buddy
allocator, excluding the poisoned raw page.

So I wrote a hugetlb-mfr-base.c selftest and expect
0. say nr_hugepages initially is 64 as system configuration.
1. after MADV_HWPOISON, nr_hugepages should still be 64 as we kept even
   HWPoison huge folio in page cache. free_hugepages should be
   nr_hugepages minus whatever the amount in use.
2. after truncated memfd to zero, nr_hugepages should reduced to 63 as
   kernel dissolved and freed the HWPoison huge folio. free_hugepages
   should also be 63.

However, when testing at the head of mm-stable commit 2877a83e4a0a
("mm/hugetlb: use folio->lru int demote_free_hugetlb_folios()"), I found
although free_hugepages is reduced to 63, nr_hugepages is not reduced
and stay at 64.

Is my expectation outdated? Or is this some kind of bug?

I assume this is a bug and then digged a little bit more. It seems there
are two issues, or two things I don't really understand.

1. During try_memory_failure_hugetlb, we should increased the target
   in-use folio's refcount via get_hwpoison_hugetlb_folio. However,
   until the end of try_memory_failure_hugetlb, this refcout is not put.
   I can make sense of this given we keep in-use huge folio in page
   cache. However, I failed to find the place to put this refcount at
   through remove_inode_hugepages. Is the refcount dec missing? At least
   my testcase suggested yes. In folios_put_refs, I added a dump_page:
   if (!folio_ref_sub_and_test(folio, nr_refs)) {
	  dump_page(&folio-page, "track hwpoison folio's ref");
	  continue;
   }
[ 1069.320976] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2780000
[ 1069.320978] head: order:18 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 1069.320980] flags: 0x400000000100044(referenced|head|hwpoison|node=0|zone=1)
[ 1069.320982] page_type: f4(hugetlb)
[ 1069.320984] raw: 0400000000100044 ffffffff8760bbc8 ffffffff8760bbc8 0000000000000000
[ 1069.320985] raw: 0000000000000000 0000000000000000 00000001f4000000 0000000000000000
[ 1069.320987] head: 0400000000100044 ffffffff8760bbc8 ffffffff8760bbc8 0000000000000000
[ 1069.320988] head: 0000000000000000 0000000000000000 00000001f4000000 0000000000000000
[ 1069.320990] head: 0400000000000012 ffffdd53de000001 ffffffffffffffff 0000000000000000
[ 1069.320991] head: 0000000000040000 0000000000000000 00000000ffffffff 0000000000000000
[ 1069.320992] page dumped because: track hwpoison folio's ref

2. Even if folio's refcount do drop to zero and we get into
   free_huge_folio, it is not clear to me which part of free_huge_folio
   is handling the case that folio is HWPoison. In my test what I
   observed is that evantually the folio is enqueue_hugetlb_folio()-ed.

I tried to fix both issues with a very immature patch and the
hugetlb-mfr-base.c can pass. The patch shows the two things I think
currently missing.

Want to use this RFC to better understand what behavior I should expect,
and if this is indeed an issue, to discuss fixes. Thanks.

[1] https://lore.kernel.org/linux-mm/20250118231549.1652825-1-jiaqiyan@google.com/T
[2] https://lore.kernel.org/all/20221018200125.848471-1-jthoughton@google.com/T/#u

Jiaqi Yan (2):
  selftest/mm: test HWPoison hugetlb truncation behavior
  mm/hugetlb: immature fix to handle HWPoisoned folio

 mm/hugetlb.c                                  |   6 +
 mm/swap.c                                     |   9 +-
 tools/testing/selftests/mm/Makefile           |   1 +
 tools/testing/selftests/mm/hugetlb-mfr-base.c | 240 ++++++++++++++++++
 4 files changed, 255 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/mm/hugetlb-mfr-base.c