diff mbox series

mm,swapops: Update check in is_pfn_swap_entry for hwpoison entries

Message ID 20240407130537.16977-1-osalvador@suse.de (mailing list archive)
State New
Headers show
Series mm,swapops: Update check in is_pfn_swap_entry for hwpoison entries | expand

Commit Message

Oscar Salvador April 7, 2024, 1:05 p.m. UTC
Tony reported that the Machine check recovery was broken in v6.9-rc1,
as he was hitting a VM_BUG_ON when injecting uncorrectable memory errors
to DRAM.
After some more digging and debugging on his side, he realized that this
went back to v6.1, with the introduction of 'commit 0d206b5d2e0d ("mm/swap: add
swp_offset_pfn() to fetch PFN from swap entry")'.
That commit, among other things, introduced swp_offset_pfn(), replacing
hwpoison_entry_to_pfn() in its favour.

The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(),
but is_pfn_swap_entry() never got updated to cover hwpoison entries, which
means that we would hit the VM_BUG_ON whenever we would call
swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM set.
Fix this by updating the check to cover hwpoison entries as well, and update
the comment while we are it.

Reported-by: Tony Luck <tony.luck@intel.com>
Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
Tested-by: Tony Luck <tony.luck@intel.com>
Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
Cc: <stable@vger.kernel.org> # 6.1.x
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/swapops.h | 65 +++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 32 deletions(-)

Comments

Oscar Salvador April 7, 2024, 1:56 p.m. UTC | #1
On Sun, Apr 07, 2024 at 03:05:37PM +0200, Oscar Salvador wrote:
> Tony reported that the Machine check recovery was broken in v6.9-rc1,
> as he was hitting a VM_BUG_ON when injecting uncorrectable memory errors
> to DRAM.
> After some more digging and debugging on his side, he realized that this
> went back to v6.1, with the introduction of 'commit 0d206b5d2e0d ("mm/swap: add
> swp_offset_pfn() to fetch PFN from swap entry")'.
> That commit, among other things, introduced swp_offset_pfn(), replacing
> hwpoison_entry_to_pfn() in its favour.
> 
> The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(),
> but is_pfn_swap_entry() never got updated to cover hwpoison entries, which
> means that we would hit the VM_BUG_ON whenever we would call
> swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM set.
> Fix this by updating the check to cover hwpoison entries as well, and update
> the comment while we are it.
> 
> Reported-by: Tony Luck <tony.luck@intel.com>
> Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
> Tested-by: Tony Luck <tony.luck@intel.com>
> Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
> Cc: <stable@vger.kernel.org> # 6.1.x

I think I need to clarify why the stable.

It is my understanding that some distros ship their kernel with
CONFIG_DEBUG_VM set by default (I think Fedora comes to my mind?).
I am fine with backing down if people think that this is an
overreaction.
Peter Xu April 7, 2024, 7:19 p.m. UTC | #2
On Sun, Apr 07, 2024 at 03:56:17PM +0200, Oscar Salvador wrote:
> On Sun, Apr 07, 2024 at 03:05:37PM +0200, Oscar Salvador wrote:
> > Tony reported that the Machine check recovery was broken in v6.9-rc1,
> > as he was hitting a VM_BUG_ON when injecting uncorrectable memory errors
> > to DRAM.
> > After some more digging and debugging on his side, he realized that this
> > went back to v6.1, with the introduction of 'commit 0d206b5d2e0d ("mm/swap: add
> > swp_offset_pfn() to fetch PFN from swap entry")'.
> > That commit, among other things, introduced swp_offset_pfn(), replacing
> > hwpoison_entry_to_pfn() in its favour.
> > 
> > The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(),
> > but is_pfn_swap_entry() never got updated to cover hwpoison entries, which
> > means that we would hit the VM_BUG_ON whenever we would call
> > swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM set.
> > Fix this by updating the check to cover hwpoison entries as well, and update
> > the comment while we are it.
> > 
> > Reported-by: Tony Luck <tony.luck@intel.com>
> > Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
> > Tested-by: Tony Luck <tony.luck@intel.com>
> > Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")

Totally unexpected, as this commit even removed hwpoison_entry_to_pfn().
Obviously even until now I assumed hwpoison is accounted as pfn swap entry
but it's just missing..

Since this commit didn't really change is_pfn_swap_entry() itself, I was
thinking maybe an older fix tag would apply, but then I noticed the old
code indeed should work well even if hwpoison entry is missing.  For
example, it's a grey area on whether a hwpoisoned page should be accounted
in smaps.  So I think the Fixes tag is correct, and thanks for fixing this.

Reviewed-by: Peter Xu <peterx@redhat.com>

> > Cc: <stable@vger.kernel.org> # 6.1.x
> 
> I think I need to clarify why the stable.
> 
> It is my understanding that some distros ship their kernel with
> CONFIG_DEBUG_VM set by default (I think Fedora comes to my mind?).
> I am fine with backing down if people think that this is an
> overreaction.

Fedora stopped having DEBUG_VM for some time, but not sure about when it's
still in the 6.1 trees.  It looks like cc stable is still reasonable from
that regard.

A side note is that when I'm looking at this, I went back and see why in
some cases we need the pfn maintained for the poisoned, then I saw the only
user is check_hwpoisoned_entry() who wants to do fast kills in some
contexts and that includes a double check on the pfns in a poisoned entry.
Then afaict this path is just too rarely used and buggy.

A few things we may need fixing, maybe someone in the loop would have time
to have a look:

- check_hwpoisoned_entry()
  - pte_none check is missing
  - all the rest swap types are missing (e.g., we want to kill the proc too
    if the page is during migration)
- check_hwpoisoned_pmd_entry()
  - need similar care like above (pmd_none is covered not others)

I copied Naoya too.

Thanks,
Oscar Salvador April 7, 2024, 8:31 p.m. UTC | #3
> Totally unexpected, as this commit even removed hwpoison_entry_to_pfn().
> Obviously even until now I assumed hwpoison is accounted as pfn swap entry
> but it's just missing..
> 
> Since this commit didn't really change is_pfn_swap_entry() itself, I was
> thinking maybe an older fix tag would apply, but then I noticed the old
> code indeed should work well even if hwpoison entry is missing.  For
> example, it's a grey area on whether a hwpoisoned page should be accounted
> in smaps.  So I think the Fixes tag is correct, and thanks for fixing this.
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks Peter

> Fedora stopped having DEBUG_VM for some time, but not sure about when it's
> still in the 6.1 trees.  It looks like cc stable is still reasonable from
> that regard.

Good to know, thanks for the info.

> A side note is that when I'm looking at this, I went back and see why in
> some cases we need the pfn maintained for the poisoned, then I saw the only
> user is check_hwpoisoned_entry() who wants to do fast kills in some
> contexts and that includes a double check on the pfns in a poisoned entry.
> Then afaict this path is just too rarely used and buggy.

Yes, unfortunately memory-failure code does not get exercised that much,
and so there might be subtly bugs lurking in there for quite some time.

> A few things we may need fixing, maybe someone in the loop would have time
> to have a look:
> 
> - check_hwpoisoned_entry()
>   - pte_none check is missing
>   - all the rest swap types are missing (e.g., we want to kill the proc too
>     if the page is during migration)
> - check_hwpoisoned_pmd_entry()
>   - need similar care like above (pmd_none is covered not others)

I will have a look and see what needs fixing, thanks for bringing it up.
David Hildenbrand April 8, 2024, 7:44 a.m. UTC | #4
On 07.04.24 15:05, Oscar Salvador wrote:
> Tony reported that the Machine check recovery was broken in v6.9-rc1,
> as he was hitting a VM_BUG_ON when injecting uncorrectable memory errors
> to DRAM.
> After some more digging and debugging on his side, he realized that this
> went back to v6.1, with the introduction of 'commit 0d206b5d2e0d ("mm/swap: add
> swp_offset_pfn() to fetch PFN from swap entry")'.
> That commit, among other things, introduced swp_offset_pfn(), replacing
> hwpoison_entry_to_pfn() in its favour.
> 
> The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(),
> but is_pfn_swap_entry() never got updated to cover hwpoison entries, which
> means that we would hit the VM_BUG_ON whenever we would call
> swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM set.
> Fix this by updating the check to cover hwpoison entries as well, and update
> the comment while we are it.
> 
> Reported-by: Tony Luck <tony.luck@intel.com>
> Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
> Tested-by: Tony Luck <tony.luck@intel.com>
> Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
> Cc: <stable@vger.kernel.org> # 6.1.x
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>
Miaohe Lin April 8, 2024, 8:31 a.m. UTC | #5
On 2024/4/7 21:05, Oscar Salvador wrote:
> Tony reported that the Machine check recovery was broken in v6.9-rc1,
> as he was hitting a VM_BUG_ON when injecting uncorrectable memory errors
> to DRAM.
> After some more digging and debugging on his side, he realized that this
> went back to v6.1, with the introduction of 'commit 0d206b5d2e0d ("mm/swap: add
> swp_offset_pfn() to fetch PFN from swap entry")'.
> That commit, among other things, introduced swp_offset_pfn(), replacing
> hwpoison_entry_to_pfn() in its favour.
> 
> The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(),
> but is_pfn_swap_entry() never got updated to cover hwpoison entries, which
> means that we would hit the VM_BUG_ON whenever we would call
> swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM set.
> Fix this by updating the check to cover hwpoison entries as well, and update
> the comment while we are it.
> 
> Reported-by: Tony Luck <tony.luck@intel.com>
> Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
> Tested-by: Tony Luck <tony.luck@intel.com>
> Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
> Cc: <stable@vger.kernel.org> # 6.1.x
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Acked-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks.
Miaohe Lin April 10, 2024, 8:16 a.m. UTC | #6
On 2024/4/8 4:31, Oscar Salvador wrote:
>> Totally unexpected, as this commit even removed hwpoison_entry_to_pfn().
>> Obviously even until now I assumed hwpoison is accounted as pfn swap entry
>> but it's just missing..
>>
>> Since this commit didn't really change is_pfn_swap_entry() itself, I was
>> thinking maybe an older fix tag would apply, but then I noticed the old
>> code indeed should work well even if hwpoison entry is missing.  For
>> example, it's a grey area on whether a hwpoisoned page should be accounted
>> in smaps.  So I think the Fixes tag is correct, and thanks for fixing this.
>>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> Thanks Peter

Thanks both.

> 
>> Fedora stopped having DEBUG_VM for some time, but not sure about when it's
>> still in the 6.1 trees.  It looks like cc stable is still reasonable from
>> that regard.
> 
> Good to know, thanks for the info.
> 
>> A side note is that when I'm looking at this, I went back and see why in
>> some cases we need the pfn maintained for the poisoned, then I saw the only
>> user is check_hwpoisoned_entry() who wants to do fast kills in some
>> contexts and that includes a double check on the pfns in a poisoned entry.
>> Then afaict this path is just too rarely used and buggy.
> 
> Yes, unfortunately memory-failure code does not get exercised that much,
> and so there might be subtly bugs lurking in there for quite some time.

There're many memory-failure testcases but some code paths still didn't get
exercised. That's a pity. :(

> 
>> A few things we may need fixing, maybe someone in the loop would have time
>> to have a look:
>>
>> - check_hwpoisoned_entry()
>>   - pte_none check is missing
>>   - all the rest swap types are missing (e.g., we want to kill the proc too
>>     if the page is during migration)

Firstly, I thought rest swap types just won't exist in this code path. But after second thought,
it seems it's possible. For example, when page is being isolated for migration, memory_failure
will fails to isolate it. And the second MCE event will goes to kill_accessing_process() and
see a migrate swap entry.

>> - check_hwpoisoned_pmd_entry()
>>   - need similar care like above (pmd_none is covered not others)
> 
> I will have a look and see what needs fixing, thanks for bringing it up.

Thanks for your time.
.

> 
>
diff mbox series

Patch

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 48b700ba1d18..a5c560a2f8c2 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -390,6 +390,35 @@  static inline bool is_migration_entry_dirty(swp_entry_t entry)
 }
 #endif	/* CONFIG_MIGRATION */
 
+#ifdef CONFIG_MEMORY_FAILURE
+
+/*
+ * Support for hardware poisoned pages
+ */
+static inline swp_entry_t make_hwpoison_entry(struct page *page)
+{
+	BUG_ON(!PageLocked(page));
+	return swp_entry(SWP_HWPOISON, page_to_pfn(page));
+}
+
+static inline int is_hwpoison_entry(swp_entry_t entry)
+{
+	return swp_type(entry) == SWP_HWPOISON;
+}
+
+#else
+
+static inline swp_entry_t make_hwpoison_entry(struct page *page)
+{
+	return swp_entry(0, 0);
+}
+
+static inline int is_hwpoison_entry(swp_entry_t swp)
+{
+	return 0;
+}
+#endif
+
 typedef unsigned long pte_marker;
 
 #define  PTE_MARKER_UFFD_WP			BIT(0)
@@ -483,8 +512,9 @@  static inline struct folio *pfn_swap_entry_folio(swp_entry_t entry)
 
 /*
  * A pfn swap entry is a special type of swap entry that always has a pfn stored
- * in the swap offset. They are used to represent unaddressable device memory
- * and to restrict access to a page undergoing migration.
+ * in the swap offset. They can either be used to represent unaddressable device
+ * memory, to restrict access to a page undergoing migration or to represent a
+ * pfn which has been hwpoisoned and unmapped.
  */
 static inline bool is_pfn_swap_entry(swp_entry_t entry)
 {
@@ -492,7 +522,7 @@  static inline bool is_pfn_swap_entry(swp_entry_t entry)
 	BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_PFN_BITS);
 
 	return is_migration_entry(entry) || is_device_private_entry(entry) ||
-	       is_device_exclusive_entry(entry);
+	       is_device_exclusive_entry(entry) || is_hwpoison_entry(entry);
 }
 
 struct page_vma_mapped_walk;
@@ -561,35 +591,6 @@  static inline int is_pmd_migration_entry(pmd_t pmd)
 }
 #endif  /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
 
-#ifdef CONFIG_MEMORY_FAILURE
-
-/*
- * Support for hardware poisoned pages
- */
-static inline swp_entry_t make_hwpoison_entry(struct page *page)
-{
-	BUG_ON(!PageLocked(page));
-	return swp_entry(SWP_HWPOISON, page_to_pfn(page));
-}
-
-static inline int is_hwpoison_entry(swp_entry_t entry)
-{
-	return swp_type(entry) == SWP_HWPOISON;
-}
-
-#else
-
-static inline swp_entry_t make_hwpoison_entry(struct page *page)
-{
-	return swp_entry(0, 0);
-}
-
-static inline int is_hwpoison_entry(swp_entry_t swp)
-{
-	return 0;
-}
-#endif
-
 static inline int non_swap_entry(swp_entry_t entry)
 {
 	return swp_type(entry) >= MAX_SWAPFILES;