Message ID | 20210930215311.240774-1-shy828301@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Solve silent data loss caused by poisoned page cache (shmem/tmpfs) | expand |
On Thu, Sep 30, 2021 at 02:53:06PM -0700, Yang Shi wrote: > Yang Shi (5): > mm: hwpoison: remove the unnecessary THP check > mm: filemap: check if THP has hwpoisoned subpage for PMD page fault > mm: hwpoison: refactor refcount check handling > mm: shmem: don't truncate page if memory failure happens > mm: hwpoison: handle non-anonymous THP correctly Today I just noticed one more thing: unpoison path has (unpoison_memory): if (page_mapping(page)) { unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n", pfn, &unpoison_rs); return 0; } I _think_ it was used to make sure we ignore page that was not successfully poisoned/offlined before (for anonymous), so raising this question up on whether we should make sure e.g. shmem hwpoisoned pages still can be unpoisoned for debugging purposes.
On Tue, Oct 12, 2021 at 7:41 PM Peter Xu <peterx@redhat.com> wrote: > > On Thu, Sep 30, 2021 at 02:53:06PM -0700, Yang Shi wrote: > > Yang Shi (5): > > mm: hwpoison: remove the unnecessary THP check > > mm: filemap: check if THP has hwpoisoned subpage for PMD page fault > > mm: hwpoison: refactor refcount check handling > > mm: shmem: don't truncate page if memory failure happens > > mm: hwpoison: handle non-anonymous THP correctly > > Today I just noticed one more thing: unpoison path has (unpoison_memory): > > if (page_mapping(page)) { > unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n", > pfn, &unpoison_rs); > return 0; > } > > I _think_ it was used to make sure we ignore page that was not successfully > poisoned/offlined before (for anonymous), so raising this question up on > whether we should make sure e.g. shmem hwpoisoned pages still can be unpoisoned > for debugging purposes. Yes, not only mapping, the refcount check is not right if page cache page is kept in page cache instead of being truncated after this series. But actually unpoison has been broken since commit 0ed950d1f28142ccd9a9453c60df87853530d778 ("mm,hwpoison: make get_hwpoison_page() call get_any_page()"). And Naoya said in the commit "unpoison_memory() is also unchanged because it's broken and need thorough fixes (will be done later)." I do have some fixes in my tree to unblock tests and fix unpoison for this series (just make it work for testing). Naoya may have some ideas in mind and it is just a debugging feature so I don't think it must be fixed in this series. It could be done later. I could add a TODO section in the cover letter to make this more clear. > > -- > Peter Xu >
On Tue, Oct 12, 2021 at 08:09:24PM -0700, Yang Shi wrote: > On Tue, Oct 12, 2021 at 7:41 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Thu, Sep 30, 2021 at 02:53:06PM -0700, Yang Shi wrote: > > > Yang Shi (5): > > > mm: hwpoison: remove the unnecessary THP check > > > mm: filemap: check if THP has hwpoisoned subpage for PMD page fault > > > mm: hwpoison: refactor refcount check handling > > > mm: shmem: don't truncate page if memory failure happens > > > mm: hwpoison: handle non-anonymous THP correctly > > > > Today I just noticed one more thing: unpoison path has (unpoison_memory): > > > > if (page_mapping(page)) { > > unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n", > > pfn, &unpoison_rs); > > return 0; > > } > > > > I _think_ it was used to make sure we ignore page that was not successfully > > poisoned/offlined before (for anonymous), so raising this question up on > > whether we should make sure e.g. shmem hwpoisoned pages still can be unpoisoned > > for debugging purposes. > > Yes, not only mapping, the refcount check is not right if page cache > page is kept in page cache instead of being truncated after this > series. But actually unpoison has been broken since commit > 0ed950d1f28142ccd9a9453c60df87853530d778 ("mm,hwpoison: make > get_hwpoison_page() call get_any_page()"). And Naoya said in the > commit "unpoison_memory() is also unchanged because it's broken and > need thorough fixes (will be done later)." > > I do have some fixes in my tree to unblock tests and fix unpoison for > this series (just make it work for testing). Naoya may have some ideas > in mind and it is just a debugging feature so I don't think it must be > fixed in this series. It could be done later. I could add a TODO > section in the cover letter to make this more clear. I see, that sounds good enough to me. Thanks,
On Tue, Oct 12, 2021 at 08:09:24PM -0700, Yang Shi wrote: > On Tue, Oct 12, 2021 at 7:41 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Thu, Sep 30, 2021 at 02:53:06PM -0700, Yang Shi wrote: > > > Yang Shi (5): > > > mm: hwpoison: remove the unnecessary THP check > > > mm: filemap: check if THP has hwpoisoned subpage for PMD page fault > > > mm: hwpoison: refactor refcount check handling > > > mm: shmem: don't truncate page if memory failure happens > > > mm: hwpoison: handle non-anonymous THP correctly > > > > Today I just noticed one more thing: unpoison path has (unpoison_memory): > > > > if (page_mapping(page)) { > > unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n", > > pfn, &unpoison_rs); > > return 0; > > } > > > > I _think_ it was used to make sure we ignore page that was not successfully > > poisoned/offlined before (for anonymous), so raising this question up on > > whether we should make sure e.g. shmem hwpoisoned pages still can be unpoisoned > > for debugging purposes. > > Yes, not only mapping, the refcount check is not right if page cache > page is kept in page cache instead of being truncated after this > series. But actually unpoison has been broken since commit > 0ed950d1f28142ccd9a9453c60df87853530d778 ("mm,hwpoison: make > get_hwpoison_page() call get_any_page()"). And Naoya said in the > commit "unpoison_memory() is also unchanged because it's broken and > need thorough fixes (will be done later)." > Yeah, I have a patch but still not make it merged to upstream. Sorry about that... > I do have some fixes in my tree to unblock tests and fix unpoison for > this series (just make it work for testing). Naoya may have some ideas > in mind and it is just a debugging feature so I don't think it must be > fixed in this series. It could be done later. I could add a TODO > section in the cover letter to make this more clear. Yes, it's fine to me that you leave this unsolved in this patchset. Thanks, Naoya Horiguchi