mbox series

[v1,0/3] Improve hugetlbfs read on HWPOISON hugepages

Message ID 20230517160948.811355-1-jiaqiyan@google.com (mailing list archive)
Headers show
Series Improve hugetlbfs read on HWPOISON hugepages | expand

Message

Jiaqi Yan May 17, 2023, 4:09 p.m. UTC
Today when hardware memory is corrupted in a hugetlb hugepage,
kernel leaves the hugepage in pagecache [1]; otherwise future mmap or
read will suject to silent data corruption. This is implemented by
returning -EIO from hugetlb_read_iter immediately if the hugepage has
HWPOISON flag set.

Since memory_failure already tracks the raw HWPOISON subpages in a
hugepage, a natural improvement is possible: if userspace only asks for
healthy subpages in the pagecache, kernel can return these data.

This patchset implements this improvement. It consist of three parts.
The 1st commit exports the functionality to tell if a subpage inside a
hugetlb hugepage is a raw HWPOISON page. The 2nd commit teaches
hugetlbfs_read_iter to return as many healthy bytes as possible.
The 3rd commit properly tests this new feature.

[1] commit 8625147cafaa ("hugetlbfs: don't delete error page from pagecache")

Jiaqi Yan (3):
  mm/hwpoison: find subpage in hugetlb HWPOISON list
  hugetlbfs: improve read HWPOISON hugepage
  selftests/mm: add tests for HWPOISON hugetlbfs read

 fs/hugetlbfs/inode.c                          |  62 +++-
 include/linux/mm.h                            |  23 ++
 mm/memory-failure.c                           |  26 +-
 tools/testing/selftests/mm/.gitignore         |   1 +
 tools/testing/selftests/mm/Makefile           |   1 +
 .../selftests/mm/hugetlb-read-hwpoison.c      | 322 ++++++++++++++++++
 6 files changed, 419 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c

Comments

Mike Kravetz May 17, 2023, 11:30 p.m. UTC | #1
On 05/17/23 16:09, Jiaqi Yan wrote:
> Today when hardware memory is corrupted in a hugetlb hugepage,
> kernel leaves the hugepage in pagecache [1]; otherwise future mmap or
> read will suject to silent data corruption. This is implemented by
> returning -EIO from hugetlb_read_iter immediately if the hugepage has
> HWPOISON flag set.
> 
> Since memory_failure already tracks the raw HWPOISON subpages in a
> hugepage, a natural improvement is possible: if userspace only asks for
> healthy subpages in the pagecache, kernel can return these data.

Thanks for putting this together.

I recall discussing this some time back, and deciding to wait and see
how HGM would progress.  Since it may be some time before HGM goes
upstream, it would be reasonable to consider this again.

One quick question.
Do you have an actual use case for this?  It certainly is an improvement
over existing functionality.  However, I am not aware of too many (?any?)
users actually doing read() calls on hugetlb files.
Jiaqi Yan May 18, 2023, 4:02 p.m. UTC | #2
On Wed, May 17, 2023 at 4:30 PM Mike Kravetz <mike.kravetz@oracle.com>
wrote:

> On 05/17/23 16:09, Jiaqi Yan wrote:
> > Today when hardware memory is corrupted in a hugetlb hugepage,
> > kernel leaves the hugepage in pagecache [1]; otherwise future mmap or
> > read will suject to silent data corruption. This is implemented by
> > returning -EIO from hugetlb_read_iter immediately if the hugepage has
> > HWPOISON flag set.
> >
> > Since memory_failure already tracks the raw HWPOISON subpages in a
> > hugepage, a natural improvement is possible: if userspace only asks for
> > healthy subpages in the pagecache, kernel can return these data.
>
> Thanks for putting this together.
>
> I recall discussing this some time back, and deciding to wait and see
> how HGM would progress.  Since it may be some time before HGM goes
> upstream, it would be reasonable to consider this again.
>

This improvement actually does NOT depend on HGM at all. No page table
related stuff involved here. The other RFC [2] I sent earlier DOES require
HGM. This improvement was brought up by James when we were working on [2].
In "Future Work" section of the cover letter, I thought HGM was needed but
soon I found I was wrong.


> One quick question.
> Do you have an actual use case for this?  It certainly is an improvement
> over existing functionality.  However, I am not aware of too many (?any?)
> users actually doing read() calls on hugetlb files.
>

I don't have any use case. I did search on Github for around half a hour
and all the hugetlb usages are done via mmap.


> --
> Mike Kravetz
>
> > This patchset implements this improvement. It consist of three parts.
> > The 1st commit exports the functionality to tell if a subpage inside a
> > hugetlb hugepage is a raw HWPOISON page. The 2nd commit teaches
> > hugetlbfs_read_iter to return as many healthy bytes as possible.
> > The 3rd commit properly tests this new feature.
> >
> > [1] commit 8625147cafaa ("hugetlbfs: don't delete error page from
> pagecache")
> >
> > Jiaqi Yan (3):
> >   mm/hwpoison: find subpage in hugetlb HWPOISON list
> >   hugetlbfs: improve read HWPOISON hugepage
> >   selftests/mm: add tests for HWPOISON hugetlbfs read
> >
> >  fs/hugetlbfs/inode.c                          |  62 +++-
> >  include/linux/mm.h                            |  23 ++
> >  mm/memory-failure.c                           |  26 +-
> >  tools/testing/selftests/mm/.gitignore         |   1 +
> >  tools/testing/selftests/mm/Makefile           |   1 +
> >  .../selftests/mm/hugetlb-read-hwpoison.c      | 322 ++++++++++++++++++
> >  6 files changed, 419 insertions(+), 16 deletions(-)
> >  create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c
> >
> > --
> > 2.40.1.606.ga4b1b128d6-goog
> >


[2]
https://lore.kernel.org/linux-mm/20230428004139.2899856-6-jiaqiyan@google.com/T/#m97c6edef8ad0cc9b064e1fd9369b8521dcfa43de
Jiaqi Yan May 18, 2023, 4:10 p.m. UTC | #3
On Wed, May 17, 2023 at 4:30 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 05/17/23 16:09, Jiaqi Yan wrote:
> > Today when hardware memory is corrupted in a hugetlb hugepage,
> > kernel leaves the hugepage in pagecache [1]; otherwise future mmap or
> > read will suject to silent data corruption. This is implemented by
> > returning -EIO from hugetlb_read_iter immediately if the hugepage has
> > HWPOISON flag set.
> >
> > Since memory_failure already tracks the raw HWPOISON subpages in a
> > hugepage, a natural improvement is possible: if userspace only asks for
> > healthy subpages in the pagecache, kernel can return these data.
>
> Thanks for putting this together.
>
> I recall discussing this some time back, and deciding to wait and see
> how HGM would progress.  Since it may be some time before HGM goes
> upstream, it would be reasonable to consider this again.

This improvement actually does NOT depend on HGM at all. No page table
related stuff involved here. The other RFC [2] I sent earlier DOES
require HGM. This improvement was brought up by James when we were
working on [2]. In "Future Work" section of the cover letter, I
thought HGM was needed but soon when I code it up, I found I was
wrong.

>
> One quick question.
> Do you have an actual use case for this?  It certainly is an improvement
> over existing functionality.  However, I am not aware of too many (?any?)
> users actually doing read() calls on hugetlb files.

I don't have any use case. I did search on Github for around half a
hour and all the hugetlb usages are done via mmap.

> --
> Mike Kravetz
>
> > This patchset implements this improvement. It consist of three parts.
> > The 1st commit exports the functionality to tell if a subpage inside a
> > hugetlb hugepage is a raw HWPOISON page. The 2nd commit teaches
> > hugetlbfs_read_iter to return as many healthy bytes as possible.
> > The 3rd commit properly tests this new feature.
> >
> > [1] commit 8625147cafaa ("hugetlbfs: don't delete error page from pagecache")

[2] https://lore.kernel.org/linux-mm/20230428004139.2899856-6-jiaqiyan@google.com/T/#m97c6edef8ad0cc9b064e1fd9369b8521dcfa43de

> >
> > Jiaqi Yan (3):
> >   mm/hwpoison: find subpage in hugetlb HWPOISON list
> >   hugetlbfs: improve read HWPOISON hugepage
> >   selftests/mm: add tests for HWPOISON hugetlbfs read
> >
> >  fs/hugetlbfs/inode.c                          |  62 +++-
> >  include/linux/mm.h                            |  23 ++
> >  mm/memory-failure.c                           |  26 +-
> >  tools/testing/selftests/mm/.gitignore         |   1 +
> >  tools/testing/selftests/mm/Makefile           |   1 +
> >  .../selftests/mm/hugetlb-read-hwpoison.c      | 322 ++++++++++++++++++
> >  6 files changed, 419 insertions(+), 16 deletions(-)
> >  create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c
> >
> > --
> > 2.40.1.606.ga4b1b128d6-goog
> >

(Sorry if you received twice, was sent in a wrong way a while ago)
Mike Kravetz May 18, 2023, 10:24 p.m. UTC | #4
On 05/18/23 09:10, Jiaqi Yan wrote:
> On Wed, May 17, 2023 at 4:30 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 05/17/23 16:09, Jiaqi Yan wrote:
> > > Today when hardware memory is corrupted in a hugetlb hugepage,
> > > kernel leaves the hugepage in pagecache [1]; otherwise future mmap or
> > > read will suject to silent data corruption. This is implemented by
> > > returning -EIO from hugetlb_read_iter immediately if the hugepage has
> > > HWPOISON flag set.
> > >
> > > Since memory_failure already tracks the raw HWPOISON subpages in a
> > > hugepage, a natural improvement is possible: if userspace only asks for
> > > healthy subpages in the pagecache, kernel can return these data.
> >
> > Thanks for putting this together.
> >
> > I recall discussing this some time back, and deciding to wait and see
> > how HGM would progress.  Since it may be some time before HGM goes
> > upstream, it would be reasonable to consider this again.
> 
> This improvement actually does NOT depend on HGM at all. No page table
> related stuff involved here. The other RFC [2] I sent earlier DOES
> require HGM. This improvement was brought up by James when we were
> working on [2]. In "Future Work" section of the cover letter, I
> thought HGM was needed but soon when I code it up, I found I was
> wrong.

Right, this has no HGM dependencies and is actually the only way I can think
of for users to extract some information from a poisoned hugetlb page.

> >
> > One quick question.
> > Do you have an actual use case for this?  It certainly is an improvement
> > over existing functionality.  However, I am not aware of too many (?any?)
> > users actually doing read() calls on hugetlb files.
> 
> I don't have any use case. I did search on Github for around half a
> hour and all the hugetlb usages are done via mmap.
> 

Ok, I was mostly curious as mmap seems to be the most common way of
accessing hugetlb pages.

Even though there is not a known use case today, I think this could be
useful for the reason above: extracting data from a poisoned hugetlb page.
Without HGM this is the only way to extract such data.

Unfortunately, read() is not an option for sysV shared memory or private
mappings.  HGM would help there.