mbox series

[0/6] hugetlbfs: support free page reporting

Message ID 20210106034623.GA1128@open-light-1.localdomain (mailing list archive)
Headers show
Series hugetlbfs: support free page reporting | expand

Message

Liang Li Jan. 6, 2021, 3:46 a.m. UTC
A typical usage of hugetlbfs it's to reserve amount of memory
during the kernel booting stage, and the reserved pages are
unlikely to return to the buddy system. When application need
hugepages, kernel will allocate them from the reserved pool.
when application terminates, huge pages will return to the
reserved pool and are kept in the free list for hugetlbfs,
these free pages will not return to buddy freelist unless the
size of reserved pool is changed. 
Free page reporting only supports buddy pages, it can't report
the free pages reserved for hugetlbfs. On the other hand,
hugetlbfs is a good choice for system with a huge amount of RAM,
because it can help to reduce the memory management overhead and
improve system performance.
This patch add the support for reporting hugepages in the free
list of hugetlbfs, it can be used by virtio_balloon driver for
memory overcommit and pre zero out free pages for speeding up
memory population and page fault handling.

Most of the code are 'copied' from free page reporting because
they are working in the same way. So the code can be refined to
remove duplication. It can be done later.

Since some guys have some concern about side effect of the 'buddy
free page pre zero out' feature brings, I remove it from this
serier.

Liang Li (6):
  mm: Add batch size for free page reporting
  mm: let user decide page reporting option
  hugetlb: add free page reporting support
  hugetlb: avoid allocation failed when page reporting is on going
  virtio-balloon: reporting hugetlb free page to host
  hugetlb: support free hugepage pre zero out

 drivers/virtio/virtio_balloon.c |  58 +++++-
 include/linux/hugetlb.h         |   5 +
 include/linux/page-flags.h      |  12 ++
 include/linux/page_reporting.h  |   7 +
 mm/Kconfig                      |  11 ++
 mm/huge_memory.c                |   3 +-
 mm/hugetlb.c                    | 271 +++++++++++++++++++++++++++
 mm/memory.c                     |   4 +
 mm/page_reporting.c             | 315 +++++++++++++++++++++++++++++++-
 mm/page_reporting.h             |  50 ++++-
 10 files changed, 725 insertions(+), 11 deletions(-)

Comments

David Hildenbrand Jan. 6, 2021, 9:41 a.m. UTC | #1
On 06.01.21 04:46, Liang Li wrote:
> A typical usage of hugetlbfs it's to reserve amount of memory
> during the kernel booting stage, and the reserved pages are
> unlikely to return to the buddy system. When application need
> hugepages, kernel will allocate them from the reserved pool.
> when application terminates, huge pages will return to the
> reserved pool and are kept in the free list for hugetlbfs,
> these free pages will not return to buddy freelist unless the
> size of reserved pool is changed. 
> Free page reporting only supports buddy pages, it can't report
> the free pages reserved for hugetlbfs. On the other hand,
> hugetlbfs is a good choice for system with a huge amount of RAM,
> because it can help to reduce the memory management overhead and
> improve system performance.
> This patch add the support for reporting hugepages in the free
> list of hugetlbfs, it can be used by virtio_balloon driver for
> memory overcommit and pre zero out free pages for speeding up
> memory population and page fault handling.

You should lay out the use case + measurements. Further you should
describe what this patch set actually does, how behavior can be tuned,
pros and cons, etc... And you should most probably keep this RFC.

> 
> Most of the code are 'copied' from free page reporting because
> they are working in the same way. So the code can be refined to
> remove duplication. It can be done later.

Nothing speaks about getting it right from the beginning. Otherwise it
will most likely never happen.

> 
> Since some guys have some concern about side effect of the 'buddy
> free page pre zero out' feature brings, I remove it from this
> serier.

You should really point out what changed size the last version. I
remember Alex and Mike had some pretty solid points of what they don't
want to see (especially: don't use free page reporting infrastructure
and don't temporarily allocate huge pages for processing them).

I am not convinced that we want to use the free page reporting
infrastructure for this (pre-zeroing huge pages). What speaks about a
thread simply iterating over huge pages one at a time, zeroing them? The
whole free page reporting infrastructure was invented because we have to
do expensive coordination (+ locking) when going via the hypervisor. For
the main use case of zeroing huge pages in the background, I don't see a
real need for that. If you believe this is the right thing to do, please
add a discussion regarding this.
Liang Li Jan. 7, 2021, 1:50 a.m. UTC | #2
On Wed, Jan 6, 2021 at 5:41 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 06.01.21 04:46, Liang Li wrote:
> > A typical usage of hugetlbfs it's to reserve amount of memory
> > during the kernel booting stage, and the reserved pages are
> > unlikely to return to the buddy system. When application need
> > hugepages, kernel will allocate them from the reserved pool.
> > when application terminates, huge pages will return to the
> > reserved pool and are kept in the free list for hugetlbfs,
> > these free pages will not return to buddy freelist unless the
> > size of reserved pool is changed.
> > Free page reporting only supports buddy pages, it can't report
> > the free pages reserved for hugetlbfs. On the other hand,
> > hugetlbfs is a good choice for system with a huge amount of RAM,
> > because it can help to reduce the memory management overhead and
> > improve system performance.
> > This patch add the support for reporting hugepages in the free
> > list of hugetlbfs, it can be used by virtio_balloon driver for
> > memory overcommit and pre zero out free pages for speeding up
> > memory population and page fault handling.
>
> You should lay out the use case + measurements. Further you should
> describe what this patch set actually does, how behavior can be tuned,
> pros and cons, etc... And you should most probably keep this RFC.
>
> >
> > Most of the code are 'copied' from free page reporting because
> > they are working in the same way. So the code can be refined to
> > remove duplication. It can be done later.
>
> Nothing speaks about getting it right from the beginning. Otherwise it
> will most likely never happen.
>
> >
> > Since some guys have some concern about side effect of the 'buddy
> > free page pre zero out' feature brings, I remove it from this
> > serier.
>
> You should really point out what changed size the last version. I
> remember Alex and Mike had some pretty solid points of what they don't
> want to see (especially: don't use free page reporting infrastructure
> and don't temporarily allocate huge pages for processing them).
>
> I am not convinced that we want to use the free page reporting
> infrastructure for this (pre-zeroing huge pages). What speaks about a
> thread simply iterating over huge pages one at a time, zeroing them? The
> whole free page reporting infrastructure was invented because we have to
> do expensive coordination (+ locking) when going via the hypervisor. For
> the main use case of zeroing huge pages in the background, I don't see a
> real need for that. If you believe this is the right thing to do, please
> add a discussion regarding this.
>
> --
> Thanks,
>
> David / dhildenb
>
>
I will take all your advice and give more detail in the next revision,
Thanks for your comments!

Liang