diff mbox series

[v4,04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate

Message ID 20201113105952.11638-5-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Free some vmemmap pages of hugetlb page | expand

Commit Message

Muchun Song Nov. 13, 2020, 10:59 a.m. UTC
If the size of HugeTLB page is 2MB, we need 512 struct page structures
(8 pages) to be associated with it. As far as I know, we only use the
first 4 struct page structures. Use of first 4 struct page structures
comes from HUGETLB_CGROUP_MIN_ORDER.

For tail pages, the value of compound_head is the same. So we can reuse
first page of tail page structs. We map the virtual addresses of the
remaining 6 pages of tail page structs to the first tail page struct,
and then free these 6 pages. Therefore, we need to reserve at least 2
pages as vmemmap areas.

So we introduce a new nr_free_vmemmap_pages field in the hstate to
indicate how many vmemmap pages associated with a HugeTLB page that we
can free to buddy system.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h |   3 ++
 mm/Makefile             |   1 +
 mm/hugetlb.c            |   3 ++
 mm/hugetlb_vmemmap.c    | 108 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/hugetlb_vmemmap.h    |  20 +++++++++
 5 files changed, 135 insertions(+)
 create mode 100644 mm/hugetlb_vmemmap.c
 create mode 100644 mm/hugetlb_vmemmap.h

Comments

Oscar Salvador Nov. 16, 2020, 1:33 p.m. UTC | #1
On Fri, Nov 13, 2020 at 06:59:35PM +0800, Muchun Song wrote:
> If the size of HugeTLB page is 2MB, we need 512 struct page structures
> (8 pages) to be associated with it. As far as I know, we only use the
> first 4 struct page structures. Use of first 4 struct page structures
> comes from HUGETLB_CGROUP_MIN_ORDER.

Once you mention 2MB HugeTLB page and its specific I would also mention
1GB HugeTLB pages, maybe something along these lines.
I would supress "As far as I know", we __know__ that we only use
the first 4 struct page structures to track metadata information.

> +/*
> + * There are 512 struct page structures(8 pages) associated with each 2MB
> + * hugetlb page. For tail pages, the value of compound_head is the same.
> + * So we can reuse first page of tail page structures. We map the virtual
> + * addresses of the remaining 6 pages of tail page structures to the first
> + * tail page struct, and then free these 6 pages. Therefore, we need to
> + * reserve at least 2 pages as vmemmap areas.
> + */
> +#define RESERVE_VMEMMAP_NR		2U

Either I would include 1GB specific there as well, or I would not add
any specifics at all and just go by saying that first two pages are used,
and the rest can be remapped to the first page that contains the tails.


> +void __init hugetlb_vmemmap_init(struct hstate *h)
> +{
> +	unsigned int order = huge_page_order(h);
> +	unsigned int vmemmap_pages;
> +
> +	vmemmap_pages = ((1 << order) * sizeof(struct page)) >> PAGE_SHIFT;
> +	/*
> +	 * The head page and the first tail page are not to be freed to buddy
> +	 * system, the others page will map to the first tail page. So there
"the remaining pages" might be more clear.

> +	 * are (@vmemmap_pages - RESERVE_VMEMMAP_NR) pages can be freed.
"that can be freed"

> +	 *
> +	 * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? This is
> +	 * not expected to happen unless the system is corrupted. So on the
> +	 * safe side, it is only a safety net.
> +	 */
> +	if (likely(vmemmap_pages > RESERVE_VMEMMAP_NR))
> +		h->nr_free_vmemmap_pages = vmemmap_pages - RESERVE_VMEMMAP_NR;
> +	else
> +		h->nr_free_vmemmap_pages = 0;

This made think of something.
Since struct hstate hstates is global, all the fields should be defined to 0.
So, the following assignments in hugetlb_add_hstate:

        h->nr_huge_pages = 0;
        h->free_huge_pages = 0;

should not be needed.
Actually, we do not initialize other values like resv_huge_pages
or surplus_huge_pages.

If that is the case, the "else" could go.

Mike?

The changes itself look good to me.
I think that putting all the vemmap stuff into hugetlb-vmemmap.* was
the right choice.
Muchun Song Nov. 16, 2020, 3:40 p.m. UTC | #2
On Mon, Nov 16, 2020 at 9:33 PM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Fri, Nov 13, 2020 at 06:59:35PM +0800, Muchun Song wrote:
> > If the size of HugeTLB page is 2MB, we need 512 struct page structures
> > (8 pages) to be associated with it. As far as I know, we only use the
> > first 4 struct page structures. Use of first 4 struct page structures
> > comes from HUGETLB_CGROUP_MIN_ORDER.
>
> Once you mention 2MB HugeTLB page and its specific I would also mention
> 1GB HugeTLB pages, maybe something along these lines.
> I would supress "As far as I know", we __know__ that we only use
> the first 4 struct page structures to track metadata information.

Thanks. Will do.

>
> > +/*
> > + * There are 512 struct page structures(8 pages) associated with each 2MB
> > + * hugetlb page. For tail pages, the value of compound_head is the same.
> > + * So we can reuse first page of tail page structures. We map the virtual
> > + * addresses of the remaining 6 pages of tail page structures to the first
> > + * tail page struct, and then free these 6 pages. Therefore, we need to
> > + * reserve at least 2 pages as vmemmap areas.
> > + */
> > +#define RESERVE_VMEMMAP_NR           2U
>
> Either I would include 1GB specific there as well, or I would not add
> any specifics at all and just go by saying that first two pages are used,
> and the rest can be remapped to the first page that contains the tails.

Thanks. Will do.

>
>
> > +void __init hugetlb_vmemmap_init(struct hstate *h)
> > +{
> > +     unsigned int order = huge_page_order(h);
> > +     unsigned int vmemmap_pages;
> > +
> > +     vmemmap_pages = ((1 << order) * sizeof(struct page)) >> PAGE_SHIFT;
> > +     /*
> > +      * The head page and the first tail page are not to be freed to buddy
> > +      * system, the others page will map to the first tail page. So there
> "the remaining pages" might be more clear.

Thanks.

>
> > +      * are (@vmemmap_pages - RESERVE_VMEMMAP_NR) pages can be freed.
> "that can be freed"

Thanks.

>
> > +      *
> > +      * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? This is
> > +      * not expected to happen unless the system is corrupted. So on the
> > +      * safe side, it is only a safety net.
> > +      */
> > +     if (likely(vmemmap_pages > RESERVE_VMEMMAP_NR))
> > +             h->nr_free_vmemmap_pages = vmemmap_pages - RESERVE_VMEMMAP_NR;
> > +     else
> > +             h->nr_free_vmemmap_pages = 0;
>
> This made think of something.
> Since struct hstate hstates is global, all the fields should be defined to 0.
> So, the following assignments in hugetlb_add_hstate:
>
>         h->nr_huge_pages = 0;
>         h->free_huge_pages = 0;
>
> should not be needed.
> Actually, we do not initialize other values like resv_huge_pages
> or surplus_huge_pages.
>
> If that is the case, the "else" could go.

Yeah, I agree with you.

>
> Mike?
>
> The changes itself look good to me.
> I think that putting all the vemmap stuff into hugetlb-vmemmap.* was
> the right choice.
>
>
> --
> Oscar Salvador
> SUSE L3
Mike Kravetz Nov. 18, 2020, 10:54 p.m. UTC | #3
On 11/16/20 5:33 AM, Oscar Salvador wrote:
> On Fri, Nov 13, 2020 at 06:59:35PM +0800, Muchun Song wrote:
>> +void __init hugetlb_vmemmap_init(struct hstate *h)
>> +{
>> +	unsigned int order = huge_page_order(h);
>> +	unsigned int vmemmap_pages;
>> +
>> +	vmemmap_pages = ((1 << order) * sizeof(struct page)) >> PAGE_SHIFT;
>> +	/*
>> +	 * The head page and the first tail page are not to be freed to buddy
>> +	 * system, the others page will map to the first tail page. So there
> "the remaining pages" might be more clear.
> 
>> +	 * are (@vmemmap_pages - RESERVE_VMEMMAP_NR) pages can be freed.
> "that can be freed"
> 
>> +	 *
>> +	 * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? This is
>> +	 * not expected to happen unless the system is corrupted. So on the
>> +	 * safe side, it is only a safety net.
>> +	 */
>> +	if (likely(vmemmap_pages > RESERVE_VMEMMAP_NR))
>> +		h->nr_free_vmemmap_pages = vmemmap_pages - RESERVE_VMEMMAP_NR;
>> +	else
>> +		h->nr_free_vmemmap_pages = 0;
> 
> This made think of something.
> Since struct hstate hstates is global, all the fields should be defined to 0.
> So, the following assignments in hugetlb_add_hstate:
> 
>         h->nr_huge_pages = 0;
>         h->free_huge_pages = 0;
> 
> should not be needed.
> Actually, we do not initialize other values like resv_huge_pages
> or surplus_huge_pages.
> 
> If that is the case, the "else" could go.
> 
> Mike?

Correct.  Those assignments have been in the code for a very long time.

> The changes itself look good to me.
> I think that putting all the vemmap stuff into hugetlb-vmemmap.* was
> the right choice.

Agree!
Mike Kravetz Nov. 18, 2020, 11:48 p.m. UTC | #4
On 11/13/20 2:59 AM, Muchun Song wrote:
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> new file mode 100644
> index 000000000000..a6c9948302e2
> --- /dev/null
> +++ b/mm/hugetlb_vmemmap.c
> @@ -0,0 +1,108 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Free some vmemmap pages of HugeTLB
> + *
> + * Copyright (c) 2020, Bytedance. All rights reserved.
> + *
> + *     Author: Muchun Song <songmuchun@bytedance.com>
> + *

Oscar has already made some suggestions to change comments.  I would suggest
changing the below text to something like the following.

> + * Nowadays we track the status of physical page frames using struct page
> + * structures arranged in one or more arrays. And here exists one-to-one
> + * mapping between the physical page frame and the corresponding struct page
> + * structure.
> + *
> + * The HugeTLB support is built on top of multiple page size support that
> + * is provided by most modern architectures. For example, x86 CPUs normally
> + * support 4K and 2M (1G if architecturally supported) page sizes. Every
> + * HugeTLB has more than one struct page structure. The 2M HugeTLB has 512
> + * struct page structure and 1G HugeTLB has 4096 struct page structures. But
> + * in the core of HugeTLB only uses the first 4 (Use of first 4 struct page
> + * structures comes from HUGETLB_CGROUP_MIN_ORDER.) struct page structures to
> + * store metadata associated with each HugeTLB. The rest of the struct page
> + * structures are usually read the compound_head field which are all the same
> + * value. If we can free some struct page memory to buddy system so that we
> + * can save a lot of memory.
> + *

struct page structures (page structs) are used to describe a physical page
frame.  By default, there is a one-to-one mapping from a page frame to
it's corresponding page struct.

HugeTLB pages consist of multiple base page size pages and is supported by
many architectures. See hugetlbpage.rst in the Documentation directory for
more details.  On the x86 architecture, HugeTLB pages of size 2MB and 1GB
are currently supported.  Since the base page size on x86 is 4KB, a 2MB
HugeTLB page consists of 512 base pages and a 1GB HugeTLB page consists of
4096 base pages.  For each base page, there is a corresponding page struct.

Within the HugeTLB subsystem, only the first 4 page structs are used to
contain unique information about a HugeTLB page.  HUGETLB_CGROUP_MIN_ORDER
provides this upper limit.  The only 'useful' information in the remaining
page structs is the compound_head field, and this field is the same for all
tail pages.

By removing redundant page structs for HugeTLB pages, memory can returned
to the buddy allocator for other uses.

> + * When the system boot up, every 2M HugeTLB has 512 struct page structures
> + * which size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE).
> + *
> + *    HugeTLB                  struct pages(8 pages)         page frame(8 pages)
> + * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
> + * |           |                     |     0     | -------------> |     0     |
> + * |           |                     |     1     | -------------> |     1     |
> + * |           |                     |     2     | -------------> |     2     |
> + * |           |                     |     3     | -------------> |     3     |
> + * |           |                     |     4     | -------------> |     4     |
> + * |     2M    |                     |     5     | -------------> |     5     |
> + * |           |                     |     6     | -------------> |     6     |
> + * |           |                     |     7     | -------------> |     7     |
> + * |           |                     +-----------+                +-----------+
> + * |           |
> + * |           |
> + * +-----------+
> + *
> + *

I think we want the description before the next diagram.

Reworded description here:

The value of compound_head is the same for all tail pages.  The first page of
page structs (page 0) associated with the HugeTLB page contains the 4 page
structs necessary to describe the HugeTLB.  The only use of the remaining pages
of page structs (page 1 to page 7) is to point to compound_head.  Therefore,
we can remap pages 2 to 7 to page 1.  Only 2 pages of page structs will be used
for each HugeTLB page.  This will allow us to free the remaining 6 pages to 
the buddy allocator.  

Here is how things look after remapping.

> + *
> + *    HugeTLB                  struct pages(8 pages)         page frame(8 pages)
> + * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
> + * |           |                     |     0     | -------------> |     0     |
> + * |           |                     |     1     | -------------> |     1     |
> + * |           |                     |     2     | -------------> +-----------+
> + * |           |                     |     3     | -----------------^ ^ ^ ^ ^
> + * |           |                     |     4     | -------------------+ | | |
> + * |     2M    |                     |     5     | ---------------------+ | |
> + * |           |                     |     6     | -----------------------+ |
> + * |           |                     |     7     | -------------------------+
> + * |           |                     +-----------+
> + * |           |
> + * |           |
> + * +-----------+
Muchun Song Nov. 19, 2020, 3 a.m. UTC | #5
On Thu, Nov 19, 2020 at 7:48 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 11/13/20 2:59 AM, Muchun Song wrote:
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > new file mode 100644
> > index 000000000000..a6c9948302e2
> > --- /dev/null
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -0,0 +1,108 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Free some vmemmap pages of HugeTLB
> > + *
> > + * Copyright (c) 2020, Bytedance. All rights reserved.
> > + *
> > + *     Author: Muchun Song <songmuchun@bytedance.com>
> > + *
>
> Oscar has already made some suggestions to change comments.  I would suggest
> changing the below text to something like the following.

Thanks Mike. I will change the below comments.

>
> > + * Nowadays we track the status of physical page frames using struct page
> > + * structures arranged in one or more arrays. And here exists one-to-one
> > + * mapping between the physical page frame and the corresponding struct page
> > + * structure.
> > + *
> > + * The HugeTLB support is built on top of multiple page size support that
> > + * is provided by most modern architectures. For example, x86 CPUs normally
> > + * support 4K and 2M (1G if architecturally supported) page sizes. Every
> > + * HugeTLB has more than one struct page structure. The 2M HugeTLB has 512
> > + * struct page structure and 1G HugeTLB has 4096 struct page structures. But
> > + * in the core of HugeTLB only uses the first 4 (Use of first 4 struct page
> > + * structures comes from HUGETLB_CGROUP_MIN_ORDER.) struct page structures to
> > + * store metadata associated with each HugeTLB. The rest of the struct page
> > + * structures are usually read the compound_head field which are all the same
> > + * value. If we can free some struct page memory to buddy system so that we
> > + * can save a lot of memory.
> > + *
>
> struct page structures (page structs) are used to describe a physical page
> frame.  By default, there is a one-to-one mapping from a page frame to
> it's corresponding page struct.
>
> HugeTLB pages consist of multiple base page size pages and is supported by
> many architectures. See hugetlbpage.rst in the Documentation directory for
> more details.  On the x86 architecture, HugeTLB pages of size 2MB and 1GB
> are currently supported.  Since the base page size on x86 is 4KB, a 2MB
> HugeTLB page consists of 512 base pages and a 1GB HugeTLB page consists of
> 4096 base pages.  For each base page, there is a corresponding page struct.
>
> Within the HugeTLB subsystem, only the first 4 page structs are used to
> contain unique information about a HugeTLB page.  HUGETLB_CGROUP_MIN_ORDER
> provides this upper limit.  The only 'useful' information in the remaining
> page structs is the compound_head field, and this field is the same for all
> tail pages.
>
> By removing redundant page structs for HugeTLB pages, memory can returned
> to the buddy allocator for other uses.
>
> > + * When the system boot up, every 2M HugeTLB has 512 struct page structures
> > + * which size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE).
> > + *
> > + *    HugeTLB                  struct pages(8 pages)         page frame(8 pages)
> > + * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
> > + * |           |                     |     0     | -------------> |     0     |
> > + * |           |                     |     1     | -------------> |     1     |
> > + * |           |                     |     2     | -------------> |     2     |
> > + * |           |                     |     3     | -------------> |     3     |
> > + * |           |                     |     4     | -------------> |     4     |
> > + * |     2M    |                     |     5     | -------------> |     5     |
> > + * |           |                     |     6     | -------------> |     6     |
> > + * |           |                     |     7     | -------------> |     7     |
> > + * |           |                     +-----------+                +-----------+
> > + * |           |
> > + * |           |
> > + * +-----------+
> > + *
> > + *
>
> I think we want the description before the next diagram.
>
> Reworded description here:
>
> The value of compound_head is the same for all tail pages.  The first page of
> page structs (page 0) associated with the HugeTLB page contains the 4 page
> structs necessary to describe the HugeTLB.  The only use of the remaining pages
> of page structs (page 1 to page 7) is to point to compound_head.  Therefore,
> we can remap pages 2 to 7 to page 1.  Only 2 pages of page structs will be used
> for each HugeTLB page.  This will allow us to free the remaining 6 pages to
> the buddy allocator.
>
> Here is how things look after remapping.
>
> > + *
> > + *    HugeTLB                  struct pages(8 pages)         page frame(8 pages)
> > + * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
> > + * |           |                     |     0     | -------------> |     0     |
> > + * |           |                     |     1     | -------------> |     1     |
> > + * |           |                     |     2     | -------------> +-----------+
> > + * |           |                     |     3     | -----------------^ ^ ^ ^ ^
> > + * |           |                     |     4     | -------------------+ | | |
> > + * |     2M    |                     |     5     | ---------------------+ | |
> > + * |           |                     |     6     | -----------------------+ |
> > + * |           |                     |     7     | -------------------------+
> > + * |           |                     +-----------+
> > + * |           |
> > + * |           |
> > + * +-----------+
>
> --
> Mike Kravetz
diff mbox series

Patch

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d5cc5f802dd4..eed3dd3bd626 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -492,6 +492,9 @@  struct hstate {
 	unsigned int nr_huge_pages_node[MAX_NUMNODES];
 	unsigned int free_huge_pages_node[MAX_NUMNODES];
 	unsigned int surplus_huge_pages_node[MAX_NUMNODES];
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+	unsigned int nr_free_vmemmap_pages;
+#endif
 #ifdef CONFIG_CGROUP_HUGETLB
 	/* cgroup control files */
 	struct cftype cgroup_files_dfl[7];
diff --git a/mm/Makefile b/mm/Makefile
index 752111587c99..2a734576bbc0 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -71,6 +71,7 @@  obj-$(CONFIG_FRONTSWAP)	+= frontswap.o
 obj-$(CONFIG_ZSWAP)	+= zswap.o
 obj-$(CONFIG_HAS_DMA)	+= dmapool.o
 obj-$(CONFIG_HUGETLBFS)	+= hugetlb.o
+obj-$(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP)	+= hugetlb_vmemmap.o
 obj-$(CONFIG_NUMA) 	+= mempolicy.o
 obj-$(CONFIG_SPARSEMEM)	+= sparse.o
 obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 81a41aa080a5..f88032c24667 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -42,6 +42,7 @@ 
 #include <linux/userfaultfd_k.h>
 #include <linux/page_owner.h>
 #include "internal.h"
+#include "hugetlb_vmemmap.h"
 
 int hugetlb_max_hstate __read_mostly;
 unsigned int default_hstate_idx;
@@ -3285,6 +3286,8 @@  void __init hugetlb_add_hstate(unsigned int order)
 	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
 					huge_page_size(h)/1024);
 
+	hugetlb_vmemmap_init(h);
+
 	parsed_hstate = h;
 }
 
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
new file mode 100644
index 000000000000..a6c9948302e2
--- /dev/null
+++ b/mm/hugetlb_vmemmap.c
@@ -0,0 +1,108 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Free some vmemmap pages of HugeTLB
+ *
+ * Copyright (c) 2020, Bytedance. All rights reserved.
+ *
+ *     Author: Muchun Song <songmuchun@bytedance.com>
+ *
+ * Nowadays we track the status of physical page frames using struct page
+ * structures arranged in one or more arrays. And here exists one-to-one
+ * mapping between the physical page frame and the corresponding struct page
+ * structure.
+ *
+ * The HugeTLB support is built on top of multiple page size support that
+ * is provided by most modern architectures. For example, x86 CPUs normally
+ * support 4K and 2M (1G if architecturally supported) page sizes. Every
+ * HugeTLB has more than one struct page structure. The 2M HugeTLB has 512
+ * struct page structure and 1G HugeTLB has 4096 struct page structures. But
+ * in the core of HugeTLB only uses the first 4 (Use of first 4 struct page
+ * structures comes from HUGETLB_CGROUP_MIN_ORDER.) struct page structures to
+ * store metadata associated with each HugeTLB. The rest of the struct page
+ * structures are usually read the compound_head field which are all the same
+ * value. If we can free some struct page memory to buddy system so that we
+ * can save a lot of memory.
+ *
+ * When the system boot up, every 2M HugeTLB has 512 struct page structures
+ * which size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE).
+ *
+ *    HugeTLB                  struct pages(8 pages)         page frame(8 pages)
+ * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
+ * |           |                     |     0     | -------------> |     0     |
+ * |           |                     |     1     | -------------> |     1     |
+ * |           |                     |     2     | -------------> |     2     |
+ * |           |                     |     3     | -------------> |     3     |
+ * |           |                     |     4     | -------------> |     4     |
+ * |     2M    |                     |     5     | -------------> |     5     |
+ * |           |                     |     6     | -------------> |     6     |
+ * |           |                     |     7     | -------------> |     7     |
+ * |           |                     +-----------+                +-----------+
+ * |           |
+ * |           |
+ * +-----------+
+ *
+ *
+ * When a HugeTLB is preallocated, we can change the mapping from above to
+ * bellow.
+ *
+ *    HugeTLB                  struct pages(8 pages)         page frame(8 pages)
+ * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
+ * |           |                     |     0     | -------------> |     0     |
+ * |           |                     |     1     | -------------> |     1     |
+ * |           |                     |     2     | -------------> +-----------+
+ * |           |                     |     3     | -----------------^ ^ ^ ^ ^
+ * |           |                     |     4     | -------------------+ | | |
+ * |     2M    |                     |     5     | ---------------------+ | |
+ * |           |                     |     6     | -----------------------+ |
+ * |           |                     |     7     | -------------------------+
+ * |           |                     +-----------+
+ * |           |
+ * |           |
+ * +-----------+
+ *
+ * For tail pages, the value of compound_head is the same. So we can reuse
+ * first page of tail page structures. We map the virtual addresses of the
+ * remaining 6 pages of tail page structures to the first tail page structures,
+ * and then free these 6 page frames. Therefore, we need to reserve at least 2
+ * pages as vmemmap areas.
+ *
+ * When a HugeTLB is freed to the buddy system, we should allocate 6 pages for
+ * vmemmap pages and restore the previous mapping relationship.
+ */
+#define pr_fmt(fmt)	"HugeTLB Vmemmap: " fmt
+
+#include "hugetlb_vmemmap.h"
+
+/*
+ * There are 512 struct page structures(8 pages) associated with each 2MB
+ * hugetlb page. For tail pages, the value of compound_head is the same.
+ * So we can reuse first page of tail page structures. We map the virtual
+ * addresses of the remaining 6 pages of tail page structures to the first
+ * tail page struct, and then free these 6 pages. Therefore, we need to
+ * reserve at least 2 pages as vmemmap areas.
+ */
+#define RESERVE_VMEMMAP_NR		2U
+
+void __init hugetlb_vmemmap_init(struct hstate *h)
+{
+	unsigned int order = huge_page_order(h);
+	unsigned int vmemmap_pages;
+
+	vmemmap_pages = ((1 << order) * sizeof(struct page)) >> PAGE_SHIFT;
+	/*
+	 * The head page and the first tail page are not to be freed to buddy
+	 * system, the others page will map to the first tail page. So there
+	 * are (@vmemmap_pages - RESERVE_VMEMMAP_NR) pages can be freed.
+	 *
+	 * Could RESERVE_VMEMMAP_NR be greater than @vmemmap_pages? This is
+	 * not expected to happen unless the system is corrupted. So on the
+	 * safe side, it is only a safety net.
+	 */
+	if (likely(vmemmap_pages > RESERVE_VMEMMAP_NR))
+		h->nr_free_vmemmap_pages = vmemmap_pages - RESERVE_VMEMMAP_NR;
+	else
+		h->nr_free_vmemmap_pages = 0;
+
+	pr_debug("can free %d vmemmap pages for %s\n", h->nr_free_vmemmap_pages,
+		 h->name);
+}
diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
new file mode 100644
index 000000000000..40c0c7dfb60d
--- /dev/null
+++ b/mm/hugetlb_vmemmap.h
@@ -0,0 +1,20 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Free some vmemmap pages of HugeTLB
+ *
+ * Copyright (c) 2020, Bytedance. All rights reserved.
+ *
+ *     Author: Muchun Song <songmuchun@bytedance.com>
+ */
+#ifndef _LINUX_HUGETLB_VMEMMAP_H
+#define _LINUX_HUGETLB_VMEMMAP_H
+#include <linux/hugetlb.h>
+
+#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
+void __init hugetlb_vmemmap_init(struct hstate *h);
+#else
+static inline void hugetlb_vmemmap_init(struct hstate *h)
+{
+}
+#endif /* CONFIG_HUGETLB_PAGE_FREE_VMEMMAP */
+#endif /* _LINUX_HUGETLB_VMEMMAP_H */