Message ID | 20210225132130.26451-1-songmuchun@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | Free some vmemmap pages of HugeTLB page | expand |
On 26/2/21 12:21 am, Muchun Song wrote: > Hi all, > > This patch series will free some vmemmap pages(struct page structures) > associated with each hugetlbpage when preallocated to save memory. > > In order to reduce the difficulty of the first version of code review. > From this version, we disable PMD/huge page mapping of vmemmap if this > feature was enabled. This accutualy eliminate a bunch of the complex code > doing page table manipulation. When this patch series is solid, we cam add > the code of vmemmap page table manipulation in the future. > > The struct page structures (page structs) are used to describe a physical > page frame. By default, there is a one-to-one mapping from a page frame to > it's corresponding page struct. > > The HugeTLB pages consist of multiple base page size pages and is supported > by many architectures. See hugetlbpage.rst in the Documentation directory > for more details. On the x86 architecture, HugeTLB pages of size 2MB and 1GB > are currently supported. Since the base page size on x86 is 4KB, a 2MB > HugeTLB page consists of 512 base pages and a 1GB HugeTLB page consists of > 4096 base pages. For each base page, there is a corresponding page struct. > > Within the HugeTLB subsystem, only the first 4 page structs are used to > contain unique information about a HugeTLB page. HUGETLB_CGROUP_MIN_ORDER > provides this upper limit. The only 'useful' information in the remaining > page structs is the compound_head field, and this field is the same for all > tail pages. The HUGETLB_CGROUP_MIN_ORDER is only when CGROUP_HUGETLB is enabled, but I guess that does not matter > > By removing redundant page structs for HugeTLB pages, memory can returned to > the buddy allocator for other uses. > > When the system boot up, every 2M HugeTLB has 512 struct page structs which > size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE). > > HugeTLB struct pages(8 pages) page frame(8 pages) > +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ > | | | 0 | -------------> | 0 | > | | +-----------+ +-----------+ > | | | 1 | -------------> | 1 | > | | +-----------+ +-----------+ > | | | 2 | -------------> | 2 | > | | +-----------+ +-----------+ > | | | 3 | -------------> | 3 | > | | +-----------+ +-----------+ > | | | 4 | -------------> | 4 | > | 2MB | +-----------+ +-----------+ > | | | 5 | -------------> | 5 | > | | +-----------+ +-----------+ > | | | 6 | -------------> | 6 | > | | +-----------+ +-----------+ > | | | 7 | -------------> | 7 | > | | +-----------+ +-----------+ > | | > | | > | | > +-----------+ > > The value of page->compound_head is the same for all tail pages. The first > page of page structs (page 0) associated with the HugeTLB page contains the 4 > page structs necessary to describe the HugeTLB. The only use of the remaining > pages of page structs (page 1 to page 7) is to point to page->compound_head. > Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs > will be used for each HugeTLB page. This will allow us to free the remaining > 6 pages to the buddy allocator. What is page 1 used for? page 0 carries the 4 struct pages needed, does compound_head need a full page? IOW, why do we need two full pages -- may be the patches have the answer to something I am missing? > > Here is how things look after remapping. > > HugeTLB struct pages(8 pages) page frame(8 pages) > +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ > | | | 0 | -------------> | 0 | > | | +-----------+ +-----------+ > | | | 1 | -------------> | 1 | > | | +-----------+ +-----------+ > | | | 2 | ----------------^ ^ ^ ^ ^ ^ > | | +-----------+ | | | | | > | | | 3 | ------------------+ | | | | > | | +-----------+ | | | | > | | | 4 | --------------------+ | | | > | 2MB | +-----------+ | | | > | | | 5 | ----------------------+ | | > | | +-----------+ | | > | | | 6 | ------------------------+ | > | | +-----------+ | > | | | 7 | --------------------------+ > | | +-----------+ > | | > | | > | | > +-----------+ > > When a HugeTLB is freed to the buddy system, we should allocate 6 pages for > vmemmap pages and restore the previous mapping relationship. > Can these 6 pages come from the hugeTLB page itself? When you say 6 pages, I presume you mean 6 pages of PAGE_SIZE > Apart from 2MB HugeTLB page, we also have 1GB HugeTLB page. It is similar > to the 2MB HugeTLB page. We also can use this approach to free the vmemmap > pages. > > In this case, for the 1GB HugeTLB page, we can save 4094 pages. This is a > very substantial gain. On our server, run some SPDK/QEMU applications which > will use 1024GB hugetlbpage. With this feature enabled, we can save ~16GB > (1G hugepage)/~12GB (2MB hugepage) memory. Thanks, Balbir Singh
On Thu, Mar 4, 2021 at 11:14 AM Singh, Balbir <bsingharora@gmail.com> wrote: > > On 26/2/21 12:21 am, Muchun Song wrote: > > Hi all, > > > > This patch series will free some vmemmap pages(struct page structures) > > associated with each hugetlbpage when preallocated to save memory. > > > > In order to reduce the difficulty of the first version of code review. > > From this version, we disable PMD/huge page mapping of vmemmap if this > > feature was enabled. This accutualy eliminate a bunch of the complex code > > doing page table manipulation. When this patch series is solid, we cam add > > the code of vmemmap page table manipulation in the future. > > > > The struct page structures (page structs) are used to describe a physical > > page frame. By default, there is a one-to-one mapping from a page frame to > > it's corresponding page struct. > > > > The HugeTLB pages consist of multiple base page size pages and is supported > > by many architectures. See hugetlbpage.rst in the Documentation directory > > for more details. On the x86 architecture, HugeTLB pages of size 2MB and 1GB > > are currently supported. Since the base page size on x86 is 4KB, a 2MB > > HugeTLB page consists of 512 base pages and a 1GB HugeTLB page consists of > > 4096 base pages. For each base page, there is a corresponding page struct. > > > > Within the HugeTLB subsystem, only the first 4 page structs are used to > > contain unique information about a HugeTLB page. HUGETLB_CGROUP_MIN_ORDER > > provides this upper limit. The only 'useful' information in the remaining > > page structs is the compound_head field, and this field is the same for all > > tail pages. > > The HUGETLB_CGROUP_MIN_ORDER is only when CGROUP_HUGETLB is enabled, but I guess > that does not matter Agree. > > > > > By removing redundant page structs for HugeTLB pages, memory can returned to > > the buddy allocator for other uses. > > > > When the system boot up, every 2M HugeTLB has 512 struct page structs which > > size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE). > > > > HugeTLB struct pages(8 pages) page frame(8 pages) > > +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ > > | | | 0 | -------------> | 0 | > > | | +-----------+ +-----------+ > > | | | 1 | -------------> | 1 | > > | | +-----------+ +-----------+ > > | | | 2 | -------------> | 2 | > > | | +-----------+ +-----------+ > > | | | 3 | -------------> | 3 | > > | | +-----------+ +-----------+ > > | | | 4 | -------------> | 4 | > > | 2MB | +-----------+ +-----------+ > > | | | 5 | -------------> | 5 | > > | | +-----------+ +-----------+ > > | | | 6 | -------------> | 6 | > > | | +-----------+ +-----------+ > > | | | 7 | -------------> | 7 | > > | | +-----------+ +-----------+ > > | | > > | | > > | | > > +-----------+ > > > > The value of page->compound_head is the same for all tail pages. The first > > page of page structs (page 0) associated with the HugeTLB page contains the 4 > > page structs necessary to describe the HugeTLB. The only use of the remaining > > pages of page structs (page 1 to page 7) is to point to page->compound_head. > > Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs > > will be used for each HugeTLB page. This will allow us to free the remaining > > 6 pages to the buddy allocator. > > What is page 1 used for? page 0 carries the 4 struct pages needed, does compound_head > need a full page? IOW, why do we need two full pages -- may be the patches have the > answer to something I am missing? Yeah. It really can free 7 pages. But we need some work to support this. Why? Now for the 2MB HugeTLB page, we only free 6 vmemmap pages. we really can free 7 vmemmap pages. In this case, we can see 8 of the 512 struct page structures have been set PG_head flag. If we can adjust compound_head() slightly and make compound_head() return the real head struct page when the parameter is the tail struct page but with PG_head flag set. In order to make the code evolution route clearer. This feature can be a separate patch (and send it out) after this patchset is solid and applied. > > > > > Here is how things look after remapping. > > > > HugeTLB struct pages(8 pages) page frame(8 pages) > > +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ > > | | | 0 | -------------> | 0 | > > | | +-----------+ +-----------+ > > | | | 1 | -------------> | 1 | > > | | +-----------+ +-----------+ > > | | | 2 | ----------------^ ^ ^ ^ ^ ^ > > | | +-----------+ | | | | | > > | | | 3 | ------------------+ | | | | > > | | +-----------+ | | | | > > | | | 4 | --------------------+ | | | > > | 2MB | +-----------+ | | | > > | | | 5 | ----------------------+ | | > > | | +-----------+ | | > > | | | 6 | ------------------------+ | > > | | +-----------+ | > > | | | 7 | --------------------------+ > > | | +-----------+ > > | | > > | | > > | | > > +-----------+ > > > > When a HugeTLB is freed to the buddy system, we should allocate 6 pages for > > vmemmap pages and restore the previous mapping relationship. > > > > Can these 6 pages come from the hugeTLB page itself? When you say 6 pages, > I presume you mean 6 pages of PAGE_SIZE There was a decent discussion about this in a previous version of the series starting here: https://lore.kernel.org/linux-mm/20210126092942.GA10602@linux/ In this thread various other options were suggested and discussed. Thanks. > > > Apart from 2MB HugeTLB page, we also have 1GB HugeTLB page. It is similar > > to the 2MB HugeTLB page. We also can use this approach to free the vmemmap > > pages. > > > > In this case, for the 1GB HugeTLB page, we can save 4094 pages. This is a > > very substantial gain. On our server, run some SPDK/QEMU applications which > > will use 1024GB hugetlbpage. With this feature enabled, we can save ~16GB > > (1G hugepage)/~12GB (2MB hugepage) memory. > > Thanks, > Balbir Singh > > > > > > > > > > > > >
On Thu, Mar 04, 2021 at 11:36:44AM +0800, Muchun Song wrote: > On Thu, Mar 4, 2021 at 11:14 AM Singh, Balbir <bsingharora@gmail.com> wrote: > > > > On 26/2/21 12:21 am, Muchun Song wrote: > > > Hi all, > > > > > > This patch series will free some vmemmap pages(struct page structures) > > > associated with each hugetlbpage when preallocated to save memory. > > > > > > In order to reduce the difficulty of the first version of code review. > > > From this version, we disable PMD/huge page mapping of vmemmap if this > > > feature was enabled. This accutualy eliminate a bunch of the complex code > > > doing page table manipulation. When this patch series is solid, we cam add > > > the code of vmemmap page table manipulation in the future. > > > > > > The struct page structures (page structs) are used to describe a physical > > > page frame. By default, there is a one-to-one mapping from a page frame to > > > it's corresponding page struct. > > > > > > The HugeTLB pages consist of multiple base page size pages and is supported > > > by many architectures. See hugetlbpage.rst in the Documentation directory > > > for more details. On the x86 architecture, HugeTLB pages of size 2MB and 1GB > > > are currently supported. Since the base page size on x86 is 4KB, a 2MB > > > HugeTLB page consists of 512 base pages and a 1GB HugeTLB page consists of > > > 4096 base pages. For each base page, there is a corresponding page struct. > > > > > > Within the HugeTLB subsystem, only the first 4 page structs are used to > > > contain unique information about a HugeTLB page. HUGETLB_CGROUP_MIN_ORDER > > > provides this upper limit. The only 'useful' information in the remaining > > > page structs is the compound_head field, and this field is the same for all > > > tail pages. > > > > The HUGETLB_CGROUP_MIN_ORDER is only when CGROUP_HUGETLB is enabled, but I guess > > that does not matter > > Agree. > > > > > > > > > By removing redundant page structs for HugeTLB pages, memory can returned to > > > the buddy allocator for other uses. > > > > > > When the system boot up, every 2M HugeTLB has 512 struct page structs which > > > size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE). > > > > > > HugeTLB struct pages(8 pages) page frame(8 pages) > > > +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ > > > | | | 0 | -------------> | 0 | > > > | | +-----------+ +-----------+ > > > | | | 1 | -------------> | 1 | > > > | | +-----------+ +-----------+ > > > | | | 2 | -------------> | 2 | > > > | | +-----------+ +-----------+ > > > | | | 3 | -------------> | 3 | > > > | | +-----------+ +-----------+ > > > | | | 4 | -------------> | 4 | > > > | 2MB | +-----------+ +-----------+ > > > | | | 5 | -------------> | 5 | > > > | | +-----------+ +-----------+ > > > | | | 6 | -------------> | 6 | > > > | | +-----------+ +-----------+ > > > | | | 7 | -------------> | 7 | > > > | | +-----------+ +-----------+ > > > | | > > > | | > > > | | > > > +-----------+ > > > > > > The value of page->compound_head is the same for all tail pages. The first > > > page of page structs (page 0) associated with the HugeTLB page contains the 4 > > > page structs necessary to describe the HugeTLB. The only use of the remaining > > > pages of page structs (page 1 to page 7) is to point to page->compound_head. > > > Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs > > > will be used for each HugeTLB page. This will allow us to free the remaining > > > 6 pages to the buddy allocator. > > > > What is page 1 used for? page 0 carries the 4 struct pages needed, does compound_head > > need a full page? IOW, why do we need two full pages -- may be the patches have the > > answer to something I am missing? > > Yeah. It really can free 7 pages. But we need some work to support this. Why? > > Now for the 2MB HugeTLB page, we only free 6 vmemmap pages. we really can > free 7 vmemmap pages. In this case, we can see 8 of the 512 struct page > structures have been set PG_head flag. If we can adjust compound_head() > slightly and make compound_head() return the real head struct page when > the parameter is the tail struct page but with PG_head flag set. > > In order to make the code evolution route clearer. This feature can be > a separate patch (and send it out) after this patchset is solid and applied. > Makes sense! > > > > > > > > Here is how things look after remapping. > > > > > > HugeTLB struct pages(8 pages) page frame(8 pages) > > > +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ > > > | | | 0 | -------------> | 0 | > > > | | +-----------+ +-----------+ > > > | | | 1 | -------------> | 1 | > > > | | +-----------+ +-----------+ > > > | | | 2 | ----------------^ ^ ^ ^ ^ ^ > > > | | +-----------+ | | | | | > > > | | | 3 | ------------------+ | | | | > > > | | +-----------+ | | | | > > > | | | 4 | --------------------+ | | | > > > | 2MB | +-----------+ | | | > > > | | | 5 | ----------------------+ | | > > > | | +-----------+ | | > > > | | | 6 | ------------------------+ | > > > | | +-----------+ | > > > | | | 7 | --------------------------+ > > > | | +-----------+ > > > | | > > > | | > > > | | > > > +-----------+ > > > > > > When a HugeTLB is freed to the buddy system, we should allocate 6 pages for > > > vmemmap pages and restore the previous mapping relationship. > > > > > > > Can these 6 pages come from the hugeTLB page itself? When you say 6 pages, > > I presume you mean 6 pages of PAGE_SIZE > > There was a decent discussion about this in a previous version of the > series starting here: > > https://lore.kernel.org/linux-mm/20210126092942.GA10602@linux/ > > In this thread various other options were suggested and discussed. > Thanks, Balbir Singh