Message ID | 20201113105952.11638-6-songmuchun@bytedance.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Free some vmemmap pages of hugetlb page | expand |
On Fri, Nov 13, 2020 at 06:59:36PM +0800, Muchun Song wrote: > +#define page_huge_pte(page) ((page)->pmd_huge_pte) Seems you do not need this one anymore. > +void vmemmap_pgtable_free(struct page *page) > +{ > + struct page *pte_page, *t_page; > + > + list_for_each_entry_safe(pte_page, t_page, &page->lru, lru) { > + list_del(&pte_page->lru); > + pte_free_kernel(&init_mm, page_to_virt(pte_page)); > + } > +} > + > +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page) > +{ > + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); > + > + /* Store preallocated pages on huge page lru list */ > + INIT_LIST_HEAD(&page->lru); > + > + while (nr--) { > + pte_t *pte_p; > + > + pte_p = pte_alloc_one_kernel(&init_mm); > + if (!pte_p) > + goto out; > + list_add(&virt_to_page(pte_p)->lru, &page->lru); > + } Definetely this looks better and easier to handle. Btw, did you explore Matthew's hint about instead of allocating a new page, using one of the ones you are going to free to store the ptes? I am not sure whether it is feasible at all though. > --- a/mm/hugetlb_vmemmap.h > +++ b/mm/hugetlb_vmemmap.h > @@ -9,12 +9,24 @@ > #ifndef _LINUX_HUGETLB_VMEMMAP_H > #define _LINUX_HUGETLB_VMEMMAP_H > #include <linux/hugetlb.h> > +#include <linux/mm.h> why do we need this here?
On Tue, Nov 17, 2020 at 11:06 PM Oscar Salvador <osalvador@suse.de> wrote: > > On Fri, Nov 13, 2020 at 06:59:36PM +0800, Muchun Song wrote: > > +#define page_huge_pte(page) ((page)->pmd_huge_pte) Yeah, I forgot to remove it. Thanks. > > Seems you do not need this one anymore. > > > +void vmemmap_pgtable_free(struct page *page) > > +{ > > + struct page *pte_page, *t_page; > > + > > + list_for_each_entry_safe(pte_page, t_page, &page->lru, lru) { > > + list_del(&pte_page->lru); > > + pte_free_kernel(&init_mm, page_to_virt(pte_page)); > > + } > > +} > > + > > +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page) > > +{ > > + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); > > + > > + /* Store preallocated pages on huge page lru list */ > > + INIT_LIST_HEAD(&page->lru); > > + > > + while (nr--) { > > + pte_t *pte_p; > > + > > + pte_p = pte_alloc_one_kernel(&init_mm); > > + if (!pte_p) > > + goto out; > > + list_add(&virt_to_page(pte_p)->lru, &page->lru); > > + } > > Definetely this looks better and easier to handle. > Btw, did you explore Matthew's hint about instead of allocating a new page, > using one of the ones you are going to free to store the ptes? Oh, sorry for missing his reply. It is a good idea. I will start an investigation. Thanks for reminding me. > I am not sure whether it is feasible at all though. > > > > --- a/mm/hugetlb_vmemmap.h > > +++ b/mm/hugetlb_vmemmap.h > > @@ -9,12 +9,24 @@ > > #ifndef _LINUX_HUGETLB_VMEMMAP_H > > #define _LINUX_HUGETLB_VMEMMAP_H > > #include <linux/hugetlb.h> > > +#include <linux/mm.h> > > why do we need this here? Yeah, also can remove:). > > -- > Oscar Salvador > SUSE L3 -- Yours, Muchun
On Tue, Nov 17, 2020 at 11:06 PM Oscar Salvador <osalvador@suse.de> wrote: > > On Fri, Nov 13, 2020 at 06:59:36PM +0800, Muchun Song wrote: > > +#define page_huge_pte(page) ((page)->pmd_huge_pte) > > Seems you do not need this one anymore. > > > +void vmemmap_pgtable_free(struct page *page) > > +{ > > + struct page *pte_page, *t_page; > > + > > + list_for_each_entry_safe(pte_page, t_page, &page->lru, lru) { > > + list_del(&pte_page->lru); > > + pte_free_kernel(&init_mm, page_to_virt(pte_page)); > > + } > > +} > > + > > +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page) > > +{ > > + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); > > + > > + /* Store preallocated pages on huge page lru list */ > > + INIT_LIST_HEAD(&page->lru); > > + > > + while (nr--) { > > + pte_t *pte_p; > > + > > + pte_p = pte_alloc_one_kernel(&init_mm); > > + if (!pte_p) > > + goto out; > > + list_add(&virt_to_page(pte_p)->lru, &page->lru); > > + } > > Definetely this looks better and easier to handle. > Btw, did you explore Matthew's hint about instead of allocating a new page, > using one of the ones you are going to free to store the ptes? > I am not sure whether it is feasible at all though. Hi Oscar and Matthew, I have started an investigation about this. Finally, I think that it may not be feasible. If we use a vmemmap page frame as a page table when we split the PMD table firstly, in this stage, we need to set 512 pte entry to the vmemmap page frame. If someone reads the tail struct page struct of the HugeTLB, it can get the arbitrary value (I am not sure it actually exists, maybe the memory compaction module can do this). So on the safe side, I think that allocating a new page is a good choice. Thanks. > > > > --- a/mm/hugetlb_vmemmap.h > > +++ b/mm/hugetlb_vmemmap.h > > @@ -9,12 +9,24 @@ > > #ifndef _LINUX_HUGETLB_VMEMMAP_H > > #define _LINUX_HUGETLB_VMEMMAP_H > > #include <linux/hugetlb.h> > > +#include <linux/mm.h> > > why do we need this here? > > -- > Oscar Salvador > SUSE L3
On 11/18/20 10:17 PM, Muchun Song wrote: > On Tue, Nov 17, 2020 at 11:06 PM Oscar Salvador <osalvador@suse.de> wrote: >> >> On Fri, Nov 13, 2020 at 06:59:36PM +0800, Muchun Song wrote: >>> +#define page_huge_pte(page) ((page)->pmd_huge_pte) >> >> Seems you do not need this one anymore. >> >>> +void vmemmap_pgtable_free(struct page *page) >>> +{ >>> + struct page *pte_page, *t_page; >>> + >>> + list_for_each_entry_safe(pte_page, t_page, &page->lru, lru) { >>> + list_del(&pte_page->lru); >>> + pte_free_kernel(&init_mm, page_to_virt(pte_page)); >>> + } >>> +} >>> + >>> +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page) >>> +{ >>> + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); >>> + >>> + /* Store preallocated pages on huge page lru list */ >>> + INIT_LIST_HEAD(&page->lru); >>> + >>> + while (nr--) { >>> + pte_t *pte_p; >>> + >>> + pte_p = pte_alloc_one_kernel(&init_mm); >>> + if (!pte_p) >>> + goto out; >>> + list_add(&virt_to_page(pte_p)->lru, &page->lru); >>> + } >> >> Definetely this looks better and easier to handle. >> Btw, did you explore Matthew's hint about instead of allocating a new page, >> using one of the ones you are going to free to store the ptes? >> I am not sure whether it is feasible at all though. > > Hi Oscar and Matthew, > > I have started an investigation about this. Finally, I think that it > may not be feasible. If we use a vmemmap page frame as a > page table when we split the PMD table firstly, in this stage, > we need to set 512 pte entry to the vmemmap page frame. If > someone reads the tail struct page struct of the HugeTLB, > it can get the arbitrary value (I am not sure it actually exists, > maybe the memory compaction module can do this). So on > the safe side, I think that allocating a new page is a good > choice. Thanks for looking into this. If I understand correctly, the issue is that you need the pte page to set up the new mappings. In your current code, this is done before removing the pages of struct pages. This keeps everything 'consistent' as things are remapped. If you want to use one of the 'pages of struct pages' for the new pte page, then there will be a period of time when things are inconsistent. Before setting up the mapping, some code could potentially access that pages of struct pages. I tend to agree that allocating allocating a new page is the safest thing to do here. Or, perhaps someone can think of a way make this safe.
On 11/13/20 2:59 AM, Muchun Song wrote: > On x86_64, vmemmap is always PMD mapped if the machine has hugepages > support and if we have 2MB contiguos pages and PMD aligned. If we want contiguous alignment > to free the unused vmemmap pages, we have to split the huge pmd firstly. > So we should pre-allocate pgtable to split PMD to PTE. > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > --- > mm/hugetlb_vmemmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > mm/hugetlb_vmemmap.h | 12 +++++++++ > 2 files changed, 85 insertions(+) Thanks for the cleanup. Oscar made some other comments. I only have one additional minor comment below. With those minor cleanups, Acked-by: Mike Kravetz <mike.kravetz@oracle.com> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c ... > +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page) > +{ > + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); > + > + /* Store preallocated pages on huge page lru list */ Let's expland the above comment to something like this: /* * Use the huge page lru list to temporarily store the preallocated * pages. The preallocated pages are used and the list is emptied * before the huge page is put into use. When the huge page is put * into use by prep_new_huge_page() the list will be reinitialized. */ > + INIT_LIST_HEAD(&page->lru); > + > + while (nr--) { > + pte_t *pte_p; > + > + pte_p = pte_alloc_one_kernel(&init_mm); > + if (!pte_p) > + goto out; > + list_add(&virt_to_page(pte_p)->lru, &page->lru); > + } > + > + return 0; > +out: > + vmemmap_pgtable_free(page); > + return -ENOMEM; > +}
On Fri, Nov 20, 2020 at 7:22 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 11/18/20 10:17 PM, Muchun Song wrote: > > On Tue, Nov 17, 2020 at 11:06 PM Oscar Salvador <osalvador@suse.de> wrote: > >> > >> On Fri, Nov 13, 2020 at 06:59:36PM +0800, Muchun Song wrote: > >>> +#define page_huge_pte(page) ((page)->pmd_huge_pte) > >> > >> Seems you do not need this one anymore. > >> > >>> +void vmemmap_pgtable_free(struct page *page) > >>> +{ > >>> + struct page *pte_page, *t_page; > >>> + > >>> + list_for_each_entry_safe(pte_page, t_page, &page->lru, lru) { > >>> + list_del(&pte_page->lru); > >>> + pte_free_kernel(&init_mm, page_to_virt(pte_page)); > >>> + } > >>> +} > >>> + > >>> +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page) > >>> +{ > >>> + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); > >>> + > >>> + /* Store preallocated pages on huge page lru list */ > >>> + INIT_LIST_HEAD(&page->lru); > >>> + > >>> + while (nr--) { > >>> + pte_t *pte_p; > >>> + > >>> + pte_p = pte_alloc_one_kernel(&init_mm); > >>> + if (!pte_p) > >>> + goto out; > >>> + list_add(&virt_to_page(pte_p)->lru, &page->lru); > >>> + } > >> > >> Definetely this looks better and easier to handle. > >> Btw, did you explore Matthew's hint about instead of allocating a new page, > >> using one of the ones you are going to free to store the ptes? > >> I am not sure whether it is feasible at all though. > > > > Hi Oscar and Matthew, > > > > I have started an investigation about this. Finally, I think that it > > may not be feasible. If we use a vmemmap page frame as a > > page table when we split the PMD table firstly, in this stage, > > we need to set 512 pte entry to the vmemmap page frame. If > > someone reads the tail struct page struct of the HugeTLB, > > it can get the arbitrary value (I am not sure it actually exists, > > maybe the memory compaction module can do this). So on > > the safe side, I think that allocating a new page is a good > > choice. > > Thanks for looking into this. > > If I understand correctly, the issue is that you need the pte page to set > up the new mappings. In your current code, this is done before removing > the pages of struct pages. This keeps everything 'consistent' as things > are remapped. > > If you want to use one of the 'pages of struct pages' for the new pte > page, then there will be a period of time when things are inconsistent. > Before setting up the mapping, some code could potentially access that > pages of struct pages. Yeah, you are right. > > I tend to agree that allocating allocating a new page is the safest thing > to do here. Or, perhaps someone can think of a way make this safe. > -- > Mike Kravetz
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index a6c9948302e2..b7dfa97b4ea9 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -71,6 +71,8 @@ */ #define pr_fmt(fmt) "HugeTLB Vmemmap: " fmt +#include <linux/list.h> +#include <asm/pgalloc.h> #include "hugetlb_vmemmap.h" /* @@ -83,6 +85,77 @@ */ #define RESERVE_VMEMMAP_NR 2U +#ifndef VMEMMAP_HPAGE_SHIFT +#define VMEMMAP_HPAGE_SHIFT HPAGE_SHIFT +#endif +#define VMEMMAP_HPAGE_ORDER (VMEMMAP_HPAGE_SHIFT - PAGE_SHIFT) +#define VMEMMAP_HPAGE_NR (1 << VMEMMAP_HPAGE_ORDER) +#define VMEMMAP_HPAGE_SIZE ((1UL) << VMEMMAP_HPAGE_SHIFT) +#define VMEMMAP_HPAGE_MASK (~(VMEMMAP_HPAGE_SIZE - 1)) + +#define page_huge_pte(page) ((page)->pmd_huge_pte) + +static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h) +{ + return h->nr_free_vmemmap_pages; +} + +static inline unsigned int vmemmap_pages_per_hpage(struct hstate *h) +{ + return free_vmemmap_pages_per_hpage(h) + RESERVE_VMEMMAP_NR; +} + +static inline unsigned long vmemmap_pages_size_per_hpage(struct hstate *h) +{ + return (unsigned long)vmemmap_pages_per_hpage(h) << PAGE_SHIFT; +} + +static inline unsigned int pgtable_pages_to_prealloc_per_hpage(struct hstate *h) +{ + unsigned long vmemmap_size = vmemmap_pages_size_per_hpage(h); + + /* + * No need pre-allocate page tables when there is no vmemmap pages + * to free. + */ + if (!free_vmemmap_pages_per_hpage(h)) + return 0; + + return ALIGN(vmemmap_size, VMEMMAP_HPAGE_SIZE) >> VMEMMAP_HPAGE_SHIFT; +} + +void vmemmap_pgtable_free(struct page *page) +{ + struct page *pte_page, *t_page; + + list_for_each_entry_safe(pte_page, t_page, &page->lru, lru) { + list_del(&pte_page->lru); + pte_free_kernel(&init_mm, page_to_virt(pte_page)); + } +} + +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page) +{ + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); + + /* Store preallocated pages on huge page lru list */ + INIT_LIST_HEAD(&page->lru); + + while (nr--) { + pte_t *pte_p; + + pte_p = pte_alloc_one_kernel(&init_mm); + if (!pte_p) + goto out; + list_add(&virt_to_page(pte_p)->lru, &page->lru); + } + + return 0; +out: + vmemmap_pgtable_free(page); + return -ENOMEM; +} + void __init hugetlb_vmemmap_init(struct hstate *h) { unsigned int order = huge_page_order(h); diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index 40c0c7dfb60d..2a72d2f62411 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -9,12 +9,24 @@ #ifndef _LINUX_HUGETLB_VMEMMAP_H #define _LINUX_HUGETLB_VMEMMAP_H #include <linux/hugetlb.h> +#include <linux/mm.h> #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP void __init hugetlb_vmemmap_init(struct hstate *h); +int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page); +void vmemmap_pgtable_free(struct page *page); #else static inline void hugetlb_vmemmap_init(struct hstate *h) { } + +static inline int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page) +{ + return 0; +} + +static inline void vmemmap_pgtable_free(struct page *page) +{ +} #endif /* CONFIG_HUGETLB_PAGE_FREE_VMEMMAP */ #endif /* _LINUX_HUGETLB_VMEMMAP_H */
On x86_64, vmemmap is always PMD mapped if the machine has hugepages support and if we have 2MB contiguos pages and PMD aligned. If we want to free the unused vmemmap pages, we have to split the huge pmd firstly. So we should pre-allocate pgtable to split PMD to PTE. Signed-off-by: Muchun Song <songmuchun@bytedance.com> --- mm/hugetlb_vmemmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++ mm/hugetlb_vmemmap.h | 12 +++++++++ 2 files changed, 85 insertions(+)