Message ID | 20200902180628.4052244-2-zi.yan@sent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | 1GB THP support on x86_64 | expand |
On 9/2/20 11:06 AM, Zi Yan wrote: > From: Zi Yan <ziy@nvidia.com> > > When depositing page table pages for 1GB THPs, we need 512 PTE pages + > 1 PMD page. Instead of counting and depositing 513 pages, we can use the > PMD page as a leader page and chain the rest 512 PTE pages with ->lru. > This, however, prevents us depositing PMD pages with ->lru, which is > currently used by depositing PTE pages for 2MB THPs. So add a new > pagechain container for PMD pages. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > include/linux/pagechain.h | 73 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 73 insertions(+) > create mode 100644 include/linux/pagechain.h > > diff --git a/include/linux/pagechain.h b/include/linux/pagechain.h > new file mode 100644 > index 000000000000..be536142b413 > --- /dev/null > +++ b/include/linux/pagechain.h > @@ -0,0 +1,73 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * include/linux/pagechain.h > + * > + * In many places it is efficient to batch an operation up against multiple > + * pages. A pagechain is a multipage container which is used for that. > + */ > + > +#ifndef _LINUX_PAGECHAIN_H > +#define _LINUX_PAGECHAIN_H > + > +#include <linux/slab.h> > + > +/* 14 pointers + two long's align the pagechain structure to a power of two */ > +#define PAGECHAIN_SIZE 13 OK, I'll bite. I see neither 14 pointers nor 2 longs below. Is the comment out of date or am I just confuzed? Update: struct list_head is 2 pointers, so I see 15 pointers & one unsigned int. Where are the 2 longs? > + > +struct page; > + > +struct pagechain { > + struct list_head list; > + unsigned int nr; > + struct page *pages[PAGECHAIN_SIZE]; > +}; thanks.
On 2 Sep 2020, at 16:29, Randy Dunlap wrote: > On 9/2/20 11:06 AM, Zi Yan wrote: >> From: Zi Yan <ziy@nvidia.com> >> >> When depositing page table pages for 1GB THPs, we need 512 PTE pages + >> 1 PMD page. Instead of counting and depositing 513 pages, we can use the >> PMD page as a leader page and chain the rest 512 PTE pages with ->lru. >> This, however, prevents us depositing PMD pages with ->lru, which is >> currently used by depositing PTE pages for 2MB THPs. So add a new >> pagechain container for PMD pages. >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> --- >> include/linux/pagechain.h | 73 +++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 73 insertions(+) >> create mode 100644 include/linux/pagechain.h >> >> diff --git a/include/linux/pagechain.h b/include/linux/pagechain.h >> new file mode 100644 >> index 000000000000..be536142b413 >> --- /dev/null >> +++ b/include/linux/pagechain.h >> @@ -0,0 +1,73 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +/* >> + * include/linux/pagechain.h >> + * >> + * In many places it is efficient to batch an operation up against multiple >> + * pages. A pagechain is a multipage container which is used for that. >> + */ >> + >> +#ifndef _LINUX_PAGECHAIN_H >> +#define _LINUX_PAGECHAIN_H >> + >> +#include <linux/slab.h> >> + >> +/* 14 pointers + two long's align the pagechain structure to a power of two */ >> +#define PAGECHAIN_SIZE 13 > > OK, I'll bite. I see neither 14 pointers nor 2 longs below. > Is the comment out of date or am I just confuzed? > > Update: struct list_head is 2 pointers, so I see 15 pointers & one unsigned int. > Where are the 2 longs? My bad. Will change this to: /* 15 pointers + one long align the pagechain structure to a power of two */ #define PAGECHAIN_SIZE 13 struct page; struct pagechain { struct list_head list; unsigned long nr; struct page *pages[PAGECHAIN_SIZE]; }; Thanks for checking. — Best Regards, Yan Zi
On Wed, Sep 02, 2020 at 02:06:13PM -0400, Zi Yan wrote: > When depositing page table pages for 1GB THPs, we need 512 PTE pages + > 1 PMD page. Instead of counting and depositing 513 pages, we can use the > PMD page as a leader page and chain the rest 512 PTE pages with ->lru. > This, however, prevents us depositing PMD pages with ->lru, which is > currently used by depositing PTE pages for 2MB THPs. So add a new > pagechain container for PMD pages. But you've allocated a page for the PMD table. Why can't you use that 4kB to store pointers to the 512 PTE tables? You could also use an existing data structure like the XArray (although not a pagevec).
On Wed, Sep 02, 2020 at 02:06:13PM -0400, Zi Yan wrote: > From: Zi Yan <ziy@nvidia.com> > > When depositing page table pages for 1GB THPs, we need 512 PTE pages + > 1 PMD page. Instead of counting and depositing 513 pages, we can use the > PMD page as a leader page and chain the rest 512 PTE pages with ->lru. > This, however, prevents us depositing PMD pages with ->lru, which is > currently used by depositing PTE pages for 2MB THPs. So add a new > pagechain container for PMD pages. > > Signed-off-by: Zi Yan <ziy@nvidia.com> Just deposit it to a linked list in the mm_struct as we do for PMD if split ptl disabled.
On 7 Sep 2020, at 8:22, Kirill A. Shutemov wrote: > On Wed, Sep 02, 2020 at 02:06:13PM -0400, Zi Yan wrote: >> From: Zi Yan <ziy@nvidia.com> >> >> When depositing page table pages for 1GB THPs, we need 512 PTE pages + >> 1 PMD page. Instead of counting and depositing 513 pages, we can use the >> PMD page as a leader page and chain the rest 512 PTE pages with ->lru. >> This, however, prevents us depositing PMD pages with ->lru, which is >> currently used by depositing PTE pages for 2MB THPs. So add a new >> pagechain container for PMD pages. >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> > > Just deposit it to a linked list in the mm_struct as we do for PMD if > split ptl disabled. > Thank you for checking the patches. Since we don’t have PUD split lock yet, I store the PMD page table pages in a newly added linked list head in mm_struct like you suggested above. I was too vague about my pagechain design for depositing page table pages for PUD THPs. Sorry about the confusion. Let me clarify why I am doing this pagechain here too. I am sure there would be some other designs and I am happy to change my code. In my design, I did not store all page table pages in a single list. I first deposit 512 PTE pages in one PMD page table page’s pmd_huge_pte using pgtable_trans_huge_depsit(), then deposit the PMD page to a newly added linked list in mm_struct. Since pmd_huge_pte shares space with half of lru in struct page, we cannot use lru to link all PMD pages together. As a result, I added pagechain. Also in this way, we can avoid these things: 1. when we withdraw the PMD page during PUD THP split, we don’t need to withdraw 513 page, set up one PMD page, then, deposit 512 PTE pages in that PMD page. 2. we don’t mix PMD page table pages and PTE page table pages in a single list, since they are initialized in different ways. Otherwise, we need to maintain a subtle rule in the single page table page list that in every 513 pages, first one is PMD page table page and the rest are PTE page table pages. As I am typing, I also realize that my current design does not work when PMD split lock is disabled, so I will fix it. I would store PMD pages and PTE pages in two separate lists in mm_struct. Any comments? — Best Regards, Yan Zi
On Mon, Sep 07, 2020 at 11:11:05AM -0400, Zi Yan wrote: > On 7 Sep 2020, at 8:22, Kirill A. Shutemov wrote: > > > On Wed, Sep 02, 2020 at 02:06:13PM -0400, Zi Yan wrote: > >> From: Zi Yan <ziy@nvidia.com> > >> > >> When depositing page table pages for 1GB THPs, we need 512 PTE pages + > >> 1 PMD page. Instead of counting and depositing 513 pages, we can use the > >> PMD page as a leader page and chain the rest 512 PTE pages with ->lru. > >> This, however, prevents us depositing PMD pages with ->lru, which is > >> currently used by depositing PTE pages for 2MB THPs. So add a new > >> pagechain container for PMD pages. > >> > >> Signed-off-by: Zi Yan <ziy@nvidia.com> > > > > Just deposit it to a linked list in the mm_struct as we do for PMD if > > split ptl disabled. > > > > Thank you for checking the patches. Since we don’t have PUD split lock > yet, I store the PMD page table pages in a newly added linked list head > in mm_struct like you suggested above. > > I was too vague about my pagechain design for depositing page table pages > for PUD THPs. Sorry about the confusion. Let me clarify why > I am doing this pagechain here too. I am sure there would be > some other designs and I am happy to change my code. > > In my design, I did not store all page table pages in a single list. > I first deposit 512 PTE pages in one PMD page table page’s pmd_huge_pte > using pgtable_trans_huge_depsit(), then deposit the PMD page to > a newly added linked list in mm_struct. Since pmd_huge_pte shares space > with half of lru in struct page, we cannot use lru to link all PMD > pages together. As a result, I added pagechain. Also in this way, > we can avoid these things: > > 1. when we withdraw the PMD page during PUD THP split, we don’t need > to withdraw 513 page, set up one PMD page, then, deposit 512 PTE pages > in that PMD page. > > 2. we don’t mix PMD page table pages and PTE page table pages in a single > list, since they are initialized in different ways. Otherwise, we need > to maintain a subtle rule in the single page table page list that in every > 513 pages, first one is PMD page table page and the rest are PTE page > table pages. > > As I am typing, I also realize that my current design does not work > when PMD split lock is disabled, so I will fix it. I would store PMD pages > and PTE pages in two separate lists in mm_struct. > > > Any comments? Okay, fair enough. Although, I think you can get away without a new data structure. We don't need double-linked list to deposit page tables. You can rework PTE tables deposit code to have single-linked list and use one pointer of ->lru (with proper name) and make PMD tables deposit to use the other one. This way you can avoid conflict for ->lru. Does it make sense?
On 9 Sep 2020, at 9:46, Kirill A. Shutemov wrote: > On Mon, Sep 07, 2020 at 11:11:05AM -0400, Zi Yan wrote: >> On 7 Sep 2020, at 8:22, Kirill A. Shutemov wrote: >> >>> On Wed, Sep 02, 2020 at 02:06:13PM -0400, Zi Yan wrote: >>>> From: Zi Yan <ziy@nvidia.com> >>>> >>>> When depositing page table pages for 1GB THPs, we need 512 PTE pages + >>>> 1 PMD page. Instead of counting and depositing 513 pages, we can use the >>>> PMD page as a leader page and chain the rest 512 PTE pages with ->lru. >>>> This, however, prevents us depositing PMD pages with ->lru, which is >>>> currently used by depositing PTE pages for 2MB THPs. So add a new >>>> pagechain container for PMD pages. >>>> >>>> Signed-off-by: Zi Yan <ziy@nvidia.com> >>> >>> Just deposit it to a linked list in the mm_struct as we do for PMD if >>> split ptl disabled. >>> >> >> Thank you for checking the patches. Since we don’t have PUD split lock >> yet, I store the PMD page table pages in a newly added linked list head >> in mm_struct like you suggested above. >> >> I was too vague about my pagechain design for depositing page table pages >> for PUD THPs. Sorry about the confusion. Let me clarify why >> I am doing this pagechain here too. I am sure there would be >> some other designs and I am happy to change my code. >> >> In my design, I did not store all page table pages in a single list. >> I first deposit 512 PTE pages in one PMD page table page’s pmd_huge_pte >> using pgtable_trans_huge_depsit(), then deposit the PMD page to >> a newly added linked list in mm_struct. Since pmd_huge_pte shares space >> with half of lru in struct page, we cannot use lru to link all PMD >> pages together. As a result, I added pagechain. Also in this way, >> we can avoid these things: >> >> 1. when we withdraw the PMD page during PUD THP split, we don’t need >> to withdraw 513 page, set up one PMD page, then, deposit 512 PTE pages >> in that PMD page. >> >> 2. we don’t mix PMD page table pages and PTE page table pages in a single >> list, since they are initialized in different ways. Otherwise, we need >> to maintain a subtle rule in the single page table page list that in every >> 513 pages, first one is PMD page table page and the rest are PTE page >> table pages. >> >> As I am typing, I also realize that my current design does not work >> when PMD split lock is disabled, so I will fix it. I would store PMD pages >> and PTE pages in two separate lists in mm_struct. >> >> >> Any comments? > > Okay, fair enough. > > Although, I think you can get away without a new data structure. We don't > need double-linked list to deposit page tables. You can rework PTE tables > deposit code to have single-linked list and use one pointer of ->lru (with > proper name) and make PMD tables deposit to use the other one. This way > you can avoid conflict for ->lru. > > Does it make sense? Yes. Thanks. Will do this in the next version. I think the single linked list from llist.h can be used. — Best Regards, Yan Zi
diff --git a/include/linux/pagechain.h b/include/linux/pagechain.h new file mode 100644 index 000000000000..be536142b413 --- /dev/null +++ b/include/linux/pagechain.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * include/linux/pagechain.h + * + * In many places it is efficient to batch an operation up against multiple + * pages. A pagechain is a multipage container which is used for that. + */ + +#ifndef _LINUX_PAGECHAIN_H +#define _LINUX_PAGECHAIN_H + +#include <linux/slab.h> + +/* 14 pointers + two long's align the pagechain structure to a power of two */ +#define PAGECHAIN_SIZE 13 + +struct page; + +struct pagechain { + struct list_head list; + unsigned int nr; + struct page *pages[PAGECHAIN_SIZE]; +}; + +static inline void pagechain_init(struct pagechain *pchain) +{ + pchain->nr = 0; + INIT_LIST_HEAD(&pchain->list); +} + +static inline void pagechain_reinit(struct pagechain *pchain) +{ + pchain->nr = 0; +} + +static inline unsigned int pagechain_count(struct pagechain *pchain) +{ + return pchain->nr; +} + +static inline unsigned int pagechain_space(struct pagechain *pchain) +{ + return PAGECHAIN_SIZE - pchain->nr; +} + +static inline bool pagechain_empty(struct pagechain *pchain) +{ + return pchain->nr == 0; +} + +/* + * Add a page to a pagechain. Returns the number of slots still available. + */ +static inline unsigned int pagechain_deposit(struct pagechain *pchain, struct page *page) +{ + VM_BUG_ON(!pagechain_space(pchain)); + pchain->pages[pchain->nr++] = page; + return pagechain_space(pchain); +} + +static inline struct page *pagechain_withdraw(struct pagechain *pchain) +{ + if (!pagechain_count(pchain)) + return NULL; + return pchain->pages[--pchain->nr]; +} + +void __init pagechain_cache_init(void); +struct pagechain *pagechain_alloc(void); +void pagechain_free(struct pagechain *pchain); + +#endif /* _LINUX_PAGECHAIN_H */ +