Message ID | 20160914071901.8127-3-juerg.haefliger@hpe.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 09/14/2016 12:19 AM, Juerg Haefliger wrote: > Allocating a page to userspace that was previously allocated to the > kernel requires an expensive TLB shootdown. To minimize this, we only > put non-kernel pages into the hot cache to favor their allocation. Hi, I had some questions about this the last time you posted it. Maybe you want to address them now. -- But kernel allocations do allocate from these pools, right? Does this just mean that kernel allocations usually have to pay the penalty to convert a page? So, what's the logic here? You're assuming that order-0 kernel allocations are more rare than allocations for userspace?
Hi Dave, On 09/14/2016 04:33 PM, Dave Hansen wrote: > On 09/14/2016 12:19 AM, Juerg Haefliger wrote: >> Allocating a page to userspace that was previously allocated to the >> kernel requires an expensive TLB shootdown. To minimize this, we only >> put non-kernel pages into the hot cache to favor their allocation. > > Hi, I had some questions about this the last time you posted it. Maybe > you want to address them now. I did reply: https://lkml.org/lkml/2016/9/5/249 ...Juerg > -- > > But kernel allocations do allocate from these pools, right? Does this > just mean that kernel allocations usually have to pay the penalty to > convert a page? > > So, what's the logic here? You're assuming that order-0 kernel > allocations are more rare than allocations for userspace? >
> On 09/02/2016 10:39 PM, Dave Hansen wrote: >> On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >> Does this >> just mean that kernel allocations usually have to pay the penalty to >> convert a page? > > Only pages that are allocated for userspace (gfp & GFP_HIGHUSER == GFP_HIGHUSER) which were > previously allocated for the kernel (gfp & GFP_HIGHUSER != GFP_HIGHUSER) have to pay the penalty. > >> So, what's the logic here? You're assuming that order-0 kernel >> allocations are more rare than allocations for userspace? > > The logic is to put reclaimed kernel pages into the cold cache to > postpone their allocation as long as possible to minimize (potential) > TLB flushes. OK, but if we put them in the cold area but kernel allocations pull them from the hot cache, aren't we virtually guaranteeing that kernel allocations will have to to TLB shootdown to convert a page? It seems like you also need to convert all kernel allocations to pull from the cold area.
On 09/14/2016 04:48 PM, Dave Hansen wrote: >> On 09/02/2016 10:39 PM, Dave Hansen wrote: >>> On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >>> Does this >>> just mean that kernel allocations usually have to pay the penalty to >>> convert a page? >> >> Only pages that are allocated for userspace (gfp & GFP_HIGHUSER == GFP_HIGHUSER) which were >> previously allocated for the kernel (gfp & GFP_HIGHUSER != GFP_HIGHUSER) have to pay the penalty. >> >>> So, what's the logic here? You're assuming that order-0 kernel >>> allocations are more rare than allocations for userspace? >> >> The logic is to put reclaimed kernel pages into the cold cache to >> postpone their allocation as long as possible to minimize (potential) >> TLB flushes. > > OK, but if we put them in the cold area but kernel allocations pull them > from the hot cache, aren't we virtually guaranteeing that kernel > allocations will have to to TLB shootdown to convert a page? No. Allocations for the kernel never require a TLB shootdown. Only allocations for userspace (and only if the page was previously a kernel page). > It seems like you also need to convert all kernel allocations to pull > from the cold area. Kernel allocations can continue to pull from the hot cache. Maybe introduce another cache for the userspace pages? But I'm not sure what other implications this might have. ...Juerg
diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0241c8a7e72a..83404b41e52d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index ddb1be05485d..f8dffda0c961 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +}
Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-)