[1/3] mm: Add __page_cache_alloc_order

Message ID	20190905182348.5319-2-willy@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ftCo=XA=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47F2E2070C From: Matthew Wilcox <willy@infradead.org> To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, Kirill Shutemov <kirill@shutemov.name>, Song Liu <songliubraving@fb.com>, William Kucharski <william.kucharski@oracle.com>, Johannes Weiner <jweiner@fb.com> Subject: [PATCH 1/3] mm: Add __page_cache_alloc_order Date: Thu, 5 Sep 2019 11:23:46 -0700 Message-Id: <20190905182348.5319-2-willy@infradead.org> In-Reply-To: <20190905182348.5319-1-willy@infradead.org> References: <20190905182348.5319-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Large pages in the page cache \| expand [0/3] Large pages in the page cache [1/3] mm: Add __page_cache_alloc_order [2/3] mm: Allow large pages to be added to the page cache [3/3] mm: Allow find_get_page to be used for large pages

Message ID

20190905182348.5319-2-willy@infradead.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47F2E2070C
From: Matthew Wilcox <willy@infradead.org>
To: linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Kirill Shutemov <kirill@shutemov.name>,
	Song Liu <songliubraving@fb.com>,
	William Kucharski <william.kucharski@oracle.com>,
	Johannes Weiner <jweiner@fb.com>
Subject: [PATCH 1/3] mm: Add __page_cache_alloc_order
Date: Thu,  5 Sep 2019 11:23:46 -0700
Message-Id: <20190905182348.5319-2-willy@infradead.org>
In-Reply-To: <20190905182348.5319-1-willy@infradead.org>
References: <20190905182348.5319-1-willy@infradead.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Large pages in the page cache | expand

Commit Message

Matthew Wilcox Sept. 5, 2019, 6:23 p.m. UTC

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

This new function allows page cache pages to be allocated that are
larger than an order-0 page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 14 +++++++++++---
 mm/filemap.c            | 11 +++++++----
 2 files changed, 18 insertions(+), 7 deletions(-)

Comments

Song Liu Sept. 5, 2019, 6:58 p.m. UTC | #1

> On Sep 5, 2019, at 11:23 AM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> This new function allows page cache pages to be allocated that are
> larger than an order-0 page.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> include/linux/pagemap.h | 14 +++++++++++---
> mm/filemap.c            | 11 +++++++----
> 2 files changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 103205494ea0..d2147215d415 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -208,14 +208,22 @@ static inline int page_cache_add_speculative(struct page *page, int count)
> }
> 
> #ifdef CONFIG_NUMA
> -extern struct page *__page_cache_alloc(gfp_t gfp);
> +extern struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order);

I guess we need __page_cache_alloc(gfp_t gfp) here for CONFIG_NUMA. 


> #else
> -static inline struct page *__page_cache_alloc(gfp_t gfp)
> +static inline
> +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
> {
> -	return alloc_pages(gfp, 0);
> +	if (order > 0)
> +		gfp |= __GFP_COMP;
> +	return alloc_pages(gfp, order);
> }
> #endif
> 
> +static inline struct page *__page_cache_alloc(gfp_t gfp)
> +{
> +	return __page_cache_alloc_order(gfp, 0);

Maybe "return alloc_pages(gfp, 0);" here to avoid checking "order > 0"?

> +}
> +
> static inline struct page *page_cache_alloc(struct address_space *x)
> {
> 	return __page_cache_alloc(mapping_gfp_mask(x));
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 05a5aa82cd32..041c77c4ca56 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -957,24 +957,27 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
> EXPORT_SYMBOL_GPL(add_to_page_cache_lru);
> 
> #ifdef CONFIG_NUMA
> -struct page *__page_cache_alloc(gfp_t gfp)
> +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
> {
> 	int n;
> 	struct page *page;
> 
> +	if (order > 0)
> +		gfp |= __GFP_COMP;
> +

I think it will be good to have separate __page_cache_alloc() for order 0, 
so that we avoid checking "order > 0", but that may require too much 
duplication. So I am on the fence for this one. 

Thanks,
Song

> 	if (cpuset_do_page_mem_spread()) {
> 		unsigned int cpuset_mems_cookie;
> 		do {
> 			cpuset_mems_cookie = read_mems_allowed_begin();
> 			n = cpuset_mem_spread_node();
> -			page = __alloc_pages_node(n, gfp, 0);
> +			page = __alloc_pages_node(n, gfp, order);
> 		} while (!page && read_mems_allowed_retry(cpuset_mems_cookie));
> 
> 		return page;
> 	}
> -	return alloc_pages(gfp, 0);
> +	return alloc_pages(gfp, order);
> }
> -EXPORT_SYMBOL(__page_cache_alloc);
> +EXPORT_SYMBOL(__page_cache_alloc_order);
> #endif
> 
> /*
> -- 
> 2.23.0.rc1
>

Matthew Wilcox Sept. 5, 2019, 7:02 p.m. UTC | #2

On Thu, Sep 05, 2019 at 06:58:53PM +0000, Song Liu wrote:
> > On Sep 5, 2019, at 11:23 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > This new function allows page cache pages to be allocated that are
> > larger than an order-0 page.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> > include/linux/pagemap.h | 14 +++++++++++---
> > mm/filemap.c            | 11 +++++++----
> > 2 files changed, 18 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index 103205494ea0..d2147215d415 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -208,14 +208,22 @@ static inline int page_cache_add_speculative(struct page *page, int count)
> > }
> > 
> > #ifdef CONFIG_NUMA
> > -extern struct page *__page_cache_alloc(gfp_t gfp);
> > +extern struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order);
> 
> I guess we need __page_cache_alloc(gfp_t gfp) here for CONFIG_NUMA. 

... no?  The __page_cache_alloc() below is outside the ifdef/else/endif, so
it's the same for both NUMA and non-NUMA.

> > #else
> > -static inline struct page *__page_cache_alloc(gfp_t gfp)
> > +static inline
> > +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
> > {
> > -	return alloc_pages(gfp, 0);
> > +	if (order > 0)
> > +		gfp |= __GFP_COMP;
> > +	return alloc_pages(gfp, order);
> > }
> > #endif
> > 
> > +static inline struct page *__page_cache_alloc(gfp_t gfp)
> > +{
> > +	return __page_cache_alloc_order(gfp, 0);
> 
> Maybe "return alloc_pages(gfp, 0);" here to avoid checking "order > 0"?

For non-NUMA cases, the __page_cache_alloc_order() will be inlined into
__page_cache_alloc() and the copiler will eliminate the test.  Or you
need a better compiler ;-)

> > -struct page *__page_cache_alloc(gfp_t gfp)
> > +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
> > {
> > 	int n;
> > 	struct page *page;
> > 
> > +	if (order > 0)
> > +		gfp |= __GFP_COMP;
> > +
> 
> I think it will be good to have separate __page_cache_alloc() for order 0, 
> so that we avoid checking "order > 0", but that may require too much 
> duplication. So I am on the fence for this one. 

We're about to dive into the page allocator ... two extra instructions
here aren't going to be noticable.

Song Liu Sept. 5, 2019, 7:06 p.m. UTC | #3

> On Sep 5, 2019, at 12:02 PM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Thu, Sep 05, 2019 at 06:58:53PM +0000, Song Liu wrote:
>>> On Sep 5, 2019, at 11:23 AM, Matthew Wilcox <willy@infradead.org> wrote:
>>> This new function allows page cache pages to be allocated that are
>>> larger than an order-0 page.
>>> 
>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>> ---
>>> include/linux/pagemap.h | 14 +++++++++++---
>>> mm/filemap.c            | 11 +++++++----
>>> 2 files changed, 18 insertions(+), 7 deletions(-)
>>> 
>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>> index 103205494ea0..d2147215d415 100644
>>> --- a/include/linux/pagemap.h
>>> +++ b/include/linux/pagemap.h
>>> @@ -208,14 +208,22 @@ static inline int page_cache_add_speculative(struct page *page, int count)
>>> }
>>> 
>>> #ifdef CONFIG_NUMA
>>> -extern struct page *__page_cache_alloc(gfp_t gfp);
>>> +extern struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order);
>> 
>> I guess we need __page_cache_alloc(gfp_t gfp) here for CONFIG_NUMA. 
> 
> ... no?  The __page_cache_alloc() below is outside the ifdef/else/endif, so
> it's the same for both NUMA and non-NUMA.

You are right. I misread this one. 

> 
>>> #else
>>> -static inline struct page *__page_cache_alloc(gfp_t gfp)
>>> +static inline
>>> +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
>>> {
>>> -	return alloc_pages(gfp, 0);
>>> +	if (order > 0)
>>> +		gfp |= __GFP_COMP;
>>> +	return alloc_pages(gfp, order);
>>> }
>>> #endif
>>> 
>>> +static inline struct page *__page_cache_alloc(gfp_t gfp)
>>> +{
>>> +	return __page_cache_alloc_order(gfp, 0);
>> 
>> Maybe "return alloc_pages(gfp, 0);" here to avoid checking "order > 0"?
> 
> For non-NUMA cases, the __page_cache_alloc_order() will be inlined into
> __page_cache_alloc() and the copiler will eliminate the test.  Or you
> need a better compiler ;-)
> 
>>> -struct page *__page_cache_alloc(gfp_t gfp)
>>> +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
>>> {
>>> 	int n;
>>> 	struct page *page;
>>> 
>>> +	if (order > 0)
>>> +		gfp |= __GFP_COMP;
>>> +
>> 
>> I think it will be good to have separate __page_cache_alloc() for order 0, 
>> so that we avoid checking "order > 0", but that may require too much 
>> duplication. So I am on the fence for this one. 
> 
> We're about to dive into the page allocator ... two extra instructions
> here aren't going to be noticable.

True. Thanks for the explanation. 

Acked-by: Song Liu <songliubraving@fb.com>

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 103205494ea0..d2147215d415 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -208,14 +208,22 @@  static inline int page_cache_add_speculative(struct page *page, int count)
 }
 
 #ifdef CONFIG_NUMA
-extern struct page *__page_cache_alloc(gfp_t gfp);
+extern struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order);
 #else
-static inline struct page *__page_cache_alloc(gfp_t gfp)
+static inline
+struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
 {
-	return alloc_pages(gfp, 0);
+	if (order > 0)
+		gfp |= __GFP_COMP;
+	return alloc_pages(gfp, order);
 }
 #endif
 
+static inline struct page *__page_cache_alloc(gfp_t gfp)
+{
+	return __page_cache_alloc_order(gfp, 0);
+}
+
 static inline struct page *page_cache_alloc(struct address_space *x)
 {
 	return __page_cache_alloc(mapping_gfp_mask(x));
diff --git a/mm/filemap.c b/mm/filemap.c
index 05a5aa82cd32..041c77c4ca56 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -957,24 +957,27 @@  int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 EXPORT_SYMBOL_GPL(add_to_page_cache_lru);
 
 #ifdef CONFIG_NUMA
-struct page *__page_cache_alloc(gfp_t gfp)
+struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
 {
 	int n;
 	struct page *page;
 
+	if (order > 0)
+		gfp |= __GFP_COMP;
+
 	if (cpuset_do_page_mem_spread()) {
 		unsigned int cpuset_mems_cookie;
 		do {
 			cpuset_mems_cookie = read_mems_allowed_begin();
 			n = cpuset_mem_spread_node();
-			page = __alloc_pages_node(n, gfp, 0);
+			page = __alloc_pages_node(n, gfp, order);
 		} while (!page && read_mems_allowed_retry(cpuset_mems_cookie));
 
 		return page;
 	}
-	return alloc_pages(gfp, 0);
+	return alloc_pages(gfp, order);
 }
-EXPORT_SYMBOL(__page_cache_alloc);
+EXPORT_SYMBOL(__page_cache_alloc_order);
 #endif
 
 /*

[1/3] mm: Add __page_cache_alloc_order

Commit Message

Comments

Patch