diff mbox series

[net-next,v9,03/13] mm: page_frag: use initial zero offset for page_frag_alloc_align()

Message ID 20240625135216.47007-4-linyunsheng@huawei.com (mailing list archive)
State Deferred
Delegated to: Netdev Maintainers
Headers show
Series First try to replace page_frag with page_frag_cache | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 895 this patch: 895
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 1 maintainers not CCed: almasrymina@google.com
netdev/build_clang success Errors and warnings before: 961 this patch: 961
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5930 this patch: 5930
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 81 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 59 this patch: 59
netdev/source_inline success Was 0 now: 0

Commit Message

Yunsheng Lin June 25, 2024, 1:52 p.m. UTC
We are above to use page_frag_alloc_*() API to not just
allocate memory for skb->data, but also use them to do
the memory allocation for skb frag too. Currently the
implementation of page_frag in mm subsystem is running
the offset as a countdown rather than count-up value,
there may have several advantages to that as mentioned
in [1], but it may have some disadvantages, for example,
it may disable skb frag coaleasing and more correct cache
prefetching

We have a trade-off to make in order to have a unified
implementation and API for page_frag, so use a initial zero
offset in this patch, and the following patch will try to
make some optimization to aovid the disadvantages as much
as possible.

As offsets is added due to alignment requirement before
actually checking if the cache is enough, which might make
it exploitable if caller passes a align value bigger than
32K mistakenly. As we are allowing order 3 page allocation
to fail easily under low memory condition, align value bigger
than PAGE_SIZE is not really allowed, so add a 'align >
PAGE_SIZE' checking in page_frag_alloc_va_align() to catch
that.

1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/

CC: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/page_frag_cache.h |  2 +-
 include/linux/skbuff.h          |  4 ++--
 mm/page_frag_cache.c            | 26 +++++++++++---------------
 3 files changed, 14 insertions(+), 18 deletions(-)

Comments

Alexander H Duyck July 1, 2024, 11:27 p.m. UTC | #1
On Tue, 2024-06-25 at 21:52 +0800, Yunsheng Lin wrote:
> We are above to use page_frag_alloc_*() API to not just
"about to use", not "above to use"

> allocate memory for skb->data, but also use them to do
> the memory allocation for skb frag too. Currently the
> implementation of page_frag in mm subsystem is running
> the offset as a countdown rather than count-up value,
> there may have several advantages to that as mentioned
> in [1], but it may have some disadvantages, for example,
> it may disable skb frag coaleasing and more correct cache
> prefetching
> 
> We have a trade-off to make in order to have a unified
> implementation and API for page_frag, so use a initial zero
> offset in this patch, and the following patch will try to
> make some optimization to aovid the disadvantages as much
> as possible.
> 
> As offsets is added due to alignment requirement before
> actually checking if the cache is enough, which might make
> it exploitable if caller passes a align value bigger than
> 32K mistakenly. As we are allowing order 3 page allocation
> to fail easily under low memory condition, align value bigger
> than PAGE_SIZE is not really allowed, so add a 'align >
> PAGE_SIZE' checking in page_frag_alloc_va_align() to catch
> that.
> 
> 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/
> 
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>  include/linux/page_frag_cache.h |  2 +-
>  include/linux/skbuff.h          |  4 ++--
>  mm/page_frag_cache.c            | 26 +++++++++++---------------
>  3 files changed, 14 insertions(+), 18 deletions(-)
> 
> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h
> index 3a44bfc99750..b9411f0db25a 100644
> --- a/include/linux/page_frag_cache.h
> +++ b/include/linux/page_frag_cache.h
> @@ -32,7 +32,7 @@ static inline void *page_frag_alloc_align(struct page_frag_cache *nc,
>  					  unsigned int fragsz, gfp_t gfp_mask,
>  					  unsigned int align)
>  {
> -	WARN_ON_ONCE(!is_power_of_2(align));
> +	WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE);
>  	return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align);
>  }
>  
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index eb8ae8292c48..d1fea23ec386 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -3320,7 +3320,7 @@ static inline void *netdev_alloc_frag(unsigned int fragsz)
>  static inline void *netdev_alloc_frag_align(unsigned int fragsz,
>  					    unsigned int align)
>  {
> -	WARN_ON_ONCE(!is_power_of_2(align));
> +	WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE);
>  	return __netdev_alloc_frag_align(fragsz, -align);
>  }
>  
> @@ -3391,7 +3391,7 @@ static inline void *napi_alloc_frag(unsigned int fragsz)
>  static inline void *napi_alloc_frag_align(unsigned int fragsz,
>  					  unsigned int align)
>  {
> -	WARN_ON_ONCE(!is_power_of_2(align));
> +	WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE);
>  	return __napi_alloc_frag_align(fragsz, -align);
>  }
>  
> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> index 88f567ef0e29..da244851b8a4 100644
> --- a/mm/page_frag_cache.c
> +++ b/mm/page_frag_cache.c
> @@ -72,10 +72,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>  		if (!page)
>  			return NULL;
>  
> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> -		/* if size can vary use size else just use PAGE_SIZE */
> -		size = nc->size;
> -#endif
>  		/* Even if we own the page, we do not use atomic_set().
>  		 * This would break get_page_unless_zero() users.
>  		 */
> @@ -84,11 +80,16 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>  		/* reset page count bias and offset to start of new frag */
>  		nc->pfmemalloc = page_is_pfmemalloc(page);
>  		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> -		nc->offset = size;
> +		nc->offset = 0;
>  	}
>  
> -	offset = nc->offset - fragsz;
> -	if (unlikely(offset < 0)) {
> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> +	/* if size can vary use size else just use PAGE_SIZE */
> +	size = nc->size;
> +#endif
> +
> +	offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
> +	if (unlikely(offset + fragsz > size)) {

The fragsz check below could be moved to here.

>  		page = virt_to_page(nc->va);
>  
>  		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
> @@ -99,17 +100,13 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>  			goto refill;
>  		}
>  
> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> -		/* if size can vary use size else just use PAGE_SIZE */
> -		size = nc->size;
> -#endif
>  		/* OK, page count is 0, we can safely set it */
>  		set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>  
>  		/* reset page count bias and offset to start of new frag */
>  		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> -		offset = size - fragsz;
> -		if (unlikely(offset < 0)) {
> +		offset = 0;
> +		if (unlikely(fragsz > PAGE_SIZE)) {

Since we aren't taking advantage of the flag that is left after the
subtraction we might just want to look at moving this piece up to just
after the offset + fragsz check. That should prevent us from trying to
refill if we have a request that is larger than a single page. In
addition we could probably just drop the 3 PAGE_SIZE checks above as
they would be redundant.

>  			/*
>  			 * The caller is trying to allocate a fragment
>  			 * with fragsz > PAGE_SIZE but the cache isn't big
> @@ -124,8 +121,7 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>  	}
>  
>  	nc->pagecnt_bias--;
> -	offset &= align_mask;
> -	nc->offset = offset;
> +	nc->offset = offset + fragsz;
>  
>  	return nc->va + offset;
>  }
Yunsheng Lin July 2, 2024, 12:28 p.m. UTC | #2
On 2024/7/2 7:27, Alexander H Duyck wrote:
> On Tue, 2024-06-25 at 21:52 +0800, Yunsheng Lin wrote:
>> We are above to use page_frag_alloc_*() API to not just
> "about to use", not "above to use"

Ack.

> 
>> allocate memory for skb->data, but also use them to do
>> the memory allocation for skb frag too. Currently the
>> implementation of page_frag in mm subsystem is running
>> the offset as a countdown rather than count-up value,
>> there may have several advantages to that as mentioned
>> in [1], but it may have some disadvantages, for example,
>> it may disable skb frag coaleasing and more correct cache
>> prefetching
>>

...

>> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
>> index 88f567ef0e29..da244851b8a4 100644
>> --- a/mm/page_frag_cache.c
>> +++ b/mm/page_frag_cache.c
>> @@ -72,10 +72,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>>  		if (!page)
>>  			return NULL;
>>  
>> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> -		/* if size can vary use size else just use PAGE_SIZE */
>> -		size = nc->size;
>> -#endif
>>  		/* Even if we own the page, we do not use atomic_set().
>>  		 * This would break get_page_unless_zero() users.
>>  		 */
>> @@ -84,11 +80,16 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>>  		/* reset page count bias and offset to start of new frag */
>>  		nc->pfmemalloc = page_is_pfmemalloc(page);
>>  		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>> -		nc->offset = size;
>> +		nc->offset = 0;
>>  	}
>>  
>> -	offset = nc->offset - fragsz;
>> -	if (unlikely(offset < 0)) {
>> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> +	/* if size can vary use size else just use PAGE_SIZE */
>> +	size = nc->size;
>> +#endif
>> +
>> +	offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
>> +	if (unlikely(offset + fragsz > size)) {
> 
> The fragsz check below could be moved to here.
> 
>>  		page = virt_to_page(nc->va);
>>  
>>  		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
>> @@ -99,17 +100,13 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>>  			goto refill;
>>  		}
>>  
>> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> -		/* if size can vary use size else just use PAGE_SIZE */
>> -		size = nc->size;
>> -#endif
>>  		/* OK, page count is 0, we can safely set it */
>>  		set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>>  
>>  		/* reset page count bias and offset to start of new frag */
>>  		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>> -		offset = size - fragsz;
>> -		if (unlikely(offset < 0)) {
>> +		offset = 0;
>> +		if (unlikely(fragsz > PAGE_SIZE)) {
> 
> Since we aren't taking advantage of the flag that is left after the
> subtraction we might just want to look at moving this piece up to just
> after the offset + fragsz check. That should prevent us from trying to
> refill if we have a request that is larger than a single page. In
> addition we could probably just drop the 3 PAGE_SIZE checks above as
> they would be redundant.

I am not sure I understand the 'drop the 3 PAGE_SIZE checks' part and
the 'redundant' part, where is the '3 PAGE_SIZE checks'? And why they
are redundant?

> 
>>  			/*
>>  			 * The caller is trying to allocate a fragment
>>  			 * with fragsz > PAGE_SIZE but the cache isn't big
>> @@ -124,8 +121,7 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>>  	}
>>  
>>  	nc->pagecnt_bias--;
>> -	offset &= align_mask;
>> -	nc->offset = offset;
>> +	nc->offset = offset + fragsz;
>>  
>>  	return nc->va + offset;
>>  }
> 
> 
> .
>
Alexander H Duyck July 2, 2024, 4 p.m. UTC | #3
On Tue, Jul 2, 2024 at 5:28 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2024/7/2 7:27, Alexander H Duyck wrote:
> > On Tue, 2024-06-25 at 21:52 +0800, Yunsheng Lin wrote:
> >> We are above to use page_frag_alloc_*() API to not just
> > "about to use", not "above to use"
>
> Ack.
>
> >
> >> allocate memory for skb->data, but also use them to do
> >> the memory allocation for skb frag too. Currently the
> >> implementation of page_frag in mm subsystem is running
> >> the offset as a countdown rather than count-up value,
> >> there may have several advantages to that as mentioned
> >> in [1], but it may have some disadvantages, for example,
> >> it may disable skb frag coaleasing and more correct cache
> >> prefetching
> >>
>
> ...
>
> >> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> >> index 88f567ef0e29..da244851b8a4 100644
> >> --- a/mm/page_frag_cache.c
> >> +++ b/mm/page_frag_cache.c
> >> @@ -72,10 +72,6 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
> >>              if (!page)
> >>                      return NULL;
> >>
> >> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> >> -            /* if size can vary use size else just use PAGE_SIZE */
> >> -            size = nc->size;
> >> -#endif
> >>              /* Even if we own the page, we do not use atomic_set().
> >>               * This would break get_page_unless_zero() users.
> >>               */
> >> @@ -84,11 +80,16 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
> >>              /* reset page count bias and offset to start of new frag */
> >>              nc->pfmemalloc = page_is_pfmemalloc(page);
> >>              nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> >> -            nc->offset = size;
> >> +            nc->offset = 0;
> >>      }
> >>
> >> -    offset = nc->offset - fragsz;
> >> -    if (unlikely(offset < 0)) {
> >> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> >> +    /* if size can vary use size else just use PAGE_SIZE */
> >> +    size = nc->size;
> >> +#endif
> >> +
> >> +    offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
> >> +    if (unlikely(offset + fragsz > size)) {
> >
> > The fragsz check below could be moved to here.
> >
> >>              page = virt_to_page(nc->va);
> >>
> >>              if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
> >> @@ -99,17 +100,13 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
> >>                      goto refill;
> >>              }
> >>
> >> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> >> -            /* if size can vary use size else just use PAGE_SIZE */
> >> -            size = nc->size;
> >> -#endif
> >>              /* OK, page count is 0, we can safely set it */
> >>              set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
> >>
> >>              /* reset page count bias and offset to start of new frag */
> >>              nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> >> -            offset = size - fragsz;
> >> -            if (unlikely(offset < 0)) {
> >> +            offset = 0;
> >> +            if (unlikely(fragsz > PAGE_SIZE)) {
> >
> > Since we aren't taking advantage of the flag that is left after the
> > subtraction we might just want to look at moving this piece up to just
> > after the offset + fragsz check. That should prevent us from trying to
> > refill if we have a request that is larger than a single page. In
> > addition we could probably just drop the 3 PAGE_SIZE checks above as
> > they would be redundant.
>
> I am not sure I understand the 'drop the 3 PAGE_SIZE checks' part and
> the 'redundant' part, where is the '3 PAGE_SIZE checks'? And why they
> are redundant?

I was referring to the addition of the checks for align > PAGE_SIZE in
the alloc functions at the start of this diff. I guess I had dropped
them from the first half of it with the "...". Also looking back
through the patch you misspelled "avoid" as "aovid".

The issue is there is a ton of pulling things forward that don't
necessarily make sense into these diffs. Now that I have finished
looking through the set I have a better idea of why those are there
and they might make sense. It is just difficult to review since code
is being added for things that aren't applicable to the patch being
reviewed.
Yunsheng Lin July 3, 2024, 11:25 a.m. UTC | #4
On 2024/7/3 0:00, Alexander Duyck wrote:

...

>>>> +
>>>> +    offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
>>>> +    if (unlikely(offset + fragsz > size)) {
>>>
>>> The fragsz check below could be moved to here.
>>>
>>>>              page = virt_to_page(nc->va);
>>>>
>>>>              if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
>>>> @@ -99,17 +100,13 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>>>>                      goto refill;
>>>>              }
>>>>
>>>> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>>>> -            /* if size can vary use size else just use PAGE_SIZE */
>>>> -            size = nc->size;
>>>> -#endif
>>>>              /* OK, page count is 0, we can safely set it */
>>>>              set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>>>>
>>>>              /* reset page count bias and offset to start of new frag */
>>>>              nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>>>> -            offset = size - fragsz;
>>>> -            if (unlikely(offset < 0)) {
>>>> +            offset = 0;
>>>> +            if (unlikely(fragsz > PAGE_SIZE)) {
>>>
>>> Since we aren't taking advantage of the flag that is left after the
>>> subtraction we might just want to look at moving this piece up to just
>>> after the offset + fragsz check. That should prevent us from trying to
>>> refill if we have a request that is larger than a single page. In
>>> addition we could probably just drop the 3 PAGE_SIZE checks above as
>>> they would be redundant.
>>
>> I am not sure I understand the 'drop the 3 PAGE_SIZE checks' part and
>> the 'redundant' part, where is the '3 PAGE_SIZE checks'? And why they
>> are redundant?
> 
> I was referring to the addition of the checks for align > PAGE_SIZE in
> the alloc functions at the start of this diff. I guess I had dropped
> them from the first half of it with the "...". Also looking back
> through the patch you misspelled "avoid" as "aovid".
> 
> The issue is there is a ton of pulling things forward that don't
> necessarily make sense into these diffs. Now that I have finished
> looking through the set I have a better idea of why those are there
> and they might make sense. It is just difficult to review since code
> is being added for things that aren't applicable to the patch being
> reviewed.

As you mentioned in other thread, perhaps the 'remaining' changing does
need to be incorporated into this patch.

> .
>
diff mbox series

Patch

diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h
index 3a44bfc99750..b9411f0db25a 100644
--- a/include/linux/page_frag_cache.h
+++ b/include/linux/page_frag_cache.h
@@ -32,7 +32,7 @@  static inline void *page_frag_alloc_align(struct page_frag_cache *nc,
 					  unsigned int fragsz, gfp_t gfp_mask,
 					  unsigned int align)
 {
-	WARN_ON_ONCE(!is_power_of_2(align));
+	WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE);
 	return __page_frag_alloc_align(nc, fragsz, gfp_mask, -align);
 }
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index eb8ae8292c48..d1fea23ec386 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3320,7 +3320,7 @@  static inline void *netdev_alloc_frag(unsigned int fragsz)
 static inline void *netdev_alloc_frag_align(unsigned int fragsz,
 					    unsigned int align)
 {
-	WARN_ON_ONCE(!is_power_of_2(align));
+	WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE);
 	return __netdev_alloc_frag_align(fragsz, -align);
 }
 
@@ -3391,7 +3391,7 @@  static inline void *napi_alloc_frag(unsigned int fragsz)
 static inline void *napi_alloc_frag_align(unsigned int fragsz,
 					  unsigned int align)
 {
-	WARN_ON_ONCE(!is_power_of_2(align));
+	WARN_ON_ONCE(!is_power_of_2(align) || align > PAGE_SIZE);
 	return __napi_alloc_frag_align(fragsz, -align);
 }
 
diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index 88f567ef0e29..da244851b8a4 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -72,10 +72,6 @@  void *__page_frag_alloc_align(struct page_frag_cache *nc,
 		if (!page)
 			return NULL;
 
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-		/* if size can vary use size else just use PAGE_SIZE */
-		size = nc->size;
-#endif
 		/* Even if we own the page, we do not use atomic_set().
 		 * This would break get_page_unless_zero() users.
 		 */
@@ -84,11 +80,16 @@  void *__page_frag_alloc_align(struct page_frag_cache *nc,
 		/* reset page count bias and offset to start of new frag */
 		nc->pfmemalloc = page_is_pfmemalloc(page);
 		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
-		nc->offset = size;
+		nc->offset = 0;
 	}
 
-	offset = nc->offset - fragsz;
-	if (unlikely(offset < 0)) {
+#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
+	/* if size can vary use size else just use PAGE_SIZE */
+	size = nc->size;
+#endif
+
+	offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
+	if (unlikely(offset + fragsz > size)) {
 		page = virt_to_page(nc->va);
 
 		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
@@ -99,17 +100,13 @@  void *__page_frag_alloc_align(struct page_frag_cache *nc,
 			goto refill;
 		}
 
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-		/* if size can vary use size else just use PAGE_SIZE */
-		size = nc->size;
-#endif
 		/* OK, page count is 0, we can safely set it */
 		set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
 
 		/* reset page count bias and offset to start of new frag */
 		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
-		offset = size - fragsz;
-		if (unlikely(offset < 0)) {
+		offset = 0;
+		if (unlikely(fragsz > PAGE_SIZE)) {
 			/*
 			 * The caller is trying to allocate a fragment
 			 * with fragsz > PAGE_SIZE but the cache isn't big
@@ -124,8 +121,7 @@  void *__page_frag_alloc_align(struct page_frag_cache *nc,
 	}
 
 	nc->pagecnt_bias--;
-	offset &= align_mask;
-	nc->offset = offset;
+	nc->offset = offset + fragsz;
 
 	return nc->va + offset;
 }