diff mbox series

mm/page_alloc: remove prefetchw() on freeing page to buddy system

Message ID 20240702020931.7061-1-richard.weiyang@gmail.com (mailing list archive)
State New
Headers show
Series mm/page_alloc: remove prefetchw() on freeing page to buddy system | expand

Commit Message

Wei Yang July 2, 2024, 2:09 a.m. UTC
The prefetchw() is introduced from an ancient patch[1].

The change log says:

    The basic idea is to free higher order pages instead of going
    through every single one.  Also, some unnecessary atomic operations
    are done away with and replaced with non-atomic equivalents, and
    prefetching is done where it helps the most.  For a more in-depth
    discusion of this patch, please see the linux-ia64 archives (topic
    is "free bootmem feedback patch").

So there are several changes improve the bootmem freeing, in which the
most basic idea is freeing higher order pages. And as Matthew says,
"Itanium CPUs of this era had no prefetchers."

I did 10 round bootup tests before and after this change, the data
doesn't prove prefetchw() help speeding up bootmem freeing. The sum of
the 10 round bootmem freeing time after prefetchw() removal even 5.2%
faster than before.

[1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
CC: David Hildenbrand <david@redhat.com>

---
The patch is based on mm-stable with David's change.
---
 mm/page_alloc.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

Comments

Andrew Morton July 2, 2024, 6:22 a.m. UTC | #1
On Tue,  2 Jul 2024 02:09:31 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:

> The prefetchw() is introduced from an ancient patch[1].
> 
> The change log says:
> 
>     The basic idea is to free higher order pages instead of going
>     through every single one.  Also, some unnecessary atomic operations
>     are done away with and replaced with non-atomic equivalents, and
>     prefetching is done where it helps the most.  For a more in-depth
>     discusion of this patch, please see the linux-ia64 archives (topic
>     is "free bootmem feedback patch").
> 
> So there are several changes improve the bootmem freeing, in which the
> most basic idea is freeing higher order pages. And as Matthew says,
> "Itanium CPUs of this era had no prefetchers."
> 
> I did 10 round bootup tests before and after this change, the data
> doesn't prove prefetchw() help speeding up bootmem freeing. The sum of
> the 10 round bootmem freeing time after prefetchw() removal even 5.2%
> faster than before.

I don't think I've ever seen prefetch make a damn bit of difference.

> [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> CC: David Hildenbrand <david@redhat.com>
> 
> ---
> The patch is based on mm-stable with David's change.

Oh help.  David makes many changes.  Please identify patches with much
care.  Fully quoting the email title works, as does a link.
David Hildenbrand July 2, 2024, 6:57 a.m. UTC | #2
On 02.07.24 04:09, Wei Yang wrote:
> The prefetchw() is introduced from an ancient patch[1].
> 
> The change log says:
> 
>      The basic idea is to free higher order pages instead of going
>      through every single one.  Also, some unnecessary atomic operations
>      are done away with and replaced with non-atomic equivalents, and
>      prefetching is done where it helps the most.  For a more in-depth
>      discusion of this patch, please see the linux-ia64 archives (topic
>      is "free bootmem feedback patch").
> 
> So there are several changes improve the bootmem freeing, in which the
> most basic idea is freeing higher order pages. And as Matthew says,
> "Itanium CPUs of this era had no prefetchers."
> 
> I did 10 round bootup tests before and after this change, the data
> doesn't prove prefetchw() help speeding up bootmem freeing. The sum of
> the 10 round bootmem freeing time after prefetchw() removal even 5.2%
> faster than before.

I suspect this is noise, though.

> 
> [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> CC: David Hildenbrand <david@redhat.com>
> 
> ---
> The patch is based on mm-stable with David's change.
> ---
>   mm/page_alloc.c | 13 ++-----------
>   1 file changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 116ee33fd1ce..c46aedfc9a12 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1236,16 +1236,11 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
>   	 */
>   	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) &&
>   	    unlikely(context == MEMINIT_HOTPLUG)) {
> -		prefetchw(p);
> -		for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
> -			prefetchw(p + 1);
> +		for (loop = 0; loop < nr_pages; loop++, p++) {
>   			VM_WARN_ON_ONCE(PageReserved(p));
>   			__ClearPageOffline(p);
>   			set_page_count(p, 0);
>   		}

Something like:

for (;;) {
	...
	if (++loop >= nr_pages)
		break;
	p++;
}


Might generate slightly better code, because we know that we execute the 
loop body at least once. We use that in set_ptes(), for example.

> -		VM_WARN_ON_ONCE(PageReserved(p));
> -		__ClearPageOffline(p);
> -		set_page_count(p, 0);
>   
>   		/*
>   		 * Freeing the page with debug_pagealloc enabled will try to
> @@ -1255,14 +1250,10 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
>   		debug_pagealloc_map_pages(page, nr_pages);
>   		adjust_managed_page_count(page, nr_pages);
>   	} else {
> -		prefetchw(p);
> -		for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
> -			prefetchw(p + 1);
> +		for (loop = 0; loop < nr_pages; loop++, p++) {
>   			__ClearPageReserved(p);
>   			set_page_count(p, 0);
>   		}
> -		__ClearPageReserved(p);
> -		set_page_count(p, 0);
>   
>   		/* memblock adjusts totalram_pages() manually. */
>   		atomic_long_add(nr_pages, &page_zone(page)->managed_pages);

Much better

Reviewed-by: David Hildenbrand <david@redhat.com>
Wei Yang July 3, 2024, 12:01 a.m. UTC | #3
On Mon, Jul 01, 2024 at 11:22:24PM -0700, Andrew Morton wrote:
>On Tue,  2 Jul 2024 02:09:31 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> The prefetchw() is introduced from an ancient patch[1].
>> 
>> The change log says:
>> 
>>     The basic idea is to free higher order pages instead of going
>>     through every single one.  Also, some unnecessary atomic operations
>>     are done away with and replaced with non-atomic equivalents, and
>>     prefetching is done where it helps the most.  For a more in-depth
>>     discusion of this patch, please see the linux-ia64 archives (topic
>>     is "free bootmem feedback patch").
>> 
>> So there are several changes improve the bootmem freeing, in which the
>> most basic idea is freeing higher order pages. And as Matthew says,
>> "Itanium CPUs of this era had no prefetchers."
>> 
>> I did 10 round bootup tests before and after this change, the data
>> doesn't prove prefetchw() help speeding up bootmem freeing. The sum of
>> the 10 round bootmem freeing time after prefetchw() removal even 5.2%
>> faster than before.
>
>I don't think I've ever seen prefetch make a damn bit of difference.
>
>> [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> CC: David Hildenbrand <david@redhat.com>
>> 
>> ---
>> The patch is based on mm-stable with David's change.
>
>Oh help.  David makes many changes.  Please identify patches with much
>care.  Fully quoting the email title works, as does a link.
>

The commit is 3dadec1babf9eee0c67c967df931d6f0cb124a04

  mm: pass meminit_context to __free_pages_core()

The link is, if I am correct.
https://lore.kernel.org/all/20240607090939.89524-2-david@redhat.com/

BTW, how we track the mail link? Just search in lore.kernel.org?
Wei Yang July 3, 2024, 12:12 a.m. UTC | #4
On Tue, Jul 02, 2024 at 08:57:57AM +0200, David Hildenbrand wrote:
>On 02.07.24 04:09, Wei Yang wrote:
>> The prefetchw() is introduced from an ancient patch[1].
>> 
>> The change log says:
>> 
>>      The basic idea is to free higher order pages instead of going
>>      through every single one.  Also, some unnecessary atomic operations
>>      are done away with and replaced with non-atomic equivalents, and
>>      prefetching is done where it helps the most.  For a more in-depth
>>      discusion of this patch, please see the linux-ia64 archives (topic
>>      is "free bootmem feedback patch").
>> 
>> So there are several changes improve the bootmem freeing, in which the
>> most basic idea is freeing higher order pages. And as Matthew says,
>> "Itanium CPUs of this era had no prefetchers."
>> 
>> I did 10 round bootup tests before and after this change, the data
>> doesn't prove prefetchw() help speeding up bootmem freeing. The sum of
>> the 10 round bootmem freeing time after prefetchw() removal even 5.2%
>> faster than before.
>
>I suspect this is noise, though.
>
>> 
>> [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> CC: David Hildenbrand <david@redhat.com>
>> 
>> ---
>> The patch is based on mm-stable with David's change.
>> ---
>>   mm/page_alloc.c | 13 ++-----------
>>   1 file changed, 2 insertions(+), 11 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 116ee33fd1ce..c46aedfc9a12 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1236,16 +1236,11 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
>>   	 */
>>   	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) &&
>>   	    unlikely(context == MEMINIT_HOTPLUG)) {
>> -		prefetchw(p);
>> -		for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
>> -			prefetchw(p + 1);
>> +		for (loop = 0; loop < nr_pages; loop++, p++) {
>>   			VM_WARN_ON_ONCE(PageReserved(p));
>>   			__ClearPageOffline(p);
>>   			set_page_count(p, 0);
>>   		}
>
>Something like:
>
>for (;;) {
>	...
>	if (++loop >= nr_pages)
>		break;
>	p++;
>}
>

So you prefer to have another version with this format? Sth like this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c46aedfc9a12..5235015eba3d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1224,7 +1224,7 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
 {
 	unsigned int nr_pages = 1 << order;
 	struct page *p = page;
-	unsigned int loop;
+	unsigned int loop = 0;
 
 	/*
 	 * When initializing the memmap, __init_single_page() sets the refcount
@@ -1236,10 +1236,13 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
 	 */
 	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) &&
 	    unlikely(context == MEMINIT_HOTPLUG)) {
-		for (loop = 0; loop < nr_pages; loop++, p++) {
+		for (;;) {
 			VM_WARN_ON_ONCE(PageReserved(p));
 			__ClearPageOffline(p);
 			set_page_count(p, 0);
+			if (++loop >= nr_pages)
+				break;
+			p++;
 		}
 
 		/*
@@ -1250,9 +1253,12 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
 		debug_pagealloc_map_pages(page, nr_pages);
 		adjust_managed_page_count(page, nr_pages);
 	} else {
-		for (loop = 0; loop < nr_pages; loop++, p++) {
+		for (;;) {
 			__ClearPageReserved(p);
 			set_page_count(p, 0);
+			if (++loop >= nr_pages)
+				break;
+			p++;
 		}
 
 		/* memblock adjusts totalram_pages() manually. */
>
>Might generate slightly better code, because we know that we execute the loop
>body at least once. We use that in set_ptes(), for example.
>
>> -		VM_WARN_ON_ONCE(PageReserved(p));
>> -		__ClearPageOffline(p);
>> -		set_page_count(p, 0);
>>   		/*
>>   		 * Freeing the page with debug_pagealloc enabled will try to
>> @@ -1255,14 +1250,10 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
>>   		debug_pagealloc_map_pages(page, nr_pages);
>>   		adjust_managed_page_count(page, nr_pages);
>>   	} else {
>> -		prefetchw(p);
>> -		for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
>> -			prefetchw(p + 1);
>> +		for (loop = 0; loop < nr_pages; loop++, p++) {
>>   			__ClearPageReserved(p);
>>   			set_page_count(p, 0);
>>   		}
>> -		__ClearPageReserved(p);
>> -		set_page_count(p, 0);
>>   		/* memblock adjusts totalram_pages() manually. */
>>   		atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
>
>Much better
>
>Reviewed-by: David Hildenbrand <david@redhat.com>
>
>-- 
>Cheers,
>
>David / dhildenb
Andrew Morton July 3, 2024, 12:49 a.m. UTC | #5
On Wed, 3 Jul 2024 00:01:42 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:

> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >> Suggested-by: Matthew Wilcox <willy@infradead.org>
> >> CC: David Hildenbrand <david@redhat.com>
> >> 
> >> ---
> >> The patch is based on mm-stable with David's change.
> >
> >Oh help.  David makes many changes.  Please identify patches with much
> >care.  Fully quoting the email title works, as does a link.
> >
> 
> The commit is 3dadec1babf9eee0c67c967df931d6f0cb124a04
> 
>   mm: pass meminit_context to __free_pages_core()

OK, that's in mm-stable.

> The link is, if I am correct.
> https://lore.kernel.org/all/20240607090939.89524-2-david@redhat.com/
> 
> BTW, how we track the mail link? Just search in lore.kernel.org?

I'm not sure what you mean?  I extract the message-id from the email
and concat it with "https://lkml.kernel.org/r".
Wei Yang July 3, 2024, 12:55 a.m. UTC | #6
On Tue, Jul 02, 2024 at 05:49:39PM -0700, Andrew Morton wrote:
>On Wed, 3 Jul 2024 00:01:42 +0000 Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> >> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> >> CC: David Hildenbrand <david@redhat.com>
>> >> 
>> >> ---
>> >> The patch is based on mm-stable with David's change.
>> >
>> >Oh help.  David makes many changes.  Please identify patches with much
>> >care.  Fully quoting the email title works, as does a link.
>> >
>> 
>> The commit is 3dadec1babf9eee0c67c967df931d6f0cb124a04
>> 
>>   mm: pass meminit_context to __free_pages_core()
>
>OK, that's in mm-stable.
>
>> The link is, if I am correct.
>> https://lore.kernel.org/all/20240607090939.89524-2-david@redhat.com/
>> 
>> BTW, how we track the mail link? Just search in lore.kernel.org?
>
>I'm not sure what you mean?  I extract the message-id from the email
>and concat it with "https://lkml.kernel.org/r".

Thanks, I got something new :-)
David Hildenbrand July 3, 2024, 8:40 a.m. UTC | #7
On 03.07.24 02:12, Wei Yang wrote:
> On Tue, Jul 02, 2024 at 08:57:57AM +0200, David Hildenbrand wrote:
>> On 02.07.24 04:09, Wei Yang wrote:
>>> The prefetchw() is introduced from an ancient patch[1].
>>>
>>> The change log says:
>>>
>>>       The basic idea is to free higher order pages instead of going
>>>       through every single one.  Also, some unnecessary atomic operations
>>>       are done away with and replaced with non-atomic equivalents, and
>>>       prefetching is done where it helps the most.  For a more in-depth
>>>       discusion of this patch, please see the linux-ia64 archives (topic
>>>       is "free bootmem feedback patch").
>>>
>>> So there are several changes improve the bootmem freeing, in which the
>>> most basic idea is freeing higher order pages. And as Matthew says,
>>> "Itanium CPUs of this era had no prefetchers."
>>>
>>> I did 10 round bootup tests before and after this change, the data
>>> doesn't prove prefetchw() help speeding up bootmem freeing. The sum of
>>> the 10 round bootmem freeing time after prefetchw() removal even 5.2%
>>> faster than before.
>>
>> I suspect this is noise, though.
>>
>>>
>>> [1]: https://lore.kernel.org/linux-ia64/40F46962.4090604@sgi.com/
>>>
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>>> CC: David Hildenbrand <david@redhat.com>
>>>
>>> ---
>>> The patch is based on mm-stable with David's change.
>>> ---
>>>    mm/page_alloc.c | 13 ++-----------
>>>    1 file changed, 2 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 116ee33fd1ce..c46aedfc9a12 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1236,16 +1236,11 @@ void __meminit __free_pages_core(struct page *page, unsigned int order,
>>>    	 */
>>>    	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) &&
>>>    	    unlikely(context == MEMINIT_HOTPLUG)) {
>>> -		prefetchw(p);
>>> -		for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
>>> -			prefetchw(p + 1);
>>> +		for (loop = 0; loop < nr_pages; loop++, p++) {
>>>    			VM_WARN_ON_ONCE(PageReserved(p));
>>>    			__ClearPageOffline(p);
>>>    			set_page_count(p, 0);
>>>    		}
>>
>> Something like:
>>
>> for (;;) {
>> 	...
>> 	if (++loop >= nr_pages)
>> 		break;
>> 	p++;
>> }
>>
> 
> So you prefer to have another version with this format? Sth like this?

Whatever you prefer, just pointing it out that we can do slightly better :)
Matthew Wilcox July 4, 2024, 3:30 a.m. UTC | #8
On Tue, Jul 02, 2024 at 08:57:57AM +0200, David Hildenbrand wrote:
> > I did 10 round bootup tests before and after this change, the data
> > doesn't prove prefetchw() help speeding up bootmem freeing. The sum of
> > the 10 round bootmem freeing time after prefetchw() removal even 5.2%
> > faster than before.
> 
> I suspect this is noise, though.

I think it's real, though small.  Each prefetchw() is an instruction,
and if we can avoid issuing an instruction, we should.

> Something like:
> 
> for (;;) {
> 	...
> 	if (++loop >= nr_pages)
> 		break;
> 	p++;
> }
> 
> 
> Might generate slightly better code, because we know that we execute the
> loop body at least once. We use that in set_ptes(), for example.

I don't think it's worth doing.  Keep the loop simple and obvious.
set_ptes() is different because we actually expect to execute the loop
exactly once (ie most folios are small).  So two compares per call to
set_ptes() instead of one makes a difference.  Here, we're expecting
to execute this loop, what, a million times?  Doing a million-and-one
compares instead of a million makes no observable difference.

I would like to see v2 of this patch dropped, please Andrew.
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 116ee33fd1ce..c46aedfc9a12 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1236,16 +1236,11 @@  void __meminit __free_pages_core(struct page *page, unsigned int order,
 	 */
 	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG) &&
 	    unlikely(context == MEMINIT_HOTPLUG)) {
-		prefetchw(p);
-		for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
-			prefetchw(p + 1);
+		for (loop = 0; loop < nr_pages; loop++, p++) {
 			VM_WARN_ON_ONCE(PageReserved(p));
 			__ClearPageOffline(p);
 			set_page_count(p, 0);
 		}
-		VM_WARN_ON_ONCE(PageReserved(p));
-		__ClearPageOffline(p);
-		set_page_count(p, 0);
 
 		/*
 		 * Freeing the page with debug_pagealloc enabled will try to
@@ -1255,14 +1250,10 @@  void __meminit __free_pages_core(struct page *page, unsigned int order,
 		debug_pagealloc_map_pages(page, nr_pages);
 		adjust_managed_page_count(page, nr_pages);
 	} else {
-		prefetchw(p);
-		for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
-			prefetchw(p + 1);
+		for (loop = 0; loop < nr_pages; loop++, p++) {
 			__ClearPageReserved(p);
 			set_page_count(p, 0);
 		}
-		__ClearPageReserved(p);
-		set_page_count(p, 0);
 
 		/* memblock adjusts totalram_pages() manually. */
 		atomic_long_add(nr_pages, &page_zone(page)->managed_pages);