diff mbox

[v3,09/10] PM / Hibernate: Publish pages restored in-place to arch code

Message ID 1448559168-8363-10-git-send-email-james.morse@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

James Morse Nov. 26, 2015, 5:32 p.m. UTC
Some architectures require code written to memory as if it were data to be
'cleaned' from any data caches before the processor can fetch them as new
instructions.

During resume from hibernate, the snapshot code copies some pages directly,
meaning these architectures do not get a chance to perform their cache
maintenance. Create a new list of pages that were restored in place, so
that the arch code can perform this maintenance when necessary.

Signed-off-by: James Morse <james.morse@arm.com>
---
 include/linux/suspend.h |  1 +
 kernel/power/snapshot.c | 42 ++++++++++++++++++++++++++++--------------
 2 files changed, 29 insertions(+), 14 deletions(-)

Comments

Lorenzo Pieralisi Dec. 3, 2015, 12:09 p.m. UTC | #1
On Thu, Nov 26, 2015 at 05:32:47PM +0000, James Morse wrote:

[...]

> @@ -2427,25 +2434,31 @@ static void *get_buffer(struct memory_bitmap *bm, struct chain_allocator *ca)
>  	if (PageHighMem(page))
>  		return get_highmem_page_buffer(page, ca);

I know it is not a problem for arm64, but you should export the
"restored" highmem pages list too because arch code may need to use
it for the same reasons this patch was implemented.

> -	if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> -		/* We have allocated the "original" page frame and we can
> -		 * use it directly to store the loaded page.
> -		 */
> -		return page_address(page);
> -
> -	/* The "original" page frame has not been allocated and we have to
> -	 * use a "safe" page frame to store the loaded page.
> -	 */
>  	pbe = chain_alloc(ca, sizeof(struct pbe));
>  	if (!pbe) {
>  		swsusp_free();
>  		return ERR_PTR(-ENOMEM);
>  	}
> -	pbe->orig_address = page_address(page);
> -	pbe->address = safe_pages_list;
> -	safe_pages_list = safe_pages_list->next;
> -	pbe->next = restore_pblist;
> -	restore_pblist = pbe;
> +
> +	if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page)) {
> +		/* We have allocated the "original" page frame and we can
> +		 * use it directly to store the loaded page.
> +		 */
> +		pbe->orig_address = NULL;
> +		pbe->address = page_address(page);
> +		pbe->next = restored_inplace_pblist;
> +		restored_inplace_pblist = pbe;
> +	} else {
> +		/* The "original" page frame has not been allocated and we
> +		 * have to use a "safe" page frame to store the loaded page.
> +		 */
> +		pbe->orig_address = page_address(page);
> +		pbe->address = safe_pages_list;
> +		safe_pages_list = safe_pages_list->next;
> +		pbe->next = restore_pblist;
> +		restore_pblist = pbe;
> +	}

This makes sense to me, more so given that the pbe data is already
pre-allocated regardless in prepare_image() (ie it should not change
the current behaviour apart from calling chain_alloc() for every page we
are restoring), you are just adding a pointer to stash that information,
hence, if it is ok for Pavel and Rafael I think this patch can be considered
for merging.

Feedback appreciated.

Thanks,
Lorenzo

> +
>  	return pbe->address;
>  }
>  
> @@ -2513,6 +2526,7 @@ int snapshot_write_next(struct snapshot_handle *handle)
>  			chain_init(&ca, GFP_ATOMIC, PG_SAFE);
>  			memory_bm_position_reset(&orig_bm);
>  			restore_pblist = NULL;
> +			restored_inplace_pblist = NULL;
>  			handle->buffer = get_buffer(&orig_bm, &ca);
>  			handle->sync_read = 0;
>  			if (IS_ERR(handle->buffer))
> -- 
> 2.6.2
>
James Morse Dec. 4, 2015, 4:26 p.m. UTC | #2
Hi Lorenzo,

On 03/12/15 12:09, Lorenzo Pieralisi wrote:
> On Thu, Nov 26, 2015 at 05:32:47PM +0000, James Morse wrote:
>> @@ -2427,25 +2434,31 @@ static void *get_buffer(struct memory_bitmap *bm, struct chain_allocator *ca)
>>  	if (PageHighMem(page))
>>  		return get_highmem_page_buffer(page, ca);
> 
> I know it is not a problem for arm64, but you should export the
> "restored" highmem pages list too because arch code may need to use
> it for the same reasons this patch was implemented.

I'm not sure it can be for the same reasons:
kernel/power/snapshot.c:swap_two_pages_data() does the copy for highmem
pages, and then kunmap-s them... if the page is to be used for
execution, it needs to be re-mapped first, possibly at a different address.
The place to do cache maintenance is in the kunmap/kmap calls.


Thanks,

James
Pavel Machek Dec. 5, 2015, 9:35 a.m. UTC | #3
On Thu 2015-11-26 17:32:47, James Morse wrote:
> Some architectures require code written to memory as if it were data to be
> 'cleaned' from any data caches before the processor can fetch them as new
> instructions.
> 
> During resume from hibernate, the snapshot code copies some pages directly,
> meaning these architectures do not get a chance to perform their cache
> maintenance. Create a new list of pages that were restored in place, so
> that the arch code can perform this maintenance when necessary.

Umm. Could the copy function be modified to do the neccessary
flushing, instead?

Alternatively, can you just clean the whole cache before jumping to
the new kernel?
								Pavel
James Morse Dec. 7, 2015, 11:28 a.m. UTC | #4
Hi Pavel,

On 05/12/15 09:35, Pavel Machek wrote:
> On Thu 2015-11-26 17:32:47, James Morse wrote:
>> Some architectures require code written to memory as if it were data to be
>> 'cleaned' from any data caches before the processor can fetch them as new
>> instructions.
>>
>> During resume from hibernate, the snapshot code copies some pages directly,
>> meaning these architectures do not get a chance to perform their cache
>> maintenance. Create a new list of pages that were restored in place, so
>> that the arch code can perform this maintenance when necessary.
> 
> Umm. Could the copy function be modified to do the neccessary
> flushing, instead?

The copying is done by load_image_lzo() using memcpy() if you have
compression enabled, and by load_image() using swap_read_page() if you
don't.

I didn't do it here as it would clean every page copied, which was the
worrying part of the previous approach. If there is an architecture
where this cache-clean operation is expensive, it would slow down
restore. I was trying to benchmark the impact of this on 32bit arm when
I spotted it was broken.

This allocated-same-page code path doesn't happen very often, so we
don't want this to have an impact on the 'normal' code path. On 32bit
arm I saw ~20 of these allocations out of ~60,000 pages.

This new way allocates a few extra pages during restore, and doesn't
assume that flush_cache_range() needs calling. It should have no impact
on architectures that aren't using the new list.


> Alternatively, can you just clean the whole cache before jumping to
> the new kernel?

On arm64, cleaning the whole cache means cleaning all of memory by
virtual address, which would be a high price to pay when we only need to
clean the pages we copied. The current implementation does clean all the
page it copies, the problem is the ~0.03% that are copied behind its
back. This patch publishes where those pages are.



Thanks!

James
Pavel Machek Dec. 8, 2015, 8:19 a.m. UTC | #5
Hi!

> > Umm. Could the copy function be modified to do the neccessary
> > flushing, instead?
> 
> The copying is done by load_image_lzo() using memcpy() if you have
> compression enabled, and by load_image() using swap_read_page() if you
> don't.
> 
> I didn't do it here as it would clean every page copied, which was the
> worrying part of the previous approach. If there is an architecture
> where this cache-clean operation is expensive, it would slow down
> restore. I was trying to benchmark the impact of this on 32bit arm when
> I spotted it was broken.

You have just loaded the page from slow storage (hard drive,
MMC). Cleaning a page should be pretty fast compared to that.

> This allocated-same-page code path doesn't happen very often, so we
> don't want this to have an impact on the 'normal' code path. On 32bit
> arm I saw ~20 of these allocations out of ~60,000 pages.
> 
> This new way allocates a few extra pages during restore, and doesn't
> assume that flush_cache_range() needs calling. It should have no impact
> on architectures that aren't using the new list.

It is also complex.

> > Alternatively, can you just clean the whole cache before jumping to
> > the new kernel?
> 
> On arm64, cleaning the whole cache means cleaning all of memory by
> virtual address, which would be a high price to pay when we only need to
> clean the pages we copied. The current implementation does clean all

How high price to pay? I mean, hibernation/restore takes
_seconds_. Paying miliseconds to have cleaner code is acceptable
price.


									Pavel
James Morse Dec. 16, 2015, 9:55 a.m. UTC | #6
Hi Pavel,

On 08/12/15 08:19, Pavel Machek wrote:
>> I didn't do it here as it would clean every page copied, which was the
>> worrying part of the previous approach. If there is an architecture
>> where this cache-clean operation is expensive, it would slow down
>> restore. I was trying to benchmark the impact of this on 32bit arm when
>> I spotted it was broken.
> 
> You have just loaded the page from slow storage (hard drive,
> MMC). Cleaning a page should be pretty fast compared to that.

(One day I hope to own a laptop that hibernates to almost-memory speed
nvram!)

Speed is one issue - another is I don't think its correct to assume that
any architecture with a flush_icache_range() function can/should have
that called on any page of data.

There is also the possibility that an architecture needs to do something
other than flush_icache_range() on the pages that were copied. (I can
see lots of s390 hooks for 'page keys', Intel's memory protection keys
may want something similar...)

This patch is the general-purpose fix, matching the existing list of
'these pages need copying' with a 'these pages were already copied'.


>> This allocated-same-page code path doesn't happen very often, so we
>> don't want this to have an impact on the 'normal' code path. On 32bit
>> arm I saw ~20 of these allocations out of ~60,000 pages.
>>
>> This new way allocates a few extra pages during restore, and doesn't
>> assume that flush_cache_range() needs calling. It should have no impact
>> on architectures that aren't using the new list.
> 
> It is also complex.

Its symmetric with the existing restore_pblist code, I think that this
is the simplest way of doing it.


>>> Alternatively, can you just clean the whole cache before jumping to
>>> the new kernel?
>>
>> On arm64, cleaning the whole cache means cleaning all of memory by
>> virtual address, which would be a high price to pay when we only need to
>> clean the pages we copied. The current implementation does clean all
> 
> How high price to pay? I mean, hibernation/restore takes
> _seconds_. Paying miliseconds to have cleaner code is acceptable
> price.

I agree, but the code to clean all 8GB of memory on Juno takes ~3
seconds, and this will probably scale linearly. We only need to clean
the ~250MB that was copied by hibernate, (and of that, only the
executable pages). The sticking point is the few pages it copies, but
doesn't tell us about.

I will put together the flush_icache_range()-during-decompression
version of this patch... it looks like powerpc will suffer the most from
this, from the comments, its flush_icache_range() code pushes data all
the way out to memory...


Thanks,

James
diff mbox

Patch

diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 8b6ec7ef0854..b17cf6081bca 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -384,6 +384,7 @@  extern bool system_entering_hibernation(void);
 extern bool hibernation_available(void);
 asmlinkage int swsusp_save(void);
 extern struct pbe *restore_pblist;
+extern struct pbe *restored_inplace_pblist;
 #else /* CONFIG_HIBERNATION */
 static inline void register_nosave_region(unsigned long b, unsigned long e) {}
 static inline void register_nosave_region_late(unsigned long b, unsigned long e) {}
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 3a970604308f..f251f5af49fb 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -74,6 +74,11 @@  void __init hibernate_image_size_init(void)
  */
 struct pbe *restore_pblist;
 
+/* List of PBEs that were restored in place. modified-harvard architectures
+ * need to 'clean' these pages before they can be executed.
+ */
+struct pbe *restored_inplace_pblist;
+
 /* Pointer to an auxiliary buffer (1 page) */
 static void *buffer;
 
@@ -1359,6 +1364,7 @@  out:
 	nr_copy_pages = 0;
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
+	restored_inplace_pblist = NULL;
 	buffer = NULL;
 	alloc_normal = 0;
 	alloc_highmem = 0;
@@ -2072,6 +2078,7 @@  load_header(struct swsusp_info *info)
 	int error;
 
 	restore_pblist = NULL;
+	restored_inplace_pblist = NULL;
 	error = check_header(info);
 	if (!error) {
 		nr_copy_pages = info->image_pages;
@@ -2427,25 +2434,31 @@  static void *get_buffer(struct memory_bitmap *bm, struct chain_allocator *ca)
 	if (PageHighMem(page))
 		return get_highmem_page_buffer(page, ca);
 
-	if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
-		/* We have allocated the "original" page frame and we can
-		 * use it directly to store the loaded page.
-		 */
-		return page_address(page);
-
-	/* The "original" page frame has not been allocated and we have to
-	 * use a "safe" page frame to store the loaded page.
-	 */
 	pbe = chain_alloc(ca, sizeof(struct pbe));
 	if (!pbe) {
 		swsusp_free();
 		return ERR_PTR(-ENOMEM);
 	}
-	pbe->orig_address = page_address(page);
-	pbe->address = safe_pages_list;
-	safe_pages_list = safe_pages_list->next;
-	pbe->next = restore_pblist;
-	restore_pblist = pbe;
+
+	if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page)) {
+		/* We have allocated the "original" page frame and we can
+		 * use it directly to store the loaded page.
+		 */
+		pbe->orig_address = NULL;
+		pbe->address = page_address(page);
+		pbe->next = restored_inplace_pblist;
+		restored_inplace_pblist = pbe;
+	} else {
+		/* The "original" page frame has not been allocated and we
+		 * have to use a "safe" page frame to store the loaded page.
+		 */
+		pbe->orig_address = page_address(page);
+		pbe->address = safe_pages_list;
+		safe_pages_list = safe_pages_list->next;
+		pbe->next = restore_pblist;
+		restore_pblist = pbe;
+	}
+
 	return pbe->address;
 }
 
@@ -2513,6 +2526,7 @@  int snapshot_write_next(struct snapshot_handle *handle)
 			chain_init(&ca, GFP_ATOMIC, PG_SAFE);
 			memory_bm_position_reset(&orig_bm);
 			restore_pblist = NULL;
+			restored_inplace_pblist = NULL;
 			handle->buffer = get_buffer(&orig_bm, &ca);
 			handle->sync_read = 0;
 			if (IS_ERR(handle->buffer))