diff mbox

[RFC] PM / Hibernate: Add sysfs knob to control size of memory for drivers

Message ID 201105100059.25372.rjw@sisk.pl (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Rafael Wysocki May 9, 2011, 10:59 p.m. UTC
From: Rafael J. Wysocki <rjw@sisk.pl>

Martin reports that on his system hibernation occasionally fails due
to the lack of memory, because the radeon driver apparently allocates
too much of it during the device freeze stage.  It turns out that the
amount of memory allocated by radeon during hibernation (and
presumably during system suspend too) depends on the utilization of
the GPU (e.g. hibernating while there are two KDE 4 sessions with
compositing enabled causes radeon to allocate more memory than for
one KDE 4 session).

In principle it should be possible to use image_size to make the
memory preallocation mechanism free enough memory for the radeon
driver, but in practice it is not easy to guess the right value
because of the way the preallocation code uses image_size.  For this
reason, it seems reasonable to allow users to control the amount of
memory reserved for driver allocations made after the preallocation,
which currently is constant and amounts to 1 MB.

Introduce a new sysfs file, /sys/power/reserved_size, whose value
will be used as the amount of memory to reserve for the
post-preallocation reservations made by device drivers, in bytes.
For backwards compatibility, set its default (and initial) value to
the currently used number (1 MB).

References: https://bugzilla.kernel.org/show_bug.cgi?id=34102
Reported-by: Martin Steigerwald <Martin@Lichtvoll.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/ABI/testing/sysfs-power |   14 ++++++++++++++
 kernel/power/hibernate.c              |   23 +++++++++++++++++++++++
 kernel/power/main.c                   |    1 +
 kernel/power/power.h                  |    4 ++++
 kernel/power/snapshot.c               |   25 ++++++++++++++++++++-----
 5 files changed, 62 insertions(+), 5 deletions(-)

Comments

Rafael Wysocki May 14, 2011, 10:56 p.m. UTC | #1
Hi,

On Tuesday, May 10, 2011, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Martin reports that on his system hibernation occasionally fails due
> to the lack of memory, because the radeon driver apparently allocates
> too much of it during the device freeze stage.  It turns out that the
> amount of memory allocated by radeon during hibernation (and
> presumably during system suspend too) depends on the utilization of
> the GPU (e.g. hibernating while there are two KDE 4 sessions with
> compositing enabled causes radeon to allocate more memory than for
> one KDE 4 session).
> 
> In principle it should be possible to use image_size to make the
> memory preallocation mechanism free enough memory for the radeon
> driver, but in practice it is not easy to guess the right value
> because of the way the preallocation code uses image_size.  For this
> reason, it seems reasonable to allow users to control the amount of
> memory reserved for driver allocations made after the preallocation,
> which currently is constant and amounts to 1 MB.
> 
> Introduce a new sysfs file, /sys/power/reserved_size, whose value
> will be used as the amount of memory to reserve for the
> post-preallocation reservations made by device drivers, in bytes.
> For backwards compatibility, set its default (and initial) value to
> the currently used number (1 MB).
> 
> References: https://bugzilla.kernel.org/show_bug.cgi?id=34102
> Reported-by: Martin Steigerwald <Martin@Lichtvoll.de>
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

OK, there are no comments, so my understanding is that everyone is fine
with this patch and I can add it to my linux-next branch.

Thanks,
Rafael


> ---
>  Documentation/ABI/testing/sysfs-power |   14 ++++++++++++++
>  kernel/power/hibernate.c              |   23 +++++++++++++++++++++++
>  kernel/power/main.c                   |    1 +
>  kernel/power/power.h                  |    4 ++++
>  kernel/power/snapshot.c               |   25 ++++++++++++++++++++-----
>  5 files changed, 62 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6/kernel/power/hibernate.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/hibernate.c
> +++ linux-2.6/kernel/power/hibernate.c
> @@ -968,10 +968,33 @@ static ssize_t image_size_store(struct k
>  
>  power_attr(image_size);
>  
> +static ssize_t reserved_size_show(struct kobject *kobj,
> +				  struct kobj_attribute *attr, char *buf)
> +{
> +	return sprintf(buf, "%lu\n", reserved_size);
> +}
> +
> +static ssize_t reserved_size_store(struct kobject *kobj,
> +				   struct kobj_attribute *attr,
> +				   const char *buf, size_t n)
> +{
> +	unsigned long size;
> +
> +	if (sscanf(buf, "%lu", &size) == 1) {
> +		reserved_size = size;
> +		return n;
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +power_attr(reserved_size);
> +
>  static struct attribute * g[] = {
>  	&disk_attr.attr,
>  	&resume_attr.attr,
>  	&image_size_attr.attr,
> +	&reserved_size_attr.attr,
>  	NULL,
>  };
>  
> Index: linux-2.6/kernel/power/power.h
> ===================================================================
> --- linux-2.6.orig/kernel/power/power.h
> +++ linux-2.6/kernel/power/power.h
> @@ -15,6 +15,7 @@ struct swsusp_info {
>  
>  #ifdef CONFIG_HIBERNATION
>  /* kernel/power/snapshot.c */
> +extern void __init hibernate_reserved_size_init(void);
>  extern void __init hibernate_image_size_init(void);
>  
>  #ifdef CONFIG_ARCH_HIBERNATION_HEADER
> @@ -55,6 +56,7 @@ extern int hibernation_platform_enter(vo
>  
>  #else /* !CONFIG_HIBERNATION */
>  
> +static inline void hibernate_reserved_size_init(void) {}
>  static inline void hibernate_image_size_init(void) {}
>  #endif /* !CONFIG_HIBERNATION */
>  
> @@ -72,6 +74,8 @@ static struct kobj_attribute _name##_att
>  
>  /* Preferred image size in bytes (default 500 MB) */
>  extern unsigned long image_size;
> +/* Size of memory reserved for drivers (default SPARE_PAGES x PAGE_SIZE) */
> +extern unsigned long reserved_size;
>  extern int in_suspend;
>  extern dev_t swsusp_resume_device;
>  extern sector_t swsusp_resume_block;
> Index: linux-2.6/kernel/power/snapshot.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/snapshot.c
> +++ linux-2.6/kernel/power/snapshot.c
> @@ -41,6 +41,18 @@ static void swsusp_set_page_forbidden(st
>  static void swsusp_unset_page_forbidden(struct page *);
>  
>  /*
> + * Number of bytes to reserve for memory allocations made by device drivers
> + * from their ->freeze() and ->freeze_noirq() callbacks so that they don't
> + * cause image creation to fail (tunable via /sys/power/reserved_size).
> + */
> +unsigned long reserved_size;
> +
> +void __init hibernate_reserved_size_init(void)
> +{
> +	reserved_size = SPARE_PAGES * PAGE_SIZE;
> +}
> +
> +/*
>   * Preferred image size in bytes (tunable via /sys/power/image_size).
>   * When it is set to N, the image creating code will do its best to
>   * ensure the image size will not exceed N bytes, but if that is
> @@ -1263,11 +1275,13 @@ static unsigned long minimum_image_size(
>   * frame in use.  We also need a number of page frames to be free during
>   * hibernation for allocations made while saving the image and for device
>   * drivers, in case they need to allocate memory from their hibernation
> - * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
> - * respectively, both of which are rough estimates).  To make this happen, we
> - * compute the total number of available page frames and allocate at least
> + * callbacks (these two numbers are given by PAGES_FOR_IO (which is a rough
> + * estimate) and reserverd_size divided by PAGE_SIZE (which is tunable through
> + * /sys/power/reserved_size, respectively).  To make this happen, we compute the
> + * total number of available page frames and allocate at least
>   *
> - * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
> + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2
> + *  + 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE)
>   *
>   * of them, which corresponds to the maximum size of a hibernation image.
>   *
> @@ -1322,7 +1336,8 @@ int hibernate_preallocate_memory(void)
>  	count -= totalreserve_pages;
>  
>  	/* Compute the maximum number of saveable pages to leave in memory. */
> -	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
> +	max_size = (count - (size + PAGES_FOR_IO)) / 2
> +			- 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE);
>  	/* Compute the desired number of image pages specified by image_size. */
>  	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
>  	if (size > max_size)
> Index: linux-2.6/kernel/power/main.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/main.c
> +++ linux-2.6/kernel/power/main.c
> @@ -337,6 +337,7 @@ static int __init pm_init(void)
>  	if (error)
>  		return error;
>  	hibernate_image_size_init();
> +	hibernate_reserved_size_init();
>  	power_kobj = kobject_create_and_add("power", NULL);
>  	if (!power_kobj)
>  		return -ENOMEM;
> Index: linux-2.6/Documentation/ABI/testing/sysfs-power
> ===================================================================
> --- linux-2.6.orig/Documentation/ABI/testing/sysfs-power
> +++ linux-2.6/Documentation/ABI/testing/sysfs-power
> @@ -158,3 +158,17 @@ Description:
>  		successful, will make the kernel abort a subsequent transition
>  		to a sleep state if any wakeup events are reported after the
>  		write has returned.
> +
> +What:		/sys/power/reserved_size
> +Date:		May 2011
> +Contact:	Rafael J. Wysocki <rjw@sisk.pl>
> +Description:
> +		The /sys/power/reserved_size file allows user space to control
> +		the amount of memory reserved for allocations made by device
> +		drivers during the "device freeze" stage of hibernation.  It can
> +		be written a string representing a non-negative integer that
> +		will be used as the amount of memory to reserve for allocations
> +		made by device drivers' "freeze" callbacks, in bytes.
> +
> +		Reading from this file will display the current value, which is
> +		set to 1 MB by default.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
>
Martin Steigerwald May 15, 2011, 8:51 a.m. UTC | #2
Am Sonntag, 15. Mai 2011 schrieb Rafael J. Wysocki:
> Hi,

Hi Rafael,

> On Tuesday, May 10, 2011, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Martin reports that on his system hibernation occasionally fails due
> > to the lack of memory, because the radeon driver apparently allocates
> > too much of it during the device freeze stage.  It turns out that the
> > amount of memory allocated by radeon during hibernation (and
> > presumably during system suspend too) depends on the utilization of
> > the GPU (e.g. hibernating while there are two KDE 4 sessions with
> > compositing enabled causes radeon to allocate more memory than for
> > one KDE 4 session).
> > 
> > In principle it should be possible to use image_size to make the
> > memory preallocation mechanism free enough memory for the radeon
> > driver, but in practice it is not easy to guess the right value
> > because of the way the preallocation code uses image_size.  For this
> > reason, it seems reasonable to allow users to control the amount of
> > memory reserved for driver allocations made after the preallocation,
> > which currently is constant and amounts to 1 MB.
> > 
> > Introduce a new sysfs file, /sys/power/reserved_size, whose value
> > will be used as the amount of memory to reserve for the
> > post-preallocation reservations made by device drivers, in bytes.
> > For backwards compatibility, set its default (and initial) value to
> > the currently used number (1 MB).
> > 
> > References: https://bugzilla.kernel.org/show_bug.cgi?id=34102
> > Reported-by: Martin Steigerwald <Martin@Lichtvoll.de>
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> OK, there are no comments, so my understanding is that everyone is fine
> with this patch and I can add it to my linux-next branch.

Extensively

Tested-by: Martin Steigerwald <Martin@Lichtvoll.de>

This patch makes a complete difference for me. Instead of not knowing 
whether my ThinkPad T42 will hibernate with lots of applications open and 
thus closing applications prior to hibernation preventively now it simply 
will. *Always*.

I even tested it with two KDE 4 sessions with running desktop search 
indexing on one. It took ages, cause KDE 4.6 desktop search / nepomuk stuff 
seems to I/O load the machine beyond anything (bugs reported there), but 
it worked.

16 MiB reserved_size has been enough for one KDE session. With 128 MiB the 
linux kernel hibernated two KDE sessions.

Drivers allocating their memory via suspend/hibernate notifiers according 
to Rafael should fix the root cause, but until that is done, this will do. 
Adjusting imagesize instead never gave me such a reliable result.

Thanks,
Rafael Wysocki May 15, 2011, 9:36 a.m. UTC | #3
On Sunday, May 15, 2011, Martin Steigerwald wrote:
> Am Sonntag, 15. Mai 2011 schrieb Rafael J. Wysocki:
> > Hi,
> 
> Hi Rafael,

Hi,

> > On Tuesday, May 10, 2011, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Martin reports that on his system hibernation occasionally fails due
> > > to the lack of memory, because the radeon driver apparently allocates
> > > too much of it during the device freeze stage.  It turns out that the
> > > amount of memory allocated by radeon during hibernation (and
> > > presumably during system suspend too) depends on the utilization of
> > > the GPU (e.g. hibernating while there are two KDE 4 sessions with
> > > compositing enabled causes radeon to allocate more memory than for
> > > one KDE 4 session).
> > > 
> > > In principle it should be possible to use image_size to make the
> > > memory preallocation mechanism free enough memory for the radeon
> > > driver, but in practice it is not easy to guess the right value
> > > because of the way the preallocation code uses image_size.  For this
> > > reason, it seems reasonable to allow users to control the amount of
> > > memory reserved for driver allocations made after the preallocation,
> > > which currently is constant and amounts to 1 MB.
> > > 
> > > Introduce a new sysfs file, /sys/power/reserved_size, whose value
> > > will be used as the amount of memory to reserve for the
> > > post-preallocation reservations made by device drivers, in bytes.
> > > For backwards compatibility, set its default (and initial) value to
> > > the currently used number (1 MB).
> > > 
> > > References: https://bugzilla.kernel.org/show_bug.cgi?id=34102
> > > Reported-by: Martin Steigerwald <Martin@Lichtvoll.de>
> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > OK, there are no comments, so my understanding is that everyone is fine
> > with this patch and I can add it to my linux-next branch.
> 
> Extensively
> 
> Tested-by: Martin Steigerwald <Martin@Lichtvoll.de>
> 
> This patch makes a complete difference for me. Instead of not knowing 
> whether my ThinkPad T42 will hibernate with lots of applications open and 
> thus closing applications prior to hibernation preventively now it simply 
> will. *Always*.
> 
> I even tested it with two KDE 4 sessions with running desktop search 
> indexing on one. It took ages, cause KDE 4.6 desktop search / nepomuk stuff 
> seems to I/O load the machine beyond anything (bugs reported there), but 
> it worked.
> 
> 16 MiB reserved_size has been enough for one KDE session. With 128 MiB the 
> linux kernel hibernated two KDE sessions.
> 
> Drivers allocating their memory via suspend/hibernate notifiers according 
> to Rafael should fix the root cause, but until that is done, this will do. 
> Adjusting imagesize instead never gave me such a reliable result.

In fact, if drivers allocated their memory from suspend/hibernate notifiers,
that would be practically equivalent to setting reserved_size to the total
amount of memory reserved by the drivers.  However, it may be difficult
for drivers to predict how much memory they will need at the time the
notifiers are called (they are called before freezing user space).

Thus I'm considering a change that will cause device drivers' ->prepare()
callbacks to be executed before the preallocation of memory takes place.
In that case the drivers may allocate memory from their ->prepare()
callbacks _after_ user space has been frozen and that will make more
sense overall.

For now, however, I think that exposing reserved_size is the right choice.

Thanks,
Rafael
Nigel Cunningham May 16, 2011, 9:34 p.m. UTC | #4
Hi.

On 15/05/11 19:36, Rafael J. Wysocki wrote:
> In fact, if drivers allocated their memory from suspend/hibernate notifiers,
> that would be practically equivalent to setting reserved_size to the total
> amount of memory reserved by the drivers.  However, it may be difficult
> for drivers to predict how much memory they will need at the time the
> notifiers are called (they are called before freezing user space).
> 
> Thus I'm considering a change that will cause device drivers' ->prepare()
> callbacks to be executed before the preallocation of memory takes place.
> In that case the drivers may allocate memory from their ->prepare()
> callbacks _after_ user space has been frozen and that will make more
> sense overall.
> 
> For now, however, I think that exposing reserved_size is the right choice.

Sorry for not commenting earlier - too busy with Drupal development and
only came across this thread by chance (yes, I'm still subscribed to the
PM list, but haven't been reading it. Hibernation isn't high on my list
of priorities at the moment because TOI is feature complete and stable.
I know I'm supposed to be sending you patches, but other things have
been taking the time that would be used for that).

Anyway...

This sounds to me like a great development. As far as TuxOnIce goes,
we've had a knob for ages that has allowed the user to specify an amount
of memory to be kept aside for driver allocations, and we calculate and
report how much they used in the debugging info sysfs entry. Because
TuxOnIce works differently to [u]swsusp, this is the only source of
potential out-of-memory related failures, and the measures just
mentioned made things much more reliable.

If things went in the direction you're suggesting here, they'd get
better again. I'm all in favour!

Regards,

Nigel
Rafael Wysocki May 16, 2011, 11:22 p.m. UTC | #5
On Monday, May 16, 2011, Nigel Cunningham wrote:
> Hi.
> 
> On 15/05/11 19:36, Rafael J. Wysocki wrote:
> > In fact, if drivers allocated their memory from suspend/hibernate notifiers,
> > that would be practically equivalent to setting reserved_size to the total
> > amount of memory reserved by the drivers.  However, it may be difficult
> > for drivers to predict how much memory they will need at the time the
> > notifiers are called (they are called before freezing user space).
> > 
> > Thus I'm considering a change that will cause device drivers' ->prepare()
> > callbacks to be executed before the preallocation of memory takes place.
> > In that case the drivers may allocate memory from their ->prepare()
> > callbacks _after_ user space has been frozen and that will make more
> > sense overall.
> > 
> > For now, however, I think that exposing reserved_size is the right choice.
> 
> Sorry for not commenting earlier - too busy with Drupal development and
> only came across this thread by chance (yes, I'm still subscribed to the
> PM list, but haven't been reading it. Hibernation isn't high on my list
> of priorities at the moment because TOI is feature complete and stable.
> I know I'm supposed to be sending you patches, but other things have
> been taking the time that would be used for that).
> 
> Anyway...
> 
> This sounds to me like a great development. As far as TuxOnIce goes,
> we've had a knob for ages that has allowed the user to specify an amount
> of memory to be kept aside for driver allocations, and we calculate and
> report how much they used in the debugging info sysfs entry. Because
> TuxOnIce works differently to [u]swsusp, this is the only source of
> potential out-of-memory related failures, and the measures just
> mentioned made things much more reliable.
> 
> If things went in the direction you're suggesting here, they'd get
> better again. I'm all in favour!

Thanks Nigel!
Pavel Machek May 18, 2011, 5:27 p.m. UTC | #6
Hi!

> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Martin reports that on his system hibernation occasionally fails due
> > to the lack of memory, because the radeon driver apparently allocates
> > too much of it during the device freeze stage.  It turns out that the
> > amount of memory allocated by radeon during hibernation (and
> > presumably during system suspend too) depends on the utilization of
> > the GPU (e.g. hibernating while there are two KDE 4 sessions with
> > compositing enabled causes radeon to allocate more memory than for
> > one KDE 4 session).
> > 
> > In principle it should be possible to use image_size to make the
> > memory preallocation mechanism free enough memory for the radeon
> > driver, but in practice it is not easy to guess the right value
> > because of the way the preallocation code uses image_size.  For this
> > reason, it seems reasonable to allow users to control the amount of
> > memory reserved for driver allocations made after the preallocation,
> > which currently is constant and amounts to 1 MB.
> > 
> > Introduce a new sysfs file, /sys/power/reserved_size, whose value
> > will be used as the amount of memory to reserve for the
> > post-preallocation reservations made by device drivers, in bytes.
> > For backwards compatibility, set its default (and initial) value to
> > the currently used number (1 MB).
> > 
> > References: https://bugzilla.kernel.org/show_bug.cgi?id=34102
> > Reported-by: Martin Steigerwald <Martin@Lichtvoll.de>
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> OK, there are no comments, so my understanding is that everyone is fine
> with this patch and I can add it to my linux-next branch.

Actually no, I don't like it. Yes, knob might be useful for debugging,
but having it as part of official kernel interface...

									Pavel
Martin Steigerwald May 18, 2011, 6:09 p.m. UTC | #7
Am Mittwoch, 18. Mai 2011 schrieb Pavel Machek:
> Hi!
> 
> > > From: Rafael J. Wysocki <rjw@sisk.pl>
> > > 
> > > Martin reports that on his system hibernation occasionally fails
> > > due to the lack of memory, because the radeon driver apparently
> > > allocates too much of it during the device freeze stage.  It turns
> > > out that the amount of memory allocated by radeon during
> > > hibernation (and presumably during system suspend too) depends on
> > > the utilization of the GPU (e.g. hibernating while there are two
> > > KDE 4 sessions with compositing enabled causes radeon to allocate
> > > more memory than for one KDE 4 session).
> > > 
> > > In principle it should be possible to use image_size to make the
> > > memory preallocation mechanism free enough memory for the radeon
> > > driver, but in practice it is not easy to guess the right value
> > > because of the way the preallocation code uses image_size.  For
> > > this reason, it seems reasonable to allow users to control the
> > > amount of memory reserved for driver allocations made after the
> > > preallocation, which currently is constant and amounts to 1 MB.
> > > 
> > > Introduce a new sysfs file, /sys/power/reserved_size, whose value
> > > will be used as the amount of memory to reserve for the
> > > post-preallocation reservations made by device drivers, in bytes.
> > > For backwards compatibility, set its default (and initial) value to
> > > the currently used number (1 MB).
> > > 
> > > References: https://bugzilla.kernel.org/show_bug.cgi?id=34102
> > > Reported-by: Martin Steigerwald <Martin@Lichtvoll.de>
> > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > OK, there are no comments, so my understanding is that everyone is
> > fine with this patch and I can add it to my linux-next branch.
> 
> Actually no, I don't like it. Yes, knob might be useful for debugging,
> but having it as part of official kernel interface...

Well I and people with similar setups it is actually quite useful. It 
makes the difference between does hibernate *every time* versus does not 
hibernate sometimes. And I don't see why it can't go again, when the issue 
is taken care of elsewise in the future. I think that autotuning / drivers 
allocating their memory via whatnot is better, but until such a mechanism 
is agreed, developed and included in official kernel I do think that this 
knob does help.

So or so I will patch this in for the kernels for my ThinkPad T42, whether 
its part of the official kernel or not. And I build my own kernels anyway. 
So when described issue really only happens on my setup and nowhere 
else...

I think missing is some documentation so that the advanced user can figure 
out this knob.
Pavel Machek May 18, 2011, 6:36 p.m. UTC | #8
Hi!

> > > OK, there are no comments, so my understanding is that everyone is
> > > fine with this patch and I can add it to my linux-next branch.
> > 
> > Actually no, I don't like it. Yes, knob might be useful for debugging,
> > but having it as part of official kernel interface...
> 
> Well I and people with similar setups it is actually quite useful. It 
> makes the difference between does hibernate *every time* versus does not 
> hibernate sometimes. And I don't see why it can't go again, when the issue 
> is taken care of elsewise in the future. I think that autotuning / drivers 
> allocating their memory via whatnot is better, but until such a mechanism 
> is agreed, developed and included in official kernel I do think that this 
> knob does help.

Yes, autotuning would be better, and yes, knob is useful for you now.

I'd say the knob is for debugging and should go to debugfs somewhere.

> I think missing is some documentation so that the advanced user can figure 
> out this knob.

...so we can easily delete the knob when it is no longer neccessary.
									Pavel
Rafael Wysocki May 18, 2011, 8:23 p.m. UTC | #9
On Wednesday, May 18, 2011, Pavel Machek wrote:
> Hi!
> 
> > > > OK, there are no comments, so my understanding is that everyone is
> > > > fine with this patch and I can add it to my linux-next branch.
> > > 
> > > Actually no, I don't like it. Yes, knob might be useful for debugging,
> > > but having it as part of official kernel interface...
> > 
> > Well I and people with similar setups it is actually quite useful. It 
> > makes the difference between does hibernate *every time* versus does not 
> > hibernate sometimes. And I don't see why it can't go again, when the issue 
> > is taken care of elsewise in the future. I think that autotuning / drivers 
> > allocating their memory via whatnot is better, but until such a mechanism 
> > is agreed, developed and included in official kernel I do think that this 
> > knob does help.
> 
> Yes, autotuning would be better, and yes, knob is useful for you now.
> 
> I'd say the knob is for debugging and should go to debugfs somewhere.
> 
> > I think missing is some documentation so that the advanced user can figure 
> > out this knob.
> 
> ...so we can easily delete the knob when it is no longer neccessary.

_If_ drivers start to allocate memory in .prepare(), the knob won't be
necessary any more and _then_ we may consider removing it.  I guess a lot
of time will pass before that happens, though. :-)

That said, I'm not sure we should recommend using it beyond the cases in which
nothing else works.

Thanks,
Rafael
diff mbox

Patch

Index: linux-2.6/kernel/power/hibernate.c
===================================================================
--- linux-2.6.orig/kernel/power/hibernate.c
+++ linux-2.6/kernel/power/hibernate.c
@@ -968,10 +968,33 @@  static ssize_t image_size_store(struct k
 
 power_attr(image_size);
 
+static ssize_t reserved_size_show(struct kobject *kobj,
+				  struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%lu\n", reserved_size);
+}
+
+static ssize_t reserved_size_store(struct kobject *kobj,
+				   struct kobj_attribute *attr,
+				   const char *buf, size_t n)
+{
+	unsigned long size;
+
+	if (sscanf(buf, "%lu", &size) == 1) {
+		reserved_size = size;
+		return n;
+	}
+
+	return -EINVAL;
+}
+
+power_attr(reserved_size);
+
 static struct attribute * g[] = {
 	&disk_attr.attr,
 	&resume_attr.attr,
 	&image_size_attr.attr,
+	&reserved_size_attr.attr,
 	NULL,
 };
 
Index: linux-2.6/kernel/power/power.h
===================================================================
--- linux-2.6.orig/kernel/power/power.h
+++ linux-2.6/kernel/power/power.h
@@ -15,6 +15,7 @@  struct swsusp_info {
 
 #ifdef CONFIG_HIBERNATION
 /* kernel/power/snapshot.c */
+extern void __init hibernate_reserved_size_init(void);
 extern void __init hibernate_image_size_init(void);
 
 #ifdef CONFIG_ARCH_HIBERNATION_HEADER
@@ -55,6 +56,7 @@  extern int hibernation_platform_enter(vo
 
 #else /* !CONFIG_HIBERNATION */
 
+static inline void hibernate_reserved_size_init(void) {}
 static inline void hibernate_image_size_init(void) {}
 #endif /* !CONFIG_HIBERNATION */
 
@@ -72,6 +74,8 @@  static struct kobj_attribute _name##_att
 
 /* Preferred image size in bytes (default 500 MB) */
 extern unsigned long image_size;
+/* Size of memory reserved for drivers (default SPARE_PAGES x PAGE_SIZE) */
+extern unsigned long reserved_size;
 extern int in_suspend;
 extern dev_t swsusp_resume_device;
 extern sector_t swsusp_resume_block;
Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -41,6 +41,18 @@  static void swsusp_set_page_forbidden(st
 static void swsusp_unset_page_forbidden(struct page *);
 
 /*
+ * Number of bytes to reserve for memory allocations made by device drivers
+ * from their ->freeze() and ->freeze_noirq() callbacks so that they don't
+ * cause image creation to fail (tunable via /sys/power/reserved_size).
+ */
+unsigned long reserved_size;
+
+void __init hibernate_reserved_size_init(void)
+{
+	reserved_size = SPARE_PAGES * PAGE_SIZE;
+}
+
+/*
  * Preferred image size in bytes (tunable via /sys/power/image_size).
  * When it is set to N, the image creating code will do its best to
  * ensure the image size will not exceed N bytes, but if that is
@@ -1263,11 +1275,13 @@  static unsigned long minimum_image_size(
  * frame in use.  We also need a number of page frames to be free during
  * hibernation for allocations made while saving the image and for device
  * drivers, in case they need to allocate memory from their hibernation
- * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES,
- * respectively, both of which are rough estimates).  To make this happen, we
- * compute the total number of available page frames and allocate at least
+ * callbacks (these two numbers are given by PAGES_FOR_IO (which is a rough
+ * estimate) and reserverd_size divided by PAGE_SIZE (which is tunable through
+ * /sys/power/reserved_size, respectively).  To make this happen, we compute the
+ * total number of available page frames and allocate at least
  *
- * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES
+ * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2
+ *  + 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE)
  *
  * of them, which corresponds to the maximum size of a hibernation image.
  *
@@ -1322,7 +1336,8 @@  int hibernate_preallocate_memory(void)
 	count -= totalreserve_pages;
 
 	/* Compute the maximum number of saveable pages to leave in memory. */
-	max_size = (count - (size + PAGES_FOR_IO)) / 2 - 2 * SPARE_PAGES;
+	max_size = (count - (size + PAGES_FOR_IO)) / 2
+			- 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE);
 	/* Compute the desired number of image pages specified by image_size. */
 	size = DIV_ROUND_UP(image_size, PAGE_SIZE);
 	if (size > max_size)
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -337,6 +337,7 @@  static int __init pm_init(void)
 	if (error)
 		return error;
 	hibernate_image_size_init();
+	hibernate_reserved_size_init();
 	power_kobj = kobject_create_and_add("power", NULL);
 	if (!power_kobj)
 		return -ENOMEM;
Index: linux-2.6/Documentation/ABI/testing/sysfs-power
===================================================================
--- linux-2.6.orig/Documentation/ABI/testing/sysfs-power
+++ linux-2.6/Documentation/ABI/testing/sysfs-power
@@ -158,3 +158,17 @@  Description:
 		successful, will make the kernel abort a subsequent transition
 		to a sleep state if any wakeup events are reported after the
 		write has returned.
+
+What:		/sys/power/reserved_size
+Date:		May 2011
+Contact:	Rafael J. Wysocki <rjw@sisk.pl>
+Description:
+		The /sys/power/reserved_size file allows user space to control
+		the amount of memory reserved for allocations made by device
+		drivers during the "device freeze" stage of hibernation.  It can
+		be written a string representing a non-negative integer that
+		will be used as the amount of memory to reserve for allocations
+		made by device drivers' "freeze" callbacks, in bytes.
+
+		Reading from this file will display the current value, which is
+		set to 1 MB by default.