mbox series

[0/2] Improve dmesg output for swapfile+hibernation

Message ID 20240522074658.2420468-1-Sukrit.Bhatnagar@sony.com (mailing list archive)
Headers show
Series Improve dmesg output for swapfile+hibernation | expand

Message

Sukrit.Bhatnagar@sony.com May 22, 2024, 7:46 a.m. UTC
While trying to use a swapfile for hibernation, I noticed that the suspend
process was failing when it tried to search for the swap to use for snapshot.
I had created the swapfile on ext4 and got the starting physical block offset
using the filefrag command.

Upon looking at the swap activation code, I realized that the iomap part is
doing some rounding up for the physical offset of the swapfile.
Then I checked the block size of the filesystem, which was actually set to
1KB by default in my environment.
(This was in buildroot, using the genimage utility to create the VM disk
partitions, filesystems etc.)

The block offset is rounded-up and stored in the swap extents metadata by
iomap code, but as the exact value is lost for 1KB-block filesystem, hibernate
cannot read back the swap header after it finishes writing data pages to swap.

Note that this is not a bug in my understanding. Both swapfile and hibernate
subsystems have the correct handling of this edge case, individually.

Another observation was that we need to rely on external commands, such as
filefrag for getting the swapfile offset value. This value can be conveniently
printed in dmesg output when doing swapon.

Sukrit Bhatnagar (2):
  iomap: swap: print warning for unaligned swapfile
  mm: swap: print starting physical block offset in swapon

 fs/iomap/swapfile.c | 10 ++++++++++
 mm/swapfile.c       |  3 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)

Comments

Pavel Machek May 23, 2024, 7:45 p.m. UTC | #1
Hi!

> While trying to use a swapfile for hibernation, I noticed that the suspend
> process was failing when it tried to search for the swap to use for snapshot.
> I had created the swapfile on ext4 and got the starting physical block offset
> using the filefrag command.

How is swapfile for hibernation supposed to work? I'm afraid that
can't work, and we should just not allow hibernation if there's
anything else than just one swap partition.

Best regards,
								Pavel
Sukrit.Bhatnagar@sony.com May 27, 2024, 11:06 a.m. UTC | #2
Hi Pavel!

On 2024-05-24 04:45, Pavel Machek wrote:
> Hi!
> 
>> While trying to use a swapfile for hibernation, I noticed that the suspend
>> process was failing when it tried to search for the swap to use for snapshot.
>> I had created the swapfile on ext4 and got the starting physical block offset
>> using the filefrag command.
> 
> How is swapfile for hibernation supposed to work? I'm afraid that
> can't work, and we should just not allow hibernation if there's
> anything else than just one swap partition.

I am not sure what you mean.

We can pass the starting physical block offset of a swapfile into
/sys/power/resume_offset, and hibernate can directly read/write
into it using the swap extents information created by iomap during
swapon. On resume, the kernel would read this offset value from
the commandline parameters, and then access the swapfile.

I find having a swapfile option for hibernate useful, in the scenarios
where it is hard to modify the partitioning scheme, or to have a
dedicated swap partition.

Are there any plans to remove swapfile support from hibernation?

--
Sukrit
Christoph Hellwig May 27, 2024, 11:20 a.m. UTC | #3
On Mon, May 27, 2024 at 11:06:11AM +0000, Sukrit.Bhatnagar@sony.com wrote:
> We can pass the starting physical block offset of a swapfile into
> /sys/power/resume_offset, and hibernate can directly read/write
> into it using the swap extents information created by iomap during
> swapon. On resume, the kernel would read this offset value from
> the commandline parameters, and then access the swapfile.

Reading a physical address from userspace is not a proper interface.
What is this code even trying to do with it?
Sukrit.Bhatnagar@sony.com May 27, 2024, 12:51 p.m. UTC | #4
Hi Christoph,

On 2024-05-27 20:20, Christoph Hellwig wrote:
> On Mon, May 27, 2024 at 11:06:11AM +0000, Sukrit.Bhatnagar@sony.com wrote:
>> We can pass the starting physical block offset of a swapfile into
>> /sys/power/resume_offset, and hibernate can directly read/write
>> into it using the swap extents information created by iomap during
>> swapon. On resume, the kernel would read this offset value from
>> the commandline parameters, and then access the swapfile.
> 
> Reading a physical address from userspace is not a proper interface.
> What is this code even trying to do with it?

I understand your point. Ideally, the low-level stuff such as finding
the physical block offset should not be handled in the userspace.

In my understanding, the resume offset in hibernate is used as follows.

Suspend
- Hibernate looks up the swap/swapfile using the details we pass in the
  sysfs entries, in the function swsusp_swap_check():
  * /sys/power/resume - path/uuid/major:minor of the swap partition (or
                        non-swap partition for swapfile)
  * /sys/power/resume_offset - physical offset of the swapfile in that
                               partition
  * If no resume device is specified, it just uses the first available swap!
- It then proceeds to write the image to the specified swap.
  (The allocation of swap pages is done by the swapfile code internally.)
- When writing is finished, the swap header needs to be updated with some
  metadata, in the function mark_swapfiles().
  * Hibernate creates bio requests to read/write the header (which is the
    first page of swap) using that physical block offset.

Resume
- Hibernate gets the partition and offset values from kernel command-line
  parameters "resume" and "resume_offset" (which must be set from
  userspace, not ideal).
- It checks for valid hibernate swap signature by reading the swap header.
  * Hibernate creates bio requests again, using the physical block offset,
    but the one from kernel command-line this time.
- Then it restores image and resumes into the previously saved kernel.

--
Sukrit
Christoph Hellwig May 27, 2024, 12:58 p.m. UTC | #5
On Mon, May 27, 2024 at 12:51:07PM +0000, Sukrit.Bhatnagar@sony.com wrote:
> In my understanding, the resume offset in hibernate is used as follows.
> 
> Suspend
> - Hibernate looks up the swap/swapfile using the details we pass in the
>   sysfs entries, in the function swsusp_swap_check():
>   * /sys/power/resume - path/uuid/major:minor of the swap partition (or
>                         non-swap partition for swapfile)
>   * /sys/power/resume_offset - physical offset of the swapfile in that
>                                partition
>   * If no resume device is specified, it just uses the first available swap!
> - It then proceeds to write the image to the specified swap.
>   (The allocation of swap pages is done by the swapfile code internally.)

Where "it" is userspace code?  If so, that already seems unsafe for
a swap device, but definitely is a no-go for a swapfile.

> - Hibernate gets the partition and offset values from kernel command-line
>   parameters "resume" and "resume_offset" (which must be set from
>   userspace, not ideal).

Or is it just for these parameters?  In which case we "only" need to
specify the swap file, which would then need code in the file system
driver to resolve the logical to physical mapping as swap files don't
need to be contiguous.
Sukrit.Bhatnagar@sony.com May 31, 2024, 2:15 p.m. UTC | #6
On 2024-05-27 21:58, Christoph Hellwig wrote:
> On Mon, May 27, 2024 at 12:51:07PM +0000, Sukrit.Bhatnagar@sony.com wrote:
>> In my understanding, the resume offset in hibernate is used as follows.
>> 
>> Suspend
>> - Hibernate looks up the swap/swapfile using the details we pass in the
>>   sysfs entries, in the function swsusp_swap_check():
>>   * /sys/power/resume - path/uuid/major:minor of the swap partition (or
>>                         non-swap partition for swapfile)
>>   * /sys/power/resume_offset - physical offset of the swapfile in that
>>                                partition
>>   * If no resume device is specified, it just uses the first available
>>   swap! - It then proceeds to write the image to the specified swap.
>>   (The allocation of swap pages is done by the swapfile code
>>   internally.)
> 
> Where "it" is userspace code?  If so, that already seems unsafe for
> a swap device, but definitely is a no-go for a swapfile.

By "it", I meant the hibernate code running in kernel space.
Once userspace triggers hibernation by `echo disk > /sys/power/state`
or a systemd wrapper program etc., and userspace tasks are frozen,
everything happens within kernel context.

>> - Hibernate gets the partition and offset values from kernel command-line
>>   parameters "resume" and "resume_offset" (which must be set from
>>   userspace, not ideal).
> 
> Or is it just for these parameters?  In which case we "only" need to
> specify the swap file, which would then need code in the file system
> driver to resolve the logical to physical mapping as swap files don't
> need to be contiguous.

Yes, it is just for setting these parameters in sysfs entries and in kernel
commandline.
I think specifying the swapfile path *may* not work because when we resume
from hibernation, the filesystems are not yet mounted (except for the case
when someone is resuming from initramfs stage).
Using the block device + physical offset, this procedure becomes Independent
of the filesystem and the mounted status.
And since the system swap information is lost on reboot/shutdown, the kernel
which loads the hibernation image will not know what swaps were enabled
when the image was created.

--
Sukrit