Message ID | cover.1598042152.git.anchalag@amazon.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix PM hibernation in Xen guests | expand |
On Fri, Aug 21, 2020 at 10:22:43PM +0000, Anchal Agarwal wrote: > Hello, > This series fixes PM hibernation for hvm guests running on xen hypervisor. > The running guest could now be hibernated and resumed successfully at a > later time. The fixes for PM hibernation are added to block and > network device drivers i.e xen-blkfront and xen-netfront. Any other driver > that needs to add S4 support if not already, can follow same method of > introducing freeze/thaw/restore callbacks. > The patches had been tested against upstream kernel and xen4.11. Large > scale testing is also done on Xen based Amazon EC2 instances. All this testing > involved running memory exhausting workload in the background. > > Doing guest hibernation does not involve any support from hypervisor and > this way guest has complete control over its state. Infrastructure > restrictions for saving up guest state can be overcome by guest initiated > hibernation. > > These patches were send out as RFC before and all the feedback had been > incorporated in the patches. The last v1 & v2 could be found here: > > [v1]: https://lkml.org/lkml/2020/5/19/1312 > [v2]: https://lkml.org/lkml/2020/7/2/995 > All comments and feedback from v2 had been incorporated in v3 series. > > Known issues: > 1.KASLR causes intermittent hibernation failures. VM fails to resumes and > has to be restarted. I will investigate this issue separately and shouldn't > be a blocker for this patch series. > 2. During hibernation, I observed sometimes that freezing of tasks fails due > to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1 > out of 200 runs and hibernation is aborted in this case. Re-trying hibernation > may work. Also, this is a known issue with hibernation and some > filesystems like XFS has been discussed by the community for years with not an > effectve resolution at this point. > > Testing How to: > --------------- > 1. Setup xen hypervisor on a physical machine[ I used Ubuntu 16.04 +upstream > xen-4.11] > 2. Bring up a HVM guest w/t kernel compiled with hibernation patches > [I used ubuntu18.04 netboot bionic images and also Amazon Linux on-prem images]. > 3. Create a swap file size=RAM size > 4. Update grub parameters and reboot > 5. Trigger pm-hibernation from within the VM > > Example: > Set up a file-backed swap space. Swap file size>=Total memory on the system > sudo dd if=/dev/zero of=/swap bs=$(( 1024 * 1024 )) count=4096 # 4096MiB > sudo chmod 600 /swap > sudo mkswap /swap > sudo swapon /swap > > Update resume device/resume offset in grub if using swap file: > resume=/dev/xvda1 resume_offset=200704 no_console_suspend=1 > > Execute: > -------- > sudo pm-hibernate > OR > echo disk > /sys/power/state && echo reboot > /sys/power/disk > > Compute resume offset code: > " > #!/usr/bin/env python > import sys > import array > import fcntl > > #swap file > f = open(sys.argv[1], 'r') > buf = array.array('L', [0]) > > #FIBMAP > ret = fcntl.ioctl(f.fileno(), 0x01, buf) > print buf[0] > " > > Aleksei Besogonov (1): > PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA > > Anchal Agarwal (4): > x86/xen: Introduce new function to map HYPERVISOR_shared_info on > Resume > x86/xen: save and restore steal clock during PM hibernation > xen: Introduce wrapper for save/restore sched clock offset > xen: Update sched clock offset to avoid system instability in > hibernation > > Munehisa Kamata (5): > xen/manage: keep track of the on-going suspend mode > xenbus: add freeze/thaw/restore callbacks support > x86/xen: add system core suspend and resume callbacks > xen-blkfront: add callbacks for PM suspend and hibernation > xen-netfront: add callbacks for PM suspend and hibernation > > Thomas Gleixner (1): > genirq: Shutdown irq chips in suspend/resume during hibernation > > arch/x86/xen/enlighten_hvm.c | 7 +++ > arch/x86/xen/suspend.c | 63 ++++++++++++++++++++ > arch/x86/xen/time.c | 15 ++++- > arch/x86/xen/xen-ops.h | 3 + > drivers/block/xen-blkfront.c | 122 ++++++++++++++++++++++++++++++++++++-- > drivers/net/xen-netfront.c | 96 +++++++++++++++++++++++++++++- > drivers/xen/events/events_base.c | 1 + > drivers/xen/manage.c | 46 ++++++++++++++ > drivers/xen/xenbus/xenbus_probe.c | 96 +++++++++++++++++++++++++----- > include/linux/irq.h | 2 + > include/xen/xen-ops.h | 3 + > include/xen/xenbus.h | 3 + > kernel/irq/chip.c | 2 +- > kernel/irq/internals.h | 1 + > kernel/irq/pm.c | 31 +++++++--- > kernel/power/user.c | 7 ++- > 16 files changed, 464 insertions(+), 34 deletions(-) > > -- > 2.16.6 > A gentle ping on the series in case there is any more feedback or can we plan to merge this? I can then send the series with minor fixes pointed by tglx@ Thanks, Anchal
On Fri, Aug 28, 2020 at 8:26 PM Anchal Agarwal <anchalag@amazon.com> wrote: > > On Fri, Aug 21, 2020 at 10:22:43PM +0000, Anchal Agarwal wrote: > > Hello, > > This series fixes PM hibernation for hvm guests running on xen hypervisor. > > The running guest could now be hibernated and resumed successfully at a > > later time. The fixes for PM hibernation are added to block and > > network device drivers i.e xen-blkfront and xen-netfront. Any other driver > > that needs to add S4 support if not already, can follow same method of > > introducing freeze/thaw/restore callbacks. > > The patches had been tested against upstream kernel and xen4.11. Large > > scale testing is also done on Xen based Amazon EC2 instances. All this testing > > involved running memory exhausting workload in the background. > > > > Doing guest hibernation does not involve any support from hypervisor and > > this way guest has complete control over its state. Infrastructure > > restrictions for saving up guest state can be overcome by guest initiated > > hibernation. > > > > These patches were send out as RFC before and all the feedback had been > > incorporated in the patches. The last v1 & v2 could be found here: > > > > [v1]: https://lkml.org/lkml/2020/5/19/1312 > > [v2]: https://lkml.org/lkml/2020/7/2/995 > > All comments and feedback from v2 had been incorporated in v3 series. > > > > Known issues: > > 1.KASLR causes intermittent hibernation failures. VM fails to resumes and > > has to be restarted. I will investigate this issue separately and shouldn't > > be a blocker for this patch series. > > 2. During hibernation, I observed sometimes that freezing of tasks fails due > > to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1 > > out of 200 runs and hibernation is aborted in this case. Re-trying hibernation > > may work. Also, this is a known issue with hibernation and some > > filesystems like XFS has been discussed by the community for years with not an > > effectve resolution at this point. > > > > Testing How to: > > --------------- > > 1. Setup xen hypervisor on a physical machine[ I used Ubuntu 16.04 +upstream > > xen-4.11] > > 2. Bring up a HVM guest w/t kernel compiled with hibernation patches > > [I used ubuntu18.04 netboot bionic images and also Amazon Linux on-prem images]. > > 3. Create a swap file size=RAM size > > 4. Update grub parameters and reboot > > 5. Trigger pm-hibernation from within the VM > > > > Example: > > Set up a file-backed swap space. Swap file size>=Total memory on the system > > sudo dd if=/dev/zero of=/swap bs=$(( 1024 * 1024 )) count=4096 # 4096MiB > > sudo chmod 600 /swap > > sudo mkswap /swap > > sudo swapon /swap > > > > Update resume device/resume offset in grub if using swap file: > > resume=/dev/xvda1 resume_offset=200704 no_console_suspend=1 > > > > Execute: > > -------- > > sudo pm-hibernate > > OR > > echo disk > /sys/power/state && echo reboot > /sys/power/disk > > > > Compute resume offset code: > > " > > #!/usr/bin/env python > > import sys > > import array > > import fcntl > > > > #swap file > > f = open(sys.argv[1], 'r') > > buf = array.array('L', [0]) > > > > #FIBMAP > > ret = fcntl.ioctl(f.fileno(), 0x01, buf) > > print buf[0] > > " > > > > Aleksei Besogonov (1): > > PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA > > > > Anchal Agarwal (4): > > x86/xen: Introduce new function to map HYPERVISOR_shared_info on > > Resume > > x86/xen: save and restore steal clock during PM hibernation > > xen: Introduce wrapper for save/restore sched clock offset > > xen: Update sched clock offset to avoid system instability in > > hibernation > > > > Munehisa Kamata (5): > > xen/manage: keep track of the on-going suspend mode > > xenbus: add freeze/thaw/restore callbacks support > > x86/xen: add system core suspend and resume callbacks > > xen-blkfront: add callbacks for PM suspend and hibernation > > xen-netfront: add callbacks for PM suspend and hibernation > > > > Thomas Gleixner (1): > > genirq: Shutdown irq chips in suspend/resume during hibernation > > > > arch/x86/xen/enlighten_hvm.c | 7 +++ > > arch/x86/xen/suspend.c | 63 ++++++++++++++++++++ > > arch/x86/xen/time.c | 15 ++++- > > arch/x86/xen/xen-ops.h | 3 + > > drivers/block/xen-blkfront.c | 122 ++++++++++++++++++++++++++++++++++++-- > > drivers/net/xen-netfront.c | 96 +++++++++++++++++++++++++++++- > > drivers/xen/events/events_base.c | 1 + > > drivers/xen/manage.c | 46 ++++++++++++++ > > drivers/xen/xenbus/xenbus_probe.c | 96 +++++++++++++++++++++++++----- > > include/linux/irq.h | 2 + > > include/xen/xen-ops.h | 3 + > > include/xen/xenbus.h | 3 + > > kernel/irq/chip.c | 2 +- > > kernel/irq/internals.h | 1 + > > kernel/irq/pm.c | 31 +++++++--- > > kernel/power/user.c | 7 ++- > > 16 files changed, 464 insertions(+), 34 deletions(-) > > > > -- > > 2.16.6 > > > A gentle ping on the series in case there is any more feedback or can we plan to > merge this? I can then send the series with minor fixes pointed by tglx@ Some more time, please!
On Fri, Aug 28, 2020 at 08:29:24PM +0200, Rafael J. Wysocki wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On Fri, Aug 28, 2020 at 8:26 PM Anchal Agarwal <anchalag@amazon.com> wrote: > > > > On Fri, Aug 21, 2020 at 10:22:43PM +0000, Anchal Agarwal wrote: > > > Hello, > > > This series fixes PM hibernation for hvm guests running on xen hypervisor. > > > The running guest could now be hibernated and resumed successfully at a > > > later time. The fixes for PM hibernation are added to block and > > > network device drivers i.e xen-blkfront and xen-netfront. Any other driver > > > that needs to add S4 support if not already, can follow same method of > > > introducing freeze/thaw/restore callbacks. > > > The patches had been tested against upstream kernel and xen4.11. Large > > > scale testing is also done on Xen based Amazon EC2 instances. All this testing > > > involved running memory exhausting workload in the background. > > > > > > Doing guest hibernation does not involve any support from hypervisor and > > > this way guest has complete control over its state. Infrastructure > > > restrictions for saving up guest state can be overcome by guest initiated > > > hibernation. > > > > > > These patches were send out as RFC before and all the feedback had been > > > incorporated in the patches. The last v1 & v2 could be found here: > > > > > > [v1]: https://lkml.org/lkml/2020/5/19/1312 > > > [v2]: https://lkml.org/lkml/2020/7/2/995 > > > All comments and feedback from v2 had been incorporated in v3 series. > > > > > > Known issues: > > > 1.KASLR causes intermittent hibernation failures. VM fails to resumes and > > > has to be restarted. I will investigate this issue separately and shouldn't > > > be a blocker for this patch series. > > > 2. During hibernation, I observed sometimes that freezing of tasks fails due > > > to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1 > > > out of 200 runs and hibernation is aborted in this case. Re-trying hibernation > > > may work. Also, this is a known issue with hibernation and some > > > filesystems like XFS has been discussed by the community for years with not an > > > effectve resolution at this point. > > > > > > Testing How to: > > > --------------- > > > 1. Setup xen hypervisor on a physical machine[ I used Ubuntu 16.04 +upstream > > > xen-4.11] > > > 2. Bring up a HVM guest w/t kernel compiled with hibernation patches > > > [I used ubuntu18.04 netboot bionic images and also Amazon Linux on-prem images]. > > > 3. Create a swap file size=RAM size > > > 4. Update grub parameters and reboot > > > 5. Trigger pm-hibernation from within the VM > > > > > > Example: > > > Set up a file-backed swap space. Swap file size>=Total memory on the system > > > sudo dd if=/dev/zero of=/swap bs=$(( 1024 * 1024 )) count=4096 # 4096MiB > > > sudo chmod 600 /swap > > > sudo mkswap /swap > > > sudo swapon /swap > > > > > > Update resume device/resume offset in grub if using swap file: > > > resume=/dev/xvda1 resume_offset=200704 no_console_suspend=1 > > > > > > Execute: > > > -------- > > > sudo pm-hibernate > > > OR > > > echo disk > /sys/power/state && echo reboot > /sys/power/disk > > > > > > Compute resume offset code: > > > " > > > #!/usr/bin/env python > > > import sys > > > import array > > > import fcntl > > > > > > #swap file > > > f = open(sys.argv[1], 'r') > > > buf = array.array('L', [0]) > > > > > > #FIBMAP > > > ret = fcntl.ioctl(f.fileno(), 0x01, buf) > > > print buf[0] > > > " > > > > > > Aleksei Besogonov (1): > > > PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA > > > > > > Anchal Agarwal (4): > > > x86/xen: Introduce new function to map HYPERVISOR_shared_info on > > > Resume > > > x86/xen: save and restore steal clock during PM hibernation > > > xen: Introduce wrapper for save/restore sched clock offset > > > xen: Update sched clock offset to avoid system instability in > > > hibernation > > > > > > Munehisa Kamata (5): > > > xen/manage: keep track of the on-going suspend mode > > > xenbus: add freeze/thaw/restore callbacks support > > > x86/xen: add system core suspend and resume callbacks > > > xen-blkfront: add callbacks for PM suspend and hibernation > > > xen-netfront: add callbacks for PM suspend and hibernation > > > > > > Thomas Gleixner (1): > > > genirq: Shutdown irq chips in suspend/resume during hibernation > > > > > > arch/x86/xen/enlighten_hvm.c | 7 +++ > > > arch/x86/xen/suspend.c | 63 ++++++++++++++++++++ > > > arch/x86/xen/time.c | 15 ++++- > > > arch/x86/xen/xen-ops.h | 3 + > > > drivers/block/xen-blkfront.c | 122 ++++++++++++++++++++++++++++++++++++-- > > > drivers/net/xen-netfront.c | 96 +++++++++++++++++++++++++++++- > > > drivers/xen/events/events_base.c | 1 + > > > drivers/xen/manage.c | 46 ++++++++++++++ > > > drivers/xen/xenbus/xenbus_probe.c | 96 +++++++++++++++++++++++++----- > > > include/linux/irq.h | 2 + > > > include/xen/xen-ops.h | 3 + > > > include/xen/xenbus.h | 3 + > > > kernel/irq/chip.c | 2 +- > > > kernel/irq/internals.h | 1 + > > > kernel/irq/pm.c | 31 +++++++--- > > > kernel/power/user.c | 7 ++- > > > 16 files changed, 464 insertions(+), 34 deletions(-) > > > > > > -- > > > 2.16.6 > > > > > A gentle ping on the series in case there is any more feedback or can we plan to > > merge this? I can then send the series with minor fixes pointed by tglx@ > > Some more time, please! > Sure happy to answer any more questions and fix more BUGS!! -- Anchal
On 8/21/20 6:22 PM, Anchal Agarwal wrote: > > Known issues: > 1.KASLR causes intermittent hibernation failures. VM fails to resumes and > has to be restarted. I will investigate this issue separately and shouldn't > be a blocker for this patch series. Is there any change in status for this? This has been noted since January. -boris > 2. During hibernation, I observed sometimes that freezing of tasks fails due > to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1 > out of 200 runs and hibernation is aborted in this case. Re-trying hibernation > may work. Also, this is a known issue with hibernation and some > filesystems like XFS has been discussed by the community for years with not an > effectve resolution at this point. >
On Fri, Aug 28, 2020 at 06:39:45PM +0000, Anchal Agarwal wrote: > On Fri, Aug 28, 2020 at 08:29:24PM +0200, Rafael J. Wysocki wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > > On Fri, Aug 28, 2020 at 8:26 PM Anchal Agarwal <anchalag@amazon.com> wrote: > > > > > > On Fri, Aug 21, 2020 at 10:22:43PM +0000, Anchal Agarwal wrote: > > > > Hello, > > > > This series fixes PM hibernation for hvm guests running on xen hypervisor. > > > > The running guest could now be hibernated and resumed successfully at a > > > > later time. The fixes for PM hibernation are added to block and > > > > network device drivers i.e xen-blkfront and xen-netfront. Any other driver > > > > that needs to add S4 support if not already, can follow same method of > > > > introducing freeze/thaw/restore callbacks. > > > > The patches had been tested against upstream kernel and xen4.11. Large > > > > scale testing is also done on Xen based Amazon EC2 instances. All this testing > > > > involved running memory exhausting workload in the background. > > > > > > > > Doing guest hibernation does not involve any support from hypervisor and > > > > this way guest has complete control over its state. Infrastructure > > > > restrictions for saving up guest state can be overcome by guest initiated > > > > hibernation. > > > > > > > > These patches were send out as RFC before and all the feedback had been > > > > incorporated in the patches. The last v1 & v2 could be found here: > > > > > > > > [v1]: https://lkml.org/lkml/2020/5/19/1312 > > > > [v2]: https://lkml.org/lkml/2020/7/2/995 > > > > All comments and feedback from v2 had been incorporated in v3 series. > > > > > > > > Known issues: > > > > 1.KASLR causes intermittent hibernation failures. VM fails to resumes and > > > > has to be restarted. I will investigate this issue separately and shouldn't > > > > be a blocker for this patch series. > > > > 2. During hibernation, I observed sometimes that freezing of tasks fails due > > > > to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1 > > > > out of 200 runs and hibernation is aborted in this case. Re-trying hibernation > > > > may work. Also, this is a known issue with hibernation and some > > > > filesystems like XFS has been discussed by the community for years with not an > > > > effectve resolution at this point. > > > > > > > > Testing How to: > > > > --------------- > > > > 1. Setup xen hypervisor on a physical machine[ I used Ubuntu 16.04 +upstream > > > > xen-4.11] > > > > 2. Bring up a HVM guest w/t kernel compiled with hibernation patches > > > > [I used ubuntu18.04 netboot bionic images and also Amazon Linux on-prem images]. > > > > 3. Create a swap file size=RAM size > > > > 4. Update grub parameters and reboot > > > > 5. Trigger pm-hibernation from within the VM > > > > > > > > Example: > > > > Set up a file-backed swap space. Swap file size>=Total memory on the system > > > > sudo dd if=/dev/zero of=/swap bs=$(( 1024 * 1024 )) count=4096 # 4096MiB > > > > sudo chmod 600 /swap > > > > sudo mkswap /swap > > > > sudo swapon /swap > > > > > > > > Update resume device/resume offset in grub if using swap file: > > > > resume=/dev/xvda1 resume_offset=200704 no_console_suspend=1 > > > > > > > > Execute: > > > > -------- > > > > sudo pm-hibernate > > > > OR > > > > echo disk > /sys/power/state && echo reboot > /sys/power/disk > > > > > > > > Compute resume offset code: > > > > " > > > > #!/usr/bin/env python > > > > import sys > > > > import array > > > > import fcntl > > > > > > > > #swap file > > > > f = open(sys.argv[1], 'r') > > > > buf = array.array('L', [0]) > > > > > > > > #FIBMAP > > > > ret = fcntl.ioctl(f.fileno(), 0x01, buf) > > > > print buf[0] > > > > " > > > > > > > > Aleksei Besogonov (1): > > > > PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA > > > > > > > > Anchal Agarwal (4): > > > > x86/xen: Introduce new function to map HYPERVISOR_shared_info on > > > > Resume > > > > x86/xen: save and restore steal clock during PM hibernation > > > > xen: Introduce wrapper for save/restore sched clock offset > > > > xen: Update sched clock offset to avoid system instability in > > > > hibernation > > > > > > > > Munehisa Kamata (5): > > > > xen/manage: keep track of the on-going suspend mode > > > > xenbus: add freeze/thaw/restore callbacks support > > > > x86/xen: add system core suspend and resume callbacks > > > > xen-blkfront: add callbacks for PM suspend and hibernation > > > > xen-netfront: add callbacks for PM suspend and hibernation > > > > > > > > Thomas Gleixner (1): > > > > genirq: Shutdown irq chips in suspend/resume during hibernation > > > > > > > > arch/x86/xen/enlighten_hvm.c | 7 +++ > > > > arch/x86/xen/suspend.c | 63 ++++++++++++++++++++ > > > > arch/x86/xen/time.c | 15 ++++- > > > > arch/x86/xen/xen-ops.h | 3 + > > > > drivers/block/xen-blkfront.c | 122 ++++++++++++++++++++++++++++++++++++-- > > > > drivers/net/xen-netfront.c | 96 +++++++++++++++++++++++++++++- > > > > drivers/xen/events/events_base.c | 1 + > > > > drivers/xen/manage.c | 46 ++++++++++++++ > > > > drivers/xen/xenbus/xenbus_probe.c | 96 +++++++++++++++++++++++++----- > > > > include/linux/irq.h | 2 + > > > > include/xen/xen-ops.h | 3 + > > > > include/xen/xenbus.h | 3 + > > > > kernel/irq/chip.c | 2 +- > > > > kernel/irq/internals.h | 1 + > > > > kernel/irq/pm.c | 31 +++++++--- > > > > kernel/power/user.c | 7 ++- > > > > 16 files changed, 464 insertions(+), 34 deletions(-) > > > > > > > > -- > > > > 2.16.6 > > > > > > > A gentle ping on the series in case there is any more feedback or can we plan to > > > merge this? I can then send the series with minor fixes pointed by tglx@ > > > > Some more time, please! > > > Sure happy to answer any more questions and fix more BUGS!! > > -- > Anchal > A gentle ping on this one again! -- Anchal