mbox series

[v7,0/4] RISC-V Hibernation Support

Message ID 20230323045604.536099-1-jeeheng.sia@starfivetech.com (mailing list archive)
Headers show
Series RISC-V Hibernation Support | expand

Message

Sia Jee Heng March 23, 2023, 4:56 a.m. UTC
This series adds RISC-V Hibernation/suspend to disk support.
Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start to
restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

At high-level, this series includes the following changes:
1) Change suspend_save_csrs() and suspend_restore_csrs()
   to public function as these functions are common to
   suspend/hibernation. (patch 1)
2) Refactor the common code in the __cpu_resume_enter() function and
   __hibernate_cpu_resume() function. The common code are used by
   hibernation and suspend. (patch 2)
3) Enhance kernel_page_present() function to support huge page. (patch 3)
4) Add arch/riscv low level functions to support
   hibernation/suspend to disk. (patch 4)

The above patches are based on kernel v6.3-rc3 and are tested on
StarFive VF2 SBC board and Qemu. 
ACPI platform mode is not supported in this series.

Changes since v6:
- Rebased to kernel v6.3-rc3
- Resolved nit

Changes since v5:
- Rebased to kernel v6.3-rc2
- Removed extra line at the commit msg
- Added comment to describe the reason to map the kernel address

Changes since v4:
- Rebased to kernel v6.3-rc1
- Resolved typo(s)
- Removed unnecessary helper function
- Removed unnecessary "addr" local variable
- Removed typecast of 'int'
- Used def_bool HIBERNATION
- Used "mv a0, zero" instead of "add a0, zero, zero"
- Make linear region as executable and writable when restoring the
  image

Changes since v3:
- Rebased to kernel v6.2
- Temporary page table code refactoring by reference to ARM64
- Resolved typo(s) and grammars
- Resolved documentation errors
- Resolved clang build issue
- Removed unnecessary comments
- Used kzalloc instead of kcalloc

Changes since v2:
- Rebased to kernel v6.2-rc5
- Refactor the common code used by hibernation and suspend
- Create copy_page macro
- Solved other comments from Andrew and Conor

Changes since v1:
- Rebased to kernel v6.2-rc3
- Fixed bot's compilation error

Sia Jee Heng (4):
  RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public
    function
  RISC-V: Factor out common code of __cpu_resume_enter()
  RISC-V: mm: Enable huge page support to kernel_page_present() function
  RISC-V: Add arch functions to support hibernation/suspend-to-disk

 arch/riscv/Kconfig                 |   6 +
 arch/riscv/include/asm/assembler.h |  82 ++++++
 arch/riscv/include/asm/suspend.h   |  22 ++
 arch/riscv/kernel/Makefile         |   1 +
 arch/riscv/kernel/asm-offsets.c    |   5 +
 arch/riscv/kernel/hibernate-asm.S  |  77 ++++++
 arch/riscv/kernel/hibernate.c      | 427 +++++++++++++++++++++++++++++
 arch/riscv/kernel/suspend.c        |   4 +-
 arch/riscv/kernel/suspend_entry.S  |  34 +--
 arch/riscv/mm/pageattr.c           |   8 +
 10 files changed, 633 insertions(+), 33 deletions(-)
 create mode 100644 arch/riscv/include/asm/assembler.h
 create mode 100644 arch/riscv/kernel/hibernate-asm.S
 create mode 100644 arch/riscv/kernel/hibernate.c


base-commit: fff5a5e7f528b2ed2c335991399a766c2cf01103

Comments

Andrew Jones March 27, 2023, 1:13 p.m. UTC | #1
On Thu, Mar 23, 2023 at 12:56:00PM +0800, Sia Jee Heng wrote:
> This series adds RISC-V Hibernation/suspend to disk support.
> Low level Arch functions were created to support hibernation.
> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> cpu state onto the stack, then calling swsusp_save() to save the memory
> image.
> 
> Arch specific hibernation header is implemented and is utilized by the
> arch_hibernation_header_restore() and arch_hibernation_header_save()
> functions. The arch specific hibernation header consists of satp, hartid,
> and the cpu_resume address. The kernel built version is also need to be
> saved into the hibernation image header to making sure only the same
> kernel is restore when resume.
> 
> swsusp_arch_resume() creates a temporary page table that covering only
> the linear map. It copies the restore code to a 'safe' page, then start to
> restore the memory image. Once completed, it restores the original
> kernel's page table. It then calls into __hibernate_cpu_resume()
> to restore the CPU context. Finally, it follows the normal hibernation
> path back to the hibernation core.
> 
> To enable hibernation/suspend to disk into RISCV, the below config
> need to be enabled:
> - CONFIG_ARCH_HIBERNATION_HEADER
> - CONFIG_ARCH_HIBERNATION_POSSIBLE
> 
> At high-level, this series includes the following changes:
> 1) Change suspend_save_csrs() and suspend_restore_csrs()
>    to public function as these functions are common to
>    suspend/hibernation. (patch 1)
> 2) Refactor the common code in the __cpu_resume_enter() function and
>    __hibernate_cpu_resume() function. The common code are used by
>    hibernation and suspend. (patch 2)
> 3) Enhance kernel_page_present() function to support huge page. (patch 3)
> 4) Add arch/riscv low level functions to support
>    hibernation/suspend to disk. (patch 4)
> 
> The above patches are based on kernel v6.3-rc3 and are tested on
> StarFive VF2 SBC board and Qemu. 
> ACPI platform mode is not supported in this series.
>

I tested this on QEMU, but, FYI, I had to use a raw backing file for
the swap disk, rather than a qcow2 backing file, otherwise it didn't
resume. It's probably worth looking into why that is.

Thanks,
drew
Sia Jee Heng March 28, 2023, 6:37 a.m. UTC | #2
> -----Original Message-----
> From: Andrew Jones <ajones@ventanamicro.com>
> Sent: Monday, March 27, 2023 9:14 PM
> To: JeeHeng Sia <jeeheng.sia@starfivetech.com>
> Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
> kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
> Subject: Re: [PATCH v7 0/4] RISC-V Hibernation Support
> 
> On Thu, Mar 23, 2023 at 12:56:00PM +0800, Sia Jee Heng wrote:
> > This series adds RISC-V Hibernation/suspend to disk support.
> > Low level Arch functions were created to support hibernation.
> > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > cpu state onto the stack, then calling swsusp_save() to save the memory
> > image.
> >
> > Arch specific hibernation header is implemented and is utilized by the
> > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > functions. The arch specific hibernation header consists of satp, hartid,
> > and the cpu_resume address. The kernel built version is also need to be
> > saved into the hibernation image header to making sure only the same
> > kernel is restore when resume.
> >
> > swsusp_arch_resume() creates a temporary page table that covering only
> > the linear map. It copies the restore code to a 'safe' page, then start to
> > restore the memory image. Once completed, it restores the original
> > kernel's page table. It then calls into __hibernate_cpu_resume()
> > to restore the CPU context. Finally, it follows the normal hibernation
> > path back to the hibernation core.
> >
> > To enable hibernation/suspend to disk into RISCV, the below config
> > need to be enabled:
> > - CONFIG_ARCH_HIBERNATION_HEADER
> > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >
> > At high-level, this series includes the following changes:
> > 1) Change suspend_save_csrs() and suspend_restore_csrs()
> >    to public function as these functions are common to
> >    suspend/hibernation. (patch 1)
> > 2) Refactor the common code in the __cpu_resume_enter() function and
> >    __hibernate_cpu_resume() function. The common code are used by
> >    hibernation and suspend. (patch 2)
> > 3) Enhance kernel_page_present() function to support huge page. (patch 3)
> > 4) Add arch/riscv low level functions to support
> >    hibernation/suspend to disk. (patch 4)
> >
> > The above patches are based on kernel v6.3-rc3 and are tested on
> > StarFive VF2 SBC board and Qemu.
> > ACPI platform mode is not supported in this series.
> >
> 
> I tested this on QEMU, but, FYI, I had to use a raw backing file for
> the swap disk, rather than a qcow2 backing file, otherwise it didn't
> resume. It's probably worth looking into why that is.
Thanks for your time. The raw file format is closer to the actual physical disk. Although I can look into the qcow2 format for QEMU in the near future, it shouldn't be a blocking factor for this patch series to be upstreamed.
> 
> Thanks,
> drew
Sia Jee Heng March 29, 2023, 10:21 a.m. UTC | #3
> -----Original Message-----
> From: JeeHeng Sia
> Sent: Tuesday, March 28, 2023 2:37 PM
> To: 'Andrew Jones' <ajones@ventanamicro.com>
> Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
> kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
> Subject: RE: [PATCH v7 0/4] RISC-V Hibernation Support
> 
> 
> 
> > -----Original Message-----
> > From: Andrew Jones <ajones@ventanamicro.com>
> > Sent: Monday, March 27, 2023 9:14 PM
> > To: JeeHeng Sia <jeeheng.sia@starfivetech.com>
> > Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; aou@eecs.berkeley.edu; linux-riscv@lists.infradead.org; linux-
> > kernel@vger.kernel.org; Leyfoon Tan <leyfoon.tan@starfivetech.com>; Mason Huo <mason.huo@starfivetech.com>
> > Subject: Re: [PATCH v7 0/4] RISC-V Hibernation Support
> >
> > On Thu, Mar 23, 2023 at 12:56:00PM +0800, Sia Jee Heng wrote:
> > > This series adds RISC-V Hibernation/suspend to disk support.
> > > Low level Arch functions were created to support hibernation.
> > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > image.
> > >
> > > Arch specific hibernation header is implemented and is utilized by the
> > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > functions. The arch specific hibernation header consists of satp, hartid,
> > > and the cpu_resume address. The kernel built version is also need to be
> > > saved into the hibernation image header to making sure only the same
> > > kernel is restore when resume.
> > >
> > > swsusp_arch_resume() creates a temporary page table that covering only
> > > the linear map. It copies the restore code to a 'safe' page, then start to
> > > restore the memory image. Once completed, it restores the original
> > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > to restore the CPU context. Finally, it follows the normal hibernation
> > > path back to the hibernation core.
> > >
> > > To enable hibernation/suspend to disk into RISCV, the below config
> > > need to be enabled:
> > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > >
> > > At high-level, this series includes the following changes:
> > > 1) Change suspend_save_csrs() and suspend_restore_csrs()
> > >    to public function as these functions are common to
> > >    suspend/hibernation. (patch 1)
> > > 2) Refactor the common code in the __cpu_resume_enter() function and
> > >    __hibernate_cpu_resume() function. The common code are used by
> > >    hibernation and suspend. (patch 2)
> > > 3) Enhance kernel_page_present() function to support huge page. (patch 3)
> > > 4) Add arch/riscv low level functions to support
> > >    hibernation/suspend to disk. (patch 4)
> > >
> > > The above patches are based on kernel v6.3-rc3 and are tested on
> > > StarFive VF2 SBC board and Qemu.
> > > ACPI platform mode is not supported in this series.
> > >
> >
> > I tested this on QEMU, but, FYI, I had to use a raw backing file for
> > the swap disk, rather than a qcow2 backing file, otherwise it didn't
> > resume. It's probably worth looking into why that is.
> Thanks for your time. The raw file format is closer to the actual physical disk. Although I can look into the qcow2 format for QEMU in
> the near future, it shouldn't be a blocking factor for this patch series to be upstreamed.

FYI, I managed to reproduce the hibernation issue that Andrew reported. The hibernation resume failed while retrieving pages from the disk, specifically in the kernel/power/swap.c - swap_read_page() function and the snapshot_write_next() function in the kernel/power/snapshot.c. I found that adding a delay to the functions (by adding a printk) allowed the page retrieval process to progress further. Through this exercise, I have begun to suspect that there may be an issue with coherency handling in between the hibernation core and the QEMU qcow2 driver. I will add it to my AR list and shall help to investigate the issue in the near future.
> >
> > Thanks,
> > drew