diff mbox

kvm-arm: Unmap shadow pagetables properly

Message ID 1473348349-5011-1-git-send-email-suzuki.poulose@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Suzuki K Poulose Sept. 8, 2016, 3:25 p.m. UTC
On arm/arm64, we depend on the kvm_unmap_hva* callbacks (via
mmu_notifiers::invalidate_*) to unmap the stage2 pagetables when
the userspace buffer gets unmapped. However, when the Hypervisor
process exits without explicit unmap of the guest buffers, the only
notifier we get is kvm_arch_flush_shadow_all() (via mmu_notifier::release
) which does nothing on arm. Later this causes us to access pages that
were already released [via exit_mmap() -> unmap_vmas()] when we actually
get to unmap the stage2 pagetable [via kvm_arch_destroy_vm() ->
kvm_free_stage2_pgd()]. This triggers crashes with CONFIG_DEBUG_PAGEALLOC,
which unmaps any free'd pages from the linear map.

 [  757.644120] Unable to handle kernel paging request at virtual address
  ffff800661e00000
 [  757.652046] pgd = ffff20000b1a2000
 [  757.655471] [ffff800661e00000] *pgd=00000047fffe3003, *pud=00000047fcd8c003,
  *pmd=00000047fcc7c003, *pte=00e8004661e00712
 [  757.666492] Internal error: Oops: 96000147 [#3] PREEMPT SMP
 [  757.672041] Modules linked in:
 [  757.675100] CPU: 7 PID: 3630 Comm: qemu-system-aar Tainted: G      D
 4.8.0-rc1 #3
 [  757.683240] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board,
  BIOS 3.06.15 Aug 19 2016
 [  757.692938] task: ffff80069cdd3580 task.stack: ffff8006adb7c000
 [  757.698840] PC is at __flush_dcache_area+0x1c/0x40
 [  757.703613] LR is at kvm_flush_dcache_pmd+0x60/0x70
 [  757.708469] pc : [<ffff20000809dbdc>] lr : [<ffff2000080b4a70>] pstate: 20000145
 ...
 [  758.357249] [<ffff20000809dbdc>] __flush_dcache_area+0x1c/0x40
 [  758.363059] [<ffff2000080b6748>] unmap_stage2_range+0x458/0x5f0
 [  758.368954] [<ffff2000080b708c>] kvm_free_stage2_pgd+0x34/0x60
 [  758.374761] [<ffff2000080b2280>] kvm_arch_destroy_vm+0x20/0x68
 [  758.380570] [<ffff2000080aa330>] kvm_put_kvm+0x210/0x358
 [  758.385860] [<ffff2000080aa524>] kvm_vm_release+0x2c/0x40
 [  758.391239] [<ffff2000082ad234>] __fput+0x114/0x2e8
 [  758.396096] [<ffff2000082ad46c>] ____fput+0xc/0x18
 [  758.400869] [<ffff200008104658>] task_work_run+0x108/0x138
 [  758.406332] [<ffff2000080dc8ec>] do_exit+0x48c/0x10e8
 [  758.411363] [<ffff2000080dd5fc>] do_group_exit+0x6c/0x130
 [  758.416739] [<ffff2000080ed924>] get_signal+0x284/0xa18
 [  758.421943] [<ffff20000808a098>] do_signal+0x158/0x860
 [  758.427060] [<ffff20000808aad4>] do_notify_resume+0x6c/0x88
 [  758.432608] [<ffff200008083624>] work_pending+0x10/0x14
 [  758.437812] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7e20)

This patch fixes the issue by moving the kvm_free_stage2_pgd() to
kvm_arch_flush_shadow_all().

Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp>
Reported-by: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kvm/arm.c | 2 --
 arch/arm/kvm/mmu.c | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

Comments

Itaru Kitayama Sept. 9, 2016, 6:02 a.m. UTC | #1
VM now shuts down normally on Mustang backed by DEBUG_PAGEALLOC=y
4.8.0-rc5 kernel.

On 9/9/16 12:25 AM, Suzuki K Poulose wrote:
> On arm/arm64, we depend on the kvm_unmap_hva* callbacks (via
> mmu_notifiers::invalidate_*) to unmap the stage2 pagetables when
> the userspace buffer gets unmapped. However, when the Hypervisor
> process exits without explicit unmap of the guest buffers, the only
> notifier we get is kvm_arch_flush_shadow_all() (via mmu_notifier::release
> ) which does nothing on arm. Later this causes us to access pages that
> were already released [via exit_mmap() -> unmap_vmas()] when we actually
> get to unmap the stage2 pagetable [via kvm_arch_destroy_vm() ->
> kvm_free_stage2_pgd()]. This triggers crashes with CONFIG_DEBUG_PAGEALLOC,
> which unmaps any free'd pages from the linear map.
>
>   [  757.644120] Unable to handle kernel paging request at virtual address
>    ffff800661e00000
>   [  757.652046] pgd = ffff20000b1a2000
>   [  757.655471] [ffff800661e00000] *pgd=00000047fffe3003, *pud=00000047fcd8c003,
>    *pmd=00000047fcc7c003, *pte=00e8004661e00712
>   [  757.666492] Internal error: Oops: 96000147 [#3] PREEMPT SMP
>   [  757.672041] Modules linked in:
>   [  757.675100] CPU: 7 PID: 3630 Comm: qemu-system-aar Tainted: G      D
>   4.8.0-rc1 #3
>   [  757.683240] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board,
>    BIOS 3.06.15 Aug 19 2016
>   [  757.692938] task: ffff80069cdd3580 task.stack: ffff8006adb7c000
>   [  757.698840] PC is at __flush_dcache_area+0x1c/0x40
>   [  757.703613] LR is at kvm_flush_dcache_pmd+0x60/0x70
>   [  757.708469] pc : [<ffff20000809dbdc>] lr : [<ffff2000080b4a70>] pstate: 20000145
>   ...
>   [  758.357249] [<ffff20000809dbdc>] __flush_dcache_area+0x1c/0x40
>   [  758.363059] [<ffff2000080b6748>] unmap_stage2_range+0x458/0x5f0
>   [  758.368954] [<ffff2000080b708c>] kvm_free_stage2_pgd+0x34/0x60
>   [  758.374761] [<ffff2000080b2280>] kvm_arch_destroy_vm+0x20/0x68
>   [  758.380570] [<ffff2000080aa330>] kvm_put_kvm+0x210/0x358
>   [  758.385860] [<ffff2000080aa524>] kvm_vm_release+0x2c/0x40
>   [  758.391239] [<ffff2000082ad234>] __fput+0x114/0x2e8
>   [  758.396096] [<ffff2000082ad46c>] ____fput+0xc/0x18
>   [  758.400869] [<ffff200008104658>] task_work_run+0x108/0x138
>   [  758.406332] [<ffff2000080dc8ec>] do_exit+0x48c/0x10e8
>   [  758.411363] [<ffff2000080dd5fc>] do_group_exit+0x6c/0x130
>   [  758.416739] [<ffff2000080ed924>] get_signal+0x284/0xa18
>   [  758.421943] [<ffff20000808a098>] do_signal+0x158/0x860
>   [  758.427060] [<ffff20000808aad4>] do_notify_resume+0x6c/0x88
>   [  758.432608] [<ffff200008083624>] work_pending+0x10/0x14
>   [  758.437812] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7e20)
>
> This patch fixes the issue by moving the kvm_free_stage2_pgd() to
> kvm_arch_flush_shadow_all().
>
> Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp>
> Reported-by: James Morse <james.morse@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   arch/arm/kvm/arm.c | 2 --
>   arch/arm/kvm/mmu.c | 1 +
>   2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 75f130e..c94b90d 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -158,8 +158,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>   {
>   	int i;
>   
> -	kvm_free_stage2_pgd(kvm);
> -
>   	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>   		if (kvm->vcpus[i]) {
>   			kvm_arch_vcpu_free(kvm->vcpus[i]);
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 29d0b23..d626d08 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1893,6 +1893,7 @@ void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslots *slots)
>   
>   void kvm_arch_flush_shadow_all(struct kvm *kvm)
>   {
> +	kvm_free_stage2_pgd(kvm);
>   }
>   
>   void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
Christoffer Dall Sept. 9, 2016, 10:38 a.m. UTC | #2
Hi Suzuki,

On Thu, Sep 08, 2016 at 04:25:49PM +0100, Suzuki K Poulose wrote:
> On arm/arm64, we depend on the kvm_unmap_hva* callbacks (via
> mmu_notifiers::invalidate_*) to unmap the stage2 pagetables when
> the userspace buffer gets unmapped. However, when the Hypervisor
> process exits without explicit unmap of the guest buffers, the only
> notifier we get is kvm_arch_flush_shadow_all() (via mmu_notifier::release
> ) which does nothing on arm. Later this causes us to access pages that
> were already released [via exit_mmap() -> unmap_vmas()] when we actually
> get to unmap the stage2 pagetable [via kvm_arch_destroy_vm() ->
> kvm_free_stage2_pgd()]. This triggers crashes with CONFIG_DEBUG_PAGEALLOC,
> which unmaps any free'd pages from the linear map.
> 
>  [  757.644120] Unable to handle kernel paging request at virtual address
>   ffff800661e00000
>  [  757.652046] pgd = ffff20000b1a2000
>  [  757.655471] [ffff800661e00000] *pgd=00000047fffe3003, *pud=00000047fcd8c003,
>   *pmd=00000047fcc7c003, *pte=00e8004661e00712
>  [  757.666492] Internal error: Oops: 96000147 [#3] PREEMPT SMP
>  [  757.672041] Modules linked in:
>  [  757.675100] CPU: 7 PID: 3630 Comm: qemu-system-aar Tainted: G      D
>  4.8.0-rc1 #3
>  [  757.683240] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board,
>   BIOS 3.06.15 Aug 19 2016
>  [  757.692938] task: ffff80069cdd3580 task.stack: ffff8006adb7c000
>  [  757.698840] PC is at __flush_dcache_area+0x1c/0x40
>  [  757.703613] LR is at kvm_flush_dcache_pmd+0x60/0x70
>  [  757.708469] pc : [<ffff20000809dbdc>] lr : [<ffff2000080b4a70>] pstate: 20000145
>  ...
>  [  758.357249] [<ffff20000809dbdc>] __flush_dcache_area+0x1c/0x40
>  [  758.363059] [<ffff2000080b6748>] unmap_stage2_range+0x458/0x5f0
>  [  758.368954] [<ffff2000080b708c>] kvm_free_stage2_pgd+0x34/0x60
>  [  758.374761] [<ffff2000080b2280>] kvm_arch_destroy_vm+0x20/0x68
>  [  758.380570] [<ffff2000080aa330>] kvm_put_kvm+0x210/0x358
>  [  758.385860] [<ffff2000080aa524>] kvm_vm_release+0x2c/0x40
>  [  758.391239] [<ffff2000082ad234>] __fput+0x114/0x2e8
>  [  758.396096] [<ffff2000082ad46c>] ____fput+0xc/0x18
>  [  758.400869] [<ffff200008104658>] task_work_run+0x108/0x138
>  [  758.406332] [<ffff2000080dc8ec>] do_exit+0x48c/0x10e8
>  [  758.411363] [<ffff2000080dd5fc>] do_group_exit+0x6c/0x130
>  [  758.416739] [<ffff2000080ed924>] get_signal+0x284/0xa18
>  [  758.421943] [<ffff20000808a098>] do_signal+0x158/0x860
>  [  758.427060] [<ffff20000808aad4>] do_notify_resume+0x6c/0x88
>  [  758.432608] [<ffff200008083624>] work_pending+0x10/0x14
>  [  758.437812] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7e20)
> 
> This patch fixes the issue by moving the kvm_free_stage2_pgd() to
> kvm_arch_flush_shadow_all().

Awesome, thanks!

I suppose this is a cc stable patch since as I can tell this bug has
been here forever.

-Christoffer

> 
> Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp>
> Reported-by: James Morse <james.morse@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  arch/arm/kvm/arm.c | 2 --
>  arch/arm/kvm/mmu.c | 1 +
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 75f130e..c94b90d 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -158,8 +158,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  {
>  	int i;
>  
> -	kvm_free_stage2_pgd(kvm);
> -
>  	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>  		if (kvm->vcpus[i]) {
>  			kvm_arch_vcpu_free(kvm->vcpus[i]);
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 29d0b23..d626d08 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1893,6 +1893,7 @@ void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslots *slots)
>  
>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>  {
> +	kvm_free_stage2_pgd(kvm);
>  }
>  
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> -- 
> 2.7.4
>
Suzuki K Poulose Sept. 9, 2016, 10:43 a.m. UTC | #3
On 09/09/16 11:38, Christoffer Dall wrote:
> Hi Suzuki,
>
> On Thu, Sep 08, 2016 at 04:25:49PM +0100, Suzuki K Poulose wrote:

>> This patch fixes the issue by moving the kvm_free_stage2_pgd() to
>> kvm_arch_flush_shadow_all().
>
> Awesome, thanks!
>
> I suppose this is a cc stable patch since as I can tell this bug has
> been here forever.

Yes, I think the code moved around a bit in the past. I will submit the
patches separately for each version once I get ack on this.

Cheers
Suzuki
Christoffer Dall Sept. 9, 2016, 10:54 a.m. UTC | #4
On Fri, Sep 09, 2016 at 11:43:42AM +0100, Suzuki K Poulose wrote:
> On 09/09/16 11:38, Christoffer Dall wrote:
> >Hi Suzuki,
> >
> >On Thu, Sep 08, 2016 at 04:25:49PM +0100, Suzuki K Poulose wrote:
> 
> >>This patch fixes the issue by moving the kvm_free_stage2_pgd() to
> >>kvm_arch_flush_shadow_all().
> >
> >Awesome, thanks!
> >
> >I suppose this is a cc stable patch since as I can tell this bug has
> >been here forever.
> 
> Yes, I think the code moved around a bit in the past. I will submit the
> patches separately for each version once I get ack on this.
> 
I don't think that's necessary, I think we just CC stable and see which
versions it applies to, and fix anything missing up at that time if
required.

Thanks,
-Christoffer
diff mbox

Patch

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 75f130e..c94b90d 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -158,8 +158,6 @@  void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	int i;
 
-	kvm_free_stage2_pgd(kvm);
-
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 29d0b23..d626d08 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1893,6 +1893,7 @@  void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslots *slots)
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
+	kvm_free_stage2_pgd(kvm);
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,