diff mbox series

[v3,5/6] KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy

Message ID 20200422161844.3848063-6-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm: vgic fixes for 5.7 | expand

Commit Message

Marc Zyngier April 22, 2020, 4:18 p.m. UTC
From: Zenghui Yu <yuzenghui@huawei.com>

It's likely that the vcpu fails to handle all virtual interrupts if
userspace decides to destroy it, leaving the pending ones stay in the
ap_list. If the un-handled one is a LPI, its vgic_irq structure will
be eventually leaked because of an extra refcount increment in
vgic_queue_irq_unlock().

This was detected by kmemleak on almost every guest destroy, the
backtrace is as follows:

unreferenced object 0xffff80725aed5500 (size 128):
comm "CPU 5/KVM", pid 40711, jiffies 4298024754 (age 166366.512s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 08 01 a9 73 6d 80 ff ff ...........sm...
c8 61 ee a9 00 20 ff ff 28 1e 55 81 6c 80 ff ff .a... ..(.U.l...
backtrace:
[<000000004bcaa122>] kmem_cache_alloc_trace+0x2dc/0x418
[<0000000069c7dabb>] vgic_add_lpi+0x88/0x418
[<00000000bfefd5c5>] vgic_its_cmd_handle_mapi+0x4dc/0x588
[<00000000cf993975>] vgic_its_process_commands.part.5+0x484/0x1198
[<000000004bd3f8e3>] vgic_its_process_commands+0x50/0x80
[<00000000b9a65b2b>] vgic_mmio_write_its_cwriter+0xac/0x108
[<0000000009641ebb>] dispatch_mmio_write+0xd0/0x188
[<000000008f79d288>] __kvm_io_bus_write+0x134/0x240
[<00000000882f39ac>] kvm_io_bus_write+0xe0/0x150
[<0000000078197602>] io_mem_abort+0x484/0x7b8
[<0000000060954e3c>] kvm_handle_guest_abort+0x4cc/0xa58
[<00000000e0d0cd65>] handle_exit+0x24c/0x770
[<00000000b44a7fad>] kvm_arch_vcpu_ioctl_run+0x460/0x1988
[<0000000025fb897c>] kvm_vcpu_ioctl+0x4f8/0xee0
[<000000003271e317>] do_vfs_ioctl+0x160/0xcd8
[<00000000e7f39607>] ksys_ioctl+0x98/0xd8

Fix it by retiring all pending LPIs in the ap_list on the destroy path.

p.s. I can also reproduce it on a normal guest shutdown. It is because
userspace still send LPIs to vcpu (through KVM_SIGNAL_MSI ioctl) while
the guest is being shutdown and unable to handle it. A little strange
though and haven't dig further...

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200414030349.625-2-yuzenghui@huawei.com
---
 virt/kvm/arm/vgic/vgic-init.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

James Morse April 23, 2020, 11:35 a.m. UTC | #1
Hi Marc, Zenghui,

On 22/04/2020 17:18, Marc Zyngier wrote:
> From: Zenghui Yu <yuzenghui@huawei.com>
> 
> It's likely that the vcpu fails to handle all virtual interrupts if
> userspace decides to destroy it, leaving the pending ones stay in the
> ap_list. If the un-handled one is a LPI, its vgic_irq structure will
> be eventually leaked because of an extra refcount increment in
> vgic_queue_irq_unlock().

> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index a963b9d766b73..53ec9b9d9bc43 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -348,6 +348,12 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  
> +	/*
> +	 * Retire all pending LPIs on this vcpu anyway as we're
> +	 * going to destroy it.
> +	 */

Looking at the other caller, do we need something like:
|	if (vgic_cpu->lpis_enabled)

?

> +	vgic_flush_pending_lpis(vcpu);
> +

Otherwise, I get this on a gic-v2 machine!:
[ 1742.187139] BUG: KASAN: use-after-free in vgic_flush_pending_lpis+0x250/0x2c0
[ 1742.194302] Read of size 8 at addr ffff0008e1bf1f28 by task qemu-system-aar/542
[ 1742.203140] CPU: 2 PID: 542 Comm: qemu-system-aar Not tainted
5.7.0-rc2-00006-g4fb0f7bb0e27 #2
[ 1742.211780] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development
Platform, BIOS EDK II Jul 30 2018
[ 1742.222596] Call trace:
[ 1742.225059]  dump_backtrace+0x0/0x328
[ 1742.228738]  show_stack+0x18/0x28
[ 1742.232071]  dump_stack+0x134/0x1b0
[ 1742.235578]  print_address_description.isra.0+0x6c/0x350
[ 1742.240910]  __kasan_report+0x10c/0x180
[ 1742.244763]  kasan_report+0x4c/0x68
[ 1742.248268]  __asan_report_load8_noabort+0x30/0x48
[ 1742.253081]  vgic_flush_pending_lpis+0x250/0x2c0
[ 1742.257718]  __kvm_vgic_destroy+0x1cc/0x478
[ 1742.261919]  kvm_vgic_destroy+0x30/0x48
[ 1742.265773]  kvm_arch_destroy_vm+0x20/0x128
[ 1742.269976]  kvm_put_kvm+0x3e0/0x8d0
[ 1742.273567]  kvm_vm_release+0x3c/0x60
[ 1742.277248]  __fput+0x218/0x630
[ 1742.280406]  ____fput+0x10/0x20
[ 1742.283565]  task_work_run+0xd8/0x1f0
[ 1742.287245]  do_exit+0x87c/0x2640
[ 1742.290575]  do_group_exit+0xd0/0x258
[ 1742.294254]  __arm64_sys_exit_group+0x3c/0x48
[ 1742.298631]  el0_svc_common.constprop.0+0x10c/0x348
[ 1742.303529]  do_el0_svc+0x48/0xd0
[ 1742.306861]  el0_sync_handler+0x11c/0x1b8
[ 1742.310888]  el0_sync+0x158/0x180
[ 1742.315716] The buggy address belongs to the page:
[ 1742.320529] page:fffffe002366fc40 refcount:0 mapcount:0 mapping:000000007e21d29f index:0x0
[ 1742.328821] flags: 0x2ffff00000000000()
[ 1742.332678] raw: 2ffff00000000000 0000000000000000 ffffffff23660401 0000000000000000
[ 1742.340449] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 1742.348215] page dumped because: kasan: bad access detected
[ 1742.355304] Memory state around the buggy address:
[ 1742.360115]  ffff0008e1bf1e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 1742.367360]  ffff0008e1bf1e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 1742.374606] >ffff0008e1bf1f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 1742.381851]                                   ^
[ 1742.386399]  ffff0008e1bf1f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 1742.393645]  ffff0008e1bf2000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 1742.400889] ==================================================================
[ 1742.408132] Disabling lock debugging due to kernel taint


With that:
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James
Zenghui Yu April 23, 2020, 11:57 a.m. UTC | #2
Hi James,

Thanks for having a look and testing it!

On 2020/4/23 19:35, James Morse wrote:
> Hi Marc, Zenghui,
> 
> On 22/04/2020 17:18, Marc Zyngier wrote:
>> From: Zenghui Yu <yuzenghui@huawei.com>
>>
>> It's likely that the vcpu fails to handle all virtual interrupts if
>> userspace decides to destroy it, leaving the pending ones stay in the
>> ap_list. If the un-handled one is a LPI, its vgic_irq structure will
>> be eventually leaked because of an extra refcount increment in
>> vgic_queue_irq_unlock().
> 
>> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
>> index a963b9d766b73..53ec9b9d9bc43 100644
>> --- a/virt/kvm/arm/vgic/vgic-init.c
>> +++ b/virt/kvm/arm/vgic/vgic-init.c
>> @@ -348,6 +348,12 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>   {
>>   	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>   
>> +	/*
>> +	 * Retire all pending LPIs on this vcpu anyway as we're
>> +	 * going to destroy it.
>> +	 */
> 
> Looking at the other caller, do we need something like:
> |	if (vgic_cpu->lpis_enabled)
> 
> ?

If LPIs are disabled at redistributor level, yes there should be no
pending LPIs in the ap_list. But I'm not sure how can the following
use-after-free BUG be triggered.

> 
>> +	vgic_flush_pending_lpis(vcpu);
>> +
> 
> Otherwise, I get this on a gic-v2 machine!:

I don't have a gic-v2 one and thus can't reproduce it :-(

> [ 1742.187139] BUG: KASAN: use-after-free in vgic_flush_pending_lpis+0x250/0x2c0
> [ 1742.194302] Read of size 8 at addr ffff0008e1bf1f28 by task qemu-system-aar/542
> [ 1742.203140] CPU: 2 PID: 542 Comm: qemu-system-aar Not tainted
> 5.7.0-rc2-00006-g4fb0f7bb0e27 #2
> [ 1742.211780] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development
> Platform, BIOS EDK II Jul 30 2018
> [ 1742.222596] Call trace:
> [ 1742.225059]  dump_backtrace+0x0/0x328
> [ 1742.228738]  show_stack+0x18/0x28
> [ 1742.232071]  dump_stack+0x134/0x1b0
> [ 1742.235578]  print_address_description.isra.0+0x6c/0x350
> [ 1742.240910]  __kasan_report+0x10c/0x180
> [ 1742.244763]  kasan_report+0x4c/0x68
> [ 1742.248268]  __asan_report_load8_noabort+0x30/0x48
> [ 1742.253081]  vgic_flush_pending_lpis+0x250/0x2c0

Could you please show the result of

./scripts/faddr2line vmlinux vgic_flush_pending_lpis+0x250/0x2c0

on your setup?

> [ 1742.257718]  __kvm_vgic_destroy+0x1cc/0x478
> [ 1742.261919]  kvm_vgic_destroy+0x30/0x48
> [ 1742.265773]  kvm_arch_destroy_vm+0x20/0x128
> [ 1742.269976]  kvm_put_kvm+0x3e0/0x8d0
> [ 1742.273567]  kvm_vm_release+0x3c/0x60
> [ 1742.277248]  __fput+0x218/0x630
> [ 1742.280406]  ____fput+0x10/0x20
> [ 1742.283565]  task_work_run+0xd8/0x1f0
> [ 1742.287245]  do_exit+0x87c/0x2640
> [ 1742.290575]  do_group_exit+0xd0/0x258
> [ 1742.294254]  __arm64_sys_exit_group+0x3c/0x48
> [ 1742.298631]  el0_svc_common.constprop.0+0x10c/0x348
> [ 1742.303529]  do_el0_svc+0x48/0xd0
> [ 1742.306861]  el0_sync_handler+0x11c/0x1b8
> [ 1742.310888]  el0_sync+0x158/0x180
> [ 1742.315716] The buggy address belongs to the page:
> [ 1742.320529] page:fffffe002366fc40 refcount:0 mapcount:0 mapping:000000007e21d29f index:0x0
> [ 1742.328821] flags: 0x2ffff00000000000()
> [ 1742.332678] raw: 2ffff00000000000 0000000000000000 ffffffff23660401 0000000000000000
> [ 1742.340449] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> [ 1742.348215] page dumped because: kasan: bad access detected
> [ 1742.355304] Memory state around the buggy address:
> [ 1742.360115]  ffff0008e1bf1e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> [ 1742.367360]  ffff0008e1bf1e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> [ 1742.374606] >ffff0008e1bf1f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> [ 1742.381851]                                   ^
> [ 1742.386399]  ffff0008e1bf1f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> [ 1742.393645]  ffff0008e1bf2000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> [ 1742.400889] ==================================================================
> [ 1742.408132] Disabling lock debugging due to kernel taint
> 
> 
> With that:
> Reviewed-by: James Morse <james.morse@arm.com>

Thanks a lot!

Zenghui
Marc Zyngier April 23, 2020, 12:03 p.m. UTC | #3
Hi James,

Thanks for the heads up.

On 2020-04-23 12:35, James Morse wrote:
> Hi Marc, Zenghui,
> 
> On 22/04/2020 17:18, Marc Zyngier wrote:
>> From: Zenghui Yu <yuzenghui@huawei.com>
>> 
>> It's likely that the vcpu fails to handle all virtual interrupts if
>> userspace decides to destroy it, leaving the pending ones stay in the
>> ap_list. If the un-handled one is a LPI, its vgic_irq structure will
>> be eventually leaked because of an extra refcount increment in
>> vgic_queue_irq_unlock().
> 
>> diff --git a/virt/kvm/arm/vgic/vgic-init.c 
>> b/virt/kvm/arm/vgic/vgic-init.c
>> index a963b9d766b73..53ec9b9d9bc43 100644
>> --- a/virt/kvm/arm/vgic/vgic-init.c
>> +++ b/virt/kvm/arm/vgic/vgic-init.c
>> @@ -348,6 +348,12 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>> 
>> +	/*
>> +	 * Retire all pending LPIs on this vcpu anyway as we're
>> +	 * going to destroy it.
>> +	 */
> 
> Looking at the other caller, do we need something like:
> |	if (vgic_cpu->lpis_enabled)
> 
> ?

Huh... On its own, this call is absolutely harmless even if you
don't have LPIs. But see below.

> 
>> +	vgic_flush_pending_lpis(vcpu);
>> +
> 
> Otherwise, I get this on a gic-v2 machine!:
> [ 1742.187139] BUG: KASAN: use-after-free in 
> vgic_flush_pending_lpis+0x250/0x2c0
> [ 1742.194302] Read of size 8 at addr ffff0008e1bf1f28 by task
> qemu-system-aar/542
> [ 1742.203140] CPU: 2 PID: 542 Comm: qemu-system-aar Not tainted
> 5.7.0-rc2-00006-g4fb0f7bb0e27 #2
> [ 1742.211780] Hardware name: ARM LTD ARM Juno Development
> Platform/ARM Juno Development
> Platform, BIOS EDK II Jul 30 2018
> [ 1742.222596] Call trace:
> [ 1742.225059]  dump_backtrace+0x0/0x328
> [ 1742.228738]  show_stack+0x18/0x28
> [ 1742.232071]  dump_stack+0x134/0x1b0
> [ 1742.235578]  print_address_description.isra.0+0x6c/0x350
> [ 1742.240910]  __kasan_report+0x10c/0x180
> [ 1742.244763]  kasan_report+0x4c/0x68
> [ 1742.248268]  __asan_report_load8_noabort+0x30/0x48
> [ 1742.253081]  vgic_flush_pending_lpis+0x250/0x2c0
> [ 1742.257718]  __kvm_vgic_destroy+0x1cc/0x478
> [ 1742.261919]  kvm_vgic_destroy+0x30/0x48
> [ 1742.265773]  kvm_arch_destroy_vm+0x20/0x128
> [ 1742.269976]  kvm_put_kvm+0x3e0/0x8d0
> [ 1742.273567]  kvm_vm_release+0x3c/0x60
> [ 1742.277248]  __fput+0x218/0x630
> [ 1742.280406]  ____fput+0x10/0x20
> [ 1742.283565]  task_work_run+0xd8/0x1f0
> [ 1742.287245]  do_exit+0x87c/0x2640
> [ 1742.290575]  do_group_exit+0xd0/0x258
> [ 1742.294254]  __arm64_sys_exit_group+0x3c/0x48
> [ 1742.298631]  el0_svc_common.constprop.0+0x10c/0x348
> [ 1742.303529]  do_el0_svc+0x48/0xd0
> [ 1742.306861]  el0_sync_handler+0x11c/0x1b8
> [ 1742.310888]  el0_sync+0x158/0x180
> [ 1742.315716] The buggy address belongs to the page:
> [ 1742.320529] page:fffffe002366fc40 refcount:0 mapcount:0
> mapping:000000007e21d29f index:0x0
> [ 1742.328821] flags: 0x2ffff00000000000()
> [ 1742.332678] raw: 2ffff00000000000 0000000000000000 ffffffff23660401
> 0000000000000000
> [ 1742.340449] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [ 1742.348215] page dumped because: kasan: bad access detected
> [ 1742.355304] Memory state around the buggy address:
> [ 1742.360115]  ffff0008e1bf1e00: ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff
> [ 1742.367360]  ffff0008e1bf1e80: ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff
> [ 1742.374606] >ffff0008e1bf1f00: ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff
> [ 1742.381851]                                   ^
> [ 1742.386399]  ffff0008e1bf1f80: ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff
> [ 1742.393645]  ffff0008e1bf2000: ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff ff ff
> [ 1742.400889]
> ==================================================================
> [ 1742.408132] Disabling lock debugging due to kernel taint
> 
> 
> With that:
> Reviewed-by: James Morse <james.morse@arm.com>

I think this is slightly more concerning. The issue is that we have
started freeing parts of the interrupt state already (we free the
SPIs early in kvm_vgic_dist_destroy()).

If a SPI was pending or active at this stage (i.e. present in the
ap_list), we are going to iterate over memory that has been freed
already. This is bad, and this can happen on GICv3 as well.

I think this should solve it, but I need to test it on a GICv2 system:

diff --git a/virt/kvm/arm/vgic/vgic-init.c 
b/virt/kvm/arm/vgic/vgic-init.c
index 53ec9b9d9bc43..30dbec9fe0b4a 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -365,10 +365,10 @@ static void __kvm_vgic_destroy(struct kvm *kvm)

  	vgic_debug_destroy(kvm);

-	kvm_vgic_dist_destroy(kvm);
-
  	kvm_for_each_vcpu(i, vcpu, kvm)
  		kvm_vgic_vcpu_destroy(vcpu);
+
+	kvm_vgic_dist_destroy(kvm);
  }

  void kvm_vgic_destroy(struct kvm *kvm)

Thanks,

         M.
Zenghui Yu April 23, 2020, 12:18 p.m. UTC | #4
On 2020/4/23 20:03, Marc Zyngier wrote:
> 
> I think this is slightly more concerning. The issue is that we have
> started freeing parts of the interrupt state already (we free the
> SPIs early in kvm_vgic_dist_destroy()).
> 
> If a SPI was pending or active at this stage (i.e. present in the
> ap_list), we are going to iterate over memory that has been freed
> already. This is bad, and this can happen on GICv3 as well.

Ah, I think this should be the case.

> 
> I think this should solve it, but I need to test it on a GICv2 system:

Agreed.

> 
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 53ec9b9d9bc43..30dbec9fe0b4a 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -365,10 +365,10 @@ static void __kvm_vgic_destroy(struct kvm *kvm)
> 
>       vgic_debug_destroy(kvm);
> 
> -    kvm_vgic_dist_destroy(kvm);
> -
>       kvm_for_each_vcpu(i, vcpu, kvm)
>           kvm_vgic_vcpu_destroy(vcpu);
> +
> +    kvm_vgic_dist_destroy(kvm);
>   }
> 
>   void kvm_vgic_destroy(struct kvm *kvm)

Thanks for the fix,
Zenghui
James Morse April 23, 2020, 2:34 p.m. UTC | #5
Hi Zenghui,

On 23/04/2020 12:57, Zenghui Yu wrote:
> On 2020/4/23 19:35, James Morse wrote:
>> On 22/04/2020 17:18, Marc Zyngier wrote:
>>> From: Zenghui Yu <yuzenghui@huawei.com>
>>>
>>> It's likely that the vcpu fails to handle all virtual interrupts if
>>> userspace decides to destroy it, leaving the pending ones stay in the
>>> ap_list. If the un-handled one is a LPI, its vgic_irq structure will
>>> be eventually leaked because of an extra refcount increment in
>>> vgic_queue_irq_unlock().
>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
>>> index a963b9d766b73..53ec9b9d9bc43 100644
>>> --- a/virt/kvm/arm/vgic/vgic-init.c
>>> +++ b/virt/kvm/arm/vgic/vgic-init.c
>>> @@ -348,6 +348,12 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>   {
>>>       struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>>   +    /*
>>> +     * Retire all pending LPIs on this vcpu anyway as we're
>>> +     * going to destroy it.
>>> +     */
>>
>> Looking at the other caller, do we need something like:
>> |    if (vgic_cpu->lpis_enabled)
>>
>> ?
> 
> If LPIs are disabled at redistributor level, yes there should be no
> pending LPIs in the ap_list. But I'm not sure how can the following
> use-after-free BUG be triggered.
> 
>>
>>> +    vgic_flush_pending_lpis(vcpu);
>>> +
>>
>> Otherwise, I get this on a gic-v2 machine!:
> 
> I don't have a gic-v2 one and thus can't reproduce it :-(
> 
>> [ 1742.187139] BUG: KASAN: use-after-free in vgic_flush_pending_lpis+0x250/0x2c0
>> [ 1742.194302] Read of size 8 at addr ffff0008e1bf1f28 by task qemu-system-aar/542
>> [ 1742.203140] CPU: 2 PID: 542 Comm: qemu-system-aar Not tainted
>> 5.7.0-rc2-00006-g4fb0f7bb0e27 #2
>> [ 1742.211780] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development
>> Platform, BIOS EDK II Jul 30 2018
>> [ 1742.222596] Call trace:
>> [ 1742.225059]  dump_backtrace+0x0/0x328
>> [ 1742.228738]  show_stack+0x18/0x28
>> [ 1742.232071]  dump_stack+0x134/0x1b0
>> [ 1742.235578]  print_address_description.isra.0+0x6c/0x350
>> [ 1742.240910]  __kasan_report+0x10c/0x180
>> [ 1742.244763]  kasan_report+0x4c/0x68
>> [ 1742.248268]  __asan_report_load8_noabort+0x30/0x48
>> [ 1742.253081]  vgic_flush_pending_lpis+0x250/0x2c0
> 
> Could you please show the result of
> 
> ./scripts/faddr2line vmlinux vgic_flush_pending_lpis+0x250/0x2c0
> 
> on your setup?

vgic_flush_pending_lpis+0x250/0x2c0:
vgic_flush_pending_lpis at arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic.c:159

Which is:
|	list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) {


I think this confirms Marc's view of the KASAN splat.


>> With that:
>> Reviewed-by: James Morse <james.morse@arm.com>
> 
> Thanks a lot!

Heh, it looks like I had the wrong end of the stick with this... I wrongly assumed calling
this on gicv2 would go using structures that held something else.


Thanks,

James
James Morse April 23, 2020, 2:34 p.m. UTC | #6
Hi guys,

On 23/04/2020 13:03, Marc Zyngier wrote:
> On 2020-04-23 12:35, James Morse wrote:
>> On 22/04/2020 17:18, Marc Zyngier wrote:
>>> From: Zenghui Yu <yuzenghui@huawei.com>
>>>
>>> It's likely that the vcpu fails to handle all virtual interrupts if
>>> userspace decides to destroy it, leaving the pending ones stay in the
>>> ap_list. If the un-handled one is a LPI, its vgic_irq structure will
>>> be eventually leaked because of an extra refcount increment in
>>> vgic_queue_irq_unlock().
>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
>>> index a963b9d766b73..53ec9b9d9bc43 100644
>>> --- a/virt/kvm/arm/vgic/vgic-init.c
>>> +++ b/virt/kvm/arm/vgic/vgic-init.c
>>> @@ -348,6 +348,12 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>>  {
>>>      struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>>
>>> +    /*
>>> +     * Retire all pending LPIs on this vcpu anyway as we're
>>> +     * going to destroy it.
>>> +     */
>>
>> Looking at the other caller, do we need something like:
>> |    if (vgic_cpu->lpis_enabled)
>>
>> ?
> 
> Huh... On its own, this call is absolutely harmless even if you
> don't have LPIs. But see below.
> 
>>
>>> +    vgic_flush_pending_lpis(vcpu);
>>> +
>>
>> Otherwise, I get this on a gic-v2 machine!:
>> [ 1742.187139] BUG: KASAN: use-after-free in vgic_flush_pending_lpis+0x250/0x2c0
>> [ 1742.194302] Read of size 8 at addr ffff0008e1bf1f28 by task
>> qemu-system-aar/542
>> [ 1742.203140] CPU: 2 PID: 542 Comm: qemu-system-aar Not tainted
>> 5.7.0-rc2-00006-g4fb0f7bb0e27 #2
>> [ 1742.211780] Hardware name: ARM LTD ARM Juno Development
>> Platform/ARM Juno Development
>> Platform, BIOS EDK II Jul 30 2018
>> [ 1742.222596] Call trace:
>> [ 1742.225059]  dump_backtrace+0x0/0x328
>> [ 1742.228738]  show_stack+0x18/0x28
>> [ 1742.232071]  dump_stack+0x134/0x1b0
>> [ 1742.235578]  print_address_description.isra.0+0x6c/0x350
>> [ 1742.240910]  __kasan_report+0x10c/0x180
>> [ 1742.244763]  kasan_report+0x4c/0x68
>> [ 1742.248268]  __asan_report_load8_noabort+0x30/0x48
>> [ 1742.253081]  vgic_flush_pending_lpis+0x250/0x2c0
>> [ 1742.257718]  __kvm_vgic_destroy+0x1cc/0x478
>> [ 1742.261919]  kvm_vgic_destroy+0x30/0x48
>> [ 1742.265773]  kvm_arch_destroy_vm+0x20/0x128
>> [ 1742.269976]  kvm_put_kvm+0x3e0/0x8d0
>> [ 1742.273567]  kvm_vm_release+0x3c/0x60
>> [ 1742.277248]  __fput+0x218/0x630
>> [ 1742.280406]  ____fput+0x10/0x20
>> [ 1742.283565]  task_work_run+0xd8/0x1f0
>> [ 1742.287245]  do_exit+0x87c/0x2640
>> [ 1742.290575]  do_group_exit+0xd0/0x258
>> [ 1742.294254]  __arm64_sys_exit_group+0x3c/0x48
>> [ 1742.298631]  el0_svc_common.constprop.0+0x10c/0x348
>> [ 1742.303529]  do_el0_svc+0x48/0xd0
>> [ 1742.306861]  el0_sync_handler+0x11c/0x1b8
>> [ 1742.310888]  el0_sync+0x158/0x180

>> [ 1742.348215] page dumped because: kasan: bad access detected

> I think this is slightly more concerning. The issue is that we have
> started freeing parts of the interrupt state already (we free the
> SPIs early in kvm_vgic_dist_destroy()).

(I took this to be some wild pointer access. Previously for use-after-free I've seen it
print where it was allocated and where it was freed).


> If a SPI was pending or active at this stage (i.e. present in the
> ap_list), we are going to iterate over memory that has been freed
> already. This is bad, and this can happen on GICv3 as well.


> I think this should solve it, but I need to test it on a GICv2 system:
> 
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 53ec9b9d9bc43..30dbec9fe0b4a 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -365,10 +365,10 @@ static void __kvm_vgic_destroy(struct kvm *kvm)
> 
>      vgic_debug_destroy(kvm);
> 
> -    kvm_vgic_dist_destroy(kvm);
> -
>      kvm_for_each_vcpu(i, vcpu, kvm)
>          kvm_vgic_vcpu_destroy(vcpu);
> +
> +    kvm_vgic_dist_destroy(kvm);
>  }
> >  void kvm_vgic_destroy(struct kvm *kvm)

This works for me on Juno.


Thanks,

James
Marc Zyngier April 23, 2020, 3:13 p.m. UTC | #7
Hi James,

On 2020-04-23 15:34, James Morse wrote:
> Hi guys,
> 
> On 23/04/2020 13:03, Marc Zyngier wrote:
>> On 2020-04-23 12:35, James Morse wrote:

[...]

>>> [ 1742.348215] page dumped because: kasan: bad access detected
> 
>> I think this is slightly more concerning. The issue is that we have
>> started freeing parts of the interrupt state already (we free the
>> SPIs early in kvm_vgic_dist_destroy()).
> 
> (I took this to be some wild pointer access. Previously for
> use-after-free I've seen it
> print where it was allocated and where it was freed).

This is indeed what I managed to trigger by forcing a pending
SPI (the kvmtool UART interrupt) in the guest and forcefully
terminating it:

[ 3807.084237] 
==================================================================
[ 3807.086516] BUG: KASAN: use-after-free in 
vgic_flush_pending_lpis+0x54/0x198
[ 3807.088027] Read of size 8 at addr ffff00085514a328 by task 
ioeventfd-worke/231
[ 3807.089771]
[ 3807.090911] CPU: 4 PID: 231 Comm: ioeventfd-worke Not tainted 
5.7.0-rc2-00086-g2100c066e9a78 #200
[ 3807.092864] Hardware name: FVP Base RevC (DT)
[ 3807.094003] Call trace:
[ 3807.095180]  dump_backtrace+0x0/0x268
[ 3807.096445]  show_stack+0x1c/0x28
[ 3807.097961]  dump_stack+0xe8/0x144
[ 3807.099374]  print_address_description.isra.11+0x6c/0x354
[ 3807.101002]  __kasan_report+0x110/0x1c8
[ 3807.102332]  kasan_report+0x48/0x60
[ 3807.103769]  __asan_load8+0x9c/0xc0
[ 3807.105113]  vgic_flush_pending_lpis+0x54/0x198
[ 3807.107187]  __kvm_vgic_destroy+0x120/0x278
[ 3807.108814]  kvm_vgic_destroy+0x30/0x48
[ 3807.110443]  kvm_arch_destroy_vm+0x20/0xa8
[ 3807.111868]  kvm_put_kvm+0x234/0x460
[ 3807.113697]  kvm_vm_release+0x34/0x48
[ 3807.115162]  __fput+0x104/0x2f8
[ 3807.116464]  ____fput+0x14/0x20
[ 3807.117929]  task_work_run+0xbc/0x188
[ 3807.119419]  do_exit+0x514/0xff8
[ 3807.120859]  do_group_exit+0x78/0x108
[ 3807.122323]  get_signal+0x164/0xcc0
[ 3807.123951]  do_notify_resume+0x244/0x5e0
[ 3807.125416]  work_pending+0x8/0x10
[ 3807.126392]
[ 3807.126969] Allocated by task 229:
[ 3807.128834]  save_stack+0x24/0x50
[ 3807.130462]  __kasan_kmalloc.isra.10+0xc4/0xe0
[ 3807.132134]  kasan_kmalloc+0xc/0x18
[ 3807.133554]  __kmalloc+0x174/0x270
[ 3807.135182]  vgic_init.part.2+0xe0/0x4f0
[ 3807.136809]  vgic_init+0x48/0x58
[ 3807.138095]  vgic_set_common_attr.isra.4+0x2fc/0x388
[ 3807.140081]  vgic_v3_set_attr+0x8c/0x350
[ 3807.141692]  kvm_device_ioctl_attr+0x124/0x190
[ 3807.143260]  kvm_device_ioctl+0xe8/0x170
[ 3807.144947]  ksys_ioctl+0xb8/0xf8
[ 3807.146575]  __arm64_sys_ioctl+0x48/0x60
[ 3807.148365]  el0_svc_common.constprop.1+0xc8/0x1c8
[ 3807.150015]  do_el0_svc+0x94/0xa0
[ 3807.151605]  el0_sync_handler+0x120/0x190
[ 3807.152922]  el0_sync+0x140/0x180
[ 3807.153899]
[ 3807.154784] Freed by task 231:
[ 3807.156178]  save_stack+0x24/0x50
[ 3807.157805]  __kasan_slab_free+0x10c/0x188
[ 3807.159433]  kasan_slab_free+0x10/0x18
[ 3807.160897]  kfree+0x88/0x350
[ 3807.162570]  __kvm_vgic_destroy+0x5c/0x278
[ 3807.164153]  kvm_vgic_destroy+0x30/0x48
[ 3807.165780]  kvm_arch_destroy_vm+0x20/0xa8
[ 3807.167408]  kvm_put_kvm+0x234/0x460
[ 3807.168691]  kvm_vm_release+0x34/0x48
[ 3807.170281]  __fput+0x104/0x2f8
[ 3807.171870]  ____fput+0x14/0x20
[ 3807.173268]  task_work_run+0xbc/0x188
[ 3807.174733]  do_exit+0x514/0xff8
[ 3807.176242]  do_group_exit+0x78/0x108
[ 3807.177434]  get_signal+0x164/0xcc0
[ 3807.179289]  do_notify_resume+0x244/0x5e0
[ 3807.180755]  work_pending+0x8/0x10
[ 3807.181731]
[ 3807.182707] The buggy address belongs to the object at 
ffff00085514a000
[ 3807.182707]  which belongs to the cache kmalloc-4k of size 4096
[ 3807.185381] The buggy address is located 808 bytes inside of
[ 3807.185381]  4096-byte region [ffff00085514a000, ffff00085514b000)
[ 3807.187591] The buggy address belongs to the page:
[ 3807.189381] page:fffffe0021345200 refcount:1 mapcount:0 
mapping:00000000090b1068 index:0x0 head:fffffe0021345200 order:3 
compound_mapcount:0 compound_pincount:0
[ 3807.192148] flags: 0x2ffff00000010200(slab|head)
[ 3807.194123] raw: 2ffff00000010200 dead000000000100 dead000000000122 
ffff00085a00f200
[ 3807.196379] raw: 0000000000000000 0000000080040004 00000001ffffffff 
0000000000000000
[ 3807.198097] page dumped because: kasan: bad access detected
[ 3807.199289]
[ 3807.200123] Memory state around the buggy address:
[ 3807.201750]  ffff00085514a200: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[ 3807.203704]  ffff00085514a280: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[ 3807.205657] >ffff00085514a300: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[ 3807.207285]                                   ^
[ 3807.208826]  ffff00085514a380: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[ 3807.210812]  ffff00085514a400: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[ 3807.212402] 
==================================================================

>> If a SPI was pending or active at this stage (i.e. present in the
>> ap_list), we are going to iterate over memory that has been freed
>> already. This is bad, and this can happen on GICv3 as well.
> 
> 
>> I think this should solve it, but I need to test it on a GICv2 system:
>> 
>> diff --git a/virt/kvm/arm/vgic/vgic-init.c 
>> b/virt/kvm/arm/vgic/vgic-init.c
>> index 53ec9b9d9bc43..30dbec9fe0b4a 100644
>> --- a/virt/kvm/arm/vgic/vgic-init.c
>> +++ b/virt/kvm/arm/vgic/vgic-init.c
>> @@ -365,10 +365,10 @@ static void __kvm_vgic_destroy(struct kvm *kvm)
>> 
>>      vgic_debug_destroy(kvm);
>> 
>> -    kvm_vgic_dist_destroy(kvm);
>> -
>>      kvm_for_each_vcpu(i, vcpu, kvm)
>>          kvm_vgic_vcpu_destroy(vcpu);
>> +
>> +    kvm_vgic_dist_destroy(kvm);
>>  }
>> >  void kvm_vgic_destroy(struct kvm *kvm)
> 
> This works for me on Juno.

I've verified that the above splat disappears on the FVP too.
I'll squash the fix in, add your RB (which I assume stands)
and send the whole thing as a lockdown present to Paolo!

Thanks,

          M.
diff mbox series

Patch

diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index a963b9d766b73..53ec9b9d9bc43 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -348,6 +348,12 @@  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 
+	/*
+	 * Retire all pending LPIs on this vcpu anyway as we're
+	 * going to destroy it.
+	 */
+	vgic_flush_pending_lpis(vcpu);
+
 	INIT_LIST_HEAD(&vgic_cpu->ap_list_head);
 }