diff mbox series

[2/2] KVM: arm/arm64: vgic: Fix irq refcount leak in kvm_vgic_set_owner()

Message ID 1559818688-20638-3-git-send-email-Dave.Martin@arm.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm/arm64: vgic: A couple of memory leak fixes | expand

Commit Message

Dave Martin June 6, 2019, 10:58 a.m. UTC
kvm_vgic_set_owner() leaks a reference on the vgic_irq descriptor,
which does not seem to match up with any vgic_put_irq() that I can
find.

Since the irq pointer is not passed out and the caller must anyway
subsequently use vgic_get_irq() when is wants a pointer, it is not
clear why we should have a dangling refcount here.

The refcount is still needed inside kvm_vgic_set_owner() to prevent
the vgic_irq struct from disappearing while while it is
manipulated.

So, keep it vgic_get_irq() here, but add the matching
vgic_put_irq() before returning.

unreferenced object 0xffff800b6365ab80 (size 128):
  comm "qemu-system-aar", pid 14414, jiffies 4300822606 (age 84.436s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 b0 e1 e0 38 00 00 ff ff  ...........8....
    b0 e1 e0 38 00 00 ff ff 78 e6 ad dd 0a 80 ff ff  ...8....x.......
  backtrace:
    [<00000000a08b80e2>] kmem_cache_alloc+0x178/0x208
    [<00000000114591cb>] vgic_add_lpi.part.5+0x34/0x190
    [<00000000ec1425ae>] vgic_its_cmd_handle_mapi+0x320/0x348
    [<00000000935c5c32>] vgic_its_process_commands.part.14+0x350/0x8b8
    [<00000000dc256d2c>] vgic_mmio_write_its_cwriter+0x78/0x98
    [<000000008659acd2>] dispatch_mmio_write+0xd4/0x120

[...]

Cc: Christoffer Dall <christoffer.dall@arm.com>
Fixes: c6ccd30e0de3 ("KVM: arm/arm64: Introduce an allocator for in-kernel irq lines")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>

---

Based on the limited testing I've done so far, the patch _appears_ to
fix the bug.

However, I still don't understand which the bug is intermittent, or why
the arch_timer or pmu (the only apparent users of kvm_vgic_set_owner())
are claiming an LPI in the first place.

So there may be other bugs in the mix, or I may have misunderstood
something...

The bug (and fix) were observed with native qemu on ThunderX2, on a
merge of v5.1 with kvmarm/next commit 9eecfc22e0bf ("KVM: arm64: Fix
ptrauth ID register masking logic").

My qemu invocation was:

$ qemu-system-aarch64 -machine virt,accel=kvm,gic_version=3 -cpu host \
    -smp 4 -nographic \
    -drive id=vblock,file=block.qcow2,format=qcow2,if=none \
    -device virtio-blk-device,drive=vblock \
    -kernel Image -append 'root=/dev/vda1 ro'
---
 virt/kvm/arm/vgic/vgic.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Marc Zyngier June 6, 2019, 12:06 p.m. UTC | #1
On 06/06/2019 11:58, Dave Martin wrote:
> kvm_vgic_set_owner() leaks a reference on the vgic_irq descriptor,
> which does not seem to match up with any vgic_put_irq() that I can
> find.
> 
> Since the irq pointer is not passed out and the caller must anyway
> subsequently use vgic_get_irq() when is wants a pointer, it is not
> clear why we should have a dangling refcount here.
> 
> The refcount is still needed inside kvm_vgic_set_owner() to prevent
> the vgic_irq struct from disappearing while while it is
> manipulated.
> 
> So, keep it vgic_get_irq() here, but add the matching
> vgic_put_irq() before returning.
> 
> unreferenced object 0xffff800b6365ab80 (size 128):
>   comm "qemu-system-aar", pid 14414, jiffies 4300822606 (age 84.436s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 b0 e1 e0 38 00 00 ff ff  ...........8....
>     b0 e1 e0 38 00 00 ff ff 78 e6 ad dd 0a 80 ff ff  ...8....x.......
>   backtrace:
>     [<00000000a08b80e2>] kmem_cache_alloc+0x178/0x208
>     [<00000000114591cb>] vgic_add_lpi.part.5+0x34/0x190
>     [<00000000ec1425ae>] vgic_its_cmd_handle_mapi+0x320/0x348
>     [<00000000935c5c32>] vgic_its_process_commands.part.14+0x350/0x8b8
>     [<00000000dc256d2c>] vgic_mmio_write_its_cwriter+0x78/0x98
>     [<000000008659acd2>] dispatch_mmio_write+0xd4/0x120
> 
> [...]
> 
> Cc: Christoffer Dall <christoffer.dall@arm.com>
> Fixes: c6ccd30e0de3 ("KVM: arm/arm64: Introduce an allocator for in-kernel irq lines")
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> 
> ---
> 
> Based on the limited testing I've done so far, the patch _appears_ to
> fix the bug.
> 
> However, I still don't understand which the bug is intermittent, or why
> the arch_timer or pmu (the only apparent users of kvm_vgic_set_owner())
> are claiming an LPI in the first place.
> 
> So there may be other bugs in the mix, or I may have misunderstood
> something...

Yeah, this doesn't make much sense. Both timer and PMU are using PPIs,
which are not refcounted, so this vgic_put_irq() is effectively a NOP.
It doesn't invalidate the patch itself, it is just that I seriously
doubt it fixes anything.

LPIs do not use the owner field so far, so we must have another get/put
mismatch somewhere.

Thanks,

	M.
Dave Martin June 6, 2019, 12:34 p.m. UTC | #2
On Thu, Jun 06, 2019 at 01:06:33PM +0100, Marc Zyngier wrote:
> On 06/06/2019 11:58, Dave Martin wrote:
> > kvm_vgic_set_owner() leaks a reference on the vgic_irq descriptor,
> > which does not seem to match up with any vgic_put_irq() that I can
> > find.
> > 
> > Since the irq pointer is not passed out and the caller must anyway
> > subsequently use vgic_get_irq() when is wants a pointer, it is not
> > clear why we should have a dangling refcount here.
> > 
> > The refcount is still needed inside kvm_vgic_set_owner() to prevent
> > the vgic_irq struct from disappearing while while it is
> > manipulated.
> > 
> > So, keep it vgic_get_irq() here, but add the matching
> > vgic_put_irq() before returning.
> > 
> > unreferenced object 0xffff800b6365ab80 (size 128):
> >   comm "qemu-system-aar", pid 14414, jiffies 4300822606 (age 84.436s)
> >   hex dump (first 32 bytes):
> >     00 00 00 00 00 00 00 00 b0 e1 e0 38 00 00 ff ff  ...........8....
> >     b0 e1 e0 38 00 00 ff ff 78 e6 ad dd 0a 80 ff ff  ...8....x.......
> >   backtrace:
> >     [<00000000a08b80e2>] kmem_cache_alloc+0x178/0x208
> >     [<00000000114591cb>] vgic_add_lpi.part.5+0x34/0x190
> >     [<00000000ec1425ae>] vgic_its_cmd_handle_mapi+0x320/0x348
> >     [<00000000935c5c32>] vgic_its_process_commands.part.14+0x350/0x8b8
> >     [<00000000dc256d2c>] vgic_mmio_write_its_cwriter+0x78/0x98
> >     [<000000008659acd2>] dispatch_mmio_write+0xd4/0x120
> > 
> > [...]
> > 
> > Cc: Christoffer Dall <christoffer.dall@arm.com>
> > Fixes: c6ccd30e0de3 ("KVM: arm/arm64: Introduce an allocator for in-kernel irq lines")
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > 
> > ---
> > 
> > Based on the limited testing I've done so far, the patch _appears_ to
> > fix the bug.
> > 
> > However, I still don't understand which the bug is intermittent, or why
> > the arch_timer or pmu (the only apparent users of kvm_vgic_set_owner())
> > are claiming an LPI in the first place.
> > 
> > So there may be other bugs in the mix, or I may have misunderstood
> > something...
> 
> Yeah, this doesn't make much sense. Both timer and PMU are using PPIs,
> which are not refcounted, so this vgic_put_irq() is effectively a NOP.
> It doesn't invalidate the patch itself, it is just that I seriously
> doubt it fixes anything.
> 
> LPIs do not use the owner field so far, so we must have another get/put
> mismatch somewhere.

No argument from me.

As I say, this change _appeared_ to make this leak go away, but I
couldn't understand why, and didn't kick it very thoroughly.  So it
may well be a red herring.

Cheers
---Dave
diff mbox series

Patch

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 191decc..930319c 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -599,6 +599,7 @@  int kvm_vgic_set_owner(struct kvm_vcpu *vcpu, unsigned int intid, void *owner)
 	else
 		irq->owner = owner;
 	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+	vgic_put_irq(vcpu->kvm, irq);
 
 	return ret;
 }