From patchwork Tue Apr 26 16:13:49 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Tokarev <mjt@tls.msk.ru>
X-Patchwork-Id: 733332
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p3QGaSKB017619
	for <patchwork-kvm@patchwork.kernel.org>;
	Tue, 26 Apr 2011 16:36:29 GMT
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755423Ab1DZQg0 (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Tue, 26 Apr 2011 12:36:26 -0400
Received: from isrv.corpit.ru ([86.62.121.231]:46800 "EHLO isrv.corpit.ru"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751710Ab1DZQgZ (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 26 Apr 2011 12:36:25 -0400
Received: from gandalf.tls.msk.ru (mjt.vpn.tls.msk.ru [192.168.177.99])
	by isrv.corpit.ru (Postfix) with ESMTP id BA6A9A3246;
	Tue, 26 Apr 2011 20:36:23 +0400 (MSD)
Received: by gandalf.tls.msk.ru (Postfix, from userid 1000)
	id 8F71E13393; Tue, 26 Apr 2011 20:36:23 +0400 (MSD)
From: Michael Tokarev <mjt@tls.msk.ru>
Date: Tue, 26 Apr 2011 20:13:49 +0400
Subject: [PATCH] fix long-standing crash in 32bit migration
To: kvm@vger.kernel.org
Cc: mjt@tls.msk.ru, Bruce Rogers <brogers@novell.com>
Message-Id: <20110426163623.8F71E13393@gandalf.tls.msk.ru>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by
	milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]);
	Tue, 26 Apr 2011 16:36:29 +0000 (UTC)

This change fixes a long-standing immediate crash (memory corruption
and abort in glibc malloc code) in migration on 32bits.

The bug is present since this commit:

  commit 692d9aca97b865b0f7903565274a52606910f129
  Author: Bruce Rogers <brogers@novell.com>
  Date:   Wed Sep 23 16:13:18 2009 -0600

    qemu-kvm: allocate correct size for dirty bitmap

    The dirty bitmap copied out to userspace is stored in a long array,
    and gets copied out to userspace accordingly.  This patch accounts
    for that correctly.  Currently I'm seeing kvm crashing due to writing
    beyond the end of the alloc'd dirty bitmap memory, because the buffer
    has the wrong size.

    Signed-off-by: Bruce Rogers
    Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr,
 -            buf = qemu_malloc((slots[i].len / 4096 + 7) / 8 + 2);
 +            buf = qemu_malloc(BITMAP_SIZE(slots[i].len));
             r = kvm_get_map(kvm, KVM_GET_DIRTY_LOG, i, buf);

BITMAP_SIZE is now open-coded in that function, like this:

 size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS), HOST_LONG_BITS) / 8;

The problem is that HOST_LONG_BITS in 32bit userspace is 32
but it's 64 in 64bit kernel.  So userspace aligns this to
32, and kernel to 64, but since no length is passed from
userspace to kernel on ioctl, kernel uses its size calculation
and copies 4 extra bytes to userspace, corrupting memory.

Here's how it looks like during migrate execution:

our=20, kern=24
our=4, kern=8
...
our=4, kern=8
our=4064, kern=4064
our=512, kern=512
our=4, kern=8
our=20, kern=24
our=4, kern=8
...
our=4, kern=8
our=4064, kern=4064
*** glibc detected *** ./x86_64-softmmu/qemu-system-x86_64: realloc(): invalid next size: 0x08f20528 ***

(our is userspace size above, kern is the size as calculated
by the kernel).

Fix this by always aligning to 64 in a hope that no platform will
have sizeof(long)>8 any time soon, and add a comment describing it
all.  It's a small price to pay for bad kernel design.

Alternatively it's possible to fix that in the kernel by using

different size calculation depending on the current process.
But this becomes quite ugly.

Special thanks goes to Stefan Hajnoczi for spotting the fundamental
cause of the issue, and to Alexander Graf for his support in #qemu.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
CC: Bruce Rogers <brogers@novell.com>
---
 kvm-all.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 7e407f0..3e75e9e 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -382,7 +382,19 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
             break;
         }
 
-        size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS), HOST_LONG_BITS) / 8;
+	/* XXX bad kernel interface alert
+         * For dirty bitmap, kernel allocates array of size aligned to
+         * bits-per-long.  But for case when the kernel is 64bits and
+         * the userspace is 32bits, userspace can't align to the same
+         * bits-per-long, since sizeof(long) is different between kernel
+         * and user space.  This way, userspace will provide buffer which
+         * may be 4 bytes less than the kernel will use, resulting in
+         * userspace memory corruption (which is not detectable by valgrind
+         * too, in most cases).
+         * So for now, let's align to 64 instead of HOST_LONG_BITS here, in
+         * a hope that sizeof(long) wont become >8 any time soon.
+         */
+        size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS), /*HOST_LONG_BITS*/ 64) / 8;
         if (!d.dirty_bitmap) {
             d.dirty_bitmap = qemu_malloc(size);
         } else if (size > allocated_size) {