From patchwork Thu Feb 14 04:39:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Gibson X-Patchwork-Id: 10811795 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 996576C2 for ; Thu, 14 Feb 2019 04:46:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8443A2DC15 for ; Thu, 14 Feb 2019 04:46:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 736EA2DC87; Thu, 14 Feb 2019 04:46:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id CA4312DC15 for ; Thu, 14 Feb 2019 04:46:44 +0000 (UTC) Received: from localhost ([127.0.0.1]:40040 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gu8v6-00077K-5r for patchwork-qemu-devel@patchwork.kernel.org; Wed, 13 Feb 2019 23:46:44 -0500 Received: from eggs.gnu.org ([209.51.188.92]:46354) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gu8pS-0002YX-B2 for qemu-devel@nongnu.org; Wed, 13 Feb 2019 23:40:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gu8pE-0007CR-UL for qemu-devel@nongnu.org; Wed, 13 Feb 2019 23:40:48 -0500 Received: from ozlabs.org ([2401:3900:2:1::2]:40795) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gu8pE-0006v3-0a; Wed, 13 Feb 2019 23:40:40 -0500 Received: by ozlabs.org (Postfix, from userid 1007) id 440Nvr4ctfz9sN8; Thu, 14 Feb 2019 15:39:20 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1550119160; bh=2Lp42hE+I2N0ishsTEnbN0moixP3X2aGKuoJOni2jo8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dbHESOD7+BjcS5V0pULYmGFceRscyKKXTamyspACdvwzOVfbc0lW6XngR3OOpEHdz HZ1ludrrk4F2WxwrIWuCI7dQ7hVCcLyck36KsvNoi6ylmtkv25Eq1jSXe5xC5GJ0HZ rZeo72slcAa6zsSV/DekdH72ZWG6yYsT/TRKgASU= From: David Gibson To: mst@redhat.com, qemu-devel@nongnu.org Date: Thu, 14 Feb 2019 15:39:16 +1100 Message-Id: <20190214043916.22128-6-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190214043916.22128-1-david@gibson.dropbear.id.au> References: <20190214043916.22128-1-david@gibson.dropbear.id.au> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2401:3900:2:1::2 Subject: [Qemu-devel] [PATCH 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-ppc@nongnu.org, David Gibson Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but we can only actually discard memory in units of the host page size. Now, we handle this very badly: we silently ignore balloon requests that aren't host page aligned, and for requests that are host page aligned we discard the entire host page. The latter can corrupt guest memory if its page size is smaller than the host's. The obvious choice would be to disable the balloon if the host page size is not 4kiB. However, that would break the special case where host and guest have the same page size, but that's larger than 4kiB. That case currently works by accident[1] - and is used in practice on many production POWER systems where 64kiB has long been the Linux default page size on both host and guest. To make the balloon safe, without breaking that useful special case, we need to accumulate 4kiB balloon requests until we have a whole contiguous host page to discard. We could in principle do that across all guest memory, but it would require a large bitmap to track. This patch represents a compromise: we track ballooned subpages for a single contiguous host page at a time. This means that if the guest discards all 4kiB chunks of a host page in succession, we will discard it. This is the expected behaviour in the (host page) == (guest page) != 4kiB case we want to support. If the guest scatters 4kiB requests across different host pages, we don't discard anything, and issue a warning. Not ideal, but at least we don't corrupt guest memory as the previous version could. Warning reporting is kind of a compromise here. Determining whether we're in a problematic state at realize() time is tricky, because we'd have to look at the host pagesizes of all memory backends, but we can't really know if some of those backends could be for special purpose memory that's not subject to ballooning. Reporting only when the guest tries to balloon a partial page also isn't great because if the guest page size happens to line up it won't indicate that we're in a non ideal situation. It could also cause alarming repeated warnings whenever a migration is attempted. So, what we do is warn the first time the guest attempts balloon a partial host page, whether or not it will end up ballooning the rest of the page immediately afterwards. [1] Because when the guest attempts to balloon a page, it will submit requests for each 4kiB subpage. Most will be ignored, but the one which happens to be host page aligned will discard the whole lot. Signed-off-by: David Gibson --- hw/virtio/virtio-balloon.c | 69 +++++++++++++++++++++++++----- include/hw/virtio/virtio-balloon.h | 3 ++ 2 files changed, 62 insertions(+), 10 deletions(-) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index e4cd8d566b..65f861cbef 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -33,33 +33,82 @@ #define BALLOON_PAGE_SIZE (1 << VIRTIO_BALLOON_PFN_SHIFT) +typedef struct PartiallyBalloonedPage { + RAMBlock *rb; + ram_addr_t base; + unsigned long bitmap[]; +} PartiallyBalloonedPage; + static void balloon_inflate_page(VirtIOBalloon *balloon, MemoryRegion *mr, hwaddr offset) { void *addr = memory_region_get_ram_ptr(mr) + offset; RAMBlock *rb; size_t rb_page_size; - ram_addr_t ram_offset; + int subpages; + ram_addr_t ram_offset, host_page_base; /* XXX is there a better way to get to the RAMBlock than via a * host address? */ rb = qemu_ram_block_from_host(addr, false, &ram_offset); rb_page_size = qemu_ram_pagesize(rb); + host_page_base = ram_offset & ~(rb_page_size - 1); + + if (rb_page_size == BALLOON_PAGE_SIZE) { + /* Easy case */ - /* Silently ignore hugepage RAM blocks */ - if (rb_page_size != getpagesize()) { + ram_block_discard_range(rb, ram_offset, rb_page_size); + /* We ignore errors from ram_block_discard_range(), because it + * has already reported them, and failing to discard a balloon + * page is not fatal */ return; } - /* Silently ignore unaligned requests */ - if (ram_offset & (rb_page_size - 1)) { - return; + /* Hard case + * + * We've put a piece of a larger host page into the balloon - we + * need to keep track until we have a whole host page to + * discard + */ + warn_report_once( +"Balloon used with backing page size > 4kiB, this may not be reliable"); + + subpages = rb_page_size / BALLOON_PAGE_SIZE; + + if (balloon->pbp + && (rb != balloon->pbp->rb + || host_page_base != balloon->pbp->base)) { + /* We've partially ballooned part of a host page, but now + * we're trying to balloon part of a different one. Too hard, + * give up on the old partial page */ + free(balloon->pbp); + balloon->pbp = NULL; } - ram_block_discard_range(rb, ram_offset, rb_page_size); - /* We ignore errors from ram_block_discard_range(), because it has - * already reported them, and failing to discard a balloon page is - * not fatal */ + if (!balloon->pbp) { + /* Starting on a new host page */ + size_t bitlen = BITS_TO_LONGS(subpages) * sizeof(unsigned long); + balloon->pbp = g_malloc0(sizeof(PartiallyBalloonedPage) + bitlen); + balloon->pbp->rb = rb; + balloon->pbp->base = host_page_base; + } + + bitmap_set(balloon->pbp->bitmap, + (ram_offset - balloon->pbp->base) / BALLOON_PAGE_SIZE, + subpages); + + if (bitmap_full(balloon->pbp->bitmap, subpages)) { + /* We've accumulated a full host page, we can actually discard + * it now */ + + ram_block_discard_range(rb, balloon->pbp->base, rb_page_size); + /* We ignore errors from ram_block_discard_range(), because it + * has already reported them, and failing to discard a balloon + * page is not fatal */ + + free(balloon->pbp); + balloon->pbp = NULL; + } } static const char *balloon_stat_names[] = { diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h index e0df3528c8..99dcd6d105 100644 --- a/include/hw/virtio/virtio-balloon.h +++ b/include/hw/virtio/virtio-balloon.h @@ -30,6 +30,8 @@ typedef struct virtio_balloon_stat_modern { uint64_t val; } VirtIOBalloonStatModern; +typedef struct PartiallyBalloonedPage PartiallyBalloonedPage; + typedef struct VirtIOBalloon { VirtIODevice parent_obj; VirtQueue *ivq, *dvq, *svq; @@ -42,6 +44,7 @@ typedef struct VirtIOBalloon { int64_t stats_last_update; int64_t stats_poll_interval; uint32_t host_features; + PartiallyBalloonedPage *pbp; } VirtIOBalloon; #endif