From patchwork Thu Oct 6 03:03:07 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Gibson X-Patchwork-Id: 9363633 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 64CD0600C8 for ; Thu, 6 Oct 2016 03:08:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4BCF928717 for ; Thu, 6 Oct 2016 03:08:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 406CF28DB5; Thu, 6 Oct 2016 03:08:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5138928717 for ; Thu, 6 Oct 2016 03:08:25 +0000 (UTC) Received: from localhost ([::1]:52684 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1brz2m-0003WD-FR for patchwork-qemu-devel@patchwork.kernel.org; Wed, 05 Oct 2016 23:08:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45355) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bryxs-0000KE-9q for qemu-devel@nongnu.org; Wed, 05 Oct 2016 23:03:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bryxo-00031v-Vc for qemu-devel@nongnu.org; Wed, 05 Oct 2016 23:03:19 -0400 Received: from ozlabs.org ([103.22.144.67]:46505) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bryxo-00030D-D4; Wed, 05 Oct 2016 23:03:16 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 3sqHXK712Nz9t0q; Thu, 6 Oct 2016 14:03:13 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1475722993; bh=8HMNOfbVLieeIxXGgw35XbB/ZKJlopWL2i3vVCHeAdE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XKOjzKgY3fINPQUZZeFmXyM2d4KLdg5uSzoFX7EsnUqpPxB/76fAKNod2owDO0s+L mNE+xsfWgfeoiLNZtNBGYBtgxs/2KBAkUz4qAI4JpgAHm8JHk4vRNDFDdC0giPQEjT TASFHXLdxvAIr6kzRFXcr4RceZlLmsmhZD25uK8A= From: David Gibson To: qemu-ppc@nongnu.org Date: Thu, 6 Oct 2016 14:03:07 +1100 Message-Id: <1475722987-18644-5-git-send-email-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1475722987-18644-1-git-send-email-david@gibson.dropbear.id.au> References: <1475722987-18644-1-git-send-email-david@gibson.dropbear.id.au> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 103.22.144.67 Subject: [Qemu-devel] [RFC 4/4] spapr: Improved placement of PCI host bridges in guest memory map X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lvivier@redhat.com, thuth@redhat.com, mdroth@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com, mst@redhat.com, aik@ozlabs.ru, qemu-devel@nongnu.org, agraf@suse.de, abologna@redhat.com, bharata@linux.vnet.ibm.com, mpolednik@redhat.com, David Gibson Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP Currently, the MMIO space for accessing PCI on pseries guests begins at 1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB chunk of address space in which it places its outbound PIO and 32-bit and 64-bit MMIO windows. This scheme as several problems: - It limits guest RAM to 1 TiB (though we have a limited fix for this now) - It limits the total MMIO window to 64 GiB. This is not always enough for some of the large nVidia GPGPU cards - Putting all the windows into a single 64 GiB area means that naturally aligning things within there will waste more address space. In addition there was a miscalculation in some of the defaults, which meant that the MMIO windows for each PHB actually slightly overran the 64 GiB region for that PHB. We got away without nasty consequences because the overrun fit within an unused area at the beginning of the next PHB's region, but it's not pretty. This patch implements a new scheme which addresses those problems, and is also closer to what bare metal hardware and pHyp guests generally use. Because some guest versions (including most current distro kernels) can't access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and 64 TiB. This is broken into 1 TiB chunks. The 1 TiB contains the PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for one PHB each. This reduces the number of allowed PHBs (without full manual configuration of all the windows) from 256 to 31, but this should still be plenty in practice. We also change some of the default window sizes for manually configured PHBs to saner values. Signed-off-by: David Gibson --- hw/ppc/spapr.c | 118 +++++++++++++++++++++++++++++++++++--------- hw/ppc/spapr_pci.c | 5 +- include/hw/pci-host/spapr.h | 8 ++- 3 files changed, 106 insertions(+), 25 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 864a48b..d842306 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2376,22 +2376,42 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index, hwaddr *mmio64, hwaddr *mmio64_size, unsigned n_dma, uint32_t *liobns, Error **errp) { + /* + * New-style PHB window placement. + * + * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window + * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO + * windows. + * + * Some guest kernels can't work with MMIO windows above 1<<46 + * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB + * + * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all + * PHBs. 33..34TiB has the 64-bit MMIO window for PHB0, 34..35 + * has the 64-bit window for PHB1 and so forth. + */ const uint64_t base_buid = 0x800000020000000ULL; - const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */ - const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */ - const hwaddr pio_offset = 0x80000000; /* 2 GiB */ - const uint32_t max_index = 255; - const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */ + const hwaddr mmio64_win_size = (1ULL << 40); /* 1 TiB */ - uint64_t max_hotplug_addr = spapr->hotplug_memory.base + - memory_region_size(&spapr->hotplug_memory.mr); - hwaddr phb0_base = QEMU_ALIGN_UP(max_hotplug_addr, phb0_alignment); - hwaddr phb_base; + int max_phbs = (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / mmio64_win_size - 1; + hwaddr mmio32_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM32_WIN_SIZE; + hwaddr mmio64_base = SPAPR_PCI_BASE + mmio64_win_size; int i; - if (index > max_index) { + /* Sanity check natural alignments */ + assert((SPAPR_PCI_BASE % mmio64_win_size) == 0); + assert((SPAPR_PCI_LIMIT % mmio64_win_size) == 0); + assert((mmio64_win_size % SPAPR_PCI_MEM32_WIN_SIZE) == 0); + assert((SPAPR_PCI_MEM32_WIN_SIZE % SPAPR_PCI_IO_WIN_SIZE) == 0); + /* Sanity check bounds */ + assert((SPAPR_PCI_BASE + max_phbs * SPAPR_PCI_IO_WIN_SIZE) + <= mmio32_base); + assert(mmio32_base + max_phbs * SPAPR_PCI_MEM32_WIN_SIZE + <= mmio64_base); + + if (index >= max_phbs) { error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)", - max_index); + max_phbs - 1); return; } @@ -2400,16 +2420,14 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index, liobns[i] = SPAPR_PCI_LIOBN(index, i); } - phb_base = phb0_base + index * phb_spacing; - *pio = phb_base + pio_offset; + *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE; *pio_size = SPAPR_PCI_IO_WIN_SIZE; - *mmio32 = phb_base + mmio_offset; - *mmio32_size = SPAPR_PCI_MMIO_WIN_SIZE; - /* - * We don't set the 64-bit MMIO window, relying on the PHB's - * fallback behaviour of automatically splitting a large "32-bit" - * window into contiguous 32-bit and 64-bit windows - */ + + *mmio32 = mmio32_base + index * SPAPR_PCI_MEM32_WIN_SIZE; + *mmio32_size = SPAPR_PCI_MEM32_WIN_SIZE; + + *mmio64 = mmio64_base + index * mmio64_win_size; + *mmio64_size = mmio64_win_size; } static void spapr_machine_class_init(ObjectClass *oc, void *data) @@ -2513,8 +2531,63 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true); /* * pseries-2.7 */ -#define SPAPR_COMPAT_2_7 \ - HW_COMPAT_2_7 \ +#define SPAPR_COMPAT_2_7 \ + HW_COMPAT_2_7 \ + { \ + .driver = TYPE_SPAPR_PCI_HOST_BRIDGE, \ + .property = "mem_win_size", \ + .value = stringify(SPAPR_PCI_2_7_MMIO_WIN_SIZE),\ + }, \ + { \ + .driver = TYPE_SPAPR_PCI_HOST_BRIDGE, \ + .property = "mem64_win_size", \ + .value = "0", \ + }, + + + +static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index, + uint64_t *buid, hwaddr *pio, hwaddr *pio_size, + hwaddr *mmio32, hwaddr *mmio32_size, + hwaddr *mmio64, hwaddr *mmio64_size, + unsigned n_dma, uint32_t *liobns, Error **errp) +{ + /* Legacy PHB placement for pseries-2.7 and earlier machine types */ + const uint64_t base_buid = 0x800000020000000ULL; + const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */ + const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */ + const hwaddr pio_offset = 0x80000000; /* 2 GiB */ + const uint32_t max_index = 255; + const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */ + + uint64_t max_hotplug_addr = spapr->hotplug_memory.base + + memory_region_size(&spapr->hotplug_memory.mr); + hwaddr phb0_base = QEMU_ALIGN_UP(max_hotplug_addr, phb0_alignment); + hwaddr phb_base; + int i; + + if (index > max_index) { + error_setg(errp, "\"index\" for PAPR PHB is too large (max %u)", + max_index); + return; + } + + *buid = base_buid + index; + for (i = 0; i < n_dma; ++i) { + liobns[i] = SPAPR_PCI_LIOBN(index, i); + } + + phb_base = phb0_base + index * phb_spacing; + *pio = phb_base + pio_offset; + *pio_size = SPAPR_PCI_IO_WIN_SIZE; + *mmio32 = phb_base + mmio_offset; + *mmio32_size = SPAPR_PCI_2_7_MMIO_WIN_SIZE; + /* + * We don't set the 64-bit MMIO window, relying on the PHB's + * fallback behaviour of automatically splitting a large "32-bit" + * window into contiguous 32-bit and 64-bit windows + */ +} static void spapr_machine_2_7_instance_options(MachineState *machine) { @@ -2527,6 +2600,7 @@ static void spapr_machine_2_7_class_options(MachineClass *mc) spapr_machine_2_8_class_options(mc); smc->tcg_default_cpu = "POWER7"; SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_7); + smc->phb_placement = phb_placement_2_7; } DEFINE_SPAPR_MACHINE(2_7, "2.7", false); diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index c4579c3..2cd9e6e 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -1565,9 +1565,10 @@ static Property spapr_phb_properties[] = { DEFINE_PROP_UINT32("liobn64", sPAPRPHBState, dma_liobn[1], -1), DEFINE_PROP_UINT64("mem_win_addr", sPAPRPHBState, mem_win_addr, -1), DEFINE_PROP_UINT64("mem_win_size", sPAPRPHBState, mem_win_size, - SPAPR_PCI_MMIO_WIN_SIZE), + SPAPR_PCI_MEM32_WIN_SIZE), DEFINE_PROP_UINT64("mem64_win_addr", sPAPRPHBState, mem64_win_addr, -1), - DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, 0), + DEFINE_PROP_UINT64("mem64_win_size", sPAPRPHBState, mem64_win_size, + SPAPR_PCI_MEM64_WIN_SIZE), DEFINE_PROP_UINT64("mem64_win_pciaddr", sPAPRPHBState, mem64_win_pciaddr, -1), DEFINE_PROP_UINT64("io_win_addr", sPAPRPHBState, io_win_addr, -1), diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h index 5324d4c..239082b 100644 --- a/include/hw/pci-host/spapr.h +++ b/include/hw/pci-host/spapr.h @@ -83,8 +83,14 @@ struct sPAPRPHBState { #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL #define SPAPR_PCI_MEM32_WIN_SIZE ((1ULL <<32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET) +#define SPAPR_PCI_MEM64_WIN_SIZE 0x10000000000ULL /* 1 TiB */ -#define SPAPR_PCI_MMIO_WIN_SIZE 0xf80000000 +/* Without manual configuration, all PCI outbound windows will be + * within this range */ +#define SPAPR_PCI_BASE (1ULL << 45) /* 32 TiB */ +#define SPAPR_PCI_LIMIT (1ULL << 46) /* 64 TiB */ + +#define SPAPR_PCI_2_7_MMIO_WIN_SIZE 0xf80000000 #define SPAPR_PCI_IO_WIN_SIZE 0x10000 #define SPAPR_PCI_MSI_WINDOW 0x40000000000ULL