From patchwork Wed Mar 2 18:49:20 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 8484741 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id EE532C0553 for ; Wed, 2 Mar 2016 18:52:11 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id C04BD200FE for ; Wed, 2 Mar 2016 18:52:10 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7FBE920382 for ; Wed, 2 Mar 2016 18:52:09 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xen.org with esmtp (Exim 4.84) (envelope-from ) id 1abBq7-0004tU-EF; Wed, 02 Mar 2016 18:49:39 +0000 Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.84) (envelope-from ) id 1abBq5-0004tM-O5 for xen-devel@lists.xen.org; Wed, 02 Mar 2016 18:49:37 +0000 Received: from [193.109.254.147] by server-8.bemta-14.messagelabs.com id BA/D2-03645-1C537D65; Wed, 02 Mar 2016 18:49:37 +0000 X-Env-Sender: joao.m.martins@oracle.com X-Msg-Ref: server-8.tower-27.messagelabs.com!1456944574!24497352!1 X-Originating-IP: [156.151.31.81] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTU2LjE1MS4zMS44MSA9PiAyODgzMzk=\n X-StarScan-Received: X-StarScan-Version: 8.11; banners=-,-,- X-VirusChecked: Checked Received: (qmail 6438 invoked from network); 2 Mar 2016 18:49:36 -0000 Received: from userp1040.oracle.com (HELO userp1040.oracle.com) (156.151.31.81) by server-8.tower-27.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 2 Mar 2016 18:49:36 -0000 Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u22InScM000308 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 2 Mar 2016 18:49:28 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.13.8) with ESMTP id u22InRdp026263 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Wed, 2 Mar 2016 18:49:28 GMT Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u22InQ6Q015647; Wed, 2 Mar 2016 18:49:27 GMT Received: from [192.168.0.104] (/89.181.14.119) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 02 Mar 2016 10:49:26 -0800 From: Joao Martins To: Jan Beulich References: <1456174934-22973-1-git-send-email-joao.m.martins@oracle.com> <1456174934-22973-2-git-send-email-joao.m.martins@oracle.com> <56CF41F302000078000D6595@prv-mh.provo.novell.com> Message-ID: <56D735B0.8040402@oracle.com> Date: Wed, 2 Mar 2016 18:49:20 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.4.0 MIME-Version: 1.0 In-Reply-To: <56CF41F302000078000D6595@prv-mh.provo.novell.com> X-Source-IP: userv0022.oracle.com [156.151.31.74] Cc: Wei Liu , Ian Campbell , Stefano Stabellini , Andrew Cooper , Ian Jackson , xen-devel@lists.xen.org, Keir Fraser Subject: Re: [Xen-devel] [PATCH RFC 1/8] x86/hvm: set initial apicid to vcpu_id X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 02/25/2016 05:03 PM, Jan Beulich wrote: >>>> On 22.02.16 at 22:02, wrote: >> Currently the initial_apicid is set vcpu_id * 2 which makes it difficult >> for the toolstack to manage how is the topology seen by the guest. >> Instead of forcing procpkg and proccount to be VCPUID * 2, instead we >> set it to max vcpuid on proccount to max_vcpu_id + 1 (logical number of >> logical cores) and procpkg to max_vcpu_id (max cores minus 1) > > I'm afraid it takes more than this to explain why the change is > needed or at least desirable. Apologies for my clumsiness in the commit message. I should have explained properly why we need this for this series in the first place. Currently use initial_apicid as vcpu_id * 2, and doubled the leafs 1 and 4 values (proccount and procpkg) which means we will address 8 LAPICIDs (tohugh only 4 will be used). Example topology and algorithm below to facilitate discussion: # Maximum logical addressable IDs (logical processors in a package) proccount = CPUID.1:EBX[23:16] # Maximum core addressable IDs - 1 (maximum cores in a package - 1) procpkg = CPUID.(4,0):EAX[31:26] # MSB (Calculate most significant bit) SMT_Mask_width = MSB(proccount / (procpkg + 1)) Core_Mask_width = MSB(procpkg + 1) CoreSMT_Mask_width = SMT_Mask_width + Core_Mask_width Pkg_Mask_width = 1 << CoreSMT_Mask_width SMT_ID = APICID & ((1 << SMT_Mask_width) - 1) Core_ID = (APICID >> SMT_Mask_width) & ((1 << Core_Mask_width) - 1) Pkg_ID = (APICID & Pkg_Mask_width) >> CoreSMT_Mask_width So as it is right now, the topology on a 4 vcpu HVM guest looks like: => topology(proccount = 16, procpkg = 7) # current APICID=0 SMT_ID=0 CORE_ID=0 PKG_ID=0 # VCPU 0 APICID=1 SMT_ID=1 CORE_ID=0 PKG_ID=0 APICID=2 SMT_ID=0 CORE_ID=1 PKG_ID=0 # VCPU 1 APICID=3 SMT_ID=1 CORE_ID=1 PKG_ID=0 APICID=4 SMT_ID=0 CORE_ID=2 PKG_ID=0 # VCPU 2 APICID=5 SMT_ID=1 CORE_ID=2 PKG_ID=0 APICID=6 SMT_ID=0 CORE_ID=3 PKG_ID=0 # VCPU 3 APICID=7 SMT_ID=1 CORE_ID=3 PKG_ID=0 [...] APICID=14 SMT_ID=0 CORE_ID=7 PKG_ID=0 APICID=15 SMT_ID=1 CORE_ID=7 PKG_ID=0 As you know, APICID describes the SMT, Core and PKG. Problem with having APICID in even numbers (0, 2, 4, ... N) is that we can't describe the SMT/siblings in the topology. Thus turning the APICID ID space into (0, 1, 2 .. N) like this patch proposes means we can know calculate all possibilities on both topology kinds. Note that is a prerequisite patch so that a later patch in this series sets the proccount and procpkg to enable seeing some cores as SMT siblings. => topology(proccount = 4, procpkg = 3) # with this patch APICID=0 SMT_ID=0 CORE_ID=0 PKG_ID=0 APICID=1 SMT_ID=0 CORE_ID=1 PKG_ID=0 APICID=2 SMT_ID=0 CORE_ID=2 PKG_ID=0 APICID=3 SMT_ID=0 CORE_ID=3 PKG_ID=0 x2APIC isn't addressed here for this RFC but it has the same issue (and consequently exposure of FEATURE_XTOPOLOGY, CPUID.(EAX=0xB, ECX=N)). One difference is that the SMT,Core,Pkg mask widths are fetched from each subleaf directly as opposed to a calculation between procpkg and proccount. > In particular I'd like to suggest that > you do some archeology to understand why things are the way > they are. Digging in the history and threads, this behavior seems to be introduced by commit c21d85b ("[HVM] Change VCPU->LAPIC_ID mapping so that VCPU0 has ID0") where the main issue looked like a conflict between VCPU 0 LAPICID and IOAPIC ID. Previous commits (a41ba62, facdf41) made IOAPIC on 0 and vLAPIC on 1..N but it broke on old kernels (for the lack of LAPIC 0), so it ended up having a vLAPIC ID space with 0, 2, 4, 6 and assign vIOAPICID = 1. This way all of {L,IO}APICs have unique IDs - this thread (http://lists.xen.org/archives/html/xen-devel/2008-09/msg00986.html) seems to mention something along these lines too. But the manuals aren't exactly clear on this ID uniqueness between LAPICs and I/O APICs on more recent processors. Any lights on this would be great. Intel 82093AA (IOAPIC) datasheet [0] says the following: "This register contains the 4-bit APIC ID. The ID serves as a physical name of the IOAPIC. All APIC devices using the APIC bus should have a unique APIC ID." Though looking at the Intel SDM Volume 3, Chapter 10.1, Figure 10-2 and 10-3, the APIC bus seems to be used only up to P6 family processors (Figure 10-2) and it's indeed shared between I/OAPIC and LAPIC . For its successor (Pentium 4 and later) it's no longer the case (Figure 10-3). My Broadwell machine in fact have conflicting APIC IDs between IOAPIC and LAPIC in my MADT table. And it does seem that it's the case too for SeaBIOS (commit e39b938 ("report real I/O APIC ID (0) on MADT and MP-table (v3)") ) and QEMU. Though it wouldn't justify as reason for doing this on Xen. [0] http://www.intel.com/design/chipsets/datashts/29056601.pdf >> --- a/xen/arch/x86/hvm/hvm.c >> +++ b/xen/arch/x86/hvm/hvm.c >> @@ -4633,7 +4633,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, >> unsigned int *ebx, >> case 0x1: >> /* Fix up VLAPIC details. */ >> *ebx &= 0x00FFFFFFu; >> - *ebx |= (v->vcpu_id * 2) << 24; >> + *ebx |= (v->vcpu_id) << 24; >> if ( vlapic_hw_disabled(vcpu_vlapic(v)) ) >> __clear_bit(X86_FEATURE_APIC & 31, edx); > > In no case is this sufficient as adjustment to the hypervisor side, > as it gets things out of sync with e.g. hvm/vlapic.c. Yes, and also the firmware like this chunk below (tested too): diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h index b838cf9..9c1f2f7 100644 --- a/tools/firmware/hvmloader/config.h +++ b/tools/firmware/hvmloader/config.h @@ -47,7 +47,7 @@ extern struct bios_config ovmf_config; #define IOAPIC_VERSION 0x11 #define LAPIC_BASE_ADDRESS 0xfee00000 -#define LAPIC_ID(vcpu_id) ((vcpu_id) * 2) +#define LAPIC_ID(vcpu_id) (vcpu_id) #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */ #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */ diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c index 01a8430..80fc6a1 100644 --- a/xen/arch/x86/hvm/vlapic.c +++ b/xen/arch/x86/hvm/vlapic.c @@ -1206,7 +1206,7 @@ void vlapic_reset(struct vlapic *vlapic) if ( !has_vlapic(v->domain) ) return; - vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24); + vlapic_set_reg(vlapic, APIC_ID, v->vcpu_id << 24); vlapic_set_reg(vlapic, APIC_LVR, VLAPIC_VERSION); for ( i = 0; i < 8; i++ )