From patchwork Tue May 2 05:45:01 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Gao X-Patchwork-Id: 9707435 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1095B60385 for ; Tue, 2 May 2017 05:46:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E9C132815E for ; Tue, 2 May 2017 05:46:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC7BA28334; Tue, 2 May 2017 05:46:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 491932815E for ; Tue, 2 May 2017 05:45:59 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d5Qap-0004oL-H3; Tue, 02 May 2017 05:43:23 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1d5Qao-0004oF-JH for xen-devel@lists.xen.org; Tue, 02 May 2017 05:43:22 +0000 Received: from [85.158.143.35] by server-3.bemta-6.messagelabs.com id D1/86-03058-97C18095; Tue, 02 May 2017 05:43:21 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDIsWRWlGSWpSXmKPExsVywNykWLdShiP SoP2qgMWSj4tZHBg9ju7+zRTAGMWamZeUX5HAmtG35zhjwRHbiq7m22wNjKv1uxi5OIQEpjNK vDszn6WLkZNDQoBX4siyGawQtr/E2Wd9bCC2kEC5xI8pR5i6GDk4WARUJLYdrwUJswkoS1z82 gtWIiJQKLHiSSeYzSzwjFHixppSEFtYwEfi+cp57CA2r4ClxK8ns9lA9vIKPGWSaN68kR1ifq nE1c9drBBFghInZz5hgRikJXHj30uwvcwC0hLL/3GAhDkF7CQerL/FCGKLAt2w+9Ze5gmMgrO QdM9C0j0LoXsBI/MqRo3i1KKy1CJdQyO9pKLM9IyS3MTMHF1DAzO93NTi4sT01JzEpGK95Pzc TYzAkGUAgh2MlzcGHGKU5GBSEuVVf8UeKcSXlJ9SmZFYnBFfVJqTWnyIUYaDQ0mCN0maI1JIs Cg1PbUiLTMHGD0waQkOHiUR3ilSQGne4oLE3OLMdIjUKUZdjndLP7xnEmLJy89LlRLnDQWZIQ BSlFGaBzcCFsmXGGWlhHkZgY4S4ilILcrNLEGVf8UozsGoJMzbCTKFJzOvBG7TK6AjmICOqFd jATmiJBEhJdXAyP8x+wsTa5fzhsUCxvWZNQURHNXufIaKZ+Y5xm3mczpVb8+vuON83exleZYu ubyuSZnle9bu3CaxYfP3L0/e2UW3CLv7/JwSlCk7gWef2d3QIw8nu9RtTWBR3+ayIlQq9drek HcZRtkn41PrPY7EnP7VU1C0Pl3wSt0SXneNoPU686MienOVWIozEg21mIuKEwHcBu+M3wIAAA == X-Env-Sender: chao.gao@intel.com X-Msg-Ref: server-11.tower-21.messagelabs.com!1493703798!66092571!1 X-Originating-IP: [192.55.52.115] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.4.12; banners=-,-,- X-VirusChecked: Checked Received: (qmail 10326 invoked from network); 2 May 2017 05:43:20 -0000 Received: from mga14.intel.com (HELO mga14.intel.com) (192.55.52.115) by server-11.tower-21.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 2 May 2017 05:43:20 -0000 Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2017 22:43:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,403,1488873600"; d="scan'208";a="851888749" Received: from skl-2s3.sh.intel.com ([10.239.48.51]) by FMSMGA003.fm.intel.com with ESMTP; 01 May 2017 22:43:16 -0700 Date: Tue, 2 May 2017 13:45:01 +0800 From: Chao Gao To: George Dunlap , Jan Beulich , Kevin Tian Message-ID: <20170502054459.GA13105@skl-2s3.sh.intel.com> Mail-Followup-To: George Dunlap , Jan Beulich , Kevin Tian , xen-devel@lists.xen.org, George Dunlap , Ian Jackson , Wei Liu , Jun Nakajima , Andrew Cooper References: <1493167967-74144-1-git-send-email-chao.gao@intel.com> <15f405cc-04aa-ac3d-8ae2-17f684b21d36@citrix.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <15f405cc-04aa-ac3d-8ae2-17f684b21d36@citrix.com> User-Agent: Mutt/1.8.0 (2017-02-23) Cc: Wei Liu , George Dunlap , Andrew Cooper , Ian Jackson , xen-devel@lists.xen.org, Jun Nakajima Subject: Re: [Xen-devel] [PATCH 0/4] mitigate the per-pCPU blocking list may be too long X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Apr 26, 2017 at 05:39:57PM +0100, George Dunlap wrote: >On 26/04/17 01:52, Chao Gao wrote: >> I compared the maximum of #entry in one list and #event (adding entry to >> PI blocking list) with and without the three latter patches. Here >> is the result: >> ------------------------------------------------------------- >> | | | | >> | Items | Maximum of #entry | #event | >> | | | | >> ------------------------------------------------------------- >> | | | | >> |W/ the patches | 6 | 22740 | >> | | | | >> ------------------------------------------------------------- >> | | | | >> |W/O the patches| 128 | 46481 | >> | | | | >> ------------------------------------------------------------- > >Any chance you could trace how long the list traversal took? It would >be good for future reference to have an idea what kinds of timescales >we're talking about. Hi. I made a simple test to get the time consumed by the list traversal. Apply below patch and create one hvm guest with 128 vcpus and a passthrough 40 NIC. All guest vcpu are pinned to one pcpu. collect data by 'xentrace -D -e 0x82000 -T 300 trace.bin' and decode data by xentrace_format. When the list length is about 128, the traversal time is in the range of 1750 cycles to 39330 cycles. The physical cpu's frequence is 1795.788MHz, therefore the time consumed is in the range of 1us to 22us. If 0.5ms is the upper bound the system can tolerate, at most 2900 vcpus can be added into the list. I hope there is no error in the test and analysis. Thanks Chao ---8<--- From 504fd32bc042670812daad41efcd982434b98cd4 Mon Sep 17 00:00:00 2001 From: Chao Gao Date: Wed, 26 Apr 2017 03:39:06 +0800 Subject: [PATCH] xentrace: trace PI-related events. This patch adds TRC_HVM_VT_D_PI_BLOCK, TRC_HVM_PI_WAKEUP_START and TRC_HVM_PI_WAKEUP_END to track PI-related events. Specifically, TRC_HVM_VT_D_PI_BLOCK track adding one entry to the per-pcpu blocking list. TRC_HVM_PI_WAKEUP_{START, END} mark the start and end of PI blocking list traversal. Also introduce a 'counter' to track the number of entries in the list. Signed-off-by: Chao Gao --- tools/xentrace/formats | 3 +++ xen/arch/x86/hvm/vmx/vmx.c | 13 ++++++++++++- xen/include/asm-x86/hvm/trace.h | 3 +++ xen/include/public/trace.h | 3 +++ 4 files changed, 21 insertions(+), 1 deletion(-) diff --git a/tools/xentrace/formats b/tools/xentrace/formats index 8b31780..34ed9e4 100644 --- a/tools/xentrace/formats +++ b/tools/xentrace/formats @@ -125,6 +125,9 @@ 0x00082020 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) INTR_WINDOW [ value = 0x%(1)08x ] 0x00082021 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) NPF [ gpa = 0x%(2)08x%(1)08x mfn = 0x%(4)08x%(3)08x qual = 0x%(5)04x p2mt = 0x%(6)04x ] 0x00082023 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) TRAP [ vector = 0x%(1)02x ] +0x00082026 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) PI_BLOCK_LIST [ domid = 0x%(1)04x vcpu = 0x%(2)04x, pcpu = 0x%(3)04x, #entry = 0x%(4)04x ] +0x00082027 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) PI_WAKEUP_START [ list_len = 0x%(1)04x ] +0x00082028 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) PI_WAKEUP_END 0x0010f001 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) page_grant_map [ domid = %(1)d ] 0x0010f002 CPU%(cpu)d %(tsc)d (+%(reltsc)8d) page_grant_unmap [ domid = %(1)d ] diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index c8ef18a..3a6640b 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -82,6 +82,7 @@ static int vmx_vmfunc_intercept(struct cpu_user_regs *regs); struct vmx_pi_blocking_vcpu { struct list_head list; spinlock_t lock; + atomic_t counter; }; /* @@ -119,6 +120,9 @@ static void vmx_vcpu_block(struct vcpu *v) */ ASSERT(old_lock == NULL); + atomic_inc(&per_cpu(vmx_pi_blocking, v->processor).counter); + HVMTRACE_4D(VT_D_PI_BLOCK, v->domain->domain_id, v->vcpu_id, v->processor, + atomic_read(&per_cpu(vmx_pi_blocking, v->processor).counter)); list_add_tail(&v->arch.hvm_vmx.pi_blocking.list, &per_cpu(vmx_pi_blocking, v->processor).list); spin_unlock_irqrestore(pi_blocking_list_lock, flags); @@ -186,6 +190,8 @@ static void vmx_pi_unblock_vcpu(struct vcpu *v) { ASSERT(v->arch.hvm_vmx.pi_blocking.lock == pi_blocking_list_lock); list_del(&v->arch.hvm_vmx.pi_blocking.list); + atomic_dec(&container_of(pi_blocking_list_lock, + struct vmx_pi_blocking_vcpu, lock)->counter); v->arch.hvm_vmx.pi_blocking.lock = NULL; } @@ -234,6 +240,7 @@ void vmx_pi_desc_fixup(unsigned int cpu) if ( pi_test_on(&vmx->pi_desc) ) { list_del(&vmx->pi_blocking.list); + atomic_dec(&per_cpu(vmx_pi_blocking, cpu).counter); vmx->pi_blocking.lock = NULL; vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx)); } @@ -2360,13 +2367,15 @@ static void pi_wakeup_interrupt(struct cpu_user_regs *regs) struct arch_vmx_struct *vmx, *tmp; spinlock_t *lock = &per_cpu(vmx_pi_blocking, smp_processor_id()).lock; struct list_head *blocked_vcpus = - &per_cpu(vmx_pi_blocking, smp_processor_id()).list; + &per_cpu(vmx_pi_blocking, smp_processor_id()).list; ack_APIC_irq(); this_cpu(irq_count)++; spin_lock(lock); + TRACE_1D(TRC_HVM_PI_WAKEUP_START, + atomic_read(&per_cpu(vmx_pi_blocking, smp_processor_id()).counter)); /* * XXX: The length of the list depends on how many vCPU is current * blocked on this specific pCPU. This may hurt the interrupt latency @@ -2377,11 +2386,13 @@ static void pi_wakeup_interrupt(struct cpu_user_regs *regs) if ( pi_test_on(&vmx->pi_desc) ) { list_del(&vmx->pi_blocking.list); + atomic_dec(&per_cpu(vmx_pi_blocking, smp_processor_id()).counter); ASSERT(vmx->pi_blocking.lock == lock); vmx->pi_blocking.lock = NULL; vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx)); } } + TRACE_0D(TRC_HVM_PI_WAKEUP_END); spin_unlock(lock); } diff --git a/xen/include/asm-x86/hvm/trace.h b/xen/include/asm-x86/hvm/trace.h index de802a6..afe8b75 100644 --- a/xen/include/asm-x86/hvm/trace.h +++ b/xen/include/asm-x86/hvm/trace.h @@ -54,6 +54,9 @@ #define DO_TRC_HVM_TRAP DEFAULT_HVM_MISC #define DO_TRC_HVM_TRAP_DEBUG DEFAULT_HVM_MISC #define DO_TRC_HVM_VLAPIC DEFAULT_HVM_MISC +#define DO_TRC_HVM_VT_D_PI_BLOCK DEFAULT_HVM_MISC +#define DO_TRC_HVM_PI_WAKEUP_START DEFAULT_HVM_MISC +#define DO_TRC_HVM_PI_WAKEUP_END DEFAULT_HVM_MISC #define TRC_PAR_LONG(par) ((par)&0xFFFFFFFF),((par)>>32) diff --git a/xen/include/public/trace.h b/xen/include/public/trace.h index 7f2e891..c5b95ee 100644 --- a/xen/include/public/trace.h +++ b/xen/include/public/trace.h @@ -234,6 +234,9 @@ #define TRC_HVM_TRAP (TRC_HVM_HANDLER + 0x23) #define TRC_HVM_TRAP_DEBUG (TRC_HVM_HANDLER + 0x24) #define TRC_HVM_VLAPIC (TRC_HVM_HANDLER + 0x25) +#define TRC_HVM_VT_D_PI_BLOCK (TRC_HVM_HANDLER + 0x26) +#define TRC_HVM_PI_WAKEUP_START (TRC_HVM_HANDLER + 0x27) +#define TRC_HVM_PI_WAKEUP_END (TRC_HVM_HANDLER + 0x28) #define TRC_HVM_IOPORT_WRITE (TRC_HVM_HANDLER + 0x216) #define TRC_HVM_IOMEM_WRITE (TRC_HVM_HANDLER + 0x217)